3 June 2016 – “Say it out loud and the machines will know”. So said Pat Matkowski of Google Analytics two weeks ago at a Google workshop on search engines, machine learning, chatbots, user interfaces (UIs) and AI as they all move beyond the web and into the messy real world. And boy, what they are finding.
It’s a wacky world out there. There is a major battle in voice and AI brewing between Amazon, Apple, Google, and Microsoft. Google has Google Now. Microsoft has Cortona and Apple has Siri. Most recently, Amazon joined the fray with Alexa on their Echo device platform. So now onto the usual battle: getting people hooked onto your platform.
Apple kicked it off, really, when it introduced Siri, although Google had Google Now in the works and followed suit pretty quickly. Interestingly, Microsoft also had been doing a lot of work internally on voice UIs and an AI engine and took the next big step in making Cortana a voice interface embedded in both Windows Mobile and, more recently, Windows desktop. Amazon did not have a mobile or desktop play so they had to create a device to deliver a voice/AI device, which comes through their Echo system.
I started delving into the role of AI in UIs 2 years ago when I was introduced to a unit of the Microsoft Natural Language Processing group as part of an e-discovery project, similar units you’ll find (at various levels of sophistication and intensity) in all of the corporate players I noted above.
And they all had the same mission: if the technology could ever mature, become powerful enough to deliver a more conversational and contextual approach to the man-machine interface … well, BANG! It could be the most viable way people interact with technology. And now … thanks to more powerful processors and advances in natural language interfaces, neural networks, and deep learning techniques … AI-based voice interfaces are set to become one of the most important ways all of us will interact with technology in the future.
Wow. Amazon (my book guy) has artificial intelligence. More importantly, it has a gizmo which people seem to be buying. Google has fabulous artificial intelligence. The Google I/O conference was a litany of smart software choir members. Now Facebook is ramping up (read “Facebook Is Using ‘Near-Human’ AI to Muscle in on Google’s Home Turf”) to make life tough for the Alphabet kids.
And wouldn’t you know it. IBM is in the game as well! Read “IBM Is building Cognitive AI to Impact Every Decision Made” which I assume means decisions at Amazon, Facebook, Google, and the other outfits in the artificial intelligence hyperbole parade. I love the word “every”. According to the write up:
“If it’s digital, it’ll be cognitive,” explained IBM CEO Ginny Rometty in a wide-ranging discussion with Recode’s Kara Swisher on Wednesday during the annual Code Conference.
Another sweeping categorical affirmative. I love it. But, hey: this is the wild and crazy world of the really, really, REALLY Big Big Big Things: Big Data, Big Predictive Analytics, Big Visualization. You other “Big Things”: step aside!
Oh, yeah. Still a long way to go but following the oh-so-typical trajectory of every technological breakthrough: it takes twice as long as we expected, and half as long as we prepared for.
Oops. I digress. Shortly I will have a 6-part AI series on the issues and concepts above. Back to my prison story …
As you can surmise, every call into or out of U.S. prisons is recorded. It can be important to know what’s being said, because some inmates use phones to conduct illegal business on the outside. But a problem: the recordings generate huge quantities of audio that are prohibitively expensive to monitor with human ears. To the rescue: a machine-learning system developed by London firm Intelligent Voice to listen in on the thousands of hours of recordings generated every month.
Note: most of my e-discovery readers know this company well. They developed “Phonetic Search” which deconstructs your search term, and attempts to pattern match “phonemes”, which are some of the building blocks of words, to find approximate matches. It is basically an e-discovery system that gives thematic and other analysis when aggregating text information. Intelligent Voice is just one company in scores of new technology in the e-discovery ecosystem often overlooked because we are so damn fixated on predictive coding, which you can surmise will eventually be a dinosaur if you grasp the introductory paragraphs of this post. More to come in my AI series.
The Intelligent Voice software was beta tested at a U.S. prison and its CEO Nigel Cannings captures it beautifully: “No one at the prison spotted code words until our software started churning through calls“.
So how does it work? The software saw the phrase “three-way” cropping up again and again in the calls – it was one of the most common non-trivial words or phrases used. At first, prison officials were surprised by the overwhelming popularity of what they thought was a sexual reference.
But then they worked out it was … a code! Clever chaps. Prisoners are allowed to call only a few previously agreed numbers. So if an inmate wanted to speak to someone on a number not on the list, they would call their friends or parents and ask for a “three-way” with the person they really wanted to talk to – “three way” being telephone speak for dialing a third party into the call. No one running the phone surveillance at the prison spotted the code until the software started churning through the recordings.
This story illustrates the speed and scale of analysis that machine-learning algorithms are bringing to the world. Intelligent Voice originally developed the software for use by UK banks, which must record their calls to comply with industry regulations. As with prisons, this generates a vast amount of audio data that is hard to search through.
Cannings has been interviewed many times and in a recent piece in Nature magazine he noted the breakthrough came when he decided to see what would happen if he pointed a machine-learning system at the waveform of the voice data — its pattern of spikes and troughs — rather than the audio recording directly. It worked brilliantly. Training his system on this visual representation let him harness powerful existing techniques designed for image classification. He built this dialect classification system based on pictures of the human voice.
There is no easy way to classify dialects. One of the things I learned at the Microsoft Natural Language Processing group was you must first select the criteria on which a classification is to be based. Sometimes dialect classification is based strictly on geography, sometimes it is based strictly on the structural features – lexicon, phonology, morphology – of the dialects. Tricky stuff.
The trick with Cannings was he let his system create its own models for recognizing speech patterns and accents that were as good as the best hand-coded ones around, models built by dialect and computer science experts. In their first run they were getting something like 88 per cent accuracy. The software then taught itself to transcribe speech by using recordings of U.S. congressional hearings, matching up the audio with the transcripts.
The power of machines that can listen and watch is not that they can do better than human ears or eyes. In fact, they perform much worse – especially when confronted with data from the real world. Their power, like all applications of computation, lies in speed, scale and the relative cheapness of processing.
Yes, the cost. The cost would work out at 5 cents per hour of audio while human transcription costs can be 1,000 times that. In fact, an automated transcription service is something Intelligent Voice is considering, but for now they are focusing on search.
As I have noted, most large tech companies are developing neural networks for understanding speech, opening up data sets that were previously difficult, or impossible, to search. Voice-activated virtual assistants like Google Now, Apple’s Siri, Amazon’s Echo and Microsoft’s Cortana must also make sense of the quirks of human speech. And Facebook recently announced that it has repurposed its image-recognition software to draw maps based on satellite photos of Earth. These maps are of lower quality than those produced by humans but, again, the advantage is speed. Facebook’s system can map the entire land surface of the planet – every road and house – in just a few hours.
A taste of tech to come …
To make this more fun: this entire post was first written in draft form by an AI software called “Hunch” which I am beta testing for a buddy/fellow student of mine in my AI program at ETH Zurich. If you have used anything like the Cubes App … which collects/indexes attachments, photos, videos, documents and links from all of your email … you know what I am talking about.
It works in the background of my web browser to track, analyze and store all the web pages I use to research a post. It forgets nothing, keeps everything. It stores full web page captures of every site I visit, and every search I perform. And it creates a subject index. And I can even integrate past posts/articles I have written.
And I do not need to switch tools to take notes or snap screenshots. It is all embedded.
So even if I travel “back in time” to review web pages or social media accounts that have been suspended or removed or I get a “404 error”, all my research/data is stored, tracked and accessible. And on my laptop, so no sensitive data being stored in that magical cloud somewhere.
Then I program it a bit (sorry, I can’t divulge details) and it punches out a first draft. And also notes the sources. Full disclosure: right now it does require a LOT of editing. But based on how well I input what to pull/collect, it is an amazing time saver if merely as an outline creator.
Oh, the places you’ll go! There is fun to be done!
There are points to be scored.
There are games to be won.
You have brains in your head.
You have feet in your shoes.
You can steer yourself
any direction you choose.
For you could be the winning-est winner of all,
as the whole wide world watches what you do with your ball.
– Dr Seuss