A chat with Google about addressing an obstacle common across the natural language processing world: understanding the ever-changing nature of language

 

23 October 2020 (Paris, France) – It’s hard enough for humans to keep up with what the cool kids are saying – so imagine how much tougher it is for AI. Google has been having a series of Zoom fests on how the Google Analytics team and the Google language teams (there are many) are addressing an obstacle common across the natural language processing (NLP) world: understanding the ever-changing nature of language. To make Google Assistant (Google’s artificial intelligence–powered virtual assistant primarily available on mobile and smart home devices) work well this is critical.

Lost for words 

Slang, cultural references, accents, different dialects … attempting to juggle all of these things is like playing whack-a-mole. Machine learning models have to simultaneously play catch-up and stay ahead of the curve. One secret: Google is training one of their models on top music charts, which belt out new slang at full volume. 

But as for learning an entirely new language? Before any launch, the team must decide whether the model grasps five overall principles: linguistic accuracy; cultural relevance; how to sound natural; cultural norms; and voice, tone, and slang. We examined a case study: when “Despacito” blew up in 2017, the Assistant’s AI model was less prepared to handle two languages at once. But increased user interest helped accelerate the model’s learning how to handle multilingual user queries.

Custom-made 

When learning a new language, customs and cultural references are just as important as words, tense, and sentence structure. That’s why the Google teams test any new launch in a company-wide internal program first – and why Google hired people from the 90+ countries where Assistant is available, who can help catch bugs or issues that non-native language speakers could miss. Google employs 3,000+ people from those countries for this task.

Interesting note: Google Assistant largely understood the names of American sports stars, but a European team member realized it couldn’t recognize an international table tennis champion. It was simply a question most of them in the U.S. had never tried up until that point. This realization sparked more effort into understanding uncommon words. They now have 2,000 people just focused on international sport names.

The obstacles faced by Google Assistant are common to virtually every consumer-facing NLP product (think: Amazon’s Alexa, Apple’s Siri, Microsoft’s Cortana, and more). But there’s a straightforward solution: just hire a diverse team to build and maintain your NLP product/service. Surely everybody has the budget to employ 5,000+ people to focus on that? 🙂

For a related story:  “Antitrust case? What antitrust case? Privacy issues? What privacy issues?” Google just powers on.

Leave a Reply

Your email address will not be published. Required fields are marked *

scroll to top