Tech tutorials Architecting a Chatbot for Language Recognition
By Insight Editor / 2 Apr 2019 , Updated on 16 May 2019 / Topics: Customer experience Artificial Intelligence (AI)
By Insight Editor / 2 Apr 2019 , Updated on 16 May 2019 / Topics: Customer experience Artificial Intelligence (AI)
Natural conversations, by nature, allow for the flexibility of switching language midconversation. In fact, for multilingual individuals such as my brothers and me, changing between various languages allows us to emphasize certain concepts without explicitly stating so.
We generally speak in Polish (English if our wives are present), English to fill in words we don’t know in Polish, and Spanish to provide emphasis or a callback to something that happened in our childhood in Puerto Rico.
Chatbots, in their current state without Artificial General Intelligence (AGI), don’t allow for the nuance of language choice. However, given the state of language recognition and machine translation, we can implement a somewhat intelligent multilingual chatbot.
In this article, I’ll outline the general automatic approach. I’ll also highlight the downsides of this approach and list the problems that need to be solved when creating a production-quality multi-language chatbot experience.
I call the fully automated approach naive. This is the type of approach most projects start with. It’s somewhat easy to put in place and moves a project into the multilingual realm quite quickly. But it comes with its set of challenges. Before I dive into those, let’s review the approach.
Assuming we have a working English natural language model and English content, the bot can implement multilingual conversations as follows:
This approach works, but the conversation quality is off. Although machine translation has improved by leaps and bounds, cases still exist in which the conversation feels stiff and culturally disconnected. This approach suffers in three main areas:
A more mature approach to a multilingual chatbot involves three key considerations. They vary based on risk aversion, content quality and available resources. Let’s explore options for each item as we progress through them.
Ideally, I like my chatbot solutions to have an NLU model for each supported language. The cost of creating and maintaining these models can be significant. For multilanguage solutions, I always ask for the highest-priority languages a client would like to support.
If an enterprise can support 90% of employees by getting two languages working well, then we can limit the NLU scope to those two languages — and use the automatic approach for any other languages.
In many of my projects, I use Microsoft’s Language Understanding Intelligent Service (LUIS). I might create one model for English and another for Simplified Chinese. That way, Chinese users don’t suffer the nuanced translation tax.
Project stakeholders also need to decide whether the chatbot should support an arbitrary amount of languages or limited valid inputs to languages with an NLU model. If it does the latter, the automatic approach above will be applied to non-natively supported languages.
In ambiguous language detection, short utterances may be valid in multiple languages. Further complicating the matter, the translation APIs, such as those from Microsoft and Google, don’t return options or confidence levels.
There are numerous approaches to resolving the ambiguous language problem. Two possibilities are 1) run a concatenation of the last N user utterances through the language recognition engine, or 2) maintain a list of ambiguous words that we ignore for language detection and use the user’s last utterance language instead.
Both are different flavors of simply considering the user’s language preference as a conversation-level rather than message-level property. If we’re interested in supporting switching between languages midconversation, a mix of both approaches works well.
I encourage clients to maintain the precise localized content sent by the chatbot. This is especially true for public consumer or regulated industry use cases in which any mistranslated content might result in fines or negative brand attention.
This, again, is a risk versus effort calculation that needs to be performed by the right stakeholders. The necessity of controlling localized content and the effort involved in it typically weigh on whether the bot supports arbitrary languages or not.
Based on all of the above, a true approach to a multilingual chatbot experience might look like this:
The bot in this case:
The managed models and paths to automatic translation add nuance to the automatic approach. If we imagine a spectrum where on one end we find the fully automatic approach and on the other end the fully managed approach, all implementations fall somewhere within this range.
Clients in regulated industries and heavily branded scenarios will lean toward the fully managed end, and clients with internal or less precise use cases will typically find the automatic approach more effective and economical.
The hybrid managed/automatic implementation does take some effort but results in the best conversational chatbot experience.