Engaged customers drive companies in business-to-business (B2B) and business-to-consumer (B2C) landscapes. In return, they seek consistent cross-channel services and great real-time customer service. But providing these has become difficult due to rising costs and a lack of well-trained service providers, which are inspiring companies to adopt conversational artificial intelligence (AI) and improve customer experience.
Conversational AI refers to virtual agents and chatbots that mimic human interactions and can engage human beings in conversation. A slight look around can confirm that these means are fast becoming a way of life. From asking Alexa to “find the nearest restaurant” to asking Siri “create a reminder”, we often interact with virtual assistants and chatbots for their ability to answer consumers’ queries, resolve complaints, make reservations, and much more.
Creating these virtual assistants requires substantial effort. Awareness of the key challenges makes the development process simple. I have first-hand experience in creating a mature chatbot for a recruitment platform and have used it as a reference point to explain the problems and their solutions.
Conversational AI Challenges in Development Process
To build a conversational AI chatbot, developers can use frameworks like RASA, Amazon’s Lex, or Google’s Dialogflow to build chatbots. But they prefer RASA more when they plan custom changes or the bot is in the mature stage as it is an open-source framework. Other frameworks are also suitable as a starting point.
Understanding their challenges becomes easier by classifying them as per the 3 major components of a chatbot.
- Natural Language Understanding (NLU) is the ability of a bot to comprehend human dialogue. It performs intent classification, entity extraction, and retrieving responses.
- Dialogue Manager is responsible for a set of actions to be performed based on the current and previous set of user inputs. It takes intent and entities as input (as part of the previous conversation) and identifies the next response.
- Natural Language Generation (NLG) is the process of generating written or spoken sentences from given data. It frames the response, which is then presented to the user.

1. Drawbacks in NLU
a. Insufficient data
When developers replace FAQs or other support systems with a chatbot, they get a decent amount of training data. But the same doesn’t happen when they create the bot from scratch. In such cases, developers generate training data synthetically.
What to do:
A template-based data generator can generate a decent amount of user queries for training. Once the chatbot is ready, project owners can expose it to a limited number of users to enhance training data and upgrade it over a period.
b. Unfitting model selection
Appropriate model selection and training data are crucial to get the best intent and entity extraction results. Developers usually train chatbots in a specific language and domain, and most of the available pre-trained models are often domain-specific and trained in a single language.
There can be cases of mixed languages as well where people are polyglot. They might enter queries in a mixed language. For instance, in a French-dominated region, people may use a type of English that is a mix of both French and English.
What to do:
Using models trained in multiple languages could reduce the problem. A pre-trained model like LaBSE (Language-agnostic Bert sentence embedding) can be helpful in such cases. LaBSE is trained in more than 109 languages on a sentence similarity task. The model already knows similar words in a different language. In our project, it worked really well.
c. Improper entity extraction
Chatbots require entities to identify what kind of data the user is searching. These entities include time, place, person, item, date, etc. However, bots can fail to identify an entity from natural language:
- Same context but different entities. For instance, bots can confuse a place as an entity when a user types “Name of students from IIT Delhi” and then “Name of students from Bengaluru”.
- Scenarios where the entities are mispredicted with low confidence. For example, a bot can identify IIT Delhi as a city with low confidence.
- Partial entity extraction by machine learning model. If a user types “students from IIT Delhi”, the model can only identify “IIT” only as an entity instead of “IIT Delhi”.
- Single-word inputs having no context can confuse the machine learning models. For example, a word like “Rishikesh” can mean both the name of a person as well as a city.
What to do:
Adding more training examples could be a solution. But there is a limit after which adding more would not help. Moreover, it’s an endless process. Another solution could be to define regex patterns using pre-defined words to help extract entities with a known set of possible values, like city, country, etc.
Models share lower confidence whenever it is not sure about entity prediction. Developers can use this as a trigger to call a custom component that can rectify the low-confident entity. Let’s consider the above example. If IIT Delhi is predicted as a city with low confidence, then the user can always search for it in the database. After failing to find the predicted entity in the City table, the model would proceed to other tables and, eventually, find it in the Institute table resulting in entity correction.
d. Wrong intent classification
Every user message has some intent associated with it. Since intents derive the next course of actions of a bot, correctly classifying user queries with intent is crucial. But developers must identify intents with minimal confusion across intents. Otherwise, there can be cases bugged by confusion. For example, “Show me open positions” vs. “Show me open position candidates”.
What to do:
There are two ways to differentiate confusing queries. Firstly, a developer can introduce sub-intent. Secondly, models can handle queries based on entities identified.
A domain-specific chatbot should be a closed system where it should clearly identify what it is capable of and what it is not. Developers must do the development in phases while planning for domain-specific chatbots. In each phase, they can identify the chatbot’s unsupported features (via unsupported intent).
They can also identify what the chatbot cannot handle in “out of scope” intent. But there could be cases where the bot is confused w.r.t unsupported and out-of-scope intent. For such scenarios, a fallback mechanism should be in place where if the intent confidence is below a threshold, the model can work gracefully with a fallback intent to handle confusion cases.
2. Challenges with Dialogue Management
Once the bot identifies the intent of a user’s message, it must send a response back. Bot decides the response based on a certain set of defined rules and stories. For example, a rule can be as simple as utter “good morning” when the user greets “Hi”. However, most often, conversations with chatbots comprise follow-up interaction, and their responses depend on the overall context of the conversation.
What to do
To handle this, chatbots are fed with real conversation examples called Stories.

Users don’t always interact as intended. A mature chatbot should handle all such deviations gracefully. Designers and developers can guarantee this if they don’t just focus on a happy path while writing stories but also work on unhappy paths.
3. Challenges in Natural Language Generation
Users’ engagement with chatbots heavily relies on their responses. Users might lose interest if the responses are too robotic or humanly. For instance, a user may not like an answer like “You have typed a wrong query” for a wrong input even though the response is correct. The answer here doesn’t match the persona of an assistant.
What to do
Chatbot serves as an assistant, and it should possess a specific persona and tone of voice. They should be welcoming and humble, and developers should design conversations and utterances accordingly. The responses should not sound very robotic and mechanical. For instance, the bot could say, “Sorry, it seems like I don’t have any details. Could you please re-type your query?” to address a wrong input.
Other Conversational AI Challenges & Solutions
a. Outdated evaluation and success criteria
A lack of exhaustive test cases can lead to a lot of iterations. Even a tiny change can lead to unexpected outcomes. It would be better if test cases evolve continuously whenever a new feature is introduced to ensure that the model stays consistent w.r.t. to other previously introduced features.
What to do
Bot actions can evolve over a period, either due to the introduction of a new feature or the change in how the product should look. To ensure that the behavior is what we expect, creating test stories to compare what the bot does versus what is expected could be a good form of providing test cases. This will ensure that any changes in the code are not breaking other parts.
Once proper evaluation happens, it’s imperative to define the KPIs which would decide the success or failure of the chatbot. Key metrics such as the number of deviations, retry, failure, session duration, etc., will help to decide the success criteria.
b. Lack of proper productionization
Productionizing a chatbot is challenging primarily due to the involvement of many machine learning models, a typical scenario of MLOps. Chatbots use conversations of the users with the bot to evolve. Out of 7 issues identified for failure of a typical chatbot, insufficient chatbot evolution and poor performance trigger 6.
What to do
Proper logging and maintenance of conversations and internal states (slots, entities, intents, etc.) are necessary to backtrack and improve. These and continuous refinement of the chatbot can ensure stability and increase its user acceptance.

Source: https://www.userlike.com/en/blog/chatbots
Future Trends
LLM (Large Language Model) based chatbots like ChatGPT and Bard are game-changing innovations and have improved the capabilities of conversational AIs. They are not only good at making open-ended human-like conversations but can perform different tasks like text summarization, paragraph writing, etc., which could be earlier achieved only by specific models.
Read also: ChatGPT for Software Development
One of the challenges with traditional chatbot systems is categorizing each sentence into intents and deciding the response accordingly. This approach is not practical. Responses like “Sorry, I couldn’t get you” are often irritating. Intentless chatbot systems are the way forward, and these LLMs can help in achieving it to a great extent.
LLMs can easily achieve state-of-the-art results in general named entity recognition barring certain domain-specific entity recognition. A mixed approach to using LLMs with any chatbot framework can inspire a more mature and robust chatbot system.
With the latest advancements and continuous research in conversational AI, chatbots are getting better every day. Areas like handling complex tasks with multiple intents, such as “Book a flight to Mumbai and arrange for a cab to Dadar,” are getting much attention.
Soon personalized conversations will take place based on the characteristics of the user to keep the user engaged. For example, if a bot finds the user is unhappy, it redirects the conversation to a real agent. Additionally, with ever-increasing chatbot data, deep learning techniques like ChatGPT can automatically generate responses for queries using a knowledge base.
Conclusion
In this blog post, I discussed various challenges one can face while developing different components of chatbots and how to handle them. Getting a chatbot to a mature level is an iterative process. A general approach could be conversation-driven development, where improvements are made based on user feedback. Models trained for AI conversation will always have certain limitations, hence, identifying and resolving them becomes a key differentiator between a mature and immature chatbot.