You’ve built a chatbot, but it’s not performing as you’d like, and it seems like the problem is coming from the training of your dataset. Knowing where to start when you want to improve the performance of your chatbot is tricky. In the previous article, we’ve demystified the concepts of benchmarking, the four key metrics of bot performance and confusion matrix. Here is a list of steps to take to improve your dataset and make bot building much easier with Recast.AI!
Knowing where to start
Training your dataset is something that we recommend doing intent by intent. When starting, you should start working on the intents that can give you the biggest performance boosts. These are the ones with the best combination of large number of sentences and a high number of errors.
Understanding what’s going wrong
There are four key metrics to determine the performance of your dataset: precision, recall, F1-Score and accuracy. If you are not comfortable with those metrics, this article will help you.
Recall and precision have specific values for each intent, so they are tools of choice to determine why an intent is not performing well. We advise you to start mostly with recall, because the loss of recall for a given intent, is a loss of precision for another intent. It is also more common to have a high precision with a very low recall. When your training isn’t performing well, the following scenarios are usually the reason:
- There isn’t enough training (under 30 expressions for bots of less than 30 intents; 150 expressions for bigger bots)
- There are expressions in this intent that do not belong
- The use-case covered by the intent is too broad for the algorithm, lowering the performance
- Your intent has too many expressions compared to other intents, creating unbalance
- The use case covered by the intent is too specific, and the algorithm cannot detect it properly
Solving issues through classification
Now that you have an idea of things that might be causing poor performance, don’t despair: they are fixable! Here are the steps to follow:
The first step would be to clean the relevant intent as well as all connected intents. What’s messing your intent up might be in another one! Dive into all and:
- Remove expressions that are not in the right place
- Make sure all expressions are in the right language
- Remove double entries
- Add more training
If the intents are big, consider if removing sentences to re-equilibrate them all is a valid solution. If you have an intent much bigger than the other, the algorithm may learn to predict, when in doubt, to always make a prediction toward the much bigger intent. Statistically speaking, it has more chance being right! But you don’t want your algorithm to rely on luck, you want it to rely on understanding.
Work on intent sizes
If after taking these steps, you still get mediocre detection scores, your intents might not be efficiently split. If your intent A is often detected as B, and B as A, merging them may solve the issue.
The use case treated by the intent can also be too broad. To solve that, either split it into different intents or use entities to specify the detection.
While doing that, it is important to think in intents and not final action. While “I want to get my last invoice” and “I need a housing confirmation” lead to the same action (getting an energy invoice), they are two different ways of asking for something. And that’s what you bot needs to understand!
Solving issues through named entity recognition
Once you have gone through all the above steps and improved your classification performance, you can now work on improving the detection of entities to provide a better experience to your users. Usually, the performance issues for named entity recognition (NER) come from the fact that an important information was not detected properly or at all.
To solve that, there are a few steps you can take:
- Vary your training: add expressions that have very different structures, where entities are placed differently
- Check that only the information you want to extract are tagged: tagging all the words in a sentence is detrimental to good performance
- Pay attention to tagging adjectives or nouns. In the sentence “I want a detailed invoice”, you might be tempted to tag “detailed” to specify what the user wants. This is tricky but can work in some specific situations. Test it out and closely monitor to see if it increases your performance.
- In some rare occurrences, it can be interesting to merge two entities that are often confused for one another. Again, test and monitor closely as this can have a different impact depending on the bot.
One word on gazettes: if you have an open gazette but are putting the entire vocabulary used in it, it might be smart to close it. You can read more about that in the documentation.
Well, if you have taken these steps, you should now have a clean and performing dataset! If you’re running into any kind of difficulties or want to go further, please reach out to the team on Slack Community. We’ll be happy to help !
Happy building 🙂
Also published on Medium.