Human Bias to Data Bias
Machine Learning(ML) and Artificial Intelligence(AI) revolutionize the way software is designed. Before the ML and AI days, it was humans who needed to think through and add in every foreseeable pathway, leaving room for peoples’ individual cognitive biases to find their place in the design of the software. Having humans in the loop meant, the product would inherit judgements made by the designer.
The advances in ML and AI allow us to skip the rule-based, bias filled systems and get right to the data-driven, bias free system. We let the cold emotionless algorithm replace the human and now have an objective bias-free solution! Seems like the perfection solution, right?
Not exactly. It is true that in modern AI, explicitly coding rules to solve problems becomes less important. For instance, to understand language, a model performs better when you allow deep learning algorithms to learn the structure of sentences rather than encoding individual grammar rules in complex error prone systems.
How Children Learn Language
Think about how children learn language; they interact with parents, friends, and strangers and pick up little nuggets of language along the way. They do not learn a pre-defined set of rules defining grammar and other properties of language, and it works great!
As a result, a child’s learning develops from those interactions with family members and others. Incorrect grammar? Foul language? Stereotypes and biases? A child is like a sponge and will pick up and learn from what he or she is exposed to in his or her environment.
Or in other words: the child is trained on data.
How Machines Learn Language
This happens to be a common property of machine learning approaches. A machine is only as strong as the data it is trained on. Which brings back the issue of biases – just on a different level because machines lack the sense of awareness and empathy that humans do.
The Power of Data
Introducing bias through underlying data is a common problem – especially when systems rely on that data to grow from niche products to important parts of everyday life.
When Pokemon Go, a location-based game, was released, it had an unforeseen flaw: the data was collected predominately by white male tech-enthusiasts, who unknowingly biased the data towards their own preferences. The amount of available in-game content positively correlated with their most visited and preferred places. This led to an exclusion of minority neighborhoods, rural areas, and, in general, low-income zones.
Similarly, LinkedIn’s job recommendation engine used to “forget” to recommend high paying executive positions to women simply because women rarely held those positions in data source the engine was trained on.
An example in the Chatbot and Natural Language Understanding (NLU) space are word embeddings (read our primer for an introduction). As a quick summary, word embeddings understand the meaning of words by their proximity to other words.
As a result, word embeddings can pick up the biases of underlying data. If an embedding trained on a corpus of news articles is used to compare similarities of words, it will replicate sexist notions of the news items.
Table 1: Similarities of different jobs to man/woman.
The term “nurse” is judged as being more similar to woman than to man whereas for “boss” and “engineer” the opposite is true. Applying such techniques that model a language based on available texts can lead to these subtle problems.
When translating utterances from a language without grammatical gender to one with, the translation engine resorts to its learned biases to resolve the ambiguity. For the Turkish “o bir muhendis” (“he/she/it is an engineer”) Google Translate selects “he” as the gender for the English translation whereas for “o bir hemsire” (“he/she/it is a nurse”) “she” is chosen.
Figure 1: Translation service showing its gender bias.
On a small scale, one could argue that that’s how language is and that removing the bias would mean removing important information. On a larger one, however, such systems can create a feedback loop by reinforcing biases when interacting with users.
How to Build Trust
The previous examples illustrate that biases are not gone by switching from handmade rule-based systems to ML based ones. The source for biases just moved from the algorithm to the training data – making it less explicit and harder to spot.
What does that mean for collecting data to create a chatbot?
Understand the Risk
First and foremost, it is important to be aware of the risk of having biases in the data which can lead to unintended behavior. Even just this awareness facilitates deeper analysis – not every chatbot scenario inhibits the same risk.
Ask the Right Questions
Second, a good understanding of the data is required. For this purpose, the data, or even better the data collection process, should be scrutinized by answering the following questions:
- What is the origin of the data?
- When was it collected and by whom?
- How does the data relate to the target user group interacting with the chatbot?
- Does it capture the user groups peculiarities?
- Are language dialects of minority communities reflected in the data?
Test for Edge Cases
Additionally, a good old-fashioned testing strategy will go a long way. If you actively search for your chatbot edge cases, then you can improve your confidence in the data.
Monitor your Source
And lastly, it always a good idea to keep an eye out for who else can modify the training data. It is easy to trust people to do the right thing and suddenly you end up with racist teenage chatbot.
Make a Conscious Effort
The advancement towards data-driven systems did not free us from the risk of bias. Instead, it made the threat more implicit requiring a new awareness: know your data and be proactive about it. Make sure, it reflects the future you want and not reiterates the mistakes of the past.