A Black Mirror

Cognitive bias in machine learning

by Griffin Kao

Illustration by Sophia Meng

published November 16, 2018

“These systems are simply mirrors for a biased society,” says Dr. Jerry Kaplan, a professor at Stanford University, speaking about deep learning models. A renowned expert in the field of artificial intelligence, having founded several AI startups and published three books on the subject matter, his voice quakes with intensity as we talk about the possibility of eliminating cognitive bias in our current artificial intelligence. “If the bias exists in society, these systems are going to be biased, because they’re based on the data they observe in the real world,” he continues.

Deep learning, a subset of artificial intelligence, describes a form of information prediction centered around neural networks, computer systems that model the human brain and nervous system. A neural network is composed of interconnected nodes that pass on signals to one another to propagate and store information through the system, loosely simulating the way neurons fire electrical signals to other neurons through their synapses. In deep learning, computer scientists feed these neural networks training data so that they can learn patterns which they can then apply to predict information, much like a child might learn to recognize what kinds of behaviors will get them in trouble versus what kinds of behaviors will earn praise.

“Word2vec” models are a type of neural network that essentially take in a corpus of text as data—these can range from famous novels to collections of news articles to large aggregations of Tweets—and determine which words appear near each other in the text in order to represent the words as vectors, or lists of numbers. Specifically, words appearing in the same sentence would be considered ‘near’ each other, and each number in the vector representation of a word is more or less a measure of how often it appears with another English word. Since similar words show up in similar contexts, their vectors should be close to each other and the cosine distances between them should be low. This isn’t a perfect approximation of word meaning, since words like “quiet” and “loud” might appear in the same location in the same sentence, but it allows word2vec models to reconstruct relatively accurate linguistic contexts for words by converting them into these lists of numbers.

Because these word2vec models allow us to represent English words as vectors (called word embeddings), we can perform mathematical operations on words by completing vector arithmetic on their vector forms. For example, if we did something like “king + woman – man,” we would get an output vector closest to “queen.” You can think of it as saying removing the man part of a king and adding the woman part gives us a queen—an analogous relation of sorts where “king is to man as queen is to woman.”

Word2vec is the basis of much of natural language processing and allows us to do useful things like text prediction and speech recognition, but it also reveals some alarming cognitive biases in our society. Since the training data that the models learn from is writing produced by people, all of our biases that exist in language data get emulated by models themselves, as Dr. Kaplan mentioned. In particular, most language models are extremely sexist, because the bulk of available writing is from eras more conservative than ours, and performing the equation “doctor + woman – man” will often output a vector close to “nurse.” However, in a world that is far from rooting out unfair biases, it’s nearly impossible to avoid creating such biased intelligent systems. Since such models must be trained on data we produce, they will always be reflections of our imperfect selves, and so we must learn to limit the power we give them.




Cognitive biases like the sexism in our language models are relatively common in the rapidly expanding field of machine learning. As models are given more and more power to make decisions, we’re beginning to see that these biases have farther reaching consequences. For instance, allowing our models to identify high-risk repeat offenders in the criminal justice system (Equivant’s COMPAS system) or criminal suspects in local law enforcement (Amazon’s Rekognition software) has greatly exacerbated racial profiling and has contributed to mass incarceration by targeting people of color.

A number of tech companies have taken actions that suggest an increasing understanding of the ethical implications of prejudice in their machine learning systems. Google’s CEO Sundar Pichai, for example, published a list of ethical principles to guide their usage of artificial intelligence this past summer. Among the principles, he writes that AI should “avoid creating or reinforcing unfair bias” and, more generally, should not be used in “technologies that cause or are likely to cause overall harm.” Likewise, Microsoft has founded a number of internal groups to manage the research and application of new AI developments, like the FATE (Fairness, Accountability, Transparency, and Ethics in AI) research group. Most of these groups are pushing for awareness, an understanding of where these biases come from and the impact they have, as the broader solution to bigoted machine learning. They believe that transparency surrounding the data sets on which AI systems are trained would allow those biases to be identified before they can do potential harm.

In addition, since much of the cognitive bias witnessed in our smart systems can be traced back to the programmer level, many researchers believe that remedying these oversights begins with educating future computer scientists on the ethics of artificial intelligence—a fact that some schools have begun to realize, offering courses like the one Dr. Kaplan teaches at Stanford, “CS122: Artifical Intelligence – Philosophy, Ethics, and Impact.” When I asked Dr. Kaplan about the syllabus for the course, he told me that the course aims to teach students to think critically about the impact that a biased model might have instead of blindly entrusting such models with important tasks. The idea is that to empower students to build machine learning software without a sense of social responsibility is akin to giving someone a gun without safety training. However, this begs the question, does safety training mean the gun itself is any less dangerous?

These measures that push awareness, transparency, or education are only effective to a certain extent. Understanding the cause of our racist, sexist, and typically intolerant machine learning models does not necessarily allow us to produce impartial systems, because the bias is inherent to the available data. For example, if we revisit our problematic word2vec model, we see it’s virtually impossible to find unbiased data on which to train a model; the nature of the task requires a large amount of text and text that conveys real meaning, so we can’t realistically create new unbiased writing or string together totally random words. In practice, most language models are trained on a number of large, publicly available corpora that aggregate wide-ranging pieces of writing. Some of these include Google Books Ngram Viewer, the American National Corpus, and Brown’s own Standard Corpus of Present-Day American English. A brief examination of Brown’s corpus immediately reveals where our models’ displayed sexism might come from—the corpus includes large sections of religious texts and a significant number of fiction pieces from writers like Charles Dickens (who was a notable misogynist). But even if we think about corpora composed of supposedly objective writing like the North American News Text Corpus, it is not hard to imagine that conservative reporting on problem areas of our society like the traditional gender roles or the pay gap might lead to our language models associating men with doctors and women with nurses.




When I speak with Dr. Kaplan he brings up the COMPAS software that learned to use race as a predictor of an incarcerated individual’s likelihood to become a repeat offender. He points out that, “from the standpoint of the courts, it’s not biased since it’s an objective analysis of the proportions present in training data” but then he iterates that, “from the standpoint of the individual, it’s completely unfair.” Since a Black individual with the same record as a white one will be labelled with a higher chance of re-offending, with this software Black people are re-admitted at much higher rates, both reflecting and perpetuating an already existing racist practice in policing. The reality is that, in order for a model to learn patterns used in predicting accurate information, the model must be fed real data—data produced by a flawed and biased population—and so our models will always echo the problematic thinking they consume. Dr. Kaplan affirms that “you can’t lay the sins at the feet of the programmer, or the techniques, since it’s inherent in the data,” again suggesting that the cognitive bias exhibited by our machine learning models is part of a larger structural issue in which the nature of the field means that our intelligent systems will always mirror the world we as humans have created.

But where does that leave us? When I ask Dr. Kaplan what this means for the field of machine learning, he talks about continuing to raise awareness and quantifying bias, but also acknowledges that “the essence of the problem is that these systems can institutionalize and perpetuate bias in electronic form.” Since we will continue to construct racist, sexist, and generally discriminatory machine learning models until we’ve fostered a society free from unfair biases, the only solution in the interim is to curb the influence that they wield.


GRIFFIN KAO B'20 is wondering why he hasn’t had a female computer science professor yet.