AI Bias: Prejudice in Machine Learning

Bias In Machine Learning and AI

At the beginning of 2019, Alexandria Ocasio-Cortez made headlines on the topic of AI bias by stating that algorithms can perpetuate racism:

“[algorithms] always have these racial inequities that get translated, because algorithms are still made by human beings, and those algorithms are still pegged to basic human assumptions. They’re just automated. And automated assumptions — if you don’t fix the bias, then you’re just automating the bias.”

Algorithms are based on mathematical equations, and the answers you can glean from a numeric equation are objectively true. Many people used this fact as a retort against Ocasio-Cortez, accusing her of trying to find prejudices where there simply weren’t any. But not only can algorithms perpetuate racial biases, but they are prone to perpetuating any and all societal biases.

What is AI Bias?

The ways that algorithms can discriminate against certain demographics is felt in many tangible ways for a lot of people. Algorithms can determine how likely you are to get a job interview, what consequences you face if you commit a crime, whether or not you will get a mortgage, and how likely you are to “randomly” get stopped by the police. Skewed data, prejudiced programmers, and false logic can mean that the results aren’t as indisputable as we may assume at first.

A prominent example of race and gender bias can be found in facial recognition software, which is beginning to become a more and more popular tool in law enforcement. Joy Buolamwini at the Massachusetts Institute of Technology found that three of the most recent gender-recognition AIs could identify a person’s gender from a photo with 99 percent accuracy – but only if the person in the photo was a white man. This puts women and people of colour at risk of false identification—in fact, accuracy dropped all the way down to 35 percent for women of colour.

Similarly, but perhaps even more worryingly, a report by ProPublica found that AI used to anticipate future criminal behaviour was heavily skewed against black people. It falsely flagged almost twice as many black defendants as potential re-offenders than white defendants, and was much more likely to mislabel white defendants who would go on to re-offend as being low-risk. And it was, as you might predict, almost always wrong in its predictions of violent crime; only 20 percent of the people it predicted would commit violent crimes actually did so.

This large, potentially dangerous oversight is likely to be caused by a lack of diversity in the data used to train the algorithms. If the people programming the AI input more data containing white men than any other demographics, then the AI will learn to identify those people with much higher accuracy. It makes sense—AI can only learn from the data it is given.


The inequalities reflected in our technology begins with us and our actions, which we have addressed before at Data Relish. All of us have bias of some kind, which social scientists call a “bias blind spot”. We do not exist in a vacuum, but have all grown up and been moulded to some extent by the actions and interactions we have with others – and especially with those we trust and admire. Bell hooks put it best: when she said that “we must consciously work to rid ourselves of the legacy of negative socialization”; the prejudices we can live with in our communities rub off on our actions if we are not careful. We will just be contributing to a vicious cycle of structural inequality.

Machine learning and AI cannot operate without data. They learn to behave based on the data that is fed into them. This data is generated by humans, who have bias, and this AI bias is passed on—either consciously or unconsciously—into the algorithms that produce the results. If we expose an algorithm to thousands of pictures of men sitting at desks in offices and women in kitchens, it will act on the assumption that men and women are more suited to those environments, respectively.

Therefore, the issues our society and even our individual communities have with structural inequalities are reflected in machine learning. Amazon only recently ditched an AI recruitment tool that favoured male applicants over female applicants for software engineer positions. The reason the AI did this was because Amazon’s computer models were trained to vet applicants by observing patterns in resumes submitted to the company over a 10-year period. Amazon had collected significantly more CVs from men—a direct reflection of the male dominated workplaces across the tech industry.

Are We Asking Algorithms the Right Questions?

If society didn’t perpetuate the idea that men are better suited to jobs in these fields than women, the tech industry would not be so male dominated. If the industry was not so male dominated, then Amazon would not have collected significantly more resumes from men than from women—and if Amazon hadn’t had such an imbalance in resumes, the AI would not have learned to observe patterns that skewed so drastically towards male applicants.

The objectivity that machine learning requires can also pose problems. Another problem that arose with Amazon’s recruitment tool was due to the AI’s ability to register only words that were explicitly gendered, while failing to recognise the significance of implicitly gendered words. After discovering that the tool was penalising female candidates, the engineers reprogrammed the AI to ignore words that were explicitly gendered, like ‘women’s’ and ‘men’s’, but didn’t do the same for words that tend to be implicitly associated with, or used more by men. Therefore, the applications affected by this AI bias were still given priority based on more highly correlated with men over women, such as ‘captured’ and ‘executed’.

Standard practices in data science aren’t really designed to pick up on these problems, and so they can easily slip through the net. AI bias can have an effect on algorithms even before the data collection process is underway. For example, if an algorithm was created in order to maximise a company’s profit margins, then there is a chance that it may end up attempting to reach this goal in predatory ways (for instance, prioritising business over fairness) simply due to the way the objective was framed when the model was being created. Machines are not capable of creating nuance on their own, so sincere thought needs to be put into the concepts that frame the goals we have in mind for our machines.

What Can We Do About AI Bias?

Mathematics cannot overcome prejudice. Human behaviour is at the heart of machine learning development, and it’s up to us to try to predict how the technology we are influencing with our data is going to act. There are, however, ways we can combat the effects of AI bias in machine learning.

The people who generate, label and annotate the data algorithms need must be aware of their unconscious biases so they can ‘unlearn’ them, and prevent them from being passed on to the technology they work with. It could help to train them on how to identify and avoid these prejudices they may have—they can be hard to spot on their own, as most of us won’t notice them, or will assume we don’t have them in the first place. A lot of the time, it is hard to accept that you have them at all—thinking about your ingrained biases can force you to rethink how you see yourself.

It is also hugely important to ensure that data science workplaces are diverse and welcoming to minorities. Diversity in programmers will be more likely to lead to diversity of data. Moreover, it is vital to make sure that more data is being collected from the demographics who are most likely to face discrimination. A study by Irene Chen looked at an income-prediction system that was twice as likely to wrongly flag female employees as low-income and male employees as high-income. It was found that, if the dataset had been increased by a factor of 10, these mistakes would have been 40 percent less likely to occur.

Adding Nuance and CONTEXT to AI BIAS

As previously mentioned, the programmers and engineers behind the creation of these models also need to be aware of the implicit as well as explicit language that is being used to shape behaviour. It’s easy to assume that you are taking away any risk of gender discrimination, for example, by preventing your AI from discriminating based on words like ‘women’s’, but what about words that are more frequently used by certain genders due to the ways we’re conditioned to speak and act? It can be difficult to get your head around, as it isn’t something that we tend to do consciously—but once it’s being programmed into a model, it will be done consciously. Studies show that men, for example, are more likely to use ‘aggressive’ words in their resumes, whereas women are more likely to use ‘soft’ words. This is a much wider societal issue that needs to be studied from several corners, but it definitely needs to be kept in mind when working with AI, so as not to accidentally shut certain demographics out.

It is very important that awareness is raised about these issues. It is possible, with the right amount of investment, time and effort, that discrimination can be wiped out from current datasets. We need to make a conscious decision that it is worth that investment. As AI and machine learning become increasingly popular and more influential in all of our day-to-day lives, we have an important decision to make. We can either use it to continuously perpetuate harmful cycles that we have been in for decades, or we can use it to move away from these prejudices, and contribute to the efforts being made to dismantle ingrained structural biases.

AI Bias and ITs IMPACT ON OUR Data

It’s not just programmers and data scientists who need to worry about contributing to AI bias and machine learning bias. All of us have become accustomed to freely giving away our data, often with little or no thought about who it is going to or what it could be used for—and there is mounting evidence that this data is not being used appropriately, or in the ways that we are led to believe.

Recent changes in GDPR regulations make it clear that voluntary consent is needed if a person’s data is being collected or used in any way, whether that is through social media, job searching sites, online commerce, smartphone apps, or any other online medium. However, we are not always given the information we really need to give informed consent when it comes to giving access to our data. We are told that the app we are downloading will have access to our data—but we’re not told where that data is going, or what the consequences could be of data being held in the wrong hands.

The consequences of this when it comes to reinforcing prejudice have the potential to be widespread and varied. On a more personal scale, ingrained AI bias via data collection can affect the products and adverts that are targeted at individual users of social media. Women on Facebook are frequently targeted with ads for post-partum diets and baby clothes, regardless of whether or not they have ever personally expressed an interest in these things (or anything related to pregnancy or children in general). An article by Coding Rights found that one interviewee was targeted with an advertisement encouraging her to “donate your Asian eggs”. If you have an Instagram account, you can even access the data that is gathered by the site in order to target ads at you. Mystifyingly, mine include “infants”, “weddings”, “pregnancy” and “weight loss”, despite me having absolutely no interest in any of these things. I can only begin to imagine how the data they have collected about me and other women my age has managed to land these things in my “Ad Interests”.

Reinforcing Biases THROUGH DATA AND AI

There are also more severe consequences emerging, however, as a result of the misuse of individuals’ data and lack of informed consent. Our data can be used against us in insidious ways, while not only reinforcing but also potentially worsening ingrained structural biases.

Anyone with a smartphone can use AI technology to manipulate images of themselves — using filters to look like cartoon characters or animals, swap faces with other people, or “predict” how you might look when you’re older. Snapchat and, more recently, FaceApp are the most common perpetrators. Apps that allow you to manipulate an image of yourself to see how you would look with a different haircut or hair colour are becoming increasingly common too.

Many of us also allow airbrushing apps like FaceApp to smooth out our skin, remove our freckles and widen our eyes. FaceApp came under flak in 2017 for a ‘hot filter’ that had lightened skin colour as part of its filter process. In an apology, it was identified as a bug, not a feature; an “unfortunate side effect … caused by the training set bias”. Replicating this bias in an airbrushing selfie app may not be too severe in isolation, but these “unfortunate side effects” don’t exist in a vacuum.

These apps, however – Meitu, Snapchat and FaceApp, among others – are used for fun. It can be argued that this is part of what makes them insidious – we’re just having a laugh, seeing how we would look if we were sixty, or had our cat’s face, so it’s even less likely to occur to us that these images of our faces are being held by these companies for ever, and we’re no longer in control of what they’re used for. It’s not these apps in isolation that are causing problems, but our increasingly relaxed attitudes to allowing our faces to be collected as data are blurring the lines between what is human and what is AI-created. As this line becomes more and more blurred, the prejudices and inequalities that AI can reinforce become easier and easier to make worse in often subtle, frightening ways.

Deepfakes – WHAT ARE THEY, and WHat is the Problem?

These types of technology are becoming so advanced that it is possible to replicate images of people that are basically completely indistinguishable from the actual, living person. Carrie Fisher didn’t realise that the CGI version of her younger self in Rogue One was fake; she thought it was real footage of her, and she just couldn’t remember filming it. Our ability to replicate such perfect likenesses of people is great for the purposes of entertainment, much in the same way that being able to swap faces with your dad and post it on social media is great for the purposes of entertainment. It is not so great when it is in the wrong hands – and it is in the wrong hands.

‘Deepfake pornography’ has already been used to target a number of female celebrities, such as Daisy Ridley, Michelle Obama and Gal Gadot. Using similar face-swapping technology to what so many of us use on our own smartphones, the victims’ faces can be copied into an existing pornographic film, and made public without the victims’ knowledge or consent. As the technology becomes more accessible, this is now being more commonly used against everyday women as a type of ‘revenge porn’, and it is extremely traumatic for the victims.

It is important to note, however, when talking about the victims of this type of abuse, that it is not the victim’s fault for posting images of themselves on social media, in the same way that it is not the victim’s fault if they face any other form of abuse. It is, however, the fault of the society that we exist in and the ways that our privacy is breached online and our data collected without our completely informed consent, as well as the fault of the perpetrator.

The use of Deepfakes are also beginning to gain prominence in political situations, and can pose a genuine threat to democracy. Vladimir Putin has allegedly been “mucking around” with Deepfakes in preparation for influencing the 2020 US elections, with many senators and policy insiders believing that Putin and potentially others may end up using Deepfake technology to destabilise the election and falsely influence American voters. Jordan Peele recently released a Deepfake video of Barack Obama calling Donald Trump a “dipshit”, in order to draw attention to the potential dangers of using this type of technology in political spaces (12).

The BBC has stated that Deepfake videos could actively contribute to violent outbreaks. Deepfake technology’s ability to create convincing likenesses of public figures could, according to Clint Watts of the Foreign Policy Research Institute, actually put public safety at risk “if it was adopted by those pushing ‘false conspiracies’. Considering that Deepfakes are already being used as political tools, there is a strong chance that their misuse could become massively destructive. In an era of fake news and political mistrust, where many of us already feel anxious and unsure of what and who to believe, this misuse of AI – and the prejudices, abuse and bigotry it can uphold – is extremely unsettling.

Wrapping Up

We are all living in a climate of uncertainty. Most of us put a significant portion of our lives online, and there is nothing wrong with that exactly; social media has infiltrated our lives in a way that is almost inescapable for most of us, and its popularity makes it a great way to connect with people. Enjoy using social media as much as you like, and take advantage of the benefits it brings — but be careful of your data. Be mindful of who you give your data to, and don’t be afraid to do some research before you agree to give an app free access to your information.

There are numerous benefits to the accelerated development of machine learning and AI. Technology is moving forward in ways that can bring about incredible societal change. But that enormous potential can go either way. We need to be cautious of how data is being used, and who is being given control of it.

2 thoughts on “AI Bias: Prejudice in Machine Learning

Leave a Reply