Building AI Fairness by Reducing Algorithmic Bias

Emily Diana explores algorithmic bias in machine learning and outlines three intervention stages: pre-processing, in-processing, and post-processing to mitigate algorithmic discrimination.

Automation is increasingly present in our lives. From recommendation algorithms to speech recognition software, credit-granting decisions to hiring assistance tools, our day-to-day experiences and important decisions made about us are often informed by statistical processes. There are many benefits to this – calculations that may be labor-intensive for a person can be made near instantaneously, large amounts of data can be collected and processed to derive new insights, and decisions can be more systematic. The goal is for automated decisions that are more consistent, more reproducible, and less biased than human decisions. While this may be the case sometimes, we have seen repeatedly that automating decisions can easily amplify biases and inequities present in the training data and society. This can be especially harmful if we believe that automation removes our responsibility to mitigate these issues.

Understanding Bias and Fairness in Machine Learning

The concept of bias and fairness in machine learning encompasses a variety of definitions, prompting crucial questions about how models should perform across different population groups. While the choice of an appropriate fairness definition is complex and often involves incompatible mathematical approaches, acknowledging these distinctions is essential for addressing the problem of algorithmic bias.

Examples of Algorithmic Bias

Facial Recognition Bias: The “Gender Shades” Study

In 2015, Joy Buolamwini and Timnit Gebru published a paper commonly referred to as “Gender Shades” that exposed race and gender biases in three popular facial recognition software programs. Dr. Buolamwini and a team of researchers at MIT were experimenting with a commercial face-analysis software for an art installation, and the system only worked reliably on the lighter-skinned colleagues. Dr. Buolamwini had to wear a white mask to allow the system to recognize her face. When Dr. Buoalwimi and Dr. Gebru conducted a thorough empirical analysis on the success of the three software programs in identifying binary gender from facial images, they found that although the accuracy on white men was always over 99 percent, the accuracy for darker-skinned women dropped to 65.3 percent. Upon inspection, they found that the training data was heavily skewed toward lighter-skinned subjects and males, and this imbalance led to dramatically different performance metrics over the different groups.

Hiring Tool Bias: Amazon’s Experience

Similarly, from 2014 to 2015, Amazon was working on a hiring tool to help cut down and score the vast number of applications. When researchers started testing the tool, they noticed that it was systematically penalizing applications from women. What was interesting was that binary sex was not actually a feature that the model could see when evaluating a new application. However, it noticed the presence of the word “women’s” in the resume, such as “women’s chess club captain,” and downgraded applicants from several well-known women’s colleges. These patterns were not coded into the model. Rather, the model had learned patterns from 10 years of hiring data in a male-dominated industry, where most resumes, especially those for hired candidates, came from men. This is an example of societal bias perpetuated by the model.

Intervention Stages for Bias Mitigation

When trying to mitigate statistical bias in machine learning models, there are three main intervention points: pre-processing, in-processing, and post-processing. 

Pre-processing approaches adjust the data before beginning the model training process. This could be through collecting more data, deriving different features or selecting relevant features, re-weighting the data, or curating a more balanced subset of the data. As many instances of algorithmic bias can be traced to imbalances in the dataset, such as in the Amazon hiring and “Gender Shades” examples, collecting higher-quality data can be very beneficial. The downsides to pre-processing approaches for mitigating bias are that collecting more data can be expensive or difficult, many approaches do not have theoretical guarantees on the downstream bias mitigation effects, and it can be difficult to address embedded societal bias distinct from representational concerns.

In-processing approaches take the data as given and adjust the model-training process itself. In particular, the training process and loss function are modified so that fairness considerations are considered rather than just overall accuracy. For example, mistakes on certain groups or certain types of mistakes might be counted more heavily in the training process. One type of in-processing method is to find a statistical model that minimizes the maximal group error rate among pre-defined groups.  

In-processing methods can be effective if one wants provable guarantees on bias mitigation or to achieve some type of trade-off between fairness and accuracy of models. For expensive training procedures, however, it may be infeasible to train a new model from scratch to address bias concerns.

Post-processing approaches take a fully trained model and adjust the outputs to reduce bias. For example, threshold adjustment is a popular approach that might use a different threshold on some prediction or label (such as a test score or credit score) to give a positive classification for different groups. Other approaches, adopted in medical settings, such as multi-calibration, may carefully shift predictions for intersectional group membership to improve accuracy overall and for intersectional identities. Finally, post-processing could involve last-layer tuning of neural networks or fine-tuning the results of large language models. Post-processing approaches tend to be effective in improving overall accuracy and are computationally efficient, though, because the adjustments are typically made as a direct function of sensitive group (or predicted sensitive group) membership, they may not be feasible if those operations are disallowed or not preferred.

Tackling Algorithmic Discrimination

Algorithmic bias and discrimination are complex topics with many angles – they can occur naturally through the machine learning training process and can be very damaging, especially if we do not anticipate them and do not try to mitigate performance discrepancies. When thinking about how to intervene on algorithmic bias, it can be helpful to think about the three broad types of interventions – pre-processing, in-processing, and post-processing approaches – and carefully consider the pros and cons of each when deciding which to use. They may also be used in conjunction, which leaves ample room for creative problem-solving when tackling machine bias and discrimination.