How to overcome the potential for unintended bias in data algorithms

Lindsey Zuloaga Lindsey Zuloaga
August 28, 2018 Big Data, Cloud & DevOps

Ready to learn Data Science? Browse Data Science Training and Certification courses developed by industry thought leaders and Experfy in Harvard Innovation Lab.

Anyone that has been online recently may have heard that some scary, biased algorithms are running wild, unchecked. They are sentencing criminals, deciding who gets fired, who gets hired, who gets loans, etc.

If you read many of the latest articles and books, it is natural to have a visceral negative response. Of course the prospect of racist, sexist robots making important decisions that affect people is terrifying.

While some of the media frenzy is warranted, these issues are not always so clear-cut. Like people, algorithms should not be stereotyped.

Algorithms have the potential to help us overcome rampant human bias. They also have the potential of magnifying and propagating that bias. I firmly believe this is an issue and it is the duty of data scientists to audit their algorithms to avoid bias.

However, even for the most careful practitioner, there is no clear-cut definition of what makes an algorithm “fair.” In fact, there are many competing notions of fairness among which there are trade-offs when it comes to dealing with real world data.

Let’s talk about three types of algorithms:

  1. The Dream: Algorithms that are audited for bias and possibly adjusted accordingly – are able to predict well without any bias. Everyone is happy! 
  2. Problematic Algorithms: Cathy O’neil’s Weapons of Math Destruction highlights several situations with bad practices – algorithms making high-stakes decisions that are not trained on outcomes, have no transparency, were trained on bad data, use features with poor statistics, are not audited for fairness, etc. 
  3. The Gray Area: Algorithms that are accurate and fair by some definitions of fairness, but not others. 

The less obvious cases described in #3 can get very interesting and controversial. Not all of these algorithms are running wild unchecked, and some have issues that are not the fault of the algorithm, but simply a reflection of what the world is like. 
 

How much are the things that matter in making the decision tied to demographic class?

Let’s say I run a bank and I don’t want to give a home loan (which we will assume is several hundred thousand dollars) to anyone who makes under $15,000 per year, this is a very simple algorithm. Most of us can agree that income is an important factor in the loan decision, but this will lead to varied treatment of different classes since income levels are distributed differently among ethnicity, gender, and age. If the outcome of my decision is that a smaller percentage of one group gets loans compared with another, many people would argue the simple algorithm is unfair.

What makes an algorithm “fair?” Let’s say I have a lot more data besides income – things like credit score, job history, etc. I have a large dataset of past outcomes to train an algorithm for future use.

Aiming for accuracy alone will almost definitely result in different treatment of people along age, race, and gender lines. To be fair, should I aim to approve the same percentage of people from each class, even if that means taking some risks?

Alternatively, I could train my algorithm to equalize the percentage of people from each class that get approved who actually paid back their loan (the true-positive rate which we can estimate from historical data).

A bit of a catch – if I do either of these things, I would have to hold the different groups to different standards. Specifically, I would have to say that I will issue a loan to someone of a certain class, but not to someone else of a different class with the exact same credentials, leading to yet another unfair scenario.

To see how this works with some data, I highly recommend playing with this interactive site created by Google with some artificial credit score data. When determining who gets a loan given that there are two subgroups with different credit score distributions in the data, there is no way to win.

Specifically, there is no situation where you can hold everyone to the same standard (credit score threshold), while also achieving the same approval percentage in each group and the same percentage of true positives (people who should get loans who actually get one).

Data can be biased because it is not diverse or representative, or it can just be “biased” because that is what the world is like – what we call that unequal base rates in the data.

Algorithms trained to associate words by their meaning and context (like word2vec) do not highly associate “woman” and “physicist.” This not because the people who built the algorithm are biased, but because the percentage of physicists who are women is actually low. Conversely, “woman” and “nurse” are associated more strongly than “man” and “nurse.”

The Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) model was criticized for treating blacks and whites differently. This model was designed to assign a risk of recidivism score to parolees.

The risk score should be taken into account, along with other information, when making a parole decision. Although race was not explicitly part of the data that was fed into the model, ultimately non-reoffending blacks were assigned higher risk scores than whites – they had different false positive rates.

Criticism of this model sparked a very interesting conversation around algorithmic fairness. The model is “fair” in one very important way: out of the group assigned x% probability of reoffending, x% of the blacks in the group and x% of the whites in the group actually reoffended (this is a notion known as “predictive equality”). When this algorithm was trained, it was to maximize accuracy and therefore, public safety.

The complaint of critics is how the algorithm failed on another notion of fairness, “Equalized Odds,” which is defined as ‘The average score received by people who reoffended (or did not reoffend) should be the same in each group.’

As I said, non-reoffending blacks received higher average risk scores than whites. What could the designers of the algorithm do to address this? The problem here comes from the unequal base rates in the data. The reason Equalized Odds is not satisfied stems from the fact that blacks were more likely to reoffend in the past (note: the reason behind why the data is skewed is a bigger societal question that will not be addressed in this post).

To satisfy Equalized Odds here would likely mean sacrificing some Predictive Equality – one way to do it would be to release some higher risk black people while not granting parole to lower risk white people. One can see how this system would also be seen as unfair. This would mean that if there were a threshold risk score below which a convict should get parole, that number would be different for blacks and whites.

Satisfying Equalized Odds means that we would have to hold blacks and whites to different standards. This conversation prompted several academic studies and Propublica has acknowledged the inherent trade-offs.

Keep in mind, in this algorithm, race was not an input to the algorithm. One important issue that could be at play here, however, is whether race was highly correlated with any of the input features. This algorithm would have benefited from thorough auditingto attempt to remove or “repair” features with drastic racial differences.

Removing or tweaking features usually leads to decreased accuracy. In the end, data scientists have to determine the proper level of trade-off. This especially makes sense if we assume the different base rates in the data could be the result of unfair treatment and this is something we do not want to propagate.

Avoiding the base rate issue is much harder in some cases than others. Machine learning in marketing is particularly interesting because it is a field that relies heavily on demographic data (either directly or indirectly) to predict which ads a consumer will likely be interested in.

For example, a lower percentage of women may see a software developer job posting. This comes from the fact that a lower percentage of women are interested in that job (again, the reasons for that are beyond the scope of this post). Expecting total equality here is difficult and would hurt the accuracy of the algorithm.

The next time you read about algorithmic bias, I invite you to consider what definitions of fairness are being highlighted. Is the algorithm treating someone unfairly because of their demographic class (I have a good income and credit score, but the bank won’t loan me money because I have red hair) or because of something else that happens to be correlated with their demographic class? (On average, the percentage of loans issued to redheads is less than non-redheads for reasons other than hair color).

The first scenario is often what we think of when we talk about bias in everyday language, and should be distinguished from the second scenario, which is more complex and requires research to fully understand.

  • Experfy Insights

    Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Lindsey Zuloaga

    Tags
    Data Science
    Leave a Comment
    Next Post
    Companies may be fooling themselves that they are GDPR compliant

    Companies may be fooling themselves that they are GDPR compliant

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    More in Big Data, Cloud & DevOps
    Big Data, Cloud & DevOps
    Cognitive Load Of Being On Call: 6 Tips To Address It

    If you’ve ever been on call, you’ve probably experienced the pain of being woken up at 4 a.m., unactionable alerts, alerts going to the wrong team, and other unfortunate events. But, there’s an aspect of being on call that is less talked about, but even more ubiquitous – the cognitive load. “Cognitive load” has perhaps

    5 MINUTES READ Continue Reading »
    Big Data, Cloud & DevOps
    How To Refine 360 Customer View With Next Generation Data Matching

    Knowing your customer in the digital age Want to know more about your customers? About their demographics, personal choices, and preferable buying journey? Who do you think is the best source for such insights? You’re right. The customer. But, in a fast-paced world, it is almost impossible to extract all relevant information about a customer

    4 MINUTES READ Continue Reading »
    Big Data, Cloud & DevOps
    3 Ways Businesses Can Use Cloud Computing To The Fullest

    Cloud computing is the anytime, anywhere delivery of IT services like compute, storage, networking, and application software over the internet to end-users. The underlying physical resources, as well as processes, are masked to the end-user, who accesses only the files and apps they want. Companies (usually) pay for only the cloud computing services they use,

    7 MINUTES READ Continue Reading »

    About Us

    Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.

    Join Us At

    Contact Us

    1700 West Park Drive, Suite 190
    Westborough, MA 01581

    Email: support@experfy.com

    Toll Free: (844) EXPERFY or
    (844) 397-3739

    © 2023, Experfy Inc. All rights reserved.