This is the question we set out to answer in the season finale of Raw Data. From policing to court decisions to how lawyers assemble their cases, algorithms and data science are beginning to make their way into the justice system, and the stakes are enormous.
In the past few decades, the U.S. has constructed a system of mass incarceration. We currently have around 2.2 million people in jail – and 20% of them are being detained pretrial, when there is still a presumption of innocence. “Mass incarceration,” says Rutgers history professor Donna Murch, “is also associated with the disproportionate incarceration – some people would call it hyper-incarceration – of black people and other populations of color.” The statistics are grim. Only 12% of the U.S. population is black, yet they represent 40% of our prison population.
So are new technologies exacerbating this problem or improving the status quo? One important area to examine is predictive policing, in which algorithmic tools are deployed to predict future locations of crimes or to identify future perpetrators. But given that algorithms are trained on historical data – data that holds the weight of hundreds of years of racial discrimination – how do we evaluate the fairness and efficacy of these tools? As sociologist Sarah Brayne of UT Austin told us, we also need to consider what data inputs are missing, which can skew results: “[The] dark figure of crime [is] the gap between actual crime statistics and official statistics. Issues like sexual assault, for example, have a much bigger dark figure, where the gap between the official crime statistics that are inputted into the algorithm and the actual crime rates might be really different.” If the inputs to these predictive policing algorithms are messy and incomplete, it seems inevitable that their ability to improve public safety and reduce crime will be similarly imperfect.
Predictive policing tools are meant to more efficiently bring people into the criminal justice system, but in recent years, a reform movement aimed at reducing mass incarceration has also led to experiments with data science. Of the hundreds of thousands of people in jail pretrial, many pose a low flight risk and low threat to public safety, so how can we help judges improve their decision making when it comes to setting bail?
Some jurisdictions have begun trying out algorithmic risk assessment tools, which are used to assign people scores that may then be used by a judge to decide whether or not to release that person on bail. (The level at which bail is set and whether the defendant can afford it is a whole other issue and outside the scope of our episode but worth reading up on.) Stanford computational social scientist Sharad Goel tells us that risk assessment software has led to concrete improvements – judges have detained fewer defendants pretrial in these jurisdictions than they would have otherwise. But just as with predictive policing tools, there are many issues with the data inputs that are used to predict an individual’s risk score. Explicitly asking for the race of a defendant would be illegal and unconstitutional, but some variables like zip code can correlate strongly with a person’s race, especially in a country that remains highly segregated.
Last year, ProPublica’s investigation “Machine Bias” found that a commonly used software called COMPAS has a much higher error rate in risk assessments of black people than for white people: among defendants who ultimately did not reoffend, blacks were twice as likely as whites to be classified as medium or high risk. In a rebuttal in The Washington Post, Sharad and several colleagues argue that while robust, ProPublica’s analysis does not fully convey the challenge of making a risk assessment algorithm “fair.” The math is a little wonky – I have to admit that while Mike and I were working on this episode, we banged our heads against the desk for days trying to understand the statistical arguments on both sides. (We do recommend reading both original pieces, however.) But Sharad explained to us that estimating risk for black defendants is just an inherently more difficult statistical problem – again, there is a lot of messiness (noise) to the historical data because of over-policing and unfair treatment.
So, it comes down to this: we have three options when it comes to using risk assessment tools.
- Don’t use an algorithm – in which case, we return to allowing human judges, with their own sets of biases, the discretion to set bail.
- Set different risk thresholds for different race groups – for example, maybe white people should be detained at a risk score of 3 or higher, while black people should be detained only at a risk score of 6 or higher. The problem here is that this blatantly violates the 14th amendment (equal protection), so it’s not really even an option.
- Set the same threshold for all defendants regardless of race – say, if you score a 5 or higher, you’re detained, even though we now have the analysis that proves we would be locking up more low-risk black people and releasing more high-risk white people.
Unfortunately none of these options is great – either we give full discretion to judges or we have an unconstitutional algorithm or we live with a higher error rate for black defendants. The last scenario is our current reality (and what ProPublica’s investigation highlighted). While I understand that the larger question is how we define “fairness” for an algorithm, I also get where ProPublica is coming from emotionally. That core problem – that it’s inherently harder to accurately predict risk for a black person – is, again, the product of hundreds of years of discrimination combined with public policy like the War on Drugs that has targeted black communities. If we assume we do want to use algorithms, and if option 3 is the only viable path, then we are continuing to lock up black Americans at a disproportionate rate. As a society, we remain shackled to our history.
Honestly, my head is still spinning from this episode. The thorny issues that come with using algorithms and data science in the American criminal justice system have reinforced for me the magnitude of the challenge of distilling complex societal, historical, and cultural relationships into 1’s and 0’s. It seems almost an absurd undertaking, but I’m reminded by socially-conscious researchers like Sharad that these tools – even with their problems – can begin to make a positive difference. I don’t want to lose the point that risk assessment algorithms have reduced the number of people being locked up pretrial. The endeavor to create an ever more equal, just, and peaceful society continues, but as citizens we need to stay awake to our own responsibility for the society we live in, and our individual complicity in tragedies like mass incarceration. Ultimately, the dark edge of algorithms is not just in relinquishing decision-making power – it’s also the temptation to shrug off the weight of the consequences that follow.
In the course of reporting this episode, Mike and I drew from a lot of great media, journalism, and scholarly research. As a very short list, our media essentials include watching 13th, a Netflix Original documentary by Ava DuVernay, and reading The New Jim Crow by Michelle Alexander. Stanford recently published a Q&A with Sharad Goel, and we also recommend checking out a short video featuring Sharad and his colleagues titled “Algorithmic decision making and the cost of fairness.” In our podcast episode, you’ll also hear the voice of Angèle Christin, who studies the social impact of algorithms and big data technologies – check out her work as well. Thanks to Donna Murch for making the time to come into studio for the short day she was visiting Stanford – her work includes Living for the City: Migration, Education, and the Rise of the Black Panther Party in Oakland, California. Keep an eye out for Sarah Brayne – we think she’s doing some really fascinating research, and we’re looking forward to seeing what else she learns about big data and policing. And finally, while this blog post didn't touch on it, Pablo Arrendondo's company CaseText is also worth checking out if you're into legal research.
We also want to give special thanks to Professor Michael Eric Dyson of Georgetown University, who we spoke with as we were putting together this episode. Our conversation with Dr. Dyson helped shape and clarify our ideas. We recommend his most recent book, Tears We Cannot Stop: A Sermon to White America.
This episode is the season finale for season two of Raw Data! Mike and I would like to express huge gratitude to our Worldview family for their support. The inimitable Isha Salian was our intern this season – she’s taking a break from us this summer working as a tech reporter at the San Francisco Chronicle. You go girl! Thanks also to Jackson Roach, who helped edit and score a few episodes this season. And finally, thanks as always to Allison Berke and everyone else at the Stanford Cyber Initiative for making Raw Data possible.
Thank you for listening and following along on our season 2 journey. Mike and I will be working on stories for season three over the summer. In the meantime, please get in touch with any feedback or story ideas. We’re always at firstname.lastname@example.org. Til soon, take care!