Using Big Data To Reform the Criminal Justice System
Impact Stories
October 14, 2019 – Criminal Justice

Using Big Data To Reform the Criminal Justice System

A comprehensive new database designed to evaluate crime and punishment could revolutionize how we make laws and treat offenders.


Imagine a nationwide database of criminal records that policy-makers could explore and analyze to further criminal justice reform. Such data could offer insight into how specific policy changes impact crime rates in various parts of the country or help policy makers track recidivism patterns.

You might well think such a national database already exists, but you’d be wrong. Most criminal records languish in local law enforcement databases, disconnected from each other.

David Eagleman, Ph.D., founder of the Center for Science and Law, is addressing this deficit with an ambitious plan to create a national clearinghouse of crime information that will allow for large-scale, cross-jurisdictional analyses of criminal arrest. His Criminal Records Database (CRD) has the potential to revolutionize the way we craft criminal law and punish and rehabilitate those who break the law.

“We’re leveraging big data to understand what legal changes work,” Eagleman says. “When a state changes a law or a prosecutorial policy, it’s rare that anyone takes a look back to understand whether it made things better or worse. Now we can do that.”

A New Way To Look at Crime

The CRD compiles tens of millions of courthouse records collected from major cities – such as Houston, New York City, and Miami – as well as entire states, such as Alabama, New Mexico, Virginia, and North Carolina. The records span from 1977 to present. Years in the making, the CRD will continue to grow as locations are added and current locations are updated. Updates include removing sealed or expunged cases. By translating all the information from different jurisdictions into a common framework, the CRD allows users to make detailed cross-jurisdictional comparisons—something no other criminal records database has been able to accomplish.

Currently, when someone is arrested, personal information and the reason for the arrest are recorded in a local database. If local law enforcement elects to share this information, the FBI can enter it into the Uniform Crime Reports (UCR) system. For criminal quantitative analysis, the UCR has long been the best resource at our disposal, but unfortunately the system is limited. “The UCR only lists the top-level [charges],” Eagleman says. “If you’re arrested on six counts, it will only show the most serious one. And because there are no identifiers, there is no way to analyze recidivism.”

Since the UCR relies on voluntary reporting, the data is random rather than comprehensive, and the system lacks detail about individual crimes. Further, the UCR cannot tell you which charges resulted in plea bargains, the corresponding fine for each offense, or the sentence imposed. Without this vital information, it’s difficult to explore which policies are working and which are not. A researcher trying to find links between specific offenses and the probability of recidivism doesn’t get the full picture.

Creating a database that corrects for such issues was no small feat. To build the CRD, Eagleman, a neuroscientist at Stanford University in California, and his team at the Center for Science and Law in Texas, utilized the Freedom of Information Act, submitting requests for arrest records in various counties, cities, and states across the country. This might not seem like such a difficult task by today’s digital standards, but jurisdictions are protective of their data. And then there’s a cost: while filing a Freedom of Information Act request is free, there is often a fee to obtain the actual information once the request has been accepted. But it worked. “After years of banging on doors, we now have full data sets – every crime on record – from counties and states all over the nation,” Eagleman says.

Once the data was acquired, Eagleman’s work was far from complete. A team of lawyers and computer programmers embarked on a multi-year process of sifting through all the records. Many entries contained typos, and different counties had different names and descriptions for the same criminal offenses. Eagleman’s team had to create a common framework that could facilitate these cross-jurisdictional comparisons. “We translated all the data into a single framework, so that all jurisdictions were directly comparable,” Eagleman says.

To streamline the data, the varied criminal offenses were grouped into 30 broad categories and 150 detailed ones. By setting uniform standards, the database allows users to compare the same crimes across all locations. The team used a Google visualization tool that allows users to easily plot various metrics: how specific crimes have changed over time, the percentage of males and females who have committed a given crime, the percentage of those found guilty, which cases were dismissed, who paid for lawyers, who had court appointed lawyers, and many more. Such high-dimensional data for each crime has never been available before the CRD.

This type of visualization tool becomes useful if, for example, a legislator evaluating a policy proposal wants to see what happened when similar laws were passed elsewhere in the country. The tool easily reveals whether a law actually did what it was intended to do—say, drive crime down. And if a law has an unintended consequence, like shifting crime into a different category, the user can see that too. “We’ve made the database open to the public,” Eagleman says. “With a few clicks, anybody can look this up.”

Another unique aspect of the CRD is its ability to use unique identifiers to track specific offenders anonymously. To properly understand recidivism patters, the database needed to track individuals over time; Eagleman wanted to do this without having any identities revealed. To this end, the database assigns a unique number to each person so users can track specific offenders without knowing who they are. “Why do we care about studying recidivism?” Eagleman asks. “Of several people who’ve done the same crime, some will do it again, while others never will. People are different and have different motivations,” Eagleman says. “Understanding the crimes and situations that correlate with high recidivism is critical from courtrooms to legislation.” Data from the CRD helps researchers to understand these issues based on data, not intuition.

Why Does This Matter?

As a neuroscientist, Eagleman might not seem the most obvious person to lead the charge to create such a comprehensive criminal records database. But as someone who studies the human brain and human behavior extensively, he has taken a deep dive into criminal behavior with the hopes that it will help influence criminal justice reform.

Eagleman is a leader in the field of Neurolaw, which he describes as the “intersection of modern brain science and the criminal justice system.” He hopes the criminal justice system can use the findings of neuroscience to craft more effective policies and guide productive sentencing reform. A big aspect to bridging these two worlds is data.

The Criminal Records Database (CRD) is supported by a grant from the Charles Koch Foundation.