Crime prediction software promised to be free of biases. New data shows it perpetuates them

This article was originally published on The Markup by Aaron Sankin, Dhruv Mehrotra for Gizmodo, Surya Mattu, and Annie Gilbertson and was republished under the Creative Commons Attribution-NonCommercial-NoDerivatives license.

Between 2018 and 2021, more than one in 33 U.S. residents were potentially subject to police patrol decisions directed by crime prediction software called PredPol.

The company that makes it sent more than 5.9 million of these crime predictions to law enforcement agencies across the country—from California to Florida, Texas to New Jersey—and we found those reports on an unsecured server.

The 💜 of EU tech

The latest rumblings from the EU tech scene, a story from our wise ol' founder Boris, and some questionable AI art. It's free, every week, in your inbox. Sign up now!

The Markup and Gizmodo analyzed them and found persistent patterns.

Residents of neighborhoods where PredPol suggested few patrols tended to be Whiter and more middle- to upper-income. Many of these areas went years without a single crime prediction.

By contrast, neighborhoods the software targeted for increased patrols were more likely to be home to Blacks, Latinos, and families that would qualify for the federal free and reduced lunch program.

These communities weren’t just targeted more—in some cases they were targeted relentlessly. Crimes were predicted every day, sometimes multiple times a day, sometimes in multiple locations in the same neighborhood: thousands upon thousands of crime predictions over years. A few neighborhoods in our data were the subject of more than 11,000 predictions.

The software often recommended daily patrols in and around public and subsidized housing, targeting the poorest of the poor.

“Communities with troubled relationships with police—this is not what they need,” said Jay Stanley, a senior policy analyst at the ACLU Speech, Privacy, and Technology Project. “They need resources to fill basic social needs.”

Yet the pattern repeated nearly everywhere we looked:

Neighborhoods in Portage, Mich., where PredPol recommended police focus patrols have nine times the proportion of Black residents as the city average. Looking at predictions on a map, local activist Quinton Bryant said, “It’s just giving them a reason to patrol these areas that are predominantly Black and Brown and poor folks.”

In Birmingham, Ala., where about half the residents are Black, the areas with the fewest crime predictions are overwhelmingly White. The neighborhoods with the most have about double the city’s average Latino population. “This higher density of police presence,” Birmingham-based anti-hunger advocate Celida Soto Garcia said, “reopens generational trauma and contributes to how these communities are hurting.”

In Los Angeles, even when crime predictions seemed to target a majority-White neighborhood, like the Northridge area, they were clustered on the blocks that are almost 100 percent Latino. The neighborhoods in the city where the software recommended police spend the most time were disproportionately poor and more heavily Latino than the city overall. “These are the areas of L.A. that have had the greatest issues of biased policing,” said Thomas A. Saenz, president and general counsel of the L.A.-based Latino civil rights group MALDEF.

About 35 miles outside of Boston, in Haverhill, Mass., PredPol recommended police focus their patrols in neighborhoods that had three times the Latino population and twice the low-income population as the city average. “These are the communities that we serve,” said Bill Spirdione, associate pastor of the Newlife Christian Assembly of God and executive director of the Common Ground food pantry.

In the Chicago suburb of Elgin, Ill., neighborhoods with the fewest crime predictions were richer, with a higher proportion than the city average of families earning $200,000 a year or more. The neighborhoods with the most predictions didn’t have a single one; instead, they had twice as many low-income residents and more than double the percentage of Latino residents as the city average. “I would liken it to policing bias-by-proxy,” Elgin Police Department deputy chief Adam Schuessler said in an interview. The department has stopped using the software.

Overall, we found that the fewer White residents who lived in an area—and the more Black and Latino residents who lived there—the more likely PredPol would predict a crime there. The same disparity existed between richer and poorer communities.

Increase or decrease of populations compared to overall jurisdiction, averaged across all 38. Sources: The Markup, PredPol, U.S. Census Bureau — Increase or decrease of populations compared to overall jurisdiction, averaged across all 38. The Markup, PredPol, U.S. Census Bureau

“No one has done the work you guys are doing, which is looking at the data,” said Andrew Ferguson, a law professor at American University who is a national expert on predictive policing. “This isn’t a continuation of research. This is actually the first time anyone has done this, which is striking because people have been paying hundreds of thousands of dollars for this technology for a decade.”

It’s impossible for us to know with certainty whether officers spent their free time in prediction areas, as PredPol recommends, and whether this led to any particular stop, arrest, or use of force. The few police departments that answered that question either said they couldn’t recall or that it didn’t result in any arrests, and the National Association of Criminal Defense Lawyers said its members are not informed when crime prediction software leads to charges.

Jumana Musa, director of that group’s Fourth Amendment Center, called the lack of information a “fundamental hurdle” to providing a fair defense.

“It’s like trying to diagnose a patient without anyone fully telling you the symptoms,” Musa said. “The prosecution doesn’t say, ‘The tool that we purchased from this company said we should patrol here.’ ”

That’s because they don’t know either, according to the National District Attorneys Association, which polled a smattering of members and found that none had heard of it being part of a case.

Only one of 38 law enforcement agencies in our analysis, the Plainfield Police Department in New Jersey, provided us with more than a few days of PredPol-produced data indicating when officers were in prediction zones—and that data was sparse. None of it matched perfectly with arrest reports during that period, which were also provided by the agency.

Reports “Found on the Internet”

We found the crime predictions for our analysis through a link on the Los Angeles Police Department’s public website, which led to an open cloud storage bucket containing PredPol predictions for not just the LAPD but also dozens of other departments. When we downloaded the data on Jan. 31, 2021, it held 7.4 million predictions dating back to Feb. 15, 2018. Public access to that page is now blocked.

We limited our analysis to U.S. law enforcement agencies with at least six months of predictions and removed predictions generated outside of contract dates, which were likely testing or trial periods. That left 5.9 million predictions provided to 38 agencies over nearly three years.

PredPol, which renamed itself Geolitica in March, criticized our analysis as based on reports “found on the internet.” But the company did not dispute the authenticity of the prediction reports, which we provided, acknowledging that they “appeared to be generated by PredPol.”

Company CEO Brian MacDonald said our data was “incomplete,” without further explanation, and “erroneous.” The errors, he said, were that one department inadvertently doubled up on some shifts, resulting in additional predictions, and that the data for at least 20 departments in the cache included predictions that were made after the contract period and not delivered to the agencies.

We explained that we had already discovered date discrepancies for exactly 20 departments and were not using that data in our final analysis, and volunteered to share the analysis dates with him for confirmation. He instead offered to allow us to use the software for free on publicly available crime data instead of reporting on the data we had gathered. After we declined, he did not respond to further emails.

Only 13 law enforcement agencies responded to requests for comment about our findings and related questions, most with a written statement indicating they no longer use PredPol.

One exception was the Decatur Police Department in Georgia. “The program as well as the officers’ own knowledge of where crime is occurring assists our department in utilizing our patrol resources more efficiently and effectively,” public information officer Sgt. John Bender said in an emailed statement. A third of Decatur’s low-income households were in a pair of neighborhoods that were each the subject of more than 11,000 crime predictions in two years.

As the percentage of households making less than $45,000 a year went up, so did predictions. Sources: The Markup, PredPol, U.S. Census Bureau — As the percentage of households making less than $45,000 a year went up, so did predictions. The Markup, PredPol, U.S. Census Bureau

Except for Elgin, whose deputy chief called the software “bias by proxy,” none of the 38 agencies that used PredPol during our analysis period expressed concern about the stark demographic differences between the neighborhoods that received the most and least predictions.

We asked MacDonald whether he was concerned about the race and income disparities. He didn’t address those questions directly but rather said the software mirrored reported crime rates “to help direct scarce police resources to protect the neighborhoods most at risk of victimization.” The company has long held the position that because the software doesn’t include race or other demographic information in its analysis, that “eliminates the possibility for privacy or civil rights violations seen with other intelligence-led or predictive policing models.”

Founders Knew About Disparities

Yet according to a research paper, PredPol’s co-founders determined in 2018 that the algorithm would have targeted Black and Latino neighborhoods up to 400 percent more than White residents in Indianapolis had it been used there.

The company did not provide the study to its law enforcement clients, MacDonald said, because it “was an academic study conducted independently of PredPol.” The authors presented the paper at an engineering conference that’s not part of the usual police circuit, the 2018 IEEE International Conference on Systems, Man and Cybernetics.

The study authors developed a potential tweak to the algorithm that they said resulted in a more even distribution of crime predictions. But they found its predictions were less in line with later crime reports, making it less accurate than the original algorithm, although still “potentially more accurate” than human predictions.

MacDonald said the company didn’t adjust its software in response.

“Such a change would reduce the protection provided to vulnerable neighborhoods with the highest victimization rates,” he said.

While MacDonald responded to some written questions by email, none of the company’s leaders would agree to an interview for this story.

To use PredPol’s algorithm, police departments set up an automatic feed of crime reports, which experts and police said include incidents reported by both the public and by officers, and choose which crimes they want predicted. The algorithm uses three variables to come up with future crime predictions: the date and time, the location, and the type of past crime reports.

The predictions consist of 500-by-500-foot boxes marked on a map listing the police shift during which the crimes are most likely to occur. PredPol advises officers to “get in the box” during free time. Officials in some cities said officers frequently drove to prediction locations and completed paperwork there.

Who Reports Crime?

In his email to The Markup and Gizmodo, MacDonald said the company’s choice of input data ensures the software’s predictions are unbiased.

“We use crime data as reported to the police by the victims themselves,” he said. “If your house is burglarized or your car stolen, you are likely to file a police report.”

But that’s not always true, according to the federal Bureau of Justice Statistics (BJS). The agency found that only 40 percent of violent crimes and less than a third of property crimes were reported to police in 2020, which is in line with prior years.

The agency has found repeatedly that White crime victims are less likely to report violent crime to police than Black or Latino victims.

In a special report looking at five years of data, BJS found an income pattern as well. People earning $50,000 or more a year reported crimes to the police 12 percent less often than those earning $25,000 a year or less.

This disparity in crime reporting would naturally be reflected in predictions.

“There’s no such thing as crime data,” said Phillip Goff, co-founder of the nonprofit Center for Policing Equity, which focuses on bias in policing. “There is only reported crime data. And the difference between the two is huge.”

Source: U.S. Department of Justice Bureau of Justice Statistics — U.S. Department of Justice Bureau of Justice Statistics

The Markup, PredPol, law enforcement agencies

MacDonald didn’t respond to questions about these studies and their implications, but PredPol’s founders acknowledged in their 2018 research paper that place-based crime prediction algorithms can focus on areas that are already receiving police attention, creating a feedback loop that leads to even more arrests and more predictions there.

We examined more than 270,000 arrests in the 11 cities using PredPol that provided those records to us (most refused) and found that locations with lots of predictions tended to have high arrest rates in general, suggesting the software was largely recommending officers patrol areas they already frequented.

Five cities provided us with data on officer use of force, and we found a similar pattern. In Plainfield, per capita use-of-force rates were nearly double the city average in the neighborhoods with the most predictions. In Niles, Ill., per capita use of force was more than double the city average in high-prediction neighborhoods. In Piscataway, N.J., the arrest rate was more than 10 times the city average in those neighborhoods.

“It’s a reason to keep doing what they’re already doing,” said Soto Garcia, the Birmingham-based activist, “which is saying, ‘This area sucks.’ And now they have the data to prove it.”

Take the 111-unit Buena Vista low-income housing complex in Elgin. Six times as many Black people live in the neighborhood where Buena Vista is located as the city average.

Police made 121 arrests at the complex between Jan. 1, 2018, and Oct. 15, 2020, according to records provided by the city, many for domestic abuse, several for outstanding warrants, and some for minor offenses, including a handful for trespassing by people excluded from the complex.

Those incidents, along with 911 calls, fed the algorithm, according to Schuessler, the Elgin Police Department’s deputy chief.

As a result, PredPol’s software predicted that burglaries, vehicle crimes, robberies, and violent crimes would occur there every day, sometimes multiple times a day—2,900 crime predictions over 29 months.

By comparison, the software only predicted about 5 percent as many crimes, 154, in an area about four miles north of Buena Vista where White residents are the majority.

Schuessler said police spent a lot of time at Buena Vista because of a couple of police programs focused on the complex, not software predictions.

Proportion of neighborhoods' race and ethnicity, averaged across 38 jurisdictions. Sources: The Markup, PredPol, U.S. Census Bureau — Proportion of neighborhoods’ race and ethnicity, averaged across 38 jurisdictions. Sources: The Markup, PredPol, U.S. Census Bureau

Steep Consequences

Frequent police presence at Buena Vista, whatever led them there, had steep consequences for one family.

Brianna Hernandez had spent two years on a waiting list to get into Buena Vista. When she found an intent-to-evict notice on her door last year, she said she broke down in tears in the kitchen that would no longer be hers. It was November 2020. Daily COVID-19 infection rates in Illinois had spiked to an all-time high, and hospitals were stuffed to capacity with the sick and the dying.

A few months earlier, Hernandez’s longtime boyfriend Jonathan King had stopped by Buena Vista to drop off cash for expenses for her and their three small children.

He was sitting on her car in the parking lot, waiting, when officer Josh Miller of the police department’s Crime Free Housing Unit rolled by in an unmarked car.

“You know you’re not supposed to be here, right?” King remembers Miller asking him.

The city’s crime-free housing ordinance requires all leases to allow eviction if the renters, their relatives, or guests are involved in criminal activity, even nearby, and allows the city to punish landlords that don’t deal with it.

King, now 31, said Buena Vista had banned him years before when he was on parole for a robbery he committed as a minor in Chicago 14 years earlier.

“They told him that once you got off probation you would be able to come back,” Hernandez said. “Apparently, that didn’t happen.”

It was King’s third arrest for trespassing at Buena Vista. He ran for it, and when officers caught up to King, they said they found a gun nearby, which King denies belongs to him. Miller arrested him for trespassing and weapons possession. The arrest came at the time of a PredPol prediction, but Schuessler said that’s not what led to it. That case is still pending.

“I know he’s banned, but what can a man do?” Hernandez asked. “He has kids.”

She said the arrest led to the eviction notice from Buena Vista. (Buena Vista wouldn’t confirm or deny it.) Hernandez remembers her 4-year-old and 5-year-old children asking, “Why are we going to a hotel?” and struggling for an answer. “They want to know why we’re moving stuff out. Why this and why that…. I wanted to sit down and cry.”

Robert Cheetham, the creator of a PredPol competitor, HunchLab, said he wrestled with the vicious cycle crime prediction algorithms could create.

“We felt like these kinds of design decisions mattered,” he said. “We wanted to avoid a situation where people are using the patrol area maps as an excuse for being around too much and in a way that wouldn’t necessarily be helpful.” He said his company tried to solve the problem by evening out the number of predictions delivered to each neighborhood.

Advocates we spoke to in at least six cities were unaware PredPol’s software was being used locally. Even those involved in government-organized social justice committees said they didn’t have a clue about it.

“It did not come up in our meetings,” said Kenneth Brown, the pastor of Haverhill’s predominantly Black and Latino Calvary Baptist Church, who chaired a citywide task force on diversity and inclusion last year.

Calcasieu Parish, La., which started receiving predictions on April 9, 2019, refused to confirm it was using the software. Robert McCorquodale, an attorney with the sheriff’s office who handles public records requests, cited “public safety and officer safety” as the reasons and said that, hypothetically, he wouldn’t want would-be criminals to outwit the software.

“I don’t confess to be an expert in this area,” he said, “but I feel like this is not a public record.”

We kept Calcasieu in our data because its predictions began in the middle of our analysis period and continued until the end, suggesting it is a legitimate new client. Calcasieu’s predictions were not among the most disparate in our data, and removing them would not have meaningfully altered the results of our analysis.

Drug and Sex Crime Predictions

The Markup and Gizmodo also found that some policing agencies were using the software to predict crimes PredPol advises against. These include drug crimes, which research has shown are not equally enforced, and sex crimes, both of which MacDonald said the company advises clients against trying to predict.

We found four municipalities used PredPol to predict drug crimes between 2018 and 2021: Boone County, Ind.; Niles, Ill.; Piscataway; and Clovis, Calif. Clovis was also one of three departments using the software to predict sexual assaults. The other two were Birmingham and Fort Myers, Fla.

When we asked MacDonald about it, he said policing agencies make their own decisions on how to use the software.

“We provide guidance to agencies at the time we set them up and tell them not to include event types without clear victimization that can include officer discretion, such as drug-related offenses,” he wrote. “If they decide to add other event types later that is up to them.”

Thomas Mosier, the police chief in Piscataway, said in an interview that he doesn’t recall receiving any instructions about not predicting certain crime types. The other agencies declined to comment about it or ignored our questions altogether.

Nearly every agency also combined fundamentally different crime types into a single prediction. For instance, authorities in Grass Valley, Calif., mixed assaults and weapons crimes with commercial burglaries and car accidents.

MacDonald said “research and data support the fact that multiple crime types can be concentrated in specific crime hotspots.”

Christopher Herrmann, a criminologist at the John Jay College of Criminal Justice, disagreed.

“Crime is very specific,” Herrmann said. “A serial murderer is not going to wake up one day and start robbing people or start stealing cars or selling drugs. The serial shoplifter isn’t going to start stealing cars. A serial rapist isn’t going to start robbing people.”

A study looking at crime patterns in Philadelphia found that “hot spots of different crime types were not found to overlap much,” and a 2013 book about predictive policing published by the RAND Corporation recommended against mixing crimes for predictions.

“The Wrong Place at the Wrong Time”

When we asked police departments that made arrests at the time and locations of PredPol predictions whether the software had brought them to the locations, they generally wouldn’t comment.

Corey Moses, for instance, was stopped by the LAPD on Feb. 11, 2019, for smoking a Newport cigarette in a nonsmoking area by a train station in MacArthur Park during the time of a crime prediction period there. The officer ran Moses’s name and discovered he had a warrant for an unpaid fine for fare evasion. Moses was cuffed, searched, and thrown in jail for the night.

“Sometimes you gotta really be doing some stupid stuff for the police to bother you, and then sometimes you don’t,” said Moses, who is Black and 41 years old. “You can just be at the wrong place at the wrong time.”

The LAPD didn’t respond to questions about whether the officer was responding to a PredPol prediction.

We did not try to determine how accurately PredPol predicted crime patterns. Its main promise is that officers responding to predictions prevent crimes by their presence.

But several police departments have dropped PredPol’s software in recent years, saying they didn’t find it useful or couldn’t judge its effectiveness. These include Piscataway; West Springfield, Mass.; and Los Angeles, Milpitas, and Tracy, Calif.

“As time went on, we realized that PredPol was not the program that we thought it was when we had first started using it,” Tracy Police Department chief of staff Sgt. Craig Koostra said in a written statement. He did not respond to a request to elaborate.

Some agencies soured on the software quickly. In 2014, a year after signing up, Milpitas Police Department lieutenant Greg Mack wrote in an evaluation that the software was “time consuming and impractical” and found no evidence that using it significantly lowered crime rates.

In his email, MacDonald declined to provide the number of clients the company has now or had during the analysis period but stated that the number of U.S. law enforcement agencies in our analysis was not an accurate count of its clients since 2018. Of the 38 U.S. law enforcement agencies in our analysis, only 15 are still PredPol customers—and two of those said they aren’t using the software anymore, despite paying for it.

The Markup, PredPol, various law enforcement agencies, media reports

Even PredPol’s original partner, the LAPD, stopped using the software last year.

The department said it was a financial decision due to budget constraints. But it came after the LAPD’s inspector general said it couldn’t determine if the software was effective and members of the Stop LAPD Spying Coalition protested at a police commission meeting, waving signs reading “Data Driven Evidence Based Policing = Pseudoscience” and “Crime Data Is Racist.”

The result was an end to a relationship begun under former police chief Bill Bratton, who had sent one of his lieutenants to UCLA to find interesting research that could be applied to crime-fighting. He ran across P. Jeffrey Brantingham, an anthropologist whose early work involved devising models for how ancient people first settled the Tibetan plateau.

“Each time mathematics interfaces itself with a new discipline, it is invigorated and renewed,” Brantingham and PredPol co-founder George Mohler, now a computer scientist at Indiana University–Purdue University Indianapolis, wrote in a National Science Foundation grant application in 2009. Brantingham’s parents were academics who pioneered the field of environmental criminology, the study of the intersection of geography and crime. And he said he learned a lot at their feet.

“I didn’t realize it, but I was accumulating knowledge by osmosis, hearing about crime and criminal behavior while spending time with my parents,” Brantingham said in a 2013 profile in UCLA’s student newspaper.

“Criminals are effectively foragers,” he added. “Choosing what car to steal is like choosing which animal to hunt.”

Collaborating with LAPD burglary detectives, Brantingham and Mohler developed an algorithm to predict property crime and tested it out. It was credited with lowering property crimes by 9 percent in the division using it, while these crimes rose 0.2 percent in the rest of the city.

The academic research that led to PredPol was funded by more than $1.7 million in grants from the National Science Foundation. UCLA Ventures and a pair of executives from telephone headset manufacturer Plantronics invested $3.7 million between 2012 and 2014 to fund the nascent commercial venture.

Around the same time, the U.S. Department of Justice began encouraging law enforcement agencies to experiment with predictive policing. It has awarded grants to at least 11 cities since March 2009, including PredPol clients in Newark, N.J.; Temple Terrace, Fla.; Carlsbad and Alhambra, Calif.; and the LAPD, which received $3 million for various projects.

But PredPol has now lost luster in academic circles: Last year, more than 1,400 mathematicians signed an open letter begging their colleagues not to collaborate on research with law enforcement, specifically singling out PredPol. Among the signatories were 13 professors, researchers, and graduate students at UCLA.

MacDonald in turn criticized the critics. “It seems irresponsible for an entire profession to say they will not cooperate in any way to help protect vulnerable communities,” he wrote in his email to The Markup and Gizmodo.

Here to Stay

Ferguson, the American University professor, said that whatever PredPol’s future, crime predictions made by software are here to stay—though not necessarily as a standalone product. Rather, he said, it’s becoming part of a buffet of police data offerings from larger tech firms, including Oracle, Microsoft, Accenture, and ShotSpotter, which uses sound detection to report gunshots and bought the crime prediction software HunchLab.

When we reached out to those companies for comment, all except Oracle, which declined comment, backed away from being associated with “predictive policing”—even though in the past all of them had pitched or publicized their products being used for it and HunchLab was a PredPol competitor.

PredPol’s original name was formed from the words predictive and policing, but even it is now distancing itself from the term—MacDonald called it a “misnomer”—and is branching out into other data services, shifting its focus to patrol-officer monitoring during its rebranding this year as Geolitica.

And that, too, was Ferguson’s point.

“These big companies that are going to hold the contracts for police [data platforms] are going to do predictive analytics,” Ferguson said.

“They’re just not going to call it predictive policing,” he added. “And it’s going to be harder to pull apart for journalists and academics.”