An algorithm widely used in US hospitals to allocate health care to patients has been systematically discriminating against black people, a sweeping analysis has found.
The study, published in Science on 24 October1, concluded that the algorithm was less likely to refer black people than white people who were equally sick to programmes that aim to improve care for patients with complex medical needs. Hospitals and insurers use the algorithm and others like it to help manage care for about 200 million people in the United States each year.
This type of study is rare, because researchers often cannot gain access to proprietary algorithms and the reams of sensitive health data needed to fully test them, says Milena Gianfrancesco, an epidemiologist at the University of California, San Francisco, who has studied sources of bias in electronic medical records. But smaller studies and anecdotal reports have documented unfair and biased decision-making by algorithms used in everything from criminal justice to education and health care.
“It is alarming,” says Gianfrancesco of the latest study. “We need a better way of actually assessing the health of the patients.”
Ziad Obermeyer, who studies machine learning and health-care management at the University of California, Berkeley, and his team stumbled onto the problem while examining the impact of programmes that provide additional resources and closer medical supervision for people with multiple, sometimes overlapping, health problems.
When Obermeyer and his colleagues ran routine statistical checks on data they received from a large hospital, they were surprised to find that people who self-identified as black were generally assigned lower risk scores than equally sick white people. As a result, the black people were less likely to be referred to the programmes that provide more-personalized care.
The researchers found that the algorithm assigned risk scores to patients on the basis of total health-care costs accrued in one year. They say that this assumption might have seemed reasonable because higher health-care costs are generally associated with greater health needs. The average black person in the data set that the scientists used had similar overall health-care costs to the average white person.
But a closer look at the data revealed that the average black person was also substantially sicker than the average white person, with a greater prevalence of conditions such as diabetes, anaemia, kidney failure and high blood pressure. Taken together, the data showed that the care provided to black people cost an average of US$1,800 less per year than the care given to a white person with the same number of chronic health problems.
The scientists speculate that this reduced access to care is due to the effects of systemic racism, ranging from distrust of the health-care system to direct racial discrimination by health-care providers.
And because the algorithm assigned people to high-risk categories on the basis of costs, those biases were passed on in its results: black people had to be sicker than white people before being referred for additional help. Only 17.7% of patients that the algorithm assigned to receive extra care were black. The researchers calculate that the proportion would be 46.5% if the algorithm were unbiased.
When Obermeyer and his team reported their findings to the algorithm’s developers — Optum of Eden Prairie, Minnesota — the company repeated their analysis and found the same results. Obermeyer is working with the firm without salary to improve the algorithm.
He and his team collaborated with the company to find variables other than healthcare costs that could be used to calculate a person's medical needs, and repeated their analysis after tweaking the algorithm accordingly. They found that making these changes reduced bias by 84%.
“We appreciate the researchers’ work,” Optum said in a statement. But the company added that it considered the researchers' conclusion to be “misleading”. “The cost model is just one of many data elements intended to be used to select patients for clinical engagement programs, including, most importantly, the doctor's expertise.”
Obermeyer says that using cost prediction to make decisions about patient engagement is a pervasive issue. “This is not a problem with one algorithm, or one company — it’s a problem with how our entire system approaches this problem,” he says.
Finding fixes for bias in algorithms — in health care and beyond — is not straightforward, Obermeyer says. “Those solutions are easy in a software engineering sense: you just rerun the algorithm with another variable,” he says. “But the hard part is: what is that other variable? How do you work around the bias and injustice that is inherent in that society?”
This is in part because of a lack of diversity among algorithm designers, and a lack of training about the social and historical context of their work, says Ruha Benjamin, author of Race After Technology (2019) and a sociologist at Princeton University in New Jersey.
“We can’t rely on the people who currently design these systems to fully anticipate or mitigate all the harms associated with automation,” she says.
Developers should run tests such as those performed by Obermeyer’s group routinely before deploying an algorithm that affects human lives, says Rayid Ghani, a computer scientist at Carnegie Mellon University in Pittsburgh, Pennsylvania. That kind of auditing is more common now, he says, since reports of biased algorithms have become more widespread.
“Are more doing it now than used to? Yes,” says Ghani. “Are enough of them doing it? No.”
He thinks that the results of these audits should always be compared to human decision making before assuming that an algorithm is making things worse. Ghani says that his team has carried out unpublished analyses comparing algorithms used in public health, criminal justice and education to human decision making. They found that the machine-learning systems were biased — but less so than the people.
“We are still using these algorithms called humans that are really biased,” says Ghani. “We’ve tested them and known that they’re horrible, but we still use them to make really important decisions every day.”
Nature 574, 608-609 (2019)