More than thirty years have passed since evidence regarding differential pulse oximetry accuracy afflicting racial and ethnic minoritized groups emerged1. Yet, testimony before the US Food and Drug Administration (FDA) in November 2022 has made clear that new generations of pulse oximeters continue to perpetuate the problem, generating flawed readings that “most likely contributed to the several-fold greater number of deaths in COVID-19 in ethnic minority patients than in white patients,” according to Dr. Amal Jubran, a pulmonary critical care doctor at Loyola Medicine in Chicago1. As of 2022, the FDA’s approval process for medical devices and software does not consider race or ethnicity, which results in harmful, even fatal, consequences for marginalized groups2. The patient populations that the FDA currently considers to assess medical devices before they are brought to market are primarily white men and, to a lesser extent, white women2. In this current form, the FDA’s approval process lacks a health equity focus, as demonstrated by the manifold disparities that arise from medical imaging devices, pulse oximeters and infrared thermometers, which have been implemented without being tested on a representative cohort of patients and have consequently resulted in readout fallacies3,4,5. Despite these well-documented disparities, the most recent FDA guidelines to improve trial diversity lack concrete accountability measures2. To achieve health equity, the FDA should mandate that manufacturers test their medical devices and software on diverse patient populations. In addition, companies should provide information on the composition of patients who participated in the design and calibration of the device or software.

A recent study found that artificial intelligence (AI) deep learning models can identify the race of a patient purely based on chest images from X-rays (area under the receiver operating characteristics curve (AUC): 0.91–0.99), CT machines (AUC: 0.87–0.96) and mammograms (AUC: 0.81)3. The risk inherent in such imaging devices is that AI models can conceivably leverage race as a feature in informing decisions through algorithms that use medical images as input, thereby perpetuating or exacerbating existing racial disparities. Particularly for medical imaging, this risk is compounded by the fact that human experts cannot identify racial identity from medical images, which means that human oversight of AI models is of limited use in mitigating this problem3. This issue creates an enormous risk for all model deployments in medical imaging: if an AI model relies on its ability to detect racial identity to make medical decisions but in doing so produces race-specific errors that clinical radiologists — who do not typically have access to racial demographic information — cannot account for, such factors may unduly influence care decision processes.

Similarly, a recent study on pulse oximeters found that, compared with white patients, Black, Hispanic and Asian patients treated in intensive care units have more significant discrepancies between blood saturation levels that are detected using pulse oximeters and blood tests1. Falsely elevated readings heighten the risk for hidden hypoxemia, which occurs at higher incidence among racial and ethnic minority groups and is associated with higher mortality rates1. Additionally, many clinical AI models rely on oxygen saturation readings to determine diagnoses, which makes downstream errors challenging to detect. Finally, infrared thermometers also mirror this same trend of fallacy4. A recent study from Emory University showed that — because of skin colour discrepancies — forehead thermometers had a 26% lower chance of detecting fever in Black patients than oral thermometers, leading to missed fevers, delayed diagnoses and antibiotic treatment, and increased death rates compared with white patients4. Medical devices and software ultimately generate racial bias because manufacturers do not design and calibrate them on diverse patient cohorts before deploying them in clinical settings. As such, it is imperative to create approval processes that ensure a broader representation of patients during clinical trials.

Based on its recent 2022 report entitled “Artificial Intelligence and Machine Learning (AI/ML) Software as a Medical Device Action Plan,” the FDA anticipates that medical devices and software will require a specialized review process as existing processes were not designed for these technologies2. The FDA calls for a process similar to its most stringent premarket pathway (PMA), which every manufacturer must complete to scientifically demonstrate device safety and effectiveness2. We recommend an extension to the PMA application regarding the subsection on clinical investigations2. Although the clinical investigation part must adhere to specific requirements that are promulgated by the FDA, none of these requirements includes consideration that disease may affect people differently. Although the FDA recommends that sponsors include clinically relevant populations no later than the end of phase two, it is not mandating it2. Instead, the FDA emphasizes that the IEC, the committee of outside experts tasked with ensuring subject safety and rights, has several members that reflect the sociocultural diversity of the communities from which research participants are drawn. The clinical investigation section of the PMA can potentially be improved in terms of its study protocols, patient information and study design (Table 1).

Table 1 Suggested extensions of FDA premarket approval pathway requirements to advance health equity

Efforts to improve representation in clinical testing are not unprecedented; in 2020, the FDA offered nonbinding recommendations to the pharmaceutical industry to increase clinical trial diversity concerning vaccinations in the wake of disproportionately large numbers of people of colour being most severely affected by COVID-19 (ref. 5). In their COVID-19 vaccine trials, Pfizer and Moderna sought to represent the demographic spread of the US population, in part by partnering with historically Black colleges and universities5. Moderna also delayed its enrollment period to ensure thresholds of representation were met for various minority groups5. Ultimately, these efforts led Pfizer and Moderna to enroll a heterogeneous mixture of clinical trial participants that mimicked the racial composition of the US, within a margin of error between 2–3% for racial minorities5. These efforts are laudable, but since the approval of the COVID-19 vaccines, the FDA’s recommendation has largely been met with words instead of actions, with many organizations solely acknowledging the importance of diversity2. Also, a complex interplay of factors, including a lack of perception of the issue among investigators and history-rooted mistrust among patients, limits minority inclusion in trials5. For instance, physicians have been shown to present patients of colour with opportunities for clinical trials less frequently than white patients5. Enrollment efforts often suffer from a lack of cultural appropriateness or an inability to tackle language and health literacy disparities5. Finally, trials are averse to selecting from hospitals more likely to care for people of colour, as they are less likely to have health insurance5.

To facilitate representative samples of racially diverse populations, as seen in the COVID-19 Pfizer and Moderna trials, the FDA should foster collaborations among pharmaceutical companies and minority-serving institutions to enroll additional patients and broaden manufacturing cycles to prioritize recruitment of underrepresented patients. While the roll-out of medical devices and software offers society an array of new technologies, the pharmaceutical industry and the FDA must acknowledge that barriers exist for the most vulnerable. To realize the equity in health outcomes all patients deserve, the FDA must look to examine and test medical devices and software across racially diverse populations.