Inaccuracies have been reported in pulse oximetry measurements taken from people who identified as Black. Here, we identify substantial ethnic disparities in the population numbers within 12 pulse oximetry databases, which may affect the testing of new oximetry devices and impact patient outcomes.
There has been an unprecedented demand for pulse oximetry—a method of determining the oxygen saturation (SpO2) of the blood—during the COVID-19 pandemic to aid in medical decision-making. Pulse oximetry data are also widely used for medical research and algorithm development. The measurement of SpO2 involves shining light onto tissue at two separate wavelengths and derives the oxygen saturation from the relative changes in light absorption with each heartbeat. This pulsatile component is independent of skin pigmentation; however, other factors, such as the specific properties of the light source and the algorithms used by the product manufacturer, can produce variations that depend on skin pigmentation1.
A recent study revealed the clinical importance of racial disparities in pulse oximetry readings2. Specifically, when compared to measurements of arterial oxygen saturation, the pulse oximetry algorithms in these devices were found to produce systematically higher saturation values in Black patients compared to white patients. Such systematic racial biases could adversely affect clinical decision-making, such as triage for supplemental oxygen, due to the pulse oximetry readings of Black patients appearing artificially higher. Such inaccuracies may disproportionately increase the risk of unrecognized low oxygen saturation levels in Black patients under certain circumstances, for example in people with COVID-193.
Recent advancements in artificial intelligence (AI) have relied on using public databases to undertake feature extraction with pulse oximetry signals to assess hypertension4, estimate lung function5, and validate algorithms developed for monitoring patients with COVID-196. To prevent potential disparities in the calibration and accuracy of pulse oximetry devices and their algorithms, the pulse oximeter signals within such public databases need to be representative of the diverse populations on which these devices are used.
Since inaccuracies in pulse oximetry readings have been attributed to differences in skin pigmentation and skin pigmentation varies with race and ethnicity, it is essential to clarify these terms as used in this article. We have predominantly chosen to use the term ethnicity because that is the specific term used in the public datasets and is also the term that is identified by the patients themselves. For the purpose of discussing health disparities, while both race and ethnicity are social constructs, ethnicity has emerged as the preferred one since it encompasses cultural aspects of social identity7 that extend beyond the more simplistic view of race that is primarily based on shared skin pigmentation or physical characteristics8. While members of a given ethnicity can express a range of skin pigmentation, it is generally agreed that those patients who self-identify as Black generally have a darker skin pigmentation than other ethnic groups.
In order to investigate the proportion of individual ethnicities represented in publicly available pulse oximetry databases, we conducted a comprehensive assessment of accessible databases from 1st January 2012–1st January 2022 using PubMed consisting of Medical Subject Headings (MeSH) terms and Title/Abstract keywords. Applying the inclusion and exclusion criteria defined in Fig. 1 resulted in 12 research articles describing 12 publicly-available datasets to assess different medical conditions using pulse oximeter data.
In total, as of January 28th 2022, these databases have been used to produce over 6214 citations according to Google Scholar including 3544 citations for Medical Information Mart for Intensive Care (MIMIC III)9; 1049 citations for MIMIC II10; 531 citations for IEEEPPG Dataset11; 243 citations for Multiparameter Intelligent Monitoring in Intensive Care I (MIMIC I)12; 239 citations for WESAD13; 215 citations for Vortal Dataset14; 102 citations for the CapnoBase Dataset15; 87 citations for the University of Queensland Vital Signs Dataset16; 86 citations for PPG-DaLiA17; 63 citations for PPG-BP Dataset18; 50 citations for Wrist PPG Signals Recorded during Exercise19; and 5 citations for Medical Information Mart for Intensive Care IV (MIMIC-IV)20. We evaluated the existence of potentialdisparities in ethnicity based on the existing patient records as reported in the publicly available databases. In the absence of such information, the numbers of subjects of each category were inferred and quantified based on the authors’ research institutions’ locations or where the data was collected, as shown in Table 1.
To avoid any uncertainty in the results of ethnic disparity analysis for a given population, databases with inferred ethnicity information were excluded from the statistical analysis. Four databases for which data for ethnicity was clearly stated, MIMIC, MIMIC-II, MIMIC-III and MIMIC-IV, were included in the statistical analysis. The distribution of ethnic groups in the four databases is shown in Fig. 2.
We tested the statistical significance among all the subjects in the four databases considering a p-value <0.05 as statistically significant and analyzed the variance using a one-way ANOVA followed by post hoc test to provide simultaneous two-way interactions using the Tukey’s honest significant difference criterion. The results indicated that there was a significant difference between the mean distributions of all racial groups; Asian and Black (p = 0.021), Asian and white (p = 4.10 × 10−14), and Black and white (p = 5.01 × 10−13). The same trend was observed between Other and Asian (p = 9.43 × 10−05), Other and Black (p = 0.026), and Other and white (p = 4.82 × 10−12). The results also suggested a higher proportion of white subjects compared to Asian, Black and other populations. These results demonstrate the existence of clear disparities in these key databases. Detailed results on the statistical separability tests for all pairs of demographic groups are provided in Table 2.
In the remaining databases in which ethnicity was not explicitly stated, the ethnic disparity is not known. However, if we examine the demographic statistics of each data set, based on location, we see that significant potential for disparity exists. For example, the Vortal dataset was collected in the UK in 2016, and the authors did not provide the race of each participant. Based on government records, we can infer the ethnic distributions based on UK ethnicity statistics: 7.5% Asian, 3.4% Black, 0.1% Other, and 80.0% white. The same method to infer ethnicity was used for the remaining databases, as shown in Table 1. Furthermore, since the racial groups were not clearly defined, it does suggest a lax approach to the matter of constructing reference databases, mainly when applied to vascular optical measurement technology that can be influenced by skin color characteristics. White subjects appeared in all four MIMIC databases where the ethnicity was clearly stated, constituting an average of 73.19% of the total population. However, Black subjects only accounted for an average of 9.29% of the sample population. In addition, Asian subjects comprised an average of 2.67% of the total population investigated. Such distributions highlight the potential for racial and ethnic biases in algorithms and devices, leading to possible challenges in their wider application in medicine.
Our findings highlight clear disparities in pulse oximetry databases. As these biased databases would be used during the premarket phase to adjust pulse oximeter accuracy and to develop algorithms for oxygen saturation determination, they place subjects with darker skin pigmentation at increased risk of unrecognized health conditions3. Such health inequalities necessitate the development of new pulse oximeter databases with more racially balanced populations. Our recommendation does not deny the value of exploiting existing biased databases; rather, it attempts to benefit from using these publicly available databases when testing developed algorithms, as well as aiming for more balanced populations in future databases. Asian and Black populations have low representation in existing databases and it would also be beneficial to create an increased number of databases from different geographical regions.
Since last year, the US Food and Drug Administration has started to issue new guidelines to evaluate pre- and post-market pulse oximeters3, and to increase awareness of racial and ethnic disparities that can affect the accuracy of pulse oximetry algorithms. As publicly-accessible databases are commonly used for the development of many biomedical algorithms and devices, our findings highlight the need to improve device algorithms and expand these databases to better represent a diversity of skin pigmentations regardless of the racial or ethnic group. Improving diversity in public databases would help improve the general accuracy of AI algorithms, especially for measurements that involve frequently life-threatening conditions such as COVID-19.
Supplementary Data 1 contains source data for the main figures in this manuscript. Pulse oximetry databases can be accessed via the following links: MIMIC-I (https://www.physionet.org/content/mimicdb/1.0.0/); CapnoBase (https://dataverse.scholarsportal.info/dataverse/capnobase#:~:text=The%20CapnoBase%20benchmark%20dataset%20contains,that%20may%20arise%20during%20anesthesia.); MIMIC-II (https://archive.physionet.org/physiobank/database/mimic2wdb/); University of Queensland Vital Signs (https://outbox.eait.uq.edu.au/uqdliu3/uqvitalsignsdataset/index.html#:~:text=Introduction,at%20the%20Royal%20Adelaide%20Hospital.); IEEEPPG (https://zenodo.org/record/3902710#.YmsOVNrMKUk); MIMIC-III (https://physionet.org/content/mimiciii/1.4/); Vortal (https://peterhcharlton.github.io/RRest/vortal_dataset.html); Wrist PPG Signals Recorded during Exercise (https://physionet.org/content/wrist/1.0.0/); WESAD (https://archive.ics.uci.edu/ml/datasets/WESAD + %28Wearable+Stress+and+Affect+Detection%29); PPG-BP (https://figshare.com/articles/dataset/PPG-BP_Database_zip/5459299); PPG-DaLiA (https://archive.ics.uci.edu/ml/datasets/PPG-DaLiA); MIMIC-IV (https://physionet.org/content/mimiciv/1.0/).
Colvonen, P. J. Response to: investigating sources of inaccuracy in wearable optical heart rate sensors. npj Digit. Med. 4(1), 1–2 (2021).
Sjoding, M. W., Dickson, R. P., Iwashyna, T. J., Gay, S. E. & Valley, T. S. Racial bias in pulse oximetry measurement. N. Engl. J. Med. 383, 2477–2478 (2020).
FDA. Pulse oximeter accuracy and limitations: FDA safety communication, https://www.fda.gov/medical-devices/safety-communications/pulse-oximeter-accuracy-and-limitations-fda-safety-communication (2021).
Elgendi, M. et al. The use of photoplethysmography for assessing hypertension. npj Digit. Med. 2, https://doi.org/10.1038/s41746-019-0136-7 (2019).
Jiang, W. et al. A wearable tele-health system towards monitoring COVID-19 and chronic diseases. IEEE Rev. Biomed. Eng. 15, 61–84 (2021).
Luks, A. M. & Swenson, E. R. Pulse oximetry for monitoring patients with COVID-19 at home. potential pitfalls and practical guidance. Ann. Am. Thorac. Soc. 17(9), 1040–1046 (2020).
Ford, C. L. & Harawa, N. T. A new conceptualization of ethnicity for social epidemiologic and health equity research. Soc. Sci. Med. 71(2), 251–258 (2010).
Harawa, N. T. & Ford, C. L. The foundation of modern racial categories and implications for research on black/white disparities in health. Ethn. Dis. 19, 209–217 (2009).
Johnson, A. E. W. et al. MIMIC-III, a freely accessible critical care database. Sci. Data 3, 160035 (2016).
Saeed, M. et al. Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II): a public-access intensive care unit database. Crit. Care Med. 39(5), 952 (2011).
Zhang, Z., Pi, Z. & Liu, B. TROIKA: A general framework for heart rate monitoring using wrist-type photoplethysmographic signals during intensive physical exercise. IEEE Trans. Biomed. Eng. 62, 522–531 (2015).
Moody, G. B. & Mark, R. G. A database to support development and evaluation of intelligent intensive care monitoring. Comput. Cardiol. 1996, 657–660 (1996).
Schmidt, P., Reiss, A., Duerichen, R., Marberger, C. & Laerhoven, K. V. Introducing WESAD, a multimodal dataset for wearable stress and affect detection. in Proceedings of the 20th ACM International Conference on Multimodal Interaction 400–408 (Association for Computing Machinery, 2018).
Charlton, P. H. et al. An assessment of algorithms to estimate respiratory rate from the electrocardiogram and photoplethysmogram. Physiol. Meas. 37, 610–626 (2016).
Karlen, W., Turner, M., Cooke, E., Dumont, G. & Ansermino, J. M. CapnoBase: signal database and tools to collect, share and annotate respiratory signals. 2010 Annual Meeting of the Society for Technology in Anesthesia, West Palm Beach, Florida, January 13–16, 2010 (2010).
Liu, D. et al. University of Queensland vital signs dataset: development of an accessible repository of anesthesia patient monitoring data for research. Anesth. Analg. 114(3), 584–589 (2012).
Reiss, A., Indlekofer, I., Schmidt, P. & Van Laerhoven, K. Deep PPG: large-scale heart rate estimation with convolutional neural networks. Sensors 19, 3079 (2019).
Liang, Y., Chen, Z., Liu, G. & Elgendi, M. A new, short-recorded photoplethysmogram dataset for blood pressure monitoring in China. Sci. Data 5, 180020 (2018).
Jarchi, D. & Casson, A. Description of a database containing wrist PPG signals recorded during physical exercise with both accelerometer and gyroscope measures of motion. Data 2, 1 (2016).
Johnson, A. et al. MIMIC-IV (version 1.0). PhysioNet. https://doi.org/10.13026/s6n6-xd98 (2021).
The authors declare no competing interests.
Peer review information
Communications Medicine thanks Steve Greenwald and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Sinaki, F.Y., Ward, R., Abbott, D. et al. Ethnic disparities in publicly-available pulse oximetry databases. Commun Med 2, 59 (2022). https://doi.org/10.1038/s43856-022-00121-8