Ethnic disparities in publicly-available pulse oximetry databases

Inaccuracies have been reported in pulse oximetry measurements taken from people who identified as Black. Here, we identify substantial ethnic disparities in the population numbers within 12 pulse oximetry databases, which may affect the testing of new oximetry devices and impact patient outcomes.

The authors present analysis of the racial make-up of available databases containing information about pulse oximetery that could be used to create biomedical algorithms. The diversity of available databases is of general interest and their result, that such databases contain lower diversity is not very surprising yet still important.
It wasn't very clear how the authors performed the "comprehensive assessment" to find all the databases. How did the authors perform the search? What criteria were used to determine if a database should be included in this analysis? These details are important for determining the comprehensiveness of this search, and the validity of the conclusions.
Reviewer #2 (Remarks to the Author): This study was prompted by a recent report which revealed a racial disparity in the accuracy of pulse oximetry readings. It reviews the results of twenty seven studies of pulse oximetry signals in subjects with hypertension, drawn from eight public databases and describes the discrepancies between the numbers of subjects of different skin colour. It also mentions that, of the total number of subjects in all eight databases, a large majority is light skinned. This is important and the paper makes this very clear, because the amplitude of photoplethysmographic signals is dependent on the degree of skin pigmentation. It follows that algorithms used in commercial systems calibrated for light skinned Caucasians, say, may not yield accurate SpO2 readings for subjects with darker skin. Thus the conclusion of the paper, that consideration be given to the distribution of skin-colour of database subjects, is justified.
The main result of this report is that there are significant differences between the 8 databases in the ethnic (and thus skin colour) make-up of their subjects. This disparity would be less of a problem if the source data had been stratified according to ethnicity/skin colour. Presumably this was not the case but it would be useful if the question were briefly discussed. The authors recommend that future databases be ethnically balanced, but is this necessarily the only way to solve the problem? It might be better, when using the databases for algorithm calibration, for instance, to exploit the existing studies with large numbers of a particular skin colour as well as aiming for more balanced populations in future databases.
I have some questions about the statistical methods used. Firstly, a few words would be useful to explain why both parametric and non-parametric tests were used to seek differences in the composition of the databases. I note that the significance of both (at the 5% level) coincides in all but one case (European v East Asian). Which value should be accepted and why? Secondly, I am concerned about the dangers of multiple comparisons between pairs of groups, because this can give rise to contradictions, or at least, inconsistencies. Would 1-way ANOVA followed by a post-hoc test be a more appropriate way test the hypothesis? Finally, I am not clear why the percentages for each database listed in appendix C don't add up to 100.
Minor points. • Second word of the text "oximeter" --> "oximetry" • As an adjective, should black be capitalised? In the main text it has been uniformly capitalised but is inconsistent in appendix A. • In the first line of the penultimate paragraph, what does "warm color spectrum" refer to?

Ethnic Disparities in Pulse Oximetry Databases
Reviewer 1:

It wasn't very clear how the authors performed the "comprehensive assessment" to find all the databases. How did the authors perform the search? What criteria were used to determine if a database should be included in this analysis?
Author reply: Thank you for your valuable question. A comprehensive assessment was implemented by searching for all publicly available databases published in the literature over the last decade.
Author action: We have added a flow chart of study identification, inclusion and exclusion criteria to show a comprehensive search for publicly available PPG datasets as Figure 1 and have improved the wording, accordingly, starting at line 50, as follows: "To explore how racial diversity impacts common databases used for academic medical research, we conducted a comprehensive assessment of accessible databases from 1 January 2010 to 1 January 2021 using PubMed consisting of Medical Subject Headings (MeSH) terms and Title/Abstract keywords. Applying the inclusion and exclusion criteria as defined in Figure 1 resulted in 11 research articles describing 11 publicly available datasets to assess different medical conditions using pulse oximeter data." Reviewer 2: 1. The authors recommend that future databases be ethnically balanced, but is this necessarily the only way to solve the problem? It might be better, when using the databases for algorithm calibration, for instance, to exploit the existing studies with large numbers of a particular skin colour as well as aiming for more balanced populations in future databases.
Author reply: Thank you for your valuable feedback. Using existing databases while developing more diversified ones in future has also been the intention of our recommendation.
Author action: We added the following text to clarify this point, starting at line 127, as follows: "Our recommendation does not deny the value of exploiting existing biased databases; rather, it attempts to benefit from using these publicly available databases when testing developed algorithm, as well as aiming for more balanced populations in future databases creation"