Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

# The reduction of race and gender bias in clinical treatment recommendations using clinician peer networks in an experimental setting

## Abstract

Bias in clinical practice, in particular in relation to race and gender, is a persistent cause of healthcare disparities. We investigated the potential of a peer-network approach to reduce bias in medical treatment decisions within an experimental setting. We created “egalitarian” information exchange networks among practicing clinicians who provided recommendations for the clinical management of patient scenarios, presented via standardized patient videos of actors portraying patients with cardiac chest pain. The videos, which were standardized for relevant clinical factors, presented either a white male actor or Black female actor of similar age, wearing the same attire and in the same clinical setting, portraying a patient with clinically significant chest pain symptoms. We found significant disparities in the treatment recommendations given to the white male patient-actor and Black female patient-actor, which when translated into real clinical scenarios would result in the Black female patient being significantly more likely to receive unsafe undertreatment, rather than the guideline-recommended treatment. In the experimental control group, clinicians who were asked to independently reflect on the standardized patient videos did not show any significant reduction in bias. However, clinicians who exchanged real-time information in structured peer networks significantly improved their clinical accuracy and showed no bias in their final recommendations. The findings indicate that clinician network interventions might be used in healthcare settings to reduce significant disparities in patient treatment.

## Introduction

Bias is an enduring cause of healthcare disparities by race and gender1,2,3,4,5,6,7. Previous experimental work demonstrated that clinicians reviewing video-based vignettes of high-risk patients with chest pain disproportionately referred men compared to women, and white patients compared to Black patients for the guideline-recommended treatment, cardiac catheterization1,7. Proposed solutions for addressing bias have focused on cognitive strategies that increase clinicians’ awareness of their own biases6,8,9,10,11. However, no approaches have yet been found that successfully reduce race and gender bias in clinical treatment recommendations6,10.

Recent research in non-clinical settings has shown that information exchange in large social networks with uniform—i.e. egalitarian12—connectivity can be effective for improving collective intelligence in both health-related and non-health-related risk assessments13,14,15,16. Studies of bias reduction in partisan networks15,16 have found that this process of collective learning in egalitarian networks can effectively reduce, and even eliminate longstanding political biases in the evaluation of novel information16. Here, we integrate this recent work on bias reduction in egalitarian information-exchange networks with theoretical research on medical reasoning17,18,19, which has argued that improving the accuracy of clinicians’ diagnostic assessments should improve the quality of their treatment recommendations19,20,21. We hypothesize that creating structured information-exchange networks among clinicians will lead to improved clinical assessments that may be effective for reducing observed patterns of race and gender bias in clinical treatment recommendations13,14. Despite the broad practical2,3,4 and scientific1,5,6,7 importance of understanding and addressing bias in medical settings, it has not been possible to evaluate this hypothesis because such a test requires the ability to experimentally isolate and measure the direct effects of clinical networks on reducing medical bias and treatment disparities.

We adopted an experimental approach to evaluate whether large, uniform information-exchange networks among clinicians13,14,15,16 might significantly reduce observed race and gender bias in clinicians’ treatment recommendations, relative to a control group of independent clinicians who did not participate in information-exchange networks. We recruited 840 practicing clinicians (see Supplementary Information for details) to participate in the online video-based study, which was administered through a proprietary mobile app for clinicians. Each clinician viewed a standardized patient video of either a white male patient-actor or Black female patient-actor, and provided clinical assessments and treatment recommendations for the depicted clinical case (see SI, Supplementary Fig. 5 and Supplementary Fig. 6). Both the white male and Black female “patients” in the videos were portrayed by professional actors who appeared 65 years old, were dressed in identical attire, and depicted a patient with clinically significant chest pain symptoms. The actors portraying each patient followed a single script, in which they provided an identical clinical history that included several risk factors for coronary artery disease (age, hyperlipidemia, and discomfort with exertion). Both videos were accompanied by an identical electrocardiogram exhibit showing abnormalities. (Hereafter, we refer to the patient-actors in the standardized patient videos as “patients”.)

After viewing the patient video, clinicians were asked to provide their initial clinical assessments and treatment recommendations. Clinical assessments took the form of a probability estimate (from 0 to 100) of the patient’s chance of having a major adverse cardiac event within the next 30 days. The most accurate assessment based on the patient’s HEART score is 16%22,23. (Additional analyses show the robustness of our findings across a range of assessment values. See SI “Sensitivity Analyses”, Supplementary Fig. 9 and Supplementary Fig. 10). Clinicians then selected a single treatment recommendation from four multiple choice options: Option A. daily 81 mg aspirin and return to clinic in one week (i.e. unsafe undertreatment); Option B. daily 81 mg aspirin and stress test within two to three days (i.e. undertreatment); Option C. full-dose aspirin and referral to emergency department for evaluation and monitoring (i.e. highest quality, guideline-recommended treatment); or Option D. full-dose aspirin and referral to cardiology for urgent cardiac catheterization (i.e. overtreatment in the context of unconfirmed diagnosis).

Option C is the most appropriate treatment based on currently accepted guidelines from the American College of Cardiology23,24 and represents the highest standard of care22,23. (In consideration of the fact that some clinicians may choose a less aggressive initial strategy in a patient with atypical symptoms, we conducted sensitivity analyses that accepted Option B and Option C as correct. As reported in Supplementary Fig. 11 and Supplementary Fig. 12 in SI “Sensitivity Analyses”, these analyses show the robustness of our findings for both Option B and Option C). Our primary measure of bias is the rate at which the white male patient versus the Black female patient was given the highest quality, guideline-recommended care (Option C). Our secondary measure of bias is inequity in the treatment of the Black female and white male patients, in terms of the relative rates at which the white male patient and the Black female patient were recommended for unsafe undertreatment (option A) rather than the guideline-recommended care (Option C) (see SI). We focus on the relative rates of option A and C because this aligns with the observed racial disparities in workup and referral rates for chest pain in clinical care25. Option A is unsafe and inappropriately defers workup to one week later putting the patient at risk of significant adverse outcomes. Option B (undertreatment) is not as unsafe as Option A because it shortens the time period of further evaluation from one week to 3 days; however, it is not the guideline-recommended care, which advises immediate evaluation for cardiac tissue damage to appropriately triage the patient. Option D is incorrect because without a troponin measurement (which assesses for acute damage of cardiac tissue), the patient presentation does not warrant an immediate invasive procedure. Option D exposes the patient to the risk of a potentially unnecessary invasive procedure and wasteful healthcare spending.

In each trial, clinicians were randomized to one of four conditions: (i) network condition with the Black female patient; (ii) network condition with the white male patient; (iii) control (independent reflection) condition with the Black female patient or (iv) control condition with the white male patient. In the two network conditions, clinicians were randomly assigned to a single location in a large, anonymous uniform social network (n = 40), in which every clinician had an equal number of connections (z = 4), which ensured that no single clinician had greater power over the communication dynamics within the network13 (see SI, Supplementary Fig. 4 for network details). Clinicians were anonymous and did not have any information about how many peers they were connected to. Clinicians’ contacts in the network remained the same throughout the experiment. In the two control conditions, clinicians provided their responses in isolation. Because clinicians in the control conditions were independent from one another, fewer overall clinicians were required for the control analyses (n = 20 in each trial). For proper comparison with the experimental conditions, we randomly assigned clinicians in each control condition into bootstrapped “groups” of n = 40, and conducted our analyses at the group level (see SI, “Statistical Analyses”).

In all conditions, clinicians were given three rounds to provide their assessments and treatment recommendations for the presented patient. In the initial round, all clinicians independently viewed their respective videos, and were then given two minutes to provide their clinical assessments and treatment recommendations. In the control conditions, clinicians remained isolated for two additional rounds of evaluation. In round two, they viewed the patient video a second time, and were again given two minutes to respond. Clinicians could either provide the same responses or modify their responses. In the final round, clinicians repeated this procedure again, and provided their final responses. In the network conditions, in round two clinicians were again shown the patient video, as well as being shown the average assessment responses (i.e. diagnostic estimates) of their network contacts, and then asked to provide their assessments and recommendations (see SI, Supplementary Fig. 6). Clinicians could either provide the same responses they gave in the initial round or modify their responses. In the final round, this procedure was repeated again showing the average responses from round two, and clinicians were asked to provide their final responses.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

## Data availability

The data collected for this study are available for download from the Network Dynamics Group website: https://ndg.asc.upenn.edu/experiments/physician-reasoning/. The data are also available at https://github.com/drguilbe/cliniciansCISource data are provided with this paper.

## Code availability

The code for this study was written in R, and is deposited on GitHub, available at: https://github.com/drguilbe/cliniciansCI.

## References

1. 1.

Chen, J., Rathore, S. S., Radford, M. J., Wang, Y. & Krumholz, H. M. Racial differences in the use of cardiac catheterization after acute myocardial infarction. N. Engl. J. Med. 344, 1443–1449 (2001).

2. 2.

Dehon, E. et al. A Systematic review of the impact of physician implicit racial bias on clinical decision making. Acad. Emerg. Med. 24, 895–904 (2017).

3. 3.

FitzGerald, C. & Hurst, S. Implicit bias in healthcare professionals: a systematic review. BMC Med. Ethics 18, 19 (2017).

4. 4.

Hall, W. J. et al. Implicit racial/ethnic bias among health care professionals and its influence on health care outcomes: a systematic review. Am. J. Public Health 105, E60–E76 (2015).

5. 5.

Hirsh, A. T., Hollingshead, N. A., Ashburn-Nardo, L. & Kroenke, K. The interaction of patient race, provider bias, and clinical ambiguity on pain management decisions. J. Pain. 16, 558–568 (2015).

6. 6.

Maina, I. W., Belton, T. D., Ginzberg, S., Singh, A. & Johnson, T. J. A decade of studying implicit racial/ethnic bias in healthcare providers using the implicit association test. Soc. Sci. Med. 199, 219–229 (2018).

7. 7.

Schulman, K. A. et al. The effect of race and sex on physicians’ recommendations for cardiac catheterization. N. Engl. J. Med. 340, 618–626 (1999).

8. 8.

Burgess, D. J., Beach, M. C. & Saha, S. Mindfulness practice: a promising approach to reducing the effects of clinician implicit bias on patients. Patient Educ. Couns. 100, 372–376 (2017).

9. 9.

Croskerry, P. The importance of cognitive errors in diagnosis and strategies to minimize them. Acad. Med. 78, 775–780 (2003).

10. 10.

Sherbino, J., Kulasegaram, K., Howey, E. & Norman, G. Ineffectiveness of cognitive forcing strategies to reduce biases in diagnostic reasoning: a controlled trial. CJEM 16, 34–40 (2014).

11. 11.

Sukhera, J., Gonzalez, C. & Watling, C. J. Implicit bias in health professions: from recognition to transformation. Acad. Med. 95, 717–723 (2020).

12. 12.

Guilbeault, D. & Centola, D. Networked collective intelligence improves dissemination of scientific information regarding smoking risks. PLoS ONE 15, e0227813 (2020).

13. 13.

Becker, J., Brackbill, D. & Centola, D. Network dynamics of social influence in the wisdom of crowds. Proc. Natl Acad. Sci. USA 114, E5070–E5076 (2017).

14. 14.

Becker, J., Guilbeault, D. & Smith, E. B. The crowd classification problem. Manag. Sci. https://doi.org/10.1287/mnsc.2021.4127 (2021).

15. 15.

Becker, J., Porter, E. & Centola, D. The wisdom of partisan crowds. Proc. Natl Acad. Sci. USA 116, 10717–10722 (2019).

16. 16.

Guilbeault, D., Becker, J. & Centola, D. Social learning and partisan bias in the interpretation of climate trends. Proc. Natl Acad. Sci. USA 115, 9714–9719 (2018).

17. 17.

Elia, F., Apra, F. & Crupi, V. Understanding and improving decisions in clinical medicine (II): making sense of reasoning in practice. Intern. Emerg. Med. 13, 287–289 (2018).

18. 18.

Kurvers, R. H. et al. Boosting medical diagnostics by pooling independent judgments. Proc. Natl Acad. Sci. USA 113, 8777–8782 (2016).

19. 19.

Pauker, S. G. & Kassirer, J. P. The threshold approach to clinical decision making. N. Engl. J. Med. 302, 1109–1117 (1980).

20. 20.

Brush, J. E., Lee, M., Sherbino, J., Taylor-Fishwick, J. C. & Norman, G. Effect of teaching bayesian methods using learning by concept vs learning by example on medical students’ ability to estimate probability of a diagnosis: a randomized clinical trial. JAMA Netw. Open 2, e1918023 (2019).

21. 21.

Centor, R. M., Geha, R. & Manesh, R. The pursuit of diagnostic excellence. JAMA Netw. Open 2, e1918040 (2019).

22. 22.

Byrne, C., Toarta, C., Backus, B. & Holt, T. The HEART score in predicting major adverse cardiac events in patients presenting to the emergency department with possible acute coronary syndrome: protocol for a systematic review and meta-analysis. Syst. Rev. 7, 148 (2018).

23. 23.

Fernando, S. M. et al. Prognostic accuracy of the HEART score for prediction of major adverse cardiac events in patients presenting with chest pain: a systematic review and meta-analysis. Acad. Emerg. Med. Feb 26, 140–151 (2019).

24. 24.

Amsterdam, E. A. et al. 2014 AHA/ACC Guideline for the Management of Patients with Non-ST-Elevation Acute Coronary Syndromes: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. J. Am. Coll. Cardiol. 64, e139–e228 (2014).

25. 25.

Napoli, A. M., Choo, E. K., Dai, J. & Desroches, B. Racial disparities in stress test utilization in an emergency department chest pain unit. Crit. Pathw. Cardiol. 12, 9–13 (2013).

26. 26.

Krockow, E. M. et al. Harnessing the wisdom of crowds can improve guideline compliance of antibiotic prescribers and support antimicrobial stewardship. Sci. Rep. 10, 18782 (2020).

27. 27.

Butler, M. et al. in Comparative Effectiveness Reviews, No. 170 (Agency for Healthcare Research and Quality, 2016).

28. 28.

Zeidan, A. J. et al. Implicit bias education and emergency medicine training: step one? awareness. AEM Educ. Train. 3, 81–85 (2019).

29. 29.

Abimanyi-Ochom, J. et al. Strategies to reduce diagnostic errors: a systematic review. BMC Med. Inf. Decis. Mak. 19, 174 (2019).

30. 30.

Henriksen, K. & Brady, J. The pursuit of better diagnostic performance: a human factors perspective. BMJ Qual. Saf. 22, ii1–ii5 (2013).

31. 31.

Barnett, M. L., Boddupalli, D., Nundy, S. & Bates, D. W. Comparative accuracy of diagnosis by collective intelligence of multiple physicians vs individual physicians. JAMA Netw. Open 2, e190096 (2019).

32. 32.

Chaudhry, B. et al. Systematic review: impact of health information technology on quality, efficiency, and costs of medical care. Ann. Intern. Med. 144, 742–752 (2006).

33. 33.

Evans, S. C. et al. Vignette methodologies for studying clinicians’ decision-making: validity, utility, and application in ICD-11 field studies. Int. J. Clin. Health Psychol. 15, 160–170 (2015).

34. 34.

Khoong, E. C. et al. Impact of digitally acquired peer diagnostic input on diagnostic confidence in outpatient cases: a pragmatic randomized trial. J. Am. Med. Inf. Assoc. 28, 632–637 (2021).

35. 35.

Veloski, J., Tai, S., Evans, A. S. & Nash, D. B. Clinical vignette-based surveys: a tool for assessing physician practice variation. Am. J. Med. Qual. 20, 151–157 (2005).

36. 36.

Roland, D., Coats, T. & Matheson, D. Towards a conceptual framework demonstrating the effectiveness of audiovisual patient descriptions (patient video cases): a review of the current literature. BMC Med. Educ. 12, 125 (2012).

37. 37.

Adilman, R. et al. Social media use among physicians and trainees: results of a national medical oncology physician survey. J. Oncol. Pract. 12, 79–80 (2016).

38. 38.

Cooper, C. P. et al. Physicians who use social media and other internet-based communication technologies. J. Am. Med. Inf. Assn 19, 960–964 (2012).

39. 39.

Lambe, K. A., Hevey, D. & Kelly, B. D. Guided reflection interventions show no effect on diagnostic accuracy in medical students. Front. Psychol. 9, 2297 (2018).

40. 40.

Monteiro, S. D. et al. Reflecting on diagnostic errors: taking a second look is not enough. J. Gen. Intern. Med. 30, 1270–1274 (2015).

41. 41.

Carroll, A. E. The high costs of unnecessary care. JAMA 318, 1748–1749 (2017).

42. 42.

Hess, P. L. et al. Appropriateness of percutaneous coronary interventions in patients with stable coronary artery disease in US Department of Veterans Affairs Hospitals from 2013 to 2015. JAMA Netw. Open 3, e203144 (2020).

43. 43.

Ko, D. T. et al. Regional variation in cardiac catheterization appropriateness and baseline risk after acute myocardial infarction. J. Am. Coll. Cardiol. 51, 716–723 (2008).

44. 44.

Lyu, H. et al. Overtreatment in the United States. PLoS ONE 12, e0181970 (2017).

45. 45.

Jayles, B. & Kurvers, R. Exchanging small amounts of opinions outperforms sharing aggregated opinions of large crowds. PsyArxiv https://arxiv.org/pdf/2003.06802.pdf (2020).

46. 46.

Pletcher, M. J., Kertesz, S. G., Kohn, M. A. & Gonzales, R. Trends in opioid prescribing by race/ethnicity for patients seeking care in US emergency departments. JAMA 299, 70–78 (2008).

47. 47.

Burgess, D. J. et al. The effect of cognitive load and patient race on physicians’ decisions to prescribe opioids for chronic low back pain: a randomized trial. Pain. Med. 15, 965–974 (2014).

48. 48.

Rauscher, G. H., Allgood, K. L., Whitman, S. & Conant, E. Disparities in screening mammography services by race/ethnicity and health insurance. J. Women’s Health 21, 154–160 (2012).

49. 49.

Harris, P. A. The impact of age, gender, race, and ethnicity on the diagnosis and treatment of depression. J. Manag Care Pharm. 10, S2–S7 (2004).

## Acknowledgements

D.C. gratefully acknowledges support from a Robert Wood Johnson Foundation Pioneer Grant, #73593 (D.C.), and thanks Alan Wagner for App development assistance and Joshua Becker for helpful contributions to App development.

## Author information

Authors

### Contributions

D.C. designed the project, D.C., E.K., and U.S. developed the research materials, D.G. and J.W. ran the experiments, D.C., D.G., E.K., and J.W. conducted the analyses, D.C. wrote the manuscript. All authors commented on and approved the final manuscript.

### Corresponding author

Correspondence to Damon Centola.

## Ethics declarations

### Competing interests

The authors declare no competing interests.

Peer review information Nature Communications thanks Chloe Fitzgerald, Dhruv Kazi, Ralf Kurvers and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

Reprints and Permissions

Centola, D., Guilbeault, D., Sarkar, U. et al. The reduction of race and gender bias in clinical treatment recommendations using clinician peer networks in an experimental setting. Nat Commun 12, 6585 (2021). https://doi.org/10.1038/s41467-021-26905-5

• Accepted:

• Published: