The reduction of race and gender bias in clinical treatment recommendations using clinician peer networks in an experimental setting

Bias in clinical practice, in particular in relation to race and gender, is a persistent cause of healthcare disparities. We investigated the potential of a peer-network approach to reduce bias in medical treatment decisions within an experimental setting. We created “egalitarian” information exchange networks among practicing clinicians who provided recommendations for the clinical management of patient scenarios, presented via standardized patient videos of actors portraying patients with cardiac chest pain. The videos, which were standardized for relevant clinical factors, presented either a white male actor or Black female actor of similar age, wearing the same attire and in the same clinical setting, portraying a patient with clinically significant chest pain symptoms. We found significant disparities in the treatment recommendations given to the white male patient-actor and Black female patient-actor, which when translated into real clinical scenarios would result in the Black female patient being significantly more likely to receive unsafe undertreatment, rather than the guideline-recommended treatment. In the experimental control group, clinicians who were asked to independently reflect on the standardized patient videos did not show any significant reduction in bias. However, clinicians who exchanged real-time information in structured peer networks significantly improved their clinical accuracy and showed no bias in their final recommendations. The findings indicate that clinician network interventions might be used in healthcare settings to reduce significant disparities in patient treatment.


nature research | reporting summary
April 2020 Field-specific reporting Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf Behavioural & social sciences study design All studies must disclose on these points even when the disclosure is negative. This is a quantitative, experimental study of how peer consultation networks among clinicians impact the accuracy of their diagnostic assessments and treatment recommendations, as a function of patient demographic.
Baseline characteristics did not differ significantly between the two groups except for the date of NPI assignment, with more clinicians with NPI assignments in 2009-2012 assigned to the control condition (  (2005-2008: 8.6%, 2009-2012: 13.6%, 2013-2016: 32.2%, 2017-present: 45.4%). This sample is not nationally representative. This sample was gathered as a convenience sample. Please see "sampling strategy" below.
Clinicians were recruited from around the US by distributing advertisements over clinician discussion boards on Reddit and Facebook's advertising platforms. Seven recruitment advertisements were posted on Reddit, specifically on messaging boards that attract doctors and resident clinicians. We distributed three advertisements over Facebook, from March to November 2019, while making use of Facebook's advertising platform to target clinicians. We limited advertisement exposure to people who resided in the US, who were 18 to 65, and whose demographic characteristics were among the following features suggested by Facebook: doctor (Dr), medical doctor (MD), and medical director (MD). Beyond online recruitment, clinicians were also recruited through Penn Medicine's Graduate Medical Education training program (for resident MD clinicians). Advertisements were circulated to the 2017 cohort of resident clinicians, and clinicians were also recruited through outreach events as part of Penn Medicine's orientation for incoming residents. Our sample procedure attempted to maximize the available sample size for our experiment, given uncertainty regarding the anticipated effect size. However, effect sizes from prior studies suggested that, assuming a strong effect size, a sample of 7 trials in each experimental condition would provide the minimal lower bound required to anticipate a treatment effect with 80% power.
To initiate a trial, the app sent push notifications to all 1100 clinicians who had registered for the study (Fig. S3). Once 120 clinicians had responded, they were randomized to conditions in a 2:1 ratio -80 clinicians were randomized to the intervention conditions, and 40 clinicians were randomized to the control conditions (Fig. S1). The 80 clinicians randomized to the network condition were then randomized in a 1:1 ratio into each of the network conditions (white male patient or black female patient). The 40 clinicians in the control condition were then randomized in a 1:1 ratio into each of the control conditions (white male patient or black female patient). The researchers collected the data were not blind to the research hypotheses. However, all randomizations were automated through the DxChallenge app, such that the experimenters were blind to the random assignments of clinicians to condition. DG and JZ were present for the data collection.
No data were excluded from this study. Our analyses were calculated using the intention-to-treat sample (see Fig. S1).
14% of participants who entered the game exhibited attrition across trials. Several factors may account for this attrition. One possible factor is that the clinician participants in our sample may have been unexpectedly unable to participate or complete the DxChallenge task as a result of responsibilities and demands in their clinical workplace.
To initiate a trial, the app sent push notifications to all 1100 clinicians who had registered for the study (Fig. S3). Once 120 clinicians had responded, they were randomized to conditions in a 2:1 ratio -80 clinicians were randomized to the intervention conditions, and 40 clinicians were randomized to the control conditions (Fig. S1). The 80 clinicians randomized to the network condition were then randomized in a 1:1 ratio into each of the network conditions (white male patient or black female patient). The 40 clinicians in the control condition were then randomized in a 1:1 ratio into each of the control conditions (white male patient or black female patient). All randomizations were automated through the app. (See "Statistical Analyses" for greater detail).