The end game: respecting major sources of population diversity

Human neuroscience is enjoying burgeoning population data resources: large-scale cohorts with thousands of participant profiles of gene expression, brain scanning and sociodemographic measures. The depth of phenotyping puts us in a better position than ever to fully embrace major sources of population diversity as effects of interest to illuminate mechanisms underlying brain health.


Eclipsing population stratification: emerging challenges
Parallel to burgeoning neuroscience databases, there has been a growing trend in designing neuroimaging-based prediction models 4 . These complex, often nonlinear models are poised to provide insights into old challenges in neuroscience in particular and biology in general 5 . Ensuing predictive models require large datasets to enable accurate predictions in the broader population 6 . However, acquiring more participants does not always result in deeper neuroscientific insight that could be ultimately translated into everyday life. Concretely, decisions from diagnostic models based on a limited spectrum of an actual patient population and diseases can potentially lead to dangerous consequences in everyday clinical practice, such as inaccurate mortality risk prediction in pneumonia patients 7 .
Dataset shift is one source of these inaccuracies and mispredictions. In machine learning terminology, dataset shift is at play when the joint distribution of the features and outputs differs between the training and target data. As an example from medicine, predictive models may fail on new participants with differences in biological or cognitive backgrounds such as demographics, handedness, dyslexia diagnosis, disease prevalence or treatment response. In fact, dataset shift is a serious contender for the most critical limitation of genetics in precision medicine 8 . This challenge is rooted in the overwhelming abundance of studies on participants of European descent, leading to a dearth of well-powered studies in globally diverse populations. Specifically, more than three out of four participants in the thousands of existing genome-wide association studies (GWAS) are of European descent 8 . However, this ethnicity makes up only 16% of the world's population (Fig. 1a). Hence, polygenic risk scores -phenotype predictions based on tens of thousands of common genetic variants -estimate individual risk far more accurately in Europeans than in most non-European groups. In a study quantifying the difference in genetic prediction accuracy across 17 anthropometric and blood-panel traits in non-European versus European participants, prediction accuracy relative to Europeans was 38% lower in Hispanic or Latino/Latina Americans, 38% lower in South Asians, 50% lower in East Asians, and 78% lower in Africans on average ( Fig. 1b) 8 . These results point to a systemic generalization failure attributable partly to the dominance of participants of European descent in genetic studies.
Failures in single-subject prediction caused by dataset shift cannot be remedied by recruiting more participants. Prioritizing diversity instead of enforcing sample homogeneity showed early promise for polygenic risk scores 8 . By respecting population stratification, we can pinpoint genomic variants that are rare or absent in European populations. Diversity-aware analyses have already yielded insights, including genetic risk variants linked to type 2 diabetes in the Latino/Latina population 9 or prostate cancer in African men 10 . These examples highlight the value of increased diversity in GWAS participants to propel genetic discovery, enhance understanding of genetic diseases, and refine medical care tailored to single patients.
Incomplete knowledge of how predictive models perform in distinct subpopulations hampers neuroscience, genetics and other biomedical research areas. We need further research benchmarking predictive models in minority populations since overfitting to narrow subpopulations increases structural racism and ultimately hurts the quality of patient care. Ethnicity and population diversity are closely interlocked with the genesis and pathophysiology of major brain disorders such as autism spectrum disorder (ASD), schizophrenia and Alzheimer's disease. These medical conditions exhibit subpopulation-related differences in prevalence, symptoms and treatment response. Alzheimer's disease is more prevalent among Check for updates COVID-19, stratified into age groups, Belgian men within any age group had a higher COVID-19 infection fatality rate than women. However, in the total population of Belgium, women appeared to show a higher rate. This discrepancy is explained by the fact that there are considerably more older women than men in Belgium 17 . Ignoring such implications of Simpson's paradox can generate misleading conclusions, which can be dangerous, such as false claims of vaccine inefficacy. Concurrently, rarely attended dimensions of population diversity may uncover instances of Simpson's paradox that we are unaware of today.
Attending to dimensions of population diversity does not reduce to including more variables in the statistical model to be estimated. If the goal is to estimate causal effects, the selection of variables or model specification needs to be based on causal grounds. Investigators must propose and defend a plausible causal structure spelling out the assumed (directional) dependencies among the outcome, input variables and relevant confounding variables, including diversity factors 18 . Establishing an assumed causal structure at the beginning of a research endeavor requires consideration of aspects outside the dataset at hand, which can often be challenging. Moreover, the ground-truth causal structure involving some variables, such as socioeconomic status (SES), may be particularly daunting. SES is likely interwoven with inter-individual differences in brain function and susceptibilities to mental and other illnesses throughout the lifespan. Nevertheless, the differences in SES are also likely to arise, at least partly, from differences in brain function and mental illness. Thus, including such causally ambiguous diversity dimensions can lead to deceiving estimates in quantitative models.
Adding as many diversity variables to a statistical model as possible typically makes it more difficult, rarely simpler, to discern what the ultimately obtained statistical model estimates actually mean. The ensuing 'causal salad' refers to the consequences of adding numerous control variables without the necessary attention to causal structure 19 . The growing set of 'control variables' is an invitation to an erroneous causal inference of effects. Statistically controlling for inappro priately picked variables can result in collider bias: the true effects of how input variables relate to the target phenotype African Americans and Hispanics than white Americans in the United States, with estimates ranging from 14% to almost 100% higher 11 . Moreover, women are more often diagnosed with Alzheimer's disease than men 12 . Furthermore, in schizophrenia, men and women tend to diverge in several clinical parameters, including onset age, symptoms, disease severity and treatment responses 13 . Similarly, ASD cohorts typically have 3-5 times more male than female members 14 . Consequently, major brain disorders need to be investigated in a diverse pool of participants. These disorders have complex mechanisms that are, in many cases, interlocked with sex, age, ethnicity and potentially many other social identity factors.
A commitment to illuminating disease mechanisms through the prism of population stratification is even more critical in the aftermath of COVID-19. Demographic status has been linked with outcomes of this public-health crisis (Fig. 2a). Many lines of evidence suggest that we will face more mental health concerns due to chronic social isolation and stress 15 . Recent evidence documented the detrimental effect of COVID-19 on minority and marginalized strata of the US population 16 . Charting patterns across >17,000 candidate variables describing the ABCD population cohort, social determinants of inequity, including household income and immigration status, emerged as the primary determinants of negative pandemic experiences (Fig. 2b). Thus, COVID-19 effects can differ by sociological strata (Fig. 2c). That is why modeling the burden on population strata, especially on minority and marginalized racial and ethnic populations, should be part of the first-line approach in analyses of epidemic-related outcomes.
Statistically, ignoring population stratification can lead to a phenomenon called Simpson's paradox or reversal: a trend appears in several different groups of individuals, but disappears or inverts when the groups are combined. Thus, a treatment that appears effective at the population level may have adverse consequences within specific population subgroup. For instance, a higher drug dosage may appear to be associated with higher recovery rates at the population level. However, within specific population strata, a higher drug dosage may actually result in lower recovery rates. As a real example from are distorted. A collider can refer to a variable affected by both the input variable and outcome. For example, seemingly lower mortality was observed for overweight individuals compared with those with average body mass index when controlling for cardiovascular disease (collider). However, increased body mass index is associated with shorter life expectancy. This statistical distortion of the underlying causal directional graph is because being overweight was associated with a substantially increased risk of developing cardiovascular disease at an earlier age. This, in turn, results in a greater proportion of life with cardiovascular disease morbidity 20 .

Consequences of the diversity blind spot
Respecting distinct population strata can unmask aspects of brain function and organization. For example, existing neuroscientific research emphasizes the importance of handedness in face recognition or language processing. Brain studies that explicitly targeted association. b, Household environment variable loadings from the multivariate pattern analysis. The bar plot reflects the top-ranked household characteristics in families with the primary explanatory mode of the model (pink, positive associations; purple, negative associations). c, A circular bar plot summarizing experienced racism, sleep hygiene and social media consumption associations with the secondary explanatory mode. Panels a-c were plotted from data associated with ref. 16 .

Comment
handedness as a central axis of biology highlighted differences in the functioning of the face-processing network. Based solely on neuroimaging studies that investigated only right-handed participants, face perception was thought to be highly lateralized to the right hemisphere. Therefore, even though earlier investigations deliberately omitted ~10% of the population, the right-hemisphere lateralization of face-sensitive brain areas made it into many neuroscience textbooks 21 . Only recently have studies on handedness in face recognition tasks found that the fusiform face area is commonly lateralized to the right hemisphere in right-handers, whereas asymmetric hemispheric lateralization appears absent in left-handed subpopulations (Fig. 3a) 22 . I r e l a n d S w i t z e r l a n d F r a n c e D e n m a r k I t a l y T a i w a n J a p a n M a i n l a n d Regions activated during a word-generation task are shown in green. Regions activated during a visuospatial attention task are shown in blue. Panels a,b adapted with permission from ref. 21 . c, Handedness as an example of a major source of population diversity, impacting brain and behavior, ties into cultural and geographical differences due to varying societal pushback. The handedness occurrence in the United States is plotted across time. The bar plot highlights the ten jurisdictions with the highest and the five jurisdictions with the lowest left-handedness rates. Panel c plotted from data associated with ref. 35 .

Comment
The interdependence between handedness and interindividual differences in hemispheric specialization has also been observed in language processing. Left-handers show more bilateral language processing while right-handers show language processing lateralized to the left hemisphere (Fig. 3b) 23 . Only a tiny fraction of right-handers (4%) exhibit right-hemispheric dominance of the language network. Yet this share increases to at least 27% in left-handers. Consequently, the whole spectrum of lateralization-related variation will remain hidden as we neglect explicit modeling of hand dominance.
Excluding left-handers in neuroscientific research may not be a coincidence. Handedness has been systematically ignored throughout the history of medical studies, on grounds such as to 'reduce noise' 24 . However, left-handers make up ~10% of the general population and constitute >800 million people on the planet (Fig. 3c). While the neuroscience field has been in the habit of eclipsing population strata with left-handers, several societies re-educate left-handers early on as children. Handedness rates thus depend on geographical location and historical periods. It is estimated 25 that in the United States, only about 3-4% of individuals born before 1920 developed as left-handed, compared with about 11-12% of those born after 1950. The conversion rate can further depend on biological sex. In Japan, the proportion of females forced to convert to the more common right-handedness is much higher than that of males (95.1% to 81.0%) 26 . Finally, in many African cultures, using the left hand is considered disrespectful and rude, which may be why only 7.9% of people in Abidjan, Ivory Coast, and 5.1% of people in Khartoum, Sudan, are left-handed 27 . Bias and unrepresentative participant samples led to several erroneous articles, including claims of a reduced lifespan of left-handers 28 , that shaped public opinion. In summary, cultural and sociological factors can drive inter-individual differences in behavior and its brain basis, which need to be accounted for in population-scale neuroscience studies.
In addition to left-handers being sometimes seen as 'non-normal' or needing correction, women were also less often recruited as participants in neuroscientific research over decades. Fortunately, public health recommendations are no longer made on purely male-based datasets such as the Baltimore Longitudinal Study of Aging. This study, which began in 1958 and explored 'normal human aging', did not enroll any women for the first 20 years of its execution 29 . Furthermore, the Physicians' Health Study concluded in 1989 that daily aspirin might reduce the risk of heart disease, based on 22,071 men and 0 women 30 .
Similarly to left-handedness, homosexuality was listed as a mental disorder in the Diagnostic and Statistical Manual of Mental Disorders until 1973. The notion of a disorder can hence be a product of the zeitgeist, scaffolded by historical and social conventions. More broadly, some investigators employ the term 'neurotypicality' or 'normative brain' to describe someone with the brain functions, behaviors and processing considered standard or average. However, studying neurotypicality may go against efforts to embrace diversity. Instead of hunting the illusion of a normal or typical brain, future neuroscience research may benefit from acknowledging key dimensions of 'neurodiversity'. In a broader context, the extended notion calls for the appreciation of all brains, including those with dyslexia, dyspraxia, dyscalculia, synesthesia, attentiondeficit/hyperactivity disorder and ASD. Rich dimensions of population diversity should be respected as a natural form of human variation.

The future starts today, not tomorrow
The diversity dimensions available in a dataset play a crucial role, which can be demonstrated by the dependence of machine learning algorithm success on population diversity (Fig. 4) 31 . In the biomedical sciences, it is often challenging to know which aspects of human diversity drive differences in a biomedical dataset or research question before they are studied in the context of an actual phenotype of interest. That is, even given a single dataset or an identical participant cohort, the most relevant diversity dimensions may depend on the research goals of a particular investigator. Opting for pertinent dimensions can build on prior evidence of the dimensions' implication in the target phenotype under study. However, there are a large number of potentially relevant variables that could be considered to capture aspects of diversity. Operationally, we can only explicitly model those dimensions of diversity available to the investigator. The UK Biobank, ABCD and other population datasets with deep phenotyping and broad participant recruitment may offer early opportunities to confront such questions.
Purposefully broadening recruitment and retention of participants from diverse backgrounds is increasingly recognized as a necessary improvement. However, recruiting from marginalized communities still faces substantial challenges, such as mistrust of government entities, economic constraints, or exploitation of a vulnerable population 32 . To address some of these challenges, the Canadian Longitudinal Study on Aging took the initiative from the beginning: geographic areas that contain people with less education and lower SES, on average, were oversampled during the recruitment process 33 . Furthermore, scientific societies called for the inclusion of existing measurements from emerging regions of the world in data initiatives 34 .
In conclusion, the emergence of large-scale population datasets marks a watershed event in twenty-first-century neuroscience. Deep profiling across phenome, brain measurements and genetics puts us in a better position than ever to sensitize studies to sources of brain diversity as we march toward single-subject prediction and precision medicine. Specifically, we argue that it will pay off to treat diversity factors as variables of interest rather than nuisance variables. The future of neuroscience lies in celebrating diversity rather than perpetuating the elusive notion of a 'normal brain'.