When we talk about the pathogenicity of genetic variants, what exactly are we talking about? Although this question on its surface may appear to be a trivial or simply philosophical question, it is not. It shapes the foundational logic of human genetics research and determines the utility of our work with respect to disease risk and clinical intervention. In brief, our standard definitions of pathogenicity refer to variants that are deleterious, harmful, or increase the probability of disease1. This sounds simple, but it is too simple, as this definition often leads us to ignore a key principle: Genes evolve and function in the contexts created by their environment, including other genetic variants. These contexts can determine penetrance and thus the ability of a variant to cause disease.

Variant pathogenicity often depends on context

A simple but informative example of the heterogeneity of pathogenicity is the beta globin variant that causes hemoglobin S (HbS). The HbS allele in an individual who is homozygous for this variant has sickle cell disease, thereby increasing risk of death at a young age2,3. However, this same allele in the context of a second allele that encodes HbA will reduce the risk of risk of death at a young age in malaria endemic regions4,5. This decreased risk of death is the reason that the HbS allele is common in malaria endemic regions, and has not been culled by evolution6,7. Furthermore, in regions without malaria, being heterozygous for the HbS allele may not affect risk of death at a young age, unless there exists another precipitating variant in that individual’s genome, or the carrier experiences hypoxia when exercising at high altitude8,9,10. In these distinct, yet malaria-free, contexts, the HbS variant may again increase death risk at a young age. To add to this complexity, data now indicate that older heterozygotes may have an increased risk of subclinical kidney pathology, and increased rates of acute renal failure when exposed to Sars-CoV-211. Finally, variants that decrease the expression of alpha globin subunits (HBA1 and HBA2—alpha thalassemia)12,13 or allow for the persistent expression of gamma globin subunits into adulthood (HBG1 and HBG2 – persistence of fetal hemoglobin)14 can greatly mitigate the risk of death due to HbS homozygosity. Thus, the pathogenicity of the HbS variant depends heavily on other alleles, the environment, and the health outcome being evaluated. HbS can be considered a “simple” case, but even in this situation, pathogenic potential is strongly shaped by multiple contextual factors (Fig. 1).

Fig. 1: The complex determinants of phenotype and pathogenicity: example of a relatively simple case—Hemoglobin S (rs334).
figure 1

Starting at the center of this schematic and moving out radially in any direction, different relevant contexts are encountered. These contexts determine the type and severity of the observed phenotypes. This schematic is based on our current understanding and is not intended to be an exhaustive description of all relevant and all possible phenotypes linked to rs334. Some of the modifying contexts and relevant phenotypes may yet to be discovered. Finally, although it cannot be comprehensively depicted on this figure, phenotypes may serve as competing risks for one another, and this becomes more complex with age. As an example, a person cannot develop chronic kidney disease in older age if they die of sickling complications at a younger age.

This example clarifies that the process of making a universal pathogenicity assessment, uses an oversimplistic framework to describe an inherently complex phenomenon. Even when a variant can cause disease, it often does not, and knowing the modifying factors is critical to evaluating pathogenicity. Thus assuming that genetic variants have a single unidirectional effect on one outcome, obscures the complex genetic architecture of disease15. Regulatory processes, genetic buffering, environmental interactions, and epistasis can all play roles in determining the impact of a given variant16,17,18,19, and these contexts cannot be ignored if we want to understand variant pathogenicity15.

Defining pathogenicity is especially hard for variants with low penetrance and variable expressivity

Nonetheless, attempts are still made to produce “universal pathogenicity” assessments20. These assessments may make sense in the context of highly penetrant variants that cause Mendelian disease, but what about low penetrance variants with variable expressivity? Allelic expression levels, epigenetic changes, cis variants, trans variants, environmental exposures, and other factors, including lifestyle, collectively shape variant impact21,22 and low penetrance variants make up a very large proportion of our annotations. When over 5000 pathogenic and loss-of-function variants were assessed in the UK Biobank and BioMe, the mean penetrance was unexpectedly low (6.9%, 95% CI: 6.0–7.8%)23. While some of this pattern can be partly explained by the factors that drive the winner’s curse (i.e. inflated magnitude of initial associations due to low power, publication bias, model overfitting, etc.)24,25, it must be added that smaller associations should be expected when the study participants are more diverse. Family-based, clinical, and case-control studies have more homogenous participants and because study entry is partly conditioned on disease status, these study groups are enriched for etiologic co-factors. This means lower penetrance and smaller effect sizes will often be observed in large population-based cohorts22,26,27, even when there are subgroups where penetrance is high. When a variant has a smaller effect size and reduced penetrance in a heterogenous, population-based sample, it is important to examine that variant in multiple contexts. This can identify potentially sensitive subgroups, such as an ancestries, environments, or multiplexed families with higher penetrance and pathogenicity. Overall, assessment of variants in multiple contexts28,29 is critical to understanding differences in the causal mechanisms of disease in distinct groups.

Downplaying this heterogeneity impairs clinical communication and practice

Regardless of the reason for low penetrance, it creates a problem for pathogenicity assessments and clinical genetic practice. When these annotations are used as screening tests for disease risk, there is a systematic problem with test specificity (i.e., the ability of a test to identify true negatives and avoid false positives30). Since penetrance among many pathogenic variants is often low, most people with these variants will not develop disease. Thus, when applied clinically this can result in a very large number of false positives and subsequent unnecessary actions. While a strong argument can be made for tolerating false positives (type 1 error) in the early stages of genetic discovery research31,32, false positives in clinical settings can lead to patient anxiety, needless expense, and harm33.

One way to vet putative pathogenicity is to perform experiments that biologically validate the effects of genetic variants. However, it should be noted that such experiments are limited in their generalizability, and they are restricted by the conditions under which the experiments are performed. In vitro experiments and animal models can clearly demonstrate causal and mechanistic evidence of pathogenicity, but they cannot test or create all relevant contexts. For example, the experimental temperature, day night cycle, diet, air quality, or hormonal milieu may not reflect those of the humans that carry a potentially pathogenic variant. Geneticists are aware of these dynamics, known as reaction norms, and they have been taught in genetics classes for decades34,35. However some physicians and the general public may not be as familiar with how this fundamental principle of genetic variation can affect our annotations.

Universal pathogenicity assessments also create a systematic problem with sensitivity (i.e., the ability of a test to identify true positives and avoid false negatives30). This is partly because our annotation guidelines36, even when thoughtfully refined37 have traditionally considered the “absence of evidence” to be “evidence of absence”. In other words, when a variant is observed in a high number of healthy people (e.g., minor allele frequency [MAF] >5%) and it has not been yet linked to disease, then it can be labeled benign. Unfortunately, this approach fails to account for the determinants of penetrance. If a key determinant of penetrance was not present among the observations, then a conditionally pathogenic variant can be labeled a Variant of Unknown Significance or even Benign. This creates many issues but it seems particularly troublesome in the clinic when sequencing patients to identify the cause of rare syndromes38. Imagine trying to annotate the phenylalanine hydroxylase gene variants that cause phenylketonuria39 in a population with almost no access to foods that contain phenylalanine. Phenylalanine hydroxylase variants would appear benign in this context. Hence, in most cases when variant pathogenicity is assessed, the process identifies what can cause disease, but importantly, it does not identify what will cause disease in a given person at a given time40,41. This context agnostic approach has utility, but its limitations must be acknowledged and accounted for.

Existing genomic methods improve when context is considered

Despite the drawbacks of often defining pathogenicity as a binary and immutable feature of variants, genetic researchers have created many techniques of great utility. For example, molecular algorithms have been developed that can predict loss of protein function and these have high value in many settings42,43,44. We also now have protocols for molecular and clinical validation with laboratory-based functional assays45, and the longitudinal tracking of sequenced individuals in electronic health records46. Furthermore, several key papers have improved our thinking about the necessity of using diverse convergent evidence for causal reasoning in genomics31,47,48. Perhaps the most impressive advance in this area, is the scoring system developed by ClinGen that assembles and interprets empirical evidence for pathogenicity49. However, these approaches can only do so much when context is not explicitly considered. For example, even if we could develop a prediction algorithm that perfectly determined loss-of-function in any protein, we would still not know if loss-of-function was good or bad for any individual (given the remainder of their genome, and their environment, and the phenotype in question)50,51,52,53,54,55,56. Take for instance a protein that can convert pro-carcinogenic compounds to carcinogens. Loss-of-function of this protein may be beneficial in the context of high procarcinogen exposure57. Hence, the context, in this case the environment, can change a variant from beneficial to pathogenic and vice versa.

Therefore, even if we are using the best methods, we can observe conflicting evidence of pathogenicity when we do not explicitly consider context. This is particularly relevant for common variants. If a given variant is detrimental in all contexts, then this variant will usually be observed as a rare or de novo variant. In other words, variants are persistently culled by evolution when they reduce reproductive fitness in all contexts, but they can be maintained in the contexts where they do not reduce reproductive fitness. This may be especially evident when we consider pleiotropy, because antagonistic pleiotropy appears to play a major role in the persistence of several human disease variants58,59. For example, the strongest genetic determinant of Alzheimer’s Disease, APOE460,61, also prevents death from diarrhea in childhood62,63. Our ancestors probably needed infection protection for their reproductive fitness and one of the variants that met this early life requirement, also increased the risk of a late life disease, Alzheimer’s Disease62,63,64,65,66. Thus, it makes very little sense to talk about the universal pathogenicity of any common variant. However, from a practical perspective, it is hard to do anything else.

Context is complex—how can we specify it?

Context is easy to invoke as a concept, but the relevant context or determinants of penetrance, can differ for virtually every variant. Thus, when operationalizing research questions: What contexts do we measure? What contexts do we analyze? What phenotype do we examine? Even in the simplest research case with a single SNP, the potentially relevant context can be a cryptic and computationally impractical search space. Unfortunately, this explodes into intractability when considering Genome Wide Association or Next Generation Sequencing data (millions of SNPs and potentially thousands of environmental exposome variables). So, how can this problem be addressed? How can contexts that need attention be identified? It may be most practical to start with common and easily measured “contexts” that are known to have strong biological functions. This will help to optimize precision, statistical power, and the likelihood of documenting context-dependent pathogenicity.

With these features in mind, biological sex is among the easiest contexts to evaluate. It is easily measurable, it divides all human populations approximately in half, and there are many anatomic, physiologic, and pathophysiologic distinctions that align with it. Thus we can, and probably should, run sex-stratified sensitivity analyses in most genetic research studies67,68,69 especially when a trait is sexually dimorphic70. Failure to do this can obscure important biological patterns. Another step would be to encourage new methods for probing the X-chromosome, a chromosome that is often-ignored in association analyses. We have already started this strategy by analyzing the female-to-male allele frequency ratio as tool for the discovery of pathogenic variants (Equation 1)71. The reasoning is as follows: females have 2 copies of all Non-Pseudoautosomal X-chromosome loci and males only have one. Thus, females can be biologically more resilient to the presence of harmful variants at these sites. The exception is variants with dominant effects, in which case ratios will not be useful for detecting these variants. In any dataset of adult humans, when a Non-Pseudoautosomal X-chromosome variant exists at a higher proportion in females, this pattern can serve as evidence that the variant may increase the probability of premature death.

Following this simple logic, we used gnomAD data72 to characterize this phenomenon. Our methods are fully described in71, but in short, we obtained exome data from the X-Chromosomes of 76,702 males and 64,754 females. Then, we calculated female-to-male allele frequency ratios for the 44,606 variants that had an allele count of at least 5. None of the pseudoautosomal variants had a ratio above 11, but 319 of the non-pseudoautosomal variants had ratios above this empiric threshold.

Only 25 of these high-ratio variants were annotated in ClinVAR and had a rs number. Most of these variants had high sex-averaged MAFs and no known associations with disease, and they were listed as benign or likely benign (Table 1). As an example, one of the 25 variants had a sex-averaged MAF of 0.13, no known disease associations, and was listed as likely benign. This site had been genotyped 38,527 times in males (one locus each) and 104,056 times in females (2 loci each), so there was no shortage of data. Overall, the variant was observed a total of 18,736 times, but not one of these observations came from a male or a homozygous female. It was only found in heterozygous females. Thus, it is likely that this variant is almost 100% lethal (perhaps even embryonic lethal) in males and homozygous females, but is without large effect in heterozygous females. When we considered the other 24 variants, we found similar patterns, although the comparisons were less extreme.

Table 1 Variants identified in gnomAD exome data that have an allele proportion ratio above 11 and a ClinVar entry.

To further characterize these variants, we probed them with a diverse set of web-based bioinformatic resources: dbSNP73, VarSome74, OMIM75, and VENUS76,77. These databases provide additional information on evolutionary conservation, gene-phenotype relationships, protein-structure predictions, and other aspects of these variants that need consideration in pathogenicity assessments. We found that:

  1. 1.

    Existing annotation methods can miss sex-specific pathogenicity. We observed that 22 out of 25 (88%) high ratio variants are listed as Benign or Likely Benign in ClinVar (1 is listed as Conflicting [Uncertain Significance and Benign] 2 are listed as Uncertain Significance). These variants are commonly observed in healthy heterozygous females and they achieve high sex-averaged MAFs so they appear benign, but males are rarely observed (i.e., these variants are not often tolerated in males)

  2. 2.

    QC procedures can mislabel evidence of sex-specific pathogenicity as genotyping error. We looked in the second dataset from gnomAD site (the genomes data) and observed that 22 out of the 25 (88%) high ratio variants failed QC filters74. Sex differences in MAF were assumed to be error rather than putative evidence of sex-specific pathogenicity. Thus, these QC filters may systematically remove variants with sex-specific pathogenicity before they can even be assessed.

  3. 3.

    Our ratio method identified genes that were already linked to clinical syndromes through other variants. In all, 23 of 25 (92%) genes implicated by the high ratio variants have specific links to clinical syndromes listed in OMIM75. The other two genes have tentative links to pathology described in their OMIM entry.

  4. 4.

    Structural predictions are not available or useful for most of these top ratio hits. Michaelangelo-VENUS structural predictions76,77 were only possible for 6 of the 25 variants (24%). VENUS requires the specification of a specific amino acid substitution at a specific site in the protein. This makes sense for some variants, but 19 of the 25 variants do not have that impact, or their exact impact on amino acid sequence cannot be yet specified (synonymous, intronic, splice donor variants, etc.)

  5. 5.

    Additional heterogeneity exists and some high ratio variants might be better tolerated by males and homozygous females in specific contexts. Some high ratio alleles had frequencies that differed by ancestry group, and this is consistent with the interpretation that these variants may not have sex-specific pathogenicity in all contexts.

Overall, these 5 points indicate that seeking and documenting evidence of sex-specific effects could improve pathogenicity annotations. The existing tools for variant characterization can only do so much if context is not explicitly evaluated. Finally, we note that the many potential mechanisms for sex-specific pathogenicity remain to be characterized, but there is some indication in our initial results that regulatory function may sometimes be involved. RegulomeDB evaluations of the 25 high-ratio variants provide diverse and nuanced information on the likelihood of regulatory function at these loci (Table 2). They reveal that 13 of the 25 high ratio variants (52%) have some indication of regulatory function: a rank less than three or a score greater than 0.5. A rank less than three indicates the presence of at least two strong pieces of experimental evidence that are consistent with regulatory function, and scores greater than 0.5 are in the top half of possible scores from models that predict transcription factor binding.

Table 2 Evidence of regulatory function among the high ratio variants.

Sex differences in allele frequency on the X chromosome are a special case, but this pattern may also be found in autosomal variants that affect disease risk differently between males and females. Very large and very small allele proportion ratios in the autosomes may also be indicative of sex-specific effects that deserve further investigation. While this area of genetic research is still in its infancy, and thresholds for discovery and confirmatory findings are not yet established, we have already observed extreme female-to-male allele proportion ratios on autosomes (many standard deviations above or below the mean). Work in progress has already revealed a distribution of ratios on chromosome 21 that demonstrates this point (Table 3). Ratios this high are very unlikely occur by chance. Finally, we note that biological sex is just the first and simplest context to consider. More complex situations such as ancestry and environmental exposures will need increased attention. For example, we already know that failing to assess ancestry-specific associations can generate ancestry-specific misinterpretations of genetic tests that disproportionally harm marginalized groups78. We need to collect genetic data on diverse ancestry groups79 and explicitly consider this context in order to avoid generating health disparities with ancestry-specific medical error80.

Table 3 Summary statistics for the 21493 female-to-male allele proportion ratios calculated on chromosome 21 in the GnomAD exomes data.

Overall, considering context will not solve all the problems in pathogenicity assessment, but it is a necessary step for addressing key clinical and translational issues in genetics. Sex-stratified GWAS70, and female-to-male allele proportion ratios71 can start us on a path that probes multiple determinants of penetrance. A lot of work remains in determining how to best explore contextual frameworks for variant pathogenicity, and other tools will be needed to evaluate additional factors, such as xenobiotic exposures and ancestry. However, biological sex is an ideal context to start with, because it will not require any new data. Information on biological sex is extractable from virtually all existing genomic data, and these data can be easily re-evaluated at low cost. Furthermore, it will not be hard or expensive to better evaluate sex differentials in allele frequency and improve the definition of benign in pathogenicity annotations. As an easy first step, ClinVar could present MAFs by sex. Overall, we call on the genetic research community to proactively consider context. While the optimal frameworks for achieving this goal are not fully established, we can to start by routinely evaluating the sexes separately, and documenting what is known about effect modifiers in our annotations. We have proposed a deeper dive into sex as a common effect modifier but other strata should be explored and documented in annotations. Covariates should be collected in our datasets and exploratory sensitivity analyses should be more routine or we will fail to identify many determinants of penetrance that have clinical relevance.

Conclusion

In summary, these strategies will not provide better answers to the old questions; they simply refine the questions so that they are more relevant. The old questions are generally context agnostic, and they have set the basis of our understanding reasonably well, but not well enough. If we want to keep advancing, we must now address the ubiquity of pleiotropy and the contextual determinants of penetrance.

Equation 1. The female-to-male allele proportion ratio71

$${\boldsymbol{R}}=\frac{({{\rm{V}}}_{\rm{f}}+{\bf{1}})/({A}_{\rm{f}}+{\bf{1}})}{({{\rm{V}}}_{\rm{m}}+{\bf{1}})/({A}_{\rm{m}}+{\bf{1}})}$$

R: allele proportion ratio

Vf: the minor allele count in females

Af: the total allele count in females

Vm: the minor allele count in males

Am: the total allele count in males