A statistical framework for rare disease diagnosis

Despite advances in genomic analysis that have enabled characterization of coding variants responsible for Mendelian disease, the identification of disease mutations outside the coding sequence remains limited. In a new study, Mohammadi et al. report the development of ANEVA (analysis of expression variation), a statistical model that can quantify variation in gene dosage, and demonstrate its potential for identifying genes that harbour pathogenic variants driving rare diseases.

Credit: Pobytov/iStock/Getty

ANEVA combines a generative model of population-level allelic expression (AE) data, which describes the relative expression of the paternal and maternal haplotype in an individual, with a mechanistic model of cis-regulatory variation to estimate how much genetic regulatory variation each gene has in the population. ANEVA provides biologically interpretable estimates of the expected variance in gene dosage that is due to inter-individual genetic differences within a population (VG). Importantly, ANEVA-based VG estimates, which were derived by applying ANEVA to AE data from the Genotype-Tissue Expression (GTEx) project version 7, were consistent with previous GTEx expression quantitative trait loci (eQTLs) data and were largely concordant with gene expression cis heritability (h2) data from two other large studies, illustrating the robustness of ANEVA.

The potential of ANEVA to capture biological sources of regulatory variation between genes was then demonstrated. Indeed, VG estimates correlated well between different tissue types in GTEx data and tended to be smaller for specific genes in tissues in which that gene is highly expressed. VG estimates from three European and one African subpopulation of the GEUVADIS consortium were also highly correlated, suggesting that gene dosage variation does not vary substantially between populations. Additionally, a generalized mean of VG across GTEx tissues (calculated for each gene) correlated with published coding and regulatory variation analyses of selective constraint on genes and traits.

ANEVA was next assessed for its utility to identify potentially pathogenic population outlier genes. To this end, the authors developed ANEVA dosage outlier test (ANEVA-DOT), a statistical test that enables comparison of an individual’s AE data with the population distributions. The pre-calculated VG estimates from ANEVA are used as a reference for the identification of potentially pathogenic genetic variants that cause the expression of a gene to fall outside the normal range seen in a healthy population — for example, a heterozygous variant that strongly reduces the expression of one allele of a gene that is intolerant to substantial dosage change. Using 466 GTEx skeletal muscle samples, ANEVA-DOT was shown to effectively capture rare genetic effects on gene dosage.

Using the VG reference estimates from the GTEx skeletal muscle samples, ANEVA-DOT was applied to AE data from 70 patients with rare Mendelian muscular dystrophies and myopathies; of the 65 patients with high-quality data for analysis, 32 had a previous pathogenic genetic diagnosis (21 of which were expected to lead to allelic imbalance). ANEVA-DOT accurately detected genes with pathogenic variants in previously resolved cases. Indeed, of a median of 2,190 tested genes, ANEVA-DOT identified a median of 11 outlier genes per individual, which included the previously diagnosed gene in 76% of diagnosed patients. Furthermore, among the 33 patients who had not received a genetic diagnosis from previous DNA and RNA sequencing analysis, a median of 9 outlier genes were noted per sample, including at least 1 neuromuscular disease-related gene in 12 patients. For one of these potential new diagnoses, RNA sequencing and RT-PCR analysis led to a confirmed diagnosis in the Mendelian muscle disease gene DES, which had previously been missed as it was an intronic variant. These findings demonstrate how VG reference estimates from GTEx data can inform rare disease diagnosis and identify non-coding pathogenic variants that might have been missed using current approaches.

“the ANEVA–ANEVA-DOT statistical framework can provide insight into rare and pathogenic variants”

Overall, the ANEVA–ANEVA-DOT statistical framework can provide insight into rare and pathogenic variants, including those in non-coding regions, to complement transcriptomics-based diagnostic pipelines for patients with rare Mendelian diseases.


Original article

  1. Mohammadi, P. et al. Genetic regulatory variation in populations informs transcriptome analysis in rare disease. Science 366, 351–356 (2019)

    CAS  Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Conor A. Bradley.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bradley, C.A. A statistical framework for rare disease diagnosis. Nat Rev Genet 21, 2–3 (2020).

Download citation


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing