Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Disease variant prediction with deep generative models of evolutionary data

A Publisher Correction to this article was published on 17 December 2021

This article has been updated


Quantifying the pathogenicity of protein variants in human disease-related genes would have a marked effect on clinical decisions, yet the overwhelming majority (over 98%) of these variants still have unknown consequences1,2,3. In principle, computational methods could support the large-scale interpretation of genetic variants. However, state-of-the-art methods4,5,6,7,8,9,10 have relied on training machine learning models on known disease labels. As these labels are sparse, biased and of variable quality, the resulting models have been considered insufficiently reliable11. Here we propose an approach that leverages deep generative models to predict variant pathogenicity without relying on labels. By modelling the distribution of sequence variation across organisms, we implicitly capture constraints on the protein sequences that maintain fitness. Our model EVE (evolutionary model of variant effect) not only outperforms computational approaches that rely on labelled data but also performs on par with, if not better than, predictions from high-throughput experiments, which are increasingly used as evidence for variant classification12,13,14,15,16. We predict the pathogenicity of more than 36 million variants across 3,219 disease genes and provide evidence for the classification of more than 256,000 variants of unknown significance. Our work suggests that models of evolutionary information can provide valuable independent evidence for variant interpretation that will be widely useful in research and clinical settings.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Modelling strategy.
Fig. 2: EVE accurately predicts disease-causing variants.
Fig. 3: EVE is as good as functional experiments at predicting clinical interpretations of variants.
Fig. 4: Predictions for variants in 3,219 genes.

Similar content being viewed by others

Data availability

The data analysed and generated in this study, including multiple sequence alignments used in training, ClinVar annotations used for validation, population frequencies and predictions from our model, are available in Supplementary Information and at Predictions from other computational models are available through data are provided with this paper.

Code availability

The model code is available at,

Change history


  1. Van Hout, C. V. et al. Exome sequencing and characterization of 49,960 individuals in the UK Biobank. Nature 586, 749–756 (2020).

    Article  ADS  PubMed  PubMed Central  CAS  Google Scholar 

  2. Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  3. Landrum, M. J. & Kattman, B. L. ClinVar at five years: delivering on the promise. Hum. Mutat. 39, 1623–1630 (2018).

    Article  PubMed  Google Scholar 

  4. Raimondi, D. et al. DEOGEN2: prediction and interactive visualization of single amino acid variant deleteriousness in human proteins. Nucleic Acids Res. 45, W201-W206 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Feng, B. J. PERCH: a unified framework for disease gene prioritization. Hum. Mutat. 38, 243–251 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Ioannidis, N. M. et al. REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. Am. J. Hum. Genet. 99, 877-885 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Ionita-Laza, I., McCallum, K., Xu, B. & Buxbaum, J. D. A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nat. Genet. 48, 214–220 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Jagadeesh, K. A. et al. M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity. Nat. Genet. 48, 1581-1586 (2016).

    Article  CAS  PubMed  Google Scholar 

  9. Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J. & Kircher, M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47, D886–D894 (2019).

    Article  CAS  PubMed  Google Scholar 

  10. Adzhubei, I. A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–424 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Findlay, G. M. et al. Accurate classification of BRCA1 variants with saturation genome editing. Nature 562, 217–222 (2018).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  13. Glazer, A. M. et al. High-throughput reclassification of SCN5A variants. Am. J. Hum. Genet. 107, 111–123 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Giacomelli, A. O. et al. Mutational processes shape the landscape of TP53 mutations in human cancer. Nat. Genet. 50, 1381–1387 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Mighell, T. L., Evans-Dutson, S. & O’Roak, B. J. A saturation mutagenesis approach to understanding PTEN lipid phosphatase activity and genotype–phenotype relationships. Am. J. Hum. Genet. 102, 943–955 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Jia, X. et al. Massively parallel functional testing of MSH2 missense variants conferring Lynch syndrome risk. Am. J. Hum. Genet. 108, 163–175 (2021).

    Article  CAS  PubMed  Google Scholar 

  17. Cao, Y. et al. The ChinaMAP analytics of deep whole genome sequences in 10,588 individuals. Cell Res. 30, 717–731 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Gudbjartsson, D. F. et al. Large-scale whole-genome sequencing of the Icelandic population. Nat. Genet. 47, 435–444 (2015).

    Article  CAS  PubMed  Google Scholar 

  19. Esposito, D. et al. MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol. 20, 223 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Trenkmann, M. Putting genetic variants to a fitness test. Nat. Rev. Genet. 19, 667 (2018).

    Article  CAS  PubMed  Google Scholar 

  21. Rehm, H. L. et al. ClinGen—the Clinical Genome Resource. N. Engl. J. Med. 372, 2235–2242 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Grimm, D. G. et al. The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity. Hum. Mutat. 36, 513–523 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Hopf, T. A. et al. Mutation effects predicted from sequence co-variation. Nat. Biotechnol. 35, 128–135 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Marks, D. S. et al. Protein 3D structure computed from evolutionary sequence variation. PLoS ONE 6, e28766 (2011).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  25. Hopf, T. A. et al. Sequence co-evolution gives 3D contacts and structures of protein complexes. eLife 3, e03430 (2014).

    Article  PubMed Central  Google Scholar 

  26. Lapedes, A., Giraud, B. & Jarzynski, C. Using sequence alignments to predict protein structure and stability with high accuracy. Preprint at (2012).

  27. Vaser, R., Adusumalli, S., Leng, S. N., Sikic, M. & Ng, P. C. SIFT missense predictions for genomes. Nat. Protoc. 11, 1–9 (2016).

    Article  CAS  PubMed  Google Scholar 

  28. Reva, B., Antipin, Y. & Sander, C. Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res. 39, e118 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Rezende, D. J., Mohamed, S. & Wierstra, D. in Proceedings of the 31st International Conference on Machine Learning vol. 32 (eds Xing, E. P. & Jebara, T.) 1278–1286 (PMLR, 2014).

  30. Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. Preprint at (2013).

  31. Riesselman, A. J., Ingraham, J. B. & Marks, D. S. Deep generative models of genetic variation capture the effects of mutations. Nat. Methods 15, 816–822 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Suzek, B. E., Wang, Y., Huang, H., McGarvey, P. B. & Wu, C. H. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 31, 926–932 (2015).

    Article  CAS  PubMed  Google Scholar 

  33. Kalia, S. S. et al. Recommendations for reporting of secondary findings in clinical exome and genome sequencing, 2016 update (ACMG SF v2.0): a policy statement of the American College of Medical Genetics and Genomics. Genet. Med. 19, 249–255 (2017).

    Article  PubMed  Google Scholar 

  34. Frigo, G. et al. Homozygous SCN5A mutation in Brugada syndrome with monomorphic ventricular tachycardia and structural heart abnormalities. Europace 9, 391–397 (2007).

    Article  PubMed  Google Scholar 

  35. Itoh, H. et al. Asymmetry of parental origin in long QT syndrome: preferential maternal transmission of KCNQ1 variants linked to channel dysfunction. Eur. J. Hum. Genet. 24, 1160–1166 (2016).

    Article  CAS  PubMed  Google Scholar 

  36. Glazer, A. M. et al. Deep mutational scan of an SCN5A voltage sensor. Circ. Genom. Precis. Med. 13, e002786 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Bouvet, D. et al. Methylation tolerance-based functional assay to assess variants of unknown significance in the MLH1 and MSH2 genes and identify patients with Lynch syndrome. Gastroenterology 157, 421–431 (2019).

    Article  CAS  PubMed  Google Scholar 

  38. Pan, X. et al. Structure of the human voltage-gated sodium channel Nav1.4 in complex with β1. Science 362, eaau2486 (2018).

    Article  PubMed  CAS  Google Scholar 

  39. Fishel, R. et al. The human mutator gene homolog MSH2 and its association with hereditary nonpolyposis colon cancer. Cell 75, 1027–1038 (1993).

    Article  CAS  PubMed  Google Scholar 

  40. Peltomaki, P. Role of DNA mismatch repair defects in the pathogenesis of human cancer. J. Clin. Oncol. 21, 1174-1179 (2003).

    Article  CAS  PubMed  Google Scholar 

  41. Warren, J. J. et al. Structure of the human MutSα DNA lesion recognition complex. Mol. Cell 26, 579–592 (2007).

    Article  CAS  PubMed  Google Scholar 

  42. Brnich, S. E. et al. Recommendations for application of the functional evidence PS3/BS3 criterion using the ACMG/AMP sequence variant interpretation framework. Genome Med. 12, 3 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  43. Lewontin, R. C. The Genetic Basis of Evolutionary Change (Columbia Univ. Press, 1974).

  44. Kreitman, M. Nucleotide polymorphism at the alcohol dehydrogenase locus of Drosophila melanogaster. Nature 304, 412-417 (1983).

    Article  ADS  CAS  PubMed  Google Scholar 

  45. Sunyaev, S. et al. Prediction of deleterious human alleles. Hum. Mol. Genet. 10, 591–597 (2001).

    Article  CAS  PubMed  Google Scholar 

  46. IUCN. The IUCN red list of threatened species. IUCN (2020).

Download references


We thank members of the Marks laboratory, OATML and C. Sander for many valuable discussions. J.F., M.D. and K.B. are supported by the Chan Zuckerberg Initiative CZI2018-191853. K.B. is also supported by the US National Institutes of Health (R01 R01GM120574). P.N. is supported by GSK and the UK Engineering and Physical Sciences Research Council (EPSRC ICASE award no. 18000077). A.G. is a Clarendon Scholar and Open Philanthropy AI Fellow. Y.G. holds a Turing AI Fellowship (Phase 1) at the Alan Turing Institute, which is supported by EPSRC grant reference V030302/1. D.S.M. holds a Ben Barres Early Career Award by the Chan Zuckerberg Initiative as part of the Neurodegeneration Challenge Network, CZI2018-191853.

Author information

Authors and Affiliations



D.S.M. and Y.G. led the research. J.F., P.N. and M.D. conceived and implemented the end-to-end approach. A.G. contributed technical advice. K.B. supported with data preparation. J.K.M. developed the website. J.F., P.N., M.D., Y.G. and D.S.M. wrote the manuscript.

Corresponding authors

Correspondence to Yarin Gal or Debora S. Marks.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature thanks Martin Kircher and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Bayesian VAE architecture details.

The Bayesian VAE architecture in EVE is comprised of a symmetric 3-layer encoder & decoder architecture (with 2,000-1,000-300 and 300-1,000-2,000 units respectively) and a latent space of dimension 50. After performing a one-hot encoding of the input sequence across amino acids (zeros in white, ones in green), we flatten the input before performing the forward pass through the network. We use a single set of parameters for the encoder (ϕp) and learn a fully-factorized gaussian distribution over the weights of the decoder (θp): weight samples for the decoder are obtained by sampling a random normal variable (rnv), multiplying that sample by the standard deviation parameters, and subsequently adding the mean parameters. A one-dimensional convolution is applied on the un-flattened output of the decoder to capture potential correlations between amino-acid usage. Finally, a softmax activation turns the final output into probabilities over amino acids at each position of the sequence (low values in white, high values in dark green). The overall network is trained by maximizing the Evidence Lower Bound (ELBO), which forms a tractable lower bound to the log-marginal likelihood (Supplementary Methods and Fig. 1).

Extended Data Fig. 2 Comparison of performance of Bayesian VAE and DeepSequence against 38 deep mutation scans.

Comparison between the performance of the Bayesian VAE architecture in EVE and DeepSequence46 which achieves state-of-the-art performance on the protein function prediction task. “Evolutionary indices” were computed by sampling 2k times from the approximate posterior distribution and by ensembling the obtained indices over 5 independently trained VAEs (Supplementary Methods).

Extended Data Fig. 3 Evolutionary index separates pathogenic and benign variants.

a, Average evolutionary index per protein, and corresponding standard deviations, for variants with known Benign and Pathogenic ClinVar labels across 3,219 proteins (sorted by alphabetical order). On the right, marginal distributions of the means over the 3,219 proteins. Evolutionary index separates pathogenic and benign labels consistently across proteins. b, Two-component Gaussian Mixture Models (GMM) over the distributions of the evolutionary indices (histograms) for all the single amino acid variants of 3,219 proteins combined (top, left) and for P53, PTEN and SCN5A separately (top right, bottom left and right, respectively). The dashed black line is the marginal likelihood for the GMM model, i.e. the likelihood of a variant sequence after marginalizing the latent variable that corresponds to the mixture assignment; the dashed blue and red lines represent the relative share of the marginal likelihood from the benign and pathogenic clusters respectively (i.e. the product of the marginal likelihood by each cluster).

Extended Data Fig. 4 EVE prediction for actionable genes and EVE comparison to other computational methods, including meta-predictors.

a, EVE AUCs versus ClinVar labels for set of ACMG “actionable genes”33 that have 15 or more labels (shown in parentheses). AUCs are computed both for EVE scores of all variants (pale blue), and of the 75% variants with most confident scores (dark blue) (Supplementary Methods). b, Performance comparison of EVE to state-of-the-art computational variant effect predictors: 7 unsupervised, 8 supervised, and 8 supervised meta-prediction methods. Size of marker indicates how many genes for which the method would be relevant (on a per-protein basis validation) (Supplementary Methods, Supplementary Notes 2, Fig. 2, Supplementary Tables 3, 4).

Extended Data Fig. 5 Computational model EVE as good as high-throughput experiments for clinical labels.

(Companion to Fig. 3) a, Comparison of computational model predictions (upper panels EVE score) and experimental assay predictions (lower panels, experimental assay metric) to ClinVar labels (dots) and VUS (crosses) and where pale red and pale blue crosses indicate EVE assignments of VUS. Dashed red and blue lines correspond to EVE predictions after removing the 25% most uncertain variants (computed on all variants across all proteins; see Supplementary Methods). x-axes are position in protein. Experimental measurements data from deep mutational scans of P5314, from left (WT_Nutlin-3, A549_p53NULL_Nutlin-3, A549_p53NULL_Etoposide), SCN5A13, and BRCA112. b, Scatter plots of experiment scores (y-axis) against EVE scores (x-axis). Experimental measurements data from deep mutational scans same as a (Supplementary Methods, Supplementary Table 6).

Extended Data Fig. 6 Comparison of label policies, and comparison of EVE and experimental predictions of clinical labels.

a, The y-axis is the subset of the ACMG actionable protein list with at least 5 benign and 5 pathogenic labels with at least a one-star review status in ClinVar, mean for the 3,219 proteins and mean for this subset. x-axis is AUCs computed using these labels (deep blue), labels with at least a two-star review status (light grey) and a more lenient labelling policy (sky blue), as defined in Supplementary Methods. b, AUC of EVE predictions (blue circle) and experimental predictions (blue cross) computed on ClinVar labels. Whilst most of the papers that provide these experimental results refer to the goal of predicting association to human disease, the assays vary in their relevance to disease phenotype. Results use high-quality labels whenever they are sufficient for robust validation (MSH2, P53, BRCA1) and lenient labels for all other cases, and 2017-release ClinVar data whenever experimental results were used in defining labels reported in 2021 (P53 and BRCA1). Reported averages of all displayed AUC values, and of AUCs computed exclusively on 2017 and 2021-reslease high-quality labels (Supplementary Methods, Supplementary Table 5,6).

Extended Data Fig. 7 EVE has many more genes that can be validated on, compared to supervised methods.

Mean number of genes, for EVE (dark blue) and a supervised method (light blue), that have sufficient labels for validation (5 (left), 10 (middle) and 20 labels (right)). We assume a 90% train 10% test random split of all labels in ClinVar for the supervised methods.

Extended Data Fig. 8 Data provided on our server

Screenshot of for the example of KCNQ2. Our server provides information about each protein: aggregate AUC/Accuracy, performance curves (ROC & Precision-recall), variant-level EVE scores, classification and uncertainties, as well as the multiple sequence alignments used for training. All data is available to download both in bulk and for individual genes.

Extended Data Fig. 9 Fraction of genes per person with more than one variant.

Density function of the fraction of total genes per person with at least two variants, though not necessarily in the same chromosome. Data extracted from 50k genomes of the UK Biobank with self-reported ethnicity backgrounds (Supplementary Methods).

Extended Data Fig. 10 Performance as a function of alignment depth.

Average AUC of EVE scores as a function of N/Lcov for the subset of genes with at least 10 known clinical labels (5 benign and 5 pathogenic). For this subset of genes, the performance of the model can be carefully validated using AUCs. There is no strong correlation between alignment depth and performance: while models with very deep alignments tend to have good performance, models with very low N/Lcov can also have AUC close to 1.

Supplementary information

Supplementary Information

This file contains Supplementary Methods, Supplementary Note 1 on the limitations of supervised modeling methods and Supplementary Note 2 with comments on meta-predictors.

Reporting Summary

Supplementary Table 1

Statistics of multiple sequence alignments used in training.

Supplementary Table 2

Summary of class assignments and combination with other sources of evidence.

Supplementary Table 3

Comparison of performance of EVE and other computational models at predicting ClinVar labels.

Supplementary Table 4

Comparison of performance of EVE and other computational models at predicting results from high-throughput functional assays.

Supplementary Table 5

Sensitivity of predictions to label policy.

Supplementary Table 6

Comparison of performance of high-throughput experiments and EVE at predicting clinical labels.

Supplementary Table 7

Genes with at least 10 labels sorted by EVE performance.

Supplementary Table 8

Variants for which there is disagreement between EVE predictions and ClinVar labels.

Supplementary Table 9

All pairs of variants occurring in the same gene over the UK biobank population for the actionable genes defined by ACMG.

Source data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Frazer, J., Notin, P., Dias, M. et al. Disease variant prediction with deep generative models of evolutionary data. Nature 599, 91–95 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing