Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity

Abstract

Variant pathogenicity classifiers such as SIFT, PolyPhen-2, CADD, and MetaLR assist in interpretation of the hundreds of rare, missense variants in the typical patient genome by deprioritizing some variants as likely benign. These widely used methods misclassify 26 to 38% of known pathogenic mutations, which could lead to missed diagnoses if the classifiers are trusted as definitive in a clinical setting. We developed M-CAP, a clinical pathogenicity classifier that outperforms existing methods at all thresholds and correctly dismisses 60% of rare, missense variants of uncertain significance in a typical genome at 95% sensitivity.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: M-CAP outperforms existing pathogenicity likelihood metrics, particularly at the high sensitivity levels required for clinical applications.
Figure 2: M-CAP correctly eliminates the most variants of uncertain consequences as benign at 95% sensitivity.

References

  1. 1

    Yang, Y. et al. Clinical whole-exome sequencing for the diagnosis of Mendelian disorders. N. Engl. J. Med. 369, 1502–1511 (2013).

    CAS  Article  Google Scholar 

  2. 2

    Iglesias, A. et al. The usefulness of whole-exome sequencing in routine clinical practice. Genet. Med. 16, 922–931 (2014).

    Article  Google Scholar 

  3. 3

    Lee, H. et al. Clinical exome sequencing for genetic identification of rare Mendelian disorders. J. Am. Med. Assoc. 312, 1880–1887 (2014).

    Article  Google Scholar 

  4. 4

    Brownstein, C.A. et al. An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge. Genome Biol. 15, R53 (2014).

    Article  Google Scholar 

  5. 5

    Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–424 (2015).

    Article  Google Scholar 

  6. 6

    Ng, S.B. et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461, 272–276 (2009).

    CAS  Article  Google Scholar 

  7. 7

    Simpson, M.A. et al. Mutations in NOTCH2 cause Hajdu–Cheney syndrome, a disorder of severe and progressive bone loss. Nat. Genet. 43, 303–305 (2011).

    CAS  Article  Google Scholar 

  8. 8

    Ng, S.B. et al. Exome sequencing identifies the cause of a mendelian disorder. Nat. Genet. 42, 30–35 (2010).

    CAS  Article  Google Scholar 

  9. 9

    Taylor, J.C. et al. Factors influencing success of clinical genome sequencing across a broad spectrum of disorders. Nat. Genet. 47, 717–726 (2015).

    CAS  Article  Google Scholar 

  10. 10

    Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

    CAS  Article  Google Scholar 

  11. 11

    1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

  12. 12

    Rehm, H.L. et al. ACMG clinical laboratory standards for next-generation sequencing. Genet. Med. 15, 733–747 (2013).

    Article  Google Scholar 

  13. 13

    Ng, P.C. & Henikoff, S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812–3814 (2003).

    CAS  Article  Google Scholar 

  14. 14

    Adzhubei, I.A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).

    CAS  Article  Google Scholar 

  15. 15

    Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).

    CAS  Article  Google Scholar 

  16. 16

    Dong, C. et al. Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies. Hum. Mol. Genet. 24, 2125–2137 (2015).

    CAS  Article  Google Scholar 

  17. 17

    Hastie, T., Tibshirani, R. & Friedman, J. Elements of Statistical Learning (Springer, 2003).

  18. 18

    Fusi, N., Smith, I., Doench, J. & Listgarten, J. In silico predictive modeling of CRISPR/Cas9 guide efficiency. Preprint at bioRxiv http://dx.doi.org/10.1101/021568 (2015).

  19. 19

    Ogutu, J.O., Piepho, H.-P. & Schulz-Streeck, T. A comparison of random forests, boosting and support vector machines for genomic selection. BMC Proc. 5 (Suppl. 3), S11 (2011).

    Article  Google Scholar 

  20. 20

    Schwarz, J.M., Cooper, D.N., Schuelke, M. & Seelow, D. MutationTaster2: mutation prediction for the deep-sequencing age. Nat. Methods 11, 361–362 (2014).

    CAS  Article  Google Scholar 

  21. 21

    Reva, B., Antipin, Y. & Sander, C. Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res. 39, e118 (2011).

    CAS  Article  Google Scholar 

  22. 22

    Shihab, H.A. et al. Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models. Hum. Mutat. 34, 57–65 (2013).

    CAS  Article  Google Scholar 

  23. 23

    Chun, S. & Fay, J.C. Identification of deleterious mutations within three human genomes. Genome Res. 19, 1553–1561 (2009).

    CAS  Article  Google Scholar 

  24. 24

    Petrovski, S., Wang, Q., Heinzen, E.L., Allen, A.S. & Goldstein, D.B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. 25

    Pollard, K.S., Hubisz, M.J., Rosenbloom, K.R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010).

    CAS  Article  Google Scholar 

  26. 26

    Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).

    CAS  Article  Google Scholar 

  27. 27

    Henikoff, S. & Henikoff, J.G. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10915–10919 (1992).

    CAS  Article  Google Scholar 

  28. 28

    Garber, M. et al. Identifying novel constrained elements by exploiting biased substitution patterns. Bioinformatics 25, i54–i62 (2009).

    CAS  Article  Google Scholar 

  29. 29

    Davydov, E.V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6, e1001025 (2010).

    Article  Google Scholar 

  30. 30

    Kuhn, R.M., Haussler, D. & Kent, W.J. The UCSC genome browser and associated tools. Brief. Bioinform. 14, 144–161 (2013).

    CAS  Article  Google Scholar 

  31. 31

    Stenson, P.D. et al. The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine. Hum. Genet. 133, 1–9 (2014).

    CAS  Article  Google Scholar 

  32. 32

    Ionita-Laza, I., McCallum, K., Xu, B. & Buxbaum, J.D. A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nat. Genet. 48, 214–220 (2016).

    CAS  Article  Google Scholar 

  33. 33

    UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Res. 43, D204–D212 (2015).

  34. 34

    Yang, Y. et al. Molecular findings among patients referred for clinical whole-exome sequencing. J. Am. Med. Assoc. 312, 1870–1879 (2014).

    CAS  Article  Google Scholar 

  35. 35

    Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).

    Article  Google Scholar 

  36. 36

    Liu, X., Jian, X. & Boerwinkle, E. dbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions. Hum. Mutat. 32, 894–899 (2011).

    CAS  Article  Google Scholar 

  37. 37

    Liu, X., Jian, X. & Boerwinkle, E. dbNSFP v2.0: a database of human non-synonymous SNVs and their functional predictions and annotations. Hum. Mutat. 34, E2393–E2402 (2013).

    CAS  Article  Google Scholar 

  38. 38

    Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    Google Scholar 

  39. 39

    Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26, 589–595 (2010).

    Article  Google Scholar 

  40. 40

    McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    CAS  Article  Google Scholar 

Download references

Acknowledgements

We thank the members of the Bejerano laboartory, particularly J. Notwell, S. Chinchali, and J. Birgmeier, for technical advice and helpful discussions. P.D.S. and D.N.C. receive financial support from Qiagen through a license agreement with Cardiff University. We thank the PolyPhen-2, CADD, Eigen, FATHMM, MutationTaster, and MetaLR teams for making their training and testing data readily available. This work was funded in part by the Stanford Pediatrics Department, DARPA, a Packard Foundation Fellowship, and a Microsoft Faculty Fellowship to G.B.

Author information

Affiliations

Authors

Contributions

K.A.J., A.M.W., M.J.B., and G.B. designed the study and analyzed results. K.A.J. and M.J.B. implemented the model and performed the experiments. K.A.J., A.M.W., and H.G. wrote software tools that were used for analysis. P.D.S. and D.N.C. curated the HGMD data and provided feedback. J.A.B. provided patient exome cases and feedback. K.A.J., A.M.W., and G.B. wrote the manuscript. All authors reviewed and commented on the manuscript.

Corresponding author

Correspondence to Gill Bejerano.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Tables 1–4, 6, 9 and 10. (PDF 486 kb)

Supplementary Table 5

M-CAP scores for disease-causing mutations found in BRCA1, BRCA2, CFTR and MLL2. (XLSX 43 kb)

Supplementary Table 7

Clinical phenotypes for case study patients. (XLSX 73 kb)

Supplementary Table 8

Rare missense variants in case study patients. (XLSX 150 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jagadeesh, K., Wenger, A., Berger, M. et al. M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity. Nat Genet 48, 1581–1586 (2016). https://doi.org/10.1038/ng.3703

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing