Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Correspondence
  • Published:

The mutation significance cutoff: gene-level thresholds for variant predictions

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Disease-associated mutation features.

References

  1. Adzhubei, I.A. et al. Nat. Methods 7, 248–249 (2010).

    Article  CAS  Google Scholar 

  2. Kumar, P., Henikoff, S. & Ng, P.C. Nat. Protoc. 4, 1073–1081 (2009).

    Article  CAS  Google Scholar 

  3. Kircher, M. et al. Nat. Genet. 46, 310–315 (2014).

    Article  CAS  Google Scholar 

  4. Petrovski, S., Wang, Q., Heinzen, E.L., Allen, A.S. & Goldstein, D.B. PLoS Genet. 9, e1003709 (2013).

    Article  CAS  Google Scholar 

  5. Samocha, K.E. et al. Nat. Genet. 46, 944–950 (2014).

    Article  CAS  Google Scholar 

  6. Itan, Y. et al. Proc. Natl. Acad. Sci. USA 112, 13615–13620 (2015).

    Article  CAS  Google Scholar 

  7. Stenson, P.D. et al. Hum. Genet. 133, 1–9 (2014).

    Article  CAS  Google Scholar 

  8. Landrum, M.J. et al. Nucleic Acids Res. 42, D980–D985 (2014).

    Article  CAS  Google Scholar 

  9. Auton, A. et al. Nature 526, 68–74 (2015).

    Article  Google Scholar 

Download references

Acknowledgements

We thank M. Kircher for information about the CADD method and D.B. Goldstein for gene-level metrics insights. We thank Y. Nemirovskaya, E. Anderson, M. Woollett and D. Papandrea for administrative support. Y.I. was supported in part by grant no. UL1 TR000043 from the National Center for Advancing Translational Sciences (NCATS), US National Institutes of Health (NIH) Clinical and Translational Science Award (CTSA) program. This study was supported by the Rockefeller University and the St. Giles Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuval Itan.

Ethics declarations

Competing interests

D.N.C. and P.D.S. are in receipt of funding from Qiagen through a license agreement with Cardiff University. The other authors declare that they have no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Comparison of the performance of impact prediction methods and mutation signatures, on the basis of functional evidence.

(A) ROC curves comparing performance of CADD with PolyPhen-2 and SIFT to distinguish between true positive disease-causing missense mutations extracted from HGMD and false-positive neutral private missense variants derived from patients' WES data from which the known disease-causing mutation was removed. (B) Association of disease-associated deleterious mutation allele frequencies (MAF) with predicted impact scores. Plot of 129,586 HGMD true-positive deleterious mutations, against their minor allele frequencies, in slots of MAF = 0, MAF ≤ 0.001 and MAF ≤ 0.01.

Supplementary Figure 2 Density plots of all 127,109 known HGMD disease-associated deleterious mutations and their corresponding 180,305 alleles by different variant-level software.

(A) CADD scores of all disease-associated deleterious mutations. (B) CADD scores of all disease-associated deleterious alleles. (C) PolyPhen-2 scores of all disease-associated deleterious mutations. (D) PolyPhen-2 scores of all disease-associated deleterious alleles. (E) SIFT scores of all disease-associated deleterious mutations. (F) SIFT scores of all disease-associated deleterious alleles.

Supplementary Figure 3 Density plots of 100,000 bootstrapping simulations to estimate TP and TN prediction rates with CADD-based MSC, by randomly partitioning 1,283 genes that contain at least 10 mutations in both HGMD and in the 1,000 Genomes Project database.

(A) TP prediction rate of novel disease-associated deleterious mutations by 90% CI MSC. (B) TN prediction rate of novel disease-associated deleterious mutations by 90% CI MSC. (C) TP prediction rate of novel disease-associated deleterious mutations by 95% CI MSC. (D) TN prediction rate of novel disease-associated deleterious mutations by 95% CI MSC. (E) TP prediction rate of novel disease-associated deleterious mutations by 99% CI MSC. (F) TN prediction rate of novel disease-associated deleterious mutations by 99% CI MSC.

Supplementary Figure 4 Characteristics of CADD-based 95% CI MSC scores generated from HGMD disease-associated mutations.

(A) Density plot of the MSC scores of all 19,698 human protein-coding genes, with a genome-wide MSC median=10.60. (B) An inverse exponential correlation between MSC and gene damage level measured by the gene damage index (GDI), showing that low MSC genes tend to be highly damaged whereas high MSC genes tend to be only slightly damaged. (C) An exponential correlation between MSC and purifying selection level measured by the neutrality index (NI), showing that high MSC genes tend to be under stronger purifying selection. (D) KEGG pathways functional enrichment of 985 genes with low MSC. The upper panel shows enrichment in the complement and coagulation cascades pathway; the lower panel shows enrichment in the ECM-receptor interaction pathway. (D) 2,288 high MSC genes display a functional enrichment in the Ribosome pathway.

Supplementary Figure 5 ROC curves comparing the performance of MSC with variant-level methods and the RVIS hot zone approach.

(A) CADD-based MSC generated with 90%, 95% and 99% CIs with CADD prediction (provided by the PolyPhen-2 method, based on a fixed cutoff), as well as the RVIS hot zone approach combining RVIS and PolyPhen-2 fixed cutoffs. (B) PolyPhen-2-based MSC generated with 90%, 95% and 99% CIs with PolyPhen-2 prediction (provided by the PolyPhen-2 method, based on a fixed cutoff), as well as the RVIS hot zone approach combining RVIS and PolyPhen-2 fixed cutoffs. (C) SIFT-based MSC generated with 90%, 95% and 99% CIs with SIFT prediction (provided by the SIFT method, based on a fixed cutoff). See Supplementary Methods for a full description of the TP and FP sets used.

Supplementary Figure 6 True positive and true negative prediction rates of variant-level methods and MSC, estimated by a set of 4,152 recently acquired HGMD disease-associated deleterious alleles of 1,119 missense mutations.

(A) True positive and true negative (private non-disease-causing variants of patients) prediction rates by CADD, PolyPhen-2 and SIFT, using fixed cutoffs, hot zone approach (combining RVIS and PolyPhen-2), and MSC estimates with 90%, 95% and 99% CIs generated by the HGMD mutation database. (B) True positive (new deleterious mutations that were not used to generate the ClinVar-based MSC scores) and true negative (private nondisease-causing variants of patients) prediction rates by CADD, PolyPhen-2 and SIFT, using fixed cutoffs, hot zone approach (combining RVIS and PolyPhen-2), and MSC estimates with 90%, 95% and 99% CIs generated by the ClinVar mutation database.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–6, Supplementary Note and Supplementary Methods (PDF 1049 kb)

Supplementary Table 1

A summary of the CADD-based 99% CI MSC protein-coding human genes. (XLSX 1409 kb)

Supplementary Table 2

A summary of the CADD-based 95% CI MSC protein-coding human genes. (XLSX 1425 kb)

Supplementary Table 3

A summary of the CADD-based 90% CI MSC protein-coding human genes. (XLSX 1437 kb)

Supplementary Table 4

A summary of the PolyPhen-2-based 99% CI MSC protein-coding human genes. (XLSX 1689 kb)

Supplementary Table 5

A summary of the PolyPhen-2-based 95% CI MSC protein-coding human genes. (XLSX 1700 kb)

Supplementary Table 6

A summary of the PolyPhen-2-based 90% CI MSC protein-coding human genes. (XLSX 1710 kb)

Supplementary Table 7

A summary of the SIFT-based 99% CI MSC protein-coding human genes. (XLSX 1696 kb)

Supplementary Table 8

A summary of the SIFT-based 95% CI MSC protein-coding human genes. (XLSX 1777 kb)

Supplementary Table 9

A summary of the SIFT-based 90% CI MSC protein-coding human genes. (XLSX 1808 kb)

Supplementary Table 10

KEGG pathway categories displaying high levels of enrichment among genes with low and high MSC scores. (XLSX 40 kb)

Supplementary Table 11

True positive and true negative prediction rates, HGMD-based. (XLSX 45 kb)

Supplementary Table 12

True positive and true negative prediction rates, ClinVar-based. (XLSX 45 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Itan, Y., Shang, L., Boisson, B. et al. The mutation significance cutoff: gene-level thresholds for variant predictions. Nat Methods 13, 109–110 (2016). https://doi.org/10.1038/nmeth.3739

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.3739

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research