The mutation significance cutoff: gene-level thresholds for variant predictions

Itan, Yuval; Shang, Lei; Boisson, Bertrand; Ciancanelli, Michael J; Markle, Janet G; Martinez-Barricarte, Ruben; Scott, Eric; Shah, Ishaan; Stenson, Peter D; Gleeson, Joseph; Cooper, David N; Quintana-Murci, Lluis; Zhang, Shen-Ying; Abel, Laurent; Casanova, Jean-Laurent

doi:10.1038/nmeth.3739

Correspondence
Published: 28 January 2016

The mutation significance cutoff: gene-level thresholds for variant predictions

Yuval Itan¹,
Lei Shang¹,
Bertrand Boisson^1,2,3,
Michael J Ciancanelli¹,
Janet G Markle¹,
Ruben Martinez-Barricarte¹,
Eric Scott⁴,
Ishaan Shah¹,
Peter D Stenson⁵,
Joseph Gleeson^4,6,
David N Cooper⁵,
Lluis Quintana-Murci^7,8,
Shen-Ying Zhang^1,2,3^na1,
Laurent Abel^1,2,3^na1 &
…
Jean-Laurent Casanova^1,2,3,6,9^na1

Nature Methods volume 13, pages 109–110 (2016)Cite this article

8006 Accesses
193 Citations
34 Altmetric
Metrics details

Subjects

Access through your institution

Buy or subscribe

To the Editor:

Next-generation sequencing (NGS) identifies about 20,000 variants per exome, of which only a few may underlie genetic diseases. Variant-level methods such as PolyPhen-2 (polymorphism phenotyping version 2), SIFT (sorting intolerant from tolerant) and CADD (combined annotation–dependent depletion) attempt to predict whether a given variant is benign or deleterious^1,2,3. These methods are commonly interpreted in a binary manner as a means of filtering out benign variants from NGS data, with a single significance cutoff value across all genes. CADD developers propose (but do not recommend for categorical usage) a fixed cutoff value between 10 and 20 on a scale of 1–99, with 99 being the most deleterious. Gene-level methods, including RVIS (residual variation intolerance score, which applies combined fixed gene- and variant-level cutoffs), de novo excess and GDI (gene damage index), are also useful^4,5,6. However, a uniform cutoff is unlikely to be accurate genome-wide (see Supplementary Note).

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

A Novel Biallelic LCK Variant Resulting in Profound T-Cell Immune Deficiency and Review of the Literature
- Anna-Lisa Lanz
- , Serife Erdem
- … Fabian Hauck
Journal of Clinical Immunology Open Access 15 December 2023
Identifying high-impact variants and genes in exomes of Ashkenazi Jewish inflammatory bowel disease patients
- Yiming Wu
- , Kyle Gettler
- … Yuval Itan
Nature Communications Open Access 20 April 2023
Rare predicted loss-of-function variants of type I IFN immunity genes are associated with life-threatening COVID-19
- Daniela Matuozzo
- , Estelle Talouarn
- … Aurélie Cobat
Genome Medicine Open Access 05 April 2023

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Disease-associated mutation features.**

References

Adzhubei, I.A. et al. Nat. Methods 7, 248–249 (2010).
Article CAS Google Scholar
Kumar, P., Henikoff, S. & Ng, P.C. Nat. Protoc. 4, 1073–1081 (2009).
Article CAS Google Scholar
Kircher, M. et al. Nat. Genet. 46, 310–315 (2014).
Article CAS Google Scholar
Petrovski, S., Wang, Q., Heinzen, E.L., Allen, A.S. & Goldstein, D.B. PLoS Genet. 9, e1003709 (2013).
Article CAS Google Scholar
Samocha, K.E. et al. Nat. Genet. 46, 944–950 (2014).
Article CAS Google Scholar
Itan, Y. et al. Proc. Natl. Acad. Sci. USA 112, 13615–13620 (2015).
Article CAS Google Scholar
Stenson, P.D. et al. Hum. Genet. 133, 1–9 (2014).
Article CAS Google Scholar
Landrum, M.J. et al. Nucleic Acids Res. 42, D980–D985 (2014).
Article CAS Google Scholar
Auton, A. et al. Nature 526, 68–74 (2015).
Article Google Scholar

Download references

Acknowledgements

We thank M. Kircher for information about the CADD method and D.B. Goldstein for gene-level metrics insights. We thank Y. Nemirovskaya, E. Anderson, M. Woollett and D. Papandrea for administrative support. Y.I. was supported in part by grant no. UL1 TR000043 from the National Center for Advancing Translational Sciences (NCATS), US National Institutes of Health (NIH) Clinical and Translational Science Award (CTSA) program. This study was supported by the Rockefeller University and the St. Giles Foundation.

Author information

Shen-Ying Zhang, Laurent Abel and Jean-Laurent Casanova: These authors contributed equally to this work.

Authors and Affiliations

St. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, New York, USA
Yuval Itan, Lei Shang, Bertrand Boisson, Michael J Ciancanelli, Janet G Markle, Ruben Martinez-Barricarte, Ishaan Shah, Shen-Ying Zhang, Laurent Abel & Jean-Laurent Casanova
Laboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM U1163, Paris, France
Bertrand Boisson, Shen-Ying Zhang, Laurent Abel & Jean-Laurent Casanova
Paris Descartes University, Imagine Institute, Paris, France
Bertrand Boisson, Shen-Ying Zhang, Laurent Abel & Jean-Laurent Casanova
Department of Neurosciences, Neurogenetics Laboratory, University of California, San Diego, San Diego, California, USA
Eric Scott & Joseph Gleeson
Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, UK
Peter D Stenson & David N Cooper
Howard Hughes Medical Institute, New York, New York, USA
Joseph Gleeson & Jean-Laurent Casanova
Human Evolutionary Genetics Unit, Institut Pasteur, Paris, France
Lluis Quintana-Murci
Centre National de la Recherche Scientifique, URA 3012, Paris, France
Lluis Quintana-Murci
Pediatric Immunology-Hematology Unit, Necker Hospital for Sick Children, Paris, France
Jean-Laurent Casanova

Authors

Yuval Itan
View author publications
You can also search for this author in PubMed Google Scholar
Lei Shang
View author publications
You can also search for this author in PubMed Google Scholar
Bertrand Boisson
View author publications
You can also search for this author in PubMed Google Scholar
Michael J Ciancanelli
View author publications
You can also search for this author in PubMed Google Scholar
Janet G Markle
View author publications
You can also search for this author in PubMed Google Scholar
Ruben Martinez-Barricarte
View author publications
You can also search for this author in PubMed Google Scholar
Eric Scott
View author publications
You can also search for this author in PubMed Google Scholar
Ishaan Shah
View author publications
You can also search for this author in PubMed Google Scholar
Peter D Stenson
View author publications
You can also search for this author in PubMed Google Scholar
Joseph Gleeson
View author publications
You can also search for this author in PubMed Google Scholar
David N Cooper
View author publications
You can also search for this author in PubMed Google Scholar
Lluis Quintana-Murci
View author publications
You can also search for this author in PubMed Google Scholar
Shen-Ying Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Laurent Abel
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Laurent Casanova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuval Itan.

Ethics declarations

Competing interests

D.N.C. and P.D.S. are in receipt of funding from Qiagen through a license agreement with Cardiff University. The other authors declare that they have no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Comparison of the performance of impact prediction methods and mutation signatures, on the basis of functional evidence.

(A) ROC curves comparing performance of CADD with PolyPhen-2 and SIFT to distinguish between true positive disease-causing missense mutations extracted from HGMD and false-positive neutral private missense variants derived from patients' WES data from which the known disease-causing mutation was removed. (B) Association of disease-associated deleterious mutation allele frequencies (MAF) with predicted impact scores. Plot of 129,586 HGMD true-positive deleterious mutations, against their minor allele frequencies, in slots of MAF = 0, MAF ≤ 0.001 and MAF ≤ 0.01.

Supplementary Figure 2 Density plots of all 127,109 known HGMD disease-associated deleterious mutations and their corresponding 180,305 alleles by different variant-level software.

(A) CADD scores of all disease-associated deleterious mutations. (B) CADD scores of all disease-associated deleterious alleles. (C) PolyPhen-2 scores of all disease-associated deleterious mutations. (D) PolyPhen-2 scores of all disease-associated deleterious alleles. (E) SIFT scores of all disease-associated deleterious mutations. (F) SIFT scores of all disease-associated deleterious alleles.

Supplementary Figure 3 Density plots of 100,000 bootstrapping simulations to estimate TP and TN prediction rates with CADD-based MSC, by randomly partitioning 1,283 genes that contain at least 10 mutations in both HGMD and in the 1,000 Genomes Project database.

(A) TP prediction rate of novel disease-associated deleterious mutations by 90% CI MSC. (B) TN prediction rate of novel disease-associated deleterious mutations by 90% CI MSC. (C) TP prediction rate of novel disease-associated deleterious mutations by 95% CI MSC. (D) TN prediction rate of novel disease-associated deleterious mutations by 95% CI MSC. (E) TP prediction rate of novel disease-associated deleterious mutations by 99% CI MSC. (F) TN prediction rate of novel disease-associated deleterious mutations by 99% CI MSC.

Supplementary Figure 4 Characteristics of CADD-based 95% CI MSC scores generated from HGMD disease-associated mutations.

(A) Density plot of the MSC scores of all 19,698 human protein-coding genes, with a genome-wide MSC median=10.60. (B) An inverse exponential correlation between MSC and gene damage level measured by the gene damage index (GDI), showing that low MSC genes tend to be highly damaged whereas high MSC genes tend to be only slightly damaged. (C) An exponential correlation between MSC and purifying selection level measured by the neutrality index (NI), showing that high MSC genes tend to be under stronger purifying selection. (D) KEGG pathways functional enrichment of 985 genes with low MSC. The upper panel shows enrichment in the complement and coagulation cascades pathway; the lower panel shows enrichment in the ECM-receptor interaction pathway. (D) 2,288 high MSC genes display a functional enrichment in the Ribosome pathway.

Supplementary Figure 5 ROC curves comparing the performance of MSC with variant-level methods and the RVIS hot zone approach.

(A) CADD-based MSC generated with 90%, 95% and 99% CIs with CADD prediction (provided by the PolyPhen-2 method, based on a fixed cutoff), as well as the RVIS hot zone approach combining RVIS and PolyPhen-2 fixed cutoffs. (B) PolyPhen-2-based MSC generated with 90%, 95% and 99% CIs with PolyPhen-2 prediction (provided by the PolyPhen-2 method, based on a fixed cutoff), as well as the RVIS hot zone approach combining RVIS and PolyPhen-2 fixed cutoffs. (C) SIFT-based MSC generated with 90%, 95% and 99% CIs with SIFT prediction (provided by the SIFT method, based on a fixed cutoff). See Supplementary Methods for a full description of the TP and FP sets used.

Supplementary Figure 6 True positive and true negative prediction rates of variant-level methods and MSC, estimated by a set of 4,152 recently acquired HGMD disease-associated deleterious alleles of 1,119 missense mutations.

(A) True positive and true negative (private non-disease-causing variants of patients) prediction rates by CADD, PolyPhen-2 and SIFT, using fixed cutoffs, hot zone approach (combining RVIS and PolyPhen-2), and MSC estimates with 90%, 95% and 99% CIs generated by the HGMD mutation database. (B) True positive (new deleterious mutations that were not used to generate the ClinVar-based MSC scores) and true negative (private nondisease-causing variants of patients) prediction rates by CADD, PolyPhen-2 and SIFT, using fixed cutoffs, hot zone approach (combining RVIS and PolyPhen-2), and MSC estimates with 90%, 95% and 99% CIs generated by the ClinVar mutation database.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Itan, Y., Shang, L., Boisson, B. et al. The mutation significance cutoff: gene-level thresholds for variant predictions. Nat Methods 13, 109–110 (2016). https://doi.org/10.1038/nmeth.3739

Download citation

Published: 28 January 2016
Issue Date: February 2016
DOI: https://doi.org/10.1038/nmeth.3739

This article is cited by

A Novel Biallelic LCK Variant Resulting in Profound T-Cell Immune Deficiency and Review of the Literature
- Anna-Lisa Lanz
- Serife Erdem
- Fabian Hauck
Journal of Clinical Immunology (2024)
Management of Atopy with Dupilumab and Omalizumab in CADINS Disease
- Natalie M. Diaz-Cabrera
- Bradly M. Bauman
- Jennifer W. Leiding
Journal of Clinical Immunology (2024)
Identifying shared genetic factors underlying epilepsy and congenital heart disease in Europeans
- Yiming Wu
- Cigdem Sevim Bayrak
- Lei Chen
Human Genetics (2023)
Inherited IRAK-4 Deficiency in Acute Human Herpesvirus-6 Encephalitis
- Zeynep Güneş Tepe
- Yılmaz Yücehan Yazıcı
- Serkan Belkaya
Journal of Clinical Immunology (2023)
Fulminant Viral Hepatitis in Two Siblings with Inherited IL-10RB Deficiency
- Cecilia B. Korol
- Serkan Belkaya
- Emmanuelle Jouanguy
Journal of Clinical Immunology (2023)