Technical Report | Published:

Unsupervised detection of cancer driver mutations with parsimony-guided learning

Nature Genetics volume 48, pages 12881294 (2016) | Download Citation

Abstract

Methods are needed to reliably prioritize biologically active driver mutations over inactive passengers in high-throughput sequencing cancer data sets. We present ParsSNP, an unsupervised functional impact predictor that is guided by parsimony. ParsSNP uses an expectation–maximization framework to find mutations that explain tumor incidence broadly, without using predefined training labels that can introduce biases. We compare ParsSNP to five existing tools (CanDrA, CHASM, FATHMM Cancer, TransFIC, and Condel) across five distinct benchmarks. ParsSNP outperformed the existing tools in 24 of 25 comparisons. To investigate the real-world benefit of these improvements, we applied ParsSNP to an independent data set of 30 patients with diffuse-type gastric cancer. ParsSNP identified many known and likely driver mutations that other methods did not detect, including truncation mutations in known tumor suppressors and the recurrent driver substitution RHOA p.Tyr42Cys. In conclusion, ParsSNP uses an innovative, parsimony-based approach to prioritize cancer driver mutations and provides dramatic improvements over existing methods.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 39, D945–D950 (2011).

  2. 2.

    et al. Cancer genome landscapes. Science 339, 1546–1558 (2013).

  3. 3.

    , , , & Identifying Mendelian disease genes with the variant effect scoring tool. BMC Genomics 14 (Suppl. 3), S3 (2013).

  4. 4.

    et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).

  5. 5.

    et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).

  6. 6.

    et al. CanDrA: cancer-specific driver missense mutation annotation with optimized features. PLoS One 8, e77945 (2013).

  7. 7.

    et al. Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations. Cancer Res. 69, 6660–6667 (2009).

  8. 8.

    , , & A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nat. Genet. 48, 214–220 (2016).

  9. 9.

    , , , & Statistically identifying tumor suppressors and oncogenes from pan-cancer genome-sequencing data. Bioinformatics 31, 3561–3568 (2015).

  10. 10.

    & Identifying cancer driver genes in tumor genome sequencing studies. Bioinformatics 27, 175–181 (2011).

  11. 11.

    , , , & Only three driver gene mutations are required for the development of lung and colorectal cancers. Proc. Natl. Acad. Sci. USA 112, 118–123 (2015).

  12. 12.

    , , & Extending P450 site-of-metabolism models with region-resolution data. Bioinformatics 31, 1966–1973 (2015).

  13. 13.

    , , & & Marino-Buslje, C. Kin-Driver: a database of driver mutations in protein kinases. Database (Oxford) 2014, bau104 (2014).

  14. 14.

    et al. Benchmarking mutation effect prediction algorithms using functionally validated cancer-related missense mutations. Genome Biol. 15, 484 (2014).

  15. 15.

    et al. Impact of mutant p53 functional properties on TP53 mutation patterns and tumor phenotype: lessons from recent developments in the IARC TP53 database. Hum. Mutat. 28, 622–629 (2007).

  16. 16.

    et al. Systematic functional interrogation of rare cancer variants identifies oncogenic alleles. Cancer Discov. 6, 714–726 (2016).

  17. 17.

    et al. Recurrent gain-of-function mutations of RHOA in diffuse-type gastric carcinoma. Nat. Genet. 46, 583–587 (2014).

  18. 18.

    , , , & OncodriveROLE classifies cancer driver genes in loss of function and activating mode of action. Bioinformatics 30, i549–i555 (2014).

  19. 19.

    et al. A census of human cancer genes. Nat. Rev. Cancer 4, 177–183 (2004).

  20. 20.

    , , , & Predicting the functional consequences of cancer-associated amino acid substitutions. Bioinformatics 29, 1504–1510 (2013).

  21. 21.

    , & Improving the prediction of the functional impact of cancer mutations by baseline tolerance transformation. Genome Med. 4, 89 (2012).

  22. 22.

    & Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. Am. J. Hum. Genet. 88, 440–449 (2011).

  23. 23.

    & Illuminating the “black box”: a randomization approach for understanding variable contributions in artificial neural networks. Ecol. Modell. 154, 135–150 (2002).

  24. 24.

    , & ARID1A, a factor that promotes formation of SWI/SNF-mediated chromatin remodeling, is a tumor suppressor in gynecologic cancers. Cancer Res. 71, 6718–6727 (2011).

  25. 25.

    et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).

  26. 26.

    et al. Activating HER2 mutations in HER2 gene amplification negative breast cancer. Cancer Discov. 3, 224–237 (2013).

  27. 27.

    , & Phosphatidylinositol 3-kinase mutations identified in human cancer are oncogenic. Proc. Natl. Acad. Sci. USA 102, 802–807 (2005).

  28. 28.

    et al. Tumour suppressor RNF43 is a stem-cell E3 ligase that induces endocytosis of Wnt receptors. Nature 488, 665–669 (2012).

  29. 29.

    , & Role of the nonsense-mediated decay factor hUpf3 in the splicing-dependent exon–exon junction complex. Science 293, 1832–1836 (2001).

  30. 30.

    et al. Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957–959 (2013).

  31. 31.

    et al. A method to predict the impact of regulatory variants from DNA sequence. Nat. Genet. 47, 955–961 (2015).

  32. 32.

    et al. The UCSC Genome Browser database: update 2011. Nucleic Acids Res. 39, D876–D882 (2011).

  33. 33.

    , & ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).

  34. 34.

    et al. Mutational landscape and significance across 12 major cancer types. Nature 502, 333–339 (2013).

  35. 35.

    , & Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res. 39, e118 (2011).

  36. 36.

    & Functional impact bias reveals cancer drivers. Nucleic Acids Res. 40, e169 (2012).

  37. 37.

    & Artificial neural networks: fundamentals, computing, design, and application. J. Microbiol. Methods 43, 3–31 (2000).

  38. 38.

    , & Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39, 1–38 (1977).

  39. 39.

    On computing the distribution function for the sum of independent and nonidentical random indicators (Technical Report 11-2) (Department of Statistics, Virginia Tech, 2011).

  40. 40.

    Cancer Genome Atlas Research Network. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N. Engl. J. Med. 368, 2059–2074 (2013).

  41. 41.

    , & OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes. Bioinformatics 29, 2238–2244 (2013).

  42. 42.

    , & Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44, 837–845 (1988).

  43. 43.

    & Modern Applied Statistics with S (Springer Science & Business Media, 2002).

  44. 44.

    et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 12, 77 (2011).

Download references

Acknowledgements

We thank O.L. Griffith for critically reading the manuscript. Our work was supported by the Alvin J. Siteman Cancer Center, the Ohana Breast Cancer Research Fund, the Foundation for the Barnes-Jewish Hospital (to R.B.), the National Library of Medicine of the National Institutes of Health (R01LM012222 to S.J.S.), and the Canadian Institutes of Health Research (DFS-134967 to R.D.K.).

Author information

Affiliations

  1. Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, Missouri, USA.

    • Runjun D Kumar
    •  & Ron Bose
  2. Computational and Systems Biology Program, Washington University in St. Louis, St. Louis, Missouri, USA.

    • Runjun D Kumar
    •  & S Joshua Swamidass
  3. Medical Scientist Training Program, Washington University School of Medicine, St. Louis, Missouri, USA.

    • Runjun D Kumar
  4. Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, Missouri, USA.

    • S Joshua Swamidass

Authors

  1. Search for Runjun D Kumar in:

  2. Search for S Joshua Swamidass in:

  3. Search for Ron Bose in:

Contributions

R.D.K. and S.J.S. designed the study. R.D.K. wrote software and performed the analysis. R.D.K., S.J.S. and R.B. wrote the manuscript. R.B. supervised the project.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to S Joshua Swamidass or Ron Bose.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–12.

Excel files

  1. 1.

    Supplementary Tables 1–7

    Supplementary Tables 1–7.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/ng.3658

Further reading