Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Identification of significantly mutated regions across cancer types highlights a rich landscape of functional molecular alterations

Abstract

Cancer sequencing studies have primarily identified cancer driver genes by the accumulation of protein-altering mutations. An improved method would be annotation independent, sensitive to unknown distributions of functions within proteins and inclusive of noncoding drivers. We employed density-based clustering methods in 21 tumor types to detect variably sized significantly mutated regions (SMRs). SMRs reveal recurrent alterations across a spectrum of coding and noncoding elements, including transcription factor binding sites and untranslated regions mutated in up to 15% of specific tumor types. SMRs demonstrate spatial clustering of alterations in molecular domains and at interfaces, often with associated changes in signaling. Mutation frequencies in SMRs demonstrate that distinct protein regions are differentially mutated across tumor types, as exemplified by a linker region of PIK3CA in which biophysical simulations suggest that mutations affect regulatory interactions. The functional diversity of SMRs underscores both the varied mechanisms of oncogenic misregulation and the advantage of functionally agnostic driver identification.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Identification of SMRs in 21 cancer types across a broad spectrum of functional elements.
Figure 2: Noncoding SMRs recurrently alter promoters and 5′ UTRs.
Figure 3: Structural mapping of SMRs onto proteins and complexes identifies differentially altered regions among cancers and molecular interfaces targeted by recurrent alterations.
Figure 4: SMRs are associated with distinct molecular signatures.
Figure 5: Structure in the distribution of cancer mutations remains largely uncharacterized.

Similar content being viewed by others

Accession codes

Accessions

Protein Data Bank

References

  1. Hodis, E. et al. A landscape of driver mutations in melanoma. Cell 150, 251–263 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Huang, F.W. et al. Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957–959 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Alexandrov, L.B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Lawrence, M.S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Lawrence, M.S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Ding, L., Wendl, M.C., McMichael, J.F. & Raphael, B.J. Expanding the computational toolbox for mining cancer genomes. Nat. Rev. Genet. 15, 556–570 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Davies, H. et al. Mutations of the BRAF gene in human cancer. Nature 417, 949–954 (2002).

    Article  CAS  PubMed  Google Scholar 

  8. Parsons, D.W. et al. An integrated genomic analysis of human glioblastoma multiforme. Science 321, 1807–1812 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Kane, D.P. & Shcherbakova, P.V. A common cancer-associated DNA polymerase ɛ mutation causes an exceptionally strong mutator phenotype, indicating fidelity defects distinct from loss of proofreading. Cancer Res. 74, 1895–1901 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Dees, N.D. et al. MuSiC: identifying mutational significance in cancer genomes. Genome Res. 22, 1589–1598 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. Tamborero, D., Gonzalez-Perez, A. & Lopez-Bigas, N. OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes. Bioinformatics 29, 2238–2244 (2013).

    Article  CAS  PubMed  Google Scholar 

  12. Porta-Pardo, E. & Godzik, A. e-Driver: a novel method to identify protein regions driving cancer. Bioinformatics 30, 3109–3114 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Schnall-Levin, M., Zhao, Y., Perrimon, N. & Berger, B. Conserved microRNA targeting in Drosophila is as widespread in coding regions as in 3′ UTRs. Proc. Natl. Acad. Sci. USA 107, 15751–15756 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Cenik, C. et al. Genome analysis reveals interplay between 5′ UTR introns and nuclear mRNA export for secretory and mitochondrial genes. PLoS Genet. 7, e1001366 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Stergachis, A.B. et al. Exonic transcription factor binding directs codon choice and affects protein evolution. Science 342, 1367–1372 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Wolfe, A.L. et al. RNA G-quadruplexes cause eIF4A-dependent oncogene translation in cancer. Nature 513, 65–70 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Xiong, H.Y. et al. RNA splicing. The human splicing code reveals new insights into the genetic determinants of disease. Science 347, 1254806 (2015).

    Article  PubMed  CAS  Google Scholar 

  18. Gerstberger, S., Hafner, M. & Tuschl, T. A census of human RNA-binding proteins. Nat. Rev. Genet. 15, 829–845 (2014).

    Article  CAS  PubMed  Google Scholar 

  19. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  20. Weinhold, N., Jacobsen, A., Schultz, N., Sander, C. & Lee, W. Genome-wide analysis of noncoding regulatory mutations in cancer. Nat. Genet. 46, 1160–1165 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Fredriksson, N.J., Ny, L., Nilsson, J.A. & Larsson, E. Systematic analysis of noncoding somatic mutations and gene expression alterations across 14 tumor types. Nat. Genet. 46, 1258–1263 (2014).

    Article  CAS  PubMed  Google Scholar 

  22. Supek, F., Miñana, B., Valcárcel, J., Gabaldón, T. & Lehner, B. Synonymous mutations frequently act as driver mutations in human cancers. Cell 156, 1324–1335 (2014).

    Article  CAS  PubMed  Google Scholar 

  23. Melton, C., Reuter, J.A., Spacek, D.V. & Snyder, M. Recurrent somatic mutations in regulatory regions of human cancer genomes. Nat. Genet. 47, 710–716 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Hofree, M., Shen, J.P., Carter, H., Gross, A. & Ideker, T. Network-based stratification of tumor mutations. Nat. Methods 10, 1108–1115 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Leiserson, M.D.M. et al. Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat. Genet. 47, 106–114 (2015).

    Article  CAS  PubMed  Google Scholar 

  26. Araya, C.L. et al. Regulatory analysis of the C. elegans genome with spatiotemporal resolution. Nature 512, 400–405 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Stergachis, A.B. et al. Conservation of trans-acting circuitry during mammalian regulatory evolution. Nature 515, 365–370 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Roadmap Epigenomics Consortium. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

  29. Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6, 80–92 (2012).

    Article  CAS  Google Scholar 

  30. Martin, E., Kriegel, H.P., Jörg, S. & Xiaowei, X. A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96, 226–231 (1996).

    Google Scholar 

  31. Futreal, P.A. et al. A census of human cancer genes. Nat. Rev. Cancer 4, 177–183 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Santarius, T., Shipley, J., Brewer, D., Stratton, M.R. & Cooper, C.S. A census of amplified and overexpressed human cancer genes. Nat. Rev. Cancer 10, 59–64 (2010).

    Article  CAS  PubMed  Google Scholar 

  33. Malhotra, A. et al. Breakpoint profiling of 64 cancer genomes reveals numerous complex rearrangements spawned by homology-independent mechanisms. Genome Res. 23, 762–776 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Jäger, D. et al. Identification of a tissue-specific putative transcription factor in breast tissue by serological screening of a breast cancer library. Cancer Res. 61, 2055–2061 (2001).

    PubMed  Google Scholar 

  35. Mei, Y.-P. et al. Small nucleolar RNA 42 acts as an oncogene in lung tumorigenesis. Oncogene 31, 2794–2804 (2012).

    Article  CAS  PubMed  Google Scholar 

  36. Okugawa, Y. et al. Clinical significance of SNORA42 as an oncogene and a prognostic biomarker in colorectal cancer. Gut http://dx.doi.org/10.1136/gutjnl-2015-309359 (15 October 2015).

  37. Budinska, E. et al. Gene expression patterns unveil a new level of molecular heterogeneity in colorectal cancer. J. Pathol. 231, 63–76 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Uhlén, M. et al. Proteomics. Tissue-based map of the human proteome. Science 347, 1260419 (2015).

    Article  CAS  PubMed  Google Scholar 

  39. Vejnar, C.E. & Zdobnov, E.M. MiRmap: comprehensive prediction of microRNA target repression strength. Nucleic Acids Res. 40, 11673–11683 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Lara, R., Seckl, M.J. & Pardo, O.E. The p90 RSK family members: common functions and isoform specificity. Cancer Res. 73, 5301–5308 (2013).

    Article  CAS  PubMed  Google Scholar 

  41. Li, J. et al. TCPA: a resource for cancer functional proteomics data. Nat. Methods 10, 1046–1047 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Samuels, Y. et al. High frequency of mutations of the PIK3CA gene in human cancers. Science 304, 554 (2004).

    Article  CAS  PubMed  Google Scholar 

  43. Thorpe, L.M., Yuzugullu, H. & Zhao, J.J. PI3K in cancer: divergent roles of isoforms, modes of activation and therapeutic targeting. Nat. Rev. Cancer 15, 7–24 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Cancer Genome Atlas Research Network. Integrated genomic characterization of endometrial carcinoma. Nature 497, 67–73 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  45. Miled, N. et al. Mechanism of two classes of cancer mutations in the phosphoinositide 3-kinase catalytic subunit. Science 317, 239–242 (2007).

    Article  CAS  PubMed  Google Scholar 

  46. Huang, C.-H. et al. The structure of a human p110α/p85α complex elucidates the effects of oncogenic PI3Kα mutations. Science 318, 1744–1748 (2007).

    Article  CAS  PubMed  Google Scholar 

  47. Gkeka, P. et al. Investigating the structure and dynamics of the PIK3CA wild-type and H1047R oncogenic mutant. PLoS Comput. Biol. 10, e1003895 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  48. Burke, J.E., Perisic, O., Masson, G.R., Vadas, O. & Williams, R.L. Oncogenic mutations mimic and enhance dynamic events in the natural activation of phosphoinositide 3-kinase p110α (PIK3CA). Proc. Natl. Acad. Sci. USA 109, 15259–15264 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Haling, J.R. et al. Structure of the BRAF-MEK complex reveals a kinase activity independent role for BRAF in MAPK signaling. Cancer Cell 26, 402–413 (2014).

    Article  CAS  PubMed  Google Scholar 

  50. Kar, G., Gursoy, A. & Keskin, O. Human cancer protein-protein interaction network: a structural perspective. PLoS Comput. Biol. 5, e1000601 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  51. Ghersi, D. & Singh, M. Interaction-based discovery of functionally important genes in cancers. Nucleic Acids Res. 42, e18 (2014).

    Article  CAS  PubMed  Google Scholar 

  52. Cheng, F. et al. Studying tumorigenesis through network evolution and somatic mutational perturbations in the cancer interactome. Mol. Biol. Evol. 31, 2156–2169 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Barbieri, C.E. et al. Exome sequencing identifies recurrent SPOP, FOXA1 and MED12 mutations in prostate cancer. Nat. Genet. 44, 685–689 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Fleming, N.I. et al. SMAD2, SMAD3 and SMAD4 mutations in colorectal cancer. Cancer Res. 73, 725–735 (2013).

    Article  CAS  PubMed  Google Scholar 

  55. Yuen, B.T.K. & Knoepfler, P.S. Histone H3.3 mutations: a variant path to cancer. Cancer Cell 24, 567–574 (2013).

    Article  CAS  PubMed  Google Scholar 

  56. Hornbeck, P.V. et al. PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res. 40, D261–D270 (2012).

    Article  CAS  PubMed  Google Scholar 

  57. Cheng, K.W. et al. The RAB25 small GTPase determines aggressiveness of ovarian and breast cancers. Nat. Med. 10, 1251–1256 (2004).

    Article  CAS  PubMed  Google Scholar 

  58. Zhang, J. et al. Overexpression of Rab25 contributes to metastasis of bladder cancer through induction of epithelial-mesenchymal transition and activation of Akt/GSK-3β/Snail signaling. Carcinogenesis 34, 2401–2408 (2013).

    Article  PubMed  CAS  Google Scholar 

  59. DeNicola, G.M. et al. Oncogene-induced Nrf2 transcription promotes ROS detoxification and tumorigenesis. Nature 475, 106–109 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Ji, Q. et al. Selective loss of AKR1C1 and AKR1C2 in breast cancer and their potential effect on progesterone signaling. Cancer Res. 64, 7610–7617 (2004).

    Article  CAS  PubMed  Google Scholar 

  61. Stanbrough, M. et al. Increased expression of genes converting adrenal androgens to testosterone in androgen-independent prostate cancer. Cancer Res. 66, 2815–2825 (2006).

    Article  CAS  PubMed  Google Scholar 

  62. Rižner, T.L., Šmuc, T., Rupreht, R., Šinkovec, J. & Penning, T.M. AKR1C1 and AKR1C3 may determine progesterone and estrogen ratios in endometrial cancer. Mol. Cell. Endocrinol. 248, 126–135 (2006).

    Article  PubMed  CAS  Google Scholar 

  63. Zhao, L. & Vogt, P.K. Helical domain and kinase domain mutations in p110α of phosphatidylinositol 3-kinase induce gain of function by different mechanisms. Proc. Natl. Acad. Sci. USA 105, 2652–2657 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Wu, X. et al. Activation of diverse signalling pathways by oncogenic PIK3CA mutations. Nat. Commun. 5, 4961 (2014).

    Article  CAS  PubMed  Google Scholar 

  65. Puente, X.S. et al. Non-coding recurrent mutations in chronic lymphocytic leukaemia. Nature 526, 519–524 (2015).

    Article  CAS  PubMed  Google Scholar 

  66. Supek, F. & Lehner, B. Differential DNA mismatch repair underlies mutation rate variation across the human genome. Nature 521, 81–84 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Reijns, M.A.M. et al. Lagging-strand replication shapes the mutational landscape of the genome. Nature 518, 502–506 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Lord, C.J. & Ashworth, A. The DNA damage response and cancer therapy. Nature 481, 287–294 (2012).

    Article  CAS  PubMed  Google Scholar 

  69. Roberts, S.A. et al. An APOBEC cytidine deaminase mutagenesis pattern is widespread in human cancers. Nat. Genet. 45, 970–976 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Polak, P. et al. Cell-of-origin chromatin organization shapes the mutational landscape of cancer. Nature 518, 360–364 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Araya, C.L. et al. A fundamental protein property, thermodynamic stability, revealed solely from large-scale measurements of protein function. Proc. Natl. Acad. Sci. USA 109, 16858–16863 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Buenrostro, J.D. et al. Quantitative analysis of RNA-protein interactions on a massively parallel array reveals biophysical and evolutionary landscapes. Nat. Biotechnol. 32, 562–568 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Guenther, U.-P. et al. Hidden specificity in an apparently nonspecific RNA-binding protein. Nature 502, 385–388 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Oliphant, T.E. Python for scientific computing. Comput. Sci. Eng. 9, 10–20 (2007).

    Article  CAS  Google Scholar 

  75. Millman, K.J. & Aivazis, M. Python for scientists and engineers. Comput. Sci. Eng. 13, 9–12 (2011).

    Article  Google Scholar 

  76. McKinney, W. in Proc. 9th Python Sci. Conf. (eds. van der Walt, S. & Millman, J.) 51–56 (2010). ISBN-13: 978-1-4583-4619-3.

  77. Dale, R.K., Pedersen, B.S. & Quinlan, A.R. Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinformatics 27, 3423–3424 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Van der Walt, S., Colbert, S.C. & Varoquaux, G. The NumPy Array: a structure for efficient numerical computation. Comput. Sci. Eng. 13, 22–30 (2011).

    Article  Google Scholar 

  79. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    Google Scholar 

  80. Cock, P.J.A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Boyle, A.P. et al. Comparative analysis of regulatory information and circuits across distant species. Nature 512, 453–456 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank the TCGA, ICGC and TCPA for making these large-scale cancer data sets available to the scientific community. We thank H. Tang for discussions regarding statistical analyses. We thank M.M. Winslow, D.M. Fowler, S. Fields and D.E. Webster for critical reading and suggestions to the manuscript. C.L.A. was supported by US National Institutes of Health (NIH) grants 3U54DK10255602 and 1P50HG00773501. C.C. was supported by the Child Health Research Institute, the Lucile Packard Foundation for Children's Health and US NIH Clinical and Translational Science Award grant UL1TR000093. J.A.R. was supported by the Damon Runyon Cancer Research Foundation and US NIH award 1U01HG007919-01. G.K. acknowledges support from the Lawrence Scholars Program, the US NIH Simbios Program (U54GM072970) and the Center for Molecular Analysis and Design at Stanford University. Biophysical simulations were supported by the Blue Waters project via US National Science Foundation awards OCI-0725070 and ACI-1238993 and the state of Illinois. Further support was provided by the National Center for Multiscale Modeling of Biological Systems (P41GM103712-S1) through Anton-1 resources provided by the Pittsburgh Supercomputing Center under grant PSCA13072P. This work was supported by the Rita Allen Foundation.

Author information

Authors and Affiliations

Authors

Contributions

C.L.A. and W.J.G. conceived of the project, and all authors designed experiments and methods. C.L.A. and C.C. developed methods for the detection and analysis of SMRs. C.L.A. constructed uniform annotations and non-Bayesian mutation probability models and performed density-based clustering, scoring and empirical false-discovery estimation (simulations), as well as regulatory (noncoding), structural (coding), frequency and whole-genome sequencing recurrence analyses. C.C. constructed Bayesian mutation probability models and performed RNA-seq, RPPA and survival outcome analyses. J.A.R. designed and performed luciferase assays. G.K. carried out biophysical simulations, performed hidden Markov model–based state decompositions and computed binding enthalpies with supervision from V.S.P. C.L.A., C.C., J.A.R., G.K., M.P.S. and W.J.G. wrote the manuscript.

Corresponding authors

Correspondence to Carlos L Araya, Michael P Snyder or William J Greenleaf.

Ethics declarations

Competing interests

M.P.S. is a co-founder and a member of the scientific advisory board (SAB) of Personalis and a member of the SABs of Genapsys and Axiomx. W.J.G. is a co-founder of Epinomics. A patent application has been filed by Stanford University with C.L.A., C.C., J.A.R., M.P.S. and W.J.G. named as inventors.

Integrated supplementary information

Supplementary Figure 1 Summary of exome sequencing data.

(a) Exome tumor-normal sample sizes for bladder cancer (BLCA), breast cancer (BRCA), carcinoid (CARC), chronic lymphocytic leukemia (CLLX), colorectal cancer (COLR), diffuse large B cell lymphoma (DLBC), esophageal adenocarcinoma (ESOP), glioblastoma multiforme (GLBM), head and neck cancer (HNSC), kidney clear cell carcinoma (KIRC), acute myeloid leukemia (LAML), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), medulloblastoma (MEDU), melanoma (MELA), multiple myeloma (MUMY), neuroblastoma (NEUB), ovarian cancer (OVAR), prostate cancer (PRAD), rhabdoid tumor (RHAB) and uterine corpus endometrial carcinoma (UCEC). (b) Reference coordinates for mutation impact annotation29 (SnpEff). CDS, coding sequence.

Supplementary Figure 2 Background mutation models capture variance in somatic mutation rates and are well correlated.

(a) Genome-wide transition/transversion mutation probabilities per tumor type. (b) Absolute difference in the log probabilities of complementary mutations (C>T and G>A) per gene in melanoma for the ‘Bayesian’ and 'Exonic' mutation probability models. The percentage of genes where complementary mutation probabilities are within one order of magnitude is indicated. (c) The median of Spearman correlations between the average ‘Bayesian’ and 'Matched' mutation probabilities in distinct tumor types is shown for the sets of tumor types with minimum numbers of samples (x axis). (d) Correlation between observed WGS intronic mutation probability (pan-cancer) and those of the ‘Bayesian’ (blue) or 'Matched' (gray) models.

Supplementary Figure 3 Density scores are highly correlated and enriched for known cancer driver genes.

(a) Right, the pan-cancer relationship between gene-specific and global binomial probabilities is shown. Left, correlation (Spearman ρ) is plotted as a function of density score in the low-to-mid density range. (b) Somatically altered SNV-driven cancer gene (SCG) fold enrichment (red) and significance of enrichment (blue) of region-associated genes as a function of region density score. (c) Fraction of SCGs that are region associated (blue) and fraction of region-associated genes that are SCGs (red) as a function of region density score.

Supplementary Figure 4 Most mutation cluster density scores fit the null distribution and lie on the diagonal in a quantile-quantile plot, indicating that simulations accurately capture the significance of mutation densities.

Quantile-quantile plots of the observed (y axis) and simulated (x axis) density scores (–log10, PDensity). (ad) Representative examples from bladder cancer (BLCA) (a), breast cancer (BRCA) (b), colorectal cancer (COLR) (c) and diffuse large B cell lymphoma (DLBC) (d) are shown. The solid line represents the threshold for density score (–log10, PDensity) that guarantees FDR ≤ 5% in each cancer type. The dashed line indicates the line corresponding to y = x. (e) Violin plots of density scores in an expanded set of 90 additional colorectal cancer simulations. (f) The distributions of density scores in the original (10×; blue) and expanded (90×; yellow) sets of simulations are highly concordant and yield tightly correlated FDR estimates for the observed density scores (inset, r2 = 0.99985). Dashed lines indicate thresholds of FDR ≤ 5%. (g) 99.2% (128/129) of SMRs thresholded by FDR (≤5%) are shared by the FDR10 × and FDR90 × thresholded sets.

Supplementary Figure 5 Robust SMRs capture ~95% of high-confidence SMRs from ten cancer types.

Robust SMRs are 58.8-fold enriched for somatic, SNV-driven Cancer Gene Census (CGC) genes (P = 2.4 × 10−34). (a) Overlap (blue) of robust SMRs (cyan) and high-confidence SMRs (gray). (b,c) Fraction of SMRs per cancer type classified as robust. Analyses in a and b are limited to high-confidence SMRs from the ten cancer types (green) with sufficient intronic mutation clusters for intron-based FDR estimation, as shown in b.

Supplementary Figure 6 Contribution of trinucleotide and APOBEC mutation heterogeneity in SMR identification.

(a) The fraction (ƒ) of mutated sites in endometrial cancer (UCEC) is plotted for each trinucleotide. Trinucleotides are oriented by transcription strand. Trinucleotides associated with APOBEC mutation signatures at high and low rates are labeled orange and pink, respectively. Notably, ƒTCT > ƒTCA and ƒAGA > ƒAGT. As shown in the inset (i), SMR mutation sites show a generally reduced fraction of APOBEC-associated trinucleotides as compared to the global set of somatic mutation sites in endometrial cancer. (b) As shown for endometrial cancer (i), the deviation in the observed over the (single-nucleotide) expected trinucleotide representation was compared with the fold change in the trinucleotide representation in SMR mutation sites for cancers with ≥250 SMR mutation sites (positions). These cancer types encompass 79% of all SMRs. On average, trinucleotide mutation heterogeneity not captured by single-nucleotide transition/transversion probabilities contributes to only 7.9% of the change in trinucleotide representation in SMRs. (a,b) Analyses performed with high- and medium-confidence SMRs. (c) Histogram of the fraction of mutations that are APOBEC associated per SMR. (d) Fraction of SMRs in which APOBEC-associated mutations are statistically increased (P < 0.05, Holmes-Bonferroni) per cancer. As shown in the inset (i), 4.0% of identified SMRs (n = 872) are driven by APOBEC-associated mutations. Raw (uncorrected) P values would indicate that 12% of SMRs have higher than expected APOBEC mutation signatures.

Supplementary Figure 7 Histogram of the fraction of somatic mutations within each coding region SMR that are predicted to alter protein sequence or RNA splicing.

Supplementary Figure 8 Histogram of Gini coefficients of dispersion for nonsynonymous mutations per gene.

Gini coefficients were calculated on the basis of the number of nonsynonymous mutations contained per residue mutated in each cancer for CGC genes. For each CGC gene (n = 522), the maximum coefficient across cancers is plotted31,32. A set of outliers with extreme Gini coefficients is labeled. 81% of CGC genes with unassociated SMRs have Gini coefficients <0.1.

Supplementary Figure 9 Molecular dynamics analysis of wild-type and mutant PIK3CA in complex with PIK3R1.

(a) Wild-type (WT) PIK3CA in complex with PIK3R1. (b) The K111E mutant of PIK3CA in complex with PIK3R1. (c) The G118D mutant of PIK3CA in complex with PIK3R1. The interaction enthalpy across the full PIK3CA-PIK3R1 binding interface follows a bimodal distribution (as shown in Fig. 3d). “Binding Mode 1” (blue) is preferred by WT PIK3CA and corresponds to binding interactions that are on average 1.8 kcal/mol tighter than those in “Binding Mode 2” (orange), which predominates in the K111E mutant of PIK3CA. The difference between the two binding modes becomes apparent in the salt-bridge pattern of R79. In “Binding Mode 1,” R79 is a key component of the binding interface (with E1215 and E1222 of PIK3R1; shown in gray helices). In “Binding Mode 2,” a salt bridge between R79 and E81 is in direct competition to this binding interaction (orange panel of a). In WT PIK3CA, this competition is attenuated by the interaction of K111 with E81 (shown in the blue panel of a) and to a similar degree by the interaction of R108 with E81 (data not shown). In the K111E mutant of PIK3CA, a similar attenuation can only occur through the simultaneous recruitment of R108 (blue panel of b). Taken together, the data suggest that K111E causes an inversion of the bimodal binding distribution and effectively weakens the interactions between PIK3CA and PIK3R1 as compared to WT PIK3CA. (c) Molecular dynamics simulations of the G118D mutant of PIK3CA show a similar weakening of the binding interactions with R79 at their core, albeit through the reshaping of a more extensive network of salt bridges that involves D118. Data are from 20 independent 0.1-μs molecular dynamics simulations. The individual distributions in Figure 3d correspond to distinct conformational states at the binding interface. Their cumulative populations were normalized and are reported as percentages.

Supplementary Figure 10 Enrichment of CGC genes among SMR-based protein-coding drivers and SMR-identified binding interfaces.

(a) Fraction of SMR- and OncodriveCLUST-identified protein-coding genes in the Cancer Gene Census (CGC). OncodriveCLUST results were obtained from Tamborero et al.11. Driver analysis in endometrial (UCEC), ovarian (OVAR) and lung squamous cell carcinoma (LUSC) were performed with the same exome data sets. Breast cancer (BRCA) results were obtained with distinct sets of exome data sets and are therefore not directly comparable. (b) The fraction of SMR-identified and previously reported51 protein and DNA interaction interfaces with recurrent cancer somatic mutations. For direct comparison, we consider only interactions with nucleic acids and proteins. All CGC genes with previously reported51 somatically altered nucleic acid or protein interfaces are captured by SMRs (inset).

Supplementary Figure 11 Molecular structure and spatial mapping of an SMR on histone H2B.

An SMR on histone H2B (HIST1H2BK.1; orange) is highlighted within the structure of the human nucleosome core particle (PDB, 2CV5). Histone H2B (blue), histone H2A (teal) and histone H4 (green) components are highlighted.

Supplementary Figure 12 NFE2L2 SMRs alter KEAP1-binding interfaces.

The structures of SMR NFE2L2.1 (orange, shown here) and NFE2L2.2 (Fig. 4g) were mapped to NFE2L2 structures (PDB, 2FLU and 3WN7). A sector of recurrent lung adenoma alterations on KEAP1 (teal) with density score FDR ≤ 5% did not meet the 2% mutation frequency cutoff. The structure of NFE2L2.2 mapped to the mouse NFE2L2-KEAP1 co-crystal structure (PDB, 3WN7) is shown in Figure 4g.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–12 and Supplementary Note. (PDF 1620 kb)

Supplementary Tables 1–14

Supplementary Tables 1–14. (XLSX 6343 kb)

Supplementary Table 15: Functional enrichment annotation.

Genes associated with high- and medium-confidence SMRs in each cancer type were analyzed for functional enrichments with DAVID. Analysis was performed only for cancer types with ≥5 high- or medium-confidence SMRs. In total, 331 functional enrichments (P < 0.05, Benjamini-Hochberg) were detected across the following 11 databases: BBID, BIOCARTA, COG_ONTOLOGY, INTERPRO, GOTERM_BP_FAT, KEGG_PATHWAY, OMIM_DISEASE, PIR_SUPERFAMILY, SMART, SP_PIR_KEYWORDS and UP_SEQ_FEATURE. (XLSX 265 kb)

Supplementary Table 16: Mutation cluster detection methods comparison.

C, protein-coding region; NC, noncoding region; SNVs, single-nucleotide variants; indels, insertions/deletions; IDRs, intrinsically disordered regions. *In 'hotspot' analysis, Weinhold et al. (2014) join mutations within 50 bp and do not consider background models that are mutation type specific. **In mutation clustering analysis, Lawrence et al. (2014) evaluate 3-bp mutation windows and do not consider background models that are mutation type specific. (XLSX 10 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Araya, C., Cenik, C., Reuter, J. et al. Identification of significantly mutated regions across cancer types highlights a rich landscape of functional molecular alterations. Nat Genet 48, 117–125 (2016). https://doi.org/10.1038/ng.3471

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.3471

This article is cited by

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer