Identification of significantly mutated regions across cancer types highlights a rich landscape of functional molecular alterations

Araya, Carlos L; Cenik, Can; Reuter, Jason A; Kiss, Gert; Pande, Vijay S; Snyder, Michael P; Greenleaf, William J

doi:10.1038/ng.3471

Analysis
Published: 21 December 2015

Identification of significantly mutated regions across cancer types highlights a rich landscape of functional molecular alterations

Carlos L Araya ORCID: orcid.org/0000-0002-5512-3062¹^na1,
Can Cenik¹^na1,
Jason A Reuter¹,
Gert Kiss²,
Vijay S Pande²,
Michael P Snyder¹ &
…
William J Greenleaf^1,3

Nature Genetics volume 48, pages 117–125 (2016)Cite this article

12k Accesses
51 Citations
48 Altmetric
Metrics details

Subjects

Abstract

Cancer sequencing studies have primarily identified cancer driver genes by the accumulation of protein-altering mutations. An improved method would be annotation independent, sensitive to unknown distributions of functions within proteins and inclusive of noncoding drivers. We employed density-based clustering methods in 21 tumor types to detect variably sized significantly mutated regions (SMRs). SMRs reveal recurrent alterations across a spectrum of coding and noncoding elements, including transcription factor binding sites and untranslated regions mutated in up to ∼15% of specific tumor types. SMRs demonstrate spatial clustering of alterations in molecular domains and at interfaces, often with associated changes in signaling. Mutation frequencies in SMRs demonstrate that distinct protein regions are differentially mutated across tumor types, as exemplified by a linker region of PIK3CA in which biophysical simulations suggest that mutations affect regulatory interactions. The functional diversity of SMRs underscores both the varied mechanisms of oncogenic misregulation and the advantage of functionally agnostic driver identification.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Identification of SMRs in 21 cancer types across a broad spectrum of functional elements.**

**Figure 2: Noncoding SMRs recurrently alter promoters and 5′ UTRs.**

**Figure 3: Structural mapping of SMRs onto proteins and complexes identifies differentially altered regions among cancers and molecular interfaces targeted by recurrent alterations.**

**Figure 4: SMRs are associated with distinct molecular signatures.**

**Figure 5: Structure in the distribution of cancer mutations remains largely uncharacterized.**

The landscape and driver potential of site-specific hotspots across cancer genomes

Article Open access 13 May 2021

Identification of cancer driver genes based on nucleotide context

Article 03 February 2020

MutSpot: detection of non-coding mutation hotspots in cancer genomes

Article Open access 05 June 2020

Accession codes

Accessions

Protein Data Bank

References

Hodis, E. et al. A landscape of driver mutations in melanoma. Cell 150, 251–263 (2012).
Article CAS PubMed PubMed Central Google Scholar
Huang, F.W. et al. Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957–959 (2013).
Article CAS PubMed PubMed Central Google Scholar
Alexandrov, L.B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).
Article CAS PubMed PubMed Central Google Scholar
Lawrence, M.S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).
Article CAS PubMed PubMed Central Google Scholar
Lawrence, M.S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501 (2014).
Article CAS PubMed PubMed Central Google Scholar
Ding, L., Wendl, M.C., McMichael, J.F. & Raphael, B.J. Expanding the computational toolbox for mining cancer genomes. Nat. Rev. Genet. 15, 556–570 (2014).
Article CAS PubMed PubMed Central Google Scholar
Davies, H. et al. Mutations of the BRAF gene in human cancer. Nature 417, 949–954 (2002).
Article CAS PubMed Google Scholar
Parsons, D.W. et al. An integrated genomic analysis of human glioblastoma multiforme. Science 321, 1807–1812 (2008).
Article CAS PubMed PubMed Central Google Scholar
Kane, D.P. & Shcherbakova, P.V. A common cancer-associated DNA polymerase ɛ mutation causes an exceptionally strong mutator phenotype, indicating fidelity defects distinct from loss of proofreading. Cancer Res. 74, 1895–1901 (2014).
Article CAS PubMed PubMed Central Google Scholar
Dees, N.D. et al. MuSiC: identifying mutational significance in cancer genomes. Genome Res. 22, 1589–1598 (2012).
CAS PubMed PubMed Central Google Scholar
Tamborero, D., Gonzalez-Perez, A. & Lopez-Bigas, N. OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes. Bioinformatics 29, 2238–2244 (2013).
Article CAS PubMed Google Scholar
Porta-Pardo, E. & Godzik, A. e-Driver: a novel method to identify protein regions driving cancer. Bioinformatics 30, 3109–3114 (2014).
Article CAS PubMed PubMed Central Google Scholar
Schnall-Levin, M., Zhao, Y., Perrimon, N. & Berger, B. Conserved microRNA targeting in Drosophila is as widespread in coding regions as in 3′ UTRs. Proc. Natl. Acad. Sci. USA 107, 15751–15756 (2010).
Article CAS PubMed PubMed Central Google Scholar
Cenik, C. et al. Genome analysis reveals interplay between 5′ UTR introns and nuclear mRNA export for secretory and mitochondrial genes. PLoS Genet. 7, e1001366 (2011).
Article CAS PubMed PubMed Central Google Scholar
Stergachis, A.B. et al. Exonic transcription factor binding directs codon choice and affects protein evolution. Science 342, 1367–1372 (2013).
Article CAS PubMed PubMed Central Google Scholar
Wolfe, A.L. et al. RNA G-quadruplexes cause eIF4A-dependent oncogene translation in cancer. Nature 513, 65–70 (2014).
Article CAS PubMed PubMed Central Google Scholar
Xiong, H.Y. et al. RNA splicing. The human splicing code reveals new insights into the genetic determinants of disease. Science 347, 1254806 (2015).
Article PubMed CAS Google Scholar
Gerstberger, S., Hafner, M. & Tuschl, T. A census of human RNA-binding proteins. Nat. Rev. Genet. 15, 829–845 (2014).
Article CAS PubMed Google Scholar
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Weinhold, N., Jacobsen, A., Schultz, N., Sander, C. & Lee, W. Genome-wide analysis of noncoding regulatory mutations in cancer. Nat. Genet. 46, 1160–1165 (2014).
Article CAS PubMed PubMed Central Google Scholar
Fredriksson, N.J., Ny, L., Nilsson, J.A. & Larsson, E. Systematic analysis of noncoding somatic mutations and gene expression alterations across 14 tumor types. Nat. Genet. 46, 1258–1263 (2014).
Article CAS PubMed Google Scholar
Supek, F., Miñana, B., Valcárcel, J., Gabaldón, T. & Lehner, B. Synonymous mutations frequently act as driver mutations in human cancers. Cell 156, 1324–1335 (2014).
Article CAS PubMed Google Scholar
Melton, C., Reuter, J.A., Spacek, D.V. & Snyder, M. Recurrent somatic mutations in regulatory regions of human cancer genomes. Nat. Genet. 47, 710–716 (2015).
Article CAS PubMed PubMed Central Google Scholar
Hofree, M., Shen, J.P., Carter, H., Gross, A. & Ideker, T. Network-based stratification of tumor mutations. Nat. Methods 10, 1108–1115 (2013).
Article CAS PubMed PubMed Central Google Scholar
Leiserson, M.D.M. et al. Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat. Genet. 47, 106–114 (2015).
Article CAS PubMed Google Scholar
Araya, C.L. et al. Regulatory analysis of the C. elegans genome with spatiotemporal resolution. Nature 512, 400–405 (2014).
Article CAS PubMed PubMed Central Google Scholar
Stergachis, A.B. et al. Conservation of trans-acting circuitry during mammalian regulatory evolution. Nature 515, 365–370 (2014).
Article CAS PubMed PubMed Central Google Scholar
Roadmap Epigenomics Consortium. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w¹¹¹⁸; iso-2; iso-3. Fly (Austin) 6, 80–92 (2012).
Article CAS Google Scholar
Martin, E., Kriegel, H.P., Jörg, S. & Xiaowei, X. A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96, 226–231 (1996).
Google Scholar
Futreal, P.A. et al. A census of human cancer genes. Nat. Rev. Cancer 4, 177–183 (2004).
Article CAS PubMed PubMed Central Google Scholar
Santarius, T., Shipley, J., Brewer, D., Stratton, M.R. & Cooper, C.S. A census of amplified and overexpressed human cancer genes. Nat. Rev. Cancer 10, 59–64 (2010).
Article CAS PubMed Google Scholar
Malhotra, A. et al. Breakpoint profiling of 64 cancer genomes reveals numerous complex rearrangements spawned by homology-independent mechanisms. Genome Res. 23, 762–776 (2013).
Article CAS PubMed PubMed Central Google Scholar
Jäger, D. et al. Identification of a tissue-specific putative transcription factor in breast tissue by serological screening of a breast cancer library. Cancer Res. 61, 2055–2061 (2001).
PubMed Google Scholar
Mei, Y.-P. et al. Small nucleolar RNA 42 acts as an oncogene in lung tumorigenesis. Oncogene 31, 2794–2804 (2012).
Article CAS PubMed Google Scholar
Okugawa, Y. et al. Clinical significance of SNORA42 as an oncogene and a prognostic biomarker in colorectal cancer. Gut http://dx.doi.org/10.1136/gutjnl-2015-309359 (15 October 2015).
Budinska, E. et al. Gene expression patterns unveil a new level of molecular heterogeneity in colorectal cancer. J. Pathol. 231, 63–76 (2013).
Article CAS PubMed PubMed Central Google Scholar
Uhlén, M. et al. Proteomics. Tissue-based map of the human proteome. Science 347, 1260419 (2015).
Article CAS PubMed Google Scholar
Vejnar, C.E. & Zdobnov, E.M. MiRmap: comprehensive prediction of microRNA target repression strength. Nucleic Acids Res. 40, 11673–11683 (2012).
Article CAS PubMed PubMed Central Google Scholar
Lara, R., Seckl, M.J. & Pardo, O.E. The p90 RSK family members: common functions and isoform specificity. Cancer Res. 73, 5301–5308 (2013).
Article CAS PubMed Google Scholar
Li, J. et al. TCPA: a resource for cancer functional proteomics data. Nat. Methods 10, 1046–1047 (2013).
Article CAS PubMed PubMed Central Google Scholar
Samuels, Y. et al. High frequency of mutations of the PIK3CA gene in human cancers. Science 304, 554 (2004).
Article CAS PubMed Google Scholar
Thorpe, L.M., Yuzugullu, H. & Zhao, J.J. PI3K in cancer: divergent roles of isoforms, modes of activation and therapeutic targeting. Nat. Rev. Cancer 15, 7–24 (2015).
Article CAS PubMed PubMed Central Google Scholar
Cancer Genome Atlas Research Network. Integrated genomic characterization of endometrial carcinoma. Nature 497, 67–73 (2013).
Article PubMed PubMed Central CAS Google Scholar
Miled, N. et al. Mechanism of two classes of cancer mutations in the phosphoinositide 3-kinase catalytic subunit. Science 317, 239–242 (2007).
Article CAS PubMed Google Scholar
Huang, C.-H. et al. The structure of a human p110α/p85α complex elucidates the effects of oncogenic PI3Kα mutations. Science 318, 1744–1748 (2007).
Article CAS PubMed Google Scholar
Gkeka, P. et al. Investigating the structure and dynamics of the PIK3CA wild-type and H1047R oncogenic mutant. PLoS Comput. Biol. 10, e1003895 (2014).
Article PubMed PubMed Central CAS Google Scholar
Burke, J.E., Perisic, O., Masson, G.R., Vadas, O. & Williams, R.L. Oncogenic mutations mimic and enhance dynamic events in the natural activation of phosphoinositide 3-kinase p110α (PIK3CA). Proc. Natl. Acad. Sci. USA 109, 15259–15264 (2012).
Article CAS PubMed PubMed Central Google Scholar
Haling, J.R. et al. Structure of the BRAF-MEK complex reveals a kinase activity independent role for BRAF in MAPK signaling. Cancer Cell 26, 402–413 (2014).
Article CAS PubMed Google Scholar
Kar, G., Gursoy, A. & Keskin, O. Human cancer protein-protein interaction network: a structural perspective. PLoS Comput. Biol. 5, e1000601 (2009).
Article PubMed PubMed Central CAS Google Scholar
Ghersi, D. & Singh, M. Interaction-based discovery of functionally important genes in cancers. Nucleic Acids Res. 42, e18 (2014).
Article CAS PubMed Google Scholar
Cheng, F. et al. Studying tumorigenesis through network evolution and somatic mutational perturbations in the cancer interactome. Mol. Biol. Evol. 31, 2156–2169 (2014).
Article CAS PubMed PubMed Central Google Scholar
Barbieri, C.E. et al. Exome sequencing identifies recurrent SPOP, FOXA1 and MED12 mutations in prostate cancer. Nat. Genet. 44, 685–689 (2012).
Article CAS PubMed PubMed Central Google Scholar
Fleming, N.I. et al. SMAD2, SMAD3 and SMAD4 mutations in colorectal cancer. Cancer Res. 73, 725–735 (2013).
Article CAS PubMed Google Scholar
Yuen, B.T.K. & Knoepfler, P.S. Histone H3.3 mutations: a variant path to cancer. Cancer Cell 24, 567–574 (2013).
Article CAS PubMed Google Scholar
Hornbeck, P.V. et al. PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res. 40, D261–D270 (2012).
Article CAS PubMed Google Scholar
Cheng, K.W. et al. The RAB25 small GTPase determines aggressiveness of ovarian and breast cancers. Nat. Med. 10, 1251–1256 (2004).
Article CAS PubMed Google Scholar
Zhang, J. et al. Overexpression of Rab25 contributes to metastasis of bladder cancer through induction of epithelial-mesenchymal transition and activation of Akt/GSK-3β/Snail signaling. Carcinogenesis 34, 2401–2408 (2013).
Article PubMed CAS Google Scholar
DeNicola, G.M. et al. Oncogene-induced Nrf2 transcription promotes ROS detoxification and tumorigenesis. Nature 475, 106–109 (2011).
Article CAS PubMed PubMed Central Google Scholar
Ji, Q. et al. Selective loss of AKR1C1 and AKR1C2 in breast cancer and their potential effect on progesterone signaling. Cancer Res. 64, 7610–7617 (2004).
Article CAS PubMed Google Scholar
Stanbrough, M. et al. Increased expression of genes converting adrenal androgens to testosterone in androgen-independent prostate cancer. Cancer Res. 66, 2815–2825 (2006).
Article CAS PubMed Google Scholar
Rižner, T.L., Šmuc, T., Rupreht, R., Šinkovec, J. & Penning, T.M. AKR1C1 and AKR1C3 may determine progesterone and estrogen ratios in endometrial cancer. Mol. Cell. Endocrinol. 248, 126–135 (2006).
Article PubMed CAS Google Scholar
Zhao, L. & Vogt, P.K. Helical domain and kinase domain mutations in p110α of phosphatidylinositol 3-kinase induce gain of function by different mechanisms. Proc. Natl. Acad. Sci. USA 105, 2652–2657 (2008).
Article CAS PubMed PubMed Central Google Scholar
Wu, X. et al. Activation of diverse signalling pathways by oncogenic PIK3CA mutations. Nat. Commun. 5, 4961 (2014).
Article CAS PubMed Google Scholar
Puente, X.S. et al. Non-coding recurrent mutations in chronic lymphocytic leukaemia. Nature 526, 519–524 (2015).
Article CAS PubMed Google Scholar
Supek, F. & Lehner, B. Differential DNA mismatch repair underlies mutation rate variation across the human genome. Nature 521, 81–84 (2015).
Article CAS PubMed PubMed Central Google Scholar
Reijns, M.A.M. et al. Lagging-strand replication shapes the mutational landscape of the genome. Nature 518, 502–506 (2015).
Article CAS PubMed PubMed Central Google Scholar
Lord, C.J. & Ashworth, A. The DNA damage response and cancer therapy. Nature 481, 287–294 (2012).
Article CAS PubMed Google Scholar
Roberts, S.A. et al. An APOBEC cytidine deaminase mutagenesis pattern is widespread in human cancers. Nat. Genet. 45, 970–976 (2013).
Article CAS PubMed PubMed Central Google Scholar
Polak, P. et al. Cell-of-origin chromatin organization shapes the mutational landscape of cancer. Nature 518, 360–364 (2015).
Article CAS PubMed PubMed Central Google Scholar
Araya, C.L. et al. A fundamental protein property, thermodynamic stability, revealed solely from large-scale measurements of protein function. Proc. Natl. Acad. Sci. USA 109, 16858–16863 (2012).
Article CAS PubMed PubMed Central Google Scholar
Buenrostro, J.D. et al. Quantitative analysis of RNA-protein interactions on a massively parallel array reveals biophysical and evolutionary landscapes. Nat. Biotechnol. 32, 562–568 (2014).
Article CAS PubMed PubMed Central Google Scholar
Guenther, U.-P. et al. Hidden specificity in an apparently nonspecific RNA-binding protein. Nature 502, 385–388 (2013).
Article CAS PubMed PubMed Central Google Scholar
Oliphant, T.E. Python for scientific computing. Comput. Sci. Eng. 9, 10–20 (2007).
Article CAS Google Scholar
Millman, K.J. & Aivazis, M. Python for scientists and engineers. Comput. Sci. Eng. 13, 9–12 (2011).
Article Google Scholar
McKinney, W. in Proc. 9th Python Sci. Conf. (eds. van der Walt, S. & Millman, J.) 51–56 (2010). ISBN-13: 978-1-4583-4619-3.
Dale, R.K., Pedersen, B.S. & Quinlan, A.R. Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinformatics 27, 3423–3424 (2011).
Article CAS PubMed PubMed Central Google Scholar
Van der Walt, S., Colbert, S.C. & Varoquaux, G. The NumPy Array: a structure for efficient numerical computation. Comput. Sci. Eng. 13, 22–30 (2011).
Article Google Scholar
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Google Scholar
Cock, P.J.A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).
Article CAS PubMed PubMed Central Google Scholar
Boyle, A.P. et al. Comparative analysis of regulatory information and circuits across distant species. Nature 512, 453–456 (2014).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank the TCGA, ICGC and TCPA for making these large-scale cancer data sets available to the scientific community. We thank H. Tang for discussions regarding statistical analyses. We thank M.M. Winslow, D.M. Fowler, S. Fields and D.E. Webster for critical reading and suggestions to the manuscript. C.L.A. was supported by US National Institutes of Health (NIH) grants 3U54DK10255602 and 1P50HG00773501. C.C. was supported by the Child Health Research Institute, the Lucile Packard Foundation for Children's Health and US NIH Clinical and Translational Science Award grant UL1TR000093. J.A.R. was supported by the Damon Runyon Cancer Research Foundation and US NIH award 1U01HG007919-01. G.K. acknowledges support from the Lawrence Scholars Program, the US NIH Simbios Program (U54GM072970) and the Center for Molecular Analysis and Design at Stanford University. Biophysical simulations were supported by the Blue Waters project via US National Science Foundation awards OCI-0725070 and ACI-1238993 and the state of Illinois. Further support was provided by the National Center for Multiscale Modeling of Biological Systems (P41GM103712-S1) through Anton-1 resources provided by the Pittsburgh Supercomputing Center under grant PSCA13072P. This work was supported by the Rita Allen Foundation.

Author information

Carlos L Araya and Can Cenik: These authors contributed equally to this work.

Authors and Affiliations

Department of Genetics, Stanford University School of Medicine, Stanford, California, USA
Carlos L Araya, Can Cenik, Jason A Reuter, Michael P Snyder & William J Greenleaf
Department of Chemistry, Stanford University, Stanford, California, USA
Gert Kiss & Vijay S Pande
Department of Applied Physics, Stanford University, Stanford, California, USA
William J Greenleaf

Authors

Carlos L Araya
View author publications
You can also search for this author in PubMed Google Scholar
Can Cenik
View author publications
You can also search for this author in PubMed Google Scholar
Jason A Reuter
View author publications
You can also search for this author in PubMed Google Scholar
Gert Kiss
View author publications
You can also search for this author in PubMed Google Scholar
Vijay S Pande
View author publications
You can also search for this author in PubMed Google Scholar
Michael P Snyder
View author publications
You can also search for this author in PubMed Google Scholar
William J Greenleaf
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

C.L.A. and W.J.G. conceived of the project, and all authors designed experiments and methods. C.L.A. and C.C. developed methods for the detection and analysis of SMRs. C.L.A. constructed uniform annotations and non-Bayesian mutation probability models and performed density-based clustering, scoring and empirical false-discovery estimation (simulations), as well as regulatory (noncoding), structural (coding), frequency and whole-genome sequencing recurrence analyses. C.C. constructed Bayesian mutation probability models and performed RNA-seq, RPPA and survival outcome analyses. J.A.R. designed and performed luciferase assays. G.K. carried out biophysical simulations, performed hidden Markov model–based state decompositions and computed binding enthalpies with supervision from V.S.P. C.L.A., C.C., J.A.R., G.K., M.P.S. and W.J.G. wrote the manuscript.

Corresponding authors

Correspondence to Carlos L Araya, Michael P Snyder or William J Greenleaf.

Ethics declarations

Competing interests

M.P.S. is a co-founder and a member of the scientific advisory board (SAB) of Personalis and a member of the SABs of Genapsys and Axiomx. W.J.G. is a co-founder of Epinomics. A patent application has been filed by Stanford University with C.L.A., C.C., J.A.R., M.P.S. and W.J.G. named as inventors.

Integrated supplementary information

Supplementary Figure 1 Summary of exome sequencing data.

(a) Exome tumor-normal sample sizes for bladder cancer (BLCA), breast cancer (BRCA), carcinoid (CARC), chronic lymphocytic leukemia (CLLX), colorectal cancer (COLR), diffuse large B cell lymphoma (DLBC), esophageal adenocarcinoma (ESOP), glioblastoma multiforme (GLBM), head and neck cancer (HNSC), kidney clear cell carcinoma (KIRC), acute myeloid leukemia (LAML), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), medulloblastoma (MEDU), melanoma (MELA), multiple myeloma (MUMY), neuroblastoma (NEUB), ovarian cancer (OVAR), prostate cancer (PRAD), rhabdoid tumor (RHAB) and uterine corpus endometrial carcinoma (UCEC). (b) Reference coordinates for mutation impact annotation²⁹ (SnpEff). CDS, coding sequence.

Supplementary Figure 2 Background mutation models capture variance in somatic mutation rates and are well correlated.

(a) Genome-wide transition/transversion mutation probabilities per tumor type. (b) Absolute difference in the log probabilities of complementary mutations (C>T and G>A) per gene in melanoma for the ‘Bayesian’ and 'Exonic' mutation probability models. The percentage of genes where complementary mutation probabilities are within one order of magnitude is indicated. (c) The median of Spearman correlations between the average ‘Bayesian’ and 'Matched' mutation probabilities in distinct tumor types is shown for the sets of tumor types with minimum numbers of samples (x axis). (d) Correlation between observed WGS intronic mutation probability (pan-cancer) and those of the ‘Bayesian’ (blue) or 'Matched' (gray) models.

Supplementary Figure 3 Density scores are highly correlated and enriched for known cancer driver genes.

(a) Right, the pan-cancer relationship between gene-specific and global binomial probabilities is shown. Left, correlation (Spearman ρ) is plotted as a function of density score in the low-to-mid density range. (b) Somatically altered SNV-driven cancer gene (SCG) fold enrichment (red) and significance of enrichment (blue) of region-associated genes as a function of region density score. (c) Fraction of SCGs that are region associated (blue) and fraction of region-associated genes that are SCGs (red) as a function of region density score.

Supplementary Figure 4 Most mutation cluster density scores fit the null distribution and lie on the diagonal in a quantile-quantile plot, indicating that simulations accurately capture the significance of mutation densities.

Quantile-quantile plots of the observed (y axis) and simulated (x axis) density scores (–log₁₀, P_Density). (a–d) Representative examples from bladder cancer (BLCA) (a), breast cancer (BRCA) (b), colorectal cancer (COLR) (c) and diffuse large B cell lymphoma (DLBC) (d) are shown. The solid line represents the threshold for density score (–log₁₀, P_Density) that guarantees FDR ≤ 5% in each cancer type. The dashed line indicates the line corresponding to y = x. (e) Violin plots of density scores in an expanded set of 90 additional colorectal cancer simulations. (f) The distributions of density scores in the original (10×; blue) and expanded (90×; yellow) sets of simulations are highly concordant and yield tightly correlated FDR estimates for the observed density scores (inset, r² = 0.99985). Dashed lines indicate thresholds of FDR ≤ 5%. (g) 99.2% (128/129) of SMRs thresholded by FDR (≤5%) are shared by the FDR_{10 ×} and FDR_{90 ×} thresholded sets.

Supplementary Figure 5 Robust SMRs capture ~95% of high-confidence SMRs from ten cancer types.

Robust SMRs are 58.8-fold enriched for somatic, SNV-driven Cancer Gene Census (CGC) genes (P = 2.4 × 10⁻³⁴). (a) Overlap (blue) of robust SMRs (cyan) and high-confidence SMRs (gray). (b,c) Fraction of SMRs per cancer type classified as robust. Analyses in a and b are limited to high-confidence SMRs from the ten cancer types (green) with sufficient intronic mutation clusters for intron-based FDR estimation, as shown in b.

Supplementary Figure 6 Contribution of trinucleotide and APOBEC mutation heterogeneity in SMR identification.

(a) The fraction (ƒ) of mutated sites in endometrial cancer (UCEC) is plotted for each trinucleotide. Trinucleotides are oriented by transcription strand. Trinucleotides associated with APOBEC mutation signatures at high and low rates are labeled orange and pink, respectively. Notably, ƒ_TCT > ƒ_TCA and ƒ_AGA > ƒ_AGT. As shown in the inset (i), SMR mutation sites show a generally reduced fraction of APOBEC-associated trinucleotides as compared to the global set of somatic mutation sites in endometrial cancer. (b) As shown for endometrial cancer (i), the deviation in the observed over the (single-nucleotide) expected trinucleotide representation was compared with the fold change in the trinucleotide representation in SMR mutation sites for cancers with ≥250 SMR mutation sites (positions). These cancer types encompass 79% of all SMRs. On average, trinucleotide mutation heterogeneity not captured by single-nucleotide transition/transversion probabilities contributes to only 7.9% of the change in trinucleotide representation in SMRs. (a,b) Analyses performed with high- and medium-confidence SMRs. (c) Histogram of the fraction of mutations that are APOBEC associated per SMR. (d) Fraction of SMRs in which APOBEC-associated mutations are statistically increased (P < 0.05, Holmes-Bonferroni) per cancer. As shown in the inset (i), 4.0% of identified SMRs (n = 872) are driven by APOBEC-associated mutations. Raw (uncorrected) P values would indicate that 12% of SMRs have higher than expected APOBEC mutation signatures.

Supplementary Figure 7 Histogram of the fraction of somatic mutations within each coding region SMR that are predicted to alter protein sequence or RNA splicing.

Supplementary Figure 8 Histogram of Gini coefficients of dispersion for nonsynonymous mutations per gene.

Gini coefficients were calculated on the basis of the number of nonsynonymous mutations contained per residue mutated in each cancer for CGC genes. For each CGC gene (n = 522), the maximum coefficient across cancers is plotted^31,32. A set of outliers with extreme Gini coefficients is labeled. 81% of CGC genes with unassociated SMRs have Gini coefficients <0.1.

Supplementary Figure 9 Molecular dynamics analysis of wild-type and mutant PIK3CA in complex with PIK3R1.

(a) Wild-type (WT) PIK3CA in complex with PIK3R1. (b) The K111E mutant of PIK3CA in complex with PIK3R1. (c) The G118D mutant of PIK3CA in complex with PIK3R1. The interaction enthalpy across the full PIK3CA-PIK3R1 binding interface follows a bimodal distribution (as shown in Fig. 3d). “Binding Mode 1” (blue) is preferred by WT PIK3CA and corresponds to binding interactions that are on average 1.8 kcal/mol tighter than those in “Binding Mode 2” (orange), which predominates in the K111E mutant of PIK3CA. The difference between the two binding modes becomes apparent in the salt-bridge pattern of R79. In “Binding Mode 1,” R79 is a key component of the binding interface (with E1215 and E1222 of PIK3R1; shown in gray helices). In “Binding Mode 2,” a salt bridge between R79 and E81 is in direct competition to this binding interaction (orange panel of a). In WT PIK3CA, this competition is attenuated by the interaction of K111 with E81 (shown in the blue panel of a) and to a similar degree by the interaction of R108 with E81 (data not shown). In the K111E mutant of PIK3CA, a similar attenuation can only occur through the simultaneous recruitment of R108 (blue panel of b). Taken together, the data suggest that K111E causes an inversion of the bimodal binding distribution and effectively weakens the interactions between PIK3CA and PIK3R1 as compared to WT PIK3CA. (c) Molecular dynamics simulations of the G118D mutant of PIK3CA show a similar weakening of the binding interactions with R79 at their core, albeit through the reshaping of a more extensive network of salt bridges that involves D118. Data are from 20 independent 0.1-μs molecular dynamics simulations. The individual distributions in Figure 3d correspond to distinct conformational states at the binding interface. Their cumulative populations were normalized and are reported as percentages.

Supplementary Figure 10 Enrichment of CGC genes among SMR-based protein-coding drivers and SMR-identified binding interfaces.

(a) Fraction of SMR- and OncodriveCLUST-identified protein-coding genes in the Cancer Gene Census (CGC). OncodriveCLUST results were obtained from Tamborero et al.¹¹. Driver analysis in endometrial (UCEC), ovarian (OVAR) and lung squamous cell carcinoma (LUSC) were performed with the same exome data sets. Breast cancer (BRCA) results were obtained with distinct sets of exome data sets and are therefore not directly comparable. (b) The fraction of SMR-identified and previously reported⁵¹ protein and DNA interaction interfaces with recurrent cancer somatic mutations. For direct comparison, we consider only interactions with nucleic acids and proteins. All CGC genes with previously reported⁵¹ somatically altered nucleic acid or protein interfaces are captured by SMRs (inset).

Supplementary Figure 11 Molecular structure and spatial mapping of an SMR on histone H2B.

An SMR on histone H2B (HIST1H2BK.1; orange) is highlighted within the structure of the human nucleosome core particle (PDB, 2CV5). Histone H2B (blue), histone H2A (teal) and histone H4 (green) components are highlighted.

Supplementary Figure 12 NFE2L2 SMRs alter KEAP1-binding interfaces.

The structures of SMR NFE2L2.1 (orange, shown here) and NFE2L2.2 (Fig. 4g) were mapped to NFE2L2 structures (PDB, 2FLU and 3WN7). A sector of recurrent lung adenoma alterations on KEAP1 (teal) with density score FDR ≤ 5% did not meet the 2% mutation frequency cutoff. The structure of NFE2L2.2 mapped to the mouse NFE2L2-KEAP1 co-crystal structure (PDB, 3WN7) is shown in Figure 4g.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–12 and Supplementary Note. (PDF 1620 kb)

Supplementary Tables 1–14

Supplementary Tables 1–14. (XLSX 6343 kb)

Supplementary Table 15: Functional enrichment annotation.

Genes associated with high- and medium-confidence SMRs in each cancer type were analyzed for functional enrichments with DAVID. Analysis was performed only for cancer types with ≥5 high- or medium-confidence SMRs. In total, 331 functional enrichments (P < 0.05, Benjamini-Hochberg) were detected across the following 11 databases: BBID, BIOCARTA, COG_ONTOLOGY, INTERPRO, GOTERM_BP_FAT, KEGG_PATHWAY, OMIM_DISEASE, PIR_SUPERFAMILY, SMART, SP_PIR_KEYWORDS and UP_SEQ_FEATURE. (XLSX 265 kb)

Supplementary Table 16: Mutation cluster detection methods comparison.

C, protein-coding region; NC, noncoding region; SNVs, single-nucleotide variants; indels, insertions/deletions; IDRs, intrinsically disordered regions. *In 'hotspot' analysis, Weinhold et al. (2014) join mutations within 50 bp and do not consider background models that are mutation type specific. **In mutation clustering analysis, Lawrence et al. (2014) evaluate 3-bp mutation windows and do not consider background models that are mutation type specific. (XLSX 10 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Araya, C., Cenik, C., Reuter, J. et al. Identification of significantly mutated regions across cancer types highlights a rich landscape of functional molecular alterations. Nat Genet 48, 117–125 (2016). https://doi.org/10.1038/ng.3471

Download citation

Received: 17 June 2015
Accepted: 20 November 2015
Published: 21 December 2015
Issue Date: February 2016
DOI: https://doi.org/10.1038/ng.3471

This article is cited by

Personalisierte Medizin – von der Translation zur Klinik
- Marcus Schmidt
- Carsten Denkert
- Sibylle Loibl
Die Gynäkologie (2023)
Personalisierte Medizin – von der Translation zur Klinik
- Marcus Schmidt
- Carsten Denkert
- Sibylle Loibl
Wiener klinisches Magazin (2023)
Extracting phylogenetic dimensions of coevolution reveals hidden functional signals
- Alexandre Colavin
- Esha Atolia
- Kerwyn Casey Huang
Scientific Reports (2022)
A pan-cancer landscape of somatic mutations in non-unique regions of the human genome
- Maxime Tarabichi
- Jonas Demeulemeester
- Tomasz Konopka
Nature Biotechnology (2021)
Non-coding driver mutations in human cancer
- Kerryn Elliott
- Erik Larsson
Nature Reviews Cancer (2021)