Abstract
Cancer sequencing studies have primarily identified cancer driver genes by the accumulation of protein-altering mutations. An improved method would be annotation independent, sensitive to unknown distributions of functions within proteins and inclusive of noncoding drivers. We employed density-based clustering methods in 21 tumor types to detect variably sized significantly mutated regions (SMRs). SMRs reveal recurrent alterations across a spectrum of coding and noncoding elements, including transcription factor binding sites and untranslated regions mutated in up to ∼15% of specific tumor types. SMRs demonstrate spatial clustering of alterations in molecular domains and at interfaces, often with associated changes in signaling. Mutation frequencies in SMRs demonstrate that distinct protein regions are differentially mutated across tumor types, as exemplified by a linker region of PIK3CA in which biophysical simulations suggest that mutations affect regulatory interactions. The functional diversity of SMRs underscores both the varied mechanisms of oncogenic misregulation and the advantage of functionally agnostic driver identification.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Hodis, E. et al. A landscape of driver mutations in melanoma. Cell 150, 251–263 (2012).
Huang, F.W. et al. Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957–959 (2013).
Alexandrov, L.B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).
Lawrence, M.S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).
Lawrence, M.S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501 (2014).
Ding, L., Wendl, M.C., McMichael, J.F. & Raphael, B.J. Expanding the computational toolbox for mining cancer genomes. Nat. Rev. Genet. 15, 556–570 (2014).
Davies, H. et al. Mutations of the BRAF gene in human cancer. Nature 417, 949–954 (2002).
Parsons, D.W. et al. An integrated genomic analysis of human glioblastoma multiforme. Science 321, 1807–1812 (2008).
Kane, D.P. & Shcherbakova, P.V. A common cancer-associated DNA polymerase ɛ mutation causes an exceptionally strong mutator phenotype, indicating fidelity defects distinct from loss of proofreading. Cancer Res. 74, 1895–1901 (2014).
Dees, N.D. et al. MuSiC: identifying mutational significance in cancer genomes. Genome Res. 22, 1589–1598 (2012).
Tamborero, D., Gonzalez-Perez, A. & Lopez-Bigas, N. OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes. Bioinformatics 29, 2238–2244 (2013).
Porta-Pardo, E. & Godzik, A. e-Driver: a novel method to identify protein regions driving cancer. Bioinformatics 30, 3109–3114 (2014).
Schnall-Levin, M., Zhao, Y., Perrimon, N. & Berger, B. Conserved microRNA targeting in Drosophila is as widespread in coding regions as in 3′ UTRs. Proc. Natl. Acad. Sci. USA 107, 15751–15756 (2010).
Cenik, C. et al. Genome analysis reveals interplay between 5′ UTR introns and nuclear mRNA export for secretory and mitochondrial genes. PLoS Genet. 7, e1001366 (2011).
Stergachis, A.B. et al. Exonic transcription factor binding directs codon choice and affects protein evolution. Science 342, 1367–1372 (2013).
Wolfe, A.L. et al. RNA G-quadruplexes cause eIF4A-dependent oncogene translation in cancer. Nature 513, 65–70 (2014).
Xiong, H.Y. et al. RNA splicing. The human splicing code reveals new insights into the genetic determinants of disease. Science 347, 1254806 (2015).
Gerstberger, S., Hafner, M. & Tuschl, T. A census of human RNA-binding proteins. Nat. Rev. Genet. 15, 829–845 (2014).
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Weinhold, N., Jacobsen, A., Schultz, N., Sander, C. & Lee, W. Genome-wide analysis of noncoding regulatory mutations in cancer. Nat. Genet. 46, 1160–1165 (2014).
Fredriksson, N.J., Ny, L., Nilsson, J.A. & Larsson, E. Systematic analysis of noncoding somatic mutations and gene expression alterations across 14 tumor types. Nat. Genet. 46, 1258–1263 (2014).
Supek, F., Miñana, B., Valcárcel, J., Gabaldón, T. & Lehner, B. Synonymous mutations frequently act as driver mutations in human cancers. Cell 156, 1324–1335 (2014).
Melton, C., Reuter, J.A., Spacek, D.V. & Snyder, M. Recurrent somatic mutations in regulatory regions of human cancer genomes. Nat. Genet. 47, 710–716 (2015).
Hofree, M., Shen, J.P., Carter, H., Gross, A. & Ideker, T. Network-based stratification of tumor mutations. Nat. Methods 10, 1108–1115 (2013).
Leiserson, M.D.M. et al. Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat. Genet. 47, 106–114 (2015).
Araya, C.L. et al. Regulatory analysis of the C. elegans genome with spatiotemporal resolution. Nature 512, 400–405 (2014).
Stergachis, A.B. et al. Conservation of trans-acting circuitry during mammalian regulatory evolution. Nature 515, 365–370 (2014).
Roadmap Epigenomics Consortium. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6, 80–92 (2012).
Martin, E., Kriegel, H.P., Jörg, S. & Xiaowei, X. A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96, 226–231 (1996).
Futreal, P.A. et al. A census of human cancer genes. Nat. Rev. Cancer 4, 177–183 (2004).
Santarius, T., Shipley, J., Brewer, D., Stratton, M.R. & Cooper, C.S. A census of amplified and overexpressed human cancer genes. Nat. Rev. Cancer 10, 59–64 (2010).
Malhotra, A. et al. Breakpoint profiling of 64 cancer genomes reveals numerous complex rearrangements spawned by homology-independent mechanisms. Genome Res. 23, 762–776 (2013).
Jäger, D. et al. Identification of a tissue-specific putative transcription factor in breast tissue by serological screening of a breast cancer library. Cancer Res. 61, 2055–2061 (2001).
Mei, Y.-P. et al. Small nucleolar RNA 42 acts as an oncogene in lung tumorigenesis. Oncogene 31, 2794–2804 (2012).
Okugawa, Y. et al. Clinical significance of SNORA42 as an oncogene and a prognostic biomarker in colorectal cancer. Gut http://dx.doi.org/10.1136/gutjnl-2015-309359 (15 October 2015).
Budinska, E. et al. Gene expression patterns unveil a new level of molecular heterogeneity in colorectal cancer. J. Pathol. 231, 63–76 (2013).
Uhlén, M. et al. Proteomics. Tissue-based map of the human proteome. Science 347, 1260419 (2015).
Vejnar, C.E. & Zdobnov, E.M. MiRmap: comprehensive prediction of microRNA target repression strength. Nucleic Acids Res. 40, 11673–11683 (2012).
Lara, R., Seckl, M.J. & Pardo, O.E. The p90 RSK family members: common functions and isoform specificity. Cancer Res. 73, 5301–5308 (2013).
Li, J. et al. TCPA: a resource for cancer functional proteomics data. Nat. Methods 10, 1046–1047 (2013).
Samuels, Y. et al. High frequency of mutations of the PIK3CA gene in human cancers. Science 304, 554 (2004).
Thorpe, L.M., Yuzugullu, H. & Zhao, J.J. PI3K in cancer: divergent roles of isoforms, modes of activation and therapeutic targeting. Nat. Rev. Cancer 15, 7–24 (2015).
Cancer Genome Atlas Research Network. Integrated genomic characterization of endometrial carcinoma. Nature 497, 67–73 (2013).
Miled, N. et al. Mechanism of two classes of cancer mutations in the phosphoinositide 3-kinase catalytic subunit. Science 317, 239–242 (2007).
Huang, C.-H. et al. The structure of a human p110α/p85α complex elucidates the effects of oncogenic PI3Kα mutations. Science 318, 1744–1748 (2007).
Gkeka, P. et al. Investigating the structure and dynamics of the PIK3CA wild-type and H1047R oncogenic mutant. PLoS Comput. Biol. 10, e1003895 (2014).
Burke, J.E., Perisic, O., Masson, G.R., Vadas, O. & Williams, R.L. Oncogenic mutations mimic and enhance dynamic events in the natural activation of phosphoinositide 3-kinase p110α (PIK3CA). Proc. Natl. Acad. Sci. USA 109, 15259–15264 (2012).
Haling, J.R. et al. Structure of the BRAF-MEK complex reveals a kinase activity independent role for BRAF in MAPK signaling. Cancer Cell 26, 402–413 (2014).
Kar, G., Gursoy, A. & Keskin, O. Human cancer protein-protein interaction network: a structural perspective. PLoS Comput. Biol. 5, e1000601 (2009).
Ghersi, D. & Singh, M. Interaction-based discovery of functionally important genes in cancers. Nucleic Acids Res. 42, e18 (2014).
Cheng, F. et al. Studying tumorigenesis through network evolution and somatic mutational perturbations in the cancer interactome. Mol. Biol. Evol. 31, 2156–2169 (2014).
Barbieri, C.E. et al. Exome sequencing identifies recurrent SPOP, FOXA1 and MED12 mutations in prostate cancer. Nat. Genet. 44, 685–689 (2012).
Fleming, N.I. et al. SMAD2, SMAD3 and SMAD4 mutations in colorectal cancer. Cancer Res. 73, 725–735 (2013).
Yuen, B.T.K. & Knoepfler, P.S. Histone H3.3 mutations: a variant path to cancer. Cancer Cell 24, 567–574 (2013).
Hornbeck, P.V. et al. PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res. 40, D261–D270 (2012).
Cheng, K.W. et al. The RAB25 small GTPase determines aggressiveness of ovarian and breast cancers. Nat. Med. 10, 1251–1256 (2004).
Zhang, J. et al. Overexpression of Rab25 contributes to metastasis of bladder cancer through induction of epithelial-mesenchymal transition and activation of Akt/GSK-3β/Snail signaling. Carcinogenesis 34, 2401–2408 (2013).
DeNicola, G.M. et al. Oncogene-induced Nrf2 transcription promotes ROS detoxification and tumorigenesis. Nature 475, 106–109 (2011).
Ji, Q. et al. Selective loss of AKR1C1 and AKR1C2 in breast cancer and their potential effect on progesterone signaling. Cancer Res. 64, 7610–7617 (2004).
Stanbrough, M. et al. Increased expression of genes converting adrenal androgens to testosterone in androgen-independent prostate cancer. Cancer Res. 66, 2815–2825 (2006).
Rižner, T.L., Šmuc, T., Rupreht, R., Šinkovec, J. & Penning, T.M. AKR1C1 and AKR1C3 may determine progesterone and estrogen ratios in endometrial cancer. Mol. Cell. Endocrinol. 248, 126–135 (2006).
Zhao, L. & Vogt, P.K. Helical domain and kinase domain mutations in p110α of phosphatidylinositol 3-kinase induce gain of function by different mechanisms. Proc. Natl. Acad. Sci. USA 105, 2652–2657 (2008).
Wu, X. et al. Activation of diverse signalling pathways by oncogenic PIK3CA mutations. Nat. Commun. 5, 4961 (2014).
Puente, X.S. et al. Non-coding recurrent mutations in chronic lymphocytic leukaemia. Nature 526, 519–524 (2015).
Supek, F. & Lehner, B. Differential DNA mismatch repair underlies mutation rate variation across the human genome. Nature 521, 81–84 (2015).
Reijns, M.A.M. et al. Lagging-strand replication shapes the mutational landscape of the genome. Nature 518, 502–506 (2015).
Lord, C.J. & Ashworth, A. The DNA damage response and cancer therapy. Nature 481, 287–294 (2012).
Roberts, S.A. et al. An APOBEC cytidine deaminase mutagenesis pattern is widespread in human cancers. Nat. Genet. 45, 970–976 (2013).
Polak, P. et al. Cell-of-origin chromatin organization shapes the mutational landscape of cancer. Nature 518, 360–364 (2015).
Araya, C.L. et al. A fundamental protein property, thermodynamic stability, revealed solely from large-scale measurements of protein function. Proc. Natl. Acad. Sci. USA 109, 16858–16863 (2012).
Buenrostro, J.D. et al. Quantitative analysis of RNA-protein interactions on a massively parallel array reveals biophysical and evolutionary landscapes. Nat. Biotechnol. 32, 562–568 (2014).
Guenther, U.-P. et al. Hidden specificity in an apparently nonspecific RNA-binding protein. Nature 502, 385–388 (2013).
Oliphant, T.E. Python for scientific computing. Comput. Sci. Eng. 9, 10–20 (2007).
Millman, K.J. & Aivazis, M. Python for scientists and engineers. Comput. Sci. Eng. 13, 9–12 (2011).
McKinney, W. in Proc. 9th Python Sci. Conf. (eds. van der Walt, S. & Millman, J.) 51–56 (2010). ISBN-13: 978-1-4583-4619-3.
Dale, R.K., Pedersen, B.S. & Quinlan, A.R. Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinformatics 27, 3423–3424 (2011).
Van der Walt, S., Colbert, S.C. & Varoquaux, G. The NumPy Array: a structure for efficient numerical computation. Comput. Sci. Eng. 13, 22–30 (2011).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Cock, P.J.A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).
Boyle, A.P. et al. Comparative analysis of regulatory information and circuits across distant species. Nature 512, 453–456 (2014).
Acknowledgements
We thank the TCGA, ICGC and TCPA for making these large-scale cancer data sets available to the scientific community. We thank H. Tang for discussions regarding statistical analyses. We thank M.M. Winslow, D.M. Fowler, S. Fields and D.E. Webster for critical reading and suggestions to the manuscript. C.L.A. was supported by US National Institutes of Health (NIH) grants 3U54DK10255602 and 1P50HG00773501. C.C. was supported by the Child Health Research Institute, the Lucile Packard Foundation for Children's Health and US NIH Clinical and Translational Science Award grant UL1TR000093. J.A.R. was supported by the Damon Runyon Cancer Research Foundation and US NIH award 1U01HG007919-01. G.K. acknowledges support from the Lawrence Scholars Program, the US NIH Simbios Program (U54GM072970) and the Center for Molecular Analysis and Design at Stanford University. Biophysical simulations were supported by the Blue Waters project via US National Science Foundation awards OCI-0725070 and ACI-1238993 and the state of Illinois. Further support was provided by the National Center for Multiscale Modeling of Biological Systems (P41GM103712-S1) through Anton-1 resources provided by the Pittsburgh Supercomputing Center under grant PSCA13072P. This work was supported by the Rita Allen Foundation.
Author information
Authors and Affiliations
Contributions
C.L.A. and W.J.G. conceived of the project, and all authors designed experiments and methods. C.L.A. and C.C. developed methods for the detection and analysis of SMRs. C.L.A. constructed uniform annotations and non-Bayesian mutation probability models and performed density-based clustering, scoring and empirical false-discovery estimation (simulations), as well as regulatory (noncoding), structural (coding), frequency and whole-genome sequencing recurrence analyses. C.C. constructed Bayesian mutation probability models and performed RNA-seq, RPPA and survival outcome analyses. J.A.R. designed and performed luciferase assays. G.K. carried out biophysical simulations, performed hidden Markov model–based state decompositions and computed binding enthalpies with supervision from V.S.P. C.L.A., C.C., J.A.R., G.K., M.P.S. and W.J.G. wrote the manuscript.
Corresponding authors
Ethics declarations
Competing interests
M.P.S. is a co-founder and a member of the scientific advisory board (SAB) of Personalis and a member of the SABs of Genapsys and Axiomx. W.J.G. is a co-founder of Epinomics. A patent application has been filed by Stanford University with C.L.A., C.C., J.A.R., M.P.S. and W.J.G. named as inventors.
Integrated supplementary information
Supplementary Figure 1 Summary of exome sequencing data.
(a) Exome tumor-normal sample sizes for bladder cancer (BLCA), breast cancer (BRCA), carcinoid (CARC), chronic lymphocytic leukemia (CLLX), colorectal cancer (COLR), diffuse large B cell lymphoma (DLBC), esophageal adenocarcinoma (ESOP), glioblastoma multiforme (GLBM), head and neck cancer (HNSC), kidney clear cell carcinoma (KIRC), acute myeloid leukemia (LAML), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), medulloblastoma (MEDU), melanoma (MELA), multiple myeloma (MUMY), neuroblastoma (NEUB), ovarian cancer (OVAR), prostate cancer (PRAD), rhabdoid tumor (RHAB) and uterine corpus endometrial carcinoma (UCEC). (b) Reference coordinates for mutation impact annotation29 (SnpEff). CDS, coding sequence.
Supplementary Figure 2 Background mutation models capture variance in somatic mutation rates and are well correlated.
(a) Genome-wide transition/transversion mutation probabilities per tumor type. (b) Absolute difference in the log probabilities of complementary mutations (C>T and G>A) per gene in melanoma for the ‘Bayesian’ and 'Exonic' mutation probability models. The percentage of genes where complementary mutation probabilities are within one order of magnitude is indicated. (c) The median of Spearman correlations between the average ‘Bayesian’ and 'Matched' mutation probabilities in distinct tumor types is shown for the sets of tumor types with minimum numbers of samples (x axis). (d) Correlation between observed WGS intronic mutation probability (pan-cancer) and those of the ‘Bayesian’ (blue) or 'Matched' (gray) models.
Supplementary Figure 3 Density scores are highly correlated and enriched for known cancer driver genes.
(a) Right, the pan-cancer relationship between gene-specific and global binomial probabilities is shown. Left, correlation (Spearman ρ) is plotted as a function of density score in the low-to-mid density range. (b) Somatically altered SNV-driven cancer gene (SCG) fold enrichment (red) and significance of enrichment (blue) of region-associated genes as a function of region density score. (c) Fraction of SCGs that are region associated (blue) and fraction of region-associated genes that are SCGs (red) as a function of region density score.
Supplementary Figure 4 Most mutation cluster density scores fit the null distribution and lie on the diagonal in a quantile-quantile plot, indicating that simulations accurately capture the significance of mutation densities.
Quantile-quantile plots of the observed (y axis) and simulated (x axis) density scores (–log10, PDensity). (a–d) Representative examples from bladder cancer (BLCA) (a), breast cancer (BRCA) (b), colorectal cancer (COLR) (c) and diffuse large B cell lymphoma (DLBC) (d) are shown. The solid line represents the threshold for density score (–log10, PDensity) that guarantees FDR ≤ 5% in each cancer type. The dashed line indicates the line corresponding to y = x. (e) Violin plots of density scores in an expanded set of 90 additional colorectal cancer simulations. (f) The distributions of density scores in the original (10×; blue) and expanded (90×; yellow) sets of simulations are highly concordant and yield tightly correlated FDR estimates for the observed density scores (inset, r2 = 0.99985). Dashed lines indicate thresholds of FDR ≤ 5%. (g) 99.2% (128/129) of SMRs thresholded by FDR (≤5%) are shared by the FDR10 × and FDR90 × thresholded sets.
Supplementary Figure 5 Robust SMRs capture ~95% of high-confidence SMRs from ten cancer types.
Robust SMRs are 58.8-fold enriched for somatic, SNV-driven Cancer Gene Census (CGC) genes (P = 2.4 × 10−34). (a) Overlap (blue) of robust SMRs (cyan) and high-confidence SMRs (gray). (b,c) Fraction of SMRs per cancer type classified as robust. Analyses in a and b are limited to high-confidence SMRs from the ten cancer types (green) with sufficient intronic mutation clusters for intron-based FDR estimation, as shown in b.
Supplementary Figure 6 Contribution of trinucleotide and APOBEC mutation heterogeneity in SMR identification.
(a) The fraction (ƒ) of mutated sites in endometrial cancer (UCEC) is plotted for each trinucleotide. Trinucleotides are oriented by transcription strand. Trinucleotides associated with APOBEC mutation signatures at high and low rates are labeled orange and pink, respectively. Notably, ƒTCT > ƒTCA and ƒAGA > ƒAGT. As shown in the inset (i), SMR mutation sites show a generally reduced fraction of APOBEC-associated trinucleotides as compared to the global set of somatic mutation sites in endometrial cancer. (b) As shown for endometrial cancer (i), the deviation in the observed over the (single-nucleotide) expected trinucleotide representation was compared with the fold change in the trinucleotide representation in SMR mutation sites for cancers with ≥250 SMR mutation sites (positions). These cancer types encompass 79% of all SMRs. On average, trinucleotide mutation heterogeneity not captured by single-nucleotide transition/transversion probabilities contributes to only 7.9% of the change in trinucleotide representation in SMRs. (a,b) Analyses performed with high- and medium-confidence SMRs. (c) Histogram of the fraction of mutations that are APOBEC associated per SMR. (d) Fraction of SMRs in which APOBEC-associated mutations are statistically increased (P < 0.05, Holmes-Bonferroni) per cancer. As shown in the inset (i), 4.0% of identified SMRs (n = 872) are driven by APOBEC-associated mutations. Raw (uncorrected) P values would indicate that 12% of SMRs have higher than expected APOBEC mutation signatures.
Supplementary Figure 8 Histogram of Gini coefficients of dispersion for nonsynonymous mutations per gene.
Gini coefficients were calculated on the basis of the number of nonsynonymous mutations contained per residue mutated in each cancer for CGC genes. For each CGC gene (n = 522), the maximum coefficient across cancers is plotted31,32. A set of outliers with extreme Gini coefficients is labeled. 81% of CGC genes with unassociated SMRs have Gini coefficients <0.1.
Supplementary Figure 9 Molecular dynamics analysis of wild-type and mutant PIK3CA in complex with PIK3R1.
(a) Wild-type (WT) PIK3CA in complex with PIK3R1. (b) The K111E mutant of PIK3CA in complex with PIK3R1. (c) The G118D mutant of PIK3CA in complex with PIK3R1. The interaction enthalpy across the full PIK3CA-PIK3R1 binding interface follows a bimodal distribution (as shown in Fig. 3d). “Binding Mode 1” (blue) is preferred by WT PIK3CA and corresponds to binding interactions that are on average 1.8 kcal/mol tighter than those in “Binding Mode 2” (orange), which predominates in the K111E mutant of PIK3CA. The difference between the two binding modes becomes apparent in the salt-bridge pattern of R79. In “Binding Mode 1,” R79 is a key component of the binding interface (with E1215 and E1222 of PIK3R1; shown in gray helices). In “Binding Mode 2,” a salt bridge between R79 and E81 is in direct competition to this binding interaction (orange panel of a). In WT PIK3CA, this competition is attenuated by the interaction of K111 with E81 (shown in the blue panel of a) and to a similar degree by the interaction of R108 with E81 (data not shown). In the K111E mutant of PIK3CA, a similar attenuation can only occur through the simultaneous recruitment of R108 (blue panel of b). Taken together, the data suggest that K111E causes an inversion of the bimodal binding distribution and effectively weakens the interactions between PIK3CA and PIK3R1 as compared to WT PIK3CA. (c) Molecular dynamics simulations of the G118D mutant of PIK3CA show a similar weakening of the binding interactions with R79 at their core, albeit through the reshaping of a more extensive network of salt bridges that involves D118. Data are from 20 independent 0.1-μs molecular dynamics simulations. The individual distributions in Figure 3d correspond to distinct conformational states at the binding interface. Their cumulative populations were normalized and are reported as percentages.
Supplementary Figure 10 Enrichment of CGC genes among SMR-based protein-coding drivers and SMR-identified binding interfaces.
(a) Fraction of SMR- and OncodriveCLUST-identified protein-coding genes in the Cancer Gene Census (CGC). OncodriveCLUST results were obtained from Tamborero et al.11. Driver analysis in endometrial (UCEC), ovarian (OVAR) and lung squamous cell carcinoma (LUSC) were performed with the same exome data sets. Breast cancer (BRCA) results were obtained with distinct sets of exome data sets and are therefore not directly comparable. (b) The fraction of SMR-identified and previously reported51 protein and DNA interaction interfaces with recurrent cancer somatic mutations. For direct comparison, we consider only interactions with nucleic acids and proteins. All CGC genes with previously reported51 somatically altered nucleic acid or protein interfaces are captured by SMRs (inset).
Supplementary Figure 11 Molecular structure and spatial mapping of an SMR on histone H2B.
An SMR on histone H2B (HIST1H2BK.1; orange) is highlighted within the structure of the human nucleosome core particle (PDB, 2CV5). Histone H2B (blue), histone H2A (teal) and histone H4 (green) components are highlighted.
Supplementary Figure 12 NFE2L2 SMRs alter KEAP1-binding interfaces.
The structures of SMR NFE2L2.1 (orange, shown here) and NFE2L2.2 (Fig. 4g) were mapped to NFE2L2 structures (PDB, 2FLU and 3WN7). A sector of recurrent lung adenoma alterations on KEAP1 (teal) with density score FDR ≤ 5% did not meet the 2% mutation frequency cutoff. The structure of NFE2L2.2 mapped to the mouse NFE2L2-KEAP1 co-crystal structure (PDB, 3WN7) is shown in Figure 4g.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–12 and Supplementary Note. (PDF 1620 kb)
Supplementary Tables 1–14
Supplementary Tables 1–14. (XLSX 6343 kb)
Supplementary Table 15: Functional enrichment annotation.
Genes associated with high- and medium-confidence SMRs in each cancer type were analyzed for functional enrichments with DAVID. Analysis was performed only for cancer types with ≥5 high- or medium-confidence SMRs. In total, 331 functional enrichments (P < 0.05, Benjamini-Hochberg) were detected across the following 11 databases: BBID, BIOCARTA, COG_ONTOLOGY, INTERPRO, GOTERM_BP_FAT, KEGG_PATHWAY, OMIM_DISEASE, PIR_SUPERFAMILY, SMART, SP_PIR_KEYWORDS and UP_SEQ_FEATURE. (XLSX 265 kb)
Supplementary Table 16: Mutation cluster detection methods comparison.
C, protein-coding region; NC, noncoding region; SNVs, single-nucleotide variants; indels, insertions/deletions; IDRs, intrinsically disordered regions. *In 'hotspot' analysis, Weinhold et al. (2014) join mutations within 50 bp and do not consider background models that are mutation type specific. **In mutation clustering analysis, Lawrence et al. (2014) evaluate 3-bp mutation windows and do not consider background models that are mutation type specific. (XLSX 10 kb)
Rights and permissions
About this article
Cite this article
Araya, C., Cenik, C., Reuter, J. et al. Identification of significantly mutated regions across cancer types highlights a rich landscape of functional molecular alterations. Nat Genet 48, 117–125 (2016). https://doi.org/10.1038/ng.3471
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.3471
This article is cited by
-
Personalisierte Medizin – von der Translation zur Klinik
Die Gynäkologie (2023)
-
Personalisierte Medizin – von der Translation zur Klinik
Wiener klinisches Magazin (2023)
-
Extracting phylogenetic dimensions of coevolution reveals hidden functional signals
Scientific Reports (2022)
-
A pan-cancer landscape of somatic mutations in non-unique regions of the human genome
Nature Biotechnology (2021)
-
Non-coding driver mutations in human cancer
Nature Reviews Cancer (2021)