Abstract
Recurrence is a hallmark of cancer-driving mutations. Recurrent mutations can arise at the same site or affect the same gene at different sites. Here we identified a set of mutations arising in individual samples and altering different cis-regulatory elements that converge on a common gene via chromatin interactions. The mutations and genes identified in this fashion showed strong relevance to cancer, in contrast to noncoding mutations with site-specific recurrence only. We developed a prediction method that identifies potentially recurrent mutations on the basis of the features shared by mutations whose recurrence is observed in a given cohort. Our method was capable of accurately predicting recurrent mutations at the level of target genes but not mutations recurring at the same site. We experimentally validated predicted mutations in distal regulatory regions of the TERT gene. In conclusion, we propose a novel approach to discovering potential cancer-driving mutations in noncoding regions.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Interpretation of the role of germline and somatic non-coding mutations in cancer: expression and chromatin conformation informed analysis
Clinical Epigenetics Open Access 28 September 2022
-
Genome-wide identification of differentially methylated promoters and enhancers associated with response to anti-PD-1 therapy in non-small cell lung cancer
Experimental & Molecular Medicine Open Access 02 September 2020
-
Prevalence and detection of low-allele-fraction variants in clinical cancer samples
Nature Communications Open Access 09 November 2017
Access options
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout




Accession codes
References
Vogelstein, B. et al. Cancer genome landscapes. Science 339, 1546–1558 (2013).
Lawrence, M.S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501 (2014).
Horn, S. et al. TERT promoter mutations in familial and sporadic melanoma. Science 339, 959–961 (2013).
Huang, F.W. et al. Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957–959 (2013).
Peifer, M. et al. Telomerase activation by genomic rearrangements in high-risk neuroblastoma. Nature 526, 700–704 (2015).
Valentijn, L.J. et al. TERT rearrangements are frequent in neuroblastoma and identify aggressive tumors. Nat. Genet. 47, 1411–1414 (2015).
Bell, R.J.A. et al. The transcription factor GABP selectively binds and activates the mutant TERT promoter in cancer. Science 348, 1036–1039 (2015).
Khurana, E. et al. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 342, 1235587 (2013).
Weinhold, N., Jacobsen, A., Schultz, N., Sander, C. & Lee, W. Genome-wide analysis of noncoding regulatory mutations in cancer. Nat. Genet. 46, 1160–1165 (2014).
Fredriksson, N.J., Ny, L., Nilsson, J.A. & Larsson, E. Systematic analysis of noncoding somatic mutations and gene expression alterations across 14 tumor types. Nat. Genet. 46, 1258–1263 (2014).
Melton, C., Reuter, J.A., Spacek, D.V. & Snyder, M. Recurrent somatic mutations in regulatory regions of human cancer genomes. Nat. Genet. 47, 710–716 (2015).
Lawrence, M.S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).
Sanyal, A., Lajoie, B.R., Jain, G. & Dekker, J. The long-range interaction landscape of gene promoters. Nature 489, 109–113 (2012).
de Laat, W. & Duboule, D. Topology of mammalian developmental enhancers and their regulatory landscapes. Nature 502, 499–506 (2013).
Jin, F. et al. A high-resolution map of the three-dimensional chromatin interactome in human cells. Nature 503, 290–294 (2013).
Fullwood, M.J. et al. An oestrogen-receptor-α–bound human chromatin interactome. Nature 462, 58–64 (2009).
Li, G. et al. Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell 148, 84–98 (2012).
Zhang, Y. et al. Chromatin connectivity maps reveal dynamic promoter–enhancer long-range associations. Nature 504, 306–310 (2013).
He, B., Chen, C., Teng, L. & Tan, K. Global view of enhancer–promoter interactome in human cells. Proc. Natl. Acad. Sci. USA 111, E2191–E2199 (2014).
Maurano, M.T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
Dixon, J.R. et al. Chromatin architecture reorganization during stem cell differentiation. Nature 518, 331–336 (2015).
Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).
Ionita-Laza, I., McCallum, K., Xu, B. & Buxbaum, J.D. A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nat. Genet. 48, 214–220 (2016).
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
Futreal, P.A. et al. A census of human cancer genes. Nat. Rev. Cancer 4, 177–183 (2004).
Polak, P. et al. Cell-of-origin chromatin organization shapes the mutational landscape of cancer. Nature 518, 360–364 (2015).
Sharrocks, A.D. The ETS-domain transcription factor family. Nat. Rev. Mol. Cell Biol. 2, 827–837 (2001).
Alexandrov, L.B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).
Krishnan, V.G. et al. Whole-genome sequencing of asian lung cancers: second-hand smoke unlikely to be responsible for higher incidence of lung cancer among Asian never-smokers. Cancer Res. 74, 6071–6081 (2014).
1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).
Kolesnikov, N. et al. ArrayExpress update—simplifying data submissions. Nucleic Acids Res. 43, D1113–D1116 (2015).
Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP–trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).
Rolland, T. et al. A proteome-scale map of the human interactome network. Cell 159, 1212–1226 (2014).
Yu, H. et al. Next-generation sequencing to generate interactome datasets. Nat. Methods 8, 478–480 (2011).
Das, J. & Yu, H. HINT: high-quality protein interactomes and their applications in understanding human disease. BMC Syst. Biol. 6, 92 (2012).
Lee, I., Blom, U.M., Wang, P.I., Shim, J.E. & Marcotte, E.M. Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res. 21, 1109–1121 (2011).
Mi, H., Muruganujan, A. & Thomas, P.D. PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucleic Acids Res. 41, D377–D386 (2013).
Matys, V. et al. TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 31, 374–378 (2003).
Weirauch, M.T. et al. Determination and inference of eukaryotic transcription factor sequence specificity. Cell 158, 1431–1443 (2014).
Grant, C.E., Bailey, T.L. & Noble, W.S. FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).
Bailey, T.L. et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 37, W202–W208 (2009).
Neph, S. et al. Circuitry and dynamics of human transcription factor regulatory networks. Cell 150, 1274–1286 (2012).
Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).
Hansen, R.S. et al. Sequencing newly replicated DNA reveals widespread plasticity in human replication timing. Proc. Natl. Acad. Sci. USA 107, 139–144 (2010).
Ihaka, R.R. A language for data analysis and graphics. J. Comput. Graph. Stat. 5, 299–314 (1996).
Liaw, A. & Wiener, M. Classification and regression by randomForest. R News 2, 18–22 (2002).
Acknowledgements
This work was supported by the Ministry of Health and Welfare funded through the Korea Health Industry Development Institute (HI13C2143), by the KAIST Future Systems Healthcare Project, and by the Ministry of Science, ICT and Future Planning (2013M3A9C4078139 and NRF-2015M3C9A4053251). Research facilities were supported by the CHUNG Moon Soul Centre of KAIST and by the KAIST Institute for the BioCentury.
Author information
Authors and Affiliations
Contributions
K.K., K.J., and W.Y. performed all analyses and drafted the manuscript. E.-Y.C., S.-M.P., M.B., and Y.-J.K. performed reporter assays. J.K.C. conceived the study, supervised data analyses, and wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Expression deviation of the genes mapped to gene-wise recurrent mutations in the TCGA breast cancer cohort (top) and the TCGA lung cancer cohort (bottom).
For each gene (each pair of red and blue bars), the average degree of expression deviation is compared between the mutated samples (red bar) and non-mutated samples (blue bar). For 72% (top) and 73% (bottom) of the cases, the mutated samples exhibited a higher degree of expression deviation. Genes were sorted according to the degree of the difference between the mutated and non-mutated groups. All genes are collapsed in the box plots on the right.
Supplementary Figure 2 Effects of copy number alterations versus mutations.
(a) The effects of copy number alterations for genes ordered as in Figure 1c. For each gene (each pair of red and blue bars), the percentage of samples with copy number alterations is compared between the mutated samples (red bar) and non-mutated samples (blue bar). All genes are collapsed in the box plots on the right. (b) Outcome of multiple linear regression quantitatively comparing the contribution of mutation recurrence and copy number alteration to expression perturbation. Copy number alteration was processed through the deviation metric used for expression perturbation. The contribution of mutation burden remained significant after adjusting for copy number alteration.
Supplementary Figure 3 Permutation tests for expression perturbation by mutations.
Gene expression profiles based on the RNA sequencing of 92 breast and 90 lung cancer samples were obtained from the TCGA and matched with the whole-genome sequences of the TCGA cohort samples. Array-based gene expression profiles of breast cancer samples were obtained from the ArrayExpress archive (E-MTAB-1088) and matched with the whole-genome sequences of the Sanger cohort samples. We obtained expression difference for each gene between the mutated and non-mutated groups as the t statistic. To estimate expected expression differences, the sample grouping (mutated versus non-mutated) was randomized while maintaining the sample size of each group. This permutation was repeated 1,000 times, and the t value was recalculated each time.
Supplementary Figure 4 Directionality of expression perturbation by mutations.
The x axis has genes ordered as in Figure 1c. The y axis indicates the fraction of cases (samples) that go in the same direction (either positive or negative expression deviation).
Supplementary Figure 5 Recurrence rate (the percentage of mutated samples in the given cohort) was obtained for each mutation (for site-specific recurrence) or each gene (for gene-level recurrence).
The site-specific recurrence rate was mapped to the target gene via the same chromatin interactome used for the gene-level recurrence analysis. P value indicates the differences in the recurrence rate between the known cancer genes and other genes.
Supplementary Figure 6 Permutation tests for biological validity of the chromatin interaction data.
We shuffled chromatin interaction links between enhancers and promoters. This procedure was repeated 1,000 times. High recurrence rates were observed for known cancer genes (Fig. 2c), cancer gene–interacting partners (Fig. 2d), and genes with relevant GO terms (Fig. 2e). We tested whether these patterns disappear as target genes are randomly mapped to mutations. To this end, the average recurrence rate of the above three groups of genes was obtained and divided by that of all genes. These relative recurrence rates were significantly higher with the real chromatin interactome map (black bars) than with the 1,000 randomized maps (gray density plots).
Supplementary Figure 7 Results of site-specific LOOCV
The proportion of positive votes by 1,000 decision trees in our random forest for site-specific LOOCV compared between truly recurrent mutations (green) and false, non-recurrent, mutations (red) depending on tumor and epigenome types.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–7, Supplementary Tables 1, 2, 4, 7 and 9, and Supplementary Note. (PDF 2003 kb)
Supplementary Table 3
Cancer genes and interacting genes with >10% gene-wise recurrence in either cancer. (XLSX 37 kb)
Supplementary Table 5
Prediction performance (AUC). (XLSX 21 kb)
Supplementary Table 6
Variable importance. (XLSX 144 kb)
Supplementary Table 8
Random forest features of lung cancer mutations targeting TERT. (XLSX 21 kb)
Rights and permissions
About this article
Cite this article
Kim, K., Jang, K., Yang, W. et al. Chromatin structure–based prediction of recurrent noncoding mutations in cancer. Nat Genet 48, 1321–1326 (2016). https://doi.org/10.1038/ng.3682
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.3682
This article is cited by
-
Interpretation of the role of germline and somatic non-coding mutations in cancer: expression and chromatin conformation informed analysis
Clinical Epigenetics (2022)
-
Non-coding driver mutations in human cancer
Nature Reviews Cancer (2021)
-
Discovery of regulatory noncoding variants in individual cancer genomes by using cis-X
Nature Genetics (2020)
-
Genome-wide identification of differentially methylated promoters and enhancers associated with response to anti-PD-1 therapy in non-small cell lung cancer
Experimental & Molecular Medicine (2020)
-
Illuminating the noncoding genome in cancer
Nature Cancer (2020)