Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Chromatin structure–based prediction of recurrent noncoding mutations in cancer

Abstract

Recurrence is a hallmark of cancer-driving mutations. Recurrent mutations can arise at the same site or affect the same gene at different sites. Here we identified a set of mutations arising in individual samples and altering different cis-regulatory elements that converge on a common gene via chromatin interactions. The mutations and genes identified in this fashion showed strong relevance to cancer, in contrast to noncoding mutations with site-specific recurrence only. We developed a prediction method that identifies potentially recurrent mutations on the basis of the features shared by mutations whose recurrence is observed in a given cohort. Our method was capable of accurately predicting recurrent mutations at the level of target genes but not mutations recurring at the same site. We experimentally validated predicted mutations in distal regulatory regions of the TERT gene. In conclusion, we propose a novel approach to discovering potential cancer-driving mutations in noncoding regions.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Chromatin structure–based recurrence model and mutational effects on transcription.
Figure 2: Oncogenic relevance of gene-wise recurrent mutations.
Figure 3: Recurrence prediction and cohort extension analysis.
Figure 4: Functional validation of predicted distal TERT mutations.

Similar content being viewed by others

Accession codes

Accessions

ArrayExpress

References

  1. Vogelstein, B. et al. Cancer genome landscapes. Science 339, 1546–1558 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Lawrence, M.S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Horn, S. et al. TERT promoter mutations in familial and sporadic melanoma. Science 339, 959–961 (2013).

    Article  CAS  PubMed  Google Scholar 

  4. Huang, F.W. et al. Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957–959 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Peifer, M. et al. Telomerase activation by genomic rearrangements in high-risk neuroblastoma. Nature 526, 700–704 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Valentijn, L.J. et al. TERT rearrangements are frequent in neuroblastoma and identify aggressive tumors. Nat. Genet. 47, 1411–1414 (2015).

    Article  CAS  PubMed  Google Scholar 

  7. Bell, R.J.A. et al. The transcription factor GABP selectively binds and activates the mutant TERT promoter in cancer. Science 348, 1036–1039 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Khurana, E. et al. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 342, 1235587 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  9. Weinhold, N., Jacobsen, A., Schultz, N., Sander, C. & Lee, W. Genome-wide analysis of noncoding regulatory mutations in cancer. Nat. Genet. 46, 1160–1165 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Fredriksson, N.J., Ny, L., Nilsson, J.A. & Larsson, E. Systematic analysis of noncoding somatic mutations and gene expression alterations across 14 tumor types. Nat. Genet. 46, 1258–1263 (2014).

    Article  CAS  PubMed  Google Scholar 

  11. Melton, C., Reuter, J.A., Spacek, D.V. & Snyder, M. Recurrent somatic mutations in regulatory regions of human cancer genomes. Nat. Genet. 47, 710–716 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Lawrence, M.S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Sanyal, A., Lajoie, B.R., Jain, G. & Dekker, J. The long-range interaction landscape of gene promoters. Nature 489, 109–113 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. de Laat, W. & Duboule, D. Topology of mammalian developmental enhancers and their regulatory landscapes. Nature 502, 499–506 (2013).

    Article  CAS  PubMed  Google Scholar 

  15. Jin, F. et al. A high-resolution map of the three-dimensional chromatin interactome in human cells. Nature 503, 290–294 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Fullwood, M.J. et al. An oestrogen-receptor-α–bound human chromatin interactome. Nature 462, 58–64 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Li, G. et al. Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell 148, 84–98 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Zhang, Y. et al. Chromatin connectivity maps reveal dynamic promoter–enhancer long-range associations. Nature 504, 306–310 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. He, B., Chen, C., Teng, L. & Tan, K. Global view of enhancer–promoter interactome in human cells. Proc. Natl. Acad. Sci. USA 111, E2191–E2199 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. Maurano, M.T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Dixon, J.R. et al. Chromatin architecture reorganization during stem cell differentiation. Nature 518, 331–336 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Ionita-Laza, I., McCallum, K., Xu, B. & Buxbaum, J.D. A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nat. Genet. 48, 214–220 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Futreal, P.A. et al. A census of human cancer genes. Nat. Rev. Cancer 4, 177–183 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Polak, P. et al. Cell-of-origin chromatin organization shapes the mutational landscape of cancer. Nature 518, 360–364 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Sharrocks, A.D. The ETS-domain transcription factor family. Nat. Rev. Mol. Cell Biol. 2, 827–837 (2001).

    Article  CAS  PubMed  Google Scholar 

  28. Alexandrov, L.B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Krishnan, V.G. et al. Whole-genome sequencing of asian lung cancers: second-hand smoke unlikely to be responsible for higher incidence of lung cancer among Asian never-smokers. Cancer Res. 74, 6071–6081 (2014).

    Article  CAS  PubMed  Google Scholar 

  30. 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

  31. Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Kolesnikov, N. et al. ArrayExpress update—simplifying data submissions. Nucleic Acids Res. 43, D1113–D1116 (2015).

    Article  CAS  PubMed  Google Scholar 

  33. Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP–trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).

    Article  CAS  PubMed  Google Scholar 

  34. Rolland, T. et al. A proteome-scale map of the human interactome network. Cell 159, 1212–1226 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Yu, H. et al. Next-generation sequencing to generate interactome datasets. Nat. Methods 8, 478–480 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Das, J. & Yu, H. HINT: high-quality protein interactomes and their applications in understanding human disease. BMC Syst. Biol. 6, 92 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Lee, I., Blom, U.M., Wang, P.I., Shim, J.E. & Marcotte, E.M. Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res. 21, 1109–1121 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Mi, H., Muruganujan, A. & Thomas, P.D. PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucleic Acids Res. 41, D377–D386 (2013).

    Article  CAS  PubMed  Google Scholar 

  39. Matys, V. et al. TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 31, 374–378 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Weirauch, M.T. et al. Determination and inference of eukaryotic transcription factor sequence specificity. Cell 158, 1431–1443 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Grant, C.E., Bailey, T.L. & Noble, W.S. FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Bailey, T.L. et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 37, W202–W208 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Neph, S. et al. Circuitry and dynamics of human transcription factor regulatory networks. Cell 150, 1274–1286 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Hansen, R.S. et al. Sequencing newly replicated DNA reveals widespread plasticity in human replication timing. Proc. Natl. Acad. Sci. USA 107, 139–144 (2010).

    Article  CAS  PubMed  Google Scholar 

  46. Ihaka, R.R. A language for data analysis and graphics. J. Comput. Graph. Stat. 5, 299–314 (1996).

    Google Scholar 

  47. Liaw, A. & Wiener, M. Classification and regression by randomForest. R News 2, 18–22 (2002).

    Google Scholar 

Download references

Acknowledgements

This work was supported by the Ministry of Health and Welfare funded through the Korea Health Industry Development Institute (HI13C2143), by the KAIST Future Systems Healthcare Project, and by the Ministry of Science, ICT and Future Planning (2013M3A9C4078139 and NRF-2015M3C9A4053251). Research facilities were supported by the CHUNG Moon Soul Centre of KAIST and by the KAIST Institute for the BioCentury.

Author information

Authors and Affiliations

Authors

Contributions

K.K., K.J., and W.Y. performed all analyses and drafted the manuscript. E.-Y.C., S.-M.P., M.B., and Y.-J.K. performed reporter assays. J.K.C. conceived the study, supervised data analyses, and wrote the manuscript.

Corresponding author

Correspondence to Jung Kyoon Choi.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Expression deviation of the genes mapped to gene-wise recurrent mutations in the TCGA breast cancer cohort (top) and the TCGA lung cancer cohort (bottom).

For each gene (each pair of red and blue bars), the average degree of expression deviation is compared between the mutated samples (red bar) and non-mutated samples (blue bar). For 72% (top) and 73% (bottom) of the cases, the mutated samples exhibited a higher degree of expression deviation. Genes were sorted according to the degree of the difference between the mutated and non-mutated groups. All genes are collapsed in the box plots on the right.

Supplementary Figure 2 Effects of copy number alterations versus mutations.

(a) The effects of copy number alterations for genes ordered as in Figure 1c. For each gene (each pair of red and blue bars), the percentage of samples with copy number alterations is compared between the mutated samples (red bar) and non-mutated samples (blue bar). All genes are collapsed in the box plots on the right. (b) Outcome of multiple linear regression quantitatively comparing the contribution of mutation recurrence and copy number alteration to expression perturbation. Copy number alteration was processed through the deviation metric used for expression perturbation. The contribution of mutation burden remained significant after adjusting for copy number alteration.

Supplementary Figure 3 Permutation tests for expression perturbation by mutations.

Gene expression profiles based on the RNA sequencing of 92 breast and 90 lung cancer samples were obtained from the TCGA and matched with the whole-genome sequences of the TCGA cohort samples. Array-based gene expression profiles of breast cancer samples were obtained from the ArrayExpress archive (E-MTAB-1088) and matched with the whole-genome sequences of the Sanger cohort samples. We obtained expression difference for each gene between the mutated and non-mutated groups as the t statistic. To estimate expected expression differences, the sample grouping (mutated versus non-mutated) was randomized while maintaining the sample size of each group. This permutation was repeated 1,000 times, and the t value was recalculated each time.

Supplementary Figure 4 Directionality of expression perturbation by mutations.

The x axis has genes ordered as in Figure 1c. The y axis indicates the fraction of cases (samples) that go in the same direction (either positive or negative expression deviation).

Supplementary Figure 5 Recurrence rate (the percentage of mutated samples in the given cohort) was obtained for each mutation (for site-specific recurrence) or each gene (for gene-level recurrence).

The site-specific recurrence rate was mapped to the target gene via the same chromatin interactome used for the gene-level recurrence analysis. P value indicates the differences in the recurrence rate between the known cancer genes and other genes.

Supplementary Figure 6 Permutation tests for biological validity of the chromatin interaction data.

We shuffled chromatin interaction links between enhancers and promoters. This procedure was repeated 1,000 times. High recurrence rates were observed for known cancer genes (Fig. 2c), cancer gene–interacting partners (Fig. 2d), and genes with relevant GO terms (Fig. 2e). We tested whether these patterns disappear as target genes are randomly mapped to mutations. To this end, the average recurrence rate of the above three groups of genes was obtained and divided by that of all genes. These relative recurrence rates were significantly higher with the real chromatin interactome map (black bars) than with the 1,000 randomized maps (gray density plots).

Supplementary Figure 7 Results of site-specific LOOCV

The proportion of positive votes by 1,000 decision trees in our random forest for site-specific LOOCV compared between truly recurrent mutations (green) and false, non-recurrent, mutations (red) depending on tumor and epigenome types.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–7, Supplementary Tables 1, 2, 4, 7 and 9, and Supplementary Note. (PDF 2003 kb)

Supplementary Table 3

Cancer genes and interacting genes with >10% gene-wise recurrence in either cancer. (XLSX 37 kb)

Supplementary Table 5

Prediction performance (AUC). (XLSX 21 kb)

Supplementary Table 6

Variable importance. (XLSX 144 kb)

Supplementary Table 8

Random forest features of lung cancer mutations targeting TERT. (XLSX 21 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, K., Jang, K., Yang, W. et al. Chromatin structure–based prediction of recurrent noncoding mutations in cancer. Nat Genet 48, 1321–1326 (2016). https://doi.org/10.1038/ng.3682

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.3682

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research