Abstract
Cohesin is present in almost all active enhancer regions, where it is associated with transcription factors1,2. Cohesin frequently colocalizes with CTCF (CCCTC-binding factor), affecting genomic stability, expression and epigenetic homeostasis3,4,5,6. Cohesin subunits are mutated in cancer7,8, but CTCF/cohesin-binding sites (CBSs) in DNA have not been examined for mutations. Here we report frequent mutations at CBSs in cancers displaying a mutational signature where mutations in A•T base pairs predominate. Integration of whole-genome sequencing data from 213 colorectal cancer (CRC) samples and chromatin immunoprecipitation sequencing (ChIP-exo) data identified frequent point mutations at CBSs. In contrast, CRCs showing an ultramutator phenotype caused by defects in the exonuclease domain of DNA polymerase ɛ (POLE) displayed significantly fewer mutations at and adjacent to CBSs. Analysis of public data showed that multiple cancer types accumulate CBS mutations. CBSs are a major mutational hotspot in the noncoding cancer genome.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Accession codes
References
Kagey, M.H. et al. Mediator and cohesin connect gene expression and chromatin architecture. Nature 467, 430–435 (2010).
Yan, J. et al. Transcription factor binding in human cells occurs in dense clusters formed around cohesin anchor sites. Cell 154, 801–813 (2013).
Parelho, V. et al. Cohesins functionally associate with CTCF on mammalian chromosome arms. Cell 132, 422–433 (2008).
Rubio, E.D. et al. CTCF physically links cohesin to chromatin. Proc. Natl. Acad. Sci. USA 105, 8309–8314 (2008).
Wendt, K.S. et al. Cohesin mediates transcriptional insulation by CCCTC-binding factor. Nature 451, 796–801 (2008).
Rao, S.S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
Kon, A. et al. Recurrent mutations in multiple components of the cohesin complex in myeloid neoplasms. Nat. Genet. 45, 1232–1237 (2013).
Leiserson, M.D. et al. Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat. Genet. 47, 106–114 (2015).
Huang, F.W. et al. Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957–959 (2013).
Mansour, M.R. et al. An oncogenic super-enhancer formed through somatic mutation of a noncoding intergenic element. Science 346, 1373–1377 (2014).
Alexandrov, L.B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).
Chen, C.L. et al. Impact of replication timing on non-CpG and CpG substitution rates in mammalian genomes. Genome Res. 20, 447–457 (2010).
Lawrence, M.S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).
Hoffman, M.M. et al. Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res. 41, 827–841 (2013).
Hudson, T.J. et al. International network of cancer genome projects. Nature 464, 993–998 (2010).
Guillou, E. et al. Cohesin organizes chromatin loops at DNA replication factories. Genes Dev. 24, 2812–2822 (2010).
Helleday, T., Eshtad, S. & Nik-Zainal, S. Mechanisms underlying mutational signatures in human cancers. Nat. Rev. Genet. 15, 585–598 (2014).
Dulak, A.M. et al. Exome and whole-genome sequencing of esophageal adenocarcinoma identifies recurrent driver events and mutational complexity. Nat. Genet. 45, 478–486 (2013).
Satou, K., Kawai, K., Kasai, H., Harashima, H. & Kamiya, H. Mutagenic effects of 8-hydroxy-dGTP in live mammalian cells. Free Radic. Biol. Med. 42, 1552–1560 (2007).
Narendra, V. et al. CTCF establishes discrete functional chromatin domains at the Hox clusters during differentiation. Science 347, 1017–1021 (2015).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
DePristo, M.A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).
Alexandrov, L.B., Nik-Zainal, S., Wedge, D.C., Campbell, P.J. & Stratton, M.R. Deciphering signatures of mutational processes operative in human cancer. Cell Rep. 3, 246–259 (2013).
Rhee, H.S. & Pugh, B.F. Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide resolution. Cell 147, 1408–1419 (2011).
Guo, Y. et al. Discovering homotypic binding events at high spatial resolution. Bioinformatics 26, 3028 (2010).
Jolma, A. et al. Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. Genome Res. 20, 861–873 (2010).
Jolma, A. et al. DNA-binding specificities of human transcription factors. Cell 152, 327–339 (2013).
Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012).
Acknowledgements
We thank S. Augsten, L. Hu, H. Metsola, S. Nieminen, A. Ollikainen, M. Rajalaakso, S. Soisalo, I.-L. Svedberg, M. Turunen, I. Vuoristo and A. Zetterlund for technical assistance. Further, we thank M. Aavikko, I. Donner, A. Karhu and M. Mehine for additional help with the manuscript. We acknowledge the International Cancer Genome Consortium (ICGC) for the somatic mutation data. This work was supported by grants from the Academy of Finland (Finnish Center of Excellence Program 2012–2017, 250345, personal grants 137680 and 274474 to O.K.), the University of Helsinki (personal grant to S.T.), the Finnish Cancer Society (also a personal grant to K.P.), the European Research Council (ERC; 268648), the Sigrid Juselius Foundation, SYSCOL (a European Union Framework Programme 7 Collaborative Project, 258236), the Nordic Information for Action eScience Center (NIASC), the Nordic Center of Excellence financed by NordForsk (project 62721, personal grant to K.P.) and State Research Funding of Kuopio University Hospital (B1401). We acknowledge the computational resources provided by the ELIXIR node, hosted at the CSC–IT Center for Science, Finland, and funded by the Academy of Finland (grants 271642 and 263164), the Ministry of Education and Culture, Finland.
Author information
Authors and Affiliations
Contributions
R.K., K.P. and E.P. performed the somatic mutation analyses. K.D. and M.E. performed the ChIP-seq and ChIP-exo experiments. R.K., T.K., K.P. and E.P. performed the sequence motif analyses. E.P., R.K. and H.R. performed the analyses of ICGC data. U.A.H., T.C., J.K., T.T., A.E.G., S.T. and O.K. prepared the CRC samples. J.-P.M., H.J., L.R.-S. and A.L. contributed patient samples. E.K. and N.V. contributed to the data analysis. E.P. and R.K. performed the primary sequence analysis. L.A.A. and J.T. designed and supervised the study. R.K., K.D., E.P. and K.P. contributed equally to the study. All authors have contributed to writing of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Substitution context weights for extracted signatures.
Relative contribution of the six substitution types, in 5’-Xp[T/C]pZ-3’ sequence contexts to signatures 1, 17 and X. The size of each segment in the pie charts corresponds to the contribution of the context to the signature.
Supplementary Figure 2 Relative exposures of signatures 1 and 17 and substitutions at CBSs in 198 MSS CRCs, 12 MSI CRCs and 3 POLε-exo− CRCs.
(a) In MSS CRCs, signatures 1 and 17 show negative correlation (r = –0.64, P = 3.52 × 10−24). Symbol size represents the number of CBS substitutions relative to the total substitution load. MSS (n = 198), MSI (n = 12) and POLε-exo− (n = 3) CRC samples cluster distinctively in terms of 3 signatures. (b,c) CBS mutation count has a strong positive correlation with signature 17 (Spearman ρ = 0.54, P = 2.47 × 10−16) (b) and negative correlation with signature 1 (r = –0.41, P = 2.06 × 10−9) (c). In b and c, symbol size is relative to the total mutation count of the sample. The saturation levels of red and blue represent the exposure levels of signatures 17 and 1, respectively.
Supplementary Figure 3 Sample-wise distribution of somatic substitutions at CBSs.
The CTCF motif region is indicated by shading. A total of 20 of the 198 tumors with the most CBS mutations are shown.
Supplementary Figure 4 Position-specific expected and observed mutation distributions at CBSs in 198 MSS CRCs.
(a) Left, expected mutation distribution at CBSs under signature 17 for the six substitution types. The expected value was calculated for every base using weights of triplets (sequence context) in signature 17. Right, observed number of mutations for the 6 substitution types at CBSs in 198 MSS CRC samples. (b) Difference in observed and expected mutation counts for the six substitution types, where both observed and expected values have been normalized by dividing by the median count in 10,000-bp regions centered on the CBSs. Substitutions of A:T bases emerge at a significantly higher level than expected from the sequence context alone (see Supplementary Table 3 for statistical analysis).
Supplementary Figure 5 Somatic substitution distribution across 28,331 CBSs in 198 MSS CRCs.
The y axis is on a logarithmic scale. A total of 1,966 substitutions were observed in 1,553 of 28,331 sites.
Supplementary Figure 6 Somatic substitutions near CTCF and RAD21 sites in CRC.
(a) Venn diagram showing the numbers of RAD21 and CTCF sites as determined by ChIP-exo. A hit for a CTCF motif was required to be within a 200-bp window for the CTCF-binding site. RAD21 and CTCF were found to colocalize at 28,331 sites (CBSs). Accumulation of somatic substitutions was observed at CBSs but not at sites bound by RAD21 or CTCF alone. Mutation frequencies are indicated for sites bound by RAD21, CTCF or both. (b) Somatic substitutions at 28,364 high-affinity CTCF motif occurrences without a nearby RAD21 and CTCF ChIP-exo peak. The affinity threshold was set to 14.954 to match approximately the number of CBSs and thus to create a sufficiently large set of CTCF motif occurrences for comparison. MSS CRCs show no accumulation of substitutions at these sites. POLε-exo− CRCs do not show a decrease in substitution rate in 400-bp windows flanking CTCF motif occurrences, unlike in the flanking regions of CTCF-binding sites colocalized by RAD21 (that is, CBSs).
Supplementary Figure 7 Somatic substitutions at binding sites for the REST, MAX, KLF5 and HNF4A transcription factors in 198 MSS CRC samples.
Binding sites are indicated by shading.
Supplementary Figure 8 Allelic fractions of somatic substitutions in CRCs.
(a,b) Density plots show the allelic fraction of somatic substitutions in the tumors with the highest signature 17 exposure (top 20) (a) and the tumors with the lowest signature 2 exposure (bottom 20) (b). Allelic fractions for the six substitution types are shown separately.
Supplementary Figure 9 Consensus germline SNPs from the 1000 Genomes Project and conservation (GERP) scores at 28,331 CBSs.
The figure pair at the top illustrates SNPs and GERP scores at CBSs and 1,000-bp flanking regions. The figure pair below that shows CBSs and 30-bp flanking regions. Germline variation increases slightly in the 50-bp regions flanking the CBSs and decreases significantly at CBSs.
Supplementary Figure 10 Sample-wise somatic substitutions at CBSs in ICGC and CRC data sets.
Samples with at least ten CBS substitutions are shown. Different substitution types at CBSs in an individual sample are shown by colored bars. Gray bars show the amount of CBS substitutions relative to the total substitution load in the cancer. Samples with more A:T changes typically show higher relative mutation counts at CBSs. The following eight ICGC data sets are shown: esophageal adenocarcinoma (ESAD-UK), hepatocellular carcinoma (LIRI-JP, LICA-FR, LINC-JP), pancreatic cancer (PACA-CA, PACA-AU), ovarian cancer (OV-AU) and malignant lymphoma (MALY-DE).
Supplementary Figure 11 Somatic substitutions at 28,331 CBSs in ICGC data sets.
A total of eight tumor types, where whole-genome sequencing data were available through ICGC and the substitution count was at least ten, are shown.
Supplementary Figure 12 CTCF motif binding affinity at CBSs in reference sequence and after somatic mutation.
Only sites with a mutation at a core motif are shown. Colors indicate substitution type. Note that outlier points are due to zero-probability bases in the motif. Data are shown for MSS (n = 198), MSI (n = 12) and POLε-exo− (n = 3) CRCs.
Supplementary Figure 13 Core CTCF motif substitutions and expression of neighboring, opposite-strand gene pairs.
(a–c) Nearby gene pairs without CBSs (a) show stronger correlation in gene expression than gene pairs separated by CBSs (b) and random gene pairs (c). (d) Gene pairs separated by CBSs having one or more somatic mutations did not provide enough data to meaningfully examine possible differences between samples with core CTCF motif substitutions (red points) and samples without core substitutions (gray-black). The x and y axes represent rankings of forward- and reverse-strand gene expression, respectively, and counts are the number of observed rank pairs over all opposite-strand gene pairs (at a distance of <20 kb). Data points closer to the diagonal suggest stronger correlation. Data are shown for 102 ovarian cancer samples; in total, 15 core CTCF substitutions were identified residing between 20 neighboring gene pairs (at a distance of <20 kb).
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–13, Supplementary Tables 3 and 5–26, and Supplementary Note. (PDF 2202 kb)
Supplementary Table 1
Somatic mutation counts in total and at CBSs, and signature exposures. (XLSX 66 kb)
Supplementary Table 2
Genomic coordinates and orientations for all CBSs (GRCh37). (XLSX 1536 kb)
Supplementary Table 4
Somatic mutation clusters in ENCODE and LoVo regions. Includes CTCF position frequency matrix and POLE exonuclease mutations. (XLSX 39 kb)
Source data
Rights and permissions
About this article
Cite this article
Katainen, R., Dave, K., Pitkänen, E. et al. CTCF/cohesin-binding sites are frequently mutated in cancer. Nat Genet 47, 818–821 (2015). https://doi.org/10.1038/ng.3335
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.3335
This article is cited by
-
Sequence dependencies and mutation rates of localized mutational processes in cancer
Genome Medicine (2023)
-
Long-range gene regulation in hormone-dependent cancer
Nature Reviews Cancer (2023)
-
Multi-feature clustering of CTCF binding creates robustness for loop extrusion blocking and Topologically Associating Domain boundaries
Nature Communications (2023)
-
CTCF-mediated H3K27me3 enrichment on the LncRNA MALAT1 promoter regulates the cardiomyocytes from I/R-induced apoptosis through targeting miR-26b-5p
Molecular & Cellular Toxicology (2023)
-
Alterations of cohesin complex genes in acute myeloid leukemia: differential co-mutations, clinical presentation and impact on outcome
Blood Cancer Journal (2023)