Abstract
Cohesin is present in almost all active enhancer regions, where it is associated with transcription factors1,2. Cohesin frequently colocalizes with CTCF (CCCTC-binding factor), affecting genomic stability, expression and epigenetic homeostasis3,4,5,6. Cohesin subunits are mutated in cancer7,8, but CTCF/cohesin-binding sites (CBSs) in DNA have not been examined for mutations. Here we report frequent mutations at CBSs in cancers displaying a mutational signature where mutations in A•T base pairs predominate. Integration of whole-genome sequencing data from 213 colorectal cancer (CRC) samples and chromatin immunoprecipitation sequencing (ChIP-exo) data identified frequent point mutations at CBSs. In contrast, CRCs showing an ultramutator phenotype caused by defects in the exonuclease domain of DNA polymerase ɛ (POLE) displayed significantly fewer mutations at and adjacent to CBSs. Analysis of public data showed that multiple cancer types accumulate CBS mutations. CBSs are a major mutational hotspot in the noncoding cancer genome.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
The co-evolution of the genome and epigenome in colorectal cancer
Nature Open Access 26 October 2022
-
The impact of rare germline variants on human somatic mutation processes
Nature Communications Open Access 28 June 2022
-
Chromatin insulation dynamics in glioblastoma: challenges and future perspectives of precision oncology
Clinical Epigenetics Open Access 31 July 2021
Access options
Subscribe to Journal
Get full journal access for 1 year
$79.00
only $6.58 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Buy article
Get time limited or full article access on ReadCube.
$32.00
All prices are NET prices.




Accession codes
References
Kagey, M.H. et al. Mediator and cohesin connect gene expression and chromatin architecture. Nature 467, 430–435 (2010).
Yan, J. et al. Transcription factor binding in human cells occurs in dense clusters formed around cohesin anchor sites. Cell 154, 801–813 (2013).
Parelho, V. et al. Cohesins functionally associate with CTCF on mammalian chromosome arms. Cell 132, 422–433 (2008).
Rubio, E.D. et al. CTCF physically links cohesin to chromatin. Proc. Natl. Acad. Sci. USA 105, 8309–8314 (2008).
Wendt, K.S. et al. Cohesin mediates transcriptional insulation by CCCTC-binding factor. Nature 451, 796–801 (2008).
Rao, S.S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
Kon, A. et al. Recurrent mutations in multiple components of the cohesin complex in myeloid neoplasms. Nat. Genet. 45, 1232–1237 (2013).
Leiserson, M.D. et al. Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat. Genet. 47, 106–114 (2015).
Huang, F.W. et al. Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957–959 (2013).
Mansour, M.R. et al. An oncogenic super-enhancer formed through somatic mutation of a noncoding intergenic element. Science 346, 1373–1377 (2014).
Alexandrov, L.B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).
Chen, C.L. et al. Impact of replication timing on non-CpG and CpG substitution rates in mammalian genomes. Genome Res. 20, 447–457 (2010).
Lawrence, M.S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).
Hoffman, M.M. et al. Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res. 41, 827–841 (2013).
Hudson, T.J. et al. International network of cancer genome projects. Nature 464, 993–998 (2010).
Guillou, E. et al. Cohesin organizes chromatin loops at DNA replication factories. Genes Dev. 24, 2812–2822 (2010).
Helleday, T., Eshtad, S. & Nik-Zainal, S. Mechanisms underlying mutational signatures in human cancers. Nat. Rev. Genet. 15, 585–598 (2014).
Dulak, A.M. et al. Exome and whole-genome sequencing of esophageal adenocarcinoma identifies recurrent driver events and mutational complexity. Nat. Genet. 45, 478–486 (2013).
Satou, K., Kawai, K., Kasai, H., Harashima, H. & Kamiya, H. Mutagenic effects of 8-hydroxy-dGTP in live mammalian cells. Free Radic. Biol. Med. 42, 1552–1560 (2007).
Narendra, V. et al. CTCF establishes discrete functional chromatin domains at the Hox clusters during differentiation. Science 347, 1017–1021 (2015).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
DePristo, M.A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).
Alexandrov, L.B., Nik-Zainal, S., Wedge, D.C., Campbell, P.J. & Stratton, M.R. Deciphering signatures of mutational processes operative in human cancer. Cell Rep. 3, 246–259 (2013).
Rhee, H.S. & Pugh, B.F. Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide resolution. Cell 147, 1408–1419 (2011).
Guo, Y. et al. Discovering homotypic binding events at high spatial resolution. Bioinformatics 26, 3028 (2010).
Jolma, A. et al. Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. Genome Res. 20, 861–873 (2010).
Jolma, A. et al. DNA-binding specificities of human transcription factors. Cell 152, 327–339 (2013).
Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012).
Acknowledgements
We thank S. Augsten, L. Hu, H. Metsola, S. Nieminen, A. Ollikainen, M. Rajalaakso, S. Soisalo, I.-L. Svedberg, M. Turunen, I. Vuoristo and A. Zetterlund for technical assistance. Further, we thank M. Aavikko, I. Donner, A. Karhu and M. Mehine for additional help with the manuscript. We acknowledge the International Cancer Genome Consortium (ICGC) for the somatic mutation data. This work was supported by grants from the Academy of Finland (Finnish Center of Excellence Program 2012–2017, 250345, personal grants 137680 and 274474 to O.K.), the University of Helsinki (personal grant to S.T.), the Finnish Cancer Society (also a personal grant to K.P.), the European Research Council (ERC; 268648), the Sigrid Juselius Foundation, SYSCOL (a European Union Framework Programme 7 Collaborative Project, 258236), the Nordic Information for Action eScience Center (NIASC), the Nordic Center of Excellence financed by NordForsk (project 62721, personal grant to K.P.) and State Research Funding of Kuopio University Hospital (B1401). We acknowledge the computational resources provided by the ELIXIR node, hosted at the CSC–IT Center for Science, Finland, and funded by the Academy of Finland (grants 271642 and 263164), the Ministry of Education and Culture, Finland.
Author information
Authors and Affiliations
Contributions
R.K., K.P. and E.P. performed the somatic mutation analyses. K.D. and M.E. performed the ChIP-seq and ChIP-exo experiments. R.K., T.K., K.P. and E.P. performed the sequence motif analyses. E.P., R.K. and H.R. performed the analyses of ICGC data. U.A.H., T.C., J.K., T.T., A.E.G., S.T. and O.K. prepared the CRC samples. J.-P.M., H.J., L.R.-S. and A.L. contributed patient samples. E.K. and N.V. contributed to the data analysis. E.P. and R.K. performed the primary sequence analysis. L.A.A. and J.T. designed and supervised the study. R.K., K.D., E.P. and K.P. contributed equally to the study. All authors have contributed to writing of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Substitution context weights for extracted signatures.
Relative contribution of the six substitution types, in 5’-Xp[T/C]pZ-3’ sequence contexts to signatures 1, 17 and X. The size of each segment in the pie charts corresponds to the contribution of the context to the signature.
Supplementary Figure 2 Relative exposures of signatures 1 and 17 and substitutions at CBSs in 198 MSS CRCs, 12 MSI CRCs and 3 POLε-exo− CRCs.
(a) In MSS CRCs, signatures 1 and 17 show negative correlation (r = –0.64, P = 3.52 × 10−24). Symbol size represents the number of CBS substitutions relative to the total substitution load. MSS (n = 198), MSI (n = 12) and POLε-exo− (n = 3) CRC samples cluster distinctively in terms of 3 signatures. (b,c) CBS mutation count has a strong positive correlation with signature 17 (Spearman ρ = 0.54, P = 2.47 × 10−16) (b) and negative correlation with signature 1 (r = –0.41, P = 2.06 × 10−9) (c). In b and c, symbol size is relative to the total mutation count of the sample. The saturation levels of red and blue represent the exposure levels of signatures 17 and 1, respectively.
Supplementary Figure 3 Sample-wise distribution of somatic substitutions at CBSs.
The CTCF motif region is indicated by shading. A total of 20 of the 198 tumors with the most CBS mutations are shown.
Supplementary Figure 4 Position-specific expected and observed mutation distributions at CBSs in 198 MSS CRCs.
(a) Left, expected mutation distribution at CBSs under signature 17 for the six substitution types. The expected value was calculated for every base using weights of triplets (sequence context) in signature 17. Right, observed number of mutations for the 6 substitution types at CBSs in 198 MSS CRC samples. (b) Difference in observed and expected mutation counts for the six substitution types, where both observed and expected values have been normalized by dividing by the median count in 10,000-bp regions centered on the CBSs. Substitutions of A:T bases emerge at a significantly higher level than expected from the sequence context alone (see Supplementary Table 3 for statistical analysis).
Supplementary Figure 5 Somatic substitution distribution across 28,331 CBSs in 198 MSS CRCs.
The y axis is on a logarithmic scale. A total of 1,966 substitutions were observed in 1,553 of 28,331 sites.
Supplementary Figure 6 Somatic substitutions near CTCF and RAD21 sites in CRC.
(a) Venn diagram showing the numbers of RAD21 and CTCF sites as determined by ChIP-exo. A hit for a CTCF motif was required to be within a 200-bp window for the CTCF-binding site. RAD21 and CTCF were found to colocalize at 28,331 sites (CBSs). Accumulation of somatic substitutions was observed at CBSs but not at sites bound by RAD21 or CTCF alone. Mutation frequencies are indicated for sites bound by RAD21, CTCF or both. (b) Somatic substitutions at 28,364 high-affinity CTCF motif occurrences without a nearby RAD21 and CTCF ChIP-exo peak. The affinity threshold was set to 14.954 to match approximately the number of CBSs and thus to create a sufficiently large set of CTCF motif occurrences for comparison. MSS CRCs show no accumulation of substitutions at these sites. POLε-exo− CRCs do not show a decrease in substitution rate in 400-bp windows flanking CTCF motif occurrences, unlike in the flanking regions of CTCF-binding sites colocalized by RAD21 (that is, CBSs).
Supplementary Figure 7 Somatic substitutions at binding sites for the REST, MAX, KLF5 and HNF4A transcription factors in 198 MSS CRC samples.
Binding sites are indicated by shading.
Supplementary Figure 8 Allelic fractions of somatic substitutions in CRCs.
(a,b) Density plots show the allelic fraction of somatic substitutions in the tumors with the highest signature 17 exposure (top 20) (a) and the tumors with the lowest signature 2 exposure (bottom 20) (b). Allelic fractions for the six substitution types are shown separately.
Supplementary Figure 9 Consensus germline SNPs from the 1000 Genomes Project and conservation (GERP) scores at 28,331 CBSs.
The figure pair at the top illustrates SNPs and GERP scores at CBSs and 1,000-bp flanking regions. The figure pair below that shows CBSs and 30-bp flanking regions. Germline variation increases slightly in the 50-bp regions flanking the CBSs and decreases significantly at CBSs.
Supplementary Figure 10 Sample-wise somatic substitutions at CBSs in ICGC and CRC data sets.
Samples with at least ten CBS substitutions are shown. Different substitution types at CBSs in an individual sample are shown by colored bars. Gray bars show the amount of CBS substitutions relative to the total substitution load in the cancer. Samples with more A:T changes typically show higher relative mutation counts at CBSs. The following eight ICGC data sets are shown: esophageal adenocarcinoma (ESAD-UK), hepatocellular carcinoma (LIRI-JP, LICA-FR, LINC-JP), pancreatic cancer (PACA-CA, PACA-AU), ovarian cancer (OV-AU) and malignant lymphoma (MALY-DE).
Supplementary Figure 11 Somatic substitutions at 28,331 CBSs in ICGC data sets.
A total of eight tumor types, where whole-genome sequencing data were available through ICGC and the substitution count was at least ten, are shown.
Supplementary Figure 12 CTCF motif binding affinity at CBSs in reference sequence and after somatic mutation.
Only sites with a mutation at a core motif are shown. Colors indicate substitution type. Note that outlier points are due to zero-probability bases in the motif. Data are shown for MSS (n = 198), MSI (n = 12) and POLε-exo− (n = 3) CRCs.
Supplementary Figure 13 Core CTCF motif substitutions and expression of neighboring, opposite-strand gene pairs.
(a–c) Nearby gene pairs without CBSs (a) show stronger correlation in gene expression than gene pairs separated by CBSs (b) and random gene pairs (c). (d) Gene pairs separated by CBSs having one or more somatic mutations did not provide enough data to meaningfully examine possible differences between samples with core CTCF motif substitutions (red points) and samples without core substitutions (gray-black). The x and y axes represent rankings of forward- and reverse-strand gene expression, respectively, and counts are the number of observed rank pairs over all opposite-strand gene pairs (at a distance of <20 kb). Data points closer to the diagonal suggest stronger correlation. Data are shown for 102 ovarian cancer samples; in total, 15 core CTCF substitutions were identified residing between 20 neighboring gene pairs (at a distance of <20 kb).
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–13, Supplementary Tables 3 and 5–26, and Supplementary Note. (PDF 2202 kb)
Supplementary Table 1
Somatic mutation counts in total and at CBSs, and signature exposures. (XLSX 66 kb)
Supplementary Table 2
Genomic coordinates and orientations for all CBSs (GRCh37). (XLSX 1536 kb)
Supplementary Table 4
Somatic mutation clusters in ENCODE and LoVo regions. Includes CTCF position frequency matrix and POLE exonuclease mutations. (XLSX 39 kb)
Source data
Rights and permissions
About this article
Cite this article
Katainen, R., Dave, K., Pitkänen, E. et al. CTCF/cohesin-binding sites are frequently mutated in cancer. Nat Genet 47, 818–821 (2015). https://doi.org/10.1038/ng.3335
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.3335
This article is cited by
-
Mapping the genomic diaspora of gastric cancer
Nature Reviews Cancer (2022)
-
Structural variations in cancer and the 3D genome
Nature Reviews Cancer (2022)
-
The impact of rare germline variants on human somatic mutation processes
Nature Communications (2022)
-
Computational analysis of cancer genome sequencing data
Nature Reviews Genetics (2022)
-
CRISPR in cancer biology and therapy
Nature Reviews Cancer (2022)