Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

CTCF/cohesin-binding sites are frequently mutated in cancer


Cohesin is present in almost all active enhancer regions, where it is associated with transcription factors1,2. Cohesin frequently colocalizes with CTCF (CCCTC-binding factor), affecting genomic stability, expression and epigenetic homeostasis3,4,5,6. Cohesin subunits are mutated in cancer7,8, but CTCF/cohesin-binding sites (CBSs) in DNA have not been examined for mutations. Here we report frequent mutations at CBSs in cancers displaying a mutational signature where mutations in A•T base pairs predominate. Integration of whole-genome sequencing data from 213 colorectal cancer (CRC) samples and chromatin immunoprecipitation sequencing (ChIP-exo) data identified frequent point mutations at CBSs. In contrast, CRCs showing an ultramutator phenotype caused by defects in the exonuclease domain of DNA polymerase ɛ (POLE) displayed significantly fewer mutations at and adjacent to CBSs. Analysis of public data showed that multiple cancer types accumulate CBS mutations. CBSs are a major mutational hotspot in the noncoding cancer genome.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Total number of somatic substitutions occurring at 28,331 CBSs.
Figure 2: Somatic substitution rates and replication timing on chromosome 1.
Figure 3: Somatic mutations at one of the most recurrently mutated CBSs.
Figure 4: Sample-wise somatic substitutions across all 28,331 CBSs in MSS CRCs, virus-associated hepatocellular carcinomas (LIRI-JP and ICGC) and MSI CRCs.

Accession codes


European Nucleotide Archive


  1. Kagey, M.H. et al. Mediator and cohesin connect gene expression and chromatin architecture. Nature 467, 430–435 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Yan, J. et al. Transcription factor binding in human cells occurs in dense clusters formed around cohesin anchor sites. Cell 154, 801–813 (2013).

    Article  CAS  PubMed  Google Scholar 

  3. Parelho, V. et al. Cohesins functionally associate with CTCF on mammalian chromosome arms. Cell 132, 422–433 (2008).

    Article  CAS  PubMed  Google Scholar 

  4. Rubio, E.D. et al. CTCF physically links cohesin to chromatin. Proc. Natl. Acad. Sci. USA 105, 8309–8314 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Wendt, K.S. et al. Cohesin mediates transcriptional insulation by CCCTC-binding factor. Nature 451, 796–801 (2008).

    Article  CAS  PubMed  Google Scholar 

  6. Rao, S.S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Kon, A. et al. Recurrent mutations in multiple components of the cohesin complex in myeloid neoplasms. Nat. Genet. 45, 1232–1237 (2013).

    Article  CAS  PubMed  Google Scholar 

  8. Leiserson, M.D. et al. Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat. Genet. 47, 106–114 (2015).

    Article  CAS  PubMed  Google Scholar 

  9. Huang, F.W. et al. Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957–959 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Mansour, M.R. et al. An oncogenic super-enhancer formed through somatic mutation of a noncoding intergenic element. Science 346, 1373–1377 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Alexandrov, L.B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Chen, C.L. et al. Impact of replication timing on non-CpG and CpG substitution rates in mammalian genomes. Genome Res. 20, 447–457 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Lawrence, M.S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  14. Hoffman, M.M. et al. Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res. 41, 827–841 (2013).

    Article  CAS  PubMed  Google Scholar 

  15. Hudson, T.J. et al. International network of cancer genome projects. Nature 464, 993–998 (2010).

    Article  CAS  PubMed  Google Scholar 

  16. Guillou, E. et al. Cohesin organizes chromatin loops at DNA replication factories. Genes Dev. 24, 2812–2822 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Helleday, T., Eshtad, S. & Nik-Zainal, S. Mechanisms underlying mutational signatures in human cancers. Nat. Rev. Genet. 15, 585–598 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Dulak, A.M. et al. Exome and whole-genome sequencing of esophageal adenocarcinoma identifies recurrent driver events and mutational complexity. Nat. Genet. 45, 478–486 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Satou, K., Kawai, K., Kasai, H., Harashima, H. & Kamiya, H. Mutagenic effects of 8-hydroxy-dGTP in live mammalian cells. Free Radic. Biol. Med. 42, 1552–1560 (2007).

    Article  CAS  PubMed  Google Scholar 

  20. Narendra, V. et al. CTCF establishes discrete functional chromatin domains at the Hox clusters during differentiation. Science 347, 1017–1021 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. DePristo, M.A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Alexandrov, L.B., Nik-Zainal, S., Wedge, D.C., Campbell, P.J. & Stratton, M.R. Deciphering signatures of mutational processes operative in human cancer. Cell Rep. 3, 246–259 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Rhee, H.S. & Pugh, B.F. Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide resolution. Cell 147, 1408–1419 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Guo, Y. et al. Discovering homotypic binding events at high spatial resolution. Bioinformatics 26, 3028 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Jolma, A. et al. Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. Genome Res. 20, 861–873 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Jolma, A. et al. DNA-binding specificities of human transcription factors. Cell 152, 327–339 (2013).

    Article  CAS  PubMed  Google Scholar 

  29. Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references


We thank S. Augsten, L. Hu, H. Metsola, S. Nieminen, A. Ollikainen, M. Rajalaakso, S. Soisalo, I.-L. Svedberg, M. Turunen, I. Vuoristo and A. Zetterlund for technical assistance. Further, we thank M. Aavikko, I. Donner, A. Karhu and M. Mehine for additional help with the manuscript. We acknowledge the International Cancer Genome Consortium (ICGC) for the somatic mutation data. This work was supported by grants from the Academy of Finland (Finnish Center of Excellence Program 2012–2017, 250345, personal grants 137680 and 274474 to O.K.), the University of Helsinki (personal grant to S.T.), the Finnish Cancer Society (also a personal grant to K.P.), the European Research Council (ERC; 268648), the Sigrid Juselius Foundation, SYSCOL (a European Union Framework Programme 7 Collaborative Project, 258236), the Nordic Information for Action eScience Center (NIASC), the Nordic Center of Excellence financed by NordForsk (project 62721, personal grant to K.P.) and State Research Funding of Kuopio University Hospital (B1401). We acknowledge the computational resources provided by the ELIXIR node, hosted at the CSC–IT Center for Science, Finland, and funded by the Academy of Finland (grants 271642 and 263164), the Ministry of Education and Culture, Finland.

Author information

Authors and Affiliations



R.K., K.P. and E.P. performed the somatic mutation analyses. K.D. and M.E. performed the ChIP-seq and ChIP-exo experiments. R.K., T.K., K.P. and E.P. performed the sequence motif analyses. E.P., R.K. and H.R. performed the analyses of ICGC data. U.A.H., T.C., J.K., T.T., A.E.G., S.T. and O.K. prepared the CRC samples. J.-P.M., H.J., L.R.-S. and A.L. contributed patient samples. E.K. and N.V. contributed to the data analysis. E.P. and R.K. performed the primary sequence analysis. L.A.A. and J.T. designed and supervised the study. R.K., K.D., E.P. and K.P. contributed equally to the study. All authors have contributed to writing of the manuscript.

Corresponding authors

Correspondence to Jussi Taipale or Lauri A Aaltonen.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Substitution context weights for extracted signatures.

Relative contribution of the six substitution types, in 5’-Xp[T/C]pZ-3’ sequence contexts to signatures 1, 17 and X. The size of each segment in the pie charts corresponds to the contribution of the context to the signature.

Source data

Supplementary Figure 2 Relative exposures of signatures 1 and 17 and substitutions at CBSs in 198 MSS CRCs, 12 MSI CRCs and 3 POLε-exo CRCs.

(a) In MSS CRCs, signatures 1 and 17 show negative correlation (r = –0.64, P = 3.52 × 10−24). Symbol size represents the number of CBS substitutions relative to the total substitution load. MSS (n = 198), MSI (n = 12) and POLε-exo (n = 3) CRC samples cluster distinctively in terms of 3 signatures. (b,c) CBS mutation count has a strong positive correlation with signature 17 (Spearman ρ = 0.54, P = 2.47 × 10−16) (b) and negative correlation with signature 1 (r = –0.41, P = 2.06 × 10−9) (c). In b and c, symbol size is relative to the total mutation count of the sample. The saturation levels of red and blue represent the exposure levels of signatures 17 and 1, respectively.

Source data

Supplementary Figure 3 Sample-wise distribution of somatic substitutions at CBSs.

The CTCF motif region is indicated by shading. A total of 20 of the 198 tumors with the most CBS mutations are shown.

Source data

Supplementary Figure 4 Position-specific expected and observed mutation distributions at CBSs in 198 MSS CRCs.

(a) Left, expected mutation distribution at CBSs under signature 17 for the six substitution types. The expected value was calculated for every base using weights of triplets (sequence context) in signature 17. Right, observed number of mutations for the 6 substitution types at CBSs in 198 MSS CRC samples. (b) Difference in observed and expected mutation counts for the six substitution types, where both observed and expected values have been normalized by dividing by the median count in 10,000-bp regions centered on the CBSs. Substitutions of A:T bases emerge at a significantly higher level than expected from the sequence context alone (see Supplementary Table 3 for statistical analysis).

Source data

Supplementary Figure 5 Somatic substitution distribution across 28,331 CBSs in 198 MSS CRCs.

The y axis is on a logarithmic scale. A total of 1,966 substitutions were observed in 1,553 of 28,331 sites.

Source data

Supplementary Figure 6 Somatic substitutions near CTCF and RAD21 sites in CRC.

(a) Venn diagram showing the numbers of RAD21 and CTCF sites as determined by ChIP-exo. A hit for a CTCF motif was required to be within a 200-bp window for the CTCF-binding site. RAD21 and CTCF were found to colocalize at 28,331 sites (CBSs). Accumulation of somatic substitutions was observed at CBSs but not at sites bound by RAD21 or CTCF alone. Mutation frequencies are indicated for sites bound by RAD21, CTCF or both. (b) Somatic substitutions at 28,364 high-affinity CTCF motif occurrences without a nearby RAD21 and CTCF ChIP-exo peak. The affinity threshold was set to 14.954 to match approximately the number of CBSs and thus to create a sufficiently large set of CTCF motif occurrences for comparison. MSS CRCs show no accumulation of substitutions at these sites. POLε-exo CRCs do not show a decrease in substitution rate in 400-bp windows flanking CTCF motif occurrences, unlike in the flanking regions of CTCF-binding sites colocalized by RAD21 (that is, CBSs).

Source data

Supplementary Figure 7 Somatic substitutions at binding sites for the REST, MAX, KLF5 and HNF4A transcription factors in 198 MSS CRC samples.

Binding sites are indicated by shading.

Source data

Supplementary Figure 8 Allelic fractions of somatic substitutions in CRCs.

(a,b) Density plots show the allelic fraction of somatic substitutions in the tumors with the highest signature 17 exposure (top 20) (a) and the tumors with the lowest signature 2 exposure (bottom 20) (b). Allelic fractions for the six substitution types are shown separately.

Source data

Supplementary Figure 9 Consensus germline SNPs from the 1000 Genomes Project and conservation (GERP) scores at 28,331 CBSs.

The figure pair at the top illustrates SNPs and GERP scores at CBSs and 1,000-bp flanking regions. The figure pair below that shows CBSs and 30-bp flanking regions. Germline variation increases slightly in the 50-bp regions flanking the CBSs and decreases significantly at CBSs.

Source data

Supplementary Figure 10 Sample-wise somatic substitutions at CBSs in ICGC and CRC data sets.

Samples with at least ten CBS substitutions are shown. Different substitution types at CBSs in an individual sample are shown by colored bars. Gray bars show the amount of CBS substitutions relative to the total substitution load in the cancer. Samples with more A:T changes typically show higher relative mutation counts at CBSs. The following eight ICGC data sets are shown: esophageal adenocarcinoma (ESAD-UK), hepatocellular carcinoma (LIRI-JP, LICA-FR, LINC-JP), pancreatic cancer (PACA-CA, PACA-AU), ovarian cancer (OV-AU) and malignant lymphoma (MALY-DE).

Source data

Supplementary Figure 11 Somatic substitutions at 28,331 CBSs in ICGC data sets.

A total of eight tumor types, where whole-genome sequencing data were available through ICGC and the substitution count was at least ten, are shown.

Source data

Supplementary Figure 12 CTCF motif binding affinity at CBSs in reference sequence and after somatic mutation.

Only sites with a mutation at a core motif are shown. Colors indicate substitution type. Note that outlier points are due to zero-probability bases in the motif. Data are shown for MSS (n = 198), MSI (n = 12) and POLε-exo (n = 3) CRCs.

Source data

Supplementary Figure 13 Core CTCF motif substitutions and expression of neighboring, opposite-strand gene pairs.

(ac) Nearby gene pairs without CBSs (a) show stronger correlation in gene expression than gene pairs separated by CBSs (b) and random gene pairs (c). (d) Gene pairs separated by CBSs having one or more somatic mutations did not provide enough data to meaningfully examine possible differences between samples with core CTCF motif substitutions (red points) and samples without core substitutions (gray-black). The x and y axes represent rankings of forward- and reverse-strand gene expression, respectively, and counts are the number of observed rank pairs over all opposite-strand gene pairs (at a distance of <20 kb). Data points closer to the diagonal suggest stronger correlation. Data are shown for 102 ovarian cancer samples; in total, 15 core CTCF substitutions were identified residing between 20 neighboring gene pairs (at a distance of <20 kb).

Source data

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–13, Supplementary Tables 3 and 5–26, and Supplementary Note. (PDF 2202 kb)

Supplementary Table 1

Somatic mutation counts in total and at CBSs, and signature exposures. (XLSX 66 kb)

Supplementary Table 2

Genomic coordinates and orientations for all CBSs (GRCh37). (XLSX 1536 kb)

Supplementary Table 4

Somatic mutation clusters in ENCODE and LoVo regions. Includes CTCF position frequency matrix and POLE exonuclease mutations. (XLSX 39 kb)

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Katainen, R., Dave, K., Pitkänen, E. et al. CTCF/cohesin-binding sites are frequently mutated in cancer. Nat Genet 47, 818–821 (2015).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer