Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Genome-wide CTCF distribution in vertebrates defines equivalent sites that aid the identification of disease-associated genes

An Erratum to this article was published on 06 September 2011

This article has been updated

Abstract

Many genomic alterations associated with human diseases localize in noncoding regulatory elements located far from the promoters they regulate, making it challenging to link noncoding mutations or risk-associated variants with target genes. The range of action of a given set of enhancers is thought to be defined by insulator elements bound by the 11 zinc-finger nuclear factor CCCTC-binding protein (CTCF). Here we analyzed the genomic distribution of CTCF in various human, mouse and chicken cell types, demonstrating the existence of evolutionarily conserved CTCF-bound sites beyond mammals. These sites preferentially flank transcription factor–encoding genes, often associated with human diseases, and function as enhancer blockers in vivo, suggesting that they act as evolutionarily invariant gene boundaries. We then applied this concept to predict and functionally demonstrate that the polymorphic variants associated with multiple sclerosis located within the EVI5 gene impinge on the adjacent gene GFI1.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Detection and conservation of CTCF-binding sites.
Figure 2: Functional validation of CTCF sites as insulators.
Figure 3: CONSYN-CTCF sites preferentially flank transcription factors involved in developmental processes.
Figure 4: Genes separated by CTCF sites have differential expression patterns and are associated with human diseases.
Figure 5: Constitutive CTCF sites help assign target genes for noncoding mutations.
Figure 6: CTCF sites in the EVI5 gene act as insulators that prevent the interaction of GFI1-associated CREs with the EVI5 promoter.

Similar content being viewed by others

Accession codes

Accessions

Gene Expression Omnibus

Change history

  • 03 June 2011

    In the version of this article initially published, the affiliation for authors at the Department of Molecular and Cellular Biology, Centro Nacional de Biotecnología, Madrid, Spain, was incomplete. The full affiliation is "Department of Molecular and Cellular Biology, Centro Nacional de Biotecnología, CSIC, Madrid, Spain." The error has been corrected in the HTML and PDF versions of the article.

References

  1. Elgar, G. & Vavouri, T. Tuning in to the signals: noncoding sequence conservation in vertebrate genomes. Trends Genet. 24, 344–352 (2008).

    Article  CAS  PubMed  Google Scholar 

  2. Manolio, T.A. Genomewide association studies and assessment of the risk of disease. N. Engl. J. Med. 363, 166–176 (2010).

    Article  CAS  PubMed  Google Scholar 

  3. Epstein, D.J. Cis-regulatory mutations in human disease. Brief. Funct. Genomics Proteomics 8, 310–316 (2009).

    Article  CAS  Google Scholar 

  4. Ragvin, A. et al. Long-range gene regulation links genomic type 2 diabetes and obesity risk regions to HHEX, SOX4, and IRX3. Proc. Natl. Acad. Sci. USA 107, 775–780 (2010).

    Article  CAS  PubMed  Google Scholar 

  5. Phillips, J.E. & Corces, V.G. CTCF: master weaver of the genome. Cell 137, 1194–1211 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Barski, A. et al. High-resolution profiling of histone methylations in the human genome. Cell 129, 823–837 (2007).

    Article  CAS  PubMed  Google Scholar 

  7. Cuddapah, S. et al. Global analysis of the insulator binding protein CTCF in chromatin barrier regions reveals demarcation of active and repressive domains. Genome Res. 19, 24–32 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Bushey, A.M., Ramos, E. & Corces, V.G. Three subclasses of a Drosophila insulator show distinct and cell type-specific genomic distributions. Genes Dev. 23, 1338–1350 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Nègre, N. et al. A comprehensive map of insulator elements for the Drosophila genome. PLoS Genet. 6, e1000814 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Ohlsson, R., Bartkuhn, M. & Renkawitz, R. CTCF shapes chromatin by multiple mechanisms: the impact of 20 years of CTCF research on understanding the workings of chromatin. Chromosoma 119, 351–360 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Ishihara, K., Oshimura, M. & Nakao, M. CTCF-dependent chromatin insulator is linked to epigenetic remodeling. Mol. Cell 23, 733–742 (2006).

    Article  CAS  PubMed  Google Scholar 

  12. Yao, H. et al. Mediation of CTCF transcriptional insulation by DEAD-box RNA-binding protein p68 and steroid receptor RNA activator SRA. Genes Dev. 24, 2543–2555 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Parelho, V. et al. Cohesins functionally associate with CTCF on mammalian chromosome arms. Cell 132, 422–433 (2008).

    Article  CAS  PubMed  Google Scholar 

  14. Rubio, E.D. et al. CTCF physically links cohesin to chromatin. Proc. Natl. Acad. Sci. USA 105, 8309–8314 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Wendt, K.S. et al. Cohesin mediates transcriptional insulation by CCCTC-binding factor. Nature 451, 796–801 (2008).

    Article  CAS  PubMed  Google Scholar 

  16. Mikkelsen, T.S. et al. Comparative epigenomic analysis of murine and human adipogenesis. Cell 143, 156–169 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Shubin, N., Tabin, C. & Carroll, S. Deep homology and the origins of evolutionary novelty. Nature 457, 818–823 (2009).

    Article  CAS  PubMed  Google Scholar 

  18. Oksenberg, J.R., Baranzini, S.E., Sawcer, S. & Hauser, S.L. The genetics of multiple sclerosis: SNPs to pathways to pathogenesis. Nat. Rev. Genet. 9, 516–526 (2008).

    Article  CAS  PubMed  Google Scholar 

  19. Handel, A.E., Handunnetthi, L., Giovannoni, G., Ebers, G.C. & Ramagopalan, S.V. Genetic and environmental factors and the distribution of multiple sclerosis in Europe. Eur. J. Neurol. 17, 1210–1214 (2010).

    Article  CAS  PubMed  Google Scholar 

  20. Hoffjan, S. & Akkad, D.A. The genetics of multiple sclerosis: an update 2010. Mol. Cell. Probes 24, 237–243 (2010).

    Article  CAS  PubMed  Google Scholar 

  21. Hafler, D.A. et al. Risk alleles for multiple sclerosis identified by a genomewide study. N. Engl. J. Med. 357, 851–862 (2007).

    Article  CAS  PubMed  Google Scholar 

  22. Hoppenbrouwers, I.A. et al. EVI5 is a risk gene for multiple sclerosis. Genes Immun. 9, 334–337 (2008).

    Article  CAS  PubMed  Google Scholar 

  23. Australia and New Zealand Multiple Sclerosis Genetics Consortium (ANZgene). Genome-wide association study identifies new multiple sclerosis susceptibility loci on chromosomes 12 and 20. Nat. Genet. 41, 824–828 (2009).

  24. Alcina, A. et al. Tag-SNP analysis of the GFI1–EVI5-RPL5–FAM69 risk locus for multiple sclerosis. Eur. J. Hum. Genet. 18, 827–831 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Jothi, R., Cuddapah, S., Barski, A., Cui, K. & Zhao, K. Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data. Nucleic Acids Res. 36, 5221–5231 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Rhead, B. et al. The UCSC Genome Browser database: update 2010. Nucleic Acids Res. 38, D613–D619 (2010).

    Article  CAS  PubMed  Google Scholar 

  27. Ovcharenko, I., Nobrega, M.A., Loots, G.G. & Stubbs, L. ECR Browser: a tool for visualizing and accessing data from comparisons of multiple vertebrate genomes. Nucleic Acids Res. 32, W280–W286 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Filippova, G.N. et al. An exceptionally conserved transcriptional repressor, CTCF, employs different combinations of zinc fingers to bind diverged promoter sequences of avian and mammalian c-myc oncogenes. Mol. Cell. Biol. 16, 2802–2813 (1996).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Chen, X. et al. Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell 133, 1106–1117 (2008).

    Article  CAS  PubMed  Google Scholar 

  30. Pikaart, M.J., Recillas-Targa, F. & Felsenfeld, G. Loss of transcriptional activity of a transgene is accompanied by DNA methylation and histone deacetylation and is prevented by insulators. Genes Dev. 12, 2852–2862 (1998).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Recillas-Targa, F. et al. Position-effect protection and enhancer blocking by the chicken β-globin insulator are separable activities. Proc. Natl. Acad. Sci. USA 14, 6883–6888 (2002).

    Article  Google Scholar 

  32. Wallace, J.A. & Felsenfeld, G. We gather together: insulators and genome organization. Curr. Opin. Genet. Dev. 17, 400–407 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Guelen, L. et al. Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature 453, 948–951 (2008).

    Article  CAS  PubMed  Google Scholar 

  34. Lunyak, V.V. et al. Developmentally regulated activation of a SINE B2 repeat as a domain boundary in organogenesis. Science 317, 248–251 (2007).

    Article  CAS  PubMed  Google Scholar 

  35. Recillas-Targa, F., Bell, A.C. & Felsenfeld, G. Positional enhancer-blocking activity of the chicken β-globin insulator in transiently transfected cells. Proc. Natl. Acad. Sci. USA 96, 14354–14359 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Bessa, J. et al. Zebrafish enhancer detection (ZED) vector: a new tool to facilitate transgenesis and the functional analysis of cis-regulatory regions in zebrafish. Dev. Dyn. 238, 2409–2417 (2009).

    Article  CAS  PubMed  Google Scholar 

  37. Ravasi, T. et al. An atlas of combinatorial transcriptional regulation in mouse and man. Cell 140, 744–752 (2010).

    Article  CAS  PubMed  Google Scholar 

  38. Kleinjan, D.A. & van Heyningen, V. Long-range control of gene expression: emerging mechanisms and disruption in disease. Am. J. Hum. Genet. 76, 8–32 (2005).

    Article  CAS  PubMed  Google Scholar 

  39. Hemmer, B., Cepok, S., Zhou, D. & Sommer, N. Multiple sclerosis–a coordinated immune attack across the blood brain barrier. Curr. Neurovasc. Res. 1, 141–150 (2004).

    Article  PubMed  Google Scholar 

  40. Phelan, J.D., Shroyer, N.F., Cook, T., Gebelein, B. & Grimes, H.L. Gfi1-cells and circuits: unraveling transcriptional networks of development and disease. Curr. Opin. Hematol. 17, 300–307 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Wilson, N.K. et al. Gfi1 expression is controlled by five distinct regulatory regions spread over 100 kilobases, with Scl/Tal1, Gata2, PU.1, Erg, Meis1, and Runx1 acting as upstream regulators in early hematopoietic cells. Mol. Cell. Biol. 30, 3853–3863 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Achiron, A. et al. Microarray analysis identifies altered regulation of nuclear receptor family members in the pre-disease state of multiple sclerosis. Neurobiol. Dis. 38, 201–209 (2010).

    Article  CAS  PubMed  Google Scholar 

  43. Gonzalez, S. et al. Oncogenic activity of Cdc6 through repression of the INK4/ARF locus. Nature 440, 702–706 (2006).

    Article  CAS  PubMed  Google Scholar 

  44. Escamilla-Del-Arenal, M. & Recillas-Targa, F. GATA-1 modulates the chromatin structure and activity of the chicken α-globin 3′ enhancer. Mol. Cell. Biol. 28, 575–586 (2008).

    Article  CAS  PubMed  Google Scholar 

  45. Rincón-Arano, H., Guerrero, G., Valdes-Quezada, C. & Recillas-Targa, F. Chicken α-globin switching depends on autonomous silencing of the embryonic π globin gene by epigenetics mechanisms. J. Cell. Biochem. 108, 675–687 (2009).

    Article  PubMed  Google Scholar 

  46. Blankenberg, D. et al. Galaxy: a web-based genome analysis tool for experimentalists. Curr. Protoc. Mol. Biol. 19, 19.10.1–19.1.21 (2010).

    Google Scholar 

  47. Blankenberg, D et al. A framework for collaborative analysis of ENCODE data: making large-scale analyses biologist-friendly. Genome Res. 17, 960–964 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Zambelli, F., Pesole, G. & Pavesi, G. Pscan: finding over-represented transcription factor binding site motifs in sequences from co-regulated or co-expressed genes. Nucleic Acids Res. 37, W247–W252 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Matys, V. et al. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 34, D108–D110 (2006).

    Article  CAS  PubMed  Google Scholar 

  50. Bailey, T.L. & Elkan, C. The value of prior knowledge in discovering motifs with MEME. Proc. Int. Conf. Intell. Syst. Mol. Biol. 3, 21–29 (1995).

    CAS  PubMed  Google Scholar 

  51. Martin, D. et al. GOToolBox: functional analysis of gene datasets based on Gene Ontology. Genome Biol. 5, R101 (2004).

    Article  PubMed  PubMed Central  Google Scholar 

  52. Hsu, F. et al. The UCSC known genes. Bioinformatics 22, 1036–1046 (2006).

    Article  CAS  PubMed  Google Scholar 

  53. Su, A.I. et al. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl. Acad. Sci. USA 101, 6062–6067 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Dekker, J., Rippe, K., Dekker, M. & Kleckner, N. Capturing chromosome conformation. Science 295, 1306–1311 (2002).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This research was supported by the following grants: BFU2007-60042/BMC, BFU2010-14839, Petri PET2007_0158, CONSOLIDER CSD2007-00008 (Spanish Ministerio de Ciencia e Innovación (MICINN)) and Proyecto de Excelencia CVI-3488 (Junta de Andalucía) to J.L.G.-S.; BFU2009-07044 (MICINN) and Proyecto de Excelencia CVI 2658 (Junta de Andalucía) to F.C.; FIS PI081636 (ISCIII) to F.M.; PN-SAF2009-11491 (MICINN) and Proyecto de Excelencia P07-CVI-02551 (Junta de Andalucía) to A.A.; BFU2008-00838, CONSOLIDER CSD2007-00008 (MICINN), Regional Government of Madrid (CAM S-SAL-0190-2006) and the Pro-CNIC Foundation to M.M.; BFU2006-12185 and BIO2009-12697 (MICINN) to L.M.; Dirección General de Asuntos del Personal Académico, Universidad Nacional Autónoma de México (IN209403, IN214407 and IN203811) and Consejo Nacional de Ciencia y Tecnología, México (CONACyT: 42653-Q, 58767 and 128464) to F.R.-T.; Intramural Research Program of the US NCBI (NIH) to I.O. and BIO2006-03380, CONSOLIDER CSD2007-00050 (MICINN) and RETICS RD07/0067/0012 (Spanish MICINN) to R.G. L.M. thanks A. Fernández for technical assistance and L. Barrios for statistical analysis. F.R.-T. thanks G.G. Avendaño for technical assistance.

Author information

Authors and Affiliations

Authors

Contributions

J.L.G.-S. and F.C. conceived the study, designed the experiments, interpreted results and wrote the manuscript. D.M. devised bioinformatics methods, carried out data analysis and wrote the paper. C.P., M.S. and M.A.B. conducted mouse ChIP experiments. C.V.-Q., M.F.-M. and F.R.-T. carried out chicken ChIP experiments. E.C.-M., E.M. and L.M. carried out insulator assays. A.F.M. conducted the 3C experiments. F.M., A.A. and M.F. provided PBMCs from blood cells and carried out qrtPCR, CNRA/CNRB activity assay in a luciferase reporter assay, quantification of GFI1 relative expression of 108 PBMC samples, genotyping of the EVI5 rs11804321 and statistical analysis. O.D. carried out the high-throughput sequencing. O.B., L.T., I.O., S.C. and P.S.P. did data analysis. M.M. and R.G. collaborated in the experimental design, discussion of results and in writing the manuscript.

Corresponding authors

Correspondence to Fernando Casares or José Luis Gómez-Skarmeta.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–7 and Supplementary Methods (PDF 4457 kb)

Supplementary Table 1

Genes flanking CTCF sites in human, mouse and chicken (XLS 21996 kb)

Supplementary Table 2

Gene Ontology of genes associated with CTCF sites (Biological Processes) (XLS 15459 kb)

Supplementary Table 3

Gene Ontology of genes associated with CTCF sites (Molecular Function) (XLS 4972 kb)

Supplementary Table 4

Primers used to amplify the human CTCF bound regions and in the 3C experiments (XLS 50 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Martin, D., Pantoja, C., Miñán, A. et al. Genome-wide CTCF distribution in vertebrates defines equivalent sites that aid the identification of disease-associated genes. Nat Struct Mol Biol 18, 708–714 (2011). https://doi.org/10.1038/nsmb.2059

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nsmb.2059

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing