Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Genomic approaches towards finding cis-regulatory modules in animals

Key Points

  • Predicting cis-regulatory modules (CRMs) can both expand our understanding of biology and have applications in medicine and other fields. For example, most of the genetic variants that are significantly associated with susceptibility to disease are not in protein-coding regions, and we surmise that many affect regulation of gene expression.

  • Clusters of transcription factor binding sites (TFBSs) do not provide sufficient specificity to be used for effective prediction of CRMs in large-scale investigations. However, when applied to identified or likely CRMs, they provide important insights into potentially cooperating transcription factors and help to build a regulatory code.

  • Predicting CRMs on the basis of strong constraint in non-coding sequences finds an important subset of the CRMs: that is, those that control developmental regulatory genes. However, they miss a large number of, possibly most, transcription-factor-occupied segments.

  • High-quality data sets of epigenetic features associated with gene regulation, such as DNase-hypersensitive sites, transcription factor occupancy and diagnostic histone modifications, provide a reasonably unbiased, sensitive view of the regulatory landscape. The integrative analysis of these features is currently the best approach for predicting CRMs.

  • Evolutionary and motif patterns in predicted CRMs should be used to partition the identified regions into categories that could have different functions in regulation.

  • Predicting and testing CRMs is essential for deciphering a regulatory code. Synthetic biology approaches, in which DNA sequences inferred for a particular function are synthesized and tested for that function, can assess the accuracy of models for regulatory codes and point to needed improvements.

Abstract

Differential gene expression is the fundamental mechanism underlying animal development and cell differentiation. However, it is a challenge to identify comprehensively and accurately the DNA sequences that are required to regulate gene expression: namely, cis-regulatory modules (CRMs). Three major features, either singly or in combination, are used to predict CRMs: clusters of transcription factor binding site motifs, non-coding DNA that is under evolutionary constraint and biochemical marks associated with CRMs, such as histone modifications and protein occupancy. The validation rates for predictions indicate that identifying diagnostic biochemical marks is the most reliable method, and understanding is enhanced by the analysis of motifs and conservation patterns within those predicted CRMs.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Rationales for three approaches to CRM prediction.
Figure 2: Evolutionary diversity in tissue-specific enhancers.

Similar content being viewed by others

The ENCODE Project Consortium, Michael P. Snyder, … Richard M. Myers

References

  1. Davidson, E. H. & Erwin, D. H. Gene regulatory networks and the evolution of animal body plans. Science 311, 796–800 (2006).

    Article  CAS  PubMed  Google Scholar 

  2. King, M. C. & Wilson, A. C. Evolution at two levels in humans and chimpanzees. Science 188, 107–116 (1975).

    Article  CAS  PubMed  Google Scholar 

  3. Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 9362–9367 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Tuupanen, S. et al. The common colorectal cancer predisposition SNP rs6983267 at chromosome 8q24 confers potential to enhanced Wnt signaling. Nature Genet. 41, 885–890 (2009).

    Article  CAS  PubMed  Google Scholar 

  5. Jia, L. et al. Functional enhancers at the gene-poor 8q24 cancer-linked locus. PLoS Genet 5, e1000597 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  6. Harismendy, O. et al. 9p21 DNA variants associated with coronary artery disease impair interferon-γ signalling response. Nature 470, 264–268 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Farrell, J. J. et al. A 3 bp deletion in the HBS1L-MYB intergenic region on chromosome 6q23 is associated with HbF expression. Blood 117, 4935–4945 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Maston, G. A., Evans, S. K. & Green, M. R. Transcriptional regulatory elements in the human genome. Annu. Rev. Genom. Hum. Genet. 7, 29–59 (2006).

    Article  CAS  Google Scholar 

  9. Schones, D. E. & Zhao, K. Genome-wide approaches to studying chromatin modifications. Nature Rev. Genet. 9, 179–191 (2008).

    Article  CAS  PubMed  Google Scholar 

  10. Rando, O. J. & Chang, H. Y. Genome-wide views of chromatin structure. Annu. Rev. Biochem. 78, 245–271 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Noonan, J. P. & McCallion, A. S. Genomics of long-range regulatory elements. Annu. Rev. Genom. Hum. Genet. 11, 1–23 (2010).

    Article  CAS  Google Scholar 

  12. Hawkins, R. D., Hon, G. C. & Ren, B. Next-generation genomics: an integrative approach. Nature Rev. Genet. 11, 476–486 (2010).

    Article  CAS  PubMed  Google Scholar 

  13. Lenhard, B., Sandelin, A. & Carninci, P. Metazoan promoters: emerging characteristics and insights into transcriptional regulation. Nature Rev. Genet. 13, 233–245 (2012).

    Article  CAS  PubMed  Google Scholar 

  14. Frazer, K. A., Elnitski, L., Church, D., Dubchak, I. & Hardison, R. C. Cross-species sequence comparisons: a review of methods and available resources. Genome Res. 13, 1–12 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Wasserman, W. W. & Sandelin, A. Applied bioinformatics for the identification of regulatory elements. Nature Rev. Genet. 5, 276–287 (2004).

    Article  CAS  PubMed  Google Scholar 

  16. Elnitski, L., Jin, V. X., Farnham, P. J. & Jones, S. J. Locating mammalian transcription factor binding sites: a survey of computational and experimental techniques. Genome Res. 16, 1455–1464 (2006).

    Article  CAS  PubMed  Google Scholar 

  17. Su, J., Teichmann, S. A. & Down, T. A. Assessing computational methods of cis-regulatory module prediction. PLoS Comput. Biol. 6, e1001020 (2010). This paper provides a comprehensive evaluation and comparison of sequence-based computational approaches for identifying cis -regulatory modules in human and D. melanogaster using large test sets.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  18. Zhang, Y. et al. Primary sequence and epigenetic determinants of in vivo occupancy of genomic DNA by GATA1. Nucleic Acids Res. 37, 7024–7038 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Cao, Y. et al. Genome-wide MyoD binding in skeletal muscle cells: a potential for broad cellular reprogramming. Dev. Cell 18, 662–674 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Cheng, Y. et al. Erythroid GATA1 function revealed by genome-wide analysis of transcription factor occupancy, histone modifications, and mRNA expression. Genome Res. 19, 2172–2184 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Pribnow, D. Nucleotide sequence of an RNA polymerase binding site at an early T7 promoter. Proc. Natl Acad. Sci. USA 72, 784–788 (1975).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Carninci, P. et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nature Genet. 38, 626–635 (2006). This paper demonstrated that high-throughput sequencing of the 5′ ends of transcripts can be used to identify transcription start sites. Using this approach revealed different classes of mammalian promoter architecture.

    Article  CAS  PubMed  Google Scholar 

  23. Banerji, J., Rusconi, S. & Schaffner, W. Expression of a β-globin gene is enhanced by remote SV40 DNA sequences. Cell 27, 299–308 (1981).

    Article  CAS  PubMed  Google Scholar 

  24. Fromm, M. & Berg, P. Simian virus 40 early- and late-region promoter functions are enhanced by the 72 base-pair repeat inserted at distant locations and inverted orientations. Mol. Cell. Biol. 3, 991–999 (1983).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Gillies, S. D., Morrison, S. L., Oi, V. T. & Tonegawa, S. A tissue-specific transcription enhancer element is located in the major intron of a rearranged immunoglobulin heavy chain gene. Cell 33, 717–728 (1983).

    Article  CAS  PubMed  Google Scholar 

  26. Rusche, L. N., Kirchmaier, A. L. & Rine, J. The establishment, inheritance, and function of silenced chromatin in Saccharomyces cerevisiae. Annu. Rev. Biochem. 72, 481–516 (2003).

    Article  CAS  PubMed  Google Scholar 

  27. Martowicz, M. L., Grass, J. A., Boyer, M. E., Guend, H. & Bresnick, E. H. Dynamic GATA factor interplay at a multicomponent regulatory region of the GATA-2 locus. J. Biol. Chem. 280, 1724–1732 (2005).

    Article  CAS  PubMed  Google Scholar 

  28. Jing, H. et al. Exchange of GATA factors mediates transitions in looped chromatin organization at a developmentally regulated gene locus. Mol. Cell 29, 232–242 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Maniatis, T., Goodbourn, S. & Fischer, J. A. Regulation of inducible and tissue-specific gene expression. Science 236, 1237–1245 (1987).

    Article  CAS  PubMed  Google Scholar 

  30. Lettice, L. A. et al. A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly. Hum. Mol. Genet. 12, 1725–1735 (2003).

    Article  CAS  PubMed  Google Scholar 

  31. Schirm, S., Jiricny, J. & Schaffner, W. The SV40 enhancer can be dissected into multiple segments, each with a different cell type specificity. Genes Dev. 1, 65–74 (1987).

    Article  CAS  PubMed  Google Scholar 

  32. Ondek, B., Gross, L. & Herr, W. The SV40 enhancer contains two distinct levels of organization. Nature 333, 40–45 (1988).

    Article  CAS  PubMed  Google Scholar 

  33. Arnosti, D. N., Barolo, S., Levine, M. & Small, S. The eve stripe 2 enhancer employs multiple modes of transcriptional synergy. Development 122, 205–214 (1996).

    CAS  PubMed  Google Scholar 

  34. Pennacchio, L. A. et al. In vivo enhancer analysis of human conserved non-coding sequences. Nature 444, 499–502 (2006). The authors used extreme evolutionary conservation to identify predicted enhancers and showed that half of those tested drove reproducible tissue-specific expression patterns in mouse embryos.

    Article  CAS  PubMed  Google Scholar 

  35. Landry, J. R. et al. Expression of the leukemia oncogene Lmo2 is controlled by an array of tissue-specific elements dispersed over 100 kb and bound by Tal1/Lmo2, Ets, and Gata factors. Blood 113, 5783–5792 (2009).

    Article  CAS  PubMed  Google Scholar 

  36. Valenzuela, L. & Kamakaka, R. T. Chromatin insulators. Annu. Rev. Genet. 40, 107–138 (2006).

    Article  CAS  PubMed  Google Scholar 

  37. Wallace, J. A. & Felsenfeld, G. We gather together: insulators and genome organization. Curr. Opin. Genet. Dev. 17, 400–407 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Chung, J. H., Whiteley, M. & Felsenfeld, G. A. 5′ element of the chicken β-globin domain serves as an insulator in human erythroid cells and protects against position effect in Drosophila. Cell 74, 505–514 (1993).

    Article  CAS  PubMed  Google Scholar 

  39. Bell, A. C., West, A. G. & Felsenfeld, G. The protein CTCF is required for the enhancer blocking activity of vertebrate insulators. Cell 98, 387–396 (1999).

    Article  CAS  PubMed  Google Scholar 

  40. Schoborg, T. A. & Labrador, M. The phylogenetic distribution of non-CTCF insulator proteins is limited to insects and reveals that BEAF-32 is Drosophila lineage specific. J. Mol. Evol. 70, 74–84 (2010).

    Article  CAS  PubMed  Google Scholar 

  41. Recillas-Targa, F. et al. Position-effect protection and enhancer blocking by the chicken β-globin insulator are separable activities. Proc. Natl Acad. Sci. USA 99, 6883–6888 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Huang, S., Li, X., Yusufzai, T. M., Qiu, Y. & Felsenfeld, G. USF1 recruits histone modification complexes and is critical for maintenance of a chromatin barrier. Mol. Cell. Biol. 27, 7991–8002 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Phillips, J. E. & Corces, V. G. CTCF: master weaver of the genome. Cell 137, 1194–1211 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Kim, T. H. et al. Analysis of the vertebrate insulator protein CTCF-binding sites in the human genome. Cell 128, 1231–1245 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Wasserman, W. W. & Fickett, J. W. Identification of regulatory regions which confer muscle-specific gene expression. J. Mol. Biol. 278, 167–181 (1998). This was one of the first analyses to combine motif discovery and sequence conservation in a predictive model that identifies tissue-specific regulatory sequences.

    Article  CAS  PubMed  Google Scholar 

  46. Frith, M. C., Spouge, J. L., Hansen, U. & Weng, Z. Statistical significance of clusters of motifs represented by position specific scoring matrices in nucleotide sequences. Nucleic Acids Res. 30, 3214–3224 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Berman, B. P. et al. Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc. Natl Acad. Sci. USA 99, 757–762 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Markstein, M., Markstein, P., Markstein, V. & Levine, M. S. Genome-wide analysis of clustered Dorsal binding sites identifies putative target genes in the Drosophila embryo. Proc. Natl Acad. Sci. USA 99, 763–768 (2002).

    Article  CAS  PubMed  Google Scholar 

  49. Rebeiz, M., Reeves, N. L. & Posakony, J. W. SCORE: a computational approach to the identification of cis-regulatory modules and target genes in whole-genome sequence data. Site clustering over random expectation. Proc. Natl Acad. Sci. USA 99, 9888–9893 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Halfon, M. S., Grad, Y., Church, G. M. & Michelson, A. M. Computation-based discovery of related transcriptional regulatory modules and motifs using an experimentally validated combinatorial model. Genome Res. 12, 1019–1028 (2002).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. Schroeder, M. D. et al. Transcriptional control in the segmentation gene network of Drosophila. PLoS Biol. 2, E271 (2004).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  52. Zhou, Q. & Wong, W. H. CisModule: de novo discovery of cis-regulatory modules by hierarchical mixture modeling. Proc. Natl Acad. Sci. USA 101, 12114–12119 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Smith, A. D., Sumazin, P., Xuan, Z. & Zhang, M. Q. DNA motifs in human and mouse proximal promoters predict tissue-specific expression. Proc. Natl Acad. Sci. USA 103, 6275–6280 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Gertz, J., Siggia, E. D. & Cohen, B. A. Analysis of combinatorial cis-regulation in synthetic and genomic promoters. Nature 457, 215–218 (2009).

    Article  CAS  PubMed  Google Scholar 

  55. Chan, E. T. et al. Conservation of core gene expression in vertebrate tissues. J. Biol. 8, 33 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  56. Ludwig, M. Z. et al. Functional evolution of a cis-regulatory module. PLoS Biol. 3, e93 (2005).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  57. Hardison, R., Oeltjen, J. & Miller, W. Long human-mouse sequence alignments reveal novel regulatory elements: a reason to sequence the mouse genome. Genome Res. 7, 959–966 (1997).

    Article  CAS  PubMed  Google Scholar 

  58. Hardison, R. C. Conserved noncoding sequences are reliable guides to regulatory elements. Trends Genet. 16, 369–372 (2000).

    Article  CAS  PubMed  Google Scholar 

  59. Pennacchio, L. A. & Rubin, E. M. Genomic strategies to identify mammalian regulatory sequences. Nature Rev. Genet. 2, 100–109 (2001).

    Article  CAS  PubMed  Google Scholar 

  60. Dermitzakis, E. T., Reymond, A. & Antonarakis, S. E. Conserved non-genic sequences — an unexpected feature of mammalian genomes. Nature Rev. Genet. 6, 151–157 (2005).

    Article  CAS  PubMed  Google Scholar 

  61. Tagle, D. A. et al. Embryonic χ and γ globin genes of a prosimian primate (Galago crassicaudatus): nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints. J. Mol. Biol. 203, 7469–7480 (1988).

    Article  Google Scholar 

  62. Gumucio, D. L. et al. Phylogenetic footprinting reveals a nuclear protein which binds to silencer sequences in the human γ and χ globin genes. Mol. Cell. Biol. 12, 4919–4929 (1992).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Hardison, R. et al. Comparative analysis of the locus control region of the rabbit beta-like globin gene cluster: HS3 increases transient expression of an embryonic χ-globin gene. Nucl. Acids Res. 21, 1265–1272 (1993).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Elnitski, L., Miller, W. & Hardison, R. Conserved E boxes function as part of the enhancer in hypersensitive site 2 of the β-globin locus control region: role of basic helix-loop-helix proteins. J. Biol. Chem. 272, 369–378 (1997).

    Article  CAS  PubMed  Google Scholar 

  65. Xie, X. et al. Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals. Nature 434, 338–345 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Stark, A. et al. Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures. Nature 450, 219–232 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Kheradpour, P., Stark, A., Roy, S. & Kellis, M. Reliable prediction of regulator targets using 12 Drosophila genomes. Genome Res. 17, 1919–1931 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Emorine, L., Kuehl, M., Weir, L., Leder, P. & Max, E. E. A conserved sequence in the immunoglobulin Jk-Ck intron: possible enhancer element. Nature 304, 447–449 (1983).

    Article  CAS  PubMed  Google Scholar 

  69. Loots, G. G. et al. Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. Science 288, 136–140 (2000).

    Article  CAS  PubMed  Google Scholar 

  70. Frazer, K. A. et al. Noncoding sequences conserved in a limited number of mammals in the SIM2 interval are frequently functional. Genome Res. 14, 367–372 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Grice, E. A., Rochelle, E. S., Green, E. D., Chakravarti, A. & McCallion, A. S. Evaluation of the RET regulatory landscape reveals the biological relevance of a HSCR-implicated enhancer. Hum. Mol. Genet. 14, 3837–3845 (2005).

    Article  CAS  PubMed  Google Scholar 

  72. Johnson, D. S., Davidson, B., Brown, C. D., Smith, W. C. & Sidow, A. Noncoding regulatory sequences of Ciona exhibit strong correspondence between evolutionary constraint and functional importance. Genome Res. 14, 2448–2456 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Woolfe, A. et al. Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol. 3, e7 (2005). This paper demonstrated that many non-coding regions that are conserved between human and Fugu rubripes show tissue-specific enhancer functions in zebrafish embryos.

    Article  CAS  PubMed  Google Scholar 

  74. Visel, A. et al. Ultraconservation identifies a small subset of extremely constrained developmental enhancers. Nature Genet. 40, 158–160 (2008).

    Article  CAS  PubMed  Google Scholar 

  75. Attanasio, C. et al. Assaying the regulatory potential of mammalian conserved non-coding sequences in human cells. Genome Biol. 9, R168 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  76. Clark, A. G. et al. Evolution of genes and genomes on the Drosophila phylogeny. Nature 450, 203–218 (2007).

    Article  CAS  PubMed  Google Scholar 

  77. Loots, G. G., Ovcharenko, I., Pachter, L., Dubchak, I. & Rubin, E. M. rVista for comparative sequence-based discovery of functional transcription factor binding sites. Genome Res. 12, 832–839 (2002).

    Article  PubMed  PubMed Central  Google Scholar 

  78. Gibbs, R. A. et al. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428, 493–521 (2004).

    Article  CAS  PubMed  Google Scholar 

  79. Sinha, S., Schroeder, M. D., Unnerstall, U., Gaul, U. & Siggia, E. D. Cross-species comparison significantly improves genome-wide prediction of cis-regulatory modules in Drosophila. BMC Bioinformat. 5, 129 (2004).

    Article  CAS  Google Scholar 

  80. Blanchette, M. et al. Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression. Genome Res. 16, 656–668 (2006). This study combines the identification of motif clusters with sequence constraint to produce a set of cis -regulatory module predictions that capture many known modules.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Heintzman, N. D. et al. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nature Genet. 39, 311–318 (2007). By using a map of histone modifications and binding locations of key transcription factors, the authors have generated a model that predicts novel promoters and enhancers.

    Article  CAS  PubMed  Google Scholar 

  82. Gotea, V. et al. Homotypic clusters of transcription factor binding sites are a key component of human promoters and enhancers. Genome Res. 20, 565–577 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Donaldson, I. J. et al. Genome-wide identification of cis-regulatory sequences controlling blood and endothelial development. Hum. Mol. Genet. 14, 595–601 (2005).

    Article  CAS  PubMed  Google Scholar 

  84. Narlikar, L. et al. Genome-wide discovery of human heart enhancers. Genome Res. 20, 381–392 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Sinha, S. & He, X. MORPH: probabilistic alignment combined with hidden Markov models of cis-regulatory modules. PLoS Comput. Biol. 3, e216 (2007).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  86. Majoros, W. H. & Ohler, U. Modeling the evolution of regulatory elements by simultaneous detection and alignment with phylogenetic pair HMMs. PLoS Comput. Biol. 6, e1001037 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  87. Taylor, J. et al. ESPERR: learning strong and weak signals in genomic sequence alignments to identify functional elements. Genome Res. 16, 1596–1604 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. Wang, H. et al. Experimental validation of predicted mammalian erythroid cis-regulatory modules. Genome Res. 16, 1480–1492 (2006). In reference 87, the authors describe a 'motif-blind' model for predicting CRMs based on patterns in multi-sequence alignments of known regulatory regions. Reference 88 shows that these the predictions were validated at a good rate.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Miller, W. et al. 28-way vertebrate alignment and conservation track in the UCSC Genome Browser. Genome Res. 17, 1797–1808 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Kantorovitz, M. R. et al. Motif-blind, genome-wide discovery of cis-regulatory modules in Drosophila and mouse. Dev. Cell 17, 568–579 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  91. King, D. C. et al. Finding cis-regulatory elements using comparative genomics: some lessons from ENCODE data. Genome Res. 17, 775–786 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Schmidt, D. et al. Five-vertebrate ChIP–seq reveals the evolutionary dynamics of transcription factor binding. Science 328, 1036–1040 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  93. Boffelli, D., Nobrega, M. A. & Rubin, E. M. Comparative genomics at the vertebrate extremes. Nature Rev. Genet. 5, 456–465 (2004).

    Article  CAS  PubMed  Google Scholar 

  94. Petrykowska, H., Vockley, C. & Elnitski, L. Detection and characterization of silencers and enhancer-blockers in the greater CFTR locus. Genome Res. 18, 1238–1246 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  95. Goldberg, A. D., Allis, C. D. & Bernstein, E. Epigenetics: a landscape takes shape. Cell 128, 635–638 (2007).

    Article  CAS  PubMed  Google Scholar 

  96. Boyd, K. E. & Farnham, P. J. Myc versus USF: discrimination at the cad gene is determined by core promoter elements. Mol. Cell. Biol. 17, 2529–2537 (1997).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  97. Ren, B. et al. Genome-wide location and function of DNA binding proteins. Science 290, 2306–2309 (2000).

    Article  CAS  PubMed  Google Scholar 

  98. Wold, B. & Myers, R. M. Sequence census methods for functional genomics. Nature Methods 5, 19–21 (2008).

    Article  CAS  PubMed  Google Scholar 

  99. Boyle, A. P. et al. High-resolution mapping and characterization of open chromatin across the genome. Cell 132, 311–322 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  100. Hesselberth, J. R. et al. Global mapping of protein-DNA interactions in vivo by digital genomic footprinting. Nature Methods 6, 283–289 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  101. ENCODE Project Consortium. A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol. 9, e1001046 (2011).

  102. Gerstein, M. B. et al. Integrative analysis of the Caenorhabditis elegans genome by the modENCODE project. Science 330, 1775–1787 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  103. Roy, S. et al. Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science 330, 1787–1797 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  104. Bernstein, B. E. et al. The NIH Roadmap Epigenomics Mapping Consortium. Nature Biotech. 28, 1045–1048 (2010).

    Article  CAS  Google Scholar 

  105. Trinklein, N. D., Aldred, S. J., Saldanha, A. J. & Myers, R. M. Identification and functional analysis of human transcriptional promoters. Genome Res. 13, 308–312 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  106. Landolin, J. M. et al. Sequence features that drive human promoter function and tissue specificity. Genome Res. 20, 890–898 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  107. Roh, T. Y., Cuddapah, S. & Zhao, K. Active chromatin domains are defined by acetylation islands revealed by genome-wide mapping. Genes Dev. 19, 542–552 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  108. Heintzman, N. D. et al. Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature 459, 108–112 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  109. Visel, A. et al. ChIP–seq accurately predicts tissue-specific activity of enhancers. Nature 457, 854–858 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  110. Blow, M. J. et al. ChIP–seq identification of weakly conserved heart enhancers. Nature Genet. 42, 806–810 (2010).

    Article  CAS  PubMed  Google Scholar 

  111. Ernst, J. et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43–49 (2011). This study uses a probabilistic model to combine multiple chromatin marks maps into an integrated model to aid the interpretation of chromatin signatures.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  112. Cheng, Y. et al. Transcriptional enhancement by GATA1-occupied DNA segments is strongly associated with evolutionary constraint on the binding site motif. Genome Res. 18, 1896–1905 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  113. Wilson, N. K. et al. Combinatorial transcriptional control in blood stem/progenitor cells: genome-wide analysis of ten major transcriptional regulators. Cell Stem Cell 7, 532–544 (2010).

    Article  CAS  PubMed  Google Scholar 

  114. Tuan, D. Y., Solomon, W. B., London, I. M. & Lee, D. P. An erythroid-specific, developmental-stage-independent enhancer far upstream of the human “β-like globin” genes. Proc. Natl Acad. Sci. USA 86, 2554–2558 (1989).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  115. West, A. G., Gaszner, M. & Felsenfeld, G. Insulators: many functions, many mechanisms. Genes Dev. 16, 271–288 (2002).

    Article  PubMed  CAS  Google Scholar 

  116. Boyle, A. P. et al. High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. Genome Res. 21, 456–464 (2011). This paper demonstrates the use of DNase–seq for mapping open chromatin to predict regulatory regions and for footprinting individual transcription factor binding sites.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  117. Rhee, H. S. & Pugh, B. F. Comprehensive genome-wide protein–DNA interactions detected at single-nucleotide resolution. Cell 147, 1408–1419 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  118. Dostie, J. & Dekker, J. Mapping networks of physical interactions between genomic elements using 5C technology. Nature Protoc. 2, 988–1002 (2007).

    Article  CAS  Google Scholar 

  119. Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  120. Li, G. et al. Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell 148, 84–98 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  121. Machanick, P. & Bailey, T. L. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27, 1696–1697 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  122. Kasowski, M. et al. Variation in transcription factor binding among humans. Science 328, 232–235 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  123. Stranger, B. E. et al. Population genomics of human gene expression. Nature Genet. 39, 1217–1224 (2007).

    Article  CAS  PubMed  Google Scholar 

  124. Cheung, V. G. & Spielman, R. S. Genetics of human gene expression: mapping DNA variants that influence gene expression. Nature Rev. Genet. 10, 595–604 (2009).

    Article  CAS  PubMed  Google Scholar 

  125. Gaulton, K. J. et al. A map of open chromatin in human pancreatic islets. Nature Genet. 42, 255–259 (2010).

    Article  CAS  PubMed  Google Scholar 

  126. Kharchenko, P. V. et al. Comprehensive analysis of the chromatin landscape in Drosophila melanogaster. Nature 471, 480–485 (2011).

    Article  CAS  PubMed  Google Scholar 

  127. Hoffman, M. M. et al. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nature Methods 9, 473–476 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  128. Badis, G. et al. Diversity and complexity in DNA recognition by transcription factors. Science 324, 1720–1723 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  129. Gilmour, D. S. & Lis, J. T. Detecting protein–DNA interactions in vivo: distribution of RNA polymerase on specific bacterial genes. Proc. Natl Acad. Sci. USA 81, 4275–4279 (1984).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  130. Johnson, D. S., Mortazavi, A., Myers, R. M. & Wold, B. Genome-wide mapping of in vivo protein–DNA interactions. Science 316, 1497–1502 (2007).

    Article  CAS  PubMed  Google Scholar 

  131. Robertson, G. et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nature Methods 4, 651–657 (2007). These two papers introduce ChIP–seq, which enables the binding locations of transcription factors to be mapped to DNA.

    Article  CAS  PubMed  Google Scholar 

  132. Birney, E. et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007).

    Article  CAS  PubMed  Google Scholar 

  133. He, H. H. et al. Nucleosome dynamics define transcriptional enhancers. Nature Genet. 42, 343–347 (2010).

    Article  CAS  PubMed  Google Scholar 

  134. Staden, R. Methods for calculating the probabilities of finding patterns in sequences. Comput. Appl. Biosci. 5, 89–96 (1989).

    CAS  PubMed  Google Scholar 

  135. Claverie, J. M. & Audic, S. The statistical significance of nucleotide position-weight matrix matches. Comput. Appl. Biosci. 12, 431–439 (1996).

    CAS  PubMed  Google Scholar 

  136. Schones, D. E., Smith, A. D. & Zhang, M. Q. Statistical significance of cis-regulatory modules. BMC Bioinformat. 8, 19 (2007).

    Article  CAS  Google Scholar 

  137. Odom, D. T. et al. Tissue-specific transcriptional regulation has diverged significantly between human and mouse. Nature Genet. 39, 730–732 (2007).

    Article  CAS  PubMed  Google Scholar 

  138. Cuellar-Partida, G. et al. Epigenetic priors for identifying active transcription factor binding sites. Bioinformatics 28, 56–62 (2012).

    Article  CAS  PubMed  Google Scholar 

  139. Fujiwara, T. et al. Discovering hematopoietic mechanisms through genome-wide analysis of GATA factor chromatin occupancy. Mol. Cell 36, 667–681 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  140. Wu, W. et al. Dynamics of the epigenetic landscape during erythroid differentiation after GATA1 restoration. Genome Res. 21, 1659–1671 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  141. Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

Support is from US National Institutes of Health grants R01 DK065806, RC2 HG005573 and U54 HG004695 and funds from Emory University to J.T.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ross C. Hardison or James Taylor.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Related links

Related links

FURTHER INFORMATION

Ross C. Hardison's homepage

James Taylor's homepage

AlignACE

BLAST

CCGB: Miller lab (for BLASTZ and LASTZ)

ChromHMM (segmentation)

CIS-ANALYST

CisModule

Cistrome Project

COMET

CREAD

ENCODE Project

enhancer_classifier

Ensembl Genome Browser

ESPERR

FindPeaks (peak calling)

Galaxy

Gumby

LAGAN Toolkit Website (for MLAGAN and CHAOS)

MACS (peak calling)

MAQ (peak calling)

Nature Reviews Genetics Series on Regulatory elements

PeakSeq (peak calling)

PipMaker

PReMod database

QuEST (peak calling)

Regulatory sequence analysis tools

Siggia lab Rockefeller Univesrity software list (for Ahab)

Software for computational molecular biology (from Miller lab; for yama)

Stormo lab (for PATSER)

Supervised CRM prediction

TAMALPAIS (peak calling)

TFBScluster

UCSC Genome Browser

VISTA tools (for AVID and mVISTA)

VISTA Enhancer Browser

Glossary

Purifying selection

The evolutionary process by which deleterious mutations are removed from a population or genome. Also referred to as negative selection or evolutionary constraint.

Enhancers

DNA sequences that cause increased expression of their target gene (or genes).

Epigenetic features

Molecules and chemical modifications that are associated with genomic DNA, including covalent modifications of DNA and histones, RNA transcribed from the DNA, occupancy of DNA by transcription factors and accessibility of DNA in chromatin to DNases.

Transcription factor binding site

(TFBS). A short segment of DNA that is bound by a particular transcription factor in vivo.

TFBS motif

A short string of DNA base pairs (often 6–10 bp long) constituting the sequence recognized by the DNA-binding domain of the transcription factor.

Promoter

The DNA sequence that directs RNA polymerase to initiate transcription at the correct place.

TFBS consensus

A string of DNA nucleotides describing the most frequently occurring short sequences in a collection of TFBSs, usually including ambiguous positions (for example, R refers to G or A nucleotides).

Position-specific weight matrix

(PWM). A matrix providing the frequency at which each nucleotide is found at the positions of the TFBS consensus.

Silencers

DNA sequences that cause reduced expression of their target gene (or genes).

Insulators

DNA sequences that control the ability of an enhancer to regulate a promoter by an enhancer blocking activity or a domain barrier function or both.

Position effects

The observation that the level of expression of some genes is affected by their position on chromosomes, with normal level of expression in one location but altered expression when translocated. For example, proximity to centromeres is associated with lowered expression for many genes.

False positive

In a prediction experiment, a case in which the prediction is positive, but the true class is negative.

Sensitivity

In a prediction experiment, the proportion of the true class that is predicted by the method: that is, (number of true positives)/(number of true positives + number of false negatives).

Positive predictive value

In a prediction experiment, the proportion of positive predictions that are true positives.

Position-specific scoring matrix

(PSSM). A matrix providing the log ratio of frequency at which each nucleotide is found at the positions of the TFBS consensus relative to a background model.

TFBS motif instances

A match to a TFBS consensus or motif matrix within a longer DNA sequence (for example, a genome or chromosome).

Specificity

In a prediction experiment, the proportion of the false class that is not predicted by the method: that is, (number of true negatives)/(number of true negatives + number of false positives).

Logistic regression

A form of regression used when the output is binary. The predictor is a linear combination of the input variables transformed with the logistic function to form a probability. For classification, the coefficients are learned to maximize the (log) conditional likelihood of the training data.

Homotypic clusters

Clusters of similar transcription factor binding sites that often bind the same transcription factor.

Chromatin immunoprecipitation

(ChIP). A method for purifying the DNA segments that are in close contact with a transcription factor in living cells. After crosslinking DNA to native proteins in cells and preparing sheared chromatin, antibodies that specifically react with one transcription factor are used to isolate the DNA bound to that transcription factor.

ChIP followed by high-throughput sequencing

(ChIP–seq). A technique for mapping the particular segments of DNA purified by chromatin immunoprecipitation (ChIP): it involves massively parallel short-read (second generation) sequencing and then aligning the reads to a reference genome. ChIP–seq is often highly accurate and has close to whole-genome coverage.

Hidden Markov model

A statistical model in which internal states are not visible but the outputs of these states are, and the outputs can therefore be used to infer the internal states. This model can be used to determine biologically relevant states from chromatin immunoprecipitation followed by high-throughput sequencing (ChIP–seq) data sets.

Morpholino oligonucleotides

Synthetic oligonucleotides in which the ribose portion of the nucleotide is replaced a morpholino compound; these are more stable than RNA and can be used to interfere with gene activity in transgenic zebrafish.

ChIP–exo

An extension of ChIP–seq that includes exonuclease trimming after immunoprecipitation to increase the resolution of the mapped transcription-factor-bound sites.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hardison, R., Taylor, J. Genomic approaches towards finding cis-regulatory modules in animals. Nat Rev Genet 13, 469–483 (2012). https://doi.org/10.1038/nrg3242

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg3242

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research