Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

In pursuit of design principles of regulatory sequences

Key Points

  • Regulatory DNA sequences contain information on the timing and level at which different genes are expressed. An ongoing challenge is to uncover the means by which this information is encoded and executed, and thereby also advance our understanding of fundamental biological processes such as development, differentiation and disease.

  • High-throughput methods for in vitro characterization of transcription factor (TF) sequence preferences and for in vivo genome-wide measurements of TF binding improved our understanding of fundamental 'building blocks' of regulatory sequences — namely, TF binding sites (TFBSs). However, these also revealed a pronounced gap between predictions based on the in vitro-derived motifs and the binding patterns observed in cells.

  • Different states of regulatory proteins (such as TFs) in different in vivo conditions or cell types, including the formation of complexes and interaction with cofactors, can result in differential sequence preferences. Accounting for these preferences can help to bridge the gap between in vitro characterization of binding specificity and binding profiles obtained in vivo.

  • TFBSs appear in a range of combinations and organizations within regulatory sequences. The extent to and the manner by which the expression outcome depends on properties of the composition and arrangement of these regulatory architectures differ. For some regulatory sequences, functional dependencies (for example, on the number of TFBSs or the relative distance between these sites) can be characterized, which provides hints into the mechanisms involved (for example, TF cooperativity).

  • The sequence context in which TFBSs are embedded can have important effects on TF binding and the expression outcome. These include effects of TFBS-flanking base pairs, which may be mediated by DNA shape, and effects related to GC content, as may be mediated by nucleosome occupancy.

  • A surge of new methods that couple the classic reporter assay with high-throughput sequencing provide efficient means for quantitatively examining the ability of many thousands of different sequences — both derived from native genomic sequences and systematically designed — to drive gene expression. A challenge for future research is to improve our mechanistic and predictive understanding of these data sets, as well as their relationship to the endogenous activity of promoters and enhancers.

  • An improved understanding of various properties of regulatory sequences and their functional implications can facilitate a better interpretation of personal genomes. There are already efforts to incorporate regulatory annotations into genome-wide association studies and expression quantitative trait locus analyses.

Abstract

Instructions for when, where and to what level each gene should be expressed are encoded within regulatory sequences. The importance of motifs recognized by DNA-binding regulators has long been known, but their extensive characterization afforded by recent technologies only partly accounts for how regulatory instructions are encoded in the genome. Here, we review recent advances in our understanding of regulatory sequences that influence transcription and go beyond the description of motifs. We discuss how understanding different aspects of the sequence-encoded regulation can help to unravel the genotype–phenotype relationship, which would lead to a more accurate and mechanistic interpretation of personal genome sequences.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Mechanisms affecting transcription factor binding: beyond simple binding site specificities.
Figure 2: Properties of regulatory sequence architectures.
Figure 3: A possible scheme for the incorporation of regulatory annotations to the analysis of expression quantitative trait loci.

Similar content being viewed by others

References

  1. Jacob, F. & Monod, J. Genetic regulatory mechanisms in the synthesis of proteins. J. Mol. Biol. 3, 318–356 (1961).

    CAS  PubMed  Google Scholar 

  2. Levine, M. Transcriptional enhancers in animal development and evolution. Curr. Biol. 20, R754–R763 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Williamson, I., Hill, R. E. & Bickmore, W. A. Enhancers: from developmental genetics to the genetics of common human disease. Dev. Cell 21, 17–19 (2011).

    Article  CAS  PubMed  Google Scholar 

  4. Dickel, D. E., Visel, A. & Pennacchio, L. A. Functional anatomy of distant-acting mammalian enhancers. Phil. Trans. R. Soc. B 368, 20120359 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Visel, A., Rubin, E. M. & Pennacchio, L. A. Genomic views of distant-acting enhancers. Nature 461, 199–205 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Sakabe, N. J., Savic, D. & Nobrega, M. A. Transcriptional enhancers in development and disease. Genome Biol. 13, 238 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Cowper-Sal lari, R. et al. Breast cancer risk-associated SNPs modulate the affinity of chromatin for FOXA1 and alter gene expression. Nature Genet. 44, 1191–1198 (2012).

    Article  CAS  PubMed  Google Scholar 

  9. Sur, I., Tuupanen, S., Whitington, T., Aaltonen, L. A. & Taipale, J. Lessons from functional analysis of genome-wide association studies. Cancer Res. 73, 4180–4184 (2013).

    Article  CAS  PubMed  Google Scholar 

  10. Struhl, K. Yeast transcriptional regulatory mechanisms. Annu. Rev. Genet. 29, 651–674 (1995).

    Article  CAS  PubMed  Google Scholar 

  11. Ptashne, M. & Gann, A. Transcriptional activation by recruitment. Nature 386, 569–577 (1997).

    Article  CAS  PubMed  Google Scholar 

  12. Harbison, C. T. et al. Transcriptional regulatory code of a eukaryotic genome. Nature 431, 99–104 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Venters, B. J. et al. A comprehensive genomic binding map of gene and chromatin regulatory proteins in Saccharomyces. Mol. Cell 41, 480–492 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Johnson, D. S., Mortazavi, A., Myers, R. M. & Wold, B. Genome-wide mapping of in vivo protein–DNA interactions. Science 316, 1497–1502 (2007).

    CAS  PubMed  Google Scholar 

  15. Arvey, A., Agius, P., Noble, W. S. & Leslie, C. Sequence and chromatin determinants of cell-type-specific transcription factor binding. Genome Res. 22, 1723–1734 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Rhee, H. S. & Pugh, B. F. Comprehensive genome-wide protein–DNA interactions detected at single-nucleotide resolution. Cell 147, 1408–1419 (2011). This paper presents ChIP-exo, which is an extension of the ChIP–seq protocol. This method substantially improves the accuracy of identifying genomic locations of DNA binding events by using an exonuclease to trim immunoprecipitated DNA to a precise distance from the crosslinking site. It was applied to several yeast TFs and to human CCCTC-binding factor (CTCF). In later studies, it was also applied to yeast pre-initiation complexes and to human initiation factors, which provided mechanistic insights into transcription initiation.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Hesselberth, J. R. et al. Global mapping of protein–DNA interactions in vivo by digital genomic footprinting. Nature Methods 6, 283–289 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Neph, S. et al. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature 489, 83–90 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Lickwar, C. R., Mueller, F., Hanlon, S. E., McNally, J. G. & Lieb, J. D. Genome-wide protein–DNA binding dynamics suggest a molecular clutch for transcription factor function. Nature 484, 251–255 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Poorey, K. et al. Measuring chromatin interaction dynamics on the second time scale at single-copy genes. Science 342, 369–372 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Stormo, G. D. & Zhao, Y. Determining the specificity of protein–DNA interactions. Nature Rev. Genet. 11, 751–760 (2010).

    Article  CAS  PubMed  Google Scholar 

  22. MacIsaac, K. D. et al. An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC Bioinformatics 7, 113 (2006).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  23. Berger, M. F. & Bulyk, M. L. Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors. Nature Protoc. 4, 393–411 (2009).

    Article  CAS  Google Scholar 

  24. Zhao, Y., Granas, D. & Stormo, G. D. Inferring binding energies from selected binding sites. PLoS Comput. Biol. 5, e1000590 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  25. Jolma, A. et al. DNA-binding specificities of human transcription factors. Cell 152, 327–339 (2013).

    Article  CAS  PubMed  Google Scholar 

  26. Maerkl, S. J. & Quake, S. R. A systems approach to measuring the binding energy landscapes of transcription factors. Science 315, 233–237 (2007).

    Article  CAS  PubMed  Google Scholar 

  27. Fordyce, P. M. et al. De novo identification and biophysical characterization of transcription-factor binding sites with microfluidic affinity analysis. Nature Biotech. 28, 970–975 (2010).

    Article  CAS  Google Scholar 

  28. Nutiu, R. et al. Direct measurement of DNA affinity landscapes on a high-throughput sequencing instrument. Nature Biotech. 29, 659–664 (2011).

    Article  CAS  Google Scholar 

  29. Badis, G. et al. A library of yeast transcription factor motifs reveals a widespread function for Rsc3 in targeting nucleosome exclusion at promoters. Mol. Cell 32, 878–887 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Zhu, C. et al. High-resolution DNA-binding specificity analysis of yeast transcription factors. Genome Res. 19, 556–566 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Grove, C. A. et al. A multiparameter network reveals extensive divergence between C. elegans bHLH transcription factors. Cell 138, 314–327 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Badis, G. et al. Diversity and complexity in DNA recognition by transcription factors. Science 324, 1720–1723 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Jones, R. B., Gordus, A., Krall, J. A. & MacBeath, G. A quantitative protein interaction network for the ErbB receptors using protein microarrays. Nature 439, 168–174 (2006).

    Article  CAS  PubMed  Google Scholar 

  34. Siggers, T., Duyzend, M. H., Reddy, J., Khan, S. & Bulyk, M. L. Non-DNA-binding cofactors enhance DNA-binding specificity of a transcriptional regulatory complex. Mol. Syst. Biol. 7, 555 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  35. ENCODE Project Consortium et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  36. Sharon, E. et al. Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters. Nature Biotech. 30, 521–530 (2012).

    Article  CAS  Google Scholar 

  37. Rajkumar, A. S., Denervaud, N. & Maerkl, S. J. Mapping the fine structure of a eukaryotic promoter input–output function. Nature Genet. 45, 1207–1215 (2013). This study measures the activity of ~200 variants of the PHO5 promoter in yeast that differ in the binding site for the regulating TF Pho4. Temporal promoter activity measurements throughout induction were obtained with a microfluidic-based platform. Previously characterized in vitro affinities were found to be highly predictive of the activity of the corresponding promoter variants in vivo . Subtle tuning of promoter activity could be achieved by manipulating the base pairs flanking the TFBS core.

    Article  CAS  PubMed  Google Scholar 

  38. Mogno, I., Kwasnieski, J. C. & Cohen, B. A. Massively parallel synthetic promoter assays reveal the in vivo effects of binding site variants. Genome Res. 23, 1908–1915 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Kheradpour, P. et al. Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. Genome Res. 23, 800–811 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Segal, E., Raveh-Sadka, T., Schroeder, M., Unnerstall, U. & Gaul, U. Predicting expression patterns from regulatory sequence in Drosophila segmentation. Nature 451, 535–540 (2008).

    Article  CAS  PubMed  Google Scholar 

  41. Gisselbrecht, S. S. et al. Highly parallel assays of tissue-specific enhancers in whole Drosophila embryos. Nature Methods 10, 774–780 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Liu, X., Lee, C. K., Granek, J. A., Clarke, N. D. & Lieb, J. D. Whole-genome comparison of Leu3 binding in vitro and in vivo reveals the importance of nucleosome occupancy in target site selection. Genome Res. 16, 1517–1528 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Guertin, M. J., Martins, A. L., Siepel, A. & Lis, J. T. Accurate prediction of inducible transcription factor binding intensities in vivo. PLoS Genet. 8, e1002610 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Zhou, X. & O'Shea, E. K. Integrated approaches reveal determinants of genome-wide binding and function of the transcription factor Pho4. Mol. Cell 42, 826–836 (2011). This paper sets out to bridge the gap between the frequent occurrences of the TF Pho4 motif along the genome and its binding pattern in vivo . It suggests that several mechanisms are at play. Nucleosome occupancy seems to restrict Pho4 binding, which is further tuned by competition with Cbf1 — another TF that has similar sequence preferences. A cooperative interaction between Pho4 and a nearby binding Pho2 is further required to activate transcription.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Gordan, R. et al. Genomic regions flanking E-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape. Cell Rep. 3, 1093–1104 (2013). This study uses an extension of PBM to characterize the binding specificities of two E-box-binding TFs — Cbf1 and Tye7 — for putative binding sites in their genomic context. A differential specificity based on the base pairs flanking the core motif is characterized, and a computational model suggests that such specificity is mediated by distinct preferences for three-dimensional DNA shape properties.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Bresnick, E. H., Katsumura, K. R., Lee, H. Y., Johnson, K. D. & Perkins, A. S. Master regulatory GATA transcription factors: mechanistic principles and emerging links to hematologic malignancies. Nucleic Acids Res. 40, 5819–5831 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Siggers, T. & Gordan, R. Protein–DNA binding: complexities and multi-protein codes. Nucleic Acids Res. 42, 2099–2111 (2014).

    Article  CAS  PubMed  Google Scholar 

  48. Slattery, M. et al. Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins. Cell 147, 1270–1282 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Lelli, K. M., Slattery, M. & Mann, R. S. Disentangling the many layers of eukaryotic transcriptional regulation. Annu. Rev. Genet. 46, 43–68 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Biggin, M. D. Animal transcription networks as highly connected, quantitative continua. Dev. Cell 21, 611–626 (2011).

    Article  CAS  PubMed  Google Scholar 

  51. Spitz, F. & Furlong, E. E. Transcription factors: from enhancer binding to developmental control. Nature Rev. Genet. 13, 613–626 (2012).

    Article  CAS  PubMed  Google Scholar 

  52. Papatsenko, D., Goltsev, Y. & Levine, M. Organization of developmental enhancers in the Drosophila embryo. Nucleic Acids Res. 37, 5665–5677 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Erives, A. & Levine, M. Coordinate enhancers share common organizational features in the Drosophila genome. Proc. Natl Acad. Sci. USA 101, 3851–3856 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Rastegar, S. et al. The words of the regulatory code are arranged in a variable manner in highly conserved enhancers. Dev. Biol. 318, 366–377 (2008).

    Article  CAS  PubMed  Google Scholar 

  55. Lusk, R. W. & Eisen, M. B. Evolutionary mirages: selection on binding site composition creates the illusion of conserved grammars in Drosophila enhancers. PLoS Genet. 6, e1000829 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  56. Hare, E. E., Peterson, B. K., Iyer, V. N., Meier, R. & Eisen, M. B. Sepsid even-skipped enhancers are functionally conserved in Drosophila despite lack of sequence conservation. PLoS Genet. 4, e1000106 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  57. Weirauch, M. T. & Hughes, T. R. Conserved expression without conserved regulatory sequence: the more things change, the more they stay the same. Trends Genet. 26, 66–74 (2010).

    Article  CAS  PubMed  Google Scholar 

  58. Zinzen, R. P., Girardot, C., Gagneur, J., Braun, M. & Furlong, E. E. Combinatorial binding predicts spatio-temporal cis-regulatory activity. Nature 462, 65–70 (2009).

    Article  CAS  PubMed  Google Scholar 

  59. Brown, C. D., Johnson, D. S. & Sidow, A. Functional architecture and evolution of transcriptional elements that drive gene coexpression. Science 317, 1557–1560 (2007).

    Article  CAS  PubMed  Google Scholar 

  60. Smith, R. P. et al. Massively parallel decoding of mammalian regulatory sequences supports a flexible organizational model. Nature Genet. 45, 1021–1028 (2013).

    Article  CAS  PubMed  Google Scholar 

  61. Tanay, A. Extensive low-affinity transcriptional interactions in the yeast genome. Genome Res. 16, 962–972 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Evans, N. C., Swanson, C. I. & Barolo, S. Sparkling insights into enhancer structure, function, and evolution. Curr. Top. Dev. Biol. 98, 97–120 (2012). This review focuses on the sparkling eye enhancer of the D. melanogaster Pax2 (also known as sv ) gene. It discusses various analyses, including the examination of sparkling orthologues and the expression measurements in several cell types of the effects of different manipulations to the composition and arrangement of TFBSs. These analyses reveal a complex combinatorial code that is densely encoded in the enhancer and several highly constrained architectural properties to ensure proper cell-specific expression.

    Article  CAS  PubMed  Google Scholar 

  63. Parker, D. S., White, M. A., Ramos, A. I., Cohen, B. A. & Barolo, S. The cis-regulatory logic of Hedgehog gradient responses: key roles for gli binding affinity, competition, and cooperativity. Sci Signal 4, ra38 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  64. Rogers, K. W. & Schier, A. F. Morphogen gradients: from generation to interpretation. Annu. Rev. Cell Dev. Biol. 27, 377–407 (2011).

    Article  CAS  PubMed  Google Scholar 

  65. Zaret, K. S. & Carroll, J. S. Pioneer transcription factors: establishing competence for gene expression. Genes Dev. 25, 2227–2241 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  66. Arnosti, D. N. & Kulkarni, M. M. Transcriptional enhancers: intelligent enhanceosomes or flexible billboards? J. Cell Biochem. 94, 890–898 (2005).

    Article  CAS  PubMed  Google Scholar 

  67. Arnosti, D. N., Barolo, S., Levine, M. & Small, S. The eve stripe 2 enhancer employs multiple modes of transcriptional synergy. Development 122, 205–214 (1996).

    CAS  PubMed  Google Scholar 

  68. Liu, F. & Posakony, J. W. Role of architecture in the function and specificity of two Notch-regulated transcriptional enhancer modules. PLoS Genet. 8, e1002796 (2012). This study examines the contribution of architectural properties of two Notch-regulated enhancers to their spatially distinct activities. Although one enhancer is resistant, to a large extent, to manipulations in the arrangement of its constituent TFBSs, the other enhancer is highly sensitive. The authors discuss how this differential reliance on architectural properties may be linked to the different developmental stages and contexts in which these enhancers function.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Senger, K. et al. Immunity regulatory DNAs share common organizational features in Drosophila. Mol. Cell 13, 19–32 (2004).

    Article  CAS  PubMed  Google Scholar 

  70. Crocker, J., Tamori, Y. & Erives, A. Evolution acts on enhancer organization to fine-tune gradient threshold readouts. PLoS Biol. 6, e263 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  71. Swanson, C. I., Evans, N. C. & Barolo, S. Structural rules and complex regulatory circuitry constrain expression of a Notch- and EGFR-regulated eye enhancer. Dev. Cell 18, 359–370 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Panne, D., Maniatis, T. & Harrison, S. C. An atomic model of the interferon-β enhanceosome. Cell 129, 1111–1123 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Thanos, D. & Maniatis, T. Virus induction of human IFNβ gene expression requires the assembly of an enhanceosome. Cell 83, 1091–1100 (1995).

    Article  CAS  PubMed  Google Scholar 

  74. Junion, G. et al. A transcription factor collective defines cardiac cell fate and reflects lineage history. Cell 148, 473–486 (2012).

    Article  CAS  PubMed  Google Scholar 

  75. Melnikov, A. et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nature Biotech. 30, 271–277 (2012).

    Article  CAS  Google Scholar 

  76. Patwardhan, R. P. et al. Massively parallel functional dissection of mammalian enhancers in vivo. Nature Biotech. 30, 265–270 (2012).

    Article  CAS  Google Scholar 

  77. Kwasnieski, J. C., Mogno, I., Myers, C. A., Corbo, J. C. & Cohen, B. A. Complex effects of nucleotide variants in a mammalian cis-regulatory element. Proc. Natl Acad. Sci. USA 109, 19498–19503 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Lagha, M., Bothma, J. P. & Levine, M. Mechanisms of transcriptional precision in animal development. Trends Genet. 28, 409–416 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Guo, Y., Mahony, S. & Gifford, D. K. High resolution genome wide binding event finding and motif discovery reveals transcription factor spatial binding constraints. PLoS Comput. Biol. 8, e1002638 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Tirosh, I. & Barkai, N. Two strategies for gene regulation by promoter nucleosomes. Genome Res. 18, 1084–1091 (2008). This analysis of yeast promoters suggests two typical promoter structures that differ in their nucleosome positions, TFBS composition and location, expression variation and transcriptional plasticity (which is a measure of the degree by which gene expression is modulated across conditions); it contributes to our understanding of how different promoter architectures and dynamics may be used to attain different functional properties of expression.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Field, Y. et al. Distinct modes of regulation by chromatin encoded through nucleosome positioning signals. PLoS Comput. Biol. 4, e1000216 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  82. Rhee, H. S. & Pugh, B. F. Genome-wide structure and organization of eukaryotic pre-initiation complexes. Nature 483, 295–301 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Leonard, D. A., Rajaram, N. & Kerppola, T. K. Structural basis of DNA bending and oriented heterodimer binding by the basic leucine zipper domains of Fos and Jun. Proc. Natl Acad. Sci. USA 94, 4913–4918 (1997).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Morin, B., Nichols, L. A. & Holland, L. J. Flanking sequence composition differentially affects the binding and functional characteristics of glucocorticoid receptor homo- and heterodimers. Biochemistry 45, 7299–7306 (2006).

    Article  CAS  PubMed  Google Scholar 

  85. Nagaoka, M., Shiraishi, Y. & Sugiura, Y. Selected base sequence outside the target binding site of zinc finger protein Sp1. Nucleic Acids Res. 29, 4920–4929 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Aow, J. S. et al. Differential binding of the related transcription factors Pho4 and Cbf1 can tune the sensitivity of promoters to different levels of an induction signal. Nucleic Acids Res. 41, 4877–4887 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Rohs, R. et al. Origins of specificity in protein–DNA recognition. Annu. Rev. Biochem. 79, 233–269 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. Kornberg, R. D. & Lorch, Y. Twenty-five years of the nucleosome, fundamental particle of the eukaryote chromosome. Cell 98, 285–294 (1999).

    Article  CAS  PubMed  Google Scholar 

  89. Pique-Regi, R. et al. Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res. 21, 447–455 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Kaplan, T. et al. Quantitative models of the mechanisms that control genome-wide patterns of transcription factor binding during early Drosophila development. PLoS Genet. 7, e1001290 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  91. John, S. et al. Chromatin accessibility pre-determines glucocorticoid receptor binding patterns. Nature Genet. 43, 264–268 (2011).

    Article  CAS  PubMed  Google Scholar 

  92. Struhl, K. & Segal, E. Determinants of nucleosome positioning. Nature Struct. Mol. Biol. 20, 267–273 (2013).

    Article  CAS  Google Scholar 

  93. Fu, Y., Sinha, M., Peterson, C. L. & Weng, Z. The insulator binding protein CTCF positions 20 nucleosomes around its binding sites across the human genome. PLoS Genet. 4, e1000138 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  94. Tillo, D. & Hughes, T. R. G+C content dominates intrinsic nucleosome occupancy. BMC Bioinformatics 10, 442 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  95. Narlikar, L., Gordan, R. & Hartemink, A. J. A nucleosome-guided map of transcription factor binding sites in yeast. PLoS Comput. Biol. 3, e215 (2007).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  96. Raveh-Sadka, T., Levo, M. & Segal, E. Incorporating nucleosomes into thermodynamic models of transcription regulation. Genome Res. 19, 1480–1496 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  97. Wasson, T. & Hartemink, A. J. An ensemble model of competitive multi-factor binding of the genome. Genome Res. 19, 2101–2112 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Brogaard, K., Xi, L., Wang, J. P. & Widom, J. A map of nucleosome positions in yeast at base-pair resolution. Nature 486, 496–501 (2012). This paper presents a chemical-based approach to map nucleosome positions genome wide at a single-base-pair resolution. This method reveals overlapping positions within the population and allows a high-resolution examination of nucleosome positions relative to sequence and genomic features such as TSS, TFBSs and Pol II pause sites.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  99. Khoueiry, P. et al. A cis-regulatory signature in ascidians and flies, independent of transcription factor binding sites. Curr. Biol. 20, 792–802 (2010).

    Article  CAS  PubMed  Google Scholar 

  100. Segal, E. & Widom, J. Poly(dA:dT) tracts: major determinants of nucleosome organization. Curr. Opin. Struct. Biol. 19, 65–71 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  101. Iyer, V. & Struhl, K. Poly(dA:dT), a ubiquitous promoter element that stimulates transcription via its intrinsic DNA structure. EMBO J. 14, 2570–2579 (1995).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  102. Raveh-Sadka, T. et al. Manipulating nucleosome disfavoring sequences allows fine-tune regulation of gene expression in yeast. Nature Genet. 44, 743–750 (2012).

    Article  CAS  PubMed  Google Scholar 

  103. Tillo, D. et al. High nucleosome occupancy is encoded at human regulatory sequences. PLoS ONE 5, e9129 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  104. Ballare, C. et al. Nucleosome-driven transcription factor binding and gene regulation. Mol. Cell 49, 67–79 (2013).

    Article  CAS  PubMed  Google Scholar 

  105. White, M. A., Myers, C. A., Corbo, J. C. & Cohen, B. A. Massively parallel in vivo enhancer assay reveals that highly local features determine the cis-regulatory function of ChIP–seq peaks. Proc. Natl Acad. Sci. USA 110, 11952–11957 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  106. Degner, J. F. et al. DNase I sensitivity QTLs are a major determinant of human expression variation. Nature 482, 390–394 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  107. Schaub, M. A., Boyle, A. P., Kundaje, A., Batzoglou, S. & Snyder, M. Linking disease associations with regulatory information in the human genome. Genome Res. 22, 1748–1759 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  108. Gaffney, D. J. et al. Dissecting the regulatory architecture of gene expression QTLs. Genome Biol. 13, R7 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  109. Lee, S. I. et al. Learning a prior on regulatory potential from eQTL data. PLoS Genet. 5, e1000358 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  110. Manor, O. & Segal, E. Robust prediction of expression differences among human individuals using only genotype information. PLoS Genet. 9, e1003396 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  111. Bintu, L. et al. Transcriptional regulation by the numbers: models. Curr. Opin. Genet. Dev. 15, 116–124 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  112. Segal, E. & Widom, J. From DNA sequence to transcriptional behaviour: a quantitative approach. Nature Rev. Genet. 10, 443–456 (2009).

    Article  CAS  PubMed  Google Scholar 

  113. Kadonaga, J. T. Perspectives on the RNA polymerase II core promoter. Wiley Interdiscip. Rev. Dev. Biol. 1, 40–51 (2012).

    Article  CAS  PubMed  Google Scholar 

  114. Mogno, I., Vallania, F., Mitra, R. D. & Cohen, B. A. TATA is a modular component of synthetic promoters. Genome Res. 20, 1391–1397 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  115. Lubliner, S., Keren, L. & Segal, E. Sequence features of yeast and human core promoters that are predictive of maximal promoter activity. Nucleic Acids Res. 41, 5569–5581 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  116. Schoenberg, D. R. & Maquat, L. E. Regulation of cytoplasmic mRNA decay. Nature Rev. Genet. 13, 246–259 (2012).

    Article  CAS  PubMed  Google Scholar 

  117. Burgess, D. J. Global analyses of determinants of RNA decay. Nature Rev. Genet. http://dx.doi.org/10.1038/nrg3710 (2014).

  118. Kornblihtt, A. R. et al. Alternative splicing: a pivotal step between eukaryotic transcription and translation. Nature Rev. Mol. Cell Biol. 14, 153–165 (2013).

    Article  CAS  Google Scholar 

  119. Ingolia, N. T. Ribosome profiling: new views of translation, from single codons to genome scale. Nature Rev. Genet. 15, 205–213 (2014).

    Article  CAS  PubMed  Google Scholar 

  120. Natoli, G. & Andrau, J. C. Noncoding transcription at enhancers: general principles and functional models. Annu. Rev. Genet. 46, 1–19 (2012).

    Article  CAS  PubMed  Google Scholar 

  121. Lam, M. T., Li, W., Rosenfeld, M. G. & Glass, C. K. Enhancer RNAs and regulated transcriptional programs. Trends Biochem. Sci. 39, 170–182 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  122. Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  123. Lohmueller, J. J., Armel, T. Z. & Silver, P. A. A tunable zinc finger-based framework for Boolean logic computation in mammalian cells. Nucleic Acids Res. 40, 5180–5187 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  124. Teo, W. S. & Chang, M. W. Development and characterization of AND-gate dynamic controllers with a modular synthetic GAL1 core promoter in Saccharomyces cerevisiae. Biotechnol. Bioeng. 111, 144–151 (2013).

    Article  PubMed  CAS  Google Scholar 

  125. Perez-Pinera, P. et al. Synergistic and tunable human gene activation by combinations of synthetic transcription factors. Nature Methods 10, 239–242 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  126. Gaj, T., Gersbach, C. A. & Barbas, C. F. 3rd. ZFN, TALEN, and CRISPR/Cas-based methods for genome engineering. Trends Biotechnol. 31, 397–405 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  127. Kasinathan, S., Orsi, G. A., Zentner, G. E., Ahmad, K. & Henikoff, S. High-resolution mapping of transcription factor binding sites on native chromatin. Nature Methods 11, 203–209 (2014).

    Article  CAS  PubMed  Google Scholar 

  128. Vierstra, J., Wang, H., John, S., Sandstrom, R. & Stamatoyannopoulos, J. A. Coupling transcription factor occupancy to nucleosome architecture with DNase–FLASH. Nature Methods 11, 66–72 (2014).

    Article  CAS  PubMed  Google Scholar 

  129. Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nature Methods 10, 1213–1218 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  130. Rajewsky, N., Vergassola, M., Gaul, U. & Siggia, E. D. Computational detection of genomic cis-regulatory modules applied to body patterning in the early Drosophila embryo. BMC Bioinformatics 3, 30 (2002).

    Article  PubMed  PubMed Central  Google Scholar 

  131. Berman, B. P. et al. Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc. Natl Acad. Sci. USA 99, 757–762 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  132. Heintzman, N. D. et al. Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature 459, 108–112 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  133. Ernst, J. et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43–49 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  134. Simon, J. M., Giresi, P. G., Davis, I. J. & Lieb, J. D. Using formaldehyde-assisted isolation of regulatory elements (FAIRE) to isolate active regulatory DNA. Nature Protoc. 7, 256–267 (2012).

    Article  CAS  Google Scholar 

  135. Khambata-Ford, S. et al. Identification of promoter regions in the human genome by using a retroviral plasmid library-based functional reporter gene assay. Genome Res. 13, 1765–1774 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  136. Jory, A. et al. A survey of 6,300 genomic fragments for cis-regulatory activity in the imaginal discs of Drosophila melanogaster. Cell Rep. 2, 1014–1024 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  137. Jenett, A. et al. A GAL4-driver line resource for Drosophila neurobiology. Cell Rep. 2, 991–1001 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  138. Manning, L. et al. A resource for manipulating gene expression and analyzing cis-regulatory modules in the Drosophila CNS. Cell Rep. 2, 1002–1013 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  139. Arnold, C. D. et al. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science 339, 1074–1077 (2013).

    Article  CAS  PubMed  Google Scholar 

  140. Dickel, D. E. et al. Function-based identification of mammalian enhancers using site-specific integration. Nature Methods 11, 566–571 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  141. Murtha, M. et al. FIREWACh: high-throughput functional detection of transcriptional regulatory modules in mammalian cells. Nature Methods 11, 559–565 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  142. Kinney, J. B., Murugan, A., Callan, C. G. Jr & Cox, E. C. Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence. Proc. Natl Acad. Sci. USA 107, 9158–9163 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  143. Kosuri, S. et al. Composability of regulatory sequences controlling transcription and translation in Escherichia coli. Proc. Natl Acad. Sci. USA 110, 14024–14029 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  144. Patwardhan, R. P. et al. High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis. Nature Biotech. 27, 1173–1175 (2009).

    Article  CAS  Google Scholar 

  145. Kleinjan, D. A. & van Heyningen, V. Long-range control of gene expression: emerging mechanisms and disruption in disease. Am. J. Hum. Genet. 76, 8–32 (2005).

    Article  CAS  PubMed  Google Scholar 

  146. Gibcus, J. H. & Dekker, J. The hierarchy of the 3D genome. Mol. Cell 49, 773–782 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  147. Smallwood, A. & Ren, B. Genome organization and long-range regulation of gene expression by enhancers. Curr. Opin. Cell Biol. 25, 387–394 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  148. Nagano, T. et al. Single-cell Hi-C reveals cell-to-cell variability in chromosome structure. Nature 502, 59–64 (2013).

    Article  CAS  PubMed  Google Scholar 

  149. Zhang, Y. et al. Chromatin connectivity maps reveal dynamic promoter-enhancer long-range associations. Nature 504, 306–310 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  150. Phillips-Cremins, J. E. et al. Architectural protein subclasses shape 3D organization of genomes during lineage commitment. Cell 153, 1281–1295 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  151. Barolo, S. Shadow enhancers: frequently asked questions about distributed cis-regulatory information and enhancer redundancy. Bioessays 34, 135–141 (2012).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This work was supported by grants from the European Research Council and the US National Institutes of Health to E.S. M.L. thanks the Azrieli Foundation for the award of an Azrieli Fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eran Segal.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

PowerPoint slides

Glossary

Genome-wide association studies

(GWASs). Genome-wide studies that are designed to identify genetic associations with an observable trait, disease or condition.

Expression quantitative trait locus

(eQTL). A locus at which genetic variation is associated with variation in gene expression levels.

In vivo

In the context of this Review, in vivo refers to experiments carried out in living cells, regardless of whether the cells are within or outside a whole organism (sometimes referred to as ex vivo).

Position weight matrices

(PWMs). Representations for the specificity of DNA-binding proteins, in which a score is assigned to every possible base pair at each position in the binding site. A PWM score for a specific sequence is the sum of position-specific scores for each of its base pair.

Dissociation constant

(Kd). The dissociation constant between two molecules (in this context, for a transcription factor and a DNA sequence). It is the ratio of the off:on rate for the formation and dissolution of the complex.

Orthologous

Pertaining to loci in two species that are derived from a common ancestral locus.

Homotypic TFBS cluster

A cluster of multiple transcription factor binding sites for the same transcription factor.

Heterotypic clustering

Clustering of multiple transcription factor binding sites for different transcription factors.

Regression model

A model that describes the relationship between a dependent variable and one or more independent variables.

Linkage disequilibrium

(LD). A nonrandom association of alleles at different loci (as might be observed for particular alleles at neighbouring loci that tend to be co-inherited).

DNase I sensitivity QTLs

(dsQTLs). Locations at which DNaseI hypersensitive site sequencing read depth significantly correlates with the genotypes at nearby single-nucleotide polymorphisms, or insertions or deletions.

Bayesian approach

A modelling approach that uses Bayes' rule and that computes a posterior probability that a hypothesis is true using a combination of prior beliefs and observed data.

k-nearest neighbours

(KNN). A non-parametric regression method that predicts the value of a new point on the basis of the values of the k closest training points in the feature space.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Levo, M., Segal, E. In pursuit of design principles of regulatory sequences. Nat Rev Genet 15, 453–468 (2014). https://doi.org/10.1038/nrg3684

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg3684

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing