Regulatory DNA sequences contain information on the timing and level at which different genes are expressed. An ongoing challenge is to uncover the means by which this information is encoded and executed, and thereby also advance our understanding of fundamental biological processes such as development, differentiation and disease.
High-throughput methods for in vitro characterization of transcription factor (TF) sequence preferences and for in vivo genome-wide measurements of TF binding improved our understanding of fundamental 'building blocks' of regulatory sequences — namely, TF binding sites (TFBSs). However, these also revealed a pronounced gap between predictions based on the in vitro-derived motifs and the binding patterns observed in cells.
Different states of regulatory proteins (such as TFs) in different in vivo conditions or cell types, including the formation of complexes and interaction with cofactors, can result in differential sequence preferences. Accounting for these preferences can help to bridge the gap between in vitro characterization of binding specificity and binding profiles obtained in vivo.
TFBSs appear in a range of combinations and organizations within regulatory sequences. The extent to and the manner by which the expression outcome depends on properties of the composition and arrangement of these regulatory architectures differ. For some regulatory sequences, functional dependencies (for example, on the number of TFBSs or the relative distance between these sites) can be characterized, which provides hints into the mechanisms involved (for example, TF cooperativity).
The sequence context in which TFBSs are embedded can have important effects on TF binding and the expression outcome. These include effects of TFBS-flanking base pairs, which may be mediated by DNA shape, and effects related to GC content, as may be mediated by nucleosome occupancy.
A surge of new methods that couple the classic reporter assay with high-throughput sequencing provide efficient means for quantitatively examining the ability of many thousands of different sequences — both derived from native genomic sequences and systematically designed — to drive gene expression. A challenge for future research is to improve our mechanistic and predictive understanding of these data sets, as well as their relationship to the endogenous activity of promoters and enhancers.
An improved understanding of various properties of regulatory sequences and their functional implications can facilitate a better interpretation of personal genomes. There are already efforts to incorporate regulatory annotations into genome-wide association studies and expression quantitative trait locus analyses.
Instructions for when, where and to what level each gene should be expressed are encoded within regulatory sequences. The importance of motifs recognized by DNA-binding regulators has long been known, but their extensive characterization afforded by recent technologies only partly accounts for how regulatory instructions are encoded in the genome. Here, we review recent advances in our understanding of regulatory sequences that influence transcription and go beyond the description of motifs. We discuss how understanding different aspects of the sequence-encoded regulation can help to unravel the genotype–phenotype relationship, which would lead to a more accurate and mechanistic interpretation of personal genome sequences.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Hybrid allele-specific ChIP-seq analysis identifies variation in brassinosteroid-responsive transcription factor binding linked to traits in maize
Genome Biology Open Access 08 May 2023
Nature Communications Open Access 05 May 2023
Nature Communications Open Access 28 November 2022
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
Jacob, F. & Monod, J. Genetic regulatory mechanisms in the synthesis of proteins. J. Mol. Biol. 3, 318–356 (1961).
Levine, M. Transcriptional enhancers in animal development and evolution. Curr. Biol. 20, R754–R763 (2010).
Williamson, I., Hill, R. E. & Bickmore, W. A. Enhancers: from developmental genetics to the genetics of common human disease. Dev. Cell 21, 17–19 (2011).
Dickel, D. E., Visel, A. & Pennacchio, L. A. Functional anatomy of distant-acting mammalian enhancers. Phil. Trans. R. Soc. B 368, 20120359 (2013).
Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
Visel, A., Rubin, E. M. & Pennacchio, L. A. Genomic views of distant-acting enhancers. Nature 461, 199–205 (2009).
Sakabe, N. J., Savic, D. & Nobrega, M. A. Transcriptional enhancers in development and disease. Genome Biol. 13, 238 (2012).
Cowper-Sal lari, R. et al. Breast cancer risk-associated SNPs modulate the affinity of chromatin for FOXA1 and alter gene expression. Nature Genet. 44, 1191–1198 (2012).
Sur, I., Tuupanen, S., Whitington, T., Aaltonen, L. A. & Taipale, J. Lessons from functional analysis of genome-wide association studies. Cancer Res. 73, 4180–4184 (2013).
Struhl, K. Yeast transcriptional regulatory mechanisms. Annu. Rev. Genet. 29, 651–674 (1995).
Ptashne, M. & Gann, A. Transcriptional activation by recruitment. Nature 386, 569–577 (1997).
Harbison, C. T. et al. Transcriptional regulatory code of a eukaryotic genome. Nature 431, 99–104 (2004).
Venters, B. J. et al. A comprehensive genomic binding map of gene and chromatin regulatory proteins in Saccharomyces. Mol. Cell 41, 480–492 (2011).
Johnson, D. S., Mortazavi, A., Myers, R. M. & Wold, B. Genome-wide mapping of in vivo protein–DNA interactions. Science 316, 1497–1502 (2007).
Arvey, A., Agius, P., Noble, W. S. & Leslie, C. Sequence and chromatin determinants of cell-type-specific transcription factor binding. Genome Res. 22, 1723–1734 (2012).
Rhee, H. S. & Pugh, B. F. Comprehensive genome-wide protein–DNA interactions detected at single-nucleotide resolution. Cell 147, 1408–1419 (2011). This paper presents ChIP-exo, which is an extension of the ChIP–seq protocol. This method substantially improves the accuracy of identifying genomic locations of DNA binding events by using an exonuclease to trim immunoprecipitated DNA to a precise distance from the crosslinking site. It was applied to several yeast TFs and to human CCCTC-binding factor (CTCF). In later studies, it was also applied to yeast pre-initiation complexes and to human initiation factors, which provided mechanistic insights into transcription initiation.
Hesselberth, J. R. et al. Global mapping of protein–DNA interactions in vivo by digital genomic footprinting. Nature Methods 6, 283–289 (2009).
Neph, S. et al. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature 489, 83–90 (2012).
Lickwar, C. R., Mueller, F., Hanlon, S. E., McNally, J. G. & Lieb, J. D. Genome-wide protein–DNA binding dynamics suggest a molecular clutch for transcription factor function. Nature 484, 251–255 (2012).
Poorey, K. et al. Measuring chromatin interaction dynamics on the second time scale at single-copy genes. Science 342, 369–372 (2013).
Stormo, G. D. & Zhao, Y. Determining the specificity of protein–DNA interactions. Nature Rev. Genet. 11, 751–760 (2010).
MacIsaac, K. D. et al. An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC Bioinformatics 7, 113 (2006).
Berger, M. F. & Bulyk, M. L. Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors. Nature Protoc. 4, 393–411 (2009).
Zhao, Y., Granas, D. & Stormo, G. D. Inferring binding energies from selected binding sites. PLoS Comput. Biol. 5, e1000590 (2009).
Jolma, A. et al. DNA-binding specificities of human transcription factors. Cell 152, 327–339 (2013).
Maerkl, S. J. & Quake, S. R. A systems approach to measuring the binding energy landscapes of transcription factors. Science 315, 233–237 (2007).
Fordyce, P. M. et al. De novo identification and biophysical characterization of transcription-factor binding sites with microfluidic affinity analysis. Nature Biotech. 28, 970–975 (2010).
Nutiu, R. et al. Direct measurement of DNA affinity landscapes on a high-throughput sequencing instrument. Nature Biotech. 29, 659–664 (2011).
Badis, G. et al. A library of yeast transcription factor motifs reveals a widespread function for Rsc3 in targeting nucleosome exclusion at promoters. Mol. Cell 32, 878–887 (2008).
Zhu, C. et al. High-resolution DNA-binding specificity analysis of yeast transcription factors. Genome Res. 19, 556–566 (2009).
Grove, C. A. et al. A multiparameter network reveals extensive divergence between C. elegans bHLH transcription factors. Cell 138, 314–327 (2009).
Badis, G. et al. Diversity and complexity in DNA recognition by transcription factors. Science 324, 1720–1723 (2009).
Jones, R. B., Gordus, A., Krall, J. A. & MacBeath, G. A quantitative protein interaction network for the ErbB receptors using protein microarrays. Nature 439, 168–174 (2006).
Siggers, T., Duyzend, M. H., Reddy, J., Khan, S. & Bulyk, M. L. Non-DNA-binding cofactors enhance DNA-binding specificity of a transcriptional regulatory complex. Mol. Syst. Biol. 7, 555 (2011).
ENCODE Project Consortium et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Sharon, E. et al. Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters. Nature Biotech. 30, 521–530 (2012).
Rajkumar, A. S., Denervaud, N. & Maerkl, S. J. Mapping the fine structure of a eukaryotic promoter input–output function. Nature Genet. 45, 1207–1215 (2013). This study measures the activity of ~200 variants of the PHO5 promoter in yeast that differ in the binding site for the regulating TF Pho4. Temporal promoter activity measurements throughout induction were obtained with a microfluidic-based platform. Previously characterized in vitro affinities were found to be highly predictive of the activity of the corresponding promoter variants in vivo . Subtle tuning of promoter activity could be achieved by manipulating the base pairs flanking the TFBS core.
Mogno, I., Kwasnieski, J. C. & Cohen, B. A. Massively parallel synthetic promoter assays reveal the in vivo effects of binding site variants. Genome Res. 23, 1908–1915 (2013).
Kheradpour, P. et al. Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. Genome Res. 23, 800–811 (2013).
Segal, E., Raveh-Sadka, T., Schroeder, M., Unnerstall, U. & Gaul, U. Predicting expression patterns from regulatory sequence in Drosophila segmentation. Nature 451, 535–540 (2008).
Gisselbrecht, S. S. et al. Highly parallel assays of tissue-specific enhancers in whole Drosophila embryos. Nature Methods 10, 774–780 (2013).
Liu, X., Lee, C. K., Granek, J. A., Clarke, N. D. & Lieb, J. D. Whole-genome comparison of Leu3 binding in vitro and in vivo reveals the importance of nucleosome occupancy in target site selection. Genome Res. 16, 1517–1528 (2006).
Guertin, M. J., Martins, A. L., Siepel, A. & Lis, J. T. Accurate prediction of inducible transcription factor binding intensities in vivo. PLoS Genet. 8, e1002610 (2012).
Zhou, X. & O'Shea, E. K. Integrated approaches reveal determinants of genome-wide binding and function of the transcription factor Pho4. Mol. Cell 42, 826–836 (2011). This paper sets out to bridge the gap between the frequent occurrences of the TF Pho4 motif along the genome and its binding pattern in vivo . It suggests that several mechanisms are at play. Nucleosome occupancy seems to restrict Pho4 binding, which is further tuned by competition with Cbf1 — another TF that has similar sequence preferences. A cooperative interaction between Pho4 and a nearby binding Pho2 is further required to activate transcription.
Gordan, R. et al. Genomic regions flanking E-box binding sites influence DNA binding specificity of bHLH transcription factors through DNA shape. Cell Rep. 3, 1093–1104 (2013). This study uses an extension of PBM to characterize the binding specificities of two E-box-binding TFs — Cbf1 and Tye7 — for putative binding sites in their genomic context. A differential specificity based on the base pairs flanking the core motif is characterized, and a computational model suggests that such specificity is mediated by distinct preferences for three-dimensional DNA shape properties.
Bresnick, E. H., Katsumura, K. R., Lee, H. Y., Johnson, K. D. & Perkins, A. S. Master regulatory GATA transcription factors: mechanistic principles and emerging links to hematologic malignancies. Nucleic Acids Res. 40, 5819–5831 (2012).
Siggers, T. & Gordan, R. Protein–DNA binding: complexities and multi-protein codes. Nucleic Acids Res. 42, 2099–2111 (2014).
Slattery, M. et al. Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins. Cell 147, 1270–1282 (2011).
Lelli, K. M., Slattery, M. & Mann, R. S. Disentangling the many layers of eukaryotic transcriptional regulation. Annu. Rev. Genet. 46, 43–68 (2012).
Biggin, M. D. Animal transcription networks as highly connected, quantitative continua. Dev. Cell 21, 611–626 (2011).
Spitz, F. & Furlong, E. E. Transcription factors: from enhancer binding to developmental control. Nature Rev. Genet. 13, 613–626 (2012).
Papatsenko, D., Goltsev, Y. & Levine, M. Organization of developmental enhancers in the Drosophila embryo. Nucleic Acids Res. 37, 5665–5677 (2009).
Erives, A. & Levine, M. Coordinate enhancers share common organizational features in the Drosophila genome. Proc. Natl Acad. Sci. USA 101, 3851–3856 (2004).
Rastegar, S. et al. The words of the regulatory code are arranged in a variable manner in highly conserved enhancers. Dev. Biol. 318, 366–377 (2008).
Lusk, R. W. & Eisen, M. B. Evolutionary mirages: selection on binding site composition creates the illusion of conserved grammars in Drosophila enhancers. PLoS Genet. 6, e1000829 (2010).
Hare, E. E., Peterson, B. K., Iyer, V. N., Meier, R. & Eisen, M. B. Sepsid even-skipped enhancers are functionally conserved in Drosophila despite lack of sequence conservation. PLoS Genet. 4, e1000106 (2008).
Weirauch, M. T. & Hughes, T. R. Conserved expression without conserved regulatory sequence: the more things change, the more they stay the same. Trends Genet. 26, 66–74 (2010).
Zinzen, R. P., Girardot, C., Gagneur, J., Braun, M. & Furlong, E. E. Combinatorial binding predicts spatio-temporal cis-regulatory activity. Nature 462, 65–70 (2009).
Brown, C. D., Johnson, D. S. & Sidow, A. Functional architecture and evolution of transcriptional elements that drive gene coexpression. Science 317, 1557–1560 (2007).
Smith, R. P. et al. Massively parallel decoding of mammalian regulatory sequences supports a flexible organizational model. Nature Genet. 45, 1021–1028 (2013).
Tanay, A. Extensive low-affinity transcriptional interactions in the yeast genome. Genome Res. 16, 962–972 (2006).
Evans, N. C., Swanson, C. I. & Barolo, S. Sparkling insights into enhancer structure, function, and evolution. Curr. Top. Dev. Biol. 98, 97–120 (2012). This review focuses on the sparkling eye enhancer of the D. melanogaster Pax2 (also known as sv ) gene. It discusses various analyses, including the examination of sparkling orthologues and the expression measurements in several cell types of the effects of different manipulations to the composition and arrangement of TFBSs. These analyses reveal a complex combinatorial code that is densely encoded in the enhancer and several highly constrained architectural properties to ensure proper cell-specific expression.
Parker, D. S., White, M. A., Ramos, A. I., Cohen, B. A. & Barolo, S. The cis-regulatory logic of Hedgehog gradient responses: key roles for gli binding affinity, competition, and cooperativity. Sci Signal 4, ra38 (2011).
Rogers, K. W. & Schier, A. F. Morphogen gradients: from generation to interpretation. Annu. Rev. Cell Dev. Biol. 27, 377–407 (2011).
Zaret, K. S. & Carroll, J. S. Pioneer transcription factors: establishing competence for gene expression. Genes Dev. 25, 2227–2241 (2011).
Arnosti, D. N. & Kulkarni, M. M. Transcriptional enhancers: intelligent enhanceosomes or flexible billboards? J. Cell Biochem. 94, 890–898 (2005).
Arnosti, D. N., Barolo, S., Levine, M. & Small, S. The eve stripe 2 enhancer employs multiple modes of transcriptional synergy. Development 122, 205–214 (1996).
Liu, F. & Posakony, J. W. Role of architecture in the function and specificity of two Notch-regulated transcriptional enhancer modules. PLoS Genet. 8, e1002796 (2012). This study examines the contribution of architectural properties of two Notch-regulated enhancers to their spatially distinct activities. Although one enhancer is resistant, to a large extent, to manipulations in the arrangement of its constituent TFBSs, the other enhancer is highly sensitive. The authors discuss how this differential reliance on architectural properties may be linked to the different developmental stages and contexts in which these enhancers function.
Senger, K. et al. Immunity regulatory DNAs share common organizational features in Drosophila. Mol. Cell 13, 19–32 (2004).
Crocker, J., Tamori, Y. & Erives, A. Evolution acts on enhancer organization to fine-tune gradient threshold readouts. PLoS Biol. 6, e263 (2008).
Swanson, C. I., Evans, N. C. & Barolo, S. Structural rules and complex regulatory circuitry constrain expression of a Notch- and EGFR-regulated eye enhancer. Dev. Cell 18, 359–370 (2010).
Panne, D., Maniatis, T. & Harrison, S. C. An atomic model of the interferon-β enhanceosome. Cell 129, 1111–1123 (2007).
Thanos, D. & Maniatis, T. Virus induction of human IFNβ gene expression requires the assembly of an enhanceosome. Cell 83, 1091–1100 (1995).
Junion, G. et al. A transcription factor collective defines cardiac cell fate and reflects lineage history. Cell 148, 473–486 (2012).
Melnikov, A. et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nature Biotech. 30, 271–277 (2012).
Patwardhan, R. P. et al. Massively parallel functional dissection of mammalian enhancers in vivo. Nature Biotech. 30, 265–270 (2012).
Kwasnieski, J. C., Mogno, I., Myers, C. A., Corbo, J. C. & Cohen, B. A. Complex effects of nucleotide variants in a mammalian cis-regulatory element. Proc. Natl Acad. Sci. USA 109, 19498–19503 (2012).
Lagha, M., Bothma, J. P. & Levine, M. Mechanisms of transcriptional precision in animal development. Trends Genet. 28, 409–416 (2012).
Guo, Y., Mahony, S. & Gifford, D. K. High resolution genome wide binding event finding and motif discovery reveals transcription factor spatial binding constraints. PLoS Comput. Biol. 8, e1002638 (2012).
Tirosh, I. & Barkai, N. Two strategies for gene regulation by promoter nucleosomes. Genome Res. 18, 1084–1091 (2008). This analysis of yeast promoters suggests two typical promoter structures that differ in their nucleosome positions, TFBS composition and location, expression variation and transcriptional plasticity (which is a measure of the degree by which gene expression is modulated across conditions); it contributes to our understanding of how different promoter architectures and dynamics may be used to attain different functional properties of expression.
Field, Y. et al. Distinct modes of regulation by chromatin encoded through nucleosome positioning signals. PLoS Comput. Biol. 4, e1000216 (2008).
Rhee, H. S. & Pugh, B. F. Genome-wide structure and organization of eukaryotic pre-initiation complexes. Nature 483, 295–301 (2012).
Leonard, D. A., Rajaram, N. & Kerppola, T. K. Structural basis of DNA bending and oriented heterodimer binding by the basic leucine zipper domains of Fos and Jun. Proc. Natl Acad. Sci. USA 94, 4913–4918 (1997).
Morin, B., Nichols, L. A. & Holland, L. J. Flanking sequence composition differentially affects the binding and functional characteristics of glucocorticoid receptor homo- and heterodimers. Biochemistry 45, 7299–7306 (2006).
Nagaoka, M., Shiraishi, Y. & Sugiura, Y. Selected base sequence outside the target binding site of zinc finger protein Sp1. Nucleic Acids Res. 29, 4920–4929 (2001).
Aow, J. S. et al. Differential binding of the related transcription factors Pho4 and Cbf1 can tune the sensitivity of promoters to different levels of an induction signal. Nucleic Acids Res. 41, 4877–4887 (2013).
Rohs, R. et al. Origins of specificity in protein–DNA recognition. Annu. Rev. Biochem. 79, 233–269 (2010).
Kornberg, R. D. & Lorch, Y. Twenty-five years of the nucleosome, fundamental particle of the eukaryote chromosome. Cell 98, 285–294 (1999).
Pique-Regi, R. et al. Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res. 21, 447–455 (2011).
Kaplan, T. et al. Quantitative models of the mechanisms that control genome-wide patterns of transcription factor binding during early Drosophila development. PLoS Genet. 7, e1001290 (2011).
John, S. et al. Chromatin accessibility pre-determines glucocorticoid receptor binding patterns. Nature Genet. 43, 264–268 (2011).
Struhl, K. & Segal, E. Determinants of nucleosome positioning. Nature Struct. Mol. Biol. 20, 267–273 (2013).
Fu, Y., Sinha, M., Peterson, C. L. & Weng, Z. The insulator binding protein CTCF positions 20 nucleosomes around its binding sites across the human genome. PLoS Genet. 4, e1000138 (2008).
Tillo, D. & Hughes, T. R. G+C content dominates intrinsic nucleosome occupancy. BMC Bioinformatics 10, 442 (2009).
Narlikar, L., Gordan, R. & Hartemink, A. J. A nucleosome-guided map of transcription factor binding sites in yeast. PLoS Comput. Biol. 3, e215 (2007).
Raveh-Sadka, T., Levo, M. & Segal, E. Incorporating nucleosomes into thermodynamic models of transcription regulation. Genome Res. 19, 1480–1496 (2009).
Wasson, T. & Hartemink, A. J. An ensemble model of competitive multi-factor binding of the genome. Genome Res. 19, 2101–2112 (2009).
Brogaard, K., Xi, L., Wang, J. P. & Widom, J. A map of nucleosome positions in yeast at base-pair resolution. Nature 486, 496–501 (2012). This paper presents a chemical-based approach to map nucleosome positions genome wide at a single-base-pair resolution. This method reveals overlapping positions within the population and allows a high-resolution examination of nucleosome positions relative to sequence and genomic features such as TSS, TFBSs and Pol II pause sites.
Khoueiry, P. et al. A cis-regulatory signature in ascidians and flies, independent of transcription factor binding sites. Curr. Biol. 20, 792–802 (2010).
Segal, E. & Widom, J. Poly(dA:dT) tracts: major determinants of nucleosome organization. Curr. Opin. Struct. Biol. 19, 65–71 (2009).
Iyer, V. & Struhl, K. Poly(dA:dT), a ubiquitous promoter element that stimulates transcription via its intrinsic DNA structure. EMBO J. 14, 2570–2579 (1995).
Raveh-Sadka, T. et al. Manipulating nucleosome disfavoring sequences allows fine-tune regulation of gene expression in yeast. Nature Genet. 44, 743–750 (2012).
Tillo, D. et al. High nucleosome occupancy is encoded at human regulatory sequences. PLoS ONE 5, e9129 (2010).
Ballare, C. et al. Nucleosome-driven transcription factor binding and gene regulation. Mol. Cell 49, 67–79 (2013).
White, M. A., Myers, C. A., Corbo, J. C. & Cohen, B. A. Massively parallel in vivo enhancer assay reveals that highly local features determine the cis-regulatory function of ChIP–seq peaks. Proc. Natl Acad. Sci. USA 110, 11952–11957 (2013).
Degner, J. F. et al. DNase I sensitivity QTLs are a major determinant of human expression variation. Nature 482, 390–394 (2012).
Schaub, M. A., Boyle, A. P., Kundaje, A., Batzoglou, S. & Snyder, M. Linking disease associations with regulatory information in the human genome. Genome Res. 22, 1748–1759 (2012).
Gaffney, D. J. et al. Dissecting the regulatory architecture of gene expression QTLs. Genome Biol. 13, R7 (2012).
Lee, S. I. et al. Learning a prior on regulatory potential from eQTL data. PLoS Genet. 5, e1000358 (2009).
Manor, O. & Segal, E. Robust prediction of expression differences among human individuals using only genotype information. PLoS Genet. 9, e1003396 (2013).
Bintu, L. et al. Transcriptional regulation by the numbers: models. Curr. Opin. Genet. Dev. 15, 116–124 (2005).
Segal, E. & Widom, J. From DNA sequence to transcriptional behaviour: a quantitative approach. Nature Rev. Genet. 10, 443–456 (2009).
Kadonaga, J. T. Perspectives on the RNA polymerase II core promoter. Wiley Interdiscip. Rev. Dev. Biol. 1, 40–51 (2012).
Mogno, I., Vallania, F., Mitra, R. D. & Cohen, B. A. TATA is a modular component of synthetic promoters. Genome Res. 20, 1391–1397 (2010).
Lubliner, S., Keren, L. & Segal, E. Sequence features of yeast and human core promoters that are predictive of maximal promoter activity. Nucleic Acids Res. 41, 5569–5581 (2013).
Schoenberg, D. R. & Maquat, L. E. Regulation of cytoplasmic mRNA decay. Nature Rev. Genet. 13, 246–259 (2012).
Burgess, D. J. Global analyses of determinants of RNA decay. Nature Rev. Genet. http://dx.doi.org/10.1038/nrg3710 (2014).
Kornblihtt, A. R. et al. Alternative splicing: a pivotal step between eukaryotic transcription and translation. Nature Rev. Mol. Cell Biol. 14, 153–165 (2013).
Ingolia, N. T. Ribosome profiling: new views of translation, from single codons to genome scale. Nature Rev. Genet. 15, 205–213 (2014).
Natoli, G. & Andrau, J. C. Noncoding transcription at enhancers: general principles and functional models. Annu. Rev. Genet. 46, 1–19 (2012).
Lam, M. T., Li, W., Rosenfeld, M. G. & Glass, C. K. Enhancer RNAs and regulated transcriptional programs. Trends Biochem. Sci. 39, 170–182 (2014).
Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).
Lohmueller, J. J., Armel, T. Z. & Silver, P. A. A tunable zinc finger-based framework for Boolean logic computation in mammalian cells. Nucleic Acids Res. 40, 5180–5187 (2012).
Teo, W. S. & Chang, M. W. Development and characterization of AND-gate dynamic controllers with a modular synthetic GAL1 core promoter in Saccharomyces cerevisiae. Biotechnol. Bioeng. 111, 144–151 (2013).
Perez-Pinera, P. et al. Synergistic and tunable human gene activation by combinations of synthetic transcription factors. Nature Methods 10, 239–242 (2013).
Gaj, T., Gersbach, C. A. & Barbas, C. F. 3rd. ZFN, TALEN, and CRISPR/Cas-based methods for genome engineering. Trends Biotechnol. 31, 397–405 (2013).
Kasinathan, S., Orsi, G. A., Zentner, G. E., Ahmad, K. & Henikoff, S. High-resolution mapping of transcription factor binding sites on native chromatin. Nature Methods 11, 203–209 (2014).
Vierstra, J., Wang, H., John, S., Sandstrom, R. & Stamatoyannopoulos, J. A. Coupling transcription factor occupancy to nucleosome architecture with DNase–FLASH. Nature Methods 11, 66–72 (2014).
Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nature Methods 10, 1213–1218 (2013).
Rajewsky, N., Vergassola, M., Gaul, U. & Siggia, E. D. Computational detection of genomic cis-regulatory modules applied to body patterning in the early Drosophila embryo. BMC Bioinformatics 3, 30 (2002).
Berman, B. P. et al. Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc. Natl Acad. Sci. USA 99, 757–762 (2002).
Heintzman, N. D. et al. Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature 459, 108–112 (2009).
Ernst, J. et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43–49 (2011).
Simon, J. M., Giresi, P. G., Davis, I. J. & Lieb, J. D. Using formaldehyde-assisted isolation of regulatory elements (FAIRE) to isolate active regulatory DNA. Nature Protoc. 7, 256–267 (2012).
Khambata-Ford, S. et al. Identification of promoter regions in the human genome by using a retroviral plasmid library-based functional reporter gene assay. Genome Res. 13, 1765–1774 (2003).
Jory, A. et al. A survey of 6,300 genomic fragments for cis-regulatory activity in the imaginal discs of Drosophila melanogaster. Cell Rep. 2, 1014–1024 (2012).
Jenett, A. et al. A GAL4-driver line resource for Drosophila neurobiology. Cell Rep. 2, 991–1001 (2012).
Manning, L. et al. A resource for manipulating gene expression and analyzing cis-regulatory modules in the Drosophila CNS. Cell Rep. 2, 1002–1013 (2012).
Arnold, C. D. et al. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science 339, 1074–1077 (2013).
Dickel, D. E. et al. Function-based identification of mammalian enhancers using site-specific integration. Nature Methods 11, 566–571 (2014).
Murtha, M. et al. FIREWACh: high-throughput functional detection of transcriptional regulatory modules in mammalian cells. Nature Methods 11, 559–565 (2014).
Kinney, J. B., Murugan, A., Callan, C. G. Jr & Cox, E. C. Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence. Proc. Natl Acad. Sci. USA 107, 9158–9163 (2010).
Kosuri, S. et al. Composability of regulatory sequences controlling transcription and translation in Escherichia coli. Proc. Natl Acad. Sci. USA 110, 14024–14029 (2013).
Patwardhan, R. P. et al. High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis. Nature Biotech. 27, 1173–1175 (2009).
Kleinjan, D. A. & van Heyningen, V. Long-range control of gene expression: emerging mechanisms and disruption in disease. Am. J. Hum. Genet. 76, 8–32 (2005).
Gibcus, J. H. & Dekker, J. The hierarchy of the 3D genome. Mol. Cell 49, 773–782 (2013).
Smallwood, A. & Ren, B. Genome organization and long-range regulation of gene expression by enhancers. Curr. Opin. Cell Biol. 25, 387–394 (2013).
Nagano, T. et al. Single-cell Hi-C reveals cell-to-cell variability in chromosome structure. Nature 502, 59–64 (2013).
Zhang, Y. et al. Chromatin connectivity maps reveal dynamic promoter-enhancer long-range associations. Nature 504, 306–310 (2013).
Phillips-Cremins, J. E. et al. Architectural protein subclasses shape 3D organization of genomes during lineage commitment. Cell 153, 1281–1295 (2013).
Barolo, S. Shadow enhancers: frequently asked questions about distributed cis-regulatory information and enhancer redundancy. Bioessays 34, 135–141 (2012).
This work was supported by grants from the European Research Council and the US National Institutes of Health to E.S. M.L. thanks the Azrieli Foundation for the award of an Azrieli Fellowship.
The authors declare no competing financial interests.
- Genome-wide association studies
(GWASs). Genome-wide studies that are designed to identify genetic associations with an observable trait, disease or condition.
- Expression quantitative trait locus
(eQTL). A locus at which genetic variation is associated with variation in gene expression levels.
- In vivo
In the context of this Review, in vivo refers to experiments carried out in living cells, regardless of whether the cells are within or outside a whole organism (sometimes referred to as ex vivo).
- Position weight matrices
(PWMs). Representations for the specificity of DNA-binding proteins, in which a score is assigned to every possible base pair at each position in the binding site. A PWM score for a specific sequence is the sum of position-specific scores for each of its base pair.
- Dissociation constant
(Kd). The dissociation constant between two molecules (in this context, for a transcription factor and a DNA sequence). It is the ratio of the off:on rate for the formation and dissolution of the complex.
Pertaining to loci in two species that are derived from a common ancestral locus.
- Homotypic TFBS cluster
A cluster of multiple transcription factor binding sites for the same transcription factor.
- Heterotypic clustering
Clustering of multiple transcription factor binding sites for different transcription factors.
- Regression model
A model that describes the relationship between a dependent variable and one or more independent variables.
- Linkage disequilibrium
(LD). A nonrandom association of alleles at different loci (as might be observed for particular alleles at neighbouring loci that tend to be co-inherited).
- DNase I sensitivity QTLs
(dsQTLs). Locations at which DNaseI hypersensitive site sequencing read depth significantly correlates with the genotypes at nearby single-nucleotide polymorphisms, or insertions or deletions.
- Bayesian approach
A modelling approach that uses Bayes' rule and that computes a posterior probability that a hypothesis is true using a combination of prior beliefs and observed data.
- k-nearest neighbours
(KNN). A non-parametric regression method that predicts the value of a new point on the basis of the values of the k closest training points in the feature space.
About this article
Cite this article
Levo, M., Segal, E. In pursuit of design principles of regulatory sequences. Nat Rev Genet 15, 453–468 (2014). https://doi.org/10.1038/nrg3684
This article is cited by
Hybrid allele-specific ChIP-seq analysis identifies variation in brassinosteroid-responsive transcription factor binding linked to traits in maize
Genome Biology (2023)
Nature Communications (2023)
Genome Biology (2022)
DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers
Nature Genetics (2022)
Nature Communications (2022)