Compatibility rules of human enhancer and promoter sequences

Bergman, Drew T.; Jones, Thouis R.; Liu, Vincent; Ray, Judhajeet; Jagoda, Evelyn; Siraj, Layla; Kang, Helen Y.; Nasser, Joseph; Kane, Michael; Rios, Antonio; Nguyen, Tung H.; Grossman, Sharon R.; Fulco, Charles P.; Lander, Eric S.; Engreitz, Jesse M.

doi:10.1038/s41586-022-04877-w

Article
Published: 20 May 2022

Compatibility rules of human enhancer and promoter sequences

Nature volume 607, pages 176–184 (2022)Cite this article

29k Accesses
48 Citations
130 Altmetric
Metrics details

Subjects

Abstract

Gene regulation in the human genome is controlled by distal enhancers that activate specific nearby promoters¹. A proposed model for this specificity is that promoters have sequence-encoded preferences for certain enhancers, for example, mediated by interacting sets of transcription factors or cofactors². This ‘biochemical compatibility’ model has been supported by observations at individual human promoters and by genome-wide measurements in Drosophila^{3,4,5,6,7,8,9}. However, the degree to which human enhancers and promoters are intrinsically compatible has not yet been systematically measured, and how their activities combine to control RNA expression remains unclear. Here we design a high-throughput reporter assay called enhancer × promoter self-transcribing active regulatory region sequencing (ExP STARR-seq) and applied it to examine the combinatorial compatibilities of 1,000 enhancer and 1,000 promoter sequences in human K562 cells. We identify simple rules for enhancer–promoter compatibility, whereby most enhancers activate all promoters by similar amounts, and intrinsic enhancer and promoter activities multiplicatively combine to determine RNA output (R² = 0.82). In addition, two classes of enhancers and promoters show subtle preferential effects. Promoters of housekeeping genes contain built-in activating motifs for factors such as GABPA and YY1, which decrease the responsiveness of promoters to distal enhancers. Promoters of variably expressed genes lack these motifs and show stronger responsiveness to enhancers. Together, this systematic assessment of enhancer–promoter compatibility suggests a multiplicative model tuned by enhancer and promoter class to control gene transcription in the human genome.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: Enhancer and promoter activities combine multiplicatively.**

**Fig. 3: Compatibility classes of enhancers and promoters.**

**Fig. 4: Promoter classes correspond to enhancer responsive versus ubiquitously expressed genes.**

**Fig. 5: P2 promoters contain built-in enhancer sequences.**

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Simultaneous single-cell three-dimensional genome and gene expression profiling uncovers dynamic enhancer connectivity underlying olfactory receptor choice

Article Open access 15 April 2024

Improving prime editing with an endogenous small RNA-binding protein

Article Open access 03 April 2024

Data availability

Raw and processed data for ExP STARR-seq, motif ExP STARR-seq, HS-STARR-seq and K562 PRO-seq can be found at the NCBI’s Gene Expression Omnibus under accession number GSE184426. Luciferase data can be found in Supplementary Table 3. Datasets used from the ENCODE Project are listed in Supplementary Table 10 and are available at https://www.encodeproject.org. Additional resources and protocols related to this study are available at https://www.engreitzlab.org/resources/.

Code availability

Code for fitting the multiplicative ExP model is available at https://doi.org/10.5281/zenodo.6514733 or https://github.com/broadinstitute/ExP-model-fit.

References

ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Article ADS Google Scholar
van Arensbergen, J., van Steensel, B. & Bussemaker, H. J. In search of the determinants of enhancer–promoter interaction specificity. Trends Cell Biol. 24, 695–702 (2014).
Article PubMed PubMed Central Google Scholar
Emami, K. H., Navarre, W. W. & Smale, S. T. Core promoter specificities of the Sp1 and VP16 transcriptional activation domains. Mol. Cell. Biol. 15, 5906–5916 (1995).
Article CAS PubMed PubMed Central Google Scholar
Ohtsuki, S., Levine, M. & Cai, H. N. Different core promoters possess distinct regulatory activities in the Drosophila embryo. Genes Dev. 12, 547–556 (1998).
Article CAS PubMed PubMed Central Google Scholar
Emami, K. H., Jain, A. & Smale, S. T. Mechanism of synergy between TATA and initiator: synergistic binding of TFIID following a putative TFIIA-induced isomerization. Genes Dev. 11, 3007–3019 (1997).
Article CAS PubMed PubMed Central Google Scholar
Butler, J. E. F. Enhancer–promoter specificity mediated by DPE or TATA core promoter motifs. Genes Dev. 15, 2515–2519 (2001).
Article CAS PubMed PubMed Central Google Scholar
Yean, D. & Gralla, J. Transcription reinitiation rate: a special role for the TATA box. Mol. Cell. Biol. 17, 3809–3816 (1997).
Article CAS PubMed PubMed Central Google Scholar
Wefald, F. C., Devlin, B. H. & Williams, R. S. Functional heterogeneity of mammalian TATA-box sequences revealed by interaction with a cell-specific enhancer. Nature 344, 260–262 (1990).
Article ADS CAS PubMed Google Scholar
Zabidi, M. A., Arnold, C. D., Schernhuber, K. & Pagani, M. Enhancer–core-promoter specificity separates developmental and housekeeping gene regulation. Nature 518, 556–559 (2015).
Article ADS CAS PubMed Google Scholar
Banerji, J., Rusconi, S. & Schaffner, W. Expression of a β-globin gene is enhanced by remote SV40 DNA sequences. Cell 27, 299–308 (1981).
Article CAS PubMed Google Scholar
Banerji, J., Olson, L. & Schaffner, W. A lymphocyte-specific cellular enhancer is located downstream of the joining region in immunoglobulin heavy chain genes. Cell 33, 729–740 (1983).
Article CAS PubMed Google Scholar
Melnikov, A. et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat. Biotechnol. 30, 271–277 (2012).
Article CAS PubMed PubMed Central Google Scholar
Arnold, C. D. et al. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science 339, 1074–1077 (2013).
Article ADS CAS PubMed Google Scholar
Kermekchiev, M., Pettersson, M., Matthias, P. & Schaffner, W. Every enhancer works with every promoter for all the combinations tested: could new regulatory pathways evolve by enhancer shuffling? Gene Expr. 1, 71–81 (1991).
CAS PubMed Google Scholar
Tewhey, R. et al. Direct identification of hundreds of expression-modulating variants using a multiplexed reporter assay. Cell 172, 1132–1134 (2018).
Article CAS PubMed Google Scholar
Klein, J. C. et al. A systematic evaluation of the design and context dependencies of massively parallel reporter assays. Nat. Methods 17, 1083–1091 (2020).
Article CAS PubMed PubMed Central Google Scholar
Muerdter, F. et al. Resolving systematic errors in widely used enhancer activity assays in human cells. Nat. Methods 15, 141–149 (2018).
Article CAS PubMed Google Scholar
Nguyen, T. A. et al. High-throughput functional comparison of promoter and enhancer activities. Genome Res. 26, 1023–1033 (2016).
Article CAS PubMed PubMed Central Google Scholar
Arnold, C. D. et al. Genome-wide assessment of sequence-intrinsic enhancer responsiveness at single-base-pair resolution. Nat. Biotechnol. 35, 136–144 (2017).
Article CAS PubMed Google Scholar
Haberle, V. et al. Transcriptional cofactors display specificity for distinct types of core promoters. Nature 570, 122–126 (2019).
Article ADS CAS PubMed Google Scholar
Li, X. & Noll, M. Compatibility between enhancers and promoters determines the transcriptional specificity of gooseberry and gooseberry neuro in the Drosophila embryo. EMBO J. 13, 400–406 (1994).
Article PubMed PubMed Central Google Scholar
Fulco, C. P. et al. Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019).
Article CAS PubMed PubMed Central Google Scholar
van Arensbergen, J. et al. Genome-wide mapping of autonomous promoter activity in human cells. Nat. Biotechnol. 35, 145–153 (2017).
Article PubMed Google Scholar
Wall, L., deBoer, E. & Grosveld, F. The human β-globin gene 3′ enhancer contains multiple binding sites for an erythroid-specific protein. Genes Dev. 2, 1089–1100 (1988).
Article CAS PubMed Google Scholar
Tuan, D. Y., Solomon, W. B., London, I. M. & Lee, D. P. An erythroid-specific, developmental-stage-independent enhancer far upstream of the human “beta-like globin” genes. Proc. Natl. Acad. Sci. USA 86, 2554–2558 (1989).
Article ADS CAS PubMed PubMed Central Google Scholar
Thakore, P. I. et al. Highly specific epigenome editing by CRISPR–Cas9 repressors for silencing of distal regulatory elements. Nat. Methods 12, 1143–1149 (2015).
Article CAS PubMed PubMed Central Google Scholar
Klann, T. S. et al. CRISPR–Cas9 epigenome editing enables high-throughput screening for functional regulatory elements in the human genome. Nat. Biotechnol. 35, 561–568 (2017).
Article CAS PubMed PubMed Central Google Scholar
Fulco, C. P. et al. Systematic mapping of functional enhancer-promoter connections with CRISPR interference. Science 354, 769–773 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Liu, Y. et al. Functional assessment of human enhancer activities using whole-genome STARR-sequencing. Genome Biol. 18, 219 (2017).
Article PubMed PubMed Central Google Scholar
Haberle, V. & Stark, A. Eukaryotic core promoters and the functional basis of transcription initiation. Nat. Rev. Mol. Cell Biol. 19, 621–637 (2018).
Article CAS PubMed PubMed Central Google Scholar
Lenhard, B., Sandelin, A. & Carninci, P. Metazoan promoters: emerging characteristics and insights into transcriptional regulation. Nat. Rev. Genet. 13, 233–245 (2012).
Article CAS PubMed Google Scholar
Fan, K., Moore, J. E., Zhang, X.-O. & Weng, Z. Genetic and epigenetic features of promoters with ubiquitous chromatin accessibility support ubiquitous transcription of cell-essential genes. Nucleic Acids Res. 49, 5705–5725 (2021).
Article CAS PubMed PubMed Central Google Scholar
Xi, H. et al. Identification and characterization of cell type-specific and ubiquitous chromatin regulatory structures in the human genome. PLoS Genet. 3, e136 (2007).
Article PubMed PubMed Central Google Scholar
Landolin, J. M. et al. Sequence features that drive human promoter function and tissue specificity. Genome Res. 20, 890–898 (2010).
Article CAS PubMed PubMed Central Google Scholar
Weingarten-Gabbay, S. et al. Systematic interrogation of human promoters. Genome Res. 29, 171–183 (2019).
Article CAS PubMed PubMed Central Google Scholar
Sahu, B. et al. Sequence determinants of human gene regulatory elements. Nat. Genet. 54, 283–294 (2022).
Article CAS PubMed PubMed Central Google Scholar
Yu, M. et al. GA-binding protein-dependent transcription initiator elements. Effect of helical spacing between polyomavirus enhancer a factor 3(PEA3)/ETS-binding sites on initiator activity. J. Biol. Chem. 272, 29060–29067 (1997).
Article CAS PubMed Google Scholar
Curina, A. et al. High constitutive activity of a broad panel of housekeeping and tissue-specific cis-regulatory elements depends on a subset of ETS proteins. Genes Dev. 31, 399–412 (2017).
Article CAS PubMed PubMed Central Google Scholar
Martinez-Ara, M., Comoglio, F., van Arensbergen, J. & van Steensel, B. Systematic analysis of intrinsic enhancer–promoter compatibility in the mouse genome. Mol. Cell https://doi.org/10.1101/2021.10.21.465269 (2022).
Maricque, B. B., Chaudhari, H. G. & Cohen, B. A. A massively parallel reporter assay dissects the influence of chromatin structure on cis-regulatory activity. Nat. Biotechnol. 37, 90–95 (2019).
Article CAS Google Scholar
Hong, C. K. Y. & Cohen, B. A. Genomic environments scale the activities of diverse core promoters. Genome Res. 32, 85–96 (2022).
Article PubMed PubMed Central Google Scholar
Chiang, C. M. & Roeder, R. G. Cloning of an intrinsic human TFIID subunit that interacts with multiple transcriptional activators. Science 267, 531–536 (1995).
Article ADS CAS PubMed Google Scholar
Austen, M., Lüscher, B. & Lüscher-Firzlaff, J. M. Characterization of the transcriptional regulator YY1. The bipartite transactivation domain is independent of interaction with the TATA box-binding protein, transcription factor IIB, TAFII55, or cAMP-responsive element-binding protein (CPB)-binding protein. J. Biol. Chem. 272, 1709–1717 (1997).
Article CAS PubMed Google Scholar
Sucharov, C., Basu, A., Carter, R. S. & Avadhani, N. G. A novel transcriptional initiator activity of the GABP factor binding ets sequence repeat from the murine cytochrome c oxidase Vb gene. Gene Expr. 5, 93–111 (1995).
CAS PubMed Google Scholar
Carter, R. S. & Avadhani, N. G. Cooperative binding of GA-binding protein transcription factors to duplicated transcription initiation region repeats of the cytochrome c oxidase subunit IV gene. J. Biol. Chem. 269, 4381–4387 (1994).
Article CAS PubMed Google Scholar
Usheva, A. & Shenk, T. YY1 transcriptional initiator: protein interactions and association with a DNA site containing unpaired strands. Proc. Natl Acad. Sci. USA 93, 13571–13576 (1996).
Article ADS CAS PubMed PubMed Central Google Scholar
Larsson, A. J. M. et al. Genomic encoding of transcriptional burst kinetics. Nature 565, 251–254 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
The FANTOM Consortium and the RIKEN PMI and CLST (DGT). A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).
Article ADS Google Scholar
Wang, T., Lander, E. S. & Sabatini, D. M. Large-scale single guide RNA library construction and use for CRISPR–Cas9-based genetic screens. Cold Spring Harb. Protoc. 2016, db.top086892 (2016).
Article Google Scholar
Engreitz, J. M. et al. Local regulation of gene expression by lncRNA promoters, transcription and splicing. Nature 539, 452–455 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Nasser, J. et al. Genome-wide enhancer maps link risk variants to disease genes. Nature 593, 238–243 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Anscombe, F. J. The transformation of Poisson, binomial and negative-binomial data. Biometrika 35, 246–254 (1948).
Article MathSciNet MATH Google Scholar
Grant, C. E., Bailey, T. L. & Noble, W. S. FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).
Article CAS PubMed PubMed Central Google Scholar
Kulakovskiy, I. V. et al. HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-seq analysis. Nucleic Acids Res. 46, D252–D259 (2018).
Article CAS PubMed Google Scholar
Core, L. J. et al. Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat. Genet. 46, 1311–1320 (2014).
Article CAS PubMed PubMed Central Google Scholar
Vanhille, L. et al. High-throughput and quantitative assessment of enhancer activity in mammals by CapStarr-seq. Nat. Commun. 6, 6905 (2015).
Article ADS CAS PubMed Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Article CAS PubMed PubMed Central Google Scholar
The R Development Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2014).
Van Rossum, G. & Drake, F. L. Python 3 Reference Manual. (CreateSpace, 2009).
Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
Article CAS PubMed PubMed Central Google Scholar
McKinney, W. Data structures for statistical computing in Python. In Proc. 9th Python in Science Conference (eds van der Walt, S. & Millman, J) 51–56 (SciPy, 2010).
Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).
Article Google Scholar
Waskom, M. seaborn: statistical data visualization. J. Open Source Softw. 6, 3021 (2021).
Article ADS Google Scholar
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
MathSciNet MATH Google Scholar
Stovner, E. B. & Sætrom, P. PyRanges: efficient comparison of genomic intervals in Python. Bioinformatics 36, 918–919 (2020).
CAS PubMed Google Scholar
Seabold, S. & Perktold, J. Statsmodels: econometric and statistical modeling with Python. in Proc. 9th Python in Science Conference (eds van der Walt, S. & Millman, J) 92–96 (SciPy, 2010).

Download references

Acknowledgements

This work was supported by a NHGRI Genomic Innovator Award (R35HG011324 to J.M.E.); Gordon and Betty Moore and the BASE Research Initiative at the Lucile Packard Children’s Hospital at Stanford University (J.M.E.); a NIH Pathway to Independence Award (K99HG009917 and R00HG009917 to J.M.E.); the Harvard Society of Fellows (J.M.E.); the Novo Nordisk Foundation Center for Genomic Mechanisms of Disease (J.M.E.); the Broad Institute (E.S.L.); an AΩA Carolyn L. Kuckein Student Research Fellowship (D.T.B.); NHGRI Ruth L. Kirschstein NRSA Predoctoral Institutional Research Training Grants (T32HG000044, V.L.); and by the National Institute of General Medical Sciences (T32GM007753, L.S.). We thank B. van Steensel and M. Martinez-Ara for sharing data and discussing analysis. We thank C. Vockley, V. Subramanian and members of the Engreitz and Lander research groups for discussions and technical assistance. E.S.L is currently on leave from the Broad Institute, MIT, and Harvard.

Author information

Charles P. Fulco
Present address: Bristol Myers Squibb, Cambridge, MA, USA
These authors contributed equally: Drew T. Bergman, Thouis R. Jones

Authors and Affiliations

Broad Institute of MIT and Harvard, Cambridge, MA, USA
Drew T. Bergman, Thouis R. Jones, Judhajeet Ray, Evelyn Jagoda, Layla Siraj, Joseph Nasser, Michael Kane, Tung H. Nguyen, Sharon R. Grossman, Charles P. Fulco, Eric S. Lander & Jesse M. Engreitz
Geisel School of Medicine at Dartmouth, Hanover, NH, USA
Drew T. Bergman
Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
Vincent Liu, Helen Y. Kang, Antonio Rios & Jesse M. Engreitz
Biophysics Graduate Program, Harvard University, Cambridge, MA, USA
Layla Siraj
BASE Initiative, Betty Irene Moore Children’s Heart Center, Lucile Packard Children’s Hospital, Stanford University School of Medicine, Stanford, CA, USA
Helen Y. Kang & Jesse M. Engreitz
Department of Biology, MIT, Cambridge, MA, USA
Eric S. Lander
Department of Systems Biology, Harvard Medical School, Boston, MA, USA
Eric S. Lander

Authors

Drew T. Bergman
View author publications
You can also search for this author in PubMed Google Scholar
Thouis R. Jones
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Liu
View author publications
You can also search for this author in PubMed Google Scholar
Judhajeet Ray
View author publications
You can also search for this author in PubMed Google Scholar
Evelyn Jagoda
View author publications
You can also search for this author in PubMed Google Scholar
Layla Siraj
View author publications
You can also search for this author in PubMed Google Scholar
Helen Y. Kang
View author publications
You can also search for this author in PubMed Google Scholar
Joseph Nasser
View author publications
You can also search for this author in PubMed Google Scholar
Michael Kane
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Rios
View author publications
You can also search for this author in PubMed Google Scholar
Tung H. Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Sharon R. Grossman
View author publications
You can also search for this author in PubMed Google Scholar
Charles P. Fulco
View author publications
You can also search for this author in PubMed Google Scholar
Eric S. Lander
View author publications
You can also search for this author in PubMed Google Scholar
Jesse M. Engreitz
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.T.B., C.P.F., T.R.J., J.R. and J.M.E. developed the ExP STARR-seq assay. D.T.B., J.R., M.K., A.R. and T.H.N. performed the STARR-seq experiments. M.K. performed the luciferase assay experiments. D.T.B., T.R.J., V.L., E.J., L.S., H.Y.K., J.N., S.R.G. and J.M.E. analysed the STARR-seq data. M.K. and J.M.E analysed the luciferase assay data. E.S.L. and J.M.E. supervised the work. All authors contributed to writing the manuscript.

Corresponding author

Correspondence to Jesse M. Engreitz.

Ethics declarations

Competing interests

C.P.F. is now an employee and shareholder of Bristol Myers Squibb. J.M.E. is a shareholder of Illumina, Inc, and other biotechnology companies. All other authors declare no competing interests.

Peer review

Peer review information

Nature thanks Alex Nord and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Design and reproducibility of ExP STARR-seq.

a. ExP STARR-seq reporter construct (pA = polyadenylation signal; purple = promoter sequencing adaptors; angled = spliced sequence; trGFP = truncated GFP open reading frame with start and stop codon; BC = 16 bp N-mer plasmid barcode; red = enhancer sequencing adaptors) and 1000x1000 K562 library contents. b. Correlation of ExP STARR-seq expression between biological replicate experiments, calculated for individual enhancer-promoter pairs with unique plasmid barcodes. Axes represent the average STARR-seq expression (RNA/DNA) of individual biological replicates. Density: number of enhancer-promoter plasmids. c. Fraction of remaining enhancer-promoter plasmids passing DNA (>25) and RNA (>1) threshold (y-axis) with downsampling of sequencing reads (x-axis). d. Distribution of plasmid barcodes per enhancer-promoter pair, red dotted-line is threshold of two plasmid barcodes. e. Correlation between virtual replicates, formed by sampling two nonoverlapping groups of three plasmid barcodes from pairs with at least 6 barcodes, and averaging log₂(RNA/DNA) within groups. f. Correlation between virtual replicates as in (c) for increasing numbers of plasmid barcodes per pair in virtual replicates. g. DNase-seq, H3K27ac ChIP-seq, and PRO-seq (RPM) by increasing quartile of autonomous promoter activity and average enhancer activity in ExP STARR-seq (n = 800). Box: median and interquartile range (IQR). Whiskers: +/− 1.5 x IQR. h. Activation in ExP STARR-seq (expression versus genomic controls in distal position) of GATA1 and HDAC6 promoters by eHDAC6 (chrX:48641342-48641606). Ctrl = activity of promoters with random genomic controls in enhancer position. Error bars: 95% CI across plasmid barcodes. n = 7 (GATA1-ctrl), 381 (HDAC6-ctrl), 4 (eHDAC6-GATA1), 37 (eHDAC6-HDAC6). i. Average enhancer activity (STARR-seq expression of plasmids containing a given enhancer averaged across all promoters) of enhancer sequences derived from random genomic controls (n = 87), accessible elements (n = 725), and genomic enhancers validated in CRISPR experiments (n = 89).

Extended Data Fig. 2 Comparison of methods of estimating enhancer and promoter activities and the multiplicative model.

a. Intrinsic promoter activity (expression versus random genomic controls in enhancer position) of five selected promoters. Error bars: 95% CI across plasmid barcodes (n = 54-79). Promoter classes (see Methods): DNASE2 (P1), HDAC6 (P1), CD164 (P1), BCAT2 (P1), PPP1R15A (P2). b. Activation (expression versus random genomic controls in enhancer position) of 5 selected promoters by 5 selected enhancers: 1 = chr11:61602148-61602412 (E1), 2 = chr19:49467061-49467325 (E1), 3 = chrX:48641342-48641606 (E1), 4 = chr19:12893216-12893480 (E2), 5 = chr17:40851134-40851398 (E1). Error bars: 95% CI across plasmid barcodes (n = 12-56). c-d. Heatmap of promoter activity (a, expression divided by intrinsic enhancer activity) or enhancer activity (b, expression divided by intrinsic promoter activity) across all pairs of promoter (vertical) and enhancer sequences (horizontal). Axes are sorted by intrinsic promoter and enhancer activities, as in Fig. 2j. Grey: missing data. e. Intrinsic promoter and enhancer activity (y-axis, estimated by a Poisson count model) versus average pairwise Spearman correlation (as in Fig. 2c, d). f–g. Correlation between two estimates of promoter (c) and enhancer (d) activities. One method (“average activity”, x-axis) estimates activity calculated by averaging across elements, and the other method (“intrinsic activity”, y-axis) estimates activity by using coefficients estimated by a Poisson count model (see Methods). h–i. Correlation of intrinsic promoter (e) and enhancer (f) activity estimates from Poisson model using data from separate replicate experiments. j–k. Fraction of variance explained by promoter activity, enhancer activity, class interaction from the perspective of expression (STARR-seq score) and enhancer activation (fold-activation of an enhancer on a promoter, normalizing out promoter strength) limited to pairs with 2 or more (c) or 20 or more (d) plasmid barcodes. Plot includes pairs with P0 promoters and E0 enhancers. Bar plots show sequential sum of squares (Type-I ANOVA). l. Correlation of the multiplicative enhancer x promoter model with STARR-seq expression comparing enhancer-promoter pairs located within 10 kb, 100 kb, and pairs located on different chromosomes.

Extended Data Fig. 3 Validation of enhancer-promoter multiplication via luciferase assays and modeling gene transcription as a function of intrinsic promoter activity and enhancer inputs.

a. ExP luciferase reporter construct. Seven enhancer fragments, with flanking polyadenylation signals, were cloned upstream of five promoter fragments and measured via the dual luciferase assay. b. Autonomous promoter activity of ExP luciferase (average luciferase signal of promoter with negative control) for 5 promoter sequences derived from 3 genes (MYC, PVT1, CCDC26). Error bars are 95% CI from 6 (MYC) or 4 (all other promoters) biological replicates. c. Enhancer activation (luciferase signal versus negative control sequence in the enhancer position) of seven enhancers across five promoter fragments. Error bars are 95% CI from 6 (MYC) or 4 (all other promoters) biological replicates. d-f. Gene transcription (y-axis): PRO-seq read counts in the gene body. a. Promoter Activity (x-axis, left): Intrinsic promoter activity, as measured by ExP STARR-seq. b. Enhancer Input (x-axis, center): enhancer activity (based on measurements of H3K27ac and DHS in the genome) multiplied by enhancer-promoter contact (based on Hi-C measurements), summed across all putative enhancers (DHS peaks) within 5 Mb of the gene promoter (excluding the promoter’s own peak), weighted by HiC contact as in the ABC Model²². c. Promoter Activity x Enhancer Input (x-axis, right). Labels: gene symbols for 741 promoters with sequence activity estimates from ExP STARR-seq and enhancer input estimates from ABC. Dotted lines: Line of best fit from linear regression in log₂ space.

Extended Data Fig. 4 Enhancer and promoter cluster identification and reproducibility.

a. Heatmap of deviations in enhancer-promoter STARR-seq expression from a multiplicative enhancer-promoter model (color scale: fold-difference between observed expression versus expression predicted by multiplicative model; gray: missing data). Same as Fig 3a, except including clusters with weak sequences and missing data (E0 and P0). Vertical axis: promoter sequences grouped by class and sorted by responsiveness to E1 vs. E2; horizontal axis: enhancer sequences grouped by class and sorted by activation of P1 vs. P2. b. Distribution of intrinsic enhancer and promoter activity (expression versus genomic controls) by cluster. c. Fraction of enhancer-promoter pairs observed in ExP STARR-seq dataset (>= 2 plasmid barcodes) by cluster. d. Correlation of average promoter activation (expression versus genomic controls in enhancer position) by E2 versus E1 enhancer sequences. Each point is one promoter sequence. Same as Fig. 3c, except including P0 promoter sequences. e. Correlation of average activation of P2 versus P1 promoters. Each point is one enhancer sequence. Same as Fig. 3d, except including E0 enhancer sequences. f. Robustness of enhancer and promoter cluster assignments to downsampling of enhancer and promoter sequences. Clustering was repeated in 100 random downsamplings to 25% of promoter sequences and 25% of enhancer sequences (6.25% of original matrix). Heatmap: Average fraction overlap between cluster assignments from the full and downsampled matrices. g. Correlation of average promoter activation (expression versus genomic controls in enhancer position) by E2 versus E1 enhancer sequences using ‘average activity’ instead of model estimates. Each point is one promoter sequence. h. Correlation of average activation of P2 versus P1 promoters using ‘average activity’ instead of model estimates. Each point is one enhancer sequence.

Extended Data Fig. 5 Classes of enhancer and promoter sequences show distinct patterns of activation and responsiveness.

a. For 6 representative enhancer sequences (3 E1 and 3 E2 sequences), the pairwise correlation of promoter activation (expression versus genomic controls in promoter position, averaged across plasmid barcodes). Each point is one promoter sequence. b. For 6 representative promoter sequences (3 P2 and 3 P1 sequences), the pairwise correlation of activation by enhancers (expression versus genomic controls in enhancer position, averaged across plasmid barcodes). Each point is one enhancer sequence.

Extended Data Fig. 6 Classes of enhancer sequences correspond to strong and weak genomic enhancers.

a. Volcano plot comparing ChIP-seq and other genomic features for E2 versus E1 enhancer sequences (see Supplementary Table 4). X-axis: ratio of average signal at P2 versus P1 promoters. Red dots: features with significantly higher signal at E1; no features have significantly higher signal at E2 enhancer sequences. b. Volcano plot comparing transcription factor motifs for E1 versus E2 enhancer sequences (see Supplementary Table 5). X-axis: ratio of average motif counts in E1 and E2 enhancer sequences. Red dots: Motifs significantly more frequent in E1 vs. E2 sequences. c. Volcano plot comparing transcription factor motifs for E1 and E2 versus E0 enhancer sequences (see Supplementary Table 5). X-axis: ratio of average motif counts in E1 and E2 versus E0 sequences. Red dots: Motifs significantly more frequent in E1 and E2 versus E0 sequences (>0) or more frequent in E0 versus E1 and E2 (<0). d. Mean H3K27ac ChIP-seq coverage of genomic elements corresponding to E0, E1, E2, or genomic control enhancer sequences (+/− 95% CI), aligned by DHS peak summit. Dotted lines mark bounds of the enhancer sequences used in ExP STARR-seq. E0 and E2 distributions are overlapping. e. % effect of genomic elements corresponding to E1 vs. E2 enhancer sequences on expression of genes corresponding to P1 promoters in CRISPRi screens, separated by quartiles of 3D contact frequency measured by Hi-C (0.39-11.9 (n = 9), 11.9-23.9 (n = 31), 23.9-58.3 (36), 58.3-100(n=34)). *P < 0.05, two-sample, two-sided t-test. Boxes are median and interquartile range, whiskers are +/− 1.5*IQR. f. Cumulative density plot showing the cell-type specificity of enhancer sequences selected for ExP STARR-seq, and DNase peaks or ABC enhancers in K562 cells. X-axis: # of cell types other than K562 in which the element is predicted to be an ABC enhancer. g. GRO-Cap coverage of genomic enhancers used in ExP STARR-seq. Top: Mean coverage of enhancers corresponding to E1 vs. E2 classes. Bottom: Coverage across all individual enhancers. h. Evolutionary conservation of enhancers separated by enhancer class, as measured by mean phastcon score (probability of each nucleotide belonging to a conserved element) and mean phyloP score (-log(p-value) under a null hypothesis of neutral evolution) across each element. P-value from KS test.

Extended Data Fig. 7 Properties of promoter classes.

a. Cumulative density plot showing the cell-type specificity of promoter chromatin activity (of promoters selected for ExP STARR-seq). X-axis: # of biosamples (cell types or tissues) other than K562 in which the promoter is active. Active = Top 50% of promoters by activity (geometric mean of H3K27ac and DHS signals, as used in the ABC model). All genes = all genes in the genome. b. Gene ontology log₂-enrichment for P1 promoters using P1 and P2 promoters as a background set. c. Predicted enhancer inputs for each gene (sum of ABC scores for all candidate enhancers within 5 Mb of the TSS, excluding the promoter of the gene itself) for genes in the genome corresponding to P1 versus P2 promoters. P = 0.00083, Mann-Whitney U test. Boxes are median and interquartile range, whiskers are +/− 1.5*IQR. d. DNase-seq signal in K562 cells at P1 and P2 promoters in the genome, aligned by boundaries of the 264-bp ExP STARR-seq promoter sequence (dotted gray lines, see Methods). e. H3K27ac ChIP-seq signal in K562 cells at P1 and P2 promoters in the genome, aligned by boundaries of the 264-bp ExP STARR-seq promoter sequence (dotted grey lines, see Methods). f. Number of nearby accessible elements (within 100 Kb of the gene promoter, considering top 150,000 DNase peaks in K562 cells as used in the ABC model²²) for the 14 genes corresponding to P1 promoters and 11 genes corresponding to P2 promoters with comprehensive CRISPR tiling data. P = 0.17, Mann-Whitney U test. Boxes are median and interquartile range, whiskers are +/− 1.5*IQR. g. % Effect of CRISPRi perturbations to genomic regulatory elements on genes corresponding to P1 vs. P2 promoters. P = 0.0071, t-test. h. Fraction of promoter sequences containing TATA or CA initiator core promoter motifs. i. GRO-Cap coverage of genomic promoters aligned by TSS. Top: Mean coverage of genomic promoters corresponding to P1 vs. P2 classes. Bottom: Coverage across all individual promoters. j. Normalized CpG-content of P1 and P2 promoter sequences (n = 800), calculated as the ratio of observed to expected CpG = (CpG fraction) / ((GC content)² / 2). Boxes are median and interquartile range, whiskers are +/− 1.5*IQR, P = 1.37*10⁻¹⁰, t-test. k. Evolutionary conservation of promoters separated by promoter class, as measured by mean phastcon score (probability of each nucleotide belonging to a conserved element) and mean phyloP score (-log(p-value) under a null hypothesis of neutral evolution) across each element. P-value from KS test. l. Volcano plot comparing frequency of transcription factor motifs in P2 versus P1 promoter sequences (see Supplementary Table 7). X-axis: ratio of average motif counts in P2 versus P1 promoter sequences. Light blue and dark blue dots: Motifs significantly more frequent in P1 or P2 promoter sequences, respectively. Red outline: significant motifs for ETS family TFs. m. Volcano plot comparing frequency of transcription factor motifs in P2 and P1 versus P0 promoter sequences (see Supplementary Table 7). X-axis: ratio of average motif counts in P2 and P1 versus P0 promoter sequences. Dark blue dots: Motifs significantly more frequent in P2 and P1 vs. P0 promoter sequences. n. Fraction of P2 promoter sequences with YY1 and GABPA binding motifs by nucleotide position, aligned by TSS and separated by strand (see Methods).

Extended Data Fig. 8 Transcription factors enriched at promoters and enhancers and hybrid-selection STARR-seq in K562 cells.

a. ChIP-seq signal for 5 transcription factors in K562 cells at P1 and P2 promoters in the genome, aligned by boundaries of the 264-bp ExP STARR-seq promoter sequence (see Methods). Top: average ChIP-seq signal normalized to input. Bottom: signal at individual genomic promoters. Black line: average for random genomic control sequences. b. ChIP-seq signal at E1 and E2 enhancers in the genome. Black line: average for random genomic control sequences. c. Correlation between intrinsic promoter activity and responsiveness of promoters to E1 enhancers (average activation by E1 sequences, expressions vs. random genomic controls). Each point is one promoter. Same as Fig. 5b, but in normal scale instead of log₂ scale. d. Correlation of HS-STARR-seq expression between biological replicate experiments for promoter and accessible element pools, calculated for individual elements with unique plasmid barcodes. Axes represent the average STARR-seq expression (RNA/DNA, log₁₀ scale) of two biological replicates. Density: number of plasmids. e. Fragment length distribution in HS-STARR-seq in promoter and accessible element pools, of fragments with at least 25 DNA counts. f. STARR-seq expression (y-axis) and fragment length (x-axis) relationship in HS-STARR-seq. Density: number of plasmids.

Extended Data Fig. 9 Motif insertion and scramble ExP STARR-seq in K562 cells and generalizability of compatibility rules.

a. Correlation of ExP STARR-seq expression between biological replicate experiments, calculated for individual enhancer-promoter pairs with unique plasmid barcodes. Axes represent the average STARR-seq expression (RNA/DNA) of individual biological replicates. Density: number of enhancer-promoter plasmids. b. Distribution of plasmid barcodes per enhancer-promoter pair. Red dotted-line: threshold of two plasmid barcodes. c. STARR-seq expression in smaller-scale validation experiment (y-axis) vs. expression in the original ExP STARR-seq dataset (x-axis) for each enhancer-promoter pair included in both experiments. Dotted gray line: line of best fit from linear regression in log2 space. d. Change in enhancer activity with P1 or P2 promoters (edited enhancer activity compared with unedited enhancer activity with a promoter) after inserting 2, 4, or 6 GABPA motifs into 1 E0 enhancer sequence. Each point represents one enhancer-promoter pair measured over 4 biological replicates. *P < 0.0001, two-tailed t-test. Boxes are median and interquartile range, whiskers are +/− 1.5*IQR. e. Fraction of variance explained by intrinsic promoter activity and enhancer activity with respect to log2 reporter expression (reporter assay score) from Martinez-Ara et al. 2021³⁹. Left bars: experiment including promoters and enhancers from the Nanog and Klf2 loci. Right bars: experiment including promoters and enhancers from the Tfcp2l1 locus. For each experiment, values are shown for pairs with 2 or more, or 5 or more plasmid barcodes. Enhancer and promoter activities explain more of the variance when considering enhancer-promoter pairs with at least 5 vs. at least 2 barcodes. Bar plots show sequential sum of squares (Type-I ANOVA) for promoters, then enhancers. f. Correlation of reporter assay expression with the product of intrinsic promoter and enhancer activities from two experiments from Martinez-Ara et al., 2021³⁹. Density color scale: number enhancer-promoter pairs.

Extended Data Fig. 10 Model of the effect of an enhancer on RNA expression.

a. Simple rules of enhancer and promoter compatibility. The effects of enhancers on nearby genes in the human genome are controlled by the quantitative tuning of intrinsic promoter activity, intrinsic enhancer activity, enhancer-promoter 3D contact, and enhancer-promoter class compatibility.

Supplementary information

Reporting Summary

Supplementary Table 1

Promoter sequences used in ExP STARR-seq.

Supplementary Table 2

Enhancer sequences used in ExP STARR-seq.

Supplementary Table 3

ExP luciferase elements and data.

Supplementary Table 4

Biochemical feature enrichment in E1 vs E2 enhancers.

Supplementary Table 5

TF motif enrichment in E1 vs E2 enhancers.

Supplementary Table 6

Biochemical feature enrichment in P1 vs P2 promoters.

Supplementary Table 7

TF motif enrichment in P1 vs P2 promoters.

Supplementary Table 8

Genome-wide predictions of promoter class.

Supplementary Table 9

Motifs correlated with enhancer and promoter activity.

Supplementary Table 10

Primer and oligonucleotide sequences.

Supplementary Table 11

ENCODE datasets used to annotate ExP enhancers and promoters.

Supplementary Table 12

Enhancer hybrid selection probe sequences for HS-STARR-seq.

Supplementary Table 13

Promoter HS probe sequences for HS-STARR-seq.

Supplementary Table 14

Promoters used in motif insertion and mutation ExP STARR-seq.

Supplementary Table 15

Enhancers used in motif insertion and mutation ExP STARR-seq.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bergman, D.T., Jones, T.R., Liu, V. et al. Compatibility rules of human enhancer and promoter sequences. Nature 607, 176–184 (2022). https://doi.org/10.1038/s41586-022-04877-w

Download citation

Received: 28 September 2021
Accepted: 17 May 2022
Published: 20 May 2022
Issue Date: 07 July 2022
DOI: https://doi.org/10.1038/s41586-022-04877-w

This article is cited by

Increased enhancer–promoter interactions during developmental enhancer activation in mammals
- Zhuoxin Chen
- Valentina Snetkova
- Evgeny Z. Kvon
Nature Genetics (2024)
3D Enhancer–promoter networks provide predictive features for gene expression and coregulation in early embryonic lineages
- Dylan Murphy
- Eralda Salataj
- Effie Apostolou
Nature Structural & Molecular Biology (2024)
Enhancer selectivity in space and time: from enhancer–promoter interactions to promoter activation
- Jin H. Yang
- Anders S. Hansen
Nature Reviews Molecular Cell Biology (2024)
Epigenomic insights into common human disease pathology
- Christopher G. Bell
Cellular and Molecular Life Sciences (2024)
Decoding enhancer complexity with machine learning and high-throughput discovery
- Gabrielle D. Smith
- Wan Hern Ching
- Emily S. Wong
Genome Biology (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.