Abstract
Gene regulation in the human genome is controlled by distal enhancers that activate specific nearby promoters1. A proposed model for this specificity is that promoters have sequence-encoded preferences for certain enhancers, for example, mediated by interacting sets of transcription factors or cofactors2. This ‘biochemical compatibility’ model has been supported by observations at individual human promoters and by genome-wide measurements in Drosophila3,4,5,6,7,8,9. However, the degree to which human enhancers and promoters are intrinsically compatible has not yet been systematically measured, and how their activities combine to control RNA expression remains unclear. Here we design a high-throughput reporter assay called enhancer × promoter self-transcribing active regulatory region sequencing (ExP STARR-seq) and applied it to examine the combinatorial compatibilities of 1,000 enhancer and 1,000 promoter sequences in human K562 cells. We identify simple rules for enhancer–promoter compatibility, whereby most enhancers activate all promoters by similar amounts, and intrinsic enhancer and promoter activities multiplicatively combine to determine RNA output (R2 = 0.82). In addition, two classes of enhancers and promoters show subtle preferential effects. Promoters of housekeeping genes contain built-in activating motifs for factors such as GABPA and YY1, which decrease the responsiveness of promoters to distal enhancers. Promoters of variably expressed genes lack these motifs and show stronger responsiveness to enhancers. Together, this systematic assessment of enhancer–promoter compatibility suggests a multiplicative model tuned by enhancer and promoter class to control gene transcription in the human genome.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
MOF-mediated histone H4 Lysine 16 acetylation governs mitochondrial and ciliary functions by controlling gene promoters
Nature Communications Open Access 21 July 2023
-
Cancer lineage-specific regulation of YAP responsive elements revealed through large-scale functional epigenomic screens
Nature Communications Open Access 03 July 2023
-
New genetic and epigenetic insights into the chemokine system: the latest discoveries aiding progression toward precision medicine
Cellular & Molecular Immunology Open Access 17 May 2023
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout





Data availability
Raw and processed data for ExP STARR-seq, motif ExP STARR-seq, HS-STARR-seq and K562 PRO-seq can be found at the NCBI’s Gene Expression Omnibus under accession number GSE184426. Luciferase data can be found in Supplementary Table 3. Datasets used from the ENCODE Project are listed in Supplementary Table 10 and are available at https://www.encodeproject.org. Additional resources and protocols related to this study are available at https://www.engreitzlab.org/resources/.
Code availability
Code for fitting the multiplicative ExP model is available at https://doi.org/10.5281/zenodo.6514733 or https://github.com/broadinstitute/ExP-model-fit.
References
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
van Arensbergen, J., van Steensel, B. & Bussemaker, H. J. In search of the determinants of enhancer–promoter interaction specificity. Trends Cell Biol. 24, 695–702 (2014).
Emami, K. H., Navarre, W. W. & Smale, S. T. Core promoter specificities of the Sp1 and VP16 transcriptional activation domains. Mol. Cell. Biol. 15, 5906–5916 (1995).
Ohtsuki, S., Levine, M. & Cai, H. N. Different core promoters possess distinct regulatory activities in the Drosophila embryo. Genes Dev. 12, 547–556 (1998).
Emami, K. H., Jain, A. & Smale, S. T. Mechanism of synergy between TATA and initiator: synergistic binding of TFIID following a putative TFIIA-induced isomerization. Genes Dev. 11, 3007–3019 (1997).
Butler, J. E. F. Enhancer–promoter specificity mediated by DPE or TATA core promoter motifs. Genes Dev. 15, 2515–2519 (2001).
Yean, D. & Gralla, J. Transcription reinitiation rate: a special role for the TATA box. Mol. Cell. Biol. 17, 3809–3816 (1997).
Wefald, F. C., Devlin, B. H. & Williams, R. S. Functional heterogeneity of mammalian TATA-box sequences revealed by interaction with a cell-specific enhancer. Nature 344, 260–262 (1990).
Zabidi, M. A., Arnold, C. D., Schernhuber, K. & Pagani, M. Enhancer–core-promoter specificity separates developmental and housekeeping gene regulation. Nature 518, 556–559 (2015).
Banerji, J., Rusconi, S. & Schaffner, W. Expression of a β-globin gene is enhanced by remote SV40 DNA sequences. Cell 27, 299–308 (1981).
Banerji, J., Olson, L. & Schaffner, W. A lymphocyte-specific cellular enhancer is located downstream of the joining region in immunoglobulin heavy chain genes. Cell 33, 729–740 (1983).
Melnikov, A. et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat. Biotechnol. 30, 271–277 (2012).
Arnold, C. D. et al. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science 339, 1074–1077 (2013).
Kermekchiev, M., Pettersson, M., Matthias, P. & Schaffner, W. Every enhancer works with every promoter for all the combinations tested: could new regulatory pathways evolve by enhancer shuffling? Gene Expr. 1, 71–81 (1991).
Tewhey, R. et al. Direct identification of hundreds of expression-modulating variants using a multiplexed reporter assay. Cell 172, 1132–1134 (2018).
Klein, J. C. et al. A systematic evaluation of the design and context dependencies of massively parallel reporter assays. Nat. Methods 17, 1083–1091 (2020).
Muerdter, F. et al. Resolving systematic errors in widely used enhancer activity assays in human cells. Nat. Methods 15, 141–149 (2018).
Nguyen, T. A. et al. High-throughput functional comparison of promoter and enhancer activities. Genome Res. 26, 1023–1033 (2016).
Arnold, C. D. et al. Genome-wide assessment of sequence-intrinsic enhancer responsiveness at single-base-pair resolution. Nat. Biotechnol. 35, 136–144 (2017).
Haberle, V. et al. Transcriptional cofactors display specificity for distinct types of core promoters. Nature 570, 122–126 (2019).
Li, X. & Noll, M. Compatibility between enhancers and promoters determines the transcriptional specificity of gooseberry and gooseberry neuro in the Drosophila embryo. EMBO J. 13, 400–406 (1994).
Fulco, C. P. et al. Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019).
van Arensbergen, J. et al. Genome-wide mapping of autonomous promoter activity in human cells. Nat. Biotechnol. 35, 145–153 (2017).
Wall, L., deBoer, E. & Grosveld, F. The human β-globin gene 3′ enhancer contains multiple binding sites for an erythroid-specific protein. Genes Dev. 2, 1089–1100 (1988).
Tuan, D. Y., Solomon, W. B., London, I. M. & Lee, D. P. An erythroid-specific, developmental-stage-independent enhancer far upstream of the human “beta-like globin” genes. Proc. Natl. Acad. Sci. USA 86, 2554–2558 (1989).
Thakore, P. I. et al. Highly specific epigenome editing by CRISPR–Cas9 repressors for silencing of distal regulatory elements. Nat. Methods 12, 1143–1149 (2015).
Klann, T. S. et al. CRISPR–Cas9 epigenome editing enables high-throughput screening for functional regulatory elements in the human genome. Nat. Biotechnol. 35, 561–568 (2017).
Fulco, C. P. et al. Systematic mapping of functional enhancer-promoter connections with CRISPR interference. Science 354, 769–773 (2016).
Liu, Y. et al. Functional assessment of human enhancer activities using whole-genome STARR-sequencing. Genome Biol. 18, 219 (2017).
Haberle, V. & Stark, A. Eukaryotic core promoters and the functional basis of transcription initiation. Nat. Rev. Mol. Cell Biol. 19, 621–637 (2018).
Lenhard, B., Sandelin, A. & Carninci, P. Metazoan promoters: emerging characteristics and insights into transcriptional regulation. Nat. Rev. Genet. 13, 233–245 (2012).
Fan, K., Moore, J. E., Zhang, X.-O. & Weng, Z. Genetic and epigenetic features of promoters with ubiquitous chromatin accessibility support ubiquitous transcription of cell-essential genes. Nucleic Acids Res. 49, 5705–5725 (2021).
Xi, H. et al. Identification and characterization of cell type-specific and ubiquitous chromatin regulatory structures in the human genome. PLoS Genet. 3, e136 (2007).
Landolin, J. M. et al. Sequence features that drive human promoter function and tissue specificity. Genome Res. 20, 890–898 (2010).
Weingarten-Gabbay, S. et al. Systematic interrogation of human promoters. Genome Res. 29, 171–183 (2019).
Sahu, B. et al. Sequence determinants of human gene regulatory elements. Nat. Genet. 54, 283–294 (2022).
Yu, M. et al. GA-binding protein-dependent transcription initiator elements. Effect of helical spacing between polyomavirus enhancer a factor 3(PEA3)/ETS-binding sites on initiator activity. J. Biol. Chem. 272, 29060–29067 (1997).
Curina, A. et al. High constitutive activity of a broad panel of housekeeping and tissue-specific cis-regulatory elements depends on a subset of ETS proteins. Genes Dev. 31, 399–412 (2017).
Martinez-Ara, M., Comoglio, F., van Arensbergen, J. & van Steensel, B. Systematic analysis of intrinsic enhancer–promoter compatibility in the mouse genome. Mol. Cell https://doi.org/10.1101/2021.10.21.465269 (2022).
Maricque, B. B., Chaudhari, H. G. & Cohen, B. A. A massively parallel reporter assay dissects the influence of chromatin structure on cis-regulatory activity. Nat. Biotechnol. 37, 90–95 (2019).
Hong, C. K. Y. & Cohen, B. A. Genomic environments scale the activities of diverse core promoters. Genome Res. 32, 85–96 (2022).
Chiang, C. M. & Roeder, R. G. Cloning of an intrinsic human TFIID subunit that interacts with multiple transcriptional activators. Science 267, 531–536 (1995).
Austen, M., Lüscher, B. & Lüscher-Firzlaff, J. M. Characterization of the transcriptional regulator YY1. The bipartite transactivation domain is independent of interaction with the TATA box-binding protein, transcription factor IIB, TAFII55, or cAMP-responsive element-binding protein (CPB)-binding protein. J. Biol. Chem. 272, 1709–1717 (1997).
Sucharov, C., Basu, A., Carter, R. S. & Avadhani, N. G. A novel transcriptional initiator activity of the GABP factor binding ets sequence repeat from the murine cytochrome c oxidase Vb gene. Gene Expr. 5, 93–111 (1995).
Carter, R. S. & Avadhani, N. G. Cooperative binding of GA-binding protein transcription factors to duplicated transcription initiation region repeats of the cytochrome c oxidase subunit IV gene. J. Biol. Chem. 269, 4381–4387 (1994).
Usheva, A. & Shenk, T. YY1 transcriptional initiator: protein interactions and association with a DNA site containing unpaired strands. Proc. Natl Acad. Sci. USA 93, 13571–13576 (1996).
Larsson, A. J. M. et al. Genomic encoding of transcriptional burst kinetics. Nature 565, 251–254 (2019).
The FANTOM Consortium and the RIKEN PMI and CLST (DGT). A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).
Wang, T., Lander, E. S. & Sabatini, D. M. Large-scale single guide RNA library construction and use for CRISPR–Cas9-based genetic screens. Cold Spring Harb. Protoc. 2016, db.top086892 (2016).
Engreitz, J. M. et al. Local regulation of gene expression by lncRNA promoters, transcription and splicing. Nature 539, 452–455 (2016).
Nasser, J. et al. Genome-wide enhancer maps link risk variants to disease genes. Nature 593, 238–243 (2021).
Anscombe, F. J. The transformation of Poisson, binomial and negative-binomial data. Biometrika 35, 246–254 (1948).
Grant, C. E., Bailey, T. L. & Noble, W. S. FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).
Kulakovskiy, I. V. et al. HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-seq analysis. Nucleic Acids Res. 46, D252–D259 (2018).
Core, L. J. et al. Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat. Genet. 46, 1311–1320 (2014).
Vanhille, L. et al. High-throughput and quantitative assessment of enhancer activity in mammals by CapStarr-seq. Nat. Commun. 6, 6905 (2015).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
The R Development Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2014).
Van Rossum, G. & Drake, F. L. Python 3 Reference Manual. (CreateSpace, 2009).
Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
McKinney, W. Data structures for statistical computing in Python. In Proc. 9th Python in Science Conference (eds van der Walt, S. & Millman, J) 51–56 (SciPy, 2010).
Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).
Waskom, M. seaborn: statistical data visualization. J. Open Source Softw. 6, 3021 (2021).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Stovner, E. B. & Sætrom, P. PyRanges: efficient comparison of genomic intervals in Python. Bioinformatics 36, 918–919 (2020).
Seabold, S. & Perktold, J. Statsmodels: econometric and statistical modeling with Python. in Proc. 9th Python in Science Conference (eds van der Walt, S. & Millman, J) 92–96 (SciPy, 2010).
Acknowledgements
This work was supported by a NHGRI Genomic Innovator Award (R35HG011324 to J.M.E.); Gordon and Betty Moore and the BASE Research Initiative at the Lucile Packard Children’s Hospital at Stanford University (J.M.E.); a NIH Pathway to Independence Award (K99HG009917 and R00HG009917 to J.M.E.); the Harvard Society of Fellows (J.M.E.); the Novo Nordisk Foundation Center for Genomic Mechanisms of Disease (J.M.E.); the Broad Institute (E.S.L.); an AΩA Carolyn L. Kuckein Student Research Fellowship (D.T.B.); NHGRI Ruth L. Kirschstein NRSA Predoctoral Institutional Research Training Grants (T32HG000044, V.L.); and by the National Institute of General Medical Sciences (T32GM007753, L.S.). We thank B. van Steensel and M. Martinez-Ara for sharing data and discussing analysis. We thank C. Vockley, V. Subramanian and members of the Engreitz and Lander research groups for discussions and technical assistance. E.S.L is currently on leave from the Broad Institute, MIT, and Harvard.
Author information
Authors and Affiliations
Contributions
D.T.B., C.P.F., T.R.J., J.R. and J.M.E. developed the ExP STARR-seq assay. D.T.B., J.R., M.K., A.R. and T.H.N. performed the STARR-seq experiments. M.K. performed the luciferase assay experiments. D.T.B., T.R.J., V.L., E.J., L.S., H.Y.K., J.N., S.R.G. and J.M.E. analysed the STARR-seq data. M.K. and J.M.E analysed the luciferase assay data. E.S.L. and J.M.E. supervised the work. All authors contributed to writing the manuscript.
Corresponding author
Ethics declarations
Competing interests
C.P.F. is now an employee and shareholder of Bristol Myers Squibb. J.M.E. is a shareholder of Illumina, Inc, and other biotechnology companies. All other authors declare no competing interests.
Peer review
Peer review information
Nature thanks Alex Nord and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Design and reproducibility of ExP STARR-seq.
a. ExP STARR-seq reporter construct (pA = polyadenylation signal; purple = promoter sequencing adaptors; angled = spliced sequence; trGFP = truncated GFP open reading frame with start and stop codon; BC = 16 bp N-mer plasmid barcode; red = enhancer sequencing adaptors) and 1000x1000 K562 library contents. b. Correlation of ExP STARR-seq expression between biological replicate experiments, calculated for individual enhancer-promoter pairs with unique plasmid barcodes. Axes represent the average STARR-seq expression (RNA/DNA) of individual biological replicates. Density: number of enhancer-promoter plasmids. c. Fraction of remaining enhancer-promoter plasmids passing DNA (>25) and RNA (>1) threshold (y-axis) with downsampling of sequencing reads (x-axis). d. Distribution of plasmid barcodes per enhancer-promoter pair, red dotted-line is threshold of two plasmid barcodes. e. Correlation between virtual replicates, formed by sampling two nonoverlapping groups of three plasmid barcodes from pairs with at least 6 barcodes, and averaging log2(RNA/DNA) within groups. f. Correlation between virtual replicates as in (c) for increasing numbers of plasmid barcodes per pair in virtual replicates. g. DNase-seq, H3K27ac ChIP-seq, and PRO-seq (RPM) by increasing quartile of autonomous promoter activity and average enhancer activity in ExP STARR-seq (n = 800). Box: median and interquartile range (IQR). Whiskers: +/− 1.5 x IQR. h. Activation in ExP STARR-seq (expression versus genomic controls in distal position) of GATA1 and HDAC6 promoters by eHDAC6 (chrX:48641342-48641606). Ctrl = activity of promoters with random genomic controls in enhancer position. Error bars: 95% CI across plasmid barcodes. n = 7 (GATA1-ctrl), 381 (HDAC6-ctrl), 4 (eHDAC6-GATA1), 37 (eHDAC6-HDAC6). i. Average enhancer activity (STARR-seq expression of plasmids containing a given enhancer averaged across all promoters) of enhancer sequences derived from random genomic controls (n = 87), accessible elements (n = 725), and genomic enhancers validated in CRISPR experiments (n = 89).
Extended Data Fig. 2 Comparison of methods of estimating enhancer and promoter activities and the multiplicative model.
a. Intrinsic promoter activity (expression versus random genomic controls in enhancer position) of five selected promoters. Error bars: 95% CI across plasmid barcodes (n = 54-79). Promoter classes (see Methods): DNASE2 (P1), HDAC6 (P1), CD164 (P1), BCAT2 (P1), PPP1R15A (P2). b. Activation (expression versus random genomic controls in enhancer position) of 5 selected promoters by 5 selected enhancers: 1 = chr11:61602148-61602412 (E1), 2 = chr19:49467061-49467325 (E1), 3 = chrX:48641342-48641606 (E1), 4 = chr19:12893216-12893480 (E2), 5 = chr17:40851134-40851398 (E1). Error bars: 95% CI across plasmid barcodes (n = 12-56). c-d. Heatmap of promoter activity (a, expression divided by intrinsic enhancer activity) or enhancer activity (b, expression divided by intrinsic promoter activity) across all pairs of promoter (vertical) and enhancer sequences (horizontal). Axes are sorted by intrinsic promoter and enhancer activities, as in Fig. 2j. Grey: missing data. e. Intrinsic promoter and enhancer activity (y-axis, estimated by a Poisson count model) versus average pairwise Spearman correlation (as in Fig. 2c, d). f–g. Correlation between two estimates of promoter (c) and enhancer (d) activities. One method (“average activity”, x-axis) estimates activity calculated by averaging across elements, and the other method (“intrinsic activity”, y-axis) estimates activity by using coefficients estimated by a Poisson count model (see Methods). h–i. Correlation of intrinsic promoter (e) and enhancer (f) activity estimates from Poisson model using data from separate replicate experiments. j–k. Fraction of variance explained by promoter activity, enhancer activity, class interaction from the perspective of expression (STARR-seq score) and enhancer activation (fold-activation of an enhancer on a promoter, normalizing out promoter strength) limited to pairs with 2 or more (c) or 20 or more (d) plasmid barcodes. Plot includes pairs with P0 promoters and E0 enhancers. Bar plots show sequential sum of squares (Type-I ANOVA). l. Correlation of the multiplicative enhancer x promoter model with STARR-seq expression comparing enhancer-promoter pairs located within 10 kb, 100 kb, and pairs located on different chromosomes.
Extended Data Fig. 3 Validation of enhancer-promoter multiplication via luciferase assays and modeling gene transcription as a function of intrinsic promoter activity and enhancer inputs.
a. ExP luciferase reporter construct. Seven enhancer fragments, with flanking polyadenylation signals, were cloned upstream of five promoter fragments and measured via the dual luciferase assay. b. Autonomous promoter activity of ExP luciferase (average luciferase signal of promoter with negative control) for 5 promoter sequences derived from 3 genes (MYC, PVT1, CCDC26). Error bars are 95% CI from 6 (MYC) or 4 (all other promoters) biological replicates. c. Enhancer activation (luciferase signal versus negative control sequence in the enhancer position) of seven enhancers across five promoter fragments. Error bars are 95% CI from 6 (MYC) or 4 (all other promoters) biological replicates. d-f. Gene transcription (y-axis): PRO-seq read counts in the gene body. a. Promoter Activity (x-axis, left): Intrinsic promoter activity, as measured by ExP STARR-seq. b. Enhancer Input (x-axis, center): enhancer activity (based on measurements of H3K27ac and DHS in the genome) multiplied by enhancer-promoter contact (based on Hi-C measurements), summed across all putative enhancers (DHS peaks) within 5 Mb of the gene promoter (excluding the promoter’s own peak), weighted by HiC contact as in the ABC Model22. c. Promoter Activity x Enhancer Input (x-axis, right). Labels: gene symbols for 741 promoters with sequence activity estimates from ExP STARR-seq and enhancer input estimates from ABC. Dotted lines: Line of best fit from linear regression in log2 space.
Extended Data Fig. 4 Enhancer and promoter cluster identification and reproducibility.
a. Heatmap of deviations in enhancer-promoter STARR-seq expression from a multiplicative enhancer-promoter model (color scale: fold-difference between observed expression versus expression predicted by multiplicative model; gray: missing data). Same as Fig 3a, except including clusters with weak sequences and missing data (E0 and P0). Vertical axis: promoter sequences grouped by class and sorted by responsiveness to E1 vs. E2; horizontal axis: enhancer sequences grouped by class and sorted by activation of P1 vs. P2. b. Distribution of intrinsic enhancer and promoter activity (expression versus genomic controls) by cluster. c. Fraction of enhancer-promoter pairs observed in ExP STARR-seq dataset (>= 2 plasmid barcodes) by cluster. d. Correlation of average promoter activation (expression versus genomic controls in enhancer position) by E2 versus E1 enhancer sequences. Each point is one promoter sequence. Same as Fig. 3c, except including P0 promoter sequences. e. Correlation of average activation of P2 versus P1 promoters. Each point is one enhancer sequence. Same as Fig. 3d, except including E0 enhancer sequences. f. Robustness of enhancer and promoter cluster assignments to downsampling of enhancer and promoter sequences. Clustering was repeated in 100 random downsamplings to 25% of promoter sequences and 25% of enhancer sequences (6.25% of original matrix). Heatmap: Average fraction overlap between cluster assignments from the full and downsampled matrices. g. Correlation of average promoter activation (expression versus genomic controls in enhancer position) by E2 versus E1 enhancer sequences using ‘average activity’ instead of model estimates. Each point is one promoter sequence. h. Correlation of average activation of P2 versus P1 promoters using ‘average activity’ instead of model estimates. Each point is one enhancer sequence.
Extended Data Fig. 5 Classes of enhancer and promoter sequences show distinct patterns of activation and responsiveness.
a. For 6 representative enhancer sequences (3 E1 and 3 E2 sequences), the pairwise correlation of promoter activation (expression versus genomic controls in promoter position, averaged across plasmid barcodes). Each point is one promoter sequence. b. For 6 representative promoter sequences (3 P2 and 3 P1 sequences), the pairwise correlation of activation by enhancers (expression versus genomic controls in enhancer position, averaged across plasmid barcodes). Each point is one enhancer sequence.
Extended Data Fig. 6 Classes of enhancer sequences correspond to strong and weak genomic enhancers.
a. Volcano plot comparing ChIP-seq and other genomic features for E2 versus E1 enhancer sequences (see Supplementary Table 4). X-axis: ratio of average signal at P2 versus P1 promoters. Red dots: features with significantly higher signal at E1; no features have significantly higher signal at E2 enhancer sequences. b. Volcano plot comparing transcription factor motifs for E1 versus E2 enhancer sequences (see Supplementary Table 5). X-axis: ratio of average motif counts in E1 and E2 enhancer sequences. Red dots: Motifs significantly more frequent in E1 vs. E2 sequences. c. Volcano plot comparing transcription factor motifs for E1 and E2 versus E0 enhancer sequences (see Supplementary Table 5). X-axis: ratio of average motif counts in E1 and E2 versus E0 sequences. Red dots: Motifs significantly more frequent in E1 and E2 versus E0 sequences (>0) or more frequent in E0 versus E1 and E2 (<0). d. Mean H3K27ac ChIP-seq coverage of genomic elements corresponding to E0, E1, E2, or genomic control enhancer sequences (+/− 95% CI), aligned by DHS peak summit. Dotted lines mark bounds of the enhancer sequences used in ExP STARR-seq. E0 and E2 distributions are overlapping. e. % effect of genomic elements corresponding to E1 vs. E2 enhancer sequences on expression of genes corresponding to P1 promoters in CRISPRi screens, separated by quartiles of 3D contact frequency measured by Hi-C (0.39-11.9 (n = 9), 11.9-23.9 (n = 31), 23.9-58.3 (36), 58.3-100(n=34)). *P < 0.05, two-sample, two-sided t-test. Boxes are median and interquartile range, whiskers are +/− 1.5*IQR. f. Cumulative density plot showing the cell-type specificity of enhancer sequences selected for ExP STARR-seq, and DNase peaks or ABC enhancers in K562 cells. X-axis: # of cell types other than K562 in which the element is predicted to be an ABC enhancer. g. GRO-Cap coverage of genomic enhancers used in ExP STARR-seq. Top: Mean coverage of enhancers corresponding to E1 vs. E2 classes. Bottom: Coverage across all individual enhancers. h. Evolutionary conservation of enhancers separated by enhancer class, as measured by mean phastcon score (probability of each nucleotide belonging to a conserved element) and mean phyloP score (-log(p-value) under a null hypothesis of neutral evolution) across each element. P-value from KS test.
Extended Data Fig. 7 Properties of promoter classes.
a. Cumulative density plot showing the cell-type specificity of promoter chromatin activity (of promoters selected for ExP STARR-seq). X-axis: # of biosamples (cell types or tissues) other than K562 in which the promoter is active. Active = Top 50% of promoters by activity (geometric mean of H3K27ac and DHS signals, as used in the ABC model). All genes = all genes in the genome. b. Gene ontology log2-enrichment for P1 promoters using P1 and P2 promoters as a background set. c. Predicted enhancer inputs for each gene (sum of ABC scores for all candidate enhancers within 5 Mb of the TSS, excluding the promoter of the gene itself) for genes in the genome corresponding to P1 versus P2 promoters. P = 0.00083, Mann-Whitney U test. Boxes are median and interquartile range, whiskers are +/− 1.5*IQR. d. DNase-seq signal in K562 cells at P1 and P2 promoters in the genome, aligned by boundaries of the 264-bp ExP STARR-seq promoter sequence (dotted gray lines, see Methods). e. H3K27ac ChIP-seq signal in K562 cells at P1 and P2 promoters in the genome, aligned by boundaries of the 264-bp ExP STARR-seq promoter sequence (dotted grey lines, see Methods). f. Number of nearby accessible elements (within 100 Kb of the gene promoter, considering top 150,000 DNase peaks in K562 cells as used in the ABC model22) for the 14 genes corresponding to P1 promoters and 11 genes corresponding to P2 promoters with comprehensive CRISPR tiling data. P = 0.17, Mann-Whitney U test. Boxes are median and interquartile range, whiskers are +/− 1.5*IQR. g. % Effect of CRISPRi perturbations to genomic regulatory elements on genes corresponding to P1 vs. P2 promoters. P = 0.0071, t-test. h. Fraction of promoter sequences containing TATA or CA initiator core promoter motifs. i. GRO-Cap coverage of genomic promoters aligned by TSS. Top: Mean coverage of genomic promoters corresponding to P1 vs. P2 classes. Bottom: Coverage across all individual promoters. j. Normalized CpG-content of P1 and P2 promoter sequences (n = 800), calculated as the ratio of observed to expected CpG = (CpG fraction) / ((GC content)2 / 2). Boxes are median and interquartile range, whiskers are +/− 1.5*IQR, P = 1.37*10−10, t-test. k. Evolutionary conservation of promoters separated by promoter class, as measured by mean phastcon score (probability of each nucleotide belonging to a conserved element) and mean phyloP score (-log(p-value) under a null hypothesis of neutral evolution) across each element. P-value from KS test. l. Volcano plot comparing frequency of transcription factor motifs in P2 versus P1 promoter sequences (see Supplementary Table 7). X-axis: ratio of average motif counts in P2 versus P1 promoter sequences. Light blue and dark blue dots: Motifs significantly more frequent in P1 or P2 promoter sequences, respectively. Red outline: significant motifs for ETS family TFs. m. Volcano plot comparing frequency of transcription factor motifs in P2 and P1 versus P0 promoter sequences (see Supplementary Table 7). X-axis: ratio of average motif counts in P2 and P1 versus P0 promoter sequences. Dark blue dots: Motifs significantly more frequent in P2 and P1 vs. P0 promoter sequences. n. Fraction of P2 promoter sequences with YY1 and GABPA binding motifs by nucleotide position, aligned by TSS and separated by strand (see Methods).
Extended Data Fig. 8 Transcription factors enriched at promoters and enhancers and hybrid-selection STARR-seq in K562 cells.
a. ChIP-seq signal for 5 transcription factors in K562 cells at P1 and P2 promoters in the genome, aligned by boundaries of the 264-bp ExP STARR-seq promoter sequence (see Methods). Top: average ChIP-seq signal normalized to input. Bottom: signal at individual genomic promoters. Black line: average for random genomic control sequences. b. ChIP-seq signal at E1 and E2 enhancers in the genome. Black line: average for random genomic control sequences. c. Correlation between intrinsic promoter activity and responsiveness of promoters to E1 enhancers (average activation by E1 sequences, expressions vs. random genomic controls). Each point is one promoter. Same as Fig. 5b, but in normal scale instead of log2 scale. d. Correlation of HS-STARR-seq expression between biological replicate experiments for promoter and accessible element pools, calculated for individual elements with unique plasmid barcodes. Axes represent the average STARR-seq expression (RNA/DNA, log10 scale) of two biological replicates. Density: number of plasmids. e. Fragment length distribution in HS-STARR-seq in promoter and accessible element pools, of fragments with at least 25 DNA counts. f. STARR-seq expression (y-axis) and fragment length (x-axis) relationship in HS-STARR-seq. Density: number of plasmids.
Extended Data Fig. 9 Motif insertion and scramble ExP STARR-seq in K562 cells and generalizability of compatibility rules.
a. Correlation of ExP STARR-seq expression between biological replicate experiments, calculated for individual enhancer-promoter pairs with unique plasmid barcodes. Axes represent the average STARR-seq expression (RNA/DNA) of individual biological replicates. Density: number of enhancer-promoter plasmids. b. Distribution of plasmid barcodes per enhancer-promoter pair. Red dotted-line: threshold of two plasmid barcodes. c. STARR-seq expression in smaller-scale validation experiment (y-axis) vs. expression in the original ExP STARR-seq dataset (x-axis) for each enhancer-promoter pair included in both experiments. Dotted gray line: line of best fit from linear regression in log2 space. d. Change in enhancer activity with P1 or P2 promoters (edited enhancer activity compared with unedited enhancer activity with a promoter) after inserting 2, 4, or 6 GABPA motifs into 1 E0 enhancer sequence. Each point represents one enhancer-promoter pair measured over 4 biological replicates. *P < 0.0001, two-tailed t-test. Boxes are median and interquartile range, whiskers are +/− 1.5*IQR. e. Fraction of variance explained by intrinsic promoter activity and enhancer activity with respect to log2 reporter expression (reporter assay score) from Martinez-Ara et al. 202139. Left bars: experiment including promoters and enhancers from the Nanog and Klf2 loci. Right bars: experiment including promoters and enhancers from the Tfcp2l1 locus. For each experiment, values are shown for pairs with 2 or more, or 5 or more plasmid barcodes. Enhancer and promoter activities explain more of the variance when considering enhancer-promoter pairs with at least 5 vs. at least 2 barcodes. Bar plots show sequential sum of squares (Type-I ANOVA) for promoters, then enhancers. f. Correlation of reporter assay expression with the product of intrinsic promoter and enhancer activities from two experiments from Martinez-Ara et al., 202139. Density color scale: number enhancer-promoter pairs.
Extended Data Fig. 10 Model of the effect of an enhancer on RNA expression.
a. Simple rules of enhancer and promoter compatibility. The effects of enhancers on nearby genes in the human genome are controlled by the quantitative tuning of intrinsic promoter activity, intrinsic enhancer activity, enhancer-promoter 3D contact, and enhancer-promoter class compatibility.
Supplementary information
Supplementary Table 1
Promoter sequences used in ExP STARR-seq.
Supplementary Table 2
Enhancer sequences used in ExP STARR-seq.
Supplementary Table 3
ExP luciferase elements and data.
Supplementary Table 4
Biochemical feature enrichment in E1 vs E2 enhancers.
Supplementary Table 5
TF motif enrichment in E1 vs E2 enhancers.
Supplementary Table 6
Biochemical feature enrichment in P1 vs P2 promoters.
Supplementary Table 7
TF motif enrichment in P1 vs P2 promoters.
Supplementary Table 8
Genome-wide predictions of promoter class.
Supplementary Table 9
Motifs correlated with enhancer and promoter activity.
Supplementary Table 10
Primer and oligonucleotide sequences.
Supplementary Table 11
ENCODE datasets used to annotate ExP enhancers and promoters.
Supplementary Table 12
Enhancer hybrid selection probe sequences for HS-STARR-seq.
Supplementary Table 13
Promoter HS probe sequences for HS-STARR-seq.
Supplementary Table 14
Promoters used in motif insertion and mutation ExP STARR-seq.
Supplementary Table 15
Enhancers used in motif insertion and mutation ExP STARR-seq.
Rights and permissions
About this article
Cite this article
Bergman, D.T., Jones, T.R., Liu, V. et al. Compatibility rules of human enhancer and promoter sequences. Nature 607, 176–184 (2022). https://doi.org/10.1038/s41586-022-04877-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41586-022-04877-w
This article is cited by
-
Decoding enhancer complexity with machine learning and high-throughput discovery
Genome Biology (2023)
-
Current sequence-based models capture gene expression determinants in promoters but mostly ignore distal enhancers
Genome Biology (2023)
-
Identification of a genomic DNA sequence that quantitatively modulates KLF1 transcription factor expression in differentiating human hematopoietic cells
Scientific Reports (2023)
-
New genetic and epigenetic insights into the chemokine system: the latest discoveries aiding progression toward precision medicine
Cellular & Molecular Immunology (2023)
-
MOF-mediated histone H4 Lysine 16 acetylation governs mitochondrial and ciliary functions by controlling gene promoters
Nature Communications (2023)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.