Transcriptional cofactors display specificity for distinct types of core promoters

Haberle, Vanja; Arnold, Cosmas D.; Pagani, Michaela; Rath, Martina; Schernhuber, Katharina; Stark, Alexander

doi:10.1038/s41586-019-1210-7

Letter
Published: 15 May 2019

Transcriptional cofactors display specificity for distinct types of core promoters

Vanja Haberle¹^na1,
Cosmas D. Arnold¹^na1,
Michaela Pagani¹,
Martina Rath¹,
Katharina Schernhuber¹ &
…
Alexander Stark^1,2

Nature volume 570, pages 122–126 (2019)Cite this article

25k Accesses
80 Citations
106 Altmetric
Metrics details

Subjects

Abstract

Transcriptional cofactors (COFs) communicate regulatory cues from enhancers to promoters and are central effectors of transcription activation and gene expression¹. Although some COFs have been shown to prefer certain promoter types^2,3,4,5 over others (for example, see refs ^6,7), the extent to which different COFs display intrinsic specificities for distinct promoters is unclear. Here we use a high-throughput promoter-activity assay in Drosophila melanogaster S2 cells to screen 23 COFs for their ability to activate 72,000 candidate core promoters (CPs). We observe differential activation of CPs, indicating distinct regulatory preferences or ‘compatibilities’^8,9 between COFs and specific types of CPs. These functionally distinct CP types are differentially enriched for known sequence elements^2,4, such as the TATA box, downstream promoter element (DPE) or TCT motif, and display distinct chromatin properties at endogenous loci. Notably, the CP types differ in their relative abundance of H3K4me3 and H3K4me1 marks (see also refs ^10,11,12), suggesting that these histone modifications might distinguish trans-regulatory factors rather than promoter- versus enhancer-type cis-regulatory elements. We confirm the existence of distinct COF–CP compatibilities in two additional Drosophila cell lines and in human cells, for which we find COFs that prefer TATA-box or CpG-island promoters, respectively. Distinct compatibilities between COFs and promoters can explain how different enhancers specifically activate distinct sets of genes⁹, alternative promoters within the same genes, and distinct transcription start sites within the same promoter¹³. Thus, COF–promoter compatibilities may underlie distinct transcriptional programs in species as divergent as flies and humans.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Differential activation of CP candidates by transcriptional COFs.**

**Fig. 2: Groups of CPs activated preferentially by different COFs contain different CP motifs.**

**Fig. 3: H3K4me1 and H3K4me3 differentially mark promoters activated by distinct COFs.**

**Fig. 4: COF−CP compatibility is a conserved regulatory principle that underlies differential gene and alternative promoter activation.**

scGHOST: identifying single-cell 3D genome subcompartments

Article 08 April 2024

Kyle Xiong, Ruochi Zhang & Jian Ma

Nuclear mRNA decay: regulatory networks that control gene expression

Article 18 April 2024

Xavier Rambout & Lynne E. Maquat

Transcription bodies regulate gene expression by sequestering CDK9

Article Open access 08 April 2024

Martino Ugolini, Maciej A. Kerlin, … Nadine L. Vastenhouw

Data availability

All raw sequencing and processed data generated in this study have been deposited in the NCBI Gene Expression Omnibus (GEO) under accession numbers GSE116197 (D. melanogaster data) and GSE126221 (human data). Previously published datasets reanalysed in this study are available in the GEO repository under the following accession numbers: GSE47691 (RNA-seq), GSE58955 (GRO-seq), GSE40739 (DHS-seq), GSE22119 (MNase-seq), GSE52029 (ChIP–seq for Tbp and Trf2), GSE97841 (ChIP-exo for TAF1 and M1BP), GSE39664 (ChIP–seq for DREF), GSE64464 (ChIP–seq for P300/CBP), GSE30820 (ChIP–seq for Fsh/Brd4), GSE37864 (ChIP–seq for Mof), GSE47263 (ChIP–seq for Chro), GSE41440 (ChIP–seq for Lpt, Pol II, H3K4me1 and H3K4me3), GSE81795 (ChIP–seq for Set1, Trr and Trx; RNA-seq upon Trx depletion), GSE81649 (PRO-seq upon P300/CBP inhibition), GSE43180 (RNA-seq upon Fsh/Brd4 depletion), GSE95025 (single-cell RNA-seq of D. melanogaster embryo). S2 cells CAGE and Chro ChIP–seq data are available from modENCODE (http://data.modencode.org/, sample ID: 5331 and 5068, respectively). The full sequences of plasmids used in this study are available at www.addgene.org. No restrictions on data availability apply.

Code availability

All custom code used for data processing and computational analyses is available from the corresponding author upon request.

References

Zabidi, M. A. & Stark, A. Regulatory enhancer-core-promoter communication via transcription factors and cofactors. Trends Genet. 32, 801–814 (2016).
Article CAS Google Scholar
Ohler, U., Liao, G.-C., Niemann, H. & Rubin, G. M. Computational analysis of core promoters in the Drosophila genome. Genome Biol. 3, R87 (2002).
Article Google Scholar
Rach, E. A., Yuan, H.-Y., Majoros, W. H., Tomancak, P. & Ohler, U. Motif composition, conservation and condition-specificity of single and alternative transcription start sites in the Drosophila genome. Genome Biol. 10, R73 (2009).
Article Google Scholar
Parry, T. J. et al. The TCT motif, a key component of an RNA polymerase II transcription system for the translational machinery. Genes Dev. 24, 2013–2018 (2010).
Article CAS Google Scholar
Hoskins, R. A. et al. Genome-wide analysis of promoter architecture in Drosophila melanogaster. Genome Res. 21, 182–192 (2011).
Article CAS Google Scholar
Hsu, J.-Y. et al. TBP, Mot1, and NC2 establish a regulatory circuit that controls DPE-dependent versus TATA-dependent transcription. Genes Dev. 22, 2353–2358 (2008).
Article CAS Google Scholar
Stampfel, G. et al. Transcriptional regulators form diverse groups with context-dependent regulatory functions. Nature 528, 147–151 (2015).
Article CAS ADS Google Scholar
van Arensbergen, J., van Steensel, B. & Bussemaker, H. J. In search of the determinants of enhancer–promoter interaction specificity. Trends Cell Biol. 24, 695–702 (2014).
Article Google Scholar
Zabidi, M. A. et al. Enhancer–core-promoter specificity separates developmental and housekeeping gene regulation. Nature 518, 556–559 (2015).
Article CAS ADS Google Scholar
Rach, E. A. et al. Transcription initiation patterns indicate divergent strategies for gene regulation at the chromatin level. PLoS Genet. 7, e1001274 (2011).
Article CAS Google Scholar
Pérez-Lluch, S. et al. Absence of canonical marks of active chromatin in developmentally regulated genes. Nat. Genet. 47, 1158–1167 (2015).
Article Google Scholar
Boija, A. et al. CBP regulates recruitment and release of promoter-proximal RNA polymerase II. Mol. Cell 68, 491–503 (2017).
Article CAS Google Scholar
Haberle, V. et al. Two independent transcription initiation codes overlap on vertebrate core promoters. Nature 507, 381–385 (2014).
Article CAS ADS Google Scholar
Arnold, C. D. et al. Genome-wide assessment of sequence-intrinsic enhancer responsiveness at single-base-pair resolution. Nat. Biotechnol. 35, 136–144 (2017).
Article CAS Google Scholar
Chatterjee, S. & Struhl, K. Connecting a promoter-bound protein to TBP bypasses the need for a transcriptional activation domain. Nature 374, 820–822 (1995).
Article CAS ADS Google Scholar
Ptashne, M. & Gann, A. Transcriptional activation by recruitment. Nature 386, 569–577 (1997).
Article CAS ADS Google Scholar
Kockmann, T. et al. The BET protein FSH functionally interacts with ASH1 to orchestrate global gene activity in Drosophila. Genome Biol. 14, R18 (2013).
Article Google Scholar
Rickels, R. et al. An evolutionary conserved epigenetic mark of Polycomb response elements implemented by Trx/MLL/COMPASS. Mol. Cell 63, 318–328 (2016).
Article CAS Google Scholar
Herz, H.-M. et al. Enhancer-associated H3K4 monomethylation by Trithorax-related, the Drosophila homolog of mammalian Mll3/Mll4. Genes Dev. 26, 2604–2620 (2012).
Article CAS Google Scholar
Straub, T., Zabel, A., Gilfillan, G. D., Feller, C. & Becker, P. B. Different chromatin interfaces of the Drosophila dosage compensation complex revealed by high-shear ChIP–seq. Genome Res. 23, 473–485 (2013).
Article CAS Google Scholar
Ho, J. W. K. et al. Comparative analysis of metazoan chromatin organization. Nature 512, 449–452 (2014).
Article CAS ADS Google Scholar
Hochheimer, A. & Tjian, R. Diversified transcription initiation complexes expand promoter selectivity and tissue-specific gene expression. Genes Dev. 17, 1309–1320 (2003).
Article CAS Google Scholar
Burke, T. W. & Kadonaga, J. T. Drosophila TFIID binds to a conserved downstream basal promoter element that is present in many TATA-box-deficient promoters. Genes Dev. 10, 711–724 (1996).
Article CAS Google Scholar
Wang, Y.-L. et al. TRF2, but not TBP, mediates the transcription of ribosomal protein genes. Genes Dev. 28, 1550–1555 (2014).
Article CAS Google Scholar
Gurudatta, B. V., Yang, J., Van Bortle, K., Donlin-Asp, P. G. & Corces, V. G. Dynamic changes in the genomic localization of DNA replication-related element binding factor during the cell cycle. Cell Cycle 12, 1605–1615 (2013).
Article CAS Google Scholar
Baumann, D. G. & Gilmour, D. S. A sequence-specific core promoter-binding transcription factor recruits TRF2 to coordinately transcribe ribosomal protein genes. Nucleic Acids Res. 45, 10481–10491 (2017).
Article CAS Google Scholar
Karaiskos, N. et al. The Drosophila embryo at single-cell transcriptome resolution. Science 358, 194–199 (2017).
Article CAS ADS Google Scholar
Gilchrist, D. A. et al. Pausing of RNA polymerase II disrupts DNA-specified nucleosome organization to enable precise gene regulation. Cell 143, 540–551 (2010).
Article CAS Google Scholar
Arnold, C. D. et al. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science 339, 1074–1077 (2013).
Article CAS ADS Google Scholar
Heintzman, N. D. et al. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat. Genet. 39, 311–318 (2007).
Article CAS Google Scholar
Herschlag, D. & Johnson, F. B. Synergism in transcriptional activation: a kinetic view. Genes Dev. 7, 173–179 (1993).
Article CAS Google Scholar
Adelman, K. & Lis, J. T. Promoter-proximal pausing of RNA polymerase II: emerging roles in metazoans. Nat. Rev. Genet. 13, 720–731 (2012).
Article CAS Google Scholar
Michel, M. & Cramer, P. Transitions for regulating early transcription. Cell 153, 943–944 (2013).
Article CAS Google Scholar
Muerdter, F. et al. Resolving systematic errors in widely used enhancer activity assays in human cells. Nat. Methods 15, 141–149 (2018).
Article CAS Google Scholar
Arnold, C. D. et al. Quantitative genome-wide enhancer activity maps for five Drosophila species show functional enhancer conservation and turnover during cis-regulatory evolution. Nat. Genet. 46, 685–692 (2014).
Article CAS Google Scholar
Andersen, P. R., Tirian, L., Vunjak, M. & Brennecke, J. A heterochromatin-dependent transcription machinery drives piRNA expression. Nature 549, 54–59 (2017).
Article CAS ADS Google Scholar
Brown, J. B. et al. Diversity and dynamics of the Drosophila transcriptome. Nature 512, 393–399 (2014).
Article CAS ADS Google Scholar
Batut, P., Dobin, A., Plessy, C., Carninci, P. & Gingeras, T. R. High-fidelity promoter profiling reveals widespread alternative promoter usage and transposon-driven developmental gene expression. Genome Res. 23, 169–180 (2013).
Article CAS Google Scholar
The FANTOM Consortium and the RIKEN PMI and CLST (DGT). A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).
Article ADS Google Scholar
Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).
Article CAS ADS Google Scholar
Jayaprakash, A. D., Jabado, O., Brown, B. D. & Sachidanandam, R. Identification and remediation of biases in the activity of RNA ligases in small-RNA deep sequencing. Nucleic Acids Res. 39, e141–e141 (2011).
Article CAS Google Scholar
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Article Google Scholar
Philip, P. et al. CBP binding outside of promoters and enhancers in Drosophila melanogaster. Epigenetics Chromatin 8, 48 (2015).
Article Google Scholar
Shlyueva, D. et al. Hormone-responsive enhancer-activity maps reveal predictive motifs, indirect repression, and targeting of closed chromatin. Mol. Cell 54, 180–192 (2014).
Article CAS Google Scholar
Fuda, N. J. et al. GAGA factor maintains nucleosome-free regions and has a role in RNA polymerase II recruitment to promoters. PLoS Genet. 11, e1005108 (2015).
Article Google Scholar
Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with RNA-seq. Bioinformatics 25, 1105–1111 (2009).
Article CAS Google Scholar
Anders, S., Pyl, P. T. & Huber, W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).
Article CAS Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Article Google Scholar
FitzGerald, P. C., Sturgill, D., Shyakhtenko, A., Oliver, B. & Vinson, C. Comparative genomics of Drosophila and human core promoters. Genome Biol. 7, R53 (2006).
Article Google Scholar
Falcon, S. & Gentleman, R. Using GOstats to test gene lists for GO term association. Bioinformatics 23, 257–258 (2007).
Article CAS Google Scholar
Gentleman, R. C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004).
Article Google Scholar
R Core Team. R: A Language and Environment for Statistical Computing http://www.R-project.org/ (R Foundation for Statistical Computing, Vienna, Austria, 2013).
Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).
Article CAS Google Scholar
Barberis, A. et al. Contact with a component of the polymerase II holoenzyme suffices for gene activation. Cell 81, 359–368 (1995).
Article CAS Google Scholar

Download references

Acknowledgements

The authors thank C. Plaschka, L. Cochella, P. R. Andersen and Life Science Editors for comments on the manuscript; the IMP/IMBA Graphics Department for help with Fig. 4; J. Wysocka, T. Swigut and K. Dorighi (Stanford University), M. Seimiya and R. Paro (ETH Zürich), and P. R. Andersen and J. Brennecke (IMBA) for sharing MLL3, Trx and Trf2 cDNAs. Deep sequencing was performed at the Vienna Biocenter Core Facilities GmbH. V.H. is supported by the Human Frontier Science Program (grant no. LT000324/2016-L). Research in the Stark group is supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 647320) and by the Austrian Science Fund (FWF, P29613-B28 and F4303-B09). Basic research at the IMP is supported by Boehringer Ingelheim GmbH and the Austrian Research Promotion Agency (FFG).

Author information

These authors contributed equally: Vanja Haberle, Cosmas D. Arnold

Authors and Affiliations

Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Vienna, Austria
Vanja Haberle, Cosmas D. Arnold, Michaela Pagani, Martina Rath, Katharina Schernhuber & Alexander Stark
Medical University of Vienna, Vienna Biocenter (VBC), Vienna, Austria
Alexander Stark

Authors

Vanja Haberle
View author publications
You can also search for this author in PubMed Google Scholar
Cosmas D. Arnold
View author publications
You can also search for this author in PubMed Google Scholar
Michaela Pagani
View author publications
You can also search for this author in PubMed Google Scholar
Martina Rath
View author publications
You can also search for this author in PubMed Google Scholar
Katharina Schernhuber
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Stark
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

V.H., C.D.A. and A.S. conceived the project. C.D.A. and M.P. performed the (COF-) STAP-seq screens, C.D.A. performed the luciferase experiments, and M.P., M.R. and K.S. cultured cells and performed transfections. V.H. performed the computational analyses. V.H., C.D.A. and A.S. interpreted the data and wrote the manuscript. A.S. supervised the project.

Corresponding author

Correspondence to Alexander Stark.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Selection of CP candidates and COFs.

a, List of initial 13 D. melanogaster COFs used in this study (see Extended Data Fig. 7 for ten additional COFs). For each COF, relevant information about its function is shown (functional domain, enzymatic activity and protein complex) and the name of the respective mammalian homologue from Ensembl database. b, CP candidates from the D. melanogaster genome were selected sequentially (in order of the white arrow) based on TSSs from datasets that map endogenous transcription initiation (CAGE³⁷ and RAMPAGE³⁸), TSSs in reporter assays (STAP-seq¹⁴), or FlyBase (v.5.57) and Ensembl (v.78) gene annotations (for each new dataset, only TSSs that were more than 10 bp away from TSSs already present in the selection were added). As negative controls, random positions without any evidence of initiation were selected. A total of 72,000 TSSs were used as reference points to design CP oligos encompassing 66 bp upstream and 66 bp downstream of the TSS. c, Overview of COF-recruitment STAP-seq (COF-STAP-seq), a high-throughput activator bypass^15,16,54-like assay that we created by combining a plasmid-based high-throughput promoter-activity assay, self-transcribing active core promoter-sequencing (STAP-seq)¹⁴ with the GAL4-DBD-mediated recruitment of individual COFs⁷. The D. melanogaster CP candidate library, pre-mixed with the D. pseudoobscura CP spike-in mix, was co-transfected with an expression plasmid for one of the GAL4-DBD–COF fusion proteins. If binding of a GAL4-DBD–COF to the 4xUAS array activates transcription from a candidate CP, this generates reporter RNAs with a short 5′ sequence tag, derived from the 3′ end of the corresponding CP. These reporter transcripts are captured with a 5′ RNA linker that includes a 10-nt-long UMI, allowing counting of individual reporter RNA molecules. In addition, the RNA linker contains a 4-nt sample barcode (BC), used for sample identification, enabling pooled processing of up to eight samples after linker ligation. This is followed by selective reverse transcription, PCR amplification, deep sequencing and mapping of the 5′ sequence tags to quantify productive initiation events at single-base-pair resolution for all candidate CPs in the library and spike-in CPs.

Extended Data Fig. 2 COF recruitment reproducibly activates transcription preferentially from annotated CP sequences.

a, Pairwise comparisons of normalized STAP-seq tag counts between three independent biological replicates per COF across all 72,000 tested CP candidates. The PCC is denoted for each comparison. b, Total unique STAP-seq tag counts for P65, GFP and the 13 COFs (left, raw counts; right, counts relative to spike-in). Bar heights, mean counts; error bars, s.d. n = 3 independent biological replicates for each COF. c, Distribution of normalized STAP-seq tag counts from all COFs at candidates grouped by different annotated genomic regions (FlyBase v.5.57). CP regions were defined as 100-bp regions from 50 bp upstream to 50 bp downstream of annotated gene TSSs, and ‘proximal promoter’ as regions up to 250 bp upstream of annotated gene TSSs. ‘Gene body’ includes both exons and introns, but excludes 5′ UTRs, which form a separate category. ‘Random negative regions’ represent candidates selected as negative controls (see Extended Data Fig. 1b) irrespective of their genomic location. n, number of independent CP candidates per box; boxes show median and interquartile range; dots are mean; whiskers indicate 5th and 95th percentiles. d, Genomic distribution of CP candidates (top; n = 72,000) and of unique STAP-seq tags; that is, transcripts initiated at CP candidates upon activation by any of the COFs (bottom; n = 41,069,770). Annotated gene CPs (red) are highly enriched for STAP-seq tags.

Extended Data Fig. 3 COFs have characteristically different CP-activation profiles.

a, COF-STAP-seq signals (transcription initiation events) of each of the 13 COFs and the positive and negative controls (P65 and GFP, respectively) from CP candidates in the representative genomic locus (same as in Fig. 1b but showing all 13 COFs). Negative values denote transcription initiation on the antisense strand. b, Principal component analysis of STAP-seq tag count normalized to spike-ins for 30,936 CPs significantly activated above GFP by at least one COF (≥twofold enrichment over GFP and Student’s t-test FDR ≤ 0.06; see Methods) in three biological replicates per tested COF and controls. Scatter plot of projections onto the first two principal components (left) and the per cent of variance explained by each principal component (right) are shown. c, Hierarchical clustering of individual biological replicates per COF based on PCCs across 30,936 CPs activated by at least one COF. All biological replicates cluster closely together and reproduce the functional COF groups shown in Fig. 1c derived from merged replicates. Blue-to-red shading indicates the PCC for each comparison. d, Comparison of CP activation above GFP (induction) in STAP-seq (x axis) and luciferase (y axis) for 50 CPs tested with P65 and four different COFs. PCC indicated for each comparison.

Extended Data Fig. 4 COF−CP compatibilities are cell-type independent.

a, Representative genomic locus showing differential COF-STAP-seq signals for recruitment of MED25, Lpt, Chro and Mof in three D. melanogaster cell lines. Each COF preferentially activates the same CPs in all three cell lines (S2, OSC and Kc167 cells), and these preferences differ between COFs. STAP-seq data is the merge of three independent biological replicates. b, Hierarchical clustering of P65 and six COFs tested in all three cell lines based on PCC of CP activation in each cell line. c, Activation of all 72,000 CP candidates by different COFs in the three cell lines. For each COF, the CPs are first sorted by activation in S2 cells and then the activation in OSC and Kc167 cells is displayed in the same order. PCCs (right) were calculated by comparing OSC or Kc167 with S2 cells, respectively. d, COF-STAP-seq activation of 50 CPs selected for luciferase assays in S2 cells (see Fig. 1d) by different COFs and P65 in the three cell lines (subset of c). Differential activation of CPs by each COF is consistent across all cell lines. e, Pairwise comparison of CP activation by different COFs above GFP (induction) in OSC versus S2 cells (top row) and Kc167 versus S2 cells (bottom row) for all 72,000 CP candidates.

Extended Data Fig. 5 COFs preferentially activate CPs of their endogenously bound and regulated target genes.

a–e, Binding of Trr¹⁸ (a), Lpt¹⁹ (b), Mof²⁰ (c) and Trx¹⁸ (e) in S2 cells and Chro in D. melanogaster embryos²¹ (d) to 5,933 CPs active in COF-STAP-seq and endogenously in S2 cells (as in Fig. 1e but for additional COFs). Per COF, CPs are sorted by STAP-seq activation (left) and ChIP–seq coverage is shown in heat maps and box plots (−150 to +50-bp window around the TSS; n = 297 independent CPs per box; box shading, mean STAP-seq tag count; boxes show median and interquartile range; whiskers indicate 5th and 95th percentiles; one-sided Wilcoxon rank-sum test; all ChIP–seq data from previous publications; see Supplementary Table 1 for details and references). For all COFs, the most strongly activated CPs in COF-STAP-seq are significantly more strongly bound by the respective COF in their endogenous genomic context compared to CPs that are activated weakly (note that even though this also holds for Lpt, the trend for Lpt starts only after the most strongly activated CPs (first two bins), which are less strongly bound than expected). f, Expression fold change upon Trx depletion by RNAi for genes associated with top and bottom 25% CPs by activation with Trx (RNA-seq data from ref. ¹⁸; see also Supplementary Table 1). Only CPs associated with genes that are active in S2 cells and activated in COF-STAP-seq by at least one COF are included. g, STAP-seq tag count for CPs of genes downregulated upon Trx depletion by RNAi versus CPs of all other genes expressed in S2 cells and activated by at least one COF (RNA-seq data from ref. ¹⁸; n, number of independent CPs; boxes show median and interquartile range; whiskers indicate 5th and 95th percentiles; one-sided Wilcoxon rank-sum test).

Extended Data Fig. 6 Defining and validating CP groups activated preferentially by different COFs.

a, Spike-in normalized COF-STAP-seq tag counts (left heat map) for 30,936 CP candidates (columns) clustered based on their preferential activation by different COFs (rows). These tag counts were transformed for each CP separately into Z-scores (right heat map) to highlight the differential activation by different COFs independently of the overall activity of the CP. We then used these Z-score-transformed values to cluster the CPs into five groups of respectively similar activation profiles across all COFs irrespective of absolute activation levels using k-means clustering (the CPs in both heat maps are organized identically according to these groups, see coloured bar on top). The line plot on the left shows the average spike-in normalized COF-STAP-seq tag count across all CPs of each group for each of the 13 COFs and the two controls. b, Per cent of variance in the data explained by clustering CPs into different number of clusters with k-means (k ranging from 1 to 10). Increasing the number of clusters beyond five is of little benefit in explaining the variance in the data. c, Gain of per cent variance explained by increasing the number of clusters in steps of one from three to six. d, Distribution of sum of squared distances to centroids of the clusters for number of clusters ranging from one to ten, using a fivefold cross-validation approach. The data was binned randomly into five equally sized bins, one bin was left aside as a test set and clustering was performed on the remaining four bins. Sum of squared distances to the nearest centroid for each data point in the test set was then calculated. The procedure was repeated for each number of clusters (k). Increasing the number of clusters beyond five does not lead to substantially more coherent or dense clusters. For each box, n = 30,936 independent CPs. e–g, Clustering of 30,936 CPs (columns) based on their preferential activation by different COFs (rows) as in a, but using data for only one replicate as indicated. k-means clustering (k = 5) for each individual replicate reproduces qualitatively the same groups obtained with the merged replicates (see a). h, Agreement between assignment of CPs to groups in individual replicates and in the pooled data (left). In each replicate, around 85% of CPs are assigned to the same group as in the assignment based on pooled replicates. Bar plot, number of replicates that reproduce group assignment for individual CPs is shown on the right. For around 94% of CPs, the group assignment is reproduced in at least two replicates. i, Pairwise distances in CP response to six COFs and two controls for CPs belonging to the same (intra-) or different (inter-) clusters (defined in S2 cells) in all three D. melanogaster cell lines. n = 115,508,123 and 362,994,457 independent CP pairs for intra and inter-cluster boxes, respectively. *P ≤ 0.01; one-sided Wilcoxon rank-sum test. j, Induction (activation above GFP) of CPs (five groups defined in S2 cells; see a) by P65 and six COFs in S2 (top), OSC (middle) and Kc167 (bottom) cells. Each of the six COFs preferentially activates the same CP groups in all three cell lines; that is, COF–CP preferences appear to be cell-type independent. n = 5,723, 11,538, 3,203, 5,038 and 5,434 CPs, for groups 1 to 5, respectively. In d, i, j, boxes show median and interquartile range; whiskers indicate 5th and 95th percentiles.

Extended Data Fig. 7 CP preferences of ten additional COFs.

a, List of ten additionally tested D. melanogaster COFs. For each COF, relevant information about its function is shown (functional domain, enzymatic activity and protein complex) as well as the name of the respective mammalian homologue. b, Total COF-STAP-seq tag counts relative to spike-in for GFP (negative control) and the ten COFs. Bar heights, mean counts; error bars, s.d.; n = 3 independent biological replicates per COF. c, Per cent of variance in the data explained by clustering CPs into different numbers of clusters with k-means (k ranging from 1 to 10) using the original dataset containing 13 COFs, P65 and GFP (as in Extended Data Fig. 6b; blue) or the extended dataset with ten additional COFs (23 total; red). The curves are highly similar for both datasets; that is, the same number of clusters explains the same amount of variance in both the original and the extended dataset. d, As in Extended Data Fig. 6a but for the extended dataset of 23 COFs: spike-in normalized STAP-seq tag counts (left heat map) for 30,936 CPs (columns) clustered based on their preferential activation by 23 different COFs and two controls (rows). Tag counts were transformed into Z-scores (right heat map), which were used to cluster CPs into five clusters with k-means. For comparison, groups defined on the dataset containing 13 COFs and two controls (Extended Data Fig. 6a) are shown in the top row and groups defined with this extended dataset are shown below. e, Correlation between each of the six activating COFs in the extended dataset and the 13 COFs of the original dataset. *PCC ≥ 0.9.

Extended Data Fig. 8 CPs activated by distinct COFs discriminate between housekeeping and developmental gene regulation.

a, Expression variability between around 8,000 single cells of a stage 6 D. melanogaster embryo for genes associated with each of the five different CP groups (single-cell RNA-seq data from ref. ²⁷). b, GO term enrichment analysis (GOStats R/Bioconductor package v.2.34.0) for genes associated with the five different CP groups. c, d, Activation of 72,000 CP candidates by a developmental (dev; from the gene zfh1) and a housekeeping (hk; from the gene ssp3) enhancer (enhancers and enhancer-less control obtained from refs ^9,14). CPs are grouped into five groups as in Extended Data Fig. 6a. The enhancer-less control reflects the basal activity of the CPs. Group 3 CPs have the highest basal activity but are further activated by the hk enhancer. n = 5,723, 11,538, 3,203, 5,038 and 5,434 independent CPs, for groups 1 to 5, respectively; boxes show median and interquartile range; whiskers indicate 5th and 95th percentiles. e, f, Transcription-factor motif enrichment analysis in the sequence 500 bp upstream of the TSS (e) or within the nearest developmental or housekeeping enhancer (from ref. ⁹; f) for the five CP groups. n = 5,723, 11,538, 3,203, 5,038 and 5,434 independent CPs, for groups 1 to 5, respectively. NS, not significant (two-sided Fisher’s exact test; P-values corrected for multiple testing by Benjamini–Hochberg procedure; FDR > 0.01).

Extended Data Fig. 9 CPs activated preferentially by distinct COFs differ in their sequence and in endogenous chromatin features.

a, Occurrence of specific dinucleotides (see label in each heat map) relative to TSSs for CPs of the five groups defined in Extended Data Fig. 6a. Within each group, CPs are sorted decreasingly by the COF-STAP-seq tag count of the respective strongest COFs (denoted on the left). Darker shade reflects higher density of the respective dinucleotides at specific positions. b, c, Examples of genomic loci with CPs active in S2 cells that are differentially activated by COFs in STAP-seq. All supporting data tracks are from S2 cells and reanalysed from previous publications (see Supplementary Table 1 for details and references). b, CPs of KLHL18 and Spt3 (group 3), and GCC185 and DCAF12 (group 4), are preferentially activated by Mof and Chro, respectively, and have high levels of H3K4me3 downstream of their TSSs. By contrast, the CP of Ect3 (group 1) is preferentially activated by P300 and has high levels of H3K4me1 both upstream and downstream of the TSS but almost no H3K4me3, although Ect3 is expressed and the CP is endogenously active in S2 cells. c, CPs of CkIIalpha-i3 (group 4) and CG13896 (group 3) are preferentially activated by Chro and Mof, respectively, and both bear high levels of H3K4me3 and low levels of H3K4me1 downstream of the TSS. By contrast, the CP of CG13895 (group 1) is preferentially activated by P300 and is marked by higher levels of H3K4me1, but lower levels of H3K4me3, although the gene is expressed in S2 cells. d, Average H3K4me1 ChIP–seq coverage in the 500-bp window upstream (left) and 500-bp window downstream (right) of the TSS for five groups of CPs active in S2 cells (as in Fig. 3b). n = 646, 363, 1,842, 1,885 and 179 CPs, for groups 1 to 5, respectively. e, Heat maps showing endogenous expression (as measured by RNA-seq (left) and GRO-seq (right)) of genes associated with CPs active in S2 cells from the five CP groups (RNA-seq and GRO-seq data from refs ^44,45; see Supplementary Table 1 for details and references). Within each group, CPs are sorted decreasingly by STAP-seq of the respective strongest COFs (denoted on the left). f, Gene expression for genes associated with five groups of CPs as in e but shown as box plots. n = 646, 363, 1,842, 1,885 and 179 CPs, for groups 1 to 5, respectively. In d, f, boxes show median and interquartile range; whiskers indicate 5th and 95th percentiles. g, Example of differentially activated alternative promoters. h, Example of differentially activated closely spaced TSSs (g, h, merge of three independent biological replicates).

Extended Data Fig. 10 Sequence-encoded COF−CP compatibility is conserved in humans.

a, Total unique STAP-seq tag counts relative to spike-in for P65, GFP and five human COFs from COF-STAP-seq in human HCT116 cells. Bar heights, mean counts; error bars, s.d.; n = 3 independent biological replicates for each COF. b, COF-STAP-seq signals (transcription initiation) activated by P65, and the five human COFs for the CPs of MMP1 (TATA-box promoter; left) and CIZ1 (CpG-island promoter; right; STAP-seq data: merge of three independent biological replicates). c, Hierarchical clustering of independent biological replicates for all tested human COFs based on PCCs across 12,000 human CP candidates. d, Occurrence of different dinucleotides (TA, AT, AA, CG and GC) around TSSs in CPs sorted by the ratio between COF-STAP-seq signals with MED15 and MLL3, for 9,607 CPs activated by either COF.

Supplementary information

Reporting Summary

Supplementary Table 1 – Used published datasets

List of all previously published datasets reanalysed in this study, with respective references, GEO and SRA accessions and mapping statistics.

Supplementary Table 2 – COF expression and spike-in CPs expression driver sequences

Sequences of Drosophila pseudoobscura enhancers and promoters used as drivers for cofactor expression and for expression of spike-in core promoters, with respective primers used to amplify them. Sequence of the 4xUAS array and the gBlock used for cloning pSTAP-seq_human-4xUAS.

Supplementary Table 3 – Human COFs

Sequences of primers used to clone the human COFs and cDNA sequences of the human BRD4, EMSY, EP300, MED15 and MLL3 COFs.

Supplementary Table 4 – COF STAP-seq primers

Sequences of primers used in COF STAP-seq pipeline, including library cloning primers, nested PCR and sequencing-ready PCR primers, and 5’ RNA linkers.

Supplementary Table 5 – CP candidates included in the genome-wide

Drosophila library. Table of all 72,000 Drosophila melanogaster CP candidates included in the STAP-seq library, with genomic coordinates, dataset supporting the choice and oligo sequence for each candidate.

Supplementary Table 6 – CP candidates included in the focused human library

Table of all 12,000 human CP candidates included in the STAP-seq library, with genomic coordinates, dataset supporting the choice and oligo sequence for each candidate.

Supplementary Table 7 – human spike-in CPs

Sequences of Mus musculus promoters used as spike-in core promoters, with respective genomic coordinates, full DNA sequence, primers used to amplify them, and concentrations of individual spike-in plasmids used for creating the spike-in mix co-transfected in STAP-seq. Sequence of the gBlock used for cloning pSTAP-seq_human_spike-in.

Supplementary Table 8 – Luciferase assay primers

Sequences of core promoters and primers for 50 core promoter candidates selected for validation in luciferase assay.

Supplementary Table 9 – COF STAP-seq mapping statistics

Summary of total sequenced reads, mapped reads and unique STAP-seq tags (after collapsing by UMI) for 78 independent COF STAP-seq and 4 enhancer STAP-seq datasets in S2 cells, 24 COF STAP-seq datasets in OSC cells, 24 COF STAP-seq datasets in Kc167 cells and 21 COF STAP-seq datasets in human HCT116 cells. Counts mapping to referent CP candidate library (Drosophila melanogaster or human) and to spike-in CPs (Drosophila pseudoobscura or Mus musculus) are reported.

Supplementary Table 10 – Spike-in CPs tag counts and normalization factors

Unique STAP-seq tag counts mapping to each of the 9 Drosophila pseudoobscura (for fly samples) or Mus musculus (for human samples) spike-in CPs, along with the calculated normalization factors used to scale down each of the independent COF STAP-seq datasets within a single batch.

Supplementary Table 11 – Normalized tagcounts for fly CPs activated in STAP-seq

List of 30,936 CP candidates activated significantly above GFP by at least one cofactor (COF) with normalized tag counts per COF (averaged across the 3 biological replicates).

Supplementary Table 12 – Normalized tagcounts for human CPs

List of 12,000 CP candidates with normalized tag counts per COF (averaged across the 3 biological replicates).

Supplementary Table 13 – Non-redundant set of activated CPs in fly

Non-overlapping subset of activated fly CPs such that only a single oligo per promoter region is kept (the one with the highest overall activity), with normalized tagcounts per COF.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Haberle, V., Arnold, C.D., Pagani, M. et al. Transcriptional cofactors display specificity for distinct types of core promoters. Nature 570, 122–126 (2019). https://doi.org/10.1038/s41586-019-1210-7

Download citation

Received: 03 July 2018
Accepted: 15 April 2019
Published: 15 May 2019
Issue Date: 06 June 2019
DOI: https://doi.org/10.1038/s41586-019-1210-7

This article is cited by

Proformer: a hybrid macaron transformer model predicts expression values from promoter sequences
- Il-Youp Kwak
- Byeong-Chan Kim
- Wuming Gong
BMC Bioinformatics (2024)
A single-cell massively parallel reporter assay detects cell-type-specific gene regulation
- Siqi Zhao
- Clarice K. Y. Hong
- Barak A. Cohen
Nature Genetics (2023)
Heat shock protein family A member 8 serving as a co-activator of transcriptional factor ETV4 up-regulates PHLDA2 to promote the growth of liver cancer
- Shuai Wang
- Yu-fei Wang
- Wei Lu
Acta Pharmacologica Sinica (2023)
Systematic discovery of recombinases for efficient integration of large DNA sequences into the human genome
- Matthew G. Durrant
- Alison Fanton
- Patrick D. Hsu
Nature Biotechnology (2023)
Toward a comprehensive catalog of regulatory elements
- Kaili Fan
- Edith Pfister
- Zhiping Weng
Human Genetics (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.