Previous methods to systematically characterize sequence-intrinsic activity of promoters have been limited by relatively low throughput and the length of the sequences that could be tested. Here we present 'survey of regulatory elements' (SuRE), a method that assays more than 108 DNA fragments, each 0.2–2 kb in size, for their ability to drive transcription autonomously. In SuRE, a plasmid library of random genomic fragments upstream of a 20-bp barcode is constructed, and decoded by paired-end sequencing. This library is used to transfect cells, and barcodes in transcribed RNA are quantified by high-throughput sequencing. When applied to the human genome, we achieve 55-fold genome coverage, allowing us to map autonomous promoter activity genome-wide in K562 cells. By computational modeling we delineate subregions within promoters that are relevant for their activity. We show that antisense promoter transcription is generally dependent on the sense core promoter sequences, and that most enhancers and several families of repetitive elements act as autonomous transcription initiation sites.
Subscribe to Journal
Get full journal access for 1 year
only $20.83 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Gene Expression Omnibus
Gene Expression Omnibus
Kadonaga, J.T. Perspectives on the RNA polymerase II core promoter. Wiley Interdiscip. Rev. Dev. Biol. 1, 40–51 (2012).
Shiraki, T. et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc. Natl. Acad. Sci. USA 100, 15776–15781 (2003).
Core, L.J., Waterfall, J.J. & Lis, J.T. Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science 322, 1845–1848 (2008).
Kwak, H., Fuda, N.J., Core, L.J. & Lis, J.T. Precise maps of RNA polymerase reveal how promoters direct initiation and pausing. Science 339, 950–953 (2013).
Core, L.J. et al. Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat. Genet. 46, 1311–1320 (2014).
Patwardhan, R.P. et al. High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis. Nat. Biotechnol. 27, 1173–1175 (2009).
Sharon, E. et al. Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters. Nat. Biotechnol. 30, 521–530 (2012).
Melnikov, A. et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat. Biotechnol. 30, 271–277 (2012).
Kheradpour, P. et al. Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. Genome Res. 23, 800–811 (2013).
Lubliner, S. et al. Core promoter sequence in yeast is a major determinant of expression level. Genome Res. 25, 1008–1017 (2015).
Farley, E.K. et al. Suboptimization of developmental enhancers. Science 350, 325–328 (2015).
Nguyen, T.A. et al. High-throughput functional comparison of promoter and enhancer activities. Genome Res. 26, 1023–1033 (2016).
Patwardhan, R.P. et al. Massively parallel functional dissection of mammalian enhancers in vivo. Nat. Biotechnol. 30, 265–270 (2012).
Smith, R.P. et al. Massively parallel decoding of mammalian regulatory sequences supports a flexible organizational model. Nat. Genet. 45, 1021–1028 (2013).
Mogno, I., Kwasnieski, J.C. & Cohen, B.A. Massively parallel synthetic promoter assays reveal the in vivo effects of binding site variants. Genome Res. 23, 1908–1915 (2013).
Dickel, D.E. et al. Function-based identification of mammalian enhancers using site-specific integration. Nat. Methods 11, 566–571 (2014).
Murtha, M. et al. FIREWACh: high-throughput functional detection of transcriptional regulatory modules in mammalian cells. Nat. Methods 11, 559–565 (2014).
Arnold, C.D. et al. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science 339, 1074–1077 (2013).
Zabidi, M.A. et al. Enhancer-core-promoter specificity separates developmental and housekeeping gene regulation. Nature 518, 556–559 (2015).
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Osoegawa, K. et al. A bacterial artificial chromosome library for sequencing the complete human genome. Genome Res. 11, 483–496 (2001).
Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).
Duttke, S.H. et al. Human promoters are intrinsically directional. Mol. Cell 57, 674–684 (2015).
Scruggs, B.S. et al. Bidirectional transcription arises from two distinct hubs of transcription factor binding and active chromatin. Mol. Cell 58, 1101–1112 (2015).
Gardiner-Garden, M. & Frommer, M. CpG islands in vertebrate genomes. J. Mol. Biol. 196, 261–282 (1987).
Landolin, J.M. et al. Sequence features that drive human promoter function and tissue specificity. Genome Res. 20, 890–898 (2010).
Bird, A.P. DNA methylation and the frequency of CpG in animal DNA. Nucleic Acids Res. 8, 1499–1504 (1980).
Andersson, R. Promoter or enhancer, what's the difference? Deconstruction of established distinctions and presentation of a unifying model. BioEssays 37, 314–323 (2015).
Kim, T.K. & Shiekhattar, R. Architectural and functional commonalities between enhancers and promoters. Cell 162, 948–959 (2015).
Hah, N., Murakami, S., Nagari, A., Danko, C.G. & Kraus, W.L. Enhancer transcripts mark active estrogen receptor binding sites. Genome Res. 23, 1210–1223 (2013).
Arner, E. et al. Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells. Science 347, 1010–1014 (2015).
Sanyal, A., Lajoie, B.R., Jain, G. & Dekker, J. The long-range interaction landscape of gene promoters. Nature 489, 109–113 (2012).
Kim, T.K. et al. Widespread transcription at neuronal activity-regulated enhancers. Nature 465, 182–187 (2010).
Blom van Assendelft, G., Hanscombe, O., Grosveld, F. & Greaves, D.R. The beta-globin dominant control region activates homologous and heterologous promoters in a tissue-specific manner. Cell 56, 969–977 (1989).
Ashe, H.L., Monks, J., Wijgerde, M., Fraser, P. & Proudfoot, N.J. Intergenic transcription and transinduction of the human beta-globin locus. Genes Dev. 11, 2494–2509 (1997).
Rada-Iglesias, A. et al. A unique chromatin signature uncovers early developmental enhancers in humans. Nature 470, 279–283 (2011).
Kwasnieski, J.C., Fiore, C., Chaudhari, H.G. & Cohen, B.A. High-throughput functional testing of ENCODE segmentation predictions. Genome Res. 24, 1595–1602 (2014).
Hay, D. et al. Genetic dissection of the α-globin super-enhancer in vivo. Nat. Genet. 48, 895–903 (2016).
Dean, A., Ley, T.J., Humphries, R.K., Fordis, M. & Schechter, A.N. Inducible transcription of five globin genes in K562 human leukemia cells. Proc. Natl. Acad. Sci. USA 80, 5515–5519 (1983).
Tahara, T., Sun, J., Igarashi, K. & Taketani, S. Heme-dependent up-regulation of the alpha-globin gene expression by transcriptional repressor Bach1 in erythroid cells. Biochem. Biophys. Res. Commun. 324, 77–85 (2004).
Faulkner, G.J. et al. A rescue strategy for multimapping short sequence tags refines surveys of transcriptional activity by CAGE. Genomics 91, 281–288 (2008).
Ling, J. et al. The solitary long terminal repeats of ERV-9 endogenous retrovirus are conserved during primate evolution and possess enhancer activities in embryonic and hematopoietic cells. J. Virol. 76, 2410–2423 (2002).
Kwasnieski, J.C., Mogno, I., Myers, C.A., Corbo, J.C. & Cohen, B.A. Complex effects of nucleotide variants in a mammalian cis-regulatory element. Proc. Natl. Acad. Sci. USA 109, 19498–19503 (2012).
Forrest, A.R. et al. A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).
Yu, X. et al. The long terminal repeat (LTR) of ERV-9 human endogenous retrovirus binds to NF-Y in the assembly of an active LTR enhancer complex NF-Y/MZF1/GATA-2. J. Biol. Chem. 280, 35184–35194 (2005).
Temin, H.M. Structure, variation and synthesis of retrovirus long terminal repeat. Cell 27, 1–3 (1981).
Langmead, B. & Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Huber, W. et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat. Methods 12, 115–121 (2015).
Hubley, R. et al. The Dfam database of repetitive DNA families. Nucleic Acids Res. 44, D81–D89 (2016).
Gel, B. et al. regioneR: an R/Bioconductor package for the association analysis of genomic regions based on permutation tests. Bioinformatics 32, 289–291 (2016).
Rube, H.T. et al. Sequence features accurately predict genome-wide MeCP2 binding in vivo. Nat. Commun. 7, 11025 (2016).
We thank the NKI Genomics Core Facility for technical support, J. Omar Yáñez Cuna for scripts and advice, and members of our laboratories for helpful discussions, and T. Rube in particular for suggesting the 2D analysis of CpG content. Supported by ERC Advanced Grant 293662 and NWO-ALW VICI (B.v.S.); NIH grants R01HG003008 and S10OD021764 (H.J.B.); and NIH grants R01HG003008 and S10OD021764 and T32GM008281 (V.D.F.). Addgene plasmid # 49157 is a gift from James Thomson, University of Wisconsin, Madison.
The authors declare no competing financial interests.
Integrated supplementary information
See Methods for detailed description. a. Size-selected and A-tailed random fragments (‘queries’) of the human genome are inserted in bulk into barcoded T-overhang plasmids by ligation. BC, barcode; ORF, open reading frame; PAS, polyadenylation signal. b. The library is digested by endonuclease I-CeuI so that the barcode with the query sequence is released. This is then self-ligated and again digested with a frequent cutter restriction enzyme to reduce the insert size. After another self-ligation the circle is linearized, PCR amplified and subjected to high-throughput sequencing. c. Per biological replicate ~100 million cells are transfected. Those plasmids that contain promoter activity in the direction of the barcode will transcribe the barcode into RNA. Cells are harvested after 24 hours, RNA is extracted, polyA purified, reverse transcribed, PCR amplified and subjected to high-throughput sequencing. By normalization to estimated barcode frequencies in the SuRE plasmid library a genome-wide SuRE expression profile is generated.
a. Coverage of the human genome by unique elements in the SuRE library. b. Distribution (fold enrichment) of SuRE peaks among the 25 types of chromatin1. c. Correlation of SuRE enrichment between biological replicates at TSSs. d. Correlation between CAGE1 and SuRE at the TSSs. e. Same as Fig. 1e but with Histone genes indicated in red. Correlation between relative promoter autonomy (log10(SuRE/GRO-cap)) and tissue specificity (number of cell types and tissues in which each TSS is active, out of 889 tested2). Grey line shows linear fit. f. Correlation between relative promoter autonomy and the total number of promoters (ENCODE chromatin type ‘Tss’) that are found in a fixed window of 5-50 kb from the TSS. g. Size distribution of genomic fragments in the SuRE library. h. Number of reads (per individual replicate) of barcodes in cDNA. Only barcodes linked to a unique genomic fragment were counted. i. Venn diagram representing the overlap between the summits of SuRE peaks as called by the MACS algorithm3 and ENCODE-annotated promoters (‘Tss’) and enhancers (‘Enh’ and ‘EnhW’ combined)1. Because >1 peak summit can overlap a ENCODE annotation, overlaps are given for each direction of the comparison in the color of the annotation. j. Relative SuRE expression (SuRE/GRO-cap) of SuRE fragments for which the 3’ ends either in an intron (black) or an exon (red). Expression is normalized to GRO-cap to avoid systematic biases resulting from possible correlations between gene structure and expression level. A LOESS curve was separately fit to the logratios for all exon- and intron-terminal fragments using the distance each fragment ended downstream of the corresponding TSS, then predicted ratios were normalized to a maximum of 1.1. Encode Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74 (2012).2. FANTOM Consortium. A promoter-level mammalian expression atlas. Nature 507, 462-470 (2014).3. Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol 9, R137 (2008).
a. Correlation between biological replicates for the focused SuRE library. Data is shown for all TSSs within in the BAC library. b. Correlation between SuRE enrichment obtained with the genome-wide library (x-axis) and the focused library (y-axis) for all peaks overlapping the BAC library. c. Same as (b) but for all TSSs in the BAC library. d. Correlation between SuRE enrichment obtained with the genome-wide library (x-axis) and a conventional reporter assay (y-axis) for 23 promoters. Grey line shows linear fit. e. Correlation between pre-transfection read-counts and post-transfection read-counts for all TSSs in the BAC library.
Average PRO-seq run-on transcription activity4 around LTR12C elements as in Fig. 5e, but in antisense orientation.4. Core, L.J. et al. Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat Genet 46, 1311-1320 (2014).
a. Mean enrichment for 4 chromatin marks centered on the summit of unannotated SuRE peaks, i.e. peaks that did not overlap ENCODE annotated promoters or enhancers (‘Tss’ or ‘Enh’ chromatin state) or repetitive elements of the ERV1 or ERVL-MaLR family. b. Same as (a) but for SuRE peaks that overlapped encode annotated promoters. c. Mean SuRE enrichment for all peaks overlapping ENCODE annotated promoters (green) and unannotated SuRE peaks. d. Same as (c) but for mean GRO-cap signal.
a. Current SuRE reporter construct for promoter detection. b. Envisioned reporter construct for enhancer detection. Query: genomic fragment, BC: barcode, ORF: open reading frame, PAS: polyadenylation signal, mPR: minimal promoter.
Supplementary Figures 1–6 and Supplementary Tables 1 and 2 (PDF 1042 kb)
software for SuRE sequencing data processing (ZIP 148 kb)
software for Generalized Linear Modeling (ZIP 198 kb)
Genomic coordinates of SuRE peaks, and their overlap with promoters, enhancers and repetitive elements. (ZIP 1637 kb)
About this article
Cite this article
van Arensbergen, J., FitzPatrick, V., de Haas, M. et al. Genome-wide mapping of autonomous promoter activity in human cells. Nat Biotechnol 35, 145–153 (2017). https://doi.org/10.1038/nbt.3754
Nature Genetics (2020)
Cell Reports (2020)
Open Biology (2020)
Nature Biotechnology (2020)
Nature Reviews Genetics (2020)