Genome-wide mapping of autonomous promoter activity in human cells

Abstract

Previous methods to systematically characterize sequence-intrinsic activity of promoters have been limited by relatively low throughput and the length of the sequences that could be tested. Here we present 'survey of regulatory elements' (SuRE), a method that assays more than 108 DNA fragments, each 0.2–2 kb in size, for their ability to drive transcription autonomously. In SuRE, a plasmid library of random genomic fragments upstream of a 20-bp barcode is constructed, and decoded by paired-end sequencing. This library is used to transfect cells, and barcodes in transcribed RNA are quantified by high-throughput sequencing. When applied to the human genome, we achieve 55-fold genome coverage, allowing us to map autonomous promoter activity genome-wide in K562 cells. By computational modeling we delineate subregions within promoters that are relevant for their activity. We show that antisense promoter transcription is generally dependent on the sense core promoter sequences, and that most enhancers and several families of repetitive elements act as autonomous transcription initiation sites.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: SuRE provides a genome-wide map of autonomous promoter activity.
Figure 2: Autonomous divergent promoter activity.
Figure 3: Partially overlapping query fragments allow for delineation of regions that drive promoter activity.
Figure 4: Relationship between CpG islands and gene expression.
Figure 5: Autonomous transcription from enhancers.
Figure 6: Autonomous transcription from specific repeat elements.

Accession codes

Primary accessions

Gene Expression Omnibus

Referenced accessions

Gene Expression Omnibus

References

  1. 1

    Kadonaga, J.T. Perspectives on the RNA polymerase II core promoter. Wiley Interdiscip. Rev. Dev. Biol. 1, 40–51 (2012).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  2. 2

    Shiraki, T. et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc. Natl. Acad. Sci. USA 100, 15776–15781 (2003).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  3. 3

    Core, L.J., Waterfall, J.J. & Lis, J.T. Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science 322, 1845–1848 (2008).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  4. 4

    Kwak, H., Fuda, N.J., Core, L.J. & Lis, J.T. Precise maps of RNA polymerase reveal how promoters direct initiation and pausing. Science 339, 950–953 (2013).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  5. 5

    Core, L.J. et al. Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat. Genet. 46, 1311–1320 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  6. 6

    Patwardhan, R.P. et al. High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis. Nat. Biotechnol. 27, 1173–1175 (2009).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  7. 7

    Sharon, E. et al. Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters. Nat. Biotechnol. 30, 521–530 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  8. 8

    Melnikov, A. et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat. Biotechnol. 30, 271–277 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  9. 9

    Kheradpour, P. et al. Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. Genome Res. 23, 800–811 (2013).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  10. 10

    Lubliner, S. et al. Core promoter sequence in yeast is a major determinant of expression level. Genome Res. 25, 1008–1017 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  11. 11

    Farley, E.K. et al. Suboptimization of developmental enhancers. Science 350, 325–328 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  12. 12

    Nguyen, T.A. et al. High-throughput functional comparison of promoter and enhancer activities. Genome Res. 26, 1023–1033 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  13. 13

    Patwardhan, R.P. et al. Massively parallel functional dissection of mammalian enhancers in vivo. Nat. Biotechnol. 30, 265–270 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  14. 14

    Smith, R.P. et al. Massively parallel decoding of mammalian regulatory sequences supports a flexible organizational model. Nat. Genet. 45, 1021–1028 (2013).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  15. 15

    Mogno, I., Kwasnieski, J.C. & Cohen, B.A. Massively parallel synthetic promoter assays reveal the in vivo effects of binding site variants. Genome Res. 23, 1908–1915 (2013).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  16. 16

    Dickel, D.E. et al. Function-based identification of mammalian enhancers using site-specific integration. Nat. Methods 11, 566–571 (2014).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  17. 17

    Murtha, M. et al. FIREWACh: high-throughput functional detection of transcriptional regulatory modules in mammalian cells. Nat. Methods 11, 559–565 (2014).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  18. 18

    Arnold, C.D. et al. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science 339, 1074–1077 (2013).

    CAS  Article  PubMed  Google Scholar 

  19. 19

    Zabidi, M.A. et al. Enhancer-core-promoter specificity separates developmental and housekeeping gene regulation. Nature 518, 556–559 (2015).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  20. 20

    Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  21. 21

    ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  22. 22

    Osoegawa, K. et al. A bacterial artificial chromosome library for sequencing the complete human genome. Genome Res. 11, 483–496 (2001).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  23. 23

    Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. 24

    Duttke, S.H. et al. Human promoters are intrinsically directional. Mol. Cell 57, 674–684 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  25. 25

    Scruggs, B.S. et al. Bidirectional transcription arises from two distinct hubs of transcription factor binding and active chromatin. Mol. Cell 58, 1101–1112 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  26. 26

    Gardiner-Garden, M. & Frommer, M. CpG islands in vertebrate genomes. J. Mol. Biol. 196, 261–282 (1987).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  27. 27

    Landolin, J.M. et al. Sequence features that drive human promoter function and tissue specificity. Genome Res. 20, 890–898 (2010).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  28. 28

    Bird, A.P. DNA methylation and the frequency of CpG in animal DNA. Nucleic Acids Res. 8, 1499–1504 (1980).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  29. 29

    Andersson, R. Promoter or enhancer, what's the difference? Deconstruction of established distinctions and presentation of a unifying model. BioEssays 37, 314–323 (2015).

    PubMed  Article  PubMed Central  Google Scholar 

  30. 30

    Kim, T.K. & Shiekhattar, R. Architectural and functional commonalities between enhancers and promoters. Cell 162, 948–959 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  31. 31

    Hah, N., Murakami, S., Nagari, A., Danko, C.G. & Kraus, W.L. Enhancer transcripts mark active estrogen receptor binding sites. Genome Res. 23, 1210–1223 (2013).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  32. 32

    Arner, E. et al. Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells. Science 347, 1010–1014 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  33. 33

    Sanyal, A., Lajoie, B.R., Jain, G. & Dekker, J. The long-range interaction landscape of gene promoters. Nature 489, 109–113 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  34. 34

    Kim, T.K. et al. Widespread transcription at neuronal activity-regulated enhancers. Nature 465, 182–187 (2010).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  35. 35

    Blom van Assendelft, G., Hanscombe, O., Grosveld, F. & Greaves, D.R. The beta-globin dominant control region activates homologous and heterologous promoters in a tissue-specific manner. Cell 56, 969–977 (1989).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  36. 36

    Ashe, H.L., Monks, J., Wijgerde, M., Fraser, P. & Proudfoot, N.J. Intergenic transcription and transinduction of the human beta-globin locus. Genes Dev. 11, 2494–2509 (1997).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  37. 37

    Rada-Iglesias, A. et al. A unique chromatin signature uncovers early developmental enhancers in humans. Nature 470, 279–283 (2011).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  38. 38

    Kwasnieski, J.C., Fiore, C., Chaudhari, H.G. & Cohen, B.A. High-throughput functional testing of ENCODE segmentation predictions. Genome Res. 24, 1595–1602 (2014).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  39. 39

    Hay, D. et al. Genetic dissection of the α-globin super-enhancer in vivo. Nat. Genet. 48, 895–903 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  40. 40

    Dean, A., Ley, T.J., Humphries, R.K., Fordis, M. & Schechter, A.N. Inducible transcription of five globin genes in K562 human leukemia cells. Proc. Natl. Acad. Sci. USA 80, 5515–5519 (1983).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  41. 41

    Tahara, T., Sun, J., Igarashi, K. & Taketani, S. Heme-dependent up-regulation of the alpha-globin gene expression by transcriptional repressor Bach1 in erythroid cells. Biochem. Biophys. Res. Commun. 324, 77–85 (2004).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  42. 42

    Faulkner, G.J. et al. A rescue strategy for multimapping short sequence tags refines surveys of transcriptional activity by CAGE. Genomics 91, 281–288 (2008).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  43. 43

    Ling, J. et al. The solitary long terminal repeats of ERV-9 endogenous retrovirus are conserved during primate evolution and possess enhancer activities in embryonic and hematopoietic cells. J. Virol. 76, 2410–2423 (2002).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  44. 44

    Kwasnieski, J.C., Mogno, I., Myers, C.A., Corbo, J.C. & Cohen, B.A. Complex effects of nucleotide variants in a mammalian cis-regulatory element. Proc. Natl. Acad. Sci. USA 109, 19498–19503 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  45. 45

    Forrest, A.R. et al. A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  46. 46

    Yu, X. et al. The long terminal repeat (LTR) of ERV-9 human endogenous retrovirus binds to NF-Y in the assembly of an active LTR enhancer complex NF-Y/MZF1/GATA-2. J. Biol. Chem. 280, 35184–35194 (2005).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  47. 47

    Temin, H.M. Structure, variation and synthesis of retrovirus long terminal repeat. Cell 27, 1–3 (1981).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  48. 48

    Langmead, B. & Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  49. 49

    Huber, W. et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat. Methods 12, 115–121 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  50. 50

    Hubley, R. et al. The Dfam database of repetitive DNA families. Nucleic Acids Res. 44, D81–D89 (2016).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  51. 51

    Gel, B. et al. regioneR: an R/Bioconductor package for the association analysis of genomic regions based on permutation tests. Bioinformatics 32, 289–291 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  52. 52

    Rube, H.T. et al. Sequence features accurately predict genome-wide MeCP2 binding in vivo. Nat. Commun. 7, 11025 (2016).

Download references

Acknowledgements

We thank the NKI Genomics Core Facility for technical support, J. Omar Yáñez Cuna for scripts and advice, and members of our laboratories for helpful discussions, and T. Rube in particular for suggesting the 2D analysis of CpG content. Supported by ERC Advanced Grant 293662 and NWO-ALW VICI (B.v.S.); NIH grants R01HG003008 and S10OD021764 (H.J.B.); and NIH grants R01HG003008 and S10OD021764 and T32GM008281 (V.D.F.). Addgene plasmid # 49157 is a gift from James Thomson, University of Wisconsin, Madison.

Author information

Affiliations

Authors

Contributions

J.v.A. conceived and developed the SuRE assay, designed and performed experiments, analyzed data and wrote the manuscript.V.D.F. developed algorithms, analyzed data and wrote the manuscript. L.P. developed algorithms and analyzed data. M.d.H. performed experiments. J.S. performed experiments. H.J.B. developed algorithms, analyzed data and wrote the manuscript. B.v.S. designed experiments, analyzed data and wrote the manuscript.

Corresponding authors

Correspondence to Joris van Arensbergen or Harmen J Bussemaker or Bas van Steensel.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Detailed schematic representation of SuRE methodology.

See Methods for detailed description. a. Size-selected and A-tailed random fragments (‘queries’) of the human genome are inserted in bulk into barcoded T-overhang plasmids by ligation. BC, barcode; ORF, open reading frame; PAS, polyadenylation signal. b. The library is digested by endonuclease I-CeuI so that the barcode with the query sequence is released. This is then self-ligated and again digested with a frequent cutter restriction enzyme to reduce the insert size. After another self-ligation the circle is linearized, PCR amplified and subjected to high-throughput sequencing. c. Per biological replicate ~100 million cells are transfected. Those plasmids that contain promoter activity in the direction of the barcode will transcribe the barcode into RNA. Cells are harvested after 24 hours, RNA is extracted, polyA purified, reverse transcribed, PCR amplified and subjected to high-throughput sequencing. By normalization to estimated barcode frequencies in the SuRE plasmid library a genome-wide SuRE expression profile is generated.

Supplementary Figure 2 SuRE genome coverage, reproducibility and peaks.

a. Coverage of the human genome by unique elements in the SuRE library. b. Distribution (fold enrichment) of SuRE peaks among the 25 types of chromatin1. c. Correlation of SuRE enrichment between biological replicates at TSSs. d. Correlation between CAGE1 and SuRE at the TSSs. e. Same as Fig. 1e but with Histone genes indicated in red. Correlation between relative promoter autonomy (log10(SuRE/GRO-cap)) and tissue specificity (number of cell types and tissues in which each TSS is active, out of 889 tested2). Grey line shows linear fit. f. Correlation between relative promoter autonomy and the total number of promoters (ENCODE chromatin type ‘Tss’) that are found in a fixed window of 5-50 kb from the TSS. g. Size distribution of genomic fragments in the SuRE library. h. Number of reads (per individual replicate) of barcodes in cDNA. Only barcodes linked to a unique genomic fragment were counted. i. Venn diagram representing the overlap between the summits of SuRE peaks as called by the MACS algorithm3 and ENCODE-annotated promoters (‘Tss’) and enhancers (‘Enh’ and ‘EnhW’ combined)1. Because >1 peak summit can overlap a ENCODE annotation, overlaps are given for each direction of the comparison in the color of the annotation. j. Relative SuRE expression (SuRE/GRO-cap) of SuRE fragments for which the 3’ ends either in an intron (black) or an exon (red). Expression is normalized to GRO-cap to avoid systematic biases resulting from possible correlations between gene structure and expression level. A LOESS curve was separately fit to the logratios for all exon- and intron-terminal fragments using the distance each fragment ended downstream of the corresponding TSS, then predicted ratios were normalized to a maximum of 1.1. Encode Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74 (2012).2. FANTOM Consortium. A promoter-level mammalian expression atlas. Nature 507, 462-470 (2014).3. Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol 9, R137 (2008).

Supplementary Figure 3 Focused BAC library.

a. Correlation between biological replicates for the focused SuRE library. Data is shown for all TSSs within in the BAC library. b. Correlation between SuRE enrichment obtained with the genome-wide library (x-axis) and the focused library (y-axis) for all peaks overlapping the BAC library. c. Same as (b) but for all TSSs in the BAC library. d. Correlation between SuRE enrichment obtained with the genome-wide library (x-axis) and a conventional reporter assay (y-axis) for 23 promoters. Grey line shows linear fit. e. Correlation between pre-transfection read-counts and post-transfection read-counts for all TSSs in the BAC library.

Supplementary Figure 4 Run-on transcription around LTR12C elements, antisense.

Average PRO-seq run-on transcription activity4 around LTR12C elements as in Fig. 5e, but in antisense orientation.4. Core, L.J. et al. Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat Genet 46, 1311-1320 (2014).

Supplementary Figure 5 Chromatin marks associated to unannotated SuRE peaks.

a. Mean enrichment for 4 chromatin marks centered on the summit of unannotated SuRE peaks, i.e. peaks that did not overlap ENCODE annotated promoters or enhancers (‘Tss’ or ‘Enh’ chromatin state) or repetitive elements of the ERV1 or ERVL-MaLR family. b. Same as (a) but for SuRE peaks that overlapped encode annotated promoters. c. Mean SuRE enrichment for all peaks overlapping ENCODE annotated promoters (green) and unannotated SuRE peaks. d. Same as (c) but for mean GRO-cap signal.

Supplementary Figure 6 Envisioned SuRE methodology for enhancer detection.

a. Current SuRE reporter construct for promoter detection. b. Envisioned reporter construct for enhancer detection. Query: genomic fragment, BC: barcode, ORF: open reading frame, PAS: polyadenylation signal, mPR: minimal promoter.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–6 and Supplementary Tables 1 and 2 (PDF 1042 kb)

Supplementary Source Code 1

software for SuRE sequencing data processing (ZIP 148 kb)

Supplementary Source Code 2

software for Generalized Linear Modeling (ZIP 198 kb)

Supplementary Data Set

Genomic coordinates of SuRE peaks, and their overlap with promoters, enhancers and repetitive elements. (ZIP 1637 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

van Arensbergen, J., FitzPatrick, V., de Haas, M. et al. Genome-wide mapping of autonomous promoter activity in human cells. Nat Biotechnol 35, 145–153 (2017). https://doi.org/10.1038/nbt.3754

Download citation

Further reading