Identification of active transcriptional regulatory elements from GRO-seq data

Danko, Charles G; Hyland, Stephanie L; Core, Leighton J; Martins, Andre L; Waters, Colin T; Lee, Hyung Won; Cheung, Vivian G; Kraus, W Lee; Lis, John T; Siepel, Adam

doi:10.1038/nmeth.3329

Article
Published: 23 March 2015

Identification of active transcriptional regulatory elements from GRO-seq data

Charles G Danko^1,2,3,
Stephanie L Hyland⁴,
Leighton J Core⁵^nAff11,
Andre L Martins⁶,
Colin T Waters ORCID: orcid.org/0000-0002-3096-0833⁵^nAff11,
Hyung Won Lee⁵,
Vivian G Cheung^7,8,
W Lee Kraus^9,10,
John T Lis⁵ &
…
Adam Siepel³^nAff11

Nature Methods volume 12, pages 433–438 (2015)Cite this article

15k Accesses
103 Citations
43 Altmetric
Metrics details

Subjects

Abstract

Modifications to the global run-on and sequencing (GRO-seq) protocol that enrich for 5′-capped RNAs can be used to reveal active transcriptional regulatory elements (TREs) with high accuracy. Here, we introduce discriminative regulatory-element detection from GRO-seq (dREG), a sensitive machine learning method that uses support vector regression to identify active TREs from GRO-seq data without requiring cap-based enrichment (https://github.com/Danko-Lab/dREG/). This approach allows TREs to be assayed together with gene expression levels and other transcriptional features in a single experiment. Predicted TREs are more enriched for several marks of transcriptional activation—including expression quantitative trait loci, disease-associated polymorphisms, acetylated histone 3 lysine 27 (H3K27ac) and transcription factor binding—than those identified by alternative functional assays. Using dREG, we surveyed TREs in eight human cell types and provide new insights into global patterns of TRE function.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: dREG schematic and validation.**

**Figure 2: Comparison of putative TREs detected using dREG, DNase I and ChromHMM.**

**Figure 3: Sequence-specific TFs identified using dREG transcribed TREs.**

**Figure 4: eQTL and GWAS SNP enrichments in the four classes of functional element.**

Mapping genotypes to chromatin accessibility profiles in single cells

Article 08 May 2024

Genome-wide association studies

Article 26 August 2021

Efficient gene knockout and genetic interaction screening using the in4mer CRISPR/Cas12a multiplex knockout platform

Article Open access 27 April 2024

Accession codes

Primary accessions

Gene Expression Omnibus

GSE66031

Referenced accessions

Gene Expression Omnibus

GSE60456

References

Calo, E. & Wysocka, J. Modification of enhancer chromatin: what, how, and why? Mol. Cell 49, 825–837 (2013).
Article CAS Google Scholar
ENCODE Project Consortium. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Giresi, P.G. & Lieb, J.D. Isolation of active regulatory elements from eukaryotic chromatin using FAIRE (Formaldehyde Assisted Isolation of Regulatory Elements). Methods 48, 233–239 (2009).
Article CAS Google Scholar
Buenrostro, J.D., Giresi, P.G., Zaba, L.C., Chang, H.Y. & Greenleaf, W.J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 1213–1218 (2013).
Article CAS Google Scholar
Cusanovich, D.A., Pavlovic, B., Pritchard, J.K. & Gilad, Y. The functional consequences of variation in transcription factor binding. PLoS Genet. 10, e1004226 (2014).
Article Google Scholar
Arnold, C.D. et al. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science 339, 1074–1077 (2013).
Article CAS Google Scholar
Kim, T.-K. et al. Widespread transcription at neuronal activity-regulated enhancers. Nature 465, 182–187 (2010).
Article CAS Google Scholar
Core, L.J., Waterfall, J.J. & Lis, J.T. Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science 322, 1845–1848 (2008).
Article CAS Google Scholar
Seila, A.C. et al. Divergent transcription from active promoters. Science 322, 1849–1851 (2008).
Article CAS Google Scholar
Kapranov, P. et al. RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science 316, 1484–1488 (2007).
Article CAS Google Scholar
Hah, N. et al. A rapid, extensive, and transient transcriptional response to estrogen signaling in breast cancer cells. Cell 145, 622–634 (2011).
Article CAS Google Scholar
Hah, N., Murakami, S., Nagari, A., Danko, C. & Kraus, W.L. Enhancer transcripts mark active estrogen receptor binding sites. Genome Res. 23, 1210–1223 (2013).
Article CAS Google Scholar
Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).
Article CAS Google Scholar
Melgar, M.F., Collins, F.S. & Sethupathy, P. Discovery of active enhancers through bidirectional expression of short transcripts. Genome Biol. 12, R113 (2011).
Article CAS Google Scholar
Wu, H. et al. Tissue-specific RNA expression marks distant-acting developmental enhancers. PLoS Genet. 10, e1004610 (2014).
Article Google Scholar
Kwak, H., Fuda, N.J., Core, L.J. & Lis, J.T. Precise maps of RNA polymerase reveal how promoters direct initiation and pausing. Science 339, 950–953 (2013).
Article CAS Google Scholar
Preker, P. et al. RNA exosome depletion reveals transcription upstream of active human promoters. Science 322, 1851–1854 (2008).
Article CAS Google Scholar
Kruesi, W.S., Core, L.J., Waters, C.T., Lis, J.T. & Meyer, B.J. Condensin controls recruitment of RNA polymerase II to achieve nematode X-chromosome dosage compensation. eLife 2, e00808 (2013).
Article Google Scholar
Core, L.J. et al. Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat. Genet. 46, 1311–1320 (2014).
Article CAS Google Scholar
Drucker, H., Burges, C.J.C., Kaufman, L., Smola, A. & Vapnik, V. in Proc Adv. Neural Inf. Process. Syst. (NIPS, 1996).
Danko, C.G. et al. Signaling pathways differentially affect RNA polymerase II initiation, pausing, and elongation rate in cells. Mol. Cell 50, 212–222 (2013).
Article CAS Google Scholar
Luo, X., Chae, M., Krishnakumar, R., Danko, C.G. & Kraus, W.L. Dynamic reorganization of the AC16 cardiomyocyte transcriptome in response to TNFα signaling revealed by integrated genomic analyses. BMC Genomics 15, 155 (2014).
Article Google Scholar
Wang, I.X. et al. RNA-DNA differences are generated in human cells within seconds after RNA exits polymerase II. Cell Rep. 6, 906–915 (2014).
Article CAS Google Scholar
Ernst, J. et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43–49 (2011).
Article CAS Google Scholar
John, S. et al. Chromatin accessibility pre-determines glucocorticoid receptor binding patterns. Nat. Genet. 43, 264–268 (2011).
Article CAS Google Scholar
Boyle, A.P. et al. High-resolution mapping and characterization of open chromatin across the genome. Cell 132, 311–322 (2008).
Article CAS Google Scholar
Creyghton, M.P. et al. Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc. Natl. Acad. Sci. USA 107, 21931–21936 (2010).
Article CAS Google Scholar
Guertin, M.J., Martins, A.L., Siepel, A. & Lis, J.T. Accurate prediction of inducible transcription factor binding intensities in vivo. PLoS Genet. 8, e1002610 (2012).
Article CAS Google Scholar
Heintzman, N.D. et al. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat. Genet. 39, 311–318 (2007).
Article CAS Google Scholar
Guertin, M.J. & Lis, J.T. Chromatin landscape dictates HSF binding to target DNA elements. PLoS Genet. 6, e1001114 (2010).
Article Google Scholar
Pique-Regi, R. et al. Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res. 21, 447–455 (2011).
Article CAS Google Scholar
Sherwood, R.I. et al. Discovery of directional and nondirectional pioneer transcription factors by modeling DNase profile magnitude and shape. Nat. Biotechnol. 32, 171–178 (2014).
Article CAS Google Scholar
Neph, S. et al. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature 489, 83–90 (2012).
Article CAS Google Scholar
Jolma, A. et al. DNA-binding specificities of human transcription factors. Cell 152, 327–339 (2013).
Article CAS Google Scholar
Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).
Article CAS Google Scholar
Schaub, M.A., Boyle, A.P., Kundaje, A., Batzoglou, S. & Snyder, M. Linking disease associations with regulatory information in the human genome. Genome Res. 22, 1748–1759 (2012).
Article CAS Google Scholar
Hadjur, S. et al. Cohesins form chromosomal cis-interactions at the developmentally regulated IFNG locus. Nature 460, 410–413 (2009).
Article CAS Google Scholar
Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D. & Weingessel, A. e1071: Misc Functions of the Department of Statistics (e1071) (TU Wien, 2010).
Chang, C.-C. & Lin, C.-J. in ACM Trans. Intell. Syst. Technol. 2, 27 (ACM, 2011).
Article Google Scholar
Hoffman, M.M. et al. Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res. 41, 827–841 (2013).
Article CAS Google Scholar
Ernst, J. & Kellis, M. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat. Biotechnol. 28, 817–825 (2010).
Article CAS Google Scholar
Neph, S. et al. BEDOPS: high-performance genomic feature operations. Bioinformatics 28, 1919–1920 (2012).
Article CAS Google Scholar
Quinlan, A.R. & Hall, I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS Google Scholar
Kittler, R. et al. A comprehensive nuclear receptor network for breast cancer cells. Cell Rep. 3, 538–551 (2013).
Article CAS Google Scholar

Download references

Acknowledgements

We thank I. Jonkers and N. Dukler for comments and helpful discussions on an early manuscript draft, and B. Gulko for critical discussions about support vector machines. This work was made possible by generous seed grants from the Cornell University Center for Vertebrate Genomics (CVG) and Center for Comparative and Population Genetics (3CPG), a US National Human Genome Research Institute grant (5R01HG007070-02) to A.S. and J.T.L., and US National Institutes of Health R01 (DK058110) to W.L.K. The content is solely the responsibility of the authors and does not necessarily represent the official views of the US National Institutes of Health.

Author information

Leighton J Core, Colin T Waters & Adam Siepel
Present address: Present addresses: Department of Molecular and Cell Biology, Institute for Systems Genomics, University of Connecticut, Storrs, Connecticut, USA (L.J.C.); Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, Massachusetts, USA (C.T.W.); Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA (A.S.).,

Authors and Affiliations

Baker Institute for Animal Health, Cornell University, Ithaca, New York, USA
Charles G Danko
Department of Biomedical Sciences, Cornell University, Ithaca, New York, USA
Charles G Danko
Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA
Charles G Danko & Adam Siepel
Tri-Institutional Training Program in Computational Biology and Medicine, New York, New York, USA
Stephanie L Hyland
Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, USA
Leighton J Core, Colin T Waters, Hyung Won Lee & John T Lis
Graduate Field in Computational Biology, Cornell University, Ithaca, New York, USA
Andre L Martins
Life Sciences Institute, University of Michigan, Ann Arbor, Michigan, USA
Vivian G Cheung
Howard Hughes Medical Institute, Chevy Chase, Maryland, USA
Vivian G Cheung
Laboratory of Signaling and Gene Regulation, Cecil H. and Ida Green Center for Reproductive Biology Sciences, University of Texas Southwestern Medical Center, Dallas, Texas, USA
W Lee Kraus
Division of Basic Research, Department of Obstetrics and Gynecology, University of Texas Southwestern Medical Center, Dallas, Texas, USA
W Lee Kraus

Authors

Charles G Danko
View author publications
You can also search for this author in PubMed Google Scholar
Stephanie L Hyland
View author publications
You can also search for this author in PubMed Google Scholar
Leighton J Core
View author publications
You can also search for this author in PubMed Google Scholar
Andre L Martins
View author publications
You can also search for this author in PubMed Google Scholar
Colin T Waters
View author publications
You can also search for this author in PubMed Google Scholar
Hyung Won Lee
View author publications
You can also search for this author in PubMed Google Scholar
Vivian G Cheung
View author publications
You can also search for this author in PubMed Google Scholar
W Lee Kraus
View author publications
You can also search for this author in PubMed Google Scholar
John T Lis
View author publications
You can also search for this author in PubMed Google Scholar
Adam Siepel
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

C.G.D. designed the dREG tool. C.G.D., A.L.M. and S.L.H. designed and implemented the software. C.G.D., S.L.H., A.L.M., L.J.C., J.T.L. and A.S. analyzed the data and interpreted the results. L.J.C., C.T.W., C.G.D., H.W.L., J.T.L., W.L.K. and V.G.C. contributed data and helped to troubleshoot experiments. C.G.D., A.S., J.T.L., L.J.C., S.L.H. and A.L.M. wrote the manuscript.

Corresponding authors

Correspondence to Charles G Danko, John T Lis or Adam Siepel.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Browser shot demonstrating the dREG technique.

Browser shot depicts raw dREG scores and dREG ‘peaks’ alongside PRO-seq, GRO-cap, DNase-I, and ENCODE ChIP-seq data for H3K27ac, H3K4me1, and H3K4me3.

Supplementary Figure 2 Illustration of the dREG feature vector and the resulting accuracy of TRE identification.

(a) The signal for +dREG TREs depicted as barcharts at decreasing window size (from top to bottom). Boxes represent consecutive, non-overlapping window sizes. The number of windows in the optimal feature vector, and the size represented by each bar, is shown at right. (b) The ROC plot shows the accuracy of dREG at distinguishing regulatory TREs given equal numbers of positive and negative examples (AUC = 0.99).

Supplementary Figure 3 Sensitivity to read depth and library quality.

(a,b) dREG sensitivity at a 10% false discovery rate at the indicated read depth or asymptotic library complexity. Dotted lines indicate a model that has been trained specifically on the indicated library. Solid lines indicate the model trained on the native K562 PRO-seq libraries. Pink and cyan denote GRO-cap sites and pairs, respectively. (c,d) SVR threshold required to achieve a 10% FDR for SVR models that have (dotted) or have not (solid) been trained specifically to the indicated parameters.

Supplementary Figure 4 dREG TREs are associated with chromatin marks characteristic of both promoters and enhancers.

The majority (>90%) of dREG TREs intersect post-translational histone modifications previously associated with either promoters or enhancers, and interpreted by ChromHMM.

Supplementary Figure 5 Chromatin marks associated with three classes of DNase I–hypersensitive sites.

DNase-I hypersensitive sites identified by either the UW and Duke assays alone, or their intersection, are associated with the indicated fraction of regulatory marks (blue), transcribed regions (red), or repeat/ heterochromatin (purple), as annotated by ChromHMM.

Supplementary Figure 6 Fraction of TREs in each class among four cell types.

Barplots represent the fraction of TREs in each of the four nested classes of TRE compared across four cell types for which all sources of data are available (K562, GM12878, CD4+ T-cells, and HeLa carcinoma cells).

Supplementary Figure 7 High-confidence DNase I peaks covered by dREG.

Fraction of DNase-I peaks (excluding CTCF-bound insulators) that intersect a dREG site (Y-axis) as a function of the PRO-seq read depth (X-axis) in K562 cells.

Supplementary Figure 8 Association of TREs in each class to independent functional marks.

(a,b) Comparison of read-densities for H3K9ac (a) and H3K4me3 (b) in each class of functional element. (c) The fraction of ENCODE peak calls for the specified mark (H3K4me1, H3K27ac, H3K9ac, and CTCF) in each of the four classes. The ‘other’ category denotes peaks for the indicated mark falling outside of the putative TREs identified by other assays. (d) Plots show the ENCODE MNase-seq signal centered on the indicated class of TRE.

Supplementary Figure 9 Enrichment of H3K27ac and PRO-seq signal intensity.

Enrichment of H3K27ac at dREG TREs that lack an H3K27ac peak call (left); and PRO-seq signal on the plus (red) and minus (blue) strand at H3K27ac peak calls without a dREG TRE prediction (right).

Supplementary Figure 10 Sequence-specific transcription factors distinguish between DNase I–hypersensitive and transcribed regulatory TREs.

(a) TFs are either associated with DNase-I hypersensitive peaks that are actively transcribed (+dREG) or open but non-transcribed (-dREG and DNase-I hypersensitive insulators), as indicated by the presence of Pol II (red rocket). ROC plots depict the accuracy with which these classes of regulatory TRE can be distinguished in three cell types based on the patterns of TF binding. (b) Logistic regression coefficients for each transcription factor correlated with transcription initiation (positive, red) or repression (negative, blue) following a 1,000 sample bootstrap.

Supplementary Figure 11 Browser shot of CTCF ChIP-seq and PRO-seq signal.

UCSC genome browser signal compares dREG, GRO-cap, CTCF, and PRO-seq in the indicated region of chr1 in K562 cells.

Supplementary Figure 12 Distance of each class to the nearest RefSeq annotated transcription start site.

Each point shows the fraction of TREs in the indicated class with a distance to the nearest RefSeq annotated transcription start site greater than the value indicated on the X-axis (i.e., 1-cumulative density function). In this plot, separate lines show the distribution for the set of all +dREG TREs (red, dotted) and for the subset which intersects chromatin marks indicative of enhancers (red, solid).

Supplementary Figure 13 PhyloP scores among the placental mammals in each class of TRE.

Violin plots denote the distribution of the maximum PhyloP score within each occurrence of the indicated class of TRE in GM12878 cells.

Supplementary Figure 14 Cell type–specific differences in TRE class.

Heatmap denotes the median frequency with which the indicated class of TRE in one cell type (‘From’ axis) intersects with the indicated TRE class in a second cell type (‘To’ axis).

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–14, Supplementary Tables 1–4 and Supplementary Discussion (PDF 1779 kb)

Supplementary Software

dREG software package. The models used for running dREG are located at https://github.com/Danko-Lab/dREG. (ZIP 3517 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Danko, C., Hyland, S., Core, L. et al. Identification of active transcriptional regulatory elements from GRO-seq data. Nat Methods 12, 433–438 (2015). https://doi.org/10.1038/nmeth.3329

Download citation

Received: 06 August 2014
Accepted: 24 February 2015
Published: 23 March 2015
Issue Date: May 2015
DOI: https://doi.org/10.1038/nmeth.3329

This article is cited by

KAS-seq profiling captures transcription dynamics during oocyte maturation
- Huiqing An
- Xiuwan Wang
- Qiang Wang
Journal of Ovarian Research (2024)
Decoding enhancer complexity with machine learning and high-throughput discovery
- Gabrielle D. Smith
- Wan Hern Ching
- Emily S. Wong
Genome Biology (2023)
CHEX-seq detects single-cell genomic single-stranded DNA with catalytical potential
- Youtao Lu
- Jaehee Lee
- James Eberwine
Nature Communications (2023)
A-MYB and BRDT-dependent RNA Polymerase II pause release orchestrates transcriptional regulation in mammalian meiosis
- Adriana K. Alexander
- Edward J. Rice
- Charles G. Danko
Nature Communications (2023)
RNA polymerase II dynamics shape enhancer–promoter interactions
- Gilad Barshad
- James J. Lewis
- Charles G. Danko
Nature Genetics (2023)

Subjects

Abstract

Access options

Similar content being viewed by others

Accession codes

Primary accessions

Gene Expression Omnibus

Referenced accessions

Gene Expression Omnibus

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Integrated supplementary information

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links