A comparison of experimental assays and analytical methods for genome-wide identification of active enhancers

Yao, Li; Liang, Jin; Ozer, Abdullah; Leung, Alden King-Yung; Lis, John T.; Yu, Haiyuan

doi:10.1038/s41587-022-01211-7

Article
Published: 17 February 2022

A comparison of experimental assays and analytical methods for genome-wide identification of active enhancers

Nature Biotechnology volume 40, pages 1056–1065 (2022)Cite this article

10k Accesses
18 Citations
44 Altmetric
Metrics details

Subjects

Abstract

Mounting evidence supports the idea that transcriptional patterns serve as more specific identifiers of active enhancers than histone marks; however, the optimal strategy to identify active enhancers both experimentally and computationally has not been determined. Here, we compared 13 genome-wide RNA sequencing (RNA-seq) assays in K562 cells and show that nuclear run-on followed by cap-selection assay (GRO/PRO-cap) has advantages in enhancer RNA detection and active enhancer identification. We also introduce a tool, peak identifier for nascent transcript starts (PINTS), to identify active promoters and enhancers genome wide and pinpoint the precise location of 5′ transcription start sites. Finally, we compiled a comprehensive enhancer candidate compendium based on the detected enhancer RNA (eRNA) transcription start sites (TSSs) available in 120 cell and tissue types, which can be accessed at https://pints.yulab.org. With knowledge of the best available assays and pipelines, this large-scale annotation of candidate enhancers will pave the way for selection and characterization of their functions in a time- and labor-efficient manner.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Comparison of currently available assays for detection of eRNAs.**

**Fig. 2: Evaluation of assay sensitivity in eRNA detection.**

**Fig. 3: Characterization of factors affecting assay sensitivity and evaluation of assay specificity in eRNA detection.**

**Fig. 4: PINTS achieves optimal balance among resolution, robustness, sensitivity, specificity and computational resources required.**

**Fig. 5: A comprehensive human enhancer compendium.**

**Fig. 6: Interactive PINTS web server.**

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Simultaneous single-cell three-dimensional genome and gene expression profiling uncovers dynamic enhancer connectivity underlying olfactory receptor choice

Article Open access 15 April 2024

Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis

Article Open access 25 March 2024

Data availability

Processed TRE calls are publicly accessible via our web portal (https://pints.yulab.org). Data that support the findings of this study are available within the paper and its Supplementary information files. All sequencing data analyzed in this study were retrieved from public databases (NCBI GEO and ENCODE portal); lists of accessions are available in Supplementary Tables 1 and 4. Source data are provided with this paper.

Code availability

The source code of PINTS is publicly available at https://github.com/hyulab/PINTS; scripts and pipelines used to generate results reported in this study can be retrieved from https://github.com/hyulab/PINTS_analysis.

References

Heintzman, N. D. et al. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat. Genet. 39, 311–318 (2007).
Article CAS PubMed Google Scholar
Calo, E. & Wysocka, J. Modification of enhancer chromatin: what, how, and why? Mol. Cell 49, 825–837 (2013).
Article CAS PubMed Google Scholar
Kim, T.-K. et al. Widespread transcription at neuronal activity-regulated enhancers. Nature 465, 182–187 (2010).
Article CAS PubMed PubMed Central Google Scholar
Descostes, N. et al. Tyrosine phosphorylation of RNA polymerase II CTD is associated with antisense promoter transcription and active enhancers in mammalian cells. eLife 3, e02105 (2014).
Article PubMed PubMed Central Google Scholar
Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).
Article CAS PubMed PubMed Central Google Scholar
Tippens, N. D. et al. Transcription imparts architecture, function and logic to enhancer units. Nat. Genet. 52, 1067–1075 (2020).
Article PubMed PubMed Central CAS Google Scholar
Core, L. J. et al. Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat. Genet. 46, 1311–1320 (2014).
Article CAS PubMed PubMed Central Google Scholar
Tome, J. M., Tippens, N. D. & Lis, J. T. Single-molecule nascent RNA sequencing identifies regulatory domain architecture at promoters and enhancers. Nat. Genet. 50, 1533–1541 (2018).
Article CAS PubMed PubMed Central Google Scholar
Kruesi, W. S., Core, L. J., Waters, C. T., Lis, J. T. & Meyer, B. J. Condensin controls recruitment of RNA polymerase II to achieve nematode X-chromosome dosage compensation. eLife 2, e00808 (2013).
Article PubMed PubMed Central CAS Google Scholar
Kwak, H., Fuda, N. J., Core, L. J. & Lis, J. T. Precise maps of RNA polymerase reveal how promoters direct initiation and pausing. Science 339, 950–953 (2013).
Article CAS PubMed PubMed Central Google Scholar
Henriques, T. et al. Widespread transcriptional pausing and elongation control at enhancers. Genes Dev. 32, 26–41 (2018).
Article CAS PubMed PubMed Central Google Scholar
Kodzius, R. et al. CAGE: cap analysis of gene expression. Nat. Methods 3, 211–222 (2006).
Article CAS PubMed Google Scholar
Batut, P., Dobin, A., Plessy, C., Carninci, P. & Gingeras, T. R. High-fidelity promoter profiling reveals widespread alternative promoter usage and transposon-driven developmental gene expression. Genome Res. 23, 169–180 (2013).
Article CAS PubMed PubMed Central Google Scholar
Hirabayashi, S. et al. NET-CAGE characterizes the dynamics and topology of human transcribed cis-regulatory elements. Nat. Genet. 51, 1369–1379 (2019).
Article CAS PubMed Google Scholar
Duttke, S. H., Chang, M. W., Heinz, S. & Benner, C. Identification and dynamic quantification of regulatory elements using total RNA. Genome Res. 29, 1836–1846 (2019).
Article CAS PubMed PubMed Central Google Scholar
Policastro, R. A., Raborn, R. T., Brendel, V. P. & Zentner, G. E. Simple and efficient profiling of transcription initiation and transcript levels with STRIPE-seq. Genome Res. 30, 910–923 (2020).
Article CAS PubMed PubMed Central Google Scholar
Core, L. J., Waterfall, J. J. & Lis, J. T. Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science 322, 1845–1848 (2008).
Article CAS PubMed PubMed Central Google Scholar
Nojima, T. et al. Mammalian NET-seq reveals genome-wide nascent transcription coupled to RNA processing. Cell 161, 526–540 (2015).
Article CAS PubMed PubMed Central Google Scholar
Paulsen, M. T. et al. Coordinated regulation of synthesis and stability of RNA during the acute TNF-induced proinflammatory response. Proc. Natl Acad. Sci. USA 110, 2240–2245 (2013).
Article CAS PubMed PubMed Central Google Scholar
Magnuson, B. et al. Identifying transcription start sites and active enhancer elements using BruUV-seq. Sci. Rep. 5, 17978 (2015).
Article CAS PubMed PubMed Central Google Scholar
Chen, H. et al. A pan-cancer analysis of enhancer expression in nearly 9000 patient samples. Cell 173, 386–399 (2018).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Z. et al. Transcriptional landscape and clinical utility of enhancer RNAs for eRNA-targeted therapy in cancer. Nat. Commun. 10, 4562 (2019).
Article PubMed PubMed Central CAS Google Scholar
Azofeifa, J. G. & Dowell, R. D. A generative model for the behavior of RNA polymerase. Bioinformatics 33, 227–234 (2017).
Article CAS PubMed Google Scholar
Danko, C. G. et al. Identification of active transcriptional regulatory elements from GRO-seq data. Nat. Methods 12, 433–438 (2015).
Article CAS PubMed PubMed Central Google Scholar
Wang, Z., Chu, T., Choate, L. A. & Danko, C. G. Identification of regulatory elements from nascent transcription using dREG. Genome Res. 29, 293–303 (2019).
Article CAS PubMed PubMed Central Google Scholar
Chu, T. et al. Chromatin run-on and sequencing maps the transcriptional regulatory landscape of glioblastoma multiforme. Nat. Genet. 50, 1553–1564 (2018).
Article CAS PubMed PubMed Central Google Scholar
Adiconis, X. et al. Comprehensive comparative analysis of 5′-end RNA-sequencing methods. Nat. Methods 15, 505–511 (2018).
Article CAS PubMed PubMed Central Google Scholar
Frith, M. C. et al. A code for transcription initiation in mammalian genomes. Genome Res. 18, 1–12 (2008).
Article CAS PubMed PubMed Central Google Scholar
Thakore, P. I. et al. Highly specific epigenome editing by CRISPR-Cas9 repressors for silencing of distal regulatory elements. Nat. Methods 12, 1143–1149 (2015).
Article CAS PubMed PubMed Central Google Scholar
Fulco, C. P. et al. Systematic mapping of functional enhancer-promoter connections with CRISPR interference. Science 354, 769–773 (2016).
Article CAS PubMed PubMed Central Google Scholar
Wakabayashi, A. et al. Insight into GATA1 transcriptional activity through interrogation of cis elements disrupted in human erythroid disorders. Proc. Natl Acad. Sci. USA 113, 4434–4439 (2016).
Article CAS PubMed PubMed Central Google Scholar
Klann, T. S. et al. CRISPR-Cas9 epigenome editing enables high-throughput screening for functional regulatory elements in the human genome. Nat. Biotechnol. 35, 561–568 (2017).
Article CAS PubMed PubMed Central Google Scholar
Xie, S., Duan, J., Li, B., Zhou, P. & Hon, G. C. Multiplexed engineering and analysis of combinatorial enhancer activity in single cells. Mol. Cell 66, 285–299 (2017).
Article CAS PubMed Google Scholar
Gasperini, M. et al. A genome-wide framework for mapping gene regulation via cellular genetic screens. Cell 176, 377–390 (2019).
Article CAS PubMed PubMed Central Google Scholar
Fulco, C. P. et al. Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019).
Article CAS PubMed PubMed Central Google Scholar
Xie, S., Armendariz, D., Zhou, P., Duan, J. & Hon, G. C. Global analysis of enhancer targets reveals convergent enhancer-driven regulatory modules. Cell Rep. 29, 2570–2578 (2019).
Article CAS PubMed PubMed Central Google Scholar
Schraivogel, D. et al. Targeted Perturb-seq enables genome-scale genetic screens in single cells. Nat. Methods 17, 629–635 (2020).
Article CAS PubMed PubMed Central Google Scholar
Kheradpour, P. et al. Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. Genome Res. 23, 800–811 (2013).
Article CAS PubMed PubMed Central Google Scholar
Kwasnieski, J. C., Fiore, C., Chaudhari, H. G. & Cohen, B. A. High-throughput functional testing of ENCODE segmentation predictions. Genome Res. 24, 1595–1602 (2014).
Article CAS PubMed PubMed Central Google Scholar
Ulirsch, J. C. et al. Systematic functional dissection of common genetic variation affecting red blood cell traits. Cell 165, 1530–1545 (2016).
Article CAS PubMed PubMed Central Google Scholar
Ernst, J. et al. Genome-scale high-resolution mapping of activating and repressive nucleotides in regulatory regions. Nat. Biotechnol. 34, 1180–1190 (2016).
Article CAS PubMed PubMed Central Google Scholar
Maricque, B. B., Chaudhari, H. G. & Cohen, B. A. A massively parallel reporter assay dissects the influence of chromatin structure on cis-regulatory activity. Nat. Biotechnol. 37, 90–95 (2019).
Article CAS Google Scholar
Rathert, P. et al. Transcriptional plasticity promotes primary and acquired resistance to BET inhibition. Nature 525, 543–547 (2015).
Article CAS PubMed PubMed Central Google Scholar
Dao, L. T. M. et al. Genome-wide characterization of mammalian promoters with distal enhancer functions. Nat. Genet. 49, 1073–1081 (2017).
Article CAS PubMed Google Scholar
Lee, D. et al. STARRPeaker: uniform processing and accurate identification of STARR-seq active regions. Genome Biol. 21, 298 (2020).
Article CAS PubMed PubMed Central Google Scholar
Wang, X. et al. High-resolution genome-wide functional dissection of transcriptional regulatory regions and nucleotides in human. Nat. Commun. 9, 5380 (2018).
Article CAS PubMed PubMed Central Google Scholar
Schwalb, B. et al. TT-seq maps the human transient transcriptome. Science 352, 1225–1228 (2016).
Article CAS PubMed Google Scholar
Core, L. J. et al. Defining the status of RNA polymerase at promoters. Cell Rep. 2, 1025–1035 (2012).
Article CAS PubMed PubMed Central Google Scholar
Mchaourab, Z. F., Perreault, A. A. & Venters, B. J. ChIP-seq and ChIP-exo profiling of Pol II, H2A.Z, and H3K4me3 in human K562 cells. Sci. Data 5, 180030 (2018).
Article CAS PubMed PubMed Central Google Scholar
Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).
Article CAS PubMed PubMed Central Google Scholar
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).
Article PubMed CAS Google Scholar
Jurka, J. Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 16, 418–420 (2000).
Article CAS PubMed Google Scholar
Djebali, S. et al. Landscape of transcription in human cells. Nature 489, 101–108 (2012).
Article CAS PubMed PubMed Central Google Scholar
Field, A. & Adelman, K. Evaluating enhancer function and transcription. Annu. Rev. Biochem. 89, 213–234 (2020).
Article CAS PubMed Google Scholar
Andersson, R. & Sandelin, A. Determinants of enhancer and promoter activities of regulatory elements. Nat. Rev. Genet. 21, 71–87 (2020).
Article CAS PubMed Google Scholar
Palazzo, A. F. & Koonin, E. V. Functional long non-coding RNAs evolve from junk transcripts. Cell 183, 1151–1161 (2020).
Article CAS PubMed Google Scholar
ENCODE Project Consortium et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020).
Wang, D. et al. Reprogramming transcription by distinct classes of enhancers functionally defined by eRNA. Nature 474, 390–394 (2011).
Article CAS PubMed PubMed Central Google Scholar
Chae, M., Danko, C. G. & Kraus, W. L. groHMM: a computational tool for identifying unannotated and cell type-specific transcription units from global run-on sequencing data. BMC Bioinformatics 16, 222 (2015).
Article PubMed PubMed Central CAS Google Scholar
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
Article PubMed PubMed Central CAS Google Scholar
Schneider, V. A. et al. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res. 27, 849–864 (2017).
Article CAS PubMed PubMed Central Google Scholar
Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Pennacchio, L. A., Bickmore, W., Dean, A., Nobrega, M. A. & Bejerano, G. Enhancers: five essential questions. Nat. Rev. Genet. 14, 288–295 (2013).
Article CAS PubMed PubMed Central Google Scholar
Vo Ngoc, L., Huang, C. Y., Cassidy, C. J., Medrano, C. & Kadonaga, J. T. Identification of the human DPR core promoter element using machine learning. Nature 585, 459–463 (2020).
Article CAS PubMed PubMed Central Google Scholar
Fornes, O. et al. JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 48, D87–D92 (2020).
CAS PubMed Google Scholar
Landrum, M. J. et al. ClinVar: improvements to accessing data. Nucleic Acids Res. 48, D835–D844 (2020).
Article CAS PubMed Google Scholar
Vahrenkamp, J. M. et al. FFPEcap-seq: a method for sequencing capped RNAs in formalin-fixed paraffin-embedded samples. Genome Res. 29, 1826–1835 (2019).
Article CAS PubMed PubMed Central Google Scholar
Yao, L., Wang, H., Song, Y. & Sui, G. BioQueue: a novel pipeline framework to accelerate bioinformatics analysis. Bioinformatics 33, 3286–3288 (2017).
Article CAS PubMed Google Scholar
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: An ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Article PubMed PubMed Central CAS Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Article CAS PubMed Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article PubMed PubMed Central CAS Google Scholar
Ramírez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016).
Article PubMed PubMed Central CAS Google Scholar
Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).
Article CAS PubMed PubMed Central Google Scholar
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
Article CAS PubMed PubMed Central Google Scholar
Seabold, S. & Perktold, J. Statsmodels: econometric and statistical modeling with Python. https://doi.org/10.25080/majora-92bf1922-011 (2010).
Dale, R. K., Pedersen, B. S. & Quinlan, A. R. Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinformatics 27, 3423–3424 (2011).
Article CAS PubMed PubMed Central Google Scholar
Cock, P. J. A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).
Article CAS PubMed PubMed Central Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).
Article CAS PubMed PubMed Central Google Scholar
Preker, P. et al. RNA exosome depletion reveals transcription upstream of active human promoters. Science 322, 1851–1854 (2008).
Article CAS PubMed Google Scholar
van Arensbergen, J. et al. Genome-wide mapping of autonomous promoter activity in human cells. Nat. Biotechnol. 35, 145–153 (2017).
Article PubMed CAS Google Scholar
Shivram, H. & Iyer, V. R. Identification and removal of sequencing artifacts produced by mispriming during reverse transcription in multiple RNA-seq technologies. RNA 24, 1266–1274 (2018).
Article CAS PubMed PubMed Central Google Scholar
Bedi, K., Paulsen, M. T., Wilson, T. E. & Ljungman, M. Characterization of novel primary miRNA transcription units in human cells using Bru-seq nascent RNA sequencing. NAR Genom. Bioinform. 2, lqz014 (2020).
Article PubMed CAS Google Scholar
Zacher, B. et al. Accurate promoter and enhancer identification in 127 ENCODE and roadmap epigenomics cell types and tissues by GenoSTAN. PLoS ONE 12, e0169249 (2017).
Article PubMed PubMed Central CAS Google Scholar

Download references

Acknowledgements

Computation was performed on a cluster administered by the Biotechnology Resource Center at Cornell University. We thank members of the Yu and Lis laboratories and the ENCODE Consortium (specifically A. Mortazavi, M. Ljungman and J. E. Moore) for helpful discussions and guidance; and H. Zhu for her suggestions on concept visualization. This work was supported by grants from the National Institutes of Health (no. UM1HG009393 to J.T.L. and H.Y. and nos. R01DK115398, R01DK127778 and R01HD082568 to H.Y.). L.Y. was supported by the Cornell Presidential Life Sciences Fellowship.

Author information

Authors and Affiliations

Department of Computational Biology, Cornell University, Ithaca, NY, USA
Li Yao, Alden King-Yung Leung, John T. Lis & Haiyuan Yu
Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
Li Yao, Jin Liang, Alden King-Yung Leung & Haiyuan Yu
Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA
Abdullah Ozer & John T. Lis

Authors

Li Yao
View author publications
You can also search for this author in PubMed Google Scholar
Jin Liang
View author publications
You can also search for this author in PubMed Google Scholar
Abdullah Ozer
View author publications
You can also search for this author in PubMed Google Scholar
Alden King-Yung Leung
View author publications
You can also search for this author in PubMed Google Scholar
John T. Lis
View author publications
You can also search for this author in PubMed Google Scholar
Haiyuan Yu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization was performed by L.Y., J.T.L. and H.Y. Methodology was carried out by L.Y. Software was the responsibility of L.Y. L.Y. carried out formal analysis. J.L. performed investigations. Data curation was carried out by L.Y., J.L. and A.K.-Y.L. L.Y. and J.L. wrote the original draft. Writing, review and editing were performed by J.L., A.O., J.T.L. and H.Y. Visualization was the responsibility of L.Y., J.L., A.O. and H.Y. J.T.L. and H.Y. supervised the study.

Corresponding authors

Correspondence to John T. Lis or Haiyuan Yu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Biotechnology thanks Leng Han and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 An extended evaluation of eRNA detection sensitivity of different assays.

a and c are the extended versions for Fig. 2a,b, respectively. a and b show the capability of different assays to capture previously identified enhancers. The color of stacked bars indicates the detection of eRNAs originated from either one or both strands of the enhancer loci. The transparency level shows the number of reads for an enhancer locus to be considered as covered. The top track in a is derived from the CRISPR or CRISPRi based reference set (n = 803), the bottom track is derived from consensus loci validated by STARR-seq and MPRA (n = 550). b, Sensitivity evaluated in the other cell line, GM12878, with orientation-independent enhancers identified from previous studies (n = 3,544)^6,46. c, Differences in read coverage among stable (n = 13,861) and unstable (n = 6,380) transcripts. The error bars in the top track show the extrema of effect sizes (n = 5,000). The center dots, box limits, and whiskers in the bottom track of c denote the median, upper and lower quartiles, and 1.5× interquartile range, respectively.

Source data

Extended Data Fig. 2 Effect of technical artifacts on eRNA capture.

a, A new strategy for evaluating strand specificity without the interference from promoter-upstream transcripts (PROMPTs)⁸¹. Red and blue colors indicate reads’ mapping direction; the highlighted (yellow) region indicates a previously validated⁸² PROMPT. Only the first exon in green was used for evaluation. b, Strand specificities of three stranded and unstranded RNA-seq libraries with our strategy. The p-value was estimated by a two-sided t test; c, Strand specificity for all libraries evaluated with our strategy. Values and error bars represent the mean and SD. n = 2 (GRO-cap, CoPRO, csRNA-seq, PRO-seq, GRO-seq, mNET-seq), n = 3 (STRIPE-seq), n = 4 (CAGE and RAMPAGE), n = 8 (BruUV-seq, total RNA-seq), n = 9 (Bru-seq). d, Distribution of 3-mers at flush end sites⁸³ for RIP-seq and TGIRT-seq. The dashed red lines stand for the frequency of RT3-mers (sequence identical to the last three nts for the RT primer [for RIP-seq] or the 3′ adapter [for TGIRT-seq]) in the genome. e, Log odds ratios (LORs) of observed RT3-mer at flushing end sites versus in the genome (top) and internal priming rates (bottom) of assays when the internal priming could be detected from the sequencing data. f, The overlap between enhancers in the RppH library (Capped+Uncapped as ‘C + U’) that are also covered in the Capped library (C). The x-axis shows the minimum number of reads required for an enhancer locus to be considered as covered. g, Difference of log-transformed read counts between the capped (C) and RppH (C + U) libraries. The effect size was measured by Cohen’s d. In the box plot, the center dots, box limits, and whiskers denote the median, upper and lower quartiles, and 1.5× interquartile range, respectively. h, Pearson’s r of log-transformed reads from promoters of expressed transcripts (TPM > 5) was quantified using PRO-seq and POLR2A ChIP-exo. n = 4,747 (low), n = 9,058 (medium), and n = 2,470 (high).

Source data

Extended Data Fig. 3 Analyses of factors affecting assays’ sensitivity in detecting eRNAs.

a is the extended version for Fig. 3a. b, An example shows that divergent transcripts detected by NT-assays can originate from two overlapping genes (MMP23B and SLC35E2B) instead of from a regulatory element. Sequencing reads were RPM-normalized. c, Proportion of mappable reads from different assays originated from various abundant RNA families. d, Effects of rRNA depletion in eRNA enrichment. For each category, three downsampled libraries were included. BruUV-seq libraries from a previously published study⁸⁴ were used for this analysis. The p-value for rRNA percentage was calculated by two proportions z test (two-sided, p-value: 0); the p-value for true enhancer coverage was calculated by McNemar’s test (two-sided, p-value: 2.1 × 10⁻²⁵). Values and error bars represent the mean and SD. e, The distribution of sequencing reads (in RPM) around GENCODE-annotated splicing junction sites. The shaded area indicates the 95% confidence interval of mean values estimated via bootstrap.

Source data

Extended Data Fig. 4 Extended evaluations of assays’ specificity.

a, Epigenomic and transcription factor binding profiles for the enhancer and non-enhancer sets. For H3K27ac and CTCF, the profiles are presented as fold-changes over control; for DHS, the profile is shown as normalized sequencing depth. Solid lines represent mean densities, and shades depict the 95% confidence interval of mean values estimated via bootstrap. KE: known enhancers; NE: non-enhancers. b Signal-to-noise ratios evaluated in K562. n = 803 for known enhancers, n = 6,777 for non-enhancers. c, Signal-to-noise ratios evaluated in GM12878. n = 3,544 (Known enhancers), and n = 153,809 (Non-enhancers). For b and c, 10,000 bootstrapped samples were used for calculating the fold enrichment (FE). The center dots, box limits, and whiskers in b and c denote the median, upper and lower quartiles, and 1.5× interquartile range, respectively. d, False discovery rates estimated by the overlap between the top 5,000, 10,000, 20,000, and 100,000 genomic bins and the true and non-enhancer sets. Downsampled libraries were used (n = 3); values and error bars represent the mean and SD.

Source data

Extended Data Fig. 5 Assessments of transcript unit prediction and schematic illustration of PINTS.

a, The consistencies vary greatly between transcription units annotated in GENCODE (Annot.) and those predicted by different tools^58,59,85 (Pred.). Lines in the violin plot indicate the 25th, 50th, and 75th quartiles, respectively. b, Schematic plot of PINTS. i, Improvement of TSS identification resolution by focusing only on read ends and using zero-inflated Poisson (ZIP) models to fit local background to address the substantially increased sparsity of signals. The thin grey lines indicate sequencing reads with the 5′ ends highlighted in red. ii, The existence of other potential true peaks (pink) elevates the estimation of read density in the local background. iii, A schematic plot shows how IQR-ZIP works. The blue box shows the read density distribution of the local background; the purple dot shows the density of the peak to be tested; the pink dot shows the density of a potential true peak close to the peak to be tested, whose read density is a clear outlier and thus excluded from local background estimation.

Source data

Extended Data Fig. 6 Profiles of peak calls generated by different peak callers for various assays.

a, Aggregated profiles of epigenomic marks, transcription binding sites, and chromatin accessibility in true enhancer regions and distal TREs identified by different peak callers for TSS- and NT-assays. The shaded area indicates the 95% confidence interval of mean values estimated via bootstrap; b, An example demonstrating why MACS2 is not suitable for identifying TREs. c, Distribution of element sizes identified from 12 assays by all applicable peak callers. In the box plot, the center lines, box limits, and whiskers denote the median, upper and lower quartiles, and 1.5× interquartile range, respectively; points show observations that are not in the range of quartiles ±1.5 × (Q₃ − Q₁). A table of sample sizes is available in Supplementary Table 5.

Extended Data Fig. 7 Extended analyses on the robustness of element predictions.

a, A previous study showed that the sequences between hg19 and hg38 are very similar as hg38 has 0.09% more ungapped non-centromeric sequences than hg19, only 0.17% of ungapped hg19 sequences are not in hg38⁶¹. Here we show the distribution of sequencing reads in the genome. The read counts of each assay were summarized against their frequency in a log scale with hg19 as blue lines and hg38 as orange lines. The p-values were calculated by two-sided Student’s t tests. b, Robustness (Jaccard index) of different peak callers when applying them to experimental data with technical and biological replicates. Correlations between alignments (Sample cor.) were calculated as Pearson’s r of log-transformed read counts among genomic bins (500 bp).

Source data

Extended Data Fig. 8 Performance evaluation of peak callers under different sequencing depths.

a, Epigenomic patterns of the true positive (enhancers, promoters) and true negative (non-enhancers) sets used for ROC calculation for peak calling from GRO-cap. b~d, Sensitivity and specificity of different peak callers when analyzing TSS-libraries (n=7) downsampled to 18.9 (b), 15 (c), and 10 (d) million mappable reads. The corresponding shaded areas show the 95% confidence interval of the means (via bootstrap). For tools where ROCs cannot be calculated, solid dots represent their performance with default parameters. Values and error bars show mean and SD.

Source data

Extended Data Fig. 9 Profiles of unique distal elements identified by different tools.

a, Comparison of the epigenomic signals (fold change over control) in elements uniquely identified by PINTS and other tools. b, Enrichment (measured as log odds ratios) of TF-binding motifs in PINTS unique TREs compared to other tools. The circles indicate the corresponding p-values (−log₂ p, two-sided z tests), and the error bars indicate the 90% confidence interval.

Source data

Extended Data Fig. 10 A summary of the computational tools compared in this study.

The features of different algorithms are summarized and grouped by their roles in the peak calling procedure (colored blocks). Features utilized by each tool to call peaks from nascent transcript sequencing data are indicated.

Supplementary information

Supplementary Information

Supplementary Notes

Reporting Summary

Supplementary Tables 1–5.

Supplementary Table 1: Summaries of sequencing libraries analyzed in this study. Supplementary Table 2: Known enhancer sets. Supplementary Table 3: Non-enhancer set. Supplementary Table 4: Datasets integrated in the PINTS web server. Supplementary Table 5: Sample size for TREs and each tool predicted in different assays.

Source data

Source Data Fig. 2

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Extended Data Fig. 1

Statistical source data.

Source Data Extended Data Fig. 2

Statistical source data.

Source Data Extended Data Fig. 3

Statistical source data.

Source Data Extended Data Fig. 4

Statistical source data.

Source Data Extended Data Fig. 5

Statistical source data.

Source Data Extended Data Fig. 7

Statistical source data.

Source Data Extended Data Fig. 8

Statistical source data.

Source Data Extended Data Fig. 9

Statistical source data.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yao, L., Liang, J., Ozer, A. et al. A comparison of experimental assays and analytical methods for genome-wide identification of active enhancers. Nat Biotechnol 40, 1056–1065 (2022). https://doi.org/10.1038/s41587-022-01211-7

Download citation

Received: 03 June 2021
Accepted: 06 January 2022
Published: 17 February 2022
Issue Date: July 2022
DOI: https://doi.org/10.1038/s41587-022-01211-7

This article is cited by

Acetylation of histone H2B marks active enhancers and predicts CBP/p300 target genes
- Takeo Narita
- Yoshiki Higashijima
- Chunaram Choudhary
Nature Genetics (2023)
Transposable elements as tissue-specific enhancers in cancers of endodermal lineage
- Konsta Karttunen
- Divyesh Patel
- Biswajyoti Sahu
Nature Communications (2023)
Toward a comprehensive catalog of regulatory elements
- Kaili Fan
- Edith Pfister
- Zhiping Weng
Human Genetics (2023)
Using Synthetic DNA Libraries to Investigate Chromatin and Gene Regulation
- Holly Kleinschmidt
- Cheng Xu
- Lu Bai
Chromosoma (2023)
Population-level variation in enhancer expression identifies disease mechanisms in the human brain
- Pengfei Dong
- Gabriel E. Hoffman
- Panos Roussos
Nature Genetics (2022)

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links