The identification of transcriptional enhancers in the human genome is a prime goal in biology. Enhancers are typically predicted via chromatin marks, yet their function is primarily assessed with plasmid-based reporter assays. Here, we show that such assays are rendered unreliable by two previously reported phenomena relating to plasmid transfection into human cells: (i) the bacterial plasmid origin of replication (ORI) functions as a conflicting core promoter and (ii) a type I interferon (IFN-I) response is activated. These cause confounding false positives and negatives in luciferase assays and STARR-seq screens. We overcome both problems by employing the ORI as core promoter and by inhibiting two IFN-I-inducing kinases, enabling genome-wide STARR-seq screens in human cells. In HeLa-S3 cells, we uncover strong enhancers, IFN-I-induced enhancers, and enhancers endogenously silenced at the chromatin level. Our findings apply to all episomal enhancer activity assays in mammalian cells and are key to the characterization of human enhancers.
Your institute does not have access to this article
Open Access articles citing this article.
Journal of Animal Science and Biotechnology Open Access 04 July 2022
Epromoters function as a hub to recruit key transcription factors required for the inflammatory response
Nature Communications Open Access 18 November 2021
An explainable artificial intelligence approach for decoding the enhancer histone modifications code and identification of novel enhancers in Drosophila
Genome Biology Open Access 08 November 2021
Subscribe to Nature+
Get immediate online access to the entire Nature family of 50+ journals
Subscribe to Journal
Get full journal access for 1 year
only $9.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Gene Expression Omnibus
Shlyueva, D., Stampfel, G. & Stark, A. Transcriptional enhancers: from properties to genome-wide predictions. Nat. Rev. Genet. 15, 272–286 (2014).
Santiago-Algarra, D., Dao, L.T.M., Pradel, L., España, A. & Spicuglia, S. Recent advances in high-throughput approaches to dissect enhancer function. F1000Res. 6, 939 (2017).
Lemp, N.A., Hiraoka, K., Kasahara, N. & Logg, C.R. Cryptic transcripts from a ubiquitous plasmid origin of replication confound tests for cis-regulatory function. Nucleic Acids Res. 40, 7280–7290 (2012).
Zabidi, M.A. et al. Enhancer-core-promoter specificity separates developmental and housekeeping gene regulation. Nature 518, 556–559 (2015).
Saragosti, S., Moyne, G. & Yaniv, M. Absence of nucleosomes in a fraction of SV40 chromatin between the origin of replication and the region coding for the late leader RNA. Cell 20, 65–73 (1980).
Arnold, C.D. et al. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science 339, 1074–1077 (2013).
Arnold, C.D. et al. Genome-wide assessment of sequence-intrinsic enhancer responsiveness at single-base-pair resolution. Nat. Biotechnol. 35, 136–144 (2017).
Juven-Gershon, T., Cheng, S. & Kadonaga, J.T. Rational design of a super core promoter that enhances gene expression. Nat. Methods 3, 917–922 (2006).
Pine, R., Levy, D.E., Reich, N. & Darnell, J.E. Jr. Transcriptional stimulation by CaPO4-DNA precipitates. Nucleic Acids Res. 16, 1371–1378 (1988).
Ishikawa, H., Ma, Z. & Barber, G.N. STING regulates intracellular DNA-mediated, type I interferon-dependent innate immunity. Nature 461, 788–792 (2009).
Paludan, S.R. & Bowie, A.G. Immune sensing of DNA. Immunity 38, 870–880 (2013).
Bridge, A.J., Pebernard, S., Ducraux, A., Nicoulaz, A.-L. & Iggo, R. Induction of an interferon response by RNAi vectors in mammalian cells. Nat. Genet. 34, 263–264 (2003).
Dao, L.T.M. et al. Genome-wide characterization of mammalian promoters with distal enhancer functions. Nat. Genet. 49, 1073–1081 (2017).
Landry, J.J.M. et al. The genomic and transcriptomic landscape of a HeLa cell line. G3 (Bethesda) 3, 1213–1224 (2013).
Tewhey, R. et al. Direct identification of hundreds of expression-modulating variants using a multiplexed reporter assay. Cell 165, 1519–1529 (2016).
Nguyen, T.A. et al. High-throughput functional comparison of promoter and enhancer activities. Genome Res. 26, 1023–1033 (2016).
Chen, Q., Sun, L. & Chen, Z.J. Regulation and function of the cGAS–STING pathway of cytosolic DNA sensing. Nat. Immunol. 17, 1142–1149 (2016).
Chan, Y.K. & Gack, M.U. Viral evasion of intracellular DNA and RNA sensing. Nat. Rev. Microbiol. 14, 360–373 (2016).
Nejepinska, J., Malik, R., Wagner, S. & Svoboda, P. Reporters transiently transfected into mammalian cells are highly sensitive to translational repression induced by dsRNA expression. PLoS One 9, e87517 (2014).
Nejepinska, J., Malik, R., Moravec, M. & Svoboda, P. Deep sequencing reveals complex spurious transcription from transiently transfected plasmids. PLoS One 7, e43283 (2012).
Clark, K., Plater, L., Peggie, M. & Cohen, P. Use of the pharmacological inhibitor BX795 to study the regulation and physiological roles of TBK1 and IkappaB kinase epsilon: a distinct upstream kinase mediates Ser-172 phosphorylation and activation. J. Biol. Chem. 284, 14136–14146 (2009).
Jammi, N.V., Whitby, L.R. & Beal, P.A. Small molecule inhibitors of the RNA-dependent protein kinase. Biochem. Biophys. Res. Commun. 308, 50–57 (2003).
McLean, C.Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 28, 495–501 (2010).
Guertin, M.J. & Lis, J.T. Mechanisms by which transcription factors gain access to target sequence elements in chromatin. Curr. Opin. Genet. Dev. 23, 116–123 (2013).
Ernst, J. & Kellis, M. ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods 9, 215–216 (2012).
Imrichová, H., Hulselmans, G., Atak, Z.K., Potier, D. & Aerts, S. i-cisTarget 2015 update: generalized cis-regulatory enrichment analysis in human, mouse and fly. Nucleic Acids Res. 43, W57–W64 (2015).
White, R.J. Transcription by RNA polymerase III: more complex than we thought. Nat. Rev. Genet. 12, 459–463 (2011).
Oler, A.J. et al. Human RNA polymerase III transcriptomes and relationships to Pol II promoter chromatin and enhancer-binding factors. Nat. Struct. Mol. Biol. 17, 620–628 (2010).
Schuettengruber, B., Chourrout, D., Vervoort, M., Leblanc, B. & Cavalli, G. Genome regulation by polycomb and trithorax proteins. Cell 128, 735–745 (2007).
Bonn, S. et al. Tissue-specific analysis of chromatin state identifies temporal signatures of enhancer activity during embryonic development. Nat. Genet. 44, 148–156 (2012).
Rada-Iglesias, A. et al. A unique chromatin signature uncovers early developmental enhancers in humans. Nature 470, 279–283 (2011).
Creyghton, M.P. et al. Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc. Natl. Acad. Sci. USA 107, 21931–21936 (2010).
Smit, A., Hubley, R. & Green, P. RepeatMasker Open-4.0. 2013–2015 http://repeatmasker.org (2014).
Friedli, M. & Trono, D. The developmental control of transposable elements and the evolution of higher species. Annu. Rev. Cell Dev. Biol. 31, 429–451 (2015).
Chuong, E.B., Elde, N.C. & Feschotte, C. Regulatory evolution of innate immunity through co-option of endogenous retroviruses. Science 351, 1083–1087 (2016).
Kunarso, G. et al. Transposable elements have rewired the core regulatory network of human embryonic stem cells. Nat. Genet. 42, 631–634 (2010).
Stamatoyannopoulos, J.A. What does our genome encode? Genome Res. 22, 1602–1611 (2012).
Li, W., Notani, D. & Rosenfeld, M.G. Enhancers as non-coding RNA transcription units: recent insights and future perspectives. Nat. Rev. Genet. 17, 207–223 (2016).
van Arensbergen, J. et al. Genome-wide mapping of autonomous promoter activity in human cells. Nat. Biotechnol. 35, 145–153 (2017).
Barakat, T.S. et al. Functional dissection of the enhancer repertoire in human embryonic stem cells. bioRxiv 146696, 10.1101/146696 (2017).
Wang, X. et al. High-resolution genome-wide functional dissection of transcriptional regulatory regions in human. bioRxiv 193136, 10.1101/193136 (2017).
Nehlsen, K., Broll, S. & Bode, J. Replicating minicircles: generation of nonviral episomes for the efficient modification of dividing cells. Gene Ther. Mol. Biol. 10, 233–244 (2006).
Walters, A.A. et al. Comparative analysis of enzymatically produced novel linear DNA constructs with plasmids for use as DNA vaccines. Gene Ther. 21, 645–652 (2014).
Shen, S.Q. et al. Massively parallel cis-regulatory analysis in the mammalian central nervous system. Genome Res. 26, 238–255 (2016).
Inoue, F. et al. A systematic comparison reveals substantial differences in chromosomal versus episomal encoding of enhancer activity. Genome Res. 27, 38–52 (2017).
Maricque, B.B., Dougherty, J.D. & Cohen, B.A. A genome-integrated massively parallel reporter assay reveals DNA sequence determinants of cis-regulatory activity in neural cells. Nucleic Acids Res. 45, e16 (2017).
Rickels, R. et al. An evolutionary conserved epigenetic mark of polycomb response elements implemented by Trx/MLL/COMPASS. Mol. Cell 63, 318–328 (2016).
Lanoix, J. & Acheson, N.H. A rabbit beta-globin polyadenylation signal directs efficient termination of transcription of polyomavirus DNA. EMBO J. 7, 2515–2522 (1988).
Ishida, Y. & Leder, P. RET: a poly A-trap retrovirus vector for reversible disruption and expression monitoring of genes in living cells. Nucleic Acids Res. 27, e35 (1999).
Vitter, J.S. Random sampling with a reservoir. ACM Trans. Math. Softw. 11, 37–57 (1985).
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Quinlan, A.R. & Hall, I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Stark, A. et al. Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures. Nature 450, 219–232 (2007).
Aken, B.L. et al. Ensembl 2017. Nucleic Acids Res. 45, D635–D642 (2017).
Alexa, A. & Rahnenfuhrer, J. topGO: enrichment analysis for gene ontology. (2016).
Bailey, T.L. & Gribskov, M. Combining evidence using p-values: application to sequence homology searches. Bioinformatics 14, 48–54 (1998).
Saldanha, A.J. Java Treeview—extensible visualization of microarray data. Bioinformatics 20, 3246–3248 (2004).
Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).
Bray, N.L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).
pheatmap: Pretty heatmaps. (Kolde, R., 2015).
FitzGerald, P.C., Sturgill, D., Shyakhtenko, A., Oliver, B. & Vinson, C. Comparative genomics of Drosophila and human core promoters. Genome Biol. 7, R53 (2006).
Ohler, U., Liao, G.-C., Niemann, H. & Rubin, G.M. Computational analysis of core promoters in the Drosophila genome. Genome Biol. 3, RESEARCH0087 (2002).
Parry, T.J. et al. The TCT motif, a key component of an RNA polymerase II transcription system for the translational machinery. Genes Dev. 24, 2013–2018 (2010).
Livak, K.J. & Schmittgen, T.D. Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods 25, 402–408 (2001).
Stark, A. et al. STARR-seq library preparation. Nat. Protoc. 10.1038/nprot.2017.144 (2017).
Stark, A. et al. STARR-seq screening protocol. Nat. Protoc. 10.1038/nprot.2017.148 (2017).
Stark, A. et al. qPCR assay to measure ISG expression in human cells. Nat. Protoc. 10.1038/nprot.2017.145 (2017).
Stark, A. et al. qPCR based reporter assay on luciferase transcripts. Nat. Protoc. 10.1038/nprot.2017.146 (2017).
We thank T. Decker and G. Versteeg (Max F. Perutz Laboratories & University of Vienna), S. Aerts (VIB-KU Leuven), P. Svoboda (Institute of Molecular Genetics of the ASCR), P. Andersen (IMBA), P. Heine and E. Jans (MaxCyte Inc.) and J. Zuber (IMP) for helpful discussions and reagents. Deep sequencing was performed at the VBCF Next-Generation Sequencing Unit (http://vbcf.ac.at). F.M. was supported by an EMBO long-term fellowship (EMBO ALTF 491–2014). Research in the Stark group is supported by the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement no. 647320) and by the Austrian Science Fund (FWF, F4303-B09). Basic research at the IMP is supported by Boehringer Ingelheim GmbH and the Austrian Research Promotion Agency (FFG).
The authors declare no competing financial interests.
Integrated supplementary information
a, The core-promoter motif content is shown for the ORI of the pGL4 family. The track above is taken from Figure 1a, panel for SCP1. b, Percent of sites within the indicated plasmid elements (backbone, ORI, SCP1) that initiate at the indicated threshold counts (1,2: background; 3-5: very weak; compare Figure 1a, Supplementary Figure 1a). c, Representative STARR-seq enhancer activity profiles obtained for reporter setups from Figure 1a are shown. The Refseq (GRCh37) gene track is indicated above, the dashed boxes indicate luciferase-validated enhancers, the red asterisk high background signal. d, Representative enhancer activity profiles over three gene loci (indicated above; H3K27ac data from ENCODE, Supplementary Table 3) for an SCP1 STARR-seq screen (see Supplementary Figure 1c and text). The upper profile is obtained from all reporter transcripts and the lower two from transcripts that either initiated in the ORI (top) or the SCP1 (bottom, see methods for details on stratification). e, Scatter plot of STARR-seq signal-over-background between screens employing SCP1 or the ORI as a core-promoter. Shown are all peaks called in either screen, highlighting predicted enhancers (n=39; green) and luciferase validated enhancers (n=4; red; compare to Figure 1d). f, STARR-seq signal-to-noise between screens employing SCP1 or the ORI as a core-promoter over 10 luciferase tested candidates (4 positive over 6 negative). P-value as stated (two-sided paired t-test, lines indicate pairing). g, Scatter plot of STARR-seq signal-over-background between screens employing SCP1 or the ORI as a core-promoter in HCT-116 cells (details as in Supplementary Figure 1e; predicted enhancers: n=27; compare to Figure 1d). h, Proposed plasmid-based reporter setups for luciferase assays, barcode-based MPRAs, and STARR-seq (for results using the proposed luciferase and STARR-seq setups see main text). For all setups, we propose to position the enhancer candidate downstream, which allows the specific assessment of enhancer- rather than promoter activity, and the use of the ORI as the core promoter. i, Proposed cloning strategies for different reporter setups employing the ORI as a core-promoter. j, Alternative strategy to block ORI-derived transcripts based on the insertion of poly-adenylation sites and splice acceptors downstream of the ORI. Note that the introduction of poly-adenylation sites alone – as present for example in pGL3 and pGL4 – are ineffective due to extensive splicing of the ORI-derived transcript (compare blocking constructs 1 and 2 to 3 and 4). The right panel depicts signal over background for the indicated constructs over predicted enhancers (n=39). Bars represent mean signal, error bars indicate 75% confidence intervals, P-values as stated (two-sided paired Fisher’s LSD test).
Supplementary Figure 2 Most ENCODE cell lines are likely capable of mounting an IFN-I response to cytosolic nucleic acids
a, Hierarchical clustering of ENCODE cell lines based on their expression profiles of four genes involved in DNA- and RNA-triggered innate immunity. Blue and orange stars indicate a functioning or inactive (cGAS/STING pathway, respectively. b, c, qPCR-based assessment of interferon stimulated genes (ISGs) induction after plasmid transfection in cells likely to have a non-functional (B, orange) or functioning DNA sensing pathway (C, blue), respectively (TUBB serves as control). Colored bars represent the mean fold change in endogenous mRNA expression levels (log2) after plasmid transfection across three independent transfections (grey dots). d, e, qPCR-based assessment of mRNA induction of endogenous ISGs in HeLa-S3 cells electroporated with different plasmid types (d) and by different means of transfection (e). In E, RNA extraction was performed at two different time points to account for the different kinetics of DNA delivery between electroporation and chemical transfection. In all cases, mean fold change (colored bars) in mRNA expression levels (log2) after transfection is shown across three independent transfections (grey dots).
a, The five most significant disease ontology terms and their enrichments reported by GREAT23 for the top 500 accessible peaks in STARR-seq screens without and with TBK1/IKK/PKR-inhibitors. Shown are log10 transformed FDR-adjusted P-values (as reported by GREAT) and fold-enrichments (shades of purple). b, The 10 most significantly enriched GO terms for genes proximal to the top 1000 peaks in the HeLa-S3 STARR-seq screen with TBK1/IKK/PKR-inhibitors. Shown are log10 transformed FDR-adjusted P-values (one-sided Fisher’s exact test) and fold-enrichments (shades of purple; compare to Figure 2c). c, The 10 most significantly enriched GO terms and their enrichments among genes proximal to enhancers that are at least 5-fold down-regulated upon TBK1/IKK/PKR-inhibitor treatment (FDR-adjusted P-value < 0.001). d, HeLa-S3 genome wide STARR-seq screens with TBK1/IKK/PKR inhibitors vs. without inhibitors. Differential peaks are highlighted as indicated at fold change cutoffs of 2 and 5 (adjusted P-value < 0.001). 39 peaks close to the TSS of IFN-I stimulated genes (‘ISG’; based on GO annotation) are shown as black circles. e, Representative STARR-seq enhancer activity profiles for canonical ISGs (H3K27ac data from ENCODE, Supplementary Table 3). Note that their differential activity in screens without (red) and with (green) TBK1/IKK/PKR inhibition is a good indicator for their inducibility by IFN-I. The peak ranks are listed below each strong enhancer, indicating that these regions are among the most highly active in STARR-seq screens performed without inhibitors. f, g, Bar graphs and scatter plots of average signal-over-background of STARR-seq screens in cells with or without TBK1/IKK/PKR-inhibitor treatment for HeLa-S3 cells (f) and HCT-116 cells (g). Shown are all peaks called in either screen, highlighted are predicted enhancers (green, n=39 for HeLa-S3; n=27 for HCT-116) and luciferase validated enhancers (red; n=4). Bars represent mean signal, error bars indicate 75% confidence intervals, P-values as stated (two-sided paired t-test). h, Percent recovery of peak calls in focused STARR-seq screens in HeLa-S3 cells employing the ORI or SCP1 as core-promoters as a fraction of peak calls in the ORI setup with inhibitor treatment. i, Number of peak calls at the indicated enrichment cutoffs in focused STARR-seq screens in HeLa-S3 cells employing the ORI or SCP1 as core-promoters.
Supplementary Figure 4 STARR-seq enhancers are mostly intergenic or intronic and are enriched in enhancer-like chromatin states
a, Average percentages of genomic annotations for the human genome (top) and STARR-seq peaks (bottom). CDS = coding sequence, 3’ / 5’ UTR = 3’ / 5’ untranslated region, n.c. exon = non-coding exon, upstream = -2 kb from the TSS. b, Enrichment of different ChromHMM states25 within non-overlapping bins of 500 STARR-seq enhancers (ranked by corrected fold-enrichment, left) and shifted control regions (+50kb, right). c, Normalized enrichment scores for different HeLa-S3 ChIP-seq datasets (NES, i-cisTarget26) for chromHMM strong enhancers (‘Enh’) without STARR-seq support and open STARR-seq enhancers that do not overlap chromHMM strong enhancers (‘Enh’) and the respective fold-differences (right, log2). d, Odds of motif occurrence (odds > 8 in any condition; FDR-adjusted P value < 10−5, two-sided Fisher’s exact test) in proximal, distal intergenic or distal intronic STARR-seq enhancers over random regions. e, Average ENCODE ChIP-seq signal of ELK1, ELK4, JUN and FOS for proximal, distal intergenic or distal intronic STARR-seq enhancers compared to random regions (grey). f, Average ENCODE ChIP-seq signal (read coverage) of H3K4me1, H3K4me3 and H3K27ac for proximal, distal intergenic or distal intronic STARR-seq enhancers compared to random regions (grey).
a, Cumulative percentage of STARR-seq peaks explained as DNase-I-hypersensitive sites in ENCODE cell lines (blue; grey indicates chance expectation based on random regions). From left to right, each additional dataset is intersected with regions not yet explained by the previous dataset (HeLa-S3 first, followed by additional cell types in alphabetical order). b, Odds ratios (FDR-adjusted P-value < 0.05, two-sided Fisher’s exact test) of indicated transposable elements in STARR-seq enhancers inaccessible in HeLa-S3 cells (DHS, P-value < 0.05, binomial test) vs. 1x106 random control regions.
Supplementary Figures 1–5 (PDF 1206 kb)
Life Sciences Reporting Summary (PDF 130 kb)
Supplementary Tables 1, 3, and 5. (PDF 110 kb)
STARR-seq Library preparation protocol (PDF 330 kb)
STARR-seq Screening protocol (PDF 429 kb)
qPCR assay to measure ISG expression in human cells (PDF 241 kb)
qPCR based reporter assay on luciferase transcripts (PDF 385 kb)
STARR-seq peaks and shortlisted regions (XLSX 7816 kb)
STARR-seq cloning sequences and oligos (XLSX 47 kb)
About this article
Cite this article
Muerdter, F., Boryń, Ł., Woodfin, A. et al. Resolving systematic errors in widely used enhancer activity assays in human cells. Nat Methods 15, 141–149 (2018). https://doi.org/10.1038/nmeth.4534
Journal of Animal Science and Biotechnology (2022)
DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers
Nature Genetics (2022)
Genetic associations at regulatory phenotypes improve fine-mapping of causal variants for 12 immune-mediated diseases
Nature Genetics (2022)