Massively parallel reporter assays (MPRAs) enable nucleotide-resolution dissection of transcriptional regulatory regions, such as enhancers, but only few regions at a time. Here we present a combined experimental and computational approach, Systematic high-resolution activation and repression profiling with reporter tiling using MPRA (Sharpr-MPRA), that allows high-resolution analysis of thousands of regions simultaneously. Sharpr-MPRA combines dense tiling of overlapping MPRA constructs with a probabilistic graphical model to recognize functional regulatory nucleotides, and to distinguish activating and repressive nucleotides, using their inferred contribution to reporter gene expression. We used Sharpr-MPRA to test 4.6 million nucleotides spanning 15,000 putative regulatory regions tiled at 5-nucleotide resolution in two human cell types. Our results recovered known cell-type-specific regulatory motifs and evolutionarily conserved nucleotides, and distinguished known activating and repressive motifs. Our results also showed that endogenous chromatin state and DNA accessibility are both predictive of regulatory function in reporter assays, identified retroviral elements with activating roles, and uncovered 'attenuator' motifs with repressive roles in active chromatin.
Subscribe to Journal
Get full journal access for 1 year
only $21.58 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Gene Expression Omnibus
Heintzman, N.D. et al. Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature 459, 108–112 (2009).
Ernst, J. & Kellis, M. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat. Biotechnol. 28, 817–825 (2010).
Ernst, J. et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43–49 (2011).
Boyle, A.P. et al. High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. Genome Res. 21, 456–464 (2011).
Pique-Regi, R. et al. Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res. 21, 447–455 (2011).
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Neph, S. et al. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature 489, 83–90 (2012).
Thurman, R.E. et al. The accessible chromatin landscape of the human genome. Nature 489, 75–82 (2012).
Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Claussnitzer, M. et al. FTO Obesity Variant Circuitry and Adipocyte Browning in Humans. N. Engl. J. Med. 373, 895–907 (2015).
Kheradpour, P. & Kellis, M. Systematic discovery and characterization of regulatory motifs in ENCODE TF binding experiments. Nucleic Acids Res. 42, 2976–2987 (2014).
Lee, D. et al. A method to predict the impact of regulatory variants from DNA sequence. Nat. Genet. 47, 955–961 (2015).
Gröschel, S. et al. A single oncogenic enhancer rearrangement causes concomitant EVI1 and GATA2 deregulation in leukemia. Cell 157, 369–381 (2014).
Patwardhan, R.P. et al. High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis. Nat. Biotechnol. 27, 1173–1175 (2009).
Melnikov, A. et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat. Biotechnol. 30, 271–277 (2012).
Patwardhan, R.P. et al. Massively parallel functional dissection of mammalian enhancers in vivo. Nat. Biotechnol. 30, 265–270 (2012).
Kwasnieski, J.C., Mogno, I., Myers, C.A., Corbo, J.C. & Cohen, B.A. Complex effects of nucleotide variants in a mammalian cis-regulatory element. Proc. Natl. Acad. Sci. USA 109, 19498–19503 (2012).
Vierstra, J. et al. Functional footprinting of regulatory DNA. Nat. Methods 12, 927–930 (2015).
Canver, M.C. et al. BCL11A enhancer dissection by Cas9-mediated in situ saturating mutagenesis. Nature 527, 192–197 (2015).
Shen, S.Q. et al. Massively parallel cis-regulatory analysis in the mammalian central nervous system. Genome Res. 26, 238–255 (2016).
Rajagopal, N. et al. High-throughput mapping of regulatory DNA. Nat. Biotechnol. 34, 167–174 (2016).
Korkmaz, G. et al. Functional genetic screens for enhancer elements in the human genome using CRISPR-Cas9. Nat. Biotechnol. 34, 192–198 (2016).
Arnold, C.D. et al. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science 339, 1074–1077 (2013).
Kheradpour, P. et al. Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. Genome Res. 23, 800–811 (2013).
Gisselbrecht, S.S. et al. Highly parallel assays of tissue-specific enhancers in whole Drosophila embryos. Nat. Methods 10, 774–780 (2013).
Dickel, D.E. et al. Function-based identification of mammalian enhancers using site-specific integration. Nat. Methods 11, 566–571 (2014).
Murtha, M. et al. FIREWACh: high-throughput functional detection of transcriptional regulatory modules in mammalian cells. Nat. Methods 11, 559–565 (2014).
Kwasnieski, J.C., Fiore, C., Chaudhari, H.G. & Cohen, B.A. High-throughput functional testing of ENCODE segmentation predictions. Genome Res. 24, 1595–1602 (2014).
Ernst, J. & Kellis, M. Interplay between chromatin state, regulator binding, and regulatory motifs in six human cell types. Genome Res. 23, 1142–1154 (2013).
Hoffman, M.M. et al. Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res. 41, 827–841 (2013).
Piper, J. et al. Wellington: a novel method for the accurate identification of digital genomic footprints from DNase-seq data. Nucleic Acids Res. 41, e201 (2013).
Sherwood, R.I. et al. Discovery of directional and nondirectional pioneer transcription factors by modeling DNase profile magnitude and shape. Nat. Biotechnol. 32, 171–178 (2014).
Lindblad-Toh, K. et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478, 476–482 (2011).
Raghav, S.K. et al. Integrative genomics identifies the corepressor SMRT as a gatekeeper of adipogenesis through the transcription factors C/EBPβ and KAISO. Mol. Cell 46, 335–350 (2012).
Blattler, A. et al. ZBTB33 binds unmethylated regions of the genome associated with actively expressed genes. Epigenetics Chromatin 6, 13 (2013).
Xie, X. et al. Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals. Nature 434, 338–345 (2005).
Mikula, M. et al. Comprehensive analysis of the palindromic motif TCTCGCGAGA: a regulatory element of the HNRNPK promoter. DNA Res. 17, 245–260 (2010).
Hu, J.H., Navas, P., Cao, H., Stamatoyannopoulos, G. & Song, C.-Z. Systematic RNAi studies on the role of Sp/KLF factors in globin gene expression and erythroid differentiation. J. Mol. Biol. 366, 1064–1073 (2007).
Watts, J.A. et al. Study of FoxA pioneer factor at silent genes reveals Rfx-repressed enhancer at Cdx2 and a potential indicator of esophageal adenocarcinoma development. PLoS Genet. 7, e1002277 (2011).
Yang, Y. & Cvekl, A. Large Maf Transcription Factors: Cousins of AP-1 Proteins and Important Regulators of Cellular Differentiation. Einstein J. Biol. Med. 23, 2–11 (2007).
Bannert, N. & Kurth, R. Retroelements and the human genome: new perspectives on an old relation. Proc. Natl. Acad. Sci. USA 101 (Suppl. 2), 14572–14579 (2004).
Wang, T. et al. Species-specific endogenous retroviruses shape the transcriptional network of the human tumor suppressor protein p53. Proc. Natl. Acad. Sci. USA 104, 18613–18618 (2007).
Kunarso, G. et al. Transposable elements have rewired the core regulatory network of human embryonic stem cells. Nat. Genet. 42, 631–634 (2010).
Song, L. et al. Open chromatin defined by DNaseI and FAIRE identifies regulatory elements that shape cell-type identity. Genome Res. 21, 1757–1767 (2011).
Creyghton, M.P. et al. Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc. Natl. Acad. Sci. USA 107, 21931–21936 (2010).
Ernst, J. & Kellis, M. ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods 9, 215–216 (2012).
Hoffman, M.M. et al. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat. Methods 9, 473–476 (2012).
Ulirsch, J.C. et al. Systematic functional dissection of common genetic variation affecting red blood cell traits. Cell 165, 1530–1545 (2016).
Tewhey, R. et al. Direct identification of hundreds of expression-modulating variants using a multiplexed reporter assay. Cell 165, 1519–1529 (2016).
Sammons, M.A., Zhu, J., Drake, A.M. & Berger, S.L. TP53 engagement with the genome occurs in distinct local chromatin environments via pioneer factor activity. Genome Res. 25, 179–188 (2015).
Melnikov, A., Zhang, X., Rogov, P., Wang, L. & Mikkelsen, T.S. Massively parallel reporter assays in cultured mammalian cells. J. Vis. Exp. 90, 90, e51719 (2014).
LeProust, E.M. et al. Synthesis of high-quality libraries of long (150mer) oligonucleotides by a novel depurination controlled process. Nucleic Acids Res. 38, 2522–2540 (2010).
Bickel, P.J. & Doksum, K.A. Mathematical Statistics: Basic Ideas and Selected Topics, Volume I, Second Edition. (CRC Press, 2015).
Gerstein, M.B. et al. Architecture of the human regulatory network derived from ENCODE data. Nature 489, 91–100 (2012).
Garber, M. et al. Identifying novel constrained elements by exploiting biased substitution patterns. Bioinformatics 25, i54–i62 (2009).
Smit, A., Hubley, R. & Green, P. RepeatMasker Open-3.0 (1996).
Kent, W.J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).
Bailey, T.L. & Elkan, C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. ISMB Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 (1994).
Gupta, S., Stamatoyannopoulos, J.A., Bailey, T.L. & Noble, W.S. Quantifying similarity between motifs. Genome Biol. 8, R24 (2007).
We thank P. Kheradpour and J.-P. Vert for useful discussions related to this work. This work was supported by US National Institutes of Health (NIH) grants R01ES024995, U01HG007912 and U01MH105578 (J.E.), R01HG006785 (T.S.M.), R01GM113708, U01HG007610, R01HG004037, U54HG006991 and U41HG007000 (M.K.), an US National Science Foundation CAREER Award #1254200, and an Alfred P. Sloan Fellowship (J.E.).
The Broad Institute has filed patents (US20140200163, EP2705152) on the original MPRA technology with T.S.M, A.M., L.W. and X.Z. among the authors. Patent protection for Sharpr-MPRA is currently being pursued with J.E. and M.K. among the authors.
Supplementary Figures 1–39 and Supplementary Notes 1–3 (ZIP 13193 kb)
Pilot activating and repressive coordinates. (XLSX 46 kb)
Scale-up motif analysis. (XLSX 543 kb)
Pilot sequences and count data. (ZIP 615 kb)
Pilot normalized data. (ZIP 88 kb)
Scale-up sequences and count data. (ZIP 25795 kb)
Sharpr-MPRA HepG2 and K562 scores. (ZIP 36598 kb)
Visualization of overlapping regions. (ZIP 23181 kb)
HepG2 and K562 activating and repressive visualizations. (ZIP 32394 kb)
Pair visualization of HepG2 and K562 big differences and values. (ZIP 105421 kb)
Listing of all Regions tested (html and tab-delimited format). (ZIP 2106 kb)
Source code for the SHARPR software. (ZIP 1726 kb)
About this article
Cite this article
Ernst, J., Melnikov, A., Zhang, X. et al. Genome-scale high-resolution mapping of activating and repressive nucleotides in regulatory regions. Nat Biotechnol 34, 1180–1190 (2016). https://doi.org/10.1038/nbt.3678
EMBO Molecular Medicine (2020)
Experimental Cell Research (2020)
Scientific Reports (2020)
PLOS ONE (2020)
Briefings in Bioinformatics (2020)