Genome-scale high-resolution mapping of activating and repressive nucleotides in regulatory regions

Abstract

Massively parallel reporter assays (MPRAs) enable nucleotide-resolution dissection of transcriptional regulatory regions, such as enhancers, but only few regions at a time. Here we present a combined experimental and computational approach, Systematic high-resolution activation and repression profiling with reporter tiling using MPRA (Sharpr-MPRA), that allows high-resolution analysis of thousands of regions simultaneously. Sharpr-MPRA combines dense tiling of overlapping MPRA constructs with a probabilistic graphical model to recognize functional regulatory nucleotides, and to distinguish activating and repressive nucleotides, using their inferred contribution to reporter gene expression. We used Sharpr-MPRA to test 4.6 million nucleotides spanning 15,000 putative regulatory regions tiled at 5-nucleotide resolution in two human cell types. Our results recovered known cell-type-specific regulatory motifs and evolutionarily conserved nucleotides, and distinguished known activating and repressive motifs. Our results also showed that endogenous chromatin state and DNA accessibility are both predictive of regulatory function in reporter assays, identified retroviral elements with activating roles, and uncovered 'attenuator' motifs with repressive roles in active chromatin.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Experimental design.
Figure 2: Tiling enhancer regions in pilot design revealed regulatory segments at 30-bp resolution.
Figure 3: Scale-up design permits dissection of regulatory regions at high resolution.
Figure 4: Comparison of Sharpr-MPRA with motif annotations.
Figure 5: Regulatory activity of ERV1 and LINE repeats.
Figure 6: Endogenous chromatin state is predictive of reporter activity.

Accession codes

Primary accessions

Gene Expression Omnibus

References

  1. 1

    Heintzman, N.D. et al. Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature 459, 108–112 (2009).

    CAS  Article  Google Scholar 

  2. 2

    Ernst, J. & Kellis, M. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat. Biotechnol. 28, 817–825 (2010).

    CAS  Article  Google Scholar 

  3. 3

    Ernst, J. et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43–49 (2011).

    CAS  Article  Google Scholar 

  4. 4

    Boyle, A.P. et al. High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. Genome Res. 21, 456–464 (2011).

    CAS  Article  Google Scholar 

  5. 5

    Pique-Regi, R. et al. Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res. 21, 447–455 (2011).

    CAS  Article  Google Scholar 

  6. 6

    ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  7. 7

    Neph, S. et al. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature 489, 83–90 (2012).

    CAS  Article  Google Scholar 

  8. 8

    Thurman, R.E. et al. The accessible chromatin landscape of the human genome. Nature 489, 75–82 (2012).

    CAS  Article  Google Scholar 

  9. 9

    Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

  10. 10

    Claussnitzer, M. et al. FTO Obesity Variant Circuitry and Adipocyte Browning in Humans. N. Engl. J. Med. 373, 895–907 (2015).

    CAS  Article  Google Scholar 

  11. 11

    Kheradpour, P. & Kellis, M. Systematic discovery and characterization of regulatory motifs in ENCODE TF binding experiments. Nucleic Acids Res. 42, 2976–2987 (2014).

    CAS  Article  Google Scholar 

  12. 12

    Lee, D. et al. A method to predict the impact of regulatory variants from DNA sequence. Nat. Genet. 47, 955–961 (2015).

    CAS  Article  Google Scholar 

  13. 13

    Gröschel, S. et al. A single oncogenic enhancer rearrangement causes concomitant EVI1 and GATA2 deregulation in leukemia. Cell 157, 369–381 (2014).

    Article  Google Scholar 

  14. 14

    Patwardhan, R.P. et al. High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis. Nat. Biotechnol. 27, 1173–1175 (2009).

    CAS  Article  Google Scholar 

  15. 15

    Melnikov, A. et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat. Biotechnol. 30, 271–277 (2012).

    CAS  Article  Google Scholar 

  16. 16

    Patwardhan, R.P. et al. Massively parallel functional dissection of mammalian enhancers in vivo. Nat. Biotechnol. 30, 265–270 (2012).

    CAS  Article  Google Scholar 

  17. 17

    Kwasnieski, J.C., Mogno, I., Myers, C.A., Corbo, J.C. & Cohen, B.A. Complex effects of nucleotide variants in a mammalian cis-regulatory element. Proc. Natl. Acad. Sci. USA 109, 19498–19503 (2012).

    CAS  Article  Google Scholar 

  18. 18

    Vierstra, J. et al. Functional footprinting of regulatory DNA. Nat. Methods 12, 927–930 (2015).

    CAS  Article  Google Scholar 

  19. 19

    Canver, M.C. et al. BCL11A enhancer dissection by Cas9-mediated in situ saturating mutagenesis. Nature 527, 192–197 (2015).

    CAS  Article  Google Scholar 

  20. 20

    Shen, S.Q. et al. Massively parallel cis-regulatory analysis in the mammalian central nervous system. Genome Res. 26, 238–255 (2016).

    CAS  Article  Google Scholar 

  21. 21

    Rajagopal, N. et al. High-throughput mapping of regulatory DNA. Nat. Biotechnol. 34, 167–174 (2016).

    CAS  Article  Google Scholar 

  22. 22

    Korkmaz, G. et al. Functional genetic screens for enhancer elements in the human genome using CRISPR-Cas9. Nat. Biotechnol. 34, 192–198 (2016).

    CAS  Article  Google Scholar 

  23. 23

    Arnold, C.D. et al. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science 339, 1074–1077 (2013).

    CAS  Article  Google Scholar 

  24. 24

    Kheradpour, P. et al. Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. Genome Res. 23, 800–811 (2013).

    CAS  Article  Google Scholar 

  25. 25

    Gisselbrecht, S.S. et al. Highly parallel assays of tissue-specific enhancers in whole Drosophila embryos. Nat. Methods 10, 774–780 (2013).

    CAS  Article  Google Scholar 

  26. 26

    Dickel, D.E. et al. Function-based identification of mammalian enhancers using site-specific integration. Nat. Methods 11, 566–571 (2014).

    CAS  Article  Google Scholar 

  27. 27

    Murtha, M. et al. FIREWACh: high-throughput functional detection of transcriptional regulatory modules in mammalian cells. Nat. Methods 11, 559–565 (2014).

    CAS  Article  Google Scholar 

  28. 28

    Kwasnieski, J.C., Fiore, C., Chaudhari, H.G. & Cohen, B.A. High-throughput functional testing of ENCODE segmentation predictions. Genome Res. 24, 1595–1602 (2014).

    CAS  Article  Google Scholar 

  29. 29

    Ernst, J. & Kellis, M. Interplay between chromatin state, regulator binding, and regulatory motifs in six human cell types. Genome Res. 23, 1142–1154 (2013).

    CAS  Article  Google Scholar 

  30. 30

    Hoffman, M.M. et al. Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res. 41, 827–841 (2013).

    CAS  Article  Google Scholar 

  31. 31

    Piper, J. et al. Wellington: a novel method for the accurate identification of digital genomic footprints from DNase-seq data. Nucleic Acids Res. 41, e201 (2013).

    CAS  Article  Google Scholar 

  32. 32

    Sherwood, R.I. et al. Discovery of directional and nondirectional pioneer transcription factors by modeling DNase profile magnitude and shape. Nat. Biotechnol. 32, 171–178 (2014).

    CAS  Article  Google Scholar 

  33. 33

    Lindblad-Toh, K. et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478, 476–482 (2011).

    CAS  Article  Google Scholar 

  34. 34

    Raghav, S.K. et al. Integrative genomics identifies the corepressor SMRT as a gatekeeper of adipogenesis through the transcription factors C/EBPβ and KAISO. Mol. Cell 46, 335–350 (2012).

    CAS  Article  Google Scholar 

  35. 35

    Blattler, A. et al. ZBTB33 binds unmethylated regions of the genome associated with actively expressed genes. Epigenetics Chromatin 6, 13 (2013).

    CAS  Article  Google Scholar 

  36. 36

    Xie, X. et al. Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals. Nature 434, 338–345 (2005).

    CAS  Article  Google Scholar 

  37. 37

    Mikula, M. et al. Comprehensive analysis of the palindromic motif TCTCGCGAGA: a regulatory element of the HNRNPK promoter. DNA Res. 17, 245–260 (2010).

    CAS  Article  Google Scholar 

  38. 38

    Hu, J.H., Navas, P., Cao, H., Stamatoyannopoulos, G. & Song, C.-Z. Systematic RNAi studies on the role of Sp/KLF factors in globin gene expression and erythroid differentiation. J. Mol. Biol. 366, 1064–1073 (2007).

    CAS  Article  Google Scholar 

  39. 39

    Watts, J.A. et al. Study of FoxA pioneer factor at silent genes reveals Rfx-repressed enhancer at Cdx2 and a potential indicator of esophageal adenocarcinoma development. PLoS Genet. 7, e1002277 (2011).

    CAS  Article  Google Scholar 

  40. 40

    Yang, Y. & Cvekl, A. Large Maf Transcription Factors: Cousins of AP-1 Proteins and Important Regulators of Cellular Differentiation. Einstein J. Biol. Med. 23, 2–11 (2007).

    CAS  Article  Google Scholar 

  41. 41

    Bannert, N. & Kurth, R. Retroelements and the human genome: new perspectives on an old relation. Proc. Natl. Acad. Sci. USA 101 (Suppl. 2), 14572–14579 (2004).

    CAS  Article  Google Scholar 

  42. 42

    Wang, T. et al. Species-specific endogenous retroviruses shape the transcriptional network of the human tumor suppressor protein p53. Proc. Natl. Acad. Sci. USA 104, 18613–18618 (2007).

    CAS  Article  Google Scholar 

  43. 43

    Kunarso, G. et al. Transposable elements have rewired the core regulatory network of human embryonic stem cells. Nat. Genet. 42, 631–634 (2010).

    CAS  Article  Google Scholar 

  44. 44

    Song, L. et al. Open chromatin defined by DNaseI and FAIRE identifies regulatory elements that shape cell-type identity. Genome Res. 21, 1757–1767 (2011).

    CAS  Article  Google Scholar 

  45. 45

    Creyghton, M.P. et al. Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc. Natl. Acad. Sci. USA 107, 21931–21936 (2010).

    CAS  Article  Google Scholar 

  46. 46

    Ernst, J. & Kellis, M. ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods 9, 215–216 (2012).

    CAS  Article  Google Scholar 

  47. 47

    Hoffman, M.M. et al. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat. Methods 9, 473–476 (2012).

    CAS  Article  Google Scholar 

  48. 48

    Ulirsch, J.C. et al. Systematic functional dissection of common genetic variation affecting red blood cell traits. Cell 165, 1530–1545 (2016).

    CAS  Article  Google Scholar 

  49. 49

    Tewhey, R. et al. Direct identification of hundreds of expression-modulating variants using a multiplexed reporter assay. Cell 165, 1519–1529 (2016).

    CAS  Article  Google Scholar 

  50. 50

    Sammons, M.A., Zhu, J., Drake, A.M. & Berger, S.L. TP53 engagement with the genome occurs in distinct local chromatin environments via pioneer factor activity. Genome Res. 25, 179–188 (2015).

    CAS  Article  Google Scholar 

  51. 51

    Melnikov, A., Zhang, X., Rogov, P., Wang, L. & Mikkelsen, T.S. Massively parallel reporter assays in cultured mammalian cells. J. Vis. Exp. 90, 90, e51719 (2014).

  52. 52

    LeProust, E.M. et al. Synthesis of high-quality libraries of long (150mer) oligonucleotides by a novel depurination controlled process. Nucleic Acids Res. 38, 2522–2540 (2010).

    CAS  Article  Google Scholar 

  53. 53

    Bickel, P.J. & Doksum, K.A. Mathematical Statistics: Basic Ideas and Selected Topics, Volume I, Second Edition. (CRC Press, 2015).

  54. 54

    Gerstein, M.B. et al. Architecture of the human regulatory network derived from ENCODE data. Nature 489, 91–100 (2012).

    CAS  Article  Google Scholar 

  55. 55

    Garber, M. et al. Identifying novel constrained elements by exploiting biased substitution patterns. Bioinformatics 25, i54–i62 (2009).

    CAS  Article  Google Scholar 

  56. 56

    Smit, A., Hubley, R. & Green, P. RepeatMasker Open-3.0 (1996).

  57. 57

    Kent, W.J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).

    CAS  Article  Google Scholar 

  58. 58

    Bailey, T.L. & Elkan, C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. ISMB Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 (1994).

    CAS  Google Scholar 

  59. 59

    Gupta, S., Stamatoyannopoulos, J.A., Bailey, T.L. & Noble, W.S. Quantifying similarity between motifs. Genome Biol. 8, R24 (2007).

    Article  Google Scholar 

Download references

Acknowledgements

We thank P. Kheradpour and J.-P. Vert for useful discussions related to this work. This work was supported by US National Institutes of Health (NIH) grants R01ES024995, U01HG007912 and U01MH105578 (J.E.), R01HG006785 (T.S.M.), R01GM113708, U01HG007610, R01HG004037, U54HG006991 and U41HG007000 (M.K.), an US National Science Foundation CAREER Award #1254200, and an Alfred P. Sloan Fellowship (J.E.).

Author information

Affiliations

Authors

Contributions

J.E. and M.K. designed the sequences, developed the computational methods and analyzed the results. A.M., X.Z., L.W., P.R. and T.S.M. conducted the experimental work. T.S.M. oversaw the experimental work. J.E. and M.K. wrote the paper with substantial input from T.S.M.

Corresponding authors

Correspondence to Jason Ernst or Manolis Kellis.

Ethics declarations

Competing interests

The Broad Institute has filed patents (US20140200163, EP2705152) on the original MPRA technology with T.S.M, A.M., L.W. and X.Z. among the authors. Patent protection for Sharpr-MPRA is currently being pursued with J.E. and M.K. among the authors.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–39 and Supplementary Notes 1–3 (ZIP 13193 kb)

Supplementary Table 1

Pilot activating and repressive coordinates. (XLSX 46 kb)

Supplementary Table 2

Scale-up motif analysis. (XLSX 543 kb)

Supplementary Data 1

Pilot sequences and count data. (ZIP 615 kb)

Supplementary Data 2

Pilot normalized data. (ZIP 88 kb)

Supplementary Data 3

Scale-up sequences and count data. (ZIP 25795 kb)

Supplementary Data 4

Sharpr-MPRA HepG2 and K562 scores. (ZIP 36598 kb)

Supplementary Data 5

Visualization of overlapping regions. (ZIP 23181 kb)

Supplementary Data 6

HepG2 and K562 activating and repressive visualizations. (ZIP 32394 kb)

Supplementary Data 7

Pair visualization of HepG2 and K562 big differences and values. (ZIP 105421 kb)

Supplementary Data 8

Listing of all Regions tested (html and tab-delimited format). (ZIP 2106 kb)

Supplementary Source Code

Source code for the SHARPR software. (ZIP 1726 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ernst, J., Melnikov, A., Zhang, X. et al. Genome-scale high-resolution mapping of activating and repressive nucleotides in regulatory regions. Nat Biotechnol 34, 1180–1190 (2016). https://doi.org/10.1038/nbt.3678

Download citation

Further reading

Search

Sign up for the Nature Briefing newsletter for a daily update on COVID-19 science.
Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing