Article | Published:

Genome-scale high-resolution mapping of activating and repressive nucleotides in regulatory regions

Nature Biotechnology volume 34, pages 11801190 (2016) | Download Citation

Abstract

Massively parallel reporter assays (MPRAs) enable nucleotide-resolution dissection of transcriptional regulatory regions, such as enhancers, but only few regions at a time. Here we present a combined experimental and computational approach, Systematic high-resolution activation and repression profiling with reporter tiling using MPRA (Sharpr-MPRA), that allows high-resolution analysis of thousands of regions simultaneously. Sharpr-MPRA combines dense tiling of overlapping MPRA constructs with a probabilistic graphical model to recognize functional regulatory nucleotides, and to distinguish activating and repressive nucleotides, using their inferred contribution to reporter gene expression. We used Sharpr-MPRA to test 4.6 million nucleotides spanning 15,000 putative regulatory regions tiled at 5-nucleotide resolution in two human cell types. Our results recovered known cell-type-specific regulatory motifs and evolutionarily conserved nucleotides, and distinguished known activating and repressive motifs. Our results also showed that endogenous chromatin state and DNA accessibility are both predictive of regulatory function in reporter assays, identified retroviral elements with activating roles, and uncovered 'attenuator' motifs with repressive roles in active chromatin.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Accessions

Primary accessions

Gene Expression Omnibus

References

  1. 1.

    et al. Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature 459, 108–112 (2009).

  2. 2.

    & Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat. Biotechnol. 28, 817–825 (2010).

  3. 3.

    et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43–49 (2011).

  4. 4.

    et al. High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. Genome Res. 21, 456–464 (2011).

  5. 5.

    et al. Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res. 21, 447–455 (2011).

  6. 6.

    ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  7. 7.

    et al. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature 489, 83–90 (2012).

  8. 8.

    et al. The accessible chromatin landscape of the human genome. Nature 489, 75–82 (2012).

  9. 9.

    Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

  10. 10.

    et al. FTO Obesity Variant Circuitry and Adipocyte Browning in Humans. N. Engl. J. Med. 373, 895–907 (2015).

  11. 11.

    & Systematic discovery and characterization of regulatory motifs in ENCODE TF binding experiments. Nucleic Acids Res. 42, 2976–2987 (2014).

  12. 12.

    et al. A method to predict the impact of regulatory variants from DNA sequence. Nat. Genet. 47, 955–961 (2015).

  13. 13.

    et al. A single oncogenic enhancer rearrangement causes concomitant EVI1 and GATA2 deregulation in leukemia. Cell 157, 369–381 (2014).

  14. 14.

    et al. High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis. Nat. Biotechnol. 27, 1173–1175 (2009).

  15. 15.

    et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat. Biotechnol. 30, 271–277 (2012).

  16. 16.

    et al. Massively parallel functional dissection of mammalian enhancers in vivo. Nat. Biotechnol. 30, 265–270 (2012).

  17. 17.

    , , , & Complex effects of nucleotide variants in a mammalian cis-regulatory element. Proc. Natl. Acad. Sci. USA 109, 19498–19503 (2012).

  18. 18.

    et al. Functional footprinting of regulatory DNA. Nat. Methods 12, 927–930 (2015).

  19. 19.

    et al. BCL11A enhancer dissection by Cas9-mediated in situ saturating mutagenesis. Nature 527, 192–197 (2015).

  20. 20.

    et al. Massively parallel cis-regulatory analysis in the mammalian central nervous system. Genome Res. 26, 238–255 (2016).

  21. 21.

    et al. High-throughput mapping of regulatory DNA. Nat. Biotechnol. 34, 167–174 (2016).

  22. 22.

    et al. Functional genetic screens for enhancer elements in the human genome using CRISPR-Cas9. Nat. Biotechnol. 34, 192–198 (2016).

  23. 23.

    et al. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science 339, 1074–1077 (2013).

  24. 24.

    et al. Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. Genome Res. 23, 800–811 (2013).

  25. 25.

    et al. Highly parallel assays of tissue-specific enhancers in whole Drosophila embryos. Nat. Methods 10, 774–780 (2013).

  26. 26.

    et al. Function-based identification of mammalian enhancers using site-specific integration. Nat. Methods 11, 566–571 (2014).

  27. 27.

    et al. FIREWACh: high-throughput functional detection of transcriptional regulatory modules in mammalian cells. Nat. Methods 11, 559–565 (2014).

  28. 28.

    , , & High-throughput functional testing of ENCODE segmentation predictions. Genome Res. 24, 1595–1602 (2014).

  29. 29.

    & Interplay between chromatin state, regulator binding, and regulatory motifs in six human cell types. Genome Res. 23, 1142–1154 (2013).

  30. 30.

    et al. Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res. 41, 827–841 (2013).

  31. 31.

    et al. Wellington: a novel method for the accurate identification of digital genomic footprints from DNase-seq data. Nucleic Acids Res. 41, e201 (2013).

  32. 32.

    et al. Discovery of directional and nondirectional pioneer transcription factors by modeling DNase profile magnitude and shape. Nat. Biotechnol. 32, 171–178 (2014).

  33. 33.

    et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478, 476–482 (2011).

  34. 34.

    et al. Integrative genomics identifies the corepressor SMRT as a gatekeeper of adipogenesis through the transcription factors C/EBPβ and KAISO. Mol. Cell 46, 335–350 (2012).

  35. 35.

    et al. ZBTB33 binds unmethylated regions of the genome associated with actively expressed genes. Epigenetics Chromatin 6, 13 (2013).

  36. 36.

    et al. Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals. Nature 434, 338–345 (2005).

  37. 37.

    et al. Comprehensive analysis of the palindromic motif TCTCGCGAGA: a regulatory element of the HNRNPK promoter. DNA Res. 17, 245–260 (2010).

  38. 38.

    , , , & Systematic RNAi studies on the role of Sp/KLF factors in globin gene expression and erythroid differentiation. J. Mol. Biol. 366, 1064–1073 (2007).

  39. 39.

    et al. Study of FoxA pioneer factor at silent genes reveals Rfx-repressed enhancer at Cdx2 and a potential indicator of esophageal adenocarcinoma development. PLoS Genet. 7, e1002277 (2011).

  40. 40.

    & Large Maf Transcription Factors: Cousins of AP-1 Proteins and Important Regulators of Cellular Differentiation. Einstein J. Biol. Med. 23, 2–11 (2007).

  41. 41.

    & Retroelements and the human genome: new perspectives on an old relation. Proc. Natl. Acad. Sci. USA 101 (Suppl. 2), 14572–14579 (2004).

  42. 42.

    et al. Species-specific endogenous retroviruses shape the transcriptional network of the human tumor suppressor protein p53. Proc. Natl. Acad. Sci. USA 104, 18613–18618 (2007).

  43. 43.

    et al. Transposable elements have rewired the core regulatory network of human embryonic stem cells. Nat. Genet. 42, 631–634 (2010).

  44. 44.

    et al. Open chromatin defined by DNaseI and FAIRE identifies regulatory elements that shape cell-type identity. Genome Res. 21, 1757–1767 (2011).

  45. 45.

    et al. Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc. Natl. Acad. Sci. USA 107, 21931–21936 (2010).

  46. 46.

    & ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods 9, 215–216 (2012).

  47. 47.

    et al. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat. Methods 9, 473–476 (2012).

  48. 48.

    et al. Systematic functional dissection of common genetic variation affecting red blood cell traits. Cell 165, 1530–1545 (2016).

  49. 49.

    et al. Direct identification of hundreds of expression-modulating variants using a multiplexed reporter assay. Cell 165, 1519–1529 (2016).

  50. 50.

    , , & TP53 engagement with the genome occurs in distinct local chromatin environments via pioneer factor activity. Genome Res. 25, 179–188 (2015).

  51. 51.

    , , , & Massively parallel reporter assays in cultured mammalian cells. J. Vis. Exp. 90, 90, e51719 (2014).

  52. 52.

    et al. Synthesis of high-quality libraries of long (150mer) oligonucleotides by a novel depurination controlled process. Nucleic Acids Res. 38, 2522–2540 (2010).

  53. 53.

    & Mathematical Statistics: Basic Ideas and Selected Topics, Volume I, Second Edition. (CRC Press, 2015).

  54. 54.

    et al. Architecture of the human regulatory network derived from ENCODE data. Nature 489, 91–100 (2012).

  55. 55.

    et al. Identifying novel constrained elements by exploiting biased substitution patterns. Bioinformatics 25, i54–i62 (2009).

  56. 56.

    , & RepeatMasker Open-3.0 (1996).

  57. 57.

    et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).

  58. 58.

    & Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. ISMB Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 (1994).

  59. 59.

    , , & Quantifying similarity between motifs. Genome Biol. 8, R24 (2007).

Download references

Acknowledgements

We thank P. Kheradpour and J.-P. Vert for useful discussions related to this work. This work was supported by US National Institutes of Health (NIH) grants R01ES024995, U01HG007912 and U01MH105578 (J.E.), R01HG006785 (T.S.M.), R01GM113708, U01HG007610, R01HG004037, U54HG006991 and U41HG007000 (M.K.), an US National Science Foundation CAREER Award #1254200, and an Alfred P. Sloan Fellowship (J.E.).

Author information

Affiliations

  1. Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, California, USA.

    • Jason Ernst
  2. Computer Science Department, University of California, Los Angeles, Los Angeles, California, USA.

    • Jason Ernst
  3. Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research at University of California, Los Angeles, Los Angeles, California, USA.

    • Jason Ernst
  4. Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, California, USA.

    • Jason Ernst
  5. Molecular Biology Institute, University of California, Los Angeles, Los Angeles, California, USA.

    • Jason Ernst
  6. Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, Massachusetts, USA.

    • Alexandre Melnikov
    • , Xiaolan Zhang
    • , Li Wang
    • , Peter Rogov
    • , Tarjei S Mikkelsen
    •  & Manolis Kellis
  7. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

    • Manolis Kellis

Authors

  1. Search for Jason Ernst in:

  2. Search for Alexandre Melnikov in:

  3. Search for Xiaolan Zhang in:

  4. Search for Li Wang in:

  5. Search for Peter Rogov in:

  6. Search for Tarjei S Mikkelsen in:

  7. Search for Manolis Kellis in:

Contributions

J.E. and M.K. designed the sequences, developed the computational methods and analyzed the results. A.M., X.Z., L.W., P.R. and T.S.M. conducted the experimental work. T.S.M. oversaw the experimental work. J.E. and M.K. wrote the paper with substantial input from T.S.M.

Competing interests

The Broad Institute has filed patents (US20140200163, EP2705152) on the original MPRA technology with T.S.M, A.M., L.W. and X.Z. among the authors. Patent protection for Sharpr-MPRA is currently being pursued with J.E. and M.K. among the authors.

Corresponding authors

Correspondence to Jason Ernst or Manolis Kellis.

Supplementary information

Zip files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–39 and Supplementary Notes 1–3

  2. 2.

    Supplementary Data 1

    Pilot sequences and count data.

  3. 3.

    Supplementary Data 2

    Pilot normalized data.

  4. 4.

    Supplementary Data 3

    Scale-up sequences and count data.

  5. 5.

    Supplementary Data 4

    Sharpr-MPRA HepG2 and K562 scores.

  6. 6.

    Supplementary Data 5

    Visualization of overlapping regions.

  7. 7.

    Supplementary Data 6

    HepG2 and K562 activating and repressive visualizations.

  8. 8.

    Supplementary Data 7

    Pair visualization of HepG2 and K562 big differences and values.

  9. 9.

    Supplementary Data 8

    Listing of all Regions tested (html and tab-delimited format).

  10. 10.

    Supplementary Source Code

    Source code for the SHARPR software.

Excel files

  1. 1.

    Supplementary Table 1

    Pilot activating and repressive coordinates.

  2. 2.

    Supplementary Table 2

    Scale-up motif analysis.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nbt.3678

Further reading