Technical Report | Published:

An interactive environment for agile analysis and visualization of ChIP-sequencing data

Nature Structural & Molecular Biology volume 23, pages 349357 (2016) | Download Citation

Abstract

To empower experimentalists with a means for fast and comprehensive chromatin immunoprecipitation sequencing (ChIP-seq) data analyses, we introduce an integrated computational environment, EaSeq. The software combines the exploratory power of genome browsers with an extensive set of interactive and user-friendly tools for genome-wide abstraction and visualization. It enables experimentalists to easily extract information and generate hypotheses from their own data and public genome-wide datasets. For demonstration purposes, we performed meta-analyses of public Polycomb ChIP-seq data and established a new screening approach to analyze more than 900 datasets from mouse embryonic stem cells for factors potentially associated with Polycomb recruitment. EaSeq, which is freely available and works on a standard personal computer, can substantially increase the throughput of many analysis workflows, facilitate transparency and reproducibility by automatically documenting and organizing analyses, and enable a broader group of scientists to gain insights from ChIP-seq data.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Accessions

Gene Expression Omnibus

References

  1. 1.

    & Next-generation sequencing data interpretation: enhancing reproducibility and accessibility. Nat. Rev. Genet. 13, 667–672 (2012).

  2. 2.

    & New insights from existing sequence data: generating breakthroughs without a pipette. Mol. Cell 49, 605–617 (2013).

  3. 3.

    Biology: the big challenges of big data. Nature 498, 255–260 (2013).

  4. 4.

    et al. The NIH Roadmap Epigenomics Mapping Consortium. Nat. Biotechnol. 28, 1045–1048 (2010).

  5. 5.

    An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  6. 6.

    , & Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210 (2002).

  7. 7.

    , , & Epiviz: interactive visual analytics for functional genomics data. Nat. Methods 11, 938–940 (2014).

  8. 8.

    et al. Spark: a navigational paradigm for genomic data exploration. Genome Res. 22, 2262–2269 (2012).

  9. 9.

    et al. VAP: a versatile aggregate profiler for efficient genome-wide data representation and discovery. Nucleic Acids Res. 42, W485–W493 (2014).

  10. 10.

    , , , & PAVIS: a tool for peak annotation and visualization. Bioinformatics 29, 3097–3099 (2013).

  11. 11.

    et al. An integrated software system for analyzing ChIP-chip and ChIP-seq data. Nat. Biotechnol. 26, 1293–1300 (2008).

  12. 12.

    et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).

  13. 13.

    , , , & Podbat: a novel genomic tool reveals Swr1-independent H2A.Z incorporation at gene coding sequences through epigenetic meta-analysis. PLoS Comput. Biol. 7, e1002163 (2011).

  14. 14.

    , , & PeakAnalyzer: genome-wide annotation of chromatin binding and modification loci. BMC Bioinformatics 11, 415 (2010).

  15. 15.

    et al. seqMINER: an integrated ChIP-seq data interpretation platform. Nucleic Acids Res. 39, e35 (2011).

  16. 16.

    et al. Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 15, 1451–1455 (2005).

  17. 17.

    , & GeneProf: analysis of high-throughput sequencing experiments. Nat. Methods 9, 7–8 (2012).

  18. 18.

    et al. Cistrome: an integrative platform for transcriptional regulation studies. Genome Biol. 12, R83 (2011).

  19. 19.

    et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004).

  20. 20.

    R Development Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, Vienna, Austria, 2008).

  21. 21.

    & From single cells to deep phenotypes in cancer. Nat. Biotechnol. 30, 639–647 (2012).

  22. 22.

    et al. High-resolution profiling of histone methylations in the human genome. Cell 129, 823–837 (2007).

  23. 23.

    , & HTSeq: a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).

  24. 24.

    & BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

  25. 25.

    , , & affy: analysis of Affymetrix GeneChip data at the probe level. Bioinformatics 20, 307–315 (2004).

  26. 26.

    , & Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nat. Biotechnol. 26, 1351–1359 (2008).

  27. 27.

    et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 22, 1813–1831 (2012).

  28. 28.

    , , & Large-scale quality analysis of published ChIP-seq data. G3 (Bethesda) 4, 209–223 (2014).

  29. 29.

    Robust locally weighted regression and smoothing scatterplots. J. Am. Stat. Assoc. 74, 829–836 (1979).

  30. 30.

    Lowess: a program for smoothing scatterplots by robust locally weighted regression. Am. Stat. 35, 54 (1981).

  31. 31.

    & Analysis of data from viral DNA microchips. J. Am. Stat. Assoc. 96, 1161–1170 (2001).

  32. 32.

    et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).

  33. 33.

    & Normalization of ChIP-seq data with control. BMC Bioinformatics 13, 199 (2012).

  34. 34.

    , & Computation for ChIP-seq and RNA-seq studies. Nat. Methods 6, S22–S32 (2009).

  35. 35.

    & Evaluation of algorithm performance in ChIP-seq peak detection. PLoS One 5, e11471 (2010).

  36. 36.

    , , & Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007).

  37. 37.

    , , & CisGenome Browser: a flexible tool for genomic data visualization. Bioinformatics 26, 1781–1782 (2010).

  38. 38.

    et al. Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. Nat. Methods 5, 829–834 (2008).

  39. 39.

    & Occupying chromatin: Polycomb mechanisms for getting to genomic targets, stopping transcriptional traffic, and staying put. Mol. Cell 49, 808–824 (2013).

  40. 40.

    & What are memories made of? How Polycomb and Trithorax proteins mediate epigenetic memory. Nat. Rev. Mol. Cell Biol. 15, 340–356 (2014).

  41. 41.

    & The Polycomb complex PRC2 and its mark in life. Nature 469, 343–349 (2011).

  42. 42.

    & Transcriptional regulation by Polycomb group proteins. Nat. Struct. Mol. Biol. 20, 1147–1155 (2013).

  43. 43.

    et al. Variant PRC1 complex-dependent H2A ubiquitylation drives PRC2 recruitment and polycomb domain formation. Cell 157, 1445–1459 (2014).

  44. 44.

    et al. KDM2B links the Polycomb Repressive Complex 1 (PRC1) to recognition of CpG islands. eLife 1, e00205 (2012).

  45. 45.

    et al. Kdm2b maintains murine embryonic stem cell status by recruiting PRC1 complex to CpG islands of developmental genes. Nat. Cell Biol. 15, 373–384 (2013).

  46. 46.

    , & Fbxl10/Kdm2b recruits polycomb repressive complex 1 to CpG islands and regulates H2A ubiquitylation. Mol. Cell 49, 1134–1146 (2013).

  47. 47.

    et al. Gene silencing triggers polycomb repressive complex 2 recruitment to CpG islands genome wide. Mol. Cell 55, 347–360 (2014).

  48. 48.

    , , , & Chromatin sampling: an emerging perspective on targeting polycomb repressor proteins. PLoS Genet. 9, e1003717 (2013).

  49. 49.

    et al. A model for transmission of the H3K27me3 epigenetic mark. Nat. Cell Biol. 10, 1291–1300 (2008).

  50. 50.

    et al. Role of the polycomb protein EED in the propagation of repressive histone marks. Nature 461, 762–767 (2009).

  51. 51.

    et al. Targeting polycomb to pericentric heterochromatin in embryonic stem cells reveals a role for H2AK119u1 in PRC2 recruitment. Cell Rep. 7, 1456–1470 (2014).

  52. 52.

    et al. Histone H2A monoubiquitination promotes histone H3 methylation in Polycomb repression. Nat. Struct. Mol. Biol. 21, 569–571 (2014).

  53. 53.

    et al. Mll2 is required for H3K4 trimethylation on bivalent promoters in embryonic stem cells, whereas Mll1 is redundant. Development 141, 526–537 (2014).

  54. 54.

    , , , & Genetic programs constructed from layered logic gates in single cells. Nature 491, 249–253 (2012).

  55. 55.

    et al. REST-mediated recruitment of polycomb repressor complexes in mammalian cells. PLoS Genet. 8, e1002494 (2012).

  56. 56.

    et al. RYBP-PRC1 complexes mediate H2A ubiquitylation at polycomb target sites independently of PRC2 and H3K27me3. Cell 148, 664–678 (2012).

  57. 57.

    , , , & A histone mutant reproduces the phenotype caused by loss of histone-modifying factor Polycomb. Science 339, 698–699 (2013).

  58. 58.

    et al. Mouse polycomb proteins bind differentially to methylated histone H3 and RNA and are enriched in facultative heterochromatin. Mol. Cell. Biol. 26, 2560–2569 (2006).

  59. 59.

    et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).

  60. 60.

    Java Treeview: extensible visualization of microarray data. Bioinformatics 20, 3246–3248 (2004).

  61. 61.

    , , & Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

  62. 62.

    et al. 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

  63. 63.

    & Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B Stat. Methodol. 57, 289–300 (1995).

  64. 64.

    et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 42, D756–D763 (2014).

Download references

Acknowledgements

We would like to thank J. Christensen, P. Cloos, N. Dietrich, N. Jungersen, J. Wegeberg and K. Helin for inspiring discussions, suggestions and proofreading; K. Isaksen for help with easeq.net; and T. Rasborg for usability consultancy. This work was supported by a grant from the Danish National Research Foundation (DNRF82) (K.H.).

Author information

Affiliations

  1. Biotech Research and Innovation Centre (BRIC), University of Copenhagen, Copenhagen, Denmark.

    • Mads Lerdrup
    • , Jens Vilstrup Johansen
    • , Shuchi Agrawal-Singh
    •  & Klaus Hansen
  2. Centre for Epigenetics, University of Copenhagen, Copenhagen, Denmark.

    • Mads Lerdrup
    • , Shuchi Agrawal-Singh
    •  & Klaus Hansen
  3. The Bioinformatics Centre, Department of Biology, University of Copenhagen, Copenhagen, Denmark.

    • Jens Vilstrup Johansen

Authors

  1. Search for Mads Lerdrup in:

  2. Search for Jens Vilstrup Johansen in:

  3. Search for Shuchi Agrawal-Singh in:

  4. Search for Klaus Hansen in:

Contributions

M.L. contributed software design, algorithms, general programming and manuscript writing. J.V.J. contributed algorithms, benchmarking, validation and manuscript proofreading. S.A.-S. contributed testing, usability testing and manuscript and software proofreading. K.H. contributed software design, funding, usability testing and manuscript writing.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Mads Lerdrup or Klaus Hansen.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–6 and Supplementary Note

Excel files

  1. 1.

    Supplementary Table 1

    List of accession numbers, reference to primary publication, and metadata for all datasets used for main and supplementary figures except Figure 6e–g.

  2. 2.

    Supplementary Table 2

    List containing accession numbers, metadata, and Z-scores for all datasets used for main Figure 6e–g

Zip files

  1. 1.

    Supplementary Software

    EaSeq

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nsmb.3180

Further reading