Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Technical Report
  • Published:

An interactive environment for agile analysis and visualization of ChIP-sequencing data

Abstract

To empower experimentalists with a means for fast and comprehensive chromatin immunoprecipitation sequencing (ChIP-seq) data analyses, we introduce an integrated computational environment, EaSeq. The software combines the exploratory power of genome browsers with an extensive set of interactive and user-friendly tools for genome-wide abstraction and visualization. It enables experimentalists to easily extract information and generate hypotheses from their own data and public genome-wide datasets. For demonstration purposes, we performed meta-analyses of public Polycomb ChIP-seq data and established a new screening approach to analyze more than 900 datasets from mouse embryonic stem cells for factors potentially associated with Polycomb recruitment. EaSeq, which is freely available and works on a standard personal computer, can substantially increase the throughput of many analysis workflows, facilitate transparency and reproducibility by automatically documenting and organizing analyses, and enable a broader group of scientists to gain insights from ChIP-seq data.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: EaSeq covers comprehensive ChIP-seq analysis workflows.
Figure 2: EaSeq facilitates analysis and exploration through a high level of visualization and interaction.
Figure 3: Overview of the versatile collection of tools for analysis and visualization in EaSeq, along with short descriptions of their main functions.
Figure 4: Examples of selected tools for data normalization and control.
Figure 5: EaSeq's peak-finding procedure generates peak sets of the same quality as those generated by widely used algorithms.
Figure 6: PcG enrichment at subtypes of CGIs and characterization of the chromatin environment at CGIs with differential PcG recruitment.
Figure 7: PcGs are more correlated to local umCpG and H2AUb than H3K27me3 levels within short ranges.
Figure 8: H3K27me3 tends to flank Suz12 peaks and CGIs to a greater extent than umCpG do and H2AUb.

Similar content being viewed by others

Accession codes

Accessions

Gene Expression Omnibus

References

  1. Nekrutenko, A. & Taylor, J. Next-generation sequencing data interpretation: enhancing reproducibility and accessibility. Nat. Rev. Genet. 13, 667–672 (2012).

    Article  CAS  PubMed  Google Scholar 

  2. Plocik, A.M. & Graveley, B.R. New insights from existing sequence data: generating breakthroughs without a pipette. Mol. Cell 49, 605–617 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Marx, V. Biology: the big challenges of big data. Nature 498, 255–260 (2013).

    Article  CAS  PubMed  Google Scholar 

  4. Bernstein, B.E. et al. The NIH Roadmap Epigenomics Mapping Consortium. Nat. Biotechnol. 28, 1045–1048 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Consortium, E.P. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

    Article  CAS  Google Scholar 

  6. Edgar, R., Domrachev, M. & Lash, A.E. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Chelaru, F., Smith, L., Goldstein, N. & Bravo, H.C. Epiviz: interactive visual analytics for functional genomics data. Nat. Methods 11, 938–940 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Nielsen, C.B. et al. Spark: a navigational paradigm for genomic data exploration. Genome Res. 22, 2262–2269 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Coulombe, C. et al. VAP: a versatile aggregate profiler for efficient genome-wide data representation and discovery. Nucleic Acids Res. 42, W485–W493 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Huang, W., Loganantharaj, R., Schroeder, B., Fargo, D. & Li, L. PAVIS: a tool for peak annotation and visualization. Bioinformatics 29, 3097–3099 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Ji, H. et al. An integrated software system for analyzing ChIP-chip and ChIP-seq data. Nat. Biotechnol. 26, 1293–1300 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Robinson, J.T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Sadeghi, L., Bonilla, C., Strålfors, A., Ekwall, K. & Svensson, J.P. Podbat: a novel genomic tool reveals Swr1-independent H2A.Z incorporation at gene coding sequences through epigenetic meta-analysis. PLoS Comput. Biol. 7, e1002163 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Salmon-Divon, M., Dvinge, H., Tammoja, K. & Bertone, P. PeakAnalyzer: genome-wide annotation of chromatin binding and modification loci. BMC Bioinformatics 11, 415 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Ye, T. et al. seqMINER: an integrated ChIP-seq data interpretation platform. Nucleic Acids Res. 39, e35 (2011).

    Article  CAS  PubMed  Google Scholar 

  16. Giardine, B. et al. Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 15, 1451–1455 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Halbritter, F., Vaidya, H.J. & Tomlinson, S.R. GeneProf: analysis of high-throughput sequencing experiments. Nat. Methods 9, 7–8 (2012).

    Article  CAS  Google Scholar 

  18. Liu, T. et al. Cistrome: an integrative platform for transcriptional regulation studies. Genome Biol. 12, R83 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Gentleman, R.C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004).

    Article  PubMed  PubMed Central  Google Scholar 

  20. R Development Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, Vienna, Austria, 2008).

  21. Bendall, S.C. & Nolan, G.P. From single cells to deep phenotypes in cancer. Nat. Biotechnol. 30, 639–647 (2012).

    Article  CAS  PubMed  Google Scholar 

  22. Barski, A. et al. High-resolution profiling of histone methylations in the human genome. Cell 129, 823–837 (2007).

    Article  CAS  PubMed  Google Scholar 

  23. Anders, S., Pyl, P.T. & Huber, W. HTSeq: a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).

    Article  CAS  PubMed  Google Scholar 

  24. Quinlan, A.R. & Hall, I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Gautier, L., Cope, L., Bolstad, B.M. & Irizarry, R.A. affy: analysis of Affymetrix GeneChip data at the probe level. Bioinformatics 20, 307–315 (2004).

    Article  CAS  PubMed  Google Scholar 

  26. Kharchenko, P.V., Tolstorukov, M.Y. & Park, P.J. Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nat. Biotechnol. 26, 1351–1359 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Landt, S.G. et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 22, 1813–1831 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Marinov, G.K., Kundaje, A., Park, P.J. & Wold, B.J. Large-scale quality analysis of published ChIP-seq data. G3 (Bethesda) 4, 209–223 (2014).

    Article  Google Scholar 

  29. Cleveland, W.S. Robust locally weighted regression and smoothing scatterplots. J. Am. Stat. Assoc. 74, 829–836 (1979).

    Article  Google Scholar 

  30. Cleveland, W.S. Lowess: a program for smoothing scatterplots by robust locally weighted regression. Am. Stat. 35, 54 (1981).

    Article  Google Scholar 

  31. Amaratunga, D. & Cabrera, J. Analysis of data from viral DNA microchips. J. Am. Stat. Assoc. 96, 1161–1170 (2001).

    Article  Google Scholar 

  32. Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Liang, K. & Keleş, S. Normalization of ChIP-seq data with control. BMC Bioinformatics 13, 199 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  34. Pepke, S., Wold, B. & Mortazavi, A. Computation for ChIP-seq and RNA-seq studies. Nat. Methods 6, S22–S32 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Wilbanks, E.G. & Facciotti, M.T. Evaluation of algorithm performance in ChIP-seq peak detection. PLoS One 5, e11471 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Johnson, D.S., Mortazavi, A., Myers, R.M. & Wold, B. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007).

    Article  CAS  PubMed  Google Scholar 

  37. Jiang, H., Wang, F., Dyer, N.P. & Wong, W.H. CisGenome Browser: a flexible tool for genomic data visualization. Bioinformatics 26, 1781–1782 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Valouev, A. et al. Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. Nat. Methods 5, 829–834 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Simon, J.A. & Kingston, R.E. Occupying chromatin: Polycomb mechanisms for getting to genomic targets, stopping transcriptional traffic, and staying put. Mol. Cell 49, 808–824 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Steffen, P.A. & Ringrose, L. What are memories made of? How Polycomb and Trithorax proteins mediate epigenetic memory. Nat. Rev. Mol. Cell Biol. 15, 340–356 (2014).

    Article  CAS  PubMed  Google Scholar 

  41. Margueron, R. & Reinberg, D. The Polycomb complex PRC2 and its mark in life. Nature 469, 343–349 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Di Croce, L. & Helin, K. Transcriptional regulation by Polycomb group proteins. Nat. Struct. Mol. Biol. 20, 1147–1155 (2013).

    Article  CAS  PubMed  Google Scholar 

  43. Blackledge, N.P. et al. Variant PRC1 complex-dependent H2A ubiquitylation drives PRC2 recruitment and polycomb domain formation. Cell 157, 1445–1459 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Farcas, A.M. et al. KDM2B links the Polycomb Repressive Complex 1 (PRC1) to recognition of CpG islands. eLife 1, e00205 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. He, J. et al. Kdm2b maintains murine embryonic stem cell status by recruiting PRC1 complex to CpG islands of developmental genes. Nat. Cell Biol. 15, 373–384 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Wu, X., Johansen, J.V. & Helin, K. Fbxl10/Kdm2b recruits polycomb repressive complex 1 to CpG islands and regulates H2A ubiquitylation. Mol. Cell 49, 1134–1146 (2013).

    Article  CAS  PubMed  Google Scholar 

  47. Riising, E.M. et al. Gene silencing triggers polycomb repressive complex 2 recruitment to CpG islands genome wide. Mol. Cell 55, 347–360 (2014).

    Article  CAS  PubMed  Google Scholar 

  48. Klose, R.J., Cooper, S., Farcas, A.M., Blackledge, N.P. & Brockdorff, N. Chromatin sampling: an emerging perspective on targeting polycomb repressor proteins. PLoS Genet. 9, e1003717 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Hansen, K.H. et al. A model for transmission of the H3K27me3 epigenetic mark. Nat. Cell Biol. 10, 1291–1300 (2008).

    Article  CAS  PubMed  Google Scholar 

  50. Margueron, R. et al. Role of the polycomb protein EED in the propagation of repressive histone marks. Nature 461, 762–767 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Cooper, S. et al. Targeting polycomb to pericentric heterochromatin in embryonic stem cells reveals a role for H2AK119u1 in PRC2 recruitment. Cell Rep. 7, 1456–1470 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Kalb, R. et al. Histone H2A monoubiquitination promotes histone H3 methylation in Polycomb repression. Nat. Struct. Mol. Biol. 21, 569–571 (2014).

    Article  CAS  PubMed  Google Scholar 

  53. Denissov, S. et al. Mll2 is required for H3K4 trimethylation on bivalent promoters in embryonic stem cells, whereas Mll1 is redundant. Development 141, 526–537 (2014).

    Article  CAS  PubMed  Google Scholar 

  54. Moon, T.S., Lou, C., Tamsir, A., Stanton, B.C. & Voigt, C.A. Genetic programs constructed from layered logic gates in single cells. Nature 491, 249–253 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Dietrich, N. et al. REST-mediated recruitment of polycomb repressor complexes in mammalian cells. PLoS Genet. 8, e1002494 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Tavares, L. et al. RYBP-PRC1 complexes mediate H2A ubiquitylation at polycomb target sites independently of PRC2 and H3K27me3. Cell 148, 664–678 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Pengelly, A.R., Copur, Ö., Jäckle, H., Herzig, A. & Müller, J. A histone mutant reproduces the phenotype caused by loss of histone-modifying factor Polycomb. Science 339, 698–699 (2013).

    Article  CAS  PubMed  Google Scholar 

  58. Bernstein, E. et al. Mouse polycomb proteins bind differentially to methylated histone H3 and RNA and are enriched in facultative heterochromatin. Mol. Cell. Biol. 26, 2560–2569 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Kent, W.J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Saldanha, A.J. Java Treeview: extensible visualization of microarray data. Bioinformatics 20, 3246–3248 (2004).

    Article  CAS  PubMed  Google Scholar 

  61. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Li, H. et al. 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B Stat. Methodol. 57, 289–300 (1995).

    Google Scholar 

  64. Pruitt, K.D. et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 42, D756–D763 (2014).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We would like to thank J. Christensen, P. Cloos, N. Dietrich, N. Jungersen, J. Wegeberg and K. Helin for inspiring discussions, suggestions and proofreading; K. Isaksen for help with easeq.net; and T. Rasborg for usability consultancy. This work was supported by a grant from the Danish National Research Foundation (DNRF82) (K.H.).

Author information

Authors and Affiliations

Authors

Contributions

M.L. contributed software design, algorithms, general programming and manuscript writing. J.V.J. contributed algorithms, benchmarking, validation and manuscript proofreading. S.A.-S. contributed testing, usability testing and manuscript and software proofreading. K.H. contributed software design, funding, usability testing and manuscript writing.

Corresponding authors

Correspondence to Mads Lerdrup or Klaus Hansen.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Regionsets in EaSeq can be ordered and generated from other types of data, and can contain parameters such as quantified levels of ChIP-seq signal.

(a) Sorting of regions by a graphical interface (middle). Left: H3K4me3 heatmap at 19,903 TSS that was sorted for gene expression in CD4+ T-cells with the highest expression at the base. Right: data after sorting for quantified H3K4me3 levels (‘Sort’ tool). (b) Overview of how new regionsets are generated from datasets using the ‘Peaks’ tool (top), from genesets using the ‘Extract’ tool (middle), or other regionsets using the ‘Controls’ tool (bottom). (c) New parameters are generated and added to an existing regionset using the ‘Quantify’ tool for quantitation of ChIP-seq signal at the regions. Left: examples of coordinates from the 19,903 genes in the regionset. Middle: example tracks generated by the ‘Quantify’ tool and showing levels of H3K4me3, H3K4me2, and RNA-PolII signal surrounding the start of the regions of interest. Signal within the regions were automatically colored orange and black outside the regions. The blue square overlaid by the tool illustrates the quantified area on the X-axis and the quantified value on the Y-axis. Right: results from the quantitation of the four example regions. Each column corresponds to a new parameter added to the existing regionset.

Supplementary Figure 2 Side-by-side comparison of tools in EaSeq and frequently used command-line counterparts.

(a, b) Tracks from EaSeq (top) and the UCSC genome browser (bottom) showing high resemblance in the positions of individual reads from an H3K27me3 ChIP (SRR946804) mapped at mm9. Tracks show the Ecel1 locus (a) and a 1 kbp window of the Sgk3 locus (b). Each plot is unmodified output from EaSeq or the UCSC genome browser. (c, d) 2D-histograms showing the distribution of read counts at the +/-5 kbp surrounding mm9 CGIs counted using EaSeqs quantitation tool and Bedtools intersect –c (c) or HTSeq (d) as well as the Pearson’s correlation coefficient, r. (e) Pseudocolored 2D-histogram showing relationship between the distribution of read counts as in (d) in relation to the distance between the CGIs. Coloring reflects the average log10 distance to the nearest CGI (measured from center to center), and red, green, and blue colors reflects average distance of app 10 kbp, 3 kbp, and 1 kbp. (f) 2D-histogram showing the distribution of read counts at the +/-5 kbp surrounding mm9 CGIs that are distanced at least 10 kbp + one read length (150 bp) from the nearest neighboring CGI. Reads were counted as in (d) as well as the Pearson’s correlation coefficient, r. (g-k) 2D-histograms showing the quantified values of read counts at the +/-5 kbp surrounding mm9 CGIs counted from two H3K27me3 datasets from mESCs (SRR946804 and SRR946806) using EaSeqs quantitation tool and left unnormalized (g) or normalized using quantile normalization in R (h), quantile normalization in EaSeq (i), or LOESS normalization in EaSeq (k). (j) Shows the relationship between the Quantile normalization of SRR946804 done using R (X-axis) and EaSeq (Y-Axis) as well as the Pearson’s correlation coefficient.

Supplementary Figure 3 Side-by-side comparison of tools in EaSeq and frequently used command-line counterparts.

(a) Histograms showing the size distributions of 100 bp regions generated from larger CGIs using EaSeq’s homogenization tool (left) and Bedtools makewindows (right). (b) Illustration of how a single CGI (green) at the Zdhhc8 locus is subdivided into smaller 100 bp regions using EaSeq’s homogenization tool (red) and Bedtools makewindows (blue). Coordinates refer to mm9. (c) 2D-histograms showing the distribution of distances between CGIs and the nearest TSS identified using EaSeqs colocalization tool (Y-axis) and Bedtools closest (X-axis) as well as the number of CGIs (n) visualized in each plot. Left plot shows all CGIs, middle plot only those that do not overlap with a TSS, and right plot those that overlap. The orange circle and arrow illustrates the location in the histogram of the three CGIs shown in (d). (d) Illustrations of individual CGIs (green) at three different loci where EaSeq colocalization tool (red) and Bedtools closest (blue) reach divergent results when assigning a TSS for each CGI. Coordinates refer to mm9. (e) Heatmaps of CGIs sorted for overlap with gene bodies and sizes and colored according to overlap with particular sets of regions identified using EaSeq tools or their Bedtools intersect counterpart. Upper panel: The parts of the CGIs overlapping with the regions computed by EaSeq or Bedtools were colored according to the color illustrated for each heatmap, whereas non-overlapping CGIs or parts hereof were colored black. Lower panel: The subset of regions in each heatmap were derived from the CGI population using the EaSeq or Bedtools algorithm described at the origin of each arrow, and colored according to the colors used for that subset of regions in the upper panel. The parts of those subsets that were derived using EaSeq and overlapped with regions derived using the Bedtools counterpart, or a Bedtools algorithm that generates a mutually exclusive set of regions, were highlighted in the color of the Bedtools set of regions – and vice versa – illustrating the extent of overlap between EaSeq and Bedtools counterparts.

Supplementary Figure 4 EaSeq’s ‘adaptive local thresholding’ (ALT) peak-finding procedure generates peak sets of the same quality as those generated by widely used peak-finding procedures.

(a) Number of peaks found for the transcription factors NRSF and GABP. (b, c) Graphs depicting the fraction of a set of PCR validated true positives (b) or identified motifs (c) that were identified as a function of the ranking of peaks found by ChIP-seq for the transcription factor GABP. (d) Box plot illustrating the positional accuracy of GABP peaks found by EaSeq compared to those identified in the same dataset by CisGenome and the two versions of MACS. Y-axis values represent the genome-wide distances from the apex of each peak to the nearest GABP motif (if any). (e) Graph depicting the fraction of GABP-motif containing-peaks in relation to peak rank.

Supplementary Figure 5 Heat maps show elevated levels of PcGs at CGIs associated near active genes, but the signals of some negative controls are also elevated at CGIs in general.

(a) Heat maps of PcG levels at +/-2.5 kbp near the center of all mouse CGIs (arrowheads = center) show an elevated vPRC1 signal at H3K4me3 enriched CGIs. The order of the CGIs was derived from nearest neighbor chain hierarchical clustering of PcGs, H3K27me3, H2AUb, input, and IgG control signals within +/- 2.5 kbp of the CGI centers using the ‘ClusterP’-tool in EaSeq, which is based on the algorithm (Juan, Les Cahiers de l'Analyse des Données. 7, 6, 1982, Benzécri, Les Cahiers de l'Analyse des Données. 7, 9, 1982). H3K4me3 and H3K36me3 marks were included to visualize the transcriptional status of the loci surrounding the CGIs. * vPRC1 signal at active CGIs. (b) Heat maps of several negative controls within 5 kbp of CGIs (Arrowheads) frequently show marked enrichment at CGIs. CGIs were sorted according to the K4me3-K27me3 balance used in Fig. 6a-c. Fr/M/kbps: DNA fragments per million reads per kilobasepairs. Grey bars marks if the control samples are of the input or IgG type.

Supplementary Figure 6 Overview of the methodology used to calculate correlation of factors affecting PcGs.

(a) Heatmaps showing the Pearsson’s correlation coefficients (r-value) for the positive control test of six Ezh2 and ten Suz12 datasets from mESCs. Data were quantified and analyzed in 200 bp windows derived from peaks from the very same Ezh2 dataset. The average correlation from all 60 combinations is shown below and in Fig. 7a. SRR-containing names refer to SRA accession numbers (http://www.ncbi.nlm.nih.gov/sra) and can be found in Supplementary Table 1. (b) Heatmaps showing the r-values from each combination in the positive control test of six Ezh2 and ten Suz12 datasets in relation to the size of the window used for quantifying Suz12 signal. The regions used for the analyses were derived from the very same Ezh2 dataset as analyzed, by extending each peak to the closest size divisible by 200 and subdividing it into 200 bp fragments. Then, the Ezh2 signal was quantified within these 200 bp regions and the windows used for quantifying Suz12 ranged from 0.2 to 2 kbp in steps of 0.2 kbp, 2 to 20 kbp in steps of 2 kbp, and 20 to 200 kbp in steps of 20 kbp. The r-values were calculated for each combination of window sizes. (c) Example of how the domain sizes were derived from the range of r-values from two combinations of Ezh2 and Suz12 datasets. For each range of r-values the domain size is set to the window size resulting in the highest r-value. (d) Visualization of the range of domain sizes at a single locus within the Ptprd gene together with tracks of selected ChIP-seq signals. The triangle illustrates the extent of the domains in relation to the color coding in the heatmaps in c, e, and f, and the gray area in the middle illustrates the extent of the 200 bp window used for quantitating PcGs (Ezh2 in the case of a-f). (e) Heatmap showing the domain sizes as derived in d for each combination of Ezh2 and Suz12 datasets. (f) Heatmap showing the domain sizes in E ranked and presented in one dimension together with the median domain size (bottom). (g, h) Test of the robustness in the calculation of domain sizes for the combination datasets exemplified in e. The sizes of the datasets were reduced randomly to the target sizes on the axes in steps of 1M reads (Ezh2) or 2M reads (Suz12), correlated and analyzed as in e. Calculated domain sizes were largely independent on dataset size, and therefore signal strength of Ezh2. For Suz12, the calculated domain size was only increased when dataset sizes were =< 4M reads. PcGs are tightly correlated to H2AUb levels, whereas H3K27me3 association is weaker and on long range. (i) Heatmaps showing the average of Pearsson’s correlation coefficients from combinations of PcG member and chromatin features that potentially affect PcG recruitment. The setup is similar to that of Fig. 6a but the window size used for analyzing the features varies from 200 bp to 200 kbp in steps of 200 bp for sizes up to 2 kbp, in steps of 2 kbp for sizes from 2 kbp to 20 kbp, and in steps of 20 kbp for sizes above that. Analysis was only done in areas scored as peak for each PcG dataset.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–6 and Supplementary Note (PDF 1256 kb)

Supplementary Table 1

List of accession numbers, reference to primary publication, and metadata for all datasets used for main and supplementary figures except Figure 6e–g. (XLSX 23 kb)

Supplementary Table 2

List containing accession numbers, metadata, and Z-scores for all datasets used for main Figure 6e–g (XLSX 6266 kb)

Supplementary Software

EaSeq (ZIP 39300 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lerdrup, M., Johansen, J., Agrawal-Singh, S. et al. An interactive environment for agile analysis and visualization of ChIP-sequencing data. Nat Struct Mol Biol 23, 349–357 (2016). https://doi.org/10.1038/nsmb.3180

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nsmb.3180

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing