An interactive environment for agile analysis and visualization of ChIP-sequencing data

Lerdrup, Mads; Johansen, Jens Vilstrup; Agrawal-Singh, Shuchi; Hansen, Klaus

doi:10.1038/nsmb.3180

Technical Report
Published: 29 February 2016

An interactive environment for agile analysis and visualization of ChIP-sequencing data

Mads Lerdrup^1,2,
Jens Vilstrup Johansen^1,3,
Shuchi Agrawal-Singh^1,2 &
…
Klaus Hansen^1,2

Nature Structural & Molecular Biology volume 23, pages 349–357 (2016)Cite this article

8386 Accesses
148 Citations
50 Altmetric
Metrics details

Subjects

Abstract

To empower experimentalists with a means for fast and comprehensive chromatin immunoprecipitation sequencing (ChIP-seq) data analyses, we introduce an integrated computational environment, EaSeq. The software combines the exploratory power of genome browsers with an extensive set of interactive and user-friendly tools for genome-wide abstraction and visualization. It enables experimentalists to easily extract information and generate hypotheses from their own data and public genome-wide datasets. For demonstration purposes, we performed meta-analyses of public Polycomb ChIP-seq data and established a new screening approach to analyze more than 900 datasets from mouse embryonic stem cells for factors potentially associated with Polycomb recruitment. EaSeq, which is freely available and works on a standard personal computer, can substantially increase the throughput of many analysis workflows, facilitate transparency and reproducibility by automatically documenting and organizing analyses, and enable a broader group of scientists to gain insights from ChIP-seq data.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: EaSeq covers comprehensive ChIP-seq analysis workflows.**

**Figure 2: EaSeq facilitates analysis and exploration through a high level of visualization and interaction.**

**Figure 3: Overview of the versatile collection of tools for analysis and visualization in EaSeq, along with short descriptions of their main functions.**

**Figure 4: Examples of selected tools for data normalization and control.**

**Figure 5: EaSeq's peak-finding procedure generates peak sets of the same quality as those generated by widely used algorithms.**

**Figure 6: PcG enrichment at subtypes of CGIs and characterization of the chromatin environment at CGIs with differential PcG recruitment.**

**Figure 7: PcGs are more correlated to local umCpG and H2AUb than H3K27me3 levels within short ranges.**

**Figure 8: H3K27me3 tends to flank Suz12 peaks and CGIs to a greater extent than umCpG do and H2AUb.**

AutoRELACS: automated generation and analysis of ultra-parallel ChIP-seq

Article Open access 24 July 2020

L. Arrigoni, F. Ferrari, … T. Manke

Streamlined quantitative analysis of histone modification abundance at nucleosome-scale resolution with siQ-ChIP version 2.0

Article Open access 09 May 2023

Bradley M. Dickson, Ariana Kupai, … Scott B. Rothbart

ArchR is a scalable software package for integrative single-cell chromatin accessibility analysis

Article Open access 25 February 2021

Jeffrey M. Granja, M. Ryan Corces, … William J. Greenleaf

Accession codes

Accessions

Gene Expression Omnibus

GSE1133

References

Nekrutenko, A. & Taylor, J. Next-generation sequencing data interpretation: enhancing reproducibility and accessibility. Nat. Rev. Genet. 13, 667–672 (2012).
Article CAS PubMed Google Scholar
Plocik, A.M. & Graveley, B.R. New insights from existing sequence data: generating breakthroughs without a pipette. Mol. Cell 49, 605–617 (2013).
Article CAS PubMed PubMed Central Google Scholar
Marx, V. Biology: the big challenges of big data. Nature 498, 255–260 (2013).
Article CAS PubMed Google Scholar
Bernstein, B.E. et al. The NIH Roadmap Epigenomics Mapping Consortium. Nat. Biotechnol. 28, 1045–1048 (2010).
Article CAS PubMed PubMed Central Google Scholar
Consortium, E.P. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Article CAS Google Scholar
Edgar, R., Domrachev, M. & Lash, A.E. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210 (2002).
Article CAS PubMed PubMed Central Google Scholar
Chelaru, F., Smith, L., Goldstein, N. & Bravo, H.C. Epiviz: interactive visual analytics for functional genomics data. Nat. Methods 11, 938–940 (2014).
Article CAS PubMed PubMed Central Google Scholar
Nielsen, C.B. et al. Spark: a navigational paradigm for genomic data exploration. Genome Res. 22, 2262–2269 (2012).
Article CAS PubMed PubMed Central Google Scholar
Coulombe, C. et al. VAP: a versatile aggregate profiler for efficient genome-wide data representation and discovery. Nucleic Acids Res. 42, W485–W493 (2014).
Article CAS PubMed PubMed Central Google Scholar
Huang, W., Loganantharaj, R., Schroeder, B., Fargo, D. & Li, L. PAVIS: a tool for peak annotation and visualization. Bioinformatics 29, 3097–3099 (2013).
Article CAS PubMed PubMed Central Google Scholar
Ji, H. et al. An integrated software system for analyzing ChIP-chip and ChIP-seq data. Nat. Biotechnol. 26, 1293–1300 (2008).
Article CAS PubMed PubMed Central Google Scholar
Robinson, J.T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Article CAS PubMed PubMed Central Google Scholar
Sadeghi, L., Bonilla, C., Strålfors, A., Ekwall, K. & Svensson, J.P. Podbat: a novel genomic tool reveals Swr1-independent H2A.Z incorporation at gene coding sequences through epigenetic meta-analysis. PLoS Comput. Biol. 7, e1002163 (2011).
Article CAS PubMed PubMed Central Google Scholar
Salmon-Divon, M., Dvinge, H., Tammoja, K. & Bertone, P. PeakAnalyzer: genome-wide annotation of chromatin binding and modification loci. BMC Bioinformatics 11, 415 (2010).
Article CAS PubMed PubMed Central Google Scholar
Ye, T. et al. seqMINER: an integrated ChIP-seq data interpretation platform. Nucleic Acids Res. 39, e35 (2011).
Article CAS PubMed Google Scholar
Giardine, B. et al. Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 15, 1451–1455 (2005).
Article CAS PubMed PubMed Central Google Scholar
Halbritter, F., Vaidya, H.J. & Tomlinson, S.R. GeneProf: analysis of high-throughput sequencing experiments. Nat. Methods 9, 7–8 (2012).
Article CAS Google Scholar
Liu, T. et al. Cistrome: an integrative platform for transcriptional regulation studies. Genome Biol. 12, R83 (2011).
Article CAS PubMed PubMed Central Google Scholar
Gentleman, R.C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004).
Article PubMed PubMed Central Google Scholar
R Development Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, Vienna, Austria, 2008).
Bendall, S.C. & Nolan, G.P. From single cells to deep phenotypes in cancer. Nat. Biotechnol. 30, 639–647 (2012).
Article CAS PubMed Google Scholar
Barski, A. et al. High-resolution profiling of histone methylations in the human genome. Cell 129, 823–837 (2007).
Article CAS PubMed Google Scholar
Anders, S., Pyl, P.T. & Huber, W. HTSeq: a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).
Article CAS PubMed Google Scholar
Quinlan, A.R. & Hall, I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Gautier, L., Cope, L., Bolstad, B.M. & Irizarry, R.A. affy: analysis of Affymetrix GeneChip data at the probe level. Bioinformatics 20, 307–315 (2004).
Article CAS PubMed Google Scholar
Kharchenko, P.V., Tolstorukov, M.Y. & Park, P.J. Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nat. Biotechnol. 26, 1351–1359 (2008).
Article CAS PubMed PubMed Central Google Scholar
Landt, S.G. et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 22, 1813–1831 (2012).
Article CAS PubMed PubMed Central Google Scholar
Marinov, G.K., Kundaje, A., Park, P.J. & Wold, B.J. Large-scale quality analysis of published ChIP-seq data. G3 (Bethesda) 4, 209–223 (2014).
Article Google Scholar
Cleveland, W.S. Robust locally weighted regression and smoothing scatterplots. J. Am. Stat. Assoc. 74, 829–836 (1979).
Article Google Scholar
Cleveland, W.S. Lowess: a program for smoothing scatterplots by robust locally weighted regression. Am. Stat. 35, 54 (1981).
Article Google Scholar
Amaratunga, D. & Cabrera, J. Analysis of data from viral DNA microchips. J. Am. Stat. Assoc. 96, 1161–1170 (2001).
Article Google Scholar
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
Article CAS PubMed PubMed Central Google Scholar
Liang, K. & Keleş, S. Normalization of ChIP-seq data with control. BMC Bioinformatics 13, 199 (2012).
Article PubMed PubMed Central Google Scholar
Pepke, S., Wold, B. & Mortazavi, A. Computation for ChIP-seq and RNA-seq studies. Nat. Methods 6, S22–S32 (2009).
Article CAS PubMed PubMed Central Google Scholar
Wilbanks, E.G. & Facciotti, M.T. Evaluation of algorithm performance in ChIP-seq peak detection. PLoS One 5, e11471 (2010).
Article CAS PubMed PubMed Central Google Scholar
Johnson, D.S., Mortazavi, A., Myers, R.M. & Wold, B. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007).
Article CAS PubMed Google Scholar
Jiang, H., Wang, F., Dyer, N.P. & Wong, W.H. CisGenome Browser: a flexible tool for genomic data visualization. Bioinformatics 26, 1781–1782 (2010).
Article CAS PubMed PubMed Central Google Scholar
Valouev, A. et al. Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. Nat. Methods 5, 829–834 (2008).
Article CAS PubMed PubMed Central Google Scholar
Simon, J.A. & Kingston, R.E. Occupying chromatin: Polycomb mechanisms for getting to genomic targets, stopping transcriptional traffic, and staying put. Mol. Cell 49, 808–824 (2013).
Article CAS PubMed PubMed Central Google Scholar
Steffen, P.A. & Ringrose, L. What are memories made of? How Polycomb and Trithorax proteins mediate epigenetic memory. Nat. Rev. Mol. Cell Biol. 15, 340–356 (2014).
Article CAS PubMed Google Scholar
Margueron, R. & Reinberg, D. The Polycomb complex PRC2 and its mark in life. Nature 469, 343–349 (2011).
Article CAS PubMed PubMed Central Google Scholar
Di Croce, L. & Helin, K. Transcriptional regulation by Polycomb group proteins. Nat. Struct. Mol. Biol. 20, 1147–1155 (2013).
Article CAS PubMed Google Scholar
Blackledge, N.P. et al. Variant PRC1 complex-dependent H2A ubiquitylation drives PRC2 recruitment and polycomb domain formation. Cell 157, 1445–1459 (2014).
Article CAS PubMed PubMed Central Google Scholar
Farcas, A.M. et al. KDM2B links the Polycomb Repressive Complex 1 (PRC1) to recognition of CpG islands. eLife 1, e00205 (2012).
Article CAS PubMed PubMed Central Google Scholar
He, J. et al. Kdm2b maintains murine embryonic stem cell status by recruiting PRC1 complex to CpG islands of developmental genes. Nat. Cell Biol. 15, 373–384 (2013).
Article CAS PubMed PubMed Central Google Scholar
Wu, X., Johansen, J.V. & Helin, K. Fbxl10/Kdm2b recruits polycomb repressive complex 1 to CpG islands and regulates H2A ubiquitylation. Mol. Cell 49, 1134–1146 (2013).
Article CAS PubMed Google Scholar
Riising, E.M. et al. Gene silencing triggers polycomb repressive complex 2 recruitment to CpG islands genome wide. Mol. Cell 55, 347–360 (2014).
Article CAS PubMed Google Scholar
Klose, R.J., Cooper, S., Farcas, A.M., Blackledge, N.P. & Brockdorff, N. Chromatin sampling: an emerging perspective on targeting polycomb repressor proteins. PLoS Genet. 9, e1003717 (2013).
Article CAS PubMed PubMed Central Google Scholar
Hansen, K.H. et al. A model for transmission of the H3K27me3 epigenetic mark. Nat. Cell Biol. 10, 1291–1300 (2008).
Article CAS PubMed Google Scholar
Margueron, R. et al. Role of the polycomb protein EED in the propagation of repressive histone marks. Nature 461, 762–767 (2009).
Article CAS PubMed PubMed Central Google Scholar
Cooper, S. et al. Targeting polycomb to pericentric heterochromatin in embryonic stem cells reveals a role for H2AK119u1 in PRC2 recruitment. Cell Rep. 7, 1456–1470 (2014).
Article CAS PubMed PubMed Central Google Scholar
Kalb, R. et al. Histone H2A monoubiquitination promotes histone H3 methylation in Polycomb repression. Nat. Struct. Mol. Biol. 21, 569–571 (2014).
Article CAS PubMed Google Scholar
Denissov, S. et al. Mll2 is required for H3K4 trimethylation on bivalent promoters in embryonic stem cells, whereas Mll1 is redundant. Development 141, 526–537 (2014).
Article CAS PubMed Google Scholar
Moon, T.S., Lou, C., Tamsir, A., Stanton, B.C. & Voigt, C.A. Genetic programs constructed from layered logic gates in single cells. Nature 491, 249–253 (2012).
Article CAS PubMed PubMed Central Google Scholar
Dietrich, N. et al. REST-mediated recruitment of polycomb repressor complexes in mammalian cells. PLoS Genet. 8, e1002494 (2012).
Article CAS PubMed PubMed Central Google Scholar
Tavares, L. et al. RYBP-PRC1 complexes mediate H2A ubiquitylation at polycomb target sites independently of PRC2 and H3K27me3. Cell 148, 664–678 (2012).
Article CAS PubMed PubMed Central Google Scholar
Pengelly, A.R., Copur, Ö., Jäckle, H., Herzig, A. & Müller, J. A histone mutant reproduces the phenotype caused by loss of histone-modifying factor Polycomb. Science 339, 698–699 (2013).
Article CAS PubMed Google Scholar
Bernstein, E. et al. Mouse polycomb proteins bind differentially to methylated histone H3 and RNA and are enriched in facultative heterochromatin. Mol. Cell. Biol. 26, 2560–2569 (2006).
Article CAS PubMed PubMed Central Google Scholar
Kent, W.J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).
Article CAS PubMed PubMed Central Google Scholar
Saldanha, A.J. Java Treeview: extensible visualization of microarray data. Bioinformatics 20, 3246–3248 (2004).
Article CAS PubMed Google Scholar
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Article CAS PubMed PubMed Central Google Scholar
Li, H. et al. 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article CAS PubMed PubMed Central Google Scholar
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B Stat. Methodol. 57, 289–300 (1995).
Google Scholar
Pruitt, K.D. et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 42, D756–D763 (2014).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We would like to thank J. Christensen, P. Cloos, N. Dietrich, N. Jungersen, J. Wegeberg and K. Helin for inspiring discussions, suggestions and proofreading; K. Isaksen for help with easeq.net; and T. Rasborg for usability consultancy. This work was supported by a grant from the Danish National Research Foundation (DNRF82) (K.H.).

Author information

Authors and Affiliations

Biotech Research and Innovation Centre (BRIC), University of Copenhagen, Copenhagen, Denmark
Mads Lerdrup, Jens Vilstrup Johansen, Shuchi Agrawal-Singh & Klaus Hansen
Centre for Epigenetics, University of Copenhagen, Copenhagen, Denmark
Mads Lerdrup, Shuchi Agrawal-Singh & Klaus Hansen
Department of Biology, The Bioinformatics Centre, University of Copenhagen, Copenhagen, Denmark
Jens Vilstrup Johansen

Authors

Mads Lerdrup
View author publications
You can also search for this author in PubMed Google Scholar
Jens Vilstrup Johansen
View author publications
You can also search for this author in PubMed Google Scholar
Shuchi Agrawal-Singh
View author publications
You can also search for this author in PubMed Google Scholar
Klaus Hansen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.L. contributed software design, algorithms, general programming and manuscript writing. J.V.J. contributed algorithms, benchmarking, validation and manuscript proofreading. S.A.-S. contributed testing, usability testing and manuscript and software proofreading. K.H. contributed software design, funding, usability testing and manuscript writing.

Corresponding authors

Correspondence to Mads Lerdrup or Klaus Hansen.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Regionsets in EaSeq can be ordered and generated from other types of data, and can contain parameters such as quantified levels of ChIP-seq signal.

(a) Sorting of regions by a graphical interface (middle). Left: H3K4me3 heatmap at 19,903 TSS that was sorted for gene expression in CD4+ T-cells with the highest expression at the base. Right: data after sorting for quantified H3K4me3 levels (‘Sort’ tool). (b) Overview of how new regionsets are generated from datasets using the ‘Peaks’ tool (top), from genesets using the ‘Extract’ tool (middle), or other regionsets using the ‘Controls’ tool (bottom). (c) New parameters are generated and added to an existing regionset using the ‘Quantify’ tool for quantitation of ChIP-seq signal at the regions. Left: examples of coordinates from the 19,903 genes in the regionset. Middle: example tracks generated by the ‘Quantify’ tool and showing levels of H3K4me3, H3K4me2, and RNA-PolII signal surrounding the start of the regions of interest. Signal within the regions were automatically colored orange and black outside the regions. The blue square overlaid by the tool illustrates the quantified area on the X-axis and the quantified value on the Y-axis. Right: results from the quantitation of the four example regions. Each column corresponds to a new parameter added to the existing regionset.

Supplementary Figure 2 Side-by-side comparison of tools in EaSeq and frequently used command-line counterparts.

(a, b) Tracks from EaSeq (top) and the UCSC genome browser (bottom) showing high resemblance in the positions of individual reads from an H3K27me3 ChIP (SRR946804) mapped at mm9. Tracks show the Ecel1 locus (a) and a 1 kbp window of the Sgk3 locus (b). Each plot is unmodified output from EaSeq or the UCSC genome browser. (c, d) 2D-histograms showing the distribution of read counts at the +/-5 kbp surrounding mm9 CGIs counted using EaSeqs quantitation tool and Bedtools intersect –c (c) or HTSeq (d) as well as the Pearson’s correlation coefficient, r. (e) Pseudocolored 2D-histogram showing relationship between the distribution of read counts as in (d) in relation to the distance between the CGIs. Coloring reflects the average log10 distance to the nearest CGI (measured from center to center), and red, green, and blue colors reflects average distance of app 10 kbp, 3 kbp, and 1 kbp. (f) 2D-histogram showing the distribution of read counts at the +/-5 kbp surrounding mm9 CGIs that are distanced at least 10 kbp + one read length (150 bp) from the nearest neighboring CGI. Reads were counted as in (d) as well as the Pearson’s correlation coefficient, r. (g-k) 2D-histograms showing the quantified values of read counts at the +/-5 kbp surrounding mm9 CGIs counted from two H3K27me3 datasets from mESCs (SRR946804 and SRR946806) using EaSeqs quantitation tool and left unnormalized (g) or normalized using quantile normalization in R (h), quantile normalization in EaSeq (i), or LOESS normalization in EaSeq (k). (j) Shows the relationship between the Quantile normalization of SRR946804 done using R (X-axis) and EaSeq (Y-Axis) as well as the Pearson’s correlation coefficient.

Supplementary Figure 3 Side-by-side comparison of tools in EaSeq and frequently used command-line counterparts.

(a) Histograms showing the size distributions of 100 bp regions generated from larger CGIs using EaSeq’s homogenization tool (left) and Bedtools makewindows (right). (b) Illustration of how a single CGI (green) at the Zdhhc8 locus is subdivided into smaller 100 bp regions using EaSeq’s homogenization tool (red) and Bedtools makewindows (blue). Coordinates refer to mm9. (c) 2D-histograms showing the distribution of distances between CGIs and the nearest TSS identified using EaSeqs colocalization tool (Y-axis) and Bedtools closest (X-axis) as well as the number of CGIs (n) visualized in each plot. Left plot shows all CGIs, middle plot only those that do not overlap with a TSS, and right plot those that overlap. The orange circle and arrow illustrates the location in the histogram of the three CGIs shown in (d). (d) Illustrations of individual CGIs (green) at three different loci where EaSeq colocalization tool (red) and Bedtools closest (blue) reach divergent results when assigning a TSS for each CGI. Coordinates refer to mm9. (e) Heatmaps of CGIs sorted for overlap with gene bodies and sizes and colored according to overlap with particular sets of regions identified using EaSeq tools or their Bedtools intersect counterpart. Upper panel: The parts of the CGIs overlapping with the regions computed by EaSeq or Bedtools were colored according to the color illustrated for each heatmap, whereas non-overlapping CGIs or parts hereof were colored black. Lower panel: The subset of regions in each heatmap were derived from the CGI population using the EaSeq or Bedtools algorithm described at the origin of each arrow, and colored according to the colors used for that subset of regions in the upper panel. The parts of those subsets that were derived using EaSeq and overlapped with regions derived using the Bedtools counterpart, or a Bedtools algorithm that generates a mutually exclusive set of regions, were highlighted in the color of the Bedtools set of regions – and vice versa – illustrating the extent of overlap between EaSeq and Bedtools counterparts.

Supplementary Figure 4 EaSeq’s ‘adaptive local thresholding’ (ALT) peak-finding procedure generates peak sets of the same quality as those generated by widely used peak-finding procedures.

(a) Number of peaks found for the transcription factors NRSF and GABP. (b, c) Graphs depicting the fraction of a set of PCR validated true positives (b) or identified motifs (c) that were identified as a function of the ranking of peaks found by ChIP-seq for the transcription factor GABP. (d) Box plot illustrating the positional accuracy of GABP peaks found by EaSeq compared to those identified in the same dataset by CisGenome and the two versions of MACS. Y-axis values represent the genome-wide distances from the apex of each peak to the nearest GABP motif (if any). (e) Graph depicting the fraction of GABP-motif containing-peaks in relation to peak rank.

Supplementary Figure 5 Heat maps show elevated levels of PcGs at CGIs associated near active genes, but the signals of some negative controls are also elevated at CGIs in general.

(a) Heat maps of PcG levels at +/-2.5 kbp near the center of all mouse CGIs (arrowheads = center) show an elevated vPRC1 signal at H3K4me3 enriched CGIs. The order of the CGIs was derived from nearest neighbor chain hierarchical clustering of PcGs, H3K27me3, H2AUb, input, and IgG control signals within +/- 2.5 kbp of the CGI centers using the ‘ClusterP’-tool in EaSeq, which is based on the algorithm (Juan, Les Cahiers de l'Analyse des Données. 7, 6, 1982, Benzécri, Les Cahiers de l'Analyse des Données. 7, 9, 1982). H3K4me3 and H3K36me3 marks were included to visualize the transcriptional status of the loci surrounding the CGIs. * vPRC1 signal at active CGIs. (b) Heat maps of several negative controls within 5 kbp of CGIs (Arrowheads) frequently show marked enrichment at CGIs. CGIs were sorted according to the K4me3-K27me3 balance used in Fig. 6a-c. Fr/M/kbps: DNA fragments per million reads per kilobasepairs. Grey bars marks if the control samples are of the input or IgG type.

Supplementary Figure 6 Overview of the methodology used to calculate correlation of factors affecting PcGs.

(a) Heatmaps showing the Pearsson’s correlation coefficients (r-value) for the positive control test of six Ezh2 and ten Suz12 datasets from mESCs. Data were quantified and analyzed in 200 bp windows derived from peaks from the very same Ezh2 dataset. The average correlation from all 60 combinations is shown below and in Fig. 7a. SRR-containing names refer to SRA accession numbers (http://www.ncbi.nlm.nih.gov/sra) and can be found in Supplementary Table 1. (b) Heatmaps showing the r-values from each combination in the positive control test of six Ezh2 and ten Suz12 datasets in relation to the size of the window used for quantifying Suz12 signal. The regions used for the analyses were derived from the very same Ezh2 dataset as analyzed, by extending each peak to the closest size divisible by 200 and subdividing it into 200 bp fragments. Then, the Ezh2 signal was quantified within these 200 bp regions and the windows used for quantifying Suz12 ranged from 0.2 to 2 kbp in steps of 0.2 kbp, 2 to 20 kbp in steps of 2 kbp, and 20 to 200 kbp in steps of 20 kbp. The r-values were calculated for each combination of window sizes. (c) Example of how the domain sizes were derived from the range of r-values from two combinations of Ezh2 and Suz12 datasets. For each range of r-values the domain size is set to the window size resulting in the highest r-value. (d) Visualization of the range of domain sizes at a single locus within the Ptprd gene together with tracks of selected ChIP-seq signals. The triangle illustrates the extent of the domains in relation to the color coding in the heatmaps in c, e, and f, and the gray area in the middle illustrates the extent of the 200 bp window used for quantitating PcGs (Ezh2 in the case of a-f). (e) Heatmap showing the domain sizes as derived in d for each combination of Ezh2 and Suz12 datasets. (f) Heatmap showing the domain sizes in E ranked and presented in one dimension together with the median domain size (bottom). (g, h) Test of the robustness in the calculation of domain sizes for the combination datasets exemplified in e. The sizes of the datasets were reduced randomly to the target sizes on the axes in steps of 1M reads (Ezh2) or 2M reads (Suz12), correlated and analyzed as in e. Calculated domain sizes were largely independent on dataset size, and therefore signal strength of Ezh2. For Suz12, the calculated domain size was only increased when dataset sizes were =< 4M reads. PcGs are tightly correlated to H2AUb levels, whereas H3K27me3 association is weaker and on long range. (i) Heatmaps showing the average of Pearsson’s correlation coefficients from combinations of PcG member and chromatin features that potentially affect PcG recruitment. The setup is similar to that of Fig. 6a but the window size used for analyzing the features varies from 200 bp to 200 kbp in steps of 200 bp for sizes up to 2 kbp, in steps of 2 kbp for sizes from 2 kbp to 20 kbp, and in steps of 20 kbp for sizes above that. Analysis was only done in areas scored as peak for each PcG dataset.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–6 and Supplementary Note (PDF 1256 kb)

Supplementary Table 1

List of accession numbers, reference to primary publication, and metadata for all datasets used for main and supplementary figures except Figure 6e–g. (XLSX 23 kb)

Supplementary Table 2

List containing accession numbers, metadata, and Z-scores for all datasets used for main Figure 6e–g (XLSX 6266 kb)

Supplementary Software

EaSeq (ZIP 39300 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lerdrup, M., Johansen, J., Agrawal-Singh, S. et al. An interactive environment for agile analysis and visualization of ChIP-sequencing data. Nat Struct Mol Biol 23, 349–357 (2016). https://doi.org/10.1038/nsmb.3180

Download citation

Received: 12 June 2015
Accepted: 29 January 2016
Published: 29 February 2016
Issue Date: April 2016
DOI: https://doi.org/10.1038/nsmb.3180

This article is cited by

TRAIP resolves DNA replication-transcription conflicts during the S-phase of unperturbed cells
- Shaun Scaramuzza
- Rebecca M. Jones
- Agnieszka Gambus
Nature Communications (2023)
Smad3 is essential for polarization of tumor-associated neutrophils in non-small cell lung carcinoma
- Jeff Yat-Fai Chung
- Philip Chiu-Tsun Tang
- Patrick Ming-Kuen Tang
Nature Communications (2023)
Histone editing elucidates the functional roles of H3K27 methylation and acetylation in mammals
- Aditya Sankar
- Faizaan Mohammad
- Kristian Helin
Nature Genetics (2022)
Histone modifications during the life cycle of the brown alga Ectocarpus
- Simon Bourdareau
- Leila Tirichine
- J. Mark Cock
Genome Biology (2021)
G-quadruplexes originating from evolutionary conserved L1 elements interfere with neuronal gene expression in Alzheimer’s disease
- Roy Hanna
- Anthony Flamier
- Gilbert Bernier
Nature Communications (2021)