Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A computational pipeline for comparative ChIP-seq analyses

Abstract

Chromatin immunoprecipitation (ChIP) followed by deep sequencing can now easily be performed across different conditions, time points and even species. However, analyzing such data is not trivial and standard methods are as yet unavailable. Here we present a protocol to systematically compare ChIP-sequencing (ChIP-seq) data across conditions. We first describe technical guidelines for data preprocessing, read mapping, read-density visualization and peak calling. We then describe methods and provide code with specific examples to compare different data sets across species and across conditions, including a threshold-free approach to measure global similarity, a strategy to assess the binary conservation of binding events and measurements for quantitative changes of binding. We discuss how differences in binding can be related to gene functions, gene expression and sequence changes. Once established, this protocol should take about 2 d to complete and be generally applicable to many data sets.

This is a preview of subscription content

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: Computational pipeline for comparative analyses of ChIP-seq data.
Figure 2: Choice of sensitive thresholds when comparing ChIP-seq samples.
Figure 3: Assessing choice of thresholds and its impact on conservation estimates.
Figure 4: Binding conservation at different peak ranks.
Figure 5: Sensitive estimation of conservation in vertebrates.
Figure 6: Comparative ChIP-seq data of the C. elegans transcription factor PHA4/FOXA across conditions.

References

  1. Ren, B. et al. Genome-wide location and function of DNA binding proteins. Science 290, 2306–2309 (2000).

    CAS  Article  PubMed  Google Scholar 

  2. Iyer, V.R. et al. Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature 409, 533–538 (2001).

    CAS  Article  PubMed  Google Scholar 

  3. Johnson, D.S., Mortazavi, A., Myers, R.M. & Wold, B. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007).

    CAS  Article  PubMed  Google Scholar 

  4. Robertson, G. et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat. Methods 4, 651–657 (2007).

    CAS  Article  PubMed  Google Scholar 

  5. Sandmann, T. et al. A core transcriptional network for early mesoderm development in Drosophila melanogaster. Genes Dev. 21, 436–449 (2007).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  6. Zinzen, R.P., Girardot, C., Gagneur, J., Braun, M. & Furlong, E.E.M. Combinatorial binding predicts spatio-temporal cis-regulatory activity. Nature 462, 65–70 (2009).

    CAS  Article  PubMed  Google Scholar 

  7. Lin, Y.C. et al. A global network of transcription factors, involving E2A, EBF1 and Foxo1, that orchestrates B cell fate. Nat. Immunol. 11, 635–643 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  8. Palii, C.G. et al. Differential genomic targeting of the transcription factor TAL1 in alternate haematopoietic lineages. EMBO J. 30, 494–509 (2011).

    CAS  Article  PubMed  Google Scholar 

  9. He, Q. et al. High conservation of transcription factor binding and evidence for combinatorial regulation across six Drosophila species. Nat. Genet. 43, 414–420 (2011).

    CAS  Article  PubMed  Google Scholar 

  10. Schmidt, D. et al. Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding. Science 328, 1036–1040 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  11. Bradley, R.K. et al. Binding site turnover produces pervasive quantitative changes in transcription factor binding between closely related Drosophila species. PLoS Biol. 8, e1000343 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Kunarso, G. et al. Transposable elements have rewired the core regulatory network of human embryonic stem cells. Nat. Genet. 42, 631–634 (2010).

    CAS  Article  PubMed  Google Scholar 

  13. Mikkelsen, T.S. et al. Comparative epigenomic analysis of murine and human adipogenesis. Cell 143, 156–169 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  14. Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Wilbanks, E.G. & Facciotti, M.T. Evaluation of algorithm performance in ChIP-seq peak detection. PLoS ONE 5, e11471 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B (Methodological) 57, 289–300 (1995).

    Article  Google Scholar 

  17. Noble, W.S. How does multiple testing correction work? Nat. Biotechnol. 27, 1135–1137 (2009).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  18. Lohmueller, K.E., Pearce, C.L., Pike, M., Lander, E.S. & Hirschhorn, J.N. Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nat. Genet. 33, 177–182 (2003).

    CAS  Article  PubMed  Google Scholar 

  19. Toth, J. & Biggin, M.D. The specificity of protein-DNA crosslinking by formaldehyde: in vitro and in Drosophila embryos. Nucleic Acids Res. 28, e4 (2000).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  20. Lettice, L.A. et al. A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly. Hum. Mol. Genet. 12, 1725–1735 (2003).

    CAS  Article  PubMed  Google Scholar 

  21. Sagai, T., Hosoya, M., Mizushina, Y., Tamura, M. & Shiroishi, T. Elimination of a long-range cis-regulatory module causes complete loss of limb-specific Shh expression and truncation of the mouse limb. Development 132, 797–803 (2005).

    CAS  Article  PubMed  Google Scholar 

  22. Hong, J.-W., Hendrix, D.A. & Levine, M.S. Shadow enhancers as a source of evolutionary novelty. Science 321, 1314 (2008).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  23. Nègre, N. et al. A comprehensive map of insulator elements for the Drosophila genome. PLoS Genet. 6, e1000814 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  24. Cuddapah, S. et al. Global analysis of the insulator binding protein CTCF in chromatin barrier regions reveals demarcation of active and repressive domains. Genome Res. 19, 24–32 (2009).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  25. Stanley, S.M., Bailey, T.L. & Mattick, J.S. GONOME: measuring correlations between GO terms and genomic positions. BMC Bioinformatics 7, 94 (2006).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Zeitlinger, J. & Stark, A. Developmental gene regulation in the era of genomics. Dev. Biol. 339, 230–239 (2010).

    CAS  Article  PubMed  Google Scholar 

  27. Borneman, A.R. et al. Divergence of transcription factor binding sites across related yeast species. Science 317, 815–819 (2007).

    CAS  Article  PubMed  Google Scholar 

  28. Zheng, W., Zhao, H., Mancera, E., Steinmetz, L.M. & Snyder, M. Genetic analysis of variation in transcription factor binding in yeast. Nature 464, 1187–1191 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  29. Meireles-Filho, A.C.A. & Stark, A. Comparative genomics of gene regulation-conservation and divergence of cis-regulatory information. Curr. Opin. Genet. Dev. 19, 565–570 (2009).

    CAS  Article  PubMed  Google Scholar 

  30. Zhong, M. et al. Genome-wide identification of binding sites defines distinct functions for Caenorhabditis elegans PHA-4/FOXA in development and environmental response. PLoS Genet. 6, e1000848 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Kim, T.H. & Ren, B. Genome-wide analysis of protein-DNA interactions. Annu. Rev. Genomics Hum. Genet. 7, 81–102 (2006).

    Article  PubMed  Google Scholar 

  32. Zeitlinger, J. et al. Program-specific distribution of a transcription factor dependent on partner transcription factor and MAPK signaling. Cell 113, 395–404 (2003).

    CAS  Article  PubMed  Google Scholar 

  33. Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  34. Matys, V. et al. TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 31, 374–378 (2003).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  35. Sandelin, A., Alkema, W., Engström, P., Wasserman, W.W. & Lenhard, B. JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 32, D91–D94 (2004).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  36. Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  37. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  38. Horner, D.S. et al. Bioinformatics approaches for genomics and post genomics applications of next-generation sequencing. Brief. Bioinformatics 11, 181–197 (2010).

    CAS  Article  PubMed  Google Scholar 

  39. Li, H. et al. The sequence alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Quinlan, A.R. & Hall, I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  41. Kent, W.J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  42. Bailey, T.L., Williams, N., Misleh, C. & Li, W.W. MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res. 34, W369–W373 (2006).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  43. Bailey, T.L. & Gribskov, M. Combining evidence using P-values: application to sequence homology searches. Bioinformatics 14, 48–54 (1998).

    CAS  Article  PubMed  Google Scholar 

  44. Das, M.K. & Dai, H.-K. A survey of DNA motif finding algorithms. BMC Bioinformatics 8 (Suppl. 7): S21 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  45. Tompa, M. et al. Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 23, 137–144 (2005).

    CAS  Article  PubMed  Google Scholar 

  46. Auerbach, R.K. et al. Mapping accessible chromatin regions using Sono-Seq. Proc. Natl. Acad. Sci. USA 106, 14926–14931 (2009).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  47. Teytelman, L. et al. Impact of chromatin structures on DNA processing for genomic analyses. PLoS ONE 4, e6700 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  48. Aird, D. et al. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol. 12, R18 (2011).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank M. Jaritz; I. Tamir; A. Sommer; O. Yanez-Cuna; D. Gerlach (Institute of Molecular Pathology); J. Steinmann (Institute of Molecular Biotechnology (IMBA)); and S. Meier and C. Seidel (Stowers Institute for Medical Research) for discussions, help and advice. A.F.B. was supported by the Austrian Ministry for Science and Research through the Genome Research in Austria (GEN-AU) Bioinformatics Integration Network III. J.Z. is a Pew scholar. A.S. is supported by a European Research Council (ERC) Starting Grant from the European Community's Seventh Framework Programme (FP7/2007-2013)/ERC grant agreement no. 242922. Basic research at the IMP is supported by Boehringer Ingelheim.

Author information

Authors and Affiliations

Authors

Contributions

A.F.B. and A.S. established the analysis pipeline. Q.H. and J.Z. performed the comparative ChIP-seq experiments. A.F.B., A.S. and J.Z. wrote the manuscript.

Corresponding authors

Correspondence to Julia Zeitlinger or Alexander Stark.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Tables 1-3

Table S1: Performance of coordinate translation in vertebrates Table S2: Read mapping sensitivity for Drosophila species Table S3: Read translation sensitivity for Drosophila species (DOC 60 kb)

Supplementary Data

Correlation (TXT 0 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Bardet, A., He, Q., Zeitlinger, J. et al. A computational pipeline for comparative ChIP-seq analyses. Nat Protoc 7, 45–61 (2012). https://doi.org/10.1038/nprot.2011.420

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nprot.2011.420

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing