Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A complete workflow for the analysis of full-size ChIP-seq (and similar) data sets using peak-motifs

Abstract

This protocol explains how to use the online integrated pipeline 'peak-motifs' (http://rsat.ulb.ac.be/rsat/) to predict motifs and binding sites in full-size peak sets obtained by chromatin immunoprecipitation–sequencing (ChIP-seq) or related technologies. The workflow combines four time- and memory-efficient motif discovery algorithms to extract significant motifs from the sequences. Discovered motifs are compared with databases of known motifs to identify potentially bound transcription factors. Sequences are scanned to predict transcription factor binding sites and analyze their enrichment and positional distribution relative to peak centers. Peaks and binding sites are exported as BED tracks that can be uploaded into the University of California Santa Cruz (UCSC) genome browser for visualization in the genomic context. This protocol is illustrated with the analysis of a set of 6,000 peaks (8 Mb in total) bound by the Drosophila transcription factor Krüppel. The complete workflow is achieved in about 25 min of computational time on the Regulatory Sequence Analysis Tools (RSAT) Web server. This protocol can be followed in about 1 h.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1
Figure 2: Screenshot of the peak-motifs web form.
Figure 3: Input sequence treatment (top) and motif discovery (bottom) options.
Figure 4: Options for motif comparisons (top) and predicted sites visualization (bottom).
Figure 5: Sequence lengths and composition.
Figure 6: Dinucleotide composition and derived background models.
Figure 7: Reference motifs.
Figure 8: Discovered motifs grouped by algorithm.
Figure 9: Discovered motifs with motif comparisons.
Figure 10: Motif comparisons.
Figure 11: Predicted sites visualized in their genomic contexts on the UCSC genome browser.
Figure 12: Motif discovery approaches.

Accession codes

Accessions

Gene Expression Omnibus

References

  1. Robertson, G. et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat. Methods 4, 651–657 (2007).

    PubMed  CAS  Google Scholar 

  2. Johnson, D.S., Mortazavi, A., Myers, R.M. & Wold, B. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007).

    PubMed  CAS  Google Scholar 

  3. Pepke, S., Wold, B. & Mortazavi, A. Computation for ChIP-seq and RNA-seq studies. Nat. Methods 6, S22–S32 (2009).

    PubMed  PubMed Central  CAS  Google Scholar 

  4. Boeva, V. et al. De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis. Nucleic Acids Res. 38, e126 (2010).

    PubMed  PubMed Central  Google Scholar 

  5. Machanick, P. & Bailey, T.L. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27, 1696 (2011).

    PubMed  PubMed Central  CAS  Google Scholar 

  6. Bailey, T.L. DREME: Motif discovery in transcription factor ChIP-seq data. Bioinformatics 27, 1653–1659 (2011).

    PubMed  PubMed Central  CAS  Google Scholar 

  7. Rusk, N. Focus on next-generation sequencing data analysis. Nat. Methods 6, S1 (2009).

    PubMed  CAS  Google Scholar 

  8. McPherson, J.D. Next-generation gap. Nat. Methods 6, S2–S5 (2009).

    PubMed  CAS  Google Scholar 

  9. Thomas-Chollier, M. et al. RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets. Nucleic Acids Res. 40, e31 (2012).

    PubMed  CAS  Google Scholar 

  10. Salmon-Divon, M., Dvinge, H., Tammoja, K. & Bertone, P. PeakAnalyzer: genome-wide annotation of chromatin binding and modification loci. BMC Bioinformatics 11, 415 (2010).

    PubMed  PubMed Central  Google Scholar 

  11. Portales-Casamar, E. et al. JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Res. 38, D105–D110 (2010).

    PubMed  CAS  Google Scholar 

  12. Wingender, E. The TRANSFAC project as an example of framework technology that supports the analysis of genomic regulation. Brief Bioinform. 9, 326–332 (2008).

    PubMed  CAS  Google Scholar 

  13. Gama-Castro, S. et al. RegulonDB version 7.0: transcriptional regulation of Escherichia coli K-12 integrated within genetic sensory response units (Gensor Units). Nucleic Acids Res. 39, D98–D105 (2011).

    PubMed  CAS  Google Scholar 

  14. Medina-Rivera, A. et al. Theoretical and empirical quality assessment of transcription factor-binding motifs. Nucleic Acids Res. 39, 808–824 (2011).

    PubMed  CAS  Google Scholar 

  15. Chen, X. et al. Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell 133, 1106–1117 (2008).

    PubMed  CAS  Google Scholar 

  16. Cline, M.S. et al. Integration of biological networks and gene expression data using Cytoscape. Nat. Protoc. 2, 2366–2382 (2007).

    PubMed  PubMed Central  CAS  Google Scholar 

  17. Fujita, P.A. et al. The UCSC Genome Browser database: update 2011. Nucleic Acids Res. 39, D876–D882 (2011).

    PubMed  CAS  Google Scholar 

  18. Fullwood, M.J., Wei, C.L., Liu, E.T. & Ruan, Y. Next-generation DNA sequencing of paired-end tags (PET) for transcriptome and genome analyses. Genome Res. 19, 521–532 (2009).

    PubMed  PubMed Central  CAS  Google Scholar 

  19. Lee, T.I. et al. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298, 799–804 (2002).

    PubMed  CAS  Google Scholar 

  20. Sanford, J.R. et al. Splicing factor SFRS1 recognizes a functionally diverse landscape of RNA transcripts. Genome Res. 19, 381–394 (2009).

    PubMed  PubMed Central  CAS  Google Scholar 

  21. van Helden, J., del Olmo, M. & Perez-Ortin, J.E. Statistical analysis of yeast genomic downstream sequences reveals putative polyadenylation signals. Nucleic Acids Res. 28, 1000–1010 (2000).

    PubMed  CAS  Google Scholar 

  22. Sand, O., Thomas-Chollier, M., Vervisch, E. & van Helden, J. Analyzing multiple data sets by interconnecting RSAT programs via SOAP Web services: an example with ChIP-chip data. Nat. Protoc. 3, 1604–1615 (2008).

    PubMed  CAS  Google Scholar 

  23. van Helden, J., Andre, B. & Collado-Vides, J. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Mol. Biol. 281, 827–842 (1998).

    PubMed  CAS  Google Scholar 

  24. van Helden, J., Rios, A.F. & Collado-Vides, J. Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucleic Acids Res. 28, 1808–1818 (2000).

    PubMed  CAS  Google Scholar 

  25. Thomas-Chollier, M. et al. RSAT 2011: regulatory sequence analysis tools. Nucleic Acids Res. 39, W86–W91 (2011).

    PubMed  PubMed Central  CAS  Google Scholar 

  26. Kulakovskiy, I.V., Boeva, V.A., Favorov, A.V. & Makeev, V.J. Deep and wide digging for binding motifs in ChIP-Seq data. Bioinformatics 26, 2622–2623 (2010).

    PubMed  CAS  Google Scholar 

  27. Agius, P., Arvey, A., Chang, W., Noble, W.S. & Leslie, C. High resolution models of transcription factor-DNA affinities improve in vitro and in vivo binding predictions. PLoS Comput. Biol. 6, e1000916 (2010).

    PubMed  PubMed Central  Google Scholar 

  28. Mercier, E. et al. An integrated pipeline for the genome-wide analysis of transcription factor binding sites from ChIP-Seq. Plos ONE 6, e16432 (2011).

    PubMed  PubMed Central  CAS  Google Scholar 

  29. Kuttippurathu, L. et al. CompleteMOTIFs: DNA motif discovery platform for transcription factor binding experiments. Bioinformatics 27, 715–717 (2010).

    PubMed  PubMed Central  Google Scholar 

  30. van Heeringen, S.J. & Veenstra, G.J. GimmeMotifs: a de novo motif prediction pipeline for ChIP-sequencing experiments. Bioinformatics 27, 270–271 (2011).

    PubMed  CAS  Google Scholar 

  31. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

    PubMed  PubMed Central  Google Scholar 

  32. Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).

    PubMed  PubMed Central  Google Scholar 

  33. Sand, O., Turatsinze, J.V. & vanHelden, J. Evaluating the prediction of cis-acting regulatory elements in genome sequences. in Modern Genome Annotation: The BioSapiens Network (eds. Frishman, D. & Valencia, A.) (Springer, 2008).

  34. Bradley, R.K. et al. Binding site turnover produces pervasive quantitative changes in transcription factor binding between closely related Drosophila species. PLoS Biol. 8, e1000343 (2010).

    PubMed  PubMed Central  Google Scholar 

  35. Barrett, T. et al. NCBI GEO: archive for functional genomics data sets—10 years on. Nucleic Acids Res. 39, D1005–D1010 (2011).

    PubMed  CAS  Google Scholar 

  36. Goecks, J., Nekrutenko, A. & Taylor, J. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11, R86 (2010).

    PubMed  PubMed Central  Google Scholar 

  37. Valouev, A. et al. Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. Nat. Methods 5, 829–834 (2008).

    PubMed  PubMed Central  CAS  Google Scholar 

  38. Bergman, C.M., Carlson, J.W. & Celniker, S.E. Drosophila DNase I footprint database: a systematic genome annotation of transcription factor binding sites in the fruitfly, Drosophila melanogaster. Bioinformatics 21, 1747–1749 (2005).

    PubMed  CAS  Google Scholar 

  39. Flicek, P. et al. Ensembl 2011. Nucleic Acids Res. 39, D800–D806 (2011).

    PubMed  CAS  Google Scholar 

  40. Harrison, M.M., Li, X.Y., Kaplan, T., Botchan, M.R. & Eisen, M.B. Zelda binding in the early Drosophila melanogaster embryo marks regions subsequently activated at the maternal-to-zygotic transition. PLoS Genet. 7, e1002266 (2011).

    PubMed  PubMed Central  CAS  Google Scholar 

  41. Kanodia, J.S. et al. Pattern formation by graded and uniform signals in the early Drosophila embryo. Biophys. J. 102, 427–433 (2012).

    PubMed  PubMed Central  CAS  Google Scholar 

  42. Tsurumi, A. et al. STAT is an essential activator of the zygotic genome in the early Drosophila embryo. PLoS Genet. 7, e1002086 (2011).

    PubMed  PubMed Central  CAS  Google Scholar 

  43. Blow, M.J. et al. ChIP-Seq identification of weakly conserved heart enhancers. Nat. Genet. 42, 806–810 (2010).

    PubMed  PubMed Central  CAS  Google Scholar 

  44. Zhu, L.J. et al. FlyFactorSurvey: a database of Drosophila transcription factor binding specificities determined using the bacterial one-hybrid system. Nucleic Acids Res. 39, D111–D117 (2011).

    PubMed  CAS  Google Scholar 

  45. Defrance, M., Janky, R., Sand, O. & van Helden, J. Using RSAT oligo-analysis and dyad-analysis tools to discover regulatory signals in nucleic sequences. Nat. Protoc. 3, 1589–1603 (2008).

    PubMed  CAS  Google Scholar 

  46. Turatsinze, J.V., Thomas-Chollier, M., Defrance, M. & van Helden, J. Using RSAT to scan genome sequences for transcription factor binding sites and cis-regulatory modules. Nat. Protoc. 3, 1578–1588 (2008).

    PubMed  CAS  Google Scholar 

Download references

Acknowledgements

This work was supported by the Alexander von Humboldt foundation to M.T.-C.; the Agence Nationale de Recherche (ANR) partner of the ERASysBio+ initiative supported under the EU ERA-NET Plus scheme in FP7 to C.H.; ANR Young Researchers Grant 'CardiHox' to C.H.; the Belgian Program on Interuniversity Attraction Poles, initiated by the Belgian Federal Science Policy Office (project P6/25 (BioMaGNet)); EU-funded Cooperation in Science and Technology (COST) action (BM1006 'SEQAHEAD—Next-Generation Sequencing Data Analysis Network'); FP7 MICROME Collaborative Project (Microbial genomics and bio-informatics', contract number 222886-2). We acknowledge the colleagues who helped to install and maintain the RSAT Web servers: R. Leplae (ULB, Belgium), R. Zayas-Lagunas (UNAM, Mexico), E. Bongcam-Rudloff (Uppsala, Sweden), F.-X. Théodule (Aix Marseille Université, France), P. Vincens (Ecole Normale Supérieure, France) and F. Joubert (Pretoria, South Africa).

Author information

Authors and Affiliations

Authors

Contributions

J.v.H., M.T.-C. and M.D. initiated and developed the peak-motifs software tool. E.D., C.H. and D.T. contributed to improve the tool and analyzed the study case for this protocol. All authors edited the manuscript.

Corresponding authors

Correspondence to Morgane Thomas-Chollier or Jacques van Helden.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Table 1

Comparison of software tools used for analyzing motifs in ChIP-seq peak sequences. This is an updated version of the Table 1 from the original peak-motifs publication9 summarizing the tasks, algorithms and usability properties to compare the different software options for the users. Adapted from Morgane Thomas-Chollier, Carl Herrmann, Matthieu Defrance, Olivier Sand, Denis Thieffry, Jacques van Helden, RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets, Nucleic Acids Research, 2012, 40(4), by permission of Oxford University Press. (PDF 808 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Thomas-Chollier, M., Darbo, E., Herrmann, C. et al. A complete workflow for the analysis of full-size ChIP-seq (and similar) data sets using peak-motifs. Nat Protoc 7, 1551–1568 (2012). https://doi.org/10.1038/nprot.2012.088

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nprot.2012.088

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing