An integrated software system for analyzing ChIP-chip and ChIP-seq data

Abstract

We present CisGenome, a software system for analyzing genome-wide chromatin immunoprecipitation (ChIP) data. CisGenome is designed to meet all basic needs of ChIP data analyses, including visualization, data normalization, peak detection, false discovery rate computation, gene-peak association, and sequence and motif analysis. In addition to implementing previously published ChIP–microarray (ChIP-chip) analysis methods, the software contains statistical methods designed specifically for ChlP sequencing (ChIP-seq) data obtained by coupling ChIP with massively parallel sequencing. The modular design of CisGenome enables it to support interactive analyses through a graphic user interface as well as customized batch-mode computation for advanced data mining. A built-in browser allows visualization of array images, signals, gene structure, conservation, and DNA sequence and motif information. We demonstrate the use of these tools by a comparative analysis of ChIP-chip and ChIP-seq data for the transcription factor NRSF/REST, a study of ChIP-seq analysis with or without a negative control sample, and an analysis of a new motif in Nanog- and Sox2-binding regions.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: The basic framework of CisGenome.
Figure 2: ChIP-seq data processing.
Figure 3: Comparisons between NRSF ChIP-seq and ChIP-chip data.
Figure 4: Analysis of a novel motif in Sox2 and Nanog binding regions.

Accession codes

Accessions

Gene Expression Omnibus

References

  1. 1

    Cawley, S. et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116, 499–509 (2004).

    CAS  Article  Google Scholar 

  2. 2

    Boyer, L.A. et al. Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 122, 947–956 (2005).

    CAS  Article  Google Scholar 

  3. 3

    Carroll, J.S. et al. Genome-wide analysis of estrogen receptor binding sites. Nat. Genet. 38, 1289–1297 (2006).

    CAS  Article  Google Scholar 

  4. 4

    Johnson, D.S., Mortazavi, A., Myers, R.M. & Wold, B. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007).

    CAS  Article  Google Scholar 

  5. 5

    Robertson, G. et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat. Methods 4, 651–657 (2007).

    CAS  Article  Google Scholar 

  6. 6

    Mikkelsen, T.S. et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553–560 (2007).

    CAS  Article  Google Scholar 

  7. 7

    Barski, A. et al. High-resolution profiling of histone methylations in the human genome. Cell 129, 823–837 (2007).

    CAS  Article  Google Scholar 

  8. 8

    Chen, X. et al. Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell 133, 1106–1117 (2008).

    CAS  Article  Google Scholar 

  9. 9

    Wederell, E.D. et al. Global analysis of in vivo Foxa2-binding sites in mouse adult liver using massively parallel sequencing. Nucleic Acids Res. 36, 4549–4564 (2008).

    CAS  Article  Google Scholar 

  10. 10

    Marson, A. et al. Connecting microRNA genes to the core transcriptional regulatory circuitry of embryonic stem cells. Cell 134, 521–533 (2008).

    CAS  Article  Google Scholar 

  11. 11

    Johnson, W.E. et al. Model-based analysis of tiling-arrays for ChIP-chip. Proc. Natl. Acad. Sci. USA 103, 12457–12462 (2006).

    CAS  Article  Google Scholar 

  12. 12

    Ji, H. & Wong, W.H. TileMap: create chromosomal map of tiling array hybridizations. Bioinformatics 21, 3629–3636 (2005).

    CAS  Article  Google Scholar 

  13. 13

    Kampa, D. et al. Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22. Genome Res. 14, 331–342 (2004).

    CAS  Article  Google Scholar 

  14. 14

    Zheng, M., Barrera, L.O., Ren, B. & Wu, Y.N. ChIP-chip: data, model, and analysis. Biometrics 63, 787–796 (2007).

    CAS  Article  Google Scholar 

  15. 15

    Keles, S. Mixture modeling for genome-wide localization of transcription factors. Biometrics 63, 10–21 (2007).

    CAS  Article  Google Scholar 

  16. 16

    Ghosh, S., Hirsch, H.A., Sekinger, E., Struhl, K. & Gingeras, T.R. Rank-statistics based enrichment-site prediction algorithm developed for chromatin immunoprecipitation on chip experiments. BMC Bioinformatics 7, 434 (2006).

    Article  Google Scholar 

  17. 17

    Du, J. et al. A supervised hidden markov model framework for efficiently segmenting tiling array data in transcriptional and chIP-chip experiments: systematically incorporating validated biological knowledge. Bioinformatics 22, 3016–3024 (2006).

    CAS  Article  Google Scholar 

  18. 18

    Qi, Y. et al. High-resolution computational models of genome binding events. Nat. Biotechnol. 24, 963–970 (2006).

    CAS  Article  Google Scholar 

  19. 19

    Scacheri, P.C., Crawford, G.E. & Davis, S. Statistics for ChIP-chip and DNase hypersensitivity experiments on NimbleGen arrays. Methods Enzymol. 411, 270–282 (2006).

    CAS  Article  Google Scholar 

  20. 20

    Bieda, M., Xu, X., Singer, M.A., Green, R. & Farnham, P.J. Unbiased location analysis of E2F1-binding sites suggests a widespread role for E2F1 in the human genome. Genome Res. 16, 595–605 (2006).

    CAS  Article  Google Scholar 

  21. 21

    Zhang, Z.D. et al. Tilescope: online analysis pipeline for high-density tiling microarray data. Genome Biol. 8, R81 (2007).

    Article  Google Scholar 

  22. 22

    Song, J.S. et al. Model-based analysis of two-color arrays (MA2C). Genome Biol. 8, R178 (2007).

    Article  Google Scholar 

  23. 23

    Reiss, D.J., Facciotti, M.T. & Baliga, N.S. Model-based deconvolution of genome-wide DNA binding. Bioinformatics 24, 396–403 (2008).

    CAS  Article  Google Scholar 

  24. 24

    Song, J.S. et al. Microarray blob-defect removal improves array analysis. Bioinformatics 23, 966–971 (2007).

    CAS  Article  Google Scholar 

  25. 25

    Liu, X.S., Brutlag, D.L. & Liu, J.S. An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments. Nat. Biotechnol. 20, 835–839 (2002).

    CAS  Article  Google Scholar 

  26. 26

    Hong, P. et al. A boosting approach for motif modeling using ChIP-chip data. Bioinformatics 21, 2636–2643 (2005).

    CAS  Article  Google Scholar 

  27. 27

    Shim, H. & Keles, S. Integrating quantitative information from ChIP-chip experiments into motif finding. Biostatistics 9, 51–65 (2008).

    Article  Google Scholar 

  28. 28

    Ji, X., Li, W., Song, J., Wei, L. & Liu, X.S. CEAS: cis-regulatory element annotation system. Nucleic Acids Res. 34, W551–554 (2006).

    CAS  Article  Google Scholar 

  29. 29

    Albert, I., Wachi, S., Jiang, C. & Pugh, B.F. GeneTrack–a genomic data processing and visualization framework. Bioinformatics 24, 1305–1306 (2008).

    CAS  Article  Google Scholar 

  30. 30

    Valouev, A. et al. Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. Nat. Methods 5, 829–834 (2008).

    CAS  Article  Google Scholar 

  31. 31

    Jothi, R., Cuddapah, S., Barski, A., Cui, K. & Zhao, K. Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data. Nucleic Acids Res. 36, 5221–5231 (2008).

    CAS  Article  Google Scholar 

  32. 32

    Wheeler, D.L. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 36, D13–D21 (2008).

    CAS  Article  Google Scholar 

  33. 33

    Karolchik, D. et al. The UCSC genome browser database: 2008 update. Nucleic Acids Res. 36, D773–D779 (2008).

    CAS  Article  Google Scholar 

  34. 34

    Flicek, P. et al. Ensembl 2008. Nucleic Acids Res. 36, D707–D714 (2008).

    CAS  Article  Google Scholar 

  35. 35

    Liu, J.S., Neuwald, A.F. & Lawrence, C.E. Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J. Am. Stat. Assoc. 90, 1156–1170 (1995).

    Article  Google Scholar 

  36. 36

    Zhou, Q. & Wong, W.H. CisModule: de novo discovery of cis-regulatory modules by hierarchical mixture modeling. Proc. Natl. Acad. Sci. USA 101, 12114–12119 (2004).

    CAS  Article  Google Scholar 

  37. 37

    Ji, H., Vokes, S.A. & Wong, W.H. A comparative analysis of genome-wide chromatin immunoprecipitation data for mammalian transcription factors. Nucleic Acids Res. 34, e146 (2006).

    Article  Google Scholar 

  38. 38

    Chen, Z.F., Paquette, A.J. & Anderson, D.J. NRSF/REST is required in vivo for repression of multiple neuronal target genes during embryogenesis. Nat. Genet. 20, 136–142 (1998).

    CAS  Article  Google Scholar 

  39. 39

    Chong, J.A. et al. REST: a mammalian silencer protein that restricts sodium channel gene expression to neurons. Cell 80, 949–957 (1995).

    CAS  Article  Google Scholar 

  40. 40

    Matys, V. et al. TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 34, D108–D110 (2006).

    CAS  Article  Google Scholar 

  41. 41

    Johnson, D.S. et al. Systematic evaluation of variability in ChIP-chip experiments using predefined DNA targets. Genome Res. 18, 393–403 (2008).

    Article  Google Scholar 

  42. 42

    Bailey, T.L. & Elkan, C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, 28–36. AAAI Press, Menlo Park, California, USA, (1994).

  43. 43

    Giardine, B. et al. Galaxy: A platform for interactive large-scale genome analysis. Genome Res. 15, 1451–1455 (2005).

    CAS  Article  Google Scholar 

  44. 44

    Crooks, G.E., Hon, G., Chandonia, J.M. & Brenner, S.E. WebLogo: A sequence logo generator. Genome Res. 14, 1188–1190 (2004).

    CAS  Article  Google Scholar 

  45. 45

    The ENCODE Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007).

  46. 46

    Euskirchen, G.M. et al. Mapping of transcription factor binding regions in mammalian cells by ChIP: comparison of array- and sequencing-based technologies. Genome Res. 17, 898–909 (2007).

    CAS  Article  Google Scholar 

  47. 47

    Jiang, H. & Wong, W.H. SeqMap: mapping massive amount of oligonucleotides to the genome. Bioinformatics 24, 2395–2396 (2008).

    CAS  Article  Google Scholar 

  48. 48

    Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).

    CAS  Article  Google Scholar 

  49. 49

    Schmid, C.D. & Bucher, P. ChIP-Seq data reveal nucleosome architecture of human promoters. Cell 131, 831–832 (2007).

    CAS  Article  Google Scholar 

Download references

Acknowledgements

We thank W. Li for assistance with analyzing the ChIP-chip spike-in data. This research was supported by National Institutes of Health grant HG003903 (to W.H.W.) and the National Human Genome Research Institute's ENCODE project (to R.M.M.). H. Ji is partially supported by the Johns Hopkins Bloomberg School of Public Health Richard L. Gelb Cancer Research Fund.

Author information

Affiliations

Authors

Contributions

H. Ji conceived the study, developed the CisGenome GUI and data analysis algorithms, carried out data analyses and drafted the manuscript. H. Jiang developed the CisGenome browser. W.M. participated in algorithm development and carried out data analyses. D.S.J. and R.M.M. generated NRSF ChIP-chip data. W.H.W. conceived the study and drafted the manuscript. All authors read and revised the manuscript.

Corresponding author

Correspondence to Wing H Wong.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–17, Supplementary Tables 1–15, Supplementary Notes, Supplementary Methods, Supplementary Data (PDF 2354 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Ji, H., Jiang, H., Ma, W. et al. An integrated software system for analyzing ChIP-chip and ChIP-seq data. Nat Biotechnol 26, 1293–1300 (2008). https://doi.org/10.1038/nbt.1505

Download citation

Further reading