Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Design and analysis of ChIP-seq experiments for DNA-binding proteins

Abstract

Recent progress in massively parallel sequencing platforms has enabled genome-wide characterization of DNA-associated proteins using the combination of chromatin immunoprecipitation and sequencing (ChIP-seq). Although a variety of methods exist for analysis of the established alternative ChIP microarray (ChIP-chip), few approaches have been described for processing ChIP-seq data. To fill this gap, we propose an analysis pipeline specifically designed to detect protein-binding positions with high accuracy. Using previously reported data sets for three transcription factors, we illustrate methods for improving tag alignment and correcting for background signals. We compare the sensitivity and spatial precision of three peak detection algorithms with published methods, demonstrating gains in spatial precision when an asymmetric distribution of tags on positive and negative strands is considered. We also analyze the relationship between the depth of sequencing and characteristics of the detected binding positions, and provide a method for estimating the sequencing depth necessary for a desired coverage of protein binding sites.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Protein-binding detection from ChIP-seq data.
Figure 2: Selecting informative tag classes based on the change in strand cross-correlation magnitude.
Figure 3: Examples of anomalies in background tag distributions.
Figure 4: Binding position detection methods and their relative sensitivity.
Figure 5: Accuracy of determined binding positions.
Figure 6: Analysis of sequencing depth.

Similar content being viewed by others

References

  1. Kim, T.H. & Ren, B. Genome-wide analysis of protein-DNA interactions. Annu. Rev. Genomics Hum. Genet. 7, 81–102 (2006).

    Article  Google Scholar 

  2. Johnson, D.S., Mortazavi, A., Myers, R.M. & Wold, B. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007).

    Article  CAS  Google Scholar 

  3. Impey, S. et al. Defining the CREB regulon: a genome-wide analysis of transcription factor regulatory regions. Cell 119, 1041–1054 (2004).

    CAS  PubMed  Google Scholar 

  4. Roh, T.Y., Cuddapah, S. & Zhao, K. Active chromatin domains are defined by acetylation islands revealed by genome-wide mapping. Genes Dev. 19, 542–552 (2005).

    Article  CAS  Google Scholar 

  5. Bhinge, A.A., Kim, J., Euskirchen, G.M., Snyder, M. & Iyer, V.R. Mapping the chromosomal targets of STAT1 by sequence tag analysis of genomic enrichment (STAGE). Genome Res. 17, 910–916 (2007).

    Article  CAS  Google Scholar 

  6. Eckhardt, F. et al. DNA methylation profiling of human chromosomes 6, 20 and 22. Nat. Genet. 38, 1378–1385 (2006).

    Article  CAS  Google Scholar 

  7. Bentley, D.R. Whole-genome re-sequencing. Curr. Opin. Genet. Dev. 16, 545–552 (2006).

    Article  CAS  Google Scholar 

  8. Johnson, W.E. et al. Model-based analysis of tiling-arrays for ChIP-chip. Proc. Natl. Acad. Sci. USA 103, 12457–12462 (2006).

    Article  CAS  Google Scholar 

  9. Qi, Y. et al. High-resolution computational models of genome binding events. Nat. Biotechnol. 24, 963–970 (2006).

    Article  CAS  Google Scholar 

  10. Peng, S., Alekseyenko, A.A., Larschan, E., Kuroda, M.I. & Park, P.J. Normalization and experimental design for ChIP-chip data. BMC Bioinformatics 8, 219 (2007).

    Article  Google Scholar 

  11. Robertson, G. et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat. Methods 4, 651–657 (2007).

    Article  CAS  Google Scholar 

  12. Smith, A.D., Xuan, Z. & Zhang, M.Q. Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinformatics 9, 128 (2008).

    Article  Google Scholar 

  13. Barski, A. et al. High-resolution profiling of histone methylations in the human genome. Cell 129, 823–837 (2007).

    Article  CAS  Google Scholar 

  14. Mortazavi, A., Leeper Thompson, E.C., Garcia, S.T., Myers, R.M. & Wold, B. Comparative genomics modeling of the NRSF/REST repressor network: from single conserved sites to genome-wide repertoire. Genome Res. 16, 1208–1221 (2006).

    Article  CAS  Google Scholar 

  15. Kim, T.H. et al. Analysis of the vertebrate insulator protein CTCF-binding sites in the human genome. Cell 128, 1231–1245 (2007).

    Article  CAS  Google Scholar 

  16. Kent, W.J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).

    Article  CAS  Google Scholar 

  17. Bailey, T.L., Williams, N., Misleh, C. & Li, W.W. MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res. 34, W369–W373 (2006).

    Article  CAS  Google Scholar 

  18. Matys, V. et al. TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 31, 374–378 (2003).

    Article  CAS  Google Scholar 

  19. Price, R.M. & Bonett, D.G. Estimating the ratio of two Poisson rates. Comput. Stat. Data Anal. 34, 345–356 (2000).

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank Dustin Schones and Keji Zhao for providing raw data and detailed descriptions for the CTCF data set, and Ali Mortazavi and Barbara Wold for providing sequence tag data for NRSF binding. This work was supported by grants from the National Institutes of Health to P.J.P. (U01HG004258, R01GM082798, UL1RR024920).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Peter V Kharchenko or Peter J Park.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kharchenko, P., Tolstorukov, M. & Park, P. Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nat Biotechnol 26, 1351–1359 (2008). https://doi.org/10.1038/nbt.1508

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt.1508

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing