Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls

Abstract

Chromatin immunoprecipitation (ChIP) followed by tag sequencing (ChIP-seq) using high-throughput next-generation instrumentation is fast, replacing chromatin immunoprecipitation followed by genome tiling array analysis (ChIP-chip) as the preferred approach for mapping of sites of transcription-factor binding and chromatin modification. Using two deeply sequenced data sets for human RNA polymerase II and STAT1, each with matching input-DNA controls, we describe a general scoring approach to address unique challenges in ChIP-seq data analysis. Our approach is based on the observation that sites of potential binding are strongly correlated with signal peaks in the control, likely revealing features of open chromatin. We develop a two-pass strategy called PeakSeq to compensate for this. A two-pass strategy compensates for signal caused by open chromatin, as revealed by inclusion of the controls. The first pass identifies putative binding sites and compensates for genomic variation in the 'mappability' of sequences. The second pass filters out sites not significantly enriched compared to the normalized control, computing precise enrichments and significances. Our scoring procedure enables us to optimize experimental design by estimating the depth of sequencing required for a desired level of coverage and demonstrating that more than two replicates provides only a marginal gain in information.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: ChIP-seq signal profile maps.
Figure 2: PeakSeq scoring procedure.
Figure 3: ChIP-seq target list scaling.
Figure 4: ChIP-seq versus ChIP-chip signal tracks and target binding sites for Pol II and STAT1.
Figure 5: Depth of sequencing and value of replicates.

Similar content being viewed by others

Accession codes

Accessions

Gene Expression Omnibus

References

  1. Ren, B. et al. Genome-wide location and function of DNA binding proteins. Science 290, 2306–2309 (2000).

    Article  CAS  Google Scholar 

  2. Iyer, V.R. et al. Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature 409, 533–538 (2001).

    Article  CAS  Google Scholar 

  3. Horak, C.E. & Snyder, M. ChIP-chip: a genomic approach for identifying transcription factor binding sites. Methods Enzymol. 350, 469–483 (2002).

    Article  CAS  Google Scholar 

  4. Kim, J. et al. Mapping DNA-protein interactions in large genomes by sequence tag analysis of genomic enrichment. Nat. Methods 2, 47–53 (2005).

    Article  CAS  Google Scholar 

  5. Wei, C. et al. A global map of p53 transcription-factor binding sites in the human genome. Cell 124, 207–219 (2006).

    Article  CAS  Google Scholar 

  6. Euskirchen, G.M. et al. Mapping of transcription factor binding regions in mammalian cells by ChIP: comparison of array- and sequencing-based technologies. Genome Res. 17, 898–909 (2007).

    Article  CAS  Google Scholar 

  7. Robertson, G. et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat. Methods 4, 651–657 (2007).

    Article  CAS  Google Scholar 

  8. Johnson, D.S. et al. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007).

    Article  CAS  Google Scholar 

  9. Birney, E. et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007).

    Article  CAS  Google Scholar 

  10. Zhang, Z.D. et al. Modeling ChIP sequencing in silico with applications. PLoS Comput. Biol. 4, e1000158 (2008).

    Article  Google Scholar 

  11. Giresi, P.G. et al. FAIRE (formaldehyde-assisted isolation of regulatory elements) isolates active regulatory elements from human chromatin. Genome Res. 17, 877–885 (2007).

    Article  CAS  Google Scholar 

  12. Kent, W.J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).

    Article  CAS  Google Scholar 

  13. Whiteford, N. et al. An analysis of the feasibility of short read sequencing. Nucleic Acids Res. 33, e171 (2005).

    Article  Google Scholar 

  14. Zhang, Z.D. et al. Tilescope: online analysis pipeline for high-density tiling microarray data. Genome Biol. 8, R81 (2007).

    Article  Google Scholar 

  15. Korbel, J.O. et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318, 420–426 (2007).

    Article  CAS  Google Scholar 

  16. Kidd, J.M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).

    Article  CAS  Google Scholar 

  17. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57, 289–300 (1995).

    Google Scholar 

  18. Royce, T.E., Rozowsky, J.S. & Gerstein, M.B. Assessing the need for sequence-based normalization in tiling microarray experiments. Bioinformatics 23, 988–997 (2007).

    Article  CAS  Google Scholar 

  19. Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851–1858 (2008).

    Article  CAS  Google Scholar 

  20. Li, R. et al. SOAP: short oligonucleotide alignment program. Bioinformatics 24, 713–714 (2008).

    Article  CAS  Google Scholar 

  21. Cawley, S. et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116, 499–509 (2004).

    Article  CAS  Google Scholar 

  22. Storey, J. A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B 64, 479–498 (2002).

    Article  Google Scholar 

  23. Storey, J. The positive false discovery rate: a Bayesian interpretation and the q-value. Ann. Statist. 31, 2013–2035 (2003).

    Article  Google Scholar 

  24. Gibbons, F.D. et al. Chipper: discovering transcription-factor targets from chromatin immunoprecipitation microarrays using variance stabilization. Genome Biol. 6, R96 (2005).

    Article  Google Scholar 

Download references

Acknowledgements

This work was done with support by grants from the National Institutes of Health (NIH) and made use of the Yale University Life Sciences Computing Center (NIH grant RR19895). We acknowledge Mike Wilson's assistance with submission of data to GEO.

Author information

Authors and Affiliations

Authors

Contributions

J.R. conceived and developed the scoring methodology, analyzed the data presented in the paper and wrote the manuscript. G.E. generated the experimental data. R.K.A. assisted with the analysis in the paper as well as editing the manuscript. Z.D.Z. was involved in the conceptualization of the scoring methodology. T.G. assisted in the coding of the PeakSeq scoring procedure. R.B. and N.C. developed the code for generating indexed mappability maps of a genome and assisted with analysis. M.S. helped conceive of the scoring methodology and with the editing of the manuscript. M.B.G. also helped conceive of the scoring methodology as well as supervised the analysis and writing of the manuscript.

Corresponding authors

Correspondence to Joel Rozowsky or Mark B Gerstein.

Supplementary information

Supplementary Text and Figures

Supplementary Figure 1, Supplementary Tables 1 and 2 and Supplementary Notes (PDF 717 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rozowsky, J., Euskirchen, G., Auerbach, R. et al. PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nat Biotechnol 27, 66–75 (2009). https://doi.org/10.1038/nbt.1518

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt.1518

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing