PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls

Rozowsky, Joel; Euskirchen, Ghia; Auerbach, Raymond K; Zhang, Zhengdong D; Gibson, Theodore; Bjornson, Robert; Carriero, Nicholas; Snyder, Michael; Gerstein, Mark B

doi:10.1038/nbt.1518

Article
Published: 04 January 2009

PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls

Joel Rozowsky¹,
Ghia Euskirchen²,
Raymond K Auerbach³,
Zhengdong D Zhang¹,
Theodore Gibson¹,
Robert Bjornson⁴,
Nicholas Carriero⁴,
Michael Snyder^1,2 &
…
Mark B Gerstein^1,3,4

Nature Biotechnology volume 27, pages 66–75 (2009)Cite this article

12k Accesses
415 Citations
7 Altmetric
Metrics details

Abstract

Chromatin immunoprecipitation (ChIP) followed by tag sequencing (ChIP-seq) using high-throughput next-generation instrumentation is fast, replacing chromatin immunoprecipitation followed by genome tiling array analysis (ChIP-chip) as the preferred approach for mapping of sites of transcription-factor binding and chromatin modification. Using two deeply sequenced data sets for human RNA polymerase II and STAT1, each with matching input-DNA controls, we describe a general scoring approach to address unique challenges in ChIP-seq data analysis. Our approach is based on the observation that sites of potential binding are strongly correlated with signal peaks in the control, likely revealing features of open chromatin. We develop a two-pass strategy called PeakSeq to compensate for this. A two-pass strategy compensates for signal caused by open chromatin, as revealed by inclusion of the controls. The first pass identifies putative binding sites and compensates for genomic variation in the 'mappability' of sequences. The second pass filters out sites not significantly enriched compared to the normalized control, computing precise enrichments and significances. Our scoring procedure enables us to optimize experimental design by estimating the depth of sequencing required for a desired level of coverage and demonstrating that more than two replicates provides only a marginal gain in information.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: ChIP-seq signal profile maps.**

**Figure 2: PeakSeq scoring procedure.**

**Figure 3: ChIP-seq target list scaling.**

**Figure 4: ChIP-seq versus ChIP-chip signal tracks and target binding sites for Pol II and STAT1.**

**Figure 5: Depth of sequencing and value of replicates.**

Streamlined quantitative analysis of histone modification abundance at nucleosome-scale resolution with siQ-ChIP version 2.0

Article Open access 09 May 2023

AutoRELACS: automated generation and analysis of ultra-parallel ChIP-seq

Article Open access 24 July 2020

Native internally calibrated chromatin immunoprecipitation for quantitative studies of histone post-translational modifications

Article 13 November 2019

Accession codes

Accessions

Gene Expression Omnibus

References

Ren, B. et al. Genome-wide location and function of DNA binding proteins. Science 290, 2306–2309 (2000).
Article CAS Google Scholar
Iyer, V.R. et al. Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature 409, 533–538 (2001).
Article CAS Google Scholar
Horak, C.E. & Snyder, M. ChIP-chip: a genomic approach for identifying transcription factor binding sites. Methods Enzymol. 350, 469–483 (2002).
Article CAS Google Scholar
Kim, J. et al. Mapping DNA-protein interactions in large genomes by sequence tag analysis of genomic enrichment. Nat. Methods 2, 47–53 (2005).
Article CAS Google Scholar
Wei, C. et al. A global map of p53 transcription-factor binding sites in the human genome. Cell 124, 207–219 (2006).
Article CAS Google Scholar
Euskirchen, G.M. et al. Mapping of transcription factor binding regions in mammalian cells by ChIP: comparison of array- and sequencing-based technologies. Genome Res. 17, 898–909 (2007).
Article CAS Google Scholar
Robertson, G. et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat. Methods 4, 651–657 (2007).
Article CAS Google Scholar
Johnson, D.S. et al. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007).
Article CAS Google Scholar
Birney, E. et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007).
Article CAS Google Scholar
Zhang, Z.D. et al. Modeling ChIP sequencing in silico with applications. PLoS Comput. Biol. 4, e1000158 (2008).
Article Google Scholar
Giresi, P.G. et al. FAIRE (formaldehyde-assisted isolation of regulatory elements) isolates active regulatory elements from human chromatin. Genome Res. 17, 877–885 (2007).
Article CAS Google Scholar
Kent, W.J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).
Article CAS Google Scholar
Whiteford, N. et al. An analysis of the feasibility of short read sequencing. Nucleic Acids Res. 33, e171 (2005).
Article Google Scholar
Zhang, Z.D. et al. Tilescope: online analysis pipeline for high-density tiling microarray data. Genome Biol. 8, R81 (2007).
Article Google Scholar
Korbel, J.O. et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318, 420–426 (2007).
Article CAS Google Scholar
Kidd, J.M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).
Article CAS Google Scholar
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57, 289–300 (1995).
Google Scholar
Royce, T.E., Rozowsky, J.S. & Gerstein, M.B. Assessing the need for sequence-based normalization in tiling microarray experiments. Bioinformatics 23, 988–997 (2007).
Article CAS Google Scholar
Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851–1858 (2008).
Article CAS Google Scholar
Li, R. et al. SOAP: short oligonucleotide alignment program. Bioinformatics 24, 713–714 (2008).
Article CAS Google Scholar
Cawley, S. et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116, 499–509 (2004).
Article CAS Google Scholar
Storey, J. A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B 64, 479–498 (2002).
Article Google Scholar
Storey, J. The positive false discovery rate: a Bayesian interpretation and the q-value. Ann. Statist. 31, 2013–2035 (2003).
Article Google Scholar
Gibbons, F.D. et al. Chipper: discovering transcription-factor targets from chromatin immunoprecipitation microarrays using variance stabilization. Genome Biol. 6, R96 (2005).
Article Google Scholar

Download references

Acknowledgements

This work was done with support by grants from the National Institutes of Health (NIH) and made use of the Yale University Life Sciences Computing Center (NIH grant RR19895). We acknowledge Mike Wilson's assistance with submission of data to GEO.

Author information

Authors and Affiliations

Molecular Biophysics & Biochemistry Dept., Yale University, PO Box 208114, New Haven, 06520-8114, Connecticut, USA
Joel Rozowsky, Zhengdong D Zhang, Theodore Gibson, Michael Snyder & Mark B Gerstein
Molecular, Cellular & Developmental Biology Dept, Yale University, New Haven, 06520, Connecticut, USA
Ghia Euskirchen & Michael Snyder
Program in Computational Biology and Bioinformatics, Yale University, New Haven, 06520, Connecticut, USA
Raymond K Auerbach & Mark B Gerstein
Department of Computer Science, Yale University, New Haven, 06520, Connecticut, USA
Robert Bjornson, Nicholas Carriero & Mark B Gerstein

Authors

Joel Rozowsky
View author publications
You can also search for this author in PubMed Google Scholar
Ghia Euskirchen
View author publications
You can also search for this author in PubMed Google Scholar
Raymond K Auerbach
View author publications
You can also search for this author in PubMed Google Scholar
Zhengdong D Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Theodore Gibson
View author publications
You can also search for this author in PubMed Google Scholar
Robert Bjornson
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas Carriero
View author publications
You can also search for this author in PubMed Google Scholar
Michael Snyder
View author publications
You can also search for this author in PubMed Google Scholar
Mark B Gerstein
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.R. conceived and developed the scoring methodology, analyzed the data presented in the paper and wrote the manuscript. G.E. generated the experimental data. R.K.A. assisted with the analysis in the paper as well as editing the manuscript. Z.D.Z. was involved in the conceptualization of the scoring methodology. T.G. assisted in the coding of the PeakSeq scoring procedure. R.B. and N.C. developed the code for generating indexed mappability maps of a genome and assisted with analysis. M.S. helped conceive of the scoring methodology and with the editing of the manuscript. M.B.G. also helped conceive of the scoring methodology as well as supervised the analysis and writing of the manuscript.

Corresponding authors

Correspondence to Joel Rozowsky or Mark B Gerstein.

Supplementary information

Supplementary Text and Figures

Supplementary Figure 1, Supplementary Tables 1 and 2 and Supplementary Notes (PDF 717 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rozowsky, J., Euskirchen, G., Auerbach, R. et al. PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nat Biotechnol 27, 66–75 (2009). https://doi.org/10.1038/nbt.1518

Download citation

Received: 13 August 2008
Accepted: 03 December 2008
Published: 04 January 2009
Issue Date: January 2009
DOI: https://doi.org/10.1038/nbt.1518

This article is cited by

WACS: improving ChIP-seq peak calling by optimally weighting controls
- Aseel Awdeh
- Marcel Turcotte
- Theodore J. Perkins
BMC Bioinformatics (2021)
The γ-tubulin meshwork assists in the recruitment of PCNA to chromatin in mammalian cells
- Matthieu Corvaisier
- Jingkai Zhou
- Maria Alvarado-Kristensson
Communications Biology (2021)
Productive visualization of high-throughput sequencing data using the SeqCode open portable platform
- Enrique Blanco
- Mar González-Ramírez
- Luciano Di Croce
Scientific Reports (2021)
NGS-Integrator: An efficient tool for combining multiple NGS data tracks using minimum Bayes’ factors
- Bronte Wen
- Hyun Jun Jung
- Mark A. Knepper
BMC Genomics (2020)
CSA: a web service for the complete process of ChIP-Seq analysis
- Min Li
- Li Tang
- Jianxin Wang
BMC Bioinformatics (2019)

PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls

Abstract

Access options

Similar content being viewed by others

Streamlined quantitative analysis of histone modification abundance at nucleosome-scale resolution with siQ-ChIP version 2.0

AutoRELACS: automated generation and analysis of ultra-parallel ChIP-seq

Native internally calibrated chromatin immunoprecipitation for quantitative studies of histone post-translational modifications

Accession codes

Accessions

Gene Expression Omnibus

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Supplementary information

Supplementary Text and Figures

Rights and permissions

About this article

Cite this article

This article is cited by

WACS: improving ChIP-seq peak calling by optimally weighting controls

The γ-tubulin meshwork assists in the recruitment of PCNA to chromatin in mammalian cells

Productive visualization of high-throughput sequencing data using the SeqCode open portable platform

NGS-Integrator: An efficient tool for combining multiple NGS data tracks using minimum Bayes’ factors

CSA: a web service for the complete process of ChIP-Seq analysis

Search

Quick links

Abstract

Access options

Similar content being viewed by others

Accession codes

Accessions

Gene Expression Omnibus

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links