Article | Published:

PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls

Nature Biotechnology volume 27, pages 6675 (2009) | Download Citation

Abstract

Chromatin immunoprecipitation (ChIP) followed by tag sequencing (ChIP-seq) using high-throughput next-generation instrumentation is fast, replacing chromatin immunoprecipitation followed by genome tiling array analysis (ChIP-chip) as the preferred approach for mapping of sites of transcription-factor binding and chromatin modification. Using two deeply sequenced data sets for human RNA polymerase II and STAT1, each with matching input-DNA controls, we describe a general scoring approach to address unique challenges in ChIP-seq data analysis. Our approach is based on the observation that sites of potential binding are strongly correlated with signal peaks in the control, likely revealing features of open chromatin. We develop a two-pass strategy called PeakSeq to compensate for this. A two-pass strategy compensates for signal caused by open chromatin, as revealed by inclusion of the controls. The first pass identifies putative binding sites and compensates for genomic variation in the 'mappability' of sequences. The second pass filters out sites not significantly enriched compared to the normalized control, computing precise enrichments and significances. Our scoring procedure enables us to optimize experimental design by estimating the depth of sequencing required for a desired level of coverage and demonstrating that more than two replicates provides only a marginal gain in information.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Accessions

Gene Expression Omnibus

References

  1. 1.

    et al. Genome-wide location and function of DNA binding proteins. Science 290, 2306–2309 (2000).

  2. 2.

    et al. Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature 409, 533–538 (2001).

  3. 3.

    & ChIP-chip: a genomic approach for identifying transcription factor binding sites. Methods Enzymol. 350, 469–483 (2002).

  4. 4.

    et al. Mapping DNA-protein interactions in large genomes by sequence tag analysis of genomic enrichment. Nat. Methods 2, 47–53 (2005).

  5. 5.

    et al. A global map of p53 transcription-factor binding sites in the human genome. Cell 124, 207–219 (2006).

  6. 6.

    et al. Mapping of transcription factor binding regions in mammalian cells by ChIP: comparison of array- and sequencing-based technologies. Genome Res. 17, 898–909 (2007).

  7. 7.

    et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat. Methods 4, 651–657 (2007).

  8. 8.

    et al. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007).

  9. 9.

    et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007).

  10. 10.

    et al. Modeling ChIP sequencing in silico with applications. PLoS Comput. Biol. 4, e1000158 (2008).

  11. 11.

    et al. FAIRE (formaldehyde-assisted isolation of regulatory elements) isolates active regulatory elements from human chromatin. Genome Res. 17, 877–885 (2007).

  12. 12.

    et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).

  13. 13.

    et al. An analysis of the feasibility of short read sequencing. Nucleic Acids Res. 33, e171 (2005).

  14. 14.

    et al. Tilescope: online analysis pipeline for high-density tiling microarray data. Genome Biol. 8, R81 (2007).

  15. 15.

    et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318, 420–426 (2007).

  16. 16.

    et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).

  17. 17.

    & Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57, 289–300 (1995).

  18. 18.

    , & Assessing the need for sequence-based normalization in tiling microarray experiments. Bioinformatics 23, 988–997 (2007).

  19. 19.

    , & Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851–1858 (2008).

  20. 20.

    et al. SOAP: short oligonucleotide alignment program. Bioinformatics 24, 713–714 (2008).

  21. 21.

    et al. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116, 499–509 (2004).

  22. 22.

    A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B 64, 479–498 (2002).

  23. 23.

    The positive false discovery rate: a Bayesian interpretation and the q-value. Ann. Statist. 31, 2013–2035 (2003).

  24. 24.

    et al. Chipper: discovering transcription-factor targets from chromatin immunoprecipitation microarrays using variance stabilization. Genome Biol. 6, R96 (2005).

Download references

Acknowledgements

This work was done with support by grants from the National Institutes of Health (NIH) and made use of the Yale University Life Sciences Computing Center (NIH grant RR19895). We acknowledge Mike Wilson's assistance with submission of data to GEO.

Author information

Affiliations

  1. Molecular Biophysics & Biochemistry Dept., Yale University, PO Box 208114, New Haven, Connecticut 06520-8114, USA.

    • Joel Rozowsky
    • , Zhengdong D Zhang
    • , Theodore Gibson
    • , Michael Snyder
    •  & Mark B Gerstein
  2. Molecular, Cellular & Developmental Biology Dept, Yale University, New Haven, Connecticut 06520, USA.

    • Ghia Euskirchen
    •  & Michael Snyder
  3. Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA.

    • Raymond K Auerbach
    •  & Mark B Gerstein
  4. Department of Computer Science, Yale University, New Haven, Connecticut 06520, USA.

    • Robert Bjornson
    • , Nicholas Carriero
    •  & Mark B Gerstein

Authors

  1. Search for Joel Rozowsky in:

  2. Search for Ghia Euskirchen in:

  3. Search for Raymond K Auerbach in:

  4. Search for Zhengdong D Zhang in:

  5. Search for Theodore Gibson in:

  6. Search for Robert Bjornson in:

  7. Search for Nicholas Carriero in:

  8. Search for Michael Snyder in:

  9. Search for Mark B Gerstein in:

Contributions

J.R. conceived and developed the scoring methodology, analyzed the data presented in the paper and wrote the manuscript. G.E. generated the experimental data. R.K.A. assisted with the analysis in the paper as well as editing the manuscript. Z.D.Z. was involved in the conceptualization of the scoring methodology. T.G. assisted in the coding of the PeakSeq scoring procedure. R.B. and N.C. developed the code for generating indexed mappability maps of a genome and assisted with analysis. M.S. helped conceive of the scoring methodology and with the editing of the manuscript. M.B.G. also helped conceive of the scoring methodology as well as supervised the analysis and writing of the manuscript.

Corresponding authors

Correspondence to Joel Rozowsky or Mark B Gerstein.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figure 1, Supplementary Tables 1 and 2 and Supplementary Notes

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nbt.1518

Further reading