Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Systematic evaluation of factors influencing ChIP-seq fidelity

Abstract

We evaluated how variations in sequencing depth and other parameters influence interpretation of chromatin immunoprecipitation–sequencing (ChIP-seq) experiments. Using Drosophila melanogaster S2 cells, we generated ChIP-seq data sets for a site-specific transcription factor (Suppressor of Hairy-wing) and a histone modification (H3K36me3). We detected a chromatin-state bias: open chromatin regions yielded higher coverage, which led to false positives if not corrected. This bias had a greater effect on detection specificity than any base-composition bias. Paired-end sequencing revealed that single-end data underestimated ChIP-library complexity at high coverage. Removal of reads originating at the same base reduced false-positives but had little effect on detection sensitivity. Even at mappable-genome coverage depth of 1 read per base pair, 1% of the narrow peaks detected on a tiling array were missed by ChIP-seq. Evaluation of widely used ChIP-seq analysis tools suggests that adjustments or algorithm improvements are required to handle data sets with deep coverage.

This is a preview of subscription content

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Impact of genomic sequence composition and chromatin state on read coverage.
Figure 2: Comparison of several features between the paired-end and single-end reads, and an evaluation of the effect of DNA fragment size.
Figure 3: Quality of Su(Hw) peaks.
Figure 4: Comparison of identified narrow peaks and the dynamic range between the sequencing and the tiling array platform.
Figure 5: Evaluation of reproducibility across replicates for six peak callers.

Accession codes

Accessions

Gene Expression Omnibus

References

  1. 1

    Johnson, D.S., Mortazavi, A., Myers, R.M. & Wold, B. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007).

    CAS  Article  Google Scholar 

  2. 2

    Robertson, G. et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat. Methods 4, 651–657 (2007).

    CAS  Article  Google Scholar 

  3. 3

    Barski, A. et al. High-resolution profiling of histone methylations in the human genome. Cell 129, 823–837 (2007).

    CAS  Article  Google Scholar 

  4. 4

    Mikkelsen, T.S. et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553–560 (2007).

    CAS  Article  Google Scholar 

  5. 5

    Johnson, D.S. et al. Systematic evaluation of variability in ChIP-chip experiments using predefined DNA targets. Genome Res. 18, 393–403 (2008).

    Article  Google Scholar 

  6. 6

    Ho, J.W. et al. ChIP-chip versus ChIP-seq: lessons for experimental design and data analysis. BMC Genomics 12, 134 (2011).

    CAS  Article  Google Scholar 

  7. 7

    Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).

    Article  Google Scholar 

  8. 8

    Laajala, T.D. et al. A practical comparison of methods for detecting transcription factor binding sites in ChIP-seq experiments. BMC Genomics 10, 618 (2009).

    Article  Google Scholar 

  9. 9

    Wilbanks, E.G. & Facciotti, M.T. Evaluation of algorithm performance in ChIP-seq peak detection. PLoS ONE 5, e11471 (2010).

    Article  Google Scholar 

  10. 10

    Rozowsky, J. et al. PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nat. Biotechnol. 27, 66–75 (2009).

    CAS  Article  Google Scholar 

  11. 11

    Negre, N. et al. A comprehensive map of insulator elements for the Drosophila genome. PLoS Genet. 6, e1000814 (2010).

    Article  Google Scholar 

  12. 12

    Myers, R.M. et al. A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol. 9, e1001046 (2011).

    CAS  Article  Google Scholar 

  13. 13

    Pepke, S., Wold, B. & Mortazavi, A. Computation for ChIP-seq and RNA-seq studies. Nat. Methods 6, S22–S32 (2009).

    CAS  Article  Google Scholar 

  14. 14

    Kolasinska-Zwierz, P. et al. Differential chromatin marking of introns and expressed exons by H3K36me3. Nat. Genet. 41, 376–381 (2009).

    CAS  Article  Google Scholar 

  15. 15

    Dohm, J.C., Lottaz, C., Borodina, T. & Himmelbauer, H. Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res. 36, e105 (2008).

    Article  Google Scholar 

  16. 16

    Kozarewa, I. et al. Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomes. Nat. Methods 6, 291–295 (2009).

    CAS  Article  Google Scholar 

  17. 17

    Kharchenko, P.V. et al. Comprehensive analysis of the chromatin landscape in Drosophila melanogaster. Nature 471, 480–485 (2011).

    CAS  Article  Google Scholar 

  18. 18

    Negre, N. et al. A cis-regulatory map of the Drosophila genome. Nature 471, 527–531 (2011).

    CAS  Article  Google Scholar 

  19. 19

    Roy, S. et al. Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science 330, 1787–1797 (2010).

    CAS  Article  Google Scholar 

  20. 20

    Larschan, E. et al. X chromosome dosage compensation via enhanced transcriptional elongation in Drosophila. Nature 471, 115–118 (2011).

    CAS  Article  Google Scholar 

  21. 21

    Teytelman, L. et al. Impact of chromatin structures on DNA processing for genomic analyses. PLoS ONE 4, e6700 (2009).

    Article  Google Scholar 

  22. 22

    Feng, X., Grossman, R. & Stein, L. PeakRanger: a cloud-enabled peak caller for ChIP-seq data. BMC Bioinformatics 12, 139 (2011).

    Article  Google Scholar 

  23. 23

    Rashid, N.U., Giresi, P.G., Ibrahim, J.G., Sun, W. & Lieb, J.D. ZINBA integrates local covariates with DNA-seq data to identify broad and narrow regions of enrichment, even within amplified genomic regions. Genome Biol. 12, R67 (2011).

    CAS  Article  Google Scholar 

  24. 24

    Ji, H. et al. An integrated software system for analyzing ChIP-chip and ChIP-seq data. Nat. Biotechnol. 26, 1293–1300 (2008).

    CAS  Article  Google Scholar 

  25. 25

    Jothi, R., Cuddapah, S., Barski, A., Cui, K. & Zhao, K. Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data. Nucleic Acids Res. 36, 5221–5231 (2008).

    CAS  Article  Google Scholar 

  26. 26

    Kharchenko, P.V., Tolstorukov, M.Y. & Park, P.J. Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nat. Biotechnol. 26, 1351–1359 (2008).

    CAS  Article  Google Scholar 

  27. 27

    Nix, D.A., Courdy, S.J. & Boucher, K.M. Empirical methods for controlling false positives and estimating confidence in ChIP-Seq peaks. BMC Bioinformatics 9, 523 (2008).

    Article  Google Scholar 

  28. 28

    Valouev, A. et al. Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. Nat. Methods 5, 829–834 (2008).

    CAS  Article  Google Scholar 

  29. 29

    Johnson, W.E. et al. Model-based analysis of tiling-arrays for ChIP-chip. Proc. Natl. Acad. Sci. USA 103, 12457–12462 (2006).

    CAS  Article  Google Scholar 

  30. 30

    Li, Q., Brown, J.B., Huang, H. & Bickel, P.J. Measuring reproducibility of high-throughput experiments. Annals of Applied Statistics 5, 1752–1779 (2011).

    Article  Google Scholar 

  31. 31

    Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).

    CAS  Article  Google Scholar 

  32. 32

    Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).

    CAS  Article  Google Scholar 

  33. 33

    Boyle, A.P., Guinney, J., Crawford, G.E. & Furey, T.S. F-Seq: a feature density estimator for high-throughput sequence tags. Bioinformatics 24, 2537–2538 (2008).

    CAS  Article  Google Scholar 

  34. 34

    Zhang, Y. et al. Expression in aneuploid Drosophila S2 cells. PLoS Biol. 8, e1000320 (2010).

    Article  Google Scholar 

  35. 35

    Celniker, S.E. et al. Unlocking the secrets of the genome. Nature 459, 927–930 (2009).

    CAS  Article  Google Scholar 

  36. 36

    Quinlan, A.R. & Hall, I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    CAS  Article  Google Scholar 

Download references

Acknowledgements

We thank the authors of all of the algorithms that we evaluated in this study: H. Ji, R. Jothi, P. Kharchenko, W. Li, D. Nix, J. Rozowsky and A. Valouev. We thank N. Bild, D. Roqueiro and M. Sabala for help in performing PeakSeq on the Bionimbus Cloud, D. Schmidt and D. Odom for sharing their sequencing data of the ENCODE spike-in sample, A. Kundaje for sharing his unpublished results on IDR analysis of H3K36me3 in humans, N. Rashid for sharing the mappability data of Drosophila genome, M. Greenberg for support in the early stage of this project, and E. Birney, M. Snyder, J. Ahringer, M. Gerstein, M. Kellis, P. Park and other members of modENCODE consortium for helpful discussions. This work was partially funded by US National Institutes of Health (HG4069 to X.S.L., 3U01 HG004270-03S1 to X.S.L. and J.D.L., and U01HG004264 to K.P.W.).

Author information

Affiliations

Authors

Contributions

Y.C. performed bioinformatic analysis. N.N. performed cell culture, ChIP experiments and library preparation with help from J.Z. J.O.M. performed library preparation and sequencing experiments. Q.L. and P.J.B. contributed code for the IDR method. Q.L. participated in writing the description of IDR method and interpretation of the IDR analysis result. M.S. performed ChIP–quantitative (q)PCR validation of the selected array-specific Su(Hw) peaks and analyzed the ChIP-qPCR data. T.L., Y.Z., T.-K.K., H.H.H., Y.R., R.M.M. and B.J.W. contributed to the early development of the project. B.J.W., K.P.W., J.D.L. and X.S.L. conceived the project. T.-K.K., H.H.H., Y.R. and R.M.M. performed pilot experiments. Y.C., J.D.L. and X.S.L. wrote the manuscript with the help from other authors.

Corresponding authors

Correspondence to Kevin P White, Jason D Lieb or X Shirley Liu.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–16 , Supplementary Table 1, Supplementary Notes, Supplementary Methods (PDF 4131 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Chen, Y., Negre, N., Li, Q. et al. Systematic evaluation of factors influencing ChIP-seq fidelity. Nat Methods 9, 609–614 (2012). https://doi.org/10.1038/nmeth.1985

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing