Abstract
Recent progress in massively parallel sequencing platforms has enabled genome-wide characterization of DNA-associated proteins using the combination of chromatin immunoprecipitation and sequencing (ChIP-seq). Although a variety of methods exist for analysis of the established alternative ChIP microarray (ChIP-chip), few approaches have been described for processing ChIP-seq data. To fill this gap, we propose an analysis pipeline specifically designed to detect protein-binding positions with high accuracy. Using previously reported data sets for three transcription factors, we illustrate methods for improving tag alignment and correcting for background signals. We compare the sensitivity and spatial precision of three peak detection algorithms with published methods, demonstrating gains in spatial precision when an asymmetric distribution of tags on positive and negative strands is considered. We also analyze the relationship between the depth of sequencing and characteristics of the detected binding positions, and provide a method for estimating the sequencing depth necessary for a desired coverage of protein binding sites.
Access options
Subscribe to Journal
Get full journal access for 1 year
$259.00
only $21.58 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
from$8.99
All prices are NET prices.






References
- 1
Kim, T.H. & Ren, B. Genome-wide analysis of protein-DNA interactions. Annu. Rev. Genomics Hum. Genet. 7, 81–102 (2006).
- 2
Johnson, D.S., Mortazavi, A., Myers, R.M. & Wold, B. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007).
- 3
Impey, S. et al. Defining the CREB regulon: a genome-wide analysis of transcription factor regulatory regions. Cell 119, 1041–1054 (2004).
- 4
Roh, T.Y., Cuddapah, S. & Zhao, K. Active chromatin domains are defined by acetylation islands revealed by genome-wide mapping. Genes Dev. 19, 542–552 (2005).
- 5
Bhinge, A.A., Kim, J., Euskirchen, G.M., Snyder, M. & Iyer, V.R. Mapping the chromosomal targets of STAT1 by sequence tag analysis of genomic enrichment (STAGE). Genome Res. 17, 910–916 (2007).
- 6
Eckhardt, F. et al. DNA methylation profiling of human chromosomes 6, 20 and 22. Nat. Genet. 38, 1378–1385 (2006).
- 7
Bentley, D.R. Whole-genome re-sequencing. Curr. Opin. Genet. Dev. 16, 545–552 (2006).
- 8
Johnson, W.E. et al. Model-based analysis of tiling-arrays for ChIP-chip. Proc. Natl. Acad. Sci. USA 103, 12457–12462 (2006).
- 9
Qi, Y. et al. High-resolution computational models of genome binding events. Nat. Biotechnol. 24, 963–970 (2006).
- 10
Peng, S., Alekseyenko, A.A., Larschan, E., Kuroda, M.I. & Park, P.J. Normalization and experimental design for ChIP-chip data. BMC Bioinformatics 8, 219 (2007).
- 11
Robertson, G. et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat. Methods 4, 651–657 (2007).
- 12
Smith, A.D., Xuan, Z. & Zhang, M.Q. Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinformatics 9, 128 (2008).
- 13
Barski, A. et al. High-resolution profiling of histone methylations in the human genome. Cell 129, 823–837 (2007).
- 14
Mortazavi, A., Leeper Thompson, E.C., Garcia, S.T., Myers, R.M. & Wold, B. Comparative genomics modeling of the NRSF/REST repressor network: from single conserved sites to genome-wide repertoire. Genome Res. 16, 1208–1221 (2006).
- 15
Kim, T.H. et al. Analysis of the vertebrate insulator protein CTCF-binding sites in the human genome. Cell 128, 1231–1245 (2007).
- 16
Kent, W.J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
- 17
Bailey, T.L., Williams, N., Misleh, C. & Li, W.W. MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res. 34, W369–W373 (2006).
- 18
Matys, V. et al. TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 31, 374–378 (2003).
- 19
Price, R.M. & Bonett, D.G. Estimating the ratio of two Poisson rates. Comput. Stat. Data Anal. 34, 345–356 (2000).
Acknowledgements
We would like to thank Dustin Schones and Keji Zhao for providing raw data and detailed descriptions for the CTCF data set, and Ali Mortazavi and Barbara Wold for providing sequence tag data for NRSF binding. This work was supported by grants from the National Institutes of Health to P.J.P. (U01HG004258, R01GM082798, UL1RR024920).
Author information
Affiliations
Corresponding authors
Supplementary information
Supplementary Text and Figures
Figures 1–20, Tables 1–3 (PDF 4157 kb)
Rights and permissions
About this article
Cite this article
Kharchenko, P., Tolstorukov, M. & Park, P. Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nat Biotechnol 26, 1351–1359 (2008). https://doi.org/10.1038/nbt.1508
Received:
Accepted:
Published:
Issue Date:
Further reading
-
Egr2 and 3 control inflammation, but maintain homeostasis, of PD-1high memory phenotype CD4 T cells
Life Science Alliance (2020)
-
EpiMogrify Models H3K4me3 Data to Identify Signaling Molecules that Improve Cell Fate Control and Maintenance
Cell Systems (2020)
-
ChIA-PIPE: A fully automated pipeline for comprehensive ChIA-PET data analysis and visualization
Science Advances (2020)
-
From reads to insight: a hitchhiker’s guide to ATAC-seq data analysis
Genome Biology (2020)
-
Occupancy maps of 208 chromatin-associated proteins in one human cell type
Nature (2020)