Computation for ChIP-seq and RNA-seq studies

Pepke, Shirley; Wold, Barbara; Mortazavi, Ali

doi:10.1038/nmeth.1371

Review Article
Published: 15 October 2009

Computation for ChIP-seq and RNA-seq studies

Shirley Pepke¹,
Barbara Wold² &
Ali Mortazavi²

Nature Methods volume 6, pages S22–S32 (2009)Cite this article

31k Accesses
386 Citations
20 Altmetric
Metrics details

Abstract

Genome-wide measurements of protein-DNA interactions and transcriptomes are increasingly done by deep DNA sequencing methods (ChIP-seq and RNA-seq). The power and richness of these counting-based measurements comes at the cost of routinely handling tens to hundreds of millions of reads. Whereas early adopters necessarily developed their own custom computer code to analyze the first ChIP-seq and RNA-seq datasets, a new generation of more sophisticated algorithms and software tools are emerging to assist in the analysis phase of these projects. Here we describe the multilayered analyses of ChIP-seq and RNA-seq datasets, discuss the software packages currently available to perform tasks at each layer and describe some upcoming challenges and features for future analysis tools. We also discuss how software choices and uses are affected by specific aspects of the underlying biology and data structure, including genome size, positional clustering of transcription factor binding sites, transcript discovery and expression quantification.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: A hierachical overview of ChIP-seq and RNA-seq analyses.**

**Figure 2: ChIP-seq peak types from various experiments.**

**Figure 3: ChIP-seq peak calling subtasks.**

**Figure 4: The impact of fragment length and complex peak structures in ChIP-seq.**

**Figure 6: Approaches to handle spliced reads.**

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis

Article Open access 25 March 2024

Tissue-specific enhancer–gene maps from multimodal single-cell data identify causal disease alleles

Article 09 April 2024

References

ENCODE Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007).
Wold, B. & Myers, R.M. Sequence census methods for functional genomics. Nat. Methods 5, 19–21 (2008).
Article CAS Google Scholar
Trapnell, C. & Salzberg, S.L. How to map billions of short reads onto genomes. Nat. Biotechnol. 27, 455–457 (2009).
Article CAS Google Scholar
Johnson, D.S., Mortazavi, A., Myers, R.M. & Wold, B. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007).
Article CAS Google Scholar
Rozowsky, J. et al. PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nat. Biotechnol. 27, 66–75 (2009).
Article CAS Google Scholar
Baugh, L.R., Demodena, J. & Sternberg, P.W. RNA Pol II accumulates at promoters of growth genes during developmental arrest. Science 324, 92–94 (2009).
Article CAS Google Scholar
Barski, A. et al. High-resolution profiling on histone methylations in the human genome. Cell 129, 823–837 (2007).
Article CAS Google Scholar
Mikkelsen, T.S. et al. Genome-wide maps of chromatin state in pluripotent and linearge-committed cells. Nature 448, 553–560 (2007).
Article CAS Google Scholar
Valouev, A. et al. Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. Nat. Methods 5, 829–834 (2008).
Article CAS Google Scholar
Ji, H. et al. An integrated software system for analyzing ChIP-chip and ChIP-seq data. Nat. Biotechnol. 26, 1293–1300 (2008).
Article CAS Google Scholar
Jothi, R., Cuddapah, S., Barski, A., Cui, K. & Zhao, K. Genome-wide identification of in vivo protein-DNA binding sites from ChIP-seq data. Nucleic Acids Res. 36, 5221–5231 (2008).
Article CAS Google Scholar
Kharchenko, P.V., Tolstorukov, M.Y. & Park, P.J. Design and anlysis of ChIP-seq experiments for DNA-binding proteins. Nat. Biotechnol. 26, 1351–1359 (2008).
Article CAS Google Scholar
Zhang, Y. et al. Model-based analysis of ChIP-seq (MACS). Genome Biol. 9, R137.1– R137.9 (2008).
Google Scholar
Boyle, A.P., Guinney, J., Crawford, G.E. & Furey, T.S. F-Seq: a feature density estimator for high-throughput sequence tags. Bioinformatics 24, 2537–2538 (2008).
Article CAS Google Scholar
Zang, C. et al. A clustering approach for identification of enriched domains from histone modification ChIP-Seq data. Bioinformatics 25, 1952–1958 (2009).
Article CAS Google Scholar
Robertson, G. et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat. Methods 4, 651–657 (2007).
Article CAS Google Scholar
Tuteja, G., White, P., Schug, J. & Kaestner, K.H. Extracting transcription factor targets from ChIP-Seq data. Nucleic Acids Res. advance online publication doi:10.1093/nar/gkp536 (24 June 2009).
Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat. Methods 5, 621–628 (2008).
Article CAS Google Scholar
Fejes, A.P. et al. FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel short-read sequencing technology. Bioinformatics 24, 1729–1730 (2008).
Article CAS Google Scholar
Nix, D.A., Courdy, S.J. & Boucher, K.M. Empirical methods for controlling false positives and estimating confidence in ChIP-seq peaks. BMC Bioinformatics 9, 523 (2008).
Article Google Scholar
Xu, H., Wei, C., Lin, F. & Sung, W.K. An HMM approach to genome-wide identification of differential histone modification sites from ChIP-seq data. Bioinformatics 24, 2344–2349 (2008).
Article CAS Google Scholar
Hon, G., Ren, B. & Wang, W. ChromaSig: a probabilistic approach to finding common chromatin signatures in the human genome. PLOS Comput. Biol. 4, e1000201 (2008).
Article Google Scholar
Nagalakshmi, U. et al. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320, 1344–1349 (2008).
Article CAS Google Scholar
Wihelm, B.T. et al. Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution. Nature 453, 1239–1243 (2008).
Article Google Scholar
Cloonan, N. et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat. Methods 5, 613–619 (2008).
Article CAS Google Scholar
Marioni, J.C., Mason, C.E., Mane, S.M., Stephens, M. & Gilad, Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18, 1509–1517 (2008).
Article CAS Google Scholar
Sultan, M. et al. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 321, 956–960 (2008).
Article CAS Google Scholar
Wang, E.T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).
Article CAS Google Scholar
Oshlack, A. & Wakefield, M.J. Transcript length bias in RNA-seq data confounds systems biology. Biol. Direct 4, 14 (2009).
Article Google Scholar
Bullard, J.H., Purdom, E.A., Hansen, K. D, Durinck, S. & Dudoit, S. Statistical inference in mRNA-seq: exploratory data analysis and differential expression. UC Berkeley Division of Biostatistics Working Paper Series 247 (2009).
Google Scholar
Zerbino, D.R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008).
Article CAS Google Scholar
Trapnell, C., Pachter, L. & Salzberg, S.L. TopHat: discovering splice junctions with RNA-seq. Bioinformatics 25, 1105–1111 (2009).
Article CAS Google Scholar
Birol, I. et al. De novo transcriptome assembly with ABySS. Bioinformatics advance online publication, doi:10.1093/bioinformatics/btp367 (15 June 2009).
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Article Google Scholar
Li, R., Li, Y., Kristiansen, K. & Wang, J. SOAP: short oligonucleotide alignment program. Bioinformatics 24, 713–714 (2008).
Article CAS Google Scholar
Cloonan, N. et al. RNA-MATE: a recursive mapping strategy for high-throughput RNA-sequencing data. Bioinformatics advance online publication, doi:10.1093/bioinformatics/btp459 (30 July 2009).
Denoeud, F. et al. Annotating genomes with massive-scale RNA sequencing. Genome Biol. 9, R175 (2009).
Article Google Scholar
De Bona, F., Ossowski, S., Schneeberger, K. & Rätsch, G. Optimal spliced alignments of short sequence reads. Bioinformatics 24, i175–i180 (2008).
Article Google Scholar
Zhang, Z., Carriero, N. & Gerstein, M. Comparative analysis of processed pseudogenes in the mouse and human genomes. Trends Genet. 20, 62–67 (2004).
Article Google Scholar
Jiang, H. & Wong, W.H. Statistical inferences for isoform expression in RNA-seq. Bioinformatics 25, 1026–1032 (2009).
Article CAS Google Scholar
Zheng, S. & Chen, L. A hierarchical Bayesian model for comparing transcriptomes at the individual transcript isoform level. Nucleic Acids Res. 37, e75 (2009).
Article Google Scholar
Huber, W., von Heydebreck, A., Sültmann, H., Poustka, A. & Vingron, M. Variance stabilization applied to microarray data calibration and to the quantification of differential gene expression. Bioinformatics 18 Suppl 1, S96–S104 (2002).
Article Google Scholar
Chepelev, I., Wei, G., Tang, Q. & Zhao, K. Detection of single nucleotide variations in expressed exons of the human genome using RNA-seq. Nucleic Acids Res. advance online publication, doi:10.1093/nar/gkp507 (15 June 2009).
Li, J.B. et al. Genome-wide identification of human RNA editing sites by parallel DNA capturing and sequencing. Science 324, 1210–1213 (2009).
Article CAS Google Scholar
Lister, R. et al. Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell 133, 523–536 (2008).
Article CAS Google Scholar
Dostie, J. et al. Chromosome conformation capture carbon copy (5C): a massively parallel solution for mapping interactions between genomic elements. Genome Res. 16, 1299–1309 (2006).
Article CAS Google Scholar
Fullwood, M.J., Wei, C.L., Liu, E.T. & Ruan, Y. Next-generation DNA sequencing of paired-end tags (PET) for transcriptome and genomes analyses. Genome Res. 19, 521–532 (2009).
Article CAS Google Scholar
Armour, C.D. et al. Digital transcriptome profiling using selective priming for cDNA synthesis. Nat. Methods 6, 647–649 (2009).
Article CAS Google Scholar

Download references

Acknowledgements

This work was supported by The Beckman Foundation, The Beckman Institute, The Simons Foundation and US National Institutes of Health (NIH) grant U54 HG004576 to B.W., Fellowships from the Gordon and Betty Moore Foundation, Caltech's Center for the Integrative Study of Cell Regulation, and the Beckman Institute to A.M., and support from the Gordon and Betty Moore foundation to S.P. The authors would like to especially thank G. Marinov and P. Sternberg for many helpful discussions of this manuscript.

Author information

Authors and Affiliations

Center for Advanced Computing Research,
Shirley Pepke
Division of Biology, California Institute of Technology, Pasadena, California, USA
Barbara Wold & Ali Mortazavi

Authors

Shirley Pepke
View author publications
You can also search for this author in PubMed Google Scholar
Barbara Wold
View author publications
You can also search for this author in PubMed Google Scholar
Ali Mortazavi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ali Mortazavi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pepke, S., Wold, B. & Mortazavi, A. Computation for ChIP-seq and RNA-seq studies. Nat Methods 6 (Suppl 11), S22–S32 (2009). https://doi.org/10.1038/nmeth.1371

Download citation

Published: 15 October 2009
Issue Date: November 2009
DOI: https://doi.org/10.1038/nmeth.1371

This article is cited by

WACS: improving ChIP-seq peak calling by optimally weighting controls
- Aseel Awdeh
- Marcel Turcotte
- Theodore J. Perkins
BMC Bioinformatics (2021)
ZmMYC2 exhibits diverse functions and enhances JA signaling in transgenic Arabidopsis
- Jingye Fu
- Lijun Liu
- Qiang Wang
Plant Cell Reports (2020)
DiNAMO: highly sensitive DNA motif discovery in high-throughput sequencing data
- Chadi Saad
- Laurent Noé
- Martin Figeac
BMC Bioinformatics (2018)
Empirical assessment of analysis workflows for differential expression analysis of human samples using RNA-Seq
- Claire R. Williams
- Alyssa Baccarella
- Charles C. Kim
BMC Bioinformatics (2017)

Computation for ChIP-seq and RNA-seq studies

Abstract

Access options

Similar content being viewed by others

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis

Tissue-specific enhancer–gene maps from multimodal single-cell data identify causal disease alleles

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

This article is cited by

WACS: improving ChIP-seq peak calling by optimally weighting controls

ZmMYC2 exhibits diverse functions and enhances JA signaling in transgenic Arabidopsis

DiNAMO: highly sensitive DNA motif discovery in high-throughput sequencing data

Empirical assessment of analysis workflows for differential expression analysis of human samples using RNA-Seq

Search

Quick links

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links