Article | Published:

Differential analysis of gene regulation at transcript resolution with RNA-seq

Nature Biotechnology volume 31, pages 4653 (2013) | Download Citation

Abstract

Differential analysis of gene and transcript expression using high-throughput RNA sequencing (RNA-seq) is complicated by several sources of measurement variability and poses numerous statistical challenges. We present Cuffdiff 2, an algorithm that estimates expression at transcript-level resolution and controls for variability evident across replicate libraries. Cuffdiff 2 robustly identifies differentially expressed transcripts and genes and reveals differential splicing and promoter-preference changes. We demonstrate the accuracy of our approach through differential analysis of lung fibroblasts in response to loss of the developmental transcription factor HOXA1, which we show is required for lung fibroblast and HeLa cell cycle progression. Loss of HOXA1 results in significant expression level changes in thousands of individual transcripts, along with isoform switching events in key regulators of the cell cycle. Cuffdiff 2 performs robust differential analysis in RNA-seq experiments at transcript resolution, revealing a layer of regulation not readily observable with other high-throughput technologies.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Accessions

Primary accessions

Gene Expression Omnibus

References

  1. 1.

    et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat. Methods 5, 613–619 (2008).

  2. 2.

    , , , & Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat. Methods 5, 621–628 (2008).

  3. 3.

    et al. Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).

  4. 4.

    et al. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat. Biotechnol. 28, 503–510 (2010).

  5. 5.

    , , , & RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18, 1509–1517 (2008).

  6. 6.

    et al. Estimating accuracy of RNA-seq and microarrays with proteomics. BMC Genomics 10, 161 (2009).

  7. 7.

    et al. The developmental transcriptome of Drosophila melanogaster. Nature 471, 473–479 (2011).

  8. 8.

    et al. Hotspots of aberrant epigenomic reprogramming in human induced pluripotent stem cells. Nature 471, 68–73 (2011).

  9. 9.

    et al. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464, 768–772 (2010).

  10. 10.

    et al. Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464, 773–777 (2010).

  11. 11.

    et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).

  12. 12.

    , , , & Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 40, 1413–1415 (2008).

  13. 13.

    & Statistical inferences for isoform expression in RNA-seq. Bioinformatics 25, 1026–1032 (2009).

  14. 14.

    , , & Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat. Methods 7, 1009–1015 (2010).

  15. 15.

    , , & Estimation of alternative splicing isoform frequencies from RNA-seq data. Algorithms Mol. Biol. 6, 9 (2011).

  16. 16.

    et al. Accurate quantification of transcriptome from RNA-seq data by effective length normalization. Nucleic Acids Res. 39, e9 (2011).

  17. 17.

    & Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010).

  18. 18.

    , & Cloud-scale RNA-sequencing differential expression analysis with Myrna. Genome Biol. 11, R83 (2010).

  19. 19.

    , & From RNA-seq reads to differential expression results. Genome Biol. 11, 220 (2010).

  20. 20.

    , & edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).

  21. 21.

    , , , & DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics 26, 136–138 (2010).

  22. 22.

    & baySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics 11, 422 (2010).

  23. 23.

    et al. Alternative expression analysis by RNA sequencing. Nat. Methods 7, 843–847 (2010).

  24. 24.

    , & Identifying differentially expressed transcripts from RNA-seq data with biological variation. Bioinformatics 28, 1721–1728 (2012).

  25. 25.

    , & Detecting differential usage of exons from RNA-seq data. Genome Res. 22, 2008–2017 (2012).

  26. 26.

    , & Modulating Hox gene functions during animal body patterning. Nat. Rev. Genet. 6, 893–904 (2005).

  27. 27.

    , & Isoform abundance inference provides a more accurate estimation of gene expression levels in RNA-seq. J. Bioinform. Comput. Biol. 08, 177 (2010).

  28. 28.

    , , , & Differential expression in RNA-seq: a matter of depth. Genome Res. 21, 2213–2223 (2011).

  29. 29.

    , , & Identifiability of isoform deconvolution from junction arrays and RNA-seq. Bioinformatics 25, 3056–3059 (2009).

  30. 30.

    , , , & Improving RNA-seq expression estimates by correcting for fragment bias. Genome Biol. 12, R22 (2011).

  31. 31.

    , , , & Anatomic demarcation by positional variation in fibroblast gene expression programs. PLoS Genet. 2, e119 (2006).

  32. 32.

    et al. Dynamic transcriptomes during neural differentiation of human embryonic stem cells revealed by short, long, and paired-end sequencing. Proc. Natl. Acad. Sci. USA 107, 5254–5259 (2010).

  33. 33.

    et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. (2011).

  34. 34.

    et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545–15550 (2005).

  35. 35.

    & Cyclin-dependent kinases: engines, clocks, and microprocessors. Annu. Rev. Cell Dev. Biol. 13, 261–291 (1997).

  36. 36.

    et al. Structural analysis of human Orc6 protein reveals a homology with transcription factor TFIIB. Proc. Natl. Acad. Sci. USA 108, 7373–7378 (2011).

  37. 37.

    & Identification and characterization of the human ORC6 homolog. J. Biol. Chem. 275, 34983–34988 (2000).

  38. 38.

    et al. Cdc14b regulates mammalian RNA polymerase II and represses cell cycle transcription. Scientific Reports 1, 189 (2011).

  39. 39.

    , , & Diverse functional networks of Tbx3 in development and disease. Wiley Interdisciplinary Rev. Syst. Biol. Med. 4, 273–283 (2012).

  40. 40.

    , , & The T-box family. Genome Biol. 3, S3008 (2002).

  41. 41.

    et al. Landscape of transcription in human cells. Nature 489, 101–108 (2012).

  42. 42.

    , , & Alternative splicing of RNA triplets is often regulated and accelerates proteome evolution. PLoS Biol. 10, e1001229 (2012).

  43. 43.

    , , & Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007).

  44. 44.

    et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553–560 (2007).

  45. 45.

    et al. Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). Genome Res. 16, 123–131 (2006).

  46. 46.

    & Isolation of active regulatory elements from eukaryotic chromatin using FAIRE (formaldehyde assisted isolation of regulatory elements). Methods 48, 233–239 (2009).

  47. 47.

    et al. An oestrogen-receptor-alpha-bound human chromatin interactome. Nature 462, 58–64 (2009).

  48. 48.

    et al. Genome-wide identification of polycomb-associated RNAs by RIP-seq. Mol. Cell 40, 939–953 (2010).

  49. 49.

    et al. HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature 456, 464–469 (2008).

  50. 50.

    et al. Transcriptome-wide regulation of pre-mRNA splicing and mRNA localization by muscleblind proteins. Cell 150, 710–724 (2012).

Download references

Acknowledgements

We are grateful to D. Kelley for a careful reading of the manuscript, and B. Wold for sharing the hESC RNA-seq data. We are also thankful for the ongoing development efforts of A. Roberts, B. Langmead, D. Kim, G. Pertea, H. Pimentel and S. Salzberg. C.T. and D.G.H. are Damon Runyon Postdoctoral Fellows. J.L.R. is a Damon Runyon-Rachleff Inovator fellow. This work was supported by US National Institutes of Health grants DP2OD006670, P01GM099117, P50HG006193 and RO1ES020260 (to J.L.R.) and R01 HG006129 and R01 DK094699 (to L.P.).

Author information

Author notes

    • Cole Trapnell
    •  & David G Hendrickson

    These authors contributed equally to this work.

    • John L Rinn
    •  & Lior Pachter

    These authors contributed equally to this work.

Affiliations

  1. Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts, USA.

    • Cole Trapnell
    • , David G Hendrickson
    • , Martin Sauvageau
    • , Loyal Goff
    •  & John L Rinn
  2. The Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts, USA.

    • Cole Trapnell
    • , David G Hendrickson
    • , Martin Sauvageau
    • , Loyal Goff
    •  & John L Rinn
  3. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

    • Loyal Goff
  4. Department of Mathematics, University of California Berkeley, Berkeley, California, USA.

    • Lior Pachter
  5. Department of Molecular & Cell Biology, University of California Berkeley, California, USA.

    • Lior Pachter

Authors

  1. Search for Cole Trapnell in:

  2. Search for David G Hendrickson in:

  3. Search for Martin Sauvageau in:

  4. Search for Loyal Goff in:

  5. Search for John L Rinn in:

  6. Search for Lior Pachter in:

Contributions

C.T. and L.P. developed the mathematics and statistics. D.G.H. and M.S. performed the experiments. D.G.H. and C.T. designed the experiments and performed the analysis. C.T. and L.G. implemented the software. L.P., J.L.R., D.G.H. and C.T. conceived the research. All authors wrote and approved the manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to John L Rinn or Lior Pachter.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–87 and Supplementary Tables 1–3

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nbt.2450

Further reading