Differential analysis of gene and transcript expression using high-throughput RNA sequencing (RNA-seq) is complicated by several sources of measurement variability and poses numerous statistical challenges. We present Cuffdiff 2, an algorithm that estimates expression at transcript-level resolution and controls for variability evident across replicate libraries. Cuffdiff 2 robustly identifies differentially expressed transcripts and genes and reveals differential splicing and promoter-preference changes. We demonstrate the accuracy of our approach through differential analysis of lung fibroblasts in response to loss of the developmental transcription factor HOXA1, which we show is required for lung fibroblast and HeLa cell cycle progression. Loss of HOXA1 results in significant expression level changes in thousands of individual transcripts, along with isoform switching events in key regulators of the cell cycle. Cuffdiff 2 performs robust differential analysis in RNA-seq experiments at transcript resolution, revealing a layer of regulation not readily observable with other high-throughput technologies.
At a glance
Gene Expression Omnibus
- Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat. Methods 5, 613–619 (2008). et al.
- Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat. Methods 5, 621–628 (2008). , , , &
- Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010). et al.
- Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat. Biotechnol. 28, 503–510 (2010). et al.
- RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18, 1509–1517 (2008). , , , &
- Estimating accuracy of RNA-seq and microarrays with proteomics. BMC Genomics 10, 161 (2009). et al.
- The developmental transcriptome of Drosophila melanogaster. Nature 471, 473–479 (2011). et al.
- Hotspots of aberrant epigenomic reprogramming in human induced pluripotent stem cells. Nature 471, 68–73 (2011). et al.
- Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464, 768–772 (2010). et al.
- Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464, 773–777 (2010). et al.
- Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008). et al.
- Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 40, 1413–1415 (2008). , , , &
- Statistical inferences for isoform expression in RNA-seq. Bioinformatics 25, 1026–1032 (2009). &
- Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat. Methods 7, 1009–1015 (2010). , , &
- Estimation of alternative splicing isoform frequencies from RNA-seq data. Algorithms Mol. Biol. 6, 9 (2011). , , &
- Accurate quantification of transcriptome from RNA-seq data by effective length normalization. Nucleic Acids Res. 39, e9 (2011). et al.
- Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010). &
- Cloud-scale RNA-sequencing differential expression analysis with Myrna. Genome Biol. 11, R83 (2010). , &
- From RNA-seq reads to differential expression results. Genome Biol. 11, 220 (2010). , &
- edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010). , &
- DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics 26, 136–138 (2010). , , , &
- baySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics 11, 422 (2010). &
- Alternative expression analysis by RNA sequencing. Nat. Methods 7, 843–847 (2010). et al.
- Identifying differentially expressed transcripts from RNA-seq data with biological variation. Bioinformatics 28, 1721–1728 (2012). , &
- Detecting differential usage of exons from RNA-seq data. Genome Res. 22, 2008–2017 (2012). , &
- Modulating Hox gene functions during animal body patterning. Nat. Rev. Genet. 6, 893–904 (2005). , &
- Isoform abundance inference provides a more accurate estimation of gene expression levels in RNA-seq. J. Bioinform. Comput. Biol. 08, 177 (2010). , &
- Differential expression in RNA-seq: a matter of depth. Genome Res. 21, 2213–2223 (2011). , , , &
- Identifiability of isoform deconvolution from junction arrays and RNA-seq. Bioinformatics 25, 3056–3059 (2009). , , &
- Improving RNA-seq expression estimates by correcting for fragment bias. Genome Biol. 12, R22 (2011). , , , &
- Anatomic demarcation by positional variation in fibroblast gene expression programs. PLoS Genet. 2, e119 (2006). , , , &
- Dynamic transcriptomes during neural differentiation of human embryonic stem cells revealed by short, long, and paired-end sequencing. Proc. Natl. Acad. Sci. USA 107, 5254–5259 (2010). et al.
- Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. (2011). et al.
- Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545–15550 (2005). et al.
- Cyclin-dependent kinases: engines, clocks, and microprocessors. Annu. Rev. Cell Dev. Biol. 13, 261–291 (1997). &
- Structural analysis of human Orc6 protein reveals a homology with transcription factor TFIIB. Proc. Natl. Acad. Sci. USA 108, 7373–7378 (2011). et al.
- Identification and characterization of the human ORC6 homolog. J. Biol. Chem. 275, 34983–34988 (2000). &
- Cdc14b regulates mammalian RNA polymerase II and represses cell cycle transcription. Scientific Reports 1, 189 (2011). et al.
- Diverse functional networks of Tbx3 in development and disease. Wiley Interdisciplinary Rev. Syst. Biol. Med. 4, 273–283 (2012). , , &
- The T-box family. Genome Biol. 3, S3008 (2002). , , &
- Landscape of transcription in human cells. Nature 489, 101–108 (2012). et al.
- Alternative splicing of RNA triplets is often regulated and accelerates proteome evolution. PLoS Biol. 10, e1001229 (2012). , , &
- Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007). , , &
- Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553–560 (2007). et al.
- Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). Genome Res. 16, 123–131 (2006). et al.
- Isolation of active regulatory elements from eukaryotic chromatin using FAIRE (formaldehyde assisted isolation of regulatory elements). Methods 48, 233–239 (2009). &
- An oestrogen-receptor-alpha-bound human chromatin interactome. Nature 462, 58–64 (2009). et al.
- Genome-wide identification of polycomb-associated RNAs by RIP-seq. Mol. Cell 40, 939–953 (2010). et al.
- HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature 456, 464–469 (2008). et al.
- Transcriptome-wide regulation of pre-mRNA splicing and mRNA localization by muscleblind proteins. Cell 150, 710–724 (2012). et al.
- Supplementary Text and Figures (22M)
Supplementary Figures 1–87 and Supplementary Tables 1–3