We evaluated 25 protocol variants of 14 independent computational methods for exon identification, transcript reconstruction and expression-level quantification from RNA-seq data. Our results show that most algorithms are able to identify discrete transcript components with high success rates but that assembly of complete isoform structures poses a major challenge even when all constituent elements are identified. Expression-level estimates also varied widely across methods, even when based on similar transcript models. Consequently, the complexity of higher eukaryotic genomes imposes severe limitations on transcript recall and splice product discrimination that are likely to remain limiting factors for the analysis of current-generation RNA-seq data.
At a glance
- Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010). et al.
- iReckon: simultaneous isoform discovery and abundance estimation from RNA-seq data. Genome Res. 23, 519–529 (2013). et al.
- Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics 27, 2325–2329 (2011). , , &
- Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation. Proc. Natl. Acad. Sci. USA 108, 19867–19872 (2011). , , , &
- Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28, 1086–1092 (2012). , , &
- Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011). et al.
- De novo assembly and analysis of RNA-seq data. Nat. Methods 7, 909–912 (2010). et al.
- EGASP: the human ENCODE Genome Annotation Assessment Project. Genome Biol. 7 (suppl. 1), S2 (2006). et al.
- AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–W439 (2006). et al.
- Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6, 31 (2005). &
- mGene: accurate SVM-based gene finding with an application to nematode genomes. Genome Res. 19, 2133–2143 (2009). et al.
- Using geneid to identify genes. Curr. Protoc. Bioinformatics 18, 4.3 (2007). , &
- trome, trEST and trGEN: databases of predicted protein sequences. Nucleic Acids Res. 32, D509–D511 (2004). et al.
- Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008). &
- Metazoan promoters: emerging characteristics and insights into transcriptional regulation. Nat. Rev. Genet. 13, 233–245 (2012). , &
- Mechanisms and consequences of alternative polyadenylation. Mol. Cell 43, 853–866 (2011). , &
- A large-scale analysis of mRNA polyadenylation of human and mouse genes. Nucleic Acids Res. 33, 201–212 (2005). , , &
- High-fidelity promoter profiling reveals widespread alternative promoter usage and transposon-driven developmental gene expression. Genome Res. 23, 169–180 (2013). , , , &
- Complex and dynamic landscape of RNA polyadenylation revealed by PAS-Seq. RNA 17, 761–772 (2011). et al.
- Systematic evaluation of spliced alignment programs for RNA-seq data. Nat. Methods doi:10.1038/nmeth.2722 (3 November 2013). et al.
- RNA-Seq read alignments with PALMapper. Curr. Protoc. Bioinformatics 32, 11.6 (2010). , , , &
- The GEM mapper: fast, accurate and versatile alignment by filtration. Nat. Methods 9, 1185–1188 (2012). , , &
- Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26, 873–881 (2010). &
- TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009). , &
- TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013). et al.
- Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008). , , , &
- Digital multiplexed gene expression analysis using the NanoString nCounter system. Curr. Protoc. Mol. Biol. 94, 25B.10 (2011).
- RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011). &
- Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat. Methods 7, 1009–1015 (2010). , , &
- rQuant.web: a tool for RNA-Seq-based transcript quantitation. Nucleic Acids Res. 38, W348–W351 (2010). &
- GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012). et al.
- The modENCODE Consortium et al. Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science 330, 1787–1797 (2010).
- Landscape of transcription in human cells. Nature 489, 101–108 (2012). et al.
- The developmental transcriptome of Drosophila melanogaster. Nature 471, 473–479 (2011). et al.
- Scaffolding a Caenorhabditis nematode genome with RNA-seq. Genome Res. 20, 1740–1747 (2010). et al.
- STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013). et al.
- Supplementary Text and Figures (10,004 KB)
Supplementary Figures 1–31, Supplementary Tables 1–10 and Supplementary Note