Massively parallel sequencing of cDNA has enabled deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here we present the Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available. By efficiently constructing and analyzing sets of de Bruijn graphs, Trinity fully reconstructs a large fraction of transcripts, including alternatively spliced isoforms and transcripts from recently duplicated genes. Compared with other de novo transcriptome assemblers, Trinity recovers more full-length transcripts across a broad range of expression levels, with a sensitivity similar to methods that rely on genome alignments. Our approach provides a unified solution for transcriptome reconstruction in any sample, especially in the absence of a reference genome.
At a glance
- De novo transcriptome assembly with ABySS. Bioinformatics 25, 2872–2877 (2009). et al.
- Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010). et al.
- Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat. Biotechnol. 28, 503–510 (2010). et al.
- Advancing RNA-Seq analysis. Nat. Biotechnol. 28, 421–423 (2010). &
- Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing. Proc. Natl. Acad. Sci. USA 106, 3264–3269 (2009). et al.
- SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25, 1966–1967 (2009). et al.
- A combinatorical problem. Koninklijke Nederlandse Akademie v. Wetenschappen 46, 758–764 (1946).
- Normal recurring decimals. J. Lond. Math. Soc. 21, 167–169 (1946).
- An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. USA 98, 9748–9753 (2001). , &
- Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008). &
- ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res. 18, 810–820 (2008). et al.
- GeneDB: a resource for prokaryotic and eukaryotic organisms. Nucleic Acids Res. 32, D339–D343 (2004). et al.
- Comprehensive comparative analysis of strand-specific RNA sequencing methods. Nat. Methods 7, 709–715 (2010). et al.
- Transcriptome analysis by strand-specific sequencing of complementary DNA. Nucleic Acids Res. 37, e123 (2009). et al.
- Comparative functional genomics of the fission yeasts. Science published online, doi:doi:10.1126/science.1203357 (21 April 2011). et al.
- Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008). et al.
- Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution. Nature 453, 1239–1243 (2008). et al.
- Bidirectional promoters generate pervasive transcription in yeast. Nature 457, 1033–1037 (2009). et al.
- GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005). &
- The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res. 34, D187–D191 (2006). et al.
- Natural history and evolutionary principles of gene duplication in fungi. Nature 449, 54–61 (2007). , , &
- Characterization of rec7, an early meiotic recombination gene in Schizosaccharomyces pombe. Genetics 157, 519–532 (2001). et al.
- The Schizosaccharomyces pombe spo6+ gene encoding a nuclear protein with sequence similarity to budding yeast Dbf4 is required for meiotic second division and sporulation. Genes Cells 5, 463–479 (2000). , &
- Comprehensive isolation of meiosis-specific genes identifies novel proteins and unusual non-coding transcripts in Schizosaccharomyces pombe. Nucleic Acids Res. 29, 2327–2337 (2001). et al.
- Strand-specific RNA sequencing reveals extensive regulated long antisense transcripts that are conserved across yeast species. Genome Biol. 11, R87 (2010). et al.
- Understanding alternative splicing: towards a cellular code. Nat. Rev. Mol. Cell Biol. 6, 386–398 (2005). , &
- De novo assembly and analysis of RNA-seq data. Nat. Methods 7, 909–912 (2010). et al.
- Alternative splicing: increasing diversity in the proteomic world. Trends Genet. 17, 100–107 (2001).
- De novo characterization of a whitefly transcriptome and analysis of its gene expression during development. BMC Genomics 11, 400 (2010). et al.
- Beware of mis-assembled genomes. Bioinformatics 21, 4320–4321 (2005). &
- Prediction and entropy of printed English. Bell Syst. Tech. J. 30, 50–64 (1951).
- De novo identification of repeat families in large genomes. Bioinformatics 21 Suppl 1, i351–i358 (2005). , &
- Genome-wide synteny through highly sensitive sequence alignment: Satsuma. Bioinformatics 26, 1145–1151 (2010). et al.
- TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009). , &
- BLAT–the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
- Supplementary Text and Figures (406K)
Supplementary Tables 1–3, Supplementary Methods, Supplementary Note and Supplementary Figures 1–9