Abstract

Massively parallel sequencing of cDNA has enabled deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here we present the Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available. By efficiently constructing and analyzing sets of de Bruijn graphs, Trinity fully reconstructs a large fraction of transcripts, including alternatively spliced isoforms and transcripts from recently duplicated genes. Compared with other de novo transcriptome assemblers, Trinity recovers more full-length transcripts across a broad range of expression levels, with a sensitivity similar to methods that rely on genome alignments. Our approach provides a unified solution for transcriptome reconstruction in any sample, especially in the absence of a reference genome.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Accessions

Primary accessions

Gene Expression Omnibus

Sequence Read Archive

References

  1. 1.

    et al. De novo transcriptome assembly with ABySS. Bioinformatics 25, 2872–2877 (2009).

  2. 2.

    et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).

  3. 3.

    et al. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat. Biotechnol. 28, 503–510 (2010).

  4. 4.

    & Advancing RNA-Seq analysis. Nat. Biotechnol. 28, 421–423 (2010).

  5. 5.

    et al. Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing. Proc. Natl. Acad. Sci. USA 106, 3264–3269 (2009).

  6. 6.

    et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25, 1966–1967 (2009).

  7. 7.

    A combinatorical problem. Koninklijke Nederlandse Akademie v. Wetenschappen 46, 758–764 (1946).

  8. 8.

    Normal recurring decimals. J. Lond. Math. Soc. 21, 167–169 (1946).

  9. 9.

    , & An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. USA 98, 9748–9753 (2001).

  10. 10.

    & Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008).

  11. 11.

    et al. ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res. 18, 810–820 (2008).

  12. 12.

    et al. GeneDB: a resource for prokaryotic and eukaryotic organisms. Nucleic Acids Res. 32, D339–D343 (2004).

  13. 13.

    et al. Comprehensive comparative analysis of strand-specific RNA sequencing methods. Nat. Methods 7, 709–715 (2010).

  14. 14.

    et al. Transcriptome analysis by strand-specific sequencing of complementary DNA. Nucleic Acids Res. 37, e123 (2009).

  15. 15.

    et al. Comparative functional genomics of the fission yeasts. Science published online, doi:10.1126/science.1203357 (21 April 2011).

  16. 16.

    et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).

  17. 17.

    et al. Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution. Nature 453, 1239–1243 (2008).

  18. 18.

    et al. Bidirectional promoters generate pervasive transcription in yeast. Nature 457, 1033–1037 (2009).

  19. 19.

    & GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005).

  20. 20.

    et al. The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res. 34, D187–D191 (2006).

  21. 21.

    , , & Natural history and evolutionary principles of gene duplication in fungi. Nature 449, 54–61 (2007).

  22. 22.

    et al. Characterization of rec7, an early meiotic recombination gene in Schizosaccharomyces pombe. Genetics 157, 519–532 (2001).

  23. 23.

    , & The Schizosaccharomyces pombe spo6+ gene encoding a nuclear protein with sequence similarity to budding yeast Dbf4 is required for meiotic second division and sporulation. Genes Cells 5, 463–479 (2000).

  24. 24.

    et al. Comprehensive isolation of meiosis-specific genes identifies novel proteins and unusual non-coding transcripts in Schizosaccharomyces pombe. Nucleic Acids Res. 29, 2327–2337 (2001).

  25. 25.

    et al. Strand-specific RNA sequencing reveals extensive regulated long antisense transcripts that are conserved across yeast species. Genome Biol. 11, R87 (2010).

  26. 26.

    , & Understanding alternative splicing: towards a cellular code. Nat. Rev. Mol. Cell Biol. 6, 386–398 (2005).

  27. 27.

    et al. De novo assembly and analysis of RNA-seq data. Nat. Methods 7, 909–912 (2010).

  28. 28.

    Alternative splicing: increasing diversity in the proteomic world. Trends Genet. 17, 100–107 (2001).

  29. 29.

    et al. De novo characterization of a whitefly transcriptome and analysis of its gene expression during development. BMC Genomics 11, 400 (2010).

  30. 30.

    & Beware of mis-assembled genomes. Bioinformatics 21, 4320–4321 (2005).

  31. 31.

    Prediction and entropy of printed English. Bell Syst. Tech. J. 30, 50–64 (1951).

  32. 32.

    , & De novo identification of repeat families in large genomes. Bioinformatics 21 Suppl 1, i351–i358 (2005).

  33. 33.

    et al. Genome-wide synteny through highly sensitive sequence alignment: Satsuma. Bioinformatics 26, 1145–1151 (2010).

  34. 34.

    , & TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).

  35. 35.

    BLAT–the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).

Download references

Acknowledgements

We thank L. Gaffney for help with figure preparation, J. Bochicchio for project management, the Broad Sequencing Platform for all sequencing work, A. Papanicolaou and M. Ott for Inchworm software testing and code enhancements, and F. Ribeiro for helpful discussions regarding error pruning. The work was supported in part by a grant from the National Human Genome Research Institute (NIH 1 U54 HG03067, Lander), the Howard Hughes Medical Institute, a National Institutes of Health PIONEER award, a Burroughs Wellcome Fund–Career Award at the Scientific Interface (A.R.), the US-Israel Binational Science Foundation (N.F. and A.R.), and funds from the National Institute of Allergy and Infectious Diseases under contract no. HHSN27220090018C. M.Y. was supported by a Clore Fellowship. K.L.-T. is a recipient of the European Young Investigator Award (EYRYI) funded by the European Science Foundation. A.R. is a researcher of the Merkin Foundation for Stem Cell Research at the Broad Institute.

Author information

Author notes

    • Manfred G Grabherr
    • , Brian J Haas
    •  & Moran Yassour

    These authors contributed equally to this work.

Affiliations

  1. Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts, USA.

    • Manfred G Grabherr
    • , Brian J Haas
    • , Moran Yassour
    • , Joshua Z Levin
    • , Dawn A Thompson
    • , Ido Amit
    • , Xian Adiconis
    • , Lin Fan
    • , Raktima Raychowdhury
    • , Qiandong Zeng
    • , Zehua Chen
    • , Evan Mauceli
    • , Nir Hacohen
    • , Andreas Gnirke
    • , Federica di Palma
    • , Bruce W Birren
    • , Chad Nusbaum
    • , Kerstin Lindblad-Toh
    •  & Aviv Regev
  2. School of Computer Science, Hebrew University, Jerusalem, Israel.

    • Moran Yassour
    •  & Nir Friedman
  3. Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

    • Moran Yassour
    •  & Aviv Regev
  4. Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts, USA.

    • Nicholas Rhind
  5. Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.

    • Kerstin Lindblad-Toh
  6. Alexander Silberman Institute of Life Sciences, Hebrew University, Jerusalem, Israel.

    • Nir Friedman
  7. Howard Hughes Medical Institute, Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

    • Aviv Regev

Authors

  1. Search for Manfred G Grabherr in:

  2. Search for Brian J Haas in:

  3. Search for Moran Yassour in:

  4. Search for Joshua Z Levin in:

  5. Search for Dawn A Thompson in:

  6. Search for Ido Amit in:

  7. Search for Xian Adiconis in:

  8. Search for Lin Fan in:

  9. Search for Raktima Raychowdhury in:

  10. Search for Qiandong Zeng in:

  11. Search for Zehua Chen in:

  12. Search for Evan Mauceli in:

  13. Search for Nir Hacohen in:

  14. Search for Andreas Gnirke in:

  15. Search for Nicholas Rhind in:

  16. Search for Federica di Palma in:

  17. Search for Bruce W Birren in:

  18. Search for Chad Nusbaum in:

  19. Search for Kerstin Lindblad-Toh in:

  20. Search for Nir Friedman in:

  21. Search for Aviv Regev in:

Contributions

M.G.G., M.Y., B.J.H., K.L.-T., N.F. and A.R. conceived and designed the study. B.J.H., M.G.G. and M.Y. developed the Inchworm, Chrysalis and Butterfly components, respectively. N.R., F.D.P., B.W.B., C.N., K.L.-T. contributed to the study's conception and execution. J.Z.L., D.A.T., X.A., L.F., R.R., I.A., N.H., A.R. and A.G. designed and performed all experiments. Q.Z., Z.C. and E.M. contributed computational analyses. M.G.G., B.J.H. and M.Y. designed, implemented and evaluated all methods. A.R., N.F., M.G.G., B.J.H. and M.Y. wrote the manuscript, with input from all authors. A.R. and N.F. contributed equally to this paper.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Nir Friedman or Aviv Regev.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Tables 1–3, Supplementary Methods, Supplementary Note and Supplementary Figures 1–9

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nbt.1883

Further reading