Full-length transcriptome assembly from RNA-Seq data without a reference genome

Grabherr, Manfred G; Haas, Brian J; Yassour, Moran; Levin, Joshua Z; Thompson, Dawn A; Amit, Ido; Adiconis, Xian; Fan, Lin; Raychowdhury, Raktima; Zeng, Qiandong; Chen, Zehua; Mauceli, Evan; Hacohen, Nir; Gnirke, Andreas; Rhind, Nicholas; di Palma, Federica; Birren, Bruce W; Nusbaum, Chad; Lindblad-Toh, Kerstin; Friedman, Nir; Regev, Aviv

doi:10.1038/nbt.1883

Article
Published: 15 May 2011

Full-length transcriptome assembly from RNA-Seq data without a reference genome

Manfred G Grabherr¹^na1,
Brian J Haas¹^na1,
Moran Yassour^1,2,3^na1,
Joshua Z Levin¹,
Dawn A Thompson¹,
Ido Amit¹,
Xian Adiconis¹,
Lin Fan¹,
Raktima Raychowdhury¹,
Qiandong Zeng¹,
Zehua Chen¹,
Evan Mauceli¹,
Nir Hacohen¹,
Andreas Gnirke¹,
Nicholas Rhind⁴,
Federica di Palma¹,
Bruce W Birren¹,
Chad Nusbaum¹,
Kerstin Lindblad-Toh^1,5,
Nir Friedman^2,6 &
…
Aviv Regev^1,3,7

Nature Biotechnology volume 29, pages 644–652 (2011)Cite this article

85k Accesses
13k Citations
63 Altmetric
Metrics details

Subjects

Abstract

Massively parallel sequencing of cDNA has enabled deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here we present the Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available. By efficiently constructing and analyzing sets of de Bruijn graphs, Trinity fully reconstructs a large fraction of transcripts, including alternatively spliced isoforms and transcripts from recently duplicated genes. Compared with other de novo transcriptome assemblers, Trinity recovers more full-length transcripts across a broad range of expression levels, with a sensitivity similar to methods that rely on genome alignments. Our approach provides a unified solution for transcriptome reconstruction in any sample, especially in the absence of a reference genome.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 2: Trinity correctly reconstructs the majority of full-length transcripts in fission yeast and mouse.**

**Figure 3: Trinity improves the yeast annotation.**

**Figure 4: Trinity resolves closely paralogous genes.**

**Figure 5: Comparison of Trinity to other mapping-first and assembly-first methods.**

**Figure 6: Trinity reconstructs polymorphic transcripts in whitefly.**

Three million images and morphological profiles of cells treated with matched chemical and genetic perturbations

Article Open access 09 April 2024

Single-cell long-read sequencing-based mapping reveals specialized splicing patterns in developing and adult mouse and human brain

Article Open access 09 April 2024

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Accession codes

Primary accessions

Gene Expression Omnibus

GSE29209

Sequence Read Archive

SRP005611

References

Birol, I. et al. De novo transcriptome assembly with ABySS. Bioinformatics 25, 2872–2877 (2009).
Article CAS PubMed Google Scholar
Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).
Article CAS PubMed PubMed Central Google Scholar
Guttman, M. et al. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat. Biotechnol. 28, 503–510 (2010).
Article CAS PubMed PubMed Central Google Scholar
Haas, B.J. & Zody, M.C. Advancing RNA-Seq analysis. Nat. Biotechnol. 28, 421–423 (2010).
Article CAS PubMed Google Scholar
Yassour, M. et al. Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing. Proc. Natl. Acad. Sci. USA 106, 3264–3269 (2009).
Article CAS PubMed PubMed Central Google Scholar
Li, R. et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25, 1966–1967 (2009).
Article CAS PubMed Google Scholar
De Bruijn, N.G. A combinatorical problem. Koninklijke Nederlandse Akademie v. Wetenschappen 46, 758–764 (1946).
Google Scholar
Good, I.J. Normal recurring decimals. J. Lond. Math. Soc. 21, 167–169 (1946).
Article Google Scholar
Pevzner, P.A., Tang, H. & Waterman, M.S. An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. USA 98, 9748–9753 (2001).
Article CAS PubMed PubMed Central Google Scholar
Zerbino, D.R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008).
Article CAS PubMed PubMed Central Google Scholar
Butler, J. et al. ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res. 18, 810–820 (2008).
Article CAS PubMed PubMed Central Google Scholar
Hertz-Fowler, C. et al. GeneDB: a resource for prokaryotic and eukaryotic organisms. Nucleic Acids Res. 32, D339–D343 (2004).
Article CAS PubMed PubMed Central Google Scholar
Levin, J.Z. et al. Comprehensive comparative analysis of strand-specific RNA sequencing methods. Nat. Methods 7, 709–715 (2010).
Article CAS PubMed PubMed Central Google Scholar
Parkhomchuk, D. et al. Transcriptome analysis by strand-specific sequencing of complementary DNA. Nucleic Acids Res. 37, e123 (2009).
Article PubMed PubMed Central Google Scholar
Rhind, N. et al. Comparative functional genomics of the fission yeasts. Science published online, doi:10.1126/science.1203357 (21 April 2011).
Wang, E.T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).
Article CAS PubMed PubMed Central Google Scholar
Wilhelm, B.T. et al. Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution. Nature 453, 1239–1243 (2008).
Article CAS PubMed Google Scholar
Xu, Z. et al. Bidirectional promoters generate pervasive transcription in yeast. Nature 457, 1033–1037 (2009).
Article CAS PubMed PubMed Central Google Scholar
Wu, T.D. & Watanabe, C.K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005).
Article CAS PubMed Google Scholar
Wu, C.H. et al. The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res. 34, D187–D191 (2006).
Article CAS PubMed Google Scholar
Wapinski, I., Pfeffer, A., Friedman, N. & Regev, A. Natural history and evolutionary principles of gene duplication in fungi. Nature 449, 54–61 (2007).
Article CAS PubMed Google Scholar
Molnar, M. et al. Characterization of rec7, an early meiotic recombination gene in Schizosaccharomyces pombe. Genetics 157, 519–532 (2001).
CAS PubMed PubMed Central Google Scholar
Nakamura, T., Kishida, M. & Shimoda, C. The Schizosaccharomyces pombe spo6⁺ gene encoding a nuclear protein with sequence similarity to budding yeast Dbf4 is required for meiotic second division and sporulation. Genes Cells 5, 463–479 (2000).
Article CAS PubMed Google Scholar
Watanabe, T. et al. Comprehensive isolation of meiosis-specific genes identifies novel proteins and unusual non-coding transcripts in Schizosaccharomyces pombe. Nucleic Acids Res. 29, 2327–2337 (2001).
Article CAS PubMed PubMed Central Google Scholar
Yassour, M. et al. Strand-specific RNA sequencing reveals extensive regulated long antisense transcripts that are conserved across yeast species. Genome Biol. 11, R87 (2010).
Article PubMed PubMed Central Google Scholar
Matlin, A.J., Clark, F. & Smith, C.W.J. Understanding alternative splicing: towards a cellular code. Nat. Rev. Mol. Cell Biol. 6, 386–398 (2005).
Article CAS PubMed Google Scholar
Robertson, G. et al. De novo assembly and analysis of RNA-seq data. Nat. Methods 7, 909–912 (2010).
Article CAS PubMed Google Scholar
Graveley, B.R. Alternative splicing: increasing diversity in the proteomic world. Trends Genet. 17, 100–107 (2001).
Article CAS PubMed Google Scholar
Wang, X.-W. et al. De novo characterization of a whitefly transcriptome and analysis of its gene expression during development. BMC Genomics 11, 400 (2010).
Article PubMed PubMed Central Google Scholar
Salzberg, S.L. & Yorke, J.A. Beware of mis-assembled genomes. Bioinformatics 21, 4320–4321 (2005).
Article CAS PubMed Google Scholar
Shannon, C.E. Prediction and entropy of printed English. Bell Syst. Tech. J. 30, 50–64 (1951).
Article Google Scholar
Price, A.L., Jones, N.C. & Pevzner, P.A. De novo identification of repeat families in large genomes. Bioinformatics 21 Suppl 1, i351–i358 (2005).
Article CAS PubMed Google Scholar
Grabherr, M.G. et al. Genome-wide synteny through highly sensitive sequence alignment: Satsuma. Bioinformatics 26, 1145–1151 (2010).
Article CAS PubMed PubMed Central Google Scholar
Trapnell, C., Pachter, L. & Salzberg, S.L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).
Article CAS PubMed PubMed Central Google Scholar
Kent, W.J. BLAT–the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank L. Gaffney for help with figure preparation, J. Bochicchio for project management, the Broad Sequencing Platform for all sequencing work, A. Papanicolaou and M. Ott for Inchworm software testing and code enhancements, and F. Ribeiro for helpful discussions regarding error pruning. The work was supported in part by a grant from the National Human Genome Research Institute (NIH 1 U54 HG03067, Lander), the Howard Hughes Medical Institute, a National Institutes of Health PIONEER award, a Burroughs Wellcome Fund–Career Award at the Scientific Interface (A.R.), the US-Israel Binational Science Foundation (N.F. and A.R.), and funds from the National Institute of Allergy and Infectious Diseases under contract no. HHSN27220090018C. M.Y. was supported by a Clore Fellowship. K.L.-T. is a recipient of the European Young Investigator Award (EYRYI) funded by the European Science Foundation. A.R. is a researcher of the Merkin Foundation for Stem Cell Research at the Broad Institute.

Author information

Manfred G Grabherr, Brian J Haas and Moran Yassour: These authors contributed equally to this work.

Authors and Affiliations

Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts, USA
Manfred G Grabherr, Brian J Haas, Moran Yassour, Joshua Z Levin, Dawn A Thompson, Ido Amit, Xian Adiconis, Lin Fan, Raktima Raychowdhury, Qiandong Zeng, Zehua Chen, Evan Mauceli, Nir Hacohen, Andreas Gnirke, Federica di Palma, Bruce W Birren, Chad Nusbaum, Kerstin Lindblad-Toh & Aviv Regev
School of Computer Science, Hebrew University, Jerusalem, Israel
Moran Yassour & Nir Friedman
Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
Moran Yassour & Aviv Regev
Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts, USA
Nicholas Rhind
Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
Kerstin Lindblad-Toh
Alexander Silberman Institute of Life Sciences, Hebrew University, Jerusalem, Israel
Nir Friedman
Department of Biology, Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
Aviv Regev

Authors

Manfred G Grabherr
View author publications
You can also search for this author in PubMed Google Scholar
Brian J Haas
View author publications
You can also search for this author in PubMed Google Scholar
Moran Yassour
View author publications
You can also search for this author in PubMed Google Scholar
Joshua Z Levin
View author publications
You can also search for this author in PubMed Google Scholar
Dawn A Thompson
View author publications
You can also search for this author in PubMed Google Scholar
Ido Amit
View author publications
You can also search for this author in PubMed Google Scholar
Xian Adiconis
View author publications
You can also search for this author in PubMed Google Scholar
Lin Fan
View author publications
You can also search for this author in PubMed Google Scholar
Raktima Raychowdhury
View author publications
You can also search for this author in PubMed Google Scholar
Qiandong Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Zehua Chen
View author publications
You can also search for this author in PubMed Google Scholar
Evan Mauceli
View author publications
You can also search for this author in PubMed Google Scholar
Nir Hacohen
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Gnirke
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas Rhind
View author publications
You can also search for this author in PubMed Google Scholar
Federica di Palma
View author publications
You can also search for this author in PubMed Google Scholar
Bruce W Birren
View author publications
You can also search for this author in PubMed Google Scholar
Chad Nusbaum
View author publications
You can also search for this author in PubMed Google Scholar
Kerstin Lindblad-Toh
View author publications
You can also search for this author in PubMed Google Scholar
Nir Friedman
View author publications
You can also search for this author in PubMed Google Scholar
Aviv Regev
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.G.G., M.Y., B.J.H., K.L.-T., N.F. and A.R. conceived and designed the study. B.J.H., M.G.G. and M.Y. developed the Inchworm, Chrysalis and Butterfly components, respectively. N.R., F.D.P., B.W.B., C.N., K.L.-T. contributed to the study's conception and execution. J.Z.L., D.A.T., X.A., L.F., R.R., I.A., N.H., A.R. and A.G. designed and performed all experiments. Q.Z., Z.C. and E.M. contributed computational analyses. M.G.G., B.J.H. and M.Y. designed, implemented and evaluated all methods. A.R., N.F., M.G.G., B.J.H. and M.Y. wrote the manuscript, with input from all authors. A.R. and N.F. contributed equally to this paper.

Corresponding authors

Correspondence to Nir Friedman or Aviv Regev.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Tables 1–3, Supplementary Methods, Supplementary Note and Supplementary Figures 1–9 (PDF 394 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Grabherr, M., Haas, B., Yassour, M. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29, 644–652 (2011). https://doi.org/10.1038/nbt.1883

Download citation

Received: 03 December 2010
Accepted: 28 April 2011
Published: 15 May 2011
Issue Date: July 2011
DOI: https://doi.org/10.1038/nbt.1883

This article is cited by

Girdling behavior of the longhorn beetle modulates the host plant to enhance larval performance
- Min-Soo Choi
- Juhee Lee
- Youngsung Joo
BMC Ecology and Evolution (2024)
Functional similarity, despite taxonomical divergence in the millipede gut microbiota, points to a common trophic strategy
- Julius Eyiuche Nweze
- Vladimír Šustr
- Roey Angel
Microbiome (2024)
GTax: improving de novo transcriptome assembly by removing foreign RNA contamination
- Roberto Vera Alvarez
- David Landsman
Genome Biology (2024)
Pangenome analysis reveals transposon-driven genome evolution in cotton
- Xin He
- Zhengyang Qi
- Maojun Wang
BMC Biology (2024)
Roast: a tool for reference-free optimization of supertranscriptome assemblies
- Madiha Shabbir
- Aziz Mithani
BMC Bioinformatics (2024)