De novo assembly and analysis of RNA-seq data

Journal name:
Nature Methods
Volume:
7,
Pages:
909–912
Year published:
DOI:
doi:10.1038/nmeth.1517
Received
Accepted
Published online

We describe Trans-ABySS, a de novo short-read transcriptome assembly and analysis pipeline that addresses variation in local read densities by assembling read substrings with varying stringencies and then merging the resulting contigs before analysis. Analyzing 7.4 gigabases of 50-base-pair paired-end Illumina reads from an adult mouse liver poly(A) RNA library, we identified known, new and alternative structures in expressed transcripts, and achieved high sensitivity and specificity relative to reference-based assembly methods.

At a glance

Figures

  1. Representation of transcripts and contigs across assemblies.
    Figure 1: Representation of transcripts and contigs across assemblies.

    (a) Distributions of normalized mean transcript coverage from read-to-genome alignments and assembly k-mer length, for unmerged contigs from assemblies for every other k value between 26 and 50 bp (left to right, with the curve for each k value in a different color). Results are shown for all Ensembl v54 mouse transcripts (gray), and for contigs that cover at least 80% of the transcript's total exon length. Inset, distribution of transcripts for each each k value. (b) Result of contig merging for main contigs from assemblies with k values of 26–50 bp. 'Buried' contigs are those with an exact sequence match within a longer 'parent' contig from another assembly. 'Untouched' contigs have no sequence match in another assembly.

  2. Performance comparisons between ABySS and reference-based transcriptome analysis tools.
    Figure 2: Performance comparisons between ABySS and reference-based transcriptome analysis tools.

    (a) Number of Ensembl v54 transcripts reconstructed to 80% by ABySS, Cufflinks and Scripture by a single contig as a function of mean read coverage, C. (b) Intron-level sensitivity and specificity of Trans-ABySS, Cufflinks, Scripture and TopHat, relative to all 298,893 nonredundant introns from UCSC genome browser, RefSeq, Ensembl and AceView transcript models. Tophat split-read alignments are shown as a curve for intron support levels ranging from 1 to 100 reads. Alignments of Trans-ABySS de novo contigs, and reference-based Tophat Cufflinks and Scripture generated contigs, are represented as points, each of which represents a set of contigs. Two points are shown for ABySS: non-reference-based filtering with (light blue) or without (dark blue) contig-level splice-site filtering.

References

  1. Pepke, S., Wold, B. & Mortazavi, A. Nat. Methods 6, S22S32 (2009).
  2. Griffith, M. et al. Nat. Methods 7, 843847 (2010).
  3. Ameur, A. et al. Genome Biol. 11, R34 (2010).
  4. Au, K.F. et al. Nucleic Acids Res. 38, 45704578 (2010).
  5. De Bona, F. et al. Bioinformatics 24, i174i180 (2008).
  6. Trapnell, C., Pachter, L. & Salzberg, S.L. Bioinformatics 25, 11051111 (2009).
  7. Wu, T.D. & Nacu, S. Bioinformatics 26, 873881 (2010).
  8. Guttman, M. et al. Nat. Biotechnol. 28, 503510 (2010).
  9. Trapnell, C. et al. Nat. Biotechnol. 28, 511515 (2010).
  10. Li, B. et al. Bioinformatics 26, 493500 (2010).
  11. Li, J., Jiang, H. & Wong, W.H. Genome Biol. 11, R50 (2010).
  12. Krawitz, P. et al. Bioinformatics 26, 722729 (2010).
  13. Cartwright, R.A. Mol. Biol. Evol. 26, 473480 (2009).
  14. Degner, J.F. et al. Bioinformatics 25, 32073212 (2009).
  15. Birzele, F. et al. Nucleic Acids Res. 38, 39994010 (2010).
  16. Simpson, J.T. et al. Genome Res. 19, 11171123 (2009).
  17. Flicek, P. & Birney, E. Nat. Methods 6 (Suppl.), S6S12 (2009).
  18. Birol, I. et al. Bioinformatics 25, 28722877 (2009).
  19. Slater, G.S. & Birney, E. BMC Bioinformatics 6, 31 (2005).
  20. Li, H. & Durbin, R. Bioinformatics 25, 17541760 (2009).
  21. Hubbard, T.J. et al. Nucleic Acids Res. 37, D690D697 (2009).
  22. Kent, W.J. Genome Res. 12, 656664 (2002).
  23. Hsu, F. et al. Bioinformatics 22, 10361046 (2006).
  24. Pruitt, K.D., Tatusova, T. & Maglott, D.R. Nucleic Acids Res. 35, D61D65 (2007).
  25. Thierry-Mieg, D. & Thierry-Mieg, J. Genome Biol. 7 (Suppl.), 1114 (2006).
  26. Melamud, E. & Moult, J. Nucleic Acids Res. 37, 48734886 (2009).
  27. Nagalakshmi, U. et al. Science 320, 13441349 (2008).
  28. Jackman, S.D. & Birol, I. Genome Biol. 11, 202 (2010).
  29. Sheth, N. et al. Nucleic Acids Res. 34, 39553967 (2006).
  30. Rhead, B. et al. Nucleic Acids Res. 38 Database issue, D613D619 (2010).
  31. Koscielny, G. et al. Genomics 93, 213220 (2009).
  32. Trapnell, C. & Salzberg, S.L. Nat. Biotechnol. 27, 455457 (2009).

Download references

Author information

Affiliations

  1. Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, Canada.

    • Gordon Robertson,
    • Jacqueline Schein,
    • Readman Chiu,
    • Richard Corbett,
    • Matthew Field,
    • Shaun D Jackman,
    • Karen Mungall,
    • Hisanaga Mark Okada,
    • Jenny Q Qian,
    • Malachi Griffith,
    • Anthony Raymond,
    • Nina Thiessen,
    • Timothee Cezard,
    • Yaron S Butterfield,
    • Richard Newsome,
    • Simon K Chan,
    • Rong She,
    • Richard Varhol,
    • Baljit Kamoh,
    • Anna-Liisa Prabhu,
    • Angela Tam,
    • YongJun Zhao,
    • Richard A Moore,
    • Martin Hirst,
    • Marco A Marra,
    • Steven J M Jones &
    • Inanc Birol
  2. Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, Canada.

    • Sam Lee &
    • Pamela A Hoodless
  3. Department of Medical Genetics, University of British Columbia, Vancouver, Canada.

    • Marco A Marra,
    • Steven J M Jones &
    • Pamela A Hoodless
  4. Present address: University of Edinburgh, Edinburgh, UK.

    • Timothee Cezard

Contributions

G.R. and J.S. wrote the paper. J.S., G.R. and K.M. reviewed predictions and recommended analysis methods. G.R. coordinated analysis and validation. B.K., A.-L.P. and A.T. constructed libraries under the supervision of YJ.Z. S.L. generated biological material and performed RT-PCR validation. R.A.M. supervised sequencing activities. Y.S.B., T.C., R. Corbett, R. Chiu, M.F., M.G., J.Q.Q., R.N., H.M.O., N.T., R.V., S.K.C. and R.S. developed analysis methods and code and performed analyses. R. Corbett and R. Chiu performed comparisons with reference-based methods. S.D.J. develops and maintains ABySS and generated the ABySS assemblies. A.R. contributed algorithms and code for ABySS. M.A.M., S.J.M.J. and P.A.H. directed research. S.J.M.J. suggested analysis methods. YJ.Z. and M.H. developed the WTSS protocol. J.S. supervised activities. P.A.H. supervised validation. I.B. developed ABySS and Trans-ABySS and directed bioinformatics work.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (2M)

    Supplementary Figures 1–21, Supplementary Tables 1–4, Supplementary Note

Additional data