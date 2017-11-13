Abstract
We introduce Scallop, an accurate reference-based transcript assembler that improves reconstruction of multi-exon and lowly expressed transcripts. Scallop preserves long-range phasing paths extracted from reads, while producing a parsimonious set of transcripts and minimizing coverage deviation. On 10 human RNA-seq samples, Scallop produces 34.5% and 36.3% more correct multi-exon transcripts than StringTie and TransComb, and respectively identifies 67.5% and 52.3% more lowly expressed transcripts. Scallop achieves higher sensitivity and precision than previous approaches over a wide range of coverage thresholds.
References
- 1.
Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Nat. Meth. 5, 621–628 (2008).
- 2.
Lister, R. et al. Cell 133, 523–536 (2008).
- 3.
Wang, Z., Gerstein, M. & Snyder, M. Nat. Rev. Genet. 10, 57–63 (2009).
- 4.
Pickrell, J.K. et al. Nature 464, 768–772 (2010).
- 5.
Trapnell, C. et al. Nat. Biotechnol. 28, 511–515 (2010).
- 6.
Guttman, M. et al. Nat. Biotechnol. 28, 503–510 (2010).
- 7.
Li, W., Feng, J. & Jiang, T. J. Comput. Biol. 18, 1693–1707 (2011).
- 8.
Lin, Y.-Y. et al. in Proc. 12th Workshop Algs. in Bioinf. (WABI'12), vol. 7534 of Lecture Notes in Comp. Sci. 178–189 (2012).
- 9.
Song, L. & Florea, L. BMC Bioinformatics 14, S14 (2013).
- 10.
Neff, K.L. et al. BMC Bioinformatics 14, 1 (2013).
- 11.
Maretty, L., Sibbesen, J.A. & Krogh, A. Genome Biol. 15, 1 (2014).
- 12.
Canzar, S., Andreotti, S., Weese, D., Reinert, K. & Klau, G.W. Genome Biol. 17, 16 (2016).
- 13.
Pertea, M. et al. Nat. Biotechnol. 33, 290–295 (2015).
- 14.
Liu, J., Yu, T., Jiang, T. & Li, G. Genome Biol. 17, 213 (2016).
- 15.
Hayer, K.E., Pizarro, A., Lahens, N.F., Hogenesch, J.B. & Grant, G.R. Bioinformatics 31, 3938–3945 (2015).
- 16.
Kim, D. et al. Genome Biol. 14, R36 (2013).
- 17.
Dobin, A. et al. Bioinformatics 29, 15–21 (2013).
- 18.
Kim, D., Langmead, B. & Salzberg, S.L. Nat. Methods 12, 357–360 (2015).
- 19.
Patro, R., Duggal, G., Love, M.I., Irizarry, R.A. & Kingsford, C. Nat. Methods 14, 417–419 (2017).
- 20.
Bray, N.L., Pimentel, H., Melsted, P. & Pachter, L. Nat. Biotechnol. 34, 525–527 (2016).
- 21.
Vatinlen, B., Chauvet, F., Chrétienne, P. & Mahey, P. Eur. J. Oper. Res. 185, 1390–1401 (2008).
- 22.
Shao, M. & Kingsford, C. Preprint at bioRxiv https://www.biorxiv.org/content/early/2016/11/16/087759.
Acknowledgements
We thank Cong Ma and Juntao Liu for helpful suggestions and discussions. This research is funded in part by the Gordon and Betty Moore Foundation's Data-Driven Discovery Initiative through Grant GBMF4554 to C.K., by The Shurl and Kay Curci Foundation, by the US National Science Foundation (CCF-1256087, CCF-1319998), and by the US National Institutes of Health (R01HG007104 and R01GM122935).
Supplementary information
PDF files
- 1.
Supplementary Text and Figures
Supplementary Figures 1–21, Supplementary Tables 1–3, Supplementary Notes 1–7
- 2.
Life Sciences Reporting Summary
Zip files
- 1.
Supplementary Code
Source Code of Scallop