Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Accurate assembly of transcripts through phase-preserving graph decomposition

Abstract

We introduce Scallop, an accurate reference-based transcript assembler that improves reconstruction of multi-exon and lowly expressed transcripts. Scallop preserves long-range phasing paths extracted from reads, while producing a parsimonious set of transcripts and minimizing coverage deviation. On 10 human RNA-seq samples, Scallop produces 34.5% and 36.3% more correct multi-exon transcripts than StringTie and TransComb, and respectively identifies 67.5% and 52.3% more lowly expressed transcripts. Scallop achieves higher sensitivity and precision than previous approaches over a wide range of coverage thresholds.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Comparison of the three methods (StringTie, TransComb, and Scallop) over the five testing samples.
Figure 2: Overview of Scallop.

References

  1. 1

    Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Nat. Meth. 5, 621–628 (2008).

    CAS  Article  Google Scholar 

  2. 2

    Lister, R. et al. Cell 133, 523–536 (2008).

    CAS  Article  Google Scholar 

  3. 3

    Wang, Z., Gerstein, M. & Snyder, M. Nat. Rev. Genet. 10, 57–63 (2009).

    CAS  Article  Google Scholar 

  4. 4

    Pickrell, J.K. et al. Nature 464, 768–772 (2010).

    CAS  Article  Google Scholar 

  5. 5

    Trapnell, C. et al. Nat. Biotechnol. 28, 511–515 (2010).

    CAS  Article  Google Scholar 

  6. 6

    Guttman, M. et al. Nat. Biotechnol. 28, 503–510 (2010).

    CAS  Article  Google Scholar 

  7. 7

    Li, W., Feng, J. & Jiang, T. J. Comput. Biol. 18, 1693–1707 (2011).

    Article  Google Scholar 

  8. 8

    Lin, Y.-Y. et al. in Proc. 12th Workshop Algs. in Bioinf. (WABI'12), vol. 7534 of Lecture Notes in Comp. Sci. 178–189 (2012).

  9. 9

    Song, L. & Florea, L. BMC Bioinformatics 14, S14 (2013).

    Article  Google Scholar 

  10. 10

    Neff, K.L. et al. BMC Bioinformatics 14, 1 (2013).

    CAS  Article  Google Scholar 

  11. 11

    Maretty, L., Sibbesen, J.A. & Krogh, A. Genome Biol. 15, 1 (2014).

    Article  Google Scholar 

  12. 12

    Canzar, S., Andreotti, S., Weese, D., Reinert, K. & Klau, G.W. Genome Biol. 17, 16 (2016).

    Article  Google Scholar 

  13. 13

    Pertea, M. et al. Nat. Biotechnol. 33, 290–295 (2015).

    CAS  Article  Google Scholar 

  14. 14

    Liu, J., Yu, T., Jiang, T. & Li, G. Genome Biol. 17, 213 (2016).

    Article  Google Scholar 

  15. 15

    Hayer, K.E., Pizarro, A., Lahens, N.F., Hogenesch, J.B. & Grant, G.R. Bioinformatics 31, 3938–3945 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. 16

    Kim, D. et al. Genome Biol. 14, R36 (2013).

    Article  Google Scholar 

  17. 17

    Dobin, A. et al. Bioinformatics 29, 15–21 (2013).

    CAS  Article  Google Scholar 

  18. 18

    Kim, D., Langmead, B. & Salzberg, S.L. Nat. Methods 12, 357–360 (2015).

    CAS  Article  Google Scholar 

  19. 19

    Patro, R., Duggal, G., Love, M.I., Irizarry, R.A. & Kingsford, C. Nat. Methods 14, 417–419 (2017).

    CAS  Article  Google Scholar 

  20. 20

    Bray, N.L., Pimentel, H., Melsted, P. & Pachter, L. Nat. Biotechnol. 34, 525–527 (2016).

    CAS  Article  Google Scholar 

  21. 21

    Vatinlen, B., Chauvet, F., Chrétienne, P. & Mahey, P. Eur. J. Oper. Res. 185, 1390–1401 (2008).

    Article  Google Scholar 

  22. 22

    Shao, M. & Kingsford, C. Preprint at bioRxiv https://www.biorxiv.org/content/early/2016/11/16/087759.

Download references

Acknowledgements

We thank Cong Ma and Juntao Liu for helpful suggestions and discussions. This research is funded in part by the Gordon and Betty Moore Foundation's Data-Driven Discovery Initiative through Grant GBMF4554 to C.K., by The Shurl and Kay Curci Foundation, by the US National Science Foundation (CCF-1256087, CCF-1319998), and by the US National Institutes of Health (R01HG007104 and R01GM122935).

Author information

Affiliations

Authors

Contributions

M.S. and C.K. designed the method, and M.S. implemented it. M.S. and C.K. designed the experiments, and M.S. conducted them. M.S. and C.K. wrote the manuscript.

Corresponding author

Correspondence to Carl Kingsford.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–21, Supplementary Tables 1–3, Supplementary Notes 1–7 (PDF 2473 kb)

Life Sciences Reporting Summary (PDF 176 kb)

Supplementary Code

Source Code of Scallop (ZIP 178 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Shao, M., Kingsford, C. Accurate assembly of transcripts through phase-preserving graph decomposition. Nat Biotechnol 35, 1167–1169 (2017). https://doi.org/10.1038/nbt.4020

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing