Brief Communication

Accurate assembly of transcripts through phase-preserving graph decomposition

Received:
Accepted:
Published online:

Abstract

We introduce Scallop, an accurate reference-based transcript assembler that improves reconstruction of multi-exon and lowly expressed transcripts. Scallop preserves long-range phasing paths extracted from reads, while producing a parsimonious set of transcripts and minimizing coverage deviation. On 10 human RNA-seq samples, Scallop produces 34.5% and 36.3% more correct multi-exon transcripts than StringTie and TransComb, and respectively identifies 67.5% and 52.3% more lowly expressed transcripts. Scallop achieves higher sensitivity and precision than previous approaches over a wide range of coverage thresholds.

  • Subscribe to Nature Biotechnology for full access:

    $250

    Subscribe

Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.

References

  1. 1.

    , , , & Nat. Meth. 5, 621–628 (2008).

  2. 2.

    et al. Cell 133, 523–536 (2008).

  3. 3.

    , & Nat. Rev. Genet. 10, 57–63 (2009).

  4. 4.

    et al. Nature 464, 768–772 (2010).

  5. 5.

    et al. Nat. Biotechnol. 28, 511–515 (2010).

  6. 6.

    et al. Nat. Biotechnol. 28, 503–510 (2010).

  7. 7.

    , & J. Comput. Biol. 18, 1693–1707 (2011).

  8. 8.

    et al. in Proc. 12th Workshop Algs. in Bioinf. (WABI'12), vol. 7534 of Lecture Notes in Comp. Sci. 178–189 (2012).

  9. 9.

    & BMC Bioinformatics 14, S14 (2013).

  10. 10.

    et al. BMC Bioinformatics 14, 1 (2013).

  11. 11.

    , & Genome Biol. 15, 1 (2014).

  12. 12.

    , , , & Genome Biol. 17, 16 (2016).

  13. 13.

    et al. Nat. Biotechnol. 33, 290–295 (2015).

  14. 14.

    , , & Genome Biol. 17, 213 (2016).

  15. 15.

    , , , & Bioinformatics 31, 3938–3945 (2015).

  16. 16.

    et al. Genome Biol. 14, R36 (2013).

  17. 17.

    et al. Bioinformatics 29, 15–21 (2013).

  18. 18.

    , & Nat. Methods 12, 357–360 (2015).

  19. 19.

    , , , & Nat. Methods 14, 417–419 (2017).

  20. 20.

    , , & Nat. Biotechnol. 34, 525–527 (2016).

  21. 21.

    , , & Eur. J. Oper. Res. 185, 1390–1401 (2008).

  22. 22.

    & Preprint at bioRxiv .

Download references

Acknowledgements

We thank Cong Ma and Juntao Liu for helpful suggestions and discussions. This research is funded in part by the Gordon and Betty Moore Foundation's Data-Driven Discovery Initiative through Grant GBMF4554 to C.K., by The Shurl and Kay Curci Foundation, by the US National Science Foundation (CCF-1256087, CCF-1319998), and by the US National Institutes of Health (R01HG007104 and R01GM122935).

Author information

Affiliations

  1. Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA.

    • Mingfu Shao
    •  & Carl Kingsford

Authors

  1. Search for Mingfu Shao in:

  2. Search for Carl Kingsford in:

Contributions

M.S. and C.K. designed the method, and M.S. implemented it. M.S. and C.K. designed the experiments, and M.S. conducted them. M.S. and C.K. wrote the manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Carl Kingsford.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–21, Supplementary Tables 1–3, Supplementary Notes 1–7

  2. 2.

    Life Sciences Reporting Summary

Zip files

  1. 1.

    Supplementary Code

    Source Code of Scallop