Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms

Abstract

We introduce Sailfish, a computational method for quantifying the abundance of previously annotated RNA isoforms from RNA-seq data. Because Sailfish entirely avoids mapping reads, a time-consuming step in all current methods, it provides quantification estimates much faster than do existing approaches (typically 20 times faster) without loss of accuracy. By facilitating frequent reanalysis of data and reducing the need to optimize parameters, Sailfish exemplifies the potential of lightweight algorithms for efficiently processing sequencing reads.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Overview of the Sailfish pipeline.
Figure 2: Speed and accuracy of Sailfish.

Accession codes

Accessions

Sequence Read Archive

References

  1. Soneson, C. & Delorenzi, M. BMC Bioinformatics 14, 91 (2013).

    Article  Google Scholar 

  2. Roychowdhury, S. et al. Sci. Trans. Med. 111ra121 (2011).

  3. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Genome Biol. 10, R25 (2009).

    Article  Google Scholar 

  4. Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Nat. Methods 5, 621–628 (2008).

    Article  CAS  Google Scholar 

  5. Trapnell, C. et al. Nat. Biotechnol. 28, 511–515 (2010).

    Article  CAS  Google Scholar 

  6. Li, B. & Dewey, C. BMC Bioinformatics 12, 323 (2011).

    Article  CAS  Google Scholar 

  7. Roberts, A. & Pachter, L. Nat. Methods 10, 71–73 (2012).

    Article  Google Scholar 

  8. Philippe, N., Salson, M., Commes, T. & Rivals, E. Genome Biol. 14, R30 (2013).

    Article  Google Scholar 

  9. Botelho, F.C., Pagh, R. & Ziviani, N. Proceedings of the 10th International Workshop on Algorithms and Data Structures Halifax, NS, Canada, August 15–17, 2007 (eds. Dehne, F., Sack, J.-R. & Zeh, N.)139–150 (Springer, 2007).

  10. Marçais, G. & Kingsford, C. Bioinformatics 27, 764–770 (2011).

    Article  Google Scholar 

  11. Varadhan, R. & Roland, C. Scand. J. Stat. 35, 335–353 (2008).

    Article  Google Scholar 

  12. Nicolae, M., Mangul, S., Mandoiu, I. & Zelikovsky, A. Algorithms Mol. Biol. 6, 9 (2011).

    Article  Google Scholar 

  13. Salzman, J., Jiang, H. & Wong, W.H. Stat. Sci. 26, 62–83 (2011).

    Article  Google Scholar 

  14. Zheng, W., Chung, L.M. & Zhao, H. BMC Bioinformatics 12, 290 (2011).

    Article  CAS  Google Scholar 

  15. Shi, L. et al. Nat. Biotechnol. 24, 1151–1161 (2006).

    Article  CAS  Google Scholar 

  16. Bullard, J.H., Purdom, E., Hansen, K.D. & Dudoit, S. BMC Bioinformatics 11, 94 (2010).

    Article  Google Scholar 

  17. Griebel, T. et al. Nucleic Acids Res. 40, 10073–10083 (2012).

    Article  CAS  Google Scholar 

  18. Grabherr, M.G. et al. Nat. Biotechnol. 29, 644–652 (2011).

    Article  CAS  Google Scholar 

  19. Sacomoto, G.A. et al. BMC Bioinformatics 13 (suppl. 6), S5 (2012).

    PubMed  PubMed Central  Google Scholar 

  20. Pruitt, K.D., Tatusova, T., Brown, G.R. & Maglott, D.R. Nucleic Acids Res. 40, D1, D130–D135 (2012).

    Article  Google Scholar 

  21. Flicek, P. et al. Nucleic Acids Res. 41, D1, D48–D55 (2013).

    Google Scholar 

  22. Trapnell, C., Pachter, L. & Salzberg, S. Bioinformatics 25, 1105–1111 (2009).

    Article  CAS  Google Scholar 

  23. Pheatt, C. J. Comput. Sci. Coll. 23, 298–298 (2008).

    Google Scholar 

Download references

Acknowledgements

This work has been partially funded by the US National Science Foundation (CCF-1256087, CCF-1053918, and EF-0849899) and US National Institutes of Health (R21AI085376, R21HG006913 and R01HG007104). C.K. received support as an Alfred P. Sloan Research Fellow. We would like to thank A. Roberts for helping to diagnose and resolve an artifact in an earlier version of this manuscript pertaining to the synthetic data generated by the Flux Simulator.

Author information

Authors and Affiliations

Authors

Contributions

R.P., S.M.M. and C.K. designed the method and algorithms, devised the experiments, and wrote the manuscript. R.P. implemented the Sailfish software.

Corresponding author

Correspondence to Carl Kingsford.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–7, Supplementary Table 1 and Supplementary Notes 1–3 (PDF 1337 kb)

Supplementary Data

Version 0.6.3 of the Sailfish source code (ZIP 666 kb)

Source data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Patro, R., Mount, S. & Kingsford, C. Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms. Nat Biotechnol 32, 462–464 (2014). https://doi.org/10.1038/nbt.2862

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt.2862

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing