Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Comparison of high-throughput sequencing data compression tools

Abstract

High-throughput sequencing (HTS) data are commonly stored as raw sequencing reads in FASTQ format or as reads mapped to a reference, in SAM format, both with large memory footprints. Worldwide growth of HTS data has prompted the development of compression methods that aim to significantly reduce HTS data size. Here we report on a benchmarking study of available compression methods on a comprehensive set of HTS data using an automated framework.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

References

  1. Giancarlo, R., Rombo, S.E. & Utro, F. Brief. Bioinform. 15, 390–406 (2014).

    Article  CAS  PubMed  Google Scholar 

  2. Holland, R.C. & Lynch, N. GigaScience 2, 5 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  3. Deorowicz, S. & Grabowski, S. Algorithms Mol. Biol. 8, 25 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Roguski, L. & Deorowicz, S. Bioinformatics 30, 2213–2215 (2014).

    Article  CAS  PubMed  Google Scholar 

  5. Dutta, A., Haque, M.M., Bose, T., Reddy, C.V. & Mande, S.S. J Bioinform. Comput. Biol. 13, 1541003 (2015).

    Article  CAS  PubMed  Google Scholar 

  6. Bonfield, J.K. & Mahoney, M.V. PLoS One 8, e59190 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Nicolae, M., Pathak, S. & Rajasekaran, S. Bioinformatics 31, 3276–3281 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Hach, F., Numanagić, I., Alkan, C. & Sahinalp, S.C. Bioinformatics 28, 3051–3057 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Grabowski, S., Deorowicz, S. & Roguski, L. Bioinformatics 31, 1389–1395 (2015).

    Article  CAS  PubMed  Google Scholar 

  10. Patro, R. & Kingsford, C. Bioinformatics 31, 2770–2777 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Cox, A.J., Bauer, M.J., Jakobi, T. & Rosone, G. Bioinformatics 1415–1419 (2012).

  12. Zhang, Y. et al. BMC Bioinformatics 16, 188 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Jones, D.C., Ruzzo, W.L., Peng, X. & Katze, M.G. Nucleic Acids Res. 40, e171 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Benoit, G. et al. BMC Bioinformatics 16, 288 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Kingsford, C. & Patro, R. Bioinformatics 31, 1920–1928 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Zhang, Y., Patel, K., Endrawis, T., Bowers, A. & Sun, Y. Gene 579, 75–81 (2016).

    Article  CAS  PubMed  Google Scholar 

  17. Li, H. et al. Bioinformatics 25, 2078–2079 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Tarasov, A., Vilella, A.J., Cuppen, E., Nijman, I.J. & Prins, P. Bioinformatics 31, 2032–2034 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Hsi-Yang Fritz, M., Leinonen, R., Cochrane, G. & Birney, E. Genome Res. 21, 734–740 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Bonfield, J.K. Bioinformatics 30, 2818–2819 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Hach, F., Numanagić, I. & Sahinalp, S.C. Nat. Methods 11, 1082–1084 (2014).

    Article  CAS  PubMed  Google Scholar 

  22. Ochoa, I., Hernaez, M. & Weissman, T. J. Bioinform. Comput. Biol. 12, 1442002 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Voges, J., Munderloh, M. & Ostermann, J. Predictive coding of aligned next-generation sequencing data. In Proc. 2016 Data Compression Conference 241–250 (IEEE, 2016).

Download references

Acknowledgements

This research was supported by Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Frontiers program 'Cancer Genome Collaboratory' project (S.C.S., F.H., I.N.); the Vanier Canada Graduate Scholarships program (I.N.); National Institutes of Health (NIH) (R01GM108348 to S.C.S.); National Science Foundation (NSF) (1619081 to S.C.S.); Indiana University Grant Challenges Program Precision Health Initiative (S.C.S.); Wellcome Trust (098051 to J.K.B.); Leibniz Universität Hannover eNIFE grant (J.V. and J.O.); Swiss Platform for Advanced Scientific Computing (PASC) PoSeNoGap project (C.A. and M.M.). We would also like to thank the authors of evaluated compression tools for providing support for their tools and replying to our bug reports.

Author information

Authors and Affiliations

Authors

Contributions

The study was initiated by I.N., C.A. and M.M. I.N. designed the benchmarking framework and performed the experiments. J.K.B. evaluated the framework. I.N., J.K.B., J.V., J.O., F.H., C.A., M.M. and S.C.S. contributed to writing the manuscript. S.C.S. and F.H. oversaw the project.

Corresponding author

Correspondence to S Cenk Sahinalp.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figure 1, Supplementary Tables 1–7 and Supplementary Notes 1–6. (PDF 2002 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Numanagić, I., Bonfield, J., Hach, F. et al. Comparison of high-throughput sequencing data compression tools. Nat Methods 13, 1005–1008 (2016). https://doi.org/10.1038/nmeth.4037

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.4037

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing