Brief Communication | Published:

Comparison of high-throughput sequencing data compression tools

Nature Methods volume 13, pages 10051008 (2016) | Download Citation

Abstract

High-throughput sequencing (HTS) data are commonly stored as raw sequencing reads in FASTQ format or as reads mapped to a reference, in SAM format, both with large memory footprints. Worldwide growth of HTS data has prompted the development of compression methods that aim to significantly reduce HTS data size. Here we report on a benchmarking study of available compression methods on a comprehensive set of HTS data using an automated framework.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    , & Brief. Bioinform. 15, 390–406 (2014).

  2. 2.

    & GigaScience 2, 5 (2013).

  3. 3.

    & Algorithms Mol. Biol. 8, 25 (2013).

  4. 4.

    & Bioinformatics 30, 2213–2215 (2014).

  5. 5.

    , , , & J Bioinform. Comput. Biol. 13, 1541003 (2015).

  6. 6.

    & PLoS One 8, e59190 (2013).

  7. 7.

    , & Bioinformatics 31, 3276–3281 (2015).

  8. 8.

    , , & Bioinformatics 28, 3051–3057 (2012).

  9. 9.

    , & Bioinformatics 31, 1389–1395 (2015).

  10. 10.

    & Bioinformatics 31, 2770–2777 (2015).

  11. 11.

    , , & Bioinformatics 1415–1419 (2012).

  12. 12.

    et al. BMC Bioinformatics 16, 188 (2015).

  13. 13.

    , , & Nucleic Acids Res. 40, e171 (2012).

  14. 14.

    et al. BMC Bioinformatics 16, 288 (2015).

  15. 15.

    & Bioinformatics 31, 1920–1928 (2015).

  16. 16.

    , , , & Gene 579, 75–81 (2016).

  17. 17.

    et al. Bioinformatics 25, 2078–2079 (2009).

  18. 18.

    , , , & Bioinformatics 31, 2032–2034 (2015).

  19. 19.

    , , & Genome Res. 21, 734–740 (2011).

  20. 20.

    Bioinformatics 30, 2818–2819 (2014).

  21. 21.

    , & Nat. Methods 11, 1082–1084 (2014).

  22. 22.

    , & J. Bioinform. Comput. Biol. 12, 1442002 (2014).

  23. 23.

    , & Predictive coding of aligned next-generation sequencing data. In Proc. 2016 Data Compression Conference 241–250 (IEEE, 2016).

Download references

Acknowledgements

This research was supported by Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Frontiers program 'Cancer Genome Collaboratory' project (S.C.S., F.H., I.N.); the Vanier Canada Graduate Scholarships program (I.N.); National Institutes of Health (NIH) (R01GM108348 to S.C.S.); National Science Foundation (NSF) (1619081 to S.C.S.); Indiana University Grant Challenges Program Precision Health Initiative (S.C.S.); Wellcome Trust (098051 to J.K.B.); Leibniz Universität Hannover eNIFE grant (J.V. and J.O.); Swiss Platform for Advanced Scientific Computing (PASC) PoSeNoGap project (C.A. and M.M.). We would also like to thank the authors of evaluated compression tools for providing support for their tools and replying to our bug reports.

Author information

Author notes

    • Ibrahim Numanagić
    •  & James K Bonfield

    These authors contributed equally to this work.

Affiliations

  1. School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada.

    • Ibrahim Numanagić
    • , Faraz Hach
    •  & S Cenk Sahinalp
  2. Wellcome Trust Sanger Institute, Hinxton, UK.

    • James K Bonfield
  3. Vancouver Prostate Centre, Vancouver, British Columbia, Canada.

    • Faraz Hach
    •  & S Cenk Sahinalp
  4. Institut für Informationsverarbeitung, Leibniz Universität, Hannover, Germany.

    • Jan Voges
    •  & Jörn Ostermann
  5. École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.

    • Claudio Alberti
    •  & Marco Mattavelli
  6. School of Informatics and Computing, Indiana University, Bloomington, Indiana, USA.

    • S Cenk Sahinalp

Authors

  1. Search for Ibrahim Numanagić in:

  2. Search for James K Bonfield in:

  3. Search for Faraz Hach in:

  4. Search for Jan Voges in:

  5. Search for Jörn Ostermann in:

  6. Search for Claudio Alberti in:

  7. Search for Marco Mattavelli in:

  8. Search for S Cenk Sahinalp in:

Contributions

The study was initiated by I.N., C.A. and M.M. I.N. designed the benchmarking framework and performed the experiments. J.K.B. evaluated the framework. I.N., J.K.B., J.V., J.O., F.H., C.A., M.M. and S.C.S. contributed to writing the manuscript. S.C.S. and F.H. oversaw the project.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to S Cenk Sahinalp.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figure 1, Supplementary Tables 1–7 and Supplementary Notes 1–6.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nmeth.4037

Further reading