Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

A comparison of single-coverage and multi-coverage metagenomic binning reveals extensive hidden contamination

Abstract

Metagenomic binning has revolutionized the study of uncultured microorganisms. Here we compare single- and multi-coverage binning on the same set of samples, and demonstrate that multi-coverage binning produces better results than single-coverage binning and identifies contaminant contigs and chimeric bins that other approaches miss. While resource expensive, multi-coverage binning is a superior approach and should always be performed over single-coverage binning.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: A comparison of single- and multi-coverage metagenomic binning.
Fig. 2: The worst-performing single-coverage bin according to the mean pairwise correlation coefficient.

Similar content being viewed by others

Data availability

Raw rumen FASTQ datasets are available under BioProject accession PRJEB21624. Raw human FASTQ datasets are available under BioProject accession PRJNA278393. Bins from Rampelli et al., assembled by Pasolli et al., are available from http://segatalab.cibio.unitn.it/data/Pasolli_et_al.html. Metagenome assemblies of the Rampelli et al. data, assembled by Pasolli et al., are available from https://www.dropbox.com/s/5qqtbyuufmgycp6/RampelliS_2015.tar.bz2. Finally, our analysis of the rumen and human datasets and bins can be downloaded from https://doi.org/10.6084/m9.figshare.19733509.

Code availability

Code for producing single- and multi-coverage assemblies and bins is available at https://github.com/WatsonLab/single_and_multiple_binning.

References

  1. Parks, D. H. et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat. Microbiol. 2, 1533–1542 (2017).

    Article  CAS  PubMed  Google Scholar 

  2. Almeida, A. et al. A new genomic blueprint of the human gut microbiota. Nature 568, 499–504 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Pasolli, E. et al. Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell 176, 649–662 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Nayfach, S., Shi, Z. J., Seshadri, R., Pollard, K. S. & Kyrpides, N. C. New insights from uncultivated genomes of the global human gut microbiome. Nature 568, 505–510 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Almeida, A. et al. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat. Biotechnol. 39, 105–114 (2021).

    Article  CAS  PubMed  Google Scholar 

  6. Stewart, R. D. et al. Compendium of 4,941 rumen metagenome-assembled genomes for rumen microbiome biology and enzyme discovery. Nat. Biotechnol. 37, 953–961 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014).

    Article  CAS  PubMed  Google Scholar 

  8. Kang, D. D. et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 7, e7359 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Kang, D. D., Froula, J., Egan, R. & Wang, Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3, e1165 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Stewart, R. D. et al. Assembly of 913 microbial genomes from metagenomic sequencing of the cow rumen. Nat. Commun. 9, 1–11 (2018).

    Article  CAS  Google Scholar 

  11. Glendinning, L., Genç, B., Wallace, R. J. & Watson, M. Metagenomic analysis of the cow, sheep, reindeer and red deer rumen. Sci. Rep. 11, 1990 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Wilkinson, T. et al. 1200 high-quality metagenome-assembled genomes from the rumen of African cattle and their relevance in the context of sub-optimal feeding. Genome Biol. 21, 229 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Orakov, A. et al. GUNC: detection of chimerism and contamination in prokaryotic genomes. Genome Biol. 22, 178 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Rampelli, S. et al. Metagenome sequencing of the hadza hunter-gatherer gut microbiota. Curr. Biol. 25, 1682–1693 (2015).

    Article  CAS  PubMed  Google Scholar 

  16. Krueger, F., James, F., Ewels, P., Afyounian, E. & Schuster-Boeckler, B. FelixKrueger/TrimGalore: v0.6.7 - DOI via Zenodo. Zenodo https://doi.org/10.5281/zenodo.5127899 (2021).

  17. Li, D., Liu, C.-M., Luo, R., Sadakane, K. & Lam, T.-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015).

    Article  CAS  PubMed  Google Scholar 

  18. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at arXiv (2013).

  19. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Chaumeil, P.-A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics 36, 1925–1927 (2020).

    Article  CAS  Google Scholar 

  21. Segata, N., Börnigen, D., Morgan, X. C. & Huttenhower, C. PhyloPhlAn is a new method for improved phylogenetic and taxonomic placement of microbes. Nat. Commun. 4, 2304 (2013).

    Article  PubMed  Google Scholar 

  22. Seshadri, R. et al. Cultivation and sequencing of rumen microbiome members from the Hungate1000 Collection. Nat. Biotechnol. 36, 359–367 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 49, W293–W296 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Olm, M. R., Brown, C. T., Brooks, B. & Banfield, J. F. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J. 11, 2864–2868 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. R: A Language and Environment for Statistical Computing (R Core Team, 2021).

  26. Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer, 2016).

  27. Xiao, N. ggsci: scientific journal and sci-fi themed color palettes for ‘ggplot2’. (2018).

  28. Yu, G. ggplotify: convert plot to ‘grob’ or ‘ggplot’ object. (2021).

  29. Wilke, C. O. cowplot: streamlined plot theme and plot annotations for ‘ggplot2’. (2020).

  30. Pedersen, T. L. patchwork: the composer of plots. (2020).

  31. Murrell, P. & Wen, Z. gridGraphics: redraw base graphics using ‘grid’ graphics. (2020).

  32. Wickham, H., François, R., Henry, L. & Müller, K. dplyr: a grammar of data manipulation. (2021).

Download references

Acknowledgements

The Roslin Institute forms part of, and is supported by, the Royal (Dick) School of Veterinary Studies, University of Edinburgh. This project was supported by the Biotechnology and Biological Sciences Research Council (BBSRC; BB/S006680/1, BB/R015023/1 and BB/V018450/1), including institute strategic program grant BBS/E/D/30002276.

Author information

Authors and Affiliations

Authors

Contributions

J.M. and M.W. carried out all analyses and wrote the paper.

Corresponding author

Correspondence to Mick Watson.

Ethics declarations

Competing interests

Mick Watson is an employee of DSM, and the remaining authors declare no competing interests

Peer review

Peer review information

Nature Methods thanks Anders Andersson, C. Titus Brown and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling editor: Lin Tang, in collaboration with the Nature Methods team. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Replication of results in human microbiome data, rationale for using Pearson correlation coefficient, challenges in implementation of our approach and Supplementary Fig. 1.

Reporting Summary

Peer Review File

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mattock, J., Watson, M. A comparison of single-coverage and multi-coverage metagenomic binning reveals extensive hidden contamination. Nat Methods 20, 1170–1173 (2023). https://doi.org/10.1038/s41592-023-01934-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-023-01934-8

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics