Abstract
Metagenomic binning has revolutionized the study of uncultured microorganisms. Here we compare single- and multi-coverage binning on the same set of samples, and demonstrate that multi-coverage binning produces better results than single-coverage binning and identifies contaminant contigs and chimeric bins that other approaches miss. While resource expensive, multi-coverage binning is a superior approach and should always be performed over single-coverage binning.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
Raw rumen FASTQ datasets are available under BioProject accession PRJEB21624. Raw human FASTQ datasets are available under BioProject accession PRJNA278393. Bins from Rampelli et al., assembled by Pasolli et al., are available from http://segatalab.cibio.unitn.it/data/Pasolli_et_al.html. Metagenome assemblies of the Rampelli et al. data, assembled by Pasolli et al., are available from https://www.dropbox.com/s/5qqtbyuufmgycp6/RampelliS_2015.tar.bz2. Finally, our analysis of the rumen and human datasets and bins can be downloaded from https://doi.org/10.6084/m9.figshare.19733509.
Code availability
Code for producing single- and multi-coverage assemblies and bins is available at https://github.com/WatsonLab/single_and_multiple_binning.
References
Parks, D. H. et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat. Microbiol. 2, 1533–1542 (2017).
Almeida, A. et al. A new genomic blueprint of the human gut microbiota. Nature 568, 499–504 (2019).
Pasolli, E. et al. Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell 176, 649–662 (2019).
Nayfach, S., Shi, Z. J., Seshadri, R., Pollard, K. S. & Kyrpides, N. C. New insights from uncultivated genomes of the global human gut microbiome. Nature 568, 505–510 (2019).
Almeida, A. et al. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat. Biotechnol. 39, 105–114 (2021).
Stewart, R. D. et al. Compendium of 4,941 rumen metagenome-assembled genomes for rumen microbiome biology and enzyme discovery. Nat. Biotechnol. 37, 953–961 (2019).
Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014).
Kang, D. D. et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 7, e7359 (2019).
Kang, D. D., Froula, J., Egan, R. & Wang, Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3, e1165 (2015).
Stewart, R. D. et al. Assembly of 913 microbial genomes from metagenomic sequencing of the cow rumen. Nat. Commun. 9, 1–11 (2018).
Glendinning, L., Genç, B., Wallace, R. J. & Watson, M. Metagenomic analysis of the cow, sheep, reindeer and red deer rumen. Sci. Rep. 11, 1990 (2021).
Wilkinson, T. et al. 1200 high-quality metagenome-assembled genomes from the rumen of African cattle and their relevance in the context of sub-optimal feeding. Genome Biol. 21, 229 (2020).
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).
Orakov, A. et al. GUNC: detection of chimerism and contamination in prokaryotic genomes. Genome Biol. 22, 178 (2021).
Rampelli, S. et al. Metagenome sequencing of the hadza hunter-gatherer gut microbiota. Curr. Biol. 25, 1682–1693 (2015).
Krueger, F., James, F., Ewels, P., Afyounian, E. & Schuster-Boeckler, B. FelixKrueger/TrimGalore: v0.6.7 - DOI via Zenodo. Zenodo https://doi.org/10.5281/zenodo.5127899 (2021).
Li, D., Liu, C.-M., Luo, R., Sadakane, K. & Lam, T.-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at arXiv (2013).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Chaumeil, P.-A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics 36, 1925–1927 (2020).
Segata, N., Börnigen, D., Morgan, X. C. & Huttenhower, C. PhyloPhlAn is a new method for improved phylogenetic and taxonomic placement of microbes. Nat. Commun. 4, 2304 (2013).
Seshadri, R. et al. Cultivation and sequencing of rumen microbiome members from the Hungate1000 Collection. Nat. Biotechnol. 36, 359–367 (2018).
Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 49, W293–W296 (2021).
Olm, M. R., Brown, C. T., Brooks, B. & Banfield, J. F. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J. 11, 2864–2868 (2017).
R: A Language and Environment for Statistical Computing (R Core Team, 2021).
Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer, 2016).
Xiao, N. ggsci: scientific journal and sci-fi themed color palettes for ‘ggplot2’. (2018).
Yu, G. ggplotify: convert plot to ‘grob’ or ‘ggplot’ object. (2021).
Wilke, C. O. cowplot: streamlined plot theme and plot annotations for ‘ggplot2’. (2020).
Pedersen, T. L. patchwork: the composer of plots. (2020).
Murrell, P. & Wen, Z. gridGraphics: redraw base graphics using ‘grid’ graphics. (2020).
Wickham, H., François, R., Henry, L. & Müller, K. dplyr: a grammar of data manipulation. (2021).
Acknowledgements
The Roslin Institute forms part of, and is supported by, the Royal (Dick) School of Veterinary Studies, University of Edinburgh. This project was supported by the Biotechnology and Biological Sciences Research Council (BBSRC; BB/S006680/1, BB/R015023/1 and BB/V018450/1), including institute strategic program grant BBS/E/D/30002276.
Author information
Authors and Affiliations
Contributions
J.M. and M.W. carried out all analyses and wrote the paper.
Corresponding author
Ethics declarations
Competing interests
Mick Watson is an employee of DSM, and the remaining authors declare no competing interests
Peer review
Peer review information
Nature Methods thanks Anders Andersson, C. Titus Brown and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling editor: Lin Tang, in collaboration with the Nature Methods team. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Replication of results in human microbiome data, rationale for using Pearson correlation coefficient, challenges in implementation of our approach and Supplementary Fig. 1.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mattock, J., Watson, M. A comparison of single-coverage and multi-coverage metagenomic binning reveals extensive hidden contamination. Nat Methods 20, 1170–1173 (2023). https://doi.org/10.1038/s41592-023-01934-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-023-01934-8
This article is cited by
-
Fairy: fast approximate coverage for multi-sample metagenomic binning
Microbiome (2024)
-
Exploring high-quality microbial genomes by assembling short-reads with long-range connectivity
Nature Communications (2024)
-
A multi-kingdom collection of 33,804 reference genomes for the human vaginal microbiome
Nature Microbiology (2024)
-
Genome-resolved metagenomics: a game changer for microbiome medicine
Experimental & Molecular Medicine (2024)
-
Unveiling microbial diversity: harnessing long-read sequencing technology
Nature Methods (2024)