Challenges in benchmarking metagenomic profilers

Sun, Zheng; Huang, Shi; Zhang, Meng; Zhu, Qiyun; Haiminen, Niina; Carrieri, Anna Paola; Vázquez-Baeza, Yoshiki; Parida, Laxmi; Kim, Ho-Cheol; Knight, Rob; Liu, Yang-Yu

doi:10.1038/s41592-021-01141-3

Analysis
Published: 13 May 2021

Challenges in benchmarking metagenomic profilers

Nature Methods volume 18, pages 618–626 (2021)Cite this article

9424 Accesses
44 Citations
52 Altmetric
Metrics details

Subjects

Abstract

Accurate microbial identification and abundance estimation are crucial for metagenomics analysis. Various methods for classification of metagenomic data and estimation of taxonomic profiles, broadly referred to as metagenomic profilers, have been developed. Nevertheless, benchmarking of metagenomic profilers remains challenging because some tools are designed to report relative sequence abundance while others report relative taxonomic abundance. Here we show how misleading conclusions can be drawn by neglecting this distinction between relative abundance types when benchmarking metagenomic profilers. Moreover, we show compelling evidence that interchanging sequence abundance and taxonomic abundance will influence both per-sample summary statistics and cross-sample comparisons. We suggest that the microbiome research community pay attention to potentially misleading biological conclusions arising from this issue when benchmarking metagenomic profilers, by carefully considering the type of abundance data that were analyzed and interpreted and clearly stating the strategy used for metagenomic profiling.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Comparison of profiling results.**

**Fig. 2: Correlation between sequence and taxonomic abundance in synthetic profiles based on different kingdoms.**

**Fig. 3: Quantitative and qualitative benchmarking results of four representative metagenomic profilers using 25 simulated communities.**

**Fig. 4: Alpha diversity based on sequence and taxonomic abundance.**

**Fig. 5: Ordination analyses of simulated profiles based on rJSD.**

Microbiome differential abundance methods produce different results across 38 datasets

Article Open access 17 January 2022

Analysis of compositions of microbiomes with bias correction

Article Open access 14 July 2020

Diversity within species: interpreting strains in microbiomes

Article 04 June 2020

Data availability

All simulated datasets can be downloaded from https://figshare.com/projects/Challenges_in_Benchmarking_Metagenomic_Profilers/79916.Source data are provided with this paper.

Code availability

R scripts used in this paper are available at https://github.com/shihuang047/re-benchmarking

References

Knight, R. et al. Best practices for analysing microbiomes. Nat. Rev. Microbiol. 16, 410–422 (2018).
Article CAS Google Scholar
Ye, S. H., Siddle, K. J., Park, D. J. & Sabeti, P. C. Benchmarking metagenomics tools for taxonomic classification. Cell 178, 779–794 (2019).
Article CAS Google Scholar
Liu, B., Gibbons, T., Ghodsi, M., Treangen, T. & Pop, M. Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences. BMC Genomics 12, S4 (2011).
Article CAS Google Scholar
Arumugam, M. et al. Enterotypes of the human gut microbiome. Nature 473, 174–180 (2011).
Article CAS Google Scholar
Milanese, A. et al. Microbial abundance, activity and population genomic profiling with mOTUs2. Nat. Commun. 10, 1014 (2019).
Article Google Scholar
Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 257 (2019).
Article CAS Google Scholar
Wood, D. E. & Salzberg, S. L. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15, R46 (2014).
Article Google Scholar
Lu, J., Breitwieser, F. P., Thielen, P. & Salzberg, S. L. Bracken: estimating species abundance in metagenomics data. PeerJ Comput. Sci. 3, e104 (2017).
Article Google Scholar
Kostic, A. D. et al. PathSeq: software to identify or discover microbes by deep sequencing of human tissue. Nat. Biotechnol. 29, 393–396 (2011).
Article CAS Google Scholar
Menzel, P., Ng, K. L. & Krogh, A. Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat. Commun. 7, 11257 (2016).
Article CAS Google Scholar
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Article CAS Google Scholar
Truong, D. T. et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat. Methods 12, 902–903 (2015).
Article CAS Google Scholar
Segata, N. et al. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods 9, 811–814 (2012).
Article CAS Google Scholar
Sunagawa, S. et al. Metagenomic species profiling using universal phylogenetic marker genes. Nat. Methods 10, 1196–1199 (2013).
Article CAS Google Scholar
Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 27, 824–834 (2017).
Article CAS Google Scholar
Li, D. et al. MEGAHIT v1.0: a fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods 102, 3–11 (2016).
Article CAS Google Scholar
Mavromatis, K. et al. Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat. Methods 4, 495–500 (2007).
Article CAS Google Scholar
Meyer, F. et al. Assessing taxonomic metagenome profilers with OPAL. Genome Biol. 20, 51 (2019).
Sczyrba, A. et al. Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software. Nat. Methods 14, 1063–1071 (2017).
Article CAS Google Scholar
McIntyre, A. B. R. et al. Comprehensive benchmarking and ensemble approaches for metagenomic classifiers. Genome Biol. 18, 182 (2017).
Article Google Scholar
Lindgreen, S., Adair, K. L. & Gardner, P. P. An evaluation of the accuracy and speed of metagenome analysis tools. Sci. Rep. 6, 19233 (2016).
Article CAS Google Scholar
Chen, F., Mackey, A. J., Vermunt, J. K. & Roos, D. S. Assessing performance of orthology detection strategies applied to eukaryotic genomes. PLoS ONE 2, e383 (2007).
Article Google Scholar
Soppa, J. Polyploidy in archaea and bacteria: about desiccation resistance, giant cell size, long-term survival, enforcement by a eukaryotic host and additional aspects. J. Mol. Microbiol. Biotechnol. 24, 409–419 (2014).
Article CAS Google Scholar
Mendell, J. E., Clements, K. D., Choat, J. H. & Angert, E. R. Extreme polyploidy in a large bacterium. Proc. Natl Acad. Sci. USA 105, 6730–6734 (2008).
Article CAS Google Scholar
Gloor, G. B., Macklaim, J. M., Pawlowsky-Glahn, V. & Egozcue, J. J. Microbiome datasets are compositional: and this is not optional. Front. Microbiol. 8, 2224 (2017).
Article Google Scholar
Aitchison, J. On criteria for measures of compositional distance. Math. Geol. 24, 365–379 (1992).
Article Google Scholar
Martino, C. et al. A novel sparse compositional technique reveals microbial perturbations. mSystems 4, e00016–e00019 (2019).
Article Google Scholar
Aitchison, J., Barceló-Vidal, C., Martín-Fernández, J. A., Pawlowsky-Glahn, V. & Logratio Analysis and compositional distance. Math. Geol. 32, 271–275 (2000).
Article Google Scholar
Breitwieser, F. P., Lu, J. & Salzberg, S. L. A review of methods and databases for metagenomic classification and assembly. Brief. Bioinform. 20, 1125–1136 (2019).
Article CAS Google Scholar
Legendre, P., Borcard, D. & Peres-Neto, P. R. Analyzing beta diversity: partitioning the spatial variation of community composition data. Ecol. Monogr. 75, 435–450 (2005).
Article Google Scholar
Mantel, N. The detection of disease clustering and a generalized regression approach. Cancer Res. 27, 209–220 (1967).
CAS PubMed Google Scholar
Faith, D. P., Minchin, P. R. & Belbin, L. Compositional dissimilarity as a robust measure of ecological distance. Vegetatio 69, 57–68 (1987).
Article Google Scholar
Legendre, P. & Gallagher, E. D. Ecologically meaningful transformations for ordination of species data. Oecologia 129, 271–280 (2001).
Article Google Scholar
van der Maaten, L. J. P. & Hinton, G. E. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Google Scholar
McInnes, L. & Healy, J. UMAP: uniform manifold approximation and projection for dimension reduction. J. Open Source Softw. 3, 861 (2018).
Article Google Scholar
Dray, S., Chessel, D. & Thioulouse, J. Procrustean co-inertia analysis for the linking of multivariate datasets. Écoscience 10, 110–119 (2003).
Article Google Scholar
Digby, P. & Kempton, R. Multivariate Analysis of Ecological Communities (Palgrave MacMillan, 1987).
Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).
Article Google Scholar
Hsu, T. et al. Urban transit system microbial communities differ by surface type and interaction with humans and the environment. mSystems 1, e00018-16 (2016).
Article Google Scholar

Download references

Acknowledgements

Research reported in this publication was supported by grant nos. R01AI141529, R01HD093761, R01AG067744, UH3OD023268, U19AI095219 and U01HL089856 from the National Institutes of Health. This work was also supported by IBM Research through the AI Horizons Network, UC San Diego AI for Healthy Living program in partnership with the UC San Diego Center for Microbiome Innovation.

Author information

These authors contributed equally: Zheng Sun, Shi Huang.

Authors and Affiliations

Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
Zheng Sun & Yang-Yu Liu
Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
Shi Huang, Qiyun Zhu, Yoshiki Vázquez-Baeza & Rob Knight
Center for Microbiome Innovation, Jacobs School of Engineering, University of California, San Diego, La Jolla, CA, USA
Shi Huang, Qiyun Zhu, Yoshiki Vázquez-Baeza & Rob Knight
Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, China
Meng Zhang
IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
Niina Haiminen & Laxmi Parida
IBM Research Europe, The Hartree Centre, Warrington, UK
Anna Paola Carrieri
AI and Cognitive Software, IBM Research-Almaden, San Jose, CA, USA
Ho-Cheol Kim
Department of Computer Science & Engineering, University of California, San Diego, La Jolla, CA, USA
Rob Knight
Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
Rob Knight

Authors

Zheng Sun
View author publications
You can also search for this author in PubMed Google Scholar
Shi Huang
View author publications
You can also search for this author in PubMed Google Scholar
Meng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Qiyun Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Niina Haiminen
View author publications
You can also search for this author in PubMed Google Scholar
Anna Paola Carrieri
View author publications
You can also search for this author in PubMed Google Scholar
Yoshiki Vázquez-Baeza
View author publications
You can also search for this author in PubMed Google Scholar
Laxmi Parida
View author publications
You can also search for this author in PubMed Google Scholar
Ho-Cheol Kim
View author publications
You can also search for this author in PubMed Google Scholar
Rob Knight
View author publications
You can also search for this author in PubMed Google Scholar
Yang-Yu Liu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y.-Y.L. and R.K. conceived and designed the analysis. Z.S. and S.H. led the analysis. M.Z., Q.Z., N.H., A.P.C., Y.V.-B, L.P. and H.-C.K. contributed evaluation strategies. All authors analyzed the results. Z.S., S.H., Y.-Y.L. and R.K. wrote the paper. All authors edited the paper.

Corresponding authors

Correspondence to Rob Knight or Yang-Yu Liu.

Ethics declarations

Competing interests

This work received support from IBM Research through the AI Horizons Network. Coauthors N.H., A.P.C., L.P. and H.-C.K. are employees of IBM. The authors declare no other competing interests.

Additional information

Peer review information Nature Methods thanks the anonymous reviewers for their contribution to the peer review of this work. Lin Tang was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, Z., Huang, S., Zhang, M. et al. Challenges in benchmarking metagenomic profilers. Nat Methods 18, 618–626 (2021). https://doi.org/10.1038/s41592-021-01141-3

Download citation

Received: 16 November 2020
Accepted: 02 April 2021
Published: 13 May 2021
Issue Date: June 2021
DOI: https://doi.org/10.1038/s41592-021-01141-3

This article is cited by

Comparative analysis of metagenomic classifiers for long-read sequencing datasets
- Josip Marić
- Krešimir Križanović
- Mile Šikić
BMC Bioinformatics (2024)
Modeling the limits of detection for antimicrobial resistance genes in agri-food samples: a comparative analysis of bioinformatics tools
- Ashley L. Cooper
- Andrew Low
- Catherine D. Carrillo
BMC Microbiology (2024)
Integrating taxonomic signals from MAGs and contigs improves read annotation and taxonomic profiling of metagenomes
- Ernestina Hauptfeld
- Nikolaos Pappas
- F. A. Bastiaan von Meijenfeldt
Nature Communications (2024)
Phage-inclusive profiling of human gut microbiomes with Phanta
- Yishay Pinto
- Meenakshi Chakraborty
- Ami S. Bhatt
Nature Biotechnology (2024)
Removal of false positives in metagenomics-based taxonomy profiling via targeting Type IIB restriction sites
- Zheng Sun
- Jiang Liu
- Yang-Yu Liu
Nature Communications (2023)

Challenges in benchmarking metagenomic profilers

Subjects

Abstract

Access options

Similar content being viewed by others

Microbiome differential abundance methods produce different results across 38 datasets

Analysis of compositions of microbiomes with bias correction

Diversity within species: interpreting strains in microbiomes

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Information

Reporting Summary

Supplementary Data

Source data

Source Data Fig. 1

Source Data Fig. 2

Source Data Fig. 3

Source Data Fig. 4

Source Data Fig. 5

Rights and permissions

About this article

Cite this article

This article is cited by

Comparative analysis of metagenomic classifiers for long-read sequencing datasets

Modeling the limits of detection for antimicrobial resistance genes in agri-food samples: a comparative analysis of bioinformatics tools

Integrating taxonomic signals from MAGs and contigs improves read annotation and taxonomic profiling of metagenomes

Phage-inclusive profiling of human gut microbiomes with Phanta

Removal of false positives in metagenomics-based taxonomy profiling via targeting Type IIB restriction sites

Search

Quick links

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links