Abstract
Microbiome data, metadata and analytical workflows have become ‘big’ in terms of volume and complexity. Although the infrastructure and technologies to share data have been established, the interdisciplinary and multi-omic nature of the field can make resources difficult to identify and use. Following best practices for data deposition requires substantial effort, with sometimes little obvious reward. Gaps remain where microbiome-specific resources for data sharing or reproducibility do not yet exist. We outline available best practices, challenges to their adoption and opportunities in data sharing in microbiome research. We showcase examples of best practices and advocate for their enforcement and incentivization for data sharing. This includes recognition of data curation and sharing endeavours by individuals, institutions, journals and funders. Opportunities for progress include enabling microbiome-specific databases to incorporate future methods for data analysis, integration and reuse.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Kyrpides, N. C., Eloe-Fadrosh, E. A. & Ivanova, N. N. Microbiome data science: understanding our microbial planet. Trends Microbiol. 24, 425–427 (2016).
Winkler, T. W. et al. Quality control and conduct of genome-wide association meta-analyses. Nat. Protoc. 9, 1192–1212 (2014).
Field, D. et al. The minimum information about a genome sequence (MIGS) specification. Nat. Biotechnol. 26, 541–547 (2008).
Yilmaz, P. et al. Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nat. Biotechnol. 29, 415–420 (2011).
Bowers, R. M. et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat. Biotechnol. 35, 725–731 (2017).
Costea, P. I. et al. Towards standards for human fecal sample processing in metagenomic studies. Nat. Biotechnol. 35, 1069–1076 (2017).
Sinha, R. et al. Assessment of variation in microbial community amplicon sequencing by the Microbiome Quality Control (MBQC) project consortium. Nat. Biotechnol. 35, 1077–1086 (2017).
Mirzayi, C. et al. Reporting guidelines for human microbiome research: the STORMS checklist. Nat. Med. 27, 1885–1892 (2021).
Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).
Sandve, G. K., Nekrutenko, A., Taylor, J. & Hovig, E. Ten simple rules for reproducible computational research. PLoS Comput. Biol. 9, e1003285 (2013).
Teeri, J. A. & Raven, P. H. A National Ecological Observatory Network. Science 298, 1893 (2002).
Mason, C. E., Afshinnekoo, E., Tighe, S., Wu, S. & Levy, S. International standards for genomes, transcriptomes and metagenomes. J. Biomol. Tech. 28, 8–18 (2017).
Arkin, A. P. et al. KBase: The United States Department of Energy Systems Biology Knowledgebase. Nat. Biotechnol. 36, 566–569 (2018).
Gilbert, J. A., Jansson, J. K. & Knight, R. Earth Microbiome Project and global systems biology. mSystems 3, e00217–17 (2018).
McDonald, D. et al. American Gut: an open platform for citizen science microbiome research. mSystems 3, e00031–18 (2018).
Lloyd-Price, J. et al. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature 569, 655–662 (2019).
Sunagawa, S. et al. Tara Oceans: towards global ocean ecosystems biology. Nat. Rev. Microbiol. 18, 428–445 (2020).
Wood-Charlson, E. M. et al. The National Microbiome Data Collaborative: enabling microbiome science. Nat. Rev. Microbiol. 18, 313–314 (2020).
Vangay, P. et al. Microbiome Metadata Standards: report of the National Microbiome Data Collaborative’s Workshop and follow-on activities. mSystems 6, e01194–20 (2021).
Reimer, L. C., Förstner, K. U. & Overmann, J. Besser forschen durch offene und FAIRe Daten. Biospektrum 28, 223 (2022).
Hamady, M. & Knight, R. Microbial community profiling for human microbiome projects: tools, techniques and challenges. Genome Res. 19, 1141–1152 (2009).
Mallick, H. et al. Experimental design and quantitative analysis of microbial community multiomics. Genome Biol. 18, 228 (2017).
Quince, C., Walker, A. W., Simpson, J. T., Loman, N. J. & Segata, N. Shotgun metagenomics, from sampling to analysis. Nat. Biotechnol. 35, 833–844 (2017).
Tenopir, C. et al. Data sharing, management, use and reuse: practices and perceptions of scientists worldwide. PLoS ONE 15, e0229003 (2020).
Kim, D. et al. Optimizing methods and dodging pitfalls in microbiome research. Microbiome 5, 52 (2017).
Pasolli, E., Truong, D. T., Malik, F., Waldron, L. & Segata, N. Machine learning meta-analysis of large metagenomic datasets: tools and biological insights. PLoS Comput. Biol. 12, e1004977 (2016).
Brooks, J. P. Challenges for case–control studies with microbiome data. Ann. Epidemiol. 26, 336–341 e331 (2016).
McCafferty, J. et al. Stochastic changes over time and not founder effects drive cage effects in microbial community assembly in a mouse model. ISME J. 7, 2116–2125 (2013).
Stappenbeck, T. S. & Virgin, H. W. Accounting for reciprocal host–microbiome interactions in experimental science. Nature 534, 191–199 (2016).
Bisanz, J. E., Upadhyay, V., Turnbaugh, J. A., Ly, K. & Turnbaugh, P. J. Meta-analysis reveals reproducible gut microbiome alterations in response to a high-fat diet. Cell Host Microbe 26, 265–272 e264 (2019).
Forney, L. J. et al. Comparison of self-collected and physician-collected vaginal swabs for microbiome analysis. J. Clin. Microbiol. 48, 1741–1748 (2010).
Kong, H. H. et al. Performing skin microbiome research: a method to the madness. J. Invest. Dermatol. 137, 561–568 (2017).
Stagaman, K., Sharpton, T. J. & Guillemin, K. Zebrafish microbiome studies make waves. Lab Anim. 49, 201–207 (2020).
Ten Hoopen, P. et al. Marine microbial biodiversity, bioinformatics and biotechnology (M2B3) data reporting and service standards. Stand. Genom. Sci. 10, 20 (2015).
Thompson, L. R. et al. A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 551, 457–463 (2017).
Karsch-Mizrachi, I., Takagi, T. & Cochrane, G. The International Nucleotide Sequence Database collaboration. Nucleic Acids Res. 46, D48–D51 (2018).
Mailman, M. D. et al. The NCBI dbGaP database of genotypes and phenotypes. Nat. Genet. 39, 1181–1186 (2007).
Rehm, H. L. et al. GA4GH: international policies and standards for data sharing across genomic research and healthcare. Cell Genom. 1, 100029 (2021).
Schirmer, M. et al. Dynamics of metatranscription in the inflammatory bowel disease gut microbiome. Nat. Microbiol. 3, 337–346 (2018).
Taylor, C. F. et al. The minimum information about a proteomics experiment (MIAPE). Nat. Biotechnol. 25, 887–893 (2007).
Conway, R. et al. Standardised coding of diet records: experiences from INTERMAP UK. Br. J. Nutr. 91, 765–771 (2004).
Schriml, L. M. et al. COVID-19 pandemic reveals the peril of ignoring metadata standards. Sci. Data 7, 188 (2020).
Meyer, F. et al. Tutorial: assessing metagenomics software with the CAMI benchmarking toolkit. Nat. Protoc. 16, 1785–1801 (2021).
Wang, J. & Jia, H. Metagenome-wide association studies: fine-mining the microbiome. Nat. Rev. Microbiol. 14, 508–522 (2016).
Thomas, A. M. et al. Metagenomic analysis of colorectal cancer datasets identifies cross-cohort microbial diagnostic signatures and a link with choline degradation. Nat. Med. 25, 667–678 (2019).
Weimann, A. et al. From genomes to phenotypes: Traitar, the microbial trait analyzer. mSystems 1, e00101–e00116 (2016).
Khaledi, A. et al. Predicting antimicrobial resistance in Pseudomonas aeruginosa with machine learning-enabled molecular diagnostics. EMBO Mol. Med. 12, e10264 (2020).
Asgari, E., Garakani, K., McHardy, A. C. & Mofrad, M. R. K. MicroPheno: predicting environments and host phenotypes from 16S rRNA gene sequencing using a k-mer based representation of shallow sub-samples. Bioinformatics 35, 1082 (2019).
Belmann, P. et al. Bioboxes: standardised containers for interchangeable bioinformatics software. Gigascience 4, 47 (2015).
O’Connor, B. D. et al. The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows. F1000Res 6, 52 (2017).
Bai, J. et al. BioContainers Registry: searching bioinformatics and proteomics tools, packages and containers. J. Proteome Res. 20, 2056–2061 (2021).
Goble C. et al. Implementing FAIR digital objects in the EOSC-Life workflow collaboratory. Zenodo https://doi.org/10.5281/zenodo.4605654 (2021).
Caporaso, J. G. et al. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7, 335–336 (2010).
Beghini, F. et al. Integrating taxonomic, functional and strain-level profiling of diverse microbial communities with bioBakery 3. eLife 10, e65088 (2021).
McDonald, D. et al. The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome. Gigascience 1, 7 (2012).
Roux, S. et al. Minimum Information about an Uncultivated Virus Genome (MIUViG). Nat. Biotechnol. 37, 29–37 (2019).
Mitchell, A. L. et al. MGnify: the microbiome analysis resource in 2020. Nucleic Acids Res. 48, D570–D578 (2020).
Pasolli, E. et al. Accessible, curated metagenomic data through ExperimentHub. Nat. Methods 14, 1023–1024 (2017).
McIver, L. J. et al. bioBakery: a meta’omic analysis environment. Bioinformatics 34, 1235–1237 (2018).
Meyer, F. et al. Critical assessment of metagenome interpretation: the second round of challenges. Nat. Methods 19, 429–440 (2022).
Chiu, C. Y. & Miller, S. A. Clinical metagenomics. Nat. Rev. Genet. 20, 341–355 (2019).
Chicco, D. Ten quick tips for machine learning in computational biology. BioData Min. 10, 35 (2017).
Alper, J. et al. Animal Models for Microbiome Research: Advancing Basic and Translational Science: Proceedings of a Workshop (National Academies Press, 2018).
Zimmermann, M., Zimmermann-Kogadeeva, M., Wegmann, R. & Goodman, A. L. Mapping human microbiome drug metabolism by gut bacteria and their genes. Nature 570, 462–467 (2019).
Garber, K. First microbiome-based drug clears phase III, in clinical trial turnaround. Nat. Rev. Drug Discov. 19, 655–656 (2020).
Mehta, R. S. et al. Gut microbial metabolism of 5-ASA diminishes its clinical efficacy in inflammatory bowel disease. Nat. Med. 29, 700–709 (2023).
Carmody, R. N. et al. Cooking shapes the structure and function of the gut microbiome. Nat. Microbiol. 4, 2052–2063 (2019).
Tett, A. et al. The Prevotella copri complex comprises four distinct clades underrepresented in westernized populations. Cell Host Microbe 26, 666–679 (2019).
Haiser, H. J. et al. Predicting and manipulating cardiac drug inactivation by the human gut bacterium Eggerthella lenta. Science 341, 295–298 (2013).
Maleki, F., Ovens, K., Hogan, D. J. & Kusalik, A. J. Gene set analysis: challenges, opportunities and future research. Front. Genet. 11, 654 (2020).
Knoppers, B. M., Harris, J. R., Budin-Ljosne, I. & Dove, E. S. A human rights approach to an international code of conduct for genomic and clinical data sharing. Hum. Genet. 133, 895–903 (2014).
Mangola, S. M., Lund, J. R., Schnorr, S. L. & Crittenden, A. N. Ethical microbiome research with Indigenous communities. Nat. Microbiol. 7, 749–756 (2022).
Zhang, Y. et al. Discovery of bioactive microbial gene products in inflammatory bowel disease. Nature 606, 754–760 (2022).
Carroll, S. R. et al. The CARE principles for Indigenous data governance. Data Sci. J 19, 43 (2020).
Gonzalez, A. et al. Qiita: rapid, web-enabled microbiome meta-analysis. Nat. Methods 15, 796–798 (2018).
Chen, I. A. et al. The IMG/M data management and analysis system v.6.0: new tools and advanced capabilities. Nucleic Acids Res. 49, D751–D763 (2021).
Meyer, F. et al. The metagenomics RAST server—a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9, 386 (2008).
Agafonov, A. et al. META-pipe cloud setup and execution. F1000Res. 6, 2060 (2017).
Goecks, J., Nekrutenko, A. & Taylor, J. Galaxy: a comprehensive approach for supporting accessible, reproducible and transparent computational research in the life sciences. Genome Biol. 11, R86 (2010).
Meyer, F. et al. AMBER: assessment of metagenome BinnERs. GigaScience 7, giy069 (2018).
Meyer, F. et al. Assessing taxonomic metagenome profilers with OPAL. Genome Biol. 20, 51 (2019).
Seppey, M., Manni, M. & Zdobnov, E. M. LEMMI: a continuous benchmarking platform for metagenomics classifiers. Genome Res. 30, 1208–1216 (2020).
Mikheenko, A., Saveliev, V. & Gurevich, A. MetaQUAST: evaluation of metagenome assemblies. Bioinformatics 32, 1088–1090 (2016).
Fritz, A. et al. CAMISIM: simulating metagenomes and microbial communities. Microbiome 7, 17 (2019).
Yuen, D. et al. The Dockstore: enhancing a community platform for sharing reproducible and accessible computational protocols. Nucleic Acids Res. 49, W624–W632 (2021).
Barrett, T. et al. BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata. Nucleic Acids Res. 40, D57–D63 (2012).
Courtot, M., Gupta, D., Liyanage, I., Xu, F. & Burdett, T. BioSamples database: FAIRer samples metadata to accelerate research data management. Nucleic Acids Res. 50, D1500–D1507 (2022).
Soiland-Reyes, S. et al. Packaging research artefacts with RO-Crate. Data Sci. 5, 97–138 (2022).
Glass, E. et al. Meeting report from the Genomic Standards Consortium (GSC) Workshop 10. Stand. Genomic Sci. 3, 225–231 (2010).
Members, M. S. I. B. et al. The metabolomics standards initiative. Nat. Biotechnol. 25, 846–848 (2007).
Buttigieg, P. L. et al. The environment ontology: contextualising biological and biomedical entities. J. Biomed. Semantics 4, 43 (2013).
Siegele, D. A. et al. Phenotype annotation with the ontology of microbial phenotypes (OMP). J. Biomed. Semantics 10, 13 (2019).
Kohler, S. et al. The Human Phenotype Ontology in 2021. Nucleic Acids Res. 49, D1207–D1217 (2021).
Gkoutos, G. V., Schofield, P. N. & Hoehndorf, R. The anatomy of phenotype ontologies: principles, properties and applications. Brief. Bioinformatics 19, 1008–1021 (2018).
Turner, P. et al. Microbiology Investigation Criteria for Reporting Objectively (MICRO): a framework for the reporting and interpretation of clinical microbiology data. BMC Med. 17, 70 (2019).
Environmental Chemicals, the Human Microbiome, and Health Risk: A Research Strategy (National Academy of Sciences, 2017).
Sud, M. et al. Metabolomics Workbench: an international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools. Nucleic Acids Res. 44, D463–D470 (2016).
Haug, K. et al. MetaboLights: a resource evolving in response to the needs of its scientific community. Nucleic Acids Res. 48, D440–D444 (2020).
Nothias, L. F. et al. Feature-based molecular networking in the GNPS analysis environment. Nat. Methods 17, 905–908 (2020).
Perez-Riverol, Y. et al. The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences. Nucleic Acids Res. 50, D543–D552 (2022).
Perez-Riverol, Y., Alpi, E., Wang, R., Hermjakob, H. & Vizcaino, J. A. Making proteomics data accessible and reusable: current state of proteomics databases and repositories. Proteomics 15, 930–949 (2015).
Deutsch, E. W. The PeptideAtlas Project. Methods Mol. Biol. 604, 285–296 (2010).
Deutsch, E. W. et al. The ProteomeXchange consortium in 2020: enabling ‘big data’ approaches in proteomics. Nucleic Acids Res. 48, D1145–D1152 (2020).
Orchard, S. et al. The MIntAct project–IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res. 42, D358–D363 (2014).
Chatr-Aryamontri, A. et al. The BioGRID interaction database: 2013 update. Nucleic Acids Res. 41, D816–D823 (2013).
Allan, C. et al. OMERO: flexible, model-driven data management for experimental biology. Nat. Methods 9, 245–253 (2012).
Williams, E. et al. The Image Data Resource: a bioimage data integration and publication platform. Nat. Methods 14, 775–781 (2017).
Pesant, S. et al. Open science resources for the discovery and analysis of Tara Oceans data. Sci. Data 2, 150023 (2015).
Hartley, M. et al. The BioImage Archive—building a home for life-sciences microscopy data. J. Mol. Biol. 434, 167505 (2022).
Yule, K. M. et al. Designing biorepositories to monitor ecological and evolutionary responses to change (version 1). Zenodo https://doi.org/10.5281/zenodo.3880411 (2020).
Human Microbiome Project Consortium A framework for human microbiome research. Nature 486, 215–221 (2012).
The Integrative HMP (iHMP) Research Network Consortium The Integrative Human Microbiome Project: dynamic analysis of microbiome-host omics profiles during periods of human health and disease. Cell Host Microbe 16, 276–289 (2014).
Salter, S. J. et al. Reagent and laboratory contamination can critically impact sequence-based microbiome analyses. BMC Biol. 12, 87 (2014).
Huang, W., Li, L., Myers, J. R. & Marth, G. T. ART: a next-generation sequencing read simulator. Bioinformatics 28, 593–594 (2012).
Acknowledgements
We thank E. Pelletier for providing helpful input and acknowledge funding by the German Research Foundation (NFDI4Microbiota, project no. 460129525 to A.C.M.) and the NIH National Institute of Diabetes and Digestive and Kidney Diseases (grant no. R24DK110499 to C.H.).
Author information
Authors and Affiliations
Contributions
C.H. and A.C.M. wrote the paper with comments from R.D.F. All authors discussed the content.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Microbiology thanks Adina Howe and Elisha Wood-Charlson for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Huttenhower, C., Finn, R.D. & McHardy, A.C. Challenges and opportunities in sharing microbiome data and analyses. Nat Microbiol 8, 1960–1970 (2023). https://doi.org/10.1038/s41564-023-01484-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41564-023-01484-x