Abstract
Microbial signatures have emerged as promising biomarkers for disease diagnostics and prognostics, yet their variability across different studies calls for a standardized approach to biomarker research. Therefore, we introduce xMarkerFinder, a four-stage computational framework for microbial biomarker identification with comprehensive validations from cross-cohort datasets, including differential signature identification, model construction, model validation and biomarker interpretation. xMarkerFinder enables the identification and validation of reproducible biomarkers for cross-cohort studies, along with the establishment of classification models and potential microbiome-induced mechanisms. Originally developed for gut microbiome research, xMarkerFinder’s adaptable design makes it applicable to various microbial habitats and data types. Distinct from existing biomarker research tools that typically concentrate on a singular aspect, xMarkerFinder uniquely incorporates a sophisticated feature selection process, specifically designed to address the heterogeneity between different cohorts, extensive internal and external validations, and detailed specificity assessments. Execution time varies depending on the sample size, selected algorithm and computational resource. Accessible via GitHub (https://github.com/tjcadd2020/xMarkerFinder), xMarkerFinder supports users with diverse expertise levels through different execution options, including step-to-step scripts with detailed tutorials and frequently asked questions, a single-command execution script, a ready-to-use Docker image and a user-friendly web server (https://www.biosino.org/xmarkerfinder).
Key points
-
The authors describe xMarkerFinder, a four-stage computational framework for microbial biomarker identification with comprehensive validations from cross-cohort datasets.
-
xMarkerFinder is the first computational framework aggregating meta-analyses and machine learning models for the establishment and validation of universally robust microbial biomarkers across multiple cohorts.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
Example input/output files, codes, frequently asked questions and a user message board for xMarkerFinder are provided in our GitHub repository (https://github.com/tjcadd2020/xMarkerFinder). Source data used for generating Figs. 3 and 9 can be accessed as Supplementary information.
Code availability
The single-command execution option and the step-to-step scripts for xMarkerFinder can be obtained at https://github.com/tjcadd2020/xmarkerfinder. The ready-to-use Docker image can be pulled from Docker Hub (https://hub.docker.com/r/tjcadd2022/xmarkerfinder) and the dockerfile used for creating this Docker image is also provided. The user-friendly web server for xMarkerFinder is available at https://www.biosino.org/xmarkerfinder/. The code in this protocol has been peer reviewed.
References
Cullin, N., Azevedo Antunes, C., Straussman, R., Stein-Thoeringer, C. K. & Elinav, E. Microbiome and cancer. Cancer Cell 39, 1317–1341 (2021).
LaCourse, K. D., Johnston, C. D. & Bullman, S. The relationship between gastrointestinal cancers and the microbiota. Lancet Gastroenterol. Hepatol. 6, 498–509 (2021).
Fan, Y. & Pedersen, O. Gut microbiota in human metabolic health and disease. Nat. Rev. Microbiol. 19, 55–71 (2021).
Britton, G. J. et al. Microbiotas from humans with inflammatory bowel disease alter the balance of gut Th17 and RORγt+ regulatory T cells and exacerbate colitis in mice. Immunity 50, 212–224. e214 (2019).
Mima, K. et al. Fusobacterium nucleatum in colorectal carcinoma tissue and patient prognosis. Gut 65, 1973–1980 (2016).
McQuade, J. L., Daniel, C. R., Helmink, B. A. & Wargo, J. A. Modulating the microbiome to improve therapeutic response in cancer. Lancet Oncol. 20, e77–e91 (2019).
Wu, Y. et al. Identification of microbial markers across populations in early detection of colorectal cancer. Nat. Commun. 12, 3063 (2021).
Liu, N.-N. et al. Multi-kingdom microbiota analyses identify bacterial–fungal interactions and biomarkers of colorectal cancer across cohorts. Nat. Microbiol. 7, 238–250 (2022).
Ma, S. et al. Population structure discovery in meta-analyzed microbial communities and inflammatory bowel disease using MMUPHin. Genome Biol. 23, 208 (2022).
Gao, W. et al. Multimodal metagenomic analysis reveals microbial single nucleotide variants as superior biomarkers for early detection of colorectal cancer. Gut Microbes 15, 2245562 (2023).
Zhu, X. et al. Multi-kingdom microbial signatures in excess body weight colorectal cancer based on global metagenomic analysis. Commun. Biol. 7, 24 (2024).
Gao, S. et al. Microbial genes outperform species and SNVs as diagnostic markers for Crohn’s disease on multicohort fecal metagenomes empowered by artificial intelligence. Gut Microbes 15, 2221428 (2023).
Relman, D. A. The human microbiome and the future practice of medicine. JAMA 314, 1127–1128 (2015).
Gao, L. et al. Oral microbiomes: more and more importance in oral cavity and whole body. Protein Cell 9, 488–500 (2018).
Byrd, A. L., Belkaid, Y. & Segre, J. A. The human skin microbiome. Nat. Rev. Microbiol. 16, 143–155 (2018).
Wang, Z. et al. Inflammatory endotype-associated airway microbiome in chronic obstructive pulmonary disease clinical stability and exacerbations: a multicohort longitudinal analysis. Am. J. Respir. Crit. Care Med. 203, 1488–1502 (2021).
Zhou, J. et al. Signatures of mucosal microbiome in oral squamous cell carcinoma identified using a random forest model. Cancer Manag. Res. 12, 5353–5363 (2020).
Zhang, L., Liu, Y., Zheng, H. J. & Zhang, C. P. The oral microbiota may have influence on oral cancer. Front. Cell Infect. Microbiol. 9, 476 (2019).
Zhao, H. et al. Variations in oral microbiota associated with oral cancer. Sci. Rep. 7, 11773 (2017).
Sunagawa, S. et al. Structure and function of the global ocean microbiome. Science 348, 1261359 (2015).
Salazar, G. et al. Gene expression changes and community turnover differentially shape the global ocean metatranscriptome. Cell 179, 1068–1083 e1021 (2019).
Govaere, O. et al. Transcriptomic profiling across the nonalcoholic fatty liver disease spectrum reveals gene signatures for steatohepatitis and fibrosis. Sci. Transl. Med. 12, eaba4448 (2020).
Pantano, L. et al. Molecular characterization and cell type composition deconvolution of fibrosis in NAFLD. Sci. Rep. 11, 18045 (2021).
Gerhard, G. S. et al. Transcriptomic profiling of obesity-related nonalcoholic steatohepatitis reveals a core set of fibrosis-specific genes. J. Endocr. Soc. 2, 710–726 (2018).
Segata, N. et al. Metagenomic biomarker discovery and explanation. Genome Biol. 12, R60 (2011).
Xiao, L., Zhang, F. & Zhao, F. Large-scale microbiome data integration enables robust biomarker identification. Nat. Comput. Sci. 2, 307–316 (2022).
Wirbel, J. et al. Microbiome meta-analysis and cross-disease comparison enabled by the SIAMCAT machine learning toolbox. Genome Biol. 22, 93 (2021).
Jiang, J. MIIDL: a Python package for microbial biomarkers identification powered by interpretable deep learning. Preprint at https://arxiv.org/abs/2109.12204 (2021).
Wang, Y. & LeCao, K. A. Managing batch effects in microbiome data. Brief. Bioinform. 21, 1954–1970 (2020).
Ling, W. et al. Batch effects removal for microbiome data via conditional quantile regression. Nat. Commun. 13, 5418 (2022).
Gibbons, S. M., Duvallet, C. & Alm, E. J. Correcting for batch effects in case-control microbiome studies. PLoS Comput. Biol. 14, e1006102 (2018).
Dai, Z., Wong, S. H., Yu, J. & Wei, Y. Batch effects correction for microbiome data with Dirichlet-multinomial regression. Bioinformatics 35, 807–814 (2018).
Knight, R. et al. Best practices for analysing microbiomes. Nat. Rev. Microbiol. 16, 410–422 (2018).
Coelho, L. P. et al. NG-meta-profiler: fast processing of metagenomes using NGLess, a domain-specific language. Microbiome 7, 84 (2019).
Faul, F., Erdfelder, E., Lang, A. G. & Buchner, A. G. Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav. Res. Methods 39, 175–191 (2007).
Faul, F., Erdfelder, E., Buchner, A. & Lang, A.-G. Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses. Behav. Res. Methods 41, 1149–1160 (2009).
Kelly, B. J. et al. Power and sample-size estimation for microbiome studies using pairwise distances and PERMANOVA. Bioinformatics 31, 2461–2468 (2015).
Casals-Pascual, C. et al. Microbial diversity in clinical microbiome studies: sample size and statistical power considerations. Gastroenterology 158, 1524–1528 (2020).
Pasolli, E. et al. Accessible, curated metagenomic data through ExperimentHub. Nat. Methods 14, 1023–1024 (2017).
Dai, D. et al. GMrepo v2: a curated human gut microbiome database with special focus on disease markers and cross-dataset comparison. Nucleic Acids Res. 50, D777–D784 (2022).
Janssens, Y. et al. Disbiome database: linking the microbiome to disease. BMC Microbiol. 18, 50 (2018).
Shoaie, S. et al. Global and temporal state of the human gut microbiome in health and disease. Preprint at Research Square https://doi.org/10.21203/rs.3.rs-339282/v1 (2021).
Weiss, S. et al. Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome 5, 27 (2017).
Molder, F. et al. Sustainable data analysis with Snakemake. F1000Res 10, 33 (2021).
Acknowledgements
This work was supported by the National Natural Science Foundation of China (grant numbers 92251307 to R.Z. and G.Z., 82170542 to R.Z., 82000536 to N.J.), and the National Key Research and Development Program of China (grant number 2021YFF0703702 to R.Z.). We thank X. Huang, K. Chen and Y. Huang for testing xMarkerFinder and providing their constructive feedback.
Author information
Authors and Affiliations
Contributions
N.J., R.Z. and G.Z. conceived and designed the study. W.G. and W.Lin wrote the codes and step-to-step protocol. Q.L. and W.Li designed the user interface and wrote related codes. W.G., W.Lin and N.J. drafted the manuscript. W.C., W.Y., X.Z. and S.G. tested the protocol. L.L., D.W., G.Z., R.Z. and N.J. reviewed and edited the manuscript. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Protocols thanks Kang Ning, Jiachao Zhang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Related links
Key references using this protocol
Wu, Y. et al. Nat. Commun. 12, 3063 (2021): https://doi.org/10.1038/s41467-021-23265-y
Liu, N.-N. et al. Nat. Microbiol. 7, 238–250 (2022): https://doi.org/10.1038/s41564-021-01030-7
Gao, S. et al. Gut Microbes 15, 2221428 (2023): https://doi.org/10.1080/19490976.2023.2221428
Gao, W et al. Gut Microbes 15, 2245562 (2023): https://doi.org/10.1080/19490976.2023.2245562
Zhu, X et al. Commun. Biol. 7, 24 (2024): https://doi.org/10.1038/s42003-023-05714-0
Key data used in this protocol
Liu, N.-N. et al. Nat. Microbiol. 7, 238–250 (2022): https://doi.org/10.1038/s41564-021-01030-7
Extended data
Supplementary information
Supplementary Information
Supplementary Figs. 1 and 2 and Table 1.
Supplementary Data 1
Example data for the execution of xMarkerFinder.
Source data
Source Data Fig. 3
Statistical source data.
Source Data Fig. 9
Statistical source data.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gao, W., Lin, W., Li, Q. et al. Identification and validation of microbial biomarkers from cross-cohort datasets using xMarkerFinder. Nat Protoc 19, 2803–2830 (2024). https://doi.org/10.1038/s41596-024-00999-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41596-024-00999-9
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.