Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Protocol
  • Published:

Identification and validation of microbial biomarkers from cross-cohort datasets using xMarkerFinder

Abstract

Microbial signatures have emerged as promising biomarkers for disease diagnostics and prognostics, yet their variability across different studies calls for a standardized approach to biomarker research. Therefore, we introduce xMarkerFinder, a four-stage computational framework for microbial biomarker identification with comprehensive validations from cross-cohort datasets, including differential signature identification, model construction, model validation and biomarker interpretation. xMarkerFinder enables the identification and validation of reproducible biomarkers for cross-cohort studies, along with the establishment of classification models and potential microbiome-induced mechanisms. Originally developed for gut microbiome research, xMarkerFinder’s adaptable design makes it applicable to various microbial habitats and data types. Distinct from existing biomarker research tools that typically concentrate on a singular aspect, xMarkerFinder uniquely incorporates a sophisticated feature selection process, specifically designed to address the heterogeneity between different cohorts, extensive internal and external validations, and detailed specificity assessments. Execution time varies depending on the sample size, selected algorithm and computational resource. Accessible via GitHub (https://github.com/tjcadd2020/xMarkerFinder), xMarkerFinder supports users with diverse expertise levels through different execution options, including step-to-step scripts with detailed tutorials and frequently asked questions, a single-command execution script, a ready-to-use Docker image and a user-friendly web server (https://www.biosino.org/xmarkerfinder).

Key points

  • The authors describe xMarkerFinder, a four-stage computational framework for microbial biomarker identification with comprehensive validations from cross-cohort datasets.

  • xMarkerFinder is the first computational framework aggregating meta-analyses and machine learning models for the establishment and validation of universally robust microbial biomarkers across multiple cohorts.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The procedure for selecting biomarkers from cross-cohort microbial profiles.
Fig. 2: The performance of microbial SNV biomarkers for the detection of adenoma10.
Fig. 3: Validation of the microbial biomarkers for adenoma diagnosis10.
Fig. 4: Overview of the xMarkerFinder workflow.
Fig. 5: Example output files for model construction and validation7.
Fig. 6: The initial interface of Gephi.
Fig. 7: Gephi import report.
Fig. 8: The interface for network construction.
Fig. 9: An example microbial network8.

Similar content being viewed by others

Data availability

Example input/output files, codes, frequently asked questions and a user message board for xMarkerFinder are provided in our GitHub repository (https://github.com/tjcadd2020/xMarkerFinder). Source data used for generating Figs. 3 and 9 can be accessed as Supplementary information.

Code availability

The single-command execution option and the step-to-step scripts for xMarkerFinder can be obtained at https://github.com/tjcadd2020/xmarkerfinder. The ready-to-use Docker image can be pulled from Docker Hub (https://hub.docker.com/r/tjcadd2022/xmarkerfinder) and the dockerfile used for creating this Docker image is also provided. The user-friendly web server for xMarkerFinder is available at https://www.biosino.org/xmarkerfinder/. The code in this protocol has been peer reviewed.

References

  1. Cullin, N., Azevedo Antunes, C., Straussman, R., Stein-Thoeringer, C. K. & Elinav, E. Microbiome and cancer. Cancer Cell 39, 1317–1341 (2021).

    Article  CAS  PubMed  Google Scholar 

  2. LaCourse, K. D., Johnston, C. D. & Bullman, S. The relationship between gastrointestinal cancers and the microbiota. Lancet Gastroenterol. Hepatol. 6, 498–509 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  3. Fan, Y. & Pedersen, O. Gut microbiota in human metabolic health and disease. Nat. Rev. Microbiol. 19, 55–71 (2021).

    Article  CAS  PubMed  Google Scholar 

  4. Britton, G. J. et al. Microbiotas from humans with inflammatory bowel disease alter the balance of gut Th17 and RORγt+ regulatory T cells and exacerbate colitis in mice. Immunity 50, 212–224. e214 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Mima, K. et al. Fusobacterium nucleatum in colorectal carcinoma tissue and patient prognosis. Gut 65, 1973–1980 (2016).

    Article  CAS  PubMed  Google Scholar 

  6. McQuade, J. L., Daniel, C. R., Helmink, B. A. & Wargo, J. A. Modulating the microbiome to improve therapeutic response in cancer. Lancet Oncol. 20, e77–e91 (2019).

    Article  PubMed  Google Scholar 

  7. Wu, Y. et al. Identification of microbial markers across populations in early detection of colorectal cancer. Nat. Commun. 12, 3063 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Liu, N.-N. et al. Multi-kingdom microbiota analyses identify bacterial–fungal interactions and biomarkers of colorectal cancer across cohorts. Nat. Microbiol. 7, 238–250 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Ma, S. et al. Population structure discovery in meta-analyzed microbial communities and inflammatory bowel disease using MMUPHin. Genome Biol. 23, 208 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Gao, W. et al. Multimodal metagenomic analysis reveals microbial single nucleotide variants as superior biomarkers for early detection of colorectal cancer. Gut Microbes 15, 2245562 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Zhu, X. et al. Multi-kingdom microbial signatures in excess body weight colorectal cancer based on global metagenomic analysis. Commun. Biol. 7, 24 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Gao, S. et al. Microbial genes outperform species and SNVs as diagnostic markers for Crohn’s disease on multicohort fecal metagenomes empowered by artificial intelligence. Gut Microbes 15, 2221428 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Relman, D. A. The human microbiome and the future practice of medicine. JAMA 314, 1127–1128 (2015).

    Article  CAS  PubMed  Google Scholar 

  14. Gao, L. et al. Oral microbiomes: more and more importance in oral cavity and whole body. Protein Cell 9, 488–500 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Byrd, A. L., Belkaid, Y. & Segre, J. A. The human skin microbiome. Nat. Rev. Microbiol. 16, 143–155 (2018).

    Article  CAS  PubMed  Google Scholar 

  16. Wang, Z. et al. Inflammatory endotype-associated airway microbiome in chronic obstructive pulmonary disease clinical stability and exacerbations: a multicohort longitudinal analysis. Am. J. Respir. Crit. Care Med. 203, 1488–1502 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Zhou, J. et al. Signatures of mucosal microbiome in oral squamous cell carcinoma identified using a random forest model. Cancer Manag. Res. 12, 5353–5363 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Zhang, L., Liu, Y., Zheng, H. J. & Zhang, C. P. The oral microbiota may have influence on oral cancer. Front. Cell Infect. Microbiol. 9, 476 (2019).

    Article  PubMed  Google Scholar 

  19. Zhao, H. et al. Variations in oral microbiota associated with oral cancer. Sci. Rep. 7, 11773 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Sunagawa, S. et al. Structure and function of the global ocean microbiome. Science 348, 1261359 (2015).

    Article  PubMed  Google Scholar 

  21. Salazar, G. et al. Gene expression changes and community turnover differentially shape the global ocean metatranscriptome. Cell 179, 1068–1083 e1021 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Govaere, O. et al. Transcriptomic profiling across the nonalcoholic fatty liver disease spectrum reveals gene signatures for steatohepatitis and fibrosis. Sci. Transl. Med. 12, eaba4448 (2020).

    Article  CAS  PubMed  Google Scholar 

  23. Pantano, L. et al. Molecular characterization and cell type composition deconvolution of fibrosis in NAFLD. Sci. Rep. 11, 18045 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Gerhard, G. S. et al. Transcriptomic profiling of obesity-related nonalcoholic steatohepatitis reveals a core set of fibrosis-specific genes. J. Endocr. Soc. 2, 710–726 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Segata, N. et al. Metagenomic biomarker discovery and explanation. Genome Biol. 12, R60 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Xiao, L., Zhang, F. & Zhao, F. Large-scale microbiome data integration enables robust biomarker identification. Nat. Comput. Sci. 2, 307–316 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Wirbel, J. et al. Microbiome meta-analysis and cross-disease comparison enabled by the SIAMCAT machine learning toolbox. Genome Biol. 22, 93 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Jiang, J. MIIDL: a Python package for microbial biomarkers identification powered by interpretable deep learning. Preprint at https://arxiv.org/abs/2109.12204 (2021).

  29. Wang, Y. & LeCao, K. A. Managing batch effects in microbiome data. Brief. Bioinform. 21, 1954–1970 (2020).

    Article  PubMed  Google Scholar 

  30. Ling, W. et al. Batch effects removal for microbiome data via conditional quantile regression. Nat. Commun. 13, 5418 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Gibbons, S. M., Duvallet, C. & Alm, E. J. Correcting for batch effects in case-control microbiome studies. PLoS Comput. Biol. 14, e1006102 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Dai, Z., Wong, S. H., Yu, J. & Wei, Y. Batch effects correction for microbiome data with Dirichlet-multinomial regression. Bioinformatics 35, 807–814 (2018).

    Article  Google Scholar 

  33. Knight, R. et al. Best practices for analysing microbiomes. Nat. Rev. Microbiol. 16, 410–422 (2018).

    Article  CAS  PubMed  Google Scholar 

  34. Coelho, L. P. et al. NG-meta-profiler: fast processing of metagenomes using NGLess, a domain-specific language. Microbiome 7, 84 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Faul, F., Erdfelder, E., Lang, A. G. & Buchner, A. G. Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav. Res. Methods 39, 175–191 (2007).

    Article  PubMed  Google Scholar 

  36. Faul, F., Erdfelder, E., Buchner, A. & Lang, A.-G. Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses. Behav. Res. Methods 41, 1149–1160 (2009).

    Article  PubMed  Google Scholar 

  37. Kelly, B. J. et al. Power and sample-size estimation for microbiome studies using pairwise distances and PERMANOVA. Bioinformatics 31, 2461–2468 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Casals-Pascual, C. et al. Microbial diversity in clinical microbiome studies: sample size and statistical power considerations. Gastroenterology 158, 1524–1528 (2020).

    Article  PubMed  Google Scholar 

  39. Pasolli, E. et al. Accessible, curated metagenomic data through ExperimentHub. Nat. Methods 14, 1023–1024 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Dai, D. et al. GMrepo v2: a curated human gut microbiome database with special focus on disease markers and cross-dataset comparison. Nucleic Acids Res. 50, D777–D784 (2022).

    Article  CAS  PubMed  Google Scholar 

  41. Janssens, Y. et al. Disbiome database: linking the microbiome to disease. BMC Microbiol. 18, 50 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  42. Shoaie, S. et al. Global and temporal state of the human gut microbiome in health and disease. Preprint at Research Square https://doi.org/10.21203/rs.3.rs-339282/v1 (2021).

  43. Weiss, S. et al. Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome 5, 27 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Molder, F. et al. Sustainable data analysis with Snakemake. F1000Res 10, 33 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (grant numbers 92251307 to R.Z. and G.Z., 82170542 to R.Z., 82000536 to N.J.), and the National Key Research and Development Program of China (grant number 2021YFF0703702 to R.Z.). We thank X. Huang, K. Chen and Y. Huang for testing xMarkerFinder and providing their constructive feedback.

Author information

Authors and Affiliations

Authors

Contributions

N.J., R.Z. and G.Z. conceived and designed the study. W.G. and W.Lin wrote the codes and step-to-step protocol. Q.L. and W.Li designed the user interface and wrote related codes. W.G., W.Lin and N.J. drafted the manuscript. W.C., W.Y., X.Z. and S.G. tested the protocol. L.L., D.W., G.Z., R.Z. and N.J. reviewed and edited the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Guoqing Zhang, Ruixin Zhu or Na Jiao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Protocols thanks Kang Ning, Jiachao Zhang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Key references using this protocol

Wu, Y. et al. Nat. Commun. 12, 3063 (2021): https://doi.org/10.1038/s41467-021-23265-y

Liu, N.-N. et al. Nat. Microbiol. 7, 238–250 (2022): https://doi.org/10.1038/s41564-021-01030-7

Gao, S. et al. Gut Microbes 15, 2221428 (2023): https://doi.org/10.1080/19490976.2023.2221428

Gao, W et al. Gut Microbes 15, 2245562 (2023): https://doi.org/10.1080/19490976.2023.2245562

Zhu, X et al. Commun. Biol. 7, 24 (2024): https://doi.org/10.1038/s42003-023-05714-0

Key data used in this protocol

Liu, N.-N. et al. Nat. Microbiol. 7, 238–250 (2022): https://doi.org/10.1038/s41564-021-01030-7

Extended data

Supplementary information

Supplementary Information

Supplementary Figs. 1 and 2 and Table 1.

Reporting Summary

Supplementary Data 1

Example data for the execution of xMarkerFinder.

Source data

Source Data Fig. 3

Statistical source data.

Source Data Fig. 9

Statistical source data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gao, W., Lin, W., Li, Q. et al. Identification and validation of microbial biomarkers from cross-cohort datasets using xMarkerFinder. Nat Protoc 19, 2803–2830 (2024). https://doi.org/10.1038/s41596-024-00999-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41596-024-00999-9

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing Microbiology

Sign up for the Nature Briefing: Microbiology newsletter — what matters in microbiology research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: Microbiology