Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Predicting metabolomic profiles from microbial composition through neural ordinary differential equations

A preprint version of the article is available at bioRxiv.

Abstract

Characterizing the metabolic profile of a microbial community is crucial for understanding its biological function and its impact on the host or environment. Metabolomics experiments directly measuring these profiles are difficult and expensive, whereas sequencing methods quantifying the species composition of microbial communities are well developed and relatively cost-effective. Computational methods that are capable of predicting metabolomic profiles from microbial compositions can save considerable efforts needed for metabolomic profiling experimentally. Yet, despite existing efforts, we still lack a computational method with high prediction power, general applicability and great interpretability. Here we develop a method called metabolomic profile predictor using neural ordinary differential equations (mNODE), based on a state-of-the-art family of deep neural network models. We show compelling evidence that mNODE outperforms existing methods in predicting the metabolomic profiles of human microbiomes and several environmental microbiomes. Moreover, in the case of human gut microbiomes, mNODE can naturally incorporate dietary information to further enhance the prediction of metabolomic profiles. Furthermore, susceptibility analysis of mNODE enables us to reveal microbe–metabolite interactions, which can be validated using both synthetic and real data. The results demonstrate that mNODE is a powerful tool to investigate the microbiome–diet–metabolome relationship, facilitating future research on precision nutrition.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The mNODE workflow to predict metabolomic profiles from species microbial compositions and other dietary information.
Fig. 2: Model comparison and validation of mNODE on synthetic data generated by the MiCRM.
Fig. 3: Performance comparison between mNODE and existing methods on real microbial community datasets.
Fig. 4: Performance of mNODE with different combinations of data types (microbial compositions, food profiles and nutritional profiles) included in the input.
Fig. 5: Using susceptibility of metabolite concentrations to microbial compositions of well-trained mNODE to infer microbe–metabolite interactions on both synthetic and real data (PRISM + NLIBD).

Similar content being viewed by others

Data availability

The datasets of PRISM + NLIBD33, lung samples21 and soil biocrust samples34 can be found at https://github.com/YDaiLab/MiMeNet/tree/master/data. The microbiome sequencing data, food frequency data and metabolomics data from VDAART35,36,37,38 are part of the Environmental influences on Child Health Outcomes (ECHO) consortium and ECHO consortium members and other interested scientists can obtain the data directly from the ECHO Data Coordinating Center. The FNDDS (USDA’s Food and Nutrient Database for Dietary Studies)43 can be found at https://data.nal.usda.gov/dataset/food-and-nutrient-database-dietary-studies-fndds. Source data are provided with this paper.

Code availability

All code for running mNODE can be found at https://github.com/wt1005203/mNODE(ref. 58).

References

  1. Donia, M. S. & Fischbach, M. A. Small molecules from the human microbiota. Science 349, 1254766 (2015).

  2. Koh, A., De Vadder, F., Kovatcheva-Datchary, P. & Bäckhed, F. From dietary fiber to host physiology: short-chain fatty acids as key bacterial metabolites. Cell 165, 1332–1345 (2016).

    Article  Google Scholar 

  3. Koppel, N., Rekdal, V. M. & Balskus, E. P. Chemical transformation of xenobiotics by the human gut microbiota. Science 356, eaag2770 (2017).

  4. Myhrstad, M. C., Tunsjø, H., Charnock, C. & Telle-Hansen, V. H. Dietary fiber, gut microbiota, and metabolic regulation—current status in human randomized trials. Nutrients 12, 859 (2020).

    Article  Google Scholar 

  5. Lin, R., Liu, W., Piao, M. & Zhu, H. A review of the relationship between the gut microbiota and amino acid metabolism. Amino Acids 49, 2083–2090 (2017).

    Article  Google Scholar 

  6. Tyson, G. W. et al. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428, 37–43 (2004).

    Article  Google Scholar 

  7. Flint, H. J., Scott, K. P., Louis, P. & Duncan, S. H. The role of the gut microbiota in nutrition and health. Nat. Rev. Gastroenterol. Hepatol. 9, 577–589 (2012).

    Article  Google Scholar 

  8. Lloyd-Price, J. et al. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature 569, 655–662 (2019).

    Article  Google Scholar 

  9. Yang, Q. et al. Metabolomics biotechnology, applications, and future trends: a systematic review. RSC Adv. 9, 37245–37257 (2019).

    Article  Google Scholar 

  10. Castelli, F. A. et al. Metabolomics for personalized medicine: the input of analytical chemistry from biomarker discovery to point-of-care tests. Anal. Bioanal. Chem. 414, 759–789 (2022).

    Article  Google Scholar 

  11. Dias-Audibert, F. L. et al. Combining machine learning and metabolomics to identify weight gain biomarkers. Front. Bioeng. Biotechnol. 8, 6 (2020).

    Article  Google Scholar 

  12. Zheng, C., Zhang, S., Ragg, S., Raftery, D. & Vitek, O. Identification and quantification of metabolites in 1H NMR spectra by Bayesian model selection. Bioinformatics 27, 1637–1644 (2011).

    Article  Google Scholar 

  13. Information Resources Management Association. Bioinformatics: Concepts, Methodologies, Tools, and Applications (IGI Global, 2013).

  14. Johnson, C. H. & Gonzalez, F. J. Challenges and opportunities of metabolomics. J. Cell. Physiol. 227, 2975–2981 (2012).

    Article  Google Scholar 

  15. Ayling, M., Clark, M. D. & Leggett, R. M. New approaches for metagenome assembly with short reads. Brief. Bioinform. 21, 584–594 (2020).

    Article  Google Scholar 

  16. Brumfield, K. D., Huq, A., Colwell, R. R., Olds, J. L. & Leddy, M. B. Microbial resolution of whole genome shotgun and 16S amplicon metagenomic sequencing using publicly available neon data. PLoS ONE 15, e0228899 (2020).

    Article  Google Scholar 

  17. Garza, D. R., van Verk, M. C., Huynen, M. A. & Dutilh, B. E. Towards predicting the environmental metabolome from metagenomics with a mechanistic model. Nat. Microbiol. 3, 456–460 (2018).

    Article  Google Scholar 

  18. Noecker, C. et al. Metabolic model-based integration of microbiome taxonomic and metabolomic profiles elucidates mechanistic links between ecological and metabolic variation. MSystems 1, e00013–15 (2016).

    Article  Google Scholar 

  19. Yin, X. et al. A comparative evaluation of tools to predict metabolite profiles from microbiome sequencing data. Front. Microbiol. 11, 3132 (2020).

    Article  Google Scholar 

  20. Kettle, H., Louis, P., Holtrop, G., Duncan, S. H. & Flint, H. J. Modelling the emergent dynamics and major metabolites of the human colonic microbiota. Environ. Microbiol. 17, 1615–1630 (2015).

    Article  Google Scholar 

  21. Quinn, R. A. et al. Niche partitioning of a pathogenic microbiome driven by chemical gradients. Sci. Adv. 4, eaau1908 (2018).

    Article  Google Scholar 

  22. Wang, T., Goyal, A., Dubinkina, V. & Maslov, S. Evidence for a multi-level trophic organization of the human gut microbiome. PLoS Comput. Biol. 15, e1007524 (2019).

    Article  Google Scholar 

  23. Goyal, A., Wang, T., Dubinkina, V. & Maslov, S. Ecology-guided prediction of cross-feeding interactions in the human gut microbiome. Nat. Commun. 12, 1335 (2021).

  24. Mallick, H. et al. Predictive metabolomic profiling of microbial communities using amplicon or metagenomic sequences. Nat. Commun. 10, 3136 (2019).

    Article  Google Scholar 

  25. Le, V., Quinn, T. P., Tran, T. & Venkatesh, S. Deep in the bowel: highly interpretable neural encoder–decoder networks predict gut metabolites from gut microbiome. BMC Genom. 21, 256 (2020).

  26. Reiman, D., Layden, B. T. & Dai, Y. MiMeNet: exploring microbiome–metabolome relationships using neural networks. PLoS Comput. Biol. 17, e1009021 (2021).

    Article  Google Scholar 

  27. Morton, J. T. et al. Learning representations of microbe–metabolite interactions. Nat. Methods 16, 1306–1314 (2019).

    Article  Google Scholar 

  28. Chen, R. T., Rubanova, Y., Bettencourt, J. & Duvenaud, D. Neural ordinary differential equations. In Advances in Neural Information Processing Systems 31, 6572–6583 (NeurIPS, 2018).

  29. Lu, Y., Zhong, A., Li, Q. & Dong, B. Beyond finite layer neural networks: bridging deep architectures and numerical differential equations. In International Conference on Machine Learning 3276–3285 (PMLR, 2018).

  30. Qiu, C., Bendickson, A., Kalyanapu, J. & Yan, J. Accuracy and architecture studies of residual neural network solving ordinary differential equations. Preprint at arXiv https://doi.org/10.48550/arXiv.2101.03583 (2021).

  31. Dutta, S., Rivera-Casillas, P. & Farthing, M. W. Neural ordinary differential equations for data-driven reduced order modeling of environmental hydrodynamics. Preprint at https://doi.org/10.48550/arXiv.2104.13962 (2021).

  32. Marsland III, R. et al. Available energy fluxes drive a transition in the diversity, stability, and functional structure of microbial communities. PLoS Comput. Biol. 15, e1006793 (2019).

    Article  Google Scholar 

  33. Franzosa, E. A. et al. Gut microbiome structure and metabolic activity in inflammatory bowel disease. Nat. Microbiol. 4, 293–305 (2019).

    Article  Google Scholar 

  34. Swenson, T. L., Karaoz, U., Swenson, J. M., Bowen, B. P. & Northen, T. R. Linking soil biology and chemistry in biological soil crust using isolate exometabolomics. Nat. Commun. 9, 19 (2018).

    Article  Google Scholar 

  35. Litonjua, A. A. et al. Effect of prenatal supplementation with vitamin D on asthma or recurrent wheezing in offspring by age 3 years: the VDAART randomized clinical trial. JAMA 315, 362–370 (2016).

    Article  Google Scholar 

  36. Litonjua, A. A. et al. Six-year follow-up of a trial of antenatal vitamin D for asthma reduction. N. Engl. J. Med. 382, 525–533 (2020).

    Article  Google Scholar 

  37. Lee-Sarwar, K. A. et al. Integrative analysis of the intestinal metabolome of childhood asthma. J. Allergy Clin. Immunol. 144, 442–454 (2019).

    Article  Google Scholar 

  38. Lee-Sarwar, K. et al. Association of the gut microbiome and metabolome with wheeze frequency in childhood asthma. J. Allergy Clin. Immunol. 147, AB53 (2021).

    Article  Google Scholar 

  39. Harvard Willett Food Frequency Questionnaire (T.H. Chan School of Public Health, Department of Nutrition, Harvard Univ., 2015).

  40. Plan and Operation of the Third National Health and Nutrition Examination Survey, 1988–94 (National Centre for Health Statistics, 1994).

  41. Nelson, K. M., Reiber, G. & Boyko, E. J. Diet and exercise among adults with type 2 diabetes: findings from the third National Health and Nutrition Examination Survey (NHANES III). Diabetes Care 25, 1722–1728 (2002).

    Article  Google Scholar 

  42. Marriott, B. P., Olsho, L., Hadden, L. & Connor, P. Intake of added sugars and selected nutrients in the United States, National Health and Nutrition Examination Survey (NHANES) 2003-2006. Crit. Rev. Food Sci. Nutr. 50, 228–258 (2010).

    Article  Google Scholar 

  43. Moshfegh, A. Food and Nutrient Database for Dietary Studies (US Department of Agriculture, Agricultural Research Service, Food Surveys Research Group, 2022); http://www.ars.usda.gov/nea/bhnrc/fsrg

  44. Ridlon, J. M., Kang, D.-J. & Hylemon, P. B. Bile salt biotransformations by human intestinal bacteria. J. Lipid Res. 47, 241–259 (2006).

    Article  Google Scholar 

  45. Bachmann, V. et al. Bile salts modulate the mucin-activated type VI secretion system of pandemic Vibrio cholerae. PLoS Negl. Trop. Dis. 9, e0004031 (2015).

    Article  Google Scholar 

  46. Ramírez-Pérez, O., Cruz-Ramón, V., Chinchilla-López, P. & Méndez-Sánchez, N. The role of the gut microbiota in bile acid metabolism. Ann. Hepatol. 16, 21–26 (2018).

    Article  Google Scholar 

  47. Jia, W., Xie, G. & Jia, W. Bile acid-microbiota crosstalk in gastrointestinal inflammation and carcinogenesis. Nat. Rev. Gastroenterol. Hepatol. 15, 111–128 (2018).

    Article  Google Scholar 

  48. Heinken, A. et al. Systematic assessment of secondary bile acid metabolism in gut microbes reveals distinct metabolic capabilities in inflammatory bowel disease. Microbiome 7, 75 (2019).

  49. Duboc, H. et al. Connecting dysbiosis, bile-acid dysmetabolism and gut inflammation in inflammatory bowel diseases. Gut 62, 531–539 (2013).

    Article  Google Scholar 

  50. Thomas, J. P., Modos, D., Rushbrook, S. M., Powell, N. & Korcsmaros, T. The emerging role of bile acids in the pathogenesis of inflammatory bowel disease. Front. Immunol. 13, 246 (2022).

  51. Kristal, A. R., Peters, U. & Potter, J. D. Is it time to abandon the food frequency questionnaire? Cancer Epidemiol. Biomarkers Prev. 14, 2826–2828 (2005).

    Article  Google Scholar 

  52. Scalbert, A. et al. The food metabolome: a window over dietary exposure. Am. J. Clin. Nutr. 99, 1286–1308 (2014).

    Article  Google Scholar 

  53. Bolyen, E. et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat. Biotechnol. 37, 852–857 (2019).

    Article  Google Scholar 

  54. Callahan, B. J. et al. DADA: high-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581–583 (2016).

    Article  Google Scholar 

  55. Evans, A. M. et al. High resolution mass spectrometry improves data quantity and quality as compared to unit mass resolution mass spectrometry in high-throughput profiling metabolomics. Metabolomics 4, 1 (2014).

    Google Scholar 

  56. Blum, R. E. et al. Validation of a food frequency questionnaire in Native American and Caucasian children 1 to 5 years of age. Matern. Child Health J. 3, 167–172 (1999).

    Article  Google Scholar 

  57. Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://doi.org/10.48550/arXiv.1412.6980 (2014).

  58. Wang, T. wt1005203/mnode: initial release. Zenodo https://doi.org/10.5281/zenodo.7602940 (2023).

Download references

Acknowledgements

Y.-Y.L. acknowledges grants from the National Institutes of Health (R01AI141529, R01HD093761, RF1AG067744, UH3OD023268, U19AI095219 and U01HL089856). K.A.L.-S. acknowledges support from the National Institutes of Health (K08HL148178). We thank N. Laranjo for VDAART data support.

Author information

Authors and Affiliations

Authors

Contributions

Y.-Y.L. conceived the project. T.W. and Y.-Y.L. designed the project. T.W. performed all the numerical calculations and data analysis. T.W. processed the real data with assistance from X.-W.W. and K.A.L.-S. All authors analysed the results. T.W. and Y.-Y.L. wrote the paper. All authors edited and approved the paper.

Corresponding author

Correspondence to Yang-Yu Liu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Robert Marsland and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Comparison of SCCs of annotated metabolites on the test set NLIBD.

For each annotated metabolite, its SCC between its predicted values and true values across samples is computed for all computational methods. a Comparison of SCCs of all annotated metabolites between MiMeNet and ResNet. b Comparison of SCCs of all well-predicted annotated metabolites between MiMeNet and ResNet. Well-predicted metabolites are metabolites that have SCCs larger than 0.5 according to either MiMeNet or ResNet. c Comparison of SCCs of all annotated metabolites between MiMeNet and mNODE. d Comparison of SCCs of all well-predicted annotated metabolites between MiMeNet and mNODE. Well-predicted metabolites are metabolites that have SCCs larger than 0.5 according to either MiMeNet or mNODE.

Source data

Extended Data Fig. 2 Bray–Curtis dissimilarity of the food consumption profiles in FFQs and nutritional profiles across samples in VDAART.

The histogram of Bray–Curtis dissimilarity for all paired food consumption profiles in FFQs and nutritional profiles.

Source data

Extended Data Fig. 3 Using susceptibility of metabolite concentrations to microbial relative abundances of well-trained mNODE to infer microbe-metabolite interactions on new synthetic data.

New synthetic data in this figure are generated by the microbial consumer-resource model with species-specific by-product generations and without the overlap between consumption and production interactions between microbes and metabolites. See Supplementary Section 4 for more details of this model. a The susceptibility of the concentration of metabolite α (yα) to the relative abundance of species i (xi), denoted as sαi, is defined as the ratio between the deviation in the concentration of metabolite αyα) and the perturbation amount in the relative abundance of species ixi). b Susceptibility values for all microbe-metabolite pairs in the synthetic data. c The ground-truth consumption matrix and corresponding rates in synthetic data. All consumption rates are shown as negative values for the convenience of comparison with panel b. d The ground-truth production matrix and corresponding rates in synthetic data. e The ROC (Receiver Operating Characteristic) curve based on TP (True Positive) rates and FP (False Positive) rates which are obtained by setting different susceptibility thresholds for classifications of interactions.

Source data

Extended Data Fig. 4 The training sample size needed to reach a great predictive performance scales linearly with the number of species in synthetic data.

Synthetic data in this figure are generated by the microbial consumer-resource model with nutrient sampling probability pn = 1.0. For the case with 100 species and varying number of metabolites (100, 200, or 300), three metrics are used for comparing model performances: a1 the mean SCC \(\bar{\rho }\), a2 the top-50 mean SCC \({\bar{\rho }}_{50}\), and a3 the number of metabolites with SCCs larger than 0.8 divided by the number of metabolites Nρ>0.8/Nm. b1-b3 The performance metrics for the case with 200 species and varying numbers of metabolites (100, 200, or 300).

Source data

Extended Data Fig. 5 The training sample size needed to reach a great predictive performance scales linearly with the number of species in synthetic data for MelonnPan, MiMeNet, and mNODE.

Synthetic data in this figure are generated by the microbial consumer-resource model with nutrient sampling probability pn = 1.0. For the case with 100 species and varying number of metabolites (100, 200, or 300), three metrics are used for comparing model performances: the mean SCC \(\bar{\rho }\), the top-50 mean SCC \({\bar{\rho }}_{50}\), and the number of metabolites with SCCs larger than 0.8 divided by the number of metabolites Nρ>0.8/Nm.

Source data

Extended Data Fig. 6 The Spearman correlation coefficients of metabolites ρ between its predicted and true values from the test set well correlate with those from the training set for the PRISM+NLIBD dataset.

Source data

Extended Data Fig. 7 More performance metrics to compare mNODE with existing methods on five real microbial community datasets.

For the dataset PRISM+NLIBD, seven performance metrics are used: a1 the top-3 mean SCC \({\bar{\rho }}_{3}\), a2 the top-5 mean SCC \({\bar{\rho }}_{5}\), a3 the top-10 mean SCC \({\bar{\rho }}_{10}\), a4 the top-50 mean SCC \({\bar{\rho }}_{50}\), a5 the number of metabolites with SCCs larger than 0.3 Nρ>0.3, a6 the number of metabolites with SCCs larger than 0.4 Nρ>0.4, and a7 the number of metabolites with SCCs larger than 0.5 Nρ>0.5. b1-b7 Performance of methods measured by seven metrics on the data from lung samples of patients with cystic fibrosis. c1-c7 Performance of methods on the data from soil biocrust samples after 5 wetting events. d1-d7 Performance of methods on the data from faecal samples of children at age 3. e1-e7 Performance of methods on the data from blood plasma samples of children at age 3.

Source data

Extended Data Fig. 8 50 mNODE training repeats with different initializations have very consistent predictive performances on real microbial community datasets.

For the dataset PRISM+NLIBD, three performance metrics are adopted for comparing model performances: : the mean SCC \(\bar{\rho }\), the top-50 mean SCC \({\bar{\rho }}_{50}\), and the number of metabolites with SCCs larger than 0.5, 0.4, and 0.3 (denoted as Nρ>0.5, Nρ>0.4, and Nρ>0.3 respectively). All datasets are randomly divided into training and test sets with the 80/20 ratio except for the PRISM and NLIBD dataset. In all box plots, the middle orange line is the median, the box extends from the first quartile (Q1) to the third quartile (Q3) of the data, the black whiskers extend from the box by 1.5 × IQR (where IQR is the interquartile range), and outlier unfilled black circles are those beyond the range defined by two whiskers. The sample size n=50 for all box plots.

Source data

Extended Data Fig. 9 Computed susceptibility values from five mNODE training repeats with different initializations are highly correlated with each other on the PRISM+NLIBD dataset.

Source data

Supplementary information

Supplementary Information

Supplementary Discussion and technical details.

Reporting Summary

Supplementary Table 1

Susceptibility values for five datasets of real microbiomes.

Source data

Source Data Fig. 2

Source data for the bar and line plots.

Source Data Fig. 3

Source data for the bar plots.

Source Data Fig. 4

Source data for the bar plots.

Source Data Fig. 5

Source data for the heatmap plots and statistical source data.

Source Data Extended Data Fig. 1

Source data for the scatter plots.

Source Data Extended Data Fig. 2

Source data for the histogram plot.

Source Data Extended Data Fig. 3

Source data for the heatmap plots and statistical source data.

Source Data Extended Data Fig. 4

Source data for the line plots.

Source Data Extended Data Fig. 5

Source data for the line plots.

Source Data Extended Data Fig. 6

Source data for the scatter plot.

Source Data Extended Data Fig. 7

Source data for the bar plots.

Source Data Extended Data Fig. 8

Source data for the box plots.

Source Data Extended Data Fig. 9

Source data for the scatter plots.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, T., Wang, XW., Lee-Sarwar, K.A. et al. Predicting metabolomic profiles from microbial composition through neural ordinary differential equations. Nat Mach Intell 5, 284–293 (2023). https://doi.org/10.1038/s42256-023-00627-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-023-00627-3

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing