Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Learning representations of microbe–metabolite interactions

Matters Arising to this article was published on 04 January 2021


Integrating multiomics datasets is critical for microbiome research; however, inferring interactions across omics datasets has multiple statistical challenges. We solve this problem by using neural networks ( to estimate the conditional probability that each molecule is present given the presence of a specific microorganism. We show with known environmental (desert soil biocrust wetting) and clinical (cystic fibrosis lung) examples, our ability to recover microbe–metabolite relationships, and demonstrate how the method can discover relationships between microbially produced metabolites and inflammatory bowel disease.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Input data types and mmvec neural network architecture.
Fig. 2: Simulation benchmarks.
Fig. 3: Metabolites released by M. vaginatus after the biocrust wetting event.
Fig. 4: Investigation of P. aeruginosa-associated molecules.
Fig. 5: Microbe–metabolite co-occurrences across study of hepatocarcinoma progression in the context of innate immunity in a mouse model.
Fig. 6: Microbe–metabolite interactions of the human microbiome in association with IBD samples.

Data availability

The cystic fibrosis sequencing and metadata data can be found at under study ID 10863. The corresponding GNPS analysis can be accessed at The biocrust soil data were retrieved from the supplemental section in Swenson et al.30. The HFD murine model case study 16S rRNA data can be found at under study ID 10856. The HFD murine model case study data are publicly available at under MassIVE ID MSV000080918. The GNPS analysis for this study can be accessed at The IBD data used can be found under

Code availability

The software implementing the mmvec algorithm can be found under Differential abundance analyses in the HFD study were performed using L2-regularized multinomial regression using software available at The software used to build the multiomics network can be found at Biplots were generated using Emperor47.


  1. 1.

    Jansson, J. K. & Baker, E. S. A multi-omic future for microbiome studies. Nat. Microbiol. 1, 645 (2016).

    Article  CAS  Google Scholar 

  2. 2.

    Noecker, C. et al. Metabolic model-based integration of microbiome taxonomic and metabolomic profiles elucidates mechanistic links between ecological and metabolic variation. MSystems 1, e00013–e00015 (2016).

    PubMed  PubMed Central  Article  Google Scholar 

  3. 3.

    Mallick, H. et al. Predictive metabolomic profiling of microbial communities using amplicon or metagenomic sequences. Nat. Commun. 10, 3136 (2019).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  4. 4.

    Knight, R. et al. Best practices for analysing microbiomes. Nat. Rev. Microbiol. 16, 410–422 (2018).

    CAS  PubMed  Article  Google Scholar 

  5. 5.

    Meng, C. et al. Dimension reduction techniques for the integrative analysis of multi-omics data. Brief. Bioinform. 17, 628–641 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  6. 6.

    Gall, G. Le et al. Metabolomics of fecal extracts detects altered metabolic activity of gut microbiota in ulcerative colitis and irritable bowel syndrome. J. Proteome Res. 10, 4208–4218 (2011).

    PubMed  Article  CAS  Google Scholar 

  7. 7.

    Rohart, F. et al. mixomics: an r package for ‘omics feature selection and multiple data integration. PLoS Comput. Biol. 13, e1005752 (2017).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  8. 8.

    Wang, B. et al. Similarity network fusion for aggregating data types on a genomic scale. Nat. Methods 11, 333 (2014).

    CAS  PubMed  Article  Google Scholar 

  9. 9.

    Argelaguet, R. et al. Multi-omics factor analysis—a framework for unsupervised integration of multi-omics data sets. Mol. Sys. Biol. 14, e8124 (2018).

    Google Scholar 

  10. 10.

    Ter Braak, C. J. F. & Verdonschot, P. F. M. Canonical correspondence analysis and related multivariate methods in aquatic ecology. Aquat. Sci. 57, 255–289 (1995).

    Article  Google Scholar 

  11. 11.

    Witten, D. M., Tibshirani, R. & Hastie, T. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10, 515–534 (2009).

    PubMed  PubMed Central  Article  Google Scholar 

  12. 12.

    Bodein, A., Chapleur, O., Droit, A. & Lê Cao K. A. A generic multivariate framework for the integration of microbiome longitudinal studies with other data types. Preprint at bioRxiv (2019).

  13. 13.

    Friedman, J. & Alm, E. J. Inferring correlation networks from genomic survey data. PLoS Comput. Biol. 8, e1002687 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  14. 14.

    Kurtz, Z. D. et al. Sparse and compositionally robust inference of microbial ecological networks. PLoS Comput. Biol. 11, e1004226 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  15. 15.

    Weiss, S. et al. Correlation detection strategies in microbial data sets vary widely in sensitivity and precision. ISME J. 10, 1669–1681 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  16. 16.

    Vandeputte, D. et al. Quantitative microbiome profiling links gut community variation to microbial load. Nature 551, 507–511 (2017).

    CAS  PubMed  Article  Google Scholar 

  17. 17.

    . & Gloor, G. B. et al. Microbiome datasets are compositional: and this is not optional. Front. Microbiol. 8, 2224 (2017).

    PubMed  PubMed Central  Article  Google Scholar 

  18. 18.

    Tang, K., Page, J. S. & Smith, R. D. Charge competition and the linear dynamic range of detection in electrospray ionization mass spectrometry. J. Am. Soc. Mass Spectrom. 15, 1416–1423 (2004).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  19. 19.

    King, R., Bonfiglio, R., Fernandez-Metzler, C., Miller-Stein, C. & Olah, T. Mechanistic investigation of ionization suppression in electrospray ionization. J. Am. Soc. Mass Spectrom. 11, 942–950 (2000).

    CAS  PubMed  Article  Google Scholar 

  20. 20.

    Matuszewski, B. K., Constanzer, M. L. & Chavez-Eng, C. M. Strategies for the assessment of matrix effect in quantitative bioanalytical methods based on HPLC–MS/MS. Anal. Chem. 75, 3019–3030 (2003).

    CAS  PubMed  Article  Google Scholar 

  21. 21.

    Kalivodová, A. et al. Pls-da for compositional data with application to metabolomics. J. Chemom. 29, 21–28 (2015).

    Article  CAS  Google Scholar 

  22. 22.

    Jansson, J. K. & Baker, E. S. A multi-omic future for microbiome studies. Nat. Microbiol. 1, 16049 (2016).

    CAS  PubMed  Article  Google Scholar 

  23. 23.

    Lovell, D., Pawlowsky-Glahn, V., Egozcue, J. J., Marguerat, S. & Bähler, J. Proportionality: a valid alternative to correlation for relative data. PLoS Comput. Biol. 11, e1004075 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  24. 24.

    Tipton, L. et al. Fungi stabilize connectivity in the lung and skin microbial ecosystems. Microbiome 6, 12 (2018).

    PubMed  PubMed Central  Article  Google Scholar 

  25. 25.

    Morton, J. T. et al. Establishing microbial composition measurement standards with reference frames. Nat. Commun. 10, 2719 (2019).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  26. 26.

    Abadi, M. et al. Tensorflow: A system for large-scale machine learning. In Proc 12th Symposium on Operating Systems Design and Implementation 265–283 (USENIX Association, 2016).

  27. 27.

    Quinn, R. A. et al. Niche partitioning of a pathogenic microbiome driven by chemical gradients. Sci. Adv. 4, eaau1908 (2018).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  28. 28.

    Shalapour, S. et al. Inflammation-induced IgA+ cells dismantle anti-liver cancer immunity. Nature 551, 340–345 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  29. 29.

    Lloyd-Price, J. et al. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature 569, 655–662 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  30. 30.

    Swenson, T. L., Karaoz, U., Swenson, J. M., Bowen, B. P. & Northen, T. R. Linking soil biology and chemistry in biological soil crust using isolate exometabolomics. Nat. Commun. 9, 19 (2018).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  31. 31.

    Baran, R. et al. Exometabolite niche partitioning among sympatric soil bacteria. Nat. Commun. 6, 8289 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  32. 32.

    Quinn, R. A. et al. A Winogradsky-based culture system shows an association between microbial fermentation and cystic fibrosis exacerbation. ISME J. 9, 1024–1038 (2015).

    CAS  PubMed  Article  Google Scholar 

  33. 33.

    Moree, W. J. et al. Interkingdom metabolic transformations captured by microbial imaging mass spectrometry. Proc. Natl Acad. Sci. USA 109, 13811–13816 (2012).

    CAS  PubMed  Article  Google Scholar 

  34. 34.

    Wang, M. et al. Sharing and community curation of mass spectrometry data with global natural products social molecular networking. Nat. Biotechnol. 34, 828–837 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  35. 35.

    Maier, R. M. & Soberon-Chavez, G. Pseudomonas aeruginosa rhamnolipids: biosynthesis and potential applications. Appl. Microbiol. Biotechnol. 54, 625–633 (2000).

    CAS  PubMed  Article  Google Scholar 

  36. 36.

    Wood, T. L. et al. Rhamnolipids from Pseudomonas aeruginosa disperse the biofilms of sulfate-reducing bacteria. NPJ Biofilms Microbiomes 4, 22 (2018).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  37. 37.

    Allen, L. et al. Pyocyanin production by Pseudomonas aeruginosa induces neutrophil apoptosis and impairs neutrophil-mediated host defenses in vivo. J. Immunol. 174, 3643–3649 (2005).

    CAS  PubMed  Article  Google Scholar 

  38. 38.

    Quinn, R. A. et al. Chemical impacts of the microbiome across scales reveal novel conjugated bile acids. Preprint at bioRxiv (2019).

  39. 39.

    Paczosa, M. K. & Mecsas, J. Klebsiella pneumoniae: going on the offense with a strong defense. Microbiol. Mol. Biol. Rev. 80, 629–661 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  40. 40.

    Tiratterra, E. et al. Role of bile acids in inflammatory bowel disease. Ann. Gastroenterol. 31, 266 (2018).

    PubMed  PubMed Central  Google Scholar 

  41. 41.

    Hofmann, A. F. & Eckmann, L. How bile acids confer gut mucosal protection against bacteria. Proc. Natl Acad. Sci. USA 103, 4333–4334 (2006).

    CAS  PubMed  Article  Google Scholar 

  42. 42.

    Begley, M., Gahan, C. G. M. & Hill, C. The interaction between bacteria and bile. FEMS Microbiol. Rev. 29, 625–651 (2005).

    CAS  PubMed  Article  Google Scholar 

  43. 43.

    Okada, Y. et al. Propionibacterium freudenreichii component 1,4-dihydroxy-2-naphthoic acid (DHNA) attenuates dextran sodium sulphate induced colitis by modulation of bacterial flora and lymphocyte homing. Gut 55, 681–688 (2006).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  44. 44.

    Foligne, B. et al. Immunomodulation properties of multi-species fermented milks. Food Microbiol. 53, 60–69 (2016).

    CAS  PubMed  Article  Google Scholar 

  45. 45.

    Ple, C. et al. Combining selected immunomodulatory Propionibacterium freudenreichii and Lactobacillus delbrueckii strains: reverse engineering development of an anti-inflammatory cheese. Mol. Nutr. Food Res. 60, 935–948 (2016).

    CAS  PubMed  Article  Google Scholar 

  46. 46.

    Colliou, N. et al. Commensal Propionibacterium strain uf1 mitigates intestinal inflammation via th17 cell regulation. J. Clin. Invest. 127, 3970–3986 (2017).

    PubMed  PubMed Central  Article  Google Scholar 

  47. 47.

    Vázquez-Baeza, Y., Pirrung, M., Gonzalez, A. & Knight, R. Emperor: a tool for visualizing high-throughput microbial community data. Gigascience 2, 16 (2013).

    PubMed  PubMed Central  Article  Google Scholar 

  48. 48.

    Nasrabadi, N. M. Pattern recognition and machine learning. J. Electron. Imaging 16, 049901 (2007).

    Article  Google Scholar 

  49. 49.

    Pawlowsky-Glahn, V., Egozcue, J. J. & Tolosana-Delgado, R. Modeling and Analysis of Compositional Data. (John Wiley & Sons, 2015).

  50. 50.

    Tomas, M., Ilya, S., Kai, C., Corrado, G. S. & Dean, J. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26 (eds Burges, C. J. C. et al.) 3111–3119 (NIPS, 2013).

  51. 51.

    Koren, Y., Bell, R. & Volinsky, C. Matrix factorization techniques for recommender systems. Computer 42, 30–37 (2009).

    Article  Google Scholar 

  52. 52.

    D. P. Kingma and J. Ba. Adam: a method for stochastic optimization. Preprint at arXiv (2014).

  53. 53.

    Blei, D. M., Ng, A. Y. & Jordan, M. I. Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003).

    Google Scholar 

  54. 54.

    Sankaran, K. & Holmes, S. P. Latent variable modeling for the microbiome. Biostatistics 20, 599–614 (2019).

    PubMed  Article  Google Scholar 

  55. 55.

    Aitchison, J. & Greenacre, M. Biplots of compositional data. J. R. Stat. Soc. Ser. C Appl. Stat. 51, 375–392 (2002).

    Article  Google Scholar 

  56. 56.

    Aitchison, J. & Ng, K. W. Conditional compositional biplots: theory and application. DUGiDocs (2005).

  57. 57.

    Martín-Fernández, J. A., Pawlowsky-Glahn, V., Egozcue, J. J. & Tolosona-Delgado, R. Advances in principal balances for compositional data. Math. Geosci. 50, 273–298 (2018).

    Article  Google Scholar 

  58. 58.

    Bolyen, E. et al. Reproducible, interactive, scalable, and extensible microbiome data science using QIIME 2. Nat. Biotechnol. 37, 852–857 (2019).

    CAS  PubMed  Article  Google Scholar 

Download references


We would like to thank V. Pawlowsky, J. J. Egozcue and S. Holmes for their insights on the geometry of this neural network model. In addition, we would also like to thank N. Bokulich for feedback and contributions on the mmvec software package. T.L.S., M.W.V.G. and T.R.N. acknowledge funding from the Office of Science Early Career Research Program, Office of Biological and Environmental Research of the U.S. Department of Energy under contract number DE-AC02-05CH11231 to Lawrence Berkeley National Laboratory. This study was in part supported by grant P41GM103484 for the Center for Computational Mass Spectrometry and instrument support through National Institutes of Health grants S10RR029121 and R03 CA211211 on reuse of metabolomics data. Y.V.B. is funded by the Janssen Human Microbiome Institute through a collaboration with the Center for Microbiome Innovation. J.T.M. was funded by National Science Foundation grant GRFP DGE-1144086. R.K. and S.J.S. have been funded by Janssen under grant number 20175015 and the Alfred P. Sloan Foundation under grant number G-2017-9838.

Author information




J.T.M. wrote the mmvec algorithm, conducted the benchmarks and ran all of the analyses. A.A.A. and L.F.N. preprocessed and annotated the metabolomics data. A.A.A. provided insights in the HFD study. J.R.F. provided insights behind word2vec and topic modeling. M.H.B. benchmarked SPIEC-EASI. R.A.Q. provided insights behind the cystic fibrosis study and simulations. Y.V.-B. provided insights behind the interpretation of the IBD analysis. M.W. developed the GNPS workflow for mmvec. N.A.B developed the heat map visualizations. A.W. developed the network visualizations. T.L.S., M.W.V.G and T.N. provided insights into the biocrust soils experiment. R.B. provided insights behind the simulation benchmarks. S.J.S provided ecological insights. P.C.D provided insights behind metabolomics. All authors were involved with writing the manuscript.

Corresponding author

Correspondence to Rob Knight.

Ethics declarations

Competing interests

Mingxun Wang is the founder of Ometa Labs LLC. The remaining authors declare no competing interests.

Additional information

Peer review information Lei Tang was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Description of the compositionality issue.

(a) An illustration of how false negatives can occur - in the absolute abundance data, there is a strong Pearson correlation between the microbes and the metabolites across (n=50 samples). These correlations disappear when considering the corresponding proportions. (b) An illustration of how false positives can occur - in the absolute abundance data, there is no correlation between the dark green molecule and the dark blue microbe (n=50 samples). However, the proportions of the same dataset show that there is a very strong correlation between the dark blue and the dark green molecule.

Supplementary Figure 2 Illustration of how excessive misannotation rates can occur.

(a) Absolute abundances and relative abundances of microbes/metabolites observed in an environment over time, with each microbe/metabolite colored according to its rate of increase / decrease (n=30). (b) A scale-invariance comparison of statistical methodologies. Points are colored by the corresponding microbes in the interactions; triangle markers represent increasing metabolites and decreasing metabolites. Mmvec is the only method that remains consistent between the absolute and relative abundances.

Supplementary Figure 3 Comparison of Pearson and mmvec on Cystic Fibrosis study.

(a) Estimates of P. aeruginosa associated molecules between Pearson and the conditional probabilities calculated from the mmvec applied to the cystic fibrosis study dataset. The annotations correspond to level 2 or 3 of the metabolomics standards initiative and may correspond to different isomeric species (n=462 molecules). (b) Ranks of Pearson coefficients and conditional probabilities from the mmvec for the Rhamnolipids (n=462 molecules). (c) Pyochelin proportions vs P. aeruginosa proportions.

Supplementary Figure 4 Negative log likelihood and prediction accuracy of mmvec.

Tensorboard visualization of training error and cross-validation error of mmvec on the IBG dataset. Five different runs with differing initialization conditions are shown.

Supplementary Figure 5 GNPS job output.

An example of job on the GNPS website with the job description and the downloadable output files from mmvec.

Supplementary information

Supplementary Information

Supplementary Figs. 1–5 and Supplementary Note

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Morton, J.T., Aksenov, A.A., Nothias, L.F. et al. Learning representations of microbe–metabolite interactions. Nat Methods 16, 1306–1314 (2019).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing