We recently presented a three-pronged association study that integrated human intestinal microbiome data derived from shotgun-based sequencing with untargeted serum metabolome data and measures of host physiology. Metabolome and microbiome data are high dimensional, posing a major challenge for data integration. Here, we present a step-by-step computational protocol that details and discusses the dimensionality-reduction techniques used and methods for subsequent integration and interpretation of such heterogeneous types of data. Dimensionality reduction was achieved through a combination of data normalization approaches, binning of co-abundant genes and metabolites, and integration of prior biological knowledge. The use of prior knowledge to overcome functional redundancy across microbiome species is one central advance of our method over available alternative approaches. Applying this framework, other investigators can integrate various ‘-omics’ readouts with variables of host physiology or any other phenotype of interest (e.g., connecting host and microbiome readouts to disease severity or treatment outcome in a clinical cohort) in a three-pronged association analysis to identify potential mechanistic links to be tested in experimental settings. Although we originally developed the framework for a human metabolome–microbiome study, it is generalizable to other organisms and environmental metagenomes, as well as to studies including other -omics domains such as transcriptomics and proteomics. The provided R code runs in ~1 h on a standard PC.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Key references using this protocol

Pedersen, H. K. et al. Nature 535, 376–381 (2016): https://doi.org/10.1038/nature18646


  1. 1.

    Le Chatelier, E. et al. Richness of human gut microbiome correlates with metabolic markers. Nature 500, 541–546 (2013).

  2. 2.

    Qin, J. et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490, 55–60 (2012).

  3. 3.

    Karlsson, F. H. et al. Gut metagenome in European women with normal, impaired and diabetic glucose control. Nature 498, 99–103 (2013).

  4. 4.

    Forslund, K. et al. Disentangling type 2 diabetes and metformin treatment signatures in the human gut microbiota. Nature 528, 262–266 (2015).

  5. 5.

    Sharon, G. et al. Specialized metabolites from the microbiome in health and disease. Cell Metab. 20, 719–730 (2014).

  6. 6.

    Antharam, V. C. et al. An integrated metabolomic and microbiome analysis identified specific gut microbiota associated with fecal cholesterol and coprostanol in Clostridium difficile infection. PLoS ONE 11, 1–23 (2016).

  7. 7.

    Musso, G., Gambino, R. & Cassader, M. Interactions between gut microbiota and host metabolism predisposing to obesity and diabetes. Annu. Rev. Med. 62, 361–380 (2011).

  8. 8.

    Escobar-Zepeda, A., De León, A. V. P. & Sanchez-Flores, A. The road to metagenomics: from microbiology to DNA sequencing technologies and bioinformatics. Front. Genet. 6, 1–15 (2015).

  9. 9.

    Johnson, C. H., Ivanisevic, J. & Siuzdak, G. Metabolomics: beyond biomarkers and towards mechanisms. Nat. Rev. Mol. Cell Biol. 17, 451–459 (2016).

  10. 10.

    Pedersen, H. K. et al. Human gut microbes impact host serum metabolome and insulin sensitivity. Nature 535, 376–381 (2016).

  11. 11.

    Li, J. et al. An integrated catalog of reference genes in the human gut microbiome. Nat. Biotechnol. 32, 834–841 (2014).

  12. 12.

    Nielsen, H. B. et al. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat. Biotechnol. 32, 822–828 (2014).

  13. 13.

    Methé, B. A. et al. A framework for human microbiome research. Nature 486, 215–221 (2012).

  14. 14.

    Quince, C., Walker, A. W., Simpson, J. T., Loman, N. J. & Segata, N. Shotgun metagenomics, from sampling to analysis. Nat. Biotechnol. 35, 833–844 (2017).

  15. 15.

    Pollock, J., Glendinning, L., Wisedchanwet, T. & Watson, M. The madness of microbiome: attempting to find consensus “best practice” for 16S microbiome studies. Appl. Environ. Microbiol. https://doi.org/10.1128/AEM.02627-17 (2018).

  16. 16.

    Mallick, H. et al. Experimental design and quantitative analysis of microbial community multiomics. Genome Biol. 18, 228 (2017).

  17. 17.

    Knight, R. et al. Best practices for analysing microbiomes. Nat. Rev. Microbiol. 16, 410–422 (2018).

  18. 18.

    Nygren, H., Seppänen-Laakso, T., Castillo, S., Hyötyläinen, T. & Orešič, M. Liquid chromatography-mass spectrometry (LC-MS)-based lipidomics for studies of body fluids and tissues. Methods Mol. Biol. 708, 247–257 (2011).

  19. 19.

    Considine, E. C., Thomas, G., Boulesteix, A. L., Khashan, A. S. & Kenny, L. C. Critical review of reporting of the data analysis step in metabolomics. Metabolomics 14, 7 (2018).

  20. 20.

    Castillo, S., Mattila, I., Miettinen, J., Orešič, M. & Hyötyläinen, T. Data analysis tool for comprehensive two-dimensional gas chromatography/time-of-flight mass spectrometry. Anal. Chem. 83, 3058–3067 (2011).

  21. 21.

    Hyötyläinen, T. & Orešič, M. Optimizing the lipidomics workflow for clinical studies—practical considerations. Anal. Bioanal. Chem. 407, 4973–4993 (2015).

  22. 22.

    Cajka, T. & Fiehn, O. Toward merging untargeted and targeted methods in mass spectrometry-based metabolomics and lipidomics. Anal. Chem. 88, 524–545 (2016).

  23. 23.

    Begley, P. et al. Development and performance of a gas chromatography-time-of-flight mass spectrometry analysis for large-scale nontargeted metabolomic studies of human serum. Anal. Chem. 81, 7038–7046 (2009).

  24. 24.

    Tukey, J. Some thoughts on clinical trials, especially problems of multiplicity. Science 198, 679–684 (1977).

  25. 25.

    Zhao, W. et al. Weighted gene coexpression network analysis: state of the art. J. Biopharm. Stat. 20, 281–300 (2010).

  26. 26.

    Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).

  27. 27.

    Zhang, B. & Horvath, S. A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Mol. Biol. 4, 17 (2005).

  28. 28.

    Pei, G., Chen, L. & Zhang, W. WGCNA application to proteomic and metabolomic data analysis. Methods Enzymol. 585, 135–158 (2017).

  29. 29.

    Abubucker, S. et al. Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Comput. Biol. 8, e1002358 (2012).

  30. 30.

    Kim, S. ppcor: an R package for a fast calculation to semi-partial correlation coefficients. Commun. Stat. Appl. Methods 22, 665–674 (2015).

  31. 31.

    Rohart, F., Gautier, B., Singh, A. & Lê Cao, K.-A. mixOmics: an R package for ‘omics feature selection and multiple data integration’. PLoS Comput. Biol. 13, e1005752 (2017).

  32. 32.

    Wang, B. et al. Similarity network fusion for aggregating data types on a genomic scale. Nat. Methods 11, 333–337 (2014).

  33. 33.

    Chen, J., Bushman, F. D., Lewis, J. D., Wu, G. D. & Li, H. Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis. Biostatistics 14, 244–258 (2013).

  34. 34.

    Noecker, C. et al. Metabolic model-based integration of microbiome taxonomic and metabolomic profiles elucidates mechanistic links between ecological and metabolic variation. mSystems 1, e00013–e00015 (2016).

  35. 35.

    Chong, J. & Xia, J. Computational approaches for integrative analysis of the metabolome and microbiome. Metabolites 7, E62 (2017).

  36. 36.

    Friedman, J. & Alm, E. J. Inferring correlation networks from genomic survey data. PLoS Comput. Biol. 8, 1–11 (2012).

  37. 37.

    Weiss, S. et al. Correlation detection strategies in microbial data sets vary widely in sensitivity and precision. ISME J. 10, 1669–1681 (2016).

  38. 38.

    Gloor, G. B., Macklaim, J. M., Pawlowsky-Glahn, V. & Egozcue, J. J. Microbiome datasets are compositional: and this is not optional. Front. Microbiol. 8, 2224 (2017).

  39. 39.

    Quigley, E. M. M. Leaky gut-concept or clinical entity? Curr. Opin. Gastroenterol. 32, 74–79 (2016).

  40. 40.

    Kelly, J. R. et al. Breaking down the barriers: the gut microbiome, intestinal permeability and stress-related psychiatric disorders. Front. Cell. Neurosci. 9, 392 (2015).

  41. 41.

    Mu, Q., Kirby, J., Reilly, C. M. & Luo, X. M. Leaky gut as a danger signal for autoimmune diseases. Front. Immunol. 8, 1–10 (2017).

  42. 42.

    Meijnikman, A. S., Gerdes, V. E., Nieuwdorp, M. & Herrema, H. Evaluating causality of gut microbiota in obesity and diabetes in humans. Endocr. Rev. 39, 133–153 (2018).

  43. 43.

    Walter, J. & Ley, R. The human gut microbiome: ecology and recent evolutionary changes. Annu. Rev. Microbiol. 65, 411–429 (2011).

  44. 44.

    Costea, P. I. et al. Towards standards for human fecal sample processing in metagenomic studies. Nat. Biotechnol. 35, 1069–1076 (2017).

  45. 45.

    Vandeputte, D. et al. Quantitative microbiome profiling links gut community variation to microbial load. Nature 551, 507–511 (2017).

  46. 46.

    Weiss, S. et al. Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome https://doi.org/10.1186/s40168-017-0237-y (2017).

  47. 47.

    Gloor, G. B. & Reid, G. Compositional analysis: a valid approach to analyze microbiome high-throughput sequencing data. Can. J. Microbiol. 62, 692–703 (2016).

  48. 48.

    Saary, P., Forslund, K., Bork, P. & Hildebrand, F. RTK: efficient rarefaction analysis of large datasets. Bioinformatics 33, 2594–2595 (2017).

  49. 49.

    Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 1–21 (2014).

  50. 50.

    Kang, D. D., Froula, J., Egan, R. & Wang, Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3, e1165 (2015).

  51. 51.

    Wu, Y.-W., Tang, Y.-H., Tringe, S. G., Simmons, B. A. & Singer, S. W. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome 2, 26 (2014).

  52. 52.

    Sunagawa, S. et al. Metagenomic species profiling using universal phylogenetic marker genes. Nat. Methods 10, 1196–1199 (2013).

  53. 53.

    Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014).

  54. 54.

    Segata, N. et al. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods 9, 811–814 (2012).

  55. 55.

    Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y. & Morishima, K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45, D353–D361 (2017).

  56. 56.

    Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).

  57. 57.

    Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000).

  58. 58.

    Caspi, R. et al. The MetaCyc database of metabolic pathways and enzymes. Nucleic Acids Res. 46, D633–D639 (2018).

  59. 59.

    Tatusov, R. L. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 28, 33–36 (2000).

  60. 60.

    Lombard, V., Golaconda Ramulu, H., Drula, E., Coutinho, P. M. & Henrissat, B. The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res. 42, 490–495 (2014).

  61. 61.

    Vieira-Silva, S. et al. Species–function relationships shape ecological properties of the human gut microbiome. Nat. Microbiol. 1, 16088 (2016).

  62. 62.

    Wold, S., Esbensen, K. & Geladi, P. Principal component analysis. Chemom. Intell. Lab. Syst. 2, 37–52 (1987).

  63. 63.

    Devarajan, K. Nonnegative matrix factorization: an analytical and interpretive tool in computational biology. PLoS Comput. Biol. 4, e1000029 (2008).

  64. 64.

    Kamburov, A., Stelzl, U., Lehrach, H. & Herwig, R. The ConsensusPathDB interaction database: 2013 update. Nucleic Acids Res. 41, D793–D800 (2013).

  65. 65.

    Forslund, K. et al. Country-specific antibiotic use practices impact the human gut resistome. Genome Res. 23, 1163–1169 (2013).

Download references


This research received funding from the European Community’s Seventh Framework Programme (FP7/2007–2013): MetaHIT, grant agreement HEALTH-F4-2007-201052 and MetaCardis, grant agreement HEALTH-2012-305312. The Department of Bio and Health Informatics, Technical University of Denmark, and the Novo Nordisk Foundation Center for Basic Metabolic Research have in addition received support from the Innovative Medicines Initiative Joint Undertaking under grant agreement no. 115317 (DIRECT), the resources of which are composed of financial contributions from the European Union’s Seventh Framework Programme (FP7/2007–2013) and EFPIA companies’ in kind contribution. The Novo Nordisk Foundation Center for Protein Research received funding from the Novo Nordisk Foundation (grant agreement NNF14CC0001). The Novo Nordisk Foundation Center for Basic Metabolic Research is an independent research center at the University of Copenhagen partially funded by an unrestricted donation from the Novo Nordisk Foundation (http://www.metabol.ku.dk). A.Ø.P. received funding from the Lundbeck Foundation (grant R218-2016-1367) and S.D.E. received funding from Agence Nationale de la Recherche MetaGenoPolis grant ‘Investissements d’avenir’ ANR-11-DPBS-0001.

Author information

Author notes

  1. These authors contributed equally: Helle Krogh Pedersen, Sofia K. Forslund


  1. The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark

    • Helle Krogh Pedersen
    • , Trine Nielsen
    • , Torben Hansen
    •  & Oluf Pedersen
  2. Experimental and Clinical Research Centre, a joint center of Max Delbrück Centre for Molecular Medicine & Charité University Hospital, Berlin, Germany

    • Sofia K. Forslund
  3. Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany

    • Sofia K. Forslund
    • , Falk Hildebrand
    •  & Peer Bork
  4. Department of Bio and Health Informatics, Technical University of Denmark, Kongens Lyngby, Denmark

    • Valborg Gudmundsdottir
    • , Anders Østergaard Petersen
    •  & Søren Brunak
  5. MTM Research Centre, School of Science and Technology, Örebro University, Örebro, Sweden

    • Tuulia Hyötyläinen
  6. MetaGénoPolis (MGP), INRA, Université Paris-Saclay, Jouy-en-Josas, France

    • S. Dusko Ehrlich
  7. Centre for Host-Microbiome Interactions, Dental Institute Central Office, Guy’s Hospital, King’s College London, London, UK

    • S. Dusko Ehrlich
  8. Novo Nordisk Foundation Center for Protein Research, Disease Systems Biology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark

    • Søren Brunak
  9. School of Medical Sciences, Örebro University, Örebro, Sweden

    • Matej Oresic
  10. Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, Turku, Finland

    • Matej Oresic
  11. Clinical Microbiomics A/S, Copenhagen, Denmark

    • Henrik Bjørn Nielsen


  1. Search for Helle Krogh Pedersen in:

  2. Search for Sofia K. Forslund in:

  3. Search for Valborg Gudmundsdottir in:

  4. Search for Anders Østergaard Petersen in:

  5. Search for Falk Hildebrand in:

  6. Search for Tuulia Hyötyläinen in:

  7. Search for Trine Nielsen in:

  8. Search for Torben Hansen in:

  9. Search for Peer Bork in:

  10. Search for S. Dusko Ehrlich in:

  11. Search for Søren Brunak in:

  12. Search for Matej Oresic in:

  13. Search for Oluf Pedersen in:

  14. Search for Henrik Bjørn Nielsen in:


The protocol was written by S.K.F., H.K.P., V.G., A.Ø.P., F.H., T. Hyötyläinen., T.N. and H.B.N., together with T. Hansen, S.D.E., S.B., M.O., P.B. and O.P., with reusable code and example data compiled and tested by H.K.P. and V.G.

Competing interests

The authors declare no competing interests.

Corresponding author

Correspondence to Oluf Pedersen.

Integrated supplementary information

  1. Supplementary Figure 1 Composition and concentration range of lipids and polar metabolites detected in the human serum samples in the MetaHIT cohort.

    The polar metabolites include both fully identified metabolites as well as metabolites identified only at the compound class level (n=778) while the lipids include only fully identified such (n=287).

Supplementary information

  1. Supplementary Figure 1

    Supplementary Figure 1 and Supplementary Methods

  2. Supplementary Table 1

  3. Supplementary Data

About this article

Publication history





By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.