A computational framework to integrate high-throughput ‘-omics’ datasets for the identification of potential mechanistic links


We recently presented a three-pronged association study that integrated human intestinal microbiome data derived from shotgun-based sequencing with untargeted serum metabolome data and measures of host physiology. Metabolome and microbiome data are high dimensional, posing a major challenge for data integration. Here, we present a step-by-step computational protocol that details and discusses the dimensionality-reduction techniques used and methods for subsequent integration and interpretation of such heterogeneous types of data. Dimensionality reduction was achieved through a combination of data normalization approaches, binning of co-abundant genes and metabolites, and integration of prior biological knowledge. The use of prior knowledge to overcome functional redundancy across microbiome species is one central advance of our method over available alternative approaches. Applying this framework, other investigators can integrate various ‘-omics’ readouts with variables of host physiology or any other phenotype of interest (e.g., connecting host and microbiome readouts to disease severity or treatment outcome in a clinical cohort) in a three-pronged association analysis to identify potential mechanistic links to be tested in experimental settings. Although we originally developed the framework for a human metabolome–microbiome study, it is generalizable to other organisms and environmental metagenomes, as well as to studies including other -omics domains such as transcriptomics and proteomics. The provided R code runs in ~1 h on a standard PC.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Overview of the protocol workflow integrating human phenotype, serum metabolome and gut microbiome data.
Fig. 2: Schematic illustration of the leave-one-MGS-out approach in the driver-species analysis.
Fig. 3: Sample plot produced by the Procedure in Step 15.
Fig. 4: Sample plot produced by the Procedure in Step 19 for the leave-one-MGS-out analysis, here shown for the combined BCAA–biosynthesis module (KEGG modules M00019, M00570, M00535 and M00432; together constituting 13 KOs).


  1. 1.

    Le Chatelier, E. et al. Richness of human gut microbiome correlates with metabolic markers. Nature 500, 541–546 (2013).

    Article  Google Scholar 

  2. 2.

    Qin, J. et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490, 55–60 (2012).

    CAS  Article  Google Scholar 

  3. 3.

    Karlsson, F. H. et al. Gut metagenome in European women with normal, impaired and diabetic glucose control. Nature 498, 99–103 (2013).

    CAS  Article  Google Scholar 

  4. 4.

    Forslund, K. et al. Disentangling type 2 diabetes and metformin treatment signatures in the human gut microbiota. Nature 528, 262–266 (2015).

    CAS  Article  Google Scholar 

  5. 5.

    Sharon, G. et al. Specialized metabolites from the microbiome in health and disease. Cell Metab. 20, 719–730 (2014).

    CAS  Article  Google Scholar 

  6. 6.

    Antharam, V. C. et al. An integrated metabolomic and microbiome analysis identified specific gut microbiota associated with fecal cholesterol and coprostanol in Clostridium difficile infection. PLoS ONE 11, 1–23 (2016).

    Article  Google Scholar 

  7. 7.

    Musso, G., Gambino, R. & Cassader, M. Interactions between gut microbiota and host metabolism predisposing to obesity and diabetes. Annu. Rev. Med. 62, 361–380 (2011).

    CAS  Article  Google Scholar 

  8. 8.

    Escobar-Zepeda, A., De León, A. V. P. & Sanchez-Flores, A. The road to metagenomics: from microbiology to DNA sequencing technologies and bioinformatics. Front. Genet. 6, 1–15 (2015).

    Article  Google Scholar 

  9. 9.

    Johnson, C. H., Ivanisevic, J. & Siuzdak, G. Metabolomics: beyond biomarkers and towards mechanisms. Nat. Rev. Mol. Cell Biol. 17, 451–459 (2016).

    CAS  Article  Google Scholar 

  10. 10.

    Pedersen, H. K. et al. Human gut microbes impact host serum metabolome and insulin sensitivity. Nature 535, 376–381 (2016).

    CAS  Article  Google Scholar 

  11. 11.

    Li, J. et al. An integrated catalog of reference genes in the human gut microbiome. Nat. Biotechnol. 32, 834–841 (2014).

    CAS  Article  Google Scholar 

  12. 12.

    Nielsen, H. B. et al. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat. Biotechnol. 32, 822–828 (2014).

    CAS  Article  Google Scholar 

  13. 13.

    Methé, B. A. et al. A framework for human microbiome research. Nature 486, 215–221 (2012).

    Article  Google Scholar 

  14. 14.

    Quince, C., Walker, A. W., Simpson, J. T., Loman, N. J. & Segata, N. Shotgun metagenomics, from sampling to analysis. Nat. Biotechnol. 35, 833–844 (2017).

    CAS  Article  Google Scholar 

  15. 15.

    Pollock, J., Glendinning, L., Wisedchanwet, T. & Watson, M. The madness of microbiome: attempting to find consensus “best practice” for 16S microbiome studies. Appl. Environ. Microbiol. https://doi.org/10.1128/AEM.02627-17 (2018).

  16. 16.

    Mallick, H. et al. Experimental design and quantitative analysis of microbial community multiomics. Genome Biol. 18, 228 (2017).

    Article  Google Scholar 

  17. 17.

    Knight, R. et al. Best practices for analysing microbiomes. Nat. Rev. Microbiol. 16, 410–422 (2018).

    CAS  Article  Google Scholar 

  18. 18.

    Nygren, H., Seppänen-Laakso, T., Castillo, S., Hyötyläinen, T. & Orešič, M. Liquid chromatography-mass spectrometry (LC-MS)-based lipidomics for studies of body fluids and tissues. Methods Mol. Biol. 708, 247–257 (2011).

    CAS  Article  Google Scholar 

  19. 19.

    Considine, E. C., Thomas, G., Boulesteix, A. L., Khashan, A. S. & Kenny, L. C. Critical review of reporting of the data analysis step in metabolomics. Metabolomics 14, 7 (2018).

    Article  Google Scholar 

  20. 20.

    Castillo, S., Mattila, I., Miettinen, J., Orešič, M. & Hyötyläinen, T. Data analysis tool for comprehensive two-dimensional gas chromatography/time-of-flight mass spectrometry. Anal. Chem. 83, 3058–3067 (2011).

    CAS  Article  Google Scholar 

  21. 21.

    Hyötyläinen, T. & Orešič, M. Optimizing the lipidomics workflow for clinical studies—practical considerations. Anal. Bioanal. Chem. 407, 4973–4993 (2015).

    Article  Google Scholar 

  22. 22.

    Cajka, T. & Fiehn, O. Toward merging untargeted and targeted methods in mass spectrometry-based metabolomics and lipidomics. Anal. Chem. 88, 524–545 (2016).

    CAS  Article  Google Scholar 

  23. 23.

    Begley, P. et al. Development and performance of a gas chromatography-time-of-flight mass spectrometry analysis for large-scale nontargeted metabolomic studies of human serum. Anal. Chem. 81, 7038–7046 (2009).

    CAS  Article  Google Scholar 

  24. 24.

    Tukey, J. Some thoughts on clinical trials, especially problems of multiplicity. Science 198, 679–684 (1977).

    CAS  Article  Google Scholar 

  25. 25.

    Zhao, W. et al. Weighted gene coexpression network analysis: state of the art. J. Biopharm. Stat. 20, 281–300 (2010).

    Article  Google Scholar 

  26. 26.

    Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).

    Article  Google Scholar 

  27. 27.

    Zhang, B. & Horvath, S. A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Mol. Biol. 4, 17 (2005).

  28. 28.

    Pei, G., Chen, L. & Zhang, W. WGCNA application to proteomic and metabolomic data analysis. Methods Enzymol. 585, 135–158 (2017).

    CAS  Article  Google Scholar 

  29. 29.

    Abubucker, S. et al. Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Comput. Biol. 8, e1002358 (2012).

    CAS  Article  Google Scholar 

  30. 30.

    Kim, S. ppcor: an R package for a fast calculation to semi-partial correlation coefficients. Commun. Stat. Appl. Methods 22, 665–674 (2015).

    PubMed  PubMed Central  Google Scholar 

  31. 31.

    Rohart, F., Gautier, B., Singh, A. & Lê Cao, K.-A. mixOmics: an R package for ‘omics feature selection and multiple data integration’. PLoS Comput. Biol. 13, e1005752 (2017).

    Article  Google Scholar 

  32. 32.

    Wang, B. et al. Similarity network fusion for aggregating data types on a genomic scale. Nat. Methods 11, 333–337 (2014).

    CAS  Article  Google Scholar 

  33. 33.

    Chen, J., Bushman, F. D., Lewis, J. D., Wu, G. D. & Li, H. Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis. Biostatistics 14, 244–258 (2013).

    Article  Google Scholar 

  34. 34.

    Noecker, C. et al. Metabolic model-based integration of microbiome taxonomic and metabolomic profiles elucidates mechanistic links between ecological and metabolic variation. mSystems 1, e00013–e00015 (2016).

    Article  Google Scholar 

  35. 35.

    Chong, J. & Xia, J. Computational approaches for integrative analysis of the metabolome and microbiome. Metabolites 7, E62 (2017).

    Article  Google Scholar 

  36. 36.

    Friedman, J. & Alm, E. J. Inferring correlation networks from genomic survey data. PLoS Comput. Biol. 8, 1–11 (2012).

    Article  Google Scholar 

  37. 37.

    Weiss, S. et al. Correlation detection strategies in microbial data sets vary widely in sensitivity and precision. ISME J. 10, 1669–1681 (2016).

    CAS  Article  Google Scholar 

  38. 38.

    Gloor, G. B., Macklaim, J. M., Pawlowsky-Glahn, V. & Egozcue, J. J. Microbiome datasets are compositional: and this is not optional. Front. Microbiol. 8, 2224 (2017).

    Article  Google Scholar 

  39. 39.

    Quigley, E. M. M. Leaky gut-concept or clinical entity? Curr. Opin. Gastroenterol. 32, 74–79 (2016).

    Article  Google Scholar 

  40. 40.

    Kelly, J. R. et al. Breaking down the barriers: the gut microbiome, intestinal permeability and stress-related psychiatric disorders. Front. Cell. Neurosci. 9, 392 (2015).

  41. 41.

    Mu, Q., Kirby, J., Reilly, C. M. & Luo, X. M. Leaky gut as a danger signal for autoimmune diseases. Front. Immunol. 8, 1–10 (2017).

    CAS  Google Scholar 

  42. 42.

    Meijnikman, A. S., Gerdes, V. E., Nieuwdorp, M. & Herrema, H. Evaluating causality of gut microbiota in obesity and diabetes in humans. Endocr. Rev. 39, 133–153 (2018).

    Article  Google Scholar 

  43. 43.

    Walter, J. & Ley, R. The human gut microbiome: ecology and recent evolutionary changes. Annu. Rev. Microbiol. 65, 411–429 (2011).

    CAS  Article  Google Scholar 

  44. 44.

    Costea, P. I. et al. Towards standards for human fecal sample processing in metagenomic studies. Nat. Biotechnol. 35, 1069–1076 (2017).

    CAS  PubMed  Google Scholar 

  45. 45.

    Vandeputte, D. et al. Quantitative microbiome profiling links gut community variation to microbial load. Nature 551, 507–511 (2017).

    CAS  PubMed  Google Scholar 

  46. 46.

    Weiss, S. et al. Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome https://doi.org/10.1186/s40168-017-0237-y (2017).

  47. 47.

    Gloor, G. B. & Reid, G. Compositional analysis: a valid approach to analyze microbiome high-throughput sequencing data. Can. J. Microbiol. 62, 692–703 (2016).

    CAS  Article  Google Scholar 

  48. 48.

    Saary, P., Forslund, K., Bork, P. & Hildebrand, F. RTK: efficient rarefaction analysis of large datasets. Bioinformatics 33, 2594–2595 (2017).

    CAS  Article  Google Scholar 

  49. 49.

    Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 1–21 (2014).

    Article  Google Scholar 

  50. 50.

    Kang, D. D., Froula, J., Egan, R. & Wang, Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3, e1165 (2015).

    Article  Google Scholar 

  51. 51.

    Wu, Y.-W., Tang, Y.-H., Tringe, S. G., Simmons, B. A. & Singer, S. W. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome 2, 26 (2014).

    CAS  Article  Google Scholar 

  52. 52.

    Sunagawa, S. et al. Metagenomic species profiling using universal phylogenetic marker genes. Nat. Methods 10, 1196–1199 (2013).

    CAS  Article  Google Scholar 

  53. 53.

    Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014).

    CAS  Article  Google Scholar 

  54. 54.

    Segata, N. et al. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods 9, 811–814 (2012).

    CAS  Article  Google Scholar 

  55. 55.

    Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y. & Morishima, K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45, D353–D361 (2017).

    CAS  Article  Google Scholar 

  56. 56.

    Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).

    CAS  Article  Google Scholar 

  57. 57.

    Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000).

    CAS  Article  Google Scholar 

  58. 58.

    Caspi, R. et al. The MetaCyc database of metabolic pathways and enzymes. Nucleic Acids Res. 46, D633–D639 (2018).

    Article  Google Scholar 

  59. 59.

    Tatusov, R. L. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 28, 33–36 (2000).

    CAS  Article  Google Scholar 

  60. 60.

    Lombard, V., Golaconda Ramulu, H., Drula, E., Coutinho, P. M. & Henrissat, B. The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res. 42, 490–495 (2014).

    Article  Google Scholar 

  61. 61.

    Vieira-Silva, S. et al. Species–function relationships shape ecological properties of the human gut microbiome. Nat. Microbiol. 1, 16088 (2016).

    CAS  Article  Google Scholar 

  62. 62.

    Wold, S., Esbensen, K. & Geladi, P. Principal component analysis. Chemom. Intell. Lab. Syst. 2, 37–52 (1987).

    CAS  Article  Google Scholar 

  63. 63.

    Devarajan, K. Nonnegative matrix factorization: an analytical and interpretive tool in computational biology. PLoS Comput. Biol. 4, e1000029 (2008).

    Article  Google Scholar 

  64. 64.

    Kamburov, A., Stelzl, U., Lehrach, H. & Herwig, R. The ConsensusPathDB interaction database: 2013 update. Nucleic Acids Res. 41, D793–D800 (2013).

    CAS  Article  Google Scholar 

  65. 65.

    Forslund, K. et al. Country-specific antibiotic use practices impact the human gut resistome. Genome Res. 23, 1163–1169 (2013).

    CAS  Article  Google Scholar 

Download references


This research received funding from the European Community’s Seventh Framework Programme (FP7/2007–2013): MetaHIT, grant agreement HEALTH-F4-2007-201052 and MetaCardis, grant agreement HEALTH-2012-305312. The Department of Bio and Health Informatics, Technical University of Denmark, and the Novo Nordisk Foundation Center for Basic Metabolic Research have in addition received support from the Innovative Medicines Initiative Joint Undertaking under grant agreement no. 115317 (DIRECT), the resources of which are composed of financial contributions from the European Union’s Seventh Framework Programme (FP7/2007–2013) and EFPIA companies’ in kind contribution. The Novo Nordisk Foundation Center for Protein Research received funding from the Novo Nordisk Foundation (grant agreement NNF14CC0001). The Novo Nordisk Foundation Center for Basic Metabolic Research is an independent research center at the University of Copenhagen partially funded by an unrestricted donation from the Novo Nordisk Foundation (http://www.metabol.ku.dk). A.Ø.P. received funding from the Lundbeck Foundation (grant R218-2016-1367) and S.D.E. received funding from Agence Nationale de la Recherche MetaGenoPolis grant ‘Investissements d’avenir’ ANR-11-DPBS-0001.

Author information




The protocol was written by S.K.F., H.K.P., V.G., A.Ø.P., F.H., T. Hyötyläinen., T.N. and H.B.N., together with T. Hansen, S.D.E., S.B., M.O., P.B. and O.P., with reusable code and example data compiled and tested by H.K.P. and V.G.

Corresponding author

Correspondence to Oluf Pedersen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related link

Key references using this protocol

Pedersen, H. K. et al. Nature 535, 376–381 (2016): https://doi.org/10.1038/nature18646

Integrated supplementary information

Supplementary Figure 1 Composition and concentration range of lipids and polar metabolites detected in the human serum samples in the MetaHIT cohort.

The polar metabolites include both fully identified metabolites as well as metabolites identified only at the compound class level (n=778) while the lipids include only fully identified such (n=287).

Supplementary information

Supplementary Figure 1

Supplementary Figure 1 and Supplementary Methods

Supplementary Table 1

Supplementary Data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Pedersen, H.K., Forslund, S.K., Gudmundsdottir, V. et al. A computational framework to integrate high-throughput ‘-omics’ datasets for the identification of potential mechanistic links. Nat Protoc 13, 2781–2800 (2018). https://doi.org/10.1038/s41596-018-0064-z

Download citation

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing