Abstract
We recently presented a three-pronged association study that integrated human intestinal microbiome data derived from shotgun-based sequencing with untargeted serum metabolome data and measures of host physiology. Metabolome and microbiome data are high dimensional, posing a major challenge for data integration. Here, we present a step-by-step computational protocol that details and discusses the dimensionality-reduction techniques used and methods for subsequent integration and interpretation of such heterogeneous types of data. Dimensionality reduction was achieved through a combination of data normalization approaches, binning of co-abundant genes and metabolites, and integration of prior biological knowledge. The use of prior knowledge to overcome functional redundancy across microbiome species is one central advance of our method over available alternative approaches. Applying this framework, other investigators can integrate various ‘-omics’ readouts with variables of host physiology or any other phenotype of interest (e.g., connecting host and microbiome readouts to disease severity or treatment outcome in a clinical cohort) in a three-pronged association analysis to identify potential mechanistic links to be tested in experimental settings. Although we originally developed the framework for a human metabolome–microbiome study, it is generalizable to other organisms and environmental metagenomes, as well as to studies including other -omics domains such as transcriptomics and proteomics. The provided R code runs in ~1 h on a standard PC.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Statistical methods and resources for biomarker discovery using metabolomics
BMC Bioinformatics Open Access 15 June 2023
-
Integrated multi-omics reveal important roles of gut contents in intestinal ischemia–reperfusion induced injuries in rats
Communications Biology Open Access 09 September 2022
-
Intrapopulation adaptive variance supports thermal tolerance in a reef-building coral
Communications Biology Open Access 19 May 2022
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout




References
Le Chatelier, E. et al. Richness of human gut microbiome correlates with metabolic markers. Nature 500, 541–546 (2013).
Qin, J. et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490, 55–60 (2012).
Karlsson, F. H. et al. Gut metagenome in European women with normal, impaired and diabetic glucose control. Nature 498, 99–103 (2013).
Forslund, K. et al. Disentangling type 2 diabetes and metformin treatment signatures in the human gut microbiota. Nature 528, 262–266 (2015).
Sharon, G. et al. Specialized metabolites from the microbiome in health and disease. Cell Metab. 20, 719–730 (2014).
Antharam, V. C. et al. An integrated metabolomic and microbiome analysis identified specific gut microbiota associated with fecal cholesterol and coprostanol in Clostridium difficile infection. PLoS ONE 11, 1–23 (2016).
Musso, G., Gambino, R. & Cassader, M. Interactions between gut microbiota and host metabolism predisposing to obesity and diabetes. Annu. Rev. Med. 62, 361–380 (2011).
Escobar-Zepeda, A., De León, A. V. P. & Sanchez-Flores, A. The road to metagenomics: from microbiology to DNA sequencing technologies and bioinformatics. Front. Genet. 6, 1–15 (2015).
Johnson, C. H., Ivanisevic, J. & Siuzdak, G. Metabolomics: beyond biomarkers and towards mechanisms. Nat. Rev. Mol. Cell Biol. 17, 451–459 (2016).
Pedersen, H. K. et al. Human gut microbes impact host serum metabolome and insulin sensitivity. Nature 535, 376–381 (2016).
Li, J. et al. An integrated catalog of reference genes in the human gut microbiome. Nat. Biotechnol. 32, 834–841 (2014).
Nielsen, H. B. et al. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat. Biotechnol. 32, 822–828 (2014).
Methé, B. A. et al. A framework for human microbiome research. Nature 486, 215–221 (2012).
Quince, C., Walker, A. W., Simpson, J. T., Loman, N. J. & Segata, N. Shotgun metagenomics, from sampling to analysis. Nat. Biotechnol. 35, 833–844 (2017).
Pollock, J., Glendinning, L., Wisedchanwet, T. & Watson, M. The madness of microbiome: attempting to find consensus “best practice” for 16S microbiome studies. Appl. Environ. Microbiol. https://doi.org/10.1128/AEM.02627-17 (2018).
Mallick, H. et al. Experimental design and quantitative analysis of microbial community multiomics. Genome Biol. 18, 228 (2017).
Knight, R. et al. Best practices for analysing microbiomes. Nat. Rev. Microbiol. 16, 410–422 (2018).
Nygren, H., Seppänen-Laakso, T., Castillo, S., Hyötyläinen, T. & Orešič, M. Liquid chromatography-mass spectrometry (LC-MS)-based lipidomics for studies of body fluids and tissues. Methods Mol. Biol. 708, 247–257 (2011).
Considine, E. C., Thomas, G., Boulesteix, A. L., Khashan, A. S. & Kenny, L. C. Critical review of reporting of the data analysis step in metabolomics. Metabolomics 14, 7 (2018).
Castillo, S., Mattila, I., Miettinen, J., Orešič, M. & Hyötyläinen, T. Data analysis tool for comprehensive two-dimensional gas chromatography/time-of-flight mass spectrometry. Anal. Chem. 83, 3058–3067 (2011).
Hyötyläinen, T. & Orešič, M. Optimizing the lipidomics workflow for clinical studies—practical considerations. Anal. Bioanal. Chem. 407, 4973–4993 (2015).
Cajka, T. & Fiehn, O. Toward merging untargeted and targeted methods in mass spectrometry-based metabolomics and lipidomics. Anal. Chem. 88, 524–545 (2016).
Begley, P. et al. Development and performance of a gas chromatography-time-of-flight mass spectrometry analysis for large-scale nontargeted metabolomic studies of human serum. Anal. Chem. 81, 7038–7046 (2009).
Tukey, J. Some thoughts on clinical trials, especially problems of multiplicity. Science 198, 679–684 (1977).
Zhao, W. et al. Weighted gene coexpression network analysis: state of the art. J. Biopharm. Stat. 20, 281–300 (2010).
Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).
Zhang, B. & Horvath, S. A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Mol. Biol. 4, 17 (2005).
Pei, G., Chen, L. & Zhang, W. WGCNA application to proteomic and metabolomic data analysis. Methods Enzymol. 585, 135–158 (2017).
Abubucker, S. et al. Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Comput. Biol. 8, e1002358 (2012).
Kim, S. ppcor: an R package for a fast calculation to semi-partial correlation coefficients. Commun. Stat. Appl. Methods 22, 665–674 (2015).
Rohart, F., Gautier, B., Singh, A. & Lê Cao, K.-A. mixOmics: an R package for ‘omics feature selection and multiple data integration’. PLoS Comput. Biol. 13, e1005752 (2017).
Wang, B. et al. Similarity network fusion for aggregating data types on a genomic scale. Nat. Methods 11, 333–337 (2014).
Chen, J., Bushman, F. D., Lewis, J. D., Wu, G. D. & Li, H. Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis. Biostatistics 14, 244–258 (2013).
Noecker, C. et al. Metabolic model-based integration of microbiome taxonomic and metabolomic profiles elucidates mechanistic links between ecological and metabolic variation. mSystems 1, e00013–e00015 (2016).
Chong, J. & Xia, J. Computational approaches for integrative analysis of the metabolome and microbiome. Metabolites 7, E62 (2017).
Friedman, J. & Alm, E. J. Inferring correlation networks from genomic survey data. PLoS Comput. Biol. 8, 1–11 (2012).
Weiss, S. et al. Correlation detection strategies in microbial data sets vary widely in sensitivity and precision. ISME J. 10, 1669–1681 (2016).
Gloor, G. B., Macklaim, J. M., Pawlowsky-Glahn, V. & Egozcue, J. J. Microbiome datasets are compositional: and this is not optional. Front. Microbiol. 8, 2224 (2017).
Quigley, E. M. M. Leaky gut-concept or clinical entity? Curr. Opin. Gastroenterol. 32, 74–79 (2016).
Kelly, J. R. et al. Breaking down the barriers: the gut microbiome, intestinal permeability and stress-related psychiatric disorders. Front. Cell. Neurosci. 9, 392 (2015).
Mu, Q., Kirby, J., Reilly, C. M. & Luo, X. M. Leaky gut as a danger signal for autoimmune diseases. Front. Immunol. 8, 1–10 (2017).
Meijnikman, A. S., Gerdes, V. E., Nieuwdorp, M. & Herrema, H. Evaluating causality of gut microbiota in obesity and diabetes in humans. Endocr. Rev. 39, 133–153 (2018).
Walter, J. & Ley, R. The human gut microbiome: ecology and recent evolutionary changes. Annu. Rev. Microbiol. 65, 411–429 (2011).
Costea, P. I. et al. Towards standards for human fecal sample processing in metagenomic studies. Nat. Biotechnol. 35, 1069–1076 (2017).
Vandeputte, D. et al. Quantitative microbiome profiling links gut community variation to microbial load. Nature 551, 507–511 (2017).
Weiss, S. et al. Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome https://doi.org/10.1186/s40168-017-0237-y (2017).
Gloor, G. B. & Reid, G. Compositional analysis: a valid approach to analyze microbiome high-throughput sequencing data. Can. J. Microbiol. 62, 692–703 (2016).
Saary, P., Forslund, K., Bork, P. & Hildebrand, F. RTK: efficient rarefaction analysis of large datasets. Bioinformatics 33, 2594–2595 (2017).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 1–21 (2014).
Kang, D. D., Froula, J., Egan, R. & Wang, Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3, e1165 (2015).
Wu, Y.-W., Tang, Y.-H., Tringe, S. G., Simmons, B. A. & Singer, S. W. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome 2, 26 (2014).
Sunagawa, S. et al. Metagenomic species profiling using universal phylogenetic marker genes. Nat. Methods 10, 1196–1199 (2013).
Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014).
Segata, N. et al. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods 9, 811–814 (2012).
Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y. & Morishima, K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45, D353–D361 (2017).
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).
Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000).
Caspi, R. et al. The MetaCyc database of metabolic pathways and enzymes. Nucleic Acids Res. 46, D633–D639 (2018).
Tatusov, R. L. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 28, 33–36 (2000).
Lombard, V., Golaconda Ramulu, H., Drula, E., Coutinho, P. M. & Henrissat, B. The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res. 42, 490–495 (2014).
Vieira-Silva, S. et al. Species–function relationships shape ecological properties of the human gut microbiome. Nat. Microbiol. 1, 16088 (2016).
Wold, S., Esbensen, K. & Geladi, P. Principal component analysis. Chemom. Intell. Lab. Syst. 2, 37–52 (1987).
Devarajan, K. Nonnegative matrix factorization: an analytical and interpretive tool in computational biology. PLoS Comput. Biol. 4, e1000029 (2008).
Kamburov, A., Stelzl, U., Lehrach, H. & Herwig, R. The ConsensusPathDB interaction database: 2013 update. Nucleic Acids Res. 41, D793–D800 (2013).
Forslund, K. et al. Country-specific antibiotic use practices impact the human gut resistome. Genome Res. 23, 1163–1169 (2013).
Acknowledgements
This research received funding from the European Community’s Seventh Framework Programme (FP7/2007–2013): MetaHIT, grant agreement HEALTH-F4-2007-201052 and MetaCardis, grant agreement HEALTH-2012-305312. The Department of Bio and Health Informatics, Technical University of Denmark, and the Novo Nordisk Foundation Center for Basic Metabolic Research have in addition received support from the Innovative Medicines Initiative Joint Undertaking under grant agreement no. 115317 (DIRECT), the resources of which are composed of financial contributions from the European Union’s Seventh Framework Programme (FP7/2007–2013) and EFPIA companies’ in kind contribution. The Novo Nordisk Foundation Center for Protein Research received funding from the Novo Nordisk Foundation (grant agreement NNF14CC0001). The Novo Nordisk Foundation Center for Basic Metabolic Research is an independent research center at the University of Copenhagen partially funded by an unrestricted donation from the Novo Nordisk Foundation (http://www.metabol.ku.dk). A.Ø.P. received funding from the Lundbeck Foundation (grant R218-2016-1367) and S.D.E. received funding from Agence Nationale de la Recherche MetaGenoPolis grant ‘Investissements d’avenir’ ANR-11-DPBS-0001.
Author information
Authors and Affiliations
Contributions
The protocol was written by S.K.F., H.K.P., V.G., A.Ø.P., F.H., T. Hyötyläinen., T.N. and H.B.N., together with T. Hansen, S.D.E., S.B., M.O., P.B. and O.P., with reusable code and example data compiled and tested by H.K.P. and V.G.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Related link
Key references using this protocol
Pedersen, H. K. et al. Nature 535, 376–381 (2016): https://doi.org/10.1038/nature18646
Integrated supplementary information
Supplementary Figure 1 Composition and concentration range of lipids and polar metabolites detected in the human serum samples in the MetaHIT cohort.
The polar metabolites include both fully identified metabolites as well as metabolites identified only at the compound class level (n=778) while the lipids include only fully identified such (n=287).
Supplementary information
Supplementary Figure 1
Supplementary Figure 1 and Supplementary Methods
Rights and permissions
About this article
Cite this article
Pedersen, H.K., Forslund, S.K., Gudmundsdottir, V. et al. A computational framework to integrate high-throughput ‘-omics’ datasets for the identification of potential mechanistic links. Nat Protoc 13, 2781–2800 (2018). https://doi.org/10.1038/s41596-018-0064-z
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41596-018-0064-z
This article is cited by
-
Statistical methods and resources for biomarker discovery using metabolomics
BMC Bioinformatics (2023)
-
Integrated multi-omics reveal important roles of gut contents in intestinal ischemia–reperfusion induced injuries in rats
Communications Biology (2022)
-
Intrapopulation adaptive variance supports thermal tolerance in a reef-building coral
Communications Biology (2022)
-
Multi-omics analyses of airway host–microbe interactions in chronic obstructive pulmonary disease identify potential therapeutic interventions
Nature Microbiology (2022)
-
Longitudinal multi-omics analyses of the gut–liver axis reveals metabolic dysregulation in hepatitis C infection and cirrhosis
Nature Microbiology (2022)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.