Abstract
To study the effect of host genetics on gut microbiome composition, the MiBioGen consortium curated and analyzed genome-wide genotypes and 16S fecal microbiome data from 18,340 individuals (24 cohorts). Microbial composition showed high variability across cohorts: only 9 of 410 genera were detected in more than 95% of samples. A genome-wide association study of host genetic variation regarding microbial taxa identified 31 loci affecting the microbiome at a genome-wide significant (P < 5 × 10−8) threshold. One locus, the lactase (LCT) gene locus, reached study-wide significance (genome-wide association study signal: P = 1.28 × 10−20), and it showed an age-dependent association with Bifidobacterium abundance. Other associations were suggestive (1.95 × 10−10 < P < 5 × 10−8) but enriched for taxa showing high heritability and for genes expressed in the intestine and brain. A phenome-wide association study and Mendelian randomization identified enrichment of microbiome trait loci in the metabolic, nutrition and environment domains and suggested the microbiome might have causal effects in ulcerative colitis and rheumatoid arthritis.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Causal effect between gut microbiota and pancreatic cancer: a two-sample Mendelian randomization study
BMC Cancer Open Access 10 November 2023
-
Roles of gut microbiota in atrial fibrillation: insights from Mendelian randomization analysis and genetic data from over 430,000 cohort study participants
Cardiovascular Diabetology Open Access 08 November 2023
-
Causal effects of gut microbiome on autoimmune liver disease: a two-sample Mendelian randomization study
BMC Medical Genomics Open Access 03 October 2023
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout






Data availability
Full GWAS summary statistics for mbQTLs are available at www.mibiogen.org, built using the MOLGENIS framework80.
16S data availability:
BSPSPC and FOCUS data is available from the Sequence Read Archive (SRA) under accession PRJNA673102.
All CARDIA data, including 16S rRNA sequencing, cannot be made publicly available due to the confidentiality restrictions. The data can be requested from CARDIA Study Data Coordinating Center at the University of Alabama at Birmingham, following CARDIA Confidentiality Certification rules. The process for obtaining data through CARDIA is outlined at https://www.cardia.dopm.uab.edu/publications-2/publications-documents.
COPSAC data are available on SRA (PRJNA683912).
DanFunD data are not deposited on the public databases due to legal and ethical restrictions. Access to the data and biological material can be granted by the DanFunD steering committee (https://www.frederiksberghospital.dk/ckff/sektioner/SBE/danfund/Sider/How-to-collaborate.aspx).
FGFP data are available on the European Genome-Phenome Archive (EGA) under accession EGAS00001004420.
GEM data are available on the SRA (PRJEB14839).
Generation R and Rotterdam Study data cannot be made publicly available due to ethical and legal restrictions; these data are available upon request to the data manager of the Rotterdam Study (f.vanrooij@erasmusmc.nl) or of the Generation R Study (c.kruithof@erasmusmc.nl), subject to local rules and regulations.
HCHS/SOL data are available from the European Nucleotide Archive (ENA) under accession ERP117287.
KSCS data are available at the public repository, Clinical and Omics data archives in the Korea National Institute of Health under accession R000635.
LLD and MIBS data are available from EGA (EGAS00001001704 and EGAS0000100924).
METSIM data are available on the SRA (SRP097785).
NGRC data are available on the ENA (ERP016332).
The NTR has a data access committee that reviews data requests and will make data available to interested researchers. The data come from extended twin families and pedigree structures with twins, which create privacy concerns and thus cannot be shared on publicly available databases. Researchers may contact eco.de.geus@vu.nl for data requests.
PNP is available on the ENA (PRJEB11532).
POPCOL is available on the EGA (EGAS00001004869).
SHIP and SHIP-TREND data can be obtained from the SHIP data management unit via an online data access application form (https://www.fvcm.med.uni-greifswald.de/dd_service/data_use_intro.php).
TwinsUK data are available on the ENA under accession ERP015317.
Code availability
All code used in the study is available on the Consortium GitHub (https://github.com/alexa-kur/miQTL_cookbook) or on the websites of corresponding software packages.
References
Gilbert, J. A. et al. Current understanding of the human microbiome. Nat. Med. 24, 392–400 (2018).
Zhernakova, A. et al. Population-based metagenomics analysis reveals markers for gut microbiome composition and diversity. Science 352, 565–569 (2016).
Falony, G. et al. Population-level analysis of gut microbiome variation. Science 352, 560–564 (2016).
Rothschild, D. et al. Environment dominates over host genetics in shaping human gut microbiota. Nature 555, 210–215 (2018).
Goodrich, J. K. et al. Human genetics shape the gut microbiome. Cell 159, 789–799 (2014).
Goodrich, J. K. et al. Genetic determinants of the gut microbiome in UK twins. Cell Host Microbe 19, 731–743 (2016).
Bonder, M. J. et al. The effect of host genetics on the gut microbiome. Nat. Genet. 48, 1407–1412 (2016).
Wang, J. et al. Genome-wide association analysis identifies variation in vitamin D receptor and other host factors influencing the gut microbiota. Nat. Genet. 48, 1396–1406 (2016).
Turpin, W. et al. Association of host genome with intestinal microbial composition in a large healthy cohort. Nat. Genet. 48, 1413–1417 (2016).
Kurilshikov, A., Wijmenga, C., Fu, J. & Zhernakova, A. Host genetics and gut microbiome: challenges and perspectives. Trends Immunol. 38, 633–647(2017).
Wang, J. et al. Meta-analysis of human genome–microbiome association studies: the MiBioGen consortium initiative. Microbiome 6, 101 (2018).
Sinha, R. et al. Assessment of variation in microbial community amplicon sequencing by the Microbiome Quality Control (MBQC) project consortium. Nat. Biotechnol. 35, 1077–1086 (2017).
Sinha, R., Abnet, C. C., White, O., Knight, R. & Huttenhower, C. The microbiome quality control project: baseline study design and future directions. Genome Biol. 16, 276 (2015).
Vandeputte, D., Tito, R. Y., Vanleeuwen, R., Falony, G. & Raes, J. Practical considerations for large-scale gut microbiome studies. FEMS Microbiol. Rev. 41, S154–S167 (2017).
Quast, C. et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41, D590–D596 (2012).
Cole, J. R. et al. The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res. 37, D141–D145 (2009).
Louis, P., Young, P., Holtrop, G. & Flint, H. J. Diversity of human colonic butyrate-producing bacteria revealed by analysis of the butyryl-CoA:acetate CoA-transferase gene. Environ. Microbiol. 12, 304–314 (2010).
Westra, H.-J. et al. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet. 45, 1238–1243 (2013).
Wason, J. M. S. & Dudbridge, F. A general framework for two-stage analysis of genome-wide association studies and its application to case–control studies. Am. J. Hum. Genet. 90, 760–773 (2012).
Võsa, U. et al. Unraveling the polygenic architecture of complex traits using blood eQTL metaanalysis. Preprint at bioRxiv https://doi.org/10.1101/447367 (2018).
Zhernakova, D. V. et al. Individual variations in cardiovascular-disease-related protein levels are driven by genetics and gut microbiome. Nat. Genet. 50, 1524–1532 (2018).
Blekhman, R. et al. Host genetic variation impacts microbiome composition across human body sites. Genome Biol. 16, 191 (2015).
Machiela, M. J. & Chanock, S. J. LDlink: a web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants. Bioinformatics 31, 3555–3557 (2015).
Kashyap, P. C. et al. Genetically dictated change in host mucus carbohydrate landscape exerts a diet-dependent effect on the gut microbiota. Proc. Natl Acad. Sci. USA 110, 17059–17064 (2013).
Crost, E. H. et al. Mechanistic insights into the cross-feeding of Ruminococcus gnavus and Ruminococcus bromii on host and dietary carbohydrates. Front. Microbiol. 9, 2558 (2018).
Yoshii, K., Hosomi, K., Sawane, K. & Kunisawa, J. Metabolism of dietary and microbial vitamin B family in the regulation of host immunity. Front. Nutr. 6, 48 (2019).
Haas, M. E. et al. Genetic association of albuminuria with cardiometabolic disease and blood pressure. Am. J. Hum. Genet. 103, 461–473 (2018).
Rowley, C. A. & Kendall, M. M. To B12 or not to B12: five questions on the role of cobalamin in host–microbial interactions. PLoS Pathog. 15, e1007479 (2019).
Xu, Y. et al. Cobalamin (vitamin B12) induced a shift in microbial composition and metabolic activity in an in vitro colon simulation. Front. Microbiol. 9, 2780 (2018).
Gysemans, C. et al. Interferon regulatory factor-1 is a key transcription factor in murine beta cells under immune attack. Diabetologia 52, 2374–2384 (2009).
Watanabe, K., Taskesen, E., van Bochoven, A. & Posthuma, D. Functional mapping and annotation of genetic associations with FUMA. Nat. Commun. 8, 1826 (2017).
Watanabe, K. et al. A global overview of pleiotropy and genetic architecture in complex traits. Nat. Genet. 51, 1339–1348 (2019).
Nicklas, T. A. et al. Self-perceived lactose intolerance results in lower intakes of calcium and dairy foods and is associated with hypertension and diabetes in adults. Am. J. Clin. Nutr. 94, 191–198 (2011).
Shin, S.-Y. et al. An atlas of genetic influences on human blood metabolites. Nat. Genet. 46, 543–550 (2014).
Suhre, K. et al. Metabolic footprint of diabetes: a multiplatform metabolomics study in an epidemiological setting. PLoS ONE 5, e13953 (2010).
Hemani, G. et al. The MR-Base platform supports systematic causal inference across the human phenome. eLife 7, e34408 (2018).
Qin, J. et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490, 55–60 (2012).
Koeth, R. A. et al. Intestinal microbiota metabolism of l-carnitine, a nutrient in red meat, promotes atherosclerosis. Nat. Med. 19, 576–585 (2013).
Coit, P. & Sawalha, A. H. The human microbiome in rheumatic autoimmune diseases: a comprehensive review. Clin. Immunol. 170, 70–79 (2016).
Vatanen, T. et al. Variation in microbiome LPS immunogenicity contributes to autoimmunity in humans. Cell 165, 842–853 (2016).
O’Mahony, S. M., Clarke, G., Borre, Y. E., Dinan, T. G. & Cryan, J. F. Serotonin, tryptophan metabolism and the brain–gut–microbiome axis. Behav. Brain Res. 277, 32–48 (2015).
Karlsson, F. H. et al. Gut metagenome in European women with normal, impaired and diabetic glucose control. Nature 498, 99–103 (2013).
Wade, K. H. & Hall, L. J. Improving causality in microbiome research: can human genetic epidemiology help? Wellcome Open Res. 4, 199 (2019).
Brooks, J. P. et al. The truth about metagenomics: quantifying and counteracting bias in 16S rRNA studies. BMC Microbiol. 15, 66 (2015).
Coluccia, E. et al. Congruency of genetic predisposition to lactase persistence and lactose breath test. Nutrients 11, 1383 (2019).
Lapides, R. A. & Savaiano, D. A. Gender, age, race and lactose intolerance: is there evidence to support a differential symptom response? a scoping review. Nutrients 10, 1956 (2018).
Valles-Colomer, M. et al. The neuroactive potential of the human gut microbiota in quality of life and depression. Nat. Microbiol. 4, 623–632 (2019).
Lloyd-Price, J. et al. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature 569, 655–662 (2019).
Vich Vila, A. et al. Gut microbiota composition and functional changes in inflammatory bowel disease and irritable bowel syndrome. Sci. Transl. Med. 10, eaap8914 (2018).
Ottosson, F. et al. Connection between BMI-related plasma metabolite profile and gut microbiota. J. Clin. Endocrinol. Metab. 103, 1491–1501 (2018).
Tun, H. M. et al. Roles of birth mode and infant gut microbiota in intergenerational transmission of overweight and obesity from mother to offspring. JAMA Pediatr. 172, 368–377 (2018).
Finnicum, C. T. et al. Metataxonomic analysis of individuals at BMI extremes and monozygotic twins discordant for BMI. Twin Res. Hum. Genet. 21, 203–213 (2018).
Sanna, S. et al. Causal relationships among the gut microbiome, short-chain fatty acids and metabolic diseases. Nat. Genet. 51, 600–605 (2019).
Jia, J. et al. Assessment of causal direction between gut microbiota-dependent metabolites and cardiometabolic health: a bidirectional Mendelian randomization analysis. Diabetes 68, 1747–1755 (2019).
Yang, Q., Lin, S. L., Kwok, M. K., Leung, G. M. & Schooling, C. M. The roles of 27 genera of human gut microbiota in ischemic heart disease, type 2 diabetes mellitus, and their risk factors: a Mendelian randomization study. Am. J. Epidemiol. 187, 1916–1922 (2018).
Rinninella, E. et al. What is the healthy gut microbiota composition? a changing ecosystem across age, environment, diet and diseases. Microorganisms 7, 14 (2019).
Plichta, D. R., Graham, D. B., Subramanian, S. & Xavier, R. J. Therapeutic opportunities in inflammatory bowel disease: mechanistic dissection of host–microbiome relationships. Cell 178, 1041–1056 (2019).
Frank, D. N. et al. Molecular-phylogenetic characterization of microbial community imbalances in human inflammatory bowel diseases. Proc. Natl Acad. Sci. USA 104, 13780–13785 (2007).
Morgan, X. C. et al. Dysfunction of the intestinal microbiome in inflammatory bowel disease and treatment. Genome Biol. 13, R79 (2012).
Tursi, A. et al. Treatment of relapsing mild-to-moderate ulcerative colitis with the probiotic VSL#3 as adjunctive to a standard pharmaceutical treatment: a double-blind, randomized, placebo-controlled study. Am. J. Gastroenterol. 105, 2218–2227 (2010).
Scher, J. U. et al. The lung microbiota in early rheumatoid arthritis and autoimmunity. Microbiome 4, 60 (2016).
McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).
Howie, B., Marchini, J. & Stephens, M. Genotype imputation with thousands of genomes. G3 1, 457–470 (2011).
Howie, B., Fuchsberger, C., Stephens, M., Marchini, J. & Abecasis, G. R. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat. Genet. 44, 955–959 (2012).
Carmi, S. et al. Sequencing an Ashkenazi reference panel supports population-targeted personal genomics and illuminates Jewish and European origins. Nat. Commun. 5, 4835 (2014).
Deelen, P. et al. Genotype harmonizer: automatic strand alignment and format conversion for genotype data integration. BMC Res. Notes 7, 901 (2014).
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).
Cochran, W. G. The combination of estimates from different experiments. Biometrics 10, 101–129 (1954).
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Pers, T. H., Timshel, P. & Hirschhorn, J. N. SNPsnap: a Web-based tool for identification and annotation of matched SNPs. Bioinformatics 31, 418–420 (2015).
Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).
Bowden, J. et al. A framework for the investigation of pleiotropy in two-sample summary data Mendelian randomization. Stat. Med. 36, 1783–1802 (2017).
Hartwig, F. P., Davey Smith, G. & Bowden, J. Robust inference in summary data Mendelian randomization via the zero modal pleiotropy assumption. Int. J. Epidemiol. 46, 1985–1998 (2017).
Bowden, J., Davey Smith, G. & Burgess, S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int. J. Epidemiol. 44, 512–525 (2015).
Verbanck, M., Chen, C.-Y., Neale, B. & Do, R. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat. Genet. 50, 693–698 (2018).
Shim, H. et al. A multivariate genome-wide association analysis of 10 LDL subfractions, and their response to statin treatment, in 1868 caucasians. PLoS ONE 10, e0120758 (2015).
Burgess, S., Butterworth, A. & Thompson, S. G. Mendelian randomization analysis with multiple genetic variants using summarized data. Genet. Epidemiol. 37, 658–665 (2013).
Swertz, M. A. et al. The MOLGENIS toolkit: rapid prototyping of biosoftware at the push of a button. BMC Bioinformatics 11, S12 (2010).
Acknowledgements
Information on cohort funding and acknowledgements is available in the Supplementary Note. We thank J. Senior and K. McIntyre for critically reading the manuscript.
Author information
Authors and Affiliations
Contributions
A.K., A.Z., R. Kraaijr, C.M.-G., L.F. and J.R. conceived and designed the study. A.K., C.M.-G., R.B., D.R. and J.W. were responsible for coordinating and performing meta-analysis. A.D., C.L.R., J.A.R.G., C.T.F., X.L., D.Z. and M.J.B. led the specific downstream analyses and should be considered as shared second authors. Specifically, A.D. performed the PheWAS analysis, C.L.R. and C.T.F. performed the heritability analysis in TwinsUK and NTR cohorts, respectively, and J.A.R.G performed the age-related analysis of the LCT locus. X.L. ran and interpreted the FUMA analysis, and D.Z. ran and interpreted the MR analysis. M.J.B. substantially contributed to the development of the analysis pipeline and protocols. R.K., J.R. and A.Z. jointly supervised the project. A.v.d.G., A.C., H.-J.W., Urmo V., M.J.B., S.S. and L.F. developed the pipeline for the meta-analysis and contributed to the methodology and statistical analysis. K.W. contributed to the PheWAS enrichment analysis. A.K., C.M.-G., R.B., D.R., J.W., A.D., C.L.R., J.A.R.G., C.T.F., X.L., D.Z., M.J.B., M.D.A., S.S., R. Kraaij, J.R. and A.Z. wrote the manuscript, with contributions from all authors. K.A.M., L.J.L. and M.F. collected and managed the CARDIA cohort. A.D.P., J.A.R.G., K.C., L.B. and W.T. collected and managed the GEM cohort. H.B., J.S., J.T., S.A.S. and S.J.S collected and managed the COPSAC study. D.B., O.P., T.H., T.J. and T.H.H. collected and managed the DanFunD study. D.A.H., G.F., J.R., J.W., K.H.W., M.J., N.J.T., R.Y.T., R.B. and S.V.-S. collected, genotyped and managed the FGFP study. C.M.-G., F.R., H.A.M., L.D. and V.W.V.J. collected and managed the Generation R study. H.-N.K., H.S. and H.-L.K. collected and managed the KSCS study. C.W., J.F., A.Z., L.F., S.S. and A.K. collected and managed the LLD cohort. A.J.L., E.O., K.L., M. Laaksok and M.B. collected and managed the METSIM cohort. A.A.M.M., D.M.A.E.J., D.K. and Z.M. collected and managed the MIBS-CO cohort. H.P. and Z.D.W. collected and managed the NGRC cohort. C.T.F., D.I.B., E.J.C.G., G.E.D., G.W. and R.G.I. collected and managed the NTR cohort. D. Rothschild, E.B., E.S. and O.W. collected and managed the PNP cohort. A.A., L.A., M.D.A., S. Walter and X.L. collected and managed the PopCol cohort. A.F., C.B., M.C.R., M. Laudes and W.L. collected and managed the BSPSPC and FOCUS cohorts. A.G.U., C.Mv.D, D. Radjabzadeh and R. Kraaij collected and managed the RS cohort data. F.F., F.U.W., G.H., H.V., M.M.L., S. Weiss and U. Völker collected and managed the SHIP and TREND cohorts. L.Y.M., Q.Q., R. Knight, R.C.K. and R.D.B collected and managed the SOL cohort. C.I.L.R, C.J.S., J.T.B., M.A.J. and T.D.S. collected and managed the TwinsUK cohort. A.A.V. and J.S.-T. contributed to the discussion. All authors approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
All authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Notes 1–4 and Supplementary Figs. 1–5
Rights and permissions
About this article
Cite this article
Kurilshikov, A., Medina-Gomez, C., Bacigalupe, R. et al. Large-scale association analyses identify host factors influencing human gut microbiome composition. Nat Genet 53, 156–165 (2021). https://doi.org/10.1038/s41588-020-00763-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-020-00763-1
This article is cited by
-
Causal effect between gut microbiota and pancreatic cancer: a two-sample Mendelian randomization study
BMC Cancer (2023)
-
Assessing the relationship between gut microbiota and irritable bowel syndrome: a two-sample Mendelian randomization analysis
BMC Gastroenterology (2023)
-
Identification of host gene-microbiome associations in colorectal cancer patients using mendelian randomization
Journal of Translational Medicine (2023)
-
Combined effect of microbially derived cecal SCFA and host genetics on feed efficiency in broiler chickens
Microbiome (2023)
-
Genetic evidence strengthens the bidirectional connection between gut microbiota and periodontitis: insights from a two-sample Mendelian randomization study
Journal of Translational Medicine (2023)