Low concordance between studies that examine the role of microbiota in human diseases is a pervasive challenge that limits the capacity to identify causal relationships between host-associated microorganisms and pathology. The risk of obtaining false positives is exacerbated by wide interindividual heterogeneity in microbiota composition1, probably due to population-wide differences in human lifestyle and physiological variables2 that exert differential effects on the microbiota. Here we infer the greatest, generalized sources of heterogeneity in human gut microbiota profiles and also identify human lifestyle and physiological characteristics that, if not evenly matched between cases and controls, confound microbiota analyses to produce spurious microbial associations with human diseases. We identify alcohol consumption frequency and bowel movement quality as unexpectedly strong sources of gut microbiota variance that differ in distribution between healthy participants and participants with a disease and that can confound study designs. We demonstrate that for numerous prevalent, high-burden human diseases, matching cases and controls for confounding variables reduces observed differences in the microbiota and the incidence of spurious associations. On this basis, we present a list of host variables that we recommend should be captured in human microbiota studies for the purpose of matching comparison groups, which we anticipate will increase robustness and reproducibility in resolving the members of the gut microbiota that are truly associated with human disease.
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
The sequencing data of the AGP used herein are available at the EBI (https://www.ebi.ac.uk/) database under study accession ID: MGYS00000596. External validation cohort data are available at NCBI BioProject PRJNA589036 (for alcohol consumption replication) and NCBI BioProject PRJEB18535 (for BMQ replication).
Source code for machine-learning analyses can be obtained at: https://github.com/jacksklar/AGPMicrobiomeHostPredictions. Source code for the remaining analyses including determination of mismatched host variables, case–control matching algorithms, and construction of permuted case–control cohorts can be obtained at: https://github.com/ivanvujkc/AGP_confounders.
The Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).
Falony, G. et al. Population-level analysis of gut microbiome variation. Science 352, 560–564 (2016).
Hsiao, E. Y. et al. Microbiota modulate behavioral and physiological abnormalities associated with neurodevelopmental disorders. Cell 155, 1451–1463 (2013).
Plovier, H. et al. A purified membrane protein from Akkermansia muciniphila or the pasteurized bacterium improves metabolism in obese and diabetic mice. Nat. Med. 23, 107–113 (2017).
Belkaid, Y. & Hand, T. W. Role of the microbiota in immunity and inflammation. Cell 157, 121–141 (2014).
Callahan, B. J. et al. DADA2: high-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581–583 (2016).
McDonald, D. et al. American gut: an open platform for citizen science. mSystems 3, e00031-18 (2018).
Forslund, K. et al. Disentangling type 2 diabetes and metformin treatment signatures in the human gut microbiota. Nature 528, 262–266 (2015).
Qin, J. et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490, 55–60 (2012).
Thingholm, L. B. et al. Obese individuals with and without type 2 diabetes show different gut microbial functional capacity and composition. Cell Host Microbe 26, 252–264.e10 (2019).
Mandal, S. et al. Analysis of composition of microbiomes: a novel method for studying microbial composition. Microb. Ecol. Health Dis. 26, 27663 (2015).
Larsen, N. et al. Gut microbiota in human adults with type 2 diabetes differs from non-diabetic adults. PLoS ONE 5, e9085 (2010).
Egshatyan, L. et al. Gut microbiota and diet in patients with different glucose tolerance. Endocr. Connect. 5, 1–9 (2016).
Karlsson, F. H. et al. Gut metagenome in European women with normal, impaired and diabetic glucose control. Nature 498, 99–103 (2013).
He, Y. et al. Regional variation limits applications of healthy gut microbiome reference ranges and disease models. Nat. Med. 24, 1532–1535 (2018).
Gevers, D. et al. The treatment-naive microbiome in new-onset Crohn’s disease. Cell Host Microbe 15, 382–392 (2014).
Vich Vila, A. et al. Gut microbiota composition and functional changes in inflammatory bowel disease and irritable bowel syndrome. Sci. Transl. Med. 10, eaap8914 (2018).
Lloyd-Price, J. et al. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature 569, 655–662 (2019).
Vujkovic-Cvijin, I. et al. HIV-associated gut dysbiosis is independent of sexual practice and correlates with noncommunicable diseases. Nat. Commun. 11, 2448 (2020).
Llopis, M. et al. Intestinal microbiota contributes to individual susceptibility to alcoholic liver disease. Gut 65, 830–839 (2016).
Ciocan, D. et al. Bile acid homeostasis and intestinal dysbiosis in alcoholic hepatitis. Aliment. Pharmacol. Ther. 48, 961–974 (2018).
Dubinkina, V. B. et al. Links of gut microbiota composition with alcohol dependence syndrome and alcoholic liver disease. Microbiome 5, 141 (2017).
Le Roy, C. I. et al. Red wine consumption associated with increased gut microbiota α-diversity in 3 independent cohorts. Gastroenterology 158, 270–272.e2 (2020).
Valles-Colomer, M. et al. The neuroactive potential of the human gut microbiota in quality of life and depression. Nat. Microbiol. 4, 623–632 (2019).
Reese, A. T. et al. Using DNA metabarcoding to evaluate the plant component of human diets: a proof of concept. mSystems 4, e00458-19 (2019).
Noguera-Julian, M. et al. Gut microbiota linked to sexual preference and HIV infection. EBioMedicine 5, 135–146 (2016).
Amir, A. et al. Correcting for microbial blooms in fecal samples during room-temperature shipping. mSystems 2, e00199-16 (2017).
Vujkovic-Cvijin, I. et al. Dysbiosis of the gut microbiota is associated with HIV disease progression and tryptophan catabolism. Sci. Transl. Med. 5, 193ra91 (2013).
Deschasaux, M. et al. Depicting the composition of gut microbiota in a population with varied ethnic origins but shared geography. Nat. Med. 24, 1526–1531 (2018).
Yasuda, K. et al. Biogeography of the intestinal mucosal and lumenal microbiome in the rhesus macaque. Cell Host Microbe 17, 385–391 (2015).
Cadwell, K. et al. Virus-plus-susceptibility gene interaction determines Crohn’s disease gene Atg16L1 phenotypes in intestine. Cell 141, 1135–1145 (2010).
Zhernakova, A. et al. Population-based metagenomics analysis reveals markers for gut microbiome composition and diversity. Science 352, 565–569 (2016).
Wilck, N. et al. Salt-responsive gut commensal modulates TH17 axis and disease. Nature 551, 585–589 (2017).
Korem, T. et al. Bread affects clinical parameters and induces gut microbiome-associated personal glycemic responses. Cell Metab. 25, 1243–1253.e5 (2017).
Ojala, M. & Garriga, G. C. Permutation tests for studying classifier performance. J. Mach. Learn. Res. 11, 1833–1863 (2010).
Barter, R. L. & Yu, B. Superheat: an R package for creating beautiful and extendable heatmaps for visualizing complex data. J. Comput. Graph. Stat. 27, 910–922 (2018).
Seidell, J. C. & Halberstadt, J. The global burden of obesity and the challenges of prevention. Ann. Nutr. Metab. 66 (Suppl. 2), 7–12 (2015).
Palleja, A. et al. Recovery of gut microbiota of healthy adults following antibiotic exposure. Nat. Microbiol. 3, 1255–1265 (2018).
Pasolli, E. et al. Accessible, curated metagenomic data through ExperimentHub. Nat. Methods 14, 1023–1024 (2017).
This research was supported by the Intramural Research Program of the National Institute of Allergy and Infectious Diseases (NIAID), National Institutes of Health (NIH). I.V.-C. was funded by the Cancer Research Institute Irvington Postdoctoral Fellowship Award and the Intramural AIDS Research Fellowship Award (NIH). Y.B. was funded by the NIAID Division of Intramural Research (ZIA-AI001115 and ZIA-AI001132), the NIH Director’s Challenge Award program and the Deputy Director for Intramural Research Innovation Award program. R.K. was funded by the NIH Pioneer Award (DP1 AT010885-01). L.N. was partially supported by the National Institute of Diabetes and Digestive and Kidney Diseases (1R01DK110541-01A1). We thank P. Grayson (National Institute of Arthritis and Musculoskeletal and Skin Diseases/NIH), P. Reiss (University of Amsterdam), A. Stacy (NIAID/NIH) and S.-J. Han (NIAID/NIH) for helpful discussion; as well as all members, contributors, administrators and volunteers of the American Gut Consortium for facilitating the AGP as an open-access resource for the microbiome science community.
R.K. is a director of the Center for Microbiome Innovation at UC San Diego, which receives industry research funding for various microbiome initiatives, but no industry funding was provided for this project. The remaining authors declare no competing interests.
Peer review information Nature thanks Eran Elinav and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Raw V4 16S rRNA reads were processed using dada2 and samples were filtered and selected as described in the text and Methods to form the ‘core sample population’. Balanced cohorts were constructed for each binary questionnaire variable, and Random Forests analyses were repeated 25 times over 75/25 splits. Concurrently, sample classes were randomly permuted to simulate noise and the same procedure was performed to facilitate empirical P value estimations.
Extended Data Fig. 2 Machine-learning evaluation of common exclusion criteria and variables for matching.
a, Random Forests analysis was performed on binary metadata variables commonly used as exclusion/inclusion criteria in comparative gut microbiota surveys (n = 4,038 subjects). Red labels represent variables chosen for exclusion while blue labels represent included subjects. Centre lines represent median values of 100-repeat mean AUROC’s, boxes denote interquartile ranges, and whiskers denote 1.5*interquartile ranges. b, Support vector machine analysis was performed on subjects by age group. Shown is a normalized confusion matrix, averaged across all cross-validation folds. Hierarchical clustering using Euclidean distances with average weighting is shown to the right. c, Random Forests AUROC values for all variables with empirical P < 0.05, shown by variable category. Analysis results for “disease-inclusive” cohorts (with only T2D and IBD removed as per final exclusion criteria, n = 5,878) are shown as well as results using only subjects reporting no medical diagnoses of diseases (“disease-exclusive” cohorts, n = 2,971). Centre lines represent median values of 100-repeat mean AUROC’s, boxes denote interquartile ranges, and whiskers denote 1.5*interquartile ranges. d, Random Forests AUROC values for physiological, lifestyle, and diet variables in subjects reporting no medical diagnoses of diseases (“disease-exclusive” cohorts, n = 2,971; x-axis) compared to disease-inclusive cohorts (n = 5,878). Outlined in black are representative cohorts for all variables chosen for matching. For frequency-based variables, the frequency categories (for example, daily, regular) with highest AUROC in the disease-exclusive cohorts are outlined. e, Spearman co-correlation heatmap of all top microbiota-associated variables (those with median AUROC >0.7 and P < 0.05 by Random Forests). Absolute values of Spearman rho correlation coefficients are shown for each variable pair at their intersections. f, Whole grain consumption frequency between non-coeliac subjects reporting no dietary gluten intake (a binary variable that exhibited mean AUROC >0.7 and P < 0.05 by Random Forests). g, Whole grain consumption frequency between coeliac subjects and non-coeliac subjects that report no special gluten-free diet (also mean AUROC >0.7 and P < 0.05). h, Subjects taking vitamin supplements are older than those not taking vitamin supplements. As in f and g, significance assessed by two-sided Mann–Whitney U test. i, Age and smoking frequency display a non-monotonic association. Accordingly, ‘Hoeffding’s D’ statistical test was used to find a significant non-monotonic association between the two variables.
Extended Data Fig. 3 Evaluation of Random Forest microbiota association strengths compared to beta diversity assessments and as a function of sample size.
Shown are plots wherein each dot represents results for a single binary cohort representing a single variable including all those listed in Supplementary Table 1. Cohort sizes were capped at 1,500 cases and controls. P values and non-parametric Spearman correlation coefficients are shown in each plot for each comparison. a, Random Forest AUROC values correlate with beta diversity-based PERMANOVA F statistics, and finds significant differences between cases and controls for fewer cohorts than does PERMANOVA. b, Sample size exhibits no significant correlation with Random Forests AUROC values. c, Sample size correlates with PERMANOVA F statistics. d, Sample size correlates strongly with PERMANOVA R2 effect size values for each variable. e, From binary and frequency host variables, variables were selected that had n > 800 samples and mean AUROC >0.65 (total n = 21 host variables). Sample cohorts for each variable were systematically down-sampled by random selection of subjects such that one case-control cohort was constructed with n = 50, 100, 150, 200, and then in size increments of 100 until reaching the final cohort size. Mean AUROC values were calculated for each cohort and mean values are represented by red dots with blue depicting 95% confidence interval. f, Cohort size for maximal model accuracy was determined as the first cohort size at which Random Forests empirical P reached a value less than 0.05 and mean AUROC reached a 90% interval of the final AUROC (that of the full cohort).
Extended Data Fig. 4 Comparison of microbiota–disease association strengths between disease-inclusive and disease-exclusive cohorts.
a, b, Differences in PERMANOVA F statistics between matched and unmatched cohorts within disease-exclusive analyses with all subjects reporting medical diagnoses removed, analogous to Fig. 2. Subjects in ‘matched’ cohorts were matched for confounding variables shown to differ between cases and controls (purple) in panel a on a per-disease basis. Boxes represent interquartile ranges in F statistics from 25 permuted cohorts per matched/unmatched condition. Centre lines within boxes represent median F statistic values. c, F statistics denoting differences between cases and controls for each disease among unmatched (location-only matched) cohorts comparing disease-exclusive to disease-inclusive results. Spearman rho = 0.81, P = 3.2 * 10−5. d, F statistics denoting differences between cases and controls for each disease among confounder-matched cohorts comparing disease-exclusive to disease-inclusive results. Spearman rho = 0.64, P = 2.9 * 10−3. e, Concordance in whether matching reduces or increases case-control microbiota differences were examined for disease-inclusive and disease-exclusive results. Differences in F statistics between matched and unmatched cohorts for each disease were calculated. Shown are F statistics differences for disease-inclusive cohorts (x-axis) and disease-exclusive cohorts (y-axis). Chi-square P = 0.0073, assuming random distribution of points across quadrants as the null hypothesis.
Extended Data Fig. 5 Machine-learning and compositional analyses for diseases before and after confounder matching.
a, Matching cases and controls for key microbiota confounding variables substantially reduces observed microbiota differences between cases and controls, as assessed by machine-learning methods. Random Forests analysis was performed as in Fig. 2 on location-paired unmatched case control cohorts (red boxes) and case control cohorts matched for confounding variables shown in Fig. 2 (blue boxes). Empirical P value significance based on comparison of AUROCs to permuted ‘shuffled’ data was calculated as described in Methods. Boxes represent interquartile ranges in 100-repeat mean AUROC values per matched/unmatched condition. Centre lines within boxes represent median AUROC values. b, Numbers of differentially abundant ASVs in disease cases versus controls before and after matching cohorts for confounding variables. ANCOM W score thresholds were calculated and ASVs are shown that met each threshold. Notably for type 2 diabetes, 26 ASVs differed significantly before matching, while zero ASVs differed post-matching.
Linear mixed effects analyses were performed as described for Fig. 4j. a, Shown are ASVs that passed unadjusted P < 0.05 in the comparison of diabetics to non-diabetic controls via linear mixed effects models, as compared to the more conservative cutoffs shown in Fig. 4j (Benjamini-Hochberg Q value <0.05). b, Shown are ASVs with Benjamini-Hochberg Q value <0.05 in the comparison of diabetics to non-diabetic controls via linear mixed effects models, with ASVs associated with confounding variables identified by ANCOM as having a W score indicating rejection of the null hypothesis for >80% of log ratio comparisons for that ASV.
Extended Data Fig. 7 Validation of confounding effects of host variables in external independent cohorts of type 2 diabetes and metabolic syndrome.
a, Microbiota-associated host variable distributions between cases and controls in prominent type 2 diabetes gut microbiota surveys. Unpaired t-tests were performed where raw data was available. For studies in which raw data was partial or not found, P values reported in each original publication are shown (Forslund et al., Egshatyan et al.). Centre lines denote mean and whiskers denote standard deviation. b, Matched and unmatched T2D case-control cohorts were constructed from independent studies shown. Student’s t-test was used to compare PERMANOVA F statistic values between randomly selected unmatched cohorts to cohorts that were matched for available confounder metadata (age, BMI, and BMI respectively). Cohort selections were bootstrapped by re-selecting case and control subjects 25 times for both unmatched and matched cohorts. Metformin+T2D were selected for comparison to non-diabetic controls for the study by Forslund et al. Success of matching was assessed using Wilcoxon signed-rank tests and matched cohorts exhibited median Q > 0.05 (ns) for each available confounding variable. c, Metabolic syndrome was examined in an external independent study. BMI, age, and sex were found to differ between location-matched (matched by district in Guangdong) subjects and metabolic syndrome cases. Subjects were matched by these variables including district, and F statistics were compared to unmatched (district-only-matched) case-control cohorts. Centre lines represent median values, boxes denote interquartile ranges, and whiskers denote 1.5* interquartile ranges. *P ≤ 0.05; **P ≤ 0.01; ***P ≤ 0.001; ****P ≤ 0.0001.
Extended Data Fig. 8 Assessment of strength of confounding effects for microbiota-associated confounding host variables.
a, Cases and controls were matched for all relevant matching variables except one that was held out (‘leave one out’ (LOO)). The effect of the single variable held out was then assessed by comparing the increase of PERMANOVA F statistic between cases and controls to that of the total change in F statistic from fully matched to unmatched case-control cohorts. Thus, an assessment for the relative independent contribution of each variable to confounding effects in the setting of matching for all other variables was obtained for each variable for each disease. b, Matching by a single variable was performed and resulting F statistics were similarly compared to the difference in F statistics from unmatched to fully matched cohorts, as described in Methods. In a and b, centre lines represent median values, boxes denote interquartile ranges, and whiskers denote 1.5* interquartile ranges.
Extended Data Fig. 9 Examination of the effects of alcohol consumption on the gut microbiota with external validation.
a, ASV abundances were collapsed to the genus level and log10 mean fold changes were calculated between daily versus never drinkers in the AGP dataset (x axis) and compared to log10 mean fold changes in daily/weekly versus monthly/never drinkers in an external validation dataset (y axis). Spearman correlation test P = 5 * 10−5. b, ASVs in differential abundance in all alcohol consumer cohorts compared to matched control non-drinker subjects, by ANCOM. Matched cohorts were constructed by selecting controls matched for all confounding variables and ANCOM was performed. ASVs found to differ significantly between cases and controls are marked by green circles and denoted by their ANCOM threshold. c, Alcohol consumption frequency, number per session, and cumulative weekly consumption are confounded for various microbiota-associated host variables. d, Microbiota covariate association strength as estimated by Random Forests empirical P value tests for alcohol consumption cohorts. Alcohol subjects were matched to never-drinker controls for confounding variables shown in Extended Data Fig. 9a and Random Forests analysis was performed as in Fig. 2. Bars denote interquartile ranges of AUROCs from 100 repeats. Empirical P = 0.0739, P = 0.0495. n = 350 participants per group. e, Subjects reporting drinking only one type of alcohol (beer/cider, red wine, white wine, or spirits/hard alcohol), were compared to non-drinkers matched for variables shown in (c). Cohort sample sizes were increased when including drinkers who consumed multiple types, and significant median PERMANOVA P values were observed: P = 0.004, P = 0.007, P = 0.021, P = 0.076. In d and e, centre lines represent median values, boxes denote interquartile ranges, and whiskers denote 1.5* interquartile ranges. f, Alpha diversity was calculated for subjects reporting consumption of each alcohol type (inclusive of those who also drink other types). Lines depict differences in median alpha diversity between cases and controls for each of the 25 re-sampled case-control cohorts. Unadjusted two-sided paired Student’s t-tests were performed. †P ≤ 0.10, *P ≤ 0.05.
a, Subjects reporting solid or loose bowel movement (BM) quality were compared to subjects reporting normal BM quality in terms of their distribution of microbiota-confounding variables. All BM subject cohorts were thus subsequently matched for sex, alcohol, BMI, whole grain, and salted snack consumption (for Fig. 4e, f). b, ASV abundances were collapsed to the genus level and log10 mean fold changes were calculated between solid versus normal BM quality subjects AGP dataset (x axis) and compared to log10 mean fold changes in sold versus normal BM quality subjects in an external validation dataset15 (y axis). Spearman correlation test P = 10−16.
Supplementary Table 1: A) Random Forests analysis of common exclusion criteria. Included are class definitions for each cohort, sample and population sizes, and exclusion criteria used. B) Random Forests analysis of all questionnaire variables, with new exclusion criteria imposed. Included are class definitions for each cohort, sample and population sizes, and exclusion criteria used.
Supplementary Table 2: Random Forest and PERMANOVA output values for all location-paired cohorts representing each non-disease host metadata variable.
Supplementary Table 3: Random Forest ASV importance values denoting relative contribution of each ASV to classifiers for each non-disease unmatched cohort. Mean fold changes for each ASV in each non-disease unmatched cohort. ANCOM threshold values for each OTU passing ANCOM filters in each non-disease unmatched cohort.
Supplementary Table 4: A) Distribution of microbiota-associated variables in unmatched disease cases versus controls and statistical assessments of skewing. B) Case/control cohort sample sizes and median PERMANOVA P values and F statistics for all matched and unmatched disese cohorts.
Supplementary Table 5: Statistical data on ASVs differing in abundance between T2D cases and controls with cohorts that were unmatched and adjusted, unmatched without adjustment, and with matched cohorts.
Supplementary Table 6: ASVs in differential abundance between matched cases and controls for all queried human diseases by ANCOM.
About this article
Cite this article
Vujkovic-Cvijin, I., Sklar, J., Jiang, L. et al. Host variables confound gut microbiota studies of human disease. Nature 587, 448–454 (2020). https://doi.org/10.1038/s41586-020-2881-9