Host variables confound gut microbiota studies of human disease

Vujkovic-Cvijin, Ivan; Sklar, Jack; Jiang, Lingjing; Natarajan, Loki; Knight, Rob; Belkaid, Yasmine

doi:10.1038/s41586-020-2881-9

Article
Published: 04 November 2020

Host variables confound gut microbiota studies of human disease

Nature volume 587, pages 448–454 (2020)Cite this article

31k Accesses
291 Citations
443 Altmetric
Metrics details

Subjects

Abstract

Low concordance between studies that examine the role of microbiota in human diseases is a pervasive challenge that limits the capacity to identify causal relationships between host-associated microorganisms and pathology. The risk of obtaining false positives is exacerbated by wide interindividual heterogeneity in microbiota composition¹, probably due to population-wide differences in human lifestyle and physiological variables² that exert differential effects on the microbiota. Here we infer the greatest, generalized sources of heterogeneity in human gut microbiota profiles and also identify human lifestyle and physiological characteristics that, if not evenly matched between cases and controls, confound microbiota analyses to produce spurious microbial associations with human diseases. We identify alcohol consumption frequency and bowel movement quality as unexpectedly strong sources of gut microbiota variance that differ in distribution between healthy participants and participants with a disease and that can confound study designs. We demonstrate that for numerous prevalent, high-burden human diseases, matching cases and controls for confounding variables reduces observed differences in the microbiota and the incidence of spurious associations. On this basis, we present a list of host variables that we recommend should be captured in human microbiota studies for the purpose of matching comparison groups, which we anticipate will increase robustness and reproducibility in resolving the members of the gut microbiota that are truly associated with human disease.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Physiological, lifestyle and dietary characteristics strongly associate with the composition of the gut microbiota.**

**Fig. 2: Human participants with a disease vary from healthy controls in critical microbiota-associated variables that confound microbiota analyses.**

**Fig. 3: Variation in microbiota due to confounding variables spuriously increases observations of disease-associated microbiota differences.**

**Fig. 4: Alcohol intake and BMQ are associated with robust effects on microbiota composition that confound microbiota studies of human disease.**

Health and disease markers correlate with gut microbiome composition across thousands of people

Article Open access 15 October 2020

Combined effects of host genetics and diet on human gut microbiota and incident disease in a single population cohort

Article 03 February 2022

Microbiome risk profiles as biomarkers for inflammatory and metabolic disorders

Article 21 February 2022

Data availability

The sequencing data of the AGP used herein are available at the EBI (https://www.ebi.ac.uk/) database under study accession ID: MGYS00000596. External validation cohort data are available at NCBI BioProject PRJNA589036 (for alcohol consumption replication) and NCBI BioProject PRJEB18535 (for BMQ replication).

Code availability

Source code for machine-learning analyses can be obtained at: https://github.com/jacksklar/AGPMicrobiomeHostPredictions. Source code for the remaining analyses including determination of mismatched host variables, case–control matching algorithms, and construction of permuted case–control cohorts can be obtained at: https://github.com/ivanvujkc/AGP_confounders.

References

The Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).
Article ADS Google Scholar
Falony, G. et al. Population-level analysis of gut microbiome variation. Science 352, 560–564 (2016).
Article CAS ADS Google Scholar
Hsiao, E. Y. et al. Microbiota modulate behavioral and physiological abnormalities associated with neurodevelopmental disorders. Cell 155, 1451–1463 (2013).
Article CAS Google Scholar
Plovier, H. et al. A purified membrane protein from Akkermansia muciniphila or the pasteurized bacterium improves metabolism in obese and diabetic mice. Nat. Med. 23, 107–113 (2017).
Article CAS Google Scholar
Belkaid, Y. & Hand, T. W. Role of the microbiota in immunity and inflammation. Cell 157, 121–141 (2014).
Article CAS Google Scholar
Callahan, B. J. et al. DADA2: high-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581–583 (2016).
Article CAS Google Scholar
McDonald, D. et al. American gut: an open platform for citizen science. mSystems 3, e00031-18 (2018).
Article CAS Google Scholar
Forslund, K. et al. Disentangling type 2 diabetes and metformin treatment signatures in the human gut microbiota. Nature 528, 262–266 (2015).
Article CAS Google Scholar
Qin, J. et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490, 55–60 (2012).
Article CAS ADS Google Scholar
Thingholm, L. B. et al. Obese individuals with and without type 2 diabetes show different gut microbial functional capacity and composition. Cell Host Microbe 26, 252–264.e10 (2019).
Article CAS Google Scholar
Mandal, S. et al. Analysis of composition of microbiomes: a novel method for studying microbial composition. Microb. Ecol. Health Dis. 26, 27663 (2015).
PubMed Google Scholar
Larsen, N. et al. Gut microbiota in human adults with type 2 diabetes differs from non-diabetic adults. PLoS ONE 5, e9085 (2010).
Article ADS Google Scholar
Egshatyan, L. et al. Gut microbiota and diet in patients with different glucose tolerance. Endocr. Connect. 5, 1–9 (2016).
Article CAS Google Scholar
Karlsson, F. H. et al. Gut metagenome in European women with normal, impaired and diabetic glucose control. Nature 498, 99–103 (2013).
Article CAS ADS Google Scholar
He, Y. et al. Regional variation limits applications of healthy gut microbiome reference ranges and disease models. Nat. Med. 24, 1532–1535 (2018).
Article CAS Google Scholar
Gevers, D. et al. The treatment-naive microbiome in new-onset Crohn’s disease. Cell Host Microbe 15, 382–392 (2014).
Article CAS Google Scholar
Vich Vila, A. et al. Gut microbiota composition and functional changes in inflammatory bowel disease and irritable bowel syndrome. Sci. Transl. Med. 10, eaap8914 (2018).
Article Google Scholar
Lloyd-Price, J. et al. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature 569, 655–662 (2019).
Article CAS ADS Google Scholar
Vujkovic-Cvijin, I. et al. HIV-associated gut dysbiosis is independent of sexual practice and correlates with noncommunicable diseases. Nat. Commun. 11, 2448 (2020).
Article CAS ADS Google Scholar
Llopis, M. et al. Intestinal microbiota contributes to individual susceptibility to alcoholic liver disease. Gut 65, 830–839 (2016).
Article CAS Google Scholar
Ciocan, D. et al. Bile acid homeostasis and intestinal dysbiosis in alcoholic hepatitis. Aliment. Pharmacol. Ther. 48, 961–974 (2018).
Article CAS Google Scholar
Dubinkina, V. B. et al. Links of gut microbiota composition with alcohol dependence syndrome and alcoholic liver disease. Microbiome 5, 141 (2017).
Article Google Scholar
Le Roy, C. I. et al. Red wine consumption associated with increased gut microbiota α-diversity in 3 independent cohorts. Gastroenterology 158, 270–272.e2 (2020).
Article Google Scholar
Valles-Colomer, M. et al. The neuroactive potential of the human gut microbiota in quality of life and depression. Nat. Microbiol. 4, 623–632 (2019).
Article CAS Google Scholar
Reese, A. T. et al. Using DNA metabarcoding to evaluate the plant component of human diets: a proof of concept. mSystems 4, e00458-19 (2019).
Article CAS Google Scholar
Noguera-Julian, M. et al. Gut microbiota linked to sexual preference and HIV infection. EBioMedicine 5, 135–146 (2016).
Article Google Scholar
Amir, A. et al. Correcting for microbial blooms in fecal samples during room-temperature shipping. mSystems 2, e00199-16 (2017).
CAS PubMed PubMed Central Google Scholar
Vujkovic-Cvijin, I. et al. Dysbiosis of the gut microbiota is associated with HIV disease progression and tryptophan catabolism. Sci. Transl. Med. 5, 193ra91 (2013).
Article Google Scholar
Deschasaux, M. et al. Depicting the composition of gut microbiota in a population with varied ethnic origins but shared geography. Nat. Med. 24, 1526–1531 (2018).
Article CAS Google Scholar
Yasuda, K. et al. Biogeography of the intestinal mucosal and lumenal microbiome in the rhesus macaque. Cell Host Microbe 17, 385–391 (2015).
Article CAS Google Scholar
Cadwell, K. et al. Virus-plus-susceptibility gene interaction determines Crohn’s disease gene Atg16L1 phenotypes in intestine. Cell 141, 1135–1145 (2010).
Article CAS Google Scholar
Zhernakova, A. et al. Population-based metagenomics analysis reveals markers for gut microbiome composition and diversity. Science 352, 565–569 (2016).
Article CAS ADS Google Scholar
Wilck, N. et al. Salt-responsive gut commensal modulates T_H17 axis and disease. Nature 551, 585–589 (2017).
Article CAS ADS Google Scholar
Korem, T. et al. Bread affects clinical parameters and induces gut microbiome-associated personal glycemic responses. Cell Metab. 25, 1243–1253.e5 (2017).
Article CAS Google Scholar
Ojala, M. & Garriga, G. C. Permutation tests for studying classifier performance. J. Mach. Learn. Res. 11, 1833–1863 (2010).
MathSciNet MATH Google Scholar
Barter, R. L. & Yu, B. Superheat: an R package for creating beautiful and extendable heatmaps for visualizing complex data. J. Comput. Graph. Stat. 27, 910–922 (2018).
Article MathSciNet Google Scholar
Seidell, J. C. & Halberstadt, J. The global burden of obesity and the challenges of prevention. Ann. Nutr. Metab. 66 (Suppl. 2), 7–12 (2015).
Article CAS Google Scholar
Palleja, A. et al. Recovery of gut microbiota of healthy adults following antibiotic exposure. Nat. Microbiol. 3, 1255–1265 (2018).
Article CAS Google Scholar
Pasolli, E. et al. Accessible, curated metagenomic data through ExperimentHub. Nat. Methods 14, 1023–1024 (2017).
Article CAS Google Scholar

Download references

Acknowledgements

This research was supported by the Intramural Research Program of the National Institute of Allergy and Infectious Diseases (NIAID), National Institutes of Health (NIH). I.V.-C. was funded by the Cancer Research Institute Irvington Postdoctoral Fellowship Award and the Intramural AIDS Research Fellowship Award (NIH). Y.B. was funded by the NIAID Division of Intramural Research (ZIA-AI001115 and ZIA-AI001132), the NIH Director’s Challenge Award program and the Deputy Director for Intramural Research Innovation Award program. R.K. was funded by the NIH Pioneer Award (DP1 AT010885-01). L.N. was partially supported by the National Institute of Diabetes and Digestive and Kidney Diseases (1R01DK110541-01A1). We thank P. Grayson (National Institute of Arthritis and Musculoskeletal and Skin Diseases/NIH), P. Reiss (University of Amsterdam), A. Stacy (NIAID/NIH) and S.-J. Han (NIAID/NIH) for helpful discussion; as well as all members, contributors, administrators and volunteers of the American Gut Consortium for facilitating the AGP as an open-access resource for the microbiome science community.

Author information

Jack Sklar
Present address: Communications Technology Laboratory, National Institute of Standards and Technology, Boulder, CO, USA

Authors and Affiliations

Metaorganism Immunity Section, Laboratory of Immune Systems Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, USA
Ivan Vujkovic-Cvijin, Jack Sklar & Yasmine Belkaid
National Institute of Allergy and Infectious Diseases Microbiome Program, National Institutes of Health, Bethesda, MD, USA
Jack Sklar & Yasmine Belkaid
Division of Biostatistics, University of California San Diego, La Jolla, CA, USA
Lingjing Jiang & Loki Natarajan
Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
Rob Knight
Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
Rob Knight
Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
Rob Knight
Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, USA
Rob Knight

Authors

Ivan Vujkovic-Cvijin
View author publications
You can also search for this author in PubMed Google Scholar
Jack Sklar
View author publications
You can also search for this author in PubMed Google Scholar
Lingjing Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Loki Natarajan
View author publications
You can also search for this author in PubMed Google Scholar
Rob Knight
View author publications
You can also search for this author in PubMed Google Scholar
Yasmine Belkaid
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

I.V.-C. conceived and led the study, wrote the manuscript with contributions from all authors, and performed all beta diversity-based ecological analyses, internal validations, validations on external cohorts, visualizations, selection of exclusion/matching criteria and quantification of confounding effects. J.S. designed the machine-learning strategies. J.S. and I.V.-C. designed and implemented the participant matching algorithms. L.J., L.N. and R.K. contributed statistical analyses including compositionally based differential abundance tests and benchmarking of case–control matching processes. Y.B. oversaw project completion, secured funding and contributed to the manuscript.

Corresponding authors

Correspondence to Ivan Vujkovic-Cvijin or Yasmine Belkaid.

Ethics declarations

Competing interests

R.K. is a director of the Center for Microbiome Innovation at UC San Diego, which receives industry research funding for various microbiome initiatives, but no industry funding was provided for this project. The remaining authors declare no competing interests.

Additional information

Peer review information Nature thanks Eran Elinav and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Data processing and machine-learning analysis framework.

Raw V4 16S rRNA reads were processed using dada2 and samples were filtered and selected as described in the text and Methods to form the ‘core sample population’. Balanced cohorts were constructed for each binary questionnaire variable, and Random Forests analyses were repeated 25 times over 75/25 splits. Concurrently, sample classes were randomly permuted to simulate noise and the same procedure was performed to facilitate empirical P value estimations.

Extended Data Fig. 2 Machine-learning evaluation of common exclusion criteria and variables for matching.

a, Random Forests analysis was performed on binary metadata variables commonly used as exclusion/inclusion criteria in comparative gut microbiota surveys (n = 4,038 subjects). Red labels represent variables chosen for exclusion while blue labels represent included subjects. Centre lines represent median values of 100-repeat mean AUROC’s, boxes denote interquartile ranges, and whiskers denote 1.5*interquartile ranges. b, Support vector machine analysis was performed on subjects by age group. Shown is a normalized confusion matrix, averaged across all cross-validation folds. Hierarchical clustering using Euclidean distances with average weighting is shown to the right. c, Random Forests AUROC values for all variables with empirical P < 0.05, shown by variable category. Analysis results for “disease-inclusive” cohorts (with only T2D and IBD removed as per final exclusion criteria, n = 5,878) are shown as well as results using only subjects reporting no medical diagnoses of diseases (“disease-exclusive” cohorts, n = 2,971). Centre lines represent median values of 100-repeat mean AUROC’s, boxes denote interquartile ranges, and whiskers denote 1.5*interquartile ranges. d, Random Forests AUROC values for physiological, lifestyle, and diet variables in subjects reporting no medical diagnoses of diseases (“disease-exclusive” cohorts, n = 2,971; x-axis) compared to disease-inclusive cohorts (n = 5,878). Outlined in black are representative cohorts for all variables chosen for matching. For frequency-based variables, the frequency categories (for example, daily, regular) with highest AUROC in the disease-exclusive cohorts are outlined. e, Spearman co-correlation heatmap of all top microbiota-associated variables (those with median AUROC >0.7 and P < 0.05 by Random Forests). Absolute values of Spearman rho correlation coefficients are shown for each variable pair at their intersections. f, Whole grain consumption frequency between non-coeliac subjects reporting no dietary gluten intake (a binary variable that exhibited mean AUROC >0.7 and P < 0.05 by Random Forests). g, Whole grain consumption frequency between coeliac subjects and non-coeliac subjects that report no special gluten-free diet (also mean AUROC >0.7 and P < 0.05). h, Subjects taking vitamin supplements are older than those not taking vitamin supplements. As in f and g, significance assessed by two-sided Mann–Whitney U test. i, Age and smoking frequency display a non-monotonic association. Accordingly, ‘Hoeffding’s D’ statistical test was used to find a significant non-monotonic association between the two variables.

Extended Data Fig. 3 Evaluation of Random Forest microbiota association strengths compared to beta diversity assessments and as a function of sample size.

Shown are plots wherein each dot represents results for a single binary cohort representing a single variable including all those listed in Supplementary Table 1. Cohort sizes were capped at 1,500 cases and controls. P values and non-parametric Spearman correlation coefficients are shown in each plot for each comparison. a, Random Forest AUROC values correlate with beta diversity-based PERMANOVA F statistics, and finds significant differences between cases and controls for fewer cohorts than does PERMANOVA. b, Sample size exhibits no significant correlation with Random Forests AUROC values. c, Sample size correlates with PERMANOVA F statistics. d, Sample size correlates strongly with PERMANOVA R² effect size values for each variable. e, From binary and frequency host variables, variables were selected that had n > 800 samples and mean AUROC >0.65 (total n = 21 host variables). Sample cohorts for each variable were systematically down-sampled by random selection of subjects such that one case-control cohort was constructed with n = 50, 100, 150, 200, and then in size increments of 100 until reaching the final cohort size. Mean AUROC values were calculated for each cohort and mean values are represented by red dots with blue depicting 95% confidence interval. f, Cohort size for maximal model accuracy was determined as the first cohort size at which Random Forests empirical P reached a value less than 0.05 and mean AUROC reached a 90% interval of the final AUROC (that of the full cohort).

Extended Data Fig. 4 Comparison of microbiota–disease association strengths between disease-inclusive and disease-exclusive cohorts.

a, b, Differences in PERMANOVA F statistics between matched and unmatched cohorts within disease-exclusive analyses with all subjects reporting medical diagnoses removed, analogous to Fig. 2. Subjects in ‘matched’ cohorts were matched for confounding variables shown to differ between cases and controls (purple) in panel a on a per-disease basis. Boxes represent interquartile ranges in F statistics from 25 permuted cohorts per matched/unmatched condition. Centre lines within boxes represent median F statistic values. c, F statistics denoting differences between cases and controls for each disease among unmatched (location-only matched) cohorts comparing disease-exclusive to disease-inclusive results. Spearman rho = 0.81, P = 3.2 * 10⁻⁵. d, F statistics denoting differences between cases and controls for each disease among confounder-matched cohorts comparing disease-exclusive to disease-inclusive results. Spearman rho = 0.64, P = 2.9 * 10⁻³. e, Concordance in whether matching reduces or increases case-control microbiota differences were examined for disease-inclusive and disease-exclusive results. Differences in F statistics between matched and unmatched cohorts for each disease were calculated. Shown are F statistics differences for disease-inclusive cohorts (x-axis) and disease-exclusive cohorts (y-axis). Chi-square P = 0.0073, assuming random distribution of points across quadrants as the null hypothesis.

Extended Data Fig. 5 Machine-learning and compositional analyses for diseases before and after confounder matching.

a, Matching cases and controls for key microbiota confounding variables substantially reduces observed microbiota differences between cases and controls, as assessed by machine-learning methods. Random Forests analysis was performed as in Fig. 2 on location-paired unmatched case control cohorts (red boxes) and case control cohorts matched for confounding variables shown in Fig. 2 (blue boxes). Empirical P value significance based on comparison of AUROCs to permuted ‘shuffled’ data was calculated as described in Methods. Boxes represent interquartile ranges in 100-repeat mean AUROC values per matched/unmatched condition. Centre lines within boxes represent median AUROC values. b, Numbers of differentially abundant ASVs in disease cases versus controls before and after matching cohorts for confounding variables. ANCOM W score thresholds were calculated and ASVs are shown that met each threshold. Notably for type 2 diabetes, 26 ASVs differed significantly before matching, while zero ASVs differed post-matching.

Extended Data Fig. 6 Assessment of the capacity for statistical methods to correct for mismatching.

Linear mixed effects analyses were performed as described for Fig. 4j. a, Shown are ASVs that passed unadjusted P < 0.05 in the comparison of diabetics to non-diabetic controls via linear mixed effects models, as compared to the more conservative cutoffs shown in Fig. 4j (Benjamini-Hochberg Q value <0.05). b, Shown are ASVs with Benjamini-Hochberg Q value <0.05 in the comparison of diabetics to non-diabetic controls via linear mixed effects models, with ASVs associated with confounding variables identified by ANCOM as having a W score indicating rejection of the null hypothesis for >80% of log ratio comparisons for that ASV.

Extended Data Fig. 7 Validation of confounding effects of host variables in external independent cohorts of type 2 diabetes and metabolic syndrome.

a, Microbiota-associated host variable distributions between cases and controls in prominent type 2 diabetes gut microbiota surveys. Unpaired t-tests were performed where raw data was available. For studies in which raw data was partial or not found, P values reported in each original publication are shown (Forslund et al., Egshatyan et al.). Centre lines denote mean and whiskers denote standard deviation. b, Matched and unmatched T2D case-control cohorts were constructed from independent studies shown. Student’s t-test was used to compare PERMANOVA F statistic values between randomly selected unmatched cohorts to cohorts that were matched for available confounder metadata (age, BMI, and BMI respectively). Cohort selections were bootstrapped by re-selecting case and control subjects 25 times for both unmatched and matched cohorts. Metformin+T2D were selected for comparison to non-diabetic controls for the study by Forslund et al. Success of matching was assessed using Wilcoxon signed-rank tests and matched cohorts exhibited median Q > 0.05 (ns) for each available confounding variable. c, Metabolic syndrome was examined in an external independent study. BMI, age, and sex were found to differ between location-matched (matched by district in Guangdong) subjects and metabolic syndrome cases. Subjects were matched by these variables including district, and F statistics were compared to unmatched (district-only-matched) case-control cohorts. Centre lines represent median values, boxes denote interquartile ranges, and whiskers denote 1.5* interquartile ranges. *P ≤ 0.05; **P ≤ 0.01; ***P ≤ 0.001; ****P ≤ 0.0001.

Extended Data Fig. 8 Assessment of strength of confounding effects for microbiota-associated confounding host variables.

a, Cases and controls were matched for all relevant matching variables except one that was held out (‘leave one out’ (LOO)). The effect of the single variable held out was then assessed by comparing the increase of PERMANOVA F statistic between cases and controls to that of the total change in F statistic from fully matched to unmatched case-control cohorts. Thus, an assessment for the relative independent contribution of each variable to confounding effects in the setting of matching for all other variables was obtained for each variable for each disease. b, Matching by a single variable was performed and resulting F statistics were similarly compared to the difference in F statistics from unmatched to fully matched cohorts, as described in Methods. In a and b, centre lines represent median values, boxes denote interquartile ranges, and whiskers denote 1.5* interquartile ranges.

Extended Data Fig. 9 Examination of the effects of alcohol consumption on the gut microbiota with external validation.

a, ASV abundances were collapsed to the genus level and log₁₀ mean fold changes were calculated between daily versus never drinkers in the AGP dataset (x axis) and compared to log₁₀ mean fold changes in daily/weekly versus monthly/never drinkers in an external validation dataset (y axis). Spearman correlation test P = 5 * 10⁻⁵. b, ASVs in differential abundance in all alcohol consumer cohorts compared to matched control non-drinker subjects, by ANCOM. Matched cohorts were constructed by selecting controls matched for all confounding variables and ANCOM was performed. ASVs found to differ significantly between cases and controls are marked by green circles and denoted by their ANCOM threshold. c, Alcohol consumption frequency, number per session, and cumulative weekly consumption are confounded for various microbiota-associated host variables. d, Microbiota covariate association strength as estimated by Random Forests empirical P value tests for alcohol consumption cohorts. Alcohol subjects were matched to never-drinker controls for confounding variables shown in Extended Data Fig. 9a and Random Forests analysis was performed as in Fig. 2. Bars denote interquartile ranges of AUROCs from 100 repeats. Empirical P = 0.0739, P = 0.0495. n = 350 participants per group. e, Subjects reporting drinking only one type of alcohol (beer/cider, red wine, white wine, or spirits/hard alcohol), were compared to non-drinkers matched for variables shown in (c). Cohort sample sizes were increased when including drinkers who consumed multiple types, and significant median PERMANOVA P values were observed: P = 0.004, P = 0.007, P = 0.021, P = 0.076. In d and e, centre lines represent median values, boxes denote interquartile ranges, and whiskers denote 1.5* interquartile ranges. f, Alpha diversity was calculated for subjects reporting consumption of each alcohol type (inclusive of those who also drink other types). Lines depict differences in median alpha diversity between cases and controls for each of the 25 re-sampled case-control cohorts. Unadjusted two-sided paired Student’s t-tests were performed. †P ≤ 0.10, *P ≤ 0.05.

Extended Data Fig. 10 Bowel movement quality matching and external validation.

a, Subjects reporting solid or loose bowel movement (BM) quality were compared to subjects reporting normal BM quality in terms of their distribution of microbiota-confounding variables. All BM subject cohorts were thus subsequently matched for sex, alcohol, BMI, whole grain, and salted snack consumption (for Fig. 4e, f). b, ASV abundances were collapsed to the genus level and log₁₀ mean fold changes were calculated between solid versus normal BM quality subjects AGP dataset (x axis) and compared to log₁₀ mean fold changes in sold versus normal BM quality subjects in an external validation dataset¹⁵ (y axis). Spearman correlation test P = 10⁻¹⁶.

Supplementary information

Reporting Summary

Supplementary Table

Supplementary Table 1: A) Random Forests analysis of common exclusion criteria. Included are class definitions for each cohort, sample and population sizes, and exclusion criteria used. B) Random Forests analysis of all questionnaire variables, with new exclusion criteria imposed. Included are class definitions for each cohort, sample and population sizes, and exclusion criteria used.

Supplementary Table

Supplementary Table 2: Random Forest and PERMANOVA output values for all location-paired cohorts representing each non-disease host metadata variable.

Supplementary Table

Supplementary Table 3: Random Forest ASV importance values denoting relative contribution of each ASV to classifiers for each non-disease unmatched cohort. Mean fold changes for each ASV in each non-disease unmatched cohort. ANCOM threshold values for each OTU passing ANCOM filters in each non-disease unmatched cohort.

Supplementary Table

Supplementary Table 4: A) Distribution of microbiota-associated variables in unmatched disease cases versus controls and statistical assessments of skewing. B) Case/control cohort sample sizes and median PERMANOVA P values and F statistics for all matched and unmatched disese cohorts.

Supplementary Table

Supplementary Table 5: Statistical data on ASVs differing in abundance between T2D cases and controls with cohorts that were unmatched and adjusted, unmatched without adjustment, and with matched cohorts.

Supplementary Table

Supplementary Table 6: ASVs in differential abundance between matched cases and controls for all queried human diseases by ANCOM.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vujkovic-Cvijin, I., Sklar, J., Jiang, L. et al. Host variables confound gut microbiota studies of human disease. Nature 587, 448–454 (2020). https://doi.org/10.1038/s41586-020-2881-9

Download citation

Received: 21 November 2019
Accepted: 28 September 2020
Published: 04 November 2020
Issue Date: 19 November 2020
DOI: https://doi.org/10.1038/s41586-020-2881-9

This article is cited by

Microbiome as a biomarker and therapeutic target in pancreatic cancer
- Ghazaleh Pourali
- Danial Kazemi
- Amir Avan
BMC Microbiology (2024)
The gut microbiome in systemic lupus erythematosus: lessons from rheumatic fever
- Gregg J. Silverman
- Doua F. Azzouz
- Abhimanyu Amarnani
Nature Reviews Rheumatology (2024)
Cardiovascular disease and cancer: shared risk factors and mechanisms
- Nicholas S. Wilcox
- Uri Amit
- Bonnie Ky
Nature Reviews Cardiology (2024)
Systematic identification of the role of gut microbiota in mental disorders: a TwinsUK cohort study
- Julie Delanote
- Alejandro Correa Rojo
- Gökhan Ertaylan
Scientific Reports (2024)
Characterization of the gut bacterial and viral microbiota in latent autoimmune diabetes in adults
- Casper S. Poulsen
- Dan Hesse
- Mette K. Andersen
Scientific Reports (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Extended data figures and tables

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links