The resident microorganisms in the human body, termed the microbiota, represent diverse communities of microbial species comprising a complex ecology of tens of trillions of mainly bacterial cells1. Our gut microbiota, the largest and most diverse of these communities, is in constant interaction with our body’s cells and systems (such as the immune system)2, and it both shapes, and is being shaped by, our health status. The particular composition and diversity of the gut microbiota are associated with many health conditions3. However, it is usually not known whether such associations are just correlative or a consequence of the health condition, or whether they might cause, or contribute to, the illness. Addressing this problem is highly challenging because of the many physiological and lifestyle differences that can exist between individuals who are healthy and those who have the illness of interest. Such confounders — the variables that correlate with both microbiota and health status — might underlie the many discrepancies observed between the outcomes of different studies linking the composition of the gut microbiota and human health4.
Writing in Nature, Vujkovic-Cvijin et al.5 tackle this problem. First, they consider physiological and lifestyle differences between people with and without a particular disease, and identify differences that might themselves be associated with the composition of the gut microbiota. Such differences can cause variation in the composition of gut microbes between healthy individuals and those who have the disease. Without knowing about these differences, it would be easy to misclassify a correlative and confounding association between lifestyle and the microbiota as being an informative causal association between disease and microbiota composition.
Next, the authors attempted to deal with such confounders by taking the approach of one-to-one matching6 of individuals who had a particular condition with healthy individuals who were similar to them with regard to such potential confounders (Fig. 1). An example might be matching with an individual of the same age, gender and body mass index (a value used in assessing a person’s weight that takes height into consideration). This type of matching procedure is often used in observational studies in which individuals cannot be assigned randomly to two groups and subjected to the two different scenarios being compared7.
Vujkovic-Cvijin et al. report that gender, age, bowel-movement quality (categorized as stools that are solid, normal or loose), body mass index and level of alcohol consumption are among the strongest potential confounders that could hinder efforts to identify true associations between disease and gut-microbiota composition. This is because these characteristics are strongly associated both with microbiota composition and with disease status. When examining the differences between individuals with a condition such as type 2 diabetes and people who do not have this condition (but who might have other diseases), there seem to be many statistically significant associations between disease status and the abundances of different gut bacteria. By contrast, if individuals who have or do not have the disease are matched using some of the confounder criteria mentioned, many of these associations cease to be statistically significant. This implies that some gut-microbiota changes previously attributed to certain diseases might instead stem from other underlying causes related to these confounders.
For example, alcohol consumption causes gut-microbiota changes, and individuals who have certain diseases consume less alcohol than average (perhaps because of the drugs that they take). Therefore, failing to match individuals on their level of alcohol consumption could result in a misleading conclusion that microbiota changes associated with the disease are attributable to the disease itself, rather than to a below-average alcohol intake.
A potential problem with Vujkovic-Cvijin and colleagues’ approach is that some of the suggested confounders might be associated with disease symptoms, rather than being lifestyle choices; people in these confounding categories could in that case already be sick but undiagnosed, or on the path to being ill. In such cases, matching with healthy individuals might actually introduce bias8. For example, matching people on their level of alcohol intake makes no sense when studying alcoholic liver disease. Moreover, even if potential confounders are not linked to the defining symptoms of the disease in question, or are not uniquely matched to symptoms of the disease, it should still be a cause for concern if matching for the confounder would mean that the resulting matched group is not representative of healthy individuals. For instance, matching people who have lung cancer with individuals who don’t have it, after the same number of years of heavy smoking, will not provide a truly healthy control group.
With that in mind, people with inflammatory bowel disease should not be matched with a healthy matching group on the basis of bowel-movement quality. Nor should people who have type 2 diabetes be matched with a healthy cohort on the basis of blood levels of the glycoprotein HbA1C, which offers a way of assessing long-term excess sugar levels (something that the authors don’t do). Researchers should also be suspicious of matching people who have type 2 diabetes with a healthy cohort on the basis of body mass index.
In an effort to address this issue, the authors repeated their analysis using a smaller cohort, in which none of the individuals in the healthy group self-reported any type of disease at all (the previous criterion for healthy individuals was just those who did not self-report the specific disease of interest). They found similar associations between disease status and the physiological and lifestyle differences, although these associations were now either less statistically significant than in the original analysis or no longer significant. Unfortunately, removing individuals with any self-reported disease does not rule out matching the people from the disease cohort with control individuals who might nevertheless be undiagnosed, or whose disease status might be borderline; this could happen if, for example, people who have diabetes are matched with those who are pre-diabetic. This problem, whose scope extends beyond this study, raises a key question for all medical studies: what constitutes a healthy cohort?
Finally, it is important to remember that identifying potential confounders between gut-microbiota composition and human health does not imply that these are unrelated. Nor does it imply a lack of causality where a relationship does exist. For example, if alcohol consumption causes changes to the microbiota that, in turn, contribute to developing type 2 diabetes, then a causal effect exists between the microbiota and the disease; but this will not be seen after matching individuals on their level of alcohol consumption. The same will be true if inflammatory bowel disease results in the types of microbiota change that cause diarrhoea, and individuals are matched on their bowel-movement quality. Thus, Vujkovic-Cvijin and colleagues’ results do not rule out the microbiota having a causal effect.
The question of causality between the microbiota and human disease is a central topic in studies in this area. These findings will certainly continue to fuel research in the field for years to come, and Vujkovic-Cvijin et al. have taken a step forward for our thinking about this issue.
Nature 587, 373-374 (2020)