Gut microbiota is associated with obesity and cardiometabolic disease in a population in the midst of Westernization

Westernization and its accompanying epidemiological transitions are associated with changes in gut microbiota. While the extremes of this lifestyle spectrum have been compared (hunter-gatherers, industrialized countries), populations undergoing such shifts have received little attention. To fill the gap of knowledge about the microbiome evolution following broad lifestyle changes and the emergence of disease-associated dysbiosis, we performed a cross-sectional study in which we characterized the microbiota of 441 Colombian adults through 16S rRNA gene sequencing and determined its relationship with demographic, health-related and dietary parameters. We showed that in the gut microbiota of this cohort thrive taxa proper of both hunter-gatherers (Prevotella, Treponema) and citizens of industrialized countries (Bacteroides, Bifidobacterium, Barnesiella); the relative abundances of these taxa differed from those in Western and non-Western populations. We also showed that the Colombian gut microbiota is composed of five consortia of co-abundant microorganisms that are differentially associated with lifestyle, obesity and cardiometabolic disease, and highlighted metabolic pathways that might explain associations between microbiota and host health. Our results give insights into the evolution of the gut microbiota, and underscore the importance of this community to human health. Promoting the growth of specific microbial consortia could help ameliorating physiological conditions associated with Western lifestyles.


Prevotella-Bacteroides co-exclusion.
A pattern commonly observed in microbiome studies has been the coexclusion of Prevotella and Bacteroides 1-5 , which has been suggested to suffice for describing enterotypes 6 . We took advantage of the curatedMetagenomicData 7 package to analyze the breadth of this co-exclusion in 16 benchmark metagenomic studies. This meta-analysis confirmed the co-exclusion between these two taxa, with the co-exclusion being stronger in Western (Spearman's rho=-0.32, p=0.002) than in non-Western populations (rho=-0.21, p<0.001). We observed that the negative correlation between Prevotella and Bacteroides in the Colombian cohort was intermediate between Western and non-Western populations (Spearman's rho=-0.26, p<0.0001); this co-exclusion did not distinguish clear types of microbiota (Fig. SR1).

Replicability of the CAG clustering.
We explored the replicability of the five detected CAGs underscored in the Colombian dataset using genus-level abundance in most of the benchmark metagenomic studies available in the curatedMetagenomicData 7 package (11 countries comprising 1600 individuals; we excluded datasets from Austria, Germany, Luxembourg, Peru and Tanzania since they considered few individuals for robust CAG inference); note that OTU abundance was unavailable in these datasets.
For each dataset, we applied the methodology employed for CAG definition in the Colombian dataset (see Methods in the main text), and compared the species composition of each CAG with those underscored in the Colombian cohort. In this way, we counted the number of times taxa clustered with "expected" microbes and the number of times there were "unexpected" associations. As an example, consider stool data from the Human Microbiome Project. In this dataset, we detected a CAG containing Bifidobacterium, Collinsella, Coprococcus, Dorea, Faecalibacterium, Ruminococcus and Streptococcus. All these microbes, except Streptococcus, clustered within the Lachnospiraceae-CAG in our Colombian cohort; Streptococcus was expected to cluster within the Pathogen-CAG. We counted the first six cases as checked (1 point was given to each taxon) while Streptococcus counted as unchecked (0 points). The replicability of a given CAG was calculated as the sum across datasets of all checked cases for the taxa "expected" to cluster in that CAG over the sum of checked and unchecked cases.
This analysis indicated that most datasets formed well-defined CAGs, some of which overlapped with the five CAGs uncovered in the Colombian cohort. In particular, the Akkermansia-Bacteroidales-, Pathogen-and Lachnospiraceae-CAGs were 70-80% replicable across datasets, whereas the Prevotella-and Ruminococcaceae-CAGs were less common (Table SR1). This is not surprising given that important taxa aggregating within the latter CAGs are enriched in non-Western populations (e.g., Prevotella, Ruminococcaceae) while the datasets for comparison originated mostly from Westernized populations, where these taxa are rarer. Even though co-abundance patterns are fundamentally dataset-dependent, several of the CAGs that we underscored in the Colombian cohort were partly replicable and might represent general ecological associations in the human gut microbiota.  For each food, the proportion of individuals who reported having eaten it in the last 24 hours and the mean intake in grams is given. Macronutrient intake in the studied population, expressed as the percentage of calories contributed by total carbohydrates, protein and total fat, was (mean ± SD) 55.4 ± 3.0%, 15.7 ± 1.4% and 28.7 ± 2.5%, respectively. Fiber intake was 17.7 ± 5.1 g.

Marker taxa
Sample size, Westernization status and relative abundances of the marker taxa of the countries included in the analysis of publicly available datasets.

Spearman's rho
Spearman's correlation coefficients of the operational taxonomic units (OTUs) that had a median abundance ≥0.01% and that were significantly correlated (q-value<0.05) with one of the first three axes of the weighted UniFrac principal correspondence analysis (PCoA). Rho<-0.3 or rho>0.3, and q-value<0.05 are highlighted in bold. The percentages of explained variations are given for each PCoA axis. 63E-11 Abundances of inferred metabolic modules for each co-abundance group (CAG) of microbes in the subset of participants forming high-abundance poles (HAPs; n=114). P-values (from Kruskal-Wallis tests) and q-values (Benjamini-Hochberg correction) denote differences in the functional potential among CAGs. Correlations between the relative abundances of metabolic modules depicted in Fig. 5 and CAG abundances in the complete dataset (n=441). Spearman's rho and qvalues are shown.