Response of the human gut and saliva microbiome to urbanization in Cameroon

Urban populations from highly industrialized countries are characterized by a lower gut bacterial diversity as well as by changes in composition compared to rural populations from less industrialized countries. To unveil the mechanisms and factors leading to this diversity loss, it is necessary to identify the factors associated with urbanization-induced shifts at a smaller geographical scale, especially in less industrialized countries. To do so, we investigated potential associations between a variety of dietary, medical, parasitological and socio-cultural factors and the gut and saliva microbiomes of 147 individuals from three populations along an urbanization gradient in Cameroon. We found that the presence of Entamoeba sp., a commensal gut protozoan, followed by stool consistency, were major determinants of the gut microbiome diversity and composition. Interestingly, urban individuals have retained most of their gut eukaryotic and bacterial diversity despite significant changes in diet compared to the rural areas, suggesting that the loss of bacterial microbiome diversity observed in industrialized areas is likely associated with medication. Finally, we observed a weak positive correlation between the gut and the saliva microbiome diversity and composition, even though the saliva microbiome is mainly shaped by habitat-related factors.

Enterotype and stomatotype distribution in the studied populations.
Enterotypes Stomatotypes  F1  F2  F3  F4  S1  S2  Rural  26  9  32  11  20  22  Semi-urban  11  12  4  4  14  11  Urban  5  19  0  8  26  5 The human gut and saliva microbiomes along a gradient of urbanization in Cameroon (Lokmer et al.) the F3 enterotype. In addition, the urban population had almost twice more individuals with F4 (25%) compared to the semi-urban (13%) and rural population (14%). For all three data subsets, enterotypes were best predicted by the presence of Entamoeba sp., which was found in 60% of F2 and 75% of F3 individuals, but only in 28% of F1 and 9% of F4 individuals. The next most important factors were the water source and urbanization level, further followed by the Bristol stool scale score for the whole dataset only, with the highestscoring individuals hosting mostly F1 and the lowest-scoring ones hosting F2, while F3 and F4 individuals were characterized by intermediate values.
The high prevalence of Prevotella-dominated enterotypes (F1 and F3) is not surprising for a country characterized by low level of industrialization, but it is interesting that some previous studies did not identify the Ruminococcaceae enterotype in lowly industrialized African populations 5,6 , which was common in our study. On the other hand, Mobeen et al. 6 reported a Ruminococcaceae-dominated enterotype as typical for urban Colombians. Finally, whereas the Bacteroides enterotype is expected to increase in prevalence with increasing urbanization and industrialization [5][6][7][8] , it is surprising that we found no direct association with diet, despite the dietary differences between rural and urban populations and the previously established links with diet 7 . Still, it is possible that this is due to the fact that no single dietary item, but rather an average diet is associated with the enterotype variation and is thus in our case best predicted by the urbanization level as a whole. Finally, similar to Vandeputte et al. 9 and Falony et al. 10 who studied industrialized populations, we observed an association between the Bristol stool scale score and enterotypes, with Ruminococcaceae scoring lowest and Prevotella enterotype highest on the scale. In conclusion, the observed discrepancies with the previous studies could partially be explained by the scarcity of data on enterotypes in Africa 11 , taken that the identity and prevalence of enterotypes may vary considerably with geography 5,6,11,12 .

Stomatotypes
We identified two stomatotypes in the studied populations: S1, dominated by Prevotella, Streptococcus and to lesser extent by Veillonella, Neisseria and Pasteurellaceae, and S2, enriched in an Enterobacteriaceae genus. 84% of individuals in urban population hosted the S1 stomatotype, which was significantly different from the semi-urban and rural populations (Fisher's exact test, p=0.005 and 0.054), where both stomatotypes were equally represented.
Regarding the best stomatotype predictors, only the rural-urban data subsets had Kappa values > 0.3 (representing a "fair agreement" 13 ) and were considered for interpretation. In both cases, the stomatotype was best predicted by habitat-related factors, and additionally with the consumption of peanuts if FFQ data were included. Specifically, S2 individuals were more likely to live in houses without cement floor and tap water, to have animals in household and eat more peanuts, all of which were associated with the rural environment.
Stomatotypes 4 have received much less attention than enterotypes so far. Ding & Schloss 3 identified four different saliva bacterial community types, whereas Willis et al. 4 found two. Our S2, dominated by Prevotella, overall corresponds to Ding's & Schloss's 3 A and C and Willis's et al. 4 stomatotype 2. On the other hand, Enterobacteriaceae (enriched in our S1) have been previously found in significant concentrations in the saliva microbiome of African populations [14][15][16] , whereas they are virtually absent in other parts of the world 3,4,14,16,17 ). Li et al. 16 proposed that high abundance of Enterobacteriaceae could be due to high temperatures, but this does not explain the absence of Enterobacteriaceae in hunter-gatherer and traditional agriculturist populations from Phillipines 17 . Whereas Ding & Schloss 3 found significant association between the Prevotella dominated stomatotypes and enterotypes, we found no such correlation. However, it is important to mention that Prevotella was abundant in both stomatoypes defined here, despite the Enterobacteriacae dominance in S1. Finally, it is noteworthy that grouping of the saliva microbiomes into three stomatotypes was only slightly worse according to the goodness-of-fit statistics. This third stomatotype was dominated by Streptococcus, followed by Prevotella, Neisseria and unclassified Pasteurellaceae. This stomatoype shared some characteristics with the S1 of Willis et al. 4 (higher relative abundance of Neisseria and to certain extent Haemophilus) and was mostly dominant in the urban population.

Phylofactorization
(Supplementary Table 6, Supplementary Fig. 5, two figures below) Although ASV (amplicon sequence variants) seem to be best suited for assessing fine ecological differences between similar ecosystems 18,19 , different factors may act at different scales (e.g. 20 ) and clustering may be a better approach in some cases 18 . However, it is often difficult to know a priori which scales are important and, in addition, multiple scales may be affected by the factor(s) of interest. In order to address this issue, Washburne et al. 21 developed a method called phylofactorization (implemented in the R package phylofactor), based on a graph-partitioning algorithm that cuts the phylogenetic tree into groups of lineages ("phylofactors") whose relative abundance changes the most in response to the studied factor, in our case to urbanization. Briefly, phylofactors are created based on some objective function. We used a custom function -a generalized least squares model as implemented in the function gls in the R package nlme 22 , with variance structure specified if needed (see Methods for the alpha diversity analysis. We used the maximization of F-statistic as a criterion for phylofactor selection. According to the authors 21 , in this way the selected phylofactors represent the lineages that most predictably vary with the studied factor. We did not pre-specify the number of phylofactors, but used a KS (Kolmogorov-Smirnov) test as a stopping criterion as described in 21 .
The human gut and saliva microbiomes along a gradient of urbanization in Cameroon (Lokmer et al.) Phylofactors in the gut microbiome Phylofactorization detected 104 lineages (ASVs or clades) consistently affected by urbanization in the gut microbiome (based on a tree with 561 edges). The fact that only a minority (17) of these phylofactors encompassed more than a single ASV indicates that urbanization in this case affected the gut microbiome mostly at a fine phylogenetic scale. The results were overall concordant with ALDex2 (see Results and Supplementary Table 5), although the relative importance and statistical significance differed to certain extent between the methods. For example, the most differentially abundant OTU (Succinivibrio) according to ALDEx2 was selected only as the 17 th phylofactor (Aeromonadales). However, the direction of changes and qualitative conclusions remained concordant. Interestingly, 16 phylofactors were composed of one or more Prevotella ASVs that all had consistently higher abundance in rural samples, suggesting that different strains disappear at different rates in response to urbanization. Prevotella is often found in high relative abundance in rural populations 23 Table S7). Similarly, different Ruminococcacae primarily responded differently to the urbanization level, but some could be to certain extent associated with the presence of Entamoeba (Supplementary Table 7). Notably, the first phylofactor was composed of a Ruminococcaceae ASV whose abundance decreased with urbanization. The largest phylofactor, F48, contained ten bacteria mostly classified as Erysipelotrichaceae, with relative abundances decreasing along the urbanization gradient. Interestingly, Erysipelotrichaceae have been linked with high-fat diet 33 and obesity 34 , whereas their link with inflammatory bowel diseases is less clear (for review see 35 ). Phylofactors in the gut microbiome -labels consist of ASV ID, lowest assigned taxonomic level, phylofactor number and +/-for increase/decrease with urbanization -single ASV phylofactors are marked by pink dot, multi-ASV by shaded rectangles The human gut and saliva microbiomes along a gradient of urbanization in Cameroon (Lokmer et al.)

Phylofactors in the saliva microbiome
For the saliva microbiome, 26 edges out of 217 tested significantly differentiated between the three populations, with nine phylofactors composed of more than one ASV. Unlike the bacteria in the gut microbiome, the response direction was generally consistent for most genera (e.g. Streptococcus, Veilonella), with an interesting exception of Prevotella (which always decreased with urbanization in the gut microbiome). Interestingly, the 20 th phylofactor reflected the enrichment in Enterobacteriaceae the rural populations, which characterizes the stomatotype S1 discussed above. Due to this strong enrichment, relative abundances of the most phylofactors increase with increasing urbanization, with the exception of several Prevotella and Pasteurellaceae ASVs, both of which have been associated with various diseases 36 , also with the healthy oral microbiomes of rural populations from non-industrialized countries 17 .
Overall, phylofactorization offers a complementary approach to the methods with a priori defined taxonomic scales. However, its results depend on the quality of the phylogenetic tree 21 and high-quality trees will be needed in order to use its full potential. Still, it highlights the need to investigate the response of the gut microbiome to urbanization at the finest taxonomic scales in order to decipher ecological interactions within the gut ecosystem. Otu0029 Enterobacteriaceae_unclassified F18− Phylofactors in the saliva microbiome -labels consist of ASV ID, lowest assigned taxonomic level, phylofactor number and +/-for increase/decrease with urbanization -single ASV phylofactors are marked by pink dot, multi-ASV by shaded rectangles The human gut and saliva microbiomes along a gradient of urbanization in Cameroon (Lokmer et al.) Effect of OTU clustering on the results (Supplementary Fig. 11-13, Supplementary Table 2, 5, 7, 8 & table on figshare) Association between microbiome features and host factors can depend on the phylogenetic resolution at which OTUs are defined 20,37 . To examine the effect of OTU clustering on our conclusions, we compared the results based on ASVs to the results based on the OTUs clustered by opticlust or bdtt methods, with cutoffs between 0.01 and 0.15. The effect of urbanization on alpha diversity was overall largely consistent across the clustering methods and cutoffs for both the gut and saliva microbiome ( Supplementary Fig.  11). This was also generally the case for the best predictors of the gut microbiome alpha diversity, although the selected set of predictors somewhat varied across the data subsets (whole, rural-urban with and without FFQ data) and diversity indices (Supplementary Table  3, Supplementary Fig. 12). For the saliva microbiome, apart from the overall consistent results obtained for 1 D ph and 2 D ph , the predictive power of the models based on other indices was low and therefore not considered as reliable for interpretation. Regarding the gut microbiome community composition, the results were consistent across the clustering methods and cutoffs, with the amount of variation explained by the ordistepselected models slightly higher with increasing cutoff levels ( Supplementary Fig. 13). It is noteworthy that the time since the last use of antibiotics was among the most important explanatory variables for some of the higher cutoffs, but not for the fine-scale OTUs and ASVs. Here again, the composition of the saliva microbiome varied in a less predictable manner and the explanatory factors were less consistent across the data subsets, clustering methods and levels (Supplementary Table 5, 7 & 8, Supplementary Fig. 13). Regarding the relationship between the gut and saliva microbiome, species and lineage richness were weakly but positively correlated for all cutoffs and both methods (mean Pearson r = 0.23 and r = 0.22, respectively, Supplementary Fig. 9). Although we observed some additional -always weak and positive -correlations for other indices, they varied across the clustering methods and cutoffs. In contrast to ASVs (marginally significant, p = 0.052), we found no evidence that the gut and saliva microbiome within individual shared more species than a random gut-saliva microbiome pair when controlling for the urbanization level (but the effect was significant for free permutations, Supplementary Fig. 10).    Supplementary Fig. 4. The most important predictors of ASV diversity for the rural-urban data subset selected by random forest regression. Results for the gut and saliva microbiome with (A, C) and without (B, D) FFQ data included. The values represent the mean of 10 model replicates. Depicted variables were selected by kmeans clustering of the variable importance values as described in Methods section, only for the models with R-squared > 0.1. Index explanations can be found in Table 1. See also Fig. 3. and Supplementary  Relative MSE decrease + 95%CI Supplementary Fig. 5. Major phylofactors in the gut and saliva microbiome associated with the urbanization gradient. Phylofactors are taxa or group of taxa that best differentiate between the tested groups, in this case between the urbanization levels (see also Supplementary Results). Pairs of plots for the gut microbiome phylofactors are on the left, for the saliva microbiome phylofactors on the right. The left plot in the pair shows the predicted change in relative abundance of the phylofactor on the log scale, the plot on the right shows the observed ratio of geometric means of the phylofactor vs. the rest of the community on the original scale.The taxon name is the lowest common ancestor of the taxa in the given phylofactor. Only the first six phylofactors are shown for each microbiome type

Group1/Group2
Ratio of geometric means: factor 6 S  show the first and the third axis of the microbiome variation for the whole dataset (see also Fig. 4 and Fig. 5), E-L show the first three axes of varitarion in he rural-urban data subset only. Individuals colored by urbanization level and shapes representing the community types (enterotypes or stomatotypes, see Supplementary Results) are depicted in lthe left column, contextual variables selected by ordistep and ASVs associated either with these variables or with the ordination axes (only the ASVs and variables within the lower or upper 2.5% quantile are shown, for all variables (see Supplementary Table 7 -8) are shown in the right column. ASVs and numerical variables are represented by arrows, factors are represented by their levels' names placed at the centroid of individuals of the given category. M-P Variation in community composition explained by different factors selected by envfit (right, each variable tested independently) and ordistep (left, the factors retained in the best non-redundant model) for the gut (M-N) and saliva (O-P) microbiome in the rural-urban data subset; dr = 24h-recall, ffq = food frequency questionnaire. Compete results for each ASV, explanatory variable and clusting method can be found in Supplementary upplementary Fig. 7. Enterotypes and stomatotypes identified by Dirichlet multinomial modelling. A) Enterotypes (gut) and B) stomatotypes (saliva) dentified by Dirichlet multinomial models of genus-level agglomerated data, showing the taxa accounting for 90% of the difference between the chosen and the null model. The broader olumns represent the predicted community type and the narrower columns represent the individual samples. Additional information on community types can be found in Supplementary ig. 8 and Supplementary Results.   Fig. 8. Best predictors of enterotypes and stomatotypes identified by random forest classification. A) Kappa and B) accuracy values for community types of both microbiomes, for all three data subsets. The values were acalculated either on a separated test data (pink) or as out-of-bag estimates (green).
Only the models with test Kappa >=0.3 were considered for interpretation. Variable importance for enterotype (C, E, G) and stomatotype (D, F, H) prediction in the complete (C, D) and rural-urban datasets with (E, F) and without FFQ (G, H) data, respectively. Only the variables with mean importance higher than a random variable (see Methods) are shown. The significant variables, i.e. the variables selected by kmeans clustering are shown in red bold. Community types area represented in Supplemenatry Fig. 7 and described in detail in Supplementary Results. Within location Supplementary Fig. 10. Are the gut and saliva microbiome of the same individual more similar to each other than a random gut-saliva microbiome pair? A) p-values and standardized effect size expressing if and how much the gut and saliva microbiome of a single individual are more similar (in terms of taxon turnover) to each other than a random gut-saliva pair (based on 1000 free permutations); B) p-values and standardized effect size expressing if and how much the gut and saliva microbiome of a single individual are more similar to each other than a random gut-saliva pair in the same(rural, semiurban, urban) population; C) relationship between log10 (p-value) for the free and urbanization level constrained results; D) relationship between the standardized effect sizes for the free and urbanization-level constrained results. Significant effect sizes are in red. The diagonal line in C-D represents perfect correlation.  Supplementary Fig. 11. Effect of OTU clustering on the relationship between alpha diversity and urbanization. Urbanization gradient is represented by linear polynomial contrast ("linear trend"). Bold points denote significant effects. Points represent % change with 95 % confidence intervals.       Fig. 12. Effect of OTU clustering on the selection of alpha diversity predictors. Best predictors of alpha diversity by random forest regression across the clustering methods and cutoffs. The plots show the overlap (or lack of it) between the significant predictors (selected by kmeans clustering of random forest importances, see Methods) of alpha diversity across the clustering methods and cutoffs. Only the predictors from models with R2 > 0.1 were considered for interpretation. Columns represent the results for the 1) whole dataset and for the rural-urban subset 2) with and 3) without FFQ (food frequency questionnaire data included). First six rows show the results for the gut, the last six for the saliva microbiome (each row shows the results for one index). If there is a single multicolored column -the results are completely consistent for all the significant models (R2>=0.1) across the methods and cutoffs. Note that very few models are significant for non-phylogenetic indices for the saliva microbiome. See also Supplementary  Supplementary Fig. 13. Effect of OTU clustering on the amount of variance explained and the variables explaining the variation in the gut and saliva microbiome composition. Variance explained by the ordistep-selected best model across clustering methods and cutoffs for the gut (left) and saliva (right) microbiome. Two plots are available for each data subset (whole, rural-urban with and without FFQ): the one in the first row shows the effect of individual variables, the one in the second row shows variables grouped by type (e.g. medical, dietary…). Each column represents one of the datasets obtained by two clustering methods (opticlust and bdtt) for clustering cutoffs 0.01-0.15 at 0.01 clustering steps as described in Methods. The first column represents the ASV dataset on which the main conclusions of the study are based. The variables are ordered by their contribution to the explained variation for each method and cutoff separately. dr = 24h-recall data.

List of supplementary tables
The tables are available in a separate *xls file.
Supplementary Table 1. Contextual information about the sampled individuals.
Supplementary Table 2. Alpha diversity of the gut and saliva microbiome along the urbanization gradient.
Supplementary Table 3. Best predictors of alpha diversity identified by random forest regression.
Supplementary Table 4. Correlations between the collected contextual variables (metadata); Supplementary Table 5. Differences in ASV/OTU abundances along the urbanization gradient (ALDEx2). Table 6. Phylofactors associated with the urbanization gradient for the gut and saliva microbiome. Table 7. ASVs and OTUs associated with the variation in the gut and saliva microbiome composition or metadata. Table 8. PCA coordinates of the variables selected by envfit for each data subset and microbiome type. Supplementary Table 9. Effect of quality control on number of reads and unique sequences in total and per sample.