Western and non-western gut microbiomes reveal new roles of Prevotella in carbohydrate metabolism and mouth–gut axis

The abundance and diversity of host-associated Prevotella species have a profound impact on human health. To investigate the composition, diversity, and functional roles of Prevotella in the human gut, a population-wide analysis was carried out on 586 healthy samples from western and non-western populations including the largest Indian cohort comprising of 200 samples, and 189 Inflammatory Bowel Disease samples from western populations. A higher abundance and diversity of Prevotella copri species enriched in complex plant polysaccharides metabolizing enzymes, particularly pullulanase containing polysaccharide-utilization-loci (PUL), were found in Indian and non-western populations. A higher diversity of oral inflammations-associated Prevotella species and an enrichment of virulence factors and antibiotic resistance genes in the gut microbiome of western populations speculates an existence of a mouth-gut axis. The study revealed the landscape of Prevotella composition in the human gut microbiome and its impact on health in western and non-western populations.

b) Relative abundance of Prevotella genomes retrieved from NCBI database and MAGs present in more than 80% of samples in non-western populations. c) Relative abundance of Prevotella genomes retrieved from NCBI database and MAGs having average relative abundance > 0. 5  Relative abundance Relative abundance of genomes belonging to Prevotella genus (> 1% relative abundance)   Supplementary Figure13: Composition of P. copri clades in different healthy populations a) Representation of genomes/bins from each clade in different healthy populations. It is represented using four different criteria i.e., for each population, percentage of genomes/bins from each clade was calculated by applying the criteria that they should present in at least one sample, more than 10% of the samples, more than 50% of the samples and more than 70% of the samples. b) Relative abundance of four P. copri clades in each population   11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

Supplementary Note 1: Classification of samples as western and non-western
The classification of populations as 'western' or 'non-western' was made on the basis of traditional lifestyle, diet, and geographic and sociodemographic definitions. Western populations include Europe, the USA and Canada 1-4 (refer https://www.worldatlas.com/articles/list-of-western-countries.html). An Intelligible definition of western and non-western classification of cohorts was mentioned in a recent study entitled "Extensive Unexplored Human Microbiome Diversity Revealed by Over 150,000 Genomes from Metagenomes Spanning Age, Geography, and Lifestyle". This study describes westernization and urbanization as synonymous terms, a complex process that occurred during the last few centuries involving profound lifestyle changes compared to populations prior to the modern era. The non-western populations that we have considered for our study were a subset of the cohorts labeled as 'non-western' populations by the study mentioned above. As described in the above-mentioned study, we adopt the terms "Westernized" and "non-Westernized" as umbrella terms to depict populations that differ by at least the majority of the above factors even though this definition comprises very heterogeneous populations. Prevotella genome abundance data showed that location of sample collection has the highest and significant contribution towards the inter-sample variation in Indian population(barplot).

Supplementary Note 4: Clustering of Prevotella genomes and examining differentially abundant Prevotella genomes in western and non-western populations
Using the 2,204 genomes, we generated individual distance trees for Prevotella genus using the 'complete' hierarchical clustering method implemented in the Fastcluster R package 5 . We calculated the number of clusters recovered using a distance cut-off of 0.05 (95% ANI), 0.03 (97% ANI) and 0.01 (99% ANI) (Almeida et al., 2020) 6 and resulted in 228, 502 and 1856 clusters respectively.
We have clustered the 2,204 Prevotella genomes based on a distance cut-off of 0.00 (100% ANI) and resulted in 2,204 clusters indicated that no two genomes/bins are 100% identical.
Clustering based on distance cut-off of 0.05 (95% ANI; species-level clustering) resulted in 228 clusters (Supplementary Data 7). These 228 clusters were named as species-level clusters, and 36 clusters (including 1740 Prevotella genomes; ~78.95%) out of them were having >= 10 genomes in each cluster. Annotation of these clusters was carried out by examining the clustering of 547 NCBI annotated Prevotella genomes out of 2204 total genomes. 213 out of 228 (93.42%) clusters were having NCBI annotated Prevotella genomes ( Supplementary Data 2, sheet2). We have extracted a representative from each species-level cluster and plotted a phylogenetic tree (Supplementary Figure 7). Two species-level clusters (Cluster 3 and 1) were having > 100 genomes present (379 and 280 in Clusters 3 and 1, respectively), and both of them turned out to be Prevotella copri (Supplementary Figure 8).
Cluster5 also contains one P. copri genome; P copri indica. Higher inter-genome diversity of P. copri can be a reason for assigning them into different clusters. Also, we have plotted separate phylogenetic trees for 36 species-level clusters having >= 10 genomes (Supplementary Figure 9 and 10). Designated strain JS262T of Prevotella koreensis was isolated from the human subgingival plaque of periodontitis lesion. The biochemical characteristics of this strain (strain JS262T) were similar to those of P. intermedia and P. nigrescens. P. pallens were initially called as P.
intermedia/nigrescens-like organism (PINLO) because of its phenotypical resemblance with these microbes 13,14 . Designated strains of P. ihumii and P. koreensis were isolated recently and their functional properties remain unexplored. The phylogenetic relatedness of P. ihumii with the P. intermedia group of bacteria has been reported in previous studies. It is evident from the literature that these Prevotella species are also involved in the complex process of biofilm formation, which is potentially connected with poor oral hygiene and involved in the cause of oral diseases, such as prevalent gingivitis and periodontitis. The phylogenetic tree given below indicates the relatedness of newly characterized P. ihumii species and P.
intermedia group bacteria.

Species name
Reports regarding association with inflammatory conditions

References
Prevotella intermedia • P. intermedia is an established oral pathogen belonging to the "orange complex". • The association between gingivitis and species of this group of bacteria seems to be most pronounced for P.intermedia, which shows a recovery rate of > 70% in adults with gingivitis • P. intermedia identified as the second most dominant bacterium in chronic periodontitis (after Po. gingivalis). A large contribution of its transcripts to the total mapped reads in periodontitis was observed. Interestingly, Prevotella spp. were found to be predictive for the onset of early childhood caries • The acute periodontal abscess seems most consistently to be associated with either Porphyromonas gingivalis or Prevotella intermedia or both • P. intermedia is a rare causative agent for aortitis. • Periodontitis and P. intermedia are associated with severe asthma 15 Figure 18a).
Further, this analysis clustered the US and Netherlands populations that also displayed the lowest average inter-sample distances based on Bray-Curtis distance matrix, which corroborates with the clustering of these two populations based on Prevotella genome abundance ( Figure 2) suggesting a similarity in Prevotella-associated carbohydrate metabolizing activity in these two populations (Supplementary Figure 18b). and GH13 were among the five GHs highly abundant in the Indian population. GH12 and GH78 family were also more abundant in the Indian cohort. GH 27 and GH36 were also found discriminating Indian population from other populations, though a comparable abundance was observed in US and Netherlands populations (Figure 6a, b). Further, 773 out of 782 contigs have Neopullulanase_SusA and pullulanase (involved in the metabolism of α-1,4 and α-1,6-linkages present in starch-derived glucans, respectively) in the same PUL. Operon prediction analysis (using operon-mapper) revealed that these two genes are in same operon (with >95% probability).
The order and frequency of genes present in the pullulanase-containing PULs revealed more clear understanding of these PULs in starch and non-starch metabolism of plantpolysaccharides, mainly cereal grains (Tables (i), (ii), (iii), (iv)).    genomes/bins). Pairwise alignment of contigs in CBs against PPG database using BLASTN was performed. The logic was that the strains of same species should share homology between most contigs, and contigs that fail this condition probably represent contamination. Contigs in the CBs that failed to align at ≥70% nucleotide identity over ≥25% length to any of the closely related genomes were flagged for removal. Further, we have clustered these 693 genomes/bins using Mash v1.1 with a criteria of Mash distance ≤ 0.05, P ≤ 0.001. This resulted into 260 clusters and the contigs that were assigned to a different Mash cluster compared to highest number of contigs were aligned to, were also marked as contamination

Refinement of bins on the basis of taxonomic annotation
As a next level of bin refinement, we have assigned taxonomy for each contig of each bin using CAT taxonomic assignment tool with most recent version of preconstructed database files were downloaded from this link https://tbb.bio.uu.nl/bastiaan/CAT_prepare/ for mapping of predicted ORFs and taxonomic assignment of each contig in the bins. All contigs classified till Prevotellaceae family from each bin were retained and the other contigs were removed. After HQ binning criteria by checkm and manually removing redundancy in the bins, 25 remained for further analysis. We have also carried out BAT analysis (assigning taxa to bins) on the final set of bins for reconfirming the taxonomic lineage of each bin