Gut-microbiota in children and adolescents with obesity: inferred functional analysis and machine-learning algorithms to classify microorganisms

The fecal microbiome of 55 obese children and adolescents (BMI-SDS 3.2 ± 0.7) and of 25 normal-weight subjects, matched both for age and sex (BMI-SDS − 0.3 ± 1.1) was analysed. Streptococcus, Acidaminococcus, Sutterella, Prevotella, Sutterella wadsworthensis, Streptococcus thermophilus, and Prevotella copri positively correlated with obesity. The inferred pathways strongly associated with obesity concern the biosynthesis pathways of tyrosine, phenylalanine, tryptophan and methionine pathways. Furthermore, polyamine biosynthesis virulence factors and pro-inflammatory lipopolysaccharide biosynthesis pathway showed higher abundances in obese samples, while the butanediol biosynthesis showed low abundance in obese subjects. Different taxa strongly linked with obesity have been related to an increased risk of multiple diseases involving metabolic pathways related to inflammation (polyamine and lipopolysaccharide biosynthesis). Cholesterol, LDL, and CRP positively correlated with specific clusters of microbial in obese patients. The Firmicutes/Bacteroidetes-ratio was lower in obese samples than in controls and differently from the literature we state that this ratio could not be a biomarker for obesity.

Alpha and beta diversity analysis. Different alpha diversity profiling indices have been used that estimate either the community richness (Chao1-index) or richness and evenness (Shannon or Simpson indexes). It is important to highlight that none of the analyzed alpha diversity indexes reached statistical significance in the comparison between the complete case series of obese patients (55 fullOB) and the 25nwHD. In detail, the trend of Chao1 mean values were slightly higher in pathological samples (143. 2 ± 22.3) rather than in nwHDs (141.5 ± 30.1) ( Supplementary Fig. 1a). Simpson (Si) and Shannon (Sh) indexes behaved in the opposite way, where both were slightly higher in nwHDs (Si: 0.946 ± 0.014-Sh: 5.057 ± 0.320) rather than fullOB (Si: 0.87 ± 0.10-Sh: 2.83 ± 0.45) ( Supplementary Fig. 1a). The Bray-Curtis index used for beta diversity analysis clearly showed statistical significance (PERMANOVA p < 0.02) in the comparison between fullOB and nwHDs, thus indicating a difference in microbiome composition between patients and normal weight controls ( Supplementary Fig. 1b).
Comparative analysis: fullOBvsnwHD. The specific associations between taxa present in fullOB and nwHD were analyzed using the Calypso 13,14 package which considers the sparsity (i.e., a dataset with many values equal to zero) and the compositional origin of microbiome data. Specifically, we used the sparse Partial Least Squares-Discriminant Analysis (sPLS-DA) 14 that associates the importance of a specific taxon to describe a group of samples (Fig. 1). The results indicated that among the first 5 genera ordered for importance, 3 of them showed a positive correlation with fullOB (Streptococcus, Acidaminococcus and Sutterella with importance scores of 0.40, 0.39, and 0.36, respectively). The relative abundance analysis, using 4 different algorithms within MicrobiomeAnalyst, and the sparse correlation for compositional data (SparCC) analysis confirmed this result ( Fig. 2a and Supplementary Table 1). A positive correlation with fullOB was found for Sutterella wadsworthensis and Streptococcus thermophilus, both characterized by 10 times higher abundances in obese patients compared to nwHD (EdgeR log 2 fold change values 3.4795 and 3.7707 respectively with p-value < 0.01; see Supplementary  Table 1). Prevotella genus and PrevotellA we correlated with full rather than nwHD (Fig. 2a and Supplementary  Table 1). The Microbial Dysbiosis index (MD-index) for this comparison is 1.5764 (EdgeR log 2 fold change values 3.3504 and 3.3511 respectively with p-value < 0.05; see Fig. 2a) indicating a high imbalance (dysbiosis) in the microbial flora of obese patients compared to controls (nwHD).
Comparative analyses: OB-G vs nwHD and OBc-G vs nwHD. The analysis of the more relevant genera using supervised Random-Forest indicated that Streptococcus was a genus tightly linked to OB-G, followed by Sutterella, Clostridium, and Lactobacillus ( Fig. 2b and Supplementary Table 2). The species that showed a positive association with OB-Group of samples were Sutterella wadsworthensis and Blautiaproducta(OBB error:0.322; Sensitivity: 0.64 and 0.70 of Specificity; see Supplementary Table 2). Both the Sutterella genus and the Sutterella wadsworthensis species have confirmed a stronger association with OB-G rather than OBc-G, considering the number of different methods reaching statistical significance and also the result of the relative abundance analysis (Fig. 2b,c. Supplementary Tables 2 and 3 EdgeR log 2 fold change 3.7642 and 4.6938 with p-value < 0.05 and Supplementary Table 3 EdgeR log 2 fold change 2.9930 and 4.1985 with p-value < 0.05). The same was true for the Streptococcus genus (EdgeR log 2 fold change 2.6028 with p-value < 0.05)and their descendent species (i.e., S. australis, S. salivarius and S. thermophilus. EdgeR log 2 fold change 2.4211, 2.1653 and 5.0708 with p-value < 0.05) that were better described in the OB-Gpatients (Supplementary Table 2). Random-forest's supervised analysis indicated that the genera Lactobacillus, Gemminger (OBB error 0.413; Sensitivity: 0.61 and 0.56 of Specificity), and the species Coprococcus comes and Bacteroides massiliensis (OBB error 0.37; Sensitivity: 0.64 and 0.61 of Specificity) showed higher "mean decrease accuracy" and thus a positive association with the OBc-G patients (Supplementary Table 2). More in detail, in relative abundance analysis the Bacteroides massiliensis showed a statistically significant value only in the comparison between OBc-G and nwHD (Supplementary Table 3 EdgeRlog 2 fold change 2.9527 with p-value < 0.05), but not in the comparison between OB-G and nwHD (Supplementary Table 2). On the contrary, Coprococcus comes showed a positive association with both OB-G and with OBc-G group of patients (Supplementary Tables 2-3 www.nature.com/scientificreports/ It is to be noted that some Bacteroides genera behave in opposite ways, indeed Bacteroides fragilis, Bacteroides plebeius, and Bacteroides thetaiotaomicron (EdgeR log 2 fold change 3.7523, zero-inflated Gaussian fit: 0.0189 and EdgerR log 2 fold change 2.1903 with p-value < 0.05) showed higher abundances in OB-G patients compared to nwHD. While, on the contrary, Bacteroides faeces and Bacteroides massiliensis (EdgeR log 2 fold change 3.1409 and 2.9527 with p-value < 0.05) showed a positive association with OBc-G compared to nwHD (Supplementary Tables 2-3). The MD-index behaves in opposite ways in the OB-G and OBc-G correlation networks with respect to nwHD. Indeed, MD-index computed for OB-G was 1.3993, showing high dysbiosis, while in OBc-G was-0.2786 (Fig. 2b,c, respectively) which still underlinesa dysbiosis but much less evident for OB-G samples. Indeed, in the group of obese patients without any complication (OB-G), there was a slightly unbalanced overrepresentation of some genera concerning the microbiome of control subjects. Whereas in obese patients with complications, some genera were underrepresented concerning controls (nwHD).

Supervised Random Forest analysis and normal weight healthy donors (nwHD).
We applied the Random Forest machine-learning algorithm 15 to identify taxa able to discriminate between patients with obesity and normal-weight donors with good classification performances. We already described the correlations between specific taxa and the OB-G or OBc-G. Here we closely analyzed taxa that classify the normal-weight healthy donor group. Alistipes genus (Supplementary Tables 1, 2: zero inflated-Gaussian fit of 0.0027 and 0.0015), different Alistipes species (Supplementary Table 1 Tables 3 and 4).
Correlation analysis of taxa clusters with physiological parameters. In the analysis of taxa related to samples of OBc-G, we found interesting positive correlations between some clusters of species and the total cholesterol, LDL levels (Fig. 3a,c and Supplementary Table 5a) and CRP (Fig. 3a (Fig. 3b). In the comparisons of the species analyzed entirely versus the same species analyzed after the feature reduction ( Fig. 3a vs. b), we found common species (i) positively associated with CRP and (ii) negatively associated with glycemia (0′, 60′, 120′): specifically, we found (i) Alistipes indistinctus, Clostridium innocuum, Desulfovibrio piger Prevotella ruminicola and Prevotella in common between clusters 18 and 14, while (ii) Acidaminococcus fermentans, Clostridium cocleatum and Clostridium ramosum in common between clusters 20-21 and cluster 11 (see Supplementary Table 5a,b).While the connection of Alistipes indistinctus and Clostridium innocuum with obesity or with clinical parameters related to obesity is not known in the literature, Desulfovibriopiger, Prevotella ruminocula and Prevotella species specie are already known to be associated with inflammation, insulin-resistance, hyperglycemia and type 2-diabetes 16 . It is important to highlight that Desulfovibrio piger (Clusters 6 for OB-G and Clusters 14, 18 for OBc-G) showed that the increase of the abundance of this species is always associated with the increase in the value of some clinical parameters critical for obesity (such as TRG for OB-G and CRP for OBc-G, respectively). Interestingly enough, in the clusters of taxa found negatively correlated with glycemia ( Fig. 3d,e, and Supplementary Table 5d,e) we found that the family Oxalobacteraceae and two genera descending from it, namely Herbaspirillum and Oxalobacter, were already known to be associated with a decreased of insulin-resistance and glycemia 17 .
Finally, in the analysis performed on the nwHD group, BMI-SDS showed a negative association with five species, enclosed in Cluster 19 (PC: − 0.64 with adj-p-value = 0.05) ( Fig. 3f and Supplementary Table 5f). This data strongly suggests that BMI-SDS could represent the most sensible clinical parameter to correctly classify children and adolescents as normal weight, overweight, and obese subjects and that the species belonging to this cluster could be considered protective against obesity.

Inferring functional (metabolic) pathways characterizing OB-G and OBc-G.
At least 20 different metabolic pathways, inferred with PICRUSt2 and present in the MetaCycdatabase 18 , showed statistical significance in the comparison between OB-G vs nwHD (Fig. 4a). Different pathways involved in amino acid biosynthesis showed a positive correlation with obesity in pediatric patients. Thus, pathways involving the phenylalanine (PWY-6628) and the tyrosine (PWY-6630) aromatic amino acids were among the entries with the highest statistical importance. In addition, tryptophan (PWY-6629) and methionine biosynthesis (HSERMET-ANA-PWY and HOMOSER-METSYN-PWY) showed higher abundances in OB-G, while an additional pathway (PWY-5345) for methionine biosynthesis showed an opposite behavior. More, different pathways involved in polyamine biosynthesis known to play a role in bacterial pathogenicity and biofilm formation showed higher abundances in the microbiome of OB-G. Among them, the POLYAMINSYN3-PWY is the one showing the higher importance of all pathways analyzed by Random-Forest analysis (mean decrease accuracy 0.006). Others like POLYAMSYN-PWY and the pro-inflammatory lipopolysaccharide 19 pathway (LPSSYN-PWY) showed higher abundances in OB-G compared to lean subjects. More, the second in order of importance in the Random-Forest analysis (mean decrease accuracy 0.004) was the super-pathway of (R,R)-butanediol biosynthesis (P125-PWY) which was overabundant only in nwHDs.

Discussion
Microbiome studies might be hampered by different technical problems related to the methods used for their analysis 21 not evident to the average readers. The 16S gene is structured in nine variable regions useful to define microbial taxonomy [22][23][24]  www.nature.com/scientificreports/ composition. Indeed methods of clustering the 16S sequences, bioinformatic pipeline, and the parameters used in the analysis might slightly modify the microbiome composition 21 . All these issues might be potential biases to complicate the comparison of microbiome biomarker (taxa or Firmicutes/Bacteroidetes-ratio) composition in different publications, even for the same pathology. Due to the not ideal primer pairs design that usually targets a single 16S variable region, it is likely to have bacterial taxa in the analyzed microbiome might be underrepresented. Thus, to overcome these problems, a possibility is to increase the number of the variable regions to be studied and to use different 16S-ribosomal sequences reference databases, bioinformatic pipelines, and www.nature.com/scientificreports/ parameters, thus improving the definition of biomarkers associated with the pathological microbiome. Our aim was not only an academical re-analysis of taxa associated with obesity but with the introduction of the analysis of multiple 16S-variable regions in the microbiome analysis together with the use of rarely used bioinformatic pipelines and parameters try to define which biomarkers were still associated with the obese patients. We believe that comparing the biomarkers defined using different primer pairs, chemistry, and methods with the ones already shown in published reports indicates the taxa that are more strictly associated with the pathology. The relative abundance analysis using the complete cohort of fecal samples of obese patients (fullOB) compared to nwHD confirmed that Acidaminococcus, Sutterella, Streptococcus, Prevotella, Lactobacillus, and some Bacteroides species correlated with obesity, as shown by different published articles (see a meta-analysis) 25 . Indeed, the Acidaminococcus genus was reported to be significantly associated with obesity in adult Hispanic subjects 26 and with pro-inflammatory diets 27 . In our data, Acidaminococcus was better associated with the group of patients with Severe Obesity (OB-SO-G) (Supplementary Table 4). Therefore, it was not astonishing that others have found such an increase in patients with type 2 diabetes (T2D) 28 . Sutterella genus was already described to correlate positively with obesity in obese Chinese children 29 , but others found the opposite in adults 30 . In line with this data, Sutterella wadsworthenis was reported to be positively associated with insulin resistance 31 . Our data showed taxa strongly associated with the microbiome of Obese patients (fullOB, OB-G, and OBc-G) in the comparisons with the fecal flora of normal weight control subjects (nwHD) (Supplementary Tables 1-4).
Streptococcus genus was already shown to be correlated in adult cases with BMI 32 . In this study, Streptococcus descendant like S.thermophilus was found to be associated with obesity also in our data. These also indicated a positive Prevotellaassociation between and its descendants in obesity. Indeed, the role of the Prevotell and Prevotella copri in obesity is still debated since beneficial and detrimental roles in health have been described for both taxa 33,34 . Prevotellaceae and the genus Prevotella have been associated with inflammation and insulin resistance 25,35 . It is also of note that Prevotellacoprishowed to be associated with an altered glucose metabolism leading to glucose intolerance and reduced insulin sensitivity due to the presence of the LeuBgene 31 . It is also known that Bacteroides and Prevotella genera have a negative correlation with serum leptin levels and a positive correlation with GHrelin serum levels 36 . Since leptin is known to inhibit hunger while GHrelin increases the drive to eat, the effect of these taxa on obesity is consistent 36 . Indeed, the leptin sensitizer butendiol (produced in P125-PWY pathways) was found to have a lower abundance in our simple-obese patients than in controls. Thus, the mechanism inducing obesity relative to the P125-PWY might induce a less efficient regulation of appetite by leptin 37 .
Obesity has been related to an increased risk of multiple diseases involving oxidative stress and inflammation 38 , and Prevotella species have been already described as being more abundant in obese patients with an inflammatory condition 26 . Along this line, in our obese patients with complications, at least ten different species were found to correlate positively with standard CRP levels, a marker of chronic inflammation. Among them, we www.nature.com/scientificreports/ found Prevotella ruminicola, Prevotella sp., the genus Mitsuokella and Desulfovibriopiger. Indeed, Desulfovibrio descendant species are known to produce endotoxins and favor alteration of gut permeability leading to the induction of pro-inflammatory responses [39][40][41] . Thisalso applies to the genus Collinsella, which we have shown with greater abundance in severely obese patients 39 . Furthermore, in our data, Bacteroides correlated with obesity, as described by others who have found a higher abundance in obesity and a positive correlation with BMI 42 . Some authors have found an inverse correlation between obesity and Bacteroides thetaiotamicron 43 compared to us. This discrepancy is intriguing since this specie is known to produce high amounts of small-chain fatty acids (acetate and propionate), and the overproduction of acetate is known to induce hepatic de novo lipogenesis and increase adiposity 44 .
Our data showed that descendant species belonging to the Gordonibacter genus have a positive correlation with plasma levels of total and LDL cholesterol. Along this line, species belonging to this genus were already known to have a positive association with total cholesterol plasma level 45 .
A recent report showed that different taxa belonging to the Phylum of the Firmicutes were found in the microbiome of adult obese or overweight patients 30 . Also, our data showed that Firmicutes descendants Acidaminococcus spp., Lachnospiraceae, Mitsuokella, Ruminococcus spp., Streptococcaceae, and Streptococcus are among the taxa that shared higher abundance values in obese patients. One exception is represented by the Odoribacter genus, although belonging to Firmicutes, which showed a higher relative abundance in normal-weight healthy (nwHD) donors. Indeed, the fact that this genus associates better with nwHDs is not surprising due to the anti-inflammatory potential of this microorganism 46 . Although the Firmicutes/Bacteroidetes-ratio was believed to be a biomarker for obesity 11,12 , its role in this condition was found to be contradictory 47 since some studies reported a positive correlation between F/B-ratio and the BMI values 48 , others, like our work, found no correlation or showed an opposite trend [48][49][50][51] and essentially a dominance of the Bacteroides genus in obesity 52 . About the inferred metabolic pathways associated with obesity, it is interesting to point out that the abundance of aromatic amino acids (tyrosine, phenylalanine, and tryptophan) has already been reported to be associated with obesity and insulin resistance 48 . In particular, tyrosine was shown to be the more prevalent amino acid associated with insulin resistance in obese children 53 . Tryptophan has also been implicated in the pathogenesis of metabolic disorders such as obesity 54 . Indeed increased levels of tryptophan have been linked to over-nutrition and might be responsible for obesity-related inflammation pathways 55 . Pro-inflammatory conditions are even supported by the pathways LPSSYN-PWY lipopolysaccharide and PWY-6478, both involved in LPS synthesis and assembly 19 . These pathways, along with PWY-341 (glycolysis V), are known to play a central role in promoting a pro-inflammatory environment that supports the production of inflammatory mediators by macrophages, thus contributing to insulin resistance. Furthermore, pyruvate, the final product of glycolysis, is metabolized into acetyl-CoA, which is essential for cholesterol and lipid synthesis 56 . All these conditions are linked to obesity and its complications. In addition, it was shown that the transfer of stools from lean donor recipients into metabolic syndrome patients increased insulin sensitivity of the latter and the abundance of 16 different taxa, including Oxalobacter formigenes. This is in line with the negative correlation found between fasting plasma glucose and Taxa Cluster 1, 2, and 3 shown in Supplementary Table 3a 17 .
The main limitation of our study is not many pediatric patients enrolled compared to the number of obese subjects in our region.The major strengths include: (i) the mono-centric recruitment, with strict patients selection; (ii) the use of BMI-SDS to define pediatric obesity (in detail, all overweight patients have been excluded); (iii) similar protocols of selection have been used for classifying regular weight healthy donors; (iv) the analysis of multiple 16S-polymorphic regions to define taxonomy and also (v) the use of different algorithms to analyze the microbiome composition among different groups. Furthermore,it is important to highlight thatour results have not been influenced by the use of drugs known to interfere with microbiota composition, such as metformin and liraglutide 8,9 .

Patients. Patients were stratified into different groups based on the BMI-SDS values obtained comparing
WHO growing curves corrected for sex and month age following the references shown https:// www. who. int/ tools/ growth-refer ence-data-for-5to19-years/ indic ators. https:// www. who. int/ tools/ growth-refer ence-data-for-5to19-years/ indic ators. The patients were defined as overweight if the BMI-SDS values had a standard deviation (SD): > + 1 SD < + 2; obese when > + 2 SD < + 3, and severely obese SD: > + 3. Monogenic or syndromic obesity were ruled out in all patients. Thus, we evaluated the fecal-microbiome of a total of 55 obese children and adolescents (fullOB) recruited at Giannina Gaslini Institute in Genoa, Italy, between February 2016, and October 2021 (mean age 13.1 ± 2.9, median 13.0, 36% female; BMI-SDS 3.2 ± 0.7). Inclusion criteria were: Caucasian subjects living in Northern Italy, personal history negative for acute or chronic gastrointestinal diseases, and/ or antibiotics or probiotics administration in the previous month. All patients were negative for autoimmune disease screening (i.e celiac and thyroid diseases). Among these fullOB samples, 32 patients were grouped based on BMI-SDS as severely obese (mean age 13.5 ± 3.5, median 13.6, 34% female; BMI-SDS 3.6 ± 0.5), while 22 of them were grouped as obese (mean age 13.2 ± 2.4, median 12.9, 36% female; BMI-SDS 2.6 ± 0.2) (see Tables 1, 2).
Research was performed in accordance with the Declaration of Helsinki. The study was approved by the local Ethical Committee of Liguria Region (approval letter, enclosed) and by Giannina Gaslini Institute (authorization letter enclosed).
Fecal microbiota analysis. DNA extraction from fecal samples was performed as reported 57 and it was used for the 16S amplification reaction performed with Ion 16S™ Metagenomics Kit (Thermo-Fisher Scientific). This method allows the PCR-amplification of 7 out of 9 informative 16S polymorphic regions 58 . Then up to 16 differently bar-coded libraries were automatically loaded onto an Ion-520-chip by the Ion-Chef and sequenced by the GeneStudio-S5-system (Thermo-Fisher Scientific). Data analysis was performed with the Ion-Reporter™ suite (v 5.18.0.2) using the curated-Greengenes (v13.5) and the MicroSEQ ID 16S-rRNA reference library (v2013.1) databases using standard parameters. Data analysis. Compositional/functional profiling and comparative-analysis of microbiome data were performed with Microbiome Analyst and Calypso web-tools 13,14,59,60 . All p-values have been adjusted to correct for multiple hypotheses, using Benjamini and Hochberg false discovery rate (FDR < 0.05), unless differently specified. Sparse Correlations for Compositional data (SparCC) 61 was applied after data-filtering to remove lowquality or uninformative features to study the network of correlation among taxa from the microbiome of obese patients and nwHD controls. In addition, we computed the Microbial Dysbiosis index (MD-index) as the logarithm of the sum of all taxa that increase their abundance over the sum of all taxa that decrease it 57 . WGCNA identified groups of taxa, or modules that were present across a set of clinical conditions, computing a similarity measure, such as Pearson's correlation coefficient, to calculate the relationship between pairs of taxa. These relationships are then used to construct a weighted network of taxa clustered to identify highly interconnected microorganisms, which are assumed to have a similar biological function or to be commonly regulated. For all the analysis Pearson's correlation coefficients were computed and its associated p-values were corrected for multiple comparisons using False Discovery Rate (FDR).The Multivariate clustering methodology based on the weighted correlation network analysis (WGCNA) was used to verify correlations of taxa clusters with the clinical parameters characterizing OB-G,OBc-G and nwHD 62 . In particular, we considered the following list of clinical parameters: sex, age (in months), BMI, BMI-SDS, serum levels of total HDL, LDL and cholesterol, triglycerides (TRG), glycemia-0' , glycemia-60' , glycemia-120' during oral glucose tolerance test, glycated-HbA1c, totalinsulin, ultrasensitive CRP (C-reactive protein), liver steatosis by ultrasound. For the normal weight healthy control group, namely nwHD we considered a subselection of the above-mentioned clinical parameters, which are sex, age, BMI and BMI-SDS. The red and blu colors in the WGCNA heatmaps indicate respectively the identified positive or negative Pearson correlations: a positive correlation means that the abundance increase of that specific cluster(s) of taxa is associated with the increase of a specific clinical or metabolic parameter while a negative correlation is associated with a decrease of that specific clinical or metabolic parameter. Furthermore, for the species of OB-G and OBc-G samples groups we proceed with the analysis of the complete data set, but also with a manual selection of the species (indicated by the label "sel-specie"), namely a feature reduction, considering only those species that we found relevant in both supervised random forest and relative abundance analyses. The metagenome functional content was predicted using PICRUSt2 63 from the biom file, to get the KEGG Orthology (KO) terms table and the inferred MetaCycpathways 18 . These data were analyzed with the Shotgun-data-profiling module of Microbiome Analyst to identify a list of the most significant pathways able to discriminate cases (OB-G or OBc-G) from controls (nwHD). The abundance of the pathways between the groups was also analyzed with the Wilcoxon test and the statistically significant pathways were clustered, considering the Pearson correlation measureand plotted using Morpheus tool (Morpheus, https:// softw are. broad insti tute. org/ morph eus).
Informed consent. Informed consent was obtained from all individual participants or their families included in the study.

Data availability
Raw 16S rRNA gene reads were deposited at the short read archive (SRA_BioProject ID PRJNA794317).