Effect of metformin and metformin/linagliptin on gut microbiota in patients with prediabetes

Lifestyle modifications, metformin, and linagliptin reduce the incidence of type 2 diabetes (T2D) in people with prediabetes. The gut microbiota (GM) may enhance such interventions' efficacy. We determined the effect of linagliptin/metformin (LM) vs metformin (M) on GM composition and its relationship to insulin sensitivity (IS) and pancreatic β-cell function (Pβf) in patients with prediabetes. A cross-sectional study was conducted at different times: basal, six, and twelve months in 167 Mexican adults with prediabetes. These treatments increased the abundance of GM SCFA-producing bacteria M (Fusicatenibacter and Blautia) and LM (Roseburia, Bifidobacterium, and [Eubacterium] hallii group). We performed a mediation analysis with structural equation models (SEM). In conclusion, M and LM therapies improve insulin sensitivity and Pβf in prediabetics. GM is partially associated with these improvements since the SEM models suggest a weak association between specific bacterial genera and improvements in IS and Pβf.

On the other hand, linagliptin, a widely used DPP-4i in T2D, is known for its cardiovascular and renal safety.Its efficacy in prediabetes has been demonstrated, improving glucose metabolism and pancreatic islet function 7,16 .Incretin-based therapies utilizing DPP-4is are based on the insulinotropic action of glucagon-like peptide 1 (GLP-1) 17 .By increasing endogenous GLP-1 and insulin levels and reducing glucagon secretion 18 , DPP-4i effectively lowers postprandial blood glucose levels by inhibiting incretin hormone degradation.Although some studies suggest a link between DPP-4is and GM, their exact mechanisms remain unclear.
To evaluate how changes in GM are associated with the clinical response, we assessed the impact of linagliptin/ metformin (LM) versus metformin alone (M) on GM composition and its association with insulin sensitivity (Matsuda Index, IS) and pancreatic β-cell function (Pβf) in Mexican patients with prediabetes.Our results indicate that linagliptin/metformin is more clinically effective than metformin alone, and the contribution of GM to the clinical response is relatively low.

Participants and clinical outcomes after the intervention
The patients in this study were part of the diabetes prevention trial PRELLIM, a double-blind, randomized parallel clinical trial comparing linagliptin + metformin + lifestyle (LM) to metformin + lifestyle (M) in terms of their effects on glucose metabolism, insulin resistance (IR), and pancreatic islet function [ClinicalTrials.govID: NCT03004612 (22/12/2016)] 7 .All participants in PRELLIM were encouraged to participate in the microbiome study.Then, between August 2018 and December 2019, 222 participants were assessed for eligibility and 167 eligible participants were included.These participants were randomly assigned to one of two distinct treatment groups.Specifically, 51 participants were included in the no-treatment group (basal evaluation), 55 at six months follow-up (35 in the M group and 20 in the LM group), and 61 at 12 months follow-up (28 in the M group and 33 in the LM group) (Fig. 1).Unfortunately, not all participants provided a stool sample during follow-up, and some did not complete the prescribed number of follow-up sessions.Accordingly, we organized the data for analysis as a cross-sectional study of groups at different points in time.Specifically, 205 samples were included in the analysis: 65 in the untreated group (baseline evaluation), 77 at the six-month follow-up (44 in the M group and 33 in the LM group), and 63 at the 12-month follow-up (28 in the group M and 35 in group LM).Since the original objective of the study was to observe changes in GM composition over time and its implications on clinical response, we obtained 38 stool samples corresponding to the participants' follow-up at months 6 (n = 12) and 12 (n = 24).Additional results from a dependent contrast analysis are shown in the supplementary material only in the subgroup of 24 participants who completed follow-up at 6 and 12 months (Table S3).Multiple cardio-metabolic risk factors were present in the whole studied population: obesity (51.6%), high total cholesterol (32%), low-HDL (72%), high triglycerides (63.7%), and high blood pressure (25.5%), without any significant difference between groups.None of the patients took medications or supplements affecting glucose metabolism or gut microbiota composition.

Insulin sensitivity and pancreatic β-cell function improved at six and 12 months of follow-up
Age was lower in subjects without pharmacological treatment than at six and 12 months of follow-up (p = 0.0347).Consistent with the original PRELLIM publication, BMI and adiposity showed progressive reductions in this subset of patients at 6 and 12 months.Glucose levels, insulin resistance (IR) index and Pβf significantly improved at six and 12 months of follow-up compared to the baseline group (Table 1) (Tables S1, S2, and S3).Pβf significantly improved in both treatment groups at six months [C-d 1.113 (0.716-1.507), p = 0.00001] and 12 months [C-d 1.056 (0.650-1.457), p = 0.00001], with a more pronounced improvement in the LM group at both six months [G-Δ 0.3597 (0.189-0.798)] and 12 months [C-d 0.2537 (0.104-0.659)] (Fig. 2 and Table S4).

Spervised machine learning: explainable analysis approach
Using the Random Forest algorithm with post-hoc explanations, we classified different hypoglycemic drugs (LM and M) (Fig. 3).Therapeutic interventions modified GM composition at six and 12 months of follow-up.AUC-ROC (Area Under the Receiver Operating Characteristic Curve) mean values were 0.79 and 0.74 for basal vs. M groups at six and 12 months, respectively (Table S5).Genera Subdoligranulum, Ruminococcaceae_DTU089, Catabacter, Ruminiclostridium 5, and Escherichia-Shigella identified the basal vs. M group at six months (Fig. 4A).
On the other hand, we identified bacterial genera in both M and LM intervention groups at six and 12 months of follow-up.At six months, Rothia, [Eubacterium] Ruminantium group, Christensenellaceae_uncultured, Ruminiclostridium 5, Turicibacter, Ruminococcaceae UCG-002, Barnesiella, Clostridium Sensu Stricto 1, and Oscillibacter classified the LM group (Fig. 4E).In contrast, at 12 months with LM, Enterobacteriaceae, Holdemania, and Chris-tensenellaceae_uncultured were characteristic genera (Fig. 4F).The mean AUC-ROC for the M vs LM comparison was 0.66 and 0.74, respectively.Interestingly, we observed that the compositional change produced by LM at six

Hypoglycemic drugs and the change in the clinical-metabolic condition explain the increase in insulin sensitivity and pancreatic β-cell function
After machine learning analysis and classifying bacteria based on hypoglycemic drugs and months of followup, we conducted Generalized Linear Mixed Models (GLMM) considering IS indices and Pβf.Thirty-five significant genera were identified for IS (Matsuda index) 27 S6).The sequences obtained in the sequencing were processed using the QIIME2 (Quantitative Insights Into Microbial Ecology) analysis platform 19 .The taxonomic classification for end sequence variants was performed using the SILVA database (version 132).The analysis of the GM was performed using the phyloseq object 20 ; through this object, the α and β diversity of the study groups in the different months of follow-up was obtained.Subsequently, with the abundance of the other bacterial genera, the genera that classify each pharmacological treatment in the different months of follow-up were obtained using machine learning algorithms (random forest) 21 .Once the bacteria that classified the subjects with M and LM treatment were identified, hierarchical linear regression models were performed to eliminate the effect of confounding variables (obesity, age, and gender).In addition, GLMM and, finally, mediation analysis using SEM.Created with BioRender.com.
We evaluated the Bacteria X Treatment and Bacteria X Time interaction for IS and Pβf.In the Bacteria X Treatment interaction, we found a significant increase in IS in the LM group associated with Erysipelotrichaceae UCG-003 (p = 0.008).For Pβf, Erysipelotrichaceae UCG-003 showed statistical significance in the interaction with both LM (p = 0.025) and M (p = 0.048), while Lachnospiraceae UCG-004 (p = 0.032) and Turicibacter (p = 0.003) showed significance with M. In the Bacteria X Time interaction, a significant increase in IS at six months was observed with LM treatment for the bacterial genera Roseburia (p = 0.031), Erysipelotrichaceae UCG-003 (p = 0.018), Lachnospiraceae NK4A136 (p = 0.035), and Bilophila (p = 0.035).For Pβf, Erysipelotrichaceae UCG-003 (p = 0.004), Lachnospiraceae NK4A136 (p = 0.048), Granulicatella (p = 0.044), and Turicibacter (p = 0.046) showed significance at 6 months, and Bilophila (p = 0.028) and Lachnospiraceae UCG-004 (p = 0.011) showed significance at 12 months.There was no effect of treatment by time interaction.Statistically significant bacteria from the GLMM models (with bidirectional effects) were used in subsequent statistical analyses: hierarchical linear regression and structural equations model (SEM).These analyses explored the relationship between GM composition, IS, and Pβf in prediabetes patients treated with hypoglycemic drugs.Hierarchical block models for linear regression revealed that hypoglycemic drugs and BMI significantly explained the increase in IS and Pβf.In the IS model, hypoglycemic drugs contributed a 10% increase (model 1 in Table 3).The addition of gender and age increased the explanation to 14% (model 2 in Table 3), and BMI further increased it to 25% (model 3 in Table 3).However, including the 17 representative genera raised the explanation to 33% (model 4 in Table 3).Among the bacteria analyzed, two were statistically significant: Catenibacterium (β = 0.079, p < 0.05) and Catabacter (β = − 0.083, p < 0.05).
The intro method was selected in which all block variables are introduced in a single step.Stepwise regression: hypoglycemic drugs in model 1; gender and age in model 2; BMI in model 3; and bacterial genera in model 4.

Structural equation modeling (SEM)
To assess the mediating effect of GM on the increase in IS and Pβf in subjects treated with hypoglycemic drugs, we conducted SEM using two significant genera from the GLMM analysis along with a hierarchical model.The SEM also considered BMI to evaluate the impact of weight reduction on clinical variables.Catabacter and Lachnospiraceae UCG-004 were selected as mediator genera for IS and Pβf, respectively, based on the highest beta values and statistical significance obtained in model four of the hierarchical model (see Tables 3 and 4).
In the following SEM, the Matsuda index (IS) is considered the dependent variable, and the genus Catabacter is the mediating variable.BMI is strongly associated with the Matsuda index, with significant coefficients in the basal group (b = − 0.24, p = 0.022), the M group (b = − 0.27, p = 0.170), and the LM group (b = 0.48, p = 0.001).However, BMI showed a weak effect on the genus Catabacter in all groups (b = 0.23, p = 0.917 basal group; b = − 0.14, p = 0.411 M group; b = − 0.19, p = 0.960 LM group) and a negligible impact on the Matsuda index (b = − 0.27, p = 0.062 baseline group; b = − 0.064, p = 0.095 M group; b = − 0.18, p = 0.510 LM group), except for the LM group, where it was statistically significant (Fig. 5B).
To examine the impact of hypoglycemic drugs and the new metabolic condition on bacterial genera abundance, we conducted GLMM (Tweedie distribution) with bacterial genera as the dependent variable and hypoglycemic drugs as the independent variable.The model was adjusted for gender, age, BMI, and the Matsuda index or Pβf.We observed negative effects for Catabacter on both IS (β = − 0.503, Eta-Squared = 0.031, p = 0.031) (Table S7) and Pβf (β = − 0.990, Eta-Squared = 0.064, p = 0.002) (Table S8).These results indicate that the drug-induced changes in clinical indices significantly influence the abundance of the bacteria.It underscores the importance of considering the variation in the metabolic condition to comprehend the complexity of the ecological structure of the intestinal microbiota in subjects with prediabetes.

Discussion
PRELLIM study was the first randomized clinical trial to evaluate the effect of combined (linagliptin/metformin) therapy on GM in prediabetes subjects with a basal control and metformin monotherapy as a comparator.Consistent with previous reports 7 , at six and 12 months, both drugs significantly improved anthropometric, biochemical, and clinical parameters.The M group increased IS more, while LM favored Pβf.M and LM interventions impacted GM composition at the genus level, resembling findings in T2D.However, novel and specific changes in microbial structure were identified at six and 12 months with both drugs, affecting bacterial genera and ASVs.
The LM intervention impacts GM composition at the genus level, similar to effects observed in animal models using other DPP-4is.Qian Zhang et al. demonstrated vildagliptin's ability to increase butyrate-producing bacteria in rats induced to T2D 31 .Lin Wang et al. explored GM modulation by liraglutide (GLP-1 receptor agonist) and saxagliptin (DPP-4i); they reported increased levels of Lactobacillus, Allobaculum, and Turicibacter in mice treated with saxagliptin, indicating possible increment in incretins and their effect on glucose homeostasis 32 .Our study results are consistent with an enrichment of butyrate-producing bacteria, including Lactobacillus, Roseburia, Fusicatenibacter, and Blautia.These genera promote peptide production in the ileum, indirectly reducing hepatic expression of proinflammatory cytokines in T2D 13,33 .Notably, changes in GM composition with each hypoglycemic drug over time differed, particularly in the number of increased SCFA-producing bacteria in each group.Nevertheless, microbial functionality was maintained in each group, a characteristic of GM known as redundancy functions, suggesting species interchangeability within a given microbiota in terms of function 34 .
Current knowledge has established that metformin reduces GM diversity in diabetic mice fed with a high-fat diet, which increases Akkermansia muciniphila abundance and other SCFA-producing and mucin-degrading Table 3.A hierarchical model for the IS (Matsuda Index).*Adjustment for hypoglycemic drugs was set for all models.The intro method was selected, and all block variables were introduced in a single step.Stepwise regression: hypoglycemic drugs in model 1; genera and age in model 2; BMI in model 3; and bacterial genera in model 4. genera 9 .A meta-analysis confirmed the alteration of GM composition by metformin 35 .In our study, Metformin induces GM changes over time in prediabetes patients, partly differing from T2D evidence.At six months, SCFAproducing bacteria (Fusicatenibacter, Blautia, and [Ruminococcus] gauvreauii) increased, inhibiting enteropathogenic and LPS-producing bacteria (Proteobacteria, Escherichia-Shigella, and Enterococcus).These differences may result from distinct physiopathological deterioration between prediabetes and T2D 36 .Interestingly, after a 12-month follow-up, we identified eight abundant SCFA-producing genera: Fusicatenibacter, Atopobiaceae, Coprococcus 1, Lachnospiraceae ND 3007 group, Anaerostipes, Dorea, Lachnospiraceae FCS020 group, and Blautia.Surprisingly, Akkermansia and other mucin-degrading bacteria were not significant in subjects with prediabetes, contrary to T2D reports [37][38][39] .However, the reduction in opportunistic pathogens and T2D-associated genera found in the M-treated group for 12 months aligns with worldwide studies 23,38 .Contrasting both interventions, LM increased bacterial genera, including SCFA producers and opportunistic pathogens, without establishing a clear pattern on the functional redundancy of the GM.Our results suggest that multiple coexisting and taxonomically distinct organisms perform diverse metabolic functions 34,40,41 (see Fig. 4).We observed that the increase in SCFA-producing genera mediated the change in GM composition.

Variable
SCFA-producing genera are crucial for host health.Butyrate is vital to human insulin sensitivity (IS) through incretins.In a high-fat diet mouse model, butyrate supplementation prevents weight gain and increases IS.Butyrate and propionate induce intestinal gluconeogenesis, improving peripheral glucose production and IS 42 .GM changes influence the gut metabolome, affecting butyrate and acetate production 43 , key gut-derived metabolites in insulin resistance (IR) and glycemic control.Increased intestinal gluconeogenesis from these SCFAs in rodents reduces hepatic gluconeogenesis, appetite, and weight, leading to better glucose homeostasis 44 .
Most studies focus on short-term GM and glucose homeostasis associations.Moreover, the impact of hypoglycemic drugs on GM changes and metabolic improvement remains unclear 45 .Our findings show low associations (r = 0.1-0.2) between bacterial abundance of specific genera and clinical variables (fasting glucose, postprandial glucose, glucose AUC, HOMA-RI, or Matsuda Index) in diverse populations and study models.However, hypoglycemic drugs affect GM structure, anthropometric (reduced weight, body fat, and waist circumference), biochemical (increased IS and Pβf), and clinical (lower systolic and diastolic blood pressure) parameters, influencing the associations.In our study, hypoglycemic drugs improved clinical indices with a low GM contribution, as previously reported 46 .Using SEM, we found a strong relationship between BMI (not total adiposity) with IS and Pβf.Metformin and linagliptin effectively reduced weight and fat percentage in overweight and obese insulin-resistant outpatients 47 .Weight loss was the primary predictor of improved IS, while weight regain  48 .
On the other hand, the bacterial abundance of SCFA-producing genera weakly explained the changes in IS and Pβf, with an eta squared of 0.01 in both cases, compared to the effects of hypoglycemic drugs and weight loss (See Table S6 and S7).Thus, we conclude that hypoglycemic drugs strongly impact metabolic conditions (IS and secretion) and moderately influence GM's composition.On the other hand, GM has a lower effect on metabolic changes.To reinforce this, we evaluated bacterial genera abundance using a GLMM, adjusting for hypoglycemic drugs, change in IS, and increased Pβf.We found that the shift in metabolic condition modifies the GM's structure.
Recent findings suggest that gut dysbiosis is linked to metabolic diseases like obesity, diabetes, and nonalcoholic fatty liver disease 49 .These discoveries support the coevolution theory between humans and the GM, profoundly affecting various host responses.It is clear that multiple variables influence glucose metabolism in prediabetes and diabetes prevention, and it cannot be explained by only one factor; in this context, GM seems to play a limited role, which still has to be elucidated in more detail.A limitation of our study is that we only measured GM composition and didn't analyze metabolites or other microbiota functionality measures.Nonetheless, our study provides the first evaluation of the effect of DPP-4 inhibitors on GM composition in humans with prediabetes.

Conclusions
Our study reveals that changes in GM have a low impact in mediating the effect of lifestyle, metformin, and linagliptin/metformin on glucose metabolism, IS, and Pβf in individuals with prediabetes.Despite the observed increase in SCFA-producing bacteria in the GM following these treatments, the SEM suggests a weak association between specific bacterial genera and improvements in IS and Pβf.Therefore, the primary mechanism of metabolic improvement in prediabetic patients is more directly attributable to the pharmacological effects of the hypoglycemic drugs with only a partial modulation of the GM.Future omics studies with long-term follow-up will determine the extent of drugs' hypoglycemic effect via GM modifications and its role in T2D development and progression.

Trial design and oversight
This study was part of a randomized, double-blind, placebo-controlled clinical trial [ClinicalTrials.govID: NCT03004612 (22/12/2016)].Participants were enrolled between August 2018 and December 2019 as part of the PRELLIM project7 (Prevention of diabetes with linagliptin, lifestyle, and metformin).Further details are in the PRELLIM article 7 .

Participants and intervention procedure
Eligible participants with prediabetes (per ADA criteria) and no prior glycemic medication were randomly assigned to two groups in a 1:1 ratio: (i) Linagliptin + metformin + lifestyle (LM group): patients started on linagliptin/metformin 2.5/850 mg once daily for a month, then increased to twice daily until study end.(ii) Metformin + lifestyle (group M): patients began with 850 mg metformin once daily, then increased to twice daily.Identical envelopes contained metformin 850 mg and linagliptin/metformin 2.5/850 mg.Both groups received the same lifestyle program.Monthly follow-up visits assessed adherence and side effects and included nutritional evaluation.OGTTs were done at baseline, six, and 12 months.Primary outcomes were changes in the GM composition; Glucose levels, insulin resistance, and pancreatic β-cell function were secondary outcomes 7 .

The detailed inclusion criteria
167 patients were screened with anthropometric, nutritional, biochemical, and metabolic evaluation, including oral glucose tolerance test and hyperglycemic clamp at the Metabolic Research Laboratory, Hospital Regional de Alta Especialidad del Bajío.Patients were eligible for enrollment in the study based on the following criteria: (i) IGT (two h glucose levels 140-199 mg/dL) during oral glucose tolerance test, ± IFG (fasting glucose 100-125 mg/ dL); (ii) age 18-65 years; (iii) ≥ 2 T2D risk factors per ADA 50 .
To carry out this research, the ethical standards, the Regulations of the General Health Law on Research for Health, and the Declaration of Helsinki of the World Medical Association of the 52nd General Assembly, Edinburgh, Scotland, October 2000 have been considered with clarification note on paragraph 29 added by the General Assembly, Washington 2002 and current international codes and standards of good clinical research practice.Written informed consent was obtained from all participants before enrollment in this study.The Research and Ethical Committee approved the study at the Hospital Regional de Alta Especialidad del Bajío (CI-HRAEB-2017-048 and CEI-22-16 extension), registered with the Mexico Secretary of Health.

Anthropometric measures
Weight and body composition were assessed via the Tanita SC-240 Scale: Monthly weight recording and bioimpedance every six months in fasting conditions.Total body fat was measured in %, and visceral fat was measured in arbitrary units 7 .

Oral glucose tolerance test (OGTT)
Subjects arrived at the University of Guanajuato's Metabolic Research Laboratory between 7 and 8 a.m., fasting.An intravenous catheter was placed, and the first blood sample was drawn.Next, they ingested 75 g of glucose.Serum samples for glucose and insulin measurement were drawn at − 15 and 0 min and every 30 min afterward for two hours, with 4 ml of blood taken each time 7 .

Randomization and masking
Patients were randomly assigned in a 1:1 ratio to receive a fixed combination of linagliptin/metformin 2.5/850 mg every 12 h + lifestyle modification program or metformin pills of 850 mg every 12 h + lifestyle modification program.Randomization was performed by a nutritionist who was not involved in the patient's follow-up using an electronic random numbers assignment system.Participants and investigators involved in the patient's followup and outcome measurements were masked to treatment allocation during the entire study using identical envelopes for pills 7 .

Interventions
(i) Linagliptin + metformin + lifestyle (LM group): Patients allocated to this group started fixed combination pills of linagliptin/metformin 2.5/850 mg once daily during the first month, and after that, the dose was increased to 2.5/850 mg twice daily until the end of the study.(ii) Metformin + lifestyle (M group): Patients in this group started taking metformin pills of 850 mg once daily during the first month and increased to 850 mg twice daily until the end of the study.Pills of metformin 850 mg and linagliptin/metformin 2.5/850 mg were prepared using identical envelopes.Both groups received the same lifestyle implementation program based on a prescribed diet to reduce their body weight by at least 5-7%, adjusting their energy requirements based on their weight, and composed from 55 to 60% of carbohydrates, 25-30% fat, and 10-15% proteins.Patients were advised to start with 45 min/week of mild-moderate exercise and increase the duration and frequency or intensity of exercise every two weeks until reaching 150 min/week of moderate activity or 75 min/week of intense activity 7 .

Fecal sample collection and processing protocol
Fecal samples from intervention and control groups were collected in sterile containers at zero, six, and twelve months.Samples were homogenized and stored at − 80 °C in sterile 1 ml screw-cap tubes before DNA extraction.DNA extraction and 16S rRNA Gene Amplification and sequencing protocol are shown in supplementary material protocol S1 23 .

Processing of 16S sequencing data
Demultiplexed MiSeq FASTQ files were analyzed in QIIME2 using the DADA2 workflow.High read quality is ensured by filtering and trimming reads before processing.The first 5′ 10 bp of all reads were trimmed, and reads truncated on 3' to max 240 and 200 bp for forward and reverse reads, respectively, due to quality dip.Reads with > 2 expected errors under Illumina base model removed.Filtered and trimmed reads are grouped by sequencing run, and the error model fits separately for each run using DADA2 default parameters.Sequence variants were obtained for each run separately using calculated error models and dereplicated input sequences.Sequence variants and counts joined across all runs in the complete sequence table, and de novo chimera removal runs on the entire table 23 .
The final sequence variants taxonomy was assigned to DADA2's RDP classifier using the SILVA database (version 132).Species are identified separately via exact sequence matches (SILVA version 132).Joined with clinical metadata and saved as a phyloseq object for downstream analyses 23 .

Taxonomic and ecological analysis
A Phyloseq object was used to calculate alpha diversity indexes (i.e., Chao 1, Simpson, Shannon, and Pielou indexes) and β diversity index (Jaccard), computed by R Phyloseq library 1.34.0 52 .

Supervised machine-learning: explainability analysis approach
To identify bacterial genera associated with different treatments (LM and M), we used the Random Forest algorithm, an ensemble method based on uncorrelated decision trees using the bagging technique.We compared various algorithms (decision tree, logistic regression, naive Bayes, and XGBoost) and selected Random Forest for its predictive performance and interpretability with SHAP values.Python3 (version 3.9.7)with the software library was used for calculations.We labeled data count matrices for M and LM-treated patients as 0 and 1, respectively.75% of the samples were randomly chosen as the training dataset and the rest as the test dataset.To validate, the Random Forest model was built and evaluated with K-fold cross-validation (n_split = 5) to ensure independent results (Figueroa et al., 2012; Mentch and Hooker, 2016).This involved dividing the data into five equal proportions, using four for training and one for testing each run.Model performance was assessed with the AUC of ROC curves (see Table S5).
Random Forest classifiers that support the (place which) in the main text are reported in the machine learning section at https:// github.com/ resen dislab/ Micro biome_ two_ treat ments_ Metfo rmine-Linag lipti ne.
Moreover, we assess the relevance of bacterial genera with Shap values (Shapley additive explanations) using TreeExplainer for the Random Forest algorithm 53 .Shap values use a game-theoretic approach for the best model interpretation and explanation.
Finally, we pairwise compared baseline groups vs both treatments (M and LM) and treatments (M vs LM) at 6 and 12 months of follow-up.SHAP values were used to explore the relevance of the classification process for each genus, quantifying its contribution to classification 53 .Microbiota data faces challenges of technical noise, zero-inflated abundance distribution, and high-dimensionality 54 .However, the random forest model effectively classifies and analyzes microbiome data under these conditions 55 .AUC-ROC was 0.79 and 0.74 for the basal vs M group at 6 and 12 months, respectively, indicating successful training and test data set selection.

Statistical analyses for clinical parameters
We estimated the required sample size to observe an effect on GM changes among the treated groups.Briefly, the sample size for this study was determined both a priori and a posteriori using different analytical approaches.In the a priori analysis, we employed an ANOVA (Analysis of Variance) with repeated measures, factoring in two effective groups and three time-point measurements.We aimed for an effect size of 0.25 and set the β error (Type II error) at 20%.This calculation indicated a required sample size of 48 patients.To account for potential dropouts and ensure robustness in our data, we increased this number by 20%, including 167 subjects.For a posteriori analysis, our approach differed slightly.We utilized an ANOVA without repeated measures and considered three effective groups.The effect size was set at 0.01 (eta squared).Under these parameters, the power of our sample was calculated to be 75%.Primary analysis: GM composition change and its relation to IR and insulin secretion in prediabetes patients.The effect of hypoglycemic drugs on IS and β-cells function is analyzed at two levels: (1) changes over follow-up months, and (2) considering drugs and follow-up months.The student's t-test for the first level included independent contrast between basal and 6/12-month groups and dependent contrast between 6 to 12-month subjects.Cohen's d (C-d) and 100-repetition bootstrap 95% confidence intervals were calculated as standardized effect sizes.For the second level, one-way ANOVA was performed, comparing means of clinical parameters for three treatment groups and follow-up months.Two-way ANOVA examined the Time X Treatment impact, and Scheffe's post-hoc was used to identify differences between groups.Logarithmic transformation (base 10) for quantitative variables normalization with bias, and Van der Wader transformation for bacterial abundances.
After machine learning analysis identifying bacteria classifying each group by hypoglycemic drugs and followup months, GLMM was performed with IS indices and Pβf as dependent variables.Fixed factors included study group, follow-up month, sex, age, bacteria genera used for classification (Random Forest), and anthropometric parameters (BMI, weight, % body fat, waist circumference).Random factors were stool samples of study subjects.Eta-squared and 95% confidence intervals were calculated with 100 repetitions bootstrap.Interactions (bacteria X Treatment, Bacteria X Time, and Treatment X Time) evaluated.For bidirectional effect, models are executed with diversity and microbial abundance as dependent variables, including study groups, sex, age, IS and Pβf indices, or anthropometric parameters (BMI, weight, % body fat, waist circumference) as fixed factors.Bacteria with significance in both models were considered for subsequent statistical and structural equation modeling

Statistical analysis of 16S metagenomics and its relationship with clinical parameters
To establish GM composition's relationship with IR and insulin secretion in prediabetes patients on hypoglycemic drugs, hierarchical linear regression, GLMM, and structural equation modeling tests were performed.Van der Wader transformation was applied to each bacterial genus for the tests.Statistical analysis was done using Stata/ SE 16.0, IBM-SPSS version 25, and RStudio version 4.1.1.
Linear regression was used with IS indices and Pβf as dependent variables.Initially, regression models estimated treatment and follow-up months (by subject) as the main effects and interactions.Subsequently, models were estimated by visiting, sex, and age.Residuals and effect size estimated for each.
The structural equation model (SEM) was used to mediate between BMI, bacterial abundance, significant components, insulin secretion, and IS indices.Coefficients are estimated by a robust method.
Mixed models were used to study hypoglycemic drugs and the effects of new metabolic conditions on bacterial genera.GLMM (Tweedie distribution) was used, with bacterial genera as the dependent variable and hypoglycemic drugs as the independent variable.The model was adjusted for gender, age, BMI, and Matsuda index or Pβf.A stool sample is used as a random effect to consider the within-patient correlation with repeated measures.

Figure 1 .
Figure 1.Study profile.This work analyzed the data into two distinct analytical categories: (1) The participants were 51 in the no-treatment group during the baseline evaluation, 55 at the six-month follow-up (with 35 in the M group and 20 in the LM group), and 61 at the 12-month follow-up (comprising 28 in the M group and 33 in the LM group).(2) The gut microbiota samples were 65 in the untreated group (baseline evaluation), 77 at the six-month follow-up (44 in the M group and 33 in the LM group), and 63 at the 12-month follow-up (28 in the group M and 35 in group LM).Created with BioRender.com.LM: The combination of linagliptin + metformin + lifestyle.M: only metformin + lifestyle.

Figure 2 .
Figure 2. Insulin sensitivity and pancreatic β-cells function.(A) IS upper panel; Matsuda index and lower panel; HOMA-IR at baseline, six and 12 months in groups M and LM.(B) Basal Pβf, six and 12 months in groups M and LM.Upper left panel; AIR, upper right panel; ORAL-DI, lower left panel; Disp_Index2 and lower right panel; AUCinsgluc_OGTT.(C) Glucose and insulin levels during baseline OGTT, six-month and 12-month follow-up: upper panel; glucose and lower panel; insulin.The monthly follow-up was compared with the monthly follow-up.*P < 0.05, **P < 0.01 and ***P < 0.001, one-way ANOVA.IS insulin sensitivity, Pβf pancreatic β-cells function, HOMA-IR Insulin Resistance Index, AIR acute insulin response, β-Cell function 2 (Matsuda*(IncAUCins 0-120 /IncAUCgluc 0-120 )), Oral_DI insulin disposition index, AUCinsgluc_OGTT glucose area under the curve, OGTT curve of oral glucose tolerance.

Figure 3 .
Figure 3. Schematic diagram of the proposed procedure for the data's clinical and GM analysis.It consists of (A) PRELLIM data, (B) Taxonomic and ecological analysis, and (C) Explainable Machine Learning analysis, generalized linear mixed models (GLMM), and structural equation models (SEM).The sequences obtained in the sequencing were processed using the QIIME2 (Quantitative Insights Into Microbial Ecology) analysis platform19 .The taxonomic classification for end sequence variants was performed using the SILVA database (version 132).The analysis of the GM was performed using the phyloseq object20 ; through this object, the α and β diversity of the study groups in the different months of follow-up was obtained.Subsequently, with the abundance of the other bacterial genera, the genera that classify each pharmacological treatment in the different months of follow-up were obtained using machine learning algorithms (random forest)21 .Once the bacteria that classified the subjects with M and LM treatment were identified, hierarchical linear regression models were performed to eliminate the effect of confounding variables (obesity, age, and gender).In addition, GLMM and, finally, mediation analysis using SEM.Created with BioRender.com.

Figure 4 .
Figure 4. SHAP graph for each hypoglycemic drug and month of follow-up.The figure shows the first ten bacterial genera with the most significant contribution to classifying patients without hypoglycemic drugs (baseline) and those treated with M and LM at six and 12 months of follow-up and between treatments.We have ordered the bacterial genera from most to least relevant from top to bottom.Blue and red represent bacterial genera's low and high abundance, respectively.The higher positive values on the SHAP axis establish the relevance of bacterial genera to classify patients with M or LM at six or 12 months of follow-up, while the negative values establish the relevance of bacterial genera for patients with prediabetes without treatment.(A) Baseline vs M with six-month follow-up.(B) Baseline vs M with 12-month follow-up.(C) Baseline vs LM with six-month follow-up.(D) Baseline vs LM with 12-month follow-up.(E) M vs LM with a six-month follow-up, and (F) M vs LM with a 12-month follow-up.

Figure 5 .
Figure 5. Heatmap resulted from the analysis of GLMM and SEM with standardized path coefficients.(A) The color scale represents the β coefficient of the GLMM; when it is red, it means a positive β, and negative when it is blue.All bacteria represented in the heatmap are statistically significant.(B) Catabacter SEM, the baseline model, showed a relationship between BMI and IS (Matsuda index) and the relationship between BMI and the genus Catabacter.Lachnospiraceae UCG-004 SEM; the baseline model shows a relationship between BMI and IS (Matsuda index) and the relationship between BMI and the genus Lachnospiraceae UCG-004.M is a model for metformin treatment, and LM is the model for linagliptin/metformin treatment.**p < 0.05**p < 0.05.

Table 1 .
Characteristics of the study populations.ISBP systolic blood pressure, DBP diastolic blood pressure, BMI body mass index, WC waist circumference, AC hip circumference, AUC area under the curve, IncAUC increase in area under the curve, OGTT curve of oral glucose tolerance, AIR acute insulin response, HOMA-IR homeostasis model assessment for insulin resistance, HOMA-B homeostasis model assessment beta-cell, Oral_DI insulin disposition index, HDL-c cholesterol high-density lipoprotein cholesterol, LDL-c low-density lipoprotein cholesterol, VLDL-c very low-density lipoprotein cholesterol, Size effect = Cohen's D. *p < 0.05.

Table 2 .
GLMM for IS, IR, and Pβf indices and effect size and confidence interval for each bacterium.

Table 4 .
A hierarchical model for the Pβf.Adjustment for hypoglycemic drugs was set for all models.predicted reduced IS.Weight loss maintenance programs are crucial for preserving metabolic benefits.Physical activity and a balanced diet increase IS in patients with obesity and T2D Vol.:(0123456789) Scientific Reports | (2024) 14:9678 | https://doi.org/10.1038/s41598-024-60081-ywww.nature.com/scientificreports/