Abstract
The escalating prevalence of insulin resistance (IR) and type 2 diabetes mellitus (T2D) underscores the urgent need for improved early detection techniques and effective treatment strategies. In this context, our study presents a proteomic analysis of post-exercise skeletal muscle biopsies from individuals across a spectrum of glucose metabolism states: normal, prediabetes, and T2D. This enabled the identification of significant protein relationships indicative of each specific glycemic condition. Our investigation primarily leveraged the machine learning approach, employing the white-box algorithm relative evolutionary hierarchical analysis (REHA), to explore the impact of regulated, mixed mode exercise on skeletal muscle proteome in subjects with diverse glycemic status. This method aimed to advance the diagnosis of IR and T2D and elucidate the molecular pathways involved in its development and the response to exercise. Additionally, we used proteomics-specific statistical analysis to provide a comparative perspective, highlighting the nuanced differences identified by REHA. Validation of the REHA model with a comparable external dataset further demonstrated its efficacy in distinguishing between diverse proteomic profiles. Key metrics such as accuracy and the area under the ROC curve confirmed REHA’s capability to uncover novel molecular pathways and significant protein interactions, offering fresh insights into the effects of exercise on IR and T2D pathophysiology of skeletal muscle. The visualizations not only underscored significant proteins and their interactions but also showcased decision trees that effectively differentiate between various glycemic states, thereby enhancing our understanding of the biomolecular landscape of T2D.
Similar content being viewed by others
Introduction
Type 2 diabetes mellitus (T2D) is a global health challenge characterized by elevated blood glucose levels and defective glucose metabolism in adipose, hepatic and skeletal muscle tissues. T2D is marked by abnormal glucose, protein and lipid metabolism, which arises from insulin resistance and insufficient hormonal response, leading to progressive pancreatic beta cell dysfunction. This disease not only affects millions worldwide but also imposes significant clinical, social, and economic burdens1,2,3,4.
With the global incidence of diabetes on the rise, there is an urgent need for earlier diagnosis and more effective treatment strategies. Modern lifestyle factors, such as reduced physical activity and unhealthy dietary habits, contribute significantly to the increasing prevalence of the condition. Skeletal muscle, as the largest tissue mass in the body, plays a central role in whole-body energy and glucose metabolism under both physiological and pathological conditions5. It adapts to various stimuli, changing in size, fiber type, and metabolic properties, and is pivotal in glucose metabolism and insulin sensitivity6,7.
Recent advances in omics technologies, particularly proteomics, have been instrumental in identifying and quantifying proteins and their modifications, offering insights into disease mechanisms. These high-throughput techniques enable the exploration of altered protein expression profiles and molecular pathways contributing to insulin resistance and T2D. However, the complexity and high dimensionality of proteomics data necessitates the use of Machine Learning methods not only at the level of protein identification and quantitation, but also interpretation of biological relationships.
In this study, we focus on using a dedicated machine learning approach, the relative evolutionary hierarchical analysis (REHA), which integrates elements of the decision tree model and evolutionary algorithms. REHA’s main innovation lies in its use of the Relative eXpression Analysis (RXA)8,9 and the multi-test concept10, based on clusters of co-expressed proteins. This approach allows us to explore associations among proteins identified by untargeted skeletal muscle proteomics, advancing both the understanding of their biological relationships and biomarker discovery11.
Our research is focused on improving the accuracy of IR and type 2 diabetes (T2D) predictions, while also deepening our understanding of the impact of exercise on key molecular pathways involved in the disease’s progression. We focus on protein expression alterations in skeletal muscle among individuals with normal, prediabetic, and diabetic metabolic states following a regulated exercise intervention. This approach enables us to investigate changes in protein expression associated with the response to exercise in IR and T2D, elucidating the roles of proteins and signaling pathways in skeletal muscle tissue. This may potentially lead to the identification of new therapeutic targets for insulin resistance and related metabolic disorders.
In this study, we assessed post-exercise skeletal muscle biopsies collected from patients with distinct glycemic state disturbances: 13 with normoglycemia, 11 with prediabetes, and 8 with T2D. Analysis was conducted on 2 independent biopsies from each participant (total of 64 samples), leading to identification of 1139 proteins, after stringent application of FDR, significance and differential expression fold-change cutoffs. Additionally, our study incorporated data from 42 samples from public repository, ranging from prediabetic to healthy states, further enriching our understanding of the protein dynamics in various glycemic conditions.
To validate our machine learning-based findings, we employed Statistics tailored for proteome data analysis. For the comparison between the NG, PD, and T2D groups, only proteins present in at least 50% of the samples (Q-value percentile 0.5), with a minimum of 2 unique peptides per protein group, displaying at least 50% expression difference (− 0.585 ≤ log2 ≤ 0.585), and possessing an FDR-adjusted p-value < 0.05 (Q-value < 0.05, − Log10 p-val > 1.13), were considered as significant12.
This statistical validation was essential for identifying potential biomarkers and for providing a one-dimensional view to complement the multi-dimensional insights offered by the REHA algorithm. We compared these statistical results with those obtained from REHA to draw comprehensive conclusions about the protein expressions in different glycemic states.
Furthermore, our investigation was enhanced through a robust external dataset, from the experiment performed on comparable study group. We integrated data from a proteomic analysis of skeletal muscle that was aligned with our classification groups, focusing on 645 proteins shared between both datasets. This step was crucial for validating the performance of the REHA models against external data, which not only demonstrated the versatility of our approach but also provided insights into the generalizability of our findings.
In addition to REHA, we tested several popular machine learning algorithms, both white-box and black-box approaches13. These diverse methodologies allowed us to thoroughly evaluate and compare the effectiveness of different analytical approaches in handling high-dimensional proteomics data. The performance of each algorithm, including REHA, was rigorously validated using metrics such as F1-score, weighted F1-score, accuracy, and the area under the ROC curve.
The simplicity and interpretability of the REHA algorithm hold promise for uncovering significant protein relations indicative of both the effect of exercise and different glycemic conditions. We anticipate that this approach will enable the discovery of new alternative pathways, offering fresh perspectives in understanding complex protein interactions involved in the response to exercise in IR and T2D subjects. This study aims not only to enrich our current knowledge base but also to open avenues for future research and potential therapeutic interventions.
Results
The aim of this study was to identify complex protein associations that could shed light on the underlying mechanisms of insulin resistance, pathogenesis of type 2 diabetes (T2D) in subjects subjected to mixed-mode exercise. Figure 1 presents the comprehensive workflow of our experiments, from the collection and preprocessing of post-exercise skeletal muscle biopsies in three cohorts to the resulting analyses. The process began with rigorous data quality control (QC), transformation, and filtering to pinpoint 1139 proteins. Subsequently, three principal investigative segments: proteomics-tailored analytical pipeline, machine learning methods, and external dataset validation; were pursued in parallel, each contributing to a multifaceted understanding of the proteomic landscape in relation to glucose tolerance and diabetes.
Machine learning approach
In the context of our machine learning exploration, REHA was a key component, providing a clear and intuitive model that differed from proteomics-tailored statistical pipeline standard methods. By adopting this innovative method, we could explore the dataset deeply, identifying protein patterns across varying levels of glucose tolerance.
Our analysis focused on binary classification tasks to identify unique patterns in protein expression distinguishing NG-PD (normal glucose tolerance to prediabetes), NG-T2D (normal glucose tolerance to type 2 diabetes), and T2D-PD (type 2 diabetes to prediabetes) subjects which underwent physical exercise regime. REHA was instrumental in crafting predictive models and uncovering the complex interactions of proteins within these groups. To ensure a comprehensive analysis, we assessed REHA’s effectiveness with various established machine learning algorithms, supported by thorough statistical evaluation.
The results provided by REHA and a comparative evaluation with other algorithms is presented in Table 1. It details the performance metrics of the predictions from the tenfold cross-validation process. The analysis reveals that REHA can effectively distinguish NG from PD, suggesting that specific protein interactions change with glucose tolerance. However, differentiating between T2D and PD was more challenging, with REHA’s results displaying greater variability in this comparison. When comparing REHA’s results with those of other white-box classifiers, its effectiveness is clearly visible. The use of statistical tests, particularly the Friedman test followed by Dunn’s multiple comparison post-hoc test14 at a 0.05 significance level, showed that REHA outperformed popular interpretable classifiers like J48, CART, and JRip in terms of accuracy, making these differences noteworthy. Conversely, Naive Bayes (NB) and black-box classifiers (such as random forest (RF) and Support Vector Machine (SVM)), showed enhanced accuracy, which was anticipated given their more complex designs. Additional information including ROC curves of REHA algorithm are included in Appendix J.
In order to understand the molecular intricacies of glycemic variations, we leveraged the REHA algorithm to highlight key proteins that may serve as biomarkers across different states of glucose tolerance. The identification of these proteins is crucial, as it helps in understanding common pathological pathways and potential targets for therapeutic intervention. To visually show the interaction and significance of these proteins, we present a Venn diagram in Fig. 2 that clearly outlines the unique and shared proteins involved in each classification task. Each protein description together with its importance is described in Appendix C and D.
In the varied context of our machine learning analysis, REHA’s robustness is demonstrated by its ability to generate straightforward yet effective rules for protein interaction. Figure 3 provide visual examples of this, where the most commonly generated REHA models are presented for each classification challenge. These decision trees provide insight into the majority relationships between pairs of proteins at each node, offering a detailed view of the decision-making process of the algorithm. Below each tree, a set of statistics is provided, which includes a confusion matrix, underscoring the predictive success and reliability of the models.
From Fig. 3, we see that the classification trees generated by REHA are straightforward and easily interpretable. Each test at the internal node examines a simple ordering relationship within the samples, directly derived from the concept of Relative eXpression Analysis (RXA). This approach ensures that we do not fall into the pitfalls of overfitting and remain more resilient to data inconsistencies since we are not focused on the actual expression values of the proteins.
Furthermore, it is observable that some splits involve more than one comparison (test). Such a scenario indicates that the node was constructed using a multi-test concept11. This complex test can encompass several simple RXA comparisons that partition the data in a similar manner. The REHA algorithm typically employs this strategy at the root of the tree to enhance the stability of decisions and to reduce the risk of underfitting. In such tests, all the individual tests cast votes (each with equal weight) and collectively determine the data split outcome at the node.
The trees presented in Fig. 3 are further elaborated upon in the discussion section and each one is described in detail in Appendix B with proteins description in Appendix C and D. However, it is important to consider that REHA uses an evolutionary algorithm15, which means each run may yield a different decision model. The examples showcased here represent some of the most recurrent models or their variants produced by multiple runs of the algorithm.
Statistics tailored for proteome data analysis
In our study, we opted to employ a proteomics-specific statistical approach to analyze our data, recognizing the limitations of conventional statistical methods for our complex proteomic dataset. Traditional methods, such as the Wilcoxon signed-rank test or the t-test, adjusted with Bonferroni correction for multiple comparisons, often fall short in accommodating the intricacies of proteomic data, which include high dimensionality and variable data completeness across a large set of observations. These foundational biostatistical analyses identified only a limited number of proteins as significantly different across conditions: five for NG-PD, one for T2D-PD, and 42 for NG-T2D, as detailed in Appendix E.
Given these constraints, our study employs a more tailored approach better suited to address the specific challenges of proteomic data. This includes criteria such as protein presence in 50% of the samples, requiring at least two unique peptides per protein, and a threshold for differential expression set at 50% of up or down-regulation. Additionally, we applied multiple testing corrections through False Discovery Rate (FDR) control to correct differential expression p-value set at < 0.05 cutoff.
This proteomics-specific approach led to the identification of 207 significantly different proteins in T2D-NG, 98 in PD-NG, and 79 in T2D-PD comparison. It also revealed overlaps that suggest shared pathological mechanisms across these conditions. Appendix F provides a comprehensive list of all proteins identified through our proteomics-specific statistical approach. This appendix ensures that readers interested in exploring the full breadth of our data can access this detailed information.
To manage the extensive data and focus on the most significant findings, the top 50% of proteins with the highest p-values for each class are shown in Fig. 4A, provided they exist in at least two classes. For proteins unique to one class, only the top 25% are displayed. This selective visualization allows us to draw attention to the most impactful proteins while maintaining clarity in our presentation.
Furthermore, to illustrate the integration of machine learning findings with our statistical results, we include three separate Venn diagrams (one for each challenge) in Fig. 4B–D. These diagrams display the proteins found significant by the machine learning approach (REHA), by the proteomics-specific statistical approach, and those that overlap, thereby emphasizing the synergy between computational modeling and empirical data analysis in understanding complex biological systems. This comparative visualization helps to validate the effectiveness of the REHA model and highlights the complementary nature of machine learning and statistical approaches in identifying key proteins relevant to the progression of glucose intolerance to diabetes.
External dataset case study
The robustness of the REHA models and their comparative performance alongside other machine learning methodologies were tested in the third part of our study. We utilized an external dataset from a proteomic analysis of skeletal muscle16, which mirrored our classification groups: NG, PD, and T2D. To facilitate a coherent integration, we merged this external data with our own, focusing on 645 shared proteins and employing normalization procedures. The last step, while not necessary for the REHA algorithm due to its reliance on order relations, proved essential for ensuring compatibility with other machine learning algorithms.
Upon combining the datasets, a noticeable decline in REHA's performance on the merged dataset was observed, a trend that was mirrored by competing algorithms, suggesting a common challenge posed by the integration of external data. Table 2 aggregates the predictive performance statistics of these algorithms, revealing variations across different metrics such as: accuracy (ACC), area under the ROC curve (AUC), F1 score, and Weighted F1 score (WF1), all averaged over tenfold cross-validation. Additional information including ROC curves of REHA algorithm are included in Appendix J.
Despite the inherent challenges of dataset merging and a reduced feature set, REHA's performance, a white-box model, remained robust, albeit with a decline from its performance on the unmerged dataset. The other white-box models, in contrast, experienced a more significant performance decrease across all comparisons. This underscores the challenges these models face in adapting to datasets with fewer features, which may limit their ability to perform fine-grained data separations based on explicit decision rules. Similarly, the NB algorithm struggled with the merged data, indicating difficulties in accommodating newly integrated data points.
Meanwhile, black-box models, known for their complex, non-linear decision-making capabilities, also saw a performance drop post-merge. However, the SVM algorithm and, to some extent, the RF algorithm maintained relatively high-performance metrics, reflecting their resilience to the challenges posed by data merging, which often arise from disparities in feature representation and integration-related noise.
We have conducted a thorough comparison between the protein selections used in the REHA prediction models for the original datasets and those identified after merging. The Venn diagrams developed from this data, illustrated in Fig. 5, offer visual insight into the overlap of protein selection across different glycemic states: NG-PD, NG-T2D, and T2D-PD. Intriguingly, our analysis revealed that despite the consolidation of datasets, which inherently entails a reduction in the number of features available for selection, there is a notable retention of key proteins across the datasets. Specifically, we observed that approximately 15–20% of the proteins, contingent on the specific condition under investigation, consistently re-emerged in the merged dataset. Each protein description together with its importance is described in Appendix G.
Finally, the merged dataset served as a foundation from which we derived two additional, distinct datasets: one exclusively containing our internal data and the other solely comprising data from external studies; each restricted to the same set of 645 proteins. The strategic intent behind this partitioning was to enable an in-depth comparison of the decision-making processes employed by the REHA model. We sought to ascertain the consistency of the rules generated by REHA and to explore whether the model’s reasoning would transcend the particularities of the datasets. This methodical curation aimed to provide a clear understanding of the model's performance and the potential universal applicability of the identified biomarkers across diverse data origins.
In the NG-PD context (see Fig. 6A), the REHA model's decision tree constructed from the merged dataset utilizes biomarkers like PREY (mitochondrial protein preY, methyltransferase and chaperone involved in coenzyme Q biosynthesis) and KAP2 (cAMP-dependent protein kinase type II-alpha regulatory subunit, Regulatory subunit of cAMP-dependent protein kinases involved in cellular cAMP signaling), which remain significant even when the analysis is narrowed to our dataset restricted to the same 645 proteins. Interestingly, even the external data's tree, while operating under the same protein limitation, reveals a different but informative narrative, showcasing the REHA model's ability to adapt and identify relevant biomarker patterns regardless of the dataset.
Exploring the NG-T2D problem (see Fig. 6B), the merged dataset's decision tree presents proteins like NNRD and ABCF1 as possible biomarkers, with our dataset's analysis bringing PFKAM and ANK3 to the fore. The external dataset, again consistent in protein numbers, allows REHA to adapt its pattern recognition, suggesting a flexible but robust model capable of navigating through dataset-specific nuances.
For T2D-PD, proteins such as TLN2 and SAMP are identified as biomarkers within the merged dataset's tree, with these markers also proving pivotal in our dataset's analysis (see Fig. 6C). The model's interaction with the external dataset continues this theme, endorsing some biomarkers. Additional network diagram illustrating protein–protein interactions and comparisons, grouped based on the REHA machine learning approach for NG versus PD, NG versus T2D and T2D versus PD problems is included in Appendix I.
Although finding a universal rule applicable to all datasets was challenging due to the vast number of proteins in comparison to the dataset sizes, REHA has demonstrated its capacity for high-precision classification. The discerned patterns underscore the model’s ability to simplify complex data into actionable insights, offering a clear view of how even a restrained subset of proteins can be pivotal in distinguishing between intricate health states.
Discussion
Proteomics poses specific challenges in data analysis and interpretation primarily due to the high-dimensional nature of the data and the intricate relationships between proteins. Therefore, adopting a more comprehensive approach is crucial for uncovering subtle yet significant meaningful differences in protein expression. It may involve employing the proteomic-tailored pipeline as well as application of some “white box” machine learning methods. This comprehensive dual strategy is invaluable for revealing novel patterns of changes and generating focused hypotheses.
Our exploratory analysis of protein expression in skeletal muscle across varying glycemic states after exercise challenge offers valuable preliminary insights into the metabolic complexities of IR and T2D. By integrating proteomic analysis with the REHA machine learning algorithm, we have identified interesting patterns and potential interactions in protein expression. Next, by using statistical methods dedicated to proteome data analysis, we found a total of 250 unique proteins as statistically significant between experimental groups. Among these, the expression of 98 proteins were significantly affected in the PD-NG- comparison, 207 in T2D-NG, and 79 in T2D-PD.
In the following discussion, we will examine the results of these dual analysis. Due to the length restrictions, we will focus here only on PD-NG comparison, while the rest of the discussion as well as the Functional Enrichment Analysis are enclosed in Appendix H and I in the supplementary materials. Figure 7 illustrates network diagrams generated using String software17 for proteins identified by REHA approach (top) and statistical approach dedicated to proteomics data that are shown in Fig. 4.
Prediabetes versus normoglycemic state—proteomic pipeline
In the context of comparing PD to NG using proteomic-tailored pipeline, the network analysis leads to a distinction between two main clusters of interconnected proteins and identifies several smaller ones (see Fig. 7).
In the center of the Red Cluster, there are Caveolin-1 (CAV1) and Caveolae-associated protein 4 (CAVIN4, CAVIN4 gene). Both are involved in the formation and function of caveolae, which are 50–100 nm sized invaginations in the cell membranes crucial for multiple cellular functions such as endocytosis, cholesterol homeostasis, signal transduction, and mechano-protection18,19. These structures have been shown to play a role in insulin receptor-mediated signaling and GLUT4 trafficking in skeletal muscle, with impaired function potentially leading to insulin resistance. Yoon Sin Oh et al. found that the expression of caveolins, particularly caveolin-1, along with insulin-related proteins, was increased by resistance exercise training in the skeletal muscle of 50-week-old Sprague Dawley rats. The upregulation of caveolin-1 was specifically associated with improved insulin sensitivity, especially in type 2 (EDL) muscle fibers. This suggests that the increase in the expression of caveolin-1 in skeletal muscle is dependent on the type of muscle fiber and the type of exercise performed, which may be important for improving insulin sensitivity in skeletal muscle18. The downregulation of caveolin-1 in the prediabetes group compared to normoglycemic individuals in our study could imply insufficient exercise among our patients and a possible transition from normoglycemia to a prediabetic state, which could impair caveolae function and insulin signaling.
Cavin-4 is predominantly expressed in cardiac and skeletal muscle. It regulates the morphology of caveolae and recruits MAPK1/3 to caveolae in cardiomyocytes. CAVIN4 activates MAPK1/3, regulating alpha-1 adrenergic receptor-induced hypertrophic responses, and contributes to the proper membrane localization and stabilization of caveolin-3 (CAV3) in cardiomyocytes. The distribution of Cavin-4 is disrupted in human muscle diseases related to Caveolin-3 dysfunction20. The lower expression of Cavin-4 in our study might indicate a compromised caveolae function, which could be contributing to impaired insulin signaling and glucose uptake in prediabetic patients.
Another important interconnection within this cluster involves Mitogen-activated protein kinase 1 (MK01, MAPK1 gene), which is downregulated in our study in the prediabetes group compared to normoglycemic individuals. MAPK signal transduction pathway proteins control cellular functions such as growth, differentiation, apoptosis, and proliferation21,22.
Furthermore, within the Red Cluster, there are members of the Ras GTPase protein family: Ras-related protein R-Ras2 (RRAS2), Ras GTPase-activating protein 4 (RASL2, RASA4 gene), and Ras GTPase-activating protein 4B (RAS4B, RASA4B gene), involved in regulating various cellular processes, such as the MAPK signaling pathway. Inhibition of Ras signaling has been shown to improve insulin sensitivity and glucose uptake both in vitro and in vivo. Beneficial effects of Ras inhibition have been observed, leading to attenuation of hyperglycemia in conventional models of type 2 diabetes23. In our study, we found lower expression of these proteins in the prediabetes group compared to normoglycemic individuals. This observation is not consistent with the expected trend, as reduced Ras signaling typically correlates with improved insulin sensitivity and glucose uptake, which might suggest an atypical response or compensatory mechanism in the initial stages of metabolic dysfunction.
Additionally, our study has shown lower expression of a member of the tail-anchored membrane proteins superfamily, Sarcolemmal membrane-associated protein (SLMAP), in the prediabetes group compared to normoglycemic individuals. SLMAP is found in various locations within the cell, including the endoplasmic reticulum (ER), mitochondria, cell surface membranes, and the nuclear envelope. Studies have indicated that SLMAP participates in vital physiological processes, including ion channel regulation and membrane fusion, with increased expression associated with endothelial dysfunction in diabetes24. This discrepancy might indicate an early compensatory mechanism in prediabetes or a distinct regulatory pathway.
Other notable interactions include Protein S100-A13 (S10AD, S100A13 gene) and Protein S100-A4 (S10A4, S100A4 gene) of the S100 protein family, which exhibit various functions, including stress-induced release of growth factors and adipogenesis regulation, with potential implications for obesity and insulin resistance25,26,27. Moreover, in vitro findings suggest that S100A4 may have an anti-inflammatory impact27. Our results showing lower expression of these proteins in prediabetic patients seem inconsistent with other studies that recognize S100A4 as an adipokine linked to insulin resistance (IR) and white adipose tissue (WAT) dysfunction in prepubertal populations, where variations in plasma S100A4 levels accompany longitudinal trajectories of IR during pubertal development in children26. On the other hand, Taxerås et al. showed that increased levels of S100A4 signify IR in adults with obesity but not in prepubertal children27.
Moesin (MOES, MSN gene) acts as a crucial link between the actin cytoskeleton and the plasma membrane, regulating specific cellular regions. It is consistently expressed in endothelial cells and is situated downstream of pathways mediating advanced glycation end products (AGEs). AGEs-induced moesin phosphorylation has been shown to be associated with subsequent hyperpermeability and endothelial cell barrier dysfunction in diabetes28.
Annexin A1 (ANXA1) plays roles in the inflammatory response and immune system modulation. It is found in vasculature, specifically in endothelial cells and vascular smooth muscle cells. Patients with T2D exhibited increased levels of ANXA1. Additionally, research on ANXA1-deficient mice showed that the absence of this protein exacerbates the diabetic phenotype, including dyslipidemia and insulin resistance, indicating a crucial role for ANXA1 in regulating these processes. Endogenous ANXA1 influenced the passive mechanics of the mesenteric artery in the context of insulin resistance, with its absence worsening the pathological remodeling of blood vessels in insulin-resistant mice29. ANXA1 has a dynamic role in regulating the metabolic state. At the prediabetes stage, its lower expression could be an early indicator of metabolic disturbances.
Proteins related to muscle structure and function highlight the network's complexity. Skeletal muscle fiber transformations take place in various biological processes, such as changes in neuromuscular activity, muscular disorders, age-related muscle loss, and during myogenesis30. We have shown higher expression of Alpha-actinin-3 (ACTN3), which is associated with muscle function31. The sarcomeric α-actinin proteins, actinin-2 and -3, are crucial components of the Z-line structure in skeletal muscles. They were previously believed to provide mechanical support during muscle contraction. However, recent research has revealed that these proteins are key adaptor proteins that interact with a range of structural, signaling, and metabolic proteins, including the key metabolic regulator glycogen phosphorylase (GPh)31. This absence of α-actinin-3 is associated with reduced muscle strength and mass, as well as a shift towards a more efficient oxidative metabolism. Therefore, deficiency in α-actinin-3 can negatively impact athletic performance31,32.
Additionally, in a small cohort of T2D patients, an increase in the number of individuals with α-actin-3 deficiency was noted, regardless of their BMI32,33. The study investigating the association between α-actinin-3 deficiency and obesity in both mice and humans showed mixed results regarding the role of ACTN3 R577X in weight gain and obesity among individuals of European descent. While the Actn3 KO 129X1/SvJ mice gained less weight compared to the wild-type (WT) mice when fed a high-fat diet (HFD) comprising 45% of calories from fat, the study found no significant contribution of the ACTN3 genotype alone to BMI or obesity in humans across six independent cohorts. However, this does not exclude the possibility of ACTN3 R577X playing a role in conditions linked to obesity, such as T2D and chronic heart failure (CHF)32.
Triadin (TRDN) regulates lumenal Ca2+ release from the sarcoplasmic reticulum through RYR1 and RYR2 channels, an essential step in triggering skeletal and heart muscle contraction. It ensures proper triad junction organization and is necessary for normal skeletal muscle strength.
Leiomodin is an actin filament nucleator in muscle cells related to tropomodulin, a capping protein localized at the pointed end of the thin filaments34,35. The role of leiomodin is crucial for the assembly and maintenance of thin filaments34,36. We have found lower expression of Leiomodin-2 (LMOD2) in prediabetic patients compared to healthy individuals, both after exercise intervention. Similarly, the expression of Leiomodin-1 (LMOD1) is typically reduced following either aerobic or resistance training. However, its role in skeletal muscle adaptations after exercise training is currently unclear34,37.
Additionally, we have identified a decrease in the expression of Ankyrin-3 (ANK3) in the prediabetes group compared to normoglycemic individuals. Ankyrins are adaptor molecules found in eukaryotic cells. They serve as crucial scaffolding proteins involved in anchoring to the muscle membrane, contributing to muscle development, neurogenesis, and synapse formation38. Functional variants of ankyrin-B have been linked to certain human diseases, including T2D39. Lorenzo et al. demonstrated that ankyrin-B (AnkB) deficiency in adipose tissue (AT) causes cell-autonomous adiposity, rendering mice more susceptible to becoming obese with age or when fed a high-fat diet, and causing other metabolic complications40. Our findings of decreased ANK3 expression in prediabetic patients could be consistent with ankyrins' involvement in metabolic regulation and disorders. Therefore, it will be important to understand the individual metabolic roles of ankyrins in other tissues, along with the resulting inherent effects of their deficiencies and how they affect metabolic organ cross-talk.
Additionally, this cluster includes Heat shock protein 90B1 HSP90B1/ENPL Endoplasmin. Heat shock proteins (HSPs) are known as chaperones and have roles in cell signaling and regulation of metabolism. In patients with diabetes, the expression and levels of HSPs decline as we shown in our study; however, these chaperones can help alleviate some diabetic complications, including oxidative stress and inflammation41.
This cluster mainly comprises proteins associated with cellular structure, signaling, and stress responses, reflecting key adaptive changes in prediabetes conditions affecting various cellular pathways.
Within the Yellow Cluster notable members include Complement C4-A (CO4A, C4A gene), a part of the classical complement pathway important for immune response and linked to the development of diabetes mellitus42. The upregulation of Complement C4-A in prediabetic individuals may align with the findings of D. E. McMillan, showing that serum levels of both C4 and C3 are increased in patients with diabetes43.
Creatine kinase B-type (KCRB, CKB gene) plays a significant role in ATP formation, which is necessary for energy-demanding processes in human and animal cells, especially muscle contractions44. The observed lower expression in prediabetes group could indicate a reduced capacity for efficient ATP production in muscle cells, potentially contributing to insulin resistance and other metabolic complications. However, studies in both animals and humans have shown that Creatine kinase is associated with insulin resistance and has been identified as a risk marker for cardiovascular disease, primarily due to its relationship with hypertension and increased body mass index45. It was also correlated with glycated haemoglobin in a nondiabetic general population46.
Heterogeneous nuclear ribonucleoproteins C-like 1, 2 ,3, 4, C1/C2 (HNRNPCL1, HNRNPCL2, HNRNPCL3, HNRNPCL1, HNRNPC genes) are involved in RNA processing47 and may potentially impact on insulin sensitivity and glucose metabolism48. We demonstrated the downregulation of these proteins in the prediabetes group participating in our study, which is consistent with the studies performed by Zhao M et al. They observed that reduced expression of hnRNP A1 in skeletal muscle affects metabolic properties and systemic insulin sensitivity by inhibiting glycogen synthesis, further supporting the crucial role of hnRNPs in maintaining metabolic homeostasis48.
Additionally, Serine-threonine kinase receptor-associated protein (STRAP) and Small nuclear ribonucleoprotein-associated proteins B and B' (RSMB, SNRPB gene) and N (RSMN, SNRPN gene) are involved in RNA splicing processes and may be related to various diseases49,50.
This cluster emphasizes features proteins crucial for immune response, ATP formation, insulin sensitivity, and RNA processing. These proteins may contribute to conditions like IR and T2D, indicating their significance in disease pathways.
Short/branched-chain specific acyl-CoA dehydrogenase, mitochondrial ACADS (ACADS gene) is one of the acyl-CoA dehydrogenases involved in the catabolism of short branched-chain acyl-CoA derivatives. It emerges as a pivotal node within the smaller Blue Cluster, demonstrating a regulatory effect on the metabolism of free fatty acids (FFAs), triglycerides (TGs), and cholesterol (CHOL)51. The lower expression of this enzyme observed in prediabetes could impair the breakdown of L-isoleucine and contribute to disruptions in lipid metabolism, thereby affecting overall metabolic homeostasis.
Other interconnected proteins in this cluster include Branched-chain-amino-acid aminotransferase, mitochondrial (BCAT2), which catalyzes the first reaction in the catabolism of the crucial branched-chain amino acids (BCAAs): leucine, isoleucine, and valine. BCAAs and related metabolites are now widely recognized as being strong biomarkers of obesity, insulin resistance, type 2 diabetes (T2D), and cardiovascular diseases in humans51. The skeletal muscle is a major site of BCAA catabolism, with BCAT2 being predominantly expressed in this tissue52.
The lower expression of BCAT2 in the prediabetes group compared to normoglycemic individuals, as shown in our studies, suggests decreased activity of branched-chain-amino-acid aminotransferase (BCAT) in muscle. Impairments in BCAAs metabolism are linked to insulin resistance and T2D by the accumulation of possibly toxic intermediates and BCAAs levels in plasma53,54.
This aligns with findings observed in BCAT2-/- mice, which also exhibit reduced levels of BCAA metabolites in peripheral tissues55. Despite continuous activation of mTORC1, BCAT2-/- mice do not develop insulin resistance; instead, they show improved glycemic control and insulin sensitivity with high energy expenditure53.
Additionally, we have shown downregulation of Alpha-aminoadipic semialdehyde dehydrogenase (ALDH7A1, ALDH7A1 gene), which is pivotal in lysine catabolism; Propionyl-CoA carboxylase alpha chain, mitochondrial (PCCA), which is essential for the catabolism of odd-chain fatty acids, branched-chain amino acids (isoleucine, threonine, methionine, valine), and other metabolites; and Aldehyde dehydrogenase family 3 member A2 (ALDH3A2, ALDH3A2 gene), which oxidizes medium to long-chain aliphatic aldehydes into fatty acids. This pathway, an alternative to glycolysis, provides essential reducing power and metabolic intermediates for biosynthetic processes. Deficiencies in these enzymes' activity may affect the metabolic pathways they engage in, leading to an accumulation of their respective substrates and potentially toxic intermediates.
The cluster also includes: Endoplasmic bifunctional protein GDH/6PGL (G6PE, H6PD gene), which operates within the endoplasmic reticulum, facilitating the initial steps of the pentose phosphate pathway and ATPase GET3 (GET3 gene), which is crucial for post-translational transportation of tail-anchored (TA) proteins to the endoplasmic reticulum (ER). There is also Cytosolic non-specific dipeptidase (CNDP2, CNDP2 gene), an enzyme that primarily breaks down Cys-Gly dipeptides but can hydrolyze various other dipeptides, including L-carnosine. It may function as a tumor suppressor, as elevated CNDP2 levels have been shown to induce cell apoptosis, while lower levels promote cell proliferation. Our study also revealed lower expression of catechol O-methyltransferase domain-containing protein 1 (CMTD1, COMTD1 gene). The genetic variation in catechol-O-methyltransferase (COMT), an enzyme that degrades catecholamines, is linked to cardiometabolic risk factors and incident cardiovascular disease (CVD). The COMT rs4680 high-activity G-allele was associated with lower HbA1c levels and modest protection from type 2 diabetes56.
Additionally, Carnosine synthase 1 (CRNS1, CARNS1 gene), vital for synthesizing carnosine and homocarnosine, is present. Carnosine is the most common naturally occurring histidyl dipeptide present in human skeletal muscle with a number of functions, including buffering pH levels and scavenging by-products of lipid peroxidation, as well as quenching reactive oxygen species and chelating first transition metals57,58,59. Studies have shown that exercise can increase carnosine levels, while levels of this dipeptide are significantly reduced in the muscles of both normal humans and T2D patients57,60,61. Moreover, carnosine scavenging of glucolipotoxic free radicals has been shown to have beneficial effects on glucose homeostasis by increasing insulin secretion and skeletal muscle glucose uptake59. Thus, our findings showing lower expression of Carnosine synthase 1 align with the broader trend, indicating that reduced expression of CARNS1 may lead to decreased carnosine levels, which can negatively impact glucose metabolism in prediabetic individuals.
Finally, we have shown downregulation of Glutathione peroxidase 3 (GPX3 gene) in the prediabetes group, which is vital in managing oxidative stress and enhancing antioxidant defense mechanisms. Our results are consistent with Chung et al.'s findings that GPx3 facilitates the antioxidant impact of peroxisome proliferator-activated receptor γ (PPARγ) in human skeletal muscle cells, suggesting that reduced GPx3 expression may impair antioxidant defenses and contribute to insulin resistance in prediabetic individuals62.
This cluster mainly comprises proteins involved in lipid metabolism, BCAA catabolism, and oxidative stress responses, reflecting significant metabolic alterations in prediabetes that impact insulin sensitivity and overall metabolic homeostasis.
The Green Cluster (on the left side) prominently features proteins crucial for protein synthesis. Among these are 60S ribosomal protein L21 (RL21, RPL21 gene), 60S ribosomal protein L24 (RL24, RPL24 gene), 60S ribosomal protein L27 (RL27, RPL27 gene), 60S ribosomal protein L37a (RL37A, RPL37A gene), and 40S ribosomal protein S26 (RS26, RPS26 gene). Additionally, nucleophosmin (NPM, NPM1 gene) participates in various cellular processes such as ribosome biogenesis, cell proliferation, and regulation of tumor suppressors, while Translation Machinery Associated 7 homolog (TMA7) is involved in translation regulation, and ATP-dependent RNA helicase DDX19A (DD19A, DDX19A gene) plays a role in mRNA export from the nucleus. Other notable proteins within this cluster include Signal Recognition Particle 14 kDa protein (SRP14), which participates in targeting secretory proteins to the rough endoplasmic reticulum membrane, as the ER is one of the key cellular structures that determines the maintenance of normal properties and functions of synthesized proteins.
The reduced expression of these proteins in muscle samples from prediabetic patients participating in our study is consistent with findings that indicate decreased protein synthesis. Adaptation to chronic insulin resistance may involve a reduction in ribosomal protein expression, leading to disruptions in muscle function52.
Furthermore, Ubiquitin Carboxyl-Terminal Hydrolase Isozyme L1 (UCHL1) is involved in processing ubiquitin precursors and ubiquitinated proteins, contributing to protein processing and axonal integrity63.
Lastly, Myosin-Binding Protein H (MYBPH) exhibits a strong affinity for myosin and regulates sarcomere assembly and stability. It interacts with thick myofilaments in the A-band of the sarcomere, playing a critical role in muscle contractile function and structure maintenance64.
This cluster primarily includes proteins essential for protein synthesis, cellular processes, and muscle function, highlighting significant adaptations in muscle metabolism and structure in prediabetic conditions.
On the right side we can distinguish the Lime Green Cluster with downregulated Myosin-11 (MYH11) and Myosin light polypeptide 6 (MYL6), both crucial in muscle contraction regulation. It also includes Tubulin beta-6 chain (TBB6, TUBB6 gene), a major constituent of microtubules. The modulation of a β-tubulin isotype is involved in muscle differentiation and regeneration65. Additionally, Decorin (PGS2, DCN gene) may influence the rate of fibril formation. Actin, aortic smooth muscle (ACTA, ACTA2 gene), in its intermediate form, contributes to various types of cell motility66. Cysteine-rich protein 1 (CRIP1) appears to be involved in zinc absorption and may function as an intracellular zinc transport protein. Dermatopontin (DERM, DPT gene) mediates adhesion via cell surface integrin binding and enhances TGFB1 activity. Mimecan (MIME, OGN gene) is a significant constituent of the skeletal muscle secretome, expressed differentially throughout muscle development67. Protein phosphatase 1 regulatory subunit 7 (PP1R7, PPP1R7 gene) acts as a regulatory subunit of protein phosphatase 1. Prolargin (PRELP) potentially anchors basement membranes to underlying connective tissue. Transgelin (TAGL, TAGLN gene) is an actin-binding cytoskeletal protein vital for maintaining muscle structure. Proteomic profiling of chronically low-frequency stimulated fast muscle has identified transgelin, along with cofilin-2, an endothelial marker and actin-binding protein, as novel biomarkers for evaluating muscle transformation30. Additionally, according to several studies, transgelin has been linked to increased cell motility and migration. Aldeiri et al. proposed that transgelin-expressing myofibroblasts play a primary role in mediating TGFβ signaling during ventricular septum (VBW) morphogenesis68.
This cluster consists of proteins involved in muscle differentiation, regeneration, and motility, reflecting significant disruptions in muscle structure and function in prediabetic conditions.
Prediabetes versus normoglycemic state—machine learning approach
When comparing network diagrams derived from the PD versus NG analysis and grouped using the REHA machine learning approach, alongside statistical methods tailored for proteomic data, several common themes and distinct differences emerge. This highlights the unique contributions of both machine learning and traditional statistical methods in comprehending complex biological datasets.
The network analysis contrasting prediabetic (PD) and normoglycemic (NG) states grouped based on the REHA machine learning approach reveals three main/distinct clusters of proteins reflecting the complexity of metabolic processes (see Fig. 7).
The Red Cluster includes proteins such as Adenylate kinase 2, mitochondrial (KAD2, AK2 gene), crucial for cellular energy homeostasis, ATP synthase subunit d, mitochondrial (ATP5H, ATP5PD gene), which plays a role in skeletal muscle endocrine signaling and is associated with insulin resistance69, and Mitochondrial coenzyme A transporter SLC25A42 (S2542, SLC25A42 gene) mediating the transport of coenzyme A (CoA) in mitochondria in exchange for intramitochondrial (deoxy)adenine nucleotides and adenosine 3',5'-diphosphate. Furthermore, it incorporates previously described Short-chain specific acyl-CoA dehydrogenase, mitochondrial (ACADS), which regulates lipid metabolism and participates in L-isoleucine metabolism51, Branched-chain-amino-acid aminotransferase (BCAT2), vital for branched-chain amino acid catabolism, are present70.
Overall, it highlights proteins involved in mitochondrial functions and energy metabolism, potentially influencing cellular responses to prediabetic conditions.
The Green Cluster encompasses a diverse range of proteins with significant implications for various cellular processes and metabolic pathways. Among them is Dynamin-1-like protein (DNM1L, DNM1L gene), crucial for mitochondrial and peroxisomal division, and Methylcrotonoyl-CoA carboxylase beta chain (MCCB, MCCC2 gene), which is essential for leucine catabolism30, linked with obesity and insulin resistance72. Other proteins within this cluster include: Pyruvate dehydrogenase protein X component, mitochondrial (ODPX, PDHX gene), which plays a crucial role in maintaining the functionality of the pyruvate dehydrogenase complex, a key enzyme in cellular energy metabolism and Epoxide hydrolase 1 (HYEP, EPHX1 gene) involved in the metabolism of endogenous lipids such as epoxide-containing fatty acids. Additionally, it comprises also detected before using statistical approach Mitogen-activated protein kinase 1 (MK01, MAPK1 gene), a serine/threonine kinase pivotal in the MAP kinase signaling pathway21. Overall, this cluster is intricately connected to cellular metabolism, signaling, and metabolic stress responses.
Lastly, the Blue Cluster contains 26S proteasome regulatory subunit 6B (PRS6B, PSMC4 gene), crucial for the ubiquitin–proteasome system, which is vital for protein homeostasis. Dysregulation of this system is linked to various diseases, notably diabetes73. Other important proteins present within this cluster include Eukaryotic translation initiation factor 4B (IF4B, EIF4B gene), essential for mRNA translation initiation, and 60S ribosomal protein L30 (RL30, RPL30 gene), involved in protein synthesis. Among others, Glycerol-3-phosphate dehydrogenase 1 (GPDA, GPD1 gene) connects carbohydrate and lipid metabolism, contributing electrons to the mitochondrial electron transport chain74, Bifunctional glutamate/proline–tRNA ligase (SYEP, EPRS1 gene) is a multifunctional protein involved in the aminoacylation of tRNA molecules within the aminoacyl-tRNA synthetase multienzyme complex. Furthermore, Programmed cell death protein 5 (PDCD5) may play a role in apoptosis. The proteins in this cluster appear to be engaged in crucial cellular processes such as maintaining cellular structure, regulating signaling pathways, and facilitating protein synthesis.
Both methods underscore the importance of proteins involved in mitochondrial function and energy metabolism, a crucial aspect given the metabolic disruptions associated with prediabetes.
Despite these commonalities, the machine learning-based method tends to reveal more nuanced insights into cellular mechanisms, perhaps due to its ability to parse complex patterns in high-dimensional data. For example, it delineates proteins essential for cellular protein homeostasis like 26S proteasome regulatory subunit 6B (PRS6B, PSMC4 gene). Proteasomal dysfunction can exacerbate metabolic disorders in type 2 diabetes by creating an insulin-resistant signature in skeletal muscle tissue75. Studies in insulin-resistant models, have shown that enhanced proteasome activity can be a result of diminished signaling through IRS-1 and Akt, a phenomenon observed in insulin-resistant human muscle76,77. Additionally, abnormal ubiquitination in peripheral tissues like skeletal muscle can impair insulin-stimulated glucose metabolism, mitochondrial function, and muscle mass78. The machine learning approach highlights additional crucial proteins related to cellular structure and signaling, such as Eukaryotic translation initiation factor 4B (IF4B, EIF4B gene), which is essential for mRNA translation initiation. This indicates a possibly deeper exploration of the proteomic landscape compared to the traditional statistical approach used in statistical approach.
On the other hand, the pipeline tailored for proteomic data, while perhaps less detailed in distinguishing subtle biological themes, offers robust validation of key proteins identified by machine learning, thereby reinforcing the relevance of these proteins in prediabetes. It highlights a diverse array of proteins implicated in critical cellular functions as well as broader categories, spanning from structural integrity to signaling regulation, and stress and immune response mechanisms. These proteins collectively reflect the complex adaptive changes occurring in prediabetic conditions, suggesting a more generalized view in mechanisms of IR and T2D development.
Material and methods
Patients and groups
The subjects included in this experiment were selected from participants recruited as part of the “Bialystok Exercise Study in Diabetes”, conducted by the Department of Endocrinology, Diabetology and Internal Medicine and Clinical Research Centre of the Medical University of Bialystok, as described earlier79,80. The study cohort involved physically inactive men with various degrees of dysglycemia, living in the city of Bialystok, Poland. All participants were engaged in a 3-month exercise program that included supervised training sessions at a local fitness center, as described earlier79,80. The investigation followed the ethical principles outlined in the 1964 Declaration of Helsinki, along with its subsequent amendments and was approved by the Ethics Committee of the Medical University of Bialystok (No. R-I-002/469/2014). Each participant signed informed consent before sample collection.
The study cohort consisted of 32 men aged 35–65 years, with a Body Mass Index (BMI) of 25–35 kg/m2 and leading a sedentary lifestyle (assessed using the Polish version of International Physical Activity Questionnaire—Long Form (IPAQ-LF)81 with the ability to undertake physical activity.
The participants were divided according to American Diabetes Association and Polish Diabetes Association dysglycemia recommendations into three groups:
-
NORMOGLYCEMIA (NG group; n = 13)—subjects with normal fasting glucose and normal glucose tolerance, defined as fasting plasma glucose (FPG) < 100 mg/dL and 2-h glucose during oral glucose tolerance test (2 h-OGTT-GLU) < 140 mg/dL.
-
PREDIABETES (PD group; n = 11)—subjects with impaired fasting glucose and impaired glucose tolerance, defined as FPG 100–125 mg/dL and 2 h-OGTT-GLU 140–199 mg/dL.
-
T2D (n = 8)—subjects with type 2 diabetes, diagnosed within last 3–5 years, treated with metformin only as an anti-diabetic drug.
For this study, we included only the post-exercise biopsies from all individuals. Analysis was conducted on 2 independent biopsies from each participant (total of 64 samples). Individuals within these groups were matched based on their age for, BMI, the number of training sessions completed during the intervention, and changes in daily kcal consumption (diet by confirming non-significant differences according to t-tests). For the group with type 2 diabetes, an additional inclusion criterion was Glycated hemoglobin (Hb1Ac) < 6.5% with only metformin approved in pharmacotherapy.
Exclusion criteria included smoking, drug or alcohol addiction, declared physical activity (highly active lifestyle), chronic diseases (with the exception of: hypertension, obesity with BMI ≤ 35 kg/m2, type 2 diabetes), pharmacotherapy (with the exception of the use of metformin by type 2 diabetics and angiotensin-converting-enzyme inhibitors due to hypertension), and abnormalities and medical contraindications to participate in planned exercise sessions found in the physical examination, including chronic neurovascular complications of diabetes. Additionally, there was no difference in the proportion of patients receiving angiotensin-converting-enzyme inhibitors as a treatment for hypertension between the groups.
The individuals of the Bialystok Exercise Study in Diabetes underwent clinical assessment, including an oral glucose tolerance test (OGTT), skeletal muscle and adipose tissue biopsy, and maximal cardiopulmonary exercise test (CPET), which were conducted both before and after the intervention. The tissue samples were then processed according to suitable protocols for proteomic assays as detailed below and in Appendix A.
Sample preparation, LC/MS/MS analysis and data treatment
The preparation of protein samples involved a sodium deoxycholate (SDC) assisted method with phase transfer extraction and lipid depletion, following the approach described by Leon et al.82 (see Fig. 8). About 25–30 mg of muscle tissue was pulverized in liquid nitrogen (LN2), suspended in a lysis buffer (1:10 w/v; containing 50 mM ammonium bicarbonate (ABC), 5% sodium deoxycholate (SDC), and 5 mM Tris(2-carboxyethyl)phosphine hydrochloride (TCEP)), sonicated, and then denatured by heating for 30 min at 60 °C. Reduced cysteine residues were alkylated with iodoacetamide (IAA) (1 M in ABC buffer, 15 mM final concentration) for 30 min at room temperature in the dark. After centrifugation, protein concentration was measured and normalized. Samples containing 20 µg of already reduced and alkylated protein mixture are tenfold diluted over initial conditions with 50 mM ABC buffer to 0.5% final concentration of SDC, then the proteases Trypsin/Lys-C mix (from 0.4 µg/µL stock) are added at 1:25 enzyme/protein ratio (1:50 individual enzyme/protein ratio). The digestion was performed in ThermoBlock at 37 °C for 16 h with shaking (600 rpm). Enzymes activity was inhibited by acidification with 10% trifluoroacetic acid (TFA) in LC/MS water up to a 0.5% concentration. SDC and lipids were removed by extraction, with 100 μL (1:1) of ethyl acetate (EA), repeated three times. Furthermore, 10 injection equivalents of iRT peptides were added to each sample. Injection-ready samples were stored at − 80 °C for further analysis.
Liquid-chromatography was performed on the Thermo Scientific 3500RSLC nanoLC system configured for trap-elute. Protein assignment was carried out through MS/MS analysis using a Q-Exactive MS instrument. Data-dependent (DDA) and independent analyses (DIA) were conducted. The staggered windows approach was applied for DIA method, with ion-chromatogram library constructed with gas-phase fractionation (GPF) approach.
Ion-chromatogram libraries from GPF runs were generated using Spectronaut version 15.4.210913.50606 (Rubin). The raw files were searched with the built-in Pulsar engine against the Homo sapiens Uniprot reference proteome FASTA file (UP000005640_9606, one sequence per protein) using Pulsar standard settings with a few modifications. Detailed description of proteomic analysis pipeline is presented in Appendix A.
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE83 partner repository with the dataset identifier PXD051677.
Functional enrichment analysis using string software
The network diagrams for Fig. 7 and additional figures in Appendix H, as well as the Functional Enrichment Analysis (supplementary materials), were generated using String software. The halo color is based on the occurrence of the protein in the submitted set in analyzing decision trees generated by the REHA algorithm or P-value (-Log10) in case of statistical analysis. We submitted data to normal gene set analysis using gene names with occurrence or P-value (-Log10) and following parameters:
-
FDR stringency: medium (5%)
-
Initial sort order: enrichment score
-
Organism: homo sapiens
-
Network type: full STRING network
-
Meaning of network edges: confidence active interaction sources: Text mining, Experiments, Databases, Co‑expression, Neighborhood, Gene Fusion, Co‑occurrence
-
Minimum required interaction score: medium confidence (0.500)
-
Max number of interactors to show: 1 shell:—none/query proteins only -; 2shell:—none
-
Network display options: hide disconnected nodes in the network,
-
Clustering Options: k-means
-
Number of clusters: 3, 6 or 6
Machine learning methods
For machine learning methods, protein identification and quantitation using DIA approach was performed in the same manner as for proteomic-tailored analysis. As a input, we used individual protein expression values from each sample (corresponding to mean combined peptide signal), calculated after peptide identification, protein folding and FDR correction (1% FDR for both precursors and peptides at the experiment level and 5% for proteins at the run-wise level), without applying cut-offs for experiment-level differential expression protein Log2 fold change and -Log10 p-value. Seven classification models were used. The relative evolutionary hierarchical analysis (REHA)10 stands out as a progressive algorithm that merges different variants of Relative eXpression Analysis (RXA)8 systems and reshapes the way inter-feature relations are understood, with a specific focus on gene expression data. By introducing hierarchical gene clusters; subsets of gene families associated with related proteins; REHA enhances data classification and attribute selection in the genomic field.
At its core, REHA integrates the Top-Scoring Pair (TSP)8 methodology from RXA with a decision tree's multi-test approach, enriched by several advancements9. It embeds a predetermined discriminative power of genes, employs a nuanced two-level mutation within genetic operators, and employs a multi-objective fitness function that evaluates classification accuracy, cluster consistency, and protein rankings.
Adhering to a typical evolutionary algorithm structure15, REHA utilizes an unstructured population and generational selection, with superior genes being propagated through a linear ranking selection and an elitist approach13. The algorithm's knowledge is encoded in a hierarchical tree form, where internal nodes link to gene clusters and branches marks the diverging paths based on gene co-expression and epistatic interactions, leading to leaves denoted by class labels. The tree's splits, each defined as a gene cluster, identify a primary gene pair and several surrogate pairs that underpin the primary pair's division, ranked by how closely they resemble the primary pair's sub-group divisions. This idea is similar to the multi-test concept11. The data split within REHA follows a majority voting system where all gene pairs have equal influence, and surrogate pairs can potentially override the primary pair’s decision when there’s an equal split. These clusters incorporate a gene’s discrimination rank, which is influenced by tools like Relief-F and impacts the algorithm’s operation across various stages.
REHA's fitness function is designed to favor models that are accurate, with substantial gene clusters that have closely resembling gene pairs and maintain a low complexity in terms of the number of attributes in the cluster. This addresses the unique challenge in gene expression data of balancing the risk of underfitting with simple models and overfitting with complex ones. The evolutionary process of REHA includes selection, crossover, and mutation to develop decision trees, which are then pruned to achieve simplicity without sacrificing accuracy. The algorithm has been preliminarily validated on cancer-related gene expression datasets, where it outperformed existing RXA solutions. Its interpretability and direct applicability highlight REHA’s potential for revealing intricate genomic patterns and suggests its versatility for different omics data types.
In the course of our experiments, we primarily utilized the REHA algorithm; however, for comparative purposes, we also employed a range of other popular state-of-the-art solutions. This diverse selection included both black-box and white-box models, thereby covering a spectrum of machine learning approaches that range from highly interpretable to those that are more complex and less transparent in their decision-making processes.
The IntelliOmics platform84, was used to prepare and transform datasets used in the performed experiments. Next, the algorithms (except the REHA) were optimized and tested using WEKA software85. Each algorithm has its own set of specific parameters that can be tuned to improve the performance of the model. An automatic search for the best combination of parameter values was used by iterating over a range of possible values and testing each combination against a performance metric (such as accuracy or AUC) to see which produces the best results. The setup and fine-tuning of the parameters were carried out on a subset of the training dataset and performed using AutoWeka86. Here are short descriptions of each algorithm:
-
J48 Decision Tree: The J48 decision tree is an implementation of the C4.5 algorithm87 in Java. It is known for its interpretability, as it creates a tree that can be visualized and understood, where each node represents a feature in the dataset, and each branch represents a decision rule. This ultimately leads to a decision output at the tree's leaves.
-
CART Decision Tree: CART algorithm88 is similar to J48 but typically uses a binary tree structure. It is used for both classification and regression tasks and offers high interpretability. The model is considered a white-box because the path from the root to any leaf can be easily translated into an if–then rule.
-
JRip Rule Learner: JRip is an implementation of the RIPPER algorithm89, a rule-based learning algorithm that is used for classification. It is considered a white-box model because the rules generated by JRip can be easily inspected and understood by humans.
-
Naive Bayes: NB classifiers90 are a family of simple “probabilistic classifiers” based on applying Bayes' theorem with strong (naive) independence assumptions between the features. It is sometimes considered a white box model because the probability model and the effect of each feature on the final decision are straightforward to understand.
-
Random Forest: RF91 is an ensemble learning method that constructs a set of decision trees at training time and outputs the mode of the classes for classification or mean prediction for regression of the individual trees. While individual trees are interpretable, the ensemble as a whole is often treated as a black-box due to the complexity of aggregating multiple decision paths.
-
Support Vector Machine: SVM92 is an algorithm for solving the quadratic programming problem that arises during the training of Support Vector Machines. SVMs are typically considered black-box models, especially with non-linear kernels, as the decision function and support vectors do not provide a clear rule-based decision path.
Since the datasets were not pre-divided into training and testing parts, a typical tenfold Cross-Validation was performed, which is a standard technique used for validation. The process of classification was carried out without performing any feature selection beforehand, meaning that all available features or variables in the dataset were used in the model. Presented results show an average score of 50 runs due to the existence of nondeterministic algorithm—REHA. Along with the confusion matrix, the area under the curve (AUC) and ROC curve was generated for each solution.
Data availability
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE 47 partner repository with the dataset identifier PXD051677. Data generated or analyzed during this study are included in the article and its supplementary Information files.
Abbreviations
- ABC:
-
Ammonium bicarbonate
- ACC:
-
Accuracy
- AUC:
-
Area under the ROC curve
- BMI:
-
Body mass index
- CART:
-
Classification and regression tree
- CPET:
-
Maximal cardiopulmonary exercise test
- DDA:
-
Data-dependent analysis
- DIA:
-
Data-independent analysis
- F1:
-
F1-score
- FDR:
-
False discovery rate
- LC/MS/MS:
-
Liquid chromatography with tandem mass spectrometry
- NB:
-
Naive Bayes
- NG:
-
Normoglycemic
- OGTT:
-
Oral glucose tolerance test
- PD:
-
Prediabetes
- REHA:
-
Relative evolutionary hierarchical analysis
- RF:
-
Random forest
- ROC:
-
Receiver operating characteristic
- RXA:
-
Relative eXpression Analysis
- SDC:
-
Sodium deoxycholate
- SVM:
-
Support vector machine
- T2D:
-
Type 2 diabetes mellitus
- WF1:
-
Weighted F1-score
References
Chen, Z. Z. & Gerszten, R. E. Metabolomics and proteomics in type 2 diabetes. Circ. Res. 126, 1613–1627 (2020).
Ahola-Olli, A. V. et al. Circulating metabolites and the risk of type 2 diabetes: A prospective study of 11,896 young adults from four finnish cohorts. Diabetologia 62, 2298–2309 (2019).
Gan, W. Z., Ramachandran, V., Lim, C. S. Y. & Koh, R. Y. Omics-based biomarkers in the diagnosis of diabetes. J. Basic Clin. Physiol. Pharmacol. https://doi.org/10.1515/jbcpp-2019-0120 (2020).
Freitas, P. A. C., Ehlert, L. R. & Camargo, J. L. Glycated albumin: A potential biomarker in diabetes. Arch. Endocrinol. Metab. 61, 296–304 (2017).
Gao, H. et al. UCHL1 regulates oxidative activity in skeletal muscle. PLoS One 15, e0241716 (2020).
Bengal, E., Aviram, S. & Hayek, T. p38 MAPK in glucose metabolism of skeletal muscle: Beneficial or harmful?. Int. J. Mol. Sci. 21, 6480 (2020).
Nakamura, T. et al. Double-stranded RNA-dependent protein kinase links pathogen sensing with stress and metabolic homeostasis. Cell 140, 338–348 (2010).
Eddy, J. A., Sung, J., Geman, D. & Price, N. D. Relative expression analysis for molecular cancer diagnosis and prognosis. Technol. Cancer Res. Treat. 9, 149–159 (2010).
Czajkowski, M. Relative relations in biomedical data classification. Encycl. Data Sci. Mach. Learn. https://doi.org/10.4018/978-1-7998-9220-5.CH161 (2023).
Czajkowski, M. & Kretowski, M. Relative evolutionary hierarchical analysis for gene expression data classification. GECCO 2019: Proc. 2019 Genet. Evol. Comput. Conf. https://doi.org/10.1145/3321707.3321862 (2019).
Czajkowski, M. & Kretowski, M. Decision tree underfitting in mining of gene expression data. An evolutionary multi-test tree approach. Expert Syst. Appl. 137, 392–404 (2019).
Bruderer, R. et al. Optimization of experimental parameters in data-independent mass spectrometry significantly increases depth and reproducibility of results. Mol. Cell. Proteom. 16, 2296–2309 (2017).
Kretowski, M. Evolutionary Decision Trees in Large-Scale Data Mining (Springer Publishing Company, 2019).
Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006).
Freitas, A. A. A review of evolutionary algorithms for data mining. Soft Comput. Knowl. Discov. Data Min. https://doi.org/10.1007/978-0-387-69935-6_4/COVER (2008).
Diamanti, K. et al. Organ-specific metabolic pathways distinguish prediabetes, type 2 diabetes, and normal tissues. Cell Rep. Med. 3, 100763 (2022).
Szklarczyk, D. et al. The STRING database in 2023: Protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 51, D638–D646 (2023).
Oh, Y. S. et al. Exercise type and muscle fiber specific induction of caveolin-1 expression for insulin sensitivity of skeletal muscle. Exp. Mol. Med. 39, 395–401 (2007).
Haddad, D., Al Madhoun, A., Nizam, R. & Al-Mulla, F. Role of caveolin-1 in diabetes and its complications. Oxid. Med. Cell. Longev. 2020, 9761539 (2020).
Bastiani, M. et al. MURC/cavin-4 and cavin family members form tissue-specific caveolar complexes. J. Cell Biol. 185, 1259–1273 (2009).
Arkun, Y. Dynamic modeling and analysis of the cross-talk between insulin/AKT and MAPK/ERK signaling pathways. PLoS One 11, e0149684 (2016).
Ozaki, K.-I. et al. Targeting the ERK signaling pathway as a potential treatment for insulin resistance and type 2 diabetes. Am. J. Physiol. Endocrinol. Metab. 310, E643–E651 (2016).
Mor, A., Aizman, E., George, J. & Kloog, Y. Ras inhibition induces insulin sensitivity and glucose uptake. PLoS One 6, e21712 (2011).
Lakshmanan, A. P., Samuel, S. M., Triggle, C., Tuana, B. S. & Ding, H. A role for sarcolemmal membrane-associated protein (SLMAP) in mediating autophagy in endothelial cells through an AMPK-dependent mechanism. FASEB J. 30, 948–1 (2016).
Gonzalez, L. L., Garrie, K. & Turner, M. D. Role of S100 proteins in health and disease. Biochim. Biophys. acta. Mol. cell Res. 1867, 118677 (2020).
Anguita-Ruiz, A. et al. The protein S100A4 as a novel marker of insulin resistance in prepubertal and pubertal children with obesity. Metabolism 105, 154187 (2020).
Taxerås, S. D. et al. Differential association between S100A4 levels and insulin resistance in prepubertal children and adult subjects with clinically severe obesity. Obes. Sci. Pract. 6, 99–106 (2020).
Liu, Y. et al. Role of moesin in the effect of glucagon-like peptide-1 on advanced glycation end products-induced endothelial barrier dysfunction. Cell. Signal. 90, 110193 (2022).
Jelinic, M. et al. Annexin-A1 deficiency exacerbates pathological remodelling of the mesenteric vasculature in insulin-resistant, but not insulin-deficient, mice. Br. J. Pharmacol. 177, 1677–1691 (2020).
Donoghue, P. et al. Proteomic profiling of chronic low-frequency stimulated fast muscle. Proteomics 7, 3417–3430 (2007).
Quinlan, K. G. R. et al. Alpha-actinin-3 deficiency results in reduced glycogen phosphorylase activity and altered calcium handling in skeletal muscle. Hum. Mol. Genet. 19, 1335–1346 (2010).
Houweling, P. J. et al. Exploring the relationship between α-actinin-3 deficiency and obesity in mice and humans. Int. J. Obes. (Lond) 41, 1154–1157 (2017).
Riedl, I., Osler, M. E., Benziane, B., Chibalin, A. V. & Zierath, J. R. Association of the ACTN3 R577X polymorphism with glucose tolerance and gene expression of sarcomeric proteins in human skeletal muscle. Physiol. Rep. 3, e12314 (2015).
Chereau, D. et al. Leiomodin is an actin filament nucleator in muscle cells. Science 320, 239–243 (2008).
Ly, T. et al. The N-terminal tropomyosin- and actin-binding sites are important for leiomodin 2’s function. Mol. Biol. Cell 27, 2565–2575 (2016).
Tolkatchev, D. et al. Leiomodin creates a leaky cap at the pointed end of actin-thin filaments. PLoS Biol. 18, e3000848 (2020).
Pillon, N. J. et al. Transcriptomic profiling of skeletal muscle adaptations to exercise and inactivity. Nat. Commun. 11, 470 (2020).
Chagula, D. B., Rechciński, T., Rudnicka, K. & Chmiela, M. Ankyrins in human health and disease: An update of recent experimental findings. Arch. Med. Sci. 16, 715–726 (2020).
Lorenzo, D. N. et al. Ankyrin-B metabolic syndrome combines age-dependent adiposity with pancreatic β cell insufficiency. J. Clin. Invest. 125, 3087–3102 (2015).
Lorenzo, D. N. & Bennett, V. Cell-autonomous adiposity through increased cell surface GLUT4 due to ankyrin-B deficiency. Proc. Natl. Acad. Sci. U. S. A. 114, 12743–12748 (2017).
Zilaee, M. & Shirali, S. Heat shock proteins and diabetes. Can. J. Diabet. 40, 594–602 (2016).
Shim, K., Begum, R., Yang, C. & Wang, H. Complement activation in obesity, insulin resistance, and type 2 diabetes mellitus. World J. Diabet.es 11, 1–12 (2020).
McMillan, D. E. Elevation of complement components in diabetes mellitus. Diabet. Metab. 6, 265–270 (1980).
Field, M. L., Khan, O., Abbaraju, J. & Clark, J. F. Functional compartmentation of glycogen phosphorylase with creatine kinase and Ca2+ ATPase in skeletal muscle. J. Theor. Biol. 238, 257–268 (2006).
Bekkelund, S. I. Leisure physical exercise and creatine kinase. activity The Tromsø study. Scand. J. Med. Sci. Sports 30, 2437–2444 (2020).
Bekkelund, S. I. Creatine kinase is associated with glycated haemoglobin in a nondiabetic population. The Tromsø study. PLoS One 18, e0281239 (2023).
Mo, L. et al. An analysis of the role of HnRNP C dysregulation in cancers. Biomark. Res. 10, 19 (2022).
Zhao, M. et al. Loss of hnRNP A1 in murine skeletal muscle exacerbates high-fat diet-induced onset of insulin resistance and hepatic steatosis. J. Mol. Cell Biol. 12, 277–290 (2020).
Will, C. L. & Lührmann, R. Spliceosome structure and function. Cold Spring Harb. Perspect. Biol. 3, a003707 (2011).
Manoharan, R., Seong, H.-A. & Ha, H. Dual roles of serine-threonine kinase receptor-associated protein (STRAP) in redox-sensitive signaling pathways related to cancer development. Oxid. Med. Cell. Longev. 2018, 5241524 (2018).
White, P. J. et al. Insulin action, type 2 diabetes, and branched-chain amino acids: A two-way street. Mol. Metab. 52, 101261 (2021).
Liu, Z.-J. & Zhu, C.-F. Causal relationship between insulin resistance and sarcopenia. Diabetol. Metab. Syndr. 15, 46 (2023).
Yoon, M.-S. The emerging role of branched-chain amino acids in insulin resistance and metabolism. Nutrients 8, 405 (2016).
Czajkowska, A. et al. Altered metabolome of amino acids species: A source of signature early biomarkers of T2DM BT. In Biomarkers in Diabetes (eds Patel, V. B. & Preedy, V. R.) 83–125 (Springer International Publishing, 2023). https://doi.org/10.1007/978-3-031-08014-2_5.
She, P. et al. Disruption of BCATm in mice leads to increased energy expenditure associated with the activation of a futile protein turnover cycle. Cell Metab. 6, 181–194 (2007).
Hall, K. T. et al. Catechol-O-methyltransferase association with hemoglobin A1c. Metabolism. 65, 961–967 (2016).
Posa, D. K. & Baba, S. P. Intracellular pH regulation of skeletal muscle in the milieu of insulin signaling. Nutrients 12, 2910 (2020).
Baba, S. P. et al. Role of aldose reductase in the metabolism and detoxification of carnosine-acrolein conjugates. J. Biol. Chem. 288, 28163–28179 (2013).
Cripps, M. J. et al. Carnosine scavenging of glucolipotoxic free radicals enhances insulin secretion and glucose uptake. Sci. Rep. 7, 13313 (2017).
Mannion, A. F., Jakeman, P. M., Dunnett, M., Harris, R. C. & Willan, P. L. Carnosine and anserine concentrations in the quadriceps femoris muscle of healthy humans. Eur. J. Appl. Physiol. Occup. Physiol. 64, 47–50 (1992).
Gualano, B. et al. Reduced muscle carnosine content in type 2, but not in type 1 diabetic patients. Amino Acids 43, 21–24 (2012).
Chung, S. S. et al. Glutathione peroxidase 3 mediates the antioxidant effect of peroxisome proliferator-activated receptor gamma in human skeletal muscle cells. Mol. Cell. Biol. 29, 20–30 (2009).
Bishop, P., Rocca, D. & Henley, J. M. Ubiquitin C-terminal hydrolase L1 (UCH-L1): Structure, distribution and roles in brain function and dysfunction. Biochem. J. 473, 2453–2462 (2016).
Conti, A. et al. Increased expression of myosin binding protein H in the skeletal muscle of amyotrophic lateral sclerosis patients. Biochim. Biophys. Acta 1842, 99–106 (2014).
Randazzo, D. et al. Persistent upregulation of the β-tubulin tubb6, linked to muscle regeneration, is a source of microtubule disorganization in dystrophic muscle. Hum. Mol. Genet. 28, 1117–1135 (2019).
Fang, H. et al. Correlation between single nucleotide polymorphisms of the ACTA2 gene and coronary artery stenosis in patients with type 2 diabetes mellitus. Exp. Ther. Med. 7, 970–976 (2014).
Freire, P. P. et al. Osteoglycin inhibition by microRNA miR-155 impairs myogenesis. PLoS One 12, e0188464 (2017).
Aldeiri, B. et al. Transgelin-expressing myofibroblasts orchestrate ventral midline closure through TGFβ signalling. Development 144, 3336–3348 (2017).
Formentini, L. et al. Mitochondrial H(+)-ATP synthase in human skeletal muscle: Contribution to dyslipidaemia and insulin resistance. Diabetologia 60, 2052–2065 (2017).
Jiang, P. et al. Transcriptomic analysis of short/branched-chain acyl-coenzyme a dehydrogenase knocked out bMECs revealed its regulatory effect on lipid metabolism. Front. Vet. Sci. 8, 744287 (2021).
Hu, J. J. et al. Discovery, structure, and function of filamentous 3-methylcrotonyl-CoA carboxylase. Structure 31, 100-110.e4 (2023).
Meierhofer, D., Halbach, M., Şen, N. E., Gispert, S. & Auburger, G. Ataxin-2 (Atxn2)-knock-out mice show branched chain amino acids and fatty acids pathway alterations. Mol. Cell. Proteom. 15, 1728–1739 (2016).
Kanayama, H. O. et al. Demonstration that a human 26S proteolytic complex consists of a proteasome and multiple associated protein components and hydrolyzes ATP and ubiquitin-ligated proteins by closely linked mechanisms. Eur. J. Biochem. 206, 567–578 (1992).
Harding, J. W. J., Pyeritz, E. A., Copeland, E. S. & White, H. B. 3rd. Role of glycerol 3-phosphate dehydrogenase in glyceride metabolism. Effect of diet on enzyme activities in chicken liver. Biochem. J. 146, 223–229 (1975).
Al-Khalili, L. et al. Proteasome inhibition in skeletal muscle cells unmasks metabolic derangements in type 2 diabetes. Am. J. Physiol. Cell Physiol. 307, C774–C787 (2014).
Hwang, H. et al. Proteomics analysis of human skeletal muscle reveals novel abnormalities in obesity and type 2 diabetes. Diabetes 59, 33–42 (2010).
Wang, X., Hu, Z., Hu, J., Du, J. & Mitch, W. E. Insulin resistance accelerates muscle protein degradation: Activation of the ubiquitin-proteasome pathway by defects in muscle cell signaling. Endocrinology 147, 4160–4168 (2006).
Stocks, B. & Zierath, J. R. Post-translational modifications: The signals at the intersection of exercise, glucose uptake, and insulin sensitivity. Endocr. Rev. 43, 654 (2022).
Szczerbinski, L. et al. Metabolomic profile of skeletal muscle and its change under a mixed-mode exercise intervention in progressively dysglycemic subjects. Front. Endocrinol. (Lausanne) 12, 778442 (2021).
Szczerbinski, L. et al. The response of mitochondrial respiration and quantity in skeletal muscle and adipose tissue to exercise in humans with prediabetes. Cells 10, 3013 (2021).
Craig, C. L. et al. International physical activity questionnaire: 12-country reliability and validity. Med. Sci. Sports Exerc. 35, 1381–1395 (2003).
León, I. R., Schwämmle, V., Jensen, O. N. & Sprenger, R. R. Quantitative assessment of in-solution digestion efficiency identifies optimal protocols for unbiased protein analysis. Mol. Cell. Proteom. 12, 2992–3005 (2013).
Perez-Riverol, Y. et al. The PRIDE database resources in 2022: A hub for mass spectrometry-based proteomics evidences. Nucleic Acids Res. 50, D543–D552 (2022).
Reska, D. et al. Integration of solutions and services for multi-omics data analysis towards personalized medicine. Biocybern. Biomed. Eng. 41, 1646–1663 (2021).
Frank, E. et al. Weka-A machine learning workbench for data mining. Data Min. Knowl. Discov. Handb. https://doi.org/10.1007/978-0-387-09823-4_66 (2009).
Thornton, C., Hutter, F., Hoos, H. H. & Leyton-Brown, K. Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. in Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data, Part F128815, pp. 847–855 (2013).
Salzberg, S. L. C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993. Mach. Learn. 16, 235–240 (1994).
Breiman, L., Friedman, J. H., Olshen, R. A. & Stone, C. J. Classification and regression trees. Classif. Regres. Trees https://doi.org/10.1201/9781315139470/CLASSIFICATION-REGRESSION-TREES-LEO-BREIMAN (2017).
Cohen, W. W. Fast Effective Rule Induction. in (eds. Prieditis, A. & Russell, S. B. T.-M. L. P. 1995) 115–123 (Morgan Kaufmann, 1995). https://doi.org/10.1016/B978-1-55860-377-6.50023-2.
Watson, T. J. An empirical study of the naive Bayes classifier. in (2001).
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Platt, J. Sequential minimal optimization : A fast algorithm for training support vector machines. Microsoft Res. Tech. Rep. (1998).
Acknowledgements
This study was supported by funds from the Ministry of Education and Science of Poland, within the project “Excellence Initiative—Research University”, the Ministry of Health of Poland, within the project “Center for Artificial Medicine at the Medical University of Bialystok”, and the Polish National Science Centre allocated on the basis of decision 2019/33/B/ST6/02386.
Author information
Authors and Affiliations
Contributions
A.C., P.Z. and M.C. planned project conception. L.S., A.K. provided clinical samples. A.C., P.Z. and M.C. designed the experiments. A.C., P.Z. acquired and analyzed data. M.C., K.J., M.K., W.K. and D.R. performed bioinformatics and machine learning analyses. A.K., L.S. and M.C. acquired funding. A.C., M.C. wrote the manuscript. P.Z., A.K., M.K., L.S., W.K., K.J. and D.R. assisted with manuscript editorial correction. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Czajkowska, A., Czajkowski, M., Szczerbinski, L. et al. Exploring protein relative relations in skeletal muscle proteomic analysis for insights into insulin resistance and type 2 diabetes. Sci Rep 14, 17631 (2024). https://doi.org/10.1038/s41598-024-68568-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-68568-4
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.