Identification of markers that distinguish adipose tissue and glucose and insulin metabolism using a multi-modal machine learning approach

Henninger, Josefin; Eliasson, Björn; Smith, Ulf; Rawshani, Aidin

doi:10.1038/s41598-021-95688-y

Download PDF

Article
Open access
Published: 23 August 2021

Identification of markers that distinguish adipose tissue and glucose and insulin metabolism using a multi-modal machine learning approach

Josefin Henninger^1,2,
Björn Eliasson^1,2,
Ulf Smith^1,2 &
…
Aidin Rawshani^1,2,3

Scientific Reports volume 11, Article number: 17050 (2021) Cite this article

2206 Accesses
7 Citations
3 Altmetric
Metrics details

Subjects

Abstract

The study of metabolomics has improved our knowledge of the biology behind type 2 diabetes and its related metabolic physiology. We aimed to investigate markers of adipose tissue morphology, as well as insulin and glucose metabolism in 53 non-obese male individuals. The participants underwent extensive clinical, biochemical and magnetic resonance imaging phenotyping, and we also investigated non-targeted serum metabolites. We used a multi-modal machine learning approach to evaluate which serum metabolomic compounds predicted markers of glucose and insulin metabolism, adipose tissue morphology and distribution. Fasting glucose was associated with metabolites of intracellular insulin action and beta-cell dysfunction, namely cysteine-s-sulphate and n-acetylgarginine, whereas fasting insulin was predicted by myristoleoylcarnitine, propionylcarnitine and other metabolites of beta-oxidation of fatty acids. OGTT-glucose levels at 30 min were predicted by 7-Hoca, a microbiota derived metabolite, as well as eugenol, a fatty acid. Both insulin clamp and HOMA-IR were predicted by metabolites involved in beta-oxidation of fatty acids and biodegradation of triacylglycerol, namely tartrate and 3-phosphoglycerate, as well as pyruvate, xanthine and liver fat. OGTT glucose area under curve (AUC) and OGTT insulin AUC, was associated with bile acid metabolites, subcutaneous adipocyte cell size, liver fat and fatty chain acids and derivates, such as isovalerylcarnitine. Finally, subcutaneous adipocyte size was associated with long chain fatty acids, markers of sphingolipid metabolism, increasing liver fat and dopamine-sulfate 1. Ectopic liver fat was predicted by methylmalonate, adipocyte cell size, glutathione derived metabolites and fatty chain acids. Ectopic heart fat was predicted visceral fat, gamma-glutamyl tyrosine and 2-acetamidophenol sulfate. Adipocyte cell size, age, alpha-tocopherol and blood pressure were associated with visceral fat. We identified several biomarkers associated with adipose tissue pathophysiology and insulin and glucose metabolism using a multi-modal machine learning approach. Our approach demonstrated the relative importance of serum metabolites and they outperformed traditional clinical and biochemical variables for most endpoints.

Adipose tissue morphology, imaging and metabolomics predicting cardiometabolic risk and family history of type 2 diabetes in non-obese men

Article Open access 19 June 2020

Integration of whole-body [18F]FDG PET/MRI with non-targeted metabolomics can provide new insights on tissue-specific insulin resistance in type 2 diabetes

Article Open access 20 May 2020

Metabolic profiling of tissue-specific insulin resistance in human obesity: results from the Diogenes study and the Maastricht Study

Article 17 March 2020

Introduction

Obesity is a multifactorial and heterogeneous disorder that is generally associated with metabolic alterations such as insulin resistance and type 2 diabetes, as well as a major risk factor for cardiovascular- morbidity and mortality. Adipose tissue constitutes of subcutaneous-, visceral- and peripheral “ectopic” fat depots, but functional variations in adipose tissue depots mediate discrepancies in metabolic and atherosclerotic risk. Failure of adipocyte growth and differentiation results in acquired lipodystrophy and pathologic fat accumulation. Upon excess caloric intake, energy is preferably stored in subcutaneous adipose tissue, which initially expands by hyperplastic growth, but in predisposed individuals, the subcutaneous adipose tissue fails to do so and instead exhibits cell dysfunction associated with adipocyte hypertrophy, mild inflammation and fibrotic remodeling. Adipose tissue dysfunction is considered a hallmark of type 2 diabetes and a major contributor to the development of insulin resistance, which in addition to β-cell dysfunction and impaired insulin secretion, forms the cornerstones of type 2 diabetes biology^1,2,3.

Understanding of biological mechanisms underpinning these conditions is constantly evolving and the addition of metabolomics has resulted in improved diagnosis and prognosis of metabolic disorder, increased our understanding of adipocyte biology and insulin- and glucose metabolism^4,5. Previous research indicates that metabolites reflecting glycolytic and tricarboxylic acid cycle (TCA) intermediates, branched-chain and aromatic amino acids, and long-chain fatty acids are associated to metabolic disorders^6,7,8.

Recently, our research group presented data that certain metabolites correlated to genetic predisposition to type 2 diabetes, impaired glucose tolerance, insulin resistance, adipocyte hypertrophy, and to ectopic fat accumulation, in healthy and lean study participants with- and without heredity for type 2 diabetes⁹.

In this study, using adipose tissue biopsies and magnetic resonance spectroscopy, we set out to investigate candidate markers for morphological characterization of subcutaneous adipose tissue and dysfunction, along with markers for visceral adipose tissue and lipid accumulation in ectopic depots. In addition, we investigated markers of insulin- and glucometabolism based on clinical characteristics, biochemical variables, non-targeted metabolites and magnetic resonance spectroscopy data. For this end, we constructed multi-modal predictive machine learning models to manage this high-dimensional dataset, with emphasis on untargeted serum metabolomics.

Methods

Ethics statement

All subjects received oral and written information and gave informed consent to participate. The study protocol was approved by the local Ethical Committees at the Sahlgrenska Academy at the University of Gothenburg (approvals 384-12 and T803-13). The study was performed in agreement with the Declaration of Helsinki.

Study population

We recruited 53 subjects via newspaper advertisements and through earlier studies performed at the laboratory. Inclusion criteria were male sex and general good health. The data collection of biochemical variables, radiological examinations and clinical variables have been described previously⁹.

Clinical variables

Lifestyle factors, as well as number of relatives diagnosed with type 2 diabetes mellitus, were evaluated through a questionnaire filled out in the laboratory.

Body weight and height, and waist and hip circumferences were recorded. We used bioelectrical impedance (single frequency, 50 kHz; Animeter, HTS, Odense, Denmark) to determine the proportions of body fat and lean body mass. Blood pressure was measured with a mercury sphygmomanometer in a sitting position after a 5 min rest.

Biochemical variables

After 12 h of fasting all subjects underwent an OGTT (75 g glucose orally) to assess glucose tolerance status. Samples for measurement of plasma glucose and serum insulin were drawn after 0, 30, 60 and 120 min. Using fasting plasma insulin and fasting plasma glucose from the OGTT, we calculated a HOMA-IR index using the formula HOMA-IR = (fasting plasma glucose x fasting plasma insulin)/22.5¹⁰. M and M/I following euglycemic clamps were used to validate the HOMA-IR.

To determine the first and second phases of insulin secretion, an intravenous glucose tolerance test (IVGTT) was performed after another overnight fast. A bolus of glucose (300 mg/kg in a 50% solution) was given within 30 s into the antecubital vein. Samples for the measurement of plasma glucose and insulin (arterialised venous blood) were drawn at − 5, 0, 2, 4, 6, 8, 10, 20, 30, 40, 50 and 60 min. Using the trapezoidal method, we calculated the acute and the late insulin responses, i.e. incremental area under the insulin curve, (AIR, 0–10 min; LIR, 10–60 min). These parameters were not included in prediction models due to co-linearity with oral glucose and insulin tolerance tests. Preliminary prediction models suggested that OGTT derived predictors had greater relative importance, compared to IVGTT predictors.

All subjects underwent a hyperinsulinemic euglycaemic clamp (insulin infusion: 240 pmol m⁻² min⁻¹ for 120 min), after another 12 h fast, to asses insulin sensitivity¹¹. Whole blood glucose was clamped at 5.0 mmol/l for the next 120 min by infusion of 20% glucose at various rates according to glucose measurements performed at 5 min intervals (YSI, Yellow Springs Instrument Company, OH). The M value (insulin sensitivity) was calculated as the mean glucose infusion rate during the last 30 min of the clamp adjusted for total body weight. M/I was calculated as the M-value corrected for steady-state insulin concentrations.

Plasma glucose was measured using standard laboratory methods (Department of Chemistry, Sahlgrenska University Hospital, Gothenburg, Sweden). Plasma insulin was measured at the University of Tübingen, Germany, by micro-particle enzyme immunoassay (Abbott Laboratories, Tokyo, Japan).

From each subject we obtained a subcutaneous abdominal adipose tissue biopsy to assess subcutaneous adipose tissue cell size. The biopsies (approximately 1–200 mg) were obtained with a needle aspiration technique, and further processed to evaluate adipose tissue cell size as previously stated^9,12. All metabolites were measured in serum after a 12 h fast.

Radiological variables

Magnetic resonance imaging (MRI) was used to assess the amount of intra-abdominal and subcutaneous fat. Localised ¹H-magnetic resonance spectroscopy was used to assess liver fat and heart lipids. MRI and MRS were performed using a 1.5 T MR-system (Intera/Achieva, software release 3.2) using the vendor’s 16 channel SENSE XI Torso coil (Philips Medical Systems, Best, The Netherlands). The software used included a research package enabling navigator triggered MRS and a field map based B₀-shimming. MRI images were evaluated at the level between the 4th and 5th lumbar vertebrae using T1 weighted axial images. MRI data was processed using an in-house developed segmentation program written in MatLab (MATLAB R2014b, The MathWorks Inc., USA). The surface of intra-abdominal and subcutaneous adipose tissue was quantified. Bone, muscle, lean tissue as well as inter-muscular fat were excluded. The fat fractions are reported as ratios to total body volume. MRS liver data and MRS cardiac data were processed using the jMRUI software. Magnetic resonance methods have been further reported in previous scientific works⁹.

Statistical analysis

Baseline characteristics for clinical-, biochemical-, metabolic- and imaging markers are presented as mean ± SD, for all study participants and cluster subgroups identified with k-means clustering method (Table 1).

Table 1 Baseline characteristics for all study participants including 3 unique clusters that were identified through k-means clustering method.

Full size table

Scaling of predictors in dataset

We construct extreme gradient boosting models to identify predictors for certain endpoints. These decision trees are generally considered invariant to monotonic transformations of features and node splits on one scale has a corresponding split on the transformed scale. However, extreme gradient boosting includes a linear booster and in the case of regularized regression, these models could be sensitive to feature scaling. Therefore, we have constructed both primary machine learning analyses based on Pareto scaled values for predictors and ancillary analyses of unscaled predictors. The ancillary analyses are presented in supplementary Appendix.

Prediction models

Predictive machine-learning models were constructed with extreme gradient boosting, a decision-tree-based ensemble non-parametric algorithm that applies a gradient boosting framework. Our multi-modal and high-dimensional data necessitates a robust and validated predictive machine learning model to examine relative variable importance, i.e. predictive ability of a broad range of predictors.

Extreme gradient boosting applies parallelized implementation for sequential tree construction with tree pruning depending on negative loss criterion and splits up to the max depth, backwards tree pruning, defined through hyperparameter optimization, includes sparsity awareness and uses LASSO and Ridge regularization to prevent overfitting. Hyperparameter optimization was performed for each machine-learning model on the entire dataset, subsequent to automated grid search for number of trees, maximum depth of a tree, L2 regularization, learning rate, the fraction of observations, parameters to be randomly sampled for each tree and the minimum sum of weights of all observation required in a tree node. Finally, each optimized model was validated with repeated cross-validation, using 5 to 10 iterations for various models proved to be optimal and allowed for hyperparameter optimization to be based on the entire dataset. Moreover, in each prediction model, in parallel with our examination of optimal number of folds for the repeated cross-validation, we also scrutinized the pattern for feature importance to present a final model with maximum consistency in feature importance. For each outcome, five different machine learning models were constructed. We assessed feature importance on the entire dataset (henceforth referred to as the complete dataset), and four additional prediction models that included various data pre-processing techniques for dimension reduction of metabolomics data. For each outcome, features with highest relative importance from the five different prediction models were afterwards presented in a final figure. A graphical illustration is presented in the supplementary Appendix (Fig. S6), which demonstrates the model construction, optimization and validation for primary analyses. A similar modeling approach was performed for identical outcomes on the unscaled dataset and these results are presented in Figs. S3–S5.

Feature extraction

In this study, we applied different dimensionality reduction techniques to non-targeted metabolomics parameters in order to reduce dimensions of feature space, whilst minimizing information loss. The non-targeted metabolomics data contains an excessive number of predictors for this dataset and there are presumably an abundance of metabolites that may not have any relationship with the endpoints being investigated. Principal component analysis (PCA) was performed to project scaled metabolomics data into lower dimensional space, reveal inherent data structure and provide a reduced dimensional representation of the original parameters. Principal component analysis was performed on the metabolomics separately and each machine-learning model included the first 20 principal components, which comprised of 75% cumulative variance.

In addition, we used a non-linear dimension reduction method called T-distributed stochastic neighboring embedding with an initial PCA step, perplexity at 10, theta 0.5 and 500 iterations, the metabolomics data was ultimately presented as three unique dimensions that were included in every prediction model. Moreover, exploratory factor analysis (EFA) is considered a data reduction technique and aims at explaining the relationship of many observed variables by a relatively small number of factors. The number of factors for EFA was decided using a simulated parallel analysis. We generated regression scores with 12 factors for EFA, using varimax rotation and minimum residual as factoring method.

Metabolomic data transformed with PCA, T-SNE and EFA were included in each gradient boosting model previous to automated grid search and hyperparameter optimization for the final model. Furthermore, we constructed two additional models that were based on recursive feature elimination with random forest and a complete dataset model that used all predictors (approximately 670 predictors). In the prediction models that demonstrated strong predictive ability for a parameter generated by means of dimension reduction techniques, we identified the unique predictors with peak scores in each dimension reduction model and included these predictors in the final linear regression model. In some instance, the predictions models based on the complete dataset or recursive feature elimination displayed similar metabolites as a model based on dimension reduction parameters. These metabolites were included once in the final linear regression model.

Cluster analyses

In order to distinguish unique metabolic phenotypes with distinct differences in baseline characteristics or prediction modelling, we used k-means and hierarchical clustering. Model validation for k-means clustering was measured with the Elbow-, Silhouette- and Gap statistic model. Optimal number of k for cluster generation ranged between 2 and 3 clusters. ANOVA was performed for the metabolic markers of interest and baseline characteristics for individuals in the cluster groups are presented in Table 1. Results from k-means clustering were compared to hierarchical clustering. Supplementary Appendix displays the tanglegram results for hierarchical clustering, which was computed with the complete and Ward method, using Euclidean distance matrix.

Linear regression models

Predictors with greatest relative importance identified through machine learning models were included in linear regression for assessment of effect size and significance level. Machine-learning models were used as a feature elimination method prior to feature selection for linear regression. The regression estimates and 95% confidence intervals are presented next to each machine learning model. The regression models were generated using log-transformed variables and standardized regression coefficients. Through linear regression, predictors with statistical significance were passed to identify variables of importance with linear regression.

AUC for glucose and insulin metabolism

We applied the following trapezoid formula to assess area under the curve for glucose- and insulin levels after oral glucose tolerance test:\(\user2{AUC~}\left( {\user2{Insulin}{ \setminus }\user2{Glucose}} \right) = \left( {\user2{Insulin}{ \setminus }\user2{Glucose}} \right)\left( {\user2{t}0} \right) + \left( {\left( {\user2{Insulin}{ \setminus }\user2{Glucose~}} \right)\user2{~}\left( {\user2{t}30} \right)\user2{x} \times 2} \right) + \left( {\left( {\user2{Insulin}{ \setminus }\user2{Glucose}} \right)\left( {\user2{t}60} \right) \times 3} \right) + \left( {\left( {\user2{Insulin}{ \setminus }\user2{Glucose}} \right)\left( {\user2{t}120} \right) \times \user2{~}2} \right)/4\).

Imputation

We used missForest package in R to impute missing data for study participants, this package is based on the random forest algorithm. We analyzed distributions and means before and after imputation without observing virtually any differences. In general, the dataset had minor missing data. A p-value of less than 0.05 were considered to indicate statistical significance.

Calculations were performed in R (v 4.0.2) using the following machine learning libraries: XGBoost, Rtsne, Cluster, missForest, Caret, Psych, GPArotation, ggRandomForests, Party, GridExtra, mlr3, factoextra, Boruta, and Matrix.

Results

Study population

The study includes 53 men with a mean age of 42 ± 8 years. Initially, the study cohort was constructed to investigate adipose tissue morphology and metabolism in middle-aged, healthy, lean or mildly overweight non-diabetic individuals with heredity for type 2 diabetes, henceforth referred to as first-degree relatives (FDR), compared to individuals without heredity, henceforth referred to as control subjects (CTR). Almost half of the study participants (n = 22) had a known family history of type 2 diabetes. For all study participants, mean body mass index was 25 ± 3 kg/m², mean fasting plasma glucose was 4.9 ± 0.4 mmol/L and mean fasting serum insulin was 45 ± 22 pmol/L. All study participants had normal liver function, systolic- and diastolic blood pressure and no ongoing pharmacological therapy. Baseline characteristics for k-means identified cluster groups are presented in Table 1. Mean values for age, body mass index, waist-circumference, MRS—subcutaneous fat, MRS—whole abdomen and insulin clamp ratio, differed among the three cluster groups.

Insulin- and glucometabolic markers

Figure 1 (panel A–F) displays machine learning models for insulin- and glucometabolic markers along with linear regression models for the most important predictors identified through variable importance from predictive models. Each machine learning models treated metabolomics data differently.

Hyperinsulinemic-euglycemic clamp

As shown in Fig. 1 Panel A, the strongest predictor of insulin clamp, in prediction models based on scaled values, was tartrate, 3-phosphoglycerate and fatty-chain acid metabolite, as compared to models with unscaled predictors, which shows that tartarate, 3-phoshpglycerate and MRS—liver fat, were the most important predictors (Supp Fig. S3 Panel A). As seen in Fig. 1 Panel A, the model with complete dataset did not generate predictors with strong predictability. The direction for standardized beta-coefficients are presented in Fig. 1 Panel A, linear regression for unstandardized beta-coefficients associated with insulin clamp was tartarate (βeta 1.25; 95% CI, 1.07 to 1.46), 3-phosphoglycerate (βeta 0.79; 95% CI, 0.68 to 0.92) and MRS—liver fat (βeta 0.94; 95% CI, 0.87 to 1.007) (Supp Fig. S3 Panel A).

OGTT S-insulin after 30 min

In Fig. 1 Panel B, the prediction model based on exploratory factors (EFA) had the highest R² (0.51) and lowest RMSE (12.9). Recursive feature elimination, T-SNE and PCA models displayed poor model diagnostics. Linear regression for the most important scaled predictors, revealed that myristoleoylcarnitine, enyl-stearoyl-2-oleoyl and 5-alpha-androstan-diol-sulfate, were associated with serum-insulin after 30 min, compared to the unscaled models, which identified body mass index, flavin-adenine dinucleotide-fad and 1.1 enyl palmitoyl-2-oleoyl-gpe, as statistically significant predictors (Fig. 1 Panel B and Supplementary Fig. S3 Panel B).

OGTT fasting plasma-glucose

In Fig. 1 Panel C, following predictors displayed strong predictability for fasting-plasma glucose, n-acetylgarginine and cysteine-s-sulfate. All prediction models demonstrated relatively poor model diagnostics for the target variable. Linear regression, from both the scaled and unscaled predictions models, showed that Cysteine-s-sulfate and n-acetylgarginine, were important and significant predictors for this outcome (Fig. 1 Panel C and Supplementary Fig. S3 Panel C).

OGTT fasting serum-insulin

Figure 1 Panel D shows the results for OGTT fasting insulin levels. The predictors propionylcarnitine, body weight and serum-bilirubin displayed strong predictability in several machine learning models, and both the scaled- and unscaled dataset. Linear regression demonstrated that body weight, propionylcarnitine and s-bilirubin were allmost statistically significant, (Supp Fig. S3 Panel D).

OGTT plasma-glucose after 30 min

For the prediction model of plasma-glucose after 30 min, data pre-processing techniques demonstrated poor model diagnostics in both the scaled and unscaled dataset. Prediction models based on scaled predictors (Fig. 1 Panel E) suggest that 7-Hoca, a microbiota -derived metabolite, was the most important predictor. Model diagnostics (R² and RMSE-value), were greater in the unscaled models. The unscaled models identified, body weight, eugenol sulfate, S-ALAT and MRS—Liver fat were important predictors, however no predictor demonstrated statistical significance in the regression model, except eugenol sulfate, which was nearly significant (βeta 1.04; 95% CI, 0.98 to 1.09) (Supp Fig. S3 Panel E).

HOMA2-IR

Feature importance for HOMA-IR (Fig. 1 Panel F), in the scaled dataset, showed that acelsulfame, an artificial sweetener, was the only significant predictor. Tartronate-hydroxymalonate and methyl-4-hydroxybenzoate-sulfate, were nearly significant in these models. In prediction models with unscaled predictors, MRS—liver fat (βeta 1.13; 95% CI, 1.011 to 1.26) , pyruvate (βeta 2.28; 95% CI, 1.42 to 3.67) and Xanthine (βeta 1.67; 95% CI, 1.047 to 2.67), were the strongest predictors (Supplementary Fig. S3 Panel F).

Predictors for glucose tolerance test

Predictors for insulin and glucose metabolism derived by means of oral glucose tolerance test, were amalgamated with a trapezoid formula to describe the area under the curve for OGTT related insulin- and glucose variables. In Fig. 2 Panel A–B displays the distribution of OGTT for insulin and glucose, whilst Fig. 2 Panel C shows the scatter plot for AUC glucose and AUC insulin, along with the correlation for these newly constructed variables.

Mean insulin (AUC insulin)

In Fig. 2 Panel D, the prediction model based recursive feature elimination demonstrated superior model diagnostics (R² 0.63). Prediction models with scaled predictors reveled that the most important and statistically significant predictors for mean insulin (AUC insulin) was adipocyte cell size, serum-bilirubin and propionylcarnitine was almost significant (Fig. 2 Panel D). The unscaled models revealed that adipocyte cell size and 1-palmitoyl-2-alpha-linolenoyl-gpc was the most important predictors, however only the last mentioned was significant in the regression model (βeta 1.52; 95% CI, 1.009 to 2.12) were also predictive of mean insulin (AUC insulin) (Supp Fig. S4 Panel B).

Mean glycemia (AUC glucose)

In Fig. 2 Panel E, strongest predictors for mean glycemia (AUC glucose) were glutamate and 1.non-adecanoyl-gpc, as compared to the unscaled models, which revealed that trans-urocanate, isovalerylcarnitine, MRS—liver fat and hyocholate, were the most important and significant predictors (Supp Fig. S4 Panel A).

Adipose tissue morphology

MRS—liver fat

In Fig. 3 Panel A, prediction models for liver fat demonstrated relatively low R² score but comparable RMSE values between models. Linear regression based on machine learning models for the scaled dataset, suggests that liver transaminases, methylmalonate and 1-nonadecanoyl-gpc, were statistically significant (Fig. 3 Panel E). Linear regression based on the unscaled dataset, revealed that adipocyte size (βeta 5.89; 95% CI, 0.74 to 46.7), transaminases ratio (βeta 0.46; 95% CI, 0.23 to 0.91), 1-nonadecanoyl-gpc-19.0 (βeta 0.24; 95% CI, 1.10 to 0.57) and gamma-glutamylphenalylalanine (βeta 10.5; 95% CI, 2.60 to 43.01), were statistically significant (Supp Fig. S5 Panel A).

MRS—visceral fat

In Fig. 3 Panel B, virtually all data pre-processing techniques demonstrated robust model diagnostics with high R² value and relatively comparable RMSE value. Adipocyte cell size, age and alpha-tocopherol, were prevalent in most gradient boosting models and statistically significant in linear regression model (Fig. 3 Panel E). As compared to the unscaled models, were adipocyte cell size, age and systolic blood pressure were important in the prediction models and statistically significant in linear regression (Supp Fig. S5 Panel B and Panel E).

MRS—cardiac lipids

For cardiac lipids, prediction models had low R² value, in both the unscaled and scaled models. In Fig. 3 Panel C, visceral fat and age demonstrated high feature importance in three prediction models, respectively. However, linear regression, based on scaled prediction models, showed that age and 2-acetamidophenol-sulfate, gamma-glutamyltyrosine and diastolic blood pressure, were statistically significant. Results from the unscaled dataset revealed relatively similar feature importance as the scaled dataset (Supp Fig. S5 Panel C), however linear regression showed that MRS—visceral fat was nearly statistically significant (βeta 2.27; 95% CI, 0.96 to 5.40) (Supp Fig. S5 Panel E).

Subcutaneous adipocyte cell size

In Fig. 3 Panel D, liver fat according to magnetic resonance spectroscopy, dopamine sulfate-1 and methyl-4-hydroxybenzoate sulfate, were the most important predictors for adipocyte cell size and statistically significant in the linear regression model. In supplementary Fig. S5 Panel D (unscaled models), prediction models identified methyl-4-hydroxybenzoate sulfate and MRS—liver fat as important predictors, whereas linear regression showed that MRS—liver fat, dopamine-sulfate 1 and sphingomyelin d18.1, were statistically significant.

Clustering analyses

K-means clustering was used to distinguish study participants with unique metabolic phenotypes. We experimented with two to four clusters as the optimal number of cluster groups. Results from K-means clustering (k = 3) are presented in Fig. 4, along with the three different methods that were used to identify optimal number of clusters. Predictors were scaled prior to clustering, similar to our approach for dimensionality reduction. Predictors for study participants belonging to unique clusters were thereafter transformed backwards to unscaled original values and characteristics between the groups, were analyzed with ANOVA. In Fig. 4 Panel B, mean insulin (AUC insulin) and liver fat, were the only predictors that were nearly statistically significantly between the cluster groups. Baseline characteristics for cluster groups are presented in Table 1.

In Fig. 5, a summarizing figure is presented to describe specific predictors or biological processes that were identified through the machine learning models and linear regression analyses.

Ancillary analyses

In supplementary Fig. S1 Panel A, cumulative variance for the first 20 principal components is presented. In Supp Fig. S1 Panel B–C, parallel analysis to assess optimal number of exploratory factors and hierarchical clustering are presented, respectively. Supplementary Fig. S2 Panel A–B, presents density plots for outcome variables and an ancillary analysis to identify metabolites with strongest correlation to age.

Discussion

The data obtained and analyzed in this study as well as previously published articles on the 53 subjects is to our knowledge unique in its extensive nature, combining clinical, biochemical, radiological and untargeted serum metabolomics data for comprehensive phenotypic metabolic characterization, as well as enhancing our knowledge of adipocyte biology and insulin- and glucose metabolism. In this study, our primary objective was to examine markers of insulin and glucose metabolism, whilst considering complete untargeted serum metabolomics. Moreover, we presented k-means and hierarchical cluster analyses in an attempt to identify unique metabolic phenotypes, considering our high-dimensional dataset. Predictive machine learning models were constructed in a stepwise fashion with additional pre-processing techniques to reduce number of predictors for each outcome. Our approach demonstrated that relative importance of serum metabolites outperformed traditional clinical and biochemical variables for most endpoints.

Predictive machine learning models based on oral glucose- and insulin tolerance tests, highlighted several metabolites as the most important predictors for glucose and insulin metabolism. Fasting glucose was associated with a known biomarker of obesity, namely cysteine-s-sulphate, which is involved in intracellular insulin action¹³ and n-acetylgarginine, which has been suggested to modulate glucose homeostasis, insulin sensitivity and promote lipolysis, through arginine-nitric oxide modulation of intracellular AMPK and PI3K¹⁴. In addition, cysteine is involved in gluthathione synthesis, which is known for its relation to beta-cell dysfunction.

Fasting insulin was predicted by body weight, serum-bilirubin and propionylcarnitine. Increased relative importance of propionylcarnitine, a fatty ester lipid molecule, indicates that dysregulated fatty acid metabolism and lipid metabolism in the beta-oxidation of long-chain fatty acids might cause lipid accumulation in tissues, supporting the role as an important metabolite for fasting insulin levels. Carnitine is essential for cellular energy since it transports long-chain fatty acids into the mitochondria for beta-oxidation, as well as transporting toxic compounds out of this cellular organelle to prevent their accumulation. Body weight also demonstrated high relative importance in both scaled and unscaled models, while serum bilirubin was nearly statistically significant.

Glucose levels at 30 min were predicted by 7-Hoca, microbiota derived metabolites, as well as, fatty-chain acids and the metabolite eugenol sulfate, which has been shown to lower blood glucose and blood lipids, as well as lower markers of inflammation¹⁵. According to animal models, eugenol facilitates insulin sensitivity and stimulates glucose uptake via skeletal muscle tissue and activation of the GLUT4-AMPK signaling pathway.

Both insulin clamp and HOMA-IR, were predicted by metabolites involved in beta-oxidation of fatty acids and biodegradation of triacylglycerol. Tartrate is considered a xenobiotic metabolite that is related to BMI, insulin resistance and adiponectin, while 3-phosphoglycerate is a significant intermediate in glycolysis as well as a non-ATP product of PGK1, which is critical for constructing serine and secreting insulin¹⁶. According to unscaled predictions models, important predictors included medium chain fatty acid, liver fat according to magnetic resonance spectroscopy, pyruvate and xanthine. Increasing serum levels of xanthine and xanthine oxidoreductase (XOR) has previously shown to be associated with greater production of reactive oxygen species, endothelial dysfunction, body mass index, fasting plasma insulin and insulin resistance. According to the scaled models, the artificial sweetener acelsulfame and methyl-4-hydroxybenzoate-sulfate, as well as, tartronate-hydroxymalonate, which is involved in fatty acid biosynthesis and mitochondrial energy production, proved to be important predictors for HOMA2-IR. Acelsulfame has previously been associated with increasing BMI and glucose intolerance.

Information derived from OGTT was used to calculate an area under curve value (AUC) for both glucose and insulin measures. These newly constructed endpoint variables were associated with several examined metabolites. In scaled prediction models, nonadecanoyl-gpc and glutamate, were almost statistically significant. According to unscaled prediction models, AUC for glucose was associated with bile acid metabolites, fatty acid esters (valerylcarnitine) and liver fat according to magnetic spectroscopy.

AUC insulin was predicted by subcutaneous adipocyte size as well as a metabolite of sphingolipid metabolism, a compound involved both in intracellular signaling and cell membrane turnover, as well as serum-bilirubin. Sphingolipids have previously been shown to be associated with insulin resistance, possibly via downstream insulin signaling alterations⁶. In addition to this, the scaled prediction models identified serum-bilirubin and propionylcarnitine, as important predictors for AUC insulin. Adipocyte hypertrophy has been extensively studied as a mediator in the development of insulin resistance and hyperinsulinemia and our results are in line with previous results³.

Finally, subcutaneous adipocyte size was found to be associated with markers of sphingolipid metabolism, dopamine-sulfate 1, liver fat and methyl-4-hydroxybenzoate sulfate, were important predictors for adipocyte cell size. Previous research has suggested a regulatory role for peripheral dopamine-sulfate in adipose tissue.

Clustering analyses identified three unique phenotypic groups, where levels of insulin resistance, defined by insulin clamps, differed significantly between the groups. At a tendency level, amount of visceral liver fat also differed but failed to reach statistical significance. We found several markers of amino acid metabolism that predict visceral adipose tissue, a finding that is in line with previous research as amino acid metabolites have been shown to predict insulin resistance¹⁷. We also found a bile acid metabolite, as well as a glycolysis metabolite to predict visceral liver fat, two cellular processes we have mentioned previously to be associated with insulin resistance.

In our previous research, we observed that in these subjects both visceral and subcutaneous fat area by MRS evaluation were predicted by metabolites of fatty acid oxidation. Lipid oxidation metabolites also predicted liver lipid accumulation, and cardiac lipid storage was predicted by a metabolite of branched chain amino acid (BCAA) turnover⁹. BCAA have previously been linked to IGT and overt type 2 diabetes and our findings are in line with these results^6,18. Our findings in this study are thus an addition to previous findings. Ectopic lipid accumulation in liver was predicted by amount of subcutaneous adipocyte cell size, liver transaminases, methylmalonate, lipid metabolites and gamma-glutamylphenylalanice. According to scaled models, predictors for visceral fat were subcutaneous adipocyte cell size, ectopic liver fat and insulin clamp. However, linear regression shows that only adipocyte cell size, age and alpha-tocopherol, were associated with visceral fat. Our data are not able to distinguish whether visceral fat accumulation precedes ectopic fat storage in the liver. In general, repeated cross-validation for the machine learning model for ectopic adipose tissue surrounding the heart tissue was poor, nevertheless age, diastolic blood pressure, 2-acetamidophenol-sulfate, gamma-glutamyltyrosine and visceral fat, were the best predictors.

A major strength of this study is the extensive examination of subjects using clinical and biochemical variables, imaging data and untargeted metabolomics. Some limitations of our study should be considered. The relatively small number of subjects included in our study complicates our ability to cross-validate and generalize our machine learning models. Validation models on test dataset were impracticable in some cases due to size of the cohort. We believe that a trade-off between a lesser regression-mean squared error (RMSE) value and R² is satisfactory in this dataset to signify the superior model for each endpoint.

Conclusion

We identified several biomarkers associated with markers of dysfunction of adipose tissue and its morphology and insulin and glucose metabolism using a multi-modal machine learning approach.

References

Kahn, S. E., Cooper, M. E. & Del Prato, S. Pathophysiology and treatment of type 2 diabetes: Perspectives on the past, present, and future. Lancet (Lond., Engl.) 383(9922), 1068–1083 (2014).
Article CAS Google Scholar
Hammarstedt, A., Gogg, S., Hedjazifar, S., Nerstedt, A. & Smith, U. Impaired adipogenesis and dysfunctional adipose tissue in human hypertrophic obesity. Physiol. Rev. 98(4), 1911–1941 (2018).
Article CAS Google Scholar
Smith, U. & Kahn, B. B. Adipose tissue regulates insulin sensitivity: Role of adipogenesis, de novo lipogenesis and novel lipids. J. Intern. Med. 280(5), 465–475 (2016).
Article CAS Google Scholar
Johnson, C. H., Ivanisevic, J. & Siuzdak, G. Metabolomics: Beyond biomarkers and towards mechanisms. Nat. Rev. Mol. Cell Biol. 17(7), 451–459 (2016).
Article CAS Google Scholar
Friedrich, N. Metabolomics in diabetes research. J. Endocrinol. 215(1), 29–42 (2012).
Article CAS Google Scholar
Yang, Q., Vijayakumar, A. & Kahn, B. B. Metabolites as regulators of insulin sensitivity and metabolism. Nat. Rev. Mol. Cell Biol. 19(10), 654–672 (2018).
Article CAS Google Scholar
Peddinti, G. et al. Early metabolic markers identify potential targets for the prevention of type 2 diabetes. Diabetologia 60(9), 1740–1750 (2017).
Article CAS Google Scholar
Gonzalez-Franquesa, A., Burkart, A. M., Isganaitis, E. & Patti, M. E. What have metabolomics approaches taught us about type 2 diabetes?. Curr. Diabet. Rep. 16(8), 74 (2016).
Article Google Scholar
Rawshani, A. et al. Adipose tissue morphology, imaging and metabolomics predicting cardiometabolic risk and family history of type 2 diabetes in non-obese men. Sci. Rep. 10(1), 9973 (2020).
Article ADS CAS Google Scholar
Matthews, D. R. et al. Homeostasis model assessment: Insulin resistance and beta-cell function from fasting plasma glucose and insulin concentrations in man. Diabetologia 28(7), 412–419 (1985).
Article CAS Google Scholar
DeFronzo, R. A., Tobin, J. D. & Andres, R. Glucose clamp technique: A method for quantifying insulin secretion and resistance. Am. J. Physiol. 237(3), E214–E223 (1979).
CAS PubMed Google Scholar
Gustafson, B. & Smith, U. The WNT inhibitor Dickkopf 1 and bone morphogenetic protein 4 rescue adipogenesis in hypertrophic obesity in humans. Diabetes 61(5), 1217–1224 (2012).
Article CAS Google Scholar
Carter, R. N. & Morton, N. M. Cysteine and hydrogen sulphide in the regulation of metabolism: Insights from genetics and pharmacology. J. Pathol. 238(2), 321–332 (2016).
Article CAS Google Scholar
Hu, S. et al. L-arginine modulates glucose and lipid metabolism in obesity and diabetes. Curr. Protein Pept. Sci. 18(6), 599–608 (2017).
Article CAS Google Scholar
Al-Trad, B., Alkhateeb, H., Alsmadi, W. & Al-Zoubi, M. Eugenol ameliorates insulin resistance, oxidative stress and inflammation in high fat-diet/streptozotocin-induced diabetic rat. Life Sci. 216, 183–188 (2019).
Article CAS Google Scholar
Huang, M. & Joseph, J. W. Assessment of the metabolic pathways associated with glucose-stimulated biphasic insulin secretion. Endocrinology 155(5), 1653–1666 (2014).
Article Google Scholar
Wiklund, P. et al. Insulin resistance is associated with altered amino acid metabolism and adipose tissue dysfunction in normoglycemic women. Sci. Rep. 6, 24540 (2016).
Article ADS CAS Google Scholar
Bloomgarden, Z. Diabetes and branched-chain amino acids: What is the link?. J. Diabetes 10(5), 350–352 (2018).
Article CAS Google Scholar

Download references

Funding

Open access funding provided by University of Gothenburg. Funding was provided by The Swedish Heart and Lung Foundation (Grant No. 2018-0366).

Author information

Authors and Affiliations

The Lundberg Laboratory for Diabetes Research, Department of Molecular and Clinical Medicine, The Sahlgrenska Academy at the University of Gothenburg, 413 45, Gothenburg, Sweden
Josefin Henninger, Björn Eliasson, Ulf Smith & Aidin Rawshani
Department of Molecular and Clinical Medicine, Institute of Medicine, University of Gothenburg, Gothenburg, Sweden
Josefin Henninger, Björn Eliasson, Ulf Smith & Aidin Rawshani
Wallenberg Laboratory for Cardiovascular and Metabolic Research, Institute of Medicine, University of Gothenburg, Gothenburg, Sweden
Aidin Rawshani

Authors

Josefin Henninger
View author publications
You can also search for this author in PubMed Google Scholar
Björn Eliasson
View author publications
You can also search for this author in PubMed Google Scholar
Ulf Smith
View author publications
You can also search for this author in PubMed Google Scholar
Aidin Rawshani
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The first author wrote the first draft of the manuscript: J.H. The study was designed by the third and last author: U.S., A.R. The last author performed all statistical analyses: A.R. The authors vouch for the data, interpretation and analyses: J.H., B.E., U.S., A.R. All of the authors participated in data collection and interpretation: J.H., B.E., U.S., A.R. All authors vouch for the accuracy and completeness of the data and analyses, and made the decision to submit the manuscript for publication: J.H., B.E., U.S., A.R. All named authors meet the International Committee of Medical Journal Editors criteria for authorship for this article, take responsibility for the integrity of the work as a whole, and have given their approval for this version to be published.

Corresponding author

Correspondence to Aidin Rawshani.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information 1.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Henninger, J., Eliasson, B., Smith, U. et al. Identification of markers that distinguish adipose tissue and glucose and insulin metabolism using a multi-modal machine learning approach. Sci Rep 11, 17050 (2021). https://doi.org/10.1038/s41598-021-95688-y

Download citation

Received: 06 April 2021
Accepted: 21 July 2021
Published: 23 August 2021
DOI: https://doi.org/10.1038/s41598-021-95688-y

This article is cited by

An early prediction model for gestational diabetes mellitus based on metabolomic biomarkers
- Melissa Razo-Azamar
- Rafael Nambo-Venegas
- Berenice Palacios-González
Diabetology & Metabolic Syndrome (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Adipose tissue morphology, imaging and metabolomics predicting cardiometabolic risk and family history of type 2 diabetes in non-obese men

Integration of whole-body [18F]FDG PET/MRI with non-targeted metabolomics can provide new insights on tissue-specific insulin resistance in type 2 diabetes

Metabolic profiling of tissue-specific insulin resistance in human obesity: results from the Diogenes study and the Maastricht Study

Introduction

Methods

Ethics statement

Study population

Clinical variables

Biochemical variables

Radiological variables

Statistical analysis

Scaling of predictors in dataset

Prediction models

Feature extraction

Cluster analyses

Linear regression models

AUC for glucose and insulin metabolism

Imputation

Results

Study population

Insulin- and glucometabolic markers

Hyperinsulinemic-euglycemic clamp

OGTT S-insulin after 30 min

OGTT fasting plasma-glucose

OGTT fasting serum-insulin

OGTT plasma-glucose after 30 min

HOMA2-IR

Predictors for glucose tolerance test

Mean insulin (AUC insulin)

Mean glycemia (AUC glucose)

Adipose tissue morphology

MRS—liver fat

MRS—visceral fat

MRS—cardiac lipids

Subcutaneous adipocyte cell size

Clustering analyses

Ancillary analyses

Discussion

Conclusion

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information 1.

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

An early prediction model for gestational diabetes mellitus based on metabolomic biomarkers

Comments

Search

Quick links