Adipose tissue morphology, imaging and metabolomics predicting cardiometabolic risk and family history of type 2 diabetes in non-obese men

We evaluated the importance of body composition, amount of subcutaneous and visceral fat, liver and heart ectopic fat, adipose tissue distribution and cell size as predictors of cardio-metabolic risk in 53 non-obese male individuals. Known family history of type 2 diabetes was identified in 25 individuals. The participants also underwent extensive phenotyping together with measuring different biomarkers and non-targeted serum metabolomics. We used ensemble learning and other machine learning approaches to identify predictors with considerable relative importance and their intricate interactions. Visceral fat and age were strong individual predictors of ectopic fat accumulation in liver and heart along with markers of lipid oxidation and reduced glucose tolerance. Subcutaneous adipose cell size was the strongest individual predictor of whole-body insulin sensitivity and also a marker of visceral and ectopic fat accumulation. The metabolite 3-MOB along with related branched-chain amino acids demonstrated strong predictability for family history of type 2 diabetes.

Several large clinical studies have shown that individuals with known genetic markers of insulin resistance are characterized by reduced subcutaneous adipose tissue 8,9 , which is consistent with our findings of reduced adipogenesis in first-degree relatives and T2D 4,10 . A further level of complexity is that there is considerable heterogeneity among different adipose tissue depots in regards to their metabolic regulation and endocrine secretion.
In the present study, we sought to identify novel predictors and pathways that can improve the characterization of patho-physiological alterations associated with metabolic risk. Extensive identification of biomarkers and their intricate interactions should allow new insight into the patho-physiological progression and risk of developing T2D and its consequences.
For this purpose, we used unbiased machine learning approaches to analyze this comprehensive database consisting of extensive phenotyping combined with radiological imaging and body composition analysis, subcutaneous adipose tissue mass and morphology, and non-targeted serum metabolomics in healthy lean/overweight individuals. Approximately half of our cohort consisted of individuals with a family history of T2D (FDR) and these were compared to subjects without known heredity.

Results
Study population. We included 53 men; 25 with a known family history of T2D (First-Degree Relatives; FDR) and 28 without. Mean age was 42 ± 8 years but FDR were significantly younger (39.3 vs 45.6 yrs, p < 0.004). The groups had similar body mass index and other basal phenotypic data with only minor differences as demonstrated in Table 1. Thus, the cohort exhibited no metabolic abnormalities and was comprised of middle-aged, healthy, lean or mildly overweight non-diabetic individuals with normal liver function and blood pressure levels and without any ongoing pharmacological therapy. The complete variable list of baseline characteristics for predictors used in the statistical models are presented in Supplementary Table 1. In a next step, we compared imaging and metabolomics data between FDR and study participants without heredity for T2D, henceforth referred to as control subjects. We observed no significant differences in distribution of total body fat, abdominal subcutaneous fat, visceral fat, and ectopic fat in liver or heart. In addition, we constructed a generalized linear model, adjusted for age, body mass index and group (i.e. FDR or control subjects), which showed no significant differences between baseline characteristics, with the exception of age and 3-MOB, where a p-value of less than 0.05 was considered to indicate statistical significance (see Supplementary Table 2 for more information).
These findings, in conjunction the machine learning models, justified our decision in pooling the imaging and metabolomics data between FDR and control subjects. Nonetheless, we performed sensitivity analyses by constructing separate prediction models for FDR and control subjects, as reported later.
Magnetic resonance spectroscopy -Liver lipids. The data for liver lipid accumulation was first analyzed with conditional random forest, a machine learning model that included all phenotype-, imaging-, protein www.nature.com/scientificreports www.nature.com/scientificreports/ and serum metabolomics data. As shown in Fig. 1 Panel A, the strongest predictor of liver lipids was amount of visceral fat followed by markers of degree of glucose tolerance, two of the metabolomics markers and insulin response despite the fact that all subjects were non-diabetic/IGT. Gradient boosting model demonstrated similar results as conditional random forest; i.e.; that visceral fat is the strongest predictor followed by the metabolomics markers 2-hydroxypalmitate and 2-hydroxystearate, as markers of lipid oxidation. Also, the marker of glucose tolerance, expressed as glucose levels at 60 min was a strong predictor, see Fig. 1 Panel A.
Liver fat according to computer tomography also revealed that 2-hydroxybutyrate/2-hydroxyisobutyrate as markers of lipid oxidation were the most important predictors, followed by visceral fat and imidazole propionate, see Fig. 1 Panel C.
Partial dependence analysis ( Fig. 2 Panel A) showed that increasing levels of visceral fat and marker of glucose tolerance were strong interacting factors. The partial dependence plot revealed that the interaction between markers of insulin sensitivity/resistance and visceral fat took place first at higher levels. Conditional random forest and gradient boosting for radiological examinations of the lipid accumulation in liver, heart and visceral fat -Relative variable importance by mean decrease accuracy and relative influence These figures (Panels A-C) display the relative importance of phenotype-, imaging-and metabolomic markers for distribution of visceral and ectopic fat according to magnetic resonance spectroscopyand imaging, using predictive machine learning models. Predictors that display a pronounced increase in relative importance (MDA or relative influence) compared to other predictors are strong predictors for the outcome. The strongest predictors identified from the machine learning models were included in generalized linear regression (Panel D) to demonstrate the risk for each unit increase for the most important predictors.
Magnetic resonance spectroscopy -cardiac lipids. MR spectroscopy of the heart ( Fig. 1 Panel B) showed for the amount of cardiac lipids that age was the strongest individual predictor followed by intensity of physical activity. The conditional random forest model and gradient boosting analysis showed relatively diverse results. However, the gradient boosting model had lowest RSME score at 1.67. Age was a strong predictor in the gradient boosting model followed by 3-methyl-2-oxobutyrate (3-MOB) as the strongest metabolomic predictor, and heredity for diabetes. As shown later, 3-MOB was also the strongest predictor of family history for T2D. The conditional random forest model validated the predictive effect of age and physical activity level, Fig. 1 Panel B. In order to properly assess the importance of heredity for diabetes, we constructed additional machine learning Figure 2. Random forest models for radiological examinations of liver, heart, visceral-and subcutaneous fat and abdominal cross section -Partial dependence plots for strongest predictors These figures (Panels A-F) are two-way partial dependence plots that show the dependence between radiological examinations of fat distribution and the most important predictors, marginalizing over the values of all other features. These are generated with conditional random forest models and enable us to visualize interactions among predictors. models with a binomial predictor for heredity called group (i.e. first-degree relatives vs. control subject), which resulted in similar outcomes.
The partial dependence analyses ( Fig. 2 Panel B) revealed minor interactions between the most important predictors for lipid accumulation in heart, according to the machine learning models. However, heredity for diabetes demonstrated strong interaction effects between age and visceral fat. Thus, like liver lipids, cardiac lipids were predicted by amount of visceral fat but physical activity and age were unexpectedly strong predictors. Linear regression showed that only age was a significant predictor for lipid accumulation in heart (RR 1.15; 95% CI, 1.05 to 1.26), Fig. 1 Panel D.
Magnetic resonance imaging -Visceral fat area. Visceral fat area ( Fig. 1 Panel C) was most strongly associated with other adipose tissue stores, in particular subcutaneous adipose cell size and amount followed by age and liver lipids while the strongest metabolomics predictor was again 2-hydroxypalmitate, a marker of lipid oxidation.
Partial dependence analysis ( Fig. 2 Panel D) showed that abdominal subcutaneous adipocyte size and amount of liver lipids were strong interacting predictors.
Gradient boosting ( Fig. 1 Panel C) validated the strong impact of adipocyte size, liver lipids and age. These findings were confirmed in predictive models for MRI -abdominal cross section (L4/L5) as seen in Fig. 3 Panel A. Thus, these data show that ectopic fat accumulation in the liver, subcutaneous adipose cell size and markers of lipid oxidation such as 2-hydroxybutyrate/2-hydroxyisobutyrate, are important predictors of each other. Linear regression for visceral fat showed that age was the only significant predictor (RR 1.04; 95% CI, 1.02 to 1.05).
Magnetic resonance imaging -Abdominal subcutaneous fat. In agreement with the analysis above, amount of subcutaneous adipose tissue was best predicted by subcutaneous adipocyte size followed by visceral fat, markers of insulin sensitivity ( Fig. 3 Panel B), and the fatty acid metabolites tetradecanedioate and 2-hydroxypalmitate were also important predictors ( Fig. 3 Panel B). However, the partial dependence analysis revealed that adipocyte size and visceral fat were not strong interacting predictors ( Fig. 2 Panel F) and also in the gradient boosting analysis ( Fig. 3 Panel B). The strongest metabolomics markers were 2-hydroxypalmitate and the dicarboxyl fatty acid tetradecaneoates which have previously been shown to be markers of elevated blood pressure and all-case mortality in the Twins UK and KORA cohorts and also to have effects on blood pressure in animal models 11 . Heredity of type 2 diabetes. We also performed predictive machine learning models for heredity of T2D since the cohort included 25 first-degree relatives. The random forest and gradient boosting models demonstrated that 3-MOB closely followed by waist-hip ratio and caprate, which is an ester of decanoic acid, were the strongest predictors for heredity of diabetes ( Fig. 4 Panel A). Stepwise modeling of the most important predictor in the previous model demonstrated that branched-chain amino acids (valine and isoleucine), acetoacetate and insulin sensitivity (OGTT S-Insulin t-30) were the strongest predictors of 3-MOB ( Fig. 4 Panel B). The metabolite 3-hydroxybutyrylcarnitine, which is associated with insulin resistance, type 2 diabetes and fatty acid oxidation in the liver, was also an important predictor of 3-MOB and acetoacetate ( Fig. 4 Panel B). Partial dependence plots revealed interaction effects between branched-chain amino acids and markers of lipid metabolism, such as the interaction between 3-hydroxybutyrylcarnitine and valine ( Fig. 4 Panel D).

Subcutaneous adipose cell size and insulin sensitivity (HOMA2-IR).
Adipose cell size is rarely measured in clinical studies but appeared as a strong predictor of cardiometabolic risk factors including amount of visceral fat and liver lipids (see Fig. 1 Panels A-C). Thus, we set out to identify predictors of adipose cell size. As shown in Fig. 5 Panels A-B, fat in visceral and subcutaneous depots were strong predictors followed by measures of insulin sensitivity and waist circumference. We constructed additional models as sensitivity analyses to evaluate the strongest predictor for abdominal subcutaneous adipocyte size in a model without imaging variables and waist circumference. This revealed that insulin sensitivity (euglycemic clamp data) was the most important predictor closely followed by mono-and dicarboxyl fatty acids such as trans-urocanate and tetra-decanedioates as the strongest metabolomics markers of subcutaneous adipose cell size.
Furthermore, we scrutinized predictors of HOMA2-IR; a widely used clinical marker of insulin sensitivity. The initial machine learning models revealed¸ as expected, that glucose tolerance and insulin sensitivity were the strongest predictors since they are used to construct the HOMA index. These were, therefore, excluded from the analysis. Under these conditions, subcutaneous adipocyte cell size was the most important predictor for HOMA2-IR and explained approximately 40-60% of overall predictability, Fig. 5 Panel B. The conditional random forest model for HOMA2-IR, not including radiological examination and serum insulin after oral glucose tolerance test, demonstrated similar importance for subcutaneous adipocyte cell size. 3-hydroxybutyrate was also strongly related to HOMA2-IR. Taken together, subcutaneous adipose cell size is an integrated marker of amount of ectopic fat and adipose tissue mass and, as such, an excellent marker of whole-body inulin sensitivity.
Relative contribution of imaging data and heredity for type 2 diabetes. In addition to the relative importance of each predictor, we also calculated the relative contribution, i.e. the percentage of each predictor explained in every machine learning model and, thereafter, calculated the average of these percentages for each predictor from the random forest and gradient boosting models. Figure 6 displays a mapped network between these predictors and how they are intertwined based on the relative contribution. Our machine learning analyses show that visceral fat, subcutaneous adipose tissue and waist-hip ratio are all strong predictors of adipose tissue mass and subcutaneous adipose cell size which, in turn, is a strong predictor for insulin sensitivity (HOMA2-IR). (2020) 10:9973 | https://doi.org/10.1038/s41598-020-66199-z www.nature.com/scientificreports www.nature.com/scientificreports/ Moreover, adipose tissue mass, most strongly visceral fat, is an important predictor for ectopic fat distribution in liver and heart. The metabolite 3-methyl-2-oxobutyrate (3-MOB), shown to be a marker of impaired fasting glucose (IFG) 12 and a strong predictor of ectopic fat accumulation. For heredity of type 2 diabetes, 3-MOB was a strong predictor along with glucose tolerance and insulin sensitivity. We performed stepwise modeling, which validated that BCAA and related metabolite (3-MOB) were strong predictors of heredity for type 2 diabetes.

Discussion
The purpose of this extensive study was to identify predictors of cardiometabolic risk profile based on different imaging analyses, glucose tolerance, insulin sensitivity, family history of diabetes and untargeted serum metabolomics. Family history is well known to be one of the most prominent risk factors for future development of T2D 13 . We sought to eliminate differences in baseline characteristics between control subjects and first-degree relatives. Therefore, all study participants were matched for BMI, blood pressure, waist circumference, fasting glucose and insulin levels, with the exception of age, which was significantly higher in first-degree relatives (6 years) Figure 3. Conditional random forest and gradient boosting for radiological examinations of an abdominal cross section, subcutaneous fat and liver fat according to magnetic resonance imaging and computer tomography -Relative variable importance by mean decrease accuracy and relative influence. These figures (Panel A-C) display the relative importance of phenotype-, imaging-and metabolomic markers for distribution of tissue in abdominal cross section, subcutaneous fat and liver fat according to magnetic resonance imaging and computer tomography. Strongest predictors were identified with machine learning models. Predictors that display a pronounced increase in importance compared to other predictors are strong predictors for the outcome. The regression estimates in Panel D shows the strongest predictors from the random forest models associated with risk increase and which are subsequently included in linear regression models to demonstrate the risk for each unit increase for those predictors.
It is clear that amount of visceral fat is the best marker of liver lipids, a strong marker of heart lipids and it was also a major marker of amount of subcutaneous adipose tissue and its adipose cell size. We have previously also shown that waist circumference, a well-established marker of visceral fat 14 , is strongly correlated to subcutaneous adipose cell size and the current data corroborate this 15 .
Interestingly, age was strongly associated with both heart, visceral and liver lipids. This is also consistent with previous findings showing that amount of visceral fat increases with age independent of BMI 14 . This increased visceral and ectopic fat accumulation may be caused by aging-associated cell senescence, which prevents the formation of new functional cells in the subcutaneous adipose tissue and leads to lipid spill-over 4,16 . Surprisingly we observed that degree of physical activity was positively related to amount of heart lipids, which is similar to what has been shown for lipids in skeletal muscles 17 . Thus, this may be a general effect of physical activity and unrelated to negative metabolic consequences. Conditional random forest and gradient boosting models for heredity of type 2 diabetes with stepwise modeling for the strongest predictor in preceding model -Relative variable importance by mean decrease of accuracy and relative influence. These figures (Panels A-D) display the relative importance of phenotype-, imaging-and metabolomic markers for heredity of type 2 diabetes and the predictors that are strongest in each preceding machine learning models as a stepwise modeling approach to identify the metabolite pathway. Predictors that display a pronounced increase in relative importance (MDA or relative influence) compared to other predictors are strong predictors for the outcome. The beta coefficients (95% confidence intervals) next to the strongest predictors in the random forest models are generated from generalized linear regression models, in order to demonstrate the risk for each unit increase. The asterisk (*) denotes that decreasing values for that predictor is associated with risk increase.

Scientific RepoRtS |
(2020) 10:9973 | https://doi.org/10.1038/s41598-020-66199-z www.nature.com/scientificreports www.nature.com/scientificreports/ Expansion of subcutaneous adipose cells is strongly associated with reduced whole-body insulin sensitivity, enhanced inflammation and it is also an independent predictor of future risk of developing T2D 2,18,19 . It is a consequence of inability to recruit new functional adipose cells 10 , which, in turn, is associated with a genetic predisposition for T2D, but, not for obesity 6,7 , and increased precursor cell senescence 4 . Several recent studies have also shown that individuals with genetic markers of insulin resistance are characterized by reduced subcutaneous adipose tissu 8,9 , which is expected to increase the adipose cell size and accumulation of ectopic lipids.
Interestingly, we also found that adipose cell size is a major predictor of whole-body insulin sensitivity measured as HOMA index or with euglycemic clamps and that visceral fat is a predictor of subcutaneous adipose cell size as well as a marker of glucose tolerance. Alfa-hydroxybutyrate (α-HB) was the strongest metabolomics predictor of HOMA insulin sensitivity. Alfa-hydroxybutyrate has also previously been shown as a marker of insulin resistance, IGT and progression to developing T2D 20,21 .
Glucose tolerance and degree of insulin sensitivity were strong predictors of family history of T2D. However, the strongest predictor was 3-methy-2-oxobutyrate (3-MOB), a key metabolite of branched-chain amino acid Figure 5. Conditional random forest, gradient boosting and partial dependence plots for adipocyte size and HOMA2-IR, with linear regression -Relative variable importance by mean decrease accuracy. Panels A to B display the relative importance of phenotype-, imaging-and metabolomic markers for adipocyte size and insulin sensitivity (HOMA), using machine learning models. Predictors that display a pronounced increase in importance compared to other predictors are strong predictors for the outcome. Panels C to D show partial dependence plots for the most important predictors identified by predictive machine learning models. Panel E shows risk increase for each unit increase according to linear regression models. *Group variable denotes a binomial predictor for heredity of type 2 diabetes, i.e., either first-degree relative or control subject.

Scientific RepoRtS |
(2020) 10:9973 | https://doi.org/10.1038/s41598-020-66199-z www.nature.com/scientificreports www.nature.com/scientificreports/ (BCAA) 22 and isoleucine as well as valine (not shown) were also strong predictors. We also found mannose as a marker of insulin resistance and future risk of developing both T2D and CVD 23,24 to be a predictor of family history and its phenotype. Detailed partial dependence plots also showed that isoleucine and a-HB strongly interacted to predict 3-MOB levels and this was also found for valine (data not shown). Previous work has shown that a-HB levels predict IGT while 3-MOB is more closely associated with IFG 20 . However, all our subjects had normal GTT and no IGT or IFG were included. This suggests that these biomarkers of family history actually predict inherent factors associated with family history and future risk of developing glucose intolerance.
Amount of liver lipids and heart lipids showed quite distinct differences in their predictors. Amount of visceral fat, lipid oxidation products, glucose tolerance and amount of abdominal subcutaneous fat and adipose cell size mainly predicted liver lipids. Heart lipids were also predicted by amount of visceral fat and liver lipids, but more strongly by age and degree of physical activity as discussed. In fact, age was an important predictor of amount of visceral fat as well as ectopic fat. This is likely a consequence of increased subcutaneous adipose tissue progenitor cell senescence 4,16 preventing recruitment of new cells and promoting lipid storage in other depots.
The strength of this study is the extensive phenotyping together with imaging data and metabolomics markers while a weakness is the limited number of individuals included. However, most previous studies have not been able to integrate all information as generated in this study. In Fig. 6, we have summarized our key findings of predictors and their apparent associations. It is clear that amount of adipose tissue, also in this non-obese cohort, and subcutaneous adipose cell size are central factors associated with both insulin sensitivity (HOMA and clamp data) and ectopic liver lipids. In fact, the most prominent predictor of HOMA, excluding glucose and insulin levels, was adipose cell size while a-HB was the most prominent metabolomics marker. Increased subcutaneous adipose cell size is associated with amount of adipose tissue and, in particular, visceral fat and waist circumference. Our study comprised exclusively of non-obese men, therefore, it is conceivable that these findings are imprecise for women.
Taken together, our current data show that subcutaneous adipose tissue mass and cell size, even in the non-obese state, are key predictors of whole-body insulin sensitivity as well as visceral and ectopic hepatic lipid accumulation. Family history of T2D is predicted by both OGTT and HOMA, but the most prominent marker was the BCAA metabolite 3-MOB as well as mannose and a-HB. It is likely that family history of T2D and its associated genetic effects are central for both the expanded adipose cell size and accumulation of ectopic fat, which are all components of the cardio-metabolic risk profile.

Study population and Methods
The local Ethical Committee at the Sahlgrenska Academy at the University of Gothenburg approved the study protocol. The study was performed in agreement with the Declaration of Helsinki. All subjects received oral and written information and gave their informed consent to participate. Study population. We recruited in total 25 first-degree relative and 28 control persons via newspaper advertisements and through earlier studies performed at the laboratory. Inclusion criteria in the first-degree relative group were male sex, at least one first-degree relative with a diagnosis of type 2 diabetes mellitus and general good health. Inclusion criteria in the control group were male sex, no first-degree relative with type 2 diabetes mellitus and general good health. Most recruited individuals had previously undergone examinations in the laboratory and had gained around 8% in body weight since previous investigations. We also recruited other individuals from previous recent investigations and they were placed on a supervised hypercaloric diet to also increase body weight with around 8% before inclusion in the study. This was done to ensure similar changes in body weight and hypercaloric diet intake in the groups. Subcutaneous adipose tissue biopsies were performed from the lower abdominal wall and processed for cell sizing as previously reported 10 . This study was approved by Figure 6. Relative importance between imaging, phenotype and metabolomics data, in individuals with-and without heredity for type 2 diabetes. The relative importance for predictors was ranked according to highest relative contribution of each predictor in both the random forest model and gradient boosting model. The size of the arrows does not indicate greater strength between the predictors and outcomes.

Scientific RepoRtS |
(2020) 10:9973 | https://doi.org/10.1038/s41598-020-66199-z www.nature.com/scientificreports www.nature.com/scientificreports/ the regional ethical committee (approvals 384-12 and T803-13). All general clinical investigation methods have been described previously 25,26 . clinical variables. Lifestyle factors were evaluated through a questionnaire filled out at the laboratory. Body weight and height, and waist and hip circumferences were recorded and BMI was calculated. The proportions of body fat and lean body mass were determined using bioelectrical impedance (single frequency, 50 kHz; Animeter, HTS, Odense, Denmark). Blood pressure was measured in a sitting position after a five minutes rest with a mercury sphygmomanometer.
To evaluate glucose tolerance status, fasting blood samples were drawn after 12 hours of fasting and were followed by an OGTT (75 g glucose orally). Samples for measurement of plasma glucose and serum insulin were drawn after 0, 30, 60 and 120 minutes.
Again after 12 hours of fasting, an intravenous glucose tolerance test (IVGTT) was performed to determine the first and second phases of insulin secretion. A bolus of glucose (300 mg/kg in a 50% solution) was given within 30 seconds into the antecubital vein. Samples for the measurement of plasma glucose and insulin (arterialized venous blood) were drawn at −5, 0, 2, 4, 6, 8, 10, 20, 30, 40, 50 and 60 minutes. The acute and the late insulin responses, i.e. incremental area under the insulin curve, (AIR, 0-10 minutes; LIR, 10-60 minutes) were calculated using the trapezoidal method.
In subgroup 1 the IVGTT was followed by a hyperinsulinemic euglycaemic clamp was initiated (insulin infusion: 240 pmol m −2 min −1 for 120 min) to evaluate insulin sensitivity as previously reported 27 . Whole blood glucose was clamped at 5.0 mmol/l for the next 120 minutes by infusion of 20% glucose at various rates according to glucose measurements performed at five minutes intervals (YSI, Yellow Springs Instrument Company, OH). Insulin sensitivity (M) was calculated as the mean glucose infusion rate during the last 30 minutes of the clamp adjusted for body weight, and M/I was calculated as the M-value corrected for steady-state insulin concentrations. In all subgroups, fasting plasma insulin and fasting plasma glucose from the OGTT were used to calculate a HOMA-IR index using the formula HOMA-IR = (fasting plasma glucose × fasting plasma insulin)/22.5 M and M/I were used to validate the HOMA-IR. Only the HOMA-IR value is reported here to assess insulin sensitivity. All metabolites were measured in serum subsequent to overnight fasting.
Plasma glucose was measured using standard laboratory methods (Department of Chemistry, Sahlgrenska University Hospital, Gothenburg, Sweden). Plasma insulin was measured at the University of Tübingen, Germany, by micro-particle enzyme immunoassay (Abbott Laboratories, Tokyo, Japan).
A subcutaneous abdominal adipose tissue biopsy was performed. The biopsies (approximately 20-30 mg) were obtained with a needle aspiration technique, from the paraumbilical region after local infiltrative anesthesia with lidocaine (20 ml, 0.5%) 28 .
Isolation of adipocytes was performed by initially washing the biopsied to remove traces of blood, followed by and treatment with collagenase (1 mg/ml) (Sigma, St Louise, MO, USA) for 60 minutes at 37 °C in a shaking water bath. Isolated adipocytes were filtered through a 250 mm nylon mesh and washed with fresh medium. Adipocyte cells were placed on a siliconized glass slide and 100 consecutive cell diameters were measured with a calibrated ocular and expressed as the average value in μm.
MRi and MRS for fat determination. MR imaging and localized 1 H-MR spectroscopy was performed using a 1.5 T MR-system (Intera/Achieva, software release 3.2) using the vendor´s 16 channel SENSE XL Torso coil (Philips Medical Systems, Best, The Netherlands). The software used included a research package enabling navigator triggered MRS and a field map based B 0 -shimming 29,30 . MRi to assess abdominal fat. The amount of intra-abdominal and subcutaneous fat was assessed at the level between the 4th and the 5th lumbar vertebra (L4/L5) using T1 weighted axial images acquired with a TE of 5.24 ms, TR of 91 ms, 80 degree flip angle, pixel size of 1*1 mm 2 and a 10 mm slice thickness.
Data processing was performed using an in-house developed segmentation program written in MatLab (MATLAB R2014b, The MathWorks Inc, USA). An intensity threshold value for fat signal was determined for each individual patient and the surface of intra-abdominal and subcutaneous adipose tissue was quantified. Inter-muscular fat, as well as muscle, bone and lean tissue was excluded. The fat fraction is reported as intra-abdominal to total volume, and subcutaneous adipose tissue to total volume ratios.
MRS to asses liver fat. Liver spectra were acquired in end expiration using point resolved spectroscopy (PRESS) with a TE of 30 ms and TR of 2700 ms. Four non-water suppressed signals and 32 water suppressed (CHESS) signals were acquired with 1.0 kHz spectral bandwidth and 1024 data points. The voxel (2.5 × 2.5 × 2.5 cm 3 ) was positioned within the right liver lobe and care was taken to exclude large intrahepatic blood vessels, bile ducts and abdominal adipose fat.
The liver MRS data were processed using the jMRUI software. In jMRUI the spectra were eddy current corrected, and residual water and base line were removed using a Hankel-Lanczos filter (HLSVD, a single decomposition method). For the non-water suppressed reference spectra all metabolite signals, except water, were removed using the HLSVD algorithm. In the processed spectra water (H 2 O, 4,7 ppm), methylene (CH 2 , 1.3 ppm) and methyl (CH 3 , 0.9 ppm) were quantified using the AMARES algorithm and all metabolites were corrected for T2 relaxation using T2 values from the literature [31][32][33] . The fat fraction(%), denoted MRS-liver lipids in results, is calculated as (CH 2 + CH 3 )/H 2 O. MRS to assess heart lipids. Cardiac MRS measurements were performed using PRESS with a TE of 35 ms.
The cardiac MRS data were processed using the jMRUI software. In jMRUI the spectra were eddy current corrected and residual water and base line were removed using HLSVD. In the processed spectra trimethylamines (TMA), creatin (Cr), methylene (CH 2 ), methyl (CH 3 ) and an additional lipid complex at 2.1 ppm were quantified using the AMARES algorithm. All metabolites were corrected for T2 relaxation using T2 values from the literature and the fat fraction (%), denoted as MRS cardiac lipids in results, was calculated as (CH 2 + CH 3 )/H 2 O. Statistical analysis. Our analysis of data involves baseline characteristics for phenotype-, imaging-and metabolomic variables for all study participants and according to hereditary status for diabetes, presented as mean ± SD (Table 1 and supplementary Table 1). Correlation between covariates in persons with first-degree relatives and control subjects were tested by Spearman correlation coefficient (supplementary figure 1). A general linear model were applied to test for associations while controlling for the potential confounders, i.e. body mass index, age and a variable termed group, denoting first-degree relative or control subjects, see supplementary Table 2 for more details.
predictive models. We constructed predictive models with Conditional Random forest and Gradient boosting to examine relative variable importance, i.e. predictive ability of a broad range of predictors for amount of ectopic-, visceral-and abdominal subcutaneous fat accumulation, assessed with various radiological examinations. Random forest and Gradient boosting are both machine learning algorithms.
conditional random forest. The random forest models were based on conditional inference trees, in order to reduce bootstrap sample bias and variable selection bias, resulting in more accurate measurements of variable importance, particularly for categorical predictors 34,35 . Random forest is a nonparametric machine learning model that consists of a collection of decision trees with random feature selection. Each regression tree is tested on a bootstrap sample of two-third of the original dataset for training instances, features and one-third for out-of-bag (OOB) measures. Conditional random forest has built-in mechanisms for estimation of test error and certainty in each prediction model by using OOB-error rate. Initially, internal-validation mechanism was utilized to construct conditional random forest models that minimalized the OOB-error rate. Thereafter, results from internal validation techniques were compared to k-fold cross validation with test and training dataset, using 3 to 5 iterations for various models turned out to be superior.

Gradient boosting models.
In addition to conditional random forest models, we constructed gradient boosting models to compare results from various models. Gradient boosting is another machine learning model that constructs ensembles of decision trees and measures variables importance in similar ways as conditional random forest models. However, the prediction is performed differently. In Gradient boosting each tree is grown on the residuals of the previous tree and prediction is subsequently accomplished by weighting the ensemble outputs of all the regression trees. In the gradient boosting models, we assumed Gaussian distribution for reducing the squared-error loss and the shrinkage factor applied to each tree was set to 0.001 36 . With these settings, the gradient boosting models showed rather similar predictions as the conditional random forest models.

Model building.
Hyperparameter optimization was performed for every machine learning model, using automated grid search. Parameters that were evaluated in the conditional random forest model included number of trees, predictors in each split, minimum sum of weights in a node to be considered for splitting and the proportion of observations needed to establish a terminal node. Machine learning models were based on hyperparameters that minimize root mean squared error (RSME). Grid search for the gradient boosting models included the parameters interaction depth, number of trees, shrinkage factor and number of observations in a terminal node.
Each conditional random forest models contains a unique subset of predictors since features with near-zero influence (negative MDA-score) and correlated predictors were excluded in the preliminary machine learning models. Feature selection for each model was based on subject matter knowledge, to improve signal extraction of predictors, improve model diagnostics and metrics (RSME). In the gradient boosting models, predictors with low relative influence were sequentially excluded until relative importance plot demonstrated stability for the strongest predictors. Each random forest and gradient boosting model were built using between 600 to 3,500 regression-or classification trees. Additional model diagnostics was performed with measurement of r-squared (R 2 ) on test sets for each model. However, due to small sample size, we noticed relatively large fluctuations in R 2 depending on partition size of test samples. Therefore, model building was performed to minimize RMSE and optimize R 2 values for each model.
Relative importance of predictors is presented as mean decrease of accuracy (MDA) and relative influence for conditional random forest and gradient boosting, respectively, a large relative importance indicates that the predictor is important, whereas a small MDA or relative influence value indicates that the predictor is less important for that outcome. partial dependence plots. Comprehensive model building revealed interactions between strong predictors for certain outcomes. Therefore, we constructed two-way partial dependence plots (Figs. 2, 4 and 5), based on random forest models, which demonstrates the interaction effects of varying values for the most important predictors. Partial dependence plots visualizes the relationship between predictors, whether it is linear, monotonous, more complex or if it'is increasing or decreasing values that is related with the outcome 37 .
Linear-and logistic regression models. The most important predictors identified by means of machine learning models were subsequently included in linear and logistic regression models. The regression estimates