Non-targeted urine metabolomics and associations with prevalent and incident type 2 diabetes

Better risk prediction and new molecular targets are key priorities in type 2 diabetes (T2D) research. Little is known about the role of the urine metabolome in predicting the risk of T2D. We aimed to use non-targeted urine metabolomics to discover biomarkers and improve risk prediction for T2D. Urine samples from two community cohorts of 1,424 adults were analyzed by ultra-performance liquid chromatography/mass spectrometry (UPLC-MS). In a discovery/replication design, three out of 62 annotated metabolites were associated with prevalent T2D, notably lower urine levels of 3-hydroxyundecanoyl-carnitine. In participants without diabetes at baseline, LASSO regression in the training set selected six metabolites that improved prediction of T2D beyond established risk factors risk over up to 12 years' follow-up in the test sample, from C-statistic 0.866 to 0.892. Our results in one of the largest non-targeted urinary metabolomics study to date demonstrate the role of the urine metabolome in identifying at-risk persons for T2D and suggest urine 3-hydroxyundecanoyl-carnitine as a biomarker candidate.

Type 2 diabetes mellitus (T2D) is a metabolic disease characterized by raised fasting glucose levels due to insulin resistance and impaired insulin production. It is a leading cause of cardiovascular disease, blindness and kidney failure 1 . The 2017 global estimate of 425 million persons with diabetes is projected to increase by 48% to 629 million by 2045 2 . A continuing challenge is the identification of persons at high risk of T2D, particularly in the absence of established risk factors such as obesity and poor diet 3,4 . This need is underscored by a 2017 survey by the charity Diabetes UK, where a top research priority for persons affected by T2D was to "identify people at high risk of type 2 diabetes and help to prevent the condition from developing" 5 . Another challenge has been the identification of currently unknown molecular mechanisms of T2D that could act as novel treatment targets 6 .
Non-targeted (or untargeted) metabolomics describes the assessment of small molecules (< 1,500 Daltons in molecular weight) in biological specimens and comprises a broad range of peptides, carbohydrates, lipids and nucleic acids. Non-targeted methods such as ultra-performance liquid chromatography coupled to quadrupoletime-of-flight mass spectrometry (UPLC-QTOFMS) capture all metabolite signals detectable by the method at hand without a priori selection. In an electrospray ionization source (ESI), effluents from the liquid chromatography system are nebulized at atmospheric pressure and ionization occurs through the application of a strong electric field on the surface of the effluent droplets as they elute from the nebulizer. The size of the charged droplets diminishes as the formed molecular ions and molecular adducts travel towards the mass spectrometer Scientific RepoRtS | (2020) 10:16474 | https://doi.org/10.1038/s41598-020-72456-y www.nature.com/scientificreports/ for analysis, collision induced dissociation, and mass detection. The accurate mass, mass spectra, and retention time of each molecular ion is matched to metabolites by comparison to internal and external standards or public databases [7][8][9] . Serum and plasma metabolomics have been used to discover biomarkers and improve risk prediction for insulin resistance and T2D [10][11][12][13] . Far less attention has been paid to the urinary metabolome.
A genome-wide association study showed about a two-thirds overlap between urinary and plasma metabolite loci in the genome 14,15 . One study in ~ 3,900 healthy persons found correlations between five-year change in glycated hemoglobin levels and baseline levels of urinary metabolites such a betaine and trimethylamine 16 . A cross-sectional study reported 94 metabolites in plasma, urine or saliva samples that differed between persons with and without T2D 17 . We are unaware of any published study that uses large-scale non-targeted urinary metabolomics for biomarker discovery or risk prediction of incident T2D.
Here, we use non-targeted UPLC-MS urinary metabolomics in two community-based cohorts of > 1,400 Swedish adults to discover metabolites associated with prevalent T2D and to assess whether urinary metabolomics improves risk prediction of incident T2D beyond an established clinical risk score.

Results
We included 789 participants of the PIVUS study (108 prevalent cases of T2D) and 635 participants of the ULSAM study (89 cases of prevalent T2D). Figure 1 shows the study flow, and baseline characteristics are displayed in Table 1.
In the discovery sample PIVUS, 7 out of 62 preliminarily annotated metabolites measured in both cohorts were associated with prevalent T2D after adjustment for sex, age and urinary creatinine at a false discovery rate (FDR) < 0.05: 3-methyxanthine (odds ratio, OR, per standard deviation increase and 95% confidence interval, CI, 0.  Table 2).
To assess associations with incident T2D, we combined both cohorts after excluding prevalent cases of T2D at baseline (n = 1,227) and randomly split the sample into a two-thirds training (n = 818) and one-third test set (n = 409). Over a maximum of 12 years′ follow-up (mean 6.32 ± 3.1 years), there were 36 and 10 incident cases of T2D in the training and test sets, respectively. LASSO regression in the training set that forced cohort, age, sex, urinary creatinine and the FOS variables into the model selected six out of the 62 metabolites as the optimal parsimonious model to predict risk of T2D (C 5 H 14 S, indoleacrylic acid, sotalol, tranexamic acid, trans-ferulic acid, (3a,5b,7a,12a)-24-[(carboxymethyl)amino]-1,12-dihydroxy-24-oxocholan-3-yl-b-D-glucopyranosiduronic acid). In the holdout test set, the baseline model C statistic was 0.866 (95% CI, 0. 786-0.946, Nagelkerke's  In contrast, calibration plots of observed and predicted risk indicated that while the baseline model was well calibrated, the baseline-plus-metabolites model showed signs of underestimation of risk (Fig. 3). This discrepancy may be due to the small number of cases-particularly in the test set-which resulted in low statistical power of the formal calibration test and does not allow reliable conclusions about the merits of the model.

Discussion
In 1,424 Swedish adults enrolled in two community-based cohorts, we discovered associations between prevalent T2D and lower urinary levels of 3-hydroxyundecanoyl-carnitine and the sodiated adduct of nonanoyl-carnitine (Supplementary Text). We also found indications for improved risk prediction for incident T2D over an average 6-year follow-up period after adding six urinary metabolites to an established diabetes risk score that did not, however, reach statistical significance. The small number of cases demands cautious interpretation of the prediction results for incident T2D.

Association between urinary 3-hydroxyundecanoyl-carnitine level and T2D. 3-hydroxyun-
decanoyl carnitine (C 18 H 35 NO 5 , HMDB0061637) belongs to the group of acylcarnitines, which are essential organic compounds composed of a fatty acid with a carboxylic acid attached to carnitine by an ester bond that are essential intermediates in fatty acid metabolism. This odd-numbered C11-carnitine occurs with relatively low abundance in the circulation and tissues when compared to even-chain acylcarnitines. The principal origin of odd-numbered medium-chain acylcarnitines remains elusive; odd-chain acylcarnitines originate both, from branched-chain amino acid catabolism, and to a lesser extent the peroxisomal processes of fatty acid alpha oxidation 18 . Despite the low abundance of odd-numbered acylcarnitines in biological matrices, the use of mass spectrometric methods has prompted the detection of C11-carnitine and other odd-numbered acylcarnitines in animal and human plasma [19][20][21] , urine 20,22 , as well as liver and kidney 23,24 . However, the distribution of oddnumbered acylcarnitines and other acylcarnitines between tissues, plasma, and renal excretion remains poorly understood 25 . Whilst levels of various acylcarnitines in the circulation 18,19 and urine 20 have been associated with increased risk of T2D and insulin resistance 21 , there is a dearth of evidence linking specifically the odd-numbered mediumchain C11-carnitine to diabetes. In a comprehensive analysis including > 110 acylcarnitines in plasma and urine of leptin-deficient (db/db) mice, an accumulation of plasma medium-and long-chain acylcarnitines was accompanied by a decrease in urinary odd-numbered (C7, C9, and C11) medium-chain acylcarnitine levels 20 . None of the other seven acylcarnitines (variants of C5, C6, C8 and C10-carnitine) among the 62 automatically annotated metabolites in our study were statistically significantly associated with prevalent T2D (Supplementary Text).
Our study is the first in human participants to report an association between lower levels of urinary C11-carnitine and prevalent T2D. Lack of power to detect associations with other carnitine metabolites cannot be excluded, as our sample size was limited. Annotation certainty of this metabolite in our sample at Metabolomics Standards Initiative (MSI) confidence level 2 is comparatively good (Supplementary Text and Supplementary  Figures 1-3), although our inability synthesize authentic standards for external validation of the annotation leaves some uncertainty.    Table 2. Associations between urinary metabolite levels and prevalent T2D in the discovery and replication samples in logistic regression adjusted for age, sex (for PIVUS only) and urinary creatinine per standard deviation unit increase in metabolite level. Metabolites associated at a false discovery rate of 5% in the discovery sample PIVUS were tested in ULSAM. The names reflect the initial, automated data-driven annotation as explained in the Methods section. The shown P values are unadjusted.

Figure 2.
Associations of the replicated urinary metabolites and prevalent T2D in the combined sample (n = 1,424). Results from logistic regression adjusted for age, sex, cohort and urinary creatinine (red color) and with additional adjustment for BMI, HDL-cholesterol, triglycerides, systolic and diastolic blood pressure, hypertension and family history of diabetes (blue color). Error bars denote 95% CI around odds ratios per standard deviation increase in urinary metabolite level. www.nature.com/scientificreports/ Associations between another urinary carnitine metabolite and T2D. Lower urinary levels of C-571 were associated with prevalent T2D both before and after additional adjustment for established T2D risk markers (Fig. 2). This metabolite was initially computationally annotated as N-jasmonoylisoleucine, but review of the spectral data strongly suggests this signal as the sodiated adduct of nonanoyl-carnitine with the association signal possibly due to a statistical artifact (Supplementary Text, Supplementary Figures 10-15). The precursor molecular ion (M + H) of nonanoyl-carnitine compound was not associated with any of the outcomes in our study and the signal for the sodiated adduct of nonanoylcarnitine could, in our opinion, be a statistical artefact. We are therefore unable to further explore the possible biology behind this association but provide detailed information on the annotation in the Supplementary Text and Supplementary Figures.

Strengths and Limitations.
We report the first epidemiological study of non-targeted urinary metabolomics to assess the risk of prevalent and incident T2D in two independent community-based cohorts. Strict statistical controls for multiple testing, a discovery/replication design, over 10 years of follow-up in the ULSAM cohort, and the unbiased non-targeted metabolomics method are strengths of our study. Limitations include the limited power for incident T2D analysis and annotation uncertainties. The ULSAM cohort included only men, whilst the PIVUS cohort had a balanced sex ratio (all analyses were adjusted for sex). Our study used deepfrozen urine samples collected several years before the UPLC-MS technology became available, necessitating analysis of spot urine samples in PIVUS and 24-h urine collections in ULSAM. Analyses were adjusted for type of sample collection and difference in urine concentration (using creatinine levels as a proxy), but the different sampling methods may have impacted the results. In the absence of external validation and reanalysis of the samples (that were used up in the analysis), our annotation of metabolites remains unconfirmed. Future studies should strive for more controlled settings with regards to the collection of urine samples.

Conclusion.
In our metabolomics study in over 1,400 adults, lower urinary levels of 3-hydroundecanoylcarnitine were associated with prevalent T2D. We were unable to assign molecular identities to another T2Dassociated signal, but provide extensive discussion of the mass spectral characteristics and possible identities. We report our complete results despite remaining annotation uncertainties as a pioneering effort to study nontargeted urinary metabolomics and T2D without a priori selection of potential metabolites or biomarkers of interest, and as our explanations of the analytical pipeline makes an innovative and informative contribution to the field of human metabolism research. The field of non-targeted metabolomics is young and the growing availability of comparison structures in molecular databases will improve the identification of metabolites in the future.

Participants. Uppsala Longitudinal Study of Adult Men (ULSAM).
Between 1970-1973, ULSAM enrolled 2,322 (81.7%) of all 2,841 men born between 1920-1924 who were residents of Uppsala county, Sweden 26 . Regular biomedical assessments have been carried out ever since as detailed here (https ://www.pubca re.uu.se/ulsam /). The current study used data and a 24-h urine collection at age 77 years. Participants were followed up until assessment at 93 years of age or death according to the Swedish Death register. Urine metabolomics data from 635 individuals out of 839 that attended assessment were available (missing individuals are due to missing urine samples or insufficient sample quality as metabolomics was carried out in the 2010s on biobank samples obtained at assessment age 77 years in the early 1990ies).

Prospective Investigation of the Vasculature in Uppsala Seniors (PIVUS).
In 2001, the PIVUS study (https :// www.medsc i.uu.se/pivus /) enrolled 50% (n = 1,016) of a random sample of Uppsala community residents aged 70 years with the aim of comparing different measures of arterial compliance 27 . The current study is based on the assessment at age 75 years where spot urine sample were collected and participants were followed until reassessment at 80 years of age or death. We included urine metabolomics data from 789 participants who had deep-frozen urine samples of sufficient quality available at the point of analysis in the 2010s.  29 was used to cluster features into spectra, interpretMSSpectrum 30 was used to infer the molecular ion, and MS-Finder 31 was used to annotate metabolites.

Non-targeted metabolomics.
Only annotated, quality-controlled metabolite features measured in both PIVUS and ULSAM were included in this study. Because this data-driven non-manual annotation can be liable to statistical artefacts, we refer to it as "preliminary/initial annotation" in the text. For all outcome-associated features, we went back to the original UPLC-MS data and carried out manual in-depth review to verify or refute the preliminary annotation. We present these validation steps for the main results of this study in the Supplementary Text  in PIVUS) to test associations between each urinary metabolite (scaled to standard deviation units) and prevalent T2D at baseline. Urinary creatinine was included as a covariate because it was strongly associated with the dominant principle components in principle component analysis (implemented as part of the XCMS normalization steps), and to control for between-sample variation in urine concentration and sampling method (24 h versus spot collection). Metabolites associated at a false discovery rate (FDR) < 0.05 in the discovery sample PIVUS were tested in the replication sample ULSAM. In part 2, we used LASSO L1-regularised logistic regression to select urinary metabolites that together improved risk prediction for incident T2D when added to the risk factors in the Framingham Offspring Study (FOS) diabetes risk score 32 . We combined both cohorts, excluded all cases of prevalent T2D at baseline and randomly split the dataset into a 2/3 training and 1/3 holdout test set. The training dataset was used to develop the LASSO model by tenfold bootstrapped internal cross-validation and the test set was used only once to evaluate performance of the selected model with regard to risk discrimination (C statistic), calibration (plots of observed against predicted risk), goodness-of-fit (Hosmer-Lemeshow test) and explained variance (Nagelkerke's pseudo-R 2 ). To develop the model in the training set, we forced cohort status and the FOS variables (age, sex, parental history of diabetes, body mass index, blood pressure, fasting glucose, HDL-cholesterol, triglycerides), into the model and allowed free shrinkage on all 6 urinary metabolite regression coefficients. Analyses were carried out in R version 3.3.3. Study approval. All participants provided written informed consent. The study was approved by the Data and resource availability. Individual level data from ULSAM and PIVUS are not deposited in the public domain, as existing ethical permits and Swedish/EU data protection regulations do not allow this. Full datasets are made available to researchers who meet the criteria for confidential data access as stipulated by participant informed consent and institutional review board/ethics committee permission at Uppsala University (Uppsala, Sweden). Data access in ULSAM is granted through the Interdisciplinary Collaboration Team on Uppsala Longitudinal Studies (ICTUS; https ://www2.pubca re.uu.se/ULSAM /res/propo sal.htm; contact: vilmantas. giedraitis@pubcare.uu.se). Data from the PIVUS study can be applied for at the PIVUS steering committee (https ://www.medsc i.uu.se/pivus /; contact: lars.lind@medsci.uu.se).
De-identified raw mass spectrometry data (without phenotype or other identifying information) and the analysis code can be obtained without prior ethical or legal approval from the main author (christoph.nowak@ki.se).

Data availability
The authors report that, for approved reasons, some access restrictions apply to the data in this study. Individual level data from ULSAM and PIVUS are not deposited in the public domain, as existing ethical permits do not allow this. Full datasets are made available to researchers who meet the criteria for confidential data access as stipulated by participant informed consent and institutional review board/ethics committee permission at Uppsala University (Uppsala, Sweden). Data access in ULSAM is granted through the Interdisciplinary Collaboration Team on Uppsala Longitudinal Studies (ICTUS; https ://www2.pubca re.uu.se/ULSAM /res/propo sal.