Characterization of a metabolomic profile associated with responsiveness to therapy in the acute phase of septic shock

The early metabolic signatures associated with the progression of septic shock and with responsiveness to therapy can be useful for developing target therapy. The Sequential Organ Failure Assessment (SOFA) score is used for stratifying risk and predicting mortality. This study aimed to verify whether different responses to therapy, assessed as changes in SOFA score at admission (T1, acute phase) and 48 h later (T2, post-resuscitation), are associated with different metabolite patterns. We examined the plasma metabolome of 21 septic shock patients (pts) enrolled in the Shockomics clinical trial (NCT02141607). Patients for which SOFAT2 was >8 and Δ = SOFAT1 − SOFAT2 < 5, were classified as not responsive to therapy (NR, 7 pts), the remaining 14 as responsive (R). We combined untargeted and targeted mass spectrometry-based metabolomics strategies to cover the plasma metabolites repertoire as far as possible. Metabolite concentration changes from T1 to T2 (Δ = T2 − T1) were used to build classification models. Our results support the emerging evidence that lipidome alterations play an important role in individual patients’ responses to infection. Furthermore, alanine indicates a possible alteration in the glucose-alanine cycle in the liver, providing a different picture of liver functionality from bilirubin. Understanding these metabolic disturbances is important for developing any effective tailored therapy for these patients.


Metabolic fingerprinting by untargeted metabolomics. A rapid untargeted analysis by flow
injection-TOF-MS was done to screen for metabolic features significantly characterizing the responsiveness (R) group and non-responsive (NR) group to therapy in septic shock patients. Statistical analyses on the species identified from the untargeted approach showed that at T1 the two groups were similar and most of the differences were seen at T2. None of the species identified had a significant difference between R and NR at T1, except 4 species: stearic acid was lower in NR, while pyruvic acid, lactic acid and histidine were higher (Fig. 1). The changes in peak intensities from T1 to T2 were verified in the two groups separately and then compared (Table 3). There was a general increase in circulating essential aminoacids such as arginine, tyrosine, threonine and lysine at T2 in R and NR, but only lysine and threonine rose significantly in both groups. Similarly, the abundance of acetylcarnitine was significantly lower in R and NR (Table 3). Only NR showed a significant reduction in circulating fatty acids, mainly saturated and monosaturated (myristic acid, palmitoleic acid, palmitic acid, oleic acid and stearic acid). The endogenous kynuramine, derived from tryptophan, increased with time in both groups, although significantly only in R. The trend (Δ = T2 − T1) showed three species significantly differed between the two groups: creatinine decreased in R and increased in NR, while myristic acid and oleic acid significantly decreased in NR only (Fig. 2).
Metabolic profiling by targeted metabolomics. We applied a mass spectrometry-based quantitative metabolomics profiling to unambiguously identify and quantify glycerophospholipids, aminoacids, acylcarnitines and biogenic amines in the plasma of the study subjects.
Similarly to the untargeted metabolomics approach, univariate statistical analyses showed that most of the differences in metabolite levels arose at T2.
Only four metabolites had significant differences in concentrations in the two groups at T1 (Supplemental Table S1 and Figure S1), and 23 metabolites at T2 (Table 4). At T2 the plasma levels of six species of lysophosphatidylcholines (lysoPCs), seven diacyl-phosphatidylcholines (PCaa), two acyl-alkyl phosphatidylcholines (PCae), two long-chain sphingomyelins (SM) and glutamic acid were lower in NR than in R, whereas there was greater abundance in NR for aminoacids such as alanine, methionine, phenylalanine and histidine (Table 4 and Fig. 3).
No metabolites changed significantly in the NR group from T1 to T2, whereas 54 significantly changed in R. In the R group eight species of lysoPCs, seventeen PCaa, twenty-two PCae, eleven SM, seven amino acids (AAs) and two biogenic amines significantly increased from T1 to T2, while histidine, creatinine and taurine decreased significantly. For 38 metabolites the trends (Δ = T2 − T1) were different between R and NR (Table 5).
Moreover, kynurenine, a product of the trypthophan catabolism increased in the NR group significantly more than R group as shown in Fig. 4.
Regression analysis for targeted metabolomics data. As combinations of features can give more information than features considered singly, we used classification models with the aim of identifying a set of metabolites that are mostly associated with the target class, i.e. patients not responsive to therapy (NR). The coefficients of the models obtained are reported in Supplemental Table S2. The interpretation of the coefficients in the logistic regression it is not trivial. If we express the odd-ratio as exponential of linear combination of the independent variables, we can say that if the coefficient β i is positive then the increase of the feature x i will be associated with the increase of the odd ratio, i.e. the probability to belong to class 1 (NR) is higher than to class 0 (R), given all other x j variables being equal. On the contrary, if the coefficient β i is negative then the increase of the feature x i will be associated with the decrease of the odd ratio, i.e. the probability to belong to class 1 is lower than to class 0.
Three metabolites were selected in all models: PC ae C40:2, PC ae C38:0 and alanine. Figure 5A shows the coefficient values of the model built according to the criterion of minimal deviance on the first 20 ranked features. All the obtained models correctly classify the observations in the testing set so the performances are better than the ones from the metabolites considered individually for patient classification (the average AUC of the metabolites is below 0.8, as shown in Supplemental Figure S2).
Regression models for targeted and untargeted metabolomics data. We built the integrated models by using 10 features from untargeted metabolomics data and the first 20 ranked metabolites from targeted analysis, as explained in the Methods section. We can notice that the set of features selected includes again lysoPCs, PCs and alanine. Moreover, six further species not measured by targeted analysis entered in the models: stearic acid, palmitoleic acid, palmitic acid, oleic acid, myristic acid and citric acid. The coefficients of the models are reported in Supplemental Table S3. Figure 5B shows the coefficients of the model built according to the criterion of minimal deviance on the first 20 ranked features. We can notice that alanine, PC ae C38:0, PC aa C38:1, myristic acid and palmitoleic acid are selected by all models. It is worth to underline that PC ae C38:0 and alanine have coefficients with the same sign as the coefficients of the models built on targeted metabolomics data only.

Discriminant Analysis.
The coefficient values of the LDA models and the VIP scores of the PLS-DA models built on the first 10 and 20 ranked features after mRMR are reported in Tables 6 and 7 for targeted metabolomics and for targeted plus untargeted metabolomics data respectively. We cannot use the entire subset of 30 features due to the lower number of observations (i.e. 21 patients only). In fact, the computation of the boundary region requires the covariance matrix would be invertible and this is not the case.
In the targeted metabolomics model, it is worth to underline that PC ae C38:0, which already played an important role in the regression models, occupies the first position in the VIP ranking when considering 20 features. Similarly, when considering PC ae C40:2, we can notice that it is in the first position when considering 10 features. This latter metabolite also has the highest score in the integrated model. Three-dimensional PLS-DA score plots on 20 features for the two models are shown in Fig. 6 (targeted metabolomics only in panel A; integrated model of targeted and untargeted metabolomics in panel B). In both cases, only one subject (i.e. NR in the targeted model and R in the integrated one) was misclassified by the models.
Plasma level of sPLA2-IIA. As shown in Fig. 7A, sPLA2-IIA plasma levels were significantly higher in NR than in R patients at T2. To note that responders showed a markedly significant reduction of sPLA2-IIA from T1 to T2. We also compared the sPLA2-IIA variation overtime, expressed as Δ = T2 − T1 (panel B) and sPLA2 -IIA decreased significantly in R only. Some NR patients reduced the level of sPLA2-IIA, however, the population trend was not significant and clear as in R.

Discussion
Improvement of organ function as assessed by a drop in SOFA score in the first days of sepsis and septic shock is associated with better outcomes 19,20 , but the mechanisms behind organ improvement remain to be fully elucidated. We made a comprehensive metabolomics study of septic shock patients stratified as responders and not responders to standard therapy on the basis of the changes in SOFA score within the first 48 hours after the enrollment.
To the best of our knowledge, this is the first study to show that plasma metabolome changes are associated with initial responsiveness to therapy in septic shock patients.
Combining untargeted and targeted metabolomics methods by collecting data for fast untargeted MS data acquisition and high-resolution MRM transitions for targeting multiple metabolites, we obtained a wider picture of patients' metabolic states and their metabolic trajectory during the first 48 hours after ICU admission. Univariate analysis and the classification models confirmed that NR group presented an overall lipidome alteration, as previously reported 14,15,18,[21][22][23] .
LysoPCs have a very complex role in metabolism. They are primarily generated by phospholipase A2 (PLA2) enzyme activity, and like this enzyme, they have a direct role in toxic inflammatory responses. Low plasma lysoPC levels have been noted in sepsis patients and systemic treatment with lysoPCs has proved to be therapeutic in rodent models of sepsis and ischemia 24 .
These observations suggest that elevation of plasma levels of these lipids can actually help relieve serious inflammatory conditions. Cunningham et al. 24 reported that specific lysoPCs act as uncompetitive product inhibitors of plasma secreted PLA2 enzymes (sPLA2s), especially under conditions of elevated enzyme activity, thus providing a feedback mechanism for the anti-inflammatory effects of these compounds 25 . Indeed, in the present work we showed that the level of sPLA2-IIA significantly decreased together with an increase of lysoPC species (i.e. lysoPC C16:00 and C18:0) in R patients only. This result is in line with other experimental evidence that reduction of sPLA2-IIA may slow down the inflammatory cascade and increase the probability of responsiveness to therapy 26,27 (see also Figure S5).
The reduction in circulating lysoPC in NR patients may simply reflect their enhanced conversion to lysophosphatidic acid (LPA), which induces a multitude of cellular responses through its action on immunologically relevant cells 28 . Conceivably the low lysoPC may also promote an excessive immune response, with detrimental effect in NR patients 21,29 . Low circulating levels of lysoPC 16 and 18 species have also been reported in inflammatory liver disease [30][31][32] . NR patients have also showed a marked decrease in PC species, which originate in the liver. The imbalance of lysoPC/PCs cycling suggests that hepatic homeostasis and function is compromised even before any clinical manifestation, and bilirubin alone cannot give a clear picture of the liver's condition.
In addition, NR patients had lower levels of PC species containing long-chain polyunsaturated fatty acids (LCPUFAs), such as PC aa C38:6, PC aa C36:6, PC aa C40:5, with further elongation/desaturation products. This profile agrees with our previous finding of the different composition of PC species as potential metabolic determinants of mortality in septic shock 14 . Here again we can speculate that, since LCPUFAs reduce T-cell activation and dampen inflammation 33 , a decrease in PC-containing LCPUFAs might hamper their protective effects, including a concerted action of either withdrawing pro-inflammatory eicosanoids or incrementing anti-inflammatory eicosanoids. Eicosanoids and pro-resolving lipid profiles have been recently correlated with survival and clinical outcome in sepsis 34 .
The multivariate models showed that less change in plasmalogen concentrations (plasmenylcholines PC ae C44:6, PC ae 40:2, PC ae 40:5, plasmanylcholine PC ae 38:0), in lysoPC species (e.g. saturated lys-oPC C16:0, C18:0), and in fatty acids in combination with larger increment of alanine were associated with non-responsiveness ( Fig. 5 and Table 7). Plasmalogens serve as endogenous antioxidants, mediators of membrane structure and dynamics, storage for polyunsaturated fatty acids and lipid mediators 35 . Raising plasmalogen levels protects human endothelial cells during hypoxia 36 . Reduced plasmalogens abundance in NR might reflect an increased oxidative imbalance probably due to an excessive systemic inflammatory response with a resulting high level of oxidative stress. A low plasmalogen level has been reported as a surrogate marker of oxidative stress in elderly septic patients 37 . Furthermore, an exaggerated systemic inflammatory response in NR would be in accordance with the high levels of kynurenine, supporting the role of an accelerated tryptophan catabolism along the kynurenine pathway in sepsis outcome 14,38 .
A novelty of this study is the emerging role of alanine. Alanine is a gluconeogenic amino acid and plays a key role in the glucose-alanine cycle, a series of reactions in which amino groups and carbons from muscle are transported to the liver. When muscles degrade amino acids for energy needs, the resulting nitrogen is transaminated to pyruvate to form alanine. This is done by the enzyme alanine transaminase, which converts L-glutamate and pyruvate into α-ketoglutarate and L-alanine. The resulting L-alanine is shuttled to the liver where the nitrogen enters the urea cycle and the pyruvate is used to make glucose. Enhanced elaboration of glucose by the liver (hepatic gluconeogenesis) is a prominent feature of the solid organ response to injury and provides fuel to the cellular elements of the inflammatory response. Indeed, the high respiratory exchange ratio for carbohydrate oxidation is the proof that fat oxidation requires more oxygen than carbohydrate oxidation to produce ATP, making glucose more efficient as a fuel source. In this regard, it is well known in sepsis that circulating cytokines and catecholamines cause a shift in metabolism towards the stress state: activation of glycogenolysis and hepatic gluconeogenesis, activation of hepatic lipolysis, increase in muscle protein catabolism and high production of lactate level 39 .
The higher plasma alanine in NR may be a sign of lower hepatic capacity for conversion of alanine to glucose. The higher levels of pyruvic acid and lactic acid in NR patients at T2 (Fig. 1) seem to further support this interpretation.
We recovered also the data of liver functionality markers from the medical records of our patients (Supplemental Table S4 and Supplemental Figures S6 and S7). The values at the two time points of aspartate transaminase (ASAT or AspAT or AAT), alanine transaminase (ALAT or ALT), alkaline phosphatase (AP or ALP), gamma-gamma transaminases (gGT), albumin and total bilirubin were not significantly different between the two groups. Only one patient in R group and one patient in NR group were diagnosed as moderate/severe liver dysfunction at the enrollment. The inflammatory response affects organs functionality, liver included, and the high variability in the values distribution can be easily explained by the shock condition and by antibiotics therapy.
Our hypothesis is that alanine plays a role in the energy metabolism and this pathway could be affected by liver dysfunctionality, but this is a further factor to be investigated in ad-hoc clinical trial.
The results presented here highlight biological pathways that could have a clinical impact on septic shock progression and management. Animal experiments are now in progress to understand the time course of metabolome change in this condition better.
We are aware of the risk of overfitting the classifier model to our limited set of subjects in this investigation, despite attempts to minimize these effects with the statistical methods used. Furthermore, these analyses could not take into account all the possible confounding factors such as different renal and hepatic functions, type of nutrition (parental or enteral), and latent insulin resistance.

Conclusion
In conclusion, the data presented here reinforce the emerging evidence that lipidome alterations are important in the individual patient's response to infection. Changes in the levels of metabolites over time can distinguish a positive response to therapy. Under conditions of severe inflammatory stress and subsequent elevation of PLA2 enzymes activity, elevation of circulating levels of lysoPCs may promote the consequent inhibition of PLA2 enzymes, thus favoring cytoprotection. The emerging role of alanine suggests a different approach for monitoring hepatic function, which will be more specific than bilirubin. Further studies should investigate whether such metabolic dysregulation could be exploited for a more effective targeted therapy.

Material and Methods
Study design, patients and clinical data. This is an ancillary study from the multicenter prospective observational trial named Shockomics (see ClinicalTrials.gov Identifier NCT02141607). Details of the protocol are described in the work of Aletti et al. 40 . The study was approved by the Geneva regional research ethics committee (Commission cantonale d'éthique de la recherché, President Prof. Bernhard Hirschel, study number 14-041).

Responsive to therapy (R)
Not responsive to therapy (NR)  Between October 2014 and December 2015, patients admitted with septic shock to the 38-bed mixed ICU of Geneva University Hospital were screened for inclusion criteria according to researchers' availability. We included adults (>18 years old) with an admission SOFA score ≥6 and arterial lactate ≥2 mmol/l. Patients with a high risk of death within the first 24 hours after admission, systemic immunosuppression, hematological diseases, metastatic cancer, pre-existing dialysis, decompensated cirrhosis or who had received more than 4 units of red blood cells or any fresh frozen plasma before ICU admission were excluded 40 . Informed consent was obtained from patients or proxies.
Patients were managed by the clinical care team according to international guidelines 41 . Blood samples collected within 16 hours of ICU admission (T1, acute-phase) and 48 hours after enrollment (T2, post-resuscitation phase) were processed for metabolomics analysis 40 . Figure 2. Untargeted metabolomics. Metabolites whose change in peak intensity from T1 to T2 in the two groups is statistically significant. Box-plots in the top right corner show differences in metabolite peak intensity between T1 and T2 expressed as delta (Δ = T2 − T1). We did the Wilcoxon rank-sum test for the delta of the two groups and Wilcoxon signed rank between T1 and T2 in each group separately. Significant differences are marked with *(p-value < 0.05).  Table 4) significantly differing in the two groups at T2 as example. Distributions are shown as box-plots where the central line is the median concentration, the edges of the box are the 25th and 75th percentiles and the outliers are defined as 1.5 times the interquartile range and highlighted by +.
SCIeNtIFIC RePoRts | 7: 9748 | DOI:10.1038/s41598-017-09619-x We used the SOFA score to classify patients in two groups according to their responsiveness to early resuscitation as responders (R) or non-responders (NR). Patients with a SOFA score at T2 higher than 8 and no drop of at least 4 points in their SOFA scores from T1 to T2 (ΔSOFA T2−T1 ) were classified NR.
Survival at 28 days after ICU admission for patients surviving hospital discharge was assessed by consultation of the Geneva Canton death registry and by telephone call to the patient/proxies.

Untargeted metabolomics by Flow Injection Analysis-Time-of-Flight Mass Spectrometry (FIA-TOF-MS). Plasma samples.
Metabolites were extracted by adding four volumes of cold methanol to the plasma sample (10 μL); samples were vortexed and incubated at −20 °C for 1 h. They were then centrifuged 10 min at 14,000 × g, and the supernatant was collected, dried in a SpeedVac and resuspended in 50 μL of 0.1% formic acid 42 . A portion (15 μL) of the extract was analyzed by mass spectrometry.
Flow Injection-TOF MS/MS. Analysis was done on an Agilent 1290 infinity Series coupled to an Agilent 6550 iFunnel Q-TOF mass spectrometer (Agilent, Santa Clara, CA) equipped with an electrospray source operated in negative and positive mode. The flow rate was 150 μL/min of mobile phase consisting of isopropanol/water (60:40, v/v) buffered with 5 mM ammonium at pH 9 for negative mode and methanol/water (60:40, v/v) with 0.1% formic acid at pH 3 for positive mode. Reference masses for internal calibration were used in continuous infusion during the analysis (m/z 121.050873, 922.009798 for positive and m/z 11.9856, 1033.9881 for negative ionization). Mass spectra were recorded from m/z 50 to 1100. Source temperature was set at 320 °C with 15 L/min drying gas and nebulizer pressure of 35 psig. Fragmentor, skimmer, and octopole voltages were set to 175, 65, and 750 V, respectively. MS/MS fragmentation patterns of statistically significant features were collected and used to confirmed metabolite identity.

MS Data
Processing. All steps of data processing and analysis were done with Matlab R2016a (The Mathworks, Natick) using in-house developed script following the workflow proposed by Fuhrer 12 . Centroid m/z lists were exported to csv format. Briefly, in this procedure, we applied a cut-off to filter peaks of less than 500 ion counts for negative and 1000 ion counts for positive ionization to avoid detection of background noise. Centroid m/z lists from different samples were merged to a single matrix by binning the accurate centroid masses within the tolerance given by the instrument resolution (about 10 ppm). The output m x n matrix contains the m peak intensities of each mass for the n analyzed samples. Because mass axis calibration is applied online during acquisition, no m/z correction was applied during processing to correct for potential drifts.  Metabolite identification. A total of 14001 and 2190 metabolite masses were measured as peak intensities in positive and negative ion mode respectively. Given the large number of masses measured, we ran preliminary statistical analyses to select only the most significant ones for subsequent metabolite identification. Details of the statistical workflow are reported in Supplemental methods. Significant metabolic species were then identified by database searches (METLIN, http://metlin.scripps.edu; HMBD, http://www.hmdb.ca/) in positive and negative ionization (Table S4).
Targeted plasma metabolomics analysis. A targeted quantitative approach using a combined direct flow injection and liquid chromatography (LC) tandem mass spectrometry (MS/MS) assay (AbsoluteIDQ 180 kit, Biocrates, Innsbruck, Austria) was applied for the metabolomics analysis to EDTA-plasma samples stored at −80 °C, as previously reported by our group 7 . The method of AbsoluteIDQ p180 kit has been proved in conformance with the FDA Guideline 'Guidance for Industry-Bioanalytical Method Validation' , which implies proof of reproducibility within a given error range. The method combines derivatization and extraction of analytes with the selective mass-spectrometric detection using multiple reaction monitoring (MRM) pairs. Isotope-labeled internal standards are integrated into the platform for metabolite absolute quantification. This strategy allows simultaneous quantification of 186 metabolites (40 amino acids and biogenic amines, 40 acylcarnitines, 90 glycerophospholipids, 15 sphingomyelins, 1 monosaccharide). For data preprocessing, analytical specification and statistical analyses refer to Supplemental Methods.  Table 5). Box-plots in the top right corner show the differences in metabolite concentrations between T1 and T2, expressed as delta (Δ = T2 − T1). We did the Wilcoxon rank-sum test for the delta of the two groups and Wilcoxon signed rank between T1 and T2 in each group separately. Significant differences are marked with *(p-value < 0.05).

Continued
A metabolite was excluded from further analysis if its concentration did not meet all the following criteria: (1) fewer than 20% of missing values (non-detectable peak) for each quantified metabolite in each experimental group (2) 50% of all sample concentrations for the metabolite had to be above the limit of detection (LOD) 7 . In total, 130 of the 186 metabolites were used for statistical analysis. All the measurable metabolites are listed in Supplemental Table S5.
Determination of sPLA2-IIA levels. sPLA2-IIA levels in plasma were detected by using the sPLA2-IIA (human type IIA) ELISA kit (Cayman Chemical, USA) according to manufacturer's instructions. Concentrations in plasma were tested in duplicate and determined against standard curves at wavelength of 450 nm (Tecan Infinite M200 Plate Reader, Tecan Trading AG, Switzerland).
Statistical analyses. For those species identified by untargeted metabolomics and also quantified by targeted approach, we verified that their peak intensities and concentrations were correlated using the Spearman correlation analysis (see Supplemental Figure S3). For the identified metabolites, we examined the ability to separate the two groups of each metabolite individually by computing the area under the ROC curve, applying leave-one-out cross-validation (CV) (Supplemental Figure S4). Details of the statistical analysis for the comparisons of peak intensities are given in the Supplemental Methods.
We compared the metabolite concentrations measured by the targeted approach at T1 and T2 of the R and NR groups by the Wilcoxon rank-sum test. Changes in metabolite concentrations from T1 to T2 were compared separately for the R and NR groups by the Wilcoxon signed-rank test. Finally, for each metabolite, we compared the time-trend changes in metabolite concentrations (i.e. ∆ = T2 − T1) between the two groups by the Wilcoxon  Table 5. Metabolite concentrations (µM) from T1 to T2 in the two groups. Significant differences between T1 and T2 are marked with *(Wilcoxon sign-rank test p < 0.05), § marks differences in the delta between R and NR (Wilcoxon rank-sum test p < 0.05). Metabolite abundance changed significantly from T1 to T2 in the NR group.
rank sum test. To deal with the large number of statistical comparisons, we also computed the false discovery rate (FDR), assessed after the bootstrapping procedure. The sample size was increased from 14 to 20 subjects for the R group and from 7 to 10 for NR by bootstrapping with replacement, for a total of 30 observations. The bootstrapping procedure was only used for the FDR assessment. Results were considered significant when p < 0.05 and FDR < 0.15. For the peak intensities, we also examined the ability to separate the two groups of each metabolite individually by computing the area under the ROC curve, applying leave-one-out cross-validation (CV) (Supplemental Figure S4).
Finally, we compared sPLA2-IIA plasma levels between R and NR by means of Wilcoxon rank-sum test both at T1 and T2 and for the ∆ = T2 − T1. We also evaluated changes in sPLA2-IIA concentrations from T1 to T2 within the same group applying the Wilcoxon signed-rank test.
Multivariate analyses. Targeted metabolomics analysis. The aim of our model was to classify NR patients. We built the model on metabolite concentration changes from T1 to T2 (Δ = T2 − T1). The concentrations are highly correlated and, when the number of observations (21 patients) is much lower than the number of features (130 metabolites) as in our case, one must filter and reduce the number of features entering the model. We adopted the method proposed by Peng et al. 43 , which is the minimal-redundancy-maximal-relevance (mRMR). This algorithm sorts the features according to their relevance to the outcome (maximum relevance criterion) and their redundancy (minimum redundancy criterion) in relation to the other variables. The ranking is based on the mutual information between the outcome and each feature and on the mutual information between  Table 6. VIP scores of PLS-DA and coefficients of LDA for the targeted metabolomics models. each couple of features. We discretized the features distribution according to the interquartile range as suggested by Peng et al. 43 in order to apply the mRMR algorithm. Multivariate analysis was done using the Elastic Net technique 44 , a shrinkage regression method effective in case of several highly correlated variables. It performs continuous variable selection, causing some of the regression coefficients to be exactly zero, thus reducing the variance of the regression estimates by eliminating redundant features. The subset of variables corresponding to non-zero coefficients can be considered as the variables mainly associated with the outcome. The elastic net technique can be used with any linear regression model. Since we have a binary outcome (R/NR), we applied the logistic regression.
We considered the first 10, 20 and 30 metabolites ranked by mRMR to build three different classification models. The dataset was divided into a training and test set as two third and one third of the observations, respectively. Data were normalized (Z score normalization) before performing the elastic net. We adopted two strategies to further select a smaller subset of features. We performed 50 times an elastic net logistic model using a logit function to fit the training set data. We considered a binary classification (R = 0, NR = 1) and the output of the model is a value between 0 and 1, which represents a sort of probability. We then selected the coefficients of the model with the minimal deviance. We also applied another strategy, we used the shrinkage parameter λ, corresponding to the model with the minimal deviance, to fit another elastic net model and to obtain the coefficients of   Table 7. VIP scores of PLS-DA and coefficients of LDA for the integrated targeted and untargeted metabolomics models. Figure 7. Plasma sPLA2-IIA levels (μg/L) in responsive (R) and non-responsive (NR) patients at T1 and T2 (panel A) and comparison of time trend variation, expressed as delta (Δ = T2 − T1), between the two groups (panel B). Distributions are shown as box-plots where the central line is the median concentration, the edges of the box are the 25th and 75th percentiles and the outliers are defined as 1.5 times the interquartile range and highlighted by +. Significant differences between groups are marked with *(Wilcoxon rank-sum test p-value < 0.05), whereas significant differences from T1 to T2 are marked with **(Wilcoxon sign-rank test p-value < 0.05).
the logistic regression. In both cases, the models were then evaluated on the testing set and the performance was assessed by the number of correct imputations. Linear Discriminant Analysis (LDA) and Partial Least Squares Discriminant Analysis (PLS-DA) were also implemented. More precisely, LDA was performed on the first 10 ranked metabolites and the coefficients for the linear boundary between the first and second classes were retrieved. PLS-DA was performed both on the first 10 and 20 ranked metabolites, considering 3 PLS components. Since the groups are unbalanced, the data matrix was weighted centered in order to avoid having a decision boundary shifted towards the most numerous group. The variable importance in projection (VIP) scores, which represent the weights of each feature in PLS-DA model, and the coefficients of LDA were compared to those of logistic regression.
The performance of the classification models was evaluated by the number of correct imputations.
Integration of data from targeted and untargeted analysis. We built an integrated model by using the concentration of the metabolites and the peak intensities of the species identified by untargeted analysis. We must precise that for those metabolites quantified also in the targeted approach (i.e. acetylcarnitine, tyrosine and histidine), we used the concentration values instead of peak intensities as they are more reliable. Therefore, the metabolites identified from untargeted approach and used for these analyses were: acetylcarnitine, pyruvic acid, lactic acid, stearic acid, kynuramine, citric acid, myristic acid, palmitoleic acid, palmitic acid, oleic acid, tyrosine, histidine. We built the model on the changes from T1 to T2 (Δ = T2 − T1). Untargeted metabolomics data (10 features in total) were then combined with the first 20 ranked metabolites from targeted analysis. We considered all the 30 features and we performed again the mRMR algorithm to find the first 10, 20 and 30 ranked features. The classification models were built using the regularized logistic regression on normalized data as described in the previous paragraph. LDA and PLS-DA were also performed as stated above.