The specific metabolome profiling of patients infected by SARS-COV-2 supports the key role of tryptophan-nicotinamide pathway and cytosine metabolism

The biological mechanisms involved in SARS-CoV-2 infection are only partially understood. Thus we explored the plasma metabolome of patients infected with SARS-CoV-2 to search for diagnostic and/or prognostic biomarkers and to improve the knowledge of metabolic disturbance in this infection. We analyzed the plasma metabolome of 55 patients infected with SARS-CoV-2 and 45 controls by LC-HRMS at the time of viral diagnosis (D0). We first evaluated the ability to predict the diagnosis from the metabotype at D0 in an independent population. Next, we assessed the feasibility of predicting the disease evolution at the 7th and 15th day. Plasma metabolome allowed us to generate a discriminant multivariate model to predict the diagnosis of SARS-CoV-2 in an independent population (accuracy > 74%, sensitivity, specificity > 75%). We identified the role of the cytosine and tryptophan-nicotinamide pathways in this discrimination. However, metabolomic exploration modestly explained the disease evolution. Here, we present the first metabolomic study in SARS-CoV-2 patients which showed a high reliable prediction of early diagnosis. We have highlighted the role of the tryptophan-nicotinamide pathway clearly linked to inflammatory signals and microbiota, and the involvement of cytosine, previously described as a coordinator of cell metabolism in SARS-CoV-2. These findings could open new therapeutic perspectives as indirect targets.

During the full-scan acquisition (full MS, AGC target 1e6, maximum injection time 250 ms), which ranged from 58 to 870 m/z, the instrument operated at 70,000 resolution (m/z = 200).
As required for all biological analyses, pre-analytical and analytical steps of the experiment were validated by findings of Quality Control (QC) samples. Quality control solution was prepared from a mix of 20 random samples from our cohort. Thus this pool was representative of our cohort and they were extracted exactly with the same process as patients' samples. Consequently, they represented both, technical variation and extraction variation. Coefficients of variation [CV% = (the standard deviation/ mean) × 100], were calculated from all metabolites data. Metabolites with a CV in QCs > 30% were excluded from the final dataset. QCs were analyzed at the beginning of the run, every 10 samples and at the end of the run.
A targeted analysis was applied on the samples, based on a library of standard compounds (Mass Spectroscopy Metabolite Library (MSML) of standards, IROA Technologies). The following criteria were followed to identify the metabolites: (1) retention time of the detected metabolite within ± 20 s of the standard reference, (2) exact measured molecular mass of the metabolite within a range of 10 ppm around the known molecular mass of the reference compound, and (3) correspondence between isotopic ratios of the metabolite and the standard reference. The signal value was calculated using Xcalibur software (Thermo Fisher Scientific, San Jose, CA) by integrating the chromatographic peak area corresponding to the selected metabolite. At this step, the dataset contained the identity of the metabolites and the corresponding area for all the samples after analysis and validation by the specialist of mass spectrometry. Data were normalized to the sum, log-transformed and autoscaled before statistical analysis. Statistical analysis. Univariate analysis. Comparison of continuous parameters defining population characteristics, such as age, BMI, Charlson index and delay between SARS-Cov-2 RTPCR and sample collection, was done by Student or Wilcoxon test, depending on the distribution's normality (Shapiro test). WHO ordinal scale at each time point was considered by class defined by the cut off 3 (oxygen request) and was analysed by the Chi 2 test, as all the qualitative data.
The univariate analysis of metabolites levels between groups was based on fold-change (FC) values and the threshold of significance with the volcano plot and the non-parametric Wilcoxon test using the free software Metaboanalyst, version 4.0 (www.metab oanal yst.ca/faces /home.xhtml ). The x-axis represents the fold change between the subject groups (log scale). The y-axis represents the adjusted p-value for t-tests of differences between samples (negative log scale). We also used the False Discovery Rate to account for multiple testing and to highlight the most discriminant parameters (Benjamini and Hochberg). We highlighted only metabolites with p < 0.1 and FC (C −/C +) > 1.5 or < 0.67. In case of comparison between more than 2 groups, Kruskal Wallis test was performed.
Multivariate analysis. Multivariate analysis was performed (1) to evaluate the relation between metabolome and diagnosis as well as clinical evolution within the C + cohort (based on WHO ordinal scale evolution at D7 and the future of patients at D15 i.e. hospital discharge, hospitalization or death), and (2) to test the ability to predict the diagnosis and the clinical evolution within the C + cohort, defined as previously.
Classification was performed by unsupervised Principal Component Analysis (PCA) to evaluate distribution of samples and identify outsiders and supervised analysis based on Partial Least-Squares Discriminant Analysis (PLS-DA). The score plots show the classified samples, the values of Variable Influence on Projection (VIP) represent the importance of the compound (metabolites) for the PLS-DA models. Metaboanalyst modelling includes leave-one-out cross-validation (LOOCV). To test the relevance of these selected compounds, the quality of the model built from them was assessed by prediction accuracy and permutation test. The performance measures of the permutated data usually form a normal distribution, and if the performance score of the original data lies outside the distribution, then the results are considered to be significant.
Then, we established an independent validation for predictions' models. Thus, we used two independent cohorts (training and test sets) to establish the reliability of our models. We randomly divided our dataset into a training set comprising 80% of the C + and C − participants, and a test set consisting of the remaining 20% of patients (random selection of subjects within each group: C + and C −). Receiver Operating Characteristic (ROC) curves were generated by Monte-Carlo cross validation (MCCV) using balanced sub-sampling. In each MCCV, two thirds (2/3) of the samples were used to evaluate the feature importance. The top 5, 10 …100 (max) important features were then used to build classification models which is validated on the 1/3 the samples that were left out. The procedures were repeated multiple times to calculate the performance and confidence interval www.nature.com/scientificreports/ of each model. The method for classification and for features ranking was the PLS-DA as previously defined. Thus (1) we generated different PLS-DA models according to variable numbers of features in the training set, (2) we chose the model that provided the highest AUC with less than 10 metabolites to keep a population/metabolite ratio at about 10, (3) and finally we selected the metabolites the most often discriminant in the models to predict the diagnosis or the clinical evolution in the test set. We independently repeated this process 5 times to estimate the sensitivity, specificity, and positive predictive as well as negative predictive values (PPV: positive predictive value, NPV: negative predictive value) of prediction in the test set.
Pathway analysis. Venn diagrams, aiming at revealing the metabolites most significantly associated with the diagnosis or disease evolution after the different predictions based on PLS-DA models were constructed (free software Venny 2.1, https ://bioin fogp.cnb.csic.es/tools /venny /). Enrichment and pathway analysis were systematically performed from all discriminant metabolites highlighted in the PLS-DA models. Metabolic pathway enrichment analysis and pathway topology analysis was conducted using MetaboAnalyst computational platform (https ://www.metab oanal yst.ca/), which computes a single p value for each metabolic pathway. Pathway topology analysis applies graph theory to measure a given experimentally identified metabolite's importance in a pre-defined metabolic pathway. Measurements were computed using centrality to estimate the relative importance of individual nodes to the overall network.
Pathway analysis calculated pathway impact that represents a combination of the centrality and pathway enrichment results; higher impact values represent the relative importance of the pathway, relative to all pathways included in the analysis. The pathway impact value was calculated as the sum of importance measures of the metabolites, normalized by the sum of importance measures of all metabolites in each pathway 14 . Metabolic pathways are represented as a network of chemical compounds with metabolites as nodes and reactions as edges. Major criteria are used to perform an informative analysis regarding the quality of pathway data 15 .

Patients.
A total of 100 patients were included, of which 45 were C − and 55 were C +. Patients' characteristics are summarized in Table 1. Twenty four percent of C − patients had systematic SARS-CoV-2 screening depending on environmental context, 15% had cardiac failure, 20% had respiratory dysfunction and the remaining had diverse disorders (organic dysfunction or other infections). Populations were gender-matched (51% and 49% male in C − and C +, respectively). Age was similar in the groups since mean ± standard deviation (SD) age was 75.9 ± 17.5 in C − patients and 77.5 ± 16.0 years old in C + patients (p = 0.83). Importantly, BMI was similar in C − and C + patients (p = 0.22). The Charlson comorbidity index as well as distinct known risk factors for severe COVID-19 were similar between the groups (p = 0.22). We noted that C + patients had more symptoms as 71% of C + patients had more than 2 symptoms (out of the main 4 symptoms) compared to 48% in C− patients (p = 0.024). Cough was more frequent in C + patients. The delay between plasma sample collection and SARS-CoV-2 RTPCR was similar (2.6 ± 1.8 vs. 3.6 ± 2.6 days in C − and C +, respectively; p = 0.07). After correction for multiple testing, biological parameters were not significantly different between groups (supplementary table S1). We observed a trend of lower troponin concentrations for C + patients but it was not significant after correction of Benjamini and Hochberg (raw p-value = 0.03, adjusted p-value: 0.003).
Although the percentage of patients with WHO ordinal scale > 3 was similar between C − and C + groups, it was higher in C + (37.3%) versus C − groups (15.9%) at t2 (p = 0.022). The variation of WHO ordinal scale over 7 days showed that 12.7% of C + patients versus 4.5% of C − patients had worse respiratory function (i.e. decrease of WHO scale, NS). Interestingly, the percentage of C − patients under mechanical ventilation decreased from 11.90% at D0 to 6.98% at D7 and it increased from 17.31% at D0 to 20.75% at D7 in C + patients (no significant). Taking into account the specific patients' management, we found no statistical difference of pO2, spO2, and respiratory frequency at any times between both groups of patients (not shown). Although most of C − patients left the hospital after the acute episode of SARS-CoV-2 suspicion, C + patients were still hospitalized 15 days later, and a higher proportion died (p < 0.0001).

Metabolic findings.
According to the process of data pre-treatment, the final dataset contained only metabolites presenting low pre-analytical and analytical variabilities. After filtering regime, we obtained 67 out of 233 metabolites for C18-negative mode, 71 out of 303 metabolites for C18-positive mode and 80 out of 334 for HILIC column. The redundancy was analyzed to keep at the end 160 metabolites. So we kept only 25% of the metabolites initially detected (supplementary table S2). The median CV of the metabolites from the final data set was 10.5%. To have an overview of all metabolic pathways that could be evaluated with reliability in our study, we presented the list of these pathways corresponding to the 160 metabolites retained in the final dataset. Supplementary figure S1.
Unsupervised analysis (PCA) did not reveal any outsiders. PLS-DA divided patients into C + and C − groups with correct performances defined by an accuracy at 74% and a significant permutation test (p < 0.01), ensuring the robustness of the model. The score plot and the important features contributing to this model are represented in Fig. 1B and C, respectively. The eight metabolites that had VIP score higher than 2 were the same as those discriminant between groups after univariate analysis. The metabolic pathways associated with the discriminant

Metabolome profile can correctly predict SARS-CoV-2 infection on an independent population.
We obtained quite similar performances of the PLS-DA models in the 5 different training sets (not shown). Figure 2 is based on an example of a prediction (out of the 5 predictions realized) and shows the main results of the different steps previously described to predict the diagnosis in the test set from the training set. The ROC curves built from PLS-DA based on a variable number of features in the training set showed excellent performances. The ROC curve providing the best performances with less than 10 features revealed an AUC at 0.763 for this example of prediction. The 10 features used to predict diagnosis in the test set enabled the construction of a ROC curve with AUC at 0.879 (p < 0.01) and provided a probability of group prediction as shown in Fig. 2.
In the present case, 3 patients were poorly predicted out of 20. According to the same strategy applied for the 5 different and independent predictions, we obtained satisfactory reproducibility of performance criteria between models with sensitivity, specificity, PPV and NPV > 75% in the independent populations (i.e. test sets). Interestingly, Venn diagrams (not shown) revealed that 6 metabolites were present in the 5 independent predictions: L-asparagine, 1-NH 2 -cyclopropane-1-carboxylate, cytosine and L-isoleucine, L-leucine and 2-aminophenol. As expected all of them were also discriminant in the entire cohort.
Metabolome analysis did not predict clinical outcomes of COVID-19 patients with high reliability. We explored whether metabolome profiles predicted outcomes 7 and 15 days after sampling. After 7 days, the WHO ordinal scale increased (i.e., clinical worsening) in 7 patients, was stable in 27 patients, and decreased (i.e. clinical improved) in 21 patients. Univariate analysis did not highlight any discriminant metabolite. The PLS-DA model showed separation of the subgroups although accuracy was poor (58%, Fig. 3A), with a non-significant permutation test (p = 0.65). Among the 15 metabolites involved (Fig. 3B) in the separation of the subgroups based on WHO ordinal scale, we found guanidinoacetate and proline involved in arginine and proline metabolism, as well as xanthine and adenosine involved in purine metabolism. We also tested a model Table 1. Demographical and clinical characteristics of patients infected by SARS-CoV-2 (C +) and controls (C −). www.nature.com/scientificreports/ to distinguish only stationary patients from those who improved their respiratory function (i.e. exclusion of patients who became more severe). This model was also not significant. After 15 days, 6 patients had died, 37 patients were still hospitalized and 12 had been discharged from hospital. Kruskal Wallis analysis of metabolites levels between groups of patients becoming revealed some metabolites presented in figure S2. Multivariate analysis (Fig. 3C,D) showed a model with good accuracy (> 75%) but a no significant permutation test. The model was not improved after exclusion of patients who died. Among the most discriminant metabolites, we found L-ornithine and L-glutamine involved in arginine metabolism, xanthine and adenine involved in purine metabolism, as well as 4-guanidinobutanoate and L-ornithine, both involved in arginine and proline metabolism.

Non COVID-19 patients (C −) COVID-19 patients (C +) p-value
A Venn diagram including all metabolites found in the 2 models performed to predict clinical outcomes is presented in Fig. 4A. We noted 2 common metabolites, thymine and xanthine and the most involved metabolic pathways associated with clinical outcomes were spermidine/spermine biosynthesis, purine, arginine and proline metabolism (Fig. 4C) that are not directly connected as shared metabolites were < 25% of the total number of their combined metabolite sets (Fig. 4B).

Discussion
To our knowledge, this study is the first to evaluate the plasma metabolome profile of patients with COVID-19 at the time of the viral diagnosis (very close to their infection, as the screening strategy aimed at the earliest diagnosis) compared to suspected patients for SARS-CoV2 infection with negative RTPCR. Our findings revealed a clear clustering of patients according to the status infected (C +) or not (C −) and also highlighted the power of the multivariate model to predict patients' diagnosis. The weaker relation between plasma metabolome and clinical severity within the C + group did not allow considering the metabolome profile as a prognostic biomarker, mainly due to the low number of patients with clinical degradation. However, this strategy helped us to identify some interesting metabolic pathways. www.nature.com/scientificreports/ Characteristics of the study population. The population in this study mostly comprised patients with mild or moderate disease both in the C − and C + groups. Criteria for prescription of SARS-CoV-2 RTPCR at our hospital did not change during the recruitment period, ensuring homogeneity of collected data over the inclusion period. Contrary to the recent work studying proteomic and metabolomics profiles of COVID-19 patients 11 , our samples were collected at the time of the diagnosis in a population naïve for anti-viral treatment. Taken into account the heterogeneity of disease evolution, this difference of inclusion criteria may explain some different findings. Although the C + and C − populations were comparable for most parameters, there was a higher frequency of cough and a higher proportion of patients with more than 2 symptoms in the C + group. Both C − and C + were frequently overweight 16 . Importantly, age and BMI were similar between groups. Likewise, diabetes was as frequent in both groups, thus limiting the putative repercussions of differential complications of diabetes on metabolome pattern [17][18][19] . As metabolites concentrations may be associated with BMI, diabetes or age for example, these parameters must be carefully controlled as it was done in this present study. Altogether, the matching of both groups suggested the absence of confusion factors that could alter metabolomics findings in our study. A fraction of patients with COVID-19 experiences clinical worsening typically 7-10 days after diagnosis 20 , thus we collected clinical data 7 and 15 days after SARS-CoV-2 RTPCR to assess ability of metabolome profiles to predict outcomes. The fraction of patients under mechanical ventilation among C + patient at D7 (20.75%) contributes to the increased WHO ordinal scale, linked with respiratory failure described in the literature 21 .
As expected, the decrease of the WHO ordinal scale from D0 to D7 was less frequent in C + patients. Likewise, patients of the C + group were more likely to be hospitalized or deceased at the 15th day after diagnosis, comparable with literature data 5 . This probably directly reflected severity of COVID-19 rather than other ailments since both groups were otherwise similar at baseline. In addition, since our hospital was not saturated even at the peak of the COVID-19 outbreak, it is unlikely that unusually early discharge of C− patients accounts for the difference in hospitalization rates between groups.  www.nature.com/scientificreports/ The strategy used in the present study overcomes many of the obstacles of metabolomics experiments through robust internal validation coupled with a validation in an independent cohort (performed 5 times, randomly). The step of independent validation adds robustness to the important concepts of reproducibility, validity, and generalizability, and produces independent confirmation of metabolic markers 23 . In this context of not overfitting modelling, we found correct performances of prediction (criteria > 75%), that would lead us to pursue, and to combine metabolome profile to other parameters, such as inflammatory factors for example. We highlighted some highly discriminant metabolites (cytosine, indole 3 acetic acid) and the pathways of biotine and nicotinate, nicotinamide metabolism.

Involvement of tryptophan-nicotinamide pathway in SARS-COV-2. Univariate and multivariate
analyses have identified two metabolites that are central in the tryptophan-nicotinamide pathway. The 3-indole acetic acid is a breakdown product of tryptophan metabolism. Nicotimamide, an amide derivative of nicotinic acid, is a precursor for generation of the coenzymes NAD + and NADP +, which are essential for many metabolic pathways. The tryptophan-nicotinamide pathway consists of two parts. The first part is from tryptophan to quinolinic acid, and the second is from quinolinic acid to N1-methyl-2-pridone-5-carboxamide and N1-methyl-4-pridone-3-carboxamide including the NAD cycle and nicotinamide catabolism 24 . These metabolomics findings may be linked with the largely described inflammatory signals of tryptophan-kynurenine metabolism 25,26 . The role of peptidyl-dipeptidase A 2 (ACE2) has been largely described in SARS-CoV-2 infection 27 . ACE2- www.nature.com/scientificreports/ dependent changes in epithelial immunity and the gut microbiota can be directly regulated by tryptophan 28 . The tryptophan-nicotinamide pathway can also act on mTOR activation, which is involved in cell proliferation, survival, transcription and expression of intestinal antimicrobial peptides 29 . The putative consequences of antimicrobial peptides on the intestinal composition of the gut microbiota open perspectives of microbiome role in this infection 30 . Interestingly tryptophan represents a metabolic node that involves serotonine synthesis, kynurenine pathway and the indole/aryl hydrocarbon receptor (AHR) pathway. Indole acetic acid is a ligand of AHR that has been involved in many diseases involving immune and inflammatory processes. All these mechanisms also found in microbiota support the promising perspective of combining peripheral metabolism and microbiome exploration 31 . Omics findings of Shen et al. 11 , and more recently Thomas et al. 32 , also reported the activation of kynurenine pathway in COVID-19 patients. They suggested that NAD synthetized from tryptophan modulates macrophage activity such as the release of interleukin-6 and tumor necrosis factor alpha 33 . Moreover Farsalinos et al. 34 considered SARS-CoV-2 as a disease for the nicotonic cholinergic system. They suggested that the inflammatory response observed in SARS-CoV-2 patients 7,35 leads to clinical characteristics that could be linked to a modification of the cholinergic anti-inflammatory pathway. Consequently, this group suggested that nicotine should be protective in SARS-CoV-2 patients due to its anti-inflammatory role. Metabolomics applied to therapeutics i.e. pharmacometabolomics should be of great interest in this context.

Increased cytosine levels in COVID-19 patients.
Cytosine was the main discriminant metabolite between C + and C− patients. Cytosine belongs to the pyrimidine class and is one of the four main bases found in DNA and RNA. Viral infections are known to cause significant metabolic changes in host cells, such as upregulation of pyrimidine nucleotide biosynthesis. Danchin A et al. 36 , reported the importance of cytosine as a coordinator of cell metabolism in SARS-CoV-2. It has been previously shown that the base composition of human mRNA and SARS-CoV-2 RNA is quite different. Indeed, the cytosine amount is lower in SARS-CoV-2 RNA genome (17.6% C vs. 30.2% A, 19.9% G, 32.4% U) than in human RNA. The increased plasma cytosine levels in COVID-19 patients may correspond to the coupling between synthesis of viral particles and the host cell's metabolism and this loss in cytosine may be associated with escape innate immunity. Moreover, cytosine appears as key in the virus evolution as cytosine availability drives RNA virus evolve a new progeny 36 . According to our findings, we suggest that the reduced percentage in cytosine in the SARS-CoV-2 genome may be associated with increased levels in biological fluids of the host, as the synthesis machinery of bases may be increased, cytosine is probably few used by this virus and cytosine may be released by cell lysis inherent to infection.

Modest association between metabolic disturbance and COVID-19 severity.
The impact of the immune response on SARS-CoV-2 severity is still an enigma and we suggested that metabolic status may be an additional tool to characterize clinical evolution. The most described predictors of disease severity are older age and comorbidities 4 . PLS-DA models (Fig. 3) showed a mild separation between groups but with a high recovery between these groups, and a high heterogeneity within each subgroup of disease evolution. Although exclusion of the most severe patients showed a better separation of groups, this was most probably due to the low statistical power which was more appropriate to separate 2 instead of 3 groups in small population. Other types of modelisations have been performed (Othogonal PLS-DA, random forest), but the models were still not significant (not shown).
Purine and pyrimidine pathways [i.e. xanthine, thymine (Fig. 4A)] have been highlighted and may be linked with purine and pyrimidine release from cell lysis 37 . ATP and NAD + are described as excitatory molecules and adenosine as anti-inflammatory effector on immune cells. Importantly, purines play a crucial role in controlling the activation and differentiation of immune cells. Inflammation induced by SARS-CoV-2 infection may be controlled in part by the release and metabolism of purines, modulated by environmental factors such as hypoxia 37 . As these findings are consistent with the involvement of tryptophan-nicotinamide pathway previously discussed, further studies with a complete evaluation of inflammation status and evolution would be of great interest to better understand both mechanisms in this infection. Consistently, the relation between metabolomics profile and inflammatory cytokines evaluated in mice with H1N1 influenza virus infection showed effects of the virus infection on tryptophan and other amino acids, and effects on pathways such as purines, pyrimidines and lipids 38 . Moreover, other mechanisms may be related to purine and pyrimidine release independently from cell lysis 39 , for example in response to biochemical or mechanical or physical stimuli. The nucleotide storage and release from secretory granules, described in airway epithelial ciliated and goblet cells in lung tissues may be involved in such kind of infection and may explained our findings 39 .
Arginine and proline metabolism as well as end products (polyamine: spermidine) were also related with clinical evolution. Arginine is an essential aminoacids for NO homeostasis and polyamines are small aliphatic polycations, ubiquitously found in living cells that have multiple functions, which are only partially understood 40 , including protection against stress induced by Reactive Oxygen Species (ROS) 41 and the induction of autophagy 42 . Interestingly our findings are consistent with those of Shen et al. 11 , who described enrichment in some of aminoacids, including metabolites involved in arginine metabolism.

Perspectives and conclusion
To our knowledge, this work represents the first metabolomics study into early diagnosis or prognosis biomarkers of COVID-19. Whether metabolomics analysis can provide such a tool remains to be explored in prospective studies, but to date, we report a significant plasma metabolome profile of COVID-19 patients with involvement of the tryptophan-nicotinamide pathway as well as cytosine metabolism. This strategy has to be considered as helpful to characterize patients, to determine homogeneous subgroups of infected patients, probably in combination with viral biomarkers. Metabolomics must be applied on an independent cohort with more severe patients www.nature.com/scientificreports/ and in parallel to inflammation exploration to confirm the suspected mechanisms associated with this infection. Omics approaches in host cells infected by this virus may complete the data about the dynamics of metabolisminflammation link, as well as the short and long term consequences on cells homeostasis. Our findings open the perspective of omics combination with proteomics and lipidomics to expand the metabolic coverage 43 .