The plasma metabolome of long COVID patients two years after infection

One of the major challenges currently faced by global health systems is the prolonged COVID-19 syndrome (also known as “long COVID”) which has emerged as a consequence of the SARS-CoV-2 epidemic. It is estimated that at least 30% of patients who have had COVID-19 will develop long COVID. In this study, our goal was to assess the plasma metabolome in a total of 100 samples collected from healthy controls, COVID-19 patients, and long COVID patients recruited in Mexico between 2020 and 2022. A targeted metabolomics approach using a combination of LC–MS/MS and FIA MS/MS was performed to quantify 108 metabolites. IL-17 and leptin were measured in long COVID patients by immunoenzymatic assay. The comparison of paired COVID-19/long COVID-19 samples revealed 53 metabolites that were statistically different. Compared to controls, 27 metabolites remained dysregulated even after two years. Post-COVID-19 patients displayed a heterogeneous metabolic profile. Lactic acid, lactate/pyruvate ratio, ornithine/citrulline ratio, and arginine were identified as the most relevant metabolites for distinguishing patients with more complicated long COVID evolution. Additionally, IL-17 levels were significantly increased in these patients. Mitochondrial dysfunction, redox state imbalance, impaired energy metabolism, and chronic immune dysregulation are likely to be the main hallmarks of long COVID even two years after acute COVID-19 infection.


Results
Demographic, clinical data and symptoms description. Table 1 shows baseline characteristics of patients enrolled in the study. Age was statistically different between negative (healthy) controls and COVID-19 patients. However, differences were not found in the self-reported comorbidities. Six patients (12.5%) developed mild disease; 37 patients developed (77%) moderate/severe disease while five patients (10.4%) developed critical disease. Six patients (12.5%) were reinfected during 2021 and 2022. All patients were fully vaccinated during the period of 2021-2022. Additional information from class A long COVID patients, class B long COVID patients, and recovered (non-long COVID) is provided in Supplementary Table 1. The questionnaire answered by the patients revealed the most persistent symptoms which were grouped into five broad categories: systemic, neurologic, psychiatric, cardiologic, and respiratory. The most predominant symptoms were loss of memory (73.3%), sleep disorders, arthralgia, fatigue, exercise intolerance, myalgia Table 1  www.nature.com/scientificreports/ (66.7%), and anxiety (60.0%) (Fig. 1). Class A patients experienced mainly neuropsychiatric symptoms with a co-occurrence of five symptoms and less, while class B patients experienced both neuropsychiatric and systemic symptoms, with a frequency of more than five concomitant symptoms ( Supplementary Fig. 1).
The multivariate analysis (PLS-DA) showed a clear separation between both classes (accuracy: 1; R 2 : 0.98; Q 2 :0.89) (Fig. 3c). The VIP score plot (Fig. 3d) shows that the most important variables that can be used to differentiate negative controls from long COVID-19 patients are phenylalanine, glutamine/glutamate ratio, taurine and glutamine.
Investigating post-COVID-19 patients: comparison between long COVID class A, class B patients and recovered (non-long COVID). Differences were found within the post-COVID-19 group, both in the frequency of symptoms reported, and in the plasma levels of some metabolites such as lactic acid, with a bimodal distribution across the group. Therefore, these patients were subclassified according to our own scale as a surrogate for disease severity. 18 patients did not report any symptoms (recovered or non-long COVID). 18 patients reported less than five persistent symptoms (class A long COVID), while 12 reported more than five symptoms (class B long COVID).
We measured the levels of ammonia (in the form of plasmatic urea) in class B patients. The concentration of urea in COVID-19 phase was 43.8 ± 7.35 mg/dL, and 32.8 ± 3.1 mg/dL in long COVID phase. Although the urea levels were lower during the long COVID phase (falling within normal values), no significant differences were found between the two phases (t-test for paired samples, p = 0.146). Blood urea nitrogen (BUN) was similar for all post-COVID patients. Figure 4 shows the box and whisker plots based on one way ANOVA for class A, class B, and fully recovered patients. The lactate/pyruvate ratio (adjusted p value = 5.9 × 10 −6 ), lactate (adjusted p value = 1.3 × 10 -5 ), arginine (2.0 × 10 -3 ), ornithine/citrulline ratio (adjusted p value 2.0 × 10 -3 ) were the variables best able to differentiate long COVID patients with more than five symptoms from patients with less than five symptoms. Arginine negatively correlated with the number of symptoms. The differences in glucose levels between class A and class B patients were found to be non-significant (p = 0.09). To evaluate the correlation of symptoms, significant variables, and the laboratory data, an integration of the metadata was done ( Supplementary Fig. 2). A negative correlation Pathway analysis. Our pathway enrichment analysis (Fig. 5) shows that the top five metabolic pathways significantly dysregulated (FDR < 0.05) in post-COVID patients (relative to controls) were: phospholipids biosynthesis, gluconeogenesis, the glucose-alanine cycle, the Warburg effect, and taurine and hypotaurine metabolism. When comparing class B patients with those recovered, the top five metabolic pathways (FDR < 0.05) were: pyruvate metabolism, gluconeogenesis, glycine and serine metabolism, urea cycle metabolism, and the Warburg effect.
Plasma IL-17 and leptin. Figure 6 shows plasma concentrations of IL-17 and leptin as measured by ELISA.
IL-17 was significantly increased in class B patients relative to class A patients (Mann-Whitney test, p = 0.0073) and recovered patients (Mann-Whitney test, p = 0.002). Leptin did not show any statistically significant differences in the three-group comparison.

Discussion
Cumulative evidence from the last three years supports the dysregulation of metabolic and immune markers due to SARS-CoV-2 infection 14 . A retrospective cohort study has demonstrated that COVID-19 patients have a significantly higher risk to develop subsequent autoimmune diseases such as rheumatoid arthritis, ankylosing spondylitis, systemic sclerosis, type I diabetes mellitus, among others 15 . In the present work, our aim was to evaluate the persistence of long-term metabolic alterations in long COVID-19 patients, as well as to measure immune markers that, when chronically produced, can trigger autoimmune diseases.
Since well-defined classification or diagnostic criteria are not available for the long COVID assessment, there is an urgent need for molecular methods able to stratify patients according to the severity of the symptoms they are experiencing. Quantitative and validated scales, such as HAM-A, HAM-D, MoCA and mMRC are considered www.nature.com/scientificreports/ gold standards for neurocognitive impairment and for dyspnea assessment. However, their practical utility could be limited for complex conditions such as long COVID where a broader range of self-reported symptoms with different severity and duration are present. It has been reported that some long COVID-19 patients complain about extreme cognitive disorders (self-reported symptoms) but without any objective alterations, while others do not report symptoms but exhibit severe cognitive disorders after 6 to 9 months following SARS-CoV-2 infection 16 .
In our study, several symptoms were corroborated through objective measures, but with lower rates when using validated scales. Therefore, molecular markers are urgently needed for the correct classification of patients. Our results revealed that 50% of analyzed plasma metabolites showed statistical differences between COVID-19 and long COVID-19 phases in patients with a more complicated evolution. One of the most dysregulated metabolites was glucose. Montefusco et al. 17 reported glycemic abnormalities in recovered patients two months after the onset of disease. The hyperglycemic state has been reported to be even worse in hospitalized patients, pointing to a possible causal role of administered drug regimens, including remdesivir and corticosteroids. These drugs stimulate hepatic gluconeogenesis from amino acids released from muscles, which then inhibits glucose uptake 18 . Long COVID has been associated with new-onset insulin resistance which may contribute to the onset of depressive symptoms by enhancing overall neurotoxicity 19 .
A number of other metabolites were also found to be dysregulated. Increased plasma pyruvate levels could be both a consequence of glycolytic dysregulation and protein degradation. The increase in putrescine levels in the long COVID phase may be an indicator of increased protein degradation to help fuel pyruvate metabolism.
Taurine and spermidine were found significantly decreased in the long COVID phase, although a trend towards normalization was observed when compared with controls. Decreased levels of serum taurine have been observed in patients with Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) 20 . The depleted levels observed in long COVID-19 phase could explain at least in part the fatigue, since taurine has multiple roles in skeletal muscle, the central nervous system, and energy metabolism. Nevertheless, based on our results, we did not find any correlation between fatigue or myalgia and taurine concentrations in long COVID-19 patients. Holmes et al. 21 found that taurine levels were increased in post-COVID-19 patients, suggesting hepatic injury, hepatotoxicity, or muscle damage. However, the cohort evaluated in the Holmes study had a three-month followup after the initial infection, which is much shorter than the follow-up used in this study.
Furthermore, we observed increased levels of kynurenine, and a trend towards normalization in tryptophan and the kynurenine/tryptophan ratio in long COVID-19 patients. This indicates that, although lower in magnitude, the inflammatory conditions attributable to the hyperactivation of this metabolic pathway are still present www.nature.com/scientificreports/ and may account for some persistent physiological symptoms in these patients. Increased levels of hippuric acid in the long COVID-19 phase could be associated with a residual intestinal dysbiosis. This metabolite has been found increased in patients with chronic kidney disease and several age-related conditions 22 . An increase in the plasma levels of hippuric acid may also be either the result of an increased fruits and vegetable intake.
Our study revealed increased levels of metabolites associated with collagen metabolism in long COVID patients. Among these metabolites, proline is particularly noteworthy due to its involvement in protein structure and function, as well as its role in maintaining cellular redox homeostasis through the generation of ATP and reactive oxygen species (ROS) during its catabolism. Proline can be synthesized from arginine through various enzymes, including arginase (both type I and type II), ornithine aminotransferase, and P5C reductase 23 . The glutamate/P5C synthase pathway in the intestine is responsible for most of the proline synthesis in the body. The increased levels of proline may arise from arginine or glutamine pathways, potentially in response to hypoxia 24 or tissue damage. Elevated blood levels of hydroxyproline have been proposed as a biomarker for diseases  www.nature.com/scientificreports/ characterized by fibrosis, indicating an increased demand for proline in collagen synthesis. Trans-hydroxyproline plays a crucial role in collagen synthesis and contributes to the thermodynamic stability of the triple-helical conformation of collagen and associated tissues 25 . In our study, both class A and class B patients had partial lung recovery, as evidenced by persistent lung function alterations observed in the lung CT scans ( Supplementary  Fig. 3). However, the measured levels of trans-hydroxyproline did not show a significant correlation with these findings. Nonetheless, the potential presence of collagen vascular diseases cannot be ruled out, as they may contribute to a more widespread systemic dysfunction, sometimes associated with viral infections 26 . The increase in glutamine (and decrease in glutamate levels) indicates a partial reestablishment of critical processes that took place during the COVID-19 infection phase, such as severe immunometabolic dysregulation. During COVID-19 phase, a decrease in circulating levels of glutamine has been widely described [27][28][29][30][31][32] . This depletion is associated with the consumption of glutamine to generate ATP and precursors (purines and pyrimidines) for the synthesis of macromolecules to assemble progeny viruses; to fuel TCA cycle (as in cancer cell metabolism); to regulate the function of immune cells for maximal cytokine production, lymphocyte function, and for the growth, plasma T cell differentiation, and antibody production by B lymphocytes; and to promote interorgan nitrogen exchange, ammonia detoxification, and pH homeostasis. Glutamine/glutamate pathway is therefore closely related to energy metabolism. Dysregulations in this axis have been previously associated with increased risk of type 2 diabetes and other inflammatory diseases 33,34 . In the post-COVID phase, by the contrary, an increase in glutamine concentrations and a decrease in glutamate (higher glutamine/glutamate ratio) could be associated with long-term recovery, since glutamine demand for immune activation, nucleotide synthesis and ammonia detoxification has decreased. However, in long COVID patients experiencing multiple symptoms (neuropsychiatric and systemic), immune dysregulation persists, leading to a persistent imbalance in glutamine/ glutamate metabolism, since glutamine may act as a signaling metabolite. Disruption within the glutamatergic pathway can lead to important neurological consequences, such as cognitive deficits 35,36 . In fact, this exquisitely sensitive glutamine/glutamate homeostasis has been reported disturbed in schizophrenia 37 and frontotemporal dementia 38 . The glutaminergic dysfunction could be associated with some psychiatric and neurologic symptoms like those reported in the present work.
Alterations in lipid metabolism are evident in most long COVID-19 patients. These patients exhibited significantly higher levels of carnitine and some short, medium, and long acylcarnitines. These alterations have been largely associated with altered fatty acid metabolism, dysfunctional mitochondria-dependent lipid catabolism, and immune processes or the lysis of white blood cells. Similar results have been reported by Guntur et al. 39 , pointing to mitochondrial dysfunction, as was also recognized during COVID-19 acute phase. Besides, decreased levels of LysoPC 16:0, LysoPC 17:0, LysoPC 18:0, and LysoPC 20:4, were found with respect to negative controls. These reductions have been reported in other inflammatory conditions 40 and other septic processes 41 . Depleted levels of lysophosphatidylcholines and phospholipid ethers, as well as depleted levels of PCs, can impede mitochondrial respiration, as has been also demonstrated in ME/CFS 42 . In line with the lipid dysregulation demonstrated by the targeted metabolomic analysis, routine clinical laboratory tests exhibited elevated levels of total cholesterol, triglycerides, and VLDL, as well as normal levels of HDL and LDL. Xu et al. 43 found increased LDL, triglycerides, total cholesterol, and decreased HDL in survivors of COVID-19, based on a large observational study with participants from the US Department of Veterans Affairs database compared to controls who had never tested positive for COVID-19.
As positive findings for the metabolic state of long COVID patients, we found that 30 metabolites fell within normal levels. Phenylalanine, which has been widely associated with sepsis and COVID disease severity 14 decreased to normal levels. Beta-hydroxybutyric acid and citric acid were also normalized, indicating partial recovery of the tricarboxylic acid cycle 14,44 . Butyric acid and propionic acid, two short-chain fatty acids that were found to be altered during COVID-19 phase, also fell within normal levels in post-COVID-19 patients, probably indicating that the leaky gut phenomenon and gut dysbiosis detected during COVID infection could be partially reestablished 45 . Spermidine was also normalized. The decrease in spermidine levels could reflect a trend for normalization in overall redox balance. Although excessive levels of spermidine (as those reported in COVID-19 patients) trigger the production of superoxide radicals, optimal concentrations mitigate oxidative stress and diminish overall ROS production 46 .
In addition, sphingomyelins and long-chain monounsaturated and saturated LysoPCs were found to be within normal levels. We previously noted altered sphingolipids levels during COVID-19 infection 14 . Sphingolipids play a crucial role in the regulation of signal transduction pathways and in certain pathological conditions, such as inflammation-associated illnesses and innate immune response.
In a recent report, Holmes et al. 21 found a high degree of interindividual variability in follow-up patients, reflecting the heterogeneity of post-COVID-19 patients and the fact that long COVID is a spectrum of disorders. Indeed, computationally modeling of the long COVID phenotype data based on electronic healthcare records found six distinct clusters, each with distinct profiles of phenotypic abnormalities 47 . Since symptom classification is still highly subjective, we decided to arbitrarily classify long COVID patients as: class A (less than five symptoms, mainly neuropsychiatric disorders), and class B (more than five symptoms, with a broad spectrum of systemic disorders). In a recent article 48 , the authors proposed a definition of PASC based on self-reported symptoms, identifying four distinct clusters with concomitant symptoms ranging from two to six, to address the heterogeneity of long COVID. The four PASC subgroups were identified as follows: cluster 1: loss of or change in smell or taste; cluster 2: post-exertional malaise and fatigue; cluster 3: brain fog, post-exertional malaise, and fatigue; cluster 4: fatigue, post-exertional malaise, dizziness, brain fog, gastrointestinal symptoms, and palpitations. Our own classification scale aligns with this study, both in terms of the number and types of symptoms included.
We believe that metabolic information may complement, and partially explain the phenotypic differences among long COVID-19 patients. Xu et al. 49  www.nature.com/scientificreports/ finding increased levels of triacylglycerols, phosphatidylcholines, prostaglandin E2, arginine, and decreased levels of betaine and adenosine in patients with abnormal pulmonary function. In our work, lactic acid levels were increased in patients with more than five symptoms and systemic disorders (class B patients). Ghali et al. 50 found that patients with ME/CFS exhibited elevated blood lactate at rest. Mitochondrial dysfunction, with increased blood lactate, low levels of ATP, and increased levels of oxidative stress markers have been associated to these alterations 51 , as well as relative deficiency of mitochondria type I fibers on muscle biopsies, and low intracellular pH during recovery phase [52][53][54] . De Boer et al. 55 also reported altered lactate levels in long COVID patients, suggesting that long COVID patients have significant impairment in fat beta-oxidation and increased blood lactate accumulation even during low-intensity exercise. In contrast, Guntur et al. 39 reported low levels of lactic acid and pyruvate in long COVID patients. However, this study was conducted in non-hospitalized patients who had recovered from COVID-19 in March 2020.
Increased level of the lactate/pyruvate ratio in class B patients is another important indicator of mitochondrial dysfunction. The lactate/pyruvate ratio has been proposed as a marker for mitochondrial disorders since it indirectly reflects the NADH/NAD + redox state 56 , lipid metabolism (fat oxidation), and ATP generation. In our study, both markers (lactate and the lactate/pyruvate ratio) were found positively correlated with fatigue, myalgia and arthralgias (Spearman correlation, R > 0.6, p < 0.05).
The increased ornithine/citrulline ratio level in class B patients reflects abnormal metabolic activity in the urea cycle. It is notable that Yamano et al. 57 reported a similarly increased ornithine/citrulline ratio in CFS patients. An adequate balance of citrulline and ornithine is vital for the clearance of ammonia via urea cycle 58 . If ammonia accumulates intracellularly, the aerobic utilization of pyruvate to feed the TCA cycle is inhibited, resulting in lactate production, which further contributes to fatigue.
In addition, class B patients had decreased levels of arginine in comparison with the other subgroups. The reduced bioavailability of arginine to produce adequate levels of nitric oxide in endothelial cells and vascular tissues leads to the impairment of multiple physiological functions of skeletal muscles, including contractile functions, and muscle repair 59 . Arginine is also a substrate for ornithine production by arginase. It is well known that under certain inflammatory conditions, arginase activity is increased 59 , producing an excess of ornithine and an imbalance in the urea cycle.
Previous studies have pointed to the persistent immune dysregulation following COVID-19 infection 60 . We found increased levels of monocytes in class B patients. Nuber-Champier et al. 61 found that monocyte percentage in the acute phase of the disease allowed them to distinguish between patients with anosognosia for memory deficits in the chronic phase (6-9 months after SARS-CoV-2 infection) and nosognosic patients.
We also measured IL-17 levels in post-COVID-19 patients since it is well known that this cytokine is persistently altered in several chronic inflammatory and autoimmune diseases 62 , and previous reports have indicated an increased risk of such diseases in COVID-19 patients 15 . IL-17 is a proinflammatory cytokine mainly produced by T helper type 17 cells, playing a vital role in the regulation of host immune response against SARS-CoV-2. IL-17-induced dysregulated immune responses have been shown to potentially cause hyperinflammatory COVID-19 disease 63 . It has been reported that IL-17 downregulates protein phosphatase 6, resulting in increased arginase-1 expression in psoriatic keratinocytes 64 . IL-17A has been found to be associated with neurological sequelae and pulmonary fibrosis in post-COVID-19 patients 65,66 . Fluctuations in IL-17 have been associated with fatigue and fatigue severity in ME/CFS patients 67 . Additionally, we measured leptin as it is believed to cause inflammatory fatigue. Leptin is produced mostly by adipose tissue and plays a role in regulating food intake, basal metabolism, and the β-oxidation of fatty acids. In metabolic diseases such as obesity, chronically elevated levels of leptin are observed, which can induce the production of proinflammatory molecules and impair immune self-tolerance, predisposing to develop conditions such as rheumatoid arthritis, inflammatory bowel disease, multiple sclerosis, and others 68 . Increased levels of leptin have been associated with higher fatigue scores in people with CFS 69 . In a study conducted by Stringer et al. 69 , the authors demonstrated that daily fatigue severity was significantly correlated with daily serum leptin levels in women with CFS. However, in our study, we did not find any statistical differences in leptin levels, despite observing higher levels in class B patients. Based on our results, levels of leptin cannot be associated with fatigue, which suggest that the action of IL-17 on metabolic pathways may play a more significant role in this regard.
Metabolomics is not only useful in providing a snapshot of transient physiological or pathophysiological processes taking place in a living organism, but it has also proven to be a powerful tool for proposing and monitoring therapeutic interventions. In the case of long COVID, a common situation worldwide is that patients have reported an absence of adequate support and a poor recognition of their condition, initially attributed to psychiatric issues. People with long COVID have tried a vast range of self-prescribed medicines, supplements, remedies, and dietary changes to manage the disease and to overcome the effects it has on their quality of life and work capacity. Based on our findings, some interventions could be tested for treating long COVID patients: (1) supplementation of taurine (reducing musculoskeletal disorders); (2) supplementation of citrulline (enhancing ammonia clearance and reducing blood lactate, as well as increasing arginine bioavailability for adequate NO production); (3) supplementation of glutamine (primary source for neurotransmitters and immune function balancing); (4) supplementation of antioxidants such as N-acetylcysteine or NAD + (redox balance); (5) supplementation of arginine (targeting endothelial dysfunction in Long-COVID), as has been previously suggested by Tosato et al. 70 . Similarities found in our results with the ME/CFS pathophysiology may pave the way to common therapeutic interventions for both diseases.
We need to acknowledge several limitations with this study. The small sample size was due to the limited number of patients who agreed to participate. While several objective measures of mood and cognition (HAM-A, HAM-D, MoCA mMRC) were used, the sample size did not allow for stratification of patients according to the different test scores obtained, and only self-reported symptoms were used for sub-group classification. Furthermore, we were unable to have a detailed tracking of treatments, medications or alternative therapies Scientific Reports | (2023) 13:12420 | https://doi.org/10.1038/s41598-023-39049-x www.nature.com/scientificreports/ during the period evaluated. This limited our interpretation with regard to the impact of pharmacological interventions on the metabolome. Also, some compounds such as hexoses were measured by direct injection (DI); therefore, it was not possible to differentiate glucose (the most abundant sugar) from its epimers. We did not have access to Ct values from the electronic files of the patients indicating the initial viral load. All the patients were infected with the original strain, which was the predominant strain circulating in 2020. There are limited studies examining the association between the initial viral load, as determined by Ct values, and the various long COVID-19 effects 71 . A study conducted in Mexico found that patients with a low Ct experienced between 15 and 20 symptoms, while patients with a high Ct value experienced fewer than six symptoms. Asymptomatic patients with Ct values between 33-36 showed no or very few post-COVID-19 symptoms 72 . However, in larger studies conducted more recently 73 , the focus has shifted towards viral persistence as a potential mechanism for long COVID, rather than solely considering the initial viral load. Patients from two different hospitals participated in our study: one private hospital located in Chihuahua city, and one public hospital located in Zacatecas city. In general, private hospital patients have relatively high incomes while public hospital patients have lower incomes. Logistic regression models showed no effects of sex, age, comorbidities, vaccination status or severity during the acute phase in the metabolomic profile associated with long COVID. However, patients from the public hospital reported more systemic symptoms in general, while patients from the private hospital reported principally neuropsychiatric symptoms. A recent study found that patients diagnosed with a post-COVID-19 condition were more likely to be unemployed or on public health insurance, illustrating racial and social disparities in access to and experience with healthcare, at least in the USA 74 . Whether the socioeconomic conditions and lifestyles, along with causes of biological origin influence the metabolic phenoreversion of patients recruited in our study, needs to be further investigated. This is particularly important in countries with significant health system disparities and significant differences in population life conditions.
At the moment of this study, most of the negative controls recruited in 2020 were tested positive for COVID-19 in subsequent waves in 2021 or 2022. Therefore, we could not compare the prevalence of symptoms in post-COVID-19 patients with a non-COVID-19 matched group. In a study conducted by Ballering et al. 75 , of the 76 422 participants longitudinally surveyed before and after COVID-19, 4231 (5.5%) had COVID-19 and were compared with 8462 matched controls. The proportion of participants who had at least one core symptom of substantially increased severity to at least moderate was 21.4% in COVID-19 participants versus 8.7% in controls, suggesting that core symptoms were attributed to COVID-19 in 12.7% of participants. These numbers are at present increasing as more epidemiological data are reported worldwide.

Conclusions
To our knowledge, this study is the first describing quantitative metabolic perturbations two years after the initial acute COVID-19 infection using targeted metabolomics. The evolution of post-COVID-19 patients is different, and symptoms are associated to distinctive metabolic patterns resembling, to some extent, the ME/CSF condition. Moreover, the differences observed between the phenotypes of post-COVID-19 patients reveals potential biomarkers that, once validated in larger and heterogeneous populations, and integrated together with clinical and sociodemographic data, will enable a more accurate and precise classification of long COVID patients beyond classification via self-reported symptoms.

Methods
Patient recruitment. For the aims of this study, COVID-19 patient survivors (with confirmed diagnostic based on a positive PCR for SARS-CoV-2) who developed a mild, severe, or critical disease, and were admitted (or hospitalized) in the Instituto Mexicano de Seguridad Social (Zacatecas city, Mexico) and Christus Muguerza del Parque Hospital (Chihuahua city, Mexico) between March and November 2020, were recruited. Participants were contacted for a face-to face interview. They were invited to respond to a questionnaire and to donate a blood sample. Plasma was isolated from the donated blood. COVID-19 patients from the Instituto Mexicano de Seguridad Social were recruited from an initial set of 124 COVID-19 patients enrolled in a previous research study 76 . Of these, 44 (35.6%) passed away during hospitalization and in the following months after hospital discharge. From 80 survivors, it was possible to contact 36 (by their Social Security Number or personal/relative phone number kept in hospital records), and 15 agreed to participate. For these 15 patients, paired plasma samples from the first diagnosis of the acute disease (COVID-19 group) and post-COVID phase were available.
Additionally, from a cohort of patients that were hospitalized in 2020 in Christus Muguerza del Parque Hospital, 33 were randomly selected by age stratification. For those patients, a basal blood sample was not available; however, all clinical information and chest computed tomography (CT) scans were recorded in the hospital archive.
For the neuropsychological assessment, the validated Hamilton Anxiety Rating Scale (HAM-A) 77 was used. For depression assessment, the Hamilton Depression scale (HAM-D) was used 78 . The Montreal Cognitive Assessment (MoCA) was employed for cognitive impairment 79 . For dyspnea assessment, the modified Medical Research Council (mMRC) dyspnea scale was implemented 80 . Basic blood biochemical markers were performed (i.e., hemoglobin, platelets, leukocytes, lymphocytes, and creatinine) for all enrolled patients.
To assess for differences in the severity of long COVID patients, our own classification was made (arbitrarily) considering the frequency of concomitant symptoms. Recovered patients were classified as those who did not report persistent symptoms. Long COVID was considered if patients reported at least one persistent neurologic, psychiatric, gastrointestinal, cardiologic, respiratory, or systemic symptom. The class A long COVID patients were those reporting less than five persistent symptoms (17 patients www.nature.com/scientificreports/ those reporting five or more persistent symptoms (13 patients). As negative controls and an indicator of normal population, stored plasma samples from 37 individuals who tested negative for SARS-CoV-2 in 2020 were used.

Metabolomics analysis.
A combination of direct injection mass spectrometry with a reverse-phase LC-MS/MS custom assay was used, as previously described 76 . Briefly, metabolites were measured using a locally developed LC-MS/MS metabolomics assay called The Metabolomics Innovation Centre (TMIC) Prime (TMIC PRIME®) Assay. This assay provides quantitative results for up to 143 endogenous metabolites, including biogenic amines, amino acids, organic acids, lipids, and lipid-like compounds. The method combines the derivatization and extraction of analytes, and the selective mass-spectrometric detection using multiple reaction monitoring (MRM) pairs. Isotope-labeled internal standards and other internal standards were used for metabolite quantification. The custom assay uses a 96 deep-well plate with a filter plate attached via sealing tape, and reagents and solvents used to prepare the plate assay. The first 14 wells of the 96-well plate were used for calibration and quality control with one double blank, three zero samples, seven calibration standards and three quality control samples. To measure all metabolites except organic acids, samples were first thawed on ice and were vortexed. 10 µL of each sample was loaded onto the center of the filter on the upper 96-well plate and dried under a stream of nitrogen. Subsequently, phenyl-isothiocyanate (PITC) was added for derivatization. After incubation, the filter spots were dried again using an evaporator. Extraction of the metabolites was then achieved by adding 300 µL of extraction solvent. The extracts were obtained by centrifugation into the lower 96-deep well plate, followed by a dilution step with the mass spectrometry running solvent.
For organic acid analysis, 150 µL of ice-cold methanol and 10 µL of isotope-labeled internal standard mixture was added to 50 µL of each plasma sample for overnight protein precipitation. Each sample was then centrifuged at 13,000×g for 15 min. 50 µL of supernatant was loaded into the center of wells of a 96-deep well plate, followed by the addition of 13 C labeled 3-nitrophenylhydrazine (3-NPH) as an isotopic labeling reagent (for quantification). After incubation for 2 h, butylated hydroxytoluene (as a stabilizer) and water were added before LC-MS injection.
Mass spectrometric analysis for the PITC-derivatized and 3-NPH-derivatized samples was performed on an ABSciex 4000 Qtrap® tandem mass spectrometry instrument (Applied Biosystems/MDS Analytical Technologies, Foster City, CA) equipped with an Agilent 1260 series UHPLC system. Organic acids, biogenic amines, amino acids, and amino acid derivatives were detected and quantified via LC-MS, while lipids, acylcarnitines, and glucose were detected and quantified via a direct injection (DI) method. Analyst 1.6.2 and MultiQuant 3.0.3 was used for quantitative analysis. An individual seven-point calibration curve was generated to quantify organic acids, amino acids, biogenic amines, and derivatives. Ratios for each analyte's signal intensity to its corresponding isotope-labelled internal standard were plotted against the specific known concentrations using quadratic regression with a 1/x 2 weighting. For lipids, acylcarnitines, and glucose, a single point calibration of a representative analyte was built using the same group of compounds that share the same core structure assuming a linear regression through zero.
Plasma IL-17 and leptin determinations. ELISA kits were used for the quantification of IL-17 (Catalog Number RAB0262, Sigma-Aldrich, St. Louis, MO, USA) and leptin (catalog number ab108879, Abcam, Cambridge, UK), following manufacturer's instructions. Briefly, standard solutions (or plasma samples), were added to each type of pre-coated 96-well plate and incubated overnight at 4 °C. The plates were then incubated with the corresponding detection antibodies (100 μL/well) for 1 h at room temperature. Streptavidin solution (100 μL) was then added to each well and the plates were incubated for 45 min. After the antibody-HPR incubation, TMB one-step substrate reagent (100 μL) was added to the wells and the plates were incubated for another 30 min before the addition of a stop solution (50 μL/well). Absorbance values (at 450 nm) were used for the calculation of the protein concentrations (pg/mL) by comparing the absorbance to an appropriate standard curve.

Statistical analysis.
To describe baseline characteristics of negative controls (non-COVID-19), COVID-19 or post-COVID-19 patients, medians with interquartile ranges (IQRs) or means [with standard deviations (s.d.)] and frequencies (%) were used for continuous and categorical data, respectively. Normality was assessed using the D' Agostino-Pearson normality test. Student's t-test or Mann-Whitney tests were used for continuous data. For categorical variables (e.g., sex, smoking, symptoms, and comorbidities) Pearson Chi 2 tests or Fisher's exact tests were used. All p-values less than 0.05 (p < 0.05) were considered statistically significant. Analyses were conducted using SPSS (IBM, version 24).
Metabolite analysis was performed with MetaboAnalyst 5.0 81 . Those metabolites with more than 20% of missing values were removed from further analysis. For the remaining metabolites, values below the limit of detection (LOD) were imputed using 1/5 of the minimum positive value of each variable. The data were then subject to autoscaling to generate appropriate Gaussian metabolite concentration distributions. Differences in mean metabolic values between controls, COVID-19, post-COVID-19, and long COVID patients were assessed using a parametric t-test or one-way ANOVA [adjusted p-value (FDR) cut-off = 0.05]. For the paired study, t-test, and volcano plots of log-transformed p-values were generated to address significant metabolites. Principal component analysis (PCA) and two-dimensional partial least squares discriminant analysis (2-D PLS-DA) scores plots were used to compare plasma metabolite data across and between study groups; 2000-fold permutation tests were used to assess statistical significance and minimize the possibility that the observed separation of the PLS-DA clusters was due to chance. Differentiated metabolites were identified by a variable importance in projection (VIP) using a score cutoff of > 1.5. Heat maps of the top 50 significant metabolites (via t-test or ANOVA) were created via MetaboAnalyst.

Scientific Reports
| (2023) 13:12420 | https://doi.org/10.1038/s41598-023-39049-x www.nature.com/scientificreports/ Pathway analysis was done using Metabolite Set Enrichment Analysis (MSEA) and Metabolomic Pathway Analysis (MetPA) modules as found in MetaboAnalyst 5.0 81 . The Homo sapiens pathway library was used for pathway analysis. The global test was used for the selected pathway enrichment analysis method, whereas the node importance measure for topological analysis was used to assess the relative betweenness centrality.
The metabolites with the highest VIP scores were used to create metabolite panels for predicting long COVID using multivariate logistic regression. Additionally, models were adjusted for relevant potential confounders such as sex, age, relevant comorbidities (i.e., DM-II, HTN, and obesity), so that only statistically significant variables (p < 0.05) remained in the final models. Logistic regression analysis was performed with the auto-scaled data. K-fold cross-validation (CV) was used to ensure that the logistic regression models were robust. To determine the performance of each generated model, the area under the receiver operating characteristics curve (AUROC or AUC) was calculated, as was sensitivity and specificity.
Ethics declarations. This study was conducted in accordance with the Declaration of Helsinki (1976). It was also revised and approved by the Research and Ethics Committees of the Instituto Mexicano de Seguridad Social, with the registration number R-2022-3301-038, and Christus Muguerza del Parque Hospital (HCMP-CEI-15042020-3, and HCMP-CEI-28022022-A01). Informed consent was obtained from all participants. All patients included in this study were informed in writing regarding the collection of their samples for research aims and were given the right to refuse participation.