Introduction

Nearly 772 million people have been infected by the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) as of November 8, 2023, including ca. 7 million deaths (https://covid19.who.int/), with more than 13.5 billion vaccine doses administered (as of November 5, 2023). SARS-CoV-2 induces a condition known as coronavirus disease 2019 (COVID-19), characterized by a wide range of clinical presentations and possible life-threatening complications1.

According to the National Institute for Health and Care Excellence (NICE) guidelines, different time phases of COVID-19 might be identified: “acute COVID-19 (signs and symptoms of COVID-19 for up to 4 weeks); ongoing symptomatic COVID-19 infection (signs and symptoms of COVID-19 from 4 to 12 weeks); and post-COVID-19 syndrome (signs and symptoms that develop during or after an infection consistent with COVID-19, continue for more than 12 weeks, and are not explained by an alternative diagnosis)”2. Overall, it is now clear that the convalescent phase of COVID-19 can present a number of clinical manifestations3 even in individuals who have had mild or moderate disease4,5.

Patients after COVID-19 may develop to so-called Long COVID, also referred to as “post-acute sequelae of COVID-19” (PASC)6. At least 70 million people around the world present long COVID7,8,9. They experience several symptoms, including cardiovascular, thrombotic and cerebrovascular disease, limited lung function with reduced lung capacities and volumes, respiratory muscle weakness, changes in radiographic and tomographic findings, type 2 diabetes, chronic fatigue syndrome, limitation in exercising, decreased functional capacity, and an overall reduced quality of life9,10. Symptoms can last for years11, with increasing public health costs and increasing economical burdening12,13.

For the pathogenesis of long COVID, several hypotheses have been put forward, including the persisting presence of SARS-CoV-2 in tissues, immune dysregulation with or without reactivation of underlying pathogens, alteration of the microbiota, microvascular blood clotting with endothelial dysfunction, among others14. The heterogeneity and complexity of post COVID-19 should be dealt with by specifically defining the targets for clinical interventions, with the aim of defining a multidisciplinary model of care to avoid burdening the patients and the health care systems with useless and costly over-investigation15,16. Physiological parameters obtained from a multiomics strategy can carefully define the patients’ status during the COVID-19 phases, recognizing and defining biological features related to and most likely predicting long-COVID manifestations14.

In this paper, we investigated the biomarkers’ landscape of long-COVID patients with several omics approaches to uncover molecular parameters that could suggest specific clinical management. We first defined the phenotype difference between patients and healthy controls by using nuclear magnetic resonance (NMR)-based metabolomics of their exhaled breath condensate (EBC)17 before entering a pulmonary rehabilitation (PR) that has been shown to be highly effective in improving the post-acute symptoms18. Such difference was also highlighted by assessing alterations in EBC-derived microRNAs (miRNAs) related to COVID-19. Finally, joining metabolomics data and clinical parameters collected during the rehabilitation program, we obtained a clear description of the pathophysiological condition of patients, highlighting the presence of persistent inflammation, dysregulation of liver, endovascular thrombotic and pulmonary processes, and physical impairment, which should be the primary targets in a management protocol of the post-acute sequelae of COVID-19.

Results

Patients

The study design is presented in Fig. 1. We screened 60 convalescent COVID-19 patients, all negativized from the wild-type SARS-CoV-2, which was the predominant form in South Italy at the time, although the presence of the D614G variant was also reported, but only 40 (92% males, mean age 58.8 years) were enrolled. Two out of 40 patients were excluded because of the low-quality NMR spectra. Samples and clinical data of 38 age- and sex-matched non-COVID-19 subjects (92% males, mean age 57.9 years) were also used as controls. They belonged to an irreversible deidentified Maugeri historical cohort of healthy volunteers selected from the hospital staff, whose samples (including EBC) and clinical data were previously collected and stored. The absence of significant respiratory, cardiac and/or metabolic diseases were anamnestic.

Figure 1
figure 1

Schematic diagram illustrating the overall study design.

Their major demographic and clinical characteristics are reported in Table 1 as mean ± standard deviation (SD). All patients presented a long-COVID condition, with lingering, recurrent symptoms after recovering from the severe/critical condition. EBC samples and all clinical and instrumental data were collected from the 38 post-COVID patients before entering the rehabilitation cycle, and, in parallel, from the 38 control subjects. For the 38 patients, clinical and instrumental data were also collected after the rehabilitation cycle.

Table 1 Characteristics and clinical parameters of the subjects enrolled in the study.

In brief, convalescent COVID-19 patients were middle-aged male subjects with a recent history of severe (44.7%) or critical (55.3%) COVID-19 according to World Health Organization (WHO) criteria (https://www.covid19treatmentguidelines.nih.gov/overview/clinical-spectrum). 63.2% of patients was transferred from an acute care setting after a hospitalization of 14.3 (7–47) days, while all 38 enrolled patients underwent a rehabilitation program of 24.3 (5–57) days (Table 1). Rehabilitation affected several clinical characteristics of post-COVID patients (p-value column in Table 1). Statistically significant variations were observed for the pulmonary parameters (PaO2, and from SpO2 down to Barthel index in Table 1), and for BMI, weight, diastolic pressure, glycemia, urea, uricemia, AST, ALT, CRP and D-dimer values, demonstrating the successful impact of rehabilitation.

NMR-based metabolomics of patients’ EBC

To define the post-COVID physiological state, we profiled by NMR the EBC from patients, and compared them with the corresponding profiles of healthy subjects. Figure S1 compares the NMR spectra of the EBC samples from a healthy subject (a) with that of a patient (b), and resonances’ assignments are reported in Table S1. Notably, saliva contamination was absent in both samples as the most intense saliva signals, originating from carbohydrates and resonating between 3.3 and 6.0 ppm, are absent. PCA was used to explore data trend and possible outliers (data not shown). We then carried out supervised OPLS-DA, which yielded a regression model with high-quality parameters (R2 = 0.81, Q2 = 0.87 and CV ANOVA p = 2.3 × 10−12), and a clear class discrimination (Fig. 2). In the associated loadings plot (not shown), the post-COVID group, with respect to controls, presented upregulation of ethanol, lactate and acetoin, and downregulation of acetate, acetone, fatty acids, isocaproate, isovalerate, methanol and valerate. Their statistical significance is reported as box and whiskers plots in Supplementary Figs. 13. These results indicate that patients present a metabotype completely different from that of healthy subjects.

Figure 2
figure 2

Orthogonal projections to latent structures discriminant analysis (OPLS-DA) of EBC samples from post-COVID patients and controls. Scores plot showing the degree of separation of the model between post-COVID (red circles) and controls (blue circles). The model presents strong regression (95%, CV-ANOVA p < 2.3 × 10−12) and high-quality parameters (R2 = 81% and Q2 = 87%). The labels t[1] and to[1] along the axes represent the scores (the first 2 partial least-squares components) of the model, which are sufficient to build a satisfactory classification model.

The discriminating biomarkers were used to identify the metabolic networks altered in post-COVID. Application of enrichment metabolic analysis indicated the potential biological mechanisms producing the separation between post-COVID and controls. With a threshold of p < 0.05, we uncovered synthesis and degradation of ketone bodies, pyruvate metabolism, propanoate metabolism, butanoate metabolism, cAMP signaling pathway, inflammatory mediator regulation of TRP channels and carbon metabolism as the most probable activated pathways. They mark the differences between the post-COVID-19 metabotype with respect to controls. The results of the enrichment analysis are reported as Supplementary Table S2.

miRNA analysis

Potential genes related to altered metabolites found in EBC were derived from gene-metabolite interaction network analysis (Supplementary Table S3). Putative miRNAs involved in the modulation of the found genes were uncovered by an in silico analysis using the miRNet tool. This approach integrated the metabolomic analysis and miRNAs modulation in the same samples. Enrichment analysis based on the hypergeometric test explored 20 miRNA functions significantly modulated (p < 0.05). Validation of miRNAs through qRT-PCR was obtained considering the functions cell cycle (74 hits, Gene ontology (GO) annotations number GO:0007049), regulation of stem cell proliferation (74 hits, GO:0072091), cell death (73 hits, GO:0008219), aging (70 hits, GO:0007568), hematopoiesis (68 hits, GO:0030097) and angiogenesis (66 hits, GO:0001525) (Supplementary Table S4).

Among the miRNAs associated with the above functions, we identified hsa-miR-145-5p, hsa-miR-221-3p, hsa-miR-221-5p, hsa-miR-17-5p, hsa-miR-222-3p and hsa-miR-34a-5p common to all six functions, hsa-miR-146a-5p common to five functions, and hsa-miR-126-3p and hsa-miR-223-3p common to four functions. A PubMed search (“miRNAs name” AND “COVID-19”) indicated that hsa-miR-34a-5p, hsa-miR-146a-5p, hsa-miR-126-3p and hsa-miR-223-3p are associated with COVID-19 pathogenesis, which we searched for in EBC samples of post-COVID-19 patients. Except for hsa-miR-34a-5p, which was below the limit of detection (Cq ≥ 35) in more than 80% of samples and therefore was not considered further, different modulation was found for the other three miRNAs. Compared with healthy controls, patients presented up-regulation of hsa-miR146a-5p (Fig. 3b), while hsa-miR-126-3p and hsa-miR-223-3p were down-regulated (Fig. 3a,c, respectively). They are involved in inflammatory responses and immune regulations, and their alterations in post-COVID-19 indicate the persistence of pathophysiological processes.

Figure 3
figure 3

Relative expression of miRNAs in EBC samples obtained from enrolled subjects. (a) hsa-miR-126-3p-3p. (b) hsa-miR-146a-5p-5p. (c) hsa-miR-223-3p-3p. Blue bars refer to control subjects, while red bars refer to post-COVID patients. Analysis was performed with qRT-PCR. p-values are shown.

Correlation of EBC metabolites with clinical parameters

The post-COVID-19 metabolites from EBC were associated with the clinical parameters obtained at the hospitalization before rehabilitation. The heatmap in Fig. 4 shows the significant Pearson correlation coefficients (p < 0.05) between the metabolites and at least one clinical parameter. Considering a threshold value of ρ ≥ |0.5|, we identified a positive correlation of 0.7 between propionate/isobutyrate (label 2 in Fig. 4) and creatinine (dark blue box with a red double asterisk, see the color code in Fig. 4). Positive correlations of 0.5 were observed between acetoin (label 1) and propionate/valine (label 3) with creatinine, isobutyrate (label 5) and alanine aminotransferase (ALT), lactate (label 10) and pH, glycine (label 21) and leukocytes, 3-hydroxyisobutyrate (label 22) and leukocytes, and ethanol (label 23) with platelets (blue boxes with a red asterisk).

Figure 4
figure 4

Heatmap based on Pearson correlation coefficients between EBC metabolites and values obtained from clinical test in negativized COVID-19 patients. Rows and columns are rearranged according to the centroid-based correlation matrix-based hierarchical clustering (CMBHC). Blue tone indicates positive correlations between metabolites and clinical data, whereas light tones indicate negative correlations. Correlation values ρ = |0.7| are marked with a double asterisk, while values ρ = |0.5| are labeled with a single asterisk. EBC metabolites are: 1, acetoin; 2, propionate/isobutyrate; 3, propionate/valine; 4, pyruvate; 5, isobutyrate; 6, methanol; 7, 3-hydroxyisovalerate; 8, isovalerate; 9, fatty acids (FA); 10, lactate; 11, formate; 12, trimethylamine; 13, 2-hydroxyisovalerate; 14, isocaproate; 15, isovalerate; 16, valerate; 17, acetate; 18, acetone; 19, serine; 20, isopropanol; 21, glycine; 22, 3-hydroxybutyrate; 23, ethanol. The arrows on top label significant EBC metabolites, whose trend, with respect to healthy subjects, is symbolized by the arrow direction. Statistically significant clinical data are underlined.

Negative correlations of − 0.7 were observed between acetoin, propionate/isobutyrate and propionate/valine with FEV1/FVC (labels 1, 2 and 3, respectively, pale yellow boxes with a black double asterisk). Negative correlations of − 0.5 involved methanol (label 6) with weight and the six-minute walking distance (6MWD), acetone (label 18) and 6MWD, and glycine (label 21) and pH (light green boxes with a black asterisk in Fig. 4). Positive correlation indicates similar behavior between metabolites and clinical values (increase/increase, decrease/decrease), while negative correlation refers to opposite behavior (increase/decrease, decrease/increase). Such correlations indicate that clinical parameters can be monitored via metabolites, which could become noninvasive markers of the clinical status.

Rehabilitation of post-COVID-19 patients: analysis of the clinical data between admission and discharge

The effects of rehabilitation on patients were evaluated by comparing the clinical/laboratory data of each patient at the admission in an average rehabilitation cycle of 24.3 days (in) (Table 1) and at discharge (out). The scores multilevel PLS-DA plot of Fig. 5 shows that the discharge status (black dots, out) is different from the one at the admission (red dots, in). In particular, at the admission, patients presented higher values of creatine, triglycerides (TGs), leukocytes, urea, red blood cell count, systolic blood pressure, total cholesterol (TC), platelets, hematocrit, weight, diastolic blood pressure, hemoglobin, glycemia, C-reactive protein (CRP), ALT, D-dimer, aspartate aminotransferase (AST), FEV1/FVC and CAT. At discharge, patients were characterized by higher values of pH, total lung capacity (TLC), HCO3, uricemia, albumin, PaCO2, DLCO/VA, DLCO, SpO2, Barthel, PaO2, FEV1%, FVC, FVC%, FEV1 and 6MWD. This is depicted in Fig. 6, which reports the contribution plot related to the above multilevel PLS-DA model, where each bar represents the loadings value for each variable on the principal component PC1 at the admission (red bars) and at discharge (black bars).

Figure 5
figure 5

Multilevel PLS-DA scores plot for post-COVID patients. The labels X-variate 1 and X-variate 2 along the axes represent the scores (the first 2 partial least-squares components) of the model, which are sufficient to build a satisfactory classification model. Admission variables (IN) are shown in red, while discharge variables (OUT) are in black.

Figure 6
figure 6

Contribution plot of the principal component PC1 of the multilevel PLS-DA model including the clinical parameters of the post-COVID patients. Each bar represents the loading value for each variable on PC1. Admission variables (IN) are shown in red, while discharge variables (OUT) are in black.

Statistical significance was found for AST, ALT, D-dimer, CAT, Barthel, DLCO, SpO2, PaO2, FEV1/FVC, FEV1, FVC, FEV1%, FVC% and 6MWD (Table 1), which are the principal clinical parameters that are carried over upon negativization. Therefore, post-COVID-19 patients should be monitored for liver damage (AST and ALT), endovascular thrombotic processes (D-dimer), persisting pulmonary symptoms (CAT, Barthel, DLCO, SpO2, PaO2, FEV1, FVC, FEV1/FVC, FEV1%, FVC%), and physical impairment (6MWD). The relationship between the above parameters and the statistically significant EBC metabolites (top arrows in Fig. 4) indicated negative correlations between increased acetoin (label 1) and decreased FEV1/FEV (ρ =  − 0.7, underlined in Fig. 4), decreased methanol (label 6) and acetone (label 18) with increased 6MWD, increased ethanol (label 23) and D-dimer (all presenting ρ =  − 0.5).

Taken together, the metabolomic, the miRNAs and the clinical data point out that post-COVID patients still present dysregulation of the liver, endovascular and pulmonary parameters.

Discussion

Our results show that post-COVID-19 patients present several dysfunctions from which post-acute sequelae could originate. In particular, the post-COVID-19 group showed persistent lung inflammation as indicated by upregulation of ethanol, lactate and acetoin, and downregulation of acetate, methanol, acetone, fatty acids, isocaproate, isovalerate and valerate. In fact, increased acetoin level is associated with airway inflammation19, and reduction of methanol was observed in the EBC of lung cancer patients20. Short-chain fatty acids acetate, isovalerate, valerate and isocaproate (SCFAs) are involved in the regulation of several leukocyte functions linked to the production of cytokines, eicosanoids and chemokines, and are reported to affect leukocyte migration to the inflammation foci21. Acetone and lactate were detected in the bronchoalveolar lavage fluid of cystic fibrosis patients with varying levels of inflammation22. In addition, lactate excess can bring about a noticeable raise in ROS and apoptosis in A549 alveolar cells23. It was reported that non-survivor COVID-19 patients had higher lactate levels with respect to survivors at the intensive-care unit admission24. Furthermore, lactate is the main downgrading product of anaerobic metabolism, and it is well known that COVID-19 patients present hypoxic lung damage and respiratory failure, and that hypoxia is an indicator of COVID-19 mortality25. Significantly different concentrations between COVID-19 patients within 21 days from clinical diagnosis and post-COVID-19 groups were observed for acetate, acetone and lactate also in plasma26.

Correlation of EBC metabolites with clinical data from patients showed statistically significant relationships between increased acetoin and reduced FEV1/FVC (ρ =  − 0.7), decreased methanol and acetone with increased 6MWD (ρ =  − 0.5), and increased ethanol and decreased D-dimer (ρ =  − 0.5), which indicate that these metabolite alterations are manifestations of the corresponding physiological functions. As a confirmation, reduction of methanol and acetone and the corresponding 6MWD increase was observed in chronic obstructive pulmonary disease (COPD) patients after a 5-week rehabilitation program27, and ethanol can reduce the global fibrinolytic capacity of whole blood, measured as D-dimer production during incubation of blood clots28.

From the above metabolites we identified the most probable dysregulated metabolic pathways, namely synthesis and degradation of ketone bodies, pyruvate metabolism, propanoate metabolism, butanoate metabolism, cAMP signaling pathway, and inflammatory mediator regulation of TRP channels. Interestingly, upregulation of ketone bodies and pyruvate metabolisms has been observed in previous NMR-based metabolomics studies of serum/plasma samples from post-COVID patients26,29,30,31.

Ketone bodies (KBs) are produced by hepatocytes’ mitochondria where fatty acids enter upon adipocytokine signaling. Interestingly, two adipocytokines, IL-6 and tumor necrosis factor-alpha (TNFα), are related to COVID-19 severity and patients’ death32. Degradation of KBs (ketolysis) implies elevated levels of KBs in the blood and urine (ketosis). Ketosis shows an anti-inflammatory activity since β-hydroxybutyrate (β-HB), derived from the KB acetoacetate, is a key regulator of inflammation pathways like the NLRP3 inflammasome33. It has been suggested that in SARS-CoV-2 infection, treatments increasing β-HB levels could improve host defenses against respiratory viral infection while decreasing inflammation34. Additionally, the high levels of triglycerides and triglycerides-rich lipoproteins observed in COVID plasma26 could be generated by a limited oxidation of acetyl-CoA inside the mitochondria, therefore favoring the synthesis of ketone bodies and the high levels of β-HB, acetoacetate and acetone in COVID-19 patients35.

cAMP is involved in several inflammatory pathways, being able to inhibit ROS generation and proinflammatory cytokine production, primarily IL-6 and TNF-α36. Furthermore, preserving the cAMP concentration in the pulmonary tissue can improve lung functions36, which are essential in COVID-19 patients. Interestingly, anosmia and ageusia, which have been observed in COVID-19 patients, have also been related to the intracellular levels of cAMP37.

The propanoate and butanoate metabolisms describe the metabolism of the SCFAs propionate and butyrate. SCFAs mediate the communication between the intestinal microbiome and the immune cells via free fatty acid receptors (FFARs), and dysregulation of the FFAR2/3 receptors’ expression favored the insurgence of respiratory diseases38. We have observed that post-COVID 19 patients showed, with respect to controls, alteration of acetate, fatty acids, isocaproate, isovalerate, valerate (all SCFAs), and fatty acids, which are involved in the production of cytokines, eicosanoids, and chemokines responsible for the lung hyperinflammation in severe COVID-19 patients39.

Transient receptor potential (TRP) channels are widely expressed in tissues that are infected by SARS-CoV-2 and have been proposed as targets for adjuvant therapies against COVID-1940. Most of the clinical manifestations of COVID-19 activate different TRP channels. For example, TRPV4 is involved in the recruitment of neutrophils and macrophages during lung injury41 and relates to hearing loss/impairment40. Loss of either TRPM4 or TRPM5 channels may significantly impair taste42 and olfaction43. TRP channels also contribute to several cardiac complications (arrhythmias, cardiac fibrosis and myocyte hypertrophy) observed in COVID-19 patients44.

Using miRNet, from the discriminating metabolites we identified the perturbed genes, which in turn prompted the miRNAs altered in EBC. miRNAs have emerged as regulators of COVID-1945,46. In particular, we found hsa-miR-126-3p and hsa-miR-223-3p downregulated in post-COVID-19, while hsa-miR-146a-5p was upregulated. They are involved in the regulation of ACE2, the binding site of the virus, and in the inflammatory responses and immune regulation47. hsa-miR-126-3p attenuates lung inflammation via different pathways that reduce many proinflammatory cytokines including IL-648, which in COVID-19 has been linked to high mortality risk32. In COVID-19 patients, the serum level of hsa-miR-126-3p was considerably reduced with the increase of disease grade49, and this pattern was also observed in patients non-responsive to therapies50. hsa-miR-126-3p downregulation was also detected in plasma samples of COVID-19 patients with respect to a healthy control group, while no downregulation was observed between severe and mild patients51, which was instead previously reported52. Furthermore, a positive correlation between miR-126-3p and neutrophils levels, and a significant negative correlation with IL-6 and D-dimer were observed53. Interestingly, in vitro hsa-miR-126-3p exhibited neutralizing activity against SARS-COV-2 infection49.

We here found that hsa-miR-126-3p does not return to the pre-COVID-19 values, and this is an indication of the persistent inflammation status after negativization. hsa-miR-126-3p also shows a pro-angiogenic role by stimulating endothelial cell proliferation54. The post-acute COVID-19 syndrome is associated with a persistent endothelial dysfunction, directly correlated with the severity of pulmonary impairment55, whose recovery is normally related to maintaining the physiological endothelial functions. Therefore, in line with the above results, the decrease we observed for hsa-miR-126-3p suggests the persistent presence of endothelial damage in patients.

Serum hsa-miR-223-3p directly inhibits the viral S protein expression and SARS-CoV-2 replication56, and is implicated in the regulation of inflammatory responses by inhibiting the action of the NLRP3 inflammasome and modulating the expression of inflammatory chemokines and cytokines57. In a possible mechanism, decrease of the has-miR-223-3p expression should increase NLRP3 expression levels and promote pyroptosis58,59. Furthermore, serum miR-223-3p Therefore, the reduced level of hsa-miR-223-3p observed in post-COVID-19 patients confirms that inflammation is still present after negativization. Interestingly, hsa-miR-223-3p was amplified by long-term physical exercise56, and we here found that 6MWD is the most important factor that characterizes the hospital discharge after post-COVID-19 rehabilitation. Taken together, this suggests a beneficial action of hsa-miR-223-3p with the consequent reduction of inflammation56.

Upon a viral infection, has-miR-146a is primarily produced to regulate the innate immune response and inflammation by negatively regulating the NF-κB pathway60,61. Therefore, its expression in COVID-19 decreases inflammatory disorders in target organs such as the lungs, heart, brain, skin, and underlying vascular disease61,62. The hsa-miR-146a-5p increase we observed in post-COVID-19 patients is an indication of the path to recovery, as the levels of IL-1, IL-6 and TNF-α cytokines are inversely correlated to has-miR-146a production63,64. In fact, hsa-miR-146a-5p was found ca. threefold higher in a COVID-19 post-acute group than in the acute group65,66, and COVID-19 patients who did not respond to tocilizumab treatment presented a reduction of has-miR-146a-5p with respect to responders, and its reduction in non-responders was associated to a higher risk of adverse outcomes53.

The above miRNAs are involved in cell cycle, regulation of stem cell proliferation, cell death, aging, hematopoiesis, and angiogenesis functions. Although nonspecific, cell cycle, regulation of stem cell proliferation and cell death could reflect the impact of COVID-19 on several multiorgan cellular processes, in line with the results of a proteomic analysis of autoptic samples from seven organs in COVID-19 patients67. Furthermore, the regulation of stem cell proliferation promotes remodeling and lung tissue regeneration after COVID-19-induced pneumonia and can help patients’ recovery68. Similarly, for cell death, acutely ill COVID-19 patients revealed an upregulation of cell death programs genes, acting in tissue specific manner69.

More specific are aging, hematopoiesis and angiogenesis. Aging is a main risk factor for severe COVID-19 and its worst outcomes because it induces immunosenescence, which hampers the response to the virus70, and inflammaging, a low-grade diffused inflammation71. Hematopoiesis alteration is associated with severe and fatal COVID-19 as SARS-CoV-2 alters the bone marrow microenvironment, weakening hematopoiesis and causing hemocytopenia72. Furthermore, IL-6, which increases dramatically in COVID-19, is important for regulation of hematopoiesis as it stimulates the production of bone marrow neutrophils73. Regarding angiogenesis, autoptic lungs from patients died from SARS-CoV-2 infection indicated the presence of significant new vessel growth and a corresponding differential upregulation of angiogenesis-associated genes74. Such a compensatory angiogenesis mechanism was also observed in heart, liver, kidney, brain and lymphoreticular organs in patients who died from COVID-1975.

Comparing clinical data from post-COVID-19 patients before and after the admission in a rehabilitation cycle, we detected dysregulation of parameters related to liver damage (AST and ALT), endovascular thrombotic processes (D-dimer), persisting pulmonary symptoms (CAT, Barthel, DLCO, SpO2, PaO2, FEV1, FVC, FEV1/FVC, FEV1%, FVC%), and physical impairment (6MWD). SARS-CoV-2-infected subjects present alterations of liver biochemistry76. Since AST and ALT increase is associated with the reduction of peripheral oxygen saturation in viral pneumonias77, it is expected that systemic hypoxia in COVID-19 may also alter AST and ALT levels. In fact, a five-fold increase of AST and ALT levels in COVID-19 with respect to normal is associated with an increased risk of death78, causing elevated levels of CRP (which is synthesized by the liver), D-dimer, ferritin and IL-676. Therefore, the increased CRP values in patients before rehabilitation again confirms the persistence of liver alteration and inflammation.

Both venous and arterial thromboses characterize COVID-19 pathology79. D-dimer is an indirect marker of active coagulation and thrombin formation, and represents a mirror of the endovascular thrombotic processes. Higher levels of D-dimer are observed in severe patients infected with SARS-CoV-2 compared to nonsevere ones, and, significantly, increased D-dimer has been reported in COVID-19 nonsurvivors with respect to survivors, and the concentration continues to rise until death80.

The following limitations of the study should be considered. First, although the number of enrolled patients encompasses that indicated by backward analysis, our results depend on a relatively limited number of subjects (38 patients and 38 controls). For this reason, we combined different types of biomarkers (EBC, miRNAs and clinical parameters), which represent complementary physiological aspects. Second, the patients were not consecutively recruited since they were selected from those of the rehabilitation division. As such, parameters like hospitalization for acute cases and rehabilitation period were variable, ranging between 7 and 50 days, and 5 and 57 days, respectively. Furthermore, only 3 females (8%) were comprised in each group because the patients admitted were typically males. Also, the possibility that conditions/treatments not recorded because of the flexibility at the hospital admission affects our conclusions cannot be excluded. We are aware that such uncontrolled heterogeneity may contribute to variations in our findings, and therefore they need to be validated in a larger cohort of patients with more balanced parameters. However, such a heterogeneity reflected the emergency due to the COVID-19 pandemic being a real clinical setting. Third, although metabolomics was untargeted, miRNAs were assessed after comparing those derived from an in silico analysis with those related to COVID-19. Obviously, other miRNAs may have clinical significance, and their clinical role may have been underestimated. However, since individual miRNAs lack specificity and they should be used in combination with other omics parameters, we related miRNAs with metabolomics markers of EBC, which, to the best of our knowledge, has not been reported thus far.

Notwithstanding the above limitations, we were able to build a satisfactory description of the metabolic processes going on in post-COVID-19 patients, characterized by persistent inflammation, dysregulation of liver, endovascular thrombotic and pulmonary processes, and physical activities. A clear correlation was found between the metabolic response of patients and the clinical outcomes, which suggested selective interventions to face the pathophysiological status of patients and possibly contrast the post-acute sequelae of COVID-19. In addition, information on the rehabilitation could be obtained according to the biomarkers that characterize the post-COVID-19 metabotype.

All considered, the results shown here provide sufficient evidence that joining together breath metabolomics, miRNAs and clinical parameters can generate a reasonable understanding of the complex pathophysiological status of negativized SARS-CoV-2 patients. Our approach is basically noninvasive and could suggest an unbiased personalized approach to achieve an optimal use of healthcare resources.

Methods

Patients

Convalescent COVID-19 patients referring to the Pulmonary Rehabilitation Unit of Istituti Clinici Scientifici Maugeri IRCCS, Telese Terme, Italy, were screened from October 2020 to February 2021 for enrollment within 2 months of swab test negativization from the wild-type SARS-CoV-2, which was the predominant form in South Italy at the time, although the presence of the D614G variant was also reported. Inclusion criteria were: age ≥ 18 years; recent SARS-CoV-2 infection with severe-to-critical COVID-19 according to the NIH classification (https://www.covid19treatmentguidelines.nih.gov/overview/clinical-spectrum/); patients presenting a long-COVID condition, with lingering, recurrent symptoms after recovering from the severe/critical condition after a negative swab test; indication for a multidisciplinary rehabilitation program. Exclusion criteria were: recent (< 6 months) major surgery or any previous lung surgery; current malignancy; any history of chronic respiratory disease (e.g., asthma, COPD) other than COVID-19; inability to understand or sign the informed consent. Clinical data and EBC samples from age- and sex-matched healthy volunteers were also included in the study as controls. They belonged to an irreversible deidentified set of electronic Maugeri database containing records of people selected from the hospital staff, whose samples (including EBC) were previously collected and stored at − 80 °C. The absence of significant respiratory, cardiac and/or metabolic diseases were anamnestic.

Participants with missing data for the outcome of interest were excluded from the study.

We followed the STROBE reporting guidelines81, in line with the 1975 Declaration of Helsinki. The Ethic Committee of Istituto Nazionale Tumori, Fondazione Pascale, Naples, Italy approved the study (n. ICS 3/20).

Study procedures

After signing the informed consent, all convalescent COVID-19 patients underwent a detailed collection of key demographic and clinical information related to the acute phase of COVID-19, lung function, physical performance, comorbidities and treatment(s). Following the same exclusion criteria as convalescent COVID-19 patients, data were extracted from an irreversibly de-identified electronic dataset for control subjects. Venous blood samples were used for the common hemato-chemical parameters. Arterial blood samples were collected to measure oxygen (PaO2) and carbon dioxide tension (PaCO2) using a blood gas analyzer (ABL 825® FLEX BGA, Radiometer Medical Aps, Copenhagen, Denmark). According to the protocols of the Spirometry parameters and diffusion lung capacity for carbon monoxide (DLCO) were also evaluated with an automated equipment (Vmax® Encore, Vyasis Healthcare, Milan, Italy) as reported82,83. Forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC) and DLCO were expressed both as numerical values and percentages of predicted values (FEV1%, FVC% and DLCO%, respectively). The COPD Assessment Test (CAT)84 and the Barthel index were also administered to patients to evaluate the impact of the disease on daily living. Exercise capacity was tested by measuring the 6MWD85. All the clinical and the instrumental analyses were carried out at the admission (in) and at the discharge (out) after rehabilitation.

Rehabilitation

The rehabilitation (a 5-week exercise-based program of 6 sessions/week (30 sessions)) protocol followed the official ATS/ERS guidelines (Supplementary Information)86. In brief, patients undertook a 5-week exercise-based program of 6 sessions/week (30 sessions). Physical exercise was the cornerstone of the program, which also included dietary and psychosocial counselling, based on treadmill walking, stationary cycling, arm ergometry, flexibility, stretching and strengthening exercises with body and fixed weights. The participation was monitored and supervised by a physiotherapist.

EBC collection, NMR sample preparation and spectra acquisition

EBC samples were collected from negativized patients (post-COVID) before entering the rehabilitation program. Control samples were from a cohort of healthy volunteers belonging to an irreversible deidentified set of electronic Maugeri database containing records of people selected from the hospital staff, whose EBC samples were previously collected and stored at − 80 °C. The absence of significant respiratory, cardiac and/or metabolic diseases were anamnestic. Sample preparation and NMR spectra acquisition were carried out as described (Supplementary Information)87,88.

Power analysis

In metabolomics, a priori power analysis is not feasible because concentration variations of biomarkers are unknown before analysis89. It was estimated by varying the 1 − α and 1 − β parameters from 95 to 99.9% and from 80 to 99.9%, respectively. Using the accuracy percentages obtained in our validation tests89, for 1 − α = 95% and 1 − β = 80% we derived 24 ± 3 post-COVID-19 patients for all classes, while for 1 − α = 1 − β = 99.9% we obtained 28 ± 3 patients. To account for possible drop-outs or protocol adherence problems, we screened 60 post-COVID-19 patients, with the final number of enrolled patients greater than those indicated by the backward analysis (40 vs. 24/28). However, 1 − α = 95% and 1 − β = 80%, and 99.9% is an extreme setting.

Multivariate data analysis

EBC proton spectra were automatically subdivided into 420 discrete regions (‘buckets’) of equal width (0.02 ppm) and integrated (Supplementary Information)87,88. Each integral was normalized to the total spectrum area to account for possible dilution effects. NMR data were imported into SIMCA-P + 14 package (Umetrics, Umeå, Sweden) for Principal Components Analysis (PCA) and Orthogonal Projections to Latent Structures Discriminant Analysis (OPLS-DA) after Pareto scaling. Model quality was evaluated by using the goodness-of-fit parameter (R2) and the goodness-of-prediction parameter (Q2)90, together with an internal iterative 7-round cross-validation and permutation test (800 repeats) and ANalysis Of VAriance testing of Cross-Validated predictive residuals (CV-ANOVA). Quantification was achieved with OriginPro 9.1 software package (OriginLab Corporation, Northampton, USA). Statistical significance for selected metabolites was determined by parametric (ANOVA with Bonferroni correction) or non-parametric (Mann–Whitney U) tests according to the results of normality test performed to evaluate data distribution (Shapiro–Wilk, Kolgomorov–Smirnov test). p < 0.05 was considered statistically significant. To evaluate possible covariates, propensity score matching was used to further estimate discriminant metabolites between controls and post-COVID classes before rehabilitation. The propensity scores were estimated in R with the MatchIt package91 using logistic regression based on weight, systolic and diastolic pressure of Table 1, while considering the other variables for correlation purposes. One-to-one nearest neighbor matching was used and 37/38 patients were matched to a non-COVID subject in the dataset. Statistical differences in ethanol (p = 0.006), methanol (p = 0.003), acetone (p = 0.007), acetate (p = 2.05 × 10−7), acetoin (p = 0.01), lactate (p = 0.006), CH2 portion of fatty acids (p = 1.38 × 10−7), isovalerate (p = 1.02 × 10−7), valerate (p = 0.005) and isocaproate (p = 3.58 × 10−6) levels in post-COVID and control class were evaluated through multiple linear models and cluster-robust variance was used to estimate the standard error. All models showed no statistically significant effect of the considered covariates.

NMR data before rehabilitation were integrated with clinical parameters generating a correlation map with hierarchical clustering analysis (HCA) with R software92. Clinical test values and selected bin integrals of significant metabolites were combined using Pearson correlation as the distance metric. The Euclidean distance was considered for the metrics and the centroid method for clustering criterion.

Clinical parameters discriminating post-COVID-19 patients at the admission (in) and at discharge (out) after rehabilitation were evaluated by analyzing paired data with multilevel PLS-DA93 using the R software and the mixOmics package94. The paired samples Wilcoxon test was used to assess statistical significance.

Network analysis

Enrichment analysis on metabolites from the post-COVID-19 vs. controls model was applied using the diffusion method computed with the FELLA package in R95. The Homo sapiens database in the Kyoto Encyclopedia of Genes and Genomes (KEGG)96 was used. The resulting network and subnetwork were evaluated with a threshold of p < 0.001. The results are reported in Supporting Table S1.

miRNet in silico analysis

With the miRNet tool97 we predicted gene-modulated miRNAs. Genes were identified by gene-metabolite interaction network analysis that uncovered interactions between metabolites and genes98. Detailed implementation resources for miRNA-target data derived from miRTarBase v7.0, TarBase v7.0 and miRecords databases. Putative miRNA functions were identified using the hypergeometric test. Network size and complexity were reduced using miRNA-Function, the database for functional enrichment analysis. miRNAs functional implications were uncovered by using Tam299. Gene Ontology and GO annotation data were obtained with QuickGO100.

RNA isolation and quantitative real time PCR (qRT-PCR) miRNAs validation

After controlling for age (airway miRNAs may be age-dependent101), total RNA was extracted from ca. 1 mL of EBC from 20 subjects (10 healthy controls and 10 post-COVID-19 patients), which offered sufficient power to assess twofold changes. The purification kit (NorgenBiotek Corporation, Thorold, ON, Canada) was used according to the manufacturer’s instructions. Quantity and quality were analyzed by NanoDrop spectrophotometer (Thermo Fisher Scientific, Monza, MB, Italy), and subsequently stored at − 80 °C until use. The quantity of total RNA in each sample ranged from 3 to 11 ng/μL and was used in agreement with the transcription kit protocol. Exogenous spike-in, cel‐has-miR‐39‐3p, was added in a standardized amount to all samples prior to the RNA extraction to allow for normalization of technical variability. Furthermore, it was possible to estimate the degree of purity of RNA as a function of contamination from complex carbohydrates and proteins. For good RNA preparations, the A260/A280 purity ratios must be 1.8–2.0, as observed for all our samples. When this ratio is lower, it indicates the presence of contaminants (phenol or proteins absorbing near 280 nm). But, in this study, we specifically selected this procedure phenol‐free and with several filtering steps (removing larger particles) to obtain a total RNA pure extract. In addition, the Norgen kit protocol has been created also for biofluids and without any need of modifications for our EBC samples. Isolated RNA was used to synthesize cDNA using a reverse transcription kit (Applied Biosystems, Foster City, CA, USA).

Selected human miRNA (hsa-miR-34a-5p; hsa-miR-146a-5p; hsa-miR-126-3p; hsa-miR-223-3p; cel-hsa-miR-39-3p) expressions were quantified using the TaqMan MicroRNA assay (Applied Biosystems, Foster City, CA, USA), and qRT-PCR was performed on an ABI Prism 7500 Real Time PCR System (Applied Biosystems, Foster City, CA, USA). miRNAs are reported as relative expression normalized to the mean of a synthetic spiked-in non-human cel-hsa-miR-39-3p (5′-UCACCGGGUGUAAAUCAGCUUG; Life Technologies Europe BV, Bleiswijk, the Netherlands). The relative expression of each miRNA was reported as 2−ΔCt, with ΔCt being the difference between the Cts of the specific miRNA and those of the cel-hsa-miR-39-3p. Each reaction was performed in triplicate.