A patient-centric modeling framework captures recovery from SARS-CoV-2 infection

Ruffieux, Hélène; Hanson, Aimee L.; Lodge, Samantha; Lawler, Nathan G.; Whiley, Luke; Gray, Nicola; Nolan, Tui H.; Bergamaschi, Laura; Mescia, Federica; Turner, Lorinda; de Sa, Aloka; Pelly, Victoria S.; Kotagiri, Prasanti; Kingston, Nathalie; Bradley, John R.; Holmes, Elaine; Wist, Julien; Nicholson, Jeremy K.; Lyons, Paul A.; Smith, Kenneth G. C.; Richardson, Sylvia; Bantug, Glenn R.; Hess, Christoph

doi:10.1038/s41590-022-01380-2

Download PDF

Resource
Open access
Published: 30 January 2023

A patient-centric modeling framework captures recovery from SARS-CoV-2 infection

Nature Immunology volume 24, pages 349–358 (2023)Cite this article

8230 Accesses
16 Citations
64 Altmetric
Metrics details

Subjects

Infectious diseases

Abstract

The biology driving individual patient responses to severe acute respiratory syndrome coronavirus 2 infection remains ill understood. Here, we developed a patient-centric framework leveraging detailed longitudinal phenotyping data and covering a year after disease onset, from 215 infected individuals with differing disease severities. Our analyses revealed distinct ‘systemic recovery’ profiles, with specific progression and resolution of the inflammatory, immune cell, metabolic and clinical responses. In particular, we found a strong inter-patient and intra-patient temporal covariation of innate immune cell numbers, kynurenine metabolites and lipid metabolites, which highlighted candidate immunologic and metabolic pathways influencing the restoration of homeostasis, the risk of death and that of long COVID. Based on these data, we identified a composite signature predictive of systemic recovery, using a joint model on cellular and molecular parameters measured soon after disease onset. New predictions can be generated using the online tool http://shiny.mrc-bsu.cam.ac.uk/apps/covid-19-systemic-recovery-prediction-app, designed to test our findings prospectively.

Necroptosis blockade prevents lung injury in severe influenza

Article 10 April 2024

Long COVID: major findings, mechanisms and recommendations

Article 13 January 2023

Age-specific nasal epithelial responses to SARS-CoV-2 infection

Article Open access 15 April 2024

Main

Coronavirus disease 2019 (COVID-19), which is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has a wide spectrum of clinical manifestations and has led to over 6.6 million deaths worldwide by the end of 2022 (ref. ¹). When acute infection is resolved, health is restored in most individuals, yet some develop prolonged symptoms, known as long COVID^2,3,4. SARS-CoV-2 can induce an important acute-phase reaction (systemic inflammation), profound changes in organismal metabolism and alterations across many elements of the immune system^5,6. Evolution of these parameters over time is highly heterogeneous between patients^7,8. How failure to restore organismal homeostasis relates to recovery from acute infection and development of long COVID remains unclear.

In this study, we exploited existing and new data from a previously described COVID-19 cohort⁹. Published results using this cohort focused on the acute and convalescent phases of illness using clinical, immune cell and inflammation parameters measured up to 3 months after disease onset. Here we adopted a wider perspective to understand organismal recovery, based on the same parameters as before⁹ (immune cell subsets, serum cytokines and C-reactive protein (CRP) levels) but collected over an extended follow-up period of 12 months, as well as newly established data (polar metabolites, glycoproteins and lipoproteins), and patient questionnaires addressing long-term symptoms of disease. We devised a patient-centric longitudinal modeling framework, based on functional principal component analysis (FPCA), to examine the cellular, metabolic and inflammatory drivers of patient variability in disease trajectories, and study recovery from COVID-19 in a broad, systemic sense.

Results

Cellular and metabolic changes associate with inflammation

We recruited 215 SARS-CoV-2 PCR-positive patients (hereafter COVID-19 patient, CovP) between 31 March 2020 and 7 August 2020 in Cambridge Hospitals (Wuhan strain). CovPs were categorized according to five severity classes based on peak clinical severity: A, asymptomatic (n = 18); B, mild symptomatic (n = 40); C, hospitalized without supplemental oxygen (n = 50); D, hospitalized with supplemental oxygen (n = 38); and E, hospitalized with assisted ventilation (n = 69). Symptomatic CovPs had onset of disease between 20 February 2020 and 18 June 2020, and hospital outcome (discharged, other hospital or facility, or deceased) for inpatients occurred after a median of 23 d after symptom onset (interquartile range (IQR), 32.25 d). Briefly, inpatient classes (C to E) were sampled at enrollment, approximately weekly up to 4 weeks, and then every 2 weeks up to 12 weeks. Discharged CovPs were asked to provide a follow-up sample 4–8 weeks after enrollment. Outpatient classes (A and B) were sampled at enrollment and subsequently after approximately 2 and 4 weeks. Participant recall beyond the original study period⁹ occurred at approximately 3, 6 and 12 months following recruitment. Additionally, 14 CovPs were newly included, after discharge from hospital. CRP levels, five serum cytokines (interferon (IFN)-γ, interleukin (IL)-10, IL-1β, IL-6, tumor necrosis factor (TNF)), 36 polar metabolites, 103 glycoproteins and lipoproteins, and 33 immune cell subsets were quantified from the blood samples collected at the above time points. Follow-up questionnaires¹⁰ assessing long-term symptoms were obtained from symptomatic CovPs (classes B to E) between 3 and 11 months after symptom onset (up to three questionnaires per CovP, median interval between two consecutive questionnaires for a same CovP: 5 months, IQR 1.5 month). The cohort also comprised 45 uninfected healthcare workers (hereafter healthy control, HC), defined by negative PCR test and serology, for whom blood samples were obtained at enrollment only.

First, to relate immunologic and metabolic changes in our cohort to findings from the published literature^5,6, we examined the impact of SARS-CoV-2 infection across clinical, metabolic and cellular variables over a 7-week window after positive swab or symptom onset. Univariate linear mixed models were used, accounting for repeated participant measurements within the 7-week window, with each biologic parameter modeled as the dependent variable. A differential abundance analysis between HCs and CovPs across all severity groups indicated widespread systemic dysregulation (Extended Data Fig. 1). CovPs had a metabolic signature characterized by increased expression of intermediates from the kynurenine pathway (3-hydroxykynurenine, kynurenine, quinolinic acid) and depletion of the upstream amino acid tryptophan (Extended Data Figs. 1 and 2). Compared to HCs, plasma from CovPs also showed reduced abundance of other amino acids, most pronounced for citrulline, histidine and ornithine (Extended Data Fig. 1). There was further a decrease in high-density lipoproteins compared to HCs, in particular apolipoproteins, and an increase in very-low-density lipoproteins, in particular triglycerides and free cholesterol, and in the N-acetyl glycoprotein signals GlycA and GlycB (Extended Data Fig. 1). Absolute numbers of neutrophils, plasmablasts and activated CD8⁺ T cells were increased in CovPs compared to HCs, while absolute counts of Vγ9 Vδ2 γδ T cells, MAIT cells, CD4⁺ T_EMRA cells, CD4⁺ T_EM cells and total B_M cells (marginal zone-like B cells) were decreased (Extended Data Fig. 1), as previously reported⁹. Mixed-model association analyses, again accounting for serial participant measurements within the first 7 weeks after symptom onset or positive swab, revealed significant linear relationships between CRP concentration in serum—an indicator of the acute-phase response in COVID-19 patients^11,12—and most metabolic and cellular variables in CovPs across all severity classes (A to E; Extended Data Fig. 3). Together, these analyses captured marked molecular and cellular abnormalities across the first 7 weeks after disease onset, and extensive covariation of these abnormalities with CRP.

Functional principal component analysis captures heterogeneity of patient C-reactive protein trajectories

Exploring study populations fails to provide patient-level insight. To overcome this limitation, we applied an FPCA framework to the parameter signatures discussed above. Based on the premise that inflammation is a central disruptor of homeostatic functions, we first aimed to identify the main modes of variation of CRP trajectories in CovPs over the first 7 weeks after symptom onset. Time points of analysis for asymptomatic CovPs (class A) were measured from the date of their first positive swab, while time points for symptomatic CovPs (classes B to E) were measured from the date of their symptom onset. As such, in order to prevent bias due to temporal shifts across participant trajectories (Supplementary Figs. 1 and 2), and ease the interpretation of the notion of ‘recovery’, we only analyzed the data from the 113 symptomatic CovPs whose CRP was measured at least twice. We used FPCA to model the CRP trajectory of each CovP as a mean function, reflecting the average CRP course across all CovPs, plus a truncated sum of patient-specific random deviations from the mean. These deviations were represented as a linear combination of orthonormal eigenfunctions, weighted by patient-specific scores, where the eigenfunctions and scores were estimated from the CovPs’ CRP data at all time points within 7 weeks after symptom onset. The first two eigenfunctions jointly accounted for more than 99% of the variance and, along with their corresponding scores, they were interpretable in terms of disease profiles: the first eigenfunction and associated scores (FPC1) acted as a proxy for inflammation severity (Fig. 1a,b). CovPs with positive FPC1 scores had higher CRP than the average CRP levels of all CovPs over the 7-week period (for example, CovP CV0047; Fig. 1b), while CovPs with negative FPC1 scores had lower CRP than the average CovP levels (for example, CovP CV0046; Fig. 1b). This interpretation was independently corroborated by the strong association of the scores with the B to E severity classes (Fig. 1c), establishing the clinical relevance of our FPCA inferences. We thus refer to the FPC1 scores as ‘severity scores’. The second eigenfunction and associated scores (FPC2) acted as a proxy for recovery from inflammation, with CovPs that had large positive FPC2 scores showing a marked improvement of the inflammatory status over time (for example, CovP CV0115; Fig. 1b). By contrast, inflammation resolved more slowly, or became more severe, compared to the mean function in CovPs with negative FPC2 scores (for example, CovP CV0212; Fig. 1b). Hence, we refer to the FPC2 scores as ‘recovery scores’. Of note, of the 113 CovPs, 40 CovPs from class C, D or E had either proven or suspected secondary infection—mostly hospital-acquired bacterial pneumonia (Supplementary Figs. 3 and 4). Additional analyses indicated an association between this status (infection proven, suspected or not suspected) and severity and recovery scores (Supplementary Fig. 3), in line with the expectation that a secondary infection would further increase inflammation levels. However, the FPCA estimates for CovPs with no suspected secondary infection were robust to the inclusion of CovPs with super-infection in the inference framework (Supplementary Fig. 5).

**Fig. 1: Functional principal component analysis of COVID-19 patients’ C-reactive protein levels.**

Gaussian mixture model (GMM) clustering of the severity and recovery scores uncovered three patient groups (Fig. 1b) with distinct inflammation trajectories. Group 1 had absent or mild inflammation over the 7-week window after symptom onset; group 2 had early, resolving inflammation, while group 3 had persisting inflammation (Fig. 1d), confirming that the patient-level FPCA estimates encompassed information on the magnitude and temporal profile of systemic inflammation as captured by CRP. Of note, all 12 CovPs who died belonged to group 3 (Fig. 1b). As expected¹³, additional analyses indicated that the groups were significantly associated with age and gender (Fig. 1e). Notably, groups 1–3 were not only entirely driven by disease severity, but also by the type of recovery profile (Fig. 1b). In particular, CovPs in group 2 had higher than average recovery scores, and each of the three groups comprised CovPs from multiple severity classes (Fig. 1c and Supplementary Table 1). We therefore refer to groups 1–3 as ‘recovery groups’.

The GMM clustering estimated probabilities of assignment of CovPs to the groups. Inspection of these probabilities indicated that some CovPs near the group boundaries were weakly allocated, that is, their assigned recovery group was uncertain (Fig. 1b). However, sensitivity analyses evaluating the impact of potential misallocation on all group-level findings indicated that estimates were barely changed by purposely reassigning all weakly allocated CovPs to another group (Supplementary Figs. 6–11). Collectively, these analyses showed that variability in CRP trajectories could be decomposed into separate ‘severity’ and ‘recovery’ latent contributions to the patient inflammation profiles, which enabled identifying three patient recovery groups with distinct inflammation dynamics.

Dynamics across biologic systems are coordinated

To test how longitudinal post-infection profiles, as reflected by recovery groups 1, 2 and 3, related to systemic recovery, we examined all biologic parameters quantified for the 113 symptomatic CovPs over the 7-week window following symptom onset. Mixed-effect modeling with time encoded as a continuous variable, indicated that the trajectories of the cellular and molecular parameters largely reflected the inflammation profile that characterized each recovery group, as exemplified by five parameters from each of the data types (Fig. 2a). However, correlation analyses conducted in each recovery group—stratifying the patient data into an acute-infection phase (weeks 0–3 after symptom onset) and a protracted-infection or convalescence phase (weeks 4–7 after symptom onset)—further indicated significant covariations across data types, for all recovery groups in both time bins, which were not observed in data from HCs (Extended Data Fig. 4). This suggested that, irrespective of disease severity, as a population, infected individuals did not make full immune and metabolic recovery by week 7 after symptom onset.

**Fig. 2: Group-level and patient-level estimates of cellular and molecular trajectories.**

To explore the interplay between the temporal profiles of the inflammatory, immune cell and metabolomic responses at the patient level, we performed additional multivariate FPCA studies on sets of parameters whose alterations were identified to be signatures of active SARS-CoV-2 infection in our population-level analyses, that is, apolipoproteins (HDA1, HDA2, VLAB), glycoproteins (GlycA, GlycB), kynurenine-pathway metabolites (3-hydroxykynurenine, kynurenine, quinolinic acid, tryptophan), serum cytokines (IFN-γ, IL-10, IL-1β, IL-6, TNF) and subsets of lymphocytes (CD4⁺ T_EMRA cells, CD4⁺ T_N cells, non-naive HLA-DR⁺CD38⁺ CD4⁺ T cells, CD8⁺ T_EMRA cells, CD8⁺ T_N cells, non-naive HLA-DR⁺CD38⁺ CD8⁺ T cells, total B_M cells, γδ T cells, mucosal-associated invariant T (MAIT) cells, natural killer (NK) cells, NKT cells, plasmablasts). Trajectories reconstructed for three individual CovPs (CV0261, CV0115 and CV0212) from each recovery group largely co-evolved over the disease course (Fig. 2b). The temporal profiles for quinolinic acid in these CovPs broadly agreed with the group-level longitudinal estimates (Fig. 2a). More generally, the 95% confidence bands of the estimated trajectories for CovP CV0261 (group 1, symptomatic but not hospitalized) covered the normal levels (HCs’ IQR; Fig. 2b), suggesting absent or mild alterations. The trajectories of CovP CV0115 (group 2, supplemental oxygen) were slightly above or below normal levels, but with no sign of deterioration over time (Fig. 2b). Finally, CovP CV0212 (group 3, assisted ventilation; died 44 d after symptom onset) exhibited a clear deterioration in all parameters, with the estimated trajectories departing from normal levels (Fig. 2b). The trajectories of cellular-level, molecular-level and inflammation-level parameters, and the severity and recovery scores for all CovP analyzed in the FPCA framework, can be inspected interactively at http://shiny.mrc-bsu.cam.ac.uk/apps/covid-19-patient-trajectories/, along with basic anthropomorphic and clinical information on each CovP. These reconstructed trajectories indicated that, although individual parameter courses tended to be representative of the general parameter evolution within the groups, some CovPs displayed exceptional trajectories, emphasizing the value of patient-specific estimates to resolve the covariation of cellular and molecular parameters with inflammation at the patient level.

In addition, the FPCA studies on the cellular and metabolic parameters more broadly defined the interplay between data types in response to infection. Similarly to the CRP FPCA, the first and second eigenfunctions could be interpreted as proxies for ‘severity’ and ‘recovery’ (or ‘normalization’) of the parameter trajectories, respectively, and the different sets of severity scores were highly correlated across data types (Fig. 2c), confirming that inflammatory, immunologic and metabolomic alterations were closely interlinked. Inspecting the partial correlation structure among severity scores further indicated direct relationships between the CRP severity scores and all other severity scores, except for the lipoprotein scores (with which they were indirectly associated through the glycoprotein and cytokine scores; Fig. 2d), corroborating the central role of CRP scores in assessing severity profiles. The severity scores from all data types echoed the clinical severity classes (B to E; Fig. 2c), indicating that the variability in the various trajectories reflected pathophysiologic signatures relevant to clinical disease. Correlation of recovery scores across the different types of parameters was somewhat weaker, although still significant in most cases (Fig. 2c), which may reflect distinct dynamics between cellular compartments or differing half-lives. Alternatively, it could indicate persistent disruption of select biologic systems despite resolution of inflammation. Together, these data captured the complexity of the mostly coordinated dynamic changes observed across the immune, metabolic and inflammatory systems.

Disruption of biologic parameters can last for months

To examine the long-term dynamics of cellular and molecular recovery, we modeled differential parameter abundance between each recovery group and HCs over an extended time frame of 1 year from infection, using linear mixed models and binning the CovPs data into five time windows (weeks 0–3, weeks 4–7, weeks 8–12, weeks 13–27, weeks 28–52 after symptom onset; chosen so as to involve similar numbers of observations). As expected, group 1 (mild or absent inflammation) showed minor early disruptions, linked to plasmablasts, citrulline, glutamic acid, glutamine, ornithine and several metabolic ratios that were significantly different from HCs in weeks 0–3 after symptom onset (Fig. 3a–d). In group 2, many parameters (including several lipoproteins, glycoproteins and lymphocytes) were altered up to week 4 (67%) or week 8 (33%) after symptom onset, but were indistinguishable from HCs in the subsequent time windows (that is, from week 8 onwards; Fig. 3a–d), thus aligning with the recovery from inflammation defining the group. Group 3 (persistent inflammation) showed widespread and long-lasting cellular alterations. Most notably, plasmablast and non-naive HLA-DR⁺CD38⁺ CD8⁺T cell numbers were still increased, whereas the numbers of plasmacytoid dendritic cells, CD4⁺ follicular helper T-like cells, Vγ9 Vδ2^lo γδ T cells, MAIT cells, naive B cells and CD4⁺ regulatory T (T_reg) cells were still decreased in the fourth time window (weeks 13–27; Fig. 3a). Similarly, there were persistent metabolic alterations (weeks 13–27) in group 3 compared to HCs, such as increased kynurenine, quinolinic acid and glutamic acid, and decreased levels of tryptophan, indole-3-acetic acid and serotonin (Fig. 3b), as well as increased levels of VLAB, GlycA and GlycB (Fig. 3c). Using group-level longitudinal mixed models with time modeled as a continuous variable, we next assessed baseline group effects (that is, group differences at symptom onset) and group × time interaction effects. Nearly half of the parameters included in this study had significant baseline and/or interaction effect(s) (for instance, NK cells, CD8⁺ T_EM cells, myeloid dendritic cells, CD8⁺ T_N cells, GlycA and GlycB displayed both baseline and interaction effects, CD4⁺ follicular helper T-like cells, Vγ9 Vδ2^lo γδ T cells, MAIT cells, kynurenine, quinolinic acid and tryptophan had significant baseline effects only, and HLA-DR⁺CD38⁺ CD8⁺ T cells had significant interaction effects only; Fig. 3a–c). These parameters, when measured early after infection, might therefore contain information regarding an individual’s ability to recover. In all, these results indicated that recovery groups 1–3 had dissimilar parameter recovery rates, with group-3 CovPs experiencing persisting biologic disruptions up to 6 months after symptom onset.

**Fig. 3: Long-term recovery-group trajectories for biologic parameters (n = 113 CovPs).**

Biologic recovery and clinical outcomes are interlinked

We next asked how biologic recovery profiles related to long COVID, using questionnaires on long-term symptoms (covering respiratory, neurological, gastrointestinal and other physical sequelae)¹⁰. These questionnaires were collected from 65 CovPs (54% female, median age 51 years old, IQR 29 years old) between months 2 and 11 after symptom onset. A comparison of the reported symptoms indicated that CovPs in group 3 reported more neurological symptoms (fatigue, muscle weakness, pain, difficulty eating, drinking, swallowing) compared to group 1 (Fig. 4a). Of note, in group 3, fatality and mechanical ventilation (a source of non-infection-related sequelae) added unavoidable limitations. We also conducted simple and partial correlation analyses (Supplementary Fig. 12), which indicated that the neurological symptoms were largely interconnected, with the exception of anosmia and dysgeusia. A latent factor analysis, which aimed to characterize the joint manifestation of the reported symptoms, identified two latent factors, the first of which (LFA1) appeared to be driven by the neurological symptoms. In particular, new neurology in limbs, fatigue and muscle weakness were attributed the largest loadings for LFA1 (Fig. 4a). As it accounted for most of the modeled variability, LFA1 also served as a natural patient-level latent proxy for long-term clinical manifestations, and was associated with the recovery groups (Fig. 4b). As such, LFA1 was significantly higher for CovPs from group 3 (poorer score), and lower for CovPs from group 1 (Fig. 4b). However, there was substantial variability within each recovery group. In particular, two patients, CV0165 in group 1 and CV0201 in group 2, had high composite scores, although their cellular, inflammatory and molecular trajectories had essentially returned to normal levels by week 7 after symptom onset, as estimated by the FPCA studies (and seen at http://shiny.mrc-bsu.cam.ac.uk/apps/covid-19-patient-trajectories/). This suggested that systemic, subjectively perceived sequelae persisted in these individuals despite absent, or rapidly resolving, inflammation and cellular/molecular disruption. Finally, a survival analysis found an increased risk of death for group 3 compared to groups 1 and 2 (log-rank test, P = 0.0007; Fig. 4c). Together, these results showed that the inflammatory response to infection, as encompassed by the three recovery groups, was tightly linked with survival outcomes and long-term clinical sequelae up to 1 year after symptom onset, despite substantial patient-to-patient variations.

**Fig. 4: Long-term symptom characterization of the recovery groups by latent factor and survival analyses.**

Joint model finds early markers of incomplete recovery

To test whether the type of recovery profile of individual CovPs could be predicted soon after infection, we trained an integrative model on samples collected during the early phase of the disease. Specifically, we applied a sparse generalized canonical correlation analysis (sGCCA) algorithm, extended for supervised analysis, on all cellular and molecular parameters jointly, also including age and gender as a separate canonical vector. sGCCA involved an internal selection procedure for identifying a subset of parameters predictive of incomplete recovery from the nearly 200 candidate markers. Briefly, sGCCA implemented a trade-off between maximizing the correlation of biologic parameters (immune cell subsets, polar metabolites, glycoproteins, lipoproteins and diverse metabolic ratios) and maximizing the discrimination between unfavorable recovery profiles (group 3) and favorable recovery profiles (group 1 and group 2 merged). We used the first sample from each CovP, provided it was taken within 3 weeks from symptom onset, and considered a training–test set split involving 70% and 30% of the samples, respectively. The signatures for the first two latent components (Fig. 5a) and the circular plot (Fig. 5b) were obtained using the training samples, while the receiver operating characteristic (ROC) curves (Fig. 5c) were obtained using the left-out test samples.

**Fig. 5: Predictive modeling for the recovery groups.**

The sign of the average signature parameter abundances (Fig. 5a) aligned with published results^5,6,9. NK cells appeared with the largest weight in the first signature (average cell numbers lower in CovPs with poor prognosis; Fig. 5a). Metabolic intermediates from the kynurenine pathway, and corresponding ratios, also appeared as important markers of the type of recovery (Fig. 5a), corroborating reports on the involvement of the pathway in adaptive immunity and inflammation¹⁴. Quinolinic acid, tryptophan, kynurenic acid, kynurenine and 3-hydroxykynurenine were selected in the first signature, while serotonin, a neurotransmitter derived from tryptophan catabolism, was selected in the second signature. These metabolites appeared together with a series of innate immune cells. The area under the curve (AUC) on the test set reached 90.8% (Fig. 5c). The predictive performance when restricting the signatures to each data type was also high (AUCs > 84.7%), suggesting strong interdependence of the biologic processes captured in our study. The AUC assessing the predictive value of age and gender was the second lowest (85.2%) after that of the cell-type block (84.7%), suggesting that the signal encompassed in the molecular markers has greater predictive value (>87.8%) compared to anthropomorphic characteristics only. Moreover, while age and gender already contain substantial information regarding the risk of incomplete recovery at the population level (Fig. 5c), at the individual level—that is, clinically most relevant—a low risk in terms of anthropomorphic factors does not rule out the possibility of incomplete organismal recovery, as our FPCA findings capture. For instance, the individual-level estimates for CovP CV0209, 38 years old, in group 3 (accessible at http://shiny.mrc-bsu.cam.ac.uk/apps/covid-19-patient-trajectories) indicated unfavorable molecular parameter trajectories and scores; this young CovP did not survive. Hence, relevant biologic markers measured early after disease onset in young individuals may correctly predict a poor outcome.

Finally, we developed an interactive tool to browse predictions based on selected markers from the two systemic recovery signatures available at http://shiny.mrc-bsu.cam.ac.uk/apps/covid-19-systemic-recovery-prediction-app/. For any new prediction run through this tool, an estimated prediction score conveys the level of confidence in the predicted recovery profile, and the predictive performance on the test set is re-evaluated based on the subset of markers supplied. This internal performance assessment of the sGCCA model on the Cambridge cohort suggested that the cellular and molecular markers from the estimated signatures contain valuable predictive information for anticipating incomplete recovery, already when measured soon after symptom onset.

Discussion

Our joint modeling framework of the longitudinal immunologic, metabolic and clinical data on a SARS-CoV-2 cohort indicated protracted temporal covariation patterns that highlighted acute-phase inflammation as a common denominator interlinked with incomplete clinical, immunologic and metabolic recovery up to a year after disease onset, and suggested that coordinated dynamics of the innate immune system, kynurenine and host lipid metabolism were likely underpinning restoration of overall homeostasis. Specifically, the reconstruction of parameter trajectories at the patient level, and the estimation of patient-specific latent ‘severity’ and ‘recovery’ scores, allowed the characterization of the inter-patient and intra-patient variability in responses to infection, and revealed three types of patient profiles in an unsupervised fashion, agnostic of any clinical assessments subject to interpretation. These profiles constituted distinct patient ‘systemic recovery’ categories, which characterized disease course, risk of death and of long COVID beyond peak clinical severity. Furthermore, our patient-centric FPCA framework permitted confronting the biology of group profiles with that of individual patients (http://shiny.mrc-bsu.cam.ac.uk/apps/covid-19-patient-trajectories/). Finally, an early prognosis of recovery for a new patient can be obtained from our pilot predictive model (http://shiny.mrc-bsu.cam.ac.uk/apps/covid-19-systemic-recovery-prediction-app/), whose excellent performance in our cohort warrants independent validation to evaluate clinical actionability.

Our data suggested that a limited number of pathophysiologic processes impact many of the parameters appearing together in early composite signatures predictive of poor prognosis. Of note, both the inflammatory immunopathology of severe acute SARS-CoV-2 infection (note that upper airway viral persistence was excluded for all severity classes⁹), and hemophagocytic lymphohistiocytosis¹⁵, a cytokine storm syndrome, are characterized by elevated triglycerides—likely linked to protracted hyper-inflammation. NK cells play a central role in antiviral immunity through the secretion of pro-inflammatory cytokines and cytotoxic activity, and NK cell dysfunction is also a key criterion of hemophagocytic lymphohistiocytosis^15,16. NK cells had the largest weight in the first predictive signature, with low absolute counts associated with poor outcome. This reduction of NK cells in peripheral blood from individuals with unresolving CRP (group 3) suggests an inflammation-driven perturbation of the NK cell compartment. The precise contribution of NK cell (dys)function to SARS-CoV-2-related acute and post-acute pathology remains unresolved^17,18,19.

Pro-inflammatory cytokines activate the kynurenine pathway through induction of indoleamine 2,3-dioxygenase (IDO-1), as observed in our cohort²⁰. Kynurenine-pathway activation has been implied in linking inflammation and central nervous system alterations by favoring the degradation of tryptophan toward 3‐hydroxykynurenine and quinolinic acid²¹, both of which appeared in the first predictive signature, and by reducing serotonin production²², which appeared in the second signature. It is plausible that abnormal levels of kynurenine-pathway intermediates, coupled with the significant reduction of serotonin abundance, contribute to neurological sequelae (for example, fatigue, weakness, chronic pain) of long COVID. Pathologic smell and taste, by contrast, may be related to more direct nerve sheath damage²³. In addition, kynurenine and some of its metabolites are endogenous ligands of the transcription factor AhR. AhR is a regulator of balance between T_reg and the T_H17 subset of helper T cells^24,25, and binding of kynurenine to AhR in CD4⁺ T cells promotes the expression of the transcription factor FoxP3 and T_reg cell differentiation²⁶. In addition, 3-hydroanthranillic acid—a kynurenine-pathway metabolite—also enhances T_reg cell differentiation²⁷. It will be interesting to define how function, frequency and absolute T_reg cell counts integrate to impact the pathophysiology of COVID-19.

The Cambridge cohort involved unvaccinated patients infected by the Wuhan variant and therefore constituted a clean floor for studying the immunologic and metabolic response triggered by the original strain of the SARS-CoV-2 virus. However, our study does not reflect the current epidemiological situation, which is also a limitation. Importantly however, our statistical framework is transferable to new cohorts (vaccinated, treated), facilitating systematic comparative work. By extension, long COVID and Epstein–Barr virus (EBV)-associated post-acute-infection syndrome (EBV-PAIS) share symptoms, and EBV reactivation after SARS-CoV-2 infection has been described^28,29. Assessing how EBV-PAIS and SARS-CoV-2-related EBV reactivation relate to disruption of immune and metabolic homeostasis, and interlinked disease processes, using our methodology may be of value.

Altogether, our framework allows studying the molecular and clinical correlates of organismal biologic recovery. It further offers patient-centric lenses to dissect the heterogeneity of the immune and metabolic responses to SARS-CoV-2 infection, which facilitates the formulation of mechanistic hypotheses as well as the development and testing of personalized intervention strategies.

Methods

Cohort and samples

The cohort and initial sampling timeline have been described in earlier work⁹. Late-time-point samples have subsequently been collected and new metabolic data have been quantified. Briefly, study participants were CovPs attending Addenbrooke’s Hospital, Royal Papworth Hospital NHS Foundation Trust or Cambridge and Peterborough Foundation Trust with a confirmed diagnosis of SARS-CoV-2 infection, as well as SARS-CoV-2-positive healthcare workers recruited from a staff screening program. The initial sampling program collected blood samples for 201 CovPs at enrollment (first samples collected on 31 March 2020), and at regular intervals up to 3 months after symptom onset, that is, inpatients were sampled approximately weekly up to 4 weeks after enrollment, then every 2 weeks up to 12 weeks after enrollment; discharged CovPs were asked to provide a follow-up sample 4–8 weeks after enrollment; outpatients were sampled after approximately 2 and 4 weeks after enrollment. Late-time-point samples were obtained up to a year after symptom onset, approximately at months 3, 6 and 12 following recruitment. A total of 14 additional hospitalized CovPs were included alongside the original cohort for the long-term follow-up after their discharge from Addenbrooke’s Hospital, providing late-time-point samples from month 3 after symptom onset onwards. Each participant was assigned to one of following categories of clinical severity: A, asymptomatic healthy workers (n = 18); B, symptomatic healthy workers (still working or self-isolating, n = 40); C, CovPs who presented to hospital but never required oxygen supplementation (n = 50); D, CovPs who were admitted to hospital and whose maximal respiratory support was supplemental oxygen (n = 38); E, CovPs who at some point required assisted ventilation (n = 69). Controls (n = 45) were SARS-CoV-2 PCR-negative hospital staff members with a negative serology. No statistical methods were used to predetermine sample sizes but our sample sizes are similar to those reported in previous publications^5,6. No randomization or blinding was applicable as this was not an intervention study. Blood samples were drawn in EDTA, sodium citrate, serum and PAXgene Blood RNA tubes (BD Biosciences) and processed by the CITIID-NIHR COVID BioResource Collaboration group. No unique material was used in this study. All study participants provided written informed consent before enrollment. Ethics approval was obtained from the East of England–Cambridge Central Research Ethics Committee (‘NIHR BioResource’ REC ref. 17/EE/0025, and ‘Genetic variation AND Altered Leucocyte Function in health and disease–GANDALF’ REC ref. 08/H0308/176).

C-reactive protein and cytokines

High-sensitivity CRP and serum cytokines (IFN-γ, IL-6, IL-10, IL-1β, TNF) were quantified by laboratories in Cambridge using standard assays⁹.

Flow immunophenotyping and CyTOF assays

The assays are detailed in previous work on the early follow-up of the study cohort⁹. Briefly, peripheral blood mononuclear cells were obtained from peripheral venous blood collected in 10% sodium citrate tubes (up to 27 ml per sample). They were isolated using Leucosep tubes (Greiner Bio-One) with Histopaque 1077 (Sigma) by centrifugation at 800g for 15 min at room temperature. Peripheral blood mononuclear cells at the interface were collected, rinsed twice with autoMACS running buffer (Miltenyi Biotech) and cryopreserved in FBS with 10% dimethylsulfoxide. All samples were processed within 4 h of collection.

Nuclear magnetic resonance spectroscopy and mass spectrometry based quantitative metabolic phenotyping

¹H nuclear magnetic resonance (NMR) sample preparation was performed according to Bruker IVDr protocols³⁰ and recommended procedures for IVDr metabolic analysis of COVID-19 plasma samples³¹. Plasma samples were stored at −80 °C until required, defrosted and centrifuged at 13,000g for 10 min at 4 ^°C. The plasma supernatant was mixed with buffer (75 mM Na₂HPO₄, 2 mM NaN₃, 4.6 mM sodium trimethylsilyl propionate-[2,2,3,3-2H₄] in 80% deuterium oxide, pH 7.4 ± 0.1; 1:1 ratio), and transferred into a Bruker SampleJet NMR tube (5 mm). NMR measurements were performed on a Bruker 600 MHz Avance III HD spectrometer (IVDr) equipped with a BBI probe and fitted with Bruker SampleJet robot with the cooling system set to 5 °C. A quantitative calibration was completed before the analysis³². For each sample, a ¹H one-dimensional experiment with solvent pre-saturation (32 scans, 98,304 data points, spectral width of 18,028.85 Hz) and a DIRE experiment was run (64 scans, 98,304 data points, spectral width of 18,028.85 Hz)³³. Lipoprotein reports itemizing 112 lipoprotein parameters for each plasma sample were generated using the Bruker IVDr Lipoprotein Subclass Analysis (B.I.LISA) method³⁰. ERETIC correction³⁴ was applied to all DIRE spectra and spectra were calibrated by setting the spectral reference value to 0 (SR = 0). Integration of the alpha-1-acid glycoprotein N-acetyl signals GlycA and GlycB, and supramolecular phospholipid composite was performed using in-house scripts. GlycA, GlycB and supramolecular phospholipid composite integrations were performed using the following regions: δ 2.05–2.09, δ 2.09–2.12 and δ 3.15–3.35.

For biogenic amines, amino acids and tryptophan metabolic pathway analysis, plasma samples were thawed at 4 °C and prepared following methods previously reported^35,36. For the quantification of biogenic amines and amino acid metabolites, separation was performed by ultra-high-performance liquid chromatography (UHPLC) using an Acquity UPLC (Waters) coupled to a Bruker impact II QToF mass analyzer (Bruker Daltonics). Resulting data files were processed for peak integration and quantification using Target Analysis for Screening Quantification (TASQ) software v2.2 (Bruker Daltonics). For the measurement of tryptophan and associated catabolites, separation was performed using an Acquity UPLC (Waters) coupled to a Xevo TQ-XS mass spectrometer (Waters). Raw files were processed for peak integrations and metabolite quantification using the TargetLynx package within MassLynx v4.2 (Waters). For both analytical assays, calibration curves were linearly fitted using a weighting factor of 1/x and quality control checks were performed.

Data preprocessing and quality control

All statistical analyses were conducted using the R software³⁷. Except for ratios for which no systematic transformation was applied, CRP, cytokines and all other cellular and molecular variables were log-transformed, using $x \mapsto {{{\mathrm{log}}}}_2\left( {x + 1} \right)$ (with the offset ‘+ 1’ accounting for zero counts while ensuring positivity), to stabilize variances and improve approximations of normality in statistical tests (although the normality assumption was not formally tested for each variable). For each molecular dataset, the presence of extreme measurements and/or batch effects was assessed using PCA visualization. No batch effect was observed. A standard box plot rule was applied to discard extreme samples, that is, with >20% of their measurements falling outside the Tukey outer fences. Following this procedure, two immune cell-type samples (1.1%) and ten metabolomic samples (1.6%) were removed from all downstream analyses (all glycoprotein and lipoprotein samples were retained).

Differential abundance analysis and association with C-reactive protein

Differential abundance analysis between CovPs and HCs was conducted using linear mixed modeling to account for serial patient measurements over a window of 7 weeks after symptom onset or positive swab. Analyses also included gender and age as fixed effects. The following model was implemented using lmerTest R package (R notation):

$${{{\mathrm{dep}}}}\_{{{\mathrm{var}}}}\sim {{{\mathrm{covid}}}}\_{{{\mathrm{status}}}} + {{{\mathrm{age}}}} + {{{\mathrm{gender}}}} + (1|{{{\mathrm{subject}}}}\_{{{\mathrm{id}}}})$$

where the different molecular variables were taken in turn as the dependent variable dep_var and covid_status is a binary variable coding for ‘COVID positive’ or ‘COVID negative’. Significance of the covid_status effect was assessed using a type 3 F test and Satterthwaite’s method (to estimate the degrees of freedom for fixed effects), and adjustment for multiple testing across each molecular dataset was performed using an FDR level of 5%. When significance was reached, the molecular variable was called ‘upregulated’ or ‘downregulated’ based on the sign of the fold change.

The same linear mixed-model framework was used to test the association between CRP and each cellular/molecular variable, replacing the covid_status categorical variable with the quantitative (log-transformed) CRP variable.

Unless specified otherwise, all the analyses described hereafter are adjusted for multiple testing per data type, using an FDR level of 5%.

Functional principal component analysis

FPCA was conducted to characterize the inter-patient and intra-patient variability and estimate individual disease trajectories using parameters reflecting inflammation, as well as the immunometabolic response to infection. These parameters were CRP levels, five cytokines (IFN-γ, IL-10, IL-1β, IL-6 and TNF), twelve lymphocyte subsets (CD4⁺ T_EMRA cells, CD4⁺ T_N cells, non-naive HLA-DR⁺CD38⁺ CD4⁺ T cells, CD8⁺ T_EMRA cells, CD8⁺ T_N cells, non-naive HLA-DR⁺CD38⁺ CD8⁺ T cells, total B_M cells, γδ T cells, MAIT cells, NK cells, NKT cells and plasmablasts), three lipoproteins (HDA1, HDA2 and VLAB), two glycoprotein signals (GlycA and GlycB) and four metabolites from the kynurenine pathway (3-hydroxykynurenine, kynurenine, quinolinic acid and tryptophan).

The R packages face³⁸ and mfaces³⁹ were used to implement the univariate FPCA study (for CRP levels) and multivariate FPCA studies (for all other groups of parameters listed above). Briefly, FPCA modeled the individual parameter trajectories as a mean function plus a truncated sum of random deviations from the mean, expressed as a linear combination of orthonormal eigenfunctions, weighted by patient-specific scores. In the multivariate setting, each parameter had a corresponding set of eigenfunctions but the scores were common to all parameters. Smooth estimates of variance and autocorrelation functions were also estimated as part of the FPCA framework, as well as cross-correlation functions in the multivariate setting. FPCA trajectories were estimated over a time window of 7 weeks after symptom onset. Asymptomatic individuals (severity class A) were not considered in the FPCA framework and all downstream analyses to avoid ambiguity when discussing recovery and to limit the risk of bias due to their different temporal alignment compared to individuals from the other classes (time was measured from the date of symptom onset for classes B to E, and from the date of first positive swab for class A; Supplementary Figs. 1 and 2).

Gaussian mixture model clustering and sensitivity analysis for recovery-group assignment

GMM clustering with unconstrained covariances was performed on the CRP scores corresponding to the first two FPCA eigenfunctions to uncover groups of CovPs with similar disease trajectories. For the n individuals, the GMM estimated an n × K matrix of probabilities, whose (i, k)^th entry was the probability that subject i belonged to cluster k. The optimal classification was obtained by assigning each individual to the cluster corresponding to their highest probability.

To assess the impact of weakly assigned individuals on the subsequent group-level analyses, we conducted cluster membership sensitivity analyses by purposely misassigning individuals to clusters as follows: we identified all weakly assigned individuals to be individuals with maximum cluster probability below 0.8 (that is, for which the probability that they do not belong to their optimal recovery group exceeds 20%) and reassigned them to the cluster corresponding to the second-highest probability (that is, their ‘next best’ cluster). We then re-performed all analyses based on the resulting ‘perturbed’ recovery groups, and compared all estimates with those based on the original optimal recovery-group clustering (Supplementary Figs. 6–11).

Fisher’s exact tests were used to characterize the groups based on gender, and ANOVA and one-versus-all t-test (two-sided) were used to assess differences in age. For all subsequent analyses, significance labels are based on FDR-adjusted P values.

Simple and partial correlation estimates

Correlation between the parameters considered in the FPCA studies (lymphocytes, lipoproteins, glycoproteins, kynurenine-pathway metabolites, cytokines and CRP levels) was assessed during acute infection (weeks 0–3 after symptom onset) and convalescence (weeks 4–7 after symptom onset; Extended Data Fig. 4). For each time window and each individual, the multiple measurements per person were averaged, before computing correlation within recovery groups. For HCs, samples were available at a single time point for each individual so a single correlation matrix was computed. FDR-adjusted correlation tests were performed using the R package TestCor separately for each recovery group and for the HCs.

Pairwise Pearson simple and partial correlation among the severity and recovery FPCA scores for lymphocytes, lipoproteins, glycoproteins, metabolites from the kynurenine pathway, as well as cytokines and CRP was also estimated.

Longitudinal modeling

Univariate mixed models were used to estimate the temporal profile of each cellular/molecular parameter for the recovery groups 1, 2 and 3, that is, with random effects to account for serial measurements of patients. Polynomial splines of degree 2 were used to model the parameters with respect to the interaction between the time from symptom onset (time) and the recovery groups (group). The following model was fitted for each molecular variable (dep_var):

$${{{\mathrm{dep}}}}\_{{{\mathrm{var}}}}\sim {{{\mathrm{time}}}} \times {{{\mathrm{group}}}} + ({{{\mathrm{1}}}}|{{{\mathrm{subject}}}}\_{{{\mathrm{id}}}})$$

The significance of baseline effects (that is, difference between groups at time zero) and interaction effects (that is, difference in the group temporal courses) were tested using LR tests and adjusted for multiplicity across all variables of same data type.

Direct comparisons with HC levels were also conducted using mixed models whereby samples were grouped into five time windows (involving similar numbers of samples): (0, 3), (4, 7), (8, 12), (13, 27) and (28, 52) weeks after symptom onset. For each molecular variable and each time window, the following model was fitted:

$${{{\mathrm{dep}}}}\_{{{\mathrm{var}}}}\sim {{{\mathrm{category}}}} + ({{{\mathrm{1}}}}|{{{\mathrm{subject}}}}\_{{{\mathrm{id}}}})$$

where category is a categorical variable coding for the HC and three recovery groups, with the former used a reference factor level. To assess the discrepancy between each group’s parameters and HC parameters, the significance of the category levels was examined, adjusting for multiplicity over all parameters of same data type. Significance was not reported if ≤25 samples were available in the group and the time window under consideration.

Long-term symptom analyses

Long-term symptoms were collected under the form of questionnaires given to CovPs between 2 and 11 months after symptom onset (average of 6 months). The questionnaires have been developed from Turner-Stokes et al.¹⁰, and were ethically reviewed for research use. The list of symptoms was: dyspnea; cough; chest pain on exertion; palpitations or swollen ankles; persisting fever (2 months or more); new leg swelling in one leg or shortness of breath with chest pain; new skin rashes or sores; voice alteration; difficulties eating, drinking or swallowing; constant noisy breathing or throat whistling; anosmia or dysgeusia; difficulty to gain or maintain weight, loss of appetite; new neurology in one or more limbs; new pain in one or more parts of the body; muscle weakness, balance or range of movement of joints; fatigue; cognition—memory, concentration and thinking skills. The questionnaire grades were recoded so the scores range from 0 (no symptom) to 5 (extreme manifestation of the symptom); the description of these grades for each symptom is detailed in the Supplementary Data Table 1. Up to three questionnaires per patient were collected, and the scores of CovPs with more than one questionnaire were averaged before the analysis. Correlation and partial correlation analyses were conducted on all symptoms apart from persisting fever (not experienced by any patient and hence a constant variable), adjusting for multiplicity (FDR 5%; Supplementary Fig. 12). Exploratory latent factor analysis by principal axes with oblique factor rotation was performed to characterize the joint manifestation of the symptoms, treating the data on the Likert scale. Nonparametric tests were used for overall comparison across recovery groups (Kruskal–Wallis rank-sum test), as well as for pairwise tests (two-sided Wilcoxon rank-sum tests).

Survival analysis

The risk of death was studied by Kaplan–Meier survival analysis, using the three recovery groups 1, 2 and 3 as strata. Difference in survival between the groups was assessed using a log-rank test.

Predictive modeling

Predictive modeling of systemic recovery was carried out to classify CovPs, shortly after disease onset, in terms of unfavorable (group 3) or favorable disease progression (groups 1 and 2, merged). An integrative sGCCA approach, adapted for supervised analysis, DIABLO⁴⁰, was applied on the first sample of each patient from the cohort, provided that this sample was taken within 3 weeks after symptom onset. CovPs were randomly assigned to a training set or a left-out test set, according to a 70%–30% split (using the R package caret to balance the recovery-category distributions within the training and test sets). The sGCCA framework jointly accommodated all parameters across the different cellular and molecular data types (33 immune cell types, 36 polar metabolites, 103 glycoproteins and lipoproteins, and 19 selected metabolic ratios), as well as age and gender in a separate canonical vector. The training procedure used threefold cross-validation to select the numbers of predictors within the sparse latent components (composite predictive signatures).

Systemic recovery-prediction tool

The predictive model based on CovPs’ early samples allows generating predictions for new samples collected when a patient presents to the clinic. An interactive tool to browse recovery prognoses is provided at http://shiny.mrc-bsu.cam.ac.uk/apps/covid-19-systemic-recovery-prediction-app/. For a new patient, each marker from the two signatures identified by the model (Fig. 5a) can be set in terms of percentiles of the empirical distribution formed by all Cambridge cohort patient and HC measurements. The colors appearing on the bar suggest normal ranges (gray, corresponding to the HCs’ IQR), low values (blue, smaller than the HCs’ first quartile) or high values (red, larger than the HCs’ third quartile). The initial values correspond to the median of HCs’ measurements. The tool runs the prediction based on the input values and it outputs the systemic recovery prognosis (1 + 2 or 3), along with a predicted score ranging from 0.5 to 1 and conveying the degree of confidence about the prediction (the larger, the higher the confidence).

As only a subset of markers from the two signatures may be quantified from blood tests collected in the clinic, the prediction can be based on a selection of markers chosen from the drop-down menus; the deselected markers are omitted in the linear combination corresponding to their latent component. To assess the expected performance of the model when restricted to a subset of markers, ROC curves are recomputed based on this subset using the Cambridge left-out test set (the curves are updated when selecting or deselecting markers). A poor performance on the test set (for example, AUC < 0.7) suggests that the selection of markers is insufficient to provide reliable predictions for the new patient. In that case, the predicted category should be disregarded and values for additional markers should be supplied where possible. As Vg9Vd2^hi γδ T appears in both signatures, its slider bar is displayed in the first column only.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The immune cell type, cytokine and CRP data, and the metadata for the early (0–3 months) follow-up⁹ are available at https://www.covid19cellatlas.org/patient/citiid/. The polar metabolite, glycoprotein and lipoprotein data for the full year post disease onset are available at https://doi.org/10.5281/zenodo.7277164.

Code availability

The source code for the two R Shiny Apps can be accessed at https://doi.org/10.5281/zenodo.7527518 and https://doi.org/10.5281/zenodo.7528488. A repository with the scripts used for all analyses is also publicly accessible at https://doi.org/10.5281/zenodo.7527330.

References

ReliefWeb. Coronavirus Disease (COVID-19): Weekly Epidemiological Update (4 January 2023)—World https://reliefweb.int/report/world/coronavirus-disease-covid-19-weekly-epidemiological-update-4-january-2023
Alwan, N. A. Track COVID-19 sickness, not just positive tests and deaths. Nature 584, 170 (2020).
Article CAS Google Scholar
Huang, C. et al. 6-month consequences of COVID-19 in patients discharged from hospital: a cohort study. Lancet 397, 220–232 (2021).
Article CAS Google Scholar
Sudre, C. H. et al. Attributes and predictors of long COVID. Nat. Med. 27, 626–631 (2021).
Article CAS Google Scholar
Lucas, C. et al. Longitudinal analyses reveal immunological misfiring in severe COVID-19. Nature 584, 463–469 (2020).
Article CAS Google Scholar
Masuda, R. et al. Integrative modeling of plasma metabolic and lipoprotein biomarkers of SARS-CoV-2 infection in Spanish and Australian COVID-19 patient cohorts. J. Proteome Res. 20, 4139–4152 (2021).
Article CAS Google Scholar
Holmes, E. et al. Incomplete systemic recovery and metabolic phenoreversion in post-acute-phase nonhospitalized COVID-19 patients: implications for assessment of post-acute COVID-19 syndrome. J. Proteome Res. 20, 3315–3329 (2021).
Article CAS Google Scholar
Peluso, M. J. et al. Markers of Immune activation and inflammation in individuals with postacute sequelae of severe acute respiratory syndrome coronavirus 2 infection. J. Infect. Dis. 224, 1839–1848 (2021).
Article CAS Google Scholar
Bergamaschi, L. et al. Longitudinal analysis reveals that delayed bystander CD8⁺ T cell activation and early immune pathology distinguish severe COVID-19 from mild disease. Immunity 54, 1257–1275 (2021).
Article CAS Google Scholar
Turner-Stokes, L. et al. The post-ICU presentation screen (PICUPS) and rehabilitation prescription (RP) for intensive care survivors part I: development and preliminary clinimetric evaluation. J. Intensive Care Soc. 23, 253–263 (2022).
Article Google Scholar
Chen, W. et al. Plasma CRP level is positively associated with the severity of COVID-19. Ann. Clin. Microbiol. Antimicrob. 19, 18 (2020).
Article CAS Google Scholar
Herold, T. et al. Elevated levels of IL-6 and CRP predict the need for mechanical ventilation in COVID-19. J. Allergy Clin. Immunol. 146, 128–136 (2020).
Article CAS Google Scholar
Selvan, M. E. Risk factors for death from COVID-19. Nat. Rev. Immunol. 20, 407–407 (2020).
Article CAS Google Scholar
Lawler, N. G. et al. Systemic perturbations in amine and kynurenine metabolism associated with acute SARS-CoV-2 infection and inflammatory cytokine responses. J. Proteome Res. 20, 2796–2811 (2021).
Article CAS Google Scholar
Buyse, S. et al. Critical care management of patients with hemophagocytic lymphohistiocytosis. Intensive Care Med. 36, 1695–1702 (2010).
Article Google Scholar
Ahmed, F., Jo, D.-H. & Lee, S.-H. Can natural killer cells be a principal player in anti-SARS-CoV-2 immunity? Front. Immunol. 11, 586765 (2020).
Article CAS Google Scholar
Maucourant, C. et al. Natural killer cell immunotypes related to COVID-19 disease severity. Sci. Immunol. 5, eabd6832 (2020).
Article CAS Google Scholar
Leem, G. et al. Abnormality in the NK-cell population is prolonged in severe COVID-19 patients. J. Allergy Clin. Immunol. 148, 996–1006 (2021).
Article CAS Google Scholar
Witkowski, M. et al. Untimely TGF-β responses in COVID-19 limit antiviral functions of NK cells. Nature 600, 295–301 (2021).
Article CAS Google Scholar
Correale, J. Immunosuppressive amino-acid catabolizing enzymes in multiple sclerosis. Front. Immunol 11, 600428 (2021).
Article Google Scholar
Stone, T. W. Development and therapeutic potential of kynurenic acid and kynurenine derivatives for neuroprotection. Trends Pharmacol. Sci. 21, 149–154 (2000).
Article CAS Google Scholar
Remus, J. L. & Dantzer, R. Inflammation models of depression in rodents: relevance to psychotropic drug discovery. Int. J. Neuropsychopharmacol. 19, pyw028 (2016).
Article Google Scholar
Wu, Y. et al. Nervous system involvement after infection with COVID-19 and other coronaviruses. Brain. Behav. Immun. 87, 18–22 (2020).
Article CAS Google Scholar
Quintana, F. J. et al. Control of T_reg and T_H17 cell differentiation by the aryl hydrocarbon receptor. Nature 453, 65–71 (2008).
Article CAS Google Scholar
Veldhoen, M. et al. The aryl hydrocarbon receptor links T_H17-cell-mediated autoimmunity to environmental toxins. Nature 453, 106–109 (2008).
Article CAS Google Scholar
Mezrich, J. D. et al. An interaction between kynurenine and the aryl hydrocarbon receptor can generate regulatory T cells. J. Immunol. 185, 3190–3198 (2010).
Article CAS Google Scholar
Yan, Y. et al. IDO upregulates regulatory T cells via tryptophan catabolite and suppresses encephalitogenic T cell responses in experimental autoimmune encephalomyelitis. J. Immunol. 185, 5953–5961 (2010).
Article CAS Google Scholar
Choutka, J., Jansari, V., Hornig, M. & Iwasaki, A. Unexplained post-acute infection syndromes. Nat. Med. 28, 911–923 (2022).
Article CAS Google Scholar
Su, Y. et al. Multiple early factors anticipate post-acute COVID-19 sequelae. Cell 185, 881–895 (2022).
Article CAS Google Scholar
Jiménez, B. et al. Quantitative lipoprotein subclass and low molecular weight metabolite analysis in human serum and plasma by ¹H NMR spectroscopy in a multilaboratory trial. Anal. Chem. 90, 11962–11971 (2018).
Article Google Scholar
Loo, R. L. et al. Quantitative in vitro diagnostic NMR spectroscopy for lipoprotein and metabolite measurements in plasma and serum: recommendations for analytical artifact minimization with special reference to COVID-19/SARS-CoV-2 samples. J. Proteome Res. 19, 4428–4441 (2020).
Article CAS Google Scholar
Dona, A. C. et al. Precision high-throughput proton nmr spectroscopy of human urine, serum, and plasma for large-scale metabolic phenotyping. Anal. Chem. 86, 9887–9894 (2014).
Article CAS Google Scholar
Lodge, S. et al. Diffusion and relaxation edited proton NMR spectroscopy of plasma reveals a high-fidelity supramolecular biomarker signature of SARS-CoV-2 infection. Anal. Chem. 93, 3976–3986 (2021).
Article CAS Google Scholar
Akoka, S., Barantin, L. & Trierweiler, M. Concentration measurement by proton NMR using the ERETIC method. Anal. Chem. 71, 2554–2557 (1999).
Article CAS Google Scholar
Gray, N. et al. A simultaneous exploratory and quantitative amino acid and biogenic amine metabolic profiling platform for rapid disease phenotyping via UPLC–QToF–MS. Talanta 223, 121872 (2021).
Article CAS Google Scholar
Whiley, L. et al. Ultrahigh-performance liquid chromatography tandem mass spectrometry with electrospray ionization quantification of tryptophan metabolites and markers of gut health in serum and plasma—application to clinical and epidemiology cohorts. Anal. Chem. 91, 5207–5216 (2019).
Article CAS Google Scholar
R Software. The R Project for Statistical Computing. https://www.r-project.org/
Xiao, L., Li, C., Checkley, W. & Crainiceanu, C. Fast covariance estimation for sparse functional data. Stat. Comput. 28, 511–522 (2018).
Article Google Scholar
Li, C., Xiao, L. & Luo, S. Fast covariance estimation for multivariate sparse functional data. Stat 9, e245 (2020).
Article Google Scholar
Singh, A. et al. DIABLO: an integrative approach for identifying key molecular drivers from multi-omics assays. Bioinformatics 35, 3055–3062 (2019).
Article CAS Google Scholar

Download references

Acknowledgements

We thank NIHR BioResource volunteers for their participation, and gratefully acknowledge NIHR BioResource centers, NHS Trusts and staff for their contribution. We thank the National Institute for Health and Care Research, NHS Blood and Transplant and Health Data Research UK as part of the Digital Innovation Hub Programme. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. We are also grateful for the generous support of CVC Capital Partners, the Evelyn Trust (20/75), UKRI COVID Immunology Consortium, Addenbrooke’s Charitable Trust (12/20A), the UKRI/NIHR through the UK Coronavirus Immunology Consortium (UK-CIC), the COVID-19 Rapid Response DCF (RG93172/HESS/40558) and the Botnar Research Centre for Child Health (BRCCH) for their financial support. H.R. thanks the Lopez–Loreta Foundation for their support. J.N., J.W. and E.H. acknowledge the Australian Federal Government’s Medical Research Future Fund (MRFF-ARAPC ARG76435), the Western Australian Department of Health, the Spinnaker Health Research Foundation and the McCusker Charitable Foundation for their support and contributions to this research program. K.G.C.S. acknowledges support from Wellcome Investigator and Collaborative Awards (200871/Z/16/ Z and 219506/Z/19/Z) and a Medical Research Council Programme Grant (MR/L019027). We also thank the NIHR Cambridge Clinic Research Facility outreach team for enrollment of patients, and the NIHR Cambridge Biomedical Research Centre Cell Phenotyping Hub and the CRUK Cambridge Institute flow cytometry core facility for their support with flow and mass cytometry.

Author information

Authors and Affiliations

MRC Biostatistics Unit, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
Hélène Ruffieux, Tui H. Nolan & Sylvia Richardson
Cambridge Institute of Therapeutic Immunology and Infectious Disease, Jeffrey Cheah Biomedical Centre, University of Cambridge, Cambridge, UK
Aimee L. Hanson, Laura Bergamaschi, Federica Mescia, Lorinda Turner, Aloka de Sa, Victoria S. Pelly, Prasanti Kotagiri, Paul A. Lyons, Kenneth G. C. Smith & Christoph Hess
Department of Medicine, University of Cambridge, Addenbrooke’s Hospital, Cambridge, UK
Aimee L. Hanson, Laura Bergamaschi, Federica Mescia, Lorinda Turner, Aloka de Sa, Victoria S. Pelly, Prasanti Kotagiri, John R. Bradley, Paul A. Lyons, Kenneth G. C. Smith & Christoph Hess
Australian National Phenome Centre, Health Futures Institute, Murdoch University, Perth, Western Australia, Australia
Samantha Lodge, Nathan G. Lawler, Luke Whiley, Nicola Gray, Elaine Holmes, Julien Wist & Jeremy K. Nicholson
Center for Computational and Systems Medicine, Health Futures Institute, Murdoch University, Perth, Western Australia, Australia
Samantha Lodge, Nathan G. Lawler, Luke Whiley, Nicola Gray, Elaine Holmes, Julien Wist & Jeremy K. Nicholson
Perron Institute for Neurological and Translational Science, Nedlands, Western Australia, Australia
Luke Whiley
Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
Nathalie Kingston
NIHR BioResource, Cambridge University Hospitals NHS Foundation, Cambridge Biomedical Campus, Cambridge, UK
Nathalie Kingston & John R. Bradley
Department of Metabolism, Digestion and Reproduction, Faculty of Medicine, Imperial College London, London, UK
Elaine Holmes
Chemistry Department, Universidad del Valle, Cali, Colombia
Julien Wist
Institute of Global Health Innovation, Imperial College London, London, UK
Jeremy K. Nicholson
Department of Biomedicine, University and University Hospital Basel, Basel, Switzerland
Glenn R. Bantug & Christoph Hess
Botnar Research Centre for Child Health (BRCCH) University Basel & ETH Zurich, Basel, Switzerland
Glenn R. Bantug & Christoph Hess

Authors

Hélène Ruffieux
View author publications
You can also search for this author in PubMed Google Scholar
Aimee L. Hanson
View author publications
You can also search for this author in PubMed Google Scholar
Samantha Lodge
View author publications
You can also search for this author in PubMed Google Scholar
Nathan G. Lawler
View author publications
You can also search for this author in PubMed Google Scholar
Luke Whiley
View author publications
You can also search for this author in PubMed Google Scholar
Nicola Gray
View author publications
You can also search for this author in PubMed Google Scholar
Tui H. Nolan
View author publications
You can also search for this author in PubMed Google Scholar
Laura Bergamaschi
View author publications
You can also search for this author in PubMed Google Scholar
Federica Mescia
View author publications
You can also search for this author in PubMed Google Scholar
Lorinda Turner
View author publications
You can also search for this author in PubMed Google Scholar
Aloka de Sa
View author publications
You can also search for this author in PubMed Google Scholar
Victoria S. Pelly
View author publications
You can also search for this author in PubMed Google Scholar
Prasanti Kotagiri
View author publications
You can also search for this author in PubMed Google Scholar
Nathalie Kingston
View author publications
You can also search for this author in PubMed Google Scholar
John R. Bradley
View author publications
You can also search for this author in PubMed Google Scholar
Elaine Holmes
View author publications
You can also search for this author in PubMed Google Scholar
Julien Wist
View author publications
You can also search for this author in PubMed Google Scholar
Jeremy K. Nicholson
View author publications
You can also search for this author in PubMed Google Scholar
Paul A. Lyons
View author publications
You can also search for this author in PubMed Google Scholar
Kenneth G. C. Smith
View author publications
You can also search for this author in PubMed Google Scholar
Sylvia Richardson
View author publications
You can also search for this author in PubMed Google Scholar
Glenn R. Bantug
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Hess
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

The Cambridge Institute of Therapeutic Immunology and Infectious Disease-National Institute of Health Research (CITIID-NIHR) BioResource COVID-19 Collaboration

Contributions

Conceptualization, H.R. and C.H. with input from A.L.H., T.H.N., P.K., J.K.N., P.A.L., K.G.C.S., S.R. and G.R.B. Methodology, H.R., T.H.N., S.R., G.R.B. and C.H. Data collection and curation, A.H.L., S.L., N.G.L., L.W., N.G., L.B., F.M., L.T., A.d.S., V.S.P., E.H. and J.W. Data analysis, H.R., A.L.H., S.L., N.G.L., J.W., G.R.B. and C.H. Interpretation, H.R., A.L.H., S.L., N.G.L., T.H.N., J.W., J.K.N., P.A.L., K.G.C.S., S.R., G.R.B. and C.H. Writing—original draft, H.R., G.R.B. and C.H. with help of A.L.H., S.L., N.G.L. and J.W. Writing—review & editing, H.R., A.L.H., S.L., N.G.L., L.W., N.G., T.H.N., L.B., F.M., L.T., A.d.S., V.S.P., P.K., N.K., J.R.B., E.H., J.W., J.K.N., P.A.L., K.G.C.S., S.R., G.R.B. and C.H. Organization of the clinical trial cohort, N.K., J.R.B., P.A.L. and K.G.C.S.

Corresponding authors

Correspondence to Hélène Ruffieux or Christoph Hess.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Immunology thanks Randy Cron and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Ioana Visan, in collaboration with the Nature Immunology team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Volcano plots for differentially abundant biological parameters, comparing COVID- positive patients with HCs, using linear mixed modeling accounting for repeated subject observations (n = 215 CovPs, n = 45 HCs).

a. Immune cell subsets. x-axis, log₂ fold-change; y-axis, −log₁₀ adjusted p-value; dashed horizontal line, 5% FDR threshold. Red and blue indicate significant up- resp. downregulation in CovPs vs HCs. b. Polar metabolites. Axes and colors as in a. c. Main classes of glyco- & lipoproteins. Axes as in a. Colors refer to compounds. d. Metabolic ratios. Axes and colors as in a.

Extended Data Fig. 2 The kynurenine and serotonin pathways.

Metabolites: TRP - tryptophan, FKYN - formyl-kynurenine, KYN - kynurenine, 3-HK - 3-hyroxykynurenine, 3-HAA - 3-hydroxyanthranillic acid, ACMS - 2-amino-3-carboxymuconate-6-semialdehyde, QA - quinolinic acid, KA - kynurenic acid, XA - xanthurenic acid, PA - picolinic acid, 5-HTP - 5-hydroxy-L-tryptophan, SER - serotonin. Enzymes: IDO - indoleamine-2,3-dioxygenase, AFMID - kynurenine aryl formidase, KMO - kynurenine-3-monooxygenase, KYNU - kynureninase, 3-HAO - 3-hydroxyanthranilate dioxy-genase, ACMSD - amino-b-carboxymuconate-semialdehyde-decarboxylase, KAT - kynurenine aminotransferase, TPH - tryptophan hydroxylase, AADC - aromatic-L-amino acid-L-tryptophan decarboxylase.

Extended Data Fig. 3 Association effects between biological parameters and CRP abundance with 95% confidence intervals (n = 215 CovPs, n = 45 HCs).

a. Immune cell subsets. Red and blue indicate positive and negative associations, respectively, at FDR 5%. Effect sizes are not on the original scale as both the dependent (biological parameter) and independent (CRP) variables have been log-transformed. b. Polar metabolites. Colors and scale as in a. c. Main classes of glyco- & lipoproteins. Colors and scale as in a. d. Metabolic ratios. Colors and scale as in a.

Extended Data Fig. 4 Correlation among the variables modeled by FPCA for the three recovery groups during acute phase (weeks 0-3) and convalescence (weeks 4–7).

Lower diagonal part of all heatmaps, correlations in CovPs’ data. Upper diagonal part, correlations in HCs’ data (same for all heatmaps). Stars, significance of correlation tests at FDR 5%.

Supplementary information

Supplementary Information

Supplementray Data 1–4 and Figures 1–14. Contains sensitivity analyses and complementary results: FPCA estimates for asymptomatic individuals; impact of secondary infections on the recovery profiles; recovery-group assignment sensitivity analysis; complement on the long-term symptom analyses.

Reporting Summary

Supplementary Table 1

Cohort characteristics by recovery groups 1–3. ‘Length of hospital stay’ is the number of days from hospital admission to discharge, transfer or death in hospital. sd, standard deviation.

Supplementary Data Table 1

Description of the grades for the long-term symptoms.

Supplementary Note

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ruffieux, H., Hanson, A.L., Lodge, S. et al. A patient-centric modeling framework captures recovery from SARS-CoV-2 infection. Nat Immunol 24, 349–358 (2023). https://doi.org/10.1038/s41590-022-01380-2

Download citation

Received: 25 May 2022
Accepted: 03 November 2022
Published: 30 January 2023
Issue Date: February 2023
DOI: https://doi.org/10.1038/s41590-022-01380-2

This article is cited by

The long-term health outcomes, pathophysiological mechanisms and multidisciplinary management of long COVID
- Jingwei Li
- Yun Zhou
- Chengdi Wang
Signal Transduction and Targeted Therapy (2023)
Immunologic prediction of long COVID
- Randy Q. Cron
Nature Immunology (2023)