As of June, 2022, coronavirus disease 2019 (COVID-19) has resulted in more than 500 million confirmed cases with 6 million deaths in more than 200 countries and regions [1]. With the increasing number of patients recovering from COVID-19, the long-term consequences of SARS-CoV-2 infection are now being highlighted. The evolving epidemiological and clinical evidence found that a proportion of survivors report ongoing health problems, which are described by a series of terms, such as long COVID-19, long-haul COVID-19 or post-acute COVID-19 syndrome [2].

A number of studies revealed that COVID-19 had long-term effects on multiple organ systems including pulmonary, cardiovascular, neurological, gastrointestinal, psychiatric and dermatologic systems [3,4,5,6,7,8,9,10]. According to a meta-analysis of sequelae for COVID-19, more than half of survivors have reported at least one sequelae symptom for up to 12 months after the acute phase [7]. The most common findings included abnormalities on lung CT, abnormal pulmonary function tests, generalized symptoms, psychiatric symptoms, and neurological symptoms mainly cognitive deficits and memory impairment. In a two-year hospital-based follow-up study of COVID-19 patients, 55% reported at least one sequelae symptom, with fatigue or muscle weakness, sleep disturbances and smell disorder always being the common symptoms [11]. There is growing concern about brain damage caused by COVID-19. The recent definition of post-COVID-19 from WHO includes fatigue, dyspnea and cognitive dysfunction as three most common symptoms [12]. Moreover, stressful medical care, long hospital stays with numerous restrictions on mobility and social isolation, and stigma can contribute a few people to experiencing the lingering mental symptoms including depression, anxiety, insomnia and post-traumatic stress disorder (PTSD) symptoms [3,4,5, 7, 13,14,15,16,17,18].

In response to the COVID-19 survivors who experience a wide array of complex, multifactorial symptoms, precise classification for COVID-19 sequelae is integral part of the rehabilitation therapy, which will guide health care practitioners support targeted interventions for survivors based on sequelae phenotype. Few studies identified clinical phenotype of long COVID with different clustering methods. A UK multicenter cohort study of survivorship of hospitalized patients with COVID-19 just identified four recovery clusters by cluster analysis, but no further investigated associated factors with recovery clusters [4]. A UK community study identified a subset of COVID-19 survivors with predominantly respiratory symptoms and another subset of patients with tiredness [19]. Another study from Spain conducted clustering analysis among not only sequelae symptoms but also some clinical characteristics like medical co-morbidities and prevalence of symptoms at hospital admission [20]. Several studies only focused on symptomology without characterizing sequelae phenotype. The longest cohort of COVID-19 sequelae lasted up to two years after discharge, but it was based on a hospital which was specifically built for COVID-19 treatment [11]. The participants had some special characteristics for this hospital. There is an urgent need for community-based follow up study to recruit COVID-19 survivors from a wider range of diverse type of hospital sources, aiming to explore the general feature and phenotype of long-term consequences of COVID-19 for precise classification and intervention. We conducted a community-based cohort study, to identify sequelae phenotype and predict factors associated with clusters, which will become a foundation of evidenced based information guiding future tailored recovery interventions.


Study design and participants

We recruited COVID-19 survivors from the communities in Hongshan district, Wuhan City, Hubei Province from October 12 to November 19, 2021. We invited all participants with laboratory confirmed or clinician-diagnosed COVID-19 from their medical records and the patients lists of health checkup was obtained from corresponding management agency of Hongshan District by the Wuchang Hospital. We excluded participants under 15 years old in this analysis. We provided participants with an electronic or paper questionnaire covering physical, cognitive and mental health. Most participants received the Montreal Cognitive Assessment (MoCA) to assess cognitive function. The results of the antibody and nucleic acid tests were collected from their medical records.

This study received approval from the ethics committee of Peking University Sixth Hospital (Institute of Mental Health). This study followed the American Association for Public Opinion Research (AAPOR) reporting guidelines and the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement (appendix pp 2–4). Informed consent was obtained from all study participants prior to all study procedures.

Measurement of long-term COVID-19 sequelae

Demographic characteristics included sex, age, occupation and education. Smoking, drinking, COVID-19 vaccination status, physical comorbidities, a history of mental disorders and a family history of mental disorders were also collected. According to the Chinese Clinical Guidance for COVID-19 Pneumonia Diagnosis and Treatment, the disease severity was classified three types: mild (i.e., the clinical symptoms are mild and there was no sign of pneumonia on chest imaging), moderate (the clinical symptoms are fever and respiratory symptoms, and radiologic assessments found signs of pneumonia) and severe/critical (severe, patients meet any of the following conditions: (1) shortness of breath, RR ≥ 30 times/min; (2) oxygen saturation ≤93% at rest; (3) alveolar oxygen partial pressure/fraction of inspiration O2 (PaO2/FiO2) ≤300 mmHg; (4) pulmonary imaging with significant progression of lesion > 50% within 24–48 h; critical, patients meet any of the following conditions: (1) respiratory failure requiring mechanical ventilation; (2) shock; (3) combined with other organ failure needed ICU monitoring and treatment) [21].

27 sequelae symptoms from five organ systems (i.e., generalized, neurological, mental, cardiopulmonary and digestive systems) were collected from participants. Among them, 21 self-reported symptoms were evaluated by the symptom questionnaire, and the remaining 6 symptoms were measured by validated scales. The contents of the symptom questionnaire and assessment tools of outcomes were shown in appendix (Supplementary Table S1). Specifically, dyspnea was assessed by the modified British Medical Research Council (mMRC) dyspnea scale, which is a five-category scale to characterize the level of dyspnea with physical activity in which higher scores correspond with increased dyspnea, and the total scores higher than 0 were considered as dyspnea [22]. We also assessed quality of life used by the EuroQol five-dimension three level (EQ-5D-3L) questionnaire and the EuroQol Visual Analogue Scale (EQ-VAS) [23].

Cognitive function was measured by objective assessment via the validated Chinese version of Montreal Cognitive Assessment (MoCA) [24] and subjective assessment via the abbreviated Cognitive Failure Questionnaire-14 (CFQ-14) [25]. The MoCA evaluates global cognitive function with a total score of 0–30, which was interpreted as follows: normal cognitive function (26–30), mild (18–25), moderate (10–17) and severe (0–9) cognitive impairment. The CFQ-14 measures daily life cognitive failures, which has 14 questions with a 5-point Likert scale resulting in a factored score ranging from 0–100, with a score of ≥43 indicating cognitive failures [26]. The survivors with the MoCA scores lower than 18 (moderate or severe cognitive impairment) or the CFQ-14 scores higher than 43 (cognitive failure) were defined as cognitive impairment.

Mental health included depression symptoms measured by the Patient Health Questionaire-9 (PHQ-9) [27], anxiety symptoms measured by the Generalized Anxiety Disorder-7 (GAD-7) [28], insomnia symptoms measured by the Insomnia Severity Index (ISI) [29], and PTSD symptoms measured by the PTSD checklist for DSM-5 (PCL-5) [30]. The total scores of these scales were interpreted as follows: PHQ-9, normal (0–4), mild (5–9), moderate to severe (10–27) depression symptoms; GAD-7, normal (0–4), mild (5–9), moderate to severe (10–21) anxiety symptoms; ISI, normal (0–7), subthreshold (8–14), moderate to severe (15–28) insomnia symptoms. We used the following cut-off scores as having depression (PHQ-9 > 4), anxiety (GAD-7 > 4), insomnia (ISI > 7) or PTSD symptoms (PCL-5 ≥ 33).

Statistical analysis

Continuous variables were presented as mean (SD) or median (IQR). Binary and categorical variables were presented as counts and percentages. Characteristics of participants were stratified by the severity of the acute illness. A chi-square test was used to identify differences in proportions across multiple categories. For normally distributed continuous data, one-way analysis of variance (ANOVA) was used to test differences across categories, with Kruskal–Wallis tests used for non-normally distributed data.

To identify the sequelae clusters of the COVID-19 survivors, unsupervised clustering of self-reported symptoms, and dyspnea, cognitive function and mental health evaluated by scales was undertaken using latent class analysis. The final number of classes were determined based on the conceptual meaning, smallest estimated class proportions, and statistical model fit indices, such as the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), adjusted BIC (aBIC) and entropy [31]. The Lo-Mendell-Rubin (LMR) test is also used to determine the number of classes in latent class analysis [32]. This is obtained by the k class run also doing a k-1 class analysis and using the derivatives from models to compute the p-value. A low p-value rejects the k-1 class model in favor of the k class model. To explore the predictive factors of sequelae phenotype, we used multinomial logistic regression for the sequelae clusters derived from latent class analysis. We adjusted for sex, age, physical comorbidities, history of mental disorders, duration of hospitalization and acute disease severity. All tests were two-tailed and p values of less than 0.05 were considered statistically significant. Latent class analysis was done with Mplus version 8.3. Other analyses used Stata MP version 16.



We invited 1791 COVID-19 survivors to take part in our survey from the communities, with 1783 survivors aged older than 15 years eligible for our survey. 1023 questionnaires were received and 1000 survivors were included in final analysis after excluding 21 repeated questionnaires and 2 participants under 15 years, with the response rate of 56.1% and the effective questionnaire rate of 97.8% (Fig. 1). The characteristics of COVID-19 survivors included in the final analysis were similar to those not included (Supplementary Table S4).

Fig. 1: Flow chart of the study.
figure 1

*1791 COVID-19 survivors invited in our survey, with 1002 survivors willing to participate in survey and 789 not participating in survey. 1783 COVID-19 survivors eligible for survey with the exclusion of 8 survivors under 15 years (1000 eligible survivors included in analysis and 783 eligible survivors not included). The response rate: 1000 eligible survivors included in analysis / 1783 eligible survivors invited = 56.1%. The effective rate: 1000 eligible questionnaires/1023 completed questionnaires = 97.8%.

The mean age of the participants was 55.9 (13.8) years, with 446 (44.6%) men. The majority of survivors had a high school education or lower (high school, 32.3%; middle school, 21.7%; primary school or lower, 7.3%). 514 (51.4%) had at least one physical comorbidity and the most common physical comorbidity were hypertension (27.5%), fatty liver (15.7%), hyperlipemia (12.1%) and diabetes (10.4%). 43 patients (4.3%) had history of mental disorders and 10 (1.0%) had family history of mental disorders. 85 patients (8.5%) were diagnosed as severe or critical cases in the acute phase. The median length of hospital stay was 31.0 days (21.0–48.0). The participants were assessed at a median of 625.0 days (615.0–634.0) after COVID-19 diagnosis. Seropositivity of IgG was observed in most patients (94.9%), but a few of people had the seropositivity of IgM (2.2%). No reinfection of SARS-COV-2 was observed according to the nucleic acid test. 77.3% of participants have received at least one dose of a COVID-19 vaccine (Table 1).

Table 1 Characteristics of COVID-19 survivors included in analysis by severity of acute illness.

Phenotype of four sequelae clusters

Among 1000 participants, 27 symptoms from five system were classified by latent class analysis (Supplementary Table S2), we identified four sequelae clusters: 44.9% individuals were classified as no or mild group (cluster 1), which included the survivors reporting 1 (median) symptom with the lowest probabilities of physical and mental symptoms. 29.2% were classified as moderate symptoms with mainly physical impairment with 4.0 (median) symptoms (cluster 2) and 9.6% were classified as moderate symptoms with mainly cognitive and mental health impairment with 5.0 (median) symptoms (cluster 3). Both cluster 2 and cluster 3 had a moderate proportion of sequelae symptoms, but physical symptoms (generalized, 95.2%; cardiopulmonary, 49.7%; digestive, 20.9%) were more common in cluster 2, and neurological (32.3%) especially cognitive impairment (25.0%) and mental symptoms (100.0%) were more common in cluster 3. 16.3% were classified as severe group (cluster 4), which had 11.0 (median) symptom with a greater proportion of physical (generalized, 100.0%; cardiopulmonary, 89.0%; digestive, 73.6%), neurological (any, 92.6%; cognitive impairment, 40.5%) and mental symptoms (98.2%) (Fig. 2 & Fig. 3).

Fig. 2: Sequelae symptoms stratified by sex, age, severity of acute illness and sequelae clusters.
figure 2

*means the p-value of subgroup test less than 0.05.

Fig. 3: The rate of sequelae symptoms by four sequelae clusters.
figure 3

A The rate of sequelae symptoms by four sequelae clusters. B The rate of five organ systems by four sequelae clusters. Cluster 1: no or mild group (no or mild physical, cognitive and mental health impairment) (44.9%). Cluster 2: moderate group with mainly physical impairment (29.2%). Cluster 3: moderate group with mainly cognitive and mental health impairment (9.6%). Cluster 4: severe group (severe physical, cognitive and mental health impairment) (16.3%).

The demographic and clinical characteristics were different between the four sequelae clusters (Supplementary Table S3). The severe cluster 4 had a greater proportion of having at least one physical comorbidity (65.6%) and history of mental disorders (11.7%) than the other three clusters. All of individuals without any persistent symptoms were in the no or mild cluster. The median of symptom count (11.0 [10.0–14.0]) in the severe cluster was significantly higher than mild (1.0 [0.0–1.0]) and moderate group (mainly for physical impairment, 4.0 [3.0–6.0]; mainly for cognitive and mental health impairment, 5.0 [4.0–6.5]). The total duration of hospitalization was longer in the severe cluster (Cluster 4, 38.0 [23.0–57.0]) than the other clusters. There were higher rates of reporting some or extreme problems for mobility (23.5%), self-care (7.4%), usual activities (11.7%), pain/discomfort (74.1%) and anxiety/depression (60.5%) evaluated by the EQ-5D-3L and the scores of the EQ-VAS in the moderate cluster with mainly for cognitive and mental health impairment (66.5 [56.5–80.0]) and severe cluster (67.0 [50.5–75.0]) were lower than the other two clusters.

Predictive factors for sequelae clusters

We explored the associated risk factors for the moderate and severe cluster. Compared with the mild cluster, participants in the moderate cluster with mainly physical impairment were more likely to be younger than 60 years (OR 1.43, 95% CI 1.03–1.98), to have at least one physical comorbidity (OR 1.68, 95% CI 1.22–2.31), and to be severe or critical cases in the acute phase (OR 1.81, 95% CI 1.01–3.23). Individuals who were female (OR 1.69, 95%CI 1.05–2.73), younger than 60 years (OR 2.08, 95%CI 1.23–3.51), and had longer hospital stays (≥60 days, OR 2.57 [1.34–4.93]) had a high proportion of being identified into the moderate cluster with mainly cognitive and mental impairment in comparison with the mild cluster. Individuals aged younger than 60 years (OR 2.10, 95% CI 1.38–3.19), having at least one physical comorbidity (OR 2.71, 95% CI 1.81–4.08), having a history of mental disorders (OR 6.79, 95% CI 2.70–17.08), having longer hospital stays (≥30 days, OR 1.67 [1.09–2.56]); ≥60 days, OR 2.65 [1.53–4.59]) and being diagnosed as severe or critical cases in the acute phase (OR 2.29, 95% CI 1.20–4.39) were significantly higher risk to be in the severe cluster than that in the mild cluster (Table 2).

Table 2 Factors that influence sequelae clusters according to multinomial logistic regression.


This study is one of the first to report the phenotype clusters of long-term sequelae of COVID-19 including the five organ systems among 1000 community-based survivors after 20 months recovery from COVID-19. We identified four clusters: nearly half of participants having no or mild impairment, nearly three-tenth of survivors having moderate impairment with mainly physical symptoms, nearly one-tenth suffering from moderate impairment with mainly cognitive and mental symptoms, and more than one-sixth suffering from severe impairment. Participants who had physical comorbidities or history of mental disorders, had longer hospital stays and were severe clinical illness cases in the acute phase with high risk to severe or moderate cluster. Physical comorbidities and long-term hospitalization stay were associated with the physical and mental or cognitive sequelae clusters. The findings indicate that the survivors in the four different clusters need different precise intervention and management strategy. Our findings introduced an important framework to map numerous symptoms to precise classification of the clinical sequelae phenotype and provided information for the medical resource reserve and allocation during the post pandemic era.

The different combination of characteristics of acute illness will predict the 4-phenotype classification. We identified that nearly half of survivors have no or mild sequelae symptoms, which aligned with the prevalence of mild cluster of the recovery clusters derived from the Post-hospitalisation COVID-19 study (PHOSP-COVID) in the UK [4]. Our results revealed a certain percentage of patients suffering from moderate impairment with mainly cognitive function and mental health. This showed a possible relationship between SARS-CoV-2 infection and the development or deterioration of cognitive impairment and complaints. Regarding a wide range of cognitive assessment tools, different definitions of cognitive impairment and various population, the prevalence of cognitive impairment in COVID-19 survivors was varied. From an international online survey of individuals with suspected or confirmed COVID-19, 58.4% of patients reported cognitive dysfunction after six months [3]. Cognitive impairment was also found after other coronavirus infections, like SARS, and MERS [33], which can extrapolate the number of COVID-19 patients with neurological deficits. COVID-19 had a profound negative effect on the survivors’ mental health. Our study reported a high rate of any mental symptoms, which were the second common sequelae, and all of moderate clusters with mainly cognitive and mental health impairment had psychological symptoms. It is worth noting that even mildly affected patients can exhibit psychological symptoms, including depression, anxiety and insomnia symptoms. Another meta-analysis study for COVID-19 sequelae also demonstrated an overall mental symptoms prevalence of 19.7% [7]. We observed that both mental and neurological persisted up to 20 months, but a 2-year retrospective cohort study found the disparities in the trajectories of these two health conditions. The researchers demonstrated that compared to other respiratory diseases controls, the increased risk of mood and anxiety disorders in the COVID-19 convalescent phase were transient, but the risk of psychotic disorder, cognitive deficit, dementia, and epilepsy or seizures was still higher than controls at 2-year follow-up [34]. It is difficult for us to estimate the relative risk of long COVID due to the lack of controls, which may result to this difference. Several mechanisms have been presented to explain SARS-CoV-2’s effects on the brain, including viral neurotropism, widespread systemic inflammation, and psychological burden of the pandemic across the world [35].

A few studies have reported that women infected with SARS-CoV-2 were more likely to get long lasting symptoms than men [5, 11], which is consistent with our findings that women had a higher percentage of sequelae and women were more likely to be classified into moderate clusters with mainly cognitive and mental health impairment. Sex differences in long COVID-19 syndrome may be explained by differences in immune system function between females and males. More rapid and robust immune responses can protect women from SARS-CoV-2 infection and serious illness, but this condition may cause the long-lasting autoimmune diseases in the convalescent phase, which is related to the long COVID-19 [36, 37]. There is also a distinctive age distribution in COVID-19 survivors. According to our study, the risk of developing the lingering disorder is relatively lower in survivors aged less than 60 years than older people. There is no consensus on age differences in COVID-19 sequelae. Some studies reported that increased age was associated with more reports of long COVID-19 symptoms [5, 7], while some reported the opposite [4, 38, 39]. The PHOSP-COVID also found that middle age (40–59 years) was associated with not recovery from COVID, and their authors claimed in the ‘news feature’ section of Nature that these findings may be due to ‘survivors bias’ [4, 40]. We also speculated that older people were not willing to participate in the survey due to limited mobility and young people with more symptoms of long COVID were willing to accept assessment, which caused the selective bias and higher proportion of long COVID observed in young people. However, according to a prospective cohort of home-isolated COVID-19 patients, young people with mild acute illness who didn’t need a hospital treatment were also at risk of persistent symptoms of dyspnea and cognitive symptoms. Therefore, we still need to pay attention to long COVID in young people [41]. A positive association between comorbidities with increased risk of moderate impairment with mainly physical symptoms and severe impairment was observed in our study. Consistent with previous reports, comorbidities, such as diabetes, chronic cardiovascular or kidney disease, and cancer, are associated with both increased risk of severe acute illness and subsequent poor recovery [2, 4]. We found that the sequelae clusters were related to severity of acute illness, except for the moderate group with mainly cognitive and mental health impairment, further supporting the view that it is hard to attribute solely the long COVID-19 syndrome to the severity of the acute lung and other organs injury. Longer hospital stays, which not only represent disease severity, but also bring numerous restrictions on mobility and life style, may have affected mental health sequelae [7]. Moreover, the medical interventions for a long-term study in the hospital, especially mechanical ventilation, provided to alleviate conditions in acute infection, were also related to severe long-term sequelae symptoms.

We found that nearly three-tenth of survivors had moderate impairment with mainly physical symptoms, of which fatigue was the most common symptom. Similar results were observed in two-year follow-up long COVID-19 study, in which fatigue or muscle weakness was always the most frequent [11]. It is not surprising that some people infected with COVID-19 developed a debilitating chronic fatigue [42], because post-infectious fatigue syndromes are often preceded by several infectious agents, like SARS [43] and Ebola virus [44]. COVID-19 and other infectious diseases with lingering fatigue in the recovery may share a common pathogenic mechanism irrespective of the acute symptoms. In the moderate group with mainly physical symptoms, cardiopulmonary sequelae were observed in nearly half of survivors, and dyspnea was the most prevalent sequelae, indicating that it is necessary to pay attention to the long-term lung impairment and carry out serial objective assessments like lung CT scans, pulmonary function tests and six minute walk test (6MWT) during convalescence.

Our findings indicated that precise management and intervention strategies should be administrated for different sequelae phenotypes in community COVID-19 survivors. From identified risk factors of sequelae clusters, we can design specific management for survivors in the convalescent phase according to their different demographic characteristics including age and sex, and clinical characteristics including medical comorbidities, duration of hospitalization and severity of acute illness. For survivors in the severe sequelae cluster, a multidisciplinary model of care should be supported on the basis of comprehensive evaluations, including physical therapy, occupational therapy, physical medicine and psychotherapy. For moderate group with mainly physical symptoms, physical therapy can be provided in the post-acute period to help survivors adapt and improve functioning, and return to normal life. For moderate group with mainly cognitive and mental health impairment, psychotherapy or behavioral interventions can cope with mental health difficulties, including depression, anxiety, insomnia and PTSD. Cognitive rehabilitation including retraining orientation, memory, attentions and executive functioning skills is also vital for this group of survivors with cognitive impairment [45,46,47].

The strengths of our community-based cohort study include assessing the comprehensive assessment of physical, cognitive and mental health outcomes in a large community-based sample of COVID-19 patients after acute infection, identifying sequelae clusters and investigating predictive factors for sequelae classification. However, this study has several limitations. Firstly, we cannot draw a clear conclusion that whether the sequelae symptoms were caused by SARA-CoV-2 infection or the premorbid diseases due to the lack of controls without COVID-19 diagnoses. Moreover, due to the absence of controls without COVID-19, we are difficult to compare prevalence of symptoms and estimate the relative risk of sequelae. Meanwhile, we need to validate the sequelae results in a large cohort study with healthy control in the future. Secondly, since most are self-reported symptoms, more objective assessments, like CT, MRI and PET are required to better characterize the long-term symptoms in future studies. Thirdly, despite the relatively low response rate and not random sampling, there is no difference between COVID-19 eligible survivors included and not included in final analysis and our participants are representative (Supplementary Table S4). Fourthly, it is unclear that how the sequelae clusters will progress during the recovery process, which needs to be investigated in further follow-up studies. Finally, information about treatment and condition during hospital stay, such as occurrence of delirium, ICU admission, use of mechanical ventilation and medical treatment were not available, which would have contributed to better interpretation of long-term health outcomes.

We identified four sequelae clusters after COVID-19 sequelae according to physical, cognitive and mental health symptoms, and investigated predictive factors associated with clusters. About half survivors need stratified and different degree and specific management and intervention 20 months after recovery from COVID-19. Our findings support the need to characterize clinical phenotypes according to validated assessment of overall health during convalescence, and suggest an important framework to map numerous symptoms to different clusters guiding future precise recovery interventions.