Persistent symptoms following SARS-CoV-2 infection in a random community sample of 508,707 people

Introduction Long COVID, describing the long-term sequelae after SARS-CoV-2 infection, remains a poorly defined syndrome. There is uncertainty about its predisposing factors and the extent of the resultant public health burden, with estimates of prevalence and duration varying widely. Methods Within rounds 3-5 of the REACT-2 study, 508,707 people in the community in England were asked about a prior history of COVID-19 and the presence and duration of 29 different symptoms. We used uni- and multivariable models to identify predictors of persistence of symptoms (12 weeks or more). We estimated the prevalence of symptom persistence at 12 weeks, and used unsupervised learning to cluster individuals by symptoms experienced. Results Among the 508,707 participants, the weighted prevalence of self-reported COVID-19 was 19.2% (95% CI: 19.1,19.3). 37.7% of 76,155 symptomatic people post COVID-19 experienced at least one symptom, while 14.8% experienced three or more symptoms, lasting 12 weeks or more. This gives a weighted population prevalence of persistent symptoms of 5.75% (5.68, 5.81) for one and 2.22% (2.1, 2.26) for three or more symptoms. Almost a third of people 8,771/28,713 (30.5%) with at least one symptom lasting 12 weeks or more reported having had severe COVID-19 symptoms ('significant effect on my daily life') at the time of their illness, giving a weighted prevalence overall for this group of 1.72% (1.69,1.76). The prevalence of persistent symptoms was higher in women than men (OR: 1.51 [1.46,1.55]) and, conditional on reporting symptoms, risk of persistent symptoms increased linearly with age by 3.5 percentage points per decade of life. Obesity, smoking or vaping, hospitalisation , and deprivation were also associated with a higher probability of persistent symptoms, while Asian ethnicity was associated with a lower probability. Two stable clusters were identified based on symptoms that persisted for 12 weeks or more: in the largest cluster, tiredness predominated, while in the second there was a high prevalence of respiratory and related symptoms. Interpretation A substantial proportion of people with symptomatic COVID-19 go on to have persistent symptoms for 12 weeks or more, which is age-dependent. Clinicians need to be aware of the differing manifestations of Long COVID which may require tailored therapeutic approaches. Managing the long-term sequelae of SARS-CoV-2 infection in the population will remain a major challenge for health services in the next stage of the pandemic.

Clustering identified two distinct groups of individuals with different symptom profiles at 12 weeks, highlighting the heterogeneity of clinical presentation.The smaller cluster had higher prevalence of respiratory and related symptoms, while for those in the larger cluster tiredness was the dominant symptom, with lower prevalence of organ-specific symptoms.

Implications of available evidence
There is a high prevalence of persistent symptoms beyond 12 weeks after acute COVID-19, with little evidence of decline thereafter.This highlights the needs for greater support for patients, both through specialised services and, for those from low-income settings, financial support.The understanding that there are distinct clusters of persistent symptoms, the most common of which is dominated by fatigue, is important for the recognition and clinical management of the condition outside of specialised services.

Introduction
The UK has experienced one of the largest epidemics of COVID-19 in Europe.As a new disease, the natural history beyond the immediate illness and the possible long-term sequelae remain largely unknown.2][3][4][5] Patients sharing their experience on social media and establishing support groups were key to raising awareness of persistent symptoms and coined the term Long COVID.
The frequency and duration of persistent symptoms from COVID-19 are poorly understood and represent a major knowledge gap if effective treatments and management strategies are to be developed.7][8][9] It has been suggested that Long COVID describes a group of disparate conditions, including post-viral syndromes, long-term tissue or organ damage and ongoing inflammation. 3,7,10,11currence of Long COVID appears to be associated with the severity of acute COVID-19 symptoms; for example, high prevalence of persistent symptoms has been reported among people hospitalised with COVID-19. 12,13The number of acute symptoms has also been associated with risk of long COVID, alongside older age and female sex. 6ile many Long COVID studies so far have focused on hospitalised COVID-19 cases, [12][13][14][15][16] here we report data from random community-based samples of the population in England.These involved over half a million people who took part in rounds three to five of the Real-Time Assessment of Community Transmission-2 (REACT-2) study between September 2020 and February 2021.We estimate symptom prevalence and investigate co-occurrence of symptoms among participants reporting symptoms lasting 12 weeks or more following suspected or confirmed COVID-19.

Participants
The REACT-2 programme evaluates community prevalence of SARS-CoV-2 anti-spike protein antibody positivity in England.Random population samples of adults in England were invited to take part every 2-4 months using the National Health Service (NHS) patient list to achieve similar numbers of participants in each of 315 lower-tier local authority (LTLA) areas. 17rticipants registered via an online portal or by telephone.Those registered were sent a test kit by post that included a self-administered point-of-care lateral flow immunoassay (LFIA) test with instructions and a link to an online video.Participants completed a survey (online/telephone) upon completion of their self-test.Participants provided information on demographics, household composition, whether or not they thought that they had had COVID-19, whether or not they had had a PCR test, comorbidities, symptoms related to COVID-19, severity of symptoms, and duration of any of a list of 29 symptoms. 18In addition, we asked participants to report any other symptoms in free text.Personalised invitations were sent to between 560,000 and 600,000 individuals aged 18 years and above in each of rounds three to five of the REACT-2 study, carried out from 15 to 28 September 2020 (round 3), 27 October to 10 November 2020 (round 4) and 25 January to 8 February 2021 (round 5).Registrations closed after ~190,000 people had signed up at each round.

Data analysis
We obtained prevalence estimates for reporting of one or more of the 29 symptoms by sex, age and other characteristics, at time of suspected or confirmed COVID-19, and for persistence of symptoms at four and 12 weeks.Our main analyses focused on individual symptoms reported as lasting for 12 weeks (84 days) or more (this excluded a small number of participants with inconsistent or missing data, see Supplementary Figure S1).Prevalence estimates were weighted by sex, age, ethnicity, LTLA population and index of multiple deprivation, to take account of the sampling design that gave approximately equal numbers of participants in each LTLA, and differential response rates, to obtain prevalence estimates that were representative of the population of England as a whole.
. We used logistic regression (univariable, and sex, age adjusted) to investigate the associations of demographic and lifestyle factors with persistence of symptoms at 12 weeks or more, and gradient boosted tree models 19 to investigate predictive ability (area under the curve, AUC) changes from adding variables to the model for persistent symptoms at 12 weeks or more.
To identify a more specific set of persistent symptoms associated with history of COVID-19, in sensitivity analyses, we carried out variable selection in a 30% subset of symptomatic participants: in univariable models, we identified a subset of persistent symptoms (12 or more weeks) that were positively associated with a reported prior positive PCR test, and estimated the population prevalence of persistence of one or more of these symptoms.We also repeated the logistic and gradient boosted tree modeling with this subset of symptoms as outcome variables.
Generalised additive models (GAMs) were constructed with likelihood of symptom persistence at 12 weeks or more modelled as a smoothed function of sex and age.A default thin plate spline was used and the smoothed functions were plotted to visualise the relationship between risk of persistent symptoms and age.
We used free-text analysis to identify single and co-occurring words to indicate other symptoms recorded by participants, and plotted these in a network.
To identify symptom clusters segmenting participants, two binary matrices were constructed for presence or absence (1 or 0) of each of the 29 surveyed symptoms at (i) time of symptom onset and (ii) 12 weeks after, for each participant.Clustering was performed, separately, both row-wise (to identify groups of participants with similar symptoms) and column-wise (to group symptoms based on their co-occurrence) using the CLustering LARge Applications (CLARA) extension of the Partitioning Around Medoids (PAM) algorithm, implemented in the R package fpc. 20Briefly, PAM searches for the most representative data points to become cluster centroids by minimising the sum of dissimilarities between data points and their assigned centroids.CLARA uses a sampling approach to reduce the computational burden for large data sets.We used Hamming distance as a measure of dissimilarity between participants (row-wise clustering) and symptoms (column-wise clustering).We determined the optimal number of clusters using the average silhouette width.We used two methods to assess cluster stability.First, we bootstrapped and re-clustered 100 times, then quantified the difference .between bootstrapped and non-bootstrapped clusters using the Jaccard coefficient, which can range from 0 (no overlap) to 1 (perfect overlap). 21Second, we removed each symptom in turn, re-clustered, then calculated the average proportion of non-overlap (APN) between these and whole-dataset clusters as a proxy for the individual variable importance and contribution to the population segmentation. 22 further describe patterns of symptom co-occurrence, we took the cross-product of the symptom matrix at symptom onset and at 12 weeks to find pairwise symptom co-occurrence counts, and visualised them as heatmaps.

Prevalence of persistent symptoms
A total of 508,707 people took part in REACT-2 rounds three to five and completed surveys.The weighted prevalence of self-reported COVID-19 was 19.2% [19.1,19.3]with 92,116 people reporting one or more of 29 symptoms, of whom 76,155 (82.7%) reported a valid date of symptom onset 12 weeks or more before their survey date (Supplementary Figure S1A).Of those self-reporting COVID-19, 28,713/76,155 (37.7%) experienced at least one symptom for 12 weeks or more and 11,241 (14.8%) experienced at least three symptoms for the same period.This gives a weighted population prevalence of persistent symptoms of 5.75% (5.68, 5.81) for one and 2.22% (2.18, 2.26) for three or more symptoms for England to early February 2021.
Almost a third of people with at least one symptom lasting 12 weeks or more (8,771/28,713 [30.5%]) reported having had severe COVID-19 symptoms ("significant effect on my daily life") at the time of their illness, giving a weighted prevalence overall of people with persistent symptoms at 12 weeks who had reported severe symptoms of 1.72% (1.69,1.76)(Supplementary Table S1A).
Figure 1 shows how the proportion of people with one or multiple symptoms declined over time since infection.There was a rapid drop-off by four weeks, a further, smaller drop by 12 weeks, but then little evidence of further decline over time up to ~22 weeks for both men and women, with higher prevalence of symptoms at each time point among women.

Factors associated with persistent symptoms
. Among symptomatic people, the persistence of one or more symptoms for 12 weeks or more was higher in women than men (age-adjusted OR:  2, Supplementary Table S2).
There was a higher proportion with persistent symptoms among those with low incomes at 51.0% (49.5, 52.4) compared with high incomes at 28.7% (27.2, 30.4) and among people living in the most deprived areas at 42.6% (41.5, 43.6) compared with the most affluent areas at 34.7% (34.0, 35.3) (Table 1).
Prevalence of persistent symptoms at 12 or more weeks was around 50% or more among people reporting co-morbidities, ranging up to 67.9% (65.6,70.1)for "other lung condition" (Table 1).
In addition to the 29 symptoms enquired about on the questionnaire, 8,370 respondents gave free-text descriptions of other symptoms, of whom 1,860 reported symptoms that persisted for 12 weeks or more.Free-text analysis of co-occurring words indicated common additional symptoms which were not in our survey, including brain-fog, hair-loss, blood-pressure, heartpalpitations, severe-joint-pain (Supplementary Figure S3).

Sensitivity analysis
In sensitivity analyses, we assessed the extent by which the observed prevalence of, and factors associated with, persistent symptoms were altered by using a smaller set of symptoms.To improve specificity of the symptoms for COVID-19, we selected symptoms that were positively predictive of self-reported PCR positivity (among a 30% holdout set of people, Supplementary Figure S1), thus identifying a subset of 15 of the surveyed 29 symptoms (Supplementary Table S1B, Supplementary Figure S4).Based on this more limited set, the number of people with symptoms reporting one or more persistent symptoms at 12 weeks was 24,867/76,155 (32.7%), giving a weighted population prevalence of 4.95% (4.89, 5.01) (Supplementary Table S1B).
Logistic regression modelling with persistence of any one of these 15 symptoms at 12 weeks as a binary outcome showed similar results to the full 29-symptom analysis (Figure 2, Supplementary Figure S5).

Clustering analysis
In clustering analysis, two stable clusters of participants were identified based on symptom profiles at 12 weeks.Participants in Cluster L1 ("tiredness cluster") experienced high prevalence of tiredness, which co-occurred with muscle aches, difficulty sleeping and shortness of breath (Supplementary Figure S6).Participants in Cluster L2 ("respiratory cluster") experienced high prevalence of respiratory symptoms including shortness of breath and tight chest, as well as chest pain (Figure 3, Figure 4).A higher proportion of people in the respiratory cluster reported severe symptoms at the time of their COVID-19 illness (43.5%, [42.0,44.9])than in the tiredness cluster (27.4%, [26.7,28.1])(Supplementary Table S3).
We obtained research ethics approval from the South Central-Berkshire B Research Ethics Committee (IRAS ID: 283787).The REACT Public Advisory Panel provides regular review of the study processes and results.

Discussion
In this large community-based study of symptoms following COVID-19 among adults aged 18 years and above in England, participants reported high prevalence of persistent symptoms lasting 12 weeks or more.Estimates ranged from 5.8% of the population experiencing one or more persistent symptoms post-COVID-19 (corresponding to over 2 million adults in England), to 2.2% for three or more persistent symptoms (just under a million adults in England), and 1.7% with one or more symptoms lasting at least 12 weeks in people who reported severe COVID-19 symptoms affecting their daily life at the time of their illness.
Our estimates of the proportion of people with persistent COVID-19 symptoms are higher than in many other studies, although previous estimates have varied widely.At the low end, one study found that 2.3% of people with COVID-19 still reported symptoms at 12 weeks; 6 other studies have reported 13.7% of people were symptomatic at 84 days, 7 14.8% symptomatic at 90 days, 8 27% at 60 days, 23 34.7% at 7 months, 24 and as high as 46% at six months. 9Our comparatively high estimate, at 37.7% of people with COVID-19 experiencing one or more symptoms at 12 weeks, may partly reflect the large list of symptoms we surveyed, many of which are common and not specific to COVID-19.However, we asked participants only about symptoms that they related to a confirmed or suspected episode of COVID-19, and not to symptoms more generally.In addition, to improve specificity, in a sensitivity analysis we restricted COVID-19 symptoms to a subset of 15 that were predictive of reported PCR positivity and which included among others, loss or change of sense of smell or taste, tiredness, shortness of breath, muscle aches, heavy arms/legs, severe fatigue, tight chest and chest pain.This gave a similar proportion of people with persistent symptoms at 12 weeks (32.7%), and a population prevalence (5%) similar to that obtained when all 29 surveyed symptoms were considered.
Increasing age, female sex, BMI, hospitalisation and co-morbidites have previously been identified as risk factors for Long COVID 6,25 .Our finding of a linear association between age and persistent symptoms in people with symptomatic COVID-19 contrasts with some other studies that suggest the highest prevalence is found in middle-aged groups. 7Our finding is conditional on symptomatic COVID-19, reflecting the fact that older age groups in the community have lower infection rates than younger people 26 and are more likely to be asymptomatic. 27Our findings of an association of persistent symptoms with deprivation, and with smoking or vaping, are not well described in the literature, although smokers have been found to have a higher risk of severe COVID-19. 28r identification of two stable and well-differentiated symptom clusters at 12 weeks supports the characterisation of Long COVID as a diverse set of overlapping conditions.
Previous studies have taken a similar unsupervised approach to characterising subtypes of Long COVID, albeit at earlier time points: Sudre et al. 6 identified two symptom clusters at 28 days post-symptom-onset, which resembled the early clusters identified in our study but not the 12week clusters.Huang et al. 23 identified five clusters at 61 days, two of which reported high prevalences of respiratory symptoms similar to cluster L2 in our study.

Strengths and Limitations
This study uses a large random community sample with a high response rate (26-29% across the three rounds) to describe the persistence of COVID-19 symptoms.It is therefore more likely to be representative of the range of disease severity in the population compared to some others, especially those based on hospitalised cases alone 5 .The focus in our questionnaire on persistence of self-reported COVID-19 symptoms, without specific reference to Long COVID (in contrast to the questions in some other studies 7 ) has allowed us to investigate the persistence of a wide range of specific symptoms that have been suggested as relevant to Long COVID. 3,5However, it is clear that a wide spectrum of symptoms and clinical presentations post-COVID-19 may be involved; our open free-text question identified a number of symptoms not included in our questionnaire including "brain fog", "palpitations" and "hair loss". 29However, as the study was based on self-reported data and because many of the symptoms are common and not specific to COVID-19, we may, as noted, have overestimated the prevalence of persistent symptoms.
A further limitation is the retrospective study design, which introduces the possibility of recall bias.Nonetheless, in earlier analyses we have shown that participant reports of date of onset of their symptoms produce an epidemic curve that very closely tracks the epidemic. 27,30,31spondents were restricted to reporting a single date of (initial) symptom onset which does not allow for delayed onset of some symptoms, nor does it allow for the reporting of relapsing symptoms which appear to be a feature of Long COVID. 6A further limitation, despite the high response rate for a community surveillance study, is the possibility of participation bias as the REACT-2 study included a home antibody self-test; 27 it is plausible that people with persistent symptoms may have been more likely to participate in order to ascertain their antibody status.

Implications
The scale of morbidity identified in this study presents significant challenges for the affected individuals and their families, and for health services and society more broadly.After the initial decline in symptom prevalence between 4 and 12 weeks the prevalence of persistent symptoms plateaued indicating that large numbers of people may have chronic symptoms requiring investigation and intervention including rehabilitation.We show here that economically disadvantaged people and those in deprived areas appear to have a higher burden of persistent symptoms post COVID-19, compounding the excess burden of severe illness and mortality from COVID-19 experienced by these groups 32 .Inability to work due to disability from Long COVID could have a high impact on these populations.Given the high potential population health burden, investment is urgently needed to expand the network of services investigating and managing people with Long COVID, to link these services to national research studies and consistent data collection to improve our understanding, and to create the infrastructure for trials in the same way as has been achieved for acute COVID-19.
We identified two clusters of participants based on their symptoms.Individuals with predominantly respiratory symptoms comprised around one quarter of those with persistent symptoms --the larger proportion comes from a cluster of less organ-specific symptoms, particularly fatigue.Clinicians will need to be aware of the range of presenting symptoms to best support their patients towards recovery and there will need to be education of healthcare professionals to recognise and respect the experiences of people reporting these symptoms.
Societal recognition of Long COVID as an important outcome of the pandemic (including as an occupational illness for those acquiring it through work) is needed, including for those who may require benefits as well as rehabilitation.The Long COVID patient groups provide much needed support to those involved, but will require resources if they are to be able to work effectively with health services, especially to reach people in more deprived and marginalised groups.  .Numbers and proportions of participants who reported one or more symptoms (from a list of 29 surveyed symptoms) of COVID-19 at i) time of symptom onset, ii) 4 weeks post symptom onset, and iii) 12 weeks post symptom onset, among the 76,155 symptomatic participants for whom we have 12 weeks' follow-up and complete data.Figure 3 Results of clustering on symptom profile at time of symptom onset and then reclustering at 12 weeks, using CLARA (partitioning around medoids) algorithm.Central Sankey plot shows transitions between early and 12-week clusters.Two stable clusters of symptomatic infections were identified at t0: cluster A1 was characterised primarily by higher prevalence of flu-like symptoms (muscle aches, headache, fever, appetite loss, chills) and respiratory symptoms (shortness of breath, tight chest, new persistent cough).Two stable clusters were identified at 12 weeks.Cluster L1 ("tiredness cluster") had high prevalence of tiredness.Cluster L2 ("respiratory cluster") was a smaller subset of 4,441 participants who had high prevalence of shortness of breath and tight chest as well as chest pain.Table S1 Numbers and weighted prevalence of currently or previously reported persistent symptoms in the full REACT-2 study population (rounds 3-5).

Figures and Tables
Table 1A shows prevalences of any of the 29 surveyed symptoms; Table 1B shows prevalences of any of a subset of 15 symptoms identified as predictive of PCR positivity.The final three columns show prevalences weighted to be representative of the adult population of England, by age, sex, ethnicity, lower-tier local authority population, and index of multiple deprivation (IMD).
(A) All 29 symptoms

Figure S4
Results of logistic model with self-reported PCR positivity as the binary outcome variable, and the 29 surveyed symptoms as binary independent variables, among a 30% variable selection holdout data set.The 15 variables that were identified as positively predictive of PCR positivity (shown in bold colours) were taken forward as a sensitivity analysis of prevalence of persistent symptoms beyond 12 weeks, and as outcome variables in logistic regression of risk factors for persistence, among this reduced symptom set.

Figure S5
Plots showing the results of predictive modelling, with one or more of a subset of 15 symptoms at 12 weeks (y/n) as the binary outcome variable.Left panel shows odds ratios from logistic regression models, both crude and adjusted on sex and age.Right panel shows the mean contribution in area under the curve (AUC) that each variable makes to a multivariable boosted tree model, derived using variable permutation.Age, sex, BMI, smoking status and prior hospitalisation with COVID-19 are the strongest predictors of persistent symptoms in multivariable modelling, while Asian ethnicity is associated with a lower risk of persistent symptoms at 12 weeks; Asian ethnicity is selected as a predictor in multivariable boosted tree modelling.

Figure 1 Figure 2
Figure 1Plots showing persistence of symptoms as a proportion of those who reported symptoms at any time• Plots account for 'censoring'i.e. the denominator in the proportion at day X is the number of respondents who reported a symptomatic infection X or more days prior to responding to the REACT-2 survey (the denominator is shown in the bottom panel plots).Women have higher rates of persistent symptoms; a slower decline in symptom prevalence is observed after 12 weeks in both sexes.The vertical dashed lines show 4 and 12 weeks.

Figure 4
Figure 4 Persistence of symptoms for all symptomatic participants (top panel) and in 12-week symptoms clusters L1 and L2 (bottom panels).Dashed lines show 4 and 12 weeks post symptom onset.

Figure S1
Figure S1 Study population flow chart

Figure S2 Figure S3
Figure S2Plots showing (top row) modelled probability of persistent symptoms at 12 weeks as a function of age and sex, using generalised additive models with splines on age and interactions between age and sex (top row), and (bottom row) weighted prevalence of persistent symptoms at 12 weeks.From left to right, plots show (i) probability of experiencing COVID-19 symptoms in the population, (ii) probability of experiencing persistent symptoms at 12 weeks in the population, and (iii) probability of experiencing persistent symptoms at 12 weeks, conditional on symptomatic infection.Conditional on symptomatic infection (likelihood of which decreases after age 50, see central panels), risk of persistent symptoms increases linearly with age (right panels).

39 htFigure S6
Figure S6 Pairwise co-occurrence heatmaps of symptoms at time of symptom onset (left) and 12 weeks after symptom onset (right), among 76,155 symptomatic participants.Deeper blue colour indicates that more people had both of the paired symptoms at symptom onset or 12 weeks post symptom onset; bar plots indicate marginal counts of participants with each symptom at symptom onset and at 12 weeks post symptom onset.
1.51 [1.46,1.55]),andincreasedwithage,with a linear increase of 3.5 percentage points per decade of life (Table1, Figure2, Supplementary (B) 15 symptoms (loss or change of sense of smell, loss or change of sense of taste, tiredness, shortness of breath, heavy arms/legs, severe fatigue, dizziness, numbness/tingling, appetite loss, red, itchy areas on skin, difficulty sleeping, tight chest, headache, chest pain).

Table S2
Odds ratios for persistent symptoms at 12 weeks among symptomatic respondents, derived from logistic regression models (forest plot shown in Figure2)