Introduction

Estimates of the prevalence of long-COVID vary widely. The WHO estimates the percentage of people who continue to have, or develop, at least one symptom more than three months after SARS-CoV-2 infection as 10–20%1. However, the UK Office for National Statistics estimate that self-reported long-COVID at a population level is much lower at 2.7%2. By contrast, a recent meta-analysis of 194 studies including 735,006 participants estimated that, at an average follow-up of 126 days, 45% of COVID-19 survivors had at least one unresolved symptom3.

Apart from altered taste and smell, the symptoms of long-COVID are non-specific. Symptoms attributed to long-COVID may be due to other causes, yet most studies of long-COVID do not include a comparison group. A study from the Netherlands compared persistent symptoms in 4231 participants who previously had COVID-19 and 8462 matched controls. Among the former, 21.3% of people had at least one symptom 3-5 months after SARS-CoV-2 infection compared with 8.7% of people not infected, suggesting the true prevalence may be nearer 12.6%4.

The long-COVID in Scotland Study (Long-CISS) is a population cohort comprising people with laboratory-confirmed SARS-CoV-2 infection and an age-, sex-, and socioeconomically-matched group of people who have never been infected5. Using the Long-CISS cohort, we aimed to determine the true prevalence of long-COVID at 6, 12, and 18 months, overall and by sub-groups. This work expands on that previously published5 by including additional waves of questionnaires and focusing analysis on the calculation of the prevalence of one or more ongoing symptom attributable to SARS-CoV-2 infection. Here, we show that the crude prevalence is 13.8%, 12.8%, and 16.3% at 6, 12, and 18 months respectively. Following adjustment for potential confounders, these figures are 6.6%, 6.5%, and 10.4% respectively.

Results and discussion

Overall, 345,673 questionnaires were completed by 288,173 individuals, of whom 257,341 (89%) consented to record linkage to their test result. Following linkage, 53,530 were excluded because they reported a previous positive test that was not recorded on the database, and 5,715 because they had asymptomatic infections. Of the remaining 198,096 individuals, 98,666 (49.8%) had previous symptomatic, laboratory-confirmed SARS-CoV-2 infection and 99,430 (50.2%) had never had a positive test. PCR tests took place between the 20th of April 2020 and the 31st of May 2022. Questionnaires were completed between the 10th of May 2021 and the 14th of November 2022. Compared with those who did not provide consent, participants in the final sample were more likely to be female (58.8% vs 51.8%; p-value < 0.001), were older (>40 years 64.0% vs 51.1%; p-value < 0.001) and slightly more deprived (most deprived SIMD quintile 20.8% vs 20.4%; p-value < 0.001).

Infected individuals were less likely to have pre-existing health conditions and more likely to have been vaccinated (Table 1). Because new first infections occurred over time, later periods of the pandemic were less common in the never infected group. Whilst 64.5% reported at least one symptom six months following SARS-CoV-2 infection, this was also true of 50.8% of those never infected (Table 2). Results were similar at 12 (67.8% versus 55.0%) and 18 (72.6% versus 56.2%) months follow-up. The crude prevalence of at least one symptom attributable to SARS-CoV-2 infection was 13.8% (13.2%,14.3%), 12.8% (11.9%,13.6%), and 16.3% (14.4%,18.2%) at six, 12 and 18 months respectively. Following adjustment for potential confounders, these figures were 6.6% (6.3%, 6.9%), 6.5% (6.0%, 6.9%) and 10.4% (9.1%, 11.6%), respectively (Supplementary Table 1). The attributable prevalence was higher in women and those who had had more vaccination doses prior to infection and lower in those with more pre-existing health conditions (Fig. 1). The adjusted attributable percentage was higher for people infected later in the pandemic: 6.7% (6.2%, 7.1%) and 7.9% (6.9%, 9.0%) at six months follow-up for the delta and omicron variants respectively compared with 3.9% (3.2%, 4.6%) for the alpha variant (Supplementary Table 1).

Table 1 Characteristics of participants by SARS-CoV-2 infection status
Table 2 Crude prevalence of individual and any symptoms at 6, 12 and 18 months following SARS-CoV-2 infection
Fig. 1: Adjusted attributable prevalence of long-COVID at 6 months following symptomatic SARS-CoV-2 infection.
figure 1

Data are presented as adjusted attributable prevalence values ±95% confidence intervals. SIMD Scottish Index of Multiple Deprivation; LTC Long term condition; VOC variant of concern. N = 132,879. Adjusted for age, sex, SIMD quintile, number of LTCs, ethnic group, vaccination status, and variant period. Numerical values of the estimates are provided in Supplementary Table 1.

Of the 98,666 participants with previous symptomatic infection 2256 (2.29%) had severe infection. At six months follow-up the crude prevalence of at least one symptom was 64.3% following mild infection compared with 79.3% following severe infection. These values were 67.8% and 82.5%, and 71.7% and 84.0%, respectively at 12 and 18 months follow-up.

Our finding that the true prevalence of long-COVID was 6.6–10.3% is not inconsistent with 12.7% reported in the Netherlands4 and the WHO estimate of 10–20%1. Based on these three sources, the UK Office for National Statistics estimate of 2.7% may be an underestimate. In our previous analysis of the same cohort, 48% of people self-reported that they were not fully recovered six months following symptomatic SARS-CoV-2 infection5. Similarly, meta-analysis of published studies reported that 45% had unresolved symptoms at 4 months follow-up3. However, our findings from the current study suggest that whilst 64.5–72.6% of people report at least one symptom six to 18 months following SARS-CoV-2 infection, only 6.6%–10.3% are likely to have long-COVID. The symptoms of the remainder are likely to have occurred without SARS-CoV-2 infection but some people may mistakenly attribute them to long-COVID. Further work is required to refine the definition and diagnosis of long-COVID and support appropriate management.

A national cohort study in England used similar methodology to estimate long-COVID prevalence in adolescents aged 11–17 years6. Potential participants were invited from the individuals in Public Health England’s SARS-CoV-2 testing database. Invitations to complete an online questionnaire were sent by letter, with a response rate of 13%. Those who tested positive for SARS-CoV-2 (n = 3065) were matched by month of test, age, sex, and geographical region to adolescents who tested negative (n = 3739). At 3 months follow-up the crude prevalence of at least one symptom attributable to infection was 13.2%, very close to our estimate in adults of 13.7% at 6 months follow-up.

Symptoms, reduced quality of life, impairment of activities of daily living, and self-reported non or partial recovery following SARS-CoV-2 infection are more common among people with pre-existing health problems, especially multimorbidity5. However, the findings of this study did not support the conclusion that their worse health following SARS-CoV-2 infection is due to a higher prevalence of long-COVID. This is based on us applying a modification of the WHO definition of long-COVID as one or more persistent or new symptom. We could not examine whether, for example, their existing symptoms deteriorated more as a result of SARS-CoV-2 infection than would otherwise have occurred. Our modification of the WHO definition does not incorporate the minimum symptom duration of at least 2 months. We could not determine if the participant’s reported symptoms lasted for at least this duration.

Whilst the percentage of people reporting one or more symptom at six months was slightly lower following omicron (63.3%) than the alpha (66.8%) and delta variants (66.7%), the true prevalence of long-COVID at six months was higher following omicron and delta than the alpha variant. Our adjusted result contradicts the findings of studies without comparison groups, that concluded that long-COVID is less prevalent following the omicron variant7, 8. In a Norwegian prospective cohort study, Magnusson et al. found that, compared with individuals who tested negative for SARS-CoV-2, the risk of ongoing symptoms posed by the omicron and delta variants were comparable at 14-126 days follow-up9.

Strengths of this study include its large, unselected study sample recruited from the general population, laboratory confirmation of infection status, and inclusion of a comparison group. To minimise bias, the comparison group was matched by age, sex and deprivation and we adjusted for a wide range of other confounders. Nonetheless, residual confounding is possible in any observational study and may explain the finding of a higher prevalence of long-COVID among people who had more vaccinations prior to infection. This finding conflicts with that of Antonelli et al.10, who reported reduced odds of long-duration (≥28 days) symptoms following two vaccine doses compared with no vaccination.

Similarly, the apparent higher prevalence of long-COVID 18 months following infection may reflect the onset of new symptoms, residual confounding due to over-representation of infections early in the pandemic in spite of adjustment for dominant variants, or be due to retention bias whereby retention is higher in those with symptoms. Both groups gain 6 months of age between questionnaires and most symptoms increase with age. Moreover, both SARS-CoV-2 infection and many of the symptoms reported at follow-up vary by season. However, this is more likely to explain differences between 6 and 12 months follow-up and between 12 and 18 months. People dying from long-COVID over time could contribute to a fall in the prevalence of long-COVID over follow-up. However, our findings do not reflect such a fall.

Selection bias may be present in those who were tested for SARS-CoV-2, those who completed the questionnaire, and those who consented to linkage. During the time period when index PCR tests were conducted testing was available to everyone free of charge. However, people might be less likely to have been tested if their symptoms were mild resulting in some bias in testing. Furthermore, selection bias in questionnaire completion could potentially lead to overestimation of associations if having ongoing symptoms made participation more likely, or alternatively underestimation of associations if having more severe ongoing symptoms affected the ability to participate. In terms of linkage consent it is difficult to determine what direction of effect this might have. Despite this limitation our methodology represents a pragmatic recruitment method that allows representative response at a population level.

The crude prevalence of long-COVID was higher following severe infection than mild infection. However, we were unable to calculate adjusted attributable prevalence stratified by infection severity. Population attributable risk is not calculable by severity because it is a detailed version of the exposure variable (test status), meaning that severity and test status are strongly correlated. Future work should explore other indicators of severity and Covid-19 history.

There is the potential for misclassification bias. Antigen tests were not available. Moreover, some individuals in the comparison group may have had SARS-CoV-2 infection that was not detected by a PCR test. This risk was reduced by excluding participants who had only negative PCR tests recorded but who self-reported that they had had SARS-CoV-2 infection. Nevertheless, the risk of classification error due to undiagnosed, asymptomatic infection remains.

Methods

The Long-COVID in Scotland Study (Long-CISS) is an ambidirectional general population cohort. Every adult (>16 years) in Scotland with a positive PCR test was invited along with a comparison group who had had a negative test but never a positive test, matched by age, sex, deprivation quintile, and time period (in units of three-month periods)5. The latter were reallocated to the infected group if, and when, they tested positive. People who had asymptomatic SARS-CoV-2 infections were excluded. The National Health Service (NHS) Scotland platform that provided PCR result notifications identified eligible participants and invited them via automated SMS text messages. The COVID-19 & Respiratory Surveillance in Scotland Dashboard (https://scotland.shinyapps.io/phs-respiratory-covid-19/) provides information on testing and positivity rates over time. An online questionnaire (Supplementary Fig. 1), self-completed at six, 12 and 18 months following the index PCR test (first positive test or, for the comparison group, most recent negative test), collected information on pre-existing health conditions and 26 current symptoms (harmonised with the ISARIC questionnaire)11.

Linkage to the test database provided the date and result of the index PCR test plus age, sex and postcode of residence. The latter was used to derive the Scottish Index of Multiple Deprivation (SIMD)12. Additional data were obtained through linkage to electronic health records - both five years prior their index test and subsequent to the test (up to January 2022) - on hospitalizations (Scottish Morbidity Record 01/04), dispensed prescriptions (Prescribing Information System), vaccinations, and death certificates (General Registrar Office). Severe infection was defined as hospital admission for SARS-CoV-2 infection. SARS-CoV-2 variants were defined as dominant if they accounted for ≥95% of cases genotyped that week in the UK population (https://sars2.cvr.gla.ac.uk/cog-uk/). Periods were defined as having no dominant variant when no single variant accounted for ≥95% of cases genotyped that week. Pre-existing health conditions were ascertained from self-report using the questionnaire, as well as linkage to previous hospitalizations and dispensed prescriptions. The methodology is described in detail elsewhere5.

Our primary outcome was long-Covid, defined as one or more self-reported symptom at follow-up. Prevalence was calculated separately for those with previous symptomatic infection and those never infected. The crude attributable prevalence was estimated as the difference between these values. The adjusted attributable prevalence was calculated using the regpar command in Stata following logistic regression, adjusting for potential confounders (age, sex, deprivation quintile, ethnic group, individual and total number of long-term conditions, vaccination status, and dominant variant). Analysis was stratified by follow-up time; six, 12, and 18 months. Estimates were calculated for the whole study population, and then by subgroup.

Ethics statement

Participants provided informed electronic consent for both data collection and data linkage, and study approval was obtained from the West of Scotland Research Ethics Committee (ref. 21/WS/0020) and the Public Benefit and Privacy Panel (ref. 2021-0180).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.