Introduction

England has experienced a large outbreak of SARS-CoV-2 infection leading to the highest excess mortality in Europe by June 20201. The first recorded COVID-19 death occurred on 28 February, with in-hospital deaths peaking by mid-April2. Hospital admission and mortality data show an asymmetrical burden of COVID-19 in England, with high rates in older people and those living in long-term care, and in people of minority ethnic groups, particularly Black and Asian (mainly South Asian) individuals3,4,5,6. It is unclear how much of this excess is due to differences in exposure to the virus, e.g. related to workplace exposures and structural inequality, and how much is due to differences in outcome, including access to health care7,8,9.

As part of the UK Government’s response to controlling the spread of the virus, on March 23 it announced a national lockdown that prohibited all but essential activities. The UK came out of lockdown from mid-May as restrictions were gradually eased as more business were allowed to reopen and the public was encouraged to use face coverings in situations when social distancing could not be maintained.

Antibody data provide a long-lasting measure of SARS-CoV-2 infection, enabling analyses of the timing and extent of the recent epidemic. Most infected people mount an IgG antibody response detectable after 14–21 days although levels may start to wane after ~90 days10. Uncertain validity of the available antibody tests, inconsistencies in sampling methods, small numbers and use of selected groups have made many studies difficult to interpret11. Different acceptability criteria may apply to community-based studies where population-wide results are required than for studies focused on individual risk11,12,13,14. While not generally approved for individual care, self-administered lateral flow immunoassay (LFIA) tests done at home provide a means for obtaining reliable community-wide prevalence estimates rapidly and at scale, at reasonable cost15,16, by adjusting the results for known test performance17.

Here, we obtained estimates of the cumulative community prevalence of IgG antibodies for SARS-CoV-2 infection among a representative sample of over 100,000 adults aged over 18 years in England, and specific sub-groups of the population, e.g. by ethnicity and occupation, to mid-July 202018. We used home-based self-testing with a LFIA that had been extensively evaluated for sensitivity and specificity in both laboratory and clinic settings and for acceptability and usability among the public19,20. The tests were delivered by post to randomly selected individuals who were given detailed instructions (including by video) on how to carry out the procedure. Participants were asked to upload a photograph of the completed test and to complete a brief questionnaire either online or by telephone (see the “Methods” section and published protocol18). As well as measuring community prevalence and identifying groups at most risk of infection, we estimated the total number of infected individuals in England and the infection fatality ratio (IFR) overall and by age, sex and ethnicity.

Results

Of the 121,976 people who were sent test kits, 109,076 (89.4%) completed the questionnaire of whom 105,651 (96.9%) completed the test, during the period 20 June–13 July 2020; 5544 (5.2%) were IgG positive, 94,364 (89.3%) IgG negative and 5743 (5.4%) reported an invalid or unreadable result, giving a crude prevalence of 5.6% (95% CI 5.4–5.7). After adjusting for the performance characteristics of the test and re-weighting to be representative of the population, overall antibody prevalence was 6.0% (95% CI: 5.8–6.1). This equates to 3.36 (3.22, 3.51) million adults in England who had antibodies to SARS-CoV-2 in England to mid-July 2020.

Prevalence was highest at ages 18–24 years (7.9%, 95% CI 7.3, 8.5) and in London (13.0%, 95% CI 12.3, 13.60) (Supplementary Table 1). Highest prevalence by ethnic group was found in people of Black (includes Black Caribbean, African and Black British) (17.3%, 95% CI 15.8, 19.1) and Asian (mainly South Asian) ethnicities (11.9%, 95% CI 11.0, 12.8), compared to 5.0% (95% CI 4.8, 5.2) in people of white ethnicity (Supplementary Table 1). There was some variation within these broad ethnic categories (Supplementary Table 2), with the highest prevalence in people of Black African ethncity (19.21%, 95% CI 15.57, 23.42). The increased prevalence among non-white ethnicities was partially but not fully explained by covariates. For example, in an unadjusted logistic regression model, compared to white ethnicity, Black ethnicity was associated with a three-fold increase in odds of being antibody positive (OR 3.2, 95% CI 2.7, 3.9) which reduced to OR 2.0 (1.7, 2.5) after adjustment for covariates (Table 1, Supplementary Fig. 1). Essential workers, particularly those with public-facing roles, also had increased prevalence: among those working in residential care facilities (care homes) with client-facing roles, prevalence was 16.5% (95% CI 13.7, 19.8) and it was 11.7% (95% CI 10.5–13.1) among health care workers with patient contact, with 3-fold (3.1; 2.5, 3.8) and 2-fold (2.1; 1.9, 2.4) odds of infection, respectively, compared with non-essential workers (Table 1, Supplementary Table 3, Supplementary Fig. 1). Those living in more deprived areas or in larger households, particularly without children, had higher prevalence than those in more affluent areas or who lived alone, although the increased odds were partially attenuated in the adjusted models. In contrast, higher household income was associated with increased prevalence of antibodies (Table 2, Supplementary Fig. 1), which was greater in younger age groups (Supplementary Table 4).

Table 1 Logistic regression analysis for prevalence of IgG antibodies to SARS-CoV-2.
Table 2 Infection fatality ratio (IFR) and numbers of SARS-CoV-2 infections by age, sex, ethnicity.

Of the 5544 IgG positive people, 3406 (61.4%; 60.1, 62.7) reported one or more typical symptoms (fever, persistent cough, loss of taste or smell), 353 (6.4%; 5.8, 7.0) reported atypical symptoms only, and 1785 (32.2%; 31.0, 33.4) reported no symptoms. This varied by age, with people over 65 being more likely to report no symptoms (392/801, 48.9%, 45.4, 52.4) than those aged 18–34 (418/1,393, 30.0%, 27.6, 32.4) or 35–64 years (975/3,350, 29.1%, 27.6, 30.6) (P < 0.001). Prevalence was higher in those with more severe symptoms, and who had contact with a confirmed or suspected case. Those who were overweight or obese had higher prevalence than those with normal weight, while current smokers had a lower prevelance than non-smokers consistent with findings from other studies4,21 (3.2% vs. 5.2%, OR 0.6, 95% CI 0.6, 0.7) (Table 1, Supplementaray Fig. 1, Supplementary Table 3).

Fig. 1: Reconstruction of epidemic curve from REACT-2 alongside national reported deaths from COVID-19.
figure 1

Number of symptomatic infections by week (dotted line; right y axis) based on the date of onset among 3493 antibody-positive participants who reported symptoms in the REACT-2 study, compared with deaths by week in England (solid line; left y axis. Data from Office for National Statistics46). Source data are provided with this paper.

Figure 1 shows how the epidemic evolved between January and June 2020. An epidemic curve was generated from dates of reported suspected or confirmed COVID-19 among symptomatic cases with antibodies (n = 3493; asymptomatic individuals and symptomatic people whose date of infection was unknown are excluded). The plot shows the epidemic curve from the present study, alongside national mortality for England by date of death: this tracks 2–3 weeks later than our epidemic curve, which peaked in the first week of April at the height of the epidemic in England. Figure 2 shows the proportionate distribution of cases from our data by employment. As the epidemic grew there was a shift towards a greater proportion of cases in essential workers, particularly those in resident-facing and patient-facing roles in care homes and health care.

Fig. 2: Proportion of infections by employment and month of symptom onset.
figure 2

Proportion of total monthly symptomatic infections based on the date of onset among 3493 antibody-positive participants who reported symptoms in the REACT-2 study, by employment status and month of symptom onset. Source data are provided with this paper.

The estimated community IFR (excluding care homes) was 0.90% (0.86, 0.94). It was higher in males (1.07%, 1.00, 1.15) than females (0.71%, 0.67, 0.75) and increased with age from 0.52% (0.49,0.55) at ages 45–64 years to 11.64% (9.22, 14.06) at ages 75+ years (Table 2). Sensitivity analyses indicate an IFR as high as 1.58% (1.51%, 1.65%) if excess rather than COVID-specific deaths are used and care home deaths are included (Supplementary Table 5). There was no difference in estimated IFR for people of Black, Asian and white ethnicities when stratified by age and sex (Table 2).

Discussion

Overall we estimate a prevalence of SARS-CoV-2 antibody of 6.0% corresponding to 3.4 million adults in England infected by the virus to mid-July 2020. The majority of people who developed antibodies reported symptoms during the peak of the epidemic in March and April 2020. As the epidemic took off it became more concentrated in specific groups including Black, Asian and other minority ethnic groups, and in essential workers, particularly those working in health and residential social care. While partially attenuated in the adjusted analyses, the higher risks persisted among these groups and reflect a starkly uneven experience of the COVID-19 epidemic across society.

An unequal burden of COVID-19 morbidity and mortality is emerging from other countries as well as the UK8,22,23,24,25. Our study has the advantage of including ethnicity data alongside information about employment, deprivation, household size and other potential explanatory variables. This allows a more nuanced exploration of the reasons underlying these unequal outcomes7. In the UK context our finding of a higher prevalence of infection, with no apparent difference in IFRs, may explain the observed excess mortality in minority ethnic groups. Therefore there is a need to better understand the occupational, social and environmental factors that may have led to higher prevalence in these groups26.

Our estimated IFR of 0.90% is consistent with a recent large study in Spain which reported 0.83–1.07%, lower than the IFR described in Italy (2%), and higher than that reported from a German study (0.38%)27,28,29. In estimating the IFR, we may have underestimated the number of infected individuals (leading to higher estimates of IFR), as a result of weakened or absent antibody response in some people, and waning antibody over time30. For the analyses of IFR nationally, we excluded deaths in care home residents since few such residents were included in our community sample. Inclusion of care home residents increased our estimates of IFR, since, like many countries, England experienced high numbers of cases and deaths in care home residents31. We included care home residents in our analyses of IFR by ethnicity as data excluding these individulas were not available.

The clinical spectrum of infection is wide, with just under one-third of people with antibodies reporting no symptoms, rising to nearly one half of people over 65 years, as also reported for individuals in long-term care32. The national prevalence study in Spain reported that 28.5% or 32.7% were asymptomatic depending on the test14, similar to our findings overall, although a systematic review of 16 clinical studies puts the figure at 40–50%33. The high prevalence of asymptomatic infection means that such cases will be missed by many routine testing campaigns that are based wholly or mainly on symptomatic individuals.

Our finding that current smokers have a lower prevalence of SARS-CoV-2 infection than non-smokers may relect unmeasured confounding, differential adoption of preventive behaviours (given the known associations of COVID-19 severity with smoking-related co-morbidities), or there may be some biological basis. For example, the effect of nicotine on angiotensin converting enzyme 2 (ACE2) receptors, a route of viral entry into cells, has been proposed as a potential mechanism34.

Our study has a number of limitatons. As in almost all population surveys, our study had unequal participation, with lower response among people from minority ethnic groups and in more deprived areas. We re-weighted the sample to account for such differential response, although this may not have overcome unknown participation biases. An important limitation was the exclusion of children for regulatory reasons as the tests were approved for research use in adults only. Furthermore, our sampling approach only allowed for one individual per household to take part in the study thus limiting our ability to explore the impact of household transmission on associations seen with other covariates. However, we did control for household size in our regression analysis to account for this. Numbers were too small to report the ethnic breakdown of antibody prevalence according to more detailed categories, as such important differences between ethnic sub-groups with respect to occupation, deprivation, and region may not have been fully captured. We used self-administered home LFIA tests as opposed to “gold standard” laboratory tests on a blood draw. However, this followed an extensive evaluation of the selected LFIA whch showed it to have acceptable performance (sensitivity and specificity) in comparison with confirmatory laboratory tests19. We also took steps to measure and improve usability, including ability to perform and read an LFIA test at home, through public involvement and evaluation in a national study of 14,000 people20.

Use of the LFIA enabled us to obtain antibody tests on large numbers of people over an 18-day period, without the need for laboratory testing or health care personnel. Antibodies were strongly associated with clinical history of confirmed or suspected COVID-19, providing face validity. Although there was a theoretical potential for reporting bias as respondents were not blinded to their test results, there was high concordance of self-reported with clinician-read results from the uploaded photographs20. Our results closely tracked other indicators of the epidemic curve. We believe that use of home-based self-tests is a sustainable model for community-based prevalence studies in other populations, avoiding the biases of surveillance that relies solely on self-referral for testing. Continued scrutiny of antibody response by clinical features, and persistence of antibodies over time, will be needed for ongoing surveillance, as waning antibodies mean that prevalence estimates may not fully capture cumulative exposure over time.

In conclusion, our finding of substantial inequalities in prevalence of SARS-CoV-2 infection by ethnicity runs counter to suggestions that the increased risk of hospitalisation and mortality from COVID-19 among minority ethnic groups is due predominantly to comorbidities or other biological factors. Work with at risk communities is urgently needed to identify appropriate interventions to reduce health inequalities related to risk of SARS-CoV-2 infection.

Methods

The REal-time Assessment of Community Transmission-2 (REACT-2) programme is evaluating community prevalence of SARS-CoV-2 infection in England. We obtained a random population sample of adults in England, using the National Health Service (NHS) patient list, which includes name, address, age and sex of everyone registered with a general practitioner (almost the entire population). Personalised invitations were sent via post to 315,000 individuals aged 18 years and above to achieve similar numbers in each of 315 lower-tier local authority areas (LTLAs). Participants registered via an online portal or by telephone with registration closed after ~120,000 people had signed up. To attain approximately the same number of registrations per LTLA, the number of invitations sent varied based on the LTLA response profile achieved when conducting similar population surveys in England35.

Those registered were sent a test kit, including a self-administered point-of-care LFIA test and instructions by post, with link to an on-line video. The questionnaires are available at the study website: https://www.imperial.ac.uk/medicine/research-and-impact/groups/react-study/react-2-study-materials/. The LFIA (Fortress Diagnostics, Northern Ireland) was selected following evaluation of performance characteristics (sensitivity and specificity) against pre-defined criteria for detection of IgG19, and extensive public involvement and user testing20. The LFIA uses the coronavirus structural spike (S) protein as the target antigen for the antibody-based detection of SARS-CoV-2. Compared to results from at least one of two in house ELISAs, sensitivity and specificity of finger-prick blood (self-read) were 84.4% (70.5%, 93.5%) in RT-PCR confirmed cases and 98.6% (97.1%, 99.4%) in 500 pre-pandemic sera19. The in-house ELISAs used in that evaluation of the LFIA were the spike protein ELISA (S-ELISA) and a hybrid spike protein receptor-binding domain double antigen-bridging assay (hybrid DABA)19. Samples for sensitivity testing were collected from adult NHS workers, who had previously tested positive for SARS- CoV-2 by PCR, but not hospitalised and were at least 21 days from the onset of symptoms19.

Participants completed a short registration questionnaire (online/telephone) and a further survey upon completion of their self-test. This included information on demographics, household composition, recent symptoms and an uploaded photograph of the result. A validation study of the photographs showed substantial concordance between participant- and clinician-interpreted results in over 500 tests (kappa: 0.89, 95% CI: 0.88–0.92)20.

Prevalence was calculated as the proportion of individuals with a positive IgG result, adjusted for test performance using:

$${p} = \left( {{q} + {\mathrm{specificity}}-1} \right)/\left( {{\mathrm{sensitivity}} + {\mathrm{specificity}}-1} \right)$$
(1)

where p is the adjusted proportion positive, q is the observed proportion positive17. Prevalence estimates at national level were weighted for age, sex, region, ethnicity and deprivation to account for the geographic sample design and for variation in response rates, so as to be representative of the population (18+ years) of England. Details of the weighting approach used and the sample population profile are in the Supplementary Information. Logistic regression models were adjusted for age, sex and region, and additionally for ethnicity, deprivation, household size and occupation. We used complete case analysis without imputation.

Regions are the highest tier of sub-national division in England and are predominantly used for statistical and some administrative purposes, London being the most dense and urban region, and the South West the least dense and most rural (further details in Supplementary Information). Index of Multiple Deprivation 2019 (IMD) was used as a measure of relative deprivation, based on seven domains at a small local area level across England (income, employment, education, health, crime, barriers to housing and services, and living environment)36.

We estimated total number of SARS-CoV-2 infections since start of the epidemic until mid-July 2020 by multiplying the antibody prevalence, adjusted for test characteristics and re-weighted for sampling, by mid-year population size at ages 18+ years in England37. To correct for survival bias we added to the seropositive population the deaths that mentioned COVID-19 on the death certificate during this period. Office for National Statistics (ONS) COVID-19 deaths registration data used includes deaths where COVID-19 was recorded as a cause of death on the death certificate, whether or not there was a laboratory-confirmed test and, at the time, irrespective of the interval from date of testing positive for those who were tested38. We then estimated the IFR, dividing the total number of COVID-19 deaths excluding care home residents14. We obtained an overall IFR estimate and estimates stratified by age and sex15. We calculated the IFR without care home deaths since we did not have sufficient numbers of care home residents in our study to be able to get an accurate estimate of prevalence of infection in this population. Early data suggested that the rate of infection in care homes was higher than in the general population39, and therefore including care home deaths would overestimate the IFR. We present separate IFR estimates by ethnicity because of the lack of availability of data on COVID deaths disaggregated by both ethnicity and care home residency, therefore we could not exclude COVID-19 deaths in care home residents from each ethnic group. Confidence bounds were obtained using the Delta method. As a sensitivity analysis we calculated IFR and total infections including care home residents, with all-cause excess deaths and stratified by age and sex. ONS excess mortality is defined as the number of deaths in 2020 which are above the number expected based on mortality rates in earlier years38. We obtained research ethics approval from the South Central-Berkshire B Research Ethics Committee (IRAS ID: 283787), and MHRA approval for use of the LFIA for research purposes only, and participants provided informed consent.

Data were analysed using the statistical package R version 4.0.040.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.