Main

Most community SARS-CoV-2 PCR tests in England are processed by one of six national ‘Lighthouse’ laboratories. Among the mutations carried by the B.1.1.7 lineage—also known as variant of concern (VOC) 202012/01—is a six-nucleotide deletion that prevents the amplification of the S gene target by the commercial PCR assay that is currently used in three of the Lighthouse laboratories1. By linking individual records of positive community tests with and without SGTF to a comprehensive line list of deaths associated with COVID-19 in England, we estimate the relative hazard of death associated with infection with the B.1.1.7 variant. We define confirmed SGTF as a compatible PCR result with cycle threshold (Ct) < 30 for orf1ab, Ct < 30 for the nucleocapsid (N) gene and no detectable S (Ct > 40); confirmed non-SGTF as any compatible PCR result with Ct < 30 for each of the orf1ab, N and S genes; and an inconclusive (missing) result as any other positive test, including tests processed by a laboratory that is incapable of assessing SGTF.

Characteristics of the study population

The study sample (Extended Data Table 1) comprises 2,245,263 individuals who had a positive community (‘Pillar 2’) test between 1 November 2020 and 14 February 2021. Just over half of those tested (1,146,534; 51.1%) had a conclusive SGTF reading and, of these, 58.8% had SGTF. Female individuals comprised 53.6% of the total sample; 44.3% of individuals were aged 1–34 years, 34.4% aged 35–54 years, 15.1% aged 55–69 years, 4.3% aged 70–84 years and 1.9% aged 85 years or older. The majority of individuals (93.7%) lived in residential accommodation (defined as a house, flat, sheltered accommodation or house in multiple occupancy) and 3.1% lived in a care or nursing home. On the basis of self-identified ethnicity, 74.0% were white, 13.6% were Asian, 4.6% were Black and 7.8% were of other, mixed or unknown ethnicity. All seven NHS England regions are represented, with the London region contributing 22.5% of tests and the South West 5.9%. The first three weeks of the study period (1–21 November) contributed 15.5% of the total tests, and the final three weeks (24 January–14 February) 12.8%. The period between 3 and 23 January contributed 31.6% of tests.

For those samples for which SGTF status was measured, SGTF prevalence was similar in male and female individuals, but lower in the older age groups: 59.0% in 1–34-year-old individuals compared with 55.4% in those aged 85 years and older. In keeping with these age patterns, SGTF prevalence was lower in individuals living in a care or nursing home (54.3%) than those in residential accommodation (58.8%). SGTF prevalence by self-identified ethnicity was 58.0% in the white group, 57.6% in the Asian group, 69.6% in the Black group and 64.8% in the other, mixed or unknown ethnicity group. SGTF prevalence was the lowest in the most-deprived quintile of the index of multiple deprivation3 (IMD) (53.9%). The highest prevalences of SGTF over the study period were observed in the East of England (77.5%), South East (77.3%) and London (75.4%) regions, and the prevalence of SGTF was the lowest in the North East and Yorkshire region (41.2%). The prevalence of SGTF increased steeply over time (Fig. 1a), from 5.8% during 1–21 November 2020 to 94.3% during 24 January–14 February 2021.

Fig. 1: Descriptive analyses.
figure 1

a, The number of samples with and without SGTF by day from 1 November 2020 to 14 February 2021, the period covered by our main analysis. b, Number of deaths within 28 days of a positive test by specimen date for all data included in the analysis. c, Kaplan–Meier plot showing survival (point estimates and 95% confidence intervals) among individuals tested in the community in England with and without SGTF, in the subset for whom SGTF was measured. The inset shows the full y-axis range. di, Crude death rates (point estimates and 95% confidence intervals) among SGTF versus non-SGTF cases (in the subset for whom SGTF was measured; n = 1,146,534) for deaths within 28 days of a positive test stratified by broad age groups and sex (d), residence type (e), ethnicity (f), IMD decile (g), region of NHS England (h) and specimen date (i). Horizontal bars show the overall crude death rates (point estimates and 95% confidence intervals) by age group irrespective of SGTF status.

Source data

Missing SGTF status was strongly associated with age and residence type. The proportion with SGTF status missing was similar in age groups 1–34 (48.3%), 35–54 (47.8%) and 55–69 (48.2%), and then increased to 54.4% in the 70–84 age group and to 77.7% in the 85 and older age group. SGTF status was missing in 87.9% of tests for individuals living in a care or nursing home, compared with 47.4% of tests among individuals in residential accommodation. This is due in part to more extensive use of lateral flow immunoassay tests in care homes, which do not yield an SGTF reading. Missingness in SGTF status also differed substantially between regions of NHS England, ranging from 21.2% in the North West to 71.1% in the South West, which is largely explained by proximity to a Lighthouse laboratory that is capable of producing an SGTF reading (Extended Data Fig. 1). Missingness also depended on the date of the specimen, with the percentage missing being lower for the earlier specimen dates and highest (54.4%) in the 21-day period that contributed the most tests (3–23 January). There were also minor differences in missingness depending on ethnicity and IMD. Of the 48.9% of tests with missing SGTF status, 5.1% were inconclusive owing to high Ct values and the remaining 43.8% were not assessed for SGTF.

In total, 19,615 people in the study sample are known to have died (0.87% of 2,245,263). Crude death rates were substantially higher in the elderly and in those living in a care or nursing home (Supplementary Table 1). The standard definition of a death associated with COVID-19 in England is any death that occurred within 28 days of the first positive SARS-CoV-2 test of an individual; 17,452 of the observed deaths (89.0%) met this criterion (Fig. 1b). Among those with known SGTF status, the crude death rate associated with COVID-19 was 1.86 deaths per 10,000 person-days of follow-up in the SGTF group, versus 1.42 deaths per 10,000 person-days in the non-SGTF group (Fig. 1c and Extended Data Table 2). Stratifying by broad age groups and by sex, residence type, ethnicity, IMD, region and specimen date, death rates within 28 days of a positive SARS-CoV-2 test were higher among SGTF than non-SGTF cases in 98 of the 104 strata assessed (94%; Fig. 1d–i and Supplementary Table 2).

Cox regression analyses

To estimate the association between SGTF and mortality while controlling for observed confounding (Extended Data Fig. 2), we fitted a series of Cox proportional hazards models4 to the data. We stratified the baseline mortality hazard by lower-tier local authority (LTLA) and specimen date to control for geographical and temporal differences in the hazard—for example, due to changes in hospital pressure during the study period—and used spline terms for age and IMD and fixed effects for sex, ethnicity and residence type in the hazard model. All models were fitted twice, once using complete cases only, that is, by simply excluding individuals with missing SGTF status, and once using inverse probability weighting (IPW), that is, accounting for missingness by upweighting individuals whose characteristics—age, sex, IMD, ethnicity, residence type, NHS England region of residence and sampling week—are underrepresented among complete cases. The analysis of the complete cases assumes that whether an individual dies is independent of whether their SGTF status is observed or missing, given the individual’s other characteristics included in the survival model, whereas the IPW analysis assumes that whether an individual has SGTF is independent of whether their SGTF status is observed or missing, given the individual’s other characteristics included in the model used to derive weights for IPW5.

For the analysis of the complete cases, the estimated hazard ratio for SGTF was 1.55 (95% confidence interval, 1.39–1.72), indicating that the hazard of death in the 28 days after a positive test is 55% (39–72%) higher for SGTF than for non-SGTF cases.

To assess the model assumption of proportional hazards, we added an interaction term between SGTF and time since a positive test. There was strong evidence of non-proportionality of hazards (likelihood ratio test \(P({\chi }_{1}^{2}=11)=0.009\)) (Fig. 2a and Extended Data Fig. 3), with the estimated time-varying hazard ratio increasing over time: 1.14 (0.92–1.40) on day 1 after the positive test, 1.58 (1.42–1.75) on day 14 and 2.24 (1.75–2.87) on day 28. Adding higher-order functions of time into the interaction terms did not significantly improve the fit of the model (likelihood ratio test \(P({\chi }_{1}^{2}=3.3)=0.07\)). We found no evidence that the effect of SGTF varied depending on age group (likelihood ratio test \(P({\chi }_{4}^{2}=5.8)=0.22\)), sex (\(P({\chi }_{1}^{2}=0.057)=0.81\)), IMD (\(P({\chi }_{9}^{2}=11)=0.31\)), ethnicity (\(P({\chi }_{3}^{2}=1.2)=0.75\)) or residence type (\(P({\chi }_{2}^{2}=0.33)=0.85\)). We note, however, that the relatively small number of deaths among 1–34-year-old individuals during the study period (44 deaths) does not permit robust assessment of the effect of SGTF in this age group. Other time-covariate interactions suggested that the time from positive test to death was slightly shorter among women, care home residents and elderly individuals; see Supplementary Note 1 for more details on models with interaction terms.

Fig. 2: Survival analyses.
figure 2

ad, Estimated hazard ratio of death (point estimate and 95% confidence intervals) within 28 days of a positive test for the SGTF analysis for complete cases (a), SGTF analysis with IPW (b), pVOC analysis for complete cases (c) and pVOC, analysis with IPW (d) in a model stratified by LTLA and specimen date and adjusted for the other covariates. e, Estimated hazard ratio of death (point estimates and 95% confidence intervals) across each model investigated. Death types are coded as follows: dX, all deaths within X days of a positive test; dNA, all deaths with no restriction on follow-up time; c28, death-certificate-confirmed deaths associated with COVID-19 within 28 days; e60, all deaths within 60 days plus all death-certificate-confirmed deaths associated with COVID-19 within any time period. S, spline term (for age or IMD); L, linear term (for age or IMD); NHSE, NHS England region (n = 7); UTLA, upper-tier local authority (n = 150); LTLA, lower-tier local authority (n = 316). LTLA start date signifies a start date chosen separately for each LTLA; Y:tstop signifies an interaction term between covariate Y and time since positive test (eth: ethnicity, res: residence type); pVOC2 signifies sequence-based misclassification adjustment (see Methods).

Source data

For IPW analysis, a model to predict missingness is required. We evaluated a series of such models, including a cauchit model, which is a robust alternative to logistic regression that is suitable for IPW5. We selected the cauchit model as it fit well and resulted in less extreme weights than other models (Extended Data Fig. 4). The IPW analysis gave similar results to the analysis of the complete cases, yielding a hazard ratio of 1.58 (1.40–1.78). Similar to the analysis of the complete cases, the IPW analysis recovered an increasing hazard ratio with time since a positive test, but the increase was less marked (Fig. 2b) and did not significantly differ from zero (Wald test \(P({\chi }_{1}^{2}=1.4)=0.23\)).

Misclassification analysis

Before the emergence of B.1.1.7, a number of minor circulating SARS-CoV-2 lineages with mutations in the S gene could also cause SGTF1. Our main analyses are restricted to specimens from 1 November 2020 onwards to avoid diluting the measured effect of B.1.1.7 on mortality due to non-B.1.1.7 lineages that cause SGTF. As an alternative approach, we undertook a misclassification analysis6, modelling the relative frequency of SGTF over time for each NHS England region as a low, time-invariant frequency of non-B.1.1.7 samples with SGTF plus a logistically growing2 frequency of B.1.1.7 samples. This allowed us to estimate the probability, pVOC, that a given SGTF sample was B.1.1.7 based on its specimen date and NHS England region (Extended Data Fig. 5). Again restricting the analysis to specimens from 1 November 2020 onward, we find a hazard ratio associated with pVOC of 1.58 (1.42–1.76) for the analysis of the complete cases and 1.61 (1.42–1.82) for the IPW analysis (Fig. 2c, d).

Absolute risks

To put these results into context, we estimated absolute mortality risks by applying hazard ratios for SGTF to the baseline risk of death among individuals tested in the community between August and October 2020 (assumed to be illustrative of the case fatality ratio associated with pre-existing variants of SARS-CoV-2) (Table 1). For the analysis of the complete cases, in women aged 70–84 years, the estimated risk of death within 28 days of a positive SARS-CoV-2 test increases from 2.9% without SGTF to 4.4% with SGTF (95% confidence interval, 4.0–4.9%) and for women 85 years or older, the risk increases from 13% to 19% (17–21%). For men aged 70–84 years, the risk of death within 28 days increases from 4.7% to 7.2% (6.4–7.9%) and for men 85 years or older, the risk increases from 17% to 25% (23–27%). Estimates based on the IPW analysis corrected for misclassification were marginally higher. These estimates reflect a substantial increase in absolute risk among older age groups, but the risk of death associated with COVID-19 after a positive test in the community remains below 1% in most individuals who are younger than 70 years old. Note that these estimates capture the fatality ratio among people tested in the community, and are thus likely to be higher than the infection fatality ratio, as many individuals with a SARS-CoV-2 infection are never tested.

Table 1 Absolute 28-day mortality risk for B.1.1.7

Further investigations

We conducted a number of sensitivity analyses to verify the robustness of our results. Our main results were largely insensitive to: restriction of the analysis to deaths caused by COVID-19 confirmed on the death certificate; any follow-up time of 21 days or longer; coarseness of geographical and temporal stratification; use of linear versus spline terms for age and IMD; analysis start date; follow-up time–covariate interactions; removal of the 10-day death registration cut-off; and restriction of the analysis to individuals with a full 28-day follow-up period (Fig. 2e). Generally, the IPW analysis yielded marginally higher hazard ratios, with greater uncertainty. As a further sensitivity analysis, we adjusted for an indicator in community testing data for whether the individual was tested because of symptoms or owing to asymptomatic screening. Although we caution that symptomatic screening status may lie on the causal pathway between SGTF status and death, we found that this adjustment had no effect on the relative hazard of SGTF (1.54 (1.39–1.71); analysis of complete cases).

Discussion

We previously found that B.1.1.7 is substantially more transmissible than pre-existing SARS-CoV-2 variants, but could not robustly identify any associated change in disease severity using population-level analysis of early data2. This analysis of individual-level data, which controls for factors that could confound the association between B.1.1.7 infection and death, reveals an increase in COVID-19 mortality associated with the B.1.1.7 lineage. We stratify our analyses by test time and geographical location—mimicking matching on these variables—to account for changes in testing rates and changing pressures on hospital services over time and by region. Our findings are consistent with earlier reports by ourselves and other groups7 and with contemporaneous studies8,9,10,11 assessing the risk of severe outcomes among individuals with B.1.1.7 infection. Notably, our study is limited to individuals tested in the community. Indicators for infection with the B.1.1.7 variant are not currently available for most people who die from COVID-19 in England, as they are tested in the hospital rather than in the community and hospitals do not routinely collect genotypic data. However, this restricted focus allows us to capture the combined effect of an altered risk of hospitalization given a positive test and an altered risk of death given hospitalization, while only the latter would be measurable in a study of hospitalized patients only. Unfortunately, we were unable to account for vaccination status in this analysis.

We do not identify any mechanism for the increased mortality here. Infections with the B.1.1.7 variant are associated with higher viral concentrations in nasopharyngeal swabs, as measured by Ct values using PCR testing (Extended Data Fig. 6). Higher viral load could therefore be partly responsible for the observed increase in mortality; this could be assessed using a mediation analysis. Alternatively, changes in test-seeking behaviour could, in principle, explain our results. If B.1.1.7-associated infections were less likely to cause symptoms, but symptomatic cases of B.1.1.7 were more severe, then our study could overestimate changes in the infection fatality rate. However, we find no clear difference in SGTF frequency among community tests relative to a random sample of SARS-CoV-2 infections in the population (Extended Data Fig. 7), which suggests that variant-associated changes in test-seeking propensity do not explain our findings.

Methods

Ethical approval

Approved by the Observational/Interventions Research Ethics Committee at the London School of Hygiene and Tropical Medicine (reference number 24020). Participant consent is not required for national infectious disease notification datasets in England.

Data sources

We linked three datasets provided by Public Health England: a line list of all positive tests in Pillar 2 (community) testing for SARS-CoV-2 for England, containing specimen date and demographic information on the participants; a line list of cycle threshold (Ct) values for the orf1ab, N (nucleocapsid), and S (spike) genes for positive tests that were processed in one of the three national laboratories (Alderley Park, Glasgow or Milton Keynes) using the Thermo Fisher TaqPath COVID-19 assay; and a line list of all deaths associated with COVID-19 in England, which combines and deduplicates deaths reported by hospitals in England, by the Office for National Statistics, through direct reporting from Public Health England Health Protection Teams, and through Demographic Batch Service tracing of laboratory-confirmed cases12. We link these datasets using a numeric identifier for Pillar 2 tests (‘FINALID’) common to all three datasets. We define SGTF as any test with Ct < 30 for orf1ab and N targets but no detectable S gene, and non-SGTF as any test with Ct < 30 for orf1ab, N and S targets. A small proportion (10.4%) of SGTF tests are inconclusive. The study population of interest is defined as all individuals who received a positive Pillar 2 test between 1 November 2020 and 14 February 2021. For our main analysis, we included only tests from after 1 November 2020 to avoid including an excess of tests with SGTF not resulting from infection by lineage B.1.1.7. In sensitivity analyses, we also consider extending the population to include tests performed between 1 September and 31 October 2020.

Our analysis does not include individuals who first tested positive in hospital—that is, those patients who presented to the hospital after the onset of symptoms without first being tested in the community. This is because the cycle threshold values used to ascertain SGTF status are not available for individuals who were not tested in the community. Of the 57,750 deaths associated with COVID-19 in England during the study period, 17,642 deaths (30.5%) can be linked to a positive Pillar 2 test; among these, 4,945 have non-missing SGTF status. So, although our study includes 1,098,729 Pillar 2 tests with non-missing SGTF status, which represents 51.1% of the 2,245,263 Pillar 2 tests over this period and 40.2% of the 2,736,806 combined Pillar 1 (hospital) and Pillar 2 (community) SARS-CoV-2 tests over this period, we can only assess SGTF status for 8.6% (4,945/57,750) of the individuals who died from COVID-19 over the study period. This is explained by differing mortality rates among individuals who first test positive in a hospital compared to those who are tested in the community, as the former group are much more likely to have a severe illness, as well as by missingness in the SGTF data.

There was a small amount of missing data for sex (n = 14, <0.01%), age (n = 171, <0.01%), and IMD and regional covariates (n = 3,817, 0.16%). There were no missing specimen dates. Individuals with missing age, sex or geographical location were excluded. We also excluded individuals from the dataset whose age was recorded as zero, as there were 17,913 age-0 individuals compared to 10,132 age-1 individuals in the dataset, suggesting that many of these age-0 individuals may have been miscoded. There were some missing data on ethnicity (n = 47,491, 2.1%) and we created a category that combines missing values with ‘other’ and ‘mixed’. Missing values for residence type (n = 63,905, 2.8%) were also combined with an ‘other’ category. The full dataset used for the main analysis comprises 2,245,263 individuals, with SGTF status missing or inconclusive for 1,098,729 (48.9%). Missing data on the exposure is addressed in the analysis, described below.

We grouped residence types into three categories: residential, which included the ‘residential dwelling (including houses, flats and sheltered accommodation)’ and ‘house in multiple occupancy’ groups; care or nursing home; and other or unknown, which included the ‘medical facilities (including hospitals and hospices, and mental health)’, ‘no fixed abode’, ‘other property classifications’, ‘overseas address’, ‘prisons, detention centres and secure units’, ‘residential institution (including residential education)’ and ‘undetermined’ groups, as well as an unspecified residence type. We grouped ethnicities into four categories according to the broad categories used in the 2011 UK Census: Asian, which included the ‘Bangladeshi (Asian or Asian British)’, ‘Chinese (other ethnic group)’, ‘Indian (Asian or Asian British)’, ‘Pakistani (Asian or Asian British)’ and ‘any other Asian background’ groups; Black, which included the ‘African (Black or Black British)’, ‘Caribbean (Black or Black British)’ and ‘any other Black background’ groups; white, which included the ‘British (white)’, ‘Irish (white)’ and ‘any other white background’ groups; and ‘other, mixed or unknown’, which included the ‘any other ethnic group’, ‘white and Asian (mixed)’, ‘white and Black African (mixed)’, ‘white and Black Caribbean (mixed)’, ‘any other mixed background’ and ‘Unknown’ groups.

Statistical methods

There are several factors that we expect are associated with both SGTF and with risk of death, thus confounding the association between SGTF and risk of death in those individuals who were tested. Area of residence and specimen date were expected to be potentially strong confounding factors. Area of residence is expected to be strongly associated with SGTF status due to different virus variants circulating in different areas, and specimen date because the prevalence of SGTF is known to have greatly increased over time. Area of residence and specimen date are also expected to be associated with risk of death after a positive test, including due to differences associated with differential pressure on hospital resources by area and time. The following variables were also identified as potential confounding factors: sex, age, residence type (residential, care or nursing home, or other or unknown), ethnicity (white, Asian, Black or other, mixed or unknown) and IMD. The potential confounding factors are referred to collectively as the covariates. For descriptive analyses, age (in years) was categorized as 1–34, 35–54, 55–69, 70–84, or 85 and older.

Descriptive analyses were performed. We tabulated the distribution of the covariates in the whole study sample, the association between each covariate and SGTF status in the subset for whom SGTF was measured, and the association between each covariate and missing data in SGTF status (Extended Data Table 1). The subset for whom SGTF status was measured are referred to as the complete cases. The unadjusted association between SGTF and mortality in the complete cases was assessed using a Kaplan–Meier plot (Fig. 1c), and Kaplan–Meier plots and crude 28-day mortality rates are also presented separately according to the categories of the covariates (Extended Data Table 2 and Extended Data Fig. 2). Crude overall mortality rates (that is, not restricted to 28 days after a positive test) were obtained for the whole sample, by SGTF status in the complete cases, and in those with missing SGTF status, according to the categories of each covariate (Supplementary Table 1). We also obtained mortality rates by SGTF status (in the complete cases) for the categories of each covariate stratified by age group (Fig. 1d–i). Exact Poisson confidence intervals are used for mortality rates, assuming constant rates.

Approximately 49% of individuals in the study sample are missing data on SGTF status, due to their test not having been processed at one of the three laboratories using the Thermo Fisher TaqPath COVID-19 assay or the test being inconclusive. We performed analysis on the complete cases, restricted to the subset for whom SGTF status was measured and conclusive. This analysis of complete cases assumes that for each analysis, the missing data—in this case missing SGTF status—are independent of the outcome of interest given the variables included in the models. This is a specific type of ‘missing not at random’ assumption, as in particular it is allowed to depend on the underlying value of SGTF. We also performed an analysis of the complete cases using inverse probability weights5 (IPW) to address the missing data on SGTF, under a ‘missing at random’ assumption. In the analysis, each individual with SGTF status measured is weighted by the inverse of their probability of having SGTF status measured based on their covariates. For the IPW, the missingness model estimated the probability of missingness using logistic regression with age (restricted cubic spline), sex, IMD decile (restricted cubic spline), ethnicity, residence type by asymptomatic screening indicator and NHS region by specimen week as predictors. We also considered a cauchit and a Gosset link for the missingness model, including the same predictors, as this was expected to provide better stability for the weights5. The fit of the missingness model was assessed using a QQ plot (Extended Data Fig. 4), and Hosmer–Lemeshow and Hinkley tests were used to choose the most appropriate model.

Cox regression4 was used to estimate the association between SGTF and the hazard of mortality, conditioning on the potential confounding factors listed above. The analyses described here were applied to the complete cases and using IPW. For IPW analyses, the standard errors accounted for the weights, although the fact that the weights were estimated was not accounted for; this results in conservative standard errors. The baseline hazard in the Cox model was stratified by both specimen date and LTLA, therefore finely controlling for these variables. The stratification gives a large number of strata matched by specimen date and LTLA. Only those strata that contain individuals who died and individuals who survived contribute to the analysis. The analysis is therefore similar to the analysis that would be performed had we created a matched nested case–control sample. The remaining variables were included as covariates in the model (sex, age, residence type, ethnicity and IMD decile). Age was included as a restricted cubic spline with five knots, and IMD decile was included as a restricted cubic spline with three knots. The time origin for the analysis was specimen date and we considered deaths up to 28 days after the specimen date for the main analyses. Individuals who did not die within 28 days were censored at the earlier of 28 days after the specimen date and the administrative censoring date, which we chose as the date of the most recent death linkable to SGTF status minus 10 days (that is, 14 February 2021) to minimize any potential bias due to late reporting of deaths. We began by assuming the proportionality of the hazards for SGTF and the covariates included in the model. The assumption of proportional hazards was assessed by including in the model an interaction between each covariate and time, which was performed separately for SGTF and for each other covariate. Schoenfeld residual plots were also obtained for each covariate (Extended Data Fig. 3). We assessed whether the association between SGTF and the hazard was modified by age, sex, IMD, ethnicity and residence type. Models with and without interactions were compared using likelihood ratio tests for the analyses of the complete cases. For the analysis using IPW, we used Wald tests based on robust standard errors13.

The analysis assumes that censoring is uninformative, which is plausible as all censoring is administrative.

Misclassification analysis

The exposure of SGTF is subject to misclassification, because a number of minor circulating SARS-CoV-2 lineages in addition to B.1.1.7 are also associated with failure to amplify the S gene target. Accordingly, a positive test with SGTF is not necessarily indicative of infection with B.1.1.7. A negative test of SGTF is assumed to be indicative of an absence of infection with B.1.1.7. Misclassification of an exposure can result in bias in its estimated association with the outcome. We fitted a logistic model to Pillar 2 SGTF frequencies by NHS region to estimate a ‘background’ rate of SGTF in the absence of B.1.1.7, assuming a beta-binomial prior. This model is then used to estimate the probability that an individual testing positive for SGTF is infected with B.1.1.7, separately for individuals in each NHS region. These probabilities can then be used in place of the indicator of SGTF exposure in the Cox models. This is the regression calibration approach6 to correcting for bias due to measurement error in an exposure.

We fitted models accounting for false-positive results (modelled as regionally varying background rates of SGTF associated with non-B.1.1.7 variants) to the SGTF data. Our logistic model for B.1.1.7 growth over time is as follows:

$${\rm{logit}}\,(\,f(t))={\rm{slope}}\times (t-{\rm{intercept}})$$
$$s(t)=f(t)+(1-f(t))\times {\rm{FP}}$$
$$\begin{array}{c}{k}_{t}\sim {\rm{b}}{\rm{e}}{\rm{t}}{\rm{a}}{\rm{B}}{\rm{i}}{\rm{n}}{\rm{o}}{\rm{m}}{\rm{i}}{\rm{a}}{\rm{l}}(n={n}_{t},\\ \,\,\,\,\,\,\alpha =s(t)\times ({\rm{c}}{\rm{o}}{\rm{n}}{\rm{c}}-2)+1,\\ \,\,\,\,\,\,\beta =(1-s(t))\times ({\rm{c}}{\rm{o}}{\rm{n}}{\rm{c}}-2)+1)\end{array}$$
$${\rm{slope}}\sim {\rm{normal}}(\mu =0,\sigma =1)$$
$${\rm{intercept}}\sim {\rm{normal}}(\mu =0,\sigma =1,000)$$
$${\rm{FP}}\sim {\rm{beta}}(\alpha =1.5,\beta =15)$$
$${\rm{conc}}\sim {\rm{normal}}(\mu =0,\sigma =500)\ge 2$$

where f(t) is the predicted frequency of B.1.1.7 among positive tests at time t (in days since 1 September 2020) based on the terms slope and intercept; s(t) is the predicted frequency of SGTF at time t due to the combination of B.1.1.7 and a background false-positive rate (FP) among non-B.1.1.7 variants, conc is the ‘concentration’ parameter (conc = α + β) of a beta distribution with mode s(t); kt is the number of SGTFs detected at time t; nt is the total number of tests at time t; and the tilde (~) signifies ‘distributed as’. All priors above are chosen to be vague, and the truncation of the concentration parameter to values greater than 2 ensures a unimodal distribution for the proportion of tests that are SGTF. The model above is fitted separately for each NHS England region. Then, pVOC for a test with SGTF = 1 at time t is equal to f(t)/s(t), and pVOC = 0 for all tests with SGTF = 0. The model was fitted using Markov chain Monte Carlo with 10,000 iterations of burn-in and 5,000 iterations of sampling.

The model above was fitted using the same data source (that is, SGTF frequencies among Pillar 2 community tests for SARS-CoV-2) as our survival analysis. To verify the robustness of this model, we performed a sensitivity analysis using sequencing data from the COVID-19 UK Genomics Consortium14 (https://www.cogconsortium.uk/) downloaded from the Microreact platform15 (https://microreact.org/) on 11 January 2020 to estimate pVOC. In this alternative analysis, we estimated pVOC for each NHS England region and date as the number of samples that were VOC 202012/01 (that is, lineage B.1.1.7 with mutations ∆69/∆70 and N501Y in spike) divided by the number of samples that were SGTF (that is, any lineage with ∆69/∆70, the deletion that causes SGTF) for that NHS England region and date, setting pVOC = 1 for all dates later than 31 December 2020 as there were no sequencing data available past this date, and filling any gaps in the data using linear interpolation. This yielded nearly identical results in our survival analysis compared with the analysis that uses the modelled pVOC described above (Fig. 2e).

Absolute risks

Estimates from the final Cox models were used to obtain estimates of absolute risk of death within 28 days of a positive test for SGTF and pVOC. Given the strong influence of age on risk of death, we present absolute risks by sex and age group (in years; 1–34, 35–54, 55–69, 70–84, 85 and older). Absolute risks of death (case fatality rate) within 28 days were estimated by age group and sex using data on individuals tested during August–October 2020; this is referred to as the baseline risk. The absolute risks of death for individuals with SGTF were then estimated as follows. If the baseline absolute risk of death in a given age group is 1 − A, then the estimated absolute risk of death with SGTF is 1 − AHR, where HR denotes the estimated hazard ratio obtained from the Cox model assuming proportional hazards. Standard errors are obtained via the delta method, and confidence intervals were based on normal approximations.

Sensitivity analyses

Several sensitivity analyses were performed. After establishing the final model using the process outlined above, we investigated the effect of using different variables for the stratification of the baseline hazard measuring region at a coarser level (the upper-tier local authority or NHS England region), as well as coarser test specimen time (week rather than exact date). Adjusting for these variables instead of using stratification was also explored. We also repeated the main analysis restricting data to specimens collected from September onwards, October onwards, November onwards or December onwards.

To assess the effect of imposing an administrative cut-off to follow-up time of 10 days before data extraction, we first reanalysed the data without this cut-off, as well as reanalysing the data restricting the analysis to individuals with a follow-up of at least 28 days.

Finally, we adjusted for symptomatic status associated with the test (asymptomatic versus symptomatic), which relates to whether the test was given for asymptomatic screening purposes or on the basis of a request by a (presumed symptomatic) individual, as only symptomatic individuals may request a community SARS-CoV-2 test in England.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.