Main

The SARS-CoV-2 lineage B.1.1.7 spread rapidly across England between November 2020 and January 2021. This variant possesses a large number of non-synonymous substitutions of immunological importance2. The N501Y replacement on the spike protein has been shown to increase ACE2 binding3,4 and cell infectivity in animal models5, and the P618H replacement on the spike protein adjoins the furin-cleavage site6. B.1.1.7 also possesses a deletion at positions 69 and 70 of the spike protein (Δ69–70) that has been associated with failure of diagnostic tests using the ThermoFisher TaqPath probe, which targets the spike protein7. Although other variants with Δ69–70 are also circulating in the UK, the absence of detection of the S gene target in an otherwise positive PCR test appears to be a highly specific biomarker for the B.1.1.7 lineage. Data from national community testing in November 2020 showed a rapid increase in SGTF during PCR testing for SARS-CoV-2, coinciding with a rapid increase in the frequency of B.1.1.7 observed in genomic surveillance. The B.1.1.7 lineage was designated VOC 202012/01 by Public Health England (PHE) in December 2020.

Phylogenetic studies carried out by the UK COVID-19 Genomics Consortium (COG-UK) (https://www.cogconsortium.uk)8 provided the first indication that B.1.1.7 has an unusual accumulation of substitutions and was growing at a higer rate than other circulating lineages. We investigated time trends in the frequency of sampling VOC genomes and the proportion of PCR tests exhibiting SGTF across the UK, which we calibrated as a biomarker of VOC infection. Using multiple approaches and both genetic and SGTF data, we conclude that B.1.1.7 is associated with a higher reproduction number (R) than previous non-VOC lineages.

We examined whole-genome SARS-CoV-2 sequences from randomly sampled residual materials obtained from community-based COVID-19 testing in England, collected between 1 October 2020 and 16 January 2021. These data included 31,390 B.1.1.7 sequences for which the time and location of sample collection were known. Over the same period, 52,795 non-VOC genomes were collected. VOC sequences were initially concentrated in London (n = 9,134), the South East (n = 5,609), and the East of England (n = 4,413), but is now widely distributed across England. Overall, we estimate the median posterior additive difference in growth rates between B.1.1.7 and co-circulating variants to be 0.69 per week (95% credible interval (CrI) 0.61–0.76) (Fig. 1a, Extended Data Fig. 1, Supplementary Methods section 2), and this difference was largest in November. However, in tandem with geographic expansion of the VOC and imposition of lockdown measures in 2021, this difference declined gradually to 0.43 per week (95% CrI 0.33–0.52) for the week ending 16 January.

Fig. 1: Expansion of lineage B.1.1.7 relative to co-circulating lineages in England.
figure 1

a, Estimated frequency of sampling the VOC (lines) over time in NHS regions (n = 84,185). Shaded regions, 95% credible region based on Bayesian regression; points, empirical proportions of the VOC in each week; error bars, 95% CI based on binomial sampling error. b, Effective population size over time for lineage B.1.1.7 and estimates based on a matched sample of the most abundant co-circulating lineage, B.1.177 (n = 3,000). Shaded regions, 95% bootstrap CI. c, The effective reproduction number inferred from growth of effective population size for both lineages in b.

The rate of genetic diversification of the VOC lineage over time allows epidemic growth rates to be estimated using phylodynamic modelling9,10. To contrast VOC and non-VOC growth patterns, we randomly sampled 3,000 VOC sequences paired with up to 3,000 non-VOC sequences and matched by week of sample collection and location (Supplementary Methods section 1). Phylodynamic modelling (Supplementary Methods section 3) of the effective population sizes of B.1.1.7 and the previously dominant non-VOC B.1.177 lineage11 gave an estimated growth rate difference of 0.33 per week (95% confidence interval (CI) 0.09–0.62), and further indicated that the VOC overtook the B.1.177 lineage on 10 December (Fig. 1b), close to the date at which VOC sampling frequency exceeded 50% in England (3 December). Thus, we estimate that B.1.1.7 reached 50% frequency within 2.5 to 3 months after its emergence in England.

We estimated the ratio of VOC to non-VOC reproduction numbers using a renewal equation based approach (Fig. 1c, Extended Data Fig. 2, Supplementary Methods section 4). This estimator depends on the absolute growth rate of the non-VOC, estimated using the phylodynamic model. We estimate the ratio of reproduction numbers between 25 October 2020 and 16 January 2021 to be 1.89 (95% CrI 1.43–2.65), assuming a gamma-distributed generation time with mean 6.4 days and coefficient of variation of 0.6612. This ratio is sensitive to the assumption that the generation time distribution is identical between variants. However, even if the VOC generation time is half that of previous variants, the estimated ratio of reproduction numbers was still 1.53 (95% CrI 1.27–1.79). The ratio trended downwards over time, coinciding with the increasing frequency of the VOC. By mid-January, the ratio had fallen from 1.89 to 1.54 (95% CrI 1.34–1.82) (Extended Data Fig. 2).

Trends in SGTF attributed to the VOC

Infection with the VOC lineage results in a diagnostic failure on the S gene target in an otherwise positive PCR test using the ThermoFisher TaqPath assay, which is widely used for SARS-CoV-2 community PCR testing in the UK. Consequently, we gained a more detailed picture of the spatial and demographic spread of B.1.1.7 by using the much more abundant diagnostic data with SGTF than by using whole-genome sequencing only. Several SARS-CoV-2 variants can result in SGTF, but since mid-November 2020, more than 97% of PCR tests with SGTF were due to the B.1.1.7 lineage1. Approximately 35% of positive test results in UK community PCR testing use the TaqPath assay, and so provide S gene target results. Before mid-November 2020, SGTF frequency among PCR positives was a poor proxy for VOC frequency. We therefore developed a spatiotemporal model to predict the proportion of SGTF cases attributable to the VOC by area and week (Supplementary Methods section 5), here termed the true positive proportion (TPP). False positives were attributed to the S-gene-positive case (S+) category. We found that the effective population size for B.1.1.7 effective population size was highly correlated with TPP-adjusted S counts (Extended Data Fig. 3).

Figure 2a–c (and Supplementary Data 1, Extended Data Fig. 4) shows the spatiotemporal trends of SGTF cases (S−), S+ and total PCR-positive cases by National Health Service (NHS) England Sustainability and Transformation Plan (STP) areas (a geographical subdivision of NHS regions). Visually, it is clear that during the second England lockdown, when schools were open, S+ case numbers decreased but S− case numbers increased. However, during the third lockdown, when schools were closed, the incidence of both S− and S+ cases declined.

Fig. 2: Trends of diagnosed cases and SGTF over time and between regions, and reproduction numbers of the VOC inferred from SGTF.
figure 2

ac, The number of diagnosed cases over time for three English STP regions that represent a wide spectrum of outcomes in terms of time of VOC introduction into the region. Each line segment is shaded with the frequency of SGTF in each week (scale at top). Vertical shaded regions represent the times of the second and third UK lockdowns. d, The estimated (Bayesian posterior) multiplicative transmission advantage of the VOC over time inferred from STP-level SGTF count data. Shaded regions, 95% CrI. e, The reproduction number of S-gene-negative cases versus the reproduction number of S-gene-positive cases over time and among STP regions for epidemiological weeks 45–55 (1 November 2020 to 16 January 2021).

Using TPP-corrected SGTF frequencies applied to overall PHE case numbers, we jointly estimated weekly effective reproduction numbers (Rt) values for the VOC and non-VOC in each of the 42 STP areas using a semi-mechanistic epidemiological model13 (Supplementary Methods section 6). The model parametrizes VOC Rt as a multiple of non-VOC Rt. The model was fitted to case numbers obtained by multiplying overall PHE case numbers by TPP-corrected SGTF frequencies. We estimated Rt for epidemiological weeks 45–55 (1 November 2020 to 16 January 2021) (Fig. 2d), as before November there were insufficient VOC cases to reliably estimate VOC reproduction numbers across England. VOC Rt was greater than non-VOC Rt for all STP–week pairs (points above the diagonal in Fig. 2e). The estimated mean ratio of Rt for the VOC and non-VOC strains was 1.79 (95% CI 1.22−2.49) over weeks 45–55. As in the phylodynamic analysis, the multiplicative advantage in Rt for the VOC declined over the time window examined, to approximately 1.5 in week 55 (Fig. 2d).

The greater Rt estimates of the VOC, even where Rt of non-VOC variants was below 1, indicates that B.1.1.7 has a transmission advantage, and that the observed frequency trends cannot be explained solely by a reduction in the mean generation time. We repeated the joint estimation of VOC and non-VOC Rt with the assumption of a 25% reduction in the mean generation time of the VOC (Extended Data Fig. 5), and this estimated the mean ratio of Rt to be 1.60 (95% CI 1.09–2.23) over weeks 45–55. Incorporating a shorter generation time for the VOC into the model reduced, but did not eliminate, the decreasing trend in transmission advantage over time.

To test whether VOC transmissibility differed by age, we first examined the age distributions of S+ and S− cases. Case numbers were age-standardized at STP area level, and then case age distributions were calculated for each STP–week (Supplementary Methods section 7). Figure 3 shows that individuals aged 19–49 years were the only age group that was consistently over-represented among observed cases relative to their share in the population (40%), with little difference between VOC and non-VOC cases. Secondary school-aged children (11–18 years) were also over-represented among observed cases relative to their share in the population (9%), and the difference between VOC and non-VOC cases was statistically significant for three weeks in November (Fig. 3, Extended Data Fig. 6). This period coincides with the second England lockdown (5 November to 2 December 2020) when schools remained open, and the differing age distributions between variants could arise from altered contact patterns when children were at greater risk of infection from all variants compared to adults.

Fig. 3: Age distribution of S-gene-positive and -negative cases over time in England.
figure 3

Observed cases were age-standardized at the level of the STP area, and age distributions were calculated for each week in STP areas and then aggregated. Shaded regions, CIs computed by bootstrapping over STP areas within NHS regions for each week.

Next, we formulated models that incorporated a difference in VOC transmission between age groups (Supplementary Methods section 7). The models were fitted variously to genome-derived and/or SGTF-derived VOC frequencies, as well as total age-specific cases in each week and region, and compared using Bayesian leave-one-out cross-validation.

Model comparison consistently favoured models that allowed the transmission advantage to vary over time and between regions, using either genomic or SGTF data. However, models that incorporated an age effect were not significantly favoured (Extended Data Table 1). Indeed, the observed fluctuations in the age distribution are equally well captured by models that do not incorporate age-specific transmission advantages (Extended Data Fig. 6). We also used these model comparisons to test the hypothesis that differences in the VOC growth rates are a consequence of a reduced generation time in B.1.1.7. In principle it is possible to statistically identify such a difference, because the data cover a period during which the overall Rt has been above and below one. Models that incorporate a change in the mean generation time were sometimes favoured (Extended Data Table 1), but the estimated ratio of mean generation times was not well identified—it varied between 0.75 and 0.96, depending on the model and data being fitted to. The mean ratio of Rt between the VOC and non-VOC ranged between 1.6 and 2.01, depending on model variant. The best fit model to both SGTF and genomic data gave an estimate of 1.74 (95% CrI 1.03–2.75), which is highly consistent with the estimates obtained from the phylodynamic analysis and the direct estimation of Rt for VOC and non-VOC described above. This model also reproduces the decline in transmission advantage over time seen in our other analyses (Extended Data Fig. 7).

Discussion

While substitutions in the B.1.1.7 lineage are associated with substantial changes in viral phenotype3,4,5,14, the extent to which these substitutions lead to meaningful differences in transmission between humans is unclear, and cannot be evaluated experimentally. When randomized experimental studies are not possible, observational studies provide strong evidence if consistent patterns are seen in multiple locations and at multiple times. Increasing frequency of a new lineage is consistent with a selective advantage, but changes in frequency result from founder effects, especially for genetic variants that are repeatedly introduced from overseas11,15. However, in contrast to previous variants that have achieved high prevalence, we see expansion of the VOC from within the UK.

We find some evidence that the multiplicative transmission advantage of B.1.1.7 (that is, ratio of reproduction numbers) declined in late December 2020 to January 2021, coincident with stricter social distancing, school closures, and the subsequent third England lockdown (Fig. 2d, Extended Data Figs. 2, 6). A number of mechanisms could generate this effect. First, a shorter generation time of the VOC would reduce the ratio of VOC to non-VOC growth rates for small values of the non-VOC growth rate. Thus as interventions reduce both reproduction numbers, their ratio would decline, even in the absence of any underlying change in transmission advantage. Some weak support for this hypothesis is provided by our age-specific model fits to SGTF data, where model comparison generally favours models that include a change in mean generation time (Extended Data Table 1). Second, social distancing changes human contact networks, reducing the number of people contacted per day, but increasing the duration and proximity of remaining (mostly household) contacts. In such circumstances, saturation of transmission probabilities can lead to a reduction in the transmission advantage of the VOC (Extended Data Fig. 8, Supplementary Methods section 7). The observation that secondary attack rates in contacts identified through routine national contact tracing were 30–40% higher for the VOC than for non-VOC cases16 provides some support for this hypothesis, given that the large majority of contacts identified through the UK Test and Trace system are household contacts.

The data included in this study were collected as part of routine surveillance of community testing and are not representative of SARS-CoV-2 infections in England. However, previous comparisons of community case data to random household prevalence surveys have shown very strong agreement in epidemic trends17,18. Furthermore, estimates of the growth advantage of B.1.1.7 obtained during earlier iterations of this study1 have largely been predictive of its subsequent spread in January, both in the UK and internationally. Independent observations of secondary attack rates inferred from UK contact tracing data have confirmed these findings19.

The substantial transmission advantage that we and others20,21 have estimated has increased the challenges in controlling COVID-19. The B.1.1.7 lineage was identified quickly owing to extensive genomic surveillance in the UK, but other lineages with similar concerning features22,23 have emerged almost concurrently, and lineages with similar features may be circulating undetected. Improving global genomic surveillance will be important for the control of COVID-19 in the presence of multiple emerging lineages with enhanced transmission or potential for immune escape.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.