Genomic reconstruction of the SARS-CoV-2 epidemic in England

The evolution of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus leads to new variants that warrant timely epidemiological characterization. Here we use the dense genomic surveillance data generated by the COVID-19 Genomics UK Consortium to reconstruct the dynamics of 71 different lineages in each of 315 English local authorities between September 2020 and June 2021. This analysis reveals a series of subepidemics that peaked in early autumn 2020, followed by a jump in transmissibility of the B.1.1.7/Alpha lineage. The Alpha variant grew when other lineages declined during the second national lockdown and regionally tiered restrictions between November and December 2020. A third more stringent national lockdown suppressed the Alpha variant and eliminated nearly all other lineages in early 2021. Yet a series of variants (most of which contained the spike E484K mutation) defied these trends and persisted at moderately increasing proportions. However, by accounting for sustained introductions, we found that the transmissibility of these variants is unlikely to have exceeded the transmissibility of the Alpha variant. Finally, B.1.617.2/Delta was repeatedly introduced in England and grew rapidly in early summer 2021, constituting approximately 98% of sampled SARS-CoV-2 genomes on 26 June 2021. A study of the evolution of the SARS-CoV-2 virus in England between September 2020 and June 2021 finds that interventions capable of containing previous variants were insufficient to stop the more transmissible Alpha and Delta variants.

ranging from 5% in winter 2020 to 38% in early summer 2021, and filtered to remove cases that were associated with international travel (Methods and Extended Data Fig. 1a, b). Overall, a total of 328 SARS-CoV-2 lineages were identified using the PANGO lineage definition 5 . As some of these lineages were only rarely and intermittently detected, we collapsed these on the basis of the underlying phylogenetic tree into a set of 71 lineages for modelling (Fig. 1b-d  and Supplementary Tables 1 and 2).
These data reveal a diversity of lineages in the fall of 2020 followed by sweeps of the Alpha and Delta variants ( Fig. 1b and Supplementary  Tables 2 and 3). Figure 1c shows the geographical distribution of cases and of different lineages, studied at the level of 315 English lower tier local authorities (LTLAs), administrative regions with approximately 100,000-200,000 inhabitants.

Modelling the dynamics of SARS-CoV-2
We developed a Bayesian statistical model that tracks the fraction of genomes from different lineages in each LTLA in each week and fits the daily total number of positive Pillar 2 tests (Methods and Extended Data Fig. 2). The multivariate logistic regression model is conceptually similar to previous approaches in its estimation of relative growth rates 10,11 . It accounts for differences in the epidemiological dynamics between LTLAs, and enables the introduction of new lineages (Fig. 2a-c). Despite the sampling noise in a given week, the fitted proportions recapitulate the observed proportions of genomes as revealed by 35 example LTLAs covering the geography of England (Fig. 2b, c and Supplementary Notes 1 and 2). The quality of fit is confirmed by different probabilistic model selection criteria (Extended Data Fig. 3) and also evident at the aggregated regional level (Extended Data Fig. 4).
Although the relative growth rate of each lineage is modelled as identical across LTLAs, the local viral proportions change dynamically due to the timing and rate of introduction of different lineages. The model also calculates total and lineage-specific local incidences and time-dependent growth rates and approximate reproduction numbers R t by negative binomial spline fitting of the number of daily positive PCR tests (Methods, Fig. 2d and Extended Data Fig. 2c). Together, this enables a quantitative reconstruction of different periods of the epidemic, which we will discuss in chronological order.

Multiple subepidemics in autumn 2020
Autumn 2020 was characterized by a surge of cases-concentrated in the north of England-that peaked in November, triggering a second national lockdown (Fig. 1a, c). This second wave initially featured B.1 and B.1.1 sublineages, which were slightly more prevalent in the south and north of England, respectively (Fig. 2b, c). Yet, the proportion of B.1.177 and its geographically diverse sublineages steadily increased across LTLAs from around 25% at the beginning of September to 65% at the end of October. This corresponds to a growth rate of between 8% (growth per 5.1 d; 95% confidence interval (CI) = 7-9%) and 12% (95% CI = 11-13%) greater than that of B.1 or B.1.1. The trend of B.1.177 expansion relative to B.1 persisted throughout January (Extended Data Fig. 5a) and involved a number of monophyletic sublineages that arose in the UK, and similar patterns were observed in Denmark 17 (Extended Data Fig. 5b). Such behaviour cannot easily be explained by international travel, which was the major factor in the initial spread of B.1. throughout Europe in summer 2020 (ref. 7 ). However, the underlying biological mechanism is unclear as the characteristic A222V spike variant is not believed to confer a growth advantage 7 .

The spread of Alpha during restrictions
The subsequent third wave from December 2020 to February 2021 was almost exclusively driven by Alpha/B.1.1.7, as described previously 10,11,18 . The rapid sweep of Alpha was due to an estimated transmissibility advantage of 1.52 compared with B.1.1 (growth per 5.1 d; 95% CI = 1.50-1.55; Fig. 2a), assuming an unchanged generation interval distribution 19 . The growth advantage is thought to stem, at least in part, from spike mutations that facilitate ACE2 receptor binding (N501Y) 20,21 and furin cleavage (P681H) 22 . Alpha grew during a period of restrictions, which proved to be insufficient to contain its spread (Fig. 3a). The second national lockdown from 5 November to 1 December 2020 successfully reduced the total number of cases, but this masked a lineage-specific increase (R t > 1; defined as growth per 5.1 d) in Alpha and a simultaneous decrease in other hitherto dominant lineages (R t < 1) in 78% (246/315) of LTLAs 23 (Fig. 3b, c). This pattern of Alpha-specific growth during lockdown is supported by a model-agnostic analysis of raw case numbers and proportions of Alpha genomes (Fig. 3e).
Three levels of regionally tiered restrictions were introduced in December 2020 (ref. 24 ) (Fig. 3a). The areas under different tiers of restrictions visibly and quantitatively coincide with the resulting local R t values, with greater R t values in areas with lower restrictions (Fig. 3a-c). The reopening caused a surge of cases across all tiers with R t > 1, which is also evident in selected time series (Fig. 3d). As Alpha cases surged, more areas were placed under tier 3 restrictions, and stricter tier 4 restrictions were introduced. Nevertheless, Alpha continued to grow (R t > 1) in most areas, presumably driven by increased social interaction over Christmas (Fig. 3c).
After the peak of 72,088 daily cases on 29 December 2020 (Fig. 1a), a third national lockdown was announced on 4 January 2021 (Fig. 3a). The lockdown and increasing immunity derived from infection and increasing vaccination 25 led to a sustained contraction of the epidemic to approximately 5,500 daily cases by 8 March, when restrictions began to be lifted by reopening schools (further steps of easing occurred on 12 April and 17 May). In contrast to the second national lockdown 93% (296/315) of LTLAs exhibited a contraction in both Alpha and other lineages (Fig. 3e).

Elimination of lineages in early 2021
The lineage-specific rates of decline during the third national lockdown and throughout March 2021 resulted in large differences in lineage-specific incidence. Cases of Alpha contracted nationally from a peak of around 50,000 daily new cases to approximately 2,750 on 1 April 2021 (Fig. 4a). At the same time, B.1.177-the most prevalent lineage in November 2020-fell to less than an estimated 10 cases per day. Moreover, the incidence of most other lineages present in autumn 2020 was well below 1 after April 2021, implying that the majority of them have been eliminated. The number of observed distinct PANGO lineages declined from a peak of 137 to only 22 in the first week of April 2021 (Fig. 4b). Although this may be attributed in part to how PANGO lineages were defined, we note that the period of contraction did not replenish the genetic diversity lost due to the selective sweep by Alpha (Extended Data Fig. 6).

Refractory variants with E484K mutations
Parallel to the elimination of many formerly dominant SARS-CoV-2 lineages, a number of new variants were imported or emerged (Fig. 4a). These include the VOCs B.1.351/Beta and P.1/Gamma, which carry the spike variant N501Y that is also found in B.1.1.7/Alpha and a similar pair of mutations (K417N/T and E484K) that were each shown to reduce the   binding affinity of antibodies from vaccine-derived or convalescent sera 20,[26][27][28][29] . The ability to escape from previous immunity is consistent with the epidemiology of Beta in South Africa 8 Fig. 5a, b). The proportion of these E484K-containing variants was consistently 0.3-0.4% from January to early April 2021. A transient rise, especially of the Beta and Gamma variants, was observed in May 2021 ( Fig. 5a, b). Yet, the dynamics were largely stochastic and characterized by a series of individual and localized outbreaks, possibly curtailed by local surge testing efforts against Beta and Gamma variants (Fig. 5c). Consistent with the transient nature of these outbreaks, the estimated growth rates of these variants were typically lower than Alpha (Fig. 2a).
Sustained imports from international travel were a critical driving mechanism behind the observed number of non-Alpha cases. A phylogeographical analysis establishing the most parsimonious sets of monophyletic and exclusively domestic clades, which can be interpreted as individual introductions, confirmed that A.23.1 with E484K (1 clade) probably has a domestic origin as no genomes of the same clade were observed internationally (Methods, Fig. 5d and Extended Data Fig. 7). The estimated number of introductions was lowest for B.1.1.318 (3 introductions, range = 1-6), and highest for Beta (49 introductions, range = 45-58) and Eta (30 introductions, range = 18-34). Although our data exclude genomes sampled directly from travellers, these repeated introductions show that the true rate of transmission is lower than the observed increase in the number of surveillance genomes.

The rise of Delta from April to June 2021
The B.1.617.1/Kappa and B.1.617.2/Delta lineages, which were first detected in India in 2020, first appeared in English surveillance samples in March 2021. In contrast to other VOCs, Delta/Kappa do not contain N501Y or E484K mutations, but their L452R mutation may reduce antibody recognition 27 and P681R enhances furin cleavage 30 , similar to the P681H mutation of Alpha. The frequency of Delta, which harbours further spike mutations of unknown function, increased rapidly and reached levels of 98% (12,474/12,689) on 26 June 2021 (Fig. 5a, b). Although initially constrained to a small number of large local clusters, such as in Bolton, in May 2021 (Fig. 5c), Delta was detected in all LTLAs by 26 June 2021 (Fig. 1c). The sweep of Delta occurred at a rate of around 59% (growth per 5.1 d, CI = 53-66) higher than Alpha with minor regional variation ( Table 4).
The rapid rise of Delta contrasts with Kappa, which grew more slowly despite being introduced at a similar time and into a similar demographic background (Figs. 2a and 5b). This is also evident in the phylogeographical analysis (based on data as of 1 May 2021). The 224 genomes of Delta derive from larger clades (23 introductions, range = 6-40; around 10 genomes for every introduction) compared with the 80 genomes of Kappa (17 introductions, range = 15-31; around 3-4 genomes per introduction) and also other variants ( Fig. 5d and Extended Data Fig. 8). The AY.1 lineage, derived from Delta and containing an additional K417N mutation, appeared only transiently (Fig. 5b).
The sustained domestic growth of Delta and its international spread 31 relative to the Alpha lineage are the first evidence of a biological growth advantage. The causes appear to be a combination of increased transmissibility and immune evasion. Evidence for higher transmissibility   Article includes the fast growth in younger unvaccinated age groups, reports of elevated secondary attack rates 32 and a higher viral load 33 . Furthermore, vaccine efficacy against infection by Delta is diminished, depending on the type of vaccine 34,35 , and reinfection is more frequent 36 , both supported by experimental research demonstrating the reduced antibody neutralization of Delta by vaccine-derived and convalescent sera 37,38 .
The higher growth rate of Delta-combined with gradual reopening and proceeding vaccination-repeated the dichotomous pattern of lineage-specific decline and growth, although now with declining Alpha (R t < 1) and growing Delta (R t > 1; Fig. 5e, f). Overall, we estimate that the spread of more transmissible variants between August 2020 and early summer 2021 increased the average growth rate of circulating SARS-CoV-2 in England by a factor of 2.39 (95% CI = 2.25-2.42; Fig. 5g). Thus, previously effective interventions may prove to be insufficient to contain newly emerging and more transmissible variants.

Discussion
Our dense genomic surveillance analysis identified lineages that consistently grew faster than others in each local authority and, therefore, at the same time, under the same restrictions and in a comparable population. This pinpointed a series of variants with elevated transmissibility, in broad agreement with other reports 10,11,13,15,31 . However, a number of limitations exist. The growth rates of rare new variants are stochastic due to introductions and superspreading. Local outbreaks of the Beta and Gamma variants triggered asymptomatic surge testing, which may have reduced their spread. Furthermore, transmission depends both on the viral variant and the immunity of the host population, which changed from less than 20% to over 90% in the study period 39 . This will influence the growth rates of variants with immune evasion capabilities over time. The effect of immunity is currently not modelled, but may become more important in the future as SARS-CoV-2 becomes endemic. Further limitations are discussed in the Limitations section of the Methods.
The third and fourth waves in England were each caused by more transmissible variants, which outgrew restrictions that were sufficient to suppress previous variants. During the second national lockdown, Alpha grew despite falling numbers for other lineages and, similarly, Delta took hold in April and May when cases of Alpha were declining. The fact that such growth was initially masked by the falling cases of dominant lineages highlights the need for dense genomic surveillance and rapid analysis to devise optimal and timely control strategies. Such surveillance should ideally be global as, even though Delta was associated with a large wave of cases in India, its transmissibility remained unclear at the time due to a lack of systematic genomic surveillance data.
The 2.4-fold increase in growth rate during the study period as a result of new variants is also likely to have consequences for the future course of the pandemic. If this increase in growth rate was explained solely by higher transmissibility, it would raise the basic reproduction number R 0 from a value of around 2.5-3 in spring 2020 (ref. 40 ) to the range of 6-7 for Delta. This is likely to spur new waves of the epidemic in countries that have to date been able to control the epidemic despite low vaccination rates, and it may exacerbate the situation elsewhere. Although the exact herd-immunity threshold depends on contact patterns and the distribution of immunity across age groups 41,42 , it is worth considering that Delta may increase the threshold to values around 0.85. Given current estimates of vaccine efficacy 34,35,43 this would require nearly 100% vaccination coverage. Even though more than 90% of adults had antibodies against SARS-CoV-2 (ref. 39 ) and close to 70% had received two doses of vaccination, England saw rising Delta variant cases in the first weeks of July 2021. It can therefore be expected that other countries with high vaccination coverage are also likely to experience rising cases when restrictions are lifted.
SARS-CoV-2 is likely to continue its evolutionary adaptation process to humans 44 . To date, variants with considerably higher transmissibility have had strongest positive selection, and swept through England during the 10 months of this investigation. However, the possibility that an increasingly immune population may now select for variants with better immune escape highlights the need for continued systematic and, ideally, global genomic surveillance.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-021-04069-y.
as outlined above and used to fit the logarithmic incidence. The derivatives of the original basis f′(t) are used to calculate the underlying growth rates and R t values, as shown further below. The convolved spline basis f*(t) is used to fit the per capita incidence in each LTLA as (Extended Data Fig. 2b): This implies that fitting the incidence function for each of the m local authorities is achieved by a suitable choice of coefficients , that is one coefficient for each spline function for each of the LTLAs. The parameters B have a univariate normal prior distribution each, which reads for LTLA i and spline j: The s.d. of the prior regularizes the amplitude of the splines and is chosen as σ = 0.2 j for weekly splines and σ = 1 j for monthly splines. This choice was found to reduce the overall variance resulting from the high number of weekly splines, meant to capture rapid changes in growth rates, but which can lead to instabilities particularly at the end of the time series, when not all effects of changes in growth rates are observed yet. The less regularized monthly splines reflect trends on the scale of several weeks and are therefore subject to less noise.
Finally, we introduce a term accounting for periodic differences in weekly testing patterns (there are typically 30% lower specimens taken on weekends; Fig. 1a): where the scalar δ t δ t i i ( ) = ( − × 7) ∀ ∈N and prior distribution δ t ( ) LogNormal(0, 1) ∼ for t = 1, …, 6 and δ(0) = 1. The total incidence was fitted to the observed number of positive daily tests X by a negative binomial with a dispersion ω = 10. The overdispersion buffers against non-Poissonian uncorrelated fluctuations in the number of daily tests.
The equation above assumes that all elements of X(t) are independent, conditional on t ( ) μ .

Growth rates and R t values
A convenient consequence of the spline basis of log( ) = µ λ, is that the delay-adjusted daily logarithmic growth rate r(t) = λ ′ (t) of the local epidemic simplifies to: represents the first derivative of the jth cubic spline basis function.
To express the daily growth rate as an approximate reproductive number R t , one needs to consider the distribution of the intergeneration time, which is assumed to be gamma distributed with mean 6.3 d (α = 2.29, β = 0.36) 46 . The R t value can be expressed as a Laplace transform of the intergeneration time distribution 47 . Effectively, this shortens the relative time period because the exponential dynamics put disproportionally more weight on stochastically early transmissions over late ones. For reasons of simplicity and being mindful also of the uncertainties of the intergeneration time distribution, we approximate R t values by multiplying the logarithmic growth rates with a value of τ e = 5.1 d, which was found to be a reasonable approximation to the convolution required to calculate R t values (denoted here by the lower case symbol ρ t ( ) in line with our convention for vector-variate symbols and to avoid confusion with the epidemiological growth rate r t ), Thus, the overall growth rate scaled to an effective inter generation time of 5.1 d can be readily derived from the derivatives of the spline basis and the corresponding coefficients. The values derived from the approach are in very close agreement with those of the method of ref. 48 , but shifted according to the typical delay from infection to test (Extended Data Fig. 2b).

Genomic prevalence
The dynamics of the relative frequency P(t) of each lineage was modelled using a logistic-linear model in each LTLA, as described above. The logistic prevalence of each lineage in each LTLA is defined as t t ( ) = logit( ( )) L P . This is modelled using the piecewise linear expression L C b t t ( ) = + ⋅ , + where b may be interpreted as a lineage-specific growth advantage and C as an offset term of dimension (LTLA × lineages). Time + t is measured since introduction t 0 and is defined as and accounts for the fact that lineages can be entirely absent prior to a stochastically distributed time period preceding their first observation. This is because, in the absence of such a term, the absence of a lineage prior to the point of observation can only be explained by a higher growth rate compared with the preceding lineages, which may not necessarily be the case. As the exact time of introduction is generally unknown, a stochastic three-week period of ∼ t t Unif(−14, 0) + 0 0 obs prior to the first observation t 0 obs was chosen.
As the inverse logit transformation projects onto the l − 1 dimensional simplex S l−1 and therefore loses one degree of freedom, B.
The scalar parameter α α = 0.01 0 can be interpreted as a weak prior with expectation 1/n, making the model less sensitive to the introduction of single new lineages, which can otherwise exert a very strong effect. Furthermore, the array α = 1 cases 2 increases the variance to account for the fact that, especially at high sequencing coverage (genomes ≈ cases), cases and therefore genomes are likely to be correlated and overdispersed as they may derive from a single transmission event. Other choices such as α α = 1, 000 1 , which make the model converge to a standard multinomial, leave the conclusions qualitatively unchanged. This model aspect is illustrated in Extended Data Fig. 2c.

Lineage-specific incidence and growth rates
From the two definitions above it follows that the lineage-specific incidence is given by multiplying the total incidence in each LTLA µ(t) with the corresponding lineage frequency estimate P(t)for lineage j at each time point Further corresponding lineage-specific R t values R(t) in each LTLA can be calculated from the lineage-agnostic average R t value ρ(t) and the lineage proportions P(t) as

Inference
The model was implemented in numpyro 49,50 and fitted using stochastic variational inference 51 . Guide functions were multivariate normal distributions for each row (corresponding to an LTLA) of B, C to preserve the correlations across lineages and time as well as for (b, c) to also model correlations between growth rates and typical introduction.

Phylogeographic analyses
To infer VOC introduction events into the UK and corresponding clade sizes, we investigated VOC genome sequences from GISAID (https:// www.gisaid.org/) available from any country. We downloaded multiple sequence alignments of genome sequences with the release dates 17 April 2021 (for the analysis of the lineages A.23.1, B.1.1.318 On each VOC/VUI phylogeny, we inferred the minimum and maximum number of introductions of the considered SARS-CoV-2 lineage into the UK compatible with a parsimonious migration history of the ancestors of the considered samples; we also measured clade sizes for one specific example parsimonious migration history. We counted only introduction events into the UK that resulted in at least one descendant from the set of UK samples that we considered in this work for our hierarchical Bayesian model; similarly, we measured clade sizes by the number of UK samples considered here included in such clades. Multiple occurrences of identical sequences were counted as separate cases, as this helped us to identify rapid SARS-CoV-2 spread. When using parsimony, we considered only migration histories along a phylogenetic tree that are parsimonious in terms of the number of migration events from and to the UK (in practice, we collapse all of the non-UK locations into a single one). Furthermore, as SARS-CoV-2 phylogenies present substantial numbers of polytomies, that is, phylogenetic nodes where the tree topology cannot be reconstructed due to a lack of mutation events on certain branches, we developed a tailored dynamic programming approach to efficiently integrate over all possible splits of polytomies and over all possible parsimonious migration histories. The idea of this method is somewhat similar to typical Bayesian phylogeographic inference 54 in that it enables us to at least in part integrate over phylogenetic uncertainty and uncertainty in migration history; however, it also represents a very simplified version of these analyses, more so than ref. 16 , as it considers most of the phylogenetic tree as fixed, ignores sampling times and uses parsimony instead of a likelihood-based approach. Parsimony is expected to represent a good approximation in the context of SARS-CoV-2, due to the shortness (both in time and substitutions) of the phylogenetic branches considered 55,56 . The main advantage of our approach is that, owing to the dynamic programming implementation, it is more computationally efficient than Bayesian alternatives, as the most computationally demanding step is the inference of the maximum likelihood phylogenetic tree. This enables us to infer plausible ranges for numbers of introduction events for large datasets and to quickly update our analyses as new sequences become available. The other advantage of this approach is that it enables us to easily customize the analysis and to focus on inferred UK introductions that result in at least one UK surveillance sample, while still making use of non-surveillance UK samples to inform the inferred phylogenetic tree and migration history. Note that possible biases due to uneven sequencing rates across the world 55 apply to our approach as well as other popular phylogeographic methods. Our approach works by traversing the maximum likelihood tree starting from the terminal nodes and ending at the root (postorder traversal).
Here, we define a 'UK clade' as a maximal subtree of the total phylogeny for which all terminal nodes are from the UK, all internal nodes are inferred to be from the UK and at least one terminal node is a UK surveillance sample; the size of a UK clade is defined as the number of UK surveillance samples in it. At each node, using values already calculated for all children nodes (possibly more than two children in the case of a multifurcation), we calculate the following quantities: (1) the maximum and minimum number of possible descendant UK clades of the current node, over the space of possible parsimonious migration histories, and conditional on the current node being UK or non-UK; (2) the number of migration events compatible with a parsimonious migration history in the subtree below the current node, and conditional on the current node being UK or non-UK; (3) the size so far of the UK clade the current node is part of, conditional on it being UK; and (4) a sample of UK clade sizes for the subtree below the node. To calculate these quantities, for each internal node, and conditional on each possible node state (UK or non-UK), we consider the possible scenarios of having 0 of 1 migration events between the internal node and its children nodes (migration histories with more than 1 migration event between the node and its children are surely not parsimonious in our analysis and can be ignored).
To confirm the results of our analyses based on parsimony, we also used the new Bayesian phylogenetic approach Thorney BEAST 16 (https://beast.community/thorney_beast) for VOCs for which it was computationally feasible, that is, excluding B.1.351. For each VOC, we used in Thorney BEAST the same topology inferred with FastTree2 as for our parsimony analysis; we also used treetime 57 v.0.8.2 to estimate a timed tree and branch divergences for use in Thorney BEAST. We used a two-state (UK and non-UK) migration model 54 of migration to infer introductions into the UK but again counted, from the posterior sample trees, only UK clades with at least one UK surveillance sample. We used a Skygrid 58 tree coalescent prior with six time intervals. The comparison of parsimony and Bayesian estimates is shown in Extended Data Fig. 8d.
Comparison of ONS incidence estimates with hospitalization, case and death rates was conducted by estimating infection trajectories separately from observed cases, hospitalizations and deaths 59,60 , convolving them with estimated PCR detection curves 61 , and dividing the resulting PCR prevalence estimates by the estimated prevalence from the ONS Community Infection Survey at the midpoints of the two-week intervals over which prevalence was reported in the survey.

Limitations
A main limitation of the analysis is that the transmission model is deterministic, whereas the spread of variants is a stochastic process. Although the logistic growth assumption is a consistent estimator of the average transmission dynamics, individual outbreaks may deviate from these averages and therefore produce unreliable estimates.
Stochastic growth effects are accounted for only in terms of (uncorrelated) overdispersion and the offset at the time of the introduction. For these reasons, the estimated growth rates may not accurately reflect the viral transmissibility, especially at a low prevalence. It is therefore important to assess whether consistent growth patterns in multiple independent areas are observed. We note that the posterior distribution of the growth rates of rare variants tends to be biased to the baseline due to the centred prior.
In its current form, the model accounts for only a single introduction event per LTLA. Although this problem is in part alleviated by the high spatial resolution, which spreads introductions across 315 LTLAs, it is important to investigate whether sustained introductions inflate the observed growth rates, as in the case of the Delta variant or other VOCs and VUIs. This can be achieved by a more detailed phylogeographic assessment and through the assessment of monophyletic sublineages.
Furthermore, there is no explicit transmission modelled from one LTLA to another. As each introduction is therefore modelled separately, this makes the model conservative in ascertaining elevated transmission as single observed cases across different LTLAs can be explained by their introduction.
The inferred growth rates also cannot identify a particular mechanism of altered transmission. Biological mechanisms include a higher viral load, longer infectivity or greater susceptibility. Lineages could potentially differ by their intergeneration time, which would lead to nonlinear scaling. Here we did not find convincing evidence in incidence data for such effects, in contrast to previous reports 23 . However, contact-tracing data indicate that the intergeneration time may be shortening for more transmissible lineages such as Delta 33,62 . Cases of the Beta and Gamma VOCs may have been more intensely contact traced and triggered asymptomatic surge testing in some postcode areas. This may have reduced the observed growth rates relative to other lineages.
Lineages, such as Beta, Gamma or Delta also differ in their ability to evade previous immunity. As immunity changes over time, this might lead to a differential growth advantage over time. It is therefore advisable to assess whether a growth advantage is constant over periods in which immunity changes considerably.
A further limitation underlies the nature of lineage definition and assignment. The PANGO lineage definition 5 assigns lineages to geographical clusters, which have by definition expanded, and this can induce a certain survivor bias, often followed by winner's curse. Another issue results from the fact that very recent variants may not be classified as a lineage despite having grown, which can inflate the growth rate of ancestral lineages over sublineages.
As the total incidence is modelled on the basis of the total number of positive PCR tests, it may be influenced by testing capacity; the total number of tests approximately tripled between September 2020 and March 2021. This can potentially lead to a time trend in recorded cases and therefore baseline R t values if the access to testing changed, for example, by too few tests being available tests during periods of high incidence, or changes to the eligibility to intermittently test with fewer symptoms. Generally, the observed incidence was in good agreement with representative cross-sectional estimates from the ONS 63,64 , except for a period of peak incidence from late December 2020 to January 2021 (Extended Data Fig. 1d). Values after 8 March 2021 need to be interpreted with caution as Pillar 2 PCR testing was supplemented by lateral flow devices, which increased the number of daily tests to more than 1.5 million. Positive cases were usually confirmed by PCR and counted only once.
The modelled curves are smoothed over intervals of approximately 7 d using cubic splines, creating the possibility that later time points influence the period of investigation and cause a certain waviness of the R t value pattern. An alternative parameterization using piecewise linear basis functions per week (that is, constant R t values per week) leaves the overall conclusions and extracted parameters broadly unchanged.

Ethical approval
This study was performed as part of surveillance for COVID-19 under the auspices of Section 251 of the National Health Service Act 2006. It therefore did not require individual patient consent or ethical approval. The COG-UK study protocol was approved by the Public Health England Research Ethics Governance Group.

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Data availability
Corresponding author(s): Moritz Gerstung Last updated by author(s): Aug 27, 2021 Reporting Summary Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Research policies, see our Editorial Policies and the Editorial Policy Checklist.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted

Software and code
Policy information about availability of computer code Data collection Consensus Fasta sequences were created using the ARTIC nextflow processing pipeline and SARS-CoV-2 lineage assignments using the Pangolin software (01-04-2021 and 23-04-2021 version) and FastTree2 (2.1.11).

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability PCR test data are publicly available at https://coronavirus.data.gov.uk/. SARS-CoV-2 genome data and geolocations can be obtained under controlled access from https://www.cogconsortium.uk/data/.