Introduction

The SARS-CoV-2 pandemic continues to cause global impact. Spike protein-based SARS-CoV-2 vaccines have shown effectiveness in several countries1,2 enabling the relaxation of non-pharmaceutical interventions in some settings. The emergence of SARS-CoV-2 variants prompts questions about the ongoing protection elicited by existing vaccines. The risk of vaccine escape, whereby vaccine generated immunity is insufficient to provide protection against disease, is a concern. High virus transmission in combination with the presence of convalescent or vaccine-mediated immunity may drive selection of escape mutants. These theoretical concerns are broadly supported by in vitro data showing reduction in neutralising antibody titres, but efficacy or clinical effectiveness of existing spike-based vaccines against the Alpha (B.1.1.7) variant of concern (VOC) does not seem to be compromised3,4. However, such vaccines have a reduced efficacy against the Beta (B.1.351) VOC that possesses additional spike mutations5,6,7 and may translate into reduced vaccine effectiveness8. Nevertheless, several lines of evidence indicate that efficacy against severe disease may be preserved against current identified VOCs.

Brazil has experienced more than 16 million confirmed cases and over 450,000 deaths to date9, with the Amazon region being particularly severely affected10. Lineages B.1.1.33 and B.1.1.28 were dominant throughout Brazil during 202011. Towards the end of 2020, two sublineages of B.1.1.28, designated Zeta (P.2) and Gamma (P.1), emerged and spread rapidly through the population11,12. Prior infection with earlier lineages may not confer adequate or sustained protection in the face of emerging variants13. For example, areas in Brazil with suspected high seroprevalence rates have seen subsequent exponential growth of infections10. This contrasts with the protection from reinfection for a median of 7 months duration seen in a large healthcare worker (HCW) study in the UK during a period when B.1 lineages were circulating and the Alpha (B.1.1.7) variant arose14. Symptomatic reinfections in immunocompetent adults with Gamma (P.1) and Zeta (P.2) sublineages have been described (following B.1 and B.1.133 infections respectively)15,16.

Both the Gamma (P.1) and Zeta (P.2) sublineages harbour the E484K mutation in the receptor binding domain (RBD) of the spike protein. E484K has been associated with in vitro immune escape from therapeutic monoclonal antibodies17,18,19, prompting the withdrawal of the emergency use authorisation for bamlanivumab in the US20. The E484K mutation is observed to have arisen independently in other variants such as Beta (B.1.351)21 and features as an additional mutation in recent samples of established VOCs such as Alpha (B.1.1.7)22. Whilst P.2 harbours no other lineage-specific spike mutations, Gamma (P.1) has additional RBD mutations, most notably K417T and N501Y. The coincident emergence of N501Y, K417T/N and E484K mutations in Gamma (P.1) and Beta (B.1.351)21,23 is suggestive of convergent evolution12.

The shared triplet of RBD mutations might suggest that the pattern of in vitro responses24,25 and reduced efficacy of ChAdOx1 nCov-195 for Beta (B.1.351) may be echoed for Gamma (P.1). However, early in vitro data showed two monoclonal antibodies retained activity against Gamma (P.1) while showing no neutralization against Beta (B.1.351)19,24. Convalescent sera from individuals infected early in the pandemic and from mRNA and viral-vectored vaccine recipients showed a reduction in Gamma (P.1) neutralization activity for both pseudovirus and live coronavirus19 but not to the extent seen for Beta (B.1.351)26. Early data for an inactivated vaccine in Brazil when Gamma (P.1) was dominant also suggest there may be a reduction in effectiveness27.

In this paper, we report the findings from a multisite Brazilian COVID-19 vaccine efficacy study assessing the efficacy of the ChAdOx1 nCoV-19 vaccine in preventing symptomatic COVID-19 disease caused by the individual circulating SARS-CoV-2 lineages.

Results

There were 10416 participants enroled and randomised into the study between June 23, 2020 and December 1, 2020. 9433 participants received two doses and met the criteria for inclusion in this analysis. Reasons for exclusion are shown in Fig. 1.

Fig. 1: CONSORT Flow diagram.
figure 1

Flow chart showing; the number of participants randomised and vaccinated with ChAdOx1 nCoV-19 or control vaccines; the number of participants included in the primary efficacy analysis and reasons for exclusion; the number of participants receiving vaccines after unblinding; and cases occurring after unblinding. Feb 28, 2021 was the data cut-off date for this analysis and events occurring after this date are not included in the data set for analysis.

677 clinical samples were shipped and processed. Of these, 307 (45%) came from cases of primary symptomatic COVID-19 meeting the definition for inclusion in the efficacy analysis, and 236 (77%) of these primary cases had sufficient intact specimen for lineage assignment through sequencing or genotyping. Some participants had more than one positive swab for the same event.

Demographic and baseline characteristics of the primary efficacy cohort were well balanced (Table 1). A total of 82% of the participants were aged 18–55 years and 70% identified as white. A total of 65% worked in a health or social care setting.

Table 1 Demographics and baseline characteristics of primary efficacy cohort.

The most prevalent lineage identified was Zeta (P.2) in 153 cases, followed by the ancestral B.1.1.28 lineage in 49 cases. Unblinding of study participants began at a similar time as the appearance of the Gamma (P.1) variant and only 18 cases were able to be included in the analysis (Fig. 2a, b). There were 46 cases of symptomatic NAAT + COVID-19 that occurred after a single dose or before the receipt of the second dose, including 22 cases of Zeta (P.2). These are summarised by lineage in Supplementary Table 4.

Fig. 2: Distribution of SARS-CoV-2 lineages from nose/throat swabs over time.
figure 2

a Stacked bar chart of cases of NAAT + SARS-CoV-2 each week during the study, with lineage assigned by sequencing or genotyping where available. b Stacked bar chart of cases of NAAT + SARS-CoV-2 each week during the study, by study site, with lineage assigned by sequencing or genotyping where available. The 6 study sites are are: Sao Paulo, Rio de Janeiro, Salvador, Santa Maria, and Porto Alegre. (see map of sites in Supplementary Fig. 3). X-axis labels show calendar year and week number. Numbers above the x-axis show the number of cases of NAAT + SARS-CoV-2 that occurred in the study during that week. Swabs were available for sequencing and genotyping only if participants were tested at a study site laboratory and the study sample was stored. An early sample from August 2020 was assigned to Gamma (P.1) to the presence of the K417T mutation. Phylogeographic analyses suggest emergence of the dominant P.1 lineage in November 2020, with a most recent common ancestor of all P.1-like (K417T) viruses estimated at August 202048. As low viral load of this sample in our dataset precluded sequencing, we were unable to further refine its phylogenetic lineage. Therefore it is plausible that this sample was a precursor to likely ‘true’ Gamma (P.1) or a spontaneous K417T mutation. In keeping with national surveillance data, multiple instances of Gamma (P.1) samples were observed in our data from January 2021.

Vaccine efficacy after two doses was 73%, (95% CI, 46, 86) for B.1.1.28, and for Zeta (P.2) was 69% (95% CI, 55, 78). Fewer cases were available for analysis of efficacy for B.1.1.33 (VE 88.2%, 95%CI 5, 99), and Gamma (P.1) (64%, 95% CI, −2, 87) which had wide confidence intervals. Efficacy was not computed for cases of N.9 (N = 4), N.6 (N = 1) or Alpha (B.1.1.7) (N = 1) as there were fewer than 5 instances of each. Swabs that were not available for sequencing as the participant had accessed PCR testing at a non-study lab were imputed using a multiple imputation (MI) model. The MI analysis gave estimates of 65.8% 95% CI (5, 88) for Gamma (P.1) and 65.2% 95% CI (53. 74) for the Zeta (P.2) (Table 2).

Table 2 Efficacy of 2 doses of ChAdOx1 nCoV-19 against primary symptomatic COVID-19, by SARS-CoV-2 lineage.

Primary outcome hospitalisation cases, occurring more than two weeks after a second dose, were present in 1 and 18 participants in the ChAdOx1 nCoV-19 and control groups respectively, VE 95% (95% CI 61, 99). The one hospitalised participant in the ChAdOx1 nCoV-19 group had a WHO score of 5, but no swab was available for processing to determine lineage. There were no severe cases nor deaths in the vaccinated arm. Among the participants meeting the criteria for primary efficacy analysis, there was one death due to COVID-19 in the control arm, and 6 further cases were classified as severe COVID-19 (WHO score ≥ 6), also in the control arm, giving 100% efficacy (95% CI 34, NE) against severe COVID-19 with two doses of vaccine (Table 3, Supplementary Table 1). A second death occurred in the control arm more than 21 days after the first dose of vaccine but before the second dose was received.

Table 3 Hospitalisations (WHO score >= 4) by SARS-CoV-2 lineage and WHO clinical progression score.

Viral load varied by SARS-CoV-2 lineage (p = 0.0002) with the Gamma (P.1) lineage having the highest median viral load (Fig. 3, Supplementary Table 2).

Fig. 3: Viral Load in nose and throat swabs by SARS-CoV-2 lineage.
figure 3

Box plot of viral load (IU/mL) from NAAT + SARS-CoV-2 cases in Brazil, in vaccinated and control participants combined. Lineages were assigned by sequencing and genotyping. Number of cases included in the analysis are shown below each box. Boxes show the median and 25th to 75th percentile range (bounds of the boxes) and whiskers to the last data point within 1.5 x interquartile range from the 25th or 75th percentile. Kruskal-Wallis test across all four groups: two-sided p = 0.0002. Different colours represent different lineages as labelled on the x-axis.

Discussion

In this post-hoc exploratory analysis, ChAdOx1 nCoV-19 provided protection against severe disease and death in Brazil, the key endpoints to protect lives and safeguard medical infrastructure from being overwhelmed. This analysis also shows vaccine efficacy against the dominant lineages causing symptomatic COVID-19 infection in our participants: Zeta (P.2) and B.1.1.28. There were relatively few cases of the B.1.1.33 and Gamma (P.1) lineages observed in the timeframe prior to unblinding, and assessment of efficacy for these variants was underpowered. The distribution of P.1 cases observed suggest that protection against symptomatic disease for this variant may be maintained but slightly reduced in comparison with Zeta (P.2) or the parent lineage B.1.1.28. However, the limited number of cases available for analysis makes it difficult to draw firm conclusions.

All first-generation spike-based COVID-19 vaccines that are currently in clinical use were generated from the ancestral Wu-1spike gene sequence, raising the potential for loss of vaccine efficacy as SARS-CoV-2 accumulates mutations during viral evolution. Our observations of vaccine efficacy of ChAdOx1 nCoV-19 against symptomatic disease in this report are consistent with our primary combined analysis of efficacy from studies in Brazil, the UK, and South Africa, in which VE was 66·7% (57·4 to 74·0). The single-dose adenovirus vectored vaccine (Ad26.COV2.S) phase 3 data showed efficacy against moderate to severe-critical COVID-19 disease of 68.1% (95% CI, 48.8 to 80.7) where Zeta (P.2) formed the majority of the sequences obtained28. A recent pre-print of a test negative Canadian case control study showed vaccine effectiveness of 48% (28 to 63) >14 days post 1 dose of ChAdOx nCoV-19 against combined Beta (B.1.351)/Gamma (P.1). Only single dose data were available given the timing of authorisation of ChAdOx1 nCoV-19 in Canada. This study had insufficient specimens to distinguish between these lineages and were thus combined, which emphasises the difficulty of achieving an adequate number of sequences for statistical comparison29.

Our data are also in keeping with the high levels of protection against severe disease caused by other variants such as Beta (B.1.351) by BNT162b2 and NVX-CoV237330,31. However, the positive findings from this study for Zeta (P.2) are in contrast to the lack of observed efficacy seen for ChAdOx1 nCoV-19 against mild-moderate disease caused by Beta (B.1.351)5. There is a wide clinical spectrum of SARS-CoV-2 infection, from asymptomatic to severe COVID-19 disease requiring multi-organ support. The immune responses required to protect from asymptomatic disease may differ in nature or magnitude from those required to protect against severe disease which may in turn have implications for the ability of SARS-CoV-2 vaccines to reduce transmission. Animal data from the ChAdOx1 nCoV-19 vaccinated hamster model showed a reduction in virus neutralising antibody titre with Beta (B.1.351) compared with Alpha (B.1.1.7). However, when challenged with either of these lineages, the vaccinated animals did not have infectious virus or gross pathology in their lungs yet virus detectable in the upper respiratory tract of both vaccinated and control animals32.

Ongoing antigenic drift of the SARS-CoV-2 virus due to error-prone RNA replication is inevitable and it is possible that vaccines will drive the selection of variants towards escape from neutralising antibodies and to increased transmissibility. Many of the RBD mutations that have arisen appear to be associated with immune evasion, transmissibility or both. The only RBD lineage defining mutation for Zeta (P.2) lineage is the E484K mutation, whilst Gamma (P.1) and Beta (B.1.351) harbour multiple RBD mutations. E484K (and a similar mutation E484Q) are being rapidly accumulated by lineages across distinct epidemiological and geographic settings and the addition of E484K/Q mutations to existing VOCs (such as Alpha (B.1.1.7)) is associated with evasion of neutralising antibodies17,33. The observation that vaccine efficacy in our trial was preserved for P.2 may indicate that E484K, when occurring as an isolated RBD mutation, may be responsible for minimal reduction in protection. However, it is not known what the relative contribution of E484K/Q mutations may have on vaccine efficacy when occurring as part of a constellation of RBD mutations. A cautious approach to variants containing E484K and other RBD mutations is warranted whilst our understanding of their individual impact improves.

The viral load was highest in the Gamma (P.1) cases consistent with other analyses12,34. Higher viral loads may result in more shedding of virus, contributing to the greater transmissibility seen with this variant12. It has been suggested that the time between onset of illness and NAAT testing might vary during the progression of the pandemic, confounding attempts to compare viral loads for different variants35. In our study there was a consistent median 4-day difference between illness onset and the collection date of the swab across all identified lineages thus comparisons of viral load are not confounded by this potential source of bias. Of note, samples with undetermined lineages had a larger median 8-day interval (IQR 6, 12) between illness onset and NAAT swab which may have resulted in the sample being taken at a time of reduced viral load making it more difficult to assign a lineage to the event.

The limitations of these data are that the sample size was determined by the number of samples from which a sequence sufficient to define lineage could be generated and was not sufficient to enable comparisons of efficacy between lineages. The evolution of the virus over time and between geographically distant trial sites resulted in a dataset with limited numbers in some lineage groups for efficacy analysis. Our study sites were situated in the South and East of Brazil which may explain the relatively small proportion of Gamma (P.1) cases by the time of data cut-off. Phylogenetic evidence suggests this lineage arose in North West Brazil and a corresponding delay in observations from other parts of the country would be expected in line with epidemic spread12. In addition, the trial participants were unblinded as to allocation arm to allow participants to be vaccinated once efficacy was established, as requested by the ethics committee, thereby necessarily truncating the participants’ ongoing inclusion for efficacy analysis. The unblinding of the study occurred at a time when Gamma (P.1) infections were growing rapidly in our study site areas. There were 18 cases of Gamma (P.1) included in the efficacy analysis and 160 that occurred after unblinding which could not be included in the efficacy analyses (supplementary table S2). However, every effort was made to assign a lineage for relevant samples obtained prior to unblinding by using a novel allele specific PCR method and missing data were imputed in a sensitivity analysis which yielded similar efficacy estimates to the complete case analysis. Our trial participants were also predominantly younger (<56 years) with relatively few co-morbidities, however despite this there was still evidence of protection against severe disease and death.

National roll-out of 2 COVID-19 vaccines, the Sinovac Biotech Ltd and Oxford/AstraZeneca vaccines, began in Brazil in January 2021, prior to study unblinding, and further vaccines have been subsequently approved for use. More than 20.1% of the population (total population ~212 million) had received at least one dose of a COVID-19 vaccine by 25th May 2021. Vaccine effectiveness studies are underway to evaluate real world impact on the pandemic in Brazil as vaccine trial efficacy of first-generation vaccines in most settings will no longer be attainable due to population vaccine roll out.

For next generation vaccines, studies to ascertain efficacy are likely to be based upon immunogenicity data showing equivalence to an as yet undefined immune correlate for protection which will be established from phase 3 trials. However, the variability of vaccine efficacy may be underpinned by genetic mismatch between the vaccine lineage and currently circulating virus36. Defining the correct immune correlate is challenging in the face of continued antigenic drift, and selection pressure from previous infection and vaccine induced immunity. Work is ongoing to establish the role of variant vaccines, heterologous schedules and booster regimes.

The likelihood that vaccine effectiveness may vary against emerging SARS-CoV-2 variants emphasises the need for the infrastructure for ongoing viral genomic surveillance. This is particularly important in countries where both viral transmission is high and vaccination coverage is limited, and may need support from international agencies.

Methods

Overview

An ongoing randomised controlled phase 3 multi-site trial of the efficacy of the ChAdOx1 nCoV-19 vaccine was conducted in Brazil that began on June 23, 2020. Efficacy, safety, and immunogenicity data, including the primary and secondary outcomes of the study, as well as the full study protocol have been previously published as part of a prespecified analysis of pooled data from UK, Brazil, and South Africa37,38.

Study design and participants

This multi-centre study assessing the safety and efficacy of ChAdOx1 nCoV-19 vaccine was performed at six sites across Brazil (São Paulo, Rio de Janeiro, Salvador, Natal, Santa Maria, Porto Alegre) (Supplementary Fig. 3). Individuals aged 18 and over at high risk of exposure to SARS-CoV-2, with healthcare workers prioritised for enrolment.

Participants were screened for inclusion and exclusion criteria, underwent medical history review, clinical observations, history-directed clinical examination and provided informed consent.

Participants were randomised 1:1 using REDCap 10.6.13 to receive either ChAdOx1 nCoV-19 (3.5–6.5 × 1010 viral particles) or MenACWY conjugate vaccine as a control, administered as an intramuscular injection. Participants randomised to the control group received saline as their second dose. In response to emerging data from our phase 1 ChAdOx1 nCoV-19 study showing a rise in neutralising antibody with a second dose39, all participants were offered a second dose with a dose interval of between 4 and 12 weeks (median 35 days, IQR 32, 47). Participants, clinical investigators and laboratory staff were blinded to vaccine allocation. Following emergency use authorisation of ChAdOx1 nCoV-19 and an inactivated SARS-CoV-2 viral vaccine in Brazil on 17th January 2021, all trial participants were unblinded to vaccine allocation but remained in the trial and continued with follow up. Participants in the control group were offered 2 doses of ChAdOx1 nCoV-19 within the trial with a dose interval in line with the national programme or could choose to accept the inactivated viral vaccine as part of the Brazilian national immunisation programme.

Participants were asked to contact their study site if they developed any one of: fever of ≥37.8 °C, cough, shortness of breath or anosmia/ageusia. They were reminded weekly to do so throughout the trial. Symptomatic participants were invited for nasopharyngeal and oropharyngeal swabbing and a SARS-CoV-2 nucleic acid amplification test (NAAT) at their local clinical site. Samples were processed using commercial NAAT assays at local diagnostic laboratories listed in Supplementary Table S5. Swabs were shipped to Oxford for sequencing and genotyping as described in the supplementary methods section.

Outcomes

The primary objective of the trial was to evaluate efficacy of the ChAdOx1 nCoV-19 vaccine against NAAT-confirmed COVID-19. The primary outcome was virologically-confirmed, symptomatic COVID-19, defined as a NAAT-positive swab combined with at least one of: fever ≥37.8 °C, cough, shortness of breath, anosmia or ageusia. All NAAT positive cases occurring before participant unblinding and vaccination were reviewed by a blinded independent endpoint adjudication committee who assigned severity scores using the WHO clinical progression score40. Only cases adjudicated by the committee as primary outcome cases were included in the analysis. Participants continued to be followed up for SARS-CoV-2 infection after unblinding and subsequent vaccination and these cases were adjudicated by an internal adjudication committee and are not included in efficacy analyses.

Analysis by lineage is a post-hoc exploratory analysis.

Statistical methods

Participants were included in primary efficacy analyses if they were seronegative to the nucleocapsid protein at baseline, received two doses of vaccine, had follow up for at least 15 days after the second dose, and no prior evidence of infection. Cases were included in the efficacy analysis if a lineage was obtained from processing the swab taken for diagnosis, COVID-19 symptoms occurred on day 15 after the second dose or later, and before the participant was unblinded as to the vaccines they had received. In addition, some participants received a COVID-19 vaccine outside of the trial and were censored in the analysis at this time point.

Symptomatic cases occurring more than 21 days after a first dose but before the 15 day post-second dose timepoint were considered secondary endpoints for efficacy analyses.

Vaccine efficacy was defined as 100% x (1 – relative risk (RR)), where RR was estimated from an unadjusted robust Poisson model using SAS proc genmod. The log of the number of days of follow up was included as an offset in the model.

To determine if the SARS-CoV-2 lineage affected the viral load for the case, viral load data was compared across variants for swabs from cases included in the efficacy analysis, and separately from all processed swabs combined regardless of vaccines received. Viral loads were compared using the Kruskal-Wallis test.

Swabs were not available from all cases as some participants accessed NAAT tests at non-study sites and at one site a freezer malfunctioned. A sensitivity analysis was conducted using multiple imputations to impute the missing lineage data from unavailable swabs under a missing at random assumption. The imputation model generated a value from a three-component multinomial (categorical) variable in which the three components corresponded to ‘Gamma (P.1) variant’, ‘Zeta (P.2) variant’ or ‘other variants’. The probabilities used to generate the imputed value were obtained from the site-specific distribution of Gamma (P.1) to Zeta (P.2) to other variants on the week the case occurred, by vaccine arm. This allowed for the chronological and geographical spread of new variants to be incorporated into the imputation, and for any potential difference in efficacy by variant to be incorporated in the imputation model. 100 imputation datasets were generated and the estimate and its standard error stored for each iteration. The 100 imputed estimates were combined using Rubin’s rules41,42 in SAS proc mianalyze.

The data cut-off date for this analysis was February 28, 2021 at which point the majority of participants in the trial were unblinded and vaccinated. Follow up of continues, however cases that accrued after unblinding and vaccination do not contribute to efficacy analyses.

Data collection was done using RedCap version 9.5.22. Statistical analysis was done using SAS version 9.4 and R version 4.0.4. Bioinformatics analysis was conducted in Python with Pandas 0.25.3. Consensus sequences were aligned using MAFFT version 7.402.

The trial was conducted according to the principles of the Declaration of Helsinki and was approved by the Brazilian National research Ethics Committee (ref: 32604920.5.0000.5505), and the Oxford Tropical Research Ethics Committee (ref. 20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36).

The trial is registered at ISRCTN89951424.

RNA extraction and viral load quantification

For 96% of the sample set (650/676 samples, 516/518 of efficacy cohort), RNA was extracted from primary samples shipped at −80 °C from participating sites in Brazil to the University of Oxford. The remaining samples were shipped as pre-extracted RNA. SARS-CoV-2 viral RNA was extracted from swab samples using the Quick-DNA/RNA Viral kit (Zymo Research): 200 µL of sample was mixed with 200 µL of DNA/RNA shield, before being extracted according to the manufacturer’s spin column protocol. RNA was eluted in 50 µL of DNAse/RNAse-free water and frozen at −80 °C. SARS-CoV-2 RNA was quantified by real-time polymerase chain reaction (RT-PCR) using the CDC N1 oligonucleotide set (https://www.cdc.gov/coronavirus/2019-ncov/lab/rt-pcr-panel-primer-probes.html) and the Quantitect Probe RT-PCR kit (QIAGEN) in a 25 µL reaction volume containing 2 µL of extracted RNA. Oligonucleotides (ATDBio) were resuspended in ultrapure water. RT-PCR was performed on an Applied Biosystems StepOnePlus Real-Time PCR system (ThermoFisher Scientific) with the following settings: 50 °C for 30 min (reverse transcription), 95 °C for 10 min (hot-start polymerase activation), and 40 cycles of 94 °C for 15 sec (denaturation) and 60 °C for 1 min (combined annealing and extension). Intra-assay variation was controlled through use of a standard curve of synthetic RNA control 19/304 (NIBSC https://www.nibsc.org/products/brm_product_catalogue/detail_page.aspx?CatId=19/304) serially diluted from 1,000 copies/reaction to 10 copies/reaction. RT-PCR Ct values were converted to copy number/reaction using the standard curve, and to international units/mL by the conversion rate provided by NIBSC for samples with known processing volumes.

Sequencing

Samples with Ct<31 were taken forward for veSEQ sequencing as previously described43, using 30 µl RNA per sample as input volume and performing target capture on batches of 90 samples, alongside a series of quantification standards and positive and controls. Samples were demultiplexed using unique dual indexes (UDI), and read output was validated against Ct values to confirm sample integrity. Genomes were assembled from sequencing reads using the ShiverCovid pipeline v1.8 (https://github.com/BDI-pathogens/ShiverCovid) with variant frequencies calculated using shiver (tools/AnalysePileup.py)44, using default settings of no base alignment quality and maximum pileup depth of 1000000. Lineages were assigned by Pangolin version 2.4.2 (lineages version 2021–04–28) combined with phylogenetic placement within the relevant clade, using the determined consensus genome for each sequenced sample. For incomplete genomes, lineages were assigned based on presence of lineage-defining mutations for Gamma (P.1) and Zeta (P.2) in the sequencing reads (https://github.com/phe-genomics/variant_definitions/blob/main/variant_yaml/) and by genotyping as described below.

Phylogenetic reconstruction

Consensus sequences were aligned using MAFFT version 7.40245 with the default settings (algorithm FFT-NS-2, 6merpair, retree 2, weighting factor 2.7, gap opening 1.53, gap extension 0.123). Phylogenetic reconstruction was performed on the alignment consisting of consensus sequences rooted with the Wuhan-Hu-1 reference sequence (RefSeq NC_045512), using IQ-TREE version 1.6.1246, with the generalised time reversible + FreeRate model and 1000 bootstrap replicates.

Genotyping

Samples for which genome sequencing did not give a clear lineage classification, or which showed evidence of RNA degradation (as identified by unexpectedly low read yield and library fragment sizes <200b; typical median fragment size 380b), were genotyped using allele specific PCR (ASP)-based assays47. Custom Gamma (P.1) and Zeta (P.2) ASP assays were designed to identify lineage-specific and highly sensitive single-nucleotide polymorphisms (SNPs) S:K417T (Gamma, P.1) and ORF1a:L3468V (Zeta, P.2). The ASP utilizes two dye-labelled probes that differ only in the SNP location, and leverages differential binding affinities of each probe due to primer-target mismatches to genotype the SNP with a higher sensitivity than sequencing. The assays were validated using sequence-confirmed Gamma (P.1) and Zeta (P.2) samples from the present dataset, with samples from other non-Gamma/Zeta (P.1/P.2) lineages as controls (Supp. Figs. 1 and 2). ASP was performed using the Quantitect Probe RT-PCR kit (QIAGEN) in a 25 µL reaction volume containing 5 µL of extracted RNA and performed on the Applied Biosystems StepOnePlus Real-Time PCR system using a genotyping program. Gamma (P.1) and Zeta (P.2) oligonucleotide sequences and reaction concentrations are listed in Supplementary Table 5. The P.1 ASP was performed with the following settings: 50 °C for 30 min, 60 °C for 30 seconds (pre-amplification read), 95 °C for 10 min, 45 cycles of 95 °C for 15 sec, 58 °C for 20 seconds, and 60 °C for 45 seconds, and 60 °C for 30 seconds (post-amplification read). The Zeta (P.2) ASP was performed with the following settings: 50 °C for 30 min, 66.5 °C for 30 seconds (pre-amplification read), 95 °C for 10 min, 50 cycles of 95 °C for 15 sec and 66.5 °C for 1 minute, and 66.5 °C for 30 seconds (post-amplification read). cDNA of sequence-confirmed samples was generated using the SuperScript III First-Strand Synthesis System (ThermoFisher Scientific) according to the manufacturer’s instructions for gene-specific primers, except reverse transcription of the Zeta (P.2) cDNA controls was performed at 50 °C. Serially diluted cDNA aliquots of sequence-confirmed Gamma (P.1), Zeta (P.2), and non-Gamma/Zeta (P.1/P.2) samples were used as discrimination controls and ultrapure water served as no-template controls (NTCs). The change in fluorescent signal between pre-amplification and post-amplification reads for both dye-labelled probes was plotted on a cartesian plane. SNPs were designated based on their clustering with discrimination controls. Samples that failed to achieve a change in signal in either probe greater than those of the NTCs or lacked evidence of amplification were designated “undetermined.” Samples that were genotyped as non-Gamma/Zeta (P.1/P.2) by ASP and had no sequence data were classified as “Other lineage (non-Gamma/Zeta, P.1/P.2)”. Samples that could not be assigned a lineage by either sequencing or genotyping were classified as “Undetermined”.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.