Vaccines targeting the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) have been highly effective in preventing symptomatic illness and in reducing hospitalizations and deaths from coronavirus disease 2019 (COVID-19)1,2,3,4,5,6,7. Previous studies have also suggested that vaccination may reduce viral loads in persons with breakthrough SARS-CoV-2 infection who have received at least one dose7,8,9, thus decreasing infectiousness and mitigating transmission. However, most of these studies were done before the emergence of ‘antibody-resistant’ SARS-CoV-2 variants of concern/variants of interest (VOCs/VOIs) carrying key mutations that have been shown to decrease antibody (Ab) neutralization (L452R/Q, E484K/Q and/or F490S), including the Beta (B.1.351), Gamma (P.1), Delta (B.1.617.2), Epsilon (B.1.427/B.1.429) and Lambda (C.37), but not the Alpha (B.1.1.7) variants10,11,12,13,14. Breakthrough infections have been reported in a small proportion of vaccine recipients3,15,16,17,18,19,20,21,22,23,24, yet little is known regarding the relative capacity of different circulating variants to escape vaccine-induced immunity and facilitate ongoing spread within highly vaccinated communities.

In San Francisco County, a sharp decline in COVID-19 cases following a 2020–2021 winter outbreak of the Epsilon variant in California10,25 preceded mass vaccination efforts (Fig. 1a). From February to June 2021, the number of cases per day continued to gradually decrease, despite a nationwide outbreak from the Alpha variant in the United States26 and the continual introduction of other VOCs/VOIs into the community25. In late June, there was an uptick in Delta variant cases presaging a surge of infections from this variant in the county and nationwide25.

Fig. 1: Overview of vaccination and SARS-CoV-2 whole-genome sequencing data from the San Francisco Bay Area.
figure 1

a, Plot showing the percentage of eligible individuals in San Francisco County who had received an FDA-authorized vaccine from the beginning of mass vaccine rollout until 30 June 2021 (blue line) and the number of new SARS-CoV-2 positive cases per 100,000 people in San Francisco County (orange line). The peach-coloured area denotes the study timeframe for sample collection (1 February to 30 June 2021). b, Plot showing the proportion of sequenced genomes with identified lineages from all cases (black line) and from fully vaccinated breakthrough cases (dark red line). The dotted line shows the proportion of sequenced genomes from breakthrough cases, regardless of whether a lineage was identified. c, Plot showing the distribution of SARS-CoV-2 lineages identified using Pangolin algorithm31 from sequenced COVID-19 cases from San Francisco County, aggregated biweekly from 1 February to 30 June 2021. d, Plots showing the distribution of Pangolin-identified SARS-CoV-2 lineages from all sequenced (top), unvaccinated (middle) and fully vaccinated breakthrough (bottom) cases. e, Plots showing the proportion of sequenced genomes carrying mutations associated with antibody resistance (L452R/Q, E484K/Q and/or F490S) among all sequenced (top), unvaccinated (middle) and fully vaccinated breakthrough (bottom) cases. f. Plots showing the proportion of sequenced genomes carrying mutations associated with enhanced infectivity (L452R/Q, N501Y/T and/or F490S) among all sequenced (top), unvaccinated (middle) and fully vaccinated breakthrough (bottom) cases. Proportions were calculated relative to the total number of sequenced cases and aggregated biweekly from 1 February to 30 June 2021.

Here we performed whole-genome sequencing and viral load measurements of nasal swabs in conjunction with retrospective medical chart review from COVID-19 infected persons over a 5-month period to investigate dynamic longitudinal shifts in the distribution of SARS-CoV-2 variants over time and identify potential correlates of breakthrough infections in a progressively vaccinated community.


We performed whole-genome sequencing of available remnant mid-turbinate nasal, nasopharyngeal and/or oropharyngeal swab samples collected from 1,373 polymerase chain reaction (PCR)-positive COVID-19 cases from San Francisco County from 1 February to 30 June 2021. During this study period, the percentage of eligible persons vaccinated in the county increased from 2 to 70%, while the number of new cases per 100,000 population declined from 23 to 2 (Fig. 1a). The cohort included COVID-19 patients seen in hospitals and clinics at University of California, San Francisco (UCSF, n = 598, 43.6%) and infected persons identified by community testing in San Francisco County performed by a commercial laboratory (Color Genomics, n = 775, 56.4%). Using the U.S. Centers for Disease Control and Prevention (CDC) definition of a vaccine breakthrough infection as a positive SARS-CoV-2 RNA or antigen test ≥14 d after completion of all recommended doses of a U.S. Food and Drug Administration (FDA)-authorized vaccine27, 125 (9.1%) of infections in the cohort were vaccine breakthroughs (Extended Data Fig. 1). Of the 125 breakthrough cases, 122 (97.6%) were confirmed to have received the Pfizer-BioNTech (BNT162b2) COVID-19 mRNA, Moderna (mRNA-1273) COVID-19 mRNA, or Johnson & Johnson/Janssen (JnJ) COVID-19 viral vector (adenovirus) vaccine. The percentage of sequenced cases that were vaccine breakthroughs increased from 0% to 31.8% from February to June (Fig. 1b). Among the viruses sequenced from the 1,373 cases, 69% (945 of 1,373) were unambiguously assigned to a SARS-CoV-2 lineage (Fig. 1b and Extended Data Fig. 1). The remaining 31% were not assigned a lineage due to insufficient genome coverage, and hence were not used for variant identification or for phylogenetic analysis.

Multiple variant lineages were found to be circulating in San Francisco County during the 5-month study period (Fig. 1c,d). The distribution of study lineages among all (Fig. 1d, top) and unvaccinated (Fig. 1d, middle) cases reflected the community distribution based on all 1,191 available reference genomes in the Global Initiative on Sharing All Influenza Data (GISAID) database, which also includes SARS-CoV-2 sequences (Fig. 1c). In contrast, the distribution of study lineages among vaccinated cases was skewed, with overrepresentation of antibody-resistant lineages (those containing ≥1 of L452R/Q, E484K/Q and/or F490S mutation10,11,12,13,14), including Beta, Gamma, Delta, Epsilon and Iota, and a corresponding decreased number and proportion of Alpha variant cases (Fig. 1c, bottom). Among all and unvaccinated cases (Fig. 1e), the proportion of antibody-resistant variants increased from approximately 40% to 90%. In contrast, antibody-resistant lineages comprised a higher percentage of cases in fully vaccinated as compared with unvaccinated or all cases (Fig. 1e, bottom). The proportion of variants with increased infectivity (those containing ≥1 of L452R/Q, N501Y/T and/or F490S) mutations10,11,12,13,14,28 increased over time to >95% in all, unvaccinated and fully vaccinated cases (Fig. 1f). By phylogenetic analysis, the genomes from fully vaccinated cases were found to be intermixed with those from unvaccinated cases and broadly distributed across all the major viral subclades (Fig. 2a,b), with proportionally more genomes assigned to subclades associated with antibody-resistant variants and no evidence of clustering by either time (Fig. 2a) or genetic distance (number of mutations) (Fig. 2b). A multiple sequence alignment comparing 42 representative SARS-CoV-2 genomes, including 6 variant lineages and 1 non-VOC/VOI lineage, showed similar overlapping mutation patterns between vaccinated and unvaccinated cases (Extended Data Fig. 2). After stratifying by month, we identified statistically significant differences in the proportion of antibody-resistant variants between fully vaccinated and unvaccinated cases in April, May, and June 2021 (Fig. 2c), but not in February and March, when the numbers of vaccine breakthrough cases were very low.

Fig. 2: Phylogenetic and temporal analyses of SARS-CoV-2 genomes.
figure 2

a, Phylogenetic tree showing the patterns of divergence and clade assignments of 797 sequenced SARS-CoV-2 positive genomes from California in the study, along with reference genomes from a globally representative dataset obtained from Nextstrain47 (grey branches). Study genomes from fully vaccinated (n = 103, red circles) or unvaccinated (n = 694, grey circles) cases are shown. Clades corresponding to the major SARS-CoV-2 variants identified in the study are named, including those that carry antibody-resistant mutations (dark red boldface text). The length of each branch is proportional to the number of mutations occurring on that branch. b, The same phylogenetic tree, except that the length of each branch is determined by inference of the molecular clock phylogeny using TreeTime50, which positions terminal nodes according to the sampling times and internal nodes at the most likely time of divergence. c, Pie charts showing the proportions of SARS-CoV-2 genomes containing identified mutations associated with antibody resistance (L452R/Q, E484K/Q and/or F490S) among fully vaccinated (top row) and unvaccinated (bottom row) cases per month over a 5-month period (from February to June 2021). The charts are shaded according to genomes carrying ≥1 mutation associated with antibody resistance (L452R/Q, E484K/Q and/or F490S) (red), or lacking an antibody-resistant mutation (black). Fisher’s Exact test (two-tailed) was used to calculate P values. NS, non-significant; *P < 0.05, **P < 0.01.

Among unvaccinated cases, most viruses consisted of non-resistant variants (57% and 61% based on hospital and community testing, respectively) (Fig. 3a, left), in contrast to vaccinated cases, for which the proportions of non-resistant variants fell to 34% and 20%, respectively (Fig. 3a, right). Variant distribution in unvaccinated cases were predominated by Alpha and Epsilon cases, whereas infections by the Gamma and Delta variants, which cause more pronounced decreases in Ab neutralization than other VOCs11,29, were increased in fully vaccinated breakthrough infections. The variant distribution in partially vaccinated cases was similar to that in unvaccinated cases (Extended Data Fig. 3). Overall, fully vaccinated cases were significantly more likely than unvaccinated cases to be infected by resistant variants (77.6% versus 47.7%, P = 1.96 × 10−8), but not by variants associated with increased infectivity (84.7% versus 76.8%, P = 0.092) (Fig. 3b, left and Extended Data Fig. 1). A proportionally increased number of infections from antibody-resistant variants in fully vaccinated cases was also observed for the subset of cases in immunocompetent patients (Fig. 3b, middle), which exhibited a similar distribution of variants (Fig. 3a, inner circles), but not in immunocompromised patients (Fig. 3b, right).

Fig. 3: Comparisons of lineage distribution and proportion of mutations associated with antibody resistance and increased infectivity in fully vaccinated and unvaccinated cases.
figure 3

a, Pie charts showing the distribution of SARS-CoV-2 variant lineages in fully vaccinated and unvaccinated cases from UCSF hospitals and clinics (top row) and from Color Genomics Laboratory (bottom row). For the UCSF charts, the lighter-shaded inner circles show cases in immunocompetent patients only, while the outer circles include both immunocompetent and immunocompromised individuals. Variants carrying mutations associated with antibody resistance are highlighted in dark red boldface text. b, Pie charts showing the proportions of SARS-CoV-2 genomes carrying mutations associated with antibody resistance (L452R/Q, E484K/Q and/or F490S) and increased infectivity (N501Y/T, L452R/Q and/or F490S) in fully vaccinated (top row) and unvaccinated cases (bottom row). The pie charts include genomes corresponding to all sequenced cases containing identifiable mutations (left) and immunocompetent (middle) or immunocompromised (right) patients from UCSF hospitals and clinics. The charts are shaded according to genomes carrying ≥1 mutation associated with antibody resistance (red), ≥1 mutation associated with increased infectivity (green), or neither type of mutation (black). Fisher’s Exact test (two-tailed) was used to calculate P values. *P < 0.05, ****P < 0.0001.

Viral RNA loads for infected persons in the full patient cohort, 125 of whom were vaccine breakthrough cases, were estimated by comparing differences in mean cycle threshold (Ct) values using quantitative PCR with reverse transcription (RT–qPCR) and use of a standard curve (Extended Data Fig. 4). There was no difference in viral RNA loads between fully vaccinated breakthrough and unvaccinated cases, either overall (P = 0.99) or according to lineage (P = 0.09–0.78) (Fig. 4a,b). Infections from variants of concern (VOCs/VOIs) had 2× viral loads compared with non-VOC/VOI lineages overall (P = 0.017) (Fig. 4c). With respect to individual VOCs, higher viral RNA loads were observed for infections by the Gamma (4×, P = 0.00076), Delta (3×, P = 0004) and Epsilon (2×, P = 0.047) variants, but not for the Alpha, Beta and Iota variants (Fig. 4d).

Fig. 4: Comparison of viral loads according to vaccination status, SARS-CoV-2 lineage and clinical symptom status.
figure 4

a, Grouped box-and-whisker plots and swarm plots showing the differences in mean cycle threshold (Ct) values and calculated viral loads in copies per ml (cp ml−1) between fully vaccinated and unvaccinated cases overall (left) and stratified by lineage (right). For all comparisons, the difference in viral loads was not statistically significant (P > 0.05). For the lineage plots on the right, the mean Ct values and approximate viral loads corresponding to unvaccinated and fully vaccinated cases are shown as μunvaccinated | μvaccinated and cp/mlunvaccinated | cp/mlvaccinated, respectively, below the name. b, Box-and-whisker and swarm plots showing the differences in viral loads between symptomatic and asymptomatic cases, stratified by vaccination status. c, Box-and-whisker and swarm plots showing the differences in viral load between specific lineages identified as VOC/VOI and other lineages that were not designated VOCs or VOIs at the time of this study (‘non-VOC/VOI’). Identified VOCs/VOIs included Alpha, Beta, Gamma, Delta, Epsilon and Iota variants, following the World Health Organization (WHO) nomenclature scheme. d, Box-and-whisker and swarm plots showing differences in viral loads between each VOC/VOI and non-VOC/VOI. For all box-and-whisker plots, the box outlines denote the interquartile range (IQR), the solid line inside the box denotes the median, the dashed line inside the box denotes the mean (μ) Ct value, and the whiskers outside the box extend to the minimum and maximum fold enrichment points. Welch’s t-test was used for significance testing. A standard curve was used to determine the estimated viral load for a corresponding Ct value (Extended Data Fig. 4).

We investigated potential correlations between variant identification, clinical symptomatology, vaccine type and viral load in vaccine breakthrough infections. Retrospective medical chart review was performed for a subset of patients from UCSF hospitals and clinics with available clinical and demographic data (n = 598) (Table 1). Among the 39 breakthrough infections out of 598 in this subset, the average age was 49 years (range 22 to 97), and the majority were women (54%). The median interval from completion of all doses of the vaccine and COVID-19 breakthrough infection was 73.5 d (range 15 to 140). The Pfizer-BioNTech (BNT162b2) COVID-19 mRNA vaccine was administered to 20 (51%) of the vaccine breakthrough patients, while 12 (31%) received the Moderna (mRNA-1273) COVID-19 mRNA vaccine and 4 (10%) received the Johnson & Johnson/Janssen (JnJ) COVID-19 viral vector (adenovirus) vaccine (Extended Data Fig. 5).

Table 1 Clinical and demographic characteristics in vaccine breakthrough and unvaccinated cases

Nine (23%) of the vaccine breakthrough patients were immunocompromised, while 28 (72%) were identified as symptomatic and 10 (26%) as asymptomatic. Among the symptomatic breakthrough infections, 6 patients (15.4%) were hospitalized for COVID-19 pneumonia, 1 patient (2.6%) required care in the intensive care unit (ICU) and 0 patients (0%) died. Among the unvaccinated infections (n = 433), 287 (66%) patients presented with symptoms, while 132 (30%) patients reported no symptoms; 51 patients (11.8%) required hospitalization, 27 (6.2%) patients were admitted to the ICU and 5 (1.2%) patients died with COVID-19 reported as the primary cause of death. Among these clinical and demographic variables, only advanced age of >65 years was significantly associated with vaccine breakthrough infections as compared with unvaccinated infections (P = 0.035, odds ratio 2.36 (95% confidence interval (CI) 0.97–5.45)). Viral RNA loads were significantly higher overall for symptomatic as compared with asymptomatic infections for both unvaccinated (P = 0.0014, ΔCt = 8.8 (95% CI 4.0–13.7), 4.5×) and vaccine breakthrough (P = 9.8 × 10−5, ΔCt = 2.8 (95% CI 1.42–4.21), 164×) cases (Fig. 3b). Differences in RNA viral loads for vaccine breakthroughs as compared with unvaccinated cases were non-significant for symptomatic cases (P = 0.64, ΔCt = 0.6 (95% CI −2.0 to 3.2), 1.6×) but were significant for asymptomatic cases (P = 0.023, Δ Ct = −5.4 (95% CI -9.9 – 0.09), 0.043×), while viral RNA loads of hospitalized patients with COVID-19 did not differ significantly from those of outpatients (Extended Data Fig. 6).

We sought to understand the serologic basis behind some of the vaccine breakthrough infections in the study cohort. Plasma samples were available from 5 of 39 (12.8%) patients with clinical metadata for qualitative testing of nucleoprotein immunoglobulin G (IgG) and spike immunoglobulin M (IgM) levels and neutralizing Ab titre using a cytopathic effect (CPE) endpoint neutralization assay as previously described10. For 4 of the 5 patients, serially collected samples were available. Neutralization assays were tested for activity against cultures of a D614G-carrying non-VOC control virus and the Alpha, Beta, Gamma, Delta and Epsilon variants (Fig. 5 and Extended Data Fig. 7). Among the 4 cases out of 5 in immunocompromised patients, 3 patients (Fig. 5 and Extended Data Fig. 7, P1, P2 and P5) failed to mount detectable qualitative and neutralizing Ab responses to the vaccine, likely due to their immunocompromised status (Extended Data Fig. 7). The interpretation was indeterminate for one immunocompromised patient (Fig. 5 and Extended Data Fig. 7, P3), as only samples at day 11 post-breakthrough or later were available, by which time the patient had probably generated a robust antibody response to the breakthrough infection. The remaining vaccine breakthrough case out of 5 (Fig. 5 and Extended Data Fig. 7, P4) was an immunocompetent patient who had received the JnJ vaccine and had also been previously infected with COVID-19 before vaccination. This patient was negative for detection of qualitative nucleoprotein IgG and spike protein IgM Ab from plasma 2 d after testing SARS-CoV-2 positive from nasopharyngeal swab by RT–qPCR; however, strong positivity for spike protein IgG Ab and high neutralizing Ab titre against D614G-carrying control virus suggested a robust antibody response to vaccination. Levels of neutralizing Ab were lowest for the Beta, Delta and Epsilon variants, consistent with the patient’s breakthrough infection by Delta.

Fig. 5: Qualitative and neutralizing antibody studies in vaccine breakthrough patients.
figure 5

Graphical timeline showing results of qualitative spike and nucleoprotein antibody testing and quantitative neutralizing antibody testing against 5 variants (Alpha, Gamma, Delta and Epsilon) and a ‘non-VOC/VOI’ D614G-carrying control strain at various timepoints relative to the date of positive SARS-CoV-2 testing (t = 0) for 5 patients with vaccine breakthrough infections (P1–P5). Immunocompromised patients (P1, P2, P3 and P5) are highlighted in red text. Qualitative antibody results are shown in peach-coloured boxes, while IC50 (50% inhibitory dose) neutralizing antibody titres against the 5 variants and D614 control virus are plotted as a bar graph on a log scale. The black zigzag marks a break in the timeline.


Here we used variant identification by SARS-CoV-2 whole-genome sequencing, quantitative viral load analysis and antibody studies, along with retrospective medical chart review to compare vaccine breakthrough (n = 125, 9.1%) and unvaccinated (n = 1,169, 85.1%) cases in both community and hospitalized settings from northern California. Previous reports have shown that the distribution of VOCs/VOIs in breakthrough cases generally reflect the estimated community prevalence in the unvaccinated population3,15,16,17,18,19,20,21,22,23,24. These reports, however, investigated breakthrough cases over a limited timeframe during which only a single predominant lineage was typically circulating. The current study spanned 5 months as the study population became progressively vaccinated from 2% to >70% while undergoing 3 successive surges of infection from the Epsilon (February–March 2021)10, Alpha (March–June 2021)26 and Delta (June 2021) variants25, with simultaneous circulation of multiple viral variants in the community. In contrast to these previous studies, we found that vaccine breakthrough infections are overrepresented by immunity-evading variants as compared with unvaccinated infections. Phylogenetic analyses revealed that the viruses in vaccine breakthrough and unvaccinated cases had similar genomes with assignment to subclades representing all major variants circulating in the community. The overrepresentation of infection from immunity-evading variants in vaccinated cases thus most likely arises from differential effectiveness of neutralizing antibodies against multiple circulating lineages. Notably, a decreased proportion of vaccine breakthrough infections from the Alpha variant was observed, despite its documented higher infectivity relative to all VOCs except Delta and Gamma10,30. Decreased Alpha infections are consistent with the higher effectiveness of available SARS-CoV-2 vaccines against Alpha relative to other VOCs2,11,13,21.

The predominance of immune-evading variants among post-vaccination cases indicates possible selective pressure for antibody-resistant escape variants circulating locally over time in the vaccinated population. In particular, the Delta variant, which is the predominant circulating lineage in the United States as of July 2021, has been shown to be more resistant to vaccine-induced immunity as well as being more infectious than Alpha13,21,29. Although our data suggest that vaccination at levels below the threshold for achieving herd immunity may increase selection for antibody-resistant variants, it is notabe that vaccine breakthrough infections comprised only a minority of total infections (9%, 125 out of 1,373 cases). These findings are consistent with previous reports showing that vaccination is effective in decreasing viral transmission31,32,33, probably reducing the rate at which new variants emerge and spread in the community34. Among demographic and clinical factors associated with vaccine breakthrough infection, we only identified a significant association with age, consistent with the prioritized rollout of the vaccine in the elderly population2. Several studies have demonstrated that the vaccine remains highly effective against preventing symptomatic breakthrough infections resulting in serious illness leading to hospitalization and/or death1,2,3,4,5,6,7. Our findings are consistent with these other studies, as there were fewer hospital admissions and no deaths in vaccinated patients as compared with unvaccinated patients, although these differences were not statistically significant due to low case numbers.

We also found that differences in viral RNA loads (as estimated using Ct values) between vaccine breakthrough and unvaccinated infections were non-significant (P = 0.99), regardless of lineage. A previous study of a community outbreak of Delta infections in the state of Massachusetts also found that viral loads were similar for both vaccinated and unvaccinated persons with COVID-1917. These findings probably formed the basis for revised indoor mask guidance in July 2021 from the US Centers for Disease Control and Prevention35. Our results show that comparably high viral loads in vaccine breakthrough infections are not confined to Delta alone; indeed, the highest viral loads were observed from Gamma. Notably, viral RNA loads in symptomatic vaccine breakthrough cases were approximately 164× higher as compared with asymptomatic cases (P = 0.0014), and similar to those in unvaccinated cases (P = 0.64). However, significantly lower viral RNA loads were observed in asymptomatic breakthrough cases as compared with unvaccinated cases (0.043×, P = 0.0023). Taken together, these data suggest that symptomatic breakthrough cases are probably as infectious as symptomatic unvaccinated cases, and thus may contribute to ongoing SARS-CoV-2 transmission, even in a highly vaccinated community. These findings thus reinforce the importance of mask wearing recommendations in symptomatic persons to control community spread, regardless of vaccination status35. These also suggest that asymptomatic transmission of breakthrough cases may be less efficient given the lower viral loads. Contact tracing investigation of vaccine breakthroughs is likely needed to ascertain the role, if any, of asymptomatic transmission in vaccinated persons in SARS-CoV-2 spread.

Our antibody analyses, although performed on a small number of cases (n = 5), show that vaccine breakthrough cases are generally associated with low or undetectable qualitative and neutralizing antibody levels in response to vaccination. These findings are consistent with studies that have correlated high antibody levels with vaccine efficacy36. We identified 3 cases of breakthrough infection from Alpha, all in immunocompromised patients who failed to mount detectable levels of neutralizing antibody to both wild-type and VOC SARS-CoV-2 lineages. The failure to mount adequate neutralizing antibody responses in a subset of immunocompromised patients may explain why a proportionally higher rate of infection by antibody-resistant variants was not observed in the current study. We also reported 1 case of Delta breakthrough infection in a patient who had contracted COVID-19 in 2020 and had also received the JnJ vaccine. As the patient was asymptomatic, we do not know exactly when the breakthrough infection occurred, potentially explaining why a reduced but detectable neutralizing antibody response to Delta was observed. Although singular, this case demonstrates the likely inadequacy of convalescent antibodies generated from previous infection in protecting against future infection, especially against emerging antibody-resistant VOCs, and the reduced effectiveness of the JnJ vaccine relative to the mRNA vaccines against the Delta variant37.

There are several limitations to our study. First, breakthrough infections in this study were identified by testing persons presenting to a tertiary hospital and clinic or as part of community-based testing by a commercial laboratory, hence sampling bias may be present. This limitation is mitigated by our results showing similar variant distributions and viral load comparisons across two separate test cohorts. Second, the total number of vaccinated persons with breakthrough infections was relatively small at 125, of which detailed clinical and epidemiologic metadata were only available for 39. Third, the sequencing depth of 5-fold or greater was not sufficient to identify intra-host diversity and viral quasi-species subpopulations. Fourth, clinical data were obtained by retrospective medical chart review and thus may have had missing data or may have been subject to inaccurate reporting. Finally, in the absence of contact tracing metadata, we were unable to assess transmission and secondary attack rates from vaccinated persons to exposed contacts.

In summary, our results reveal that vaccine breakthrough infections are overrepresented by immune-evading variants such as Gamma11,38 and Delta11,13,29, probably due to selection pressure in a highly vaccinated community (>71% fully vaccinated as of early August 2021), and that high-titre symptomatic post-vaccination infections may be a key contributor to viral spread. Waning immunity resulting in decreased effectiveness of the vaccine in preventing symptomatic infection over time39, relaxation of COVID-19 restrictions and complacency due to ‘pandemic fatigue’, and in particular, the emergence of the Delta variant with both higher infectivity and antibody resistance13,21,29,40,41 may explain the steep rise in COVID-19 cases in San Francisco County (Fig. 1)42 and nationwide25 in July–August 2021. Targeted booster vaccinations to increase protective neutralizing antibody levels against antibody-resistant variants33,36, potentially guided by monitoring of immune correlates of vaccine efficacy36, will probably be needed in the near future to control viral spread in the community.


Human sample collection, ethics statement and public health surveillance data

Remnant nasopharyngeal and/or oropharyngeal samples and plasma samples from laboratory confirmed SARS-CoV-2 positive patients were retrieved from the UCSF Clinical Laboratories and stored in a biorepository until processed. Remnant samples were biobanked and retrospective medical chart reviews for relevant clinical and demographic metadata were performed under a waiver of consent and according to protocols approved by the UCSF Institutional Review Board (protocol numbers 10-01116 and 11-05519).

De-identified samples from community COVID-19 testing were obtained from Color Genomics Laboratory as part of a research collaboration. Vaccine breakthrough data corresponding to the de-identified samples from Color Genomics were obtained from the San Francisco Department of Public Health. Approval for sequencing and analysis of these de-identified samples and metadata was obtained from the UCSF Institutional Review Board (protocol number 11-05519).

Data regarding the number of COVID-19 cases in San Francisco County during the study period per 100,000 population and the percentage of the eligible persons in the county who were vaccinated were obtained from publicly available records42,43.

Viral whole-genome sequencing

For primary nasopharyngeal and/or oropharyngeal swab samples from UCSF hospitals and clinics, remnant samples collected in UTM/VTM were diluted with DNA/RNA shield (Zymo Research, R1100-250) in a 1:1 ratio (100 μl primary sample + 100 μl shield). The Omega BioTek MagBind Viral DNA/RNA Kit (Omega Biotek, M6246-03) and the KingFisher Flex Purification System with a 96 deep-well head (ThermoFisher, 5400630) were then used for viral RNA extraction. For mid-turbinate nasal swab samples sent to Color Genomics for commercial laboratory testing, dry swabs were collected and transported to the laboratory with no added media. At the laboratory, the swabs were resuspended in 1.3 ml lysis buffer and RNA was extracted using the Chemagic 360 system (Perkin-Elmer). Remnant RNA was then aliquoted for viral whole-genome sequencing.

Extracted RNA was reverse transcribed to complementary DNA and tiling multiplexed amplicon PCR was performed using SARS-CoV-2 primers version 3 according to a published protocol44. Adapter ligation was performed using the NEBNext Ultra II DNA Library Prep Kit for Illumina (New England Biolabs, E7645L). Libraries were barcoded using NEBNext Multiplex Oligos for Illumina (96 unique dual-index primer pairs) (New England Biolabs, E6440L) and purified with AMPure XP (Beckman-Coulter, A63881). Amplicon libraries were then sequenced on either Illumina NextSeq 550 or Novaseq 6000 as 1×300 single-end reads (300 cycles).

SARS-CoV-2 viral genome assembly and analyses

SARS-CoV-2 viral genome reads were assembled and variants were identified using an in-house bioinformatics pipeline as previously described45. BCL files generated by Illumina sequencers (NextSeq 550 or NovaSeq 6000) were simultaneously demultiplexed and converted to FASTQ files. Raw FASTQ files were first screened for SARS-CoV-2 sequences using BLASTn (BLAST + package 2.9.0) and were aligned against the Wuhan-Hu-1 SARS-CoV-2 reference genome (National Center for Biotechnology Information (NCBI) GenBank accession number NC_045512.2). Reads containing adapters, the ARTIC primer sequences, and low-quality reads were filtered using BBDuk (version 38.87), and then mapped to the NC_045512.2 reference genome using BBMap (version 38.87). Variants were called with CallVariants (version 38.87) and a depth cutoff of 5 was used to generate the final assembly. Pangolin software (version 3.0.2) was used to identify the lineage46. Using a custom in-house script (code available at Zenodo, doi: 10.5281/zenodo.5207242), consensus FASTA files generated by the genome assembly pipeline were scanned to confirm the presence/absence of resistance-associated (L452R, L452Q, E484K and/or F490S)10,11,12,13,14, and infectivity-associated (L452R/Q, N501Y/T and/or F490S) mutations10,11,12,13,14,28. Only genomes with defined lineages were included in this analysis. Phylogenetic analyses were performed using Nextstrain v9 (cli version 3.0.3)47, which runs the Augur (version 13.0.2) bioinformatics pipeline consisting of MAFFT v7.45348 for alignments, IQTREE v.1.6 for estimating maximum likelihood phylogenies49, TreeTime (version 0.8.4) for dating and ancestral inference50, and Auspice (version 2.31.0) for tree visualization. Multiple sequence alignment of SARS-CoV-2 genomes was performed using the MAFFT aligner v7.38848 as implemented in Geneious v11.1.551.

RT–qPCR and viral load analysis

The TaqPath COVID-19 Combo kit (ThermoFisher) was used to determine Ct values. This multiplex real-time RT–qPCR assay detects the nucleoprotein (N) gene, spike (S) gene, and orf1ab genes. For simplicity, only the N gene Ct value was used for quantitative analysis of RNA viral loads in this study. A standard curve was generated by serially diluting known concentrations of SARS-CoV-2 positive control in triplicates and the N gene Ct values of each concentration were determined. The log of each known concentration was plotted against the Ct value. The data were fitted to a regression curve and the correlation coefficient was calculated. The viral load measurements in copies per ml were interpolated from the derived standard curve.

Antibody assays

SARS-CoV-2-specific antibodies were determined using the Abbott ARCHITECT SARS-CoV-2 IgG (N-based), AdviseDx SARS-CoV-2 IgM (spike receptor-binding domain (RBD)-based), and AdviseDx SARS-CoV-2 IgG II (spike RBD-based) tests according to the manufacturer’s specifications.

CPE endpoint neutralization assays using a VOC lineage virus

CPE endpoint neutralization assays were done following the limiting dilution model52 and using P1 stocks of D614G-carrying control, B.1.1.7, B.1.617.2, B.1.429, B.1.351 and P.1 lineages. Convalescent patient plasma was diluted 1:10 and heat inactivated at 56 °C for 30 min. Serial 2-fold dilutions of plasma were made in BSA-PBS. Plasma dilutions were mixed with 100 TCID50 of each virus diluted in BSA-PBS at a 1:1 ratio (160 μl plasma dilution and 160 μl virus input) and incubated for 1 h at 37 °C. Final plasma dilutions in plasma–virus mixtures ranged from 1:100 to 1:12,800. Plasma–virus mixtures (100 μl) were inoculated on confluent monolayer of Vero-81 cells in 96-well plates in triplicate and incubated at 37 °C with 5% CO2 incubator. After incubation, 150 μl MEM containing 5% FCS was added to the wells, and plates were incubated at 37 °C with 5% CO2 until consistent CPE was seen in virus control (no neutralizing plasma added) wells. Positive and negative controls were included as well as cell control wells and a viral back titration to verify TCID50 viral input. Individual wells were scored for CPE as having a binary outcome of ‘infection’ or ‘no infection’, and the IC50 was calculated using the Spearman–Karber method. All steps were done in a Biosafety Level 3 lab using approved protocols.

Statistical analyses

Statistical analyses were performed using Python scipy package (version 1.5.2) and rstatix package (version 0.7.0) in R (version 4.0.3). For comparisons of the mean Ct values, significance testing was done using Welch’s t-test as implemented in Python (version 3.7.10). Fisher’s Exact test was used to assess the association of demographics and clinical variables with vaccination status. Box-and-whisker and swarm plots were generated using Python matplotlib (version 3.3.2) and seaborn (version 0.11.0) packages. All statistical tests were conducted as two-sided at the 0.05 significance level.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.