Pattern of SARS-CoV-2 variant B.1.1.519 emergence in Alaska

Alaska has the lowest population density in the United States (US) with a mix of urban centers and isolated rural communities. Alaska’s distinct population dynamics compared to the contiguous US may have contributed to unique patterns of SARS-CoV-2 variants observed in early 2021. Here we examined 2323 SARS-CoV-2 genomes from Alaska and 278,635 from the contiguous US collected from December 2020 through June 2021 because of the notable emergence and spread of lineage B.1.1.519 in Alaska. We found that B.1.1.519 was consistently detected from late January through June of 2021 in Alaska with a peak prevalence in April of 77.9% unlike the rest of the US at 4.6%. The earlier emergence of B.1.1.519 coincided with a later peak of Alpha (B.1.1.7) compared to the contiguous US. We also observed differences in variant composition over time between the two most populated regions of Alaska and a modest increase in COVID-19 cases during the peak incidence of B.1.1.519. However, it is difficult to disentangle how social dynamics conflated changes in COVID-19 during this time. We suggest that the viral characteristics, such as amino acid substitutions in the spike protein, likely contributed to the unique spread of B.1.1.519 in Alaska.

www.nature.com/scientificreports/ In this study, we report the emergence, prevalence, and viral traits of SARS-CoV-2 lineages circulating in Alaska from 2233 sequenced specimens collected between the first week of December 2020 through the last week of June 2021, after which B.1.1.519 was no longer detected. Using SARS-CoV-2 sequence data from the United States available on NCBI GenBank by the CDC and its contracted labs, we explore how the trend in B.1.1.519 and other variants observed in Alaska compared to trends across the contiguous United States. These results highlight the importance of continued genomic surveillance for aiding efforts in controlling the pandemic by providing a context to compare the potential for the next VOC emergence.

Materials and methods
Retrieving SARS-CoV-2 sequence data for Alaska. The data presented here were generated as part of the Alaska SARS-CoV-2 Sequencing Consortium which is a partnership between the University of Alaska and the Alaska Division of Public Health (AKDPH) with the aim to increase genomic surveillance of variants. On February 14th, 2022, we downloaded 2,323 sequences from samples collected between 2020-11-29 to 2021-06-26 available on GISAID for subsequent analysis 8,9 . Genome sequencing in Alaska is from a non-targeted sample of cases, which is the best available approximation of random samples despite potential disparate coverage across Alaska's economic regions. We used these data to estimate the prevalence of lineages per week on dates beginning on Sunday.
Analysis of SARS-CoV-2 sequence data for Alaska. Lineages 11 . Nextclade's quality control incorporates six individual rules based on missing data, mutations, and amino acid changes as described in the online documentation.
We collected metadata on the number of cases for Alaska from the AKDPH COVID-19 CSV Files Database Cases Data B dataset on 2022-2-14 12 . We collected 2020 population estimates for economic regions of Alaska from the Alaska Department of Labor and Workforce Development population estimates page 13 . Information on the timeline of COVID-19 policies (in terms of restrictions, opening, and vaccines) in Alaska was collected from the Johns Hopkins Coronavirus Resource Center 14 .
SARS-CoV-2 sequence data for the contiguous United States. On February 14th, 2022, we downloaded metadata available on GenBank for 1,331,799 sequences including Pangolin assignment, Isolate, and GeoLocation for all samples from the United State of America collected between 2020-11-29 to 2021-06-26 sequenced by the CDC. We then filtered out cases from Alaska, Hawaii, and US territories to limit our comparisons to the contiguous United States. We collected daily case data with seven day rolling averages from the CDC COVID Data Tracker site for the entire United States 15 .

Results and discussion
Higher prevalence of B.1.1.519 in Alaska versus the contiguous United States. We found a striking difference in the percentage of sequenced cases, a metric that can be used as an estimate of prevalence, assigned to the lineage B.1.1.519 between Alaska (Fig. 1B) and the rest of the United States (Fig. 1D)  Unlike the contiguous United States where Alpha comprised approximately 5% of sequenced cases by the end of January, in Alaska, Alpha was not consistently detected in sequenced cases until March of 2021 (Fig. 1B). By the first week of March in Alaska, B.1.1.519 had already reached an estimated 60.8% of sequenced cases compared to 4.6% in the rest of the US, which was the peak prevalence for the contiguous United States (Fig. 1B). During B.1.1.519's peak in the contiguous United States, Alpha made up 30.5% of sequenced cases. By contrast, in Alaska, B.1.1.519 reached a peak prevalence of 77.9% by the week of 4 April 2021, 5 weeks after the peak in the contiguous United States.
In terms of COVID-19 cases reported in Alaska, there was a sharp decline the first several weeks of December 2020 followed by an increase in cases through the first week of January 2021 and then another decline (Fig. 1A). During this time, cases in the United States increased until the second week of January 2021 followed by a decline through the third week of February 2021 (Fig. 1C). In both Alaska and the rest of the United States, new COVID-19 cases remained at a relatively stable rate while the cases attributed to VOC, VOI, and VBMs were rising (Fig. 1A,C). Given the complexity of factors that affect case rates, it is difficult to attribute the stabilization Alaska's vast geographic expanse and limited travel options, populations of people tend to interact more frequently within economic regions of the state. In Alaska there are six economic regions defined by the Department of Labor and Workforce Development: the Anchorage-Mat Su, Interior, Gulf Coast, Southeast, Southwest, and Northern regions in order from highest to lowest population. Based on available but limited data, we observed distinct trends in SARS-CoV-2 across economic regions of Alaska (Fig. S1). However, in our analysis, we excluded all but the two most populous economic regions, the Anchorage-Mat Su and Interior, because of the low proportion of sequenced cases in the other economic regions of the state. We can draw more robust conclu- www.nature.com/scientificreports/ sions about estimates of prevalence because these two regions sequenced greater than 5% of the newly reported cases February through June of 2021 (Fig. S2).
Between the Anchorage-Mat Su and Interior economic regions there were distinct dynamics in the prevalence of variants over time (Fig. 2). For example, Alpha was consistently detected earlier in the Anchorage-Mat Su (Fig. 2B) than the Interior (Fig. 2D), 360 miles apart. However, in the Interior, Epsilon comprised a high percentage of sequenced cases from January through March 2021 (Fig. 2D), and the emergence and persistence of B.1.1.519 differed between these two regions of the state.
The first case of B.1.1.519 in Alaska was detected from a sample collected 4 February 2021 in the Southwest region of the state. In both the Anchorage-Mat Su and Interior regions, B.1.1.519 was detected in specimens with collection dates one day apart from one another, 7 and 8 February 2021, respectively. By the end of February, B.1.1.519 comprised 37.8% of sequenced cases in the Anchorage-Mat Su region and 10.5% of cases in the Interior (Fig. 2). By February VBMs detected in the Anchorage-Mat Su included Alpha, Epsilon, and Gamma. On the other hand, in the Interior, Epsilon was the only VBM detected and comprised 38.5% of sequenced cases in February (Fig. 2D). In the Interior, it appears that Epsilon, which was first detected 6 January 2021, was able to establish itself before B.1.1.519. However, one month after B.1.1.519 was first detected in the Interior, B.1.1.519 overtook Epsilon. By April, the majority of sequenced cases were assigned to B.1.1.519 in the Interior (77.2%) and the Anchorage-Mat Su (72.6%). In both regions, there is a noticible increase in cases around this peak in B.1.1.519 prevlaence ( Fig. 2A,C). After peaking in April, prevalence of B.1.1.519 in May only marginally decreased for the Interior to 74.1% but sharply decreased for the Anchorage-Mat Su region to 38.8% (Fig. 2).
The sharp decline of B.1.1.519 in the Anchorage-Mat Su region may have occurred earlier than in the Interior due to the emergence of other variants that had established themselves in the Anchorage-Mat Su region well before the Interior. For example, in the Anchorage-Mat Su region, Alpha was detected on 20 December 2020-97 days before it was first detected in the Interior. This gave Alpha more time to establish itself within the population and consequently reached 42.9% of sequenced cases by May in the Anchorage-Mat Su region versus 9.8% in the Interior. Delta also emerged in the Anchorage-Mat Su region weeks before it was first detected in

High prevalence of B.1.1.519 in Alaska driven mutational advantage. Mutations in the SARS-
CoV-2 genomes from Alaskan cases occurred in an almost clockwise fashion (Fig. 3A). Although most of these mutations are expected to have neutral effects, some changes can confer selective advantages that enhance fitness by increasing immune evasion, transmissibility, and/or infectivity 16 . Mutations that cause amino acid changes to the spike (S) glycoprotein may confer selective advantages given the role of the S protein in COVID-19 pathogenesis and tropism. The mature S protein is cleaved into a fusion domain (S2), and a S1 domain containing an N-terminal head and the receptor binding domain (RBD) that binds to host cell human angiotensin converting enzyme-2 (hACE2) receptor. The S1 domain and RBD particularly are key targets for binding of neutralizing antibodies induced by infection or vaccines, principally IgG 17 . Given the role of the S protein in receptor binding and subsequent membrane fusion and viral entry into host cells, mutations affecting S1, especially the RBD, can impact hACE2 affinity, viral entry, infectivity, and immune evasion 18,19 . B.1.1.519 has 11 shared amino acid (AA) mutations relative to the original SARS-CoV-2 (Wuhan-1) genome, with 4 in S: T478K and D614G in the RBD, P681H in the S1/S2 cleavage site, and T732A in the S2 domain 5 . T478K also arose in B.1.617.2 (Delta) 20 and may contribute to antibody evasion 21 . P681H also occurs in B.1.1.7 (Alpha) and has a weak ability to increase proteolytic cleavage of S1/S2 in vitro 22 . The functional significance of B.1.1.519 harboring these mutations in S is unclear. In a study from the Netherlands, there was no evidence of increased viral load conferred by B.1.1.519 23 .
Interestingly, based on the regression of total substitutions over time, which exhibited an increase at a steady rate, B.1.1.519 genomes had more total substitutions across the genome (Fig. 3A, red line) than the background of non-emerging lineages (Fig. 3A, gray and black lines). B.1.1.519 had significantly more AA substitutions in the spike protein than lineages detected in Alaska prior to the week of B.1.1.519's peak prevalence, 4 April 2021, suggesting a potential competitive advantage over previously circulating lineages (B.1.1.519 = 4.18 ± 0.01, Prepeak lineages = 2.65 ± 0.10 S AA substitutions, Wilcoxon W = 42,302, p-value < 2e−16; Fig. 3B). The period of time after B.1.1.519's peak prevalence was dominated by the emergence of many VOC. Genomes sequenced after B.1.1.519's peak prevalence had significantly more AA changes in the spike protein than B.1.1.519, dominated by more divergent Alpha (Fig. 3A, yellow line) and Delta VOC (Post-peak lineages = 7.75 ± 0.07 S AA substitutions, Wilcoxon W = 1e+05, p-value < 2e−16; Fig. 3B). These VOC likely held a selective advantage that allowed them to outcompete B.1.1.519 in Alaska, a phenomenon of variant replacement observed across Alpha and Delta waves in the USA 24 .
B.1.1.519 established itself within the circulating population of SARS-CoV-2 viruses in Alaska weeks before other variants, such as Alpha, were consistently detected (Fig. 1B). The difference in prevalence of B.1.1.519 during Alpha's emergence in Alaska versus the contiguous United States, paired with the finding of more S protein amino acid changes in this variant than previous lineages in Alaska, suggests that in Alaska B.1.1.519 likely emerged and became dominant due the unique diversity of variants. In this context, the initial transmission events of B.1.1.519 led to this variant outcompeting the previously circulating lineages because of a selective www.nature.com/scientificreports/ advantage that that variant already possessed. The same trend was likely not possible in the other states because Alpha, which proved to outcompete B.

Conclusion
Using genomic data, we identified the unique dynamics of B.1.1.519 spread in Alaska compared to the contiguous United States. The distinct pattern of emergence and spread in Alaska emphasizes both how novel introductions impact regional differences in circulating SARS-CoV-2 lineages and the significance of having robust sequencing efforts in place to detect this variation. There are several resources that compile SARS-CoV-2 sequence data to help examine patterns of emergence and spread. One such resource, outbreak.info, uses genomic data from the sequence repository GISAID to estimate prevalence, emergence, and spread of SARS-CoV-2 lineages globally 4 . When considering data from this source, the highest worldwide prevalence of B.1.1.519 was in early March 2021, a month before Alaska's peak prevalence. At a country level, B.1.1.519 was found to comprise the highest percentage of sequenced genomes in Mexico (15% of sequenced cases compared to 1% of cases sequenced in the United States) over the course of the pandemic. On 14 February 2022, when this analysis was conducted, B.1.1.519 comprised 10.4% of all of Alaska's SARS-CoV-2 sequences to date. While this is high compared to other geographical locations, spatial and temporal heterogeneity in sequencing efforts is known to bias genomic surveillance and complicate comparisons between locations.
In Alaska, disparate sequencing coverage affected our ability to generate robust analyses in rural regions of the state where sampling and subsequent sequencing efforts were lower than more populated regions. Despite this disparate coverage, genomic surveillance continues to be an essential tool for informing public health decisionmaking throughout the state. Genomic surveillance paired with epidemiological findings will be key to detecting new threats as they emerge and is a critical tool in shaping policy to address the clinical impacts of local variation in circulating virus lineages. At a patient level, variant surveillance can inform which monoclonal antibodies will be most effective for early treatment to directly reduce individual mortality and morbidity. At a community level, variant proportion data paired with epidemiological findings can inform the broader population of their risks of reinfection, evaluate vaccine efficacy, and assess the need for appropriate treatment stockpiles and quantities. Isolated populations are vulnerable to variants with significant clinical outcomes that rise to local dominance. While the WHO did monitor the abundance of B.1.1.519, this variant did not receive the VOC classification as it lacked a globally spread. It is cases like these that stress the importance of genomic surveillance in places like Alaska. Genomic surveillance information will continue to be of great importance for public health policy decisions as SARS-CoV-2 becomes endemic in Alaska and around the globe.

Data availability
All data used in this study are available online through publicly open databases. SARS-CoV-2 genomes for Alaska were retrieved from the global initiative on sharing all influenza data (GISAID) (https:// www. gisaid. org/). GISAID Accession Numbers are provided in the Supplementary file. United States SARS-CoV-2 metadata was retrieved from NCBI GenBank (https:// www. ncbi. nlm. nih. gov/ sars-cov-2/). Metadata on the number of cases for Alaska is available on the Alaska COVID-19 Summary Dashboard (https:// covid 19. alaska. gov/).