Genomic surveillance of SARS-CoV-2 in Puerto Rico enabled early detection and tracking of variants

Background Puerto Rico has experienced the full impact of the COVID-19 pandemic. Since SARS-CoV-2, the virus that causes COVID-19, was first detected on the island in March of 2020, it spread rapidly though the island’s population and became a critical threat to public health. Methods We conducted a genomic surveillance study through a partnership with health agencies and academic institutions to understand the emergence and molecular epidemiology of the virus on the island. We sampled COVID-19 cases monthly over 19 months and sequenced a total of 753 SARS-CoV-2 genomes between March 2020 and September 2021 to reconstruct the local epidemic in a regional context using phylogenetic inference. Results Our analyses reveal that multiple importation events propelled the emergence and spread of the virus throughout the study period, including the introduction and spread of most SARS-CoV-2 variants detected world-wide. Lineage turnover cycles through various phases of the local epidemic were observed, where the predominant lineage was replaced by the next competing lineage or variant after ~4 months of circulation locally. We also identified the emergence of lineage B.1.588, an autochthonous lineage that predominated in Puerto Rico from September to December 2020 and subsequently spread to the United States. Conclusions The results of this collaborative approach highlight the importance of timely collection and analysis of SARS-CoV-2 genomic surveillance data to inform public health responses.

Overall Impression: The work provides a thorough and comprehensive analysis of the main SARS-CoV-2 lineages circulating in Peurto Rico during the study period. It is well written, the data and results mostly clearly presented and worth publishing. However, the work presented provides an overall view of changing lineages in Puerto Rico and is not primarily focused on the emergence of the B.1.588 lineage as the title suggests. The Discussion can be improved by including comparisons to data from and studies conducted in other countries (other than the United States), specifically any island states.
Specific comments and suggestions: Line 99 -The authors claim that Puerto Rico is an ideal setting in which to monitor and track SARS-CoV-2 lineage introduction and spread due to it being a geographically isolated location. This claim should be substantiated with further discussion and comparisons to similar studies conducted in other similar locations.
Line 130 -Mention is made of the data included in the analyses including these imported cases. However, no further mention is made of these cases. It would be worth indicating on the relevant figure where these sequences fall within the larger dataset and discussing in the text if it is known whether any of these cases contributed to introduction events to Puerto Rico.
Page 168, Figure C -Panel C can be moved to the Supplementary Information. It is referenced only once (page 7, lines 163-165) and the difference in the groupings of the dates used in panels A and B distract from the figure.
Line 177 -"... due to high frequency." Is this why it was considered separately or because it is a lineage of focus due it possibly having emerged in Puerto Rico? Line 189 -Is it possible to comment on the number of introductions given the results of the analysis, as well as comment on how much earlier that detection of the lineages by traditional epidemiological methods? This is worth discussing and for the main lineages presented and discussed in the paper.
Lines 220-222 -The replacement of B.1.588 and timeline stated is not clear in Figure 2 as expected given this text. Perhaps include a sub-panel showing the specific area of phylogeny being referred to? Or indicate using a graphic on the phylogeny the area?
Line 248 -Studies conducted in other countries and regions should be considered in order to better contextualize the results presented here.
Line 285 -Reference is made to "low node support". Support values over a certain threshold and those of interest should be indicated on the tree in Figure 5.
Line 347 -The use of the word "Interestingly" to describe the decline in frequency of B.1.588 observed is questionable given that Alpha was possible introduced at that point. An attempt is made to compare the fitness and infectivity rates of B.1.588 and Alpha in the next paragraph. However, given the focus on B.1.588 in this paper more discussion is needed on B.1.588 and its behaviour observed in other countries and comparisons of fitness, infectivity etc. to other main SARS-CoV-2 lineages.
Line 360-361 -Reference needed Line 392 -More discussion is needed of the relevance of the island geography to the results presented (see comment for Line 99) Reviewer #3 (Remarks to the Author): This study describes the SARS-CoV-2 genomic epidemiology in Puerto Rico (PR) from the first detection in March 2020 through to September 2021. To do this, in association with the health agencies and academic institutions, the authors sequenced 753 genomes covering all health regions in PR. Key results include detection of multiple introductions, lineage turnover over time, and the emergence of B.1.588 locally, which subsequently spread to the U.S, and finally, thorough detailed analysis of the Alpha and Delta sequences the study show extensive migration between PR, the US and the Caribbean.
As Puerto Rico has experienced a severe pandemic, the data presented here has the potential to improve understanding of regional dynamics, however, the description of Results is not clear in many cases, and sometimes does not accurately portray the Results, and there was some confusion in epidemic terminology (Major comment below), indicating the manuscript needs a thorough revision for rigour and clarity. As the study remains largely descriptive, it is unclear how the epidemic or regional migration patterns changes as the control measures changed, except for the first wave.
Major. 1. Several problems with terminology make it difficult to follow. Mainly, "epidemic peaks/peaks" is incorrectly used throughout instead of "epidemic wave". For example, in line 142 "Circulation of lineage B.1.588 declined during the **first peak** of the epidemic in the winter of 2020" should instead be "first epidemic wave in the winter of". "sub-tree" has a different meaning in phylogenetics -it should just be "tree" in most cases as these were constructed separately.

Statements in the Results that need further clarification
Results: Local epidemic and variant detection. -Lines 118-119 "During March-July 2020, the number of confirmed COVID-19 cases remained low, associated with the strict stay-at-home order." is not fully correct as substantial cases were detected in July as stated in the next sentence, indicating cases rose before lifting stay-at-home orders.
-Line 136-138 is not clear. "The initial phase of the epidemic was characterised by the detection of a wide diversity of B.1x lineages that circulated at low frequency for short periods of time, suggesting that the local epidemic was initiated by multiple introduction events. " Wide diversity is not apparent in Figure  Results: Phylogenetic reconstruction of the local pandemic -Line 195-196 "Our analysis also showed the emergence and spread of the SARS-CoV-2 variants detected in Puerto Rico." Rephrase as the spread in PR is not shown. -Line 199-200 "The observed clustering patterns indicate multiple virus introductions with rapid and explosive expansion across the island in a short period of time." The phylogenies indicate introductions, but the 'explosive expansion' is not apparent in Figure. Results: Detection and spread of autochthonous lineage B.1.588 -This section is better explained, except for the part specifying the origins of B.1.588 from within Puerto Rico. The root of this tree is sparsely sampled with long branch lengths indicating a better sampling is needed in the tree Figure. Results: Emergence of SARS-CoV-2 variants -In this section, the authors describe the circulation of Alpha and Delta in more detail, showing extensive migration between PR and US, however, this section remains largely descriptive with the same conclusion for both "multiple introductions throughout x-x months propelled the emergence and transmission of this variant in the island". This could be combined together, and a summary of the number of migration events, or such quantities in relation to control measures over time could be a meaningful presentation of the introduction.
3. Methods * Specify how the sequence alignment was treated. Specify if the sites deemed as problematic for phylogenetics have been removed. * Yang96 was used for dating -specify why this codon-based model, and was the alignment trimmed to codon regions? "concatenated" in line 707, should be changed to "combined".
Delete "inference" in Line 711. 2 Line 130 -Mention is made of the data included in the analyses including these imported cases. However, no further mention is made of these cases. It would be worth indicating on the relevant figure where these sequences fall within the larger dataset and discussing in the text if it is known whether any of these cases contributed to introduction events to Puerto Rico.
This study generated 753 complete genomes which are represented with red dots in the phylogeny in Figure 2. Because the initial imported cases were A lineage and no evidence of further spread, we considered not marking these in the phylogeny. However, we classified as imported cases those genomes obtained from patients with reported travel history or genomes closely associated to sequences from CONUS. Importation events were also inferred by detection of a wide diversity of B.1x lineages that predominated in CONUS, some of which we have no evidence of further spread in Puerto Rico considering the available sampling at the time. To clarify the limitation of travel history data, we added the following on lines 416-418 of the highlighted-marked version of the revised manuscript "The availability of case metadata, such as travel history, was also limited which would have facilitated an in-depth analysis on the impact of importations on the island." The authors prefer to maintain panel C as part of Figure 1 considering the evidence that provides supporting that multiple importation were received in the island harboring a variety of genotypes. Tick marks and columns representing 1 month are aligned between panels A and B to facilitate interpretation.

4
Line 177 -"... due to high frequency." Is this why it was considered separately or because it is a lineage of focus due it possibly having emerged in Puerto Rico The authors considered B.1.588 as separate set due to both reasons listed by the reviewer. The frequency of B.1.588 genomes detected was substantially higher than other B.1x lineages at the time, in addition to the nature of the lineage having diverged in Puerto Rico as highlighted in this report. To clarify this, line 181 of the highlighted-marked version of the revised manuscript now reads "…due to high frequency and focus of this study." 5 Line 189 -Is it possible to comment on the number of introductions given the results of the analysis, as well as comment on how much earlier that detection of the lineages by traditional epidemiological methods? This is worth discussing The authors cannot confirm with accuracy which of all the genomes samples come from imported cases considering the metadata available. Similar to comment #2, the authors classified as imported cases those genomes obtained from patients with reported travel history or genomes closely associated to sequences from CONUS. Importation events were also inferred by detection of a wide diversity of B.1x lineages, some of which we have no evidence of further spread considering the available sampling at and for the main lineages presented and discussed in the paper.
the time. Ancestor reconstruction allowed the inference of date of divergence or emergence of variants in the island. Since genomics continues to be the only methods for variant detection, cannot compare dates to detection by traditional epidemiological methods which rely on molecular diagnostics.

6
Lines 220-222 -The replacement of B.1.588 and timeline stated is not clear in Figure 2 as expected given this text. Perhaps include a subpanel showing the specific area of phylogeny being referred to? Or indicate using a graphic on the phylogeny the area? 8 Line 285 -Reference is made to "low node support". Support values over a certain threshold and those of interest should be indicated on the tree in Figure 5.
The authors consider low node support those nodes with less than 75% bootstrap support. Considering the compressed graphical representation of the tree, the authors prefer not to include bootstrap values in the figure. To clarify node support, the authors added the following to line 294 of the highlighted-marked version of the revised manuscript "…with low node support, less than 75% bootstrap value ( Figure 5)." 9 Line 347 -The use of the word "Interestingly" to describe the decline in frequency of B.1.588 observed is questionable given that Alpha was possible introduced at that point. An attempt is made to compare the fitness and infectivity rates of B.1.588 and Alpha in the next paragraph. However, given the focus on B.1.588 in this paper more discussion is needed on B.1.588 and its behaviour observed in other countries and comparisons of The authors replaced the word "Interestingly" for "Curiously" because we find "curious" the switch of predominant lineage during a high transmission period suggesting potential competition of lineages in population whose immunological scenario was changing due to the vaccination campaign. However, the authors are not able to provide additional discussion on the behavior of B.1.588 because little is known about this lineage and it has not been reported by other groups, to date. To clarify this, we added the following statement to lines 353-354 of the highlighted-marked version of the revised manuscript "However, since this lineage was not considered a VBM, little is known about its phenotype or impact on other regions." fitness, infectivity etc. to other main SARS-CoV-2 lineages. We also speculate that these cycles could be related to the limitations of an island geography limiting access to the island only by restricted air travel at the time…".

Reviewer #3 comments Authors' response and corrections
1 Several problems with terminology make it difficult to follow. Mainly, "epidemic peaks/peaks" is incorrectly used throughout instead of "epidemic wave". For example, in line 142 "Circulation of lineage B.1.588 declined during the **first peak** of the epidemic in the winter of 2020" should instead be "first epidemic wave in the winter of". "sub-tree" has a different meaning in phylogenetics -it should just be "tree" in most cases as these were constructed separately.
The authors agree. The correct term should be epidemic wave. We have replaced the word "peak" with "wave" throughout the manuscript. The term "sub-tree" was used to refer to a focused tree on a smaller subset of the parental dataset. To clarify this, we replaced the term "sub-tree" with "focused phylogenetic tree" throughout the manuscript as well.
Lines 118-119 "During March-July 2020, the number of confirmed COVID-19 cases remained low, associated with the strict stay-athome order." is not fully correct as substantial cases were detected in July as stated in the next sentence, indicating cases rose before lifting stay-at-home orders.
The authors confirm that the steep increase in cases followed the lifting of the stay-at-home order issued by the local government, through the exact timing of the steep increase was difficult to assess considering the accessibility and precision of the data reported by the local government surveillance portal. To accommodate this uncertainty, we now start the statement in line 119 of the highlighted-marked version of the revised manuscript with "Around the time when the order was lifted…" The next sentence states that the following epidemic wave was detected in November 2020, line 122. 3 Line 136-138 is not clear. "The initial phase of the epidemic was characterised by the detection of a wide diversity of B.1x lineages that circulated at low frequency for short periods of time, suggesting that the local epidemic was initiated by multiple introduction events. " Wide diversity is not apparent in Figure 1.
The authors claim that a wide diversity of B.1x lineages were detected during the first year of the local epidemic. All of these B.1x lineages are grouped within the "Other" category, lavender colored bar in Figure 1B. Lines 121-122 of the highlighted-marked version of the revised manuscript reference Figure 1A and indicate that the peaks of each epidemic waves occurred in November 2020, April 2021, and August 2021. To clarify in-text citation, line 121 now reads "…we observed 3 epidemic waves with high points in November 2020, April 2021, and August 2021." The reader can now refer to each epidemic wave in Figure 1A.