Impact of US vaccination strategy on COVID-19 wave dynamics

We employ the epidemic Renormalization Group (eRG) framework to understand, reproduce and predict the COVID-19 pandemic diffusion across the US. The human mobility across different geographical US divisions is modelled via open source flight data alongside the impact of social distancing for each such division. We analyse the impact of the vaccination strategy on the current pandemic wave dynamics in the US. We observe that the ongoing vaccination campaign will not impact the current pandemic wave and therefore strict social distancing measures must still be enacted. To curb the current and the next waves our results indisputably show that vaccinations alone are not enough and strict social distancing measures are required until sufficient immunity is achieved. Our results are essential for a successful vaccination strategy in the US.


Methodology
In this section we briefly review our methods that include the open source flight data description, their interplay with the eRG mathematical model framework and, last but not least, the interplay with vaccine deployment and implementation. www.nature.com/scientificreports/ Data description. The flight data comes from the OpenSky Network, which is a non-profit association that provides open access to real-world air traffic control dataset for research purposes 31 . The OpenSky COVID- 19 Flight Dataset (opens ky-netwo rk. org) was made available in April 2020 and is currently updated on a monthly basis, with the purpose of supporting research on the spread of the pandemic and the associated economic impact. This dataset has been used to investigate mobility in the early months of the pandemic 32 as well as the pandemic's effect on economic indicators 33 .
The data provides information about the origin and destination airports as well as the date and time of all flights worldwide. For our analyses we considered domestic flights in the US only. We aggregated the data, to obtain the number of flights between all pairs of airports per day, from the beginning of April until the end of October, 2020. Subsequently, the airports in each state and the number of flights associated with them were combined, to give the number of within and between state flights, on a day to day basis for the whole period.
The number of daily infected cases, which is also used for analysis in this paper, is provided by the open source online repository Opendatasoft (public. opend ataso ft. com/ explo re/ datas et/ testi ng-data-covid 19-usa/). Mathematical modeling. The states within the US have different population and demographic distribution. A state-by-state mathematical modeling, therefore, is challenged by statistical artifacts. For these reasons we group the states following the census divisions (US Censu s Bureau), as summarized in Table 1 and illustrated in Fig. 1. Note, that contrary to the official definitions, we include Maryland and Delaware in Mid-Atlantic instead of South Atlantic. The main reason is that the population of these two states is more connected to states in Mid-Atlantic, as proven by the diffusion timing of the virus.
Building upon our successful understanding of the COVID-19 temporal evolution 34 we apply our framework to the US case. Building on that framework we employ the following eRG set of first order differential equations 15 to describe the time-evolution of the cumulative number of infected cases within the US divisions: (1)  www.nature.com/scientificreports/ where with I i (t) being the cumulative number of infected cases per million inhabitants for the division i and ln indicating its natural logarithm. These equations embody, within a small number of parameters, the pandemic spreading dynamics across coupled regions of the world via the temporal evolution of α i (t) . The parameters γ i and a i can be extracted by the data within each single wave. The fit methodology is described in 14,15 . In the US, it is well known that the COVID-19 pandemic started in NE and MA (mainly in New York City) and then spread to the other divisions. Thus, we define the US first wave period from March to the end of August as shown in Fig. 2. In particular, one observes a peak of new infected in NE and MA around April, while for the other divisions the main peak occurs around July. We also observe an initial feature in the latter divisions that we did not attempt to model except for ENC (mostly located in Chicago) and WNC. For the two latter divisions, we considered these as two independent first wave components. The US second wave is thus associated with the episode starting in October, 2020.
As a first method, working under the assumption that the US pandemic indeed originated in New York (MA), we first determine the k ij matrix entries between the division MA and the others. The values are chosen to reasonably reproduce the delay between the main peaks of the first wave in pairs of divisions (C.f. the top section in Table 2). Interestingly, with the exception of NE, the entries of the k matrix are comparable to the ones  Table 2. Values of the k ij entries among US divisions. In the top section, the values between Mid-Atlantic (MA) and the other divisions are obtained from fits of the first wave timing. In the central and bottom sections, the complete matrix (except the entries between MA and NE) is obtained using flight data for the first wave (from April 1st to May 31st) and the second (from September 1st to October 31st), respectively.  34 . For NE, a large coupling is needed due to the tight connections between the two regions, in particular New York City with the neighbouring states and Massachusetts. As a second method, we used the flight data to estimate the number of travellers between different divisions, under the assumption that the k ij matrix entries are proportional to this set of data. To have a realistic matrix for k ij , we first take the mean number of flights from division i to division j during the period from April 1st to May 31st for the first wave, and from September 1st to October 31st for the second wave. Then, we multiply the number of flights by an effective average number of passengers, and normalize it by 10 6 , following the definition of k ij 15 . For the first wave, the optimal average number of passenger is found to be 10, while for the second wave we find an optimal value of 5. Note that these values do not correspond to the actual number of passengers in the flights: in fact, the values of the couplings k ij also take into account the probability of the passengers to carry the infection as compared to the average in the division of origin. When the value is low it might suggest that the sample of passengers in a flight is less infectious than average, as people with symptoms tend not to travel. Controls at airports may also contribute to this. The key information we extract from the flight data is the relative flux of infections among different divisions.
The results are listed in the middle and bottom sections of Table 2. We keep the same value from the previous fit only for MA-NE. The reason behind this choice is the tight connection between the two divisions, where most of the human mobility is imputable to road transport.
By the end of November, we clearly observe a new rise in the number of infections, signalling the onset of a second wave pandemic in the US (see Fig. 2). Using our framework, we model and then simulate the second wave across the different US divisions.
Finally, to check the geographical diffusion of the virus during the various phases of the pandemic in the US, we define an indicator of the uniformity of the new case incidence 18 . This indicator can be defined week by week via a χ 2 -like variable, given by: is the number of new cases per week in division i at time t and I ′ (t) the mean of the same quantity in the 9 divisions. The parameter χ 2 quantifies the geographical diffusion of the SARS-CoV-2 virus in the US: the smaller its value, the more uniform the pandemic spread within the whole country. The result is shown in Fig. 3: during the first peak in April (light gray shade), the value of χ 2 is large, signalling that the epidemic diffusion is localized in a few divisions; during the second peak of the first wave (gray shade), the value has dropped, signalling that the epidemic has been spreading to all divisions. Finally, the data for the ongoing second wave (dark gray shade) shows that χ 2 is dropping towards zero, as expected for a more diffuse incidence of infections.

Vaccine deployment and implementation. Various vaccines have been developed for the COVID-19
pandemic, and their deployment in the US has already started on December 14th (https:// www. washi ngton post. com). The effect of the immunization due to the vaccine has been studied in the context of compartmental models, like SEIR 30 . In our mathematical model, the simplest and most intuitive effect is a reduction of both the total number of infections during a single wave, a i , and/or the effective diffusion rate of the virus γ i , in each division.
To validate this working hypothesis, and understand how the vaccination of a portion of the population affects the values of a and γ in the eRG framework, we studied the effect of immunization in a simple percolation model, which has been shown to be in the same class of universality as simple compartmental models 35 . To do so, we set up a Monte-Carlo simulation, consisting of a square grid whose nodes are associated to a susceptible individual. Each node can be in four exclusive states: Susceptible (S), Infected (I), Recovered (R) or Vaccinated (V). At each step in time in the simulation, for each node we generate a random number r between 0 and 1: if the node is in state S in proximity to a node in state I and r < γ * , we switch its state to I, else it remains S; if the Figure 3. Evolution of the uniformity indicator χ 2 over time (weekly basis). The shaded bands indicate the period when epidemic peaks are recorded. www.nature.com/scientificreports/ node is in state I and r < ǫ * , we switch its state to R, else it remains I; if the node is in state R or V, it will not change. This model reproduces the diffusion of the infection, where γ * is the infection probability on the lattice and ǫ * is the recovery rate. Finally, we fit the data from the simulation to the solution of a simple eRG equation to extract γ and a. The vaccination is implemented by setting a random fraction R v of nodes to the state V before the simulation starts. The values of a and γ as a function of the fraction of vaccinations are shown in Fig. 4: we observe that both parameters are reduced by the same percentage as the vaccination up to R v 25 %. Above this value of vaccinated nodes, the simulation is unstable and the result cannot be trusted. This result, nevertheless, demonstrates that the vaccination reduces both parameters a and γ proportionally reinforcing our expectation. In a realistic scenario, the vaccination of the population can only be implemented in a gradual way, so that the total vaccination campaign has a duration in time. We can thereby assume that a fraction R v of the population is vaccinated in a time interval t . The rate of vaccinations is therefore c = R v /� t . This implies that the variation in γ , during the time interval from t v to t v + t , is given by: where γ (t v ) is the effective infection rate before the start of the vaccination campaign. The solution for the timedependent effective infection rate is until t = t v + t , after which γ remains constant again at a reduced value γ (t v ) (1 − R v ).
To find the variation of a(t) within the vaccination interval t v to t v + t , we assume that the not-yet-infected individuals are vaccinated at the same rate c as the total population. Thus, at any given time, the variation in the number of individuals that will be exposed to the infection, I exp (t) = e a(t) , is proportional to the difference e a(t) − e α(t) . This leads to the following differential equation: This equation needs to be solved in a coupled system with the eRG one. Note that the derivative is zero outside of the time interval [t v , t v + t ] . In the numerical solutions for the effect of the vaccine, we will add one equation for each a i (t) , assuming that the vaccination rate c is the same in all divisions.

Results
Validating the eRG on the first wave data. The epidemic data (C.f. Fig. 2) shows that the MA division (New York City) was first hit hard by the COVID-19 pandemic, and was followed closely by NE. The other divisions witnessed a comparable peak of new infections 3-4 months later. Note that we are using cases normalized per million to facilitate the comparison between divisions with different population. As a first study, we want to test the eRG equations (1) against the hypothesis that the epidemic has been diffusing from MA to the other divisions. The parameters a i and γ i are fixed by fitting the data, as shown in Table 3. Thus, the timing of the peaks in the divisions is determined by the entries of the k ij matrix. Determining all 81 entries from the data is not possible, as we only have 9 epidemiological curves. Thus, we assume that only the couplings between the source MA and any other division are responsible. The results of the fits are shown in the top block of Table 2, and will be used as a control benchmark.
Except for k 21 that links NE and MA, all the other k 2j are of order 10 −3 , thus confirming the range we found for the European second wave 34 . The value of k 21 is of order unity, which implies that there is a stronger connection between the two divisions. This may be explained by the fact that there exist a significant flow of people between New York City and the neighbouring states (including Massachusetts) in New England. Work commutes and weekend travelling by car explains the required high number of travellers per week. Another interesting feature is the presence of a small peak of infections for ENC and WNC, around March. This feature cannot originate from the MA division, as that would imply a k-value of order 10, which is clearly unrealistic 15 . The only viable  www.nature.com/scientificreports/ solution is that the epidemic hit these two divisions from abroad. On the other hand, the second peak observed around August can be explained by the interaction with MA. The values of k ij are, in principle, determined by the flow of people between different divisions. Thus, we could use any set of mobility data 36 to estimate the relative numbers of the entries, while the normalization also depends on the effective infection power of the traveling individuals and it can be determined from the data. With the help of mobility data, we can reduce the 81 parameters to a single one. Due to the large distances across divisions, we decided to focus on the flight data, as described in the methodology section. The values of the entries are reported in the middle section of Table 2. Note that for MA-NE we used the same value obtained from the previous fit, as the people's flow is mostly dominated by land movements.
Using this matrix of k ij to simulate the spread of the first wave across the country, as originating from MA, we obtain the curves in the left panels of Fig. 5. For nearly all divisions, we obtain the correct timing for the peak, with the exception of SA and ESC (for ENC and WNC, the anomaly may be linked to the presence of a mild early peak and the absence of a prominent second peak). The results are more accurate for divisions far from MA, thus validating the method as the diffusion of the virus seems to depend on the people travelling (by air) among divisions. For SA, the predicted curve is substantially anticipated compared to the data: this discrepancy may be explained by the presence of an air hub in Atlanta, GE, so that many of the passengers of flights landing there do not stop in the division but instead take an immediate connecting flight.
Understanding the second wave. The US states are currently witnessing a second wave, which is ravaging in all the 9 divisions with comparable intensity. Previous studies in the eRG framework have uncovered two possible origins for an epidemic wave to start: one is the coupling with an external region with a raging epidemic 15 , the second is the instability represented by a strolling phase in between waves 17,18 . We have shown that the former mechanism can account for the peak structure during the first wave.
As a first step, we will try to use the same method to understand the second wave. Since travelling to the US from abroad has been strongly reduced and regulated, we will consider the divisions that witnessed a peak in July-August as source for the second wave. To this purpose, we define a Region-X 15 as an average sum of all the divisions with a pandemic peak occurring in the July-August period. The parameters are chosen to reproduce the number of cases in the totality of the relevant 7 divisions (SA, ESC, NSC, ENC, WNC, M and P) normalized by the total population. For each division, we optimized a i and γ i to reproduce the current data adjourned at December 16th (C.f. Table 3). For the couplings k ij we use the flight data, except for the usual MA-NE couplings (C.f. bottom section of Table 2). Finally, the k 0j connecting the 9 divisions to the source Region-X are computed by summing the k entries between the division j and the 7 divisions used to model Region-X (also derived from flight data).
The results of the eRG equations are shown in the right panels of Fig. 5, showing a good agreement. The values of the k 0j of the Region-X are one order of magnitude larger than the others. This fact can be interpreted by the presence of hotspots in each division which also contribute significantly to the new wave. In other words, traveling among divisions cannot be the only responsible factor for the onset of the second wave in the US. This hypothesis can also be validated by studying the uniformity of the distribution of the new infections in various states during the three peaks, as shown in Fig. 3. Comparing the three peak regions, we see that the uniformity indicator is systematically decreasing, thus indicating a more geographically uniform presence of the virus.
It is also interesting to notice that the value of γ i for the second wave is systematically smaller than the infection rate during the first wave. This is in agreement with the results we found in Ref. 17,18 , where we modelled the multi-wave structure of the pandemic via an instability inside each region. The result of this simple analysis supports the hypothesis that the virus is now endemic for all states in the US, thus a multi-wave pattern will continue to emerge. Traveling among states (or divisions) is less relevant at this stage.
The result of our eRG analysis shows that the current wave will end in March-April 2021. Note, however, that we have not taken into account the potential disastrous effect of the Christmas and New Year holidays, which Table 3. Parameters of the eRG model for the first and second wave in the 9 divisions. For the first wave, we report the values from the fit, including the 1σ error. For the second wave, the values are chosen to reproduce the current data, adjourned to December 16th. www.nature.com/scientificreports/ could lead to an increase in the infection rates. In some divisions there is a increase at the end of November, which can be attributed to the Thanksgiving holiday.
Effect of the current vaccination strategy. Following the development of multiple vaccines for the SARS-CoV-2 virus (https:// www. nature. com/ artic les/ d41586-020-03626-1), vaccination campaigns have started in many countries, including the US. This will influence the development of the current wave, and help in curbing the future ones. The vaccination campaign started on December 14th in the US (https:// www. washi ngton post. com). We also know that the US has purchased 100 million doses from Pfizer (plus an additional 100 million from Moderna) (https:// www. forbes. com), so that at least 20% of the population may be vaccinated in this first campaign. As of December 28, 0.64% of the total population has been vaccinated (https:// ourwo rldin data. org) in a little over 1 week, thus in our study we will use this as a benchmark weekly rate. The data listed above defines our starting benchmark for the current vaccination campaign. www.nature.com/scientificreports/ To study the effect of the vaccinations, we have solved the eRG equations for the second wave, with the addition of the reduction of a i and γ i , as detailed in the methodology section. We show the result for two sample divisions in Fig. 6 (dashed curves) as compared to the same solutions without vaccines (solid curves). A vaccination at a 0.64% rate per week does not affect the peak of new infections. As a reference, we also increased the vaccination rates to 1% and 2% : in these cases, an important flattening of the epidemic curve can be observed for SA, where the vaccination started early compared to the peak of infections. This situation may be realized, as the vaccine is being administered to the population that is more at risk of being infected by the virus. In the other extreme case, represented by WNC, the vaccine is ineffective in changing the current wave because the peak has already been attained before the vaccination campaign started.
Our results confirm that the current vaccination strategy, which is performed during a peak episode, is not effective to substantially slow down the spread of the virus. On the other hand, the effectiveness for future waves is not a question. It would be, in fact, very efficient to be able to administer the vaccine to a larger portion of the population before the start of the next wave.
Update of the vaccination to the first quarter 2021. As shown in the right column in Fig. 5, our simulation of the second wave, done in mid December 2020, reproduces very well the epidemiological data up to March 17, 2021. The only exception is Pacific, which has seen a sharper drop in the number of new infections. Furthermore, one can clearly see a rebounce in January that can be accredited to the Christmas holidays. Nevertheless, this small effect does not have a significant impact on the agreement of our prediction with the data.
In the first quarter of 2021, the vaccination campaign has also taken off steadily, with nearly a quarter of the US population having received at least one shot of vaccine. Furthermore, since February 27 the FDA has authorised the use of the Janssen mono-dose vaccine (https:// www. fda. gov/ emerg ency-prepa redne ss-and-respo nse/ coron avirus-disea se-2019-covid-19/ janss en-covid-19-vacci ne), which is now being administered together with the two-dose Pfizer and Moderna vaccines. The data show that the rate of vaccinations has been increasing approximately linearly with time, thus we updated the prediction to take into account a vaccination fraction c(t) growing linearly with time: Figure 6. Evolution of the number of infections without vaccination ( c = 0 ) and with a vaccination rate of 0.64%/week, 1%/week and 2%/week starting on December 14th and stopping at 20% of the population vaccinated. We show the results for two sample divisions: South Atlantic and West North Central.  Table 4.
The new results are shown in Fig. 7 for the 9 divisions. We consider both people that received at least one shot (partial vaccination, in dash-dotted lines) and fully vaccinated ones (dashed lines), with an offset of 4 weeks between the beginning of the two. We consider them as two extreme cases, defining a systematic error in our account of vaccinations. In all divisions, the effect is minor, as the vaccination campaign has started too close to the peak of the second wave. The only exception is Pacific, where taking into account vaccinations substantially improves the agreement with the data.
The updated results confirm that a vaccination campaign operating during a wave will not significantly affect the timing and height of the peak. Social distancing and containment measures remain necessary. Conversely, vaccinating a large portion of the population will certainly curb the eventual next wave.

Discussion
In this paper we employ the epidemic Renormalization Group (eRG) framework in order to understand, reproduce and predict the diffusion of the COVID-19 pandemic across the US as well as the effect of vaccination strategies. By using flight data, we are able to see the changes in mobility across the divisions, and observe how these changes affect the spread of the virus. Furthermore, we show that the impact of the vaccination campaign on the current wave of the pandemic in the US is marginal. Based on that, the importance of social distancing is still relevant. Furthermore, we demonstrate that the current wave is due to the endemic diffusion of the virus. Therefore, building upon our previous results 18 , in order to control the next pandemic wave the number of daily new cases per million must be around or less than 10-20 during the next inter-wave period. This conclusion is further corroborated in Ref. 37 for Europe.
We learnt that the number of infected individuals in the current wave are not affected measurably by the vaccination campaign. However, it is foreseeable that it will impact specific compartments such as the overall number of deceased individuals. Our study included an immunization rate between 0.64 to 2% of the total population each week. We also updated the results with the actual rates of vaccination in the different divisions, as of March 24, 2020. The results of our eRG model agree remarkably well with the new data from December 28, 2020, to March 17, 2021. To curb the current and the next waves, our results indisputably show that vaccinations alone are not enough and strict social distancing measures are required until sufficient immunity is achieved.  Figure 7. Results of the eRG solutions for the second wave with a vaccination campaign based on the data. Here, we consider a vaccination rate linearly increasing in time, with slopes given in Table 4. The eRG parameters are the same used for Fig. 5, based on data until December 28, 2020.