Infection kinetics of Covid-19 and containment strategy

The devastating trail of Covid-19 is characterized by one of the highest mortality-to-infected ratio for a pandemic. Restricted therapeutic and early-stage vaccination still renders social exclusion through lockdown as the key containment mode.To understand the dynamics, we propose PHIRVD, a mechanistic infection propagation model that Machine Learns (Bayesian Markov Chain Monte Carlo) the evolution of six infection stages, namely healthy susceptible (H), predisposed comorbid susceptible (P), infected (I), recovered (R), herd immunized (V) and mortality (D), providing a highly reliable mortality prediction profile for 18 countries at varying stages of lockdown. Training data between 10 February to 29 June 2020, PHIRVD can accurately predict mortality profile up to November 2020, including the second wave kinetics. The model also suggests mortality-to-infection ratio as a more dynamic pandemic descriptor, substituting reproduction number. PHIRVD establishes the importance of early and prolonged but strategic lockdown to contain future relapse, complementing futuristic vaccine impact.

www.nature.com/scientificreports/ and over-reliance on infection statistics in predicting mortality rate. 20 addressed this, but it lacked the probabilistic kernel of 21 . Another issue that has often been overlooked is the best possible containment strategy in coping with the disease. Standard approaches include social distancing 23 , contact tracing 24 , social seclusion between comorbid and healthy, selfquarantine of the infected (including asymptomatic). The target in all of these is to block the epidemic spread network so that the infection chain can be broken 25 .
Vaccines have led the fight against COVID-19 26 . Multiple vaccines are now available for public use that use differential chemical pathways, e.g. mRNA replication (Pfizer 27 , Moderna 28 ), viral vectors (Oxford-AstraZeneca 29 , Sputnik V 30 ), antibody formation through attachment to spike proteins (Covaxin 31 , Sinofarm 32 ), double-stranded DNA cloning (Janssen 33 ), genetic engineering of the SARS-Cov-2 spike proteins (Novavax 34 ). The vaccine arsenal is fast getting reinforced with newer additions, all targeted to mitigate the viral load as also to provide long term immunity. While expected to be a major immunity booster going forward, given the expected timeframes of vaccine rollout and perceived mutation towards newer strains of the virus (e.g. Indian variant B.1.617 35 , South African B.1.351 36 ) that have at times restricted the efficacy rates of vaccines [34][35][36] , the major defence front will still rely on transmission mitigation through restricted movement, mask usage, sanitation codes and avoiding public gatherings, the collective impact of which could be enumerated from the PHIRVD model.

Results
Infection kinetics of healthy and comorbid susceptible.  infection propagation epidemiology clearly points to the need for analyzing the vastly different infection and mortality profiles of the healthy versus the comorbid susceptible groups. Our key target is to study this interactive infection propagation and then predict future mortality and infection profiles, emphasizing mortality as the key policy indicator. The present article is to marry a robust Susceptible(S)-Infection(I)-Recovered(R)-Vaccinated(V) (SIRV) structure, estimating the reproduction number 37 , together with a Machine Learning (ML) prediction kernel, using a multi-layered error filtration structure, to generate a predictive model called PHIRVD (see "Methods" section). PHIRVD delivers three major successes at an unprecedented level of accuracy: prediction of the number of infected and dead over the next 30 days (validated using sparse data) for each of the 18 countries considered, a comparative analysis of the impact of lockdown using multiple withdrawal dates for 6 worst-hit countries with high ongoing infection rates, and a detailed temporal profile of future reproduction numbers that can be (and have been) verified against real data. PHIRVD also establishes mortality-to-infection ratio as the key dynamic pandemic descriptor instead of reproduction number.
Mathematical model-PHIRVD. Our compartmentalised Covid-19 pandemic kinetics uses a 6-dimensional dynamical system as in Eq. (1), combining SIR and SEIR kernels 38,39 , schematically outlined in Fig. 1:  www.nature.com/scientificreports/ The parameters in this model, that we call PHIRVD, characterize the infection rate of healthy agents ( β 1 ), infection rate of agents with pre-existing health conditions ( β 2 ), relapse rate ( β 3 ), conversion rates of recovered to healthy susceptible ( q 1H ) and previously "immuned" to healthy susceptible ( q 2H ), conversion rates of recovered to pre-existing susceptible ( q 1P ) and previously "immuned" to pre-existing susceptible ( q 2P ), death rate due to non-Covid interference ( γ ), additional death rate due to agents with pre-existing conditions ( δ ) and that due to infected ( ζ ), recovery rate (w), rate at which healthy ( h 2v ) and pre-existing susceptible ( p 2v ) groups are quarantined. Our focus being Covid-19 infection and mortality statistics, we neglect death rate ( γ = 0 ) and additional death rate ( δ = 0 ) due to all non-Covid causes. Since death rate of healthy infected is a lot lower than that of the comorbid and elderly death rate (https:// www. cdc. gov/ coron avirus/ 2019-ncov/ need-extra-preca utions/ older-adults. html) 40,41 ; hence we have added a practical constraint in our model to account for this effect that expresses in the form of β 1 < β 2 . Hence, the infection rate of H-group is considered to be a small fraction ( ) of the P-group, i.e. β 1 = β 2 . The death variable D thus acts like a "sink" of the dynamical system ensuring a population conservation inbuilt within the model ( H + P + I + R + V + D = constant). The PHIRVD model can be easily extended to incorporate the impact of upcoming and available vaccines. The impact points would be at the transitory phases between prolonged lockdown, characterized by low susceptible-infected coupling, to a lockdown withdrawal, typically leading to a surge in the infection/mortality traffic, a case of human reaction to maximize social expression.
In training our model, we find it useful to define an extra variable I c (t) , which represents the cumulative number of those infected upto a given date. In other words, it includes not only those who are currently infected, but also those who have since recovered or died, i. e. dI c dt = (β 1 Hβ 2 P + β 3 R)I. Since we have considered relapse in our model, it is to be noted that I c (t) = I(t) + R(t) + D(t).

Data repositories.
Identifying the infection kinetics of Covid-19 as an interactive evolution process involving six time evolving population density variables: healthy susceptible (H), susceptible with pre-existing conditions or comorbidity (P), infected (I), recovered (R), naturally immuned (i.e. a clone for vaccinated V) and dead (D), the PHIRVD model uses statistics from the Johns Hopkins Covid-19 database 42 to accurately predict mortality and infection statistics of 18 Asian, European and American countries. Data threshold was set beyond the first 19 days of low (or no) infection, followed by data training between 10 February 2020 to 29 June 2020. Results were later cross-verified from other databases e.g. US: https:// usafa cts. org; EU: https:// data. europa. eu/; UK: https:// coron avirus. data. gov. uk/; India: https:// www. covid 19ind ia. org/. The Bayesian Markov Chain Monte Carlo (MCMC) 43 infrastructure in PHIRVD trains the repository data to probabilistically predict the 17 parameters of the infection kinetic model (see "Methods" section). Unlike previous predictive Machine Learning models 14,[19][20][21][22] , this structure allows more dynamic adaptive control of the infection kinetic estimation resulting in a highly accurate predictive module.
Mortality and infection: prediction against reality. The 18 countries or regions under study were divided into 4 infection classes, the first three based on decreasing mortality-to-infection ratio for countries past their infection peak: UK, Netherlands, Sweden, New York State (Class A); Germany, Korea, Australia, Russia, Vietnam (Class B); and Italy, Spain, Hubei (Class C). Class Class D comprises India, Poland, Iran, France, Portugal and Brazil, with ongoing infection regimes. We deliberately chose New York State instead of the entire United States due to its high population density and tourist/ worker traffic that is quite different from the national average.
With the number of reported cases being highly dependent on the number of daily testings, not necessarily in agreement with the actual disease propagation dynamics, we observe some deviations between the simulated I(t) and the actual number of reported cases. On the other hand, D(t) is less affected by the testing rate. Since we are using mortality statistics with the same weightage as the infected data, we prioritize mortality prediction. We note that daily training of any epidemiological model will invariably achieve better data match, as many studies have shown. However, our ML embedded propagation kinetic model thrives on long term predictions, as much as possible.
Comparative statistics for our Class A representative, the UK, is shown in Fig. 2. The blue region marks the training zone that fixes the parameters. Based on the highest mortality to infection ratio in each group, the representative countries for the other 3 classes are Germany (Class B), Italy (Class C), India (Class D). Figures 3, 4 and 5 represent infection statistics for Class B (Germany), C (Italy) and D (India) respectively (other plots in Appendix II). Chi-square tested (see "Methods" section for Chi-squared statistic used) accuracy chart in Table 1 clearly points to the veracity of the accuracy claim made. On the other hand, Vietnam presents an interesting case. With a reported zero mortality rate notwithstanding high population density, it has been repeatedly cited as an example of early quarantine success. The model tracks even such an exceptional case to a moderate level of accuracy (in Appendix II). The outsets and insets respectively outline the cumulative versus the daily infection traffic. Details for other countries, for 4 infection classes, are provided in Appendix II. Table 2 presents a comparative chart of the PHIRVD model predictions versus real data, separately for the numbers of infected and dead, for countries representing the 4 classes with data trained between 10 February to 29 June: Class A (UK), Class B (Germany), Class C (Italy) and Class D (India). Futuristic prediction is shown until 12 July. For other countries in each individual class, with data training between 10 February to 10 May, 30 days' prediction until 9 June establishes the predictive strength of this model (see Tables S2-S5, Appendix III), error validated as shown in Table S1 (see Appendix I). Table 3 compares second wave mortality prediction obtained from PHIRVD against real data, based on data training until 29 June 2020. The result can be substantially improved if data is trained within a month of the www.nature.com/scientificreports/ resurgent wave, as in November 2020. But the reliability of prediction stretching up to 150 days beyond last data training is unprecedented to our knowledge and affirms the robustness of the model. The expected number of secondary cases produced from each infected individual is traditionally defined as the basic reproduction number. The detailed calculation of R e is provided in the "Methods" section. Figure 6 depicts the time evolution of basic reproduction number that indirectly reflects the emerging infection (and fatality) rate for the 4 representative countries from infection classes A-D, represented by the basic reproduction number R 0 44-46 (see "Methods" section). R 0 kinetics of all other countries are provided in Appendix I. Class A countries consistently show the sharpest drop in R 0 and the flattest stability period, followed by progressive R0 decay and waiting time, often the 'gestation time' , reflected by the plateau regions of the respective plots for classes B, C and D respectively. The point of note here is that while Germany and Italy show higher levels of infection than the UK, the gestation period for the UK is a lot larger than both. India shows a similar trend although the absolute numbers for India are a lot lower than the other three, indicating a complicated relationship between Full Width at Half Maximum (FWHM) and gestation period.

Discussion
Combining conventional infection kinetic modeling with a predictive Bayesian MCMC, PHIRVD quantifies the impact of lockdown as a containment tool. It estimates mortality statistics with high significance for 18 countries, accurate upto the next 30 days, beyond the last date of data training. Ideal lockdown imposition and withdrawal times have been predicted and validated, including for ongoing regimen e.g. India. PHIRVD also predicts secondary relapse timings and establishes mortality-to-infection ratio as the key pandemic predictive descriptor instead of reproduction number. PHIRVD is also capable of analyzing the impact of migration, an  /20  403  472  155  91  376  126  14  12  142  121  23  23  18641  17298  507  570   01/07/20  60  455  176  88  475  121  5  11  182  116  21  22  19160  17471  434  579   02/07/20  4  439  89  84  477  117  11  11  201  110  30  21  20903  17628  379  588   03/07/20  502  424  136  81  410  112  4  11  223  106  15  20  22771  17767  442  596   04/07/20  624  408  67  79  418  108  10  10  235  101  21  19  24850  17889  613    www.nature.com/scientificreports/ ongoing project. Our findings clearly suggest that phased lockdown is a potent containment tool but needs to be strategically imposed, where the correct implementation and withdrawal times are paramount. Secondary infection and mortality prediction will be key to future strategic quarantine imposition and analyzing impact of future therapeutics. PHIRVD leads to three key outcomes. First, we present highly accurate probabilistic predictions for the numbers of infected and dead for each country for a total of 18 countries, typically 3 weeks beyond the last date of (Machine Learned) data training. Our PHIRVD model depicts a high degree of reliability between model prediction against real data validation across the range of countries considered.
Our model can also be used to identify a better strategy for lockdown imposition, to minimize the fatality. The full simulations plots (in Appendix II) clearly outlines how an increasing infection profile initially matches with decreasing numbers of pre-existing susceptible and increasing statistics for the recovered, that then slows down as the infection peak arrives, eventually to tail off in to a no-infection landscape. While the qualitative trends are similar for all classes (A, B, C, D) of countries, the impact of lockdown on the first peak, and then a second (relapse) peak, hint at the internal health versus econometrics of the countries concerned. To prove this point, we compare infection (and mortality) propagation kinetics of 2 chosen countries for two different dates, one on the recess (UK: Fig. 7), the other with uprising infection level (India: Fig. 8). As opposed to the recent furore about school children being exposed to the Covid-19 menace as a result of early lockdown withdrawal, our result clearly shows that there is practically no difference in mortality between a withdrawal on June 1, 2020 as against a later withdrawal e.g. July 1, 2020 (although a withdrawal on May 1 would have been disastrous). The 1 June (almost equally safe) withdrawal would, of course, be favoured on economic and social grounds.
The third key outcome of our analysis is the establishment of mortality:infection ratio as the key descriptor of pandemic over and above reproduction number, that has conventionally been used for the purpose. The proof of this is in the accurate prediction of the secondary infection relapse time that the reproductive number fails to predict. As can be seen from Fig. 7a,b, this relapse time period could be deferred with a late lockdown withdrawal on July 1 (as compared to June 1) although the peak mortality rates are not hugely different (ca 200 at 1 July compared to ca 400 at 1 June). Using 1 July 2020 as the UK lockdown withdrawal date, there is a clear signature of secondary relapse in the first week of September (identified as the second peak in Fig. 7). The Indian situation is clearly more challenging, though, as shown in Fig. 8. While perhaps economically unsustainable, India could benefit with a lockdown even beyond 31 July, 2020. For other nations like Iran, Portugal, France and Poland, our predictions of non-trivial secondary relapses (all in late June) match almost perfectly with data, both infected and dead. As the second wave data is now available for the UK, we simulated it using our PHIRVD model. Results shown in Fig. 9 demonstrate excellent agreement with real statistics (data trained only up to 29 June 2020), that reaffirms the strength of the model.
A real point of contention amongst politicians, health professionals and medical scientists has, for long, been the correct lockdown implementation and withdrawal times. In statistical parlance, this effectively amounts to an estimation of the FWHM as has been estimated for Wuhan at 2.6 weeks from initial infection 47 . To analyze these counterclaims, we incorporate the effects of withdrawal of lockdown as a country specific, dynamically evolving quantity.
The availability of the awaited vaccines 26 , and of late, the therapeutic range 48,49 , have provided major immunity tools in the Covid firefight. The impacts of these vaccines are most likely to be futuristic antibody switch though,

Methods
Motivation of the PHIRVD model. PHIRVD To evaluate the reproduction number R e , we have to break the equation of dI dt into two parts F , V , i.e., where F = (β 1 H + β 2 P + β 3 R)I and V = (ζ + w)I . Now, F = ∂F ∂I | DFE and � = ∂V ∂I | DFE . Then Lockdown dynamics. During the time period, over which we trained our model, most of the countries (except Sweden), of our interest, were under lockdown. Therefore, we studied the effects of withdrawal/relaxation of lockdown for some countries by introducing a time varying parameter L(t) in the model in Eq. (1) substituting β 1,2,3 with β 1,2,3 L(t) respectively, where L(t) = 1 for t ≤ t 0 , and α for t ≥ t 0 + k . For t 0 < t < t 0 + k , Here t 0 marks the lockdown withdrawal time point, k is the approximate time duration during which the susceptible and infected population mixes well (e.g. within one week or one month etc.), where α is the parameter quantifying the intensity of mixing between susceptible and infected population. A larger α value implies a higher mixing rate among susceptible and infected individuals. The function L(t) is such that before lockdown withdrawal, it does not alter the contact probability while after withdrawal, it linearly increases from the value 1 to α over a time interval of k days, ensuring that the contact probability between susceptible and infected increases from a low to a high value within this time period.
Parameter estimation. The Bayesian MCMC data training leading to supervised learning is itself conducted in two steps using a double-filtration process. First, infection data alone are used to arrive at a preliminary set of values, characterizing each country. The said values are then filtered through combined infected and mortality statistics for a second training to sequentially converge to a preset upper limit. The training schedule is repeated multiply to ensure accurate predictions of the training dataset. Estimation of the equilibrium reproduction number is strategically used to reduce the effective parameter space from 13 to 8 parameters, perfectly conforming with the Bayesian MCMC prediction which shows that value fluctuations with other parameters do not contribute much to the infection kinetics. The model clearly separates the H and P infection classes to reflect their differential levels of infection and mortality. Another constituent is the death rate kinetics embedded in the central structure. The infection propagation model outlined in Eq. (1) is a multi-parameter model whose parameters are evaluated using predictive data modeling within the Bayesian MCMC construct. Similar structures have been selectively used in 21,22 albeit for single-country specific models without any explicit mortality dynamics. Over-reliance on infection statistics has often led to incorrect estimation for mortality statistics, whose accurate prediction is our first key target, an aim that is remarkably well served by our ML-embedded compartmentalised model. We present both the cumulative and daily (inset plots) statistics of infected population over 400 days, data trained between 10 February 2020 to 29 June 2020 (140 days) and then predicted up to the next 8 weeks (shown up to 12 July 2020 in Table 1).

The Bayesian Markov chain Monte Carlo (MCMC) algorithm.
To understand how the algorithm uses the data to determine the parameters, it is useful to recall some elements of Bayesian statistics 50,51 . Let D = (D 1 , D 2 , . . . , D n ) represent the full data vector that is being used to train the algorithm. For our case, the subscripts run over both the time intervals (daily) as well as the data types, such as I c (t i ) and D(t i ) . Similarly, let = (θ 1 , θ 2 , . . . , θ α ) represent the vector of parameters. A key ingredient is the prior probabibility distribution (Bayesian priors) for each θ i . While the absence of any knowledge of the system would call for a prior that is flat in the physically allowed region, the incorporation of such knowledge (which, in the present context, could be divined from the analysis of, say even part of the data for a single country in a given class) quickly gives the prior a somewhat peaked structure. In other words, one could as well start with a normal-distributed prior, viz., � ∼ N(� 0 , σ ) , where the vector 0 represents the mean of the parameters and σ = (σ 1 , σ 2 , . . . , σ α ) the standard deviation. As it turns out, the dependence of the final result on the prior is quite insignificant.Given a , it is straightforward to calculate the conditional probability P (D| ) of obtaining a realization D for the data. Using Bayes' theorem, the posterior probability for given the data is expressed as where P (D) = � P(D|�)P (�)d� , with denoting the whole parameter space. This, immediately leads us to the likelihood ratio of two parameter vectors 1 and 2 , namely www.nature.com/scientificreports/ We now resort to a 3-step algorithm: 1. Choose parameters (including initial conditions) through a random walk in the parameter space. The nature of the random walk is determined by the prior probability distributions for the parameters, including initial conditions. 2. Calculate the likelihood ratio function for the parameters, given the data. 3. Decide whether to accept the suggested parameter set or not.
Step 1: Let S i = (S i1 , S i2 , . . . , S in ) be the simulated vector at the ith step for parameter values i = (θ i1 , θ i2 , . . . , θ iα ) . Compared to the total population, the data I c (t), D(t) etc. are quasi-continuous and can be assumed to be drawn from a Normal distribution with respective standard deviations Ŵ = (γ 1 , γ 2 , . . . , γ n ) and means S i = (S i1 , S i2 , . . . , S in ) . Therefore, the posterior probability (or likelihood, in case of continuous probability density) of the parameter vector i is, Next, we execute a random walk in -space with distribution N(� i , σ ) to find i+1 , and calculate again the posterior likelihood function, with the simulated data vector S i+1 , corresponding to the parameter vector i+1 as Step 2: The likelihood ratio is now calculated to be P ( i+1 |D)/P ( i |D).
Step 3: Next, we generate a uniform random number r ∼ U[0, 1] . If r < P ( i+1 |D)/P ( i |D) , we accept i+1 , otherwise we go back to Step 1 and repeat the procedure.

Estimation of the reproduction number kinetics.
Understandably, the basic reproduction number R 0 is no longer a constant. Defining R 0 (t) as the average number of secondary infections from a primary case at a given epoch t, and similarly I d (t) as the number of daily new cases, we have where g(τ ) is the probability density function of the generation time τ , defined as the time required for a new secondary infection to be generated from a primary infection. In other words, τ is the time interval between the onset of a primary case to the onset of a secondary case, generated from this primary case. As is reported 37 , the mean generation time is approximately 6.5 days, we assume g(τ ) has a Gamma distribution with g(τ ) = Gamma(6.5, 0.62) . We represent R 0 (t) as a function of time as We approximate the denominator of Eq. (8) directly from our simulated data, by a discrete sum, and evaluate R 0 at nth day as (4) P ( 2 |D) P ( 1 |D) = P (D| 2 )P ( 2 ) P (D| 1 )P ( 1 ) . ( 0 < ǫ < 1 ), where D i are observed data and S i the simulated data for the i th day, we quantify the accuracy of our model fitting with the real data. Understandably, the data for daily new infections and daily new deaths are contaminated by noise, more severely than the corresponding cumulative data. Hence, a Chi-square test applied on cumulative data will always give a high p-value. However, to test the power of our predictive machine learning algorithm, we calculated the p-values on daily new data of deaths and infected. Assuming the real data are drawn from a normal distribution with mean value same as the simulated data, and with a standard deviation equal to some fraction of the simulated data, we derive our Chi-square statistic. Although, the real data of infected and dead are always positive, as the infection increases, this assumption is very well valid, except for a very small time interval at the starting of infection in a population. .