Abstract
The devastating trail of Covid19 is characterized by one of the highest mortalitytoinfected ratio for a pandemic. Restricted therapeutic and earlystage vaccination still renders social exclusion through lockdown as the key containment mode.To understand the dynamics, we propose PHIRVD, a mechanistic infection propagation model that Machine Learns (Bayesian Markov Chain Monte Carlo) the evolution of six infection stages, namely healthy susceptible (H), predisposed comorbid susceptible (P), infected (I), recovered (R), herd immunized (V) and mortality (D), providing a highly reliable mortality prediction profile for 18 countries at varying stages of lockdown. Training data between 10 February to 29 June 2020, PHIRVD can accurately predict mortality profile up to November 2020, including the second wave kinetics. The model also suggests mortalitytoinfection ratio as a more dynamic pandemic descriptor, substituting reproduction number. PHIRVD establishes the importance of early and prolonged but strategic lockdown to contain future relapse, complementing futuristic vaccine impact.
Introduction
Deadlier than most pandemics in the last 100 years, barring HIV and plague, Covid19 rages on despite imposition of movement restrictions as well as clinical testing and community health measures^{1,2}. As of 4 August 2020, SARSCOV2 has infected ca 18.5 million worldwide with ca 700,000 dead. Covid19 containment has been a major strategic issue for governments worldwide, with particular emphasis on the correct lockdown timing and span. Alarming belated infection spurt have been registered in overpopulated countries like India, Brazil and Iran with early and extensive lockdowns. While the low mortality rates exhibited by lowresourced yet densely populated Asian countries have been attributed to the relative youth of the populations^{3}, sparsely populated Sweden depicts an alarming deadtoinfected ratio in contrast to its European neighbours^{4}.
Quarantine has been advised as the best infection control measure^{5,6}. This has led to key questions as to the ideal start point and the absolute span of the ensuing lockdown. Major cases in support of lockdown are Vietnam and Cuba, that have claimed almost no death^{7,8}, although such claims have been questioned^{9}. In countries like Italy, the UK, the US, Sweden and Brazil, with strategic reluctance for early lockdown, comparatively softer prohibition lockdown protocols have admittedly transpired to worrisome statistics. On the other hand, European countries like Germany, the Netherlands, Belgium and France as also nonEuropean countries like Australia, New Zealand and Korea who enforced early lockdowns initially registered remarkably low infection and mortality rates^{10}, with \(1.0<R_0<2.0\) during lockdown, that spiked later (www.worldometers.info). Many suffered from reinfection relapse^{11,12} with a sudden spurt in infection^{13}. Regions like India, Iran and New York State, with variable quarantine measures, have all seen late infection surges. While India resorted to an early clampdown with an early withdrawal, New York State resorted to a late lockdown, but both with similar numerical implications, a feature attributed to inevitable movement of migrant workers^{14}.
Analyzes of the SARS epidemic of 2003 showed that case isolation and contact tracing^{1,15}, while highly effective if implemented at early stages, become ineffectual if the basic infection spread occurs before symptomatic detection^{16,17}. This finding was revisited in Covid19 transmission kinetics^{18} pointing to the importance of appropriate early (presymptomatic) stage strategizing. Other studies stress the importance of combining isolation^{19}, social distancing with widespread testing^{20} and contact tracing^{2}. Initial predictive models^{14} used data from Wuhan and Italy^{20}. Both efforts suffer from a lack of robustness due to inaccurate future prediction that is reliant on sparse data, devoid of any inherent ML training protocol to emphasize on prediction rather than on data fitting.The first predictive study used a Bayesian inference structure on a simplistic SIRV model^{21,22}, using infection statistics from Germany. While a move in the right direction, it suffered from two key deficiencies: lack of a time evolving death rate as an independent dynamical variable and overreliance on infection statistics in predicting mortality rate.^{20}addressed this, but it lacked the probabilistic kernel of^{21}. Another issue that has often been overlooked is the best possible containment strategy in coping with the disease. Standard approaches include social distancing^{23}, contact tracing^{24}, social seclusion between comorbid and healthy, self quarantine of the infected (including asymptomatic). The target in all of these is to block the epidemic spread network so that the infection chain can be broken^{25}.
Vaccines have led the fight against COVID19^{26}. Multiple vaccines are now available for public use that use differential chemical pathways, e.g. mRNA replication (Pfizer^{27}, Moderna^{28}), viral vectors (OxfordAstraZeneca^{29}, Sputnik V^{30}), antibody formation through attachment to spike proteins (Covaxin^{31}, Sinofarm^{32}), doublestranded DNA cloning (Janssen^{33}), genetic engineering of the SARSCov2 spike proteins (Novavax^{34}). The vaccine arsenal is fast getting reinforced with newer additions, all targeted to mitigate the viral load as also to provide long term immunity. While expected to be a major immunity booster going forward, given the expected timeframes of vaccine rollout and perceived mutation towards newer strains of the virus (e.g. Indian variant B.1.617^{35}, South African B.1.351^{36}) that have at times restricted the efficacy rates of vaccines^{34,34,36}, the major defence front will still rely on transmission mitigation through restricted movement, mask usage, sanitation codes and avoiding public gatherings, the collective impact of which could be enumerated from the PHIRVD model.
Results
Infection kinetics of healthy and comorbid susceptible
COVID19 infection propagation epidemiology clearly points to the need for analyzing the vastly different infection and mortality profiles of the healthy versus the comorbid susceptible groups. Our key target is to study this interactive infection propagation and then predict future mortality and infection profiles, emphasizing mortality as the key policy indicator. The present article is to marry a robust Susceptible(S)Infection(I)Recovered(R)Vaccinated(V) (SIRV) structure, estimating the reproduction number^{37}, together with a Machine Learning (ML) prediction kernel, using a multilayered error filtration structure, to generate a predictive model called PHIRVD (see “Methods” section). PHIRVD delivers three major successes at an unprecedented level of accuracy: prediction of the number of infected and dead over the next 30 days (validated using sparse data) for each of the 18 countries considered, a comparative analysis of the impact of lockdown using multiple withdrawal dates for 6 worsthit countries with high ongoing infection rates, and a detailed temporal profile of future reproduction numbers that can be (and have been) verified against real data. PHIRVD also establishes mortalitytoinfection ratio as the key dynamic pandemic descriptor instead of reproduction number.
Mathematical model—PHIRVD
Our compartmentalised Covid19 pandemic kinetics uses a 6dimensional dynamical system as in Eq. (1), combining SIR and SEIR kernels^{38,39}, schematically outlined in Fig. 1:
The parameters in this model, that we call PHIRVD, characterize the infection rate of healthy agents (\(\beta _1\)), infection rate of agents with preexisting health conditions (\(\beta _2\)), relapse rate (\(\beta _3\)), conversion rates of recovered to healthy susceptible (\(q_{1H}\)) and previously “immuned” to healthy susceptible (\(q_{2H}\)), conversion rates of recovered to preexisting susceptible (\(q_{1P}\)) and previously “immuned” to preexisting susceptible (\(q_{2P}\)), death rate due to nonCovid interference (\(\gamma\)), additional death rate due to agents with preexisting conditions (\(\delta\)) and that due to infected (\(\zeta\)), recovery rate (w), rate at which healthy (\(h_{2v}\)) and preexisting susceptible (\(p_{2v}\)) groups are quarantined. Our focus being Covid19 infection and mortality statistics, we neglect death rate (\(\gamma =0\)) and additional death rate (\(\delta =0\)) due to all nonCovid causes. Since death rate of healthy infected is a lot lower than that of the comorbid and elderly death rate (https://www.cdc.gov/coronavirus/2019ncov/needextraprecautions/olderadults.html)^{40,41}; hence we have added a practical constraint in our model to account for this effect that expresses in the form of \(\beta _1<\beta _2\). Hence, the infection rate of Hgroup is considered to be a small fraction (\(\lambda\)) of the Pgroup, i.e. \(\beta _1=\beta _2 \lambda\). The death variable D thus acts like a “sink” of the dynamical system ensuring a population conservation inbuilt within the model (\(H+P+I+R+V+D\) = constant). The PHIRVD model can be easily extended to incorporate the impact of upcoming and available vaccines. The impact points would be at the transitory phases between prolonged lockdown, characterized by low susceptibleinfected coupling, to a lockdown withdrawal, typically leading to a surge in the infection/mortality traffic, a case of human reaction to maximize social expression.
In training our model, we find it useful to define an extra variable \(I_c(t)\), which represents the cumulative number of those infected upto a given date. In other words, it includes not only those who are currently infected, but also those who have since recovered or died, i. e. \(\frac{dI_c}{dt}=(\beta _1 H \beta _2 P + \beta _3 R) I.\) Since we have considered relapse in our model, it is to be noted that \(I_c(t) \ne I(t) + R(t) + D(t)\).
Data repositories
Identifying the infection kinetics of Covid19 as an interactive evolution process involving six time evolving population density variables: healthy susceptible (H), susceptible with preexisting conditions or comorbidity (P), infected (I), recovered (R), naturally immuned (i.e. a clone for vaccinated V) and dead (D), the PHIRVD model uses statistics from the Johns Hopkins Covid19 database^{42} to accurately predict mortality and infection statistics of 18 Asian, European and American countries. Data threshold was set beyond the first 19 days of low (or no) infection, followed by data training between 10 February 2020 to 29 June 2020. Results were later crossverified from other databases e.g. US: https://usafacts.org; EU: https://data.europa.eu/; UK: https://coronavirus.data.gov.uk/; India: https://www.covid19india.org/. The Bayesian Markov Chain Monte Carlo (MCMC)^{43} infrastructure in PHIRVD trains the repository data to probabilistically predict the 17 parameters of the infection kinetic model (see “Methods” section). Unlike previous predictive Machine Learning models^{14,19,20,21,22}, this structure allows more dynamic adaptive control of the infection kinetic estimation resulting in a highly accurate predictive module.
Mortality and infection: prediction against reality
The 18 countries or regions under study were divided into 4 infection classes, the first three based on decreasing mortalitytoinfection ratio for countries past their infection peak: UK, Netherlands, Sweden, New York State (Class A); Germany, Korea, Australia, Russia, Vietnam (Class B); and Italy, Spain, Hubei (Class C). Class Class D comprises India, Poland, Iran, France, Portugal and Brazil, with ongoing infection regimes. We deliberately chose New York State instead of the entire United States due to its high population density and tourist/ worker traffic that is quite different from the national average.
With the number of reported cases being highly dependent on the number of daily testings, not necessarily in agreement with the actual disease propagation dynamics, we observe some deviations between the simulated I(t) and the actual number of reported cases. On the other hand, D(t) is less affected by the testing rate. Since we are using mortality statistics with the same weightage as the infected data, we prioritize mortality prediction. We note that daily training of any epidemiological model will invariably achieve better data match, as many studies have shown. However, our ML embedded propagation kinetic model thrives on long term predictions, as much as possible.
Comparative statistics for our Class A representative, the UK, is shown in Fig. 2. The blue region marks the training zone that fixes the parameters. Based on the highest mortality to infection ratio in each group, the representative countries for the other 3 classes are Germany (Class B), Italy (Class C), India (Class D). Figures 3, 4 and 5 represent infection statistics for Class B (Germany), C (Italy) and D (India) respectively (other plots in Appendix II). Chisquare tested (see “Methods” section for Chisquared statistic used) accuracy chart in Table 1 clearly points to the veracity of the accuracy claim made. On the other hand, Vietnam presents an interesting case. With a reported zero mortality rate notwithstanding high population density, it has been repeatedly cited as an example of early quarantine success. The model tracks even such an exceptional case to a moderate level of accuracy (in Appendix II). The outsets and insets respectively outline the cumulative versus the daily infection traffic. Details for other countries, for 4 infection classes, are provided in Appendix II.
Table 2 presents a comparative chart of the PHIRVD model predictions versus real data, separately for the numbers of infected and dead, for countries representing the 4 classes with data trained between 10 February to 29 June: Class A (UK), Class B (Germany), Class C (Italy) and Class D (India). Futuristic prediction is shown until 12 July. For other countries in each individual class, with data training between 10 February to 10 May, 30 days’ prediction until 9 June establishes the predictive strength of this model (see Tables S2–S5, Appendix III), error validated as shown in Table S1 (see Appendix I).
Table 3 compares second wave mortality prediction obtained from PHIRVD against real data, based on data training until 29 June 2020. The result can be substantially improved if data is trained within a month of the resurgent wave, as in November 2020. But the reliability of prediction stretching up to 150 days beyond last data training is unprecedented to our knowledge and affirms the robustness of the model.
The expected number of secondary cases produced from each infected individual is traditionally defined as the basic reproduction number. The detailed calculation of \(R_{\text {e}}\) is provided in the “Methods” section. Figure 6 depicts the time evolution of basic reproduction number that indirectly reflects the emerging infection (and fatality) rate for the 4 representative countries from infection classes AD, represented by the basic reproduction number \(R_0\)^{44,42,46} (see “Methods” section). \(R_0\) kinetics of all other countries are provided in Appendix I. Class A countries consistently show the sharpest drop in \(R_0\) and the flattest stability period, followed by progressive R0 decay and waiting time, often the ‘gestation time’, reflected by the plateau regions of the respective plots for classes B, C and D respectively. The point of note here is that while Germany and Italy show higher levels of infection than the UK, the gestation period for the UK is a lot larger than both. India shows a similar trend although the absolute numbers for India are a lot lower than the other three, indicating a complicated relationship between Full Width at Half Maximum (FWHM) and gestation period.
Discussion
Combining conventional infection kinetic modeling with a predictive Bayesian MCMC, PHIRVD quantifies the impact of lockdown as a containment tool. It estimates mortality statistics with high significance for 18 countries, accurate upto the next 30 days, beyond the last date of data training. Ideal lockdown imposition and withdrawal times have been predicted and validated, including for ongoing regimen e.g. India. PHIRVD also predicts secondary relapse timings and establishes mortalitytoinfection ratio as the key pandemic predictive descriptor instead of reproduction number. PHIRVD is also capable of analyzing the impact of migration, an ongoing project. Our findings clearly suggest that phased lockdown is a potent containment tool but needs to be strategically imposed, where the correct implementation and withdrawal times are paramount. Secondary infection and mortality prediction will be key to future strategic quarantine imposition and analyzing impact of future therapeutics.
PHIRVD leads to three key outcomes. First, we present highly accurate probabilistic predictions for the numbers of infected and dead for each country for a total of 18 countries, typically 3 weeks beyond the last date of (Machine Learned) data training. Our PHIRVD model depicts a high degree of reliability between model prediction against real data validation across the range of countries considered.
Our model can also be used to identify a better strategy for lockdown imposition, to minimize the fatality. The full simulations plots (in Appendix II) clearly outlines how an increasing infection profile initially matches with decreasing numbers of preexisting susceptible and increasing statistics for the recovered, that then slows down as the infection peak arrives, eventually to tail off in to a noinfection landscape. While the qualitative trends are similar for all classes (A, B, C, D) of countries, the impact of lockdown on the first peak, and then a second (relapse) peak, hint at the internal health versus econometrics of the countries concerned. To prove this point, we compare infection (and mortality) propagation kinetics of 2 chosen countries for two different dates, one on the recess (UK: Fig. 7), the other with uprising infection level (India: Fig. 8). As opposed to the recent furore about school children being exposed to the Covid19 menace as a result of early lockdown withdrawal, our result clearly shows that there is practically no difference in mortality between a withdrawal on June 1, 2020 as against a later withdrawal e.g. July 1, 2020 (although a withdrawal on May 1 would have been disastrous). The 1 June (almost equally safe) withdrawal would, of course, be favoured on economic and social grounds.
The third key outcome of our analysis is the establishment of mortality:infection ratio as the key descriptor of pandemic over and above reproduction number, that has conventionally been used for the purpose. The proof of this is in the accurate prediction of the secondary infection relapse time that the reproductive number fails to predict. As can be seen from Fig. 7a,b, this relapse time period could be deferred with a late lockdown withdrawal on July 1 (as compared to June 1) although the peak mortality rates are not hugely different (ca 200 at 1 July compared to ca 400 at 1 June). Using 1 July 2020 as the UK lockdown withdrawal date, there is a clear signature of secondary relapse in the first week of September (identified as the second peak in Fig. 7). The Indian situation is clearly more challenging, though, as shown in Fig. 8. While perhaps economically unsustainable, India could benefit with a lockdown even beyond 31 July, 2020. For other nations like Iran, Portugal, France and Poland, our predictions of nontrivial secondary relapses (all in late June) match almost perfectly with data, both infected and dead. As the second wave data is now available for the UK, we simulated it using our PHIRVD model. Results shown in Fig. 9 demonstrate excellent agreement with real statistics (data trained only up to 29 June 2020), that reaffirms the strength of the model.
A real point of contention amongst politicians, health professionals and medical scientists has, for long, been the correct lockdown implementation and withdrawal times. In statistical parlance, this effectively amounts to an estimation of the FWHM as has been estimated for Wuhan at 2.6 weeks from initial infection^{47}. To analyze these counterclaims, we incorporate the effects of withdrawal of lockdown as a country specific, dynamically evolving quantity.
The availability of the awaited vaccines^{26}, and of late, the therapeutic range^{48,49}, have provided major immunity tools in the Covid firefight. The impacts of these vaccines are most likely to be futuristic antibody switch though, as is clearly evidenced by the huge second/third phase outbreaks in countries like India, Bangladesh and Russia that survived the initial onslaught well. With growing mortality profile, sometimes attributed to newer viral strains, the impact of quarantine measures, namely what and how to choose and when to implement or withdraw, has now assumed crucial importance, for which our model can serve as a future benchmark.
Methods
Motivation of the PHIRVD model
PHIRVD uniquely combines a dynamically evolving infection propagation model that tracks the phenomenology of infection kinetics with a probabilistic predictive algorithm, the latter chosen as a Bayesian Markov Chain Monte Carlo (MCMC) kernel. The Bayesian MCMC is used to train past data to predict time independent generic parameters that can predict the future statistics. The choice is guided by the strength of Bayesian MCMC in a range of dynamical modeling studies in complementary fields^{50,51}.
Reproduction number \(R_{\text {e}}\) at fixed point
For \(\gamma =0, \delta =0\), from Eq. (1) the disease free equilibrium (DFE) or fixed point is given by \(P^* =H^*\frac{h_{2v} q_{2P}}{p_{2v} q_{2H}}\), \(I^*=0, R^*=0\), \(V^*=H^* \frac{h_{2v}}{ q_{2H}}\). To evaluate the reproduction number \(R_{\text {e}}\), we have to break the equation of \(\frac{dI}{dt}\) into two parts \({\mathcal {F}}, {\mathcal {V}}\), i.e.,
where \({\mathcal {F}}=(\beta _1 H + \beta _2 P + \beta _3 R) I\) and \({\mathcal {V}}=(\zeta +w) I\). Now, \(F=\frac{\partial {\mathcal {F}}}{\partial I}_{DFE}\) and \(\Sigma =\frac{\partial {\mathcal {V}}}{\partial I}_{DFE}\). Then \(R_{\text {e}}=\frac{F}{\Sigma }=\frac{ H^* \left( \frac{\beta _{2} h_{2v} q_{2P}}{p_{2v} q_{2H}}+\beta _{1}\right) }{\zeta +\omega }\).
Lockdown dynamics
During the time period, over which we trained our model, most of the countries (except Sweden), of our interest, were under lockdown. Therefore, we studied the effects of withdrawal/relaxation of lockdown for some countries by introducing a time varying parameter L(t) in the model in Eq. (1) substituting \(\beta _{1,2,3}\) with \(\beta _{1,2,3}\,L(t)\) respectively, where \(L(t) = 1 \,\,\text {for} \,\,t \le t_0, \,\,\text {and}\,\,\alpha \,\,\text {for}\,\,t \ge t_0+k\). For \(t_0<t<t_0+k\), \(L(t) = \frac{1}{k} [ \alpha (tt_0)+(t_0+kt) ]\). Here \(t_0\) marks the lockdown withdrawal time point, k is the approximate time duration during which the susceptible and infected population mixes well (e.g. within one week or one month etc.), where \(\alpha\) is the parameter quantifying the intensity of mixing between susceptible and infected population. A larger \(\alpha\) value implies a higher mixing rate among susceptible and infected individuals. The function L(t) is such that before lockdown withdrawal, it does not alter the contact probability while after withdrawal, it linearly increases from the value 1 to \(\alpha\) over a time interval of k days, ensuring that the contact probability between susceptible and infected increases from a low to a high value within this time period.
Parameter estimation
The Bayesian MCMC data training leading to supervised learning is itself conducted in two steps using a doublefiltration process. First, infection data alone are used to arrive at a preliminary set of values, characterizing each country. The said values are then filtered through combined infected and mortality statistics for a second training to sequentially converge to a preset upper limit. The training schedule is repeated multiply to ensure accurate predictions of the training dataset. Estimation of the equilibrium reproduction number is strategically used to reduce the effective parameter space from 13 to 8 parameters, perfectly conforming with the Bayesian MCMC prediction which shows that value fluctuations with other parameters do not contribute much to the infection kinetics. The model clearly separates the H and P infection classes to reflect their differential levels of infection and mortality. Another constituent is the death rate kinetics embedded in the central structure. The infection propagation model outlined in Eq. (1) is a multiparameter model whose parameters are evaluated using predictive data modeling within the Bayesian MCMC construct. Similar structures have been selectively used in^{21,22} albeit for singlecountry specific models without any explicit mortality dynamics. Overreliance on infection statistics has often led to incorrect estimation for mortality statistics, whose accurate prediction is our first key target, an aim that is remarkably well served by our MLembedded compartmentalised model. We present both the cumulative and daily (inset plots) statistics of infected population over 400 days, data trained between 10 February 2020 to 29 June 2020 (140 days) and then predicted up to the next 8 weeks (shown up to 12 July 2020 in Table 1).
The Bayesian Markov chain Monte Carlo (MCMC) algorithm
To understand how the algorithm uses the data to determine the parameters, it is useful to recall some elements of Bayesian statistics^{50,51}. Let \(\varvec{D}=(D_1, D_2, \ldots , D_n)\) represent the full data vector that is being used to train the algorithm. For our case, the subscripts run over both the time intervals (daily) as well as the data types, such as \(I_c(t_i)\) and \(D(t_i)\). Similarly, let \(\varvec{\Theta }=(\theta _1, \theta _2, \ldots , \theta _\alpha )\) represent the vector of parameters. A key ingredient is the prior probabibility distribution (Bayesian priors) for each \(\theta _i\). While the absence of any knowledge of the system would call for a prior that is flat in the physically allowed region, the incorporation of such knowledge (which, in the present context, could be divined from the analysis of, say even part of the data for a single country in a given class) quickly gives the prior a somewhat peaked structure. In other words, one could as well start with a normaldistributed prior, viz., \(\varvec{\Theta } \sim N(\varvec{\Theta _0,\sigma })\), where the vector \(\varvec{\Theta _0}\) represents the mean of the parameters and \(\varvec{\sigma }=(\sigma _1, \sigma _2, \ldots , \sigma _\alpha )\) the standard deviation. As it turns out, the dependence of the final result on the prior is quite insignificant.Given a \(\varvec{\Theta }\), it is straightforward to calculate the conditional probability \({\mathscr {P}}(\varvec{D\Theta })\) of obtaining a realization \(\varvec{D}\) for the data. Using Bayes’ theorem, the posterior probability for \(\varvec{\Theta }\) given the data is expressed as
where \({\mathscr {P}}(\varvec{D)=\int _\Omega {\mathscr {P}}(\varvec{D}\varvec{\Theta }}) {\mathscr {P}}(\varvec{\Theta })d\varvec{\Theta }\), with \(\Omega\) denoting the whole parameter space. This, immediately leads us to the likelihood ratio of two parameter vectors \(\varvec{\Theta _1}\) and \(\varvec{\Theta _2}\), namely
We now resort to a 3step algorithm:

1.
Choose parameters (including initial conditions) through a random walk in the parameter space. The nature of the random walk is determined by the prior probability distributions for the parameters, including initial conditions.

2.
Calculate the likelihood ratio function for the parameters, given the data.

3.
Decide whether to accept the suggested parameter set or not.
Step 1:
Let \(\varvec{S_i}=(S_{i1}, S_{i2}, \ldots , S_{in})\) be the simulated vector at the ith step for parameter values \(\varvec{\Theta _i}=(\theta _{i1}, \theta _{i2}, \ldots , \theta _{i\alpha })\). Compared to the total population, the data \(I_c(t), D(t)\) etc. are quasicontinuous and can be assumed to be drawn from a Normal distribution with respective standard deviations \(\varvec{\Gamma }=(\gamma _1, \gamma _2, \ldots , \gamma _n)\) and means \(\varvec{S_i}=(S_{i1}, S_{i2}, \ldots , S_{in})\). Therefore, the posterior probability (or likelihood, in case of continuous probability density) of the parameter vector \(\varvec{\Theta _i}\) is,
Next, we execute a random walk in \(\varvec{\Theta }\)space with distribution \(N(\varvec{\Theta _i,\sigma })\) to find \(\varvec{\Theta _{i+1}}\), and calculate again the posterior likelihood function, with the simulated data vector \(\varvec{S_{i+1}}\), corresponding to the parameter vector \(\varvec{\Theta _{i+1}}\) as
Step 2:
The likelihood ratio is now calculated to be \({\mathscr {P}}(\varvec{\Theta _{i+1}D}) / {\mathscr {P}}(\varvec{\Theta _iD})\).
Step 3:
Next, we generate a uniform random number \(r \sim U[0,1]\). If \(r < {\mathscr {P}}(\varvec{\Theta _{i+1}D})/{\mathscr {P}}(\varvec{\Theta _iD})\), we accept \(\varvec{\Theta _{i+1}}\), otherwise we go back to Step 1 and repeat the procedure.
We have used cumulative infected and dead data as the vector \(\varvec{D}\) and we normalize (as described above) the data vector \(\varvec{D}\), as well as the simulated vector \(\varvec{S_i}\) at every step, before calculating the likelihood ratio in Step 2 above. We have used \(\sigma = (\varvec{\sigma _P}, \varvec{\sigma _{IC}})\), where \(\varvec{\sigma _P} = (0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01)\) only for parameters part, \(\varvec{\sigma _{IC}} = (0.1, 0.1, 0.001, 0.0, 0.0, 0.0)\) for initial data part, and \(\varvec{\Gamma }=(\gamma _1, \gamma _2, \ldots , \gamma _n)\), where \(\gamma _j = (0.10.05)(j1)/(n1)+0.05\). The initial days (where the numbers are low) in the data are given relatively smaller weightage than the later days for fitting, as the noise level is higher initially, than the signal.
Estimation of the reproduction number kinetics
Understandably, the basic reproduction number \(R_0\) is no longer a constant. Defining \(R_0(t)\) as the average number of secondary infections from a primary case at a given epoch t, and similarly \(I_d(t)\) as the number of daily new cases, we have
where \(g(\tau )\) is the probability density function of the generation time \(\tau\), defined as the time required for a new secondary infection to be generated from a primary infection. In other words, \(\tau\) is the time interval between the onset of a primary case to the onset of a secondary case, generated from this primary case. As is reported^{37}, the mean generation time is approximately 6.5 days, we assume \(g(\tau )\) has a Gamma distribution with \(g(\tau ) = \mathrm {Gamma}(6.5, 0.62)\). We represent \(R_0(t)\) as a function of time as
We approximate the denominator of Eq. (8) directly from our simulated data, by a discrete sum, and evaluate \(R_0\) at nth day as
Statistical error estimation and pvalues
Using the Chisquare statistic as \(\chi ^{2} \equiv \sum \nolimits _{i=1}^n \left( \frac{D_{i}S_{i}}{\epsilon S_{i}+1}\right) ^{2}\) (\(0<\epsilon <1\)), where \(D_i\) are observed data and \(S_i\) the simulated data for the \(\text {i}\)th day, we quantify the accuracy of our model fitting with the real data. Understandably, the data for daily new infections and daily new deaths are contaminated by noise, more severely than the corresponding cumulative data. Hence, a Chisquare test applied on cumulative data will always give a high pvalue. However, to test the power of our predictive machine learning algorithm, we calculated the pvalues on daily new data of deaths and infected. Assuming the real data are drawn from a normal distribution with mean value same as the simulated data, and with a standard deviation equal to some fraction of the simulated data, we derive our Chisquare statistic. Although, the real data of infected and dead are always positive, as the infection increases, this assumption is very well valid, except for a very small time interval at the starting of infection in a population.
Data availability
Data from the Johns Hopkins repository (https://github.com/CSSEGISandData/Covid19) were used, together with country specific repositories, e.g. US: https://usafacts.org; EU: https://data.europa.eu/; UK: https://coronavirus.data.gov.uk/; India: https://www.covid19india.org/. All the epidemiological information we used is documented in the Extended Data and Supplementary Tables. The codes and relevant files are made available through the Aston Data Repository.
References
 1.
Davies, N. G., Kucharski, A. J., Eggo, R. M., Gimma, A. & Edmunds, W. J. Effects of nonpharmaceutical interventions on COVID19 cases, deaths, and demand for hospital services in the UK: A modelling study. The Lancet Public Healthhttps://doi.org/10.1016/S24682667(20)30133X (2020).
 2.
Gatto, M. et al. Spread and dynamics of the COVID19 epidemic in Italy: Effects of emergency containment measures. PNAS 117(19), 10484–10491 (2020).
 3.
Koff, W. C. & Wlliams, M. A. Covid19 and immunity in aging populations—A new research agenda. NEJMhttps://doi.org/10.1056/NEJMp2006761 (2020).
 4.
Giesecke, J. The invisible pandemic. The Lancet 395(10238), E98. https://doi.org/10.1016/S01406736(20)310357 (2020).
 5.
Moghadas, M. S. et al. The implications of silent transmission for the control of COVID19 outbreaks. PNAS 117(30), 17513–17515 (2020).
 6.
Funk, S. et al. The impact of control strategies and behavioural changes on the elimination of Ebola from Lofa County, Liberia. Philos. Trans. R. Soc. Lond. B Biol. Sci. 372, 20160302 (2017).
 7.
Nguyen, T. A., Nguyen, Q. C., Le Kim, A. T., Nguyen, H. N., & Nguyen, T. T. H. Modelling the impact of control measures against the Covid19 pandemic in Vietnam. BMJ. https://doi.org/10.1101/2020.04.24.20078030 (2020).
 8.
Li, R. et al. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARSCoV2). Science 368(6490), 489–493 (2020).
 9.
Barton, C. M. et al. Call for transparency of Covid19 models. Science 368(6490), 482–483 (2020).
 10.
European Centre for Disease Prevention and Control. https://www.ecdc.europa.eu/en/cases2019ncoveueea.
 11.
Ota, M. Will we see protection or reinfection in COVID19?. Nat. Rev. Immunol. 20, 351 (2020).
 12.
Chen, D. et al. Recurrence of positive SARSCoV2 RNA in COVID19: A case report. Int. J. Inf. Dis. 93, 297–299 (2020).
 13.
Kissler, S. M., Tedijanto, C., Goldstein, E., Grad, Y. H. & Lipsitch, M. Projecting the transmission dynamics of SARSCoV2 through the postpandemic period. Science 368, 860–868 (2020).
 14.
Kucharski, A. J. et al. Early dynamics of transmission and control of Covid19: A mathematical modelling study. Lancet Infect. Dis. 20, 553–58 (2020).
 15.
Glasser, J. W., Hupert, N., McCauley, M. M. & Hatchett, R. Modeling and public health emergency responses: Lessons from SARS. Epidemics 3, 32–37 (2011).
 16.
Fraser, C., Riley, S., Anderson, R. M. & Ferguson, N. M. Factors that make an infectious disease outbreak controllable. Proc. Natl. Acad. Sci. USA 101, 6146–51 (2004).
 17.
Peak, C. M., Childs, L. M., Grad, Y. H. & Buckee, C. O. Comparing nonpharmaceutical interventions for containing emerging epidemics. Proc. Natl. Acad. Sci. USA 114, 4023–28 (2017).
 18.
He, Xi. et al. Temporal dynamics in viral shedding and transmissibility of Covid19. Nat. Med. 26, 672–675 (2020).
 19.
Hellewell, J. et al. Feasibility of controlling Covid19 outbreaks by isolation of cases and contacts. Lancet Glob. Health 8, e488–e496 (2020).
 20.
Giordano, G. et al. Modelling the Covid19 epidemic and implementation of populationwide interventions in Italy. Nat. Med. Lett.https://doi.org/10.1038/s4159102008837 (2020).
 21.
Denning, J. et al. Inferring change points in the spread off Covid19 reveals the effectiveness of intervention. Sciencehttps://doi.org/10.1126/science.abb9789 (2020).
 22.
Jo, H., Son, H. & Jung, S. Y. Analysis of COVID19 spread in South Korea using the SIR model with timedependent parameters and deep learning. BMJhttps://doi.org/10.1101/2020.04.13.20063412 (2020).
 23.
Lewnard, A. J. & Lo, N. C. Scientific and ethical basis for socialdistancing interventions against COVID19. Lancet Infect. Dis.https://doi.org/10.1016/S14733099(20)301900 (2020).
 24.
O’Hallahan, J. et al. From secondary prevention to primary prevention: A unique strategy that gives hope to a country ravaged by meningococcal disease. Vaccinehttps://doi.org/10.1016/j.vaccine.2005.01.061 (2005).
 25.
Matamalas, J. T., Arenas, A., Gómez, S. Effective approach to epidemic containment using link equations in complex networks. Sci. Adv.412, eaau4212. https://doi.org/10.1126/sciadv.aau4212.
 26.
https://www.who.int/emergencies/diseases/novelcoronavirus2019/covid19vaccines.
 27.
Sahin, U. et al. COVID19 vaccine BNT162b1 elicits human antibody and TH1 T cell responses. Nature 586, 594–599 (2020).
 28.
Corbett, K. S. et al. SARSCoV2 mRNA vaccine design enabled by prototype pathogen preparedness. Nature 586, 567–571 (2020).
 29.
Mahase, E. BMJ 372. https://doi.org/10.1136/bmj.n86 (2021).
 30.
Baraniuk, C. BMJ 372. https://doi.org/10.1136/bmj.n743 (2021).
 31.
https://www.nytimes.com/interactive/2021/health/bharatbiotechcovid19vaccine.html.
 32.
Zhang, Y. et al. Safety, tolerability, and immunogenicity of an inactivated SARSCoV2 vaccine in healthy adults aged 18–59 years: A randomised, doubleblind, placebocontrolled, phase 1/2 clinical trial. The Lancet 21, 181 (2021).
 33.
Mercado, N. B. et al. Singleshot Ad26 vaccine protects against SARSCoV2 in rhesus macaques. Nature 586, 583–588 (2020).
 34.
Callaway, E., & Mallapaty, S. Novavax offers first evidence that COVID vaccines protect people against variants. Nature590, 17. https://doi.org/10.1038/d41586021002689.
 35.
Culbertsonb, A. COVID19: Does the Indian variant make vaccines less effective and how concerned should we be? https://news.sky.com/story/istheindiancovidvariantmoreinfectiousandshouldtheukbeconcerned12280387.
 36.
Cohen, J. South Africa suspends use of AstraZeneca’s COVID19 vaccine after it fails to clearly stop virus variant. Sciencehttps://doi.org/10.1126/science.abg9559.
 37.
Aliou, M. A. & Baldé, T. Fitting SIR model to COVID19 pandemic data and comparative forecasting with machine learning. BMJhttps://doi.org/10.1101/2020.04.26.20081042 (2020).
 38.
Prem, K. et al. The effect of control strategies to reduce social mixing on outcomes of the Covid19 epidemic in Wuhan, China: A modelling study. Lancet Public Health 5(5), E261–E270 (2020).
 39.
Grela, E., Stich, M. & Chattopadhyay, A. K. Epidemiological impact off waning immunization on a vaccinated population. Eur. Phys. J. B 91, 267 (2018).
 40.
Zhou, P. et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Naturehttps://doi.org/10.1038/s4158602020127 (2020).
 41.
Hamming, I., Timens, W., Bulthuis, M. L. C., Lely, A. T., Navis, G. J. V., Goor, H. V. Tissue distribution of ACE2 protein, the functional receptor for SARS coronavirus. A first step in understanding SARS pathogenesis. J. Pathol.https://doi.org/10.1002/path.1570 (2004).
 42.
John Hopkins Covid19 repository. https://github.com/CSSEGISandData/Covid19.
 43.
Endo, A., Leeuwen, E. V. & Baguelin, M. Introduction to particle Markovchain Monte Carlo for disease dynamics modellers. Epidemics 29, 100363 (2019).
 44.
Seth, F. et al. Estimating the number of infections and the the impact of nonpharmaceutical interventions on COVID19 in 11 European countries. Imp. Coll. Lond.https://doi.org/10.25561/77731 (2020).
 45.
Nishiura, H. Correcting the actual reproduction number: A simple method to estimate R0 from early epidemic growth data. Int. J. Environ. Res. Public Health 7(1), 291–302 (2010).
 46.
Cori, A., Ferguson, N. M., Fraser, C. & Cauchemez, S. A new framework and software to estimate timevarying reproduction numbers during epidemics. Am. J. Epidemiol. 178(9), 1505–1512 (2013).
 47.
Tomie, T. Understanding the present status and forecasting of COVID19in Wuhan. https://doi.org/10.1101/2020.02.13.20022251.
 48.
 49.
https://www.reuters.com/article/healthcoronavirussprayintidUSKBN2BE267.
 50.
Gelman, A. et al.Bayesian Data Analysis (CRC Press, 2013).
 51.
Ramsay, J. & Hooker, G. Dynamic Data Analysis (Springer, 2017).
Acknowledgements
AKC acknowledges Darren Flower for his comments and advice on the manuscript. The authors acknowledge VAXFARM Life Sciences for insightful discussions on vaccinology and therapeutics.
Author information
Affiliations
Contributions
A.K.C. and D.C. designed the core model, sequentially modified by S.K.N. S.K.N. led the MCMC computation and model simulation, while A.K.C. and B.K. led the analytical sections. D.C. and G.G., together with S.K.N. and B.K., were in charge of comparative statistical error estimation. All authors wrote and approved the manuscript. All authors have identical contribution towards the final output.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Chattopadhyay, A.K., Choudhury, D., Ghosh, G. et al. Infection kinetics of Covid19 and containment strategy. Sci Rep 11, 11606 (2021). https://doi.org/10.1038/s41598021906982
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598021906982
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.