Spatio-temporal predictive modeling framework for infectious disease spread

A novel predictive modeling framework for the spread of infectious diseases using high-dimensional partial differential equations is developed and implemented. A scalar function representing the infected population is defined on a high-dimensional space and its evolution over all the directions is described by a population balance equation (PBE). New infections are introduced among the susceptible population from a non-quarantined infected population based on their interaction, adherence to distancing norms, hygiene levels and any other societal interventions. Moreover, recovery, death, immunity and all aforementioned parameters are modeled on the high-dimensional space. To epitomize the capabilities and features of the above framework, prognostic estimates of Covid-19 spread using a six-dimensional (time, 2D space, infection severity, duration of infection, and population age) PBE is presented. Further, scenario analysis for different policy interventions and population behavior is presented, throwing more insights into the spatio-temporal spread of infections across duration of disease, infection severity and age of the population. These insights could be used for science-informed policy planning.


Results
The population balance model. Let T ∞ be a given final time and � := � x ⊗ � ℓ be the computational domain of interest. Here, x ⊂ R 2 is the spatial domain defining the geographical region of interest and � ℓ ⊂ R n , where n is the number of internal directions. Each of the n-internal directions represents the property of the population on which a distribution needs to be predicted. Suppose the properties of interest are the infection severity, duration of the infection and age of the population, then a model with three internal directions could be used as follows. Let � ℓ := L v × L d × L a be the internal domain, where L v = [0, 1] denotes the infection severity interval, L d = [0, d ∞ ] denotes the duration of infection, d ∞ is the maximum duration of infection, L a = [0, a ∞ ] denotes the age interval and a ∞ is the maximum age of the population. The infection index ℓ v ∈ L v quantifies the severity of the infection among the infected population. Specifically, the population with infection index ℓ v = 0 is completely disease-free, with ℓ v = 1 has maximum severity, with ℓ v ≥ v sym shows symptoms and those with ℓ v < v sym are asymptomatic. The duration of infection index ℓ d ∈ L d quantifies the time since a population has been exposed to and contracted the disease. Specifically, the population that just contracted the disease has ℓ d = 0 . Typically, a person is asymptomatic until they reach ℓ v = v sym , and the duration elapsed ℓ d is the incubation period in which the disease is sub-clinical and that population is actively spreading the disease. After recovery, a population doesn't necessarily go to ℓ v = 0 , rather they reach ℓ v < v reco .
Let I(t, x, ℓ) , where t ∈ (0, T ∞ ], x ∈ � x and ℓ ∈ � ℓ , be the infected number density function of the population. To describe the evolution of the active infected population size distribution, we propose the population balance equation Here, u denotes the advection vector that quantifies the multiscale spatial movement of the population in a differential neighbourhood of x (e.g., migrant laborers, daily commute for work, logistics-related travel, periodic gathering for religious and social events), n is the outward unit normal vector to x , ∂� − := {x ∈ ∂� x | u · n < 0 } , g n is the flux that quantifies the net addition of the infected population into x from outside (the spatial movement of the population across the border of the domain ∂� x ), and I 0 is the initial distribution of the infected population. Further, Here, β is the immunity of the infected population, γ is the pre-medical history of the infected population and α is the effective treatment index. Next, we define the rate term C = C R + C ID , where C R (t, x, ℓ) is a recovery rate function that quantifies the rate of recovery of the population from the infection, and C ID (t, x, ℓ) is the infectious death rate. We also define a source term F = C T (t, x, ℓ) that quantifies the point-to-point movement of infected population (e.g., by air, train etc) within x , which are not included in u and g n . Moreover, C T and u need to be defined in such a way that the net internal movement of infected population within x is conserved. Moreover, B nuc is the nucleation function that quantifies the infection transmission from the infected to the susceptible population and it is a function of several parameters as follows Here X ∈ , σ , H and S D are the interactivity, hygiene and social distancing indices respectively. Finally, the total population N(t) at a given time t ∈ (0, T ∞ ] is defined by Here, N S , N B , N R , N I , N Q N ID and N D are the number of susceptible, newborn, recovered, infected (symptomatic/ asymptomatic), quarantined, infectious death and natural death populations, respectively. The given initial and boundary conditions and the above defined parameters close the population balance system.

Modeling of parameters.
The proposed population balance model (1) is comprehensive and built on the basis of several parameters as defined above. This framework is very versatile and provides us the means to incorporate the effects of different parameters that describe the complex infectious disease spread dynamics into a singe PDE. Fitting these parameters accurately using either data-driven approaches 10 or appropriate assumptions from literature makes the model very robust. In this section, we describe the modeling of each parameter. (1) (2) B nuc = B nuc t, X, σ , H, S D , N S , N Q , I . www.nature.com/scientificreports/ Nucleation term. The nucleation term B nuc quantifies the new infection number density that is added to the system at ℓ d = 0 for all t, x , ℓ v , and ℓ a . Depending on how the susceptible population interacts with the infected population, new infections are added to the system. We call this addition as nucleation (borrowing the terminology from process engineering), which is modelled as (3) determines the fraction of the infected population in quarantine and it can be modeled as in equation (5). Further, the factor γ Q is dependent on the level of screening including testing, strictness of enforcing isolation and compliance of susceptible general public. Suppose γ Q = 1 , i.e., if the entire infected population is kept under strict isolation, newly infected population will be zero and eventually there will be no spread of disease. However, due to economic, social and democratic reasons, implementing such a strategy is nearly impossible and there is bound to be spread, i.e., γ Q < 1 . Moreover, the integral on the right-hand side of equation (3) is the total non-quarantined number density of the infected population at (t, x) , and R is the rate at which the non-quarantined population infects the susceptible population. The factor R is modelled as in equation (4), where R 0 is the basic reproduction rate,

Scientific Reports
. Suppose σ = 0 then everything is under perfect lockdown and R → 0 . In case S D = 1 , everyone is following perfect social distancing and R → 0 . Moreover, the newly infected population has to be added at different age ( ℓ a ) and infection (ℓ v ) levels for which the factors f 4 and f 5 are introduced. We propose to use logistic functions fitted to data from literature for f 1 , f 2 , f 3 ; the normalized demography at (t, x) for f 4 (t, x, ℓ a ) , and a Gaussian mixture with two components so that maxima is at v sym and tails are proportional to the interval length over [0, v sym ] and [v sym , 1] for f 5 (ℓ v ) . In addition, the constant in f 5 is chosen such that the integral of f 5 over its support is one. This condition is imposed to ensure that R 0 can be interpreted as the basic reproduction rate used in standard epidemiological models 12  Growth factor. The growth factor G ℓ v quantifies how the infected number density is advected along the direction of l v , that is, how the infection becomes mild to severe/critical and vice-versa in the infected population. We can model it as a function of the medical history, immunity of the population, which in turn are functions of the age l a , treatment and socio-economic status. Nevertheless, as a simple first order model, we propose a nonlinear function of the age, where K g is a non-dimensionalization factor, p is a power of nonlinearity and a risk is the age offset.
Recovery rate and infectious death rate. In general, the recovery rate C R and infectious death rate C ID depend on ℓ v , and in turn are functions of hospital facilities, age, and health state of the population. These rates can be modeled directly from clinical data for all ordinates ℓ v , ℓ d , ℓ a . For the exact functional forms refer to the Supplementary Information.
Initial infection number density. The initial number density I 0 (x, ℓ) can be estimated directly from available official data on the day of starting the simulation. However, the data is available only in terms of total number of tested and confirmed cases at a x-location and the dependence on ℓ needs to be estimated via appropriate datadriven and analytical functions. As such, first we utilize data from a period of 14 days, along with the log-normal distribution of incubation period 13 to calculate the initial number density N D (x) at all the spatial points x , but integrated over the three internal ordinates ( ℓ v , ℓ d , ℓ a ), i.e., . www.nature.com/scientificreports/ where N i (x) is the data of tested and positive. For the distribution along the internal ordinates, we propose to use the following initial infection number density distribution Here the first term in the square brackets is the normalized demography function (same as f 4 ), second term is the log-normal incubation period function with fitted 13 a 2 = 0.42 and b 2 = 1.62 , and f 5 is same as before.

Covid-19 epidemic spread predictions.
To exhibit the capabilities of the proposed model, the forecast of Covid-19 spread in India is presented here. The numerical scheme and the fitted model parameters are given in the Supplementary Information. The proposed model and numerical schemes are implemented in our in-house finite element package 14,15 and have been verified in our earlier studies with applications to process engineering 16,17 .
With the spread of Covid-19 in India, the federal government imposed a nation-wide lockdown from March 25, 2020. To simulate the spread of infections starting from March 23, 2020, the initial distribution of infected population is estimated using the data of active cases from March 23 to April 5 according to equations (7) and (8). Then the infection spread forecast for one year is computed by solving the PBE system (1). Further, data until June 21, 2020 is utilized to select the parameters (e.g., S D , C R , C ID , γ Q ) that best explains the actual data. Thereafter, the control parameter S D is varied to perform scenario analyses as presented next.
Scenario analysis. Different future scenarios are predicted by varying S D (t) based on the anticipated individual behavior (social distancing, hygiene practice, compliance to government rules etc.) and government policies (quarantine rules, lockdown rules etc.). The first scenario, named Current Trend follows business as usual assuming further relaxation to lockdown rules. A second variant named Better Scenario assumes better compliance in the social distancing and other measures to control the spread of the disease. Sunday, and Sunday & Wednesday lockdowns are imposed on the Current Trend scenario to formulate the third and fourth scenarios respectively. These lockdown scenarios are introduced to measure the impact of periodic lockdowns on the effectiveness of these strategies to control the disease spread. The active ( N I ), recovered (cumulative N R ), deaths (cumulative N ID ) and total (sum of active, recovered and deaths) predicted by the four scenarios for the duration between March 23, 2020 and March 22, 2021 are shown as time-series plots in Fig. 1. In the Current Trend, a peak of 0.975 million ' Active Cases' is predicted in the last week of October 2020, and there will be around 21 million ' Active Cases' , 450,000 deaths and 9.1 million total cases at the end of March 2021. The peak of the Better Scenario is predicted in the second week of September 2020 with 0.478 million ' Active Cases' , which is lower than the Current Trend. Further, there will be around 14,200 ' Active Cases' , 0.188 million deaths and 3.74 million total cases at the end of March 2021. The weekly lockdown scenarios assume that a complete lockdown is imposed on Sunday or Sunday and Wednesday. During this lockdown, there is a complete restriction of people's movement similar to the nationwide lockdown imposed between Mar 25 and April 14 in India. With Sunday Lockdown, a peak of 0.365 million ' Active Cases' is sustained for about two weeks during 5-20 September 2020, and there will be around 30,200 thousand ' Active Cases' , 0.167 million deaths and 3.32 million total cases at the end of March www.nature.com/scientificreports/ 2021. With Sunday and Wednesday lockdown, a peak of 0.197 million ' Active Cases' is sustained for the period 27 June to 15 July 2020, and there will around 2800 ' Active Cases' , 70,300 deaths and 1.39 million total cases at the end of March 2021. The insets in each panel of Fig. 1 show the comparison with actual data and thereby validate the model. In addition, the time series plots for other scenarios including a worse-case scenario can be found at IISc-Model website 18 .
A key insight needed for the central government in India was to identify which states needed additional assistance in containing the spread of the virus. For this purpose, we first fit the parameters of the model to match the reported national data. Thereafter, the estimates are computed for individual states with the above tuned parameters. Figure 2 shows the actual data (until July 14, 2020) and the computed estimates for the states of Karnataka (KA) and Maharashtra (MH). The data of KA follows the computed estimate closely, whereas the actual data indicates more infections and active cases in MH than the estimate. This observation indicates that MH needs further assistance than KA (as of July 14, 2020) to contain covid-19 spread. Such insights can be used by the authorities to introduce state-wise lockdown policies and to plan infrastructure for quarantine, treatments etc. The estimates computed with the above tuned parameters for other states (as of July 14, 2020) can be seen at our IISc-Model website 18 . Further, to capture the recent dynamics of the spread in the individual states following the graded re-opening of the economy, we have re-started the simulation with the updated initial condition on Aug 1, 2020. These updated predictions can also be seen at our website 18 .
Population distribution. Our PBE model in fact predicts the distribution of the infected population over all the internal ordinates ℓ v , ℓ d , ℓ a . In the previous section, we have shown only the total number of infected, recovered and deceased populations. Now, to showcase one of the unique features of the model, we present and discuss the predicted population distribution for the Sunday lockdown scenario. Figure 3 shows the predicted distribution of active Covid-19 infected population over the ordinates ℓ v , ℓ d and ℓ a at different time instances (day 60, 120, 180, 240, 300 and 365) with their corresponding dates.
Predicting the severity of the infected population is crucial to plan the hospital requirements including antiviral treatments, quarantine, hospitalisation, ventilator support, and oxygen support. In particular, the information of asymptomatic and symptomatic infected population helps the policymakers to plan quarantine rules. Moreover, the death rate is a function of the severity of infection and is crucial to predict the causalities arising from the infection spread. The duration of infection plays a key role in epidemic modeling. Classical models usually assume a constant duration. However, the recovery and the death of the patient depend on the immunity, age and health of the patient, medical treatments etc and thus the duration of infection need not be a constant. In the proposed model, the duration of infection is considered as an independent internal ordinate. The recovered population distribution over the duration of infection and other internal ordinates for days 60, 120, 180, 240, 300 and 365 is shown in Fig. 4 along with their corresponding dates. Crucially, the predicted distribution with duration of infection, especially at initial stages, is key to plan for testing and to make effective decisions on quarantine, hospitalization and discharging from hospitals. Finally, the age of the population is pivotal in epidemics like Covid-19 since it affects children and aged population severely. Therefore, it is incorporated into the proposed model as another independent ordinate. In fact, the newly infected population is added from the susceptible population across the age distribution through the nucleation term. Moreover, the response to the antiviral treatment, death and recovery rates depend on age-specific health complications such as diabetics, cardiovascular disease, can also be incorporated in the PBE model with appropriate functions that depend on the age of the population.

Discussion
Our spatio-temporal modeling framework is the first comprehensive partial differential equation model for predicting infectious disease spread. Computationally, our model is efficient compared to agent-based stochastic models. Mathematically, our PDE system is more compact and comprehensive compared to ODE-based compartmental models. Specifically, the PDE is a continuum description of the infected population whereas the compartmental models are a discrete representation. Crucially, in contrast to the existing models, our model www.nature.com/scientificreports/ provides an insight into the distribution of infected population (presented in previous sections). This information is important to plan policy interventions, especially in Covid-19 like pandemics. Not only prognostic estimates, but also diagnostic estimates for more detailed analysis using distribution can be performed with the proposed framework.
With more data and employing data-driven and machine learning approaches, we could further refine the parameters and functional forms of different model components to derive more insightful predictions. For example, to derive insights into the reopening of the workplace and educational institutions, the nucleation and  www.nature.com/scientificreports/ advection vector could be modeled to account for interactions between different age groups and movement of people between homes and these places. The potential options for refining the model are virtually endless. In particular, there is no restriction on the choice of number of internal coordinates. For example, in addition to ℓ v , ℓ d , ℓ a , profession, mobility history, etc can also be added as internal coordinates. Even though we have emphasized Covid-19 pandemic in the present paper, the proposed model can readily be used for forecasting any other infectious disease spread. In future, a data assimilative framework for a realtime update of forecasts can also be implemented.