Fractional SIR epidemiological models

The purpose of this work is to make a case for epidemiological models with fractional exponent in the contribution of sub-populations to the incidence rate. More specifically, we question the standard assumption in the literature on epidemiological models, where the incidence rate dictating propagation of infections is taken to be proportional to the product between the infected and susceptible sub-populations; a model that relies on strong mixing between the two groups and widespread contact between members of the groups. We contend, that contact between infected and susceptible individuals, especially during the early phases of an epidemic, takes place over a (possibly diffused) boundary between the respective sub-populations. As a result, the rate of transmission depends on the product of fractional powers instead. The intuition relies on the fact that infection grows in geographically concentrated cells, in contrast to the standard product model that relies on complete mixing of the susceptible to infected sub-populations. We validate the hypothesis of fractional exponents (1) by numerical simulation for disease propagation in graphs imposing a local structure to allowed disease transmissions and (2) by fitting the model to the JHU CSSE COVID-19 Data for the period Jan-22-20 to April-30-20, for the countries of Italy, Germany, France, and Spain.

where I(t), S(t) represent the size of infected and susceptible sub-populations. The proportionality factor β is typically determined on a case-by-case basis. Thus, if R(t) represents the size of the recovered population, assuming that all individuals undergo full recovery and thereby the total population S(t) + I(t) + R(t) remains constant, the most basic model for transmissions is in the form of the following system of equations, known as SIR model, where α is the recovery rate (with time constant of recovery τ := 1/α ) and η a parameter regulating the rate at which immunity is lost over time. See 1,18 for all the details about the SIR model together with an extensive list of references. Multi-compartmental models that include infected but asymptomatic individuals, deceased, or exposed, have also been considered. However, throughout, the basic feedback that drives the infection, r(t), is invariably as in (1).
In departure from this well-studied SIR paradigm, we propose a fractional SIR (fSIR) model with rate where one or, possibly, both sub-populations are scaled by exponents that are typically less than 1. The justification for such a model stems from the fact that, at least during the initial phase of an epidemic, infection propagates outwards from infected cells to the general population. In such a scenario, where for instance S(t) ≫ I(t) (much greater), the boundary of infected cells which would roughly account for most new infections, scales as a (2) r(t) = βI(t) γ S(t) κ ,

Models of discrete transmissions
To provide insight and justification for our hypothesis on the validity of Eq. (2), we develop a discrete model for direct transmissions between individuals consisting of nodes (individuals) on a graph that captures contacts between them. In the present work, the graph is fixed, while in future work we plan to explore the possibility of time-varying links between nodes as well as modeling control actions, such as social-distancing, so as to study the effects of such mediation protocols.
Model: probabilistic SIR on a graph. Consider a simple undirected graph of size n with adjacency matrix A, where A ij = 1 if node i and j are connected, and A ij = 0 otherwise . The graph is used to model the spread of infection over a network of nodes representing individual people. Every node can be in one the three states {S, I, R} . We use x i (t) ∈ {S, I, R} to represent the state of node i at time t ∈ {0, 1, 2, 3, . . .} . We consider the unit of time to be one day. Here, {x 1 (t), . . . , x n (t)} evolves, as a Markov chain on 3 n states, according to the following transition probabilities dictating transition at the node level for x i (t) to x i (t + 1), www.nature.com/scientificreports/ Here, β is the transmission rate, α is the recovery rate, and η is the susceptibility rate (quantifying loss of immunity over time). The notation N where 1 {·} = 1 when {·} holds and is 0 otherwise. With regard to the structure of the graph, specifying contacts between individuals, we describe results considering the following options: Two-dimensional grid-graph. We work out two rudimentary models where individuals (nodes) are placed on a 2-dimensional grid (vertex set) We carry out experiments for two cases, where the edge set is defined by with d ∈ {1, 2} . Thus, when d = 1 , each node is connected to k = 4 nearest neighbors, while in the case where d = 2 each node is connected to k = 8 nearest neighbors (von Neumann neighborhood with Manhattan distances 1 and 2, respectively). The graph structure for the case d = 1 is depicted in Fig. 1a. In either case, the infectious model is simulated and the results discussed in the section on experiments.
Two-dimensional random graph. We postulate a distribution of nodes on R 2 according to a Gaussian mixture model (Fig. 1b). Each node is connected to its 4 nearest neighbor. Analogous conclusions are drawn and discussed in the section on experiments as well.  www.nature.com/scientificreports/ Table 1 includes summary of notation for modeling parameters. We experiment with a randomized initialization, where a number of small initial cells of infected individuals are sprinkled randomly with probability p 0 inside the general population. A localized initialization, where a specified initial collection of neighboring nodes are infected and the contagion begins from these nodes, gives similar results.
We are interested in studying the dynamics of the subpopulations of susceptible, infected, and recovered individuals, denoted by S(t), I(t), and R(t), respectively. These quantities are calculated from the state of the individuals x i (t): for each ξ ∈ {S, I, R}.
We model the change in the number of infected individuals �I(t) := I(t) − I(t − 1) as a function of aggregate quantities S(t), I(t), and R(t). The change is mainly due to two factors, (1) an increase due to transmission of infection by contact between infected and susceptible individuals, denoted by �I trans (t) , and (2) a decrease due to recovery of individuals, denoted by �I recovery (t) . Thus, In terms of the state of the Markov chain, �I trans (t) and I recovery are given by It is expected that �I recovery (t) ≈ αI(t) on average. However, the model for I trans is not simple. We hypothesize the fractional model where c is a constant, and γ and κ are exponents. We test this hypothesis in the next section by carrying out Monte Carlo simulations on a network model and fitting the data to the proposed fractional model.

Experiments
Two-dimensional grid-graph. Simulation results of infection spread on the two-dimensional grid-graph with size 100 × 100 , for d ∈ {1, 2} (and hence, each node connected to the k ∈ {4, 8} nearest nodes), were carried out for the model (4) and are presented in Figs. 2, 3 and Figs. 4, 5, respectively. The documented experiments were carried out for η = 0.01 , initial infection probability p 0 = 10 −3 , combination of parameters for α = {0.05, 0.1} and β ∈ {0.2, 0.3} , and a time period of 50 days. In order to take into account the randomness of the stochastic model, 100 Monte Carlo simulations were carried out. Each simulation involves independent random initialization of the infected individuals, while the structure of the graph remains fixed.
The top panel in each figure depicts the number of susceptible, infected, and recovered individuals as a function of time, for a single realization of the stochastic model. The bottom panel (of the three panels) in each figure depicts in 3D the number of newly infected population due to transmission, �I trans (t) , vs. the number of infected people I(t), and the number of susceptible people S(t), with solid blue lines marking 100 independent simulations. The center panel represents the two-dimensional projection of the data points on the (log(�I trans (t)), log(I)) plane.
In order to capture the relationship between �I trans (t) and I(t) and S(t), a parametric curve of the form �I trans (t) = cI(t) γ S(t) κ is fitted to the data-points obtained from each simulation. Thus, the parameters c, γ , and κ are obtained by a least-squares fit of the linear relation between respective logarithms (with zero entries for �I trans (t), I(t), S(t) being ignored), The linear regression implicitly assumes an additive Gaussian perturbation, as in (8), on the log-transformed data. Such an additive perturbation of the logarithm is natural in population models as it ensures, here, that �I trans (t) remains positive and that �I trans (t) is zero when either S(t) or I(t) is zero; for further discussion, see 30 .
Note that c, γ , κ are random and differ in each realization of the stochastic model. The mean and standard deviation of γ , κ over 100 simulations are reported in Tables 2a,b for the cases d = 1 and d = 2 , respectively. Also, the curve corresponding to the mean values of the parameters is depicted in the second and third panels for comparison with the simulation data points (in orange color). We observe that in all cases, the standard deviation of the fitted exponent γ is small, around 0.04. However, the standard deviation over κ is large, in some cases around 0.4. This is due to some extent to the fact that the simulations were carried out over short time window, so that the change in S is not sufficiently significant to lead to a reliable exponent κ (i.e., the fractional change in S is relatively small, hence there is an inherent insensitivity to the actual value of the exponent).
log(�I trans (t)) = log(c) + γ log(I(t)) + κ log(S(t)). www.nature.com/scientificreports/ As an indicator of the statistical significance of the fractional SIR model, as compared to the traditional model with exponents equal to one, we evaluated the standard model selection Akaike Information Criterion (AIC). The AIC score is defined as where the maximum likelihood is obtained by maximizing the probability of observing the data set for the parameters of the model. The assumed fSIR model is with ǫ(t) independent Gaussian random noise, for each t, having mean zero and unknown variance denoted by σ 2 . Because of the Gaussian assumption for the noise, the log-likelihood takes the following form www.nature.com/scientificreports/ where T is the number of data-points. In our results, we report the log-likelihood normalized by the number of data points, i.e. 1 T log-likelihood. For the purpose of comparison, we consider four different models: (1) our proposed fSIR model where γ and κ are free, (2) γ is free and κ = 1 , (3) γ = 1 and κ is free, and (4) γ = 1 and κ = 1 . The number of unknown parameters for the first model is 4, namely, c, γ , κ, σ . For the second and the third model, the number of unknown parameters is 3, and for the last one it is 2. We computed the AIC scores and the normalized maximum loglikelihood for all four models averaged over 100 Monte-Carlo simulations for the two-dimensional grid network structure. The results for the setting (Manhattan distance) d = 1 and d = 2 are reported in Tables 3a,b, respectively. The reported values suggest a preferance for the fSIR model over models with exponents equal to one; the goodness of the fit more than compensates for the increased "complexity in the model" due to the additional parameters. For illustration purposes, we compare the four models fit to the data in Fig. 6.
In order to study the effects of model parameters α and β on the fitted exponents γ and κ , we carried out additional experiments and reported the results in Fig. 7. Varying β does not affect the fitted exponent significantly, however a change in α does. This is consistent with the exponent being impacted by the interface (boundary) between the infected population and susceptible population; a change in β affects the speed that the interface propagates. A change in α impacts the interface by changing the rate at which the population inside clusters become susceptible again. www.nature.com/scientificreports/ Two-dimensional random graph. Simulation result on a two-dimensional random graph model were carried out and documented in Fig. 8. The spatial distribution of 10 4 node-coordinates ((x, y)-coordinates, cf. Fig. 1b) has been selected from the mixture of three Gaussians distributions on the node-coordinate plane, Here, N(v, R) denotes a Gaussian distribution with mean v and covariance R. Initially, for the experiment documented in Fig. 8, the nodes are connected to their 4 nearest neighbors. The infection spread model is once again the one specified in (4). The parameters selected for the results in Fig. 8 are β = 0.3 and η = 0.01 , while we compare the effect of α ∈ {0.05, 0.1} . The numerical result contains 100 independent simulations where each simulation involves independent random construction of the graph according to the Gaussian mixture model, and also random initialization of infected nodes. As before, the first layer of panels depicts the number of susceptible, infected, and recovered individuals as a function of time, for a single realization. The two-dimensional projection of the (�I(t), I(t), S(t)) curve on the (log(�I(t)), log(I(t))) plane is depicted in the second layer of panels and compared to the distribution of (�I(t), I(t)) point set, while the third layer of panels compares the fit of curve to the (�I(t), I(t), S(t)) data set in a three-dimensional plot once again in logarithmic scales. The mean and the standard deviation of the fitted exponents are reported in Table 2c. We carry out the model selection procedure, described for the two-dimensional grid, for the random graph and report the results in Table 3c.  It appears, as hypothesized, that the exponent γ is affected by the local structure of the graph. If vertices are only connected to nearest neighbors, then γ is small. When random connections are introduced (thereby reducing the effective diameter of the graph, i.e., "small world" effect), γ increases to 1. Note that only changing the number of connections to a nearest neighbor does not affect the exponent as shown in Fig. 10a.

COVID-19 data-set.
We utilized the COVID-19 data repository by the center for systems science and engineering (CSSE) at Johns Hopkins University 7,9 . We study the relationship between the number of newly infected �I trans (t) and the total number of infected individuals I(t) for the duration January 22 to September 13, in Italy, Germany, France, and Spain. The number of newly infected individuals is available as the number of newly confirmed cases each day. The total number of infected individuals (active cases) is calculated by subtracting the Figure 5. Spread of infection on a two-dimensional grid, with connection to k = 8 nearest neighbors (i.e., d = 2 in Eq. (5)) and transmission rate β = 0.3 (the recovery rate α is fixed to either 0.05 or 0.1). The top panels depict the number of susceptible, infected, and recovered individuals as a function of time over a range of 50 days, for a single realization of the stochastic model described in "Model: probabilistic SIR on a graph". The center and bottom panel depict (�I trans (t), I(t)) and (�I trans (t), I(t), S(t)) respectively, for 100 Monte-Carlo simulations (with solid blue lines). The fractional model fit is illustrated with orange dashed curve. The fitting procedure maximizes the log-likelihood (9) over the free parameters. The value of the fitted exponents are shown in the legend. www.nature.com/scientificreports/    Table 3. Model selection results for simulated and real data. For the simulated data, the fitting models are based on (8) with parameters c, γ , κ, σ . The models differ in that the exponents γ or κ are fixed or optimally selected (free). For the real data, the size of susceptible population S(t) is assumed to be constant and the exponent κ is ignored. The AIC represents the Akaike information criterion defined according to (7) and MLL denotes the normalized maximum log-likelihood computed by maximizing (9) over free parameters. The reported results for the simulated data are averaged over 100 Monte-Carlo simulations. For the simulated data, the infection parameters α and β are assumed to take different values as indicated in the top row of the tables.  www.nature.com/scientificreports/ number of deaths and recovered from the cumulative sum of confirmed cases. The time evolution of the number of confirmed cases and active cases, for these four different countries, is depicted in Fig. 11. We fitted the model to the data, for the parameter c and exponent γ , for the first 100 days of the pandemic (starting on January 22). Due to the fact that S(t) ≫ I(t) during these initial stages of infection spread, S(t) is treated as constant. The result for the four different countries is depicted in Fig. 12. The model selection result, in comparison to the model where γ = 1 is fixed, is presented in Table 3d.

Discussion
The recent onset of the COVID-19 pandemic has underscored the urgency of accurate models for the spread of infectious diseases. These may help guide the allocation of resources, intervention, and mediation strategies, and may help quantify the impact of lifestyle changes on the progression of the epidemic and the threshold for herd immunity 8 . In particular, one immediate practical question in the current COVID-19 pandemic is to decide when an intervention-such as reinstitution of social distancing (after this has been relaxed)-is appropriate. Thus, while efforts to develop accurate models go back almost a century 16 (see also 4,13 ), the subject is especially urgent today.
(10) log(�I trans (t)) = log(c) + γ log(I(t)) Figure 6. Comparison of four fSIR models with free and partially specified choice of exponents (with the SIR model corresponding to γ = 1 and κ = 1 ) based on simulation data obtained from two-dimensional grid model with infection parameter α = 0.05 and β = 0.3 . The AIC and maximum likelihood scores for these models are reported in Table 3. www.nature.com/scientificreports/ The main thesis of this work is that models of epidemics, especially during the early phases, incorrectly assume that the contagion depends on the product of infected and susceptible populations. Contagion takes place at the boundary of infected cells and as a result it is the topology of the distribution of infected cells that dictate the spread. An analogous situation takes place in tumor growth, where models suggest (see 3,21,23 ) fractional exponent for the contribution of tumor volume, as this more accurately captures the size of the boundary that affects growth. Thus, based on an analogous rationale, we propose a fractional-power alternative, fSIR, to the standard SIR model of disease dynamics. The value of exponents depend on a number of factors including the nature of the boundary between infected cells and the general susceptible population. Specifically, the exponent relates to the level mixing between infected and healthy individuals at interface between the two sub-populations, and may be quantified by the diameter of the graph that represents contacts between individuals.
Our thesis is supported by simulation results as well as by fitting this fSIR model to recent COVID-19 datasets. Specifically, the two-dimensional discrete probabilistic SIR models in Figs. 2 and 3 (with k = 4 nearest-neighbor connection) and in Figs. 4 and 5 (with k = 8 nearest-neighbor connection), that simulate disease propagation on a discrete domain of nodes (representing individuals in contact with one another), suggest exponents γ in the range between 0.66 and 0.76 for the contribution of I(t) on the infection rate. Similar results are observed in Fig. 8 for a two-dimensional random distribution of nodes (vertex set) with four nearest-neighbor contacts The top panels depict the number of susceptible, infected, and recovered individuals as a function of time over a range of 50 days, for a single realization of the stochastic model described in "Model: probabilistic SIR on a graph". The center and bottom panel depict (�I trans (t), I(t)) and (�I trans (t), I(t), S(t)) respectively, for 100 Monte-Carlo simulations (with solid blue lines). The fractional model fit is illustrated with orange dashed curve. The fitting procedure maximizes the log-likelihood (9) over the free parameters. The value of the fitted exponents are shown in the caption.  Fig. 10 highlight the weak dependence of γ on the number of short-range contacts (nearest neighbors) and the strong dependence on even a few long-range contacts amongst the general population. The fit of the COVID-19 data-set 9 gives exponents for the contribution of I(t) on the infection rate in the range of 0.6-0.8. In this data-set, the value of S(t) (that includes the remaining of a rather large total population) varies insignificantly over time, and hence may be treated as constant. A limitation of our experiment is that the value of I(t) is only an estimate since recording of all infected individuals is not guaranteed. Our findings relate to patterns reported in recent studies on COVID-19 11,28 . In particular 28 , reports a power growth law of the infected  www.nature.com/scientificreports/ population with time, which is consistent with our proposal since, in general, fractional exponents lead to power laws (Assuming for simplicity that S(t) is constant, and the recovery rate α = 0 , then the fSIR equation simplifies to İ = βI(t) γ . When γ ∈ (0, 1) , the solution can be readily expressed as I(t) = (c 0 + β(1 − γ )t) 1 1−γ for a constant c 0 . This represents a power law in t, unless γ = 1 , in which case the solution I(t) = c 0 e βt is exponential).
An important direction for future research lies in the better understanding of how fractional powers in macroscopic dynamics arise from the probabilistic-network epidemic models. The majority of existing literature on epidemics on networks deals with steady state analysis and is aimed at assessing percolation thresholds and conditions for the appearance of an endemic state (see the review paper 25 ). It will be also of great interest to explore connections with a body of literature on reaction diffusion dynamics 6,15,31,32 for the purpose of gaining insight into macroscopic laws for epidemics.
The authors believe that it is imperative that a deeper and more extensive study is carried out, whereupon the values of I(t), �I(t), R(t) are estimated from more extensive datasets. The effect of mediation efforts, such as social distancing, should be recorded as well and taken into account by differentiating data for the periods before and after such mediation protocols take effect. It is the authors' hope that questions raised in this work, as to the validity of the basic assumption in SIR models, lead to more reliable and robust ways to estimate the progression of epidemics as well as the progression of the current COVID-19. Figure 11. Number of confirmed and active cases in four different countries during the period 01/22/2020 to 09/13/2020. The shaded area is the time period where the fractional model is used to fit the data in Fig. 12. The number of active cases is computed by subtracting the total deaths and recovered from total confirmed cases.