Stochastic analysis using public data for forecasting of epidemic spreading of the novel coronavirus disease

We propose a stochastic model for epidemic spreading of the novel coronavirus based in data supported by the Brazilian health agencies. Furthermore, we performed an analysis using the Fokker-Planck equation estimating the novel cases in the day t as the mean half-width of the distribution of novel cases P ( N , t ) . Our results display that the model based in the Itô diffusion adjusts well to the results supplied by health Brazilian agencies due to large uncertain in the ofﬁcial data and to the low number of tests realized in the population.


Introduction
The use of models and mathematical methods for theoretical researchers to the study the spread of contagious diseases goes back a least to some works by Daniel Bernoulli in XVIII century on smallpox. 1 One of these powerful techniques are nonlinear differential equations (NL) which are an important topic in different branches of the science and engineering due to their ability to explain several complex behaviors in the nature. [1][2][3][4][5] Diseases as the typhoid fever, smallpox and also the COVID-19 are spread largely by individuals who can transmit the disease but who do not exhibit overt symptoms. Considering x and y the proportions of susceptible and carriers, respectively, in the population, suppose that carriers are identified and removed from the population at a rate dy/dt = −αy. If the disease spreads at a rate proportional to the product x and y, then we will have dx/dt = −γxy. We can solve the set of equations for y at instant t subject to initial condition y(0) = y 0 to obtain the proportion of the population that escapes of the pandemic as the the limit value of x as t → ∞ from the initial condition x(0) = x 0 . For the smallpox, once contracted and survived, the individual present a permanent immunity. Thus considering a group of individuals born in a given time t and considering n(t) the number of the individuals surviving a time t later. We make x(t) the number of members of this group who do not have smallpox at time t and who are therefore susceptible and we let α be the rate at which susceptibles contract smallpox. Furthermore, we let ν, the rate at which people who contract smallpox die from the disease. Finally, We let µ(t) be the death rate from all cases other than smallpox. Then dx/dt (the rate at which the number of susceptible declines), is given by dx/dt = − [α + µ(t)] x, where the first term is the rate at which the susceptibles contract smallpox, and the second term is the rate at which they die from all other cases. Furthermore dn/dt = −ναx − µ(t)n, where dn/dt is the death rate. The two terms on the right side are the death rates due to all other causes, respectively. In the end, we define the variable z given by z = x/n and obtain that z satisfies the following nonlinear differential equation dz/dt = −αz(1 − νz), with the initial condition z(0) = 1 which is independent of µ(t). The ν and α parameters were estimated as ν = α = 1/8. From these values, one can determine the proportion of individuals who never had acquired smallpox.
We can obtain a more realistic model adding some randomness in the differential equation above, where the size of the population at time t and α(t), the relative rate of growth as a function of time t, α(t) are not deterministic, but subject to some random due to environmental noise, so that, α(t) = r(t) + "noise". The exact behaviour of the 'noise term' is not known. However, its probability distribution, yes. The function r(t) is assumed to be non random. [6][7][8] In this work, we investigate the Itô diffusion model with additive white noise and nonlinear terms as a possible model for the spread of the COVID-19 in Brazil. Due to uncertainly in the official data about the real cases numbers generated by the low number of tests made in the population generates a large randomness in the data and therefore, makes the use of the stochastic analysis greatly adequate to treat the spread of time evolution of the COVID-19. Furthermore, we derive the stochastic differential equation, in Itô calculus, corresponding to nonlinear Fokker-Planck equation derived in framework of the non-additive statistical mechanics, which must also obey to stylized features of the financial market as inverse cubic law [9][10][11][12][13][14][15][16][17] .
The plan of this paper is the following. In section 1, we describe the stochastic model. In section 2, we make the connection

A phenomenological Langevin equation for dynamics of the COVID-19
The behavior of the cumulative total cases number P(t) of infected by coronavirus registered in the Brazil as function of time (days) from 15th March, 2020 is displayed in Fig. 2. From fit of least minimums squares to the set of officials data of the Brazilian ministry of health, we estimate the curve for the behavior of the total cases number of COVID-19 given by a polynomial of fourth degree of t (dashed-blue-line) in the Fig. 2 given as f (t) = −4397.7 − 34.3t + 26.2t 2 − 0.8t 3 + 0.01t 4 . The data were registered in the period 03 − 15 -06 − 05. The points of non differentiability is due to an uncertainty of the official data and to population isolation conditions. Hence, for modeling of this behavior, we add a random term together with nonlinear terms in Eq. (1) with aim to simulate the effect of this uncertainty.
where f (t) is e polynomial of least squares fit (n = 4 degree). In Fig. 1, we obtain the behavior of f (t) given by f (t) = −4397.7 − 34.3t + 26.2t 2 − 0.8t 3 + 0.01t 4 . Furthermore, we have A(P(t),t), given by Bernoulli's model: A(P(t),t) = αP(t)(1 − νP(t)) and B(P(t),t) = β 0 t 3 (multiplicative white noise) in Eq. (1) and we perform the calculation for different values of β 0 (constant). We obtain a strong oscillation of the curve with the increasing of β 0 indicating so, an increase of growing of the uncertainly of the cumulative total case number. The data range considered is from 15-March until 06-June.

Nonlinear Fokker-Planck equation 2.1 Nonlinear Fokker-Planck equation within a Itô prescription
The nonlinear Fokker-Planck equation is given in the non-additive statistical mechanics by 18 We obtain that Eq. (2) is equivalent to following Itô stochastic differential equation where is a stochastic process defined on probability space Ω, being the triple (Ω, F , P) a probability space, where F is a σ -algebra and P a probability measure that is, a function that to every set A ∈ F assigns a number in range [0, 1], where P(Ω) = 1, P( / 0) = 0 and The random variable X, defined on Ω with the property that for every Borel subset B of R, the subset of Ω given by {X ∈ B} = {ω ∈ Ω; X(ω) ∈ B} is in the σ -algebra F . Moreover, P(X(t),t), given X a random variable on a probability space (Ω, F , P) is the probability measure of X. µ X assigns to each Borel subset B of R the mass µ X (B) = P{X ∈ B} 7, 8 . Furthermore, we have that dW is the Wiener increment, where W (t) is a Wiener process also known as Brownian motion. We have an environment stochastic white noise ζ (t) that if relate with the Wiener process W (t) by W (t) = t 0 ζ (t )dt . Even though in the literature had been explained differently 19 , we have that W (t) which is the integral of ζ (t), is not the derivative of the Wiener process, ζ (t) = dW (t)/dt, being therefore W (t) not differentiable 6 The Itô integral for φ is given by where ms− lim means square limit. W (t) is a Markovian process, presenting normal distribution, satisfying the conditions We can write the Eq. (2) as where we define A(x,t) = K(x,t) and φ (X(t),t) = [P(X(t),t)] 1−q 2 to obtain the correspondent Itô stochastic differential equation

3/9
which is the Eq. (3). The solution for X(t) using the Stratonovich integral is given as This implies that X(t) is the solution of the following modified Itô equation where φ denotes the derivative of φ (x,t) with respect to x. Therefore, the Eq. (2) in Itô calculus is different of the Stratonovich interpretation.
From the Feynman-Kac theorem, let h(x) be a Borel-measurable function. Fix T > 0, and let t ∈ [0, T ] be given, we define the function where Furthermore, we assume E t,x |h(X(T ))| < ∞ for all t and x, then g(x,t) satisfies the partial differential equation with the terminal condition g(x,t) = h(x) for all x, where we assume that the stochastic process g(X(t),t), 0 ≤ t ≤ T is a martingale. 7,8 . From Eq. (3), we obtain the time development of an arbitrary f (X(t)) using the Itô formula 6 Taking the average of both sides in the equation above, we obtain and using We integrate by parts and discard surface terms to obtain and therefore Thus, we have a complete equivalence between the diffusion process defined by a drift coefficient K(x,t) and a diffusion coefficient given as φ (x,t) = [P(x,t)] 1−q , in which the diffusion process can be locally approximated by an Itô stochastic differential equation.

4/9
Correspondence to Stratonovich stochastic differential equation with K s = K − 1 2 φ ∂ x φ and using the correspondence between the Itô stochastic differential equation and the Fokker-Planck equation, we have a equivalent Fokker-Planck equation which is known as Stratonovich form of the Fokker-Planck equation 6 . However, it is different from Eq. (2). Therefore, we have that the corresponding nonlinear Fokker-Planck equation in the Stratonovich prescription is different from nonlinear equation obtained in the Itô prescription, being the Itô stochastic differential equation more usually employed to make the connection with the Fokker-Planck equation, contrary to Ref. 19 . In spite of both definitions can be related by the choosing of i by τ i = αt i + (1 − α)t i−1 , α ∈ (0, 1) they generate different definitions of stochastic integral (Itô integral and Stratonovich )integral) and consequently to different stochastic differential equations, even though the Itô stochastic differential equation is equivalent to an another Stratonovich equation however, with an additional term 6 .

Existence and Uniqueness
We can investigate the existence and uniqueness of solutions of nonlinear differential equations utilizing the well-known existence and uniqueness theorem for stochastic differential equations 7 . Let T > 0 and K(x,t) : for some constant C and such that for some constant D. Let Z be a random variable which is independent on the σ -algebra F m ∞ generated by W s (·), s ≤ 0 and such that the expectation E |Z| 2 < ∞. Then the stochastic differential equation has a unique t-continuous solution X t (ω) with the property that X t (ω) is adapted to filtration F Z t generated by Z and W s (·); 3 Analysis by Fokker-Planck equation

Stochastic model for novel cases
The behavior of the novel cases number N(t) of infected by coronavirus registered in the Brazil as function of time (days) registered since 15th March, 2020 is displayed in Fig. 2. From fit of least minimums squares to the set of officials data of the Brazilian ministry of health, we estimate the curve for the behavior of the novel cases of COVID-19 given by a polynomial of fourth order of t (solid-blue-line) in the Fig. 2 given as N(t) = −1152.5 + 126.1t − 4.0925t 2 + 0.064749t 3 − 2.6 × 10 −5 t 4 . The data are by May 22 th . The zig-zag behavior into the range of t considered reflects in an increase of the uncertainty in the data and to population isolation conditions. For modeling of the situation, we added a random term together with nonlinear terms in Eq. (1) with aim to simulate the effect of this uncertainty. The α and ν parameters and the nonlinear polynomial added comes from the Bernoulli model for the spreading of the smallpox with the value α = ν = 1/8 obtained by him. As the COVID-19 as the smallpox, once contracted and survived, confers a lifetime immunity, considering a group of individuals in a given time t where we have a number of these individuals surviving a time t later. We propose then that the model for the novel cases in each day t obeys to the following stochastic model where f (t) is a polynomial of n degree (n = 4). With a drift term in the form of a standard Bernoulli model of disease spread and is driven by a Wiener process with normal distribution. The deterministic part can adjust to the reported novel cases, in the average.

Analytical Results
We can perform the simulation of the model Eq. (25) with term βW (t), Winner increment and an additive white noise of standard deviation σ w = √ ∆t. We can write the Winner increment as β dW (t) ∼ √ dtβ R G , where R G is an aleatory generator number with Gaussian distribution of mean zero and variance σ 2 w = 1. From stochastic equation Eq. (25), we obtain the time development of an arbitrary function f (X(t)) by using of the Itô formula where higher order terms have been discarded, and (dW (t)) 2 = dt. Taking the average of both sides in the equation above and defining γ = β 2 , (β constant) we obtain Using We integrate by parts and discard surface terms to obtain and hence The associated Fokker-Planck equation to the above model is given by taking the Fourier transform of the Fokker-Planck equation, we can guarantee the normalization of the probability density in which P(x) is reasonably well behaved. We take the boundaries at infinity for P(x,t) as lim x→∞ P(x,t) = 0 and therefore ∂ x P(x) being reasonably well behaved. As lim x→∞ ∂ x P(x,t) = 0 thus, a nonzero current of probability at infinity will usually require that the terms in the equation above become infinite there 6 . We use the initial condition P(x = x 0 ,t = 0) = P 0 . For solving the Fokker-Plank equation above independent on time we make the power series expansion P(x,t) = ∑ ∞ n=0 a n (t)x n to obtain n(n − 1)a n x n−2 . (32) We obtain the relations α ∞ ∑ n=0 a n x n − 2να ∞ ∑ n=1 a n−1 x n + ∞ ∑ n=1 na n x n − αν ∞ ∑ n=2 (n − 1)a n−1 x n + γ 2 ∞ ∑ n=0 (n + 1)(n + 2)a n+2 x n = k, where k is a separation constant. For k = 0, one obtains the following recurrence relations for 0 ≤ n ≤ 2, and a n+2 = 2 γ [αν(n + 1)a n−1 − (α + n)a n ] (n + 2)(n + 1) , for n > 2. Additionally, we have Therefore, we obtain P(x) in the form The constants a 0 and a 1 are determined by the initial conditions P(0, 0) = P 0 and ∂ x P(x, 0) = 0 in x = 0. We obtain a 0 = P 0 and a 1 = 0. From the normalization condition, the second term in the density probability above must be zero and therefore, all coefficients a 1 must cancel. Therefore, we have To ensure the normalization of the probability density, P 0 must be non zero only within interval −ε ≤ x ≤ ε and zero out it. For k = 0, we have from Eq. (33) that n = 0 and a 2 + αa 0 /γ = k and all a n higher are zero. Consequently, we obtain from integration of the Eq. (35) We obtain the mean half-width of the distribution as function of time as given as Furthermore, we have (n = 0) da 0 (t)/dt = 0, or a 0 (t) = c, a 1 = k/2 f (t) and P(t) = P 0 t + c − k/2, where we define p 0 = c − k/2. Therefore How the first cases are registered on March 15th (t = 15), we obtain p 0 = −1.5 an P 0 = 0.1 for the official results of the novel cases number. In Fig. 3(left), we plot the time series of novel cases registered in the Brazil and the time series of the model Eq. (25) We obtain that the model adjust well to the data within range considered. In right plot we obtain the mean half-width as function of time t (days). We calculate the variance of the distribution of novel cases, where the standard deviation gives an estimating of the number of the novel cases at day t. The results fit qualitatively to the official data of ministry of healthy. According of the results obtained we obtain a forecasting of growing in the new cases number in the next weeks.
In Fig. 4, we display the behavior of the kurtosis (excess), λ 4 (t) as function of time. The excess kurtosis relates to the deviation of the tail of the distribution, as compared to Gaussian. The negative value obtained for the kurtosis for large t values indicates that the shape of the distribution is near to the Wigner semicircle. Furthermore, at range of small t where the kurtosis is nearest to zero we have that the distribution is nearest of a Gaussian (λ 4 = 0). The small oscillation in the range short time from the beginning of the registration of cases should be due to the low number of data in this range of time.

Conclusions
In Brief, we propose a stochastic model for the time evolution of the SARS-CoV-2 (COVID- 19) in Brazil based in the nonlinear Itô's diffusion. Our results are compared with official data supplied by the Brazilian healthy agencies where due to large uncertain in the results generated principally by the low number of tests made in the population and hence, to under reporting, generates a large uncertainly in the official results and consequently, the stochastic differential equation analysis becomes the more realistic model for growing of the total number of infected P and novel cases number N(t). We can use an approach beyond white noise limit to try to better describe the expansion dynamics what can be done in a future work. The model reported here is based on Brazilian data from March 15 which shows an upward trend in the coming weeks.