Estimation of COVID-19 recovery and decease periods in Canada using delay model

We derive a novel model escorted by large scale compartments, based on a set of coupled delay differential equations with extensive delays, in order to estimate the incubation, recovery and decease periods of COVID-19, and more generally any infectious disease. This is possible thanks to some optimization algorithms applied to publicly available database of confirmed corona cases, recovered cases and death toll. In this purpose, we separate (1) the total cases into 14 groups corresponding to 14 incubation periods, (2) the recovered cases into 406 groups corresponding to a combination of incubation and recovery periods, and (3) the death toll into 406 groups corresponding to a combination of incubation and decease periods. In this paper, we focus on recovery and decease periods and their correlation with the incubation period. The estimated mean recovery period we obtain is 22.14 days (95% Confidence Interval (CI) 22.00–22.27), and the 90th percentile is 28.91 days (95% CI 28.71–29.13), which is in agreement with statistical supported studies. The bimodal gamma distribution reveals that there are two groups of recovered individuals with a short recovery period, mean 21.02 days (95% CI 20.92–21.12), and a long recovery period, mean 38.88 days (95% CI 38.61–39.15). Our study shows that the characteristic of the decease period and the recovery period are alike. From the bivariate analysis, we observe a high probability domain for recovered individuals with respect to incubation and recovery periods. A similar domain is obtained for deaths analyzing bivariate distribution of incubation and decease periods.

The outbreak of coronavirus disease 2019 (COVID- 19), reported early in Wuhan (China) 1 and spread around the world, is creating dramatic and daily changes with profound impacts worldwide. As a consequence the outbreak was declared a pandemic by the World Health Organization (WHO) in March 2020 2 , and by the end of 2020, COVID-19 has infected about 79.2 millions of people in the world, with an approximate cumulative global mortality of 3.2% 2 . To limit the impact of this deadly virus, a rapid and widespread vaccination of the population is now in place. However, it is established that vaccine are not 100% effective to stop the transmission or infection of COVID- 19 1 India) which make the situation more challenging. In this circumstance to get a complete feature of COVID-19, it is essential to fully understand the key (incubation, recovery and decease) periods.
We already successfully estimated the incubation period of COVID-19 in Canada 3 . The previous model 3 simply allowed for the calculation of the incubation period, while the current model allows for the calculation of all key periods, incubation, recovery and decease. In the present context, we focus on the recovery and decease periods and their correlation with the incubation period. In the current framework, we define the recovery period as the time from the contraction of the coronavirus to recovery, i.e., the incubation period plus the onset time from the symptom to recovery; the latter is the same as the viral shedding of SARS-CoV-2. We describe the decease period in the same way as the recovery period. Understanding the recovery period of disease is very useful information in the struggle against the disease. If the incidence of a disease is remarkably high and the recovery period of the disease is also high then the prevalence of the disease in the country is likely to increase which in turn puts extra health, economic and social burden on this country. Understanding the recovery period of the disease will help governments to plan proper strategies to counter the disease and to organize the requirements such as hospitals, doctors, medical staffs, medical equipment's, etc. It will also help to implement different social and economic policies which will be essential to fight the disease.
To the best of our knowledge, we demonstrate for the first time a substantial compartment based model, with a total 830 partitions, in order to estimate the key (incubation, recovery, decease periods) periods of COVID-19 as well as the bivariate distribution of incubation and recovery periods, and the bivariate distribution of incubation and decease periods. This will be achieved using publicly available database 30 of the total number of corona-positive cases, recovery and death toll. There is no scope to verify the database which is the only limitation of the present study. Using the novel model, demonstrated here, we divide the publicly available database into thousands of groups, and these separated classes are the key source for estimating all the key periods. This approach is free from any special type of samples in order to produce the distributions of those periods; it only involves large scale computations for estimating about thousand model parameters. After a single calculation of this method, we can generate the current distributions as well as previous distributions of those periods. In the statistical based approaches, it is usually difficult to consider large incubation, recovery and decease periods if the sample size is small. However, in our approach, we can go well beyond 14 days, the maximum incubation period that we have set in this paper, and beyond the interval 2 weeks to 6 weeks, the range of recovery as well as decease periods that we have considered in the current computations. As of May 23, 2021, the World Health Organization (WHO) had confirmed a total of 1,359,180 cases of COVID-19 in Canada, including 25,231 deaths 2 . As of May 23, 2021 there are five provinces, out of eleven provinces where we observe significant effect of COVID-19, in Canada with death toll more than 1000 (Fig. 1a), and the recovery and death rates are respectively 96.1% and 0.7% (Fig. 1c). During the first wave of COVID-19 in Canada, January 22, 2020 to July 16, 2020, the recovery and death rates were respectively 66.5% and 8.1% (Fig. 1b). Here, we assume that the recovery and decease periods of COVID-19 remain unchanged i.e., these periods during the first wave and the present time are almost identical, and under this assumption we merely consider the database of first wave, the period before vaccination, for the calculation. In the present context, we do not consider the patient's gender or age due to lack of the required data.
There are various studies on recovery period, and no result is reported (to the best of our knowledge) on bivariate distributions as mentioned above. The key periods may depend on age 31 (median-age/country), hard immunity, public health system, corona testing capacities, daily corona cases, etc. For a better estimation of the key periods for a particular region, we need to study local patients. Data collection is a bottleneck in studying those key periods for COVID-19 or other infectious diseases using clinical survey, and we need a sample of large size for bivariate analysis. However, key periods can easily be estimated using the approach we propose here, the publicly available database along with optimization algorithms.

Results
The proposed model assists us to generate new refined recovery and death toll database, R ik and D ik , by dividing the total recovered individuals and the total number of deaths as of July 16, 2020 into myriad of groups. The new database is the key source for studying all kinds of distributions, reported in the article.
Validation of the proposed model. After estimating the model parameters with sufficiently small values of error functions, we obtain a good agreement ( Fig. 1e-g) between the calculated values of the model variables such as total corona-positive cases, number of recovered individuals, etc. and the available data 30 . The population of the infected group gradually increased until end of April 2020, and thereafter slowed down (Fig. 1d).
Using the elements D ik for i = 1, 2, . . . , 14 and k = 1, 2, . . . , 29 , we obtain a bivariate histogram (Fig. 4a) for the incubation and decease periods. There are two peaks at the points (3,22), i.e., for τ i = 3 and η k = 22 , and (9, 23), i.e., for τ i = 9 and η k = 23 , corresponding to the high densities of deaths. We estimate the histogram using 4b) where the variables τ and η represent the incubation and decease periods, respectively. The mean m (d) τ and standard deviation σ (d) τ of the incubation period are 6.56 (95% CI 6.36-6.76) and 3.00 (95% CI 2.86-3.15), respectively; the mean m η and standard deviation σ η of the decease period are 21.64 (95% CI 21.33-21.94) and 4.43 (95% CI 4.23-4.65), respectively; the correlation between incubation and decease periods ρ τ η is − 0.008. The two dimensional representation of the bivariate normal distribution (Fig. 4c) shows that the highly probable decease region (red in the figure) is a nested domain of τ = 6.56 and η = 21.64 . To precisely analyze the highly probable regions, we estimate the histogram (Fig. 4a) using a nonparametric density function with a width of 40%, low and high percentage give a more local and global estimation, respectively, and obtain a distribution with two peaks (Fig. 4d), one in the high probability region (red in figure) and another one in the second high probability region (yellow in figure). The highly probable region is surrounding the point τ = 8.17 , η = 21.86 (Fig. 4e). The bivariate mixture distribution analysis shows that we can estimate the histogram of D ik for i = 1, 2, . . . , 14 and k = 1, 2, . . . , 29 using a combination of two bivariate normal distributions, where the superscript 1 (resp. 2) represents the parameters for first (resp. second) component. The parameters of the first component are m (d1) Onset time from symptom to recovery. Using the fact that τ + θ = ζ and the property of expecta-

Discussion
In the present context, we estimate the recovery as well as decease periods using a novel compartment based model and publicly available database. Here, we consider a maximum length of the incubation period of 14 days, and the ranges of the recovery and decease periods are from 2 to 6 weeks. However, in our method, we can go well beyond all those ranges; the longer ranges simply require a long computational time. Notice that our method could apply the proposed model to estimate key periods for any infectious disease, as along as similar data are available. Calculating the incubation and recovery periods for other counties is naturally possible using the proposed model. The multi-group database R ik , i = 1, 2, . . . , 14 and k = 1, 2, . . . , 29 , generated from the model, is the key source to compute all types of distribution of the recovery period, univariate, bimodal and bivariate. The bimodal gamma distribution of the recovery period, 0.9365 Ŵ(ζ , K r1 , θ r1 ) + 0.0635 Ŵ(ζ , K r2 , θ r2 ) , demonstrates that the recovery period of 93.65% recovered individuals obeys the distribution Ŵ(ζ , K r1 , θ r1 ) , and that of 6.35% recovered individuals obeys the distribution Ŵ(ζ , K r2 , θ r2 ) . Thus, there are two groups of recovered individuals with short recovery period, 21.02 days (on average), and long recovery period, 38.88 days (on average). The characteristics of those two groups may depend on age, underlying health condition, immunity, etc. The database of numerous groups D ik , i = 1, 2, . . . , 14 and k = 1, 2, . . . , 29 , generated from the model, is the key source to compute all types of distribution of the decease period, univariate, bimodal and bivariate. The bimodal gamma distribution of the decease period, 0.9508 Ŵ(ζ , K d1 , θ d1 ) + 0.0492 Ŵ(ζ , K d2 , θ d2 ) , demonstrates that the decease period of 95.08% deaths obeys the distribution Ŵ(ζ , K d1 , θ d1 ) , and that of 4.92% deaths obeys the distribution Ŵ(ζ , K d2 , θ d2 ) . Thus, there are two groups of deaths with short decease period, 21.18 days (on average), and long decease period, 38.41 days (on average). The characteristics of those two groups may depend on age, underlying health condition, immunity, etc. The calculated results employing the proposed model show that the recovery and decease periods are the same. It seems that the survival period of the coronavirus is the same as that of human, in the form of immunity.
The bivariate normal distribution of incubation and recovery periods indicates a recovery window of 4.82 ≤ τ ≤ 8.49 and 19.27 ≤ ζ ≤ 25.72 as the highly probable domain for recovery. The bivariate normal distribution of incubation and decease periods indicates a decease window of 4.55 ≤ τ ≤ 8.45 and 19.35 ≤ η ≤ 24.85 as the highly probable domain of deaths. The study shows that the recovery and decease windows almost coincide within these key periods. To determine precisely the recovery as well as the decease windows, we use nonparametric distributions. Under the nonparametric analysis we identify two recovery windows, 2.27 ≤ τ ≤ 4.38 , 18.41 ≤ ζ ≤ 22.79 and 6.42 ≤ τ ≤ 9.63 , 17.81 ≤ ζ ≤ 23.93 , and one decease window, 6.34 ≤ τ ≤ 9.41 , 20.17 ≤ η ≤ 23.69 . Nonparametric analysis provides some discrepancy between the recovery and decease windows.

Conclusions
We have developed a novel compartment based model to divide the publicly available database of total confirmed cases, recovered cases, and number of deaths into numerous subgroups to obtain the distributions of the recovered and decease periods. The outcomes of this study can be divided into three categories; these are univariate, univariate (bimodal) and bivariate distributions. We obtain mean recovery and decease periods from the univariate distribution. We observe two groups of recovered individuals as well as deaths: a short recovery (decease) period and a long recovery (decease) period from the univariate (bimodal) distribution. From the bivariate analysis, we investigate the correlation between the incubation and recovery periods as well as the correlation between incubation and decease periods. The model itself and the procedure to solve it, are the core of this work, and it can be applied to any infectious disease in any region. We obtain the distributions of the key periods from the population, considering all types of cases (non-hospitalized, non-ICU, ICU) of recovered individuals and deaths, which is naturally better than any sample-dependent result. In this approach, we do not need any clinical survey; the publicly available data on confirmed cases, recovery and death toll, are sufficient to analyze the univariate and bivariate distributions. The current model can be extended to study age-based key periods, but for this purpose we need an age dependent database. The monotonic iteration scheme, introduced for better estimation, can be applied to numerical analysis problems.

Methods
In this section, we introduce a compartment based infectious disease model including a large number of partitions, Lockdown, Susceptible, Removed, Infected, fourteen compartments of Confirmed cases, hundreds of compartments of Recovered and Deaths. The model is constructed as a set of coupled delay differential equations involving few thousands of variables and parameters, and will be used, not as a prediction tool, but (1) for constructing the myriad groups of recovered individuals and death tools and (2) estimating accurately the recovery and decease periods. This model will however have to be parameterized and validated using existing data, in order to justify its accuracy and its application in the proposed methodology.
The model. Modeling the spread of pandemics is an essential tool for projecting its outcome. By estimating important epidemiological parameters using the available database and optimization techniques, we can make predictions of different intervention scenarios. Compartment based model, where the population of a region is distributed into several population groups, such as susceptible, infected, total cases, etc., is a simple but useful tool to demonstrate the panorama of an epidemic. The proposed model is an extension of our previous work 3 , including a very large number of compartments of recovered and deaths individuals; the schematic diagram of the model is presented in Fig. 5a. The following are the underlying principles of the present model.
• The total population is constant (neglecting the migrations, births and unrelated deaths) and initially every individual is assumed susceptible to contract the disease. • The disease is spread through the direct (face-to-face meeting) or indirect (through air current, common used or delivery items like door handles, grocery products) contact of susceptible individuals with the infected individuals. • The quarantined area or the compartment for corona cases contains only members of the infected population who are tested corona-positive. • The virus kills a part of the people it infects; the survivors represent the recovered group.
• There is a non-pharmaceutical policy (stay at home), commonly known as lockdown, to stop the spread of the disease. • The group of asymptomatic patients is a part of infected individuals, and the never-tested recovered asymptomatic patients can be removed from the infected group. If an asymptomatic patient dies, it is counted after investigation.
Based on the above principles, we consider the following compartments: • Lockdown (insusceptible) (L): the group of persons who are keeping themselves safe.  In the present context, we assume that there is no overlap between these two compartments, infected (I) and confirmed cases (C). In other words, tested corona-positive individuals are assumed to be unable to substantially (1)  ik represents the total corona-positive cases corresponding the incubation period τ i , recovered individuals corresponding the incubation period τ i and onset time θ ik i.e., recovery period ζ k = τ i + θ ik and death toll corresponding the incubation period τ i and onset time µ ik i.e., decease period η k = τ i + µ ik , respectively, presented in Fig. 5a.
The time-dependent model is the following set of coupled delay differential equations, for i = 1, . . . , J: where the real positive modeling parameters α , β , γ δ i , ik , κ ik and ν are the rate of lockdown, the rate of infection, the rate of recovery from the asymptomatic group, the rate of tested corona-positive corresponding the incubation period τ i , the rate of recovery corresponding the recovery period ζ k , the rate of decease corresponding the decease period η k and the rate of transit from lockdown compartment to susceptible compartment, respectively. The variables S(t − τ i ) and I(t − τ i ) denote the cumulative data of (t − τ i ) days, i.e., total number of suspected and infected individuals of (t − τ i ) days. The factors /N convey the rate of individuals who were infected τ i days ago, the rate of individuals who were infected τ i + θ ik days ago and recovered, the rate of individuals who were infected τ i + µ ik days ago and died, respectively. It follows from (4), that for any t where N (constant) is the total population size. We can define a group of new variable T i for i = 1, . . . , J such that and where T, total confirmed cases, is the group of individuals who tested corona positive (active cases + recovered + deaths). From Eq.(4) we can generate three different sets of coupled delay differential equations for i = 1, . . . , J and k = 1, . . . , M  (9) and (10) can be used to calculate incubation period 3 , recovery period and decease period, respectively. In the present context, we focus on recovery as well as decease periods. We solve Eqs. (8), (9) and (10) using matlab inner-embedded function dde23 with particular sets of model parameters. To solve the initial value problem, in the interval [t 0 , t 1 ] , we consider L(t 0 ) , S(t 0 ) , I(t 0 ) , T(t 0 ) , R(t 0 ) and D(t 0 ) as follows: where T(t 0 ) , R(t 0 ) and D(t 0 ) are the available data at time t 0 , and q is the initial value adjusting parameters. Initially, there are no lockdown individual and no removed individuals from the asymptomatic group so that we can consider L(t 0 ) = 0 and V (t 0 ) = 0. It follows from (7) and (11) In the present context T (t 0 ) = 0 , since there were no corona-positive cases reported on January 22, 2020. As a consequence, we also take T i (t 0 ) = 0 for i = 1, 2, . . . , J , and the similar assumptions are valid for R ik (t 0 ) and D ik (t 0 ) i.e., R ik (t 0 ) = 0 and D ik (t 0 ) = 0 for i = 1, 2, . . . , J and k = 1, 2, . . . , M.
Parameter estimation. We focus on the exponential growth phase of the COVID-19 epidemic in Canada; one can use this approach to estimate the incubation, recovery period and decease periods for any region affected by this infectious disease. The time resolved (daily updated) database 30 provides the number of total corona-positive cases, the number of recovered individuals and the death toll. We define two groups of model parameters: primary parameters, the parameters involved in Eq. (8) i.e., q, α , β , γ , δ i for i = 1, 2, . . . , J and ν , and secondary parameters, the parameters involved in Eqs. (9) and (10) other than the primary parameters i.e., ik and κ ik for i = 1, 2, . . . , J and k = 1, 2, . . . , M . We use the estimated values of the primary parameters to optimize the secondary parameters. The optimal values of the primary parameters P 0 = (q, α, β, δ 1 (t), . . . , δ J (t), ν) T , q is the initial value of I(t), is obtained by minimizing the error function Er(P 0 ) , defined as where T (m) is the available data of total corona-positive cases on the particular mth day, and T (m) is the calculated results obtained from the system (8). The integer K, used in (13), is the size of the data set. Due to the complexity of the error function, the minimization using the matlab function fminsearch requires a very large number of iterations. We use the similar error functions Er(P 1 ) and Er(P 2 ) to optimize the secondary parameters   www.nature.com/scientificreports/ Numerical experiment. In this section, we present a detailed description of the computational procedure for the proposed model. On 23 January 2020, a 56-year old man admitted to Toronto hospital emergency department in Toronto with a new onset of fever and nonproductive cough, and returning from Wuhan, China, the day prior 34,35 . It is believed this is the first confirmed case of 2019-nCoV in Canada, and according to the government report, the novel coronavirus arrived on the Canadian coast on January 25, 2020, first reported case. The above information suggests that the start date of the current pandemic in Canada is possibly to be January 22, 2020. Additionally, some research studies reported that the estimation of the incubation period is from 2 to 14 days, and recovery as well as decease period of COVID-19 is from 2 to 6 weeks 2,36 . As a consequence, in the present study we consider J = 14 i.e, 14 delays for the incubation period, and M = 42 − 14 + 1 = 29 i.e, 29 delays for the recovery as well as decease periods. Here we consider a calculation of 177 days, from January 22, 2020 to July 16, 2020, duration of the first wave in Canada. The purpose of the model is to separate the publicly available database T, R and D into myriad groups T i , R ik and D ik for i = 1, 2, . . . , 14 and k = 1, 2, . . . , 29 (Fig. 5b).
Then the local minimum computed by the optimization algorithm depends on the initial values of the parameters: for q, α , β , ν we consider any positive random number less than unity, where as a choice of δ = (δ 1 , . . . , δ 14 ) T is tricky. For this purpose, we consider a vector of 14 positive random numbers δ such that δ 1 < · · · < δ 4 < δ 5 > δ 6 > · · · > δ 14 and 14 i=1 δ i = 0.9 . We observe, from numerous numerical experiments, the renormalization factor 0.9 works perfectly for the computation. The estimated values of the primary parameters P op 0 are presented in Fig. 5e, and the value of the error function Er(P 0 ) = 41.64. The estimated values of the primary parameters are related to Eq. (8), the set of coupled delay differential equations, and Eq. (13), the error function. Using the estimated values of the primary parameters, we optimize the secondary parameters ik for i = 1, 2, . . . , 14 and k = 1, 2, . . . , 29 related to Eqs. (9) and (14). The choice of the initial values of ik is such that for any fixed i, i = 1, 2, . . . , 14 , the first fourteen ik s i.e., { i1 , i2 , . . . , i14 } are in ascending order, and the rest i.e., { i15 , i16 , . . . , i29 } are in descending order; and 29 k=1 ik = 0.72 . After optimization, we obtain the value of the error function Er(P 1 op ) = 236.47. Using the estimated values of the primary parameters, we optimize the secondary parameters κ ik for i = 1, 2, . . . , 14 and k = 1, 2, . . . , 29 related to Eqs. (10) and (15). The choice of the initial values of κ ik is such that for any fixed i, i = 1, 2, . . . , 14 , the first fourteen κ ik s i.e., {κ i1 , κ i2 , . . . , κ i14 } are in ascending order, and the rest i.e., {κ i15 , κ i16 , . . . , κ i29 } are in descending order; and 29 k=1 κ ik = 0.1 . After optimization, we obtain the value of the error function Er(P • Step 3 In the second iteration, we optimize the subdomain 2 and keeping the other subdomains of � (1) unchanged. After second iteration, we get estimated parameters The optimization of the subdomain 2 , demonstrated in Step3, is related to minimizing the error function such that Er(� (1) ) ≥ Er(� (2) ) ; the equality sign holds for op 2 = 2 . The error function of the (n + 1) th iteration, Er(� (n+1) ) cannot be greater than that of nth iteration, Er(� (n) ) , because of this characteristic of the error function we define the approach as MIS. The flow chat of the optimization scheme is presented in Fig. 5d. The upper and lower panels of Fig. 5f show the estimated values of the secondary parameters ik and κ ik , respectively, obtained from the MIS. The upper and lower panels of Fig. 5g show the values of the error functions E R , using MIS to optimize ik , and E D , using MIS to optimize κ ik , for fourteen iteration steps and Er(P  Figure 5g shows that the MIS works efficiently to get better estimations.

Data availibility
The data used to estimate the model parameters are publicly available and are available with the code.