Distribution of incubation periods of COVID-19 in the Canadian context

We propose a novel model based on a set of coupled delay differential equations with fourteen delays in order to accurately estimate the incubation period of COVID-19, employing publicly available data of confirmed corona cases. In this goal, we separate the total cases into fourteen groups for the corresponding fourteen incubation periods. The estimated mean incubation period we obtain is 6.74 days (95% Confidence Interval(CI): 6.35 to 7.13), and the 90th percentile is 11.64 days (95% CI: 11.22 to 12.17), corresponding to a good agreement with statistical supported studies. This model provides an almost zero-cost computational complexity to estimate the incubation period.

The outbreak of coronavirus disease 2019 (COVID- 19), reported early in Wuhan (China) and spread around the world 1 , is creating dramatic and daily changes with profound impacts worldwide. People with underlying medical condition, respiratory disease, diabetes, cancer, etc., and older people are vulnerable to severe complications and death from coronavirus, although we are discovering new features of COVID-19 every day. In the absence of vaccination and proper medication, we can merely obey some non-pharmaceutical containments, lockdown, social distancing, hand hygiene, face masking, mobile app to trace corona-positive individuals, to help prevent the spread of this infectious disease. To minimize transmission of the virus through human-to-human interaction, quarantine of individuals with exposure to infectious pathogen is an effective strategy for containing contagious diseases. To govern a quarantine period for asymptomatic as well as presymptomatic individuals, it is essential to fully understand the incubation period of COVID-19 disease.
The incubation period of an infectious disease is the time between when a person is infected by a virus and when the first symptoms of the disease are noticed. Estimates of the incubation period for COVID-19 range from 2 to 14 days, according to the investigation so far. Precise knowledge of the incubation period is crucial to control infectious disease like COVID-19; a long incubation period means a high risk of further spreading the disease. The distribution of the incubation period can be used to estimate the basic reproduction number R 0 , a key factor of epidemics, in order to measure the potential for disease transmission. It is indeed difficult to obtain a good estimate of the incubation period on the basis of limited data.
To the best of our knowledge, we are proposing for the first time a mathematical model, comprising fourteen delays, to estimate the incubation period utilizing publicly available data of the total number of corona-positive cases. This approach is free from any special type of samples in order to produce the distribution of the incubation period. It is then almost cost free, as it only involves a small scale computations. After a single calculation employing this method, we can generate the current distribution as well as previous distributions of the incubation period. We can also observe the change in the incubation period. In the statistical based approach, it is usually difficult to consider a large incubation period if the sample size is small. However, in this approach, we can go well beyond 14 days, the maximum incubation period we have set for the current work. In this context, we demonstrate the incubation period of the COVID-19 epidemic in Canada employing publicly available data of confirmed corona-positive cases 28 . As of November 7, 2020, the World Health Organization (WHO) had confirmed a total of 251,338 cases of COVID-19 in Canada, including 10,381 deaths 1 .
There are several studies on incubation period mainly based on Chinese patients that can only provide a rough estimation for rest of the world. The incubation period may depends on age 29  www.nature.com/scientificreports/ of the incubation period for a particular region, we need to study local patients. Data collection is a bottleneck in studying the incubation period. However, one can easily estimate the incubation period using the approach we propose and publicly available data of confirmed cases.

Results and discussion
After estimating the model parameters with sufficiently small values of the error functions, we compare in Fig. 1 total corona cases calculated with our model and the available data 28 . This shows excellent agreement. In Fig. 2, the confirmed cases 211,735 of 276 days are divided into fourteen groups. The ith compartment T i , defined in Eq. (3), is the confirmed cases of 276 days corresponding to the incubation period of i day(s) for i = 1, 2, . . . , 14 .
In addition T i is the frequency of the incubation period of i day(s), and using the bar chat, we obtain a mean incubation period of 6.89 days, a median of the incubation period of 6 days, 90th percentile of 11 days, 95th percentile of 12 days and 99th percentile of 13.5 days. The bar chat shows that mode of the incubation period is of 6 days, and there is a second peak for the incubation period at 10 days. However, the second peak is strongly dominated by the first one. From the bar chat presented in Fig. 2, we can also obtain the probability densities of incubation periods of the first k days during the epidemic, the total confirmed cases of the first k days starting  10 , an oscillatory behaviour is observed. The descending pattern for 2 ≤ i ≤ 5 and the ascending order for 7 ≤ i ≤ 9 and 11 ≤ i ≤ 14 indicate that the mean incubation period is rising. Now, we fit the frequency data for the first 276 days, presented in the bar chat Fig. 2, with the lognormal distribution function and obtain the lognormal distribution parameters µ = 1.79 and σ = 0.52 . Fig. 4 shows the lognormal distribution function of the incubation period for the first 276 days and population size 211,735 with p The estimated incubation period, obtained using lognormal distribution, has a Figure 3. Probability densities of incubation period, presented in Eq. (1). The 'first 100 days' indicates that the density of incubation period based on the cumulative data of the first 100 days during the epidemic starting from January 22, 2020 and similar for other two.
. The estimated incubation period, obtained from the frequency table of 276th day and population size of 2258, has a mean of 7.14 days, a median of 7 days, the 90th percentile of 11 days, 95th percentile of 12.5 days and 99th percentile of 14 days. We generate the lognormal distribution function from the 276th day's frequency data and obtain the lognormal distribution parameters µ = 1.83 and σ = 0.53 . Fig. 5 shows the lognormal distribution of 276th day along with p (276) i . The estimated incubation period of 276th day, population size of 2258 and obtained using a lognormal distribution, has a mean of 6.98 days (95% CI: 6.41 to 7.55) and the 90th percentile of 12.29 days. A list of several studies along with the present calculation "Math.-Model" are presented in Table 1; we present the mean incubation period for the raw data as well as the lognormal distribution function. The list shows that our calculated data are closed the value reported by Ma et. al. 9 using a larger sample size of 587. The calculated mean incubation period using two different ways, the raw data as well as the lognormal distribution are indeed closed, indicating that the raw data calculated using our mathematical

Methods
In this section, we introduce a compartment based infectious disease model including a total of seventeen partitions, Lockdown, Susceptible, Infected and fourteen compartments of Total confirmed cases (LSIT). The model is constructed as a set of coupled delay differential equations involving several variables and parameters.
The model. Modeling the spread of epidemic is an essential tool for projecting its outcome. By estimating important epidemiological parameters using the available database, we can make forecasts of different intervention scenarios. In the context of compartment based model, where the population of a region is distributed into several population groups, such as susceptible, infected, total cases, etc., is a simple but useful tool to demonstrate the panorama of an epidemic. In this article, we introduce an infectious disease model, extending the standard SIR model, including the phenomenon lockdown, a non-pharmaceutical way to prevent the spread of the epidemic. The schematic diagram of the model is presented in Fig. 6 with several compartments and various model parameters. The following are the underlying principles of the present model.
• The total population is constant (neglecting the migrations, births and unrelated deaths) and initially every individual is assumed susceptible to contract the disease. • The disease is spread through the direct (face-to-face meeting) or indirect (through air current, common used or delivery items like door handles, grocery products) contact of susceptible individuals with the infected individuals. • The quarantined area or the compartment for corona cases contains only members of the infected population who are tested corona-positive. • The virus always kills some percent of the people it infects; the survivors percent represents the recovered group. • There is a non-pharmaceutical policy (stay at home), commonly known as lockdown, to stop the spread of the disease.
Based on the above principles, we consider several compartments: • Susceptible (S): the group of individuals who can be infected.
• Infected (I): the group of people who are spreading the contiguous disease.
• Total cases (T): the group of individuals who tested corona-positive (Active cases + Recovered + Deaths).
• Lockdown (insusceptible) (L): the group of persons who are keeping themselves safe.
In the model, we assume that there is no overlap between these two compartments, infected(I) and total cases (T). In other words, tested corona-positive individuals are assumed to be unable to substantially spread the disease due to isolation and are immune to re-infection after recovery 30 . The goal of the present model is to estimate the distribution of the incubation periods of COVID-19. In this goal, we split the compartment T into J subcomponents T 1 , . . . , T J , where i represents the total corona-positive cases corresponding the incubation period τ i , presented in Fig. 6. The time-dependent model is the following set of coupled delay differential equations: where α(t) , β(t) , δ i (t) , for i = 1, . . . , J and ν(t) are real positive parameters respectively modeling the rate of lockdown, the rate of infection, the rate of tested corona-positive corresponding the incubation period τ i and the rate of transit from lockdown compartment to susceptible compartment, respectively. The coefficient α(t) can be expressed in the mathematical form as where α (1) and α (2) are constants, and the similar expressions are valid for other time-dependent parameters.
Here t 0 , t d and t 1 are the initial, intermediate and final times, respectively. The variables S(t − τ i ) and I(t − τ i ) denote the cumulative data of (t − τ i ) days, i.e., total number of suspected and infected individuals of (t − τ i ) days. The factor δ i (t)β(t)S(t − τ i )I(t − τ i )/N conveys the rate of individuals who were infected τ i days ago. It follows from (4), that for any t where N (constant) is the total population size.
We solve (4) using matlab inner-embedded function dde23 with particular sets of model parameters. To solve the initial value problem (4), in the interval [t 0 , t 1 ] , we consider L(t 0 ) , S(t 0 ) , I(t 0 ) and T(t 0 ) as follows: where T(t 0 ) is the available data at time t 0 , and q is the initial value adjusting parameters. Initially, there is no lockdown individual so that we can consider L(t 0 ) = 0 . It follows from (3) and (6) that In the present context T (t 0 ) = 0 , since there were no corona-positive cases reported on January 22, 2020. As a consequence, we also take T i (t 0 ) = 0 for i = 1, 2, . . . , J.
Parameter estimation of the model. We focus on the exponential growth phase of the COVID-19 epidemic in Canada; one can use the approach to estimate the incubation period distribution for any region affected by the infectious disease. The time resolved (daily updated) database 28 provides the number of total coronapositive cases. The optimal values of p(t) = (q, α(t), β(t), δ 1 (t), . . . , δ J (t), ν) T , that is the set of initial values and model parameters, is obtained by minimizing the root mean square error function E(p(t)) , defined as where T (k) is the available data of total corona-positive cases on the particular kth day, and T (k) is the calculated results obtained from System (4). The integer M, used in (8), is the size of the data set. Due to the complexity of the error function, the minimization using the matlab function fminsearch requires a very large number of iterations.
Numerical experiment. In this section, we propose a detailed description of the computational procedure for the proposed model. On 23 January 2020, a 56-year old man admitted to Toronto hospital emergency department in Toronto with a new onset of fever and nonproductive cough, and returning from Wuhan, China, the day prior 31,32 . It is believed this is the first confirmed case of 2019-nCoV in Canada, and according to the government report, the novel coronavirus arrived on the Canadian coast on January 25, 2020, first reported case. The above information suggests that the start date of the current pandemic in Canada is possibly xsto be January 22, 2020. Additionally, some research studies reported that the estimation of the incubation period of COVID-19 is from 2 to 14 days 1,33 . As a consequence, in the present study we consider 14 delays, τ 1 = 1 day, τ 2 = 2 days, . . . , τ 14 = 14 days. Here we consider a calculation of 276 days, from January 22, 2020 to October 23, 2020. We decompose the time domain of 276 days into two parts : the time domain splitter is in the interval where the first wave is slowed www.nature.com/scientificreports/ down and the "second wave" begins, i.e. the splitter is in the interphase of two different scenarios. In this goal, we can choose the parameters p(t) as where p (1) = (q, α (1) , β (1) , δ 14 , ν (1) ) T and p (2) = (α (2) , β (2) , δ 1 , . . . , δ 14 , ν (2) ) T are some constants. The capability of an optimization package depends on the initial values of the parameters: for q, α , β , ν we consider any positive random number less than unity, where as a choice of δ = (δ 1 , . . . , δ 14 ) T is tricky. For this purpose, we consider a vector of 14 positive random numbers δ such that δ 1 < · · · < δ 4 > δ 5 > δ 6 > · · · > δ 14 and 14 i=1 δ i = 0.9 . We observe, from numerous numerical experiments, the renormalization factor 0.9 works perfectly for the computation.
For a complete calculation, we run the matlab code twice. Firstly, we run the code for the period January 22, 2020 to July 19, 2020 to obtain the estimated value p (1) est of p (1) , presented in Table 2, and the value of the error function E(p (1) est ) = 54.92 for q = 0.005. Then, we run the code for the entire period from January 22, 2020 to October 23, 2020, but using the estimated value p (1) est for the interval January 22, 2020 to July 19, 2020, and obtain the estimated value p (2) est of p (2) for the rest of the period, presented in Table 2, and the value of the error function E(p (1) est , p (2) est ) = 48.38 , defined as where M 1 corresponds the date July 19, 2020.

Conclusion
In this paper, we have derived a mathematical model based on a set of coupled delay differential equations, which was used to estimate the incubation period with good agreement with statistical works. Using the proposed model and publicly available data of confirmed cases, one could accurately estimate the incubation period in any region. We obtain the distribution of the incubation period from the population, so that it is better than any sample-dependent result. We have considered fourteen delays, but it is possible to consider an arbitrary number of delays. We can still use the model in the early stages of the disease, when the incubation period is unknown. We can select the maximum incubation period and then calculate the split confirmed corona cases (as shown in Fig. 2). The fit log-normal distribution of the split data with sufficiently small p-value will provide the good estimation. On the other hand, for example, suppose that the maximum incubation period is 16 days, and we consider 18 days. After calculation, we will only get few corona-positive cases for the incubation periods 17 and 18 days. After estimating the model parameters, one can estimate the incubation period of confirmed cases over a long period, over a small time interval, and even over a single day. The present approach can be used with a (9) p(t) = p (1) from January 22, 2020 to July 19, 2020 , p(t) = p (2) from July 20, 2020 to October 23, 2020 , (T (k) (p (2) est ) − T (k) ) 2 ,