A recursive bifurcation model for early forecasting of COVID-19 virus spread in South Korea and Germany

Early forecasting of COVID-19 virus spread is crucial to decision making on lockdown or closure of cities, states or countries. In this paper we design a recursive bifurcation model for analyzing COVID-19 virus spread in different countries. The bifurcation facilitates recursive processing of infected population through linear least-squares fitting. In addition, a nonlinear least-squares fitting procedure is utilized to predict the future values of infected populations. Numerical results on the data from two countries (South Korea and Germany) indicate the effectiveness of our approach, compared to a logistic growth model and a Richards model in the context of early forecast. The limitation of our approach and future research are also mentioned at the end of this paper.

www.nature.com/scientificreports/ untraced contacts 25 , undetected international cases 26 , and actual infected cases 27 . Statistical reasoning 28,29 and stochastic simulation 30,31 were also explored by a few researchers. The second group of investigations was based on dynamic modelling. Susceptible exposed infectious recovered model (SEIR) was used in assessing various measures in the COVID-19 outbreak [32][33][34][35] . Furthermore, it was utilized in investigating the effect of lockdown 36 , transmission process 37 , transmission risk 32 , and the effect of quarantine 32 . The SEIR model with time delays was also developed for studying the period of incubation and recovery 38,39 .
Richards 40 developed a flexible growth function for empirical use in the context of plant data based on von Bertalanffy's growth function 41 , which was originally designed for animals. This model was later used for fitting the single-phase outbreaks of severe acute respiratory syndrome (SARS) in Hong Kong 42 and Taiwan 43 as well as a multi-phase outbreak of SARS in Toronto 44 . The Georgia State group 45 recently studied short-term forecasts of the COVID-19 epidemic in Guangdong and Zhejiang, China between February 13 and 23, 2020 via a generalized logistic growth model 46 , the Richards model, and a sub-epidemic model 47 . As an extension to a logistic growth model, the Richards mode can be described by a single differential equation: where P(t) represents the cumulative number of infected cases at time point t, r denotes a growth rate, K refers to the maximum asymptote, and a is a scaling parameter. One solution of Eq. (1) is where t 0 is the time value of the sigmoid's midpoint. When a = 1, this model is degenerated into a simple logistic growth mode with three parameters [K, r, t 0 ]. Equation (2) represents a four-parameter model [a, K, r, t 0 ] and other variations of the Richards model could consist of up to 6 parameters. In this paper, our comparison and discussion are limited to Eq. (2).
Although there have been many recent studies with respect to the COVID-19 virus spread, an accurate forecasting model for the virus spread based on data at a very early time point is still elusive. Such a model is crucial to a decision-making process for strategic plans to achieve a balance between reduction in life loss and avoidance of economic crisis due to lockdown. In this paper, we develop a new recursive bifurcation model, apply it to the recent data in two countries (South Korea and Germany), and compare it with a simple logistic growth model and the Richards model in the context of COVID-19 virus spread.
The rest of this paper is organized as follows. In "Recursive bifurcation model", a recurve bifurcation model is introduced to model the COVID-19 spread. A bifurcation analysis is given in "Bifurcation analysis of COVID-19 virus spread" on the data of infected population in South Korea. "Early forecasting of COVID-19 virus spread" describes the forecasting of COVID-19 virus spread based on our model and a comparative study with two existing models, followed by some concluding remarks in "Conclusions".

Recursive bifurcation model
In this paper, we focus on the cumulative number of infected population, which is an important metric to measure the extent of the COVID-19 spread in different countries. Although the infected population in most countries follows a pattern of a logistic or sigmoid function, the logarithm of the infected population may provide more information, as shown in Fig. 1b.
The countries, in which a bifurcation pattern occurred, include South Korea, Germany, United States, France, Canada, Australia, Malaysia, and Ecuador. Figure 2 shows the pattern in the last 5 countries of the list. The detailed information of Germany and United States data are given in "Early forecasting of COVID-19 virus spread". By utilizing the bifurcation, we can find out the intrinsic parameters (such as growth rate) in cycle 1 and Figure 1. The number of infected population in South Korea as of April 5, 2020. www.nature.com/scientificreports/ apply those parameters in the prediction for cycle 2 or beyond. More importantly, the bifurcation model performs better than the Richards model in the early forecasting of COVID-19 virus spread. A more detailed discussion about this improvement is provided in "Early forecasting of COVID-19 virus spread". Following the above idea, we introduce a recursive Tanh function to describe the cumulative number of infected population within each cycle of an entire virus spread process: where i refers to the i-th cycle, P is the cumulative number of infected population at any time point in the i-th cycle, D represents the number of days since the initiation of virus spread, P i−1 stands for the cumulative number of infected population at the end of the (i − 1)-th cycle, r i is the spread rate in the i-th cycle, and D i−1 refers to the number of days at the end of the (i − 1)-th cycle. The purpose of adding 1 in the logarithm calculation is to avoid an infinity caused by the case where P = 0. www.nature.com/scientificreports/ Note that Eq. (3) is not strictly a recursive formula in a conventional sense. The reason for us to call it as a recursive one is that Eq. (3) should be recursively solved starting from cycle 1 toward cycle n, if n is the last cycle for the virus spread. When n = 1, this equation is degenerated to a regular Tanh function:

Bifurcation analysis of COVID-19 virus spread
In order to validate Eq. (3) for the analysis of COVID-19 virus spread, we have to select a complete virus spread process. Among all the countries, South Korea seems to be the best choice for this validation because the country provides reasonably reliable data and the virus spread in that country has been stabilized. r i in Eq. (3) represents an intrinsic attribute of the virus spread rate. It can be estimated by a linear leastsquares fit of the following linear equation in a parameter space: Figure 3a shows the result of determining the virus spread rate, r 1 . By using this r value, we predict the infected population, y p , which is very close to the true data, y, as shown in Fig. 3b.
Furthermore, by using r 1 in cycle 2 of South Korea data, we want to validate whether Eq. (3) is still valid by introducing Eq. (6), and α should be unity (Fig. 4): Fig. 4, α = 0.968 , which is very close to unity. This indirectly indicates the correctness of Eq. (3) for cycle 2 with growth rate, r 1 , from cycle 1. The bifurcation in Fig. 1a is easy to identify visually. An automatic algorithm can be created on the basis of discontinuity of tangential direction with a traverse on the curve. Since it is not the main focus of this paper, we do not explore this aspect any further herein.

Early forecasting of COVID-19 virus spread
Based on the model in "Bifurcation analysis of COVID-19 virus spread", we design an algorithm for early forecasting of COVID-19 virus in South Korea and Germany, as given in Table 1. Since the infected population has recently been stabilized in these two countries, it is then possible to validate the accuracy of this early forecasting model.
We first use the following formula to estimate β n through a linear least-squares fit: Nonlinear Levenberg-Marquart least-squares fitting 48 is computed to determine three unknown parameters ( β n , θ n and D n ) simultaneously in the following equation for future forecasting: where θ n is an extra parameter, which plays a role of slope control. Once β n , θ n and D n are determined, Eq. (8) can be used to predict the values of infected population at future time points. Figure 5a defines several different time points to investigate the performance of early forecast on the "future" infected population. Here, the "future" is termed only in a sense with respect to a selected time point (i.e., a reference point after which the "future" is factitiously defined) even though we already have the true infected population data for a period after that time point. t i refers to the time value of the inflection point. 0.9t i , 0.8t i , ..., and 0 equally divide the range [0, t i ] into ten intervals. We use a similar interval for time values that are greater than t i : 1.1t i , 1.2t i , ..., and mt i , where m could be any real number and should be greater than unity.
It is shown in Fig. 5b that the infection point, t i , appeared 12 days later than the cycle transition point, t c , between cycles 1 and 2 on South Korea data; a similar pattern was also observed in U.S., Germany and the countries listed in Fig. 2. This provides a better opportunity for our bifurcation method to produce an early forecast, compared to existing methods such as the Richards method, which generates a reasonably good prediction only at a time point after the inflection point because the right part of a curve after the inflection point can't normally be mirrored from the left part of the curve before the inflection point. In the case of Germany data (Fig. 5c), t c appeared 38 days earlier than t i . This provides an opportunity for our bifurcation method to utilize the cycle information for early forecast on the COVID virus spread. Tables 2 and 3 are a comparison in the 95 percent confidence interval of prediction errors of three models at a reference time point defined in Fig. 5a. The second column of these tables contains a mean value and 95 percent confidence bounds in a pair of parentheses. In general, our bifurcation model performs relatively better for an (7a) log(P + 1) − log(P n−1 + 1) =β n 2 1 + e −2r n( D−D n−1) − 1 , (7a) y =β n Z,  Step 1 Determine the virus spread rate in cycle 1, r 1 , based on a least-squares fitting of Eq. (5) Step 2 Recursively analyze the infected population in cycles 2 through n − 1 Step 3 Let the virus spread rate in cycle n,r n = r n−1 Step 4 Estimate an initial value of the logarithm of infected population in cycle n by a linear least-squares fitting of β n in Eq. (7) Step 5 Determine β n , θ n and D n by using a nonlinear Levenberg-Marquart least-squares fitting through Eq. (8) Step 6 Predict the future infected population based on Eq.  Figure 6 shows the curve fitting of two-country data based on the three models used in this paper. The parameters associated with each model are given in Table 4.  www.nature.com/scientificreports/ Note that for the data from South Korea and Germany, there is no multi-stage pattern if the simple logistic growth model or the Richards model is used. Only through the special treatment in our bifurcation model, did a two-stage pattern appear, allowing a more accurate forecast at an early time point (such as 0.8 T i or 0.9 T i ) on the infected population growth at a later time point (for example, 2.0 T i ).
The importance of our model is supported by the results in Tables 2 and 3, where our model performs significantly better than the existing models in terms of early forecast on the growth of COVID-19 virus spread. Consequently, our model has a potential to be used in decision making for the events of virus spread in the future.
The data from United States presents a challenge to our approach and also reflects a limitation of the method. As shown in Fig. 7a, for the period from January 22, 2020 to April 17, 2020, there was no inflection point and our cycle transition point appeared very early (Day 33). This case indicates that the cycle transition point appeared at  www.nature.com/scientificreports/ least 52 days earlier than the inflection point. However, since the inflection point did not appear even on August 3, 2020 (Fig. 7b), it is difficult to validate the accuracy of any early forecasting models on U.S. data at the time of writing this paper. It is still elusive how to design and evaluate an early forecasting model when the entire virus spread history is at its early stage without the occurrence of its inflection point. This will be a future research topic.

Conclusions
In this paper, we propose a recursive bifurcation approach for early forecasting of COVID-19 virus spread. An algorithm is developed to predict the future infected population based on ongoing existing data as of June 14, 2020. Numerical analyses were conducted in comparison with two existing models (a logistic growth model and a Richards model). The results indicate that our bifurcation model performs relatively better than the two existing models at 0.8 T i or 0.9 T i time point, where T i refers to the inflection point of an infected population-time curve. We presented an important observation in which the cycle transition time point, T c , appeared much earlier than T i . This allows our bifurcation model to perform well in the early forecasting of COVID-19 virus spread in South Korea and Germany. However, the ongoing infection spread in United States presents a challenge to our model. It will be a future research topic on how to evaluate the forecasting of COVID-19 infection spread when its inflection point has not occurred as of August 3, 2020.

Data availability
The www.nature.com/scientificreports/ Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.