A modified SEIR model to predict the behavior of the early stage in coronavirus and coronavirus-like outbreaks

COVID-19 is a highly infectious disease that emerged in China at the end of 2019. The COVID-19 pandemic is the first known pandemic caused by a coronavirus, namely, the new and emerging SARS-CoV-2 coronavirus. In the present work, we present simulations of the initial outbreak of this new coronavirus using a modified transmission rate SEIR model that takes into account the impact of government actions and the perception of risk by individuals in reaction to the proportion of fatal cases. The parameters related to these effects were fitted to the number of infected cases in the 33 provinces of China. The data for Hubei Province, the probable site of origin of the current pandemic, were considered as a particular case for the simulation and showed that the theoretical model reproduces the behavior of the data, thus indicating the importance of combining government actions and individual risk perceptions when the proportion of fatal cases is greater than \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$4\%$$\end{document}4%. The results show that the adjusted model reproduces the behavior of the data quite well for some provinces, suggesting that the spread of the disease differs when different actions are evaluated. The proposed model could help to predict outbreaks of viruses with a biological and molecular structure similar to that of SARS-CoV-2.


Scientific Reports
| (2021) 11:16331 | https://doi.org/10.1038/s41598-021-95785-y www.nature.com/scientificreports/ combining statistical analysis and mechanistic mathematical models 6 . He et al. 7 propose a mechanistic mathematical model that incorporates the effect of school closures, human behavior responses, and weather changes as the most plausible actions from a health data correlation study about the flu pandemic of 1918 in London 3 . Lin et al. 8 use a modified version of the method introduced by He et al. 7 to simulate the spread of COVID-19 in the city of Wuhan, China, thus emphasizing the effects of the measures adopted by the government and the individual reaction on the risk associated with the disease infection rate. They use this model because of the similarities between both diseases regarding the spreading velocity. However, they eliminate the effect of weather changes because there is no evidence about its relationship with the COVID-19 infection rate. The benefits of the proposed model are related to its consideration of zoonotic transmission and high migration of people during a short period as well as government measures and individual actions. The results of the simulations show a disease spread tendency consistent with the data on infected individuals reported in the city of Wuhan from January 27 to mid-February 2020. Nevertheless, the authors do not report simulations with extended periods that would allow us to verify the mechanistic model validity regarding the most recent data and the correlation study published recently by Tian et al. 5 .
In the present work, we intend to model the early stage of the spread of COVID-19 (we started this research in February 2020 and finished the modeling and writing in May 2020) using a modified version of the simple SEIR (susceptible-exposed-infected-recovered [9][10][11] ) and coupled it with the mechanistic model on disease infection rates proposed by He et al. 7 . For this infection rate, we eliminated the effect of weather changes but incorporated the effects studied by Tian et al. 5 as a function of time and individual reactions to deaths, which is one of the model parameters. Our purpose is to show predictions about the disease spread in the short and initial epidemic phases for the entire province of Hubei, China, and compared these finding with the health data reported for their respective periods. Although the complete model is not sophisticated, the results obtained in our simulations show an acceptable and valuable match with the health data reported for Hubei for the initial 90-day period. Hence, we consider that the results can be used by decision-makers to plot and implement policies, as well as contingency plans, to face possible new epidemic outbreaks in the early stage. In the near future, we plan to add new features to discuss more realistic scenarios, not only for COVID-19 but also for other diseases caused by other viruses that may occur in the future.
At the time of writing this paper, different variants have arisen during the pandemic. These variants might differ substantially in the characteristics that affect the dynamics of the simulations, for example, the transmission rate or the case fatality proportion 12 . In addition, the values of the adjusted parameters are appropriate for the data provided in a particular case, and the model describes the mean behavior of the state variables. Thus, the results given for a model simulation represent the behavior of specific data provided at a specific time. Hence, we consider the procedure to be applicable to other data sets as a framework, and comparisons of these data sets with the obtained data provides a measure of the method's effectiveness. The authors suggest that the methodology should be applied in future works for other regions.
This paper is structured as follows. In "Mathematical model", we introduce the mathematical aspects of the model, defining all state variables and equations. In "Data used in the model", we describe the health data used to fit the parameters. In "Numerical simulations", we describe all the simulation runs. In "Results and discussion", we provide a discussion. Finally, in "Conclusions", we present the conclusions of this work.

Mathematical model
We adopt the classical SEIR (Susceptible-Exposed-Infected-Recovered) framework as a baseline. In the SEIR model, the total population, which is represented by the variable N(t) for all t ≥ 0 , is regrouped into sets of individuals who are seen as units 11 . In this sense, the variable S(t) represents the number of persons susceptible to infection, E(t) represents the number of persons exposed to the infection, I(t) represents the number of persons infected after exposure, and R(t) represents the number of persons recovered after infection. Additionally, we add the variable D(t), which represents those patients who do not recover from infection, are not infectious but ultimately die, and the variable M(t), which represents the number of deaths from the disease. All these variables, of course, depend on time.
We consider the parameter φ related to the case fatality proportion (CFP) leaving the infected unit, which represents the individuals who are not susceptible to becoming infected again because they are in the process of dying within time g −1 and whose time until death is measured by the rate of change of the variable D(t). This fraction of individuals is dynamically accounted for by the rate of variation with respect to time of the variable M(t).
We now describe the equations governing the system in terms of the normalized variables s(t), e(t), i(t), r(t), d(t) and m(t). In other words, we rescale the variables described above with respect to the total population, which allows the suitable management of the model from a numerical perspective without the effects of differences in scale. In this sense, we assume that the change in the s(t) fraction decreases proportionally to that of , where β(t) is the transmission rate, which converts s(t) into e(t), and before spending a mean amount of time, it converts to σ −1 into the unit of infected individuals i(t). The zoonotic transmission is implemented with a stepwise function f(t). Denoting the derived operator with respect to time by ∂ t , our model is described as follows: www.nature.com/scientificreports/ The next layer in our model represents individuals that recovered with a rate of (1 − φ)γ . After a period of illness γ −1 (mean infectious period), the i(t) is converted into r(t): The rates of change for the variables d(t) and m(t) are given as follows: This model is complemented by the following equation used to determine the total population at each instant of time t > 0: The total individual population N(t) decreases for a positive migration rate µ(t) and increases for a negative migration rate (the population model remains constant with µ = 0 ). Although we consider the parameter µ as a time-dependent stepwise function, as shown in Table 1, we remove the time dependency for simplicity of notation.
All dynamical quantities in our novel SEIR model, which is normalized by N(t), satisfy the following relationship: Figure 1 shows the flow diagram that summarizes the model. We use the transmission rate function β(t) defined by He et al. 3 that incorporates the impact of governmental actions (all actions that impact mobility) and the decreasing contacts among individuals in response to the proportion of deaths or severe cases (i.e., the severity of the epidemic) as follows: where the quantity (1 − α) represents the seasonality of governmental actions (quarantine, airport closure, shopping center closure, social distancing, curfews, etc.) 8 for all α ∈ [0, 1] , and κ is a parameter that represents the intensity of perception of the risk p(t) that the individuals exhibit during the pandemic. This public perception of risk is modeled as follows: where −1 is the mean duration of impact of deaths on public perception. For instance, the spread of a disease without any action from part of the susceptible population is κ = 0 (naive spread). Governmental actions, such as quarantine and lockouts, are considered when α = 0 . A goal of this model is to analyze the effects of individual

Data used in the model
We use the data provided by the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University 2 . From this repository, we use the time series of confirmed cases, number of deaths and number of recovered individuals in China's 33 administrative dependencies from January 22, 2020, until 90 days later. For example, Fig. 2 depicts a plot of cumulative cases in the province of Hubei. From the beginning of the outbreak, we assume the timeline given in Fig. 3. The outbreak started on December 1, 2019, in a seafood market in the city of Wuhan in Hubei 8 . Then, because of the Chinese New Year holiday, a strong migration occurred from December 31 until January 22, 2020, when the Chinese government started the first soft control measures, with additional stronger governmental measures taken on January 29 that imposed circulation control, social distancing, educational lockout, etc. In our model, following the settings by Lin et al. 8 for the transmission function in Eq. (9), we set the parameter α to values of α = 0 from December 1 until January 23, α = 0.4239 from January 23 to January 29, and α = 0.8478 after January 29 . The zoonotic transmission function f(t) in Eq. (1) is the normalized value of the zoonotic transmission rate F(t) defined in Table 1. For the province of Hubei, F(t) is set as a stepwise function, with the value of F 0 = 10 from December 1 until December 31 and zero the rest of the time. For the rest of the administrative dependencies, the zoonotic transmission rate is always set to zero.  including the province of Hubei, which is the case study considered in this work. The data set used in our analysis consists of the first 90 days from January 22, 2020. We use the maximum-likelihood method, which closely follows the procedure used by He et al. 3 , to fit the free parameters N 0 , σ , and β 0 . The latter is split into two different base transmission rates, β 01 and β 02 . The splitting of β 0 corresponds to the finding that the disease transmission rate changes in response to the different governmental actions on two different dates.

Numerical simulations
We organize the numerical simulations into two main sets. For the first set, which is presented in subsection entitled Model simulations, we run our model for four different values of the CFP to elucidate its effect on the evolution of the disease spread related to government actions and the public perception of risk. For the second set, which is presented in subsection entitled Fitting using the health data from Chinese provinces, we perform data fitting with the health data using the maximum-likelihood method. Fitting using the health data from Chinese provinces. We fit the parameters shown in Table 2 using the maximum-likelihood method as described in He et al. 3 . We consider the following free parameters: the initial susceptible population, N 0 ; the inverse of the mean latent period, σ ; and the baseline transmission rate, β 0 . In some cases, we split this parameter into two, e.g., β 01 and β 02 , to consider different dynamics for disease spread. Once the best fit of the parameters is found, we identify the 1 − σ error estimation (1 standard deviation). It is worth mentioning that some of the fits are of good quality, for which we have modest to high confidence, whereas some are of low quality. Nevertheless, we present all of these fits for the sake of completeness of the survey. Figure 6 shows the results of fitting the number of infections per day I(t) for a subset of 15 provinces in China, and the remaining results are shown in Fig. 7. Figure 8 depicts the results of the model using the Hubei data, using N 0 = 1.14 × 10 6 , β 01 = β 02 = 0.5 , which was estimated in the data fitting, and CFP φ = 4.5% , which is very similar to the actual value. The rest of the parameters are the same as in Table 1

Results and discussion
In Figs. 4 and 5, we observe a reduction in the number of infections per day based on a comparison of the naive model, the individual reaction model and the individual reaction with government reaction model. The findings show that individual reactions and government measures led to a reduction in the number of infections per day. The curve representing the naive simulation did not show variations with respect to the changes in the CFP parameter φ , while these changes produced a remarkable effect on the other two models. The simulations showed that a larger value of parameter φ corresponds to a lower peak in the number of infections per day. The reduction with respect to the naive simulation is approximately 5:1 when φ = 20% , while it is barely perceptible when φ = 0.5%. At first glance, one can assume that the CFP is not related to the number of infected individuals. However, the higher the number of deaths is, the more cautious individuals become because the risk perception increases; thus, the possibility of contagion decreases. Equation (5) indicates that our model reproduces this behavior in a consistent manner. In fact, the variable i(t) is directly related to the variable d(t) (and then to m(t)) and inversely related to the parameter φ . Thus, for a higher value of φ , a lower value of i(t) is obtained.
In Fig. 4a, when φ = 10% , we also observe another small peak produced by the simulations around day 160, although this is only observed when only the individual reaction is considered. In contrast, we do not observe this second peak for the other model categories with a lower value of φ . These secondary peaks might represent rebounding outbreaks; thus, the model consistently reproduces this behavior that we expect in the short and long term, which is similar to that observed for other infectious diseases 3 . It is worth mentioning that there is almost no difference in the curves when we add the government measurements in addition to the individual response, which indicates that government action does not have a significant impact on the change in the curve. From our simulations, we understand that it is more important to consider actions taken by citizens than actions taken by the government that are not assimilated by individuals. Such a perspective centers contagion prevention mainly on individuals instead of measure enforcement.
With respect to the number of deaths (M(t)), the simulations are also consistent. Figures 4 and 5 (plots (c) and (d)) clearly show that a larger the CFP corresponds to a higher number of deaths. Additionally, we assess the influence of public risk perception and government action regarding deaths when varying the CFP, φ . A  Table 1  www.nature.com/scientificreports/ larger value of φ corresponds to a larger difference in the M(t) curves. In Fig. 4c, the effect of the individual and government actions reduces the height of the curve for the extreme CFP by four-fifths, while the reduction in plot (d) is less than 10in Fig. 5c, the reduction in the number of deaths is approximately one-fifth compared with that of the high CFP but is barely noticeable for the low CFP. This result highlights the importance of different measures (individual or collective) for reducing the effect of disease impacts. For fitting the parameters, in Figs. 6 and 7, as mentioned in subsection entitled Fitting using the health data from Chinese provinces, some parameters are of good quality and promote modest to high confidence, while others are of low quality. However, we can highlight that our model is generally able to reproduce the disease outbreak rebound observed in the data, which was pointed out in our previous discussion. Additionally, it is worth noting that the free parameter β 01 is mainly fitted in the range from 0.5944 to 1.68, which is similar to the values reported by Lin et al. 8 .
The large differences in the values of σ obtained in our data fittings probably indicate a contribution of the rapid onset of contagion once an individual is exposed, although they might also be due to differences in the population densities among the provinces of China. The contagion rate β represents the likelihood that a susceptible individual (S(t)) is exposed (E(t)), and this depends on the outbreak intensity. However, once an individual is exposed, the time until they are infected is given by σ . This value contributes to a higher velocity of disease spread once an individual is exposed, thus indicating the importance of personal security measures.
The values of the parameters that determine the dynamics of disease spread are closely related to the period of time considered in the data, as well as the appearance of new variants. Viruses, such as SARS-CoV-2, mutate continuously and may have varying effects over time 15 . Thus, models need to adjust the values of the parameters accordingly and mainly consider the appearance of variants of concern (VOCs). Future work could consider a parameter factor that includes the variation rate of viruses or the appearance of VOCs, or they could even include more equations to represent several virus variants 12 .
Finally, for more detailed insights into the case of Hubei, we obtain the simulation results depicted in Fig. 8. The simulation time starts on December 1, 2019, using the timeline shown in Fig. 3 and the data from Hubei: the initial susceptible population N 0 = 1.14 × 10 6 , β 01 = β 02 = 0.5 , as the value we obtain from the data fitting; the rest of the parameters are the same as in Table 1, and they are based on both government action and  Table 1  Although we fit the values of the free parameters using a time series starting on January 22, the simulation qualitatively shows good agreement with the Hubei data. The peak of the number of contagious individuals occurs around day 50, which is close to the end of January, consistent with the results of Lin et al. 8 , who noted that there is a delay in values after processing the tests. Thus, the simulation results must be offset by approximately 15 days with respect to the data time series. However, the simulation shows an underestimation of the peak. The number of deaths M(t) is very similar to that reported in the data. With respect to the number of recovered individuals and total cases, the results show qualitatively good agreement but are quantitatively overestimated.

Conclusions
In this paper, we present an SEIR model for computationally simulating the COVID-19 outbreak, and it considers the combined effect of governmental actions, public perception of contagion risk and the case fatality proportion (CFP). The outcome is a theoretical model that qualitatively reproduces the behavior expected for the COVID-19 outbreak based on the three scenarios of no governmental action or individual reaction due the perception of contagion risk, which is labeled as the naive category; only the individual reaction based on the perception www.nature.com/scientificreports/ of risk; and the combination of individual reaction and governmental actions. In all scenarios, we also consider low ( 0.5% ), middle ( 1% ), high ( 2% ), and extreme ( 20% ) values of CPF. The results of the simulations show that the influence of CFP variations is more important when considering the individual reaction in terms of the perception of risk. In one case, a disease outbreak rebound appears when only the individual reaction to risk is considered; however, this rebound effect disappears when government action is considered as a complementary measure. We conclude that individual actions with regard to the perception of contagion risk are more effective in reducing disease spread; however, government measures reinforce the reduction in spread when people relax after a reduction in the death rate. We deploy a maximum-likelihood method to fit health data from the 33 provinces of China provided by the Johns Hopkins University COVID-19 database. We fit the data for the free parameters of the initial population N 0 , the inverse of the main latent period σ , and the baseline transmission rate β 0 , which is split into two parameters, β 01 and β 02 , acting at different times. The data fitting is reported for all 33 provinces. By adjusting certain parameters, the model can capture the transmission dynamics of the COVID-19 outbreak. Figure 6. Fit of the first 15 Chinese provinces produced using the data from the CSSE repository and our model using the values of the parameters as in Table 1, with F 0 = 10 for Hubei and F 0 = 0 otherwise. Occasionally, β 02 has to be removed from the fit to reproduce the smaller second outbreaks seen in some provinces. We give the best fit parameters and their quality of fit ( R 2 ) in Table 2.
Scientific Reports | (2021) 11:16331 | https://doi.org/10.1038/s41598-021-95785-y www.nature.com/scientificreports/ Figure 7. Fits from the remaining 18 to 33 Chinese provinces produced using the data from the CSSE repository and our model using the values of the parameters as in Table 1, with F 0 = 10 , for Hubei and F 0 = 0 , otherwise. Occasionally, β 02 has to be removed from the fit to reproduce the smaller second outbreaks seen in some provinces. We give the best fit parameters and their quality of fit ( R 2 ) in Table 2.