Introduction

The Coronavirus disease 2019 (COVID-19), has severely affected different aspects of human health all over the world. The high transmissibility rate and lack of efficient therapeutics have transformed the situation into a major challenge1,2,3. Ranking fifth according to the outbreak, announced the first cases on February 19, 2020. The pandemic quickly impacted most regions of the country with the highest rates in Tehran, Qom and Isfahan provinces4,5.

Virus epidemics have been known as one of the major health issues causing a high mortality rate in human communities throughout history. The Spanish flue emerged in 1918, caused 50 million deaths for just over 2 years6. Also, since the early reports of HIV infection in 1980, more than 32 million deaths have been reported around the world due to virus infection7. Tragically, not only these older global epidemics but also, the local spread of SARS, MERS, and Ebola viruses in recent years have not made countries ready for such crises. The lack of preparation for the outbreak of COVID-19 beside the high rate of transmissibility, unknown complications, and inappropriate medications led to a world-wide disaster paralyzing the health care systems even in developed countries. More importantly, the future course of the disease is not yet clearly known in many affected countries and the estimates of the ultimate spread of the epidemic are not coherent. This makes policy-making and planning for the required resources very difficult for the governments8. In this unknown situation, forecasting approaches are of particular importance.

Since the emergence of COVID-19, different mathematical modeling approaches have been employed to simulate the disease course9; the artificial intelligence-based models10 are interesting but as they depend on many learning steps, their validity can be questioned in the absence of sufficient training datasets. Day-level forecasting based on time-series data is another approach11 that simply follows previous patterns and cannot predict the trend of changes. Agent-based modeling is also a rational approach for predicting the disease course, which simulates the fellow of the individuals (agents) to calculate the disease spread in the community12. Nevertheless, such models rely on the population-level parameters such as rates of movements, distancing and virus infectivity parameters which are not yet known. Ordinary differential equation (ODE)-based models have been used for a long time to simulate the classical dynamics of epidemics13. This type of models was first proposed by Kermack and McKendrick in 1927 to simulate the transmission of infectious disease such as measles and rubella14. Such models assume susceptible (S), infective (I), and recovered (R) fractions in a close population and calculate the rate of changes in each fraction with ODEs15,16.

Over the past year, various simple and modified SIR models have been developed to predict the course of COVID-19; Anastassopoulou et al. provided a preliminary estimate of the fatality and recovery ratios for the population of Wuhan by a discrete-time SIR model17. Moreover, a model was developed by Giordano et al. that predicted the effect of different lockdown strategies on the epidemic in Italy. They considered different sub-groups regarding the stage and severity of the disease in the infected individuals18. Similarly, using an SEIR (Susceptible-Exposed-Infectious-Recovered) framework, Lin et al. predicted the effect of government policies and individual actions on the spread of the epidemic in China19. It should be noted that the modified SIR models require more complex data for the development, and due to the presence of little information and lack of reliable data regarding this newly emerged disease, the simple SIR model has been the choice of many investigators20.

Despite the simplicity and effectiveness of SIR-based models in forecasting a variety of other infectious diseases, their capability in the case of the COVID-19 pandemic is a matter of debate. In this regard, an SIR model was developed using the data of the first 3 months of COVID-19 spread in Isfahan province and the conformity and accuracy of the model was evaluated using the actual data in later months. Results of this study scrutinize the appropriateness of SIR models for the simulation and prediction of the COVID-19 epidemics.

Material and methods

SIR model

In the SIR models, the population is considered to be closed and the sum of susceptible, infective or recovered fractions denoted by \(i\left( t \right)\), \(s\left( t \right)\), and \(r\left( t \right)\), respectively, is equal to 1 for each \(t \ge 0\). The model is defined by the following set of ODEs:

$$\frac{ds}{{dt}} = - \lambda si,$$
(1)
$$\frac{di}{{dt}} = \lambda si - \gamma i,$$
(2)
$$\frac{dr}{{dt}} = \gamma i,$$
(3)

where \(\lambda\) is the infection rate and \(\gamma\) is the recovery rate. R0 is known as the basic reproduction number and defined as:

$$R_{0} = \frac{\lambda }{\gamma }.$$
(4)

R0 is an essential determinant of outbreaks and can be interpreted as the expected number of new cases directly caused by an infectious individual before the recovery.

All simulations were performed using MATLAB R2017a21.

Epidemiological data

The data used in this study was provided by the medical care monitoring center (MCMC) of Isfahan University of Medical Sciences, Isfahan, Iran. This data consists of the daily hospitalized cases and deaths from all cities of Isfahan province except Kashan, Aran and Bidgol which are in the zone of Kashan University of Medical Sciences. The population of the studied region is approximately 4,632,000. Considering the low negative predictive value of PCR test for the diagnosis of COVID-1922, all patients admitted to hospitals with the clinical suspicion of the disease were considered as infected cases. This can describe the potential discrepancy between the actual cases assumed in this study and other reports that rely on the molecular diagnosis of the infection.

Results

Using the epidemiologic data for the first 3 months, an SIR-based model was constructed to predict the disease course

In the SIR model, \(r\left( t \right) + i\left( t \right){ }\) determines all the individuals who are no longer susceptible. The term can be interpreted as the accumulative number of new cases in each time-point that can be recovered, died or are still infectious due to the course of disease. According to recent studies, 30% of infected individuals are asymptomatic that are not identified in the population unless in active case finding programs23. Out of the remaining infected cases that manifest the disease, 70% show mild to moderate symptoms and are treated in out-patient settings without any need to hospitalization. whereas, 30% of symptomatic cases have severe manifestations and must be admitted to hospitals for a while during the infection24,25. Hence, the cumulative number of new hospital-admitted cases can be estimated from Eq. (5):

$$Cumulative{ }\,hospitalized\,{ }patients{ } = { }0.21{ }\left( {r\left( t \right) + i\left( t \right)} \right).$$
(5)

In addition, the number of cumulative deaths is assumed as:

$$Cumulative\, death = K \times \left( {r\left( t \right) + i\left( t \right)} \right).$$
(6)

Although a mortality rate of 2–3% is reported for COVID-19 patients26, the actual rate depends on a variety of parameters including the average age of the population and the capacity of healthcare systems. In this regard, the parameter \(K\) is determined for each population individually based on the reported mortality rates.

To estimate the R0 value for the COVID-19 epidemic, the model was first fitted to the cumulative new cases data reported until April 10th, 2020 from Wuhan and Italy and the epidemic final size (Fig. 1a,b), Cumulative hospitalized cases (Fig. 1c,d), and the Cumulative deaths (Fig. 1e,f) were predicted. Then the model was fine-tuned based on Isfahan statistics.

Figure 1
figure 1

Modeling COVID-19 epidemic in Wuhan and Italy. To provide a rough estimation of the epidemic dynamics, the model was first generated with the reported data of Wuhan and Italy. (a,b) Epidemic final size. (c,d) Cumulative hospitalized cases. (e,f) Cumulative deaths.

An R0 of 1.0078 could best describe the data of Italy. Also, the curves of Wuhan could be appropriately simulated when R0 was assumed 1.0146 (Fig. 1d). Notably, the number of new hospital-admitted cases in Italy showed a decline in days 40–50 which could be attributed to the strict social distancing policies imposed by the government (Fig. 1c). To fit the model to the mortality data, parameter \(K\) (Eq. 6) was set at 0.025 for Italy. Notably, when the same value was considered for Wuhan, the simulation curve was far higher than the reported official data (Fig. 1f). In agreement with this observation, the Chinese authorities have later declared the inaccuracy of the initial reports of deaths in Wuhan and boosted the number by 50%27.

The simulations for Wuhan and Italy allowed starting the modeling of the epidemic in Isfahan with a rough estimation of R0 value. Although, each outbreak is described with a unique R0 in classical SIR models28,29, we decided to use a set of R0 values for different time intervals to account for the variations of the community behavior and inconsistency of social distancing regulations. At the time of model construction, the actual epidemiological data were available for the episode from Feb 14th to April 11th. The model had the best fit to the actual data when four different R0 values were considered. Since the changes in social distancing are reflected in the hospital admission rates with a time delay of about 2 weeks30, the four R0 values were corroborated with community behavior variations as represented by city traffic reports (data not shown).

To forecast the epidemic in this province, three scenarios based on the strictness level of social distancing were assumed and different R0 values were considered in each case (Table 1). From Feb 14th to April 11th, the highest value of R0 was 1.0165 (R01) which is attributed to the beginning of the epidemic when people were not aware of the outbreak and thus, no restriction was imposed. In the “bad scenario”, R0 again reached the same value after a transitional step. In addition, during the mentioned period, the smallest value of R0 was 1.0040 (R03). Hence, considering the practical issues, it is assumed that R0 may again reach to a value as small as 1.0050 in the “good scenario”. Also, an intermediate value of 1.0095 was considered for the “feasible scenario”.

Table 1 R0 values used in the SIR model.

In the good scenario achieved by strict social distancing, the SIR model predicted that about 13,000 cases would be cumulatively hospitalized by the end of the epidemic and the total number of deaths would reach 800 cases. In this scenario, the curve of cumulative cases approaches a plateau on May 4 and by June 10, only 0–2 new cases would be admitted to hospitals on a daily basis (Fig. 2a).

Figure 2
figure 2

Modeling COVID-19 epidemic in Isfahan province using the first 3 month data. To forecast the epidemic, Good (a), Feasible (b) and Bad (c) Scenarios are assumed.

The feasible scenario was attributed to closing schools, universities, cultural centers and limiting social gatherings and businesses that are not critical. In this scenario, the model predicted that total hospitalized cases and deaths would reach about 15,000 and 920, respectively. Also, the curve of cumulative cases would approach a plateau on May 24 and by July 18, it predicted that only 0–2 new cases would be admitted to hospitals on a daily basis (Fig. 2b). In the bad scenario, a situation was assumed in which the community is back to the routine life style. In this scenario, 22,000 cases would be totally admitted to the hospitals and the number of victims would be more than 1300. The model also forecasted that a second peak is unlikely and the epidemic would last up to mid-August (Fig. 2c). The results of the three scenarios are summarized in Table 2.

Table 2 Predicted outcomes of the three proposed scenarios.

Evaluation of model predictions with actual epidemiologic data after 9 months

The prediction patterns in the three scenarios were evaluated by their conformity to actual data over the 9 months since the start of the epidemic. Although the good-scenario predictions were far from the pattern of epidemic (data not shown), the trend of feasible scenario could appropriately match with the daily hospitalized cases until mid-June (Fig. 3a). However, a new long-lasting huger peak was appeared at the beginning of July which was not anticipated even in the bad scenario, leading to the increase of total infections to 43,293 until October 22 (Fig. 3b). Although the model could forecast disease course patterns in the short term, the predicted values were far from the actual data in later time intervals and the constructed model was unsuccessful at long term prediction of epidemic course.

Figure 3
figure 3

Evaluation of model predictions after 9 months. Actual data until October 22 are demonstrated in line with model predictions in Feasible (a) and Bad (b) scenarios. In each graph, the time-intervals for model construction and validation are indicated.

Discussion

COVID-19 pandemic is emerged as a massive health burden all over the world. Mathematical modeling is a useful approach for the prediction of disease dynamics to take best decisions. Currently, various mathematical approaches have been developed for modeling the COVID-19 pandemic which mostly depend on a number of parameters for the simulation31. Due to the lack of access to adequate information about this newly emerged pandemic, simple strategies such as SIR-based models have been implemented by several investigators. However, their accuracy and suitability for prediction of the COVID-19 disease course is a matter of debate. In this study, using the data of the first 3 months of the COVID-19 epidemic in Isfahan, an SIR model was developed and the outcome was predicted with three scenarios that differ in terms of social distancing stringency. Then, the constructed model was evaluated for conforming the predicted values to the actual data in the six consecutive months. Although the model could appropriately simulate the patterns until the mid-June, it was unable to predict the second peak of epidemic emerged in July.

The inability of our model to predict COVID-19 pandemic in Isfahan motivated us to assess the conformity of the predictions of SIR models developed by other investigators to the actual data of different regions around the world. Although the SIR model developed by Ahmetolan et al. could predict the course of disease in the countries with controlled epidemics, such as China, Korea, and Singapore, it was unable to forecast the larger peak mainly started at the late summer in many countries located in the northern hemisphere32. Similarly, the modified SIR model constructed by Cooper et al., which was manually adjusted to fit actual data in a period of time33, was unable to predict the next surges, and the estimates of infected populations were also far from reality. For instance, the model predicted 330,000 infections at the end of September in India, but it actually turned to be around 900,00034. Morover, the extended SIR model of Wangping and colleagues estimated that the epidemic in Italy would be ended in August with a total number of 182,051 infected cases. Nevertheless, according to actual data, a second peak started in October with more than 700,000 total infected cases35. Similar inconsistencies between predictions of SIR-based models and actual data for the COVID-19 epidemic could be observed in other studies28,36,37,38.

The failure of SIR-based models to forecast the COVID-19 pandemic can be described by a variety of reasons. These over-simplified models ignore the factors that have a great effect on the course of disease. For instance, various studies demonstrate that air pollution is a main reason for the observed variations in disease contagion39,40. The other issue affecting the spread of virus is the behavioral changes considerably related to the social and cultural context of the population41. As mostly a single R0 value is considered in the SIR models, the unexpected changes in the social behaviors of the population are missed and the model would be unable to follow the emerging alterations. Indeed, ceremonies and national gatherings have a great impact on disease spread. For instance, a religious meeting in Malaysia, held on February 27 to March 3 is supposed to be the source of virus spread in India and Pakistan42. Similar social events in Isfahan, at least partly, may describe the unpredicted peak starting from July.

Another explanation for the failure of SIR-based models to predict the COVID-19 epidemics is that the modeling is based on the assumptions that are not necessarily true; the population is considered closed in the SIR models, whereas, complete isolation was not followed in most regions, making them vulnerable to changes in the neighboring communities. In addition, the recovered individuals are assumed as immunized in the SIR models, which are no longer susceptible. This assumption contrasts with new findings suggesting there is a possibility of the reactivation of the virus or reinfection of previously infected individuals43,44,45. Indeed, a recent study indicates that the longevity of neutralizing antibodies in infected individuals is only around 50–60 days 46.

Conclusion

Taken together, the COVID-19 pandemic features are not coherent with the SIR modeling framework and the dynamics of this outbreak is under the influence of various parameters for most of which quantitative information is not yet available. More sophisticated modeling approaches in line with more precise epidemiological and biomedical data are urgently required to make the pandemic forecasting feasible.