Deep learning forecasting using time-varying parameters of the SIRD model for Covid-19

Accurate epidemiological models are necessary for governments, organizations, and individuals to respond appropriately to the ongoing novel coronavirus pandemic. One informative metric epidemiological models provide is the basic reproduction number (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_0$$\end{document}R0), which can describe if the infected population is growing (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_0 > 1$$\end{document}R0>1) or shrinking (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_0 < 1$$\end{document}R0<1). We introduce a novel algorithm that incorporates the susceptible-infected-recovered-dead model (SIRD model) with the long short-term memory (LSTM) neural network that allows for real-time forecasting and time-dependent parameter estimates, including the contact rate, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta$$\end{document}β, and deceased rate, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu$$\end{document}μ. With an accurate prediction of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta$$\end{document}β and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu$$\end{document}μ, we can directly derive \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_0$$\end{document}R0, and find a numerical solution of compartmental models, such as the SIR-type models. Incorporating the epidemiological model dynamics of the SIRD model into the LSTM network, the new algorithm improves forecasting accuracy. Furthermore, we utilize mobility data from cellphones and positive test rate in our prediction model, and we also present a vaccination model. Leveraging mobility and vaccination schedule is important for capturing behavioral changes by individuals in response to the pandemic as well as policymakers.

In this work, we combine a compartmental model with a recurrent neural network that incorporates mobility data as well as the positive test rate. We (1) predict the time-dependent parameters β and µ using a neural network; (2) forecast the infection rates when mobility decreases or increases; and (3) forecast the change in infection rate based on different vaccination schedules. The goal of this paper is to provide a method to predict time-varying parameters β and µ (and hence R 0 ) as well as to solve the SIRD equations.
The method under consideration in our paper combines the two aforementioned approaches. We first introduce a version of recurrent neural networks to predict the time-varying parameters β and µ . Since γ is assumed to be constant, one can easily find R 0 = β/γ from the neural network. We then obtain the compartments, S, I, R, and D, by numerically solving the SIRD equation over a certain time period (e.g. 7 days). To test the performance of our approach, we used publicly available data for different countries, France, United Kingdom, Germany, and South Korea, provided by Johns Hopkins University. For more detail, we provide an illustration of the algorithm in Fig. 9. We also include two additional datasets: mobility data from cellphones and the positive test rate. Studies reveal that both mobility and positive test rate have been shown to influence the spread of Covid-19 considerably [22][23][24][25] .
In this paper, we present an accurate computational scheme to predict the reproduction number which enables Covid-19 forecasting. We use this scheme to forecast different scenarios by increasing or decreasing the mobility parameter. In doing so, our model can help study the effect of government-imposed lockdowns on R 0 . Furthermore, we make use of a SIRD model with vaccination to see how vaccination affects the spread of the virus. Among many other vaccination models 26,27 , our study focuses on the model introduced in 28,29 as it is sufficient to capture important dynamics in the experiments. By leveraging parameters relative to the vaccination rates, our simulations show how the vaccination rate affects the number of infectious cases. Such experiments can show how different public health interventions may affect the outcome of the epidemic.

Results
In this section, we describe a sequence of numerical experiments of our algorithm further detailed in the Method section below. First, we present the estimated values of our time-dependent parameters β and µ using the Levenber-Marquardt algorithm. Then, the accuracy of the algorithm is demonstrated using in-sample data, and out-of-sample predictions for the next 10 weeks. Lastly, forecasting depending on mobility and vaccination rate is examined. In summary, our main contributions consist of three key findings; (i) our SIRD-LSTM combined network outperforms classical prediction models; (ii) we incorporate the mobility and vaccination as inputs of our neural network to increase the accuracy of our parameters predictions; (iii) we forecast Covid-19 trends when mobility decreases or increases.
Parameter Estimates. A significant finding of our paper is that treating the parameters β and µ as timedependent increases model accuracy. Figure 1 shows (β, µ) for four countries (France, United Kingdom, Germany, and South Korea) generated by the Levenberg-Marquardt algorithm. From this, we can find the basic reproduction number, R 0 = β γ , with γ = 1/14 , which is useful to study the dynamics of the infectious class 30 . We compare real infection data from France, the United Kingdom, Germany, and South Korea with a SIRD model using constant β or time-dependent β . Figure 2 shows the difference between a β and µ constant, that we estimate using the Levemberg-Marquardt algorithm over one year, with β and µ estimated over just 1 week. The time-dependent model more accurately forecasts the infection rate over seven days across each country regardless of the time period. Therefore, it is necessary to consider β and µ as time-dependent variables. Accuracy of our model. To test the forecasting capability of the SIRD-LSTM combined network, we compare the number of predicted confirmed Covid-19 cases under various measures for within sample scenarios. The in-sample fit of the model is an essential indicator for the validity of the model's prediction of the parameters, whereas the out-of-sample forecasts can provide an important guideline for decision/policymakers. Figure 3 depicts the prediction of the time varying parameters (β, µ) compared with (β, µ) from the dataset. We randomly choose N T test data amongst 365 days, and make use of them as a test set. To measure the accuracy, we use the relative-L 2 errors of β , µ , S, I, R, and D such that where Y i true is the ith true dataset of β , µ , S, I, R, or D, and Y i pred is ith predicted values from our algorithm. We observe that the predicted and true parameters are close to each other. Table 1 demonstrates quantitative results on accuracy of our computation. Table 1 shows the relative L 2 error of β is between 3.13 × 10 −3 and 6.29 × 10 −2 , the relative L 2 error of µ is between 9.26 × 10 −2 and 1.73 × 10 −1 . The relative L 2 error with N T = 14 (2 weeks), of the compartments, S, I, R, and D, is also displayed in Table 1. Figure 4 depicts mobility, positive test rate, cumulative infectious individuals, and contact ratio β against the time. The positive test rate and cumulative infectious individuals follow similar trends as opposed to mobility and the positive test rate. The countries under consideration enforce lockdowns as cumulative infectious individuals increased. Hence, the trend plots reveal that greater mobility leads to an increase in infectious individuals. www.nature.com/scientificreports/ Out-of-sample forecast. We next conduct an out-of-sample forecast analysis of our SIRD-LSTM combined model. Figure 5 demonstrates a prediction of R 0 of each country using β generated by the LSTM networks. By forecasting β , in Fig. 6, we show a short-term prediction of the SIRD model up to 10 weeks. In the simulation, we assume that the positive test rate and mobility are the same as the final observation from the dataset. Both the SIRD and vaccinated SIRD models are computed and demonstrated in Fig. 6. In France, Germany, and South Korea, the depicted curves of the infections for the next 10 weeks are increasing, while the infection curve for the next 10 weeks tends to slightly decrease in the United Kingdom. In fact, it has been reported from various sources in May 2021 that the vaccination strategy and lockdowns in the United Kingdom were successful 31 .
Forecasting depending on mobility. Policymakers have sought to decrease the rate of infection in their populations by decreasing population mobility through lockdowns and, more recently, increasing vaccinations. Here, we model the effect of decreasing mobility and increasing vaccination rate on infection rate. If the mobility is increased by 30% of the normal mobility (baseline mobility), the model shows that the peak of infectious individuals increases drastically, see Fig. 7. The data show how visits to places are changing compared to the baseline. A baseline day represents a normal value for that day of the week. The baseline day is the median value www.nature.com/scientificreports/ from the 5 weeks Jan 3-Feb 6, 2020; for more information, see e.g. 32 . Figure 7 shows that in France, South Korea, and Germany, increased mobility results in a drastic change in the number of new Covid-19 cases. On the other hand, if mobility restrictions are decreased to 30% normal mobility, the model predicts that the peak of infectious individuals decreases compared to the baseline mobility.
Forecasting depending on the vaccination rate. In addition, with vaccination, the Covid-19 cases are noticeably decreasing for all of the countries under study in our work. The countries whose reproduction number ( R 0 = β/γ ) is close to 1 such as the United Kingdom and South Korea, have a better vaccination effect than the other countries. Figure 8 displays forecasting of infectious cases under various vaccination schedules within 70 days. In the experiment, we assume that the vaccine is evenly distributed with respect to time. The plots reveal that high vaccination rates are important in reducing the number of infectious cases. Figure 7 shows the models' forecast for infections with different mobility levels in each country. Given mobility information, the combined SIRD-LSTM model can predict the time-varying parameters (β, µ) . With those predicted parameters, the number of infectious individuals are implemented with or without vaccination. Based on the projected forecasts, we observe that a continuation of quarantine level mobility will result in low case counts.

Discussion
We introduced a novel algorithm that incorporates deep learning and compartmental models allowing for forecasts and evaluation of the current Covid-19 outbreak worldwide. We combined the SIRD model with the LSTM network and observed advantages of real-time forecasting and parameter estimation. The new algorithm integrates the forecasting accuracy of LSTM networks with the epidemiological model dynamics of the SIR-type model. Compared to the classical SIRD model in the literature, we forecast time-varying parameters predicted by the LSTM neural network. To forecast the parameters, mobility and positive test rate data are used in the architecture. We find that these inputs are important in improving the model's ability to fit the data. In addition, incorporating these data is essential for capturing behavioral changes by individuals in response to the pandemic as well as to observe the effect of policy decisions to increase vaccination and decrease mobility. As in other approaches, we conduct our research on publicly available datasets. We demonstrate how a new algorithm can be developed to better exploit quantitative measures in the fight against Covid-19. By utilizing reliable metrics and infection dynamics, we provide an approach that is deeply data-driven and computer-based. The proposed simulations can provide a tool for forecasting the effects of different mobility scenarios. Furthermore, as the proposed algorithm is compatible and generalizable, this allows for additional compartments in the SIR model or additional input datasets in the network which makes the method accessible to policymakers. Our developments point towards several extensions of great importance. In particular, we evaluated the impact of the imposition and relaxation of lockdown measures by inputting these changes into the LSTM neural network. We found that employing lockdown rules for each country can help to capture interesting regional dynamics of the Covid-19, and may give specific information to the policymakers. Another direction is to study    www.nature.com/scientificreports/ highly nonlinear capabilities of the neural network can be used to conduct inference on latent parameters of the SIR model.

Methodology
In this section, we explore our numerical method and prediction algorithm considered in this research. To begin, we describe the compartmental models, the SIRD equations, and the Runge-Kutta method. Then, we present the Levenberg-Marquardt algorithm. Lastly, we illustrate the combined SIRD-LSTM architecture which is the heart of our approach. We confirm that all methods were performed in accordance with the relevant guidelines and regulations.
Compartmental model: SIRD model. In this study, we represent the spread of Covid-19 using the susceptible-infected-recovered-dead (SIRD) model. Compartmental models have been used to simplify the mathematical modeling of infectious diseases 34,35 . One of the well-known (and simplest) models is the SIR model, and many models including SIRD are derivatives of this basic form [36][37][38] . The SIRD model predicts how a disease spreads, the total number infected, or the duration of an epidemic, and estimate important epidemiological parameters such as the reproductive number. Regarding the compartmental model, the population is assigned to compartments with labels: In addition, N is the total number of people in the area at time t with N = S(t) + I(t) + R(t) . The SIRD model is given by the following expressions 15 : where the parameter β , called the contact ratio, represents the effective contact rate, i.e. expected number of people infected by an infectious person, and γ is defined as recovery rate, i.e. expected number of people removed from the infected state. The ratio of β and γ is called as reproduction number, i.e. R 0 = β/γ . The reproduction number ( R 0 ) shows the average number of secondary infections coming from an infected person. The parameter µ is defined as a deceased rate. We assume that the recovered subjects are no longer susceptible to infection; the number of deaths due to other reasons is neglected. Further, the region under consideration is assumed to be isolated from other regions. This is a reasonable assumption as containment measures such as travel restriction have been enforced in most countries. By introducing the vaccination rate, the S(t) and R(t) terms can be modified for the vaccination model. We add the vaccination rate, ν , and the vaccine efficacy factor, ε , into our SIRD model to study an extended SIRD model with vaccination. For instance, ε = 0.95 for the Moderna and Pfizer vaccine 39 . More precisely, we introduce a multiplier factor δ = (1 − ε).
(2) www.nature.com/scientificreports/ We now write the following SIRD model which incorporates vaccination 28,29 With the SIRD model, we generate a deep neural network to predict β and µ . Subsequently, the SIRD with vaccination model provides the dynamics of the vaccination with predicted parameters β and µ. The contact rate, β = β(t) , and death rate, µ = µ(t) , of many acute infectious diseases varies significantly in time and frequently exhibits significant seasonal dependence 40,41 . Epidemiological models can be used to predict contact and death rate, which are important for measuring the spread of disease. A substantial body of research predicts the contact and death rate, β and µ , of infectious diseases via the discrete compartmental model [42][43][44] . The rest of this section introduces an algorithm to compute the time-dependent parameters directly from our data and the discrete SIRD model.

Levenberg-Marquardt algorithm.
To estimate the contact rate, β , and the death rate, µ , we use the Levenberg-Marquardt algorithm. To apply the algorithm, we solve the SIRD equations using a numerical approximation. In the present study, we use the fourth-order Runge-Kutta methods (RK4) which give the following discrete version of the SIRD model. For simplicity, we set then (2) can be recast Figure 8. Forecasting of of the number of Covid-19 infections for France, the United Kingdom, Germany, and South Korea under various vaccination schedules. Here, " 12% ", " 20% ", and " 38% " mean 12%, 20%, 38% of the population is vaccinated, respectively. www.nature.com/scientificreports/ The RK4 of (2) can be written as Given a dataset (y(t)) , using the Levenberg-Marquardt algorithm, we aim to find the parameters (β n , µ n ) := (β(t n ), µ(t n )) of the model curve with the least-squares curve-fitting 45 , We note that the Covid-19 dataset for each country is obtained from the Google mobility report 32 .
Neural network architecture. Long short term memory networks-so-called LSTM-are variants of recurrent neural network (RNN), capable of learning long-term dependencies. They were introduced by Hochreiter and Schmidhuber 46 , and are widely used in many fields such as time series prediction 47 , speech recognition 48 , and robot control 49 among many other applications.
Classic RNNs can keep track of arbitrary long-term dependencies in the input sequences. However, there is a computational drawback to the standard RNNs. In standard RNNs, this repeating module will have a very simple structure, such as a single layer. When training a classical RNN with back-propagation, the gradients which are back-propagated may tend to zero (vanish gradient problem) because the RNN remembers data for just a small duration of time. In other words, if we need the information after a small-time it may be reproducible, but once a lot of information is fed in, this information may get lost somewhere. This issue can be resolved by applying a variant of RNNs such as the LSTM network. The LSTMs are explicitly designed to avoid the long-term dependency problem as remembering information for long periods is practically their default behavior.
The compact forms of the LSTM with a forget gate can be described by the following system of equations: where x t is input vector, c t is a memory cell, and {i t , f t , o t } denote the input, forget, and output gates, respectively; for more details, see for instance 46,50,51 . Here, the operator • denotes the Hadamard product (element-wise product), and subscript t indexes the time step.
In the proposed neural network, we couple the SIRD model (2) and the LSTM network. By the Levenberg-Marquardt algorithm, predictions on β and µ are made by curve-fitting methods. With this, input data consists of x t = {S t , I t , R t , D t , p t , m t , β t , µ t } where p t is a positive rate (the percentage of all coronavirus tests performed that are actually positive) and m t is a mobility trend at time t obtained from Google's mobility report. The reports chart movement trends over time by geography, across different categories of places such as retail and recreation, groceries and pharmacies, parks, transit stations, workplaces, and residential. The parameters β t and µ t are predicted by the Levenberg-Marquardt algorithm. The output of the LSTM network is ( β t+1 , µ t+1 ) . When implementing cost functions, we apply a mean-squared forecasting error metric as well as mean-absolute percentage errors.
The network structure and activation of each hidden unit in the hidden layers are determined by the neurons in the previous layers. The activity of each layer is given by the nonlinear activation function σ such as a sigmoid function or ReLU function. The final output of the coupled model is obtained by combining the network output of confirmed cases with the SIR model forecast. More precisely, the collective dataset generated from the SIRD model is used as inputs for the LSTM whose outputs provide the parameters β and µ for the next time period. By predicting the parameters, we are able to solve the SIRD moded, which gives {S, I, R, D} for the next time period. The coupled models given in Fig. 9 illustrate the Neural LSTM-SIRD architecture. The network architecture we use is an LSTM with ReLU activation functions, and is trained by using Adam optimizer with a mean-squared error loss function. The model is not constrained to a particular setup and we could search over various hyperparameters to manipulate the number of neurons, with similar results.
(5) (β,μ) = arg min β,µ m i=1 y n+1,i − y n,i + h 6 (k 1 + 2k 2 + 2k 3 + k 4 ) 2 . (6) i t = σ g (W i x t + U i h t−1 + b i ), c t = σ c (W c x t + U c h t−1 + b c ), h t = o t • σ h (c t ), Figure 9. A description of the combined SIRD-LSTM model structure with Covid-19 community mobility (mobility) and positive test rate (Pos. Test Rate) to generate forecasts of time varying parameters (β, µ) . The ODE solver based on the Runge-Kutta fourth order method makes use of the predicted parameters in the numerical discretization.