Abstract
Corona Virus Disease 2019 (COVID19) has spread rapidly to countries all around the world from the end of 2019, which caused a great impact on global health and has had a huge impact on many countries. Since there is still no effective treatment, it is essential to making effective predictions for relevant departments to make responses and arrangements in advance. Under the limited data, the prediction error of LSTM model will increase over time, and its prone to big bias for medium and longterm prediction. To overcome this problem, our study proposed a LSTMMarkov model, which uses Markov model to reduce the prediction error of LSTM model. Based on confirmed case data in the US, Britain, Brazil and Russia, we calculated the training errors of LSTM and constructed the probability transfer matrix of the Markov model by the errors. And finally, the prediction results were obtained by combining the output data of LSTM model with the prediction errors of Markov Model. The results show that: compared with the prediction results of the classical LSTM model, the average prediction error of LSTMMarkov is reduced by more than 75%, and the RMSE is reduced by more than 60%, the mean \({R}^{2}\) of LSTMMarkov is over 0.96. All those indicators demonstrate that the prediction accuracy of proposed LSTMMarkov model is higher than that of the LSTM model to reach more accurate prediction of COVID19.
Introduction
COVID19 has spread to several countries around the world in a very short period and has had a huge impact on many countries. As of February 2021, more than 100 million people worldwide have been diagnosed and more than 2 million have died^{1}. Unlike other infectious diseases, COVID19 has mutated. The first wave of the epidemic broke out around March 2020. After a series of measures, the epidemic was alleviated to some extent. Since September 2020, the epidemic combined with the influenza virus broke out again^{2}. In the second wave of the epidemic, the number of confirmed cases in European countries increased dramatically, which is a worrying situation. The number of diagnosed people in each country is shown in Fig. 1. Now that a vaccine has been developed, there are still many problems with the spread of vaccination^{3}, we still need to minimize the spread of the disease through making policies, such as isolation, keeping a social distance and wearing a mask^{4}. Therefore, predicting the future trend of the epidemic, helping relevant departments and personnel to develop policies to control the spread of the epidemic, and producing medical supplies are still extremely important.
In the field of infectious disease prediction, the main methods used could be concluded as three categories: statisticsbased method, deep learning method and machine learning method. The models commonly used including the SEIR model^{5}, SVM model^{6}, ARIMA model^{7}, LSTM model^{8}, etc. For example, Kermack used epidemic model SIR to predict the development tendency of COVID19, they believed that the transmission rate and mortality rate of the disease were fixed during the study period. However, COVID19 did not suitable for this hypothesis^{9}. Benvenuto adopted a statistical method based on moving autoregressive model (ARIMA) to make prediction. ARIMA is a linear model, which holds that there is a linear relationship between future and past phenomena. Even though the model has a good effect in shortterm prediction, it does not apply to longterm prediction of COVID19^{10}. Choi used the seasonal autoregressive combined moving average (SARIMA) model to estimate the mortality of COVID19^{11}. Abdu Gumaei adopted a gradient enhanced regression model to estimate the mortality of COVID19, which is a combination optimization of multiple weak regressions and can only predict a single variable^{12}. All of these are statistical methods.
More and more scholars have applied deep learning methods to predict data recently. For instance, Bandyopadhyay et al. used the gate circulation network and the LSTM model to predict and estimate the number of COVID19 diagnosed, dead and cured cases^{13}. And Huang et al. used the deep learning method based on the convolutional neural network to predict the cumulative number of deaths of COVID19^{14}. Zang et al.^{15} demonstrated that CNN–LSTM, LSTM, and CNN models were more accurate than ANN and SVM models in the shortterm forecasting of global horizontal irradiance (GHI). S. Bock et al.^{16} compared machine learning and deep learning models’ performance while changing the amount of input data. The results showed that the accuracy of deep learning model tend to increase as the number of training data increases. Such studies all have shown that the prediction accuracy of LSTM model will increase with the increase of training data, it can overcome the gradient vanishing and gradient explosion problems, and it has a good memory.
The purpose of this study is to develop a model that can accurately predict future epidemic trends over long periods based on historical case data, and the LSTM model still exists some problems when it comes to forecasting. For example, (1) the LSTM model uses the existing data to train model parameters, and the model parameters obtained with a large amount of existing data are accurate, otherwise, the training effect of the model may not be very well^{17}. (2) The LSTM model can only predict the shortterm data rather than longterm. Moreover, under limited data, the accuracy of the prediction results will also decrease with the increase of the prediction period^{18}. (3) The Forget Gate in the standard LSTM model is easy to ignore and exclude relevant contents in long sequence tasks. The Forget Gate reduces the participation of previously hidden state and gives priority to calculating unit state by using the input of current state^{19}. These drawbacks limit the accuracy of predictions. The improvements to the model can be divided into two categories: one is to adopt small variants of the LSTM model, that is, to improve the structure of the model itself, including Peephole connection^{20} and Gated Recurrent Unit (GRU) model^{21}. The other is to combine LSTM model with other models, which typically includes the CNNLSTM model^{22} and the SVMLSTM model^{23}, to improve the prediction accuracy of LSTM model. The above improvements to the LSTM model are all aimed at improving the accuracy of data input at the early stage of model training, so as to improve the prediction accuracy of the LSTM model. However, the disadvantage of decrease accuracy of LSTM model still remained in the longterm prediction. The Markov model is a probabilistic prediction model based on statistics, that is, the probability transfer matrix is constructed based on the data before prediction, and the probability matrix is used to predict the data^{24}. The Markov model supports the detailed division of data, so Markov model can be used to correct the errors of other models, which makes up for the disadvantage that the errors of LSTM model increase with time. In view of this, the Markov model is proposed to reduce the prediction error of the LSTM model for the number of people confirmed daily, so as to improve the prediction performance of LSTM model. It is the theoretical basis for combination of the two models in this study.
The experimental results show that the combination of the LSTM and Markov model could improve the prediction accuracy of the epidemic trend effectively, and the prediction effect is also in line with reality, which has a guiding significance for the actual epidemic prediction. The contributions of this paper are summarized as below:

(1)
LSTM model of deep learning combined with Markov model of statistical method was designed to predict the number of confirmed cases of COVID19.

(2)
The prediction errors of our proposed method (LSTMMarkov) are much smaller than that of LSTM model.

(3)
LSTMMarkov model can improve the accuracy and precision of medium and longterm trend prediction of COVID19.
Methods
LSTM model
The LSTM model has been improved by the Recurrent Neural Network (RNN) and has been widely used in many fields, such as text recognition^{25}, finance^{26} and industrial engineering^{27}. The LSTM consists of an input layer, an output layer and hidden layers. After the input data passes through the input layer, it comes into the hidden layers. Hidden layers are the most complex and it may have multiple layers. Each hidden layer of LSTM consists of three gate units and one memory state unit. After the input information passes through three gate units and one memory unit in turn, the useful information is stored in the memory unit, and the invalid information is discarded, which can realize the prediction of the subsequent data. The function of each gate is different, and the detailed structure of the LSTM is shown in Fig. 2.
The function of each Gate in Fig. 2 can be described as follows:

(1)
Forget Gate The information first passes through the Forget Gate. The function of the Forget Gate is to determine which information from the previous layer will be discarded and which will be retained in the current state. It can be expressed as follows:
$${f}_{t}=\sigma \left({W}_{f}\cdot \left[{h}_{t1},{x}_{t} \right]+{b}_{f}\right).$$(1) 
(2)
Input Gate After entering the information, the data is updated. The Input Gate applies the \(sigmoid\) function to update the data and then determines which information to store in memory cells. The specific formula is as follows:
$${i}_{t}=\sigma \left({W}_{i}\cdot \left[{h}_{t1},{x}_{t} \right]+{b}_{i}\right).$$(2) 
(3)
Output Gate The Output Gate determines the output of the model and the proportion of the output of control unit state \({C}_{t}\) to the hidden layer elements of the current LSTM model. The initial output is obtained by the \(sigmoid\) activation function, then the value is reduced to – 1 to 1 by \({\text{tan}}h\) function, and then multiplies with the output of the \(sigmoid\) to obtain the result, which could be expressed as follows:
$${o}_{t}=\sigma \left({W}_{o}\cdot \left[{h}_{t1},{x}_{t} \right]+{b}_{o} \right),$$(3)$${h}_{t}={o}_{t}\cdot \text{tanh}\left({C}_{t}\right).$$(4) 
(4)
Memory Cell A line located at the top is the Memory Cell. It uses the \(tanh\) function to generate new candidate values, and then combines the input information of the Input Gate with the current state information to update the memory state. It determines the information currently stored and the information transmitted to the next step, so that it can use the historical information to predict the future data. The calculation formula is as follows:
$$\stackrel{\sim }{{C}_{t} }=\text{tanh}\left({W}_{c}\cdot \left[{h}_{t1},{x}_{t} \right]+{b}_{C} \right).$$(5)
In the above formulas, \(\sigma \) presents \(sigmoid\) function, \({W}_{f}, {W}_{i},{W}_{C},{W}_{o}\) represent the weight of the Forget Gate, the Input Gate, the Memory unit and the Output Gate, respectively. \({b}_{f}, {b}_{i}, {b}_{C},{b}_{o}\) represent the bias of the Forget Gate, the Input Gate, the Memory unit and the Output Gate, severally. They are all generated by random initialization function. The \({h}_{t1}\) is the value of the hidden unit calculated at the last time, and \({x}_{t}\) is the input information at the current moment.
Markov model
The Markov is a statistical stochastic prediction model, which can be predicted only by calculating the corresponding state transition matrix according to the evolution characteristics of the event itself^{28}. Markov is often used for compressing images^{29} and predicting service time of building^{30}, etc. The process of Markov model is shown in Fig. 3, the principles of Markov are described as follows:
Definition 1
Setting up \({X}_{1},{X}_{2},\cdots {X}_{n}\) as a discrete sequence of random variables, denote as {\({X}_{n}\)}. All the possible values of \({X}_{n}\) are called the state space of {\({X}_{n}\)}, denote as \(E=\){\({X}_{1},{X}_{2},\cdots {X}_{n}\)}. If any positive integer is \(n\) and any \({x}_{{i}_{1}},{x}_{{x}_{{i}_{2}}},\ldots {x}_{{x}_{{i}_{n}}}\), only if \(P\)(\({X}_{1}={x}_{{i}_{1}},{X}_{2}={x}_{{i}_{2}}, \ldots ,{X}_{n}={x}_{{i}_{n}}\))\(>0\), then,
We will call {\({X}_{n}\)} is a Markov chain.
Definition 2
Assuming that {\({X}_{n}\)} is the Markov chain. If any \({x}_{i},{ x}_{j}\subset E\), and if
always is true, then we will call {\({X}_{n}\)} as homogeneous Markov chain.
Definition 3
If {\({X}_{n}\)} is a homogeneous Markov chain, then \(P\){\({X}_{n+k}={x}_{j} {X}_{n}={x}_{i}\)} is called kstep transition matrix from the state \({x}_{i}\) to the state \({x}_{j}\) of {\({X}_{n}\)} and denoted as \({P}_{ij}\left(k\right).\) We call the matrix with \({P}_{ij}\left(k\right)\) as its elements the kstep transfer matrix of {\({X}_{n}\)}, recorded as \({P}_{k}\).
Definition 4
For any \(i\), if the element \({a}_{ij}\ge 0\) of the matrix \({({a}_{ij})}_{n\times n}\), and all \(\sum_{j}^{n}{a}_{ij}=1\) is true, then the matrix \({({a}_{ij})}_{n\times n}\) is a random matrix.
Definition 5
If matrix
each element \({a}_{ij}\left(n\right)\) is the term of a sequence of numbers {\({a}_{ij}\left(n\right)\)}, then matrix \(A\)(\(n\)) is called sequence matrix. And for any \(i,j=\text{1,2}\cdots ,m\), if the limit of each sequence exists, we call it when the \(n\) tends to infinity, \(A=\)(\({a}_{ij}\)) is the limit of \(A\)(\(n\)).
According to Definition 2, if the limit matrix \(P\)(\(k\)) of the kstep transition matrix of the homogeneous Markov chain exists, with the continuous evolution of the system, the transition probability between the final system states will remain unchanged, the system will show the characteristics of statistical regularity, and then it will evolve into a stable system. All systems considered in this article have a finite number of states.
Proposed model
In this study, we used Markov model to correct the prediction error of LSTM model. From reading literatures, we known that ADAM optimizer outperformed the other optimizers^{31}. And to avoid overfitting, we set the dropout to 0.02 and the hidden layer to 1 in the model^{32}, and the number of nodes in the hidden layer is 4. Hence, our experiment is as follows: first, the LSTM model was trained with the confirmed cases of COVID19 of four countries. Then, the difference between the number of confirmed cases predicted by LSTM and the actual number of confirmed cases was calculated, which was then taken as the input data of Markov model to calculate the probability transition matrix. Finally, LSTM model was used to predict the cumulative number of confirmed cases, and Markov model was used to correct the error of the prediction, so as to obtained the final forecasting results. The experimental process of our proposed method is shown in Fig. 4.
Experiment and discussion
Data source
The statistics used in this study were collected by John Hopkins University^{33}, including four countries: the United States, Britain, Brazil and Russia, dated from March 1, 2020 to December 31, 2020. We extracted date and death data for the above countries from the repository. These four countries are the most seriously affected by the epidemic and the country with the most confirmed cases in the world. Most importantly, their curves are smooth, with no temporary surges in the middle. And the numbers of cases in these countries have been increasing, so it makes sense for us to make predictions.
Data processing
In this study, the LSTM and LSTMMarkov models have been applied to understand the future transmission dynamics of COVID19. The experiments are conducted on opensource libraries such as NumPy, Pandas and TensorFlow. Python, as a highlevel generalpurpose programming language, is used to interact with deep learning libraries as an application program interfaces (APIs). The obtained APIs is used to design the current model structure for above neural network variants.
Firstly, we divided the case data into four groups by country. Each data set for each country was considered as a time series. According to the statistical method, the data distributed outside each group data series (\(\upmu 3\upsigma ,\upmu +3\upsigma \)) are regarded as outliers^{34}. And it’s no outliers in the four datasets. Then, the data was normalized according to the following formula:
where \(min\) is the minimum value of the data and the \(max\) is the maximum value of the data.
Secondly, each set of data was divided into two parts. 70% of the data were used for training the parameters of the LSTMMarkov model, and the rest of it were used for testing and prediction. The number of test days is about 100.
Thirdly, setting the optimal model parameters. From reading literature, we known that the ADAM optimizer outperformed the other optimizer. So, we chosen ADAM as the model optimizer. We initially determined the range of input time step^{35}, then by the trialanderror method, we chosen the best value of window and assigned each country with corresponding best time step. The prediction effects of different parameters are shown in Tables 1 and 2. In the end, the time steps of the US, Britain, Brazil and Russia were set to 9, 7, 10 and 7 days respectively. That means: in the US, confirmed cases in the first 9 days were used to predict cases on the 10th day. In the Britain and Russia, confirmed cases in the first 7 days were used to predict cases on the 8th day. In the Brazil, the number of days to input is 10. For the epochs, as shown in Fig. 5, when the epoch is 50, loss convergence is the minimal. So, 50 is also more appropriate. With the optimal parameters, the resulting model is also optimal in weights and biases. Tables 1, 2 and 3 show the setting of model parameters in the four countries:
Finally, the trained LSTM model and the LSTMMarkov model were used to predict the number of daily confirmed cases in each country before February 20, 2021, respectively.
Assessment indicator
There are errors between predicted data and actual data. In this paper, RMSE (rootmeansquare error) was used to evaluate the degree of dispersion of error. In order to evaluate the fitting degree of models, we chosen \({R}^{2}\) (Rsquared) index, and we used the error rate to evaluate the accuracy of the prediction, which are defined as follows:
where \(y\) is the true value, \(\widehat{y}\) is the predicted value, \(n\) is the number of values.
Root mean square error (RMSE) of the LSTM model and the improved model proposed in this paper were compared to determine whether the prediction accuracy of the model was improved^{36,37}. The smaller the value of RMES, the better the performance. The \({R}^{2}\) was used to evaluate the fitting degree of the two models^{38}, the closer to 1, the better the model works. The \(error rate\) was used to estimate the accuracy of prediction, the closer to 0, the more accurate.
Experimental results
In this paper, LSTM model and the proposed LSTMMarkov model were applied to predict the number of daily total infected cases of COVID19 in the four countries mentioned above respectively and the results are shown in Fig. 6.
As can be seen from Fig. 6, the curves keep rising as time goes on, especially after October 2020, the curves rise steeply. This implies that the situation became more severe in October. We predicted that by January 2021, the Britain will stabilize to 3.5 million. Then its epidemic will be brought under control. In the US and Russia, the number of daily confirmed cases would still see further increase, but the curves were starting to flatten and the growth would slow down around February. While Brazilian cases would continue to see rapid increases, with no signs of slowing down. We predict that more than 8 million people infected by February 2021.
In addition, the prediction errors of LSTM model and LSTMMarkov model were calculated and compared, as is shown in Fig. 7.
According to Fig. 7, the prediction errors of LSTM model increase very fast, and the errors increase the fastest at about 30 days. In the US, the prediction errors of the LSTMMarkov model are always smaller than the LSTM model. In other countries, the errors of LSTMMarkov model are slightly larger than LSTM in the initial stage, but far less than that of LSTM in the middle and late stage. By February 2021, the errors of the LSTMMarkov model are less than that of the LSTM model 4 million in the US, 1 million in the Britain and Brazil and 40,000 in the Russia, respectively. The result indicates that the proposed LSTMMarkov model greatly reduces the prediction error of the LSTM model.
We calculated RMSE and \({R}^{2}\) of the LSTM model and the LSTMMarkov model respectively, which are shown in Fig. 8.
To verify the effectiveness of our proposed method, the cumulative number of infected cases predicted by the two models for December 5, 2020, January 5, 2021 and February 5, 2020 were compared with the real values, respectively. As shown in Table 4 and Fig. 9.
As can be seen from a in Fig. 8, in the US, Britain, Brazil and Russia, the \({R}^{2}\) of LSTMMarkov are 0.96, 0.94, 0.97 and 0.98, with the average value greater than 0.96 and close to 1, both are larger than LSTM model. So, we can know that the proposed model has better fitting effect than the LSTM model. From b in Fig. 8, the RMES of LSTMMarkov model is nearly 40% of LSTM, which proved that the forecasting precision is greatly improved by of LSTMMarkov model. According to Fig. 9, compared with the number of reported cases, the average LSTMMarkov error rates for the US, Britain, Brazil and Russia were 0.040, 0.044, 0.032 and 0.037, respectively. Its average prediction error rate was 0.038 and the average error rate of LSTM is 0.152. As a result, the error was reduced by more than 75%, far less than the LSTM model, and the accuracy was improved by 60%. Both the shortterm and longterm prediction error rates of LSTMMarkov model are lower than the LSTM model.
Discussion
As can be seen from Fig. 6, the prediction curve of our proposed model has the same trend as the actual curves, and is closer to the real curve than the prediction curve of the LSTM model. We predict that the number of cases will continue to increase in these countries, and then in January 2021, the curve of cumulative confirmed cases will be gradually slow in the Britain, the number of cases will stabilize at about 3.5 million, so, the epidemic will be brought under control. And the number of cases will continue to increase in US, Russia and Brazil, but Brazil’s growth will not slow. It can be seen from Figs. 7, 9 and Table 2 that the prediction error curve of the LSTMMarkov model is much lower than the LSTM model. The average error rate of the LSTM model is 0.152, while the average error rate of LSTMMarkov model is 0.038. Both the shortterm and longterm prediction error rates of the LSTMMarkov model are smaller than those of the LSTM model. The Fig. 8 show that the prediction accuracy of the LSTMMarkov model is much higher than that of the LSTM model through the \({R}^{2}\) and RMSE value.
After the new president of the US took office, he paid special attention to epidemic prevention. He signed an executive order requiring the nation to wear masks and issued a quarantine order. He announced that the national strategy will be driven by scientists and public health experts who will communicate directly to you^{39}. The United States began to gradually lift the blockade recently, and has distributed nearly 4 million vaccines to the country by February 2021. The vaccine acceptance rate in the US is 56.9%^{40}. We can also read from Fig. 6 that the number of people diagnosed in the United States increased rapidly in January and gradually slow at the end of January, indicating the effectiveness of the U.S. policy. The British government has also taken a lot of treasures to control the epidemic. The National Health Service (NHS) has handed out £4.2 million in December 2020 to vaccinate the groups most in need and reduce vaccine inequality^{41}. And the UK has committed to rolling out vaccines as a top priority for caring for residents and staff. Since the new year, the delivery system in England comprises the original hospital hubs and primary care services, now supplemented by mass vaccination centers and community pharmacy services. By the end of January, more than 300,000 vaccinations were being given each day^{42}. In conclusion, what we see from our experimental results is that in February, the number of diagnoses gradually slowing down in both countries and the epidemic was brought under control, which is consistent with what we predicted.
The Russian government did not pay enough attention to COVID19 in the early days, leading to a rapid outbreak. Later, due to the abolition of unprofitable hospitals, polyclinics and infectious disease beds, the shortage of doctors and the heavy workload of medical institutions, therefore, the number of confirmed cases in Russia will continue to increase for some time to come^{43}. And in Brazil, the governmental response to COVID19 has been marked by the lack of leadership at the federal level, distrust of science, denial of the importance of the virus and progressive cuts to health and research funding. There are racial and gender differences in the fight against novel coronavirus^{44}. Brazil, of course, has by far the worst outbreak, and the number of confirmed cases is still rising dramatically, at the same time our experimental results also indicate this point.
Summary
COVID19 has been announced as a global pandemic, and has drawn great attention of countries all over the world. This study proposes a LSTM model combined with Markov model (LSTMMarkov) in view of the traditional LSTM models predict problems deviation of the data. First, the model was trained by confirmed case data from four countries: the US, Britain, Brazil and Russia. Then, predicting the number of confirmed cases before February 20, 2021 in each country by using the Markov model correcting LSTM model. Finally, using \({R}^{2}\), RMSE value and \(error\, rate\) to evaluate the effectiveness of our proposed model.
We predicted that the number of cases will stabilize and the epidemic will be brought under control in the Britain by February 2021, while the number of cases will continue to rise in US, Brazil and Russia. The results show that the prediction curve of the proposed LSTMMarkov model is closer to the real epidemic curve, the mean RMSE is only 40% of the LSTM model, the \({\text{R}}^{2}\) are all close to 1, the average error is reduced by more than 75%. Thus, the forecasting accuracy of LSTMMarkov is far higher than LSTM model. By comparing the \(error \,rate\) of LSTMMarkov model with LSTM model, the results show that the former has better prediction effect. And compared with other research results^{45,46,47}, our improvement of LSTM model is better. In conclusion, LSTMMarkov model can predict the confirmed cases effectively, the predicted results can also provide help and reference for the government decisionmaking in formulating relevant measures, and have practical significance in life.
Threads
However, this method still has some shortcomings. We didn’t experiment with more countries to see if the model works for all countries. Later, if possible, we will apply the model to other countries to improve the model. And the influencing factors only include the number of confirmed cases, without considering various influencing factors such as gender, age, occupation or location. In the future, we will continue to improve the model and add a variety of influencing factors in the later stage to further improve the accuracy of prediction.
Data availability
The datasets generated during and/or analyzed during the current study are available in the GitHub repository [https://github.com/CSSEGISandData/COVID19].
References
 1.
Su, C.M., Wang, L. & Yoo, D. Activation of NFκB and induction of proinflammatory cytokine expressions mediated by ORF7a protein of SARSCoV2. Sci. Rep. 11, 1–12 (2021).
 2.
Engelbrecht, F. A. & Scholes, R. J. Test for Covid19 seasonality and the risk of second waves. One Health 12, 100202 (2021).
 3.
French, J., Deshpande, S., Evans, W. & Obregon, R. Key guidelines in developing a preemptive COVID19 vaccination uptake promotion strategy. Int. J. Environ. Res. Public Health 17, 5893 (2020).
 4.
Jia, J. S. et al. Population flow drives spatiotemporal distribution of COVID19 in China. Nature 582, 389–394 (2020).
 5.
Cooper, I., Mondal, A. & Antonopoulos, C. G. A SIR model assumption for the spread of COVID19 in different communities. Chaos Solitons Fractals 139, 110057 (2020).
 6.
Singh, V. et al. Prediction of COVID19 corona virus pandemic based on time series data using Support Vector Machine. J. Discr. Math. Sci. Cryptogr. 23, 1583–1597 (2020).
 7.
Aslam, M. Using the kalman filter with Arima for the COVID19 pandemic dataset of Pakistan. Data Brief 31, 105854 (2020).
 8.
Wang, P., Zheng, X., Ai, G., Liu, D. & Zhu, B. Time series prediction for the epidemic trends of COVID19 using the improved LSTM deep learning method: Case studies in Russia, Peru and Iran. Chaos Solitons Fractals 140, 110214 (2020).
 9.
Kermack, W. O. & McKendrick, A. G. A contribution to the mathematical theory of epidemics. Proc. R. Soc. Lond. Ser. A 115, 700–721 (1927).
 10.
Benvenuto, D., Giovanetti, M., Vassallo, L., Angeletti, S. & Ciccozzi, M. Application of the ARIMA model on the COVID2019 epidemic dataset. Data Brief 29, 105340 (2020).
 11.
Choi, K. & Thacker, S. B. Mortality during influenza epidemics in the United States, 1967–1978. Am. J. Public Health 72, 1280–1283 (1982).
 12.
Gumaei, A. et al. Prediction of COVID19 confirmed cases using gradient boosting regression method. Comput. Mater. Continua 66, 315 (2021).
 13.
Bandyopadhyay, S. K. & Dutta, S. Machine learning approach for confirmation of covid19 cases: Positive, negative, death and release. MedRxiv. https://doi.org/10.5281/zenodo.3822623 (2020).
 14.
Li, D., Huang, G., Zhang, G. & Wang, J. Driving factors of total carbon emissions from the construction industry in Jiangsu Province, China. J. Clean. Prod. 276, 123179 (2020).
 15.
Zang, H. et al. Shortterm global horizontal irradiance forecasting based on a hybrid CNNLSTM model with spatiotemporal correlations. Renew. Energy 160, 26–41 (2020).
 16.
Bock, S. & Weiß, M. 2019 International Joint Conference on Neural Networks (IJCNN), 1–8 (IEEE).
 17.
Zhang, L. et al. 2020 15th IEEE Conference on Industrial Electronics and Applications (ICIEA), 1848–1853 (IEEE).
 18.
Liang, Y., Li, W., Lou, P. & Hu, J. Thermal error prediction for heavyduty CNC machines enabled by long shortterm memory networks and fogcloud architecture. J. Manuf. Syst. https://doi.org/10.1016/j.jmsy.2020.10.008 (2020).
 19.
Fanta, H., Shao, Z. & Ma, L. SiTGRU: Singletunnelled gated recurrent unit for abnormality detection. Inf. Sci. 524, 15–32 (2020).
 20.
Rahman, M. & Siddiqui, F. H. An optimized abstractive text summarization model using peephole convolutional LSTM. Symmetry 11, 1290 (2019).
 21.
Dey, R. & Salem, F. M. 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), 1597–1600 (IEEE).
 22.
Vidal, A. & Kristjanpoller, W. Gold volatility prediction using a CNNLSTM approach. Expert Syst. Appl. 157, 113481 (2020).
 23.
Moradzadeh, A., Pourhossein, K., MohammadiIvatloo, B., Khalili, T. & Bidram, A. 2021 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT), 1–5 (IEEE).
 24.
Büyükşahin, Ü. Ç. & Ertekin, Ş. Improving forecasting accuracy of time series data using a new ARIMAANN hybrid method and empirical mode decomposition. Neurocomputing 361, 151–163 (2019).
 25.
Su, M.H., Wu, C.H., Huang, K.Y. & Hong, Q.B. 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia), 1–6 (IEEE).
 26.
Wang, X., Xie, X., Chen, Y. & Zhao, B. A machine learning approach to forecasting carry trade returns. Appl. Econom. Lett. https://doi.org/10.1080/13504851.2021.1918624 (2021).
 27.
Zollanvari, A., Kunanbayev, K., Bitaghsir, S. A. & Bagheri, M. Transformer fault prognosis using deep recurrent neural network over vibration signals. IEEE Trans. Instrum. Meas. 70, 1–11 (2020).
 28.
Sobaszek, Ł, Gola, A. & Kozłowski, E. Predictive scheduling with Markov chains and ARIMA models. Appl. Sci. 10, 6121 (2020).
 29.
Wang, C., Feng, Y., Li, T., Xie, H. & Kwon, G.R. A new encryptionthencompression scheme on gray images using the Markov random field. Comput. Mater. Continua 56, 107–121 (2018).
 30.
Ullah, I., Ahmad, R. & Kim, D. A prediction mechanism of energy consumption in residential buildings using hidden Markov model. Energies 11, 358 (2018).
 31.
Jiang, S. & Chen, Y. Pacific Rim Conference on Multimedia 743–753 (Springer, 2018).
 32.
Rice, L., Wong, E. & Kolter, Z. International Conference on Machine Learning 8093–8104 (PMLR, 2020).
 33.
Dong, E., Du, H. & Gardner, L. An interactive webbased dashboard to track COVID19 in real time. Lancet. Infect. Dis 20, 533–534 (2020).
 34.
Catoni, O. Statistical Learning Theory and Stochastic Optimization: Ecole d’Eté de Probabilités de SaintFlour, XXXI2001 Vol. 1851 (Springer, 2004).
 35.
Nabi, K. N., Tahmid, M. T., Rafi, A., Kader, M. E. & Haider, M. A. Forecasting COVID19 cases: A comparative analysis between recurrent and convolutional neural networks. Results Phys. 24, 104137 (2021).
 36.
Montoye, A. H., Begum, M., Henning, Z. & Pfeiffer, K. A. Comparison of linear and nonlinear models for predicting energy expenditure from raw accelerometer data. Physiol. Meas. 38, 343 (2017).
 37.
Dhamodharavadhani, S., Rathipriya, R. & Chatterjee, J. M. COVID19 mortality rate prediction for India using statistical neural network models. Front. Public Health. https://doi.org/10.3389/fpubh.2020.00441 (2020).
 38.
Athab, N. A. An analytical study of cervical spine pain according to the mechanical indicators of the administrative work staff. Indian J. Public Health 10, 1349 (2019).
 39.
Tanne, J. H. Covid19: Biden launches national plan based on “science and public health alone”. BMJ 372, n210, https://doi.org/10.1136/bmj.n210 (2021).
 40.
Sallam, M. COVID19 vaccine hesitancy worldwide: A concise systematic review of vaccine acceptance rates. Vaccines 9, 160 (2021).
 41.
Iacobucci, G. Covid19: NHS England pledges extra funding to local areas to reduce vaccine inequalities. BMJ, n580, https://doi.org/10.1136/bmj.n580 (2021).
 42.
Sim, F. Early Covid19 vaccination rollout: A commentary from England. Isr. J. Health Policy Res. 10, 1–4 (2021).
 43.
Velikorossov, V., Maksimov, M. & Prodanova, N. On the evaluation of the effectiveness of states’ measures to overcome the Covid19 crisis: statistics and common sense. repository 10, https://repository.mruni.eu/handle/007/17158 (2021).
 44.
Ribeiro, K. B., Ribeiro, A. F., de Sousa Mascena Veras, M. A. & de Castro, M. C. Social inequalities and COVID19 mortality in the city of São Paulo, Brazil. Int. J. Epidemiol. 50, 732 (2021).
 45.
Yan, B. et al. An improved method for the fitting and prediction of the number of covid19 confirmed cases based on lstm. Preprint at http://arXiv.org/2005.03446 (2020).
 46.
Aktar, S. et al. Machine learning approach to predicting COVID19 disease severity based on clinical blood test data: Statistical analysis and model development. JMIR Med. Inf. 9, e25884 (2021).
 47.
Jörges, C., Berkenbrink, C. & Stumpe, B. Prediction and reconstruction of ocean wave heights based on bathymetric data using LSTM neural networks. Ocean Eng. 232, 109046 (2021).
Funding
This research was supported by the Fundamental Research Funds for the Central Universities under Grant Nos. 2652020002 and 2652020004.
Author information
Affiliations
Contributions
X.Z. and H.L. provided the initial idea and research plan, R.M. collected receipts, calculated results and wrote the first draft of the manuscript, P.W. participated in the model design and calculation result analysis, C.Z. participated in the calculation plan design and supplementary experiments. All authors participated in the analysis and discussion of the results, and participated in the revision of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ma, R., Zheng, X., Wang, P. et al. The prediction and analysis of COVID19 epidemic trend by combining LSTM and Markov method. Sci Rep 11, 17421 (2021). https://doi.org/10.1038/s41598021970375
Received:
Accepted:
Published:
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.