Dual-attention-based recurrent neural network for hand-foot-mouth disease prediction in Korea

Hand–foot–mouth disease (HFMD) is a viral disease that occurs primarily in children. Meteorological factors have a significant impact on its popularity annually in Korea. This study proposes a new HFMD prediction model using a dual-attention-based recurrent neural network (DA-RNN) and important weather factors for HFMD in Korea. First, suspected cases of HFMD in each state were predicted using meteorological factors from the DA-RNN. Second, the weather factors were divided into six categories: temperature, wind, rainfall, day length, humidity, and air pollution to conduct sensitivity analysis. Because of this prediction, the proposed model showed the best performance in predicting the number of suspected HFMD cases in a week compared with other RNN methods. Sensitivity analysis showed that air pollution and rainfall play an important role in HFMD in Korea. This model provides information for HFMD prevention and control and can be extended to predict other infectious diseases.


Sensitivity analysis
The sensitivity test results are shown in Fig. 3.The sensitivity analysis compared the evaluation scores (MAE, RMSE, MAPE, and R 2 ) by eliminating them by group, as shown in Table S2.
In order to determine important meteorological groups by objectifying the evaluation scores obtained from the sensitivity experiment, a spider map was obtained through the defined important values, which is shown in Fig. 3A.The importance score was calculated based on the area of this spider map, and the larger the value, the greater the importance, which is shown in Fig. 3B.
The predicted important score after removing the five elements from the temperature group was 0.0892.The predicted important score two elements of the wind group removed was 0.6455; the predicted important score with two elements of the rainfall group removed was 0.891; the predicted important score with two elements of the day length group deleted was 0; the important score with two elements of the humidity group omitted was 0.5624; and the important score with five elements of the air pollution group deleted was 0.6485.The important score results for each group are shown in Fig. 3B.Supplementary Fig. S2 shows the sensitivity analysis important score results.
Because of the characteristics of its structure, DA-RNN calculates the attention scores of features every time, which can confirm changes in the importance of meteorological factors.Figure 3C shows changes in the importance of meteorological factors during the test period.Although the score values vary depending on the period, rainfall and air pollution groups have high important scores over the overall time.

Method
Figure 4 summarizes the HFMD prediction using DA-RNN.After data collection and normalization, weekly HFMD data were regression-targeted using DA-RNN to predict weekly suspected HFMD cases per 1000 people and to discriminate between important meteorological factor groups.

Data collection
HFMD data were obtained from 2011 to 2020 from the number of suspected cases per 1,000 people per state provided by the Korea Centers for Disease Control and Prevention 19 .For meteorological factors, 15 types of data on temperature, wind speed, precipitation, day length, and humidity from 2011 to 2020 were collected using the Korea Meteorological Data Open Portal 20 .Five types of air pollution data were collected from 2011 to 2020 through AirKorea 21 .

Data normalization
Normalization pre-processing is essential when there is a difference in the scale of features between the data.Min-max normalization converts all features between zero and one 22 .The min-max normalization equation is as follows:

Estimation of suspected HFMD patients using DA-RNN
The traditional RNN successfully uses a time-series prediction algorithm, which has the problem of vanishing gradients.LSTM and GRU have been used to overcome this limitation.The LSTM and GRU are shown in Figure S3 23,24 .However, the same problem occurs when the time series increases.The structure of the encoder-decoder network was used; a representative example is Seq2Seq 25,26 .This structure has been extended to an attention mechanism that provides a score for past information because the performance degradation problem occurs when the input sequence is lengthy.Qin et al. proposed a DA-RNN as a time-series prediction model 18 .This structure overcomes the shortcomings of the RNN model in existing time-series studies using two attention mechanisms with encoder and decoder structures.The structure of this study's model is shown in Fig. 5.The attention mechanism does not use the input data of each forecasting point in the same ratio but rather evaluates the attention score related to the data of the corresponding forecasting point and uses it for prediction.Among various attention scores, the Bahdanau (concate) method was used 26 .
This study compares the prediction results obtained using LSTM, GRU, and seq2seq to confirm the prediction performance of DA-RNN.The importance of the weather factor in the encoder and the temporal importance  www.nature.com/scientificreports/ in the decoder were calculated.To confirm the influence of the importance calculation of each structure, the prediction results obtained through a single attention mechanism in the encoder and decoder were compared.DA-RNN is an encoder-decoder-based algorithm comprising an attention mechanism in each encoder and decoder.The input data tensor consists of n driving series and n-1 target series during the T time step, and the output data are the T time step, which is called the target series of T time steps.Each encoder passes through the input attention and encoder LSTM structure.The input attention shown in Fig. 5 can be expressed mathematically as follows: www.nature.com/scientificreports/where x k denotes the kth weather variable, h t−1 denotes the hidden encoder state, and w denotes the encoder cell state.e k t is calculated using the Bahdanau method as the kth attrition score of t time, and the attrition distribution α k t is calculated using softmax and calculated as the attention value x t of t time.The mathematical expression of the encoder LSTM is expressed as follows: where b denotes the bias, and the encoder LSTM has the same structure as the general LSTM.f t is a forget gate, i t is an input gate, o t is an output gate, s t is a cell state, and h t is a hidden state.σ and ⊙ are sigmoid function and element wise multiplication, respectively.
The decoder has a temporal attention and an LSTM decoder structure.The temporal attention is expressed mathematically as follows: The hidden state calculated using the encoder was used as the input for temporal attention.Similarly, the attention score l k t was calculated using the Bahdanau method, and the attention distribution β i t was calculated using the activation function softmax to obtain the temporal attention value c t at time t.In the LSTM decoder, the calculated attention value, and previous target value y t−1 were concatenated and used.
The mathematical expression of the decoder LSTM is identical to that of the encoder LSTM, as expressed in the following equation.
The DA-RNN model was configured using Python 3.9.

Discussion
This study explores the meteorological factors that predict and influence the prevalence of HFMD in Korea.This study aimed to predict the number of suspected HFMD cases per 1,000 people per week and analyze the sensitivity of meteorological factors.We conducted HFMD prediction and sensitivity analysis using the timeseries forecasting method DA-RNN.First, we estimated the number of suspected cases per 1000 people with HFMD from 2019 to 2020 using 2011 to 2018 data and all meteorological factors.To evaluate the performance of DA-RNN, we experimented with LSTM, GRU, Seq2seq, encoder attention-based seq2seq, and decoder attentionbased seq2seq under the same experimental conditions.Second, MAE, RMSE, MAPE, and R 2 were compared to determine the most influential group when meteorological factors were divided into groups based on temperature, wind, rainfall, sunshine, and humidity under the same conditions and when each group was excluded.
The DA-RNN outperformed other methods in terms of MAE, RMSE, MAPE, and R 2 metrics (0.8544, 2.7117, 0.3163, and 0.9333, respectively).This was confirmed using single attention, demonstrating the importance of calculating the weights of the wake-up factors in the encoder attention mechanism.Using the sensitivity test results, among the six meteorological groups, rainfall, air pollution, wind and humidity groups were identified as groups of overall importance (see Supplementary Information).
According to GeoDetector theory, the meteorological factors affecting HFMD in Guangxi, an inland region of China, are similar to temperature and rainfall.However, because our country is surrounded by sea on three sides, not only rainfall but also humidity has a significant impact.This is consistent with a study in Japan, which has a similar topography, where relative humidity was found to be an important factor in HFMD 10 .We have added the day-length group and air pollution to examine the meteorological factors in previous studies.The prevalence pattern of HFMD from 2011 to 2019 and that in 2020 differed significantly, which indirectly confirms the great influence of the air pollution group on the degree of social activity of people owing to the impact of COVID-19.
This study has several limitations.First, the pattern of HFMD in 2020 changed significantly because of COVID-19.We chose the air pollution group as an indirect factor; however, the pattern was not sufficiently Vol:.( 1234567890 www.nature.com/scientificreports/learned because of the lack of directly related data.Therefore, factors related to social phenomena and population density should be considered in future research.Additionally, the terrestrial viewpoint was not considered in this study.Therefore, it is necessary to establish a system that can evaluate risk levels by region and provide alerts.In this study, only meteorological factors were considered, which may have led to inaccurate forecasts, considering that other factors might be important.Subsequently, social and terrestrial factors should be considered to improve the accuracy of the DA-RNN model for HFMD prediction.To the best of our knowledge.This study is the first to use a DA-RNN for HFMD in Korea and reveals important meteorological factors.This model can significantly influence government policy by differentiating between the meteorological factors to be observed when predicting HFMD in Korea.This study's framework could be extended to other epidemiological studies and time-series problems.

Conclusion
This study proposes a new model to predict the number of suspected weekly HFMD cases using 20 meteorological factors.The meteorological factors were divided into six groups, and a sensitivity test was conducted to determine the most influential group.Our model uses the DA-RNN and shows good prediction results even in 2019 and 2020, which test period are difficult to predict compared with other models.The results showed that the factors that significantly affect HFMD are rainfall, air pollution, wind, humidity group.These results show the need for governments to consider meteorological factors in HFMD prevention guidelines.

Figure 1 .
Figure 1.Epidemiological and meteorological characteristics.(A) Number of suspected HFMD cases per thousand people per week.(B) Average, maximum, and minimum temperatures among the temperature group.(C) Maximum instantaneous wind speed and average wind speed among the wind group.(D) Maximum rainfall and 1 h maximum rainfall among the rainfall group.(E) Day length per year among the sunshine group.(F) Average and minimum humidity among the humidity group.(G) SO 2 , O 3 , and NO 2 among the air pollution group.

Figure 2 .
Figure 2. Estimation of suspected HFMD patients.In (A) LSTM; (B) GRU; (C) Seq2seq; (D) encoder attention-based seq2seq; (E) decoder attention-based seq2seq; (F) DA-RNN.The orange, blue, and green lines represent the actual data value, predicted value of the train, and predicted value of the test, respectively.

Figure 3 .
Figure 3. Sensitivity analysis results.(A) Spider map of the sensitivity analysis important value results; (B) Important scores; (C) Encoder attention scores of the DA-RNN test results.

Table 2 .
Meteorological factors used in forecasting and their descriptions.