Fuzzy inference-based LSTM for long-term time series prediction

Long short-term memory (LSTM) based time series forecasting methods suffer from multiple limitations, such as accumulated error, diminishing temporal correlation, and lacking interpretability, which compromises the prediction performance. To overcome these shortcomings, a fuzzy inference-based LSTM with the embedding of a fuzzy system is proposed to enhance the accuracy and interpretability of LSTM for long-term time series prediction. Firstly, a fast and complete fuzzy rule construction method based on Wang–Mendel (WM) is proposed, which can enhance the computational efficiency and completeness of the WM model by fuzzy rules simplification and complement strategies. Then, the fuzzy prediction model is constructed to capture the fuzzy logic in data. Finally, the fuzzy inference-based LSTM is proposed by integrating the fuzzy prediction fusion, the strengthening memory layer, and the parameter segmentation sharing strategy into the LSTM network. Fuzzy prediction fusion increases the network reasoning capability and interpretability, the strengthening memory layer strengthens the long-term memory and alleviates the gradient dispersion problem, and the parameter segmentation sharing strategy balances processing efficiency and architecture discrimination. Experiments on publicly available time series demonstrate that the proposed method can achieve better performance than existing models for long-term time series prediction.

Fuzzy system characterized by universal approximation capability and outstanding interpretability, providing an effective paradigm for handling uncertain data, representing latent knowledge, and exhibiting the inference process 25 .Some models attempt to enhance deep learning-based models by embedding fuzzy set theory, such as fuzzy deep convolutional neural network 26 , deep fuzzy echo state network 27 , and fuzzy recurrent neural network 28 , etc. Especially for time series forecasting, the related works have been proposed to integrate the fuzzy system with LSTM carrying the advantages of both fuzzy logic and deep learning.Li et al. 29 propose a Type-2 fuzzy LSTM neural network to perform traffic volume prediction.Tang et al. 30 propose a granule time series forecasting model by integrating the trend fuzzy granule and LSTM network.The innovations of these models are mainly reflected in taking the fuzzy information as input data and training the network's parameters with a fuzzy system.However, the overall structure of LSTM has not changed, and the interpretability of the fuzzy system has been shrunken to some extent.
The extraction of fuzzy rules from the training data is a crucial component in modeling a fuzzy system.The Wang-Mendel (WM) model is a powerful tool for directly extracting fuzzy rules using only one pass of the training data 31,32 .The effectiveness of the WM model could be greatly degraded due to the excessive generation of fuzzy rules.To overcome this issue, an improved WM model utilizing fuzzy c-means is proposed 33 .But, there is still a challenging task to determine the number of clustering.Zhai et al. 34 propose an on-line WM fuzzy inference model, which can adaptively acquire the fuzzy rules from training data.However, the performance of the model could be limited due to the redundant rules and the lacking rules not covered by the training data.
Taking all the above observations into consideration, this paper proposes a fuzzy inference-based LSTM for time series forecasting, which enhance the accuracy and interpretability of LSTM with by embedding fuzzy system.To improve the computational efficiency and completeness of WM model, a fuzzy rule base construction method based on WM model is proposed.Then, the fuzzy prediction model based on the improved WM model is constructed.Finally, the fuzzy inference-based LSTM is proposed to carry out prediction by integrating the fuzzy prediction fusion, the strengthening memory layer, and the parameter segmentation sharing strategy into the LSTM network.In summary, the main contributions of this model are as follows: (i) A fast and complete fuzzy rule construction method based on the WM model is proposed, which can enhance the computational efficiency and completeness of the WM model by fuzzy rules simplification and complement strategies.(ii) Strengthening memory layer is constructed by integrating the current output with the cell state, which can strengthen the long-term memory and alleviate the gradient dispersion problem of LSTM.(iii) Parameter segmentation sharing strategy by dividing the overall output layer into different parts is proposed, which can balance processing efficiency and architecture discrimination.(iv) Fuzzy inference-based LSTM with the embedding of a fuzzy system is proposed, which can enhance the accuracy and interpretability of LSTM for long-term time series prediction.(v) Extensive experiments demonstrate the better performance of the proposed method in comparison with related models.

Prerequisites
This paper focuses on improving the interpretability and accuracy of deep neural network based on the fuzzy inference model in tackling the time series prediction problem.This section mainly introduces two related methods, LSTM and WM model.

Long Short-Term Memory Neural Network (LSTM)
RNN has achieved good performance in processing and learning time series information, but it cannot successfully learn long-term dependencies due to gradient explosion or gradient disappearance problems.LSTM is an extension for RNNs, which introduces the "gate" cell to retain and learn long-term dependencies.LSTM network can capture important features from inputs and store the information over a long period of time, thus it has achieved good results in long-term forecasting.In general, the critical components of LSTM network architecture consists of three gates: forget, input, and output gates denoted by f, i and o, respectively.The detailed description of the calculation procedure for each gate is shown as follows: (1) Forget Gate.Determine what information needs to be retained in the memory cell with the help of sigmoid function.The output is expressed as follows: where x t and h t−1 represent input and hidden state at time step t and t − 1 , respectively.W represents weight matrices, b f represents a constant bias, and σ (•) represents sigmoid function.
(2) Input Gate.Determine whether the new information should be saved to the memory cell by the sigmoid layer and tanh layer.The outputs of the two layers are computed in the following form: The update of the memory cell is achieved by the combination of these two layers, where the current memory is obtained by retaining previous information and introducing new cell state information.The mathematical equation is expressed in the following form: (1) where c t represents cell state at time step t, • denotes the Hadamard product.
3) Output Gate.Determine what part of the memory contributes to the current put and map the output between −1 and 1 by tanh function.The outputs can be computed by the following equations:

Wang-Mendel model
WM model is a simple and powerful tool for generating the fuzzy rule base from sample data.However, the effectiveness of the WM model could be greatly degraded due to the huge amount of data.Each training data generates a fuzzy rule resulting that the rule extraction strategy is not efficient enough.Thus, improving the rule generation mechanism becomes crucial, and a fast and complete fuzzy rule base construction method based on the WM model will be proposed.The simplification strategy for redundant rules and conflict rules is proposed to simplify the fuzzy rule base.The complement strategy is proposed to complement the fuzzy rule base.
Fuzzy rules extraction.Given the time series, define the length of the antecedents and consequents of the fuzzy rule, and divide several fuzzy subsets of each variable of the antecedents to extract input-output sample pairs.Each feature of the input-output sample pair can be assigned to the fuzzy set with the highest membership degree, and these membership degrees are used to calculate the weight of the fuzzy rule, and finally an unorganized fuzzy rule base is generated.
Fuzzy rule arrangement.When the sample size is large, it is easy to generate redundant rules.To solve the above problem, when adding a rule to the rule base, first check whether the antecedents of the rule already exist in the rule base.If it does not exist, add it to the rule base; Otherwise, save the rule with the highest weight.
Fuzzy rule-based prediction.A central antifuzzy inference machine is used to organize rules with the same antecedents in the fuzzy rule base, obtain the consequent of the rules for fuzzy inference, and get the final fuzzy rule inference base. .
The drawback of this model is that the generated fuzzy rule library lacks good completeness and robustness, resulting in low model accuracy.Therefore, in order to improve the accuracy of the model, we need to optimize the method of constructing a fuzzy rule inference system to quickly and comprehensively construct a fuzzy rule inference system.

The proposed fuzzy prediction model
The construction of fuzzy rule base is crucial for fuzzy rule-based prediction model.The fuzzy rule base constructed based on WM model may be have redundant rules, and lack correspondence rules for new available sample due to the fuzzy regions uncovered by training data.To improve the computational efficiency and completeness of WM model, a fast and complete fuzzy rule base construction method based on WM model is proposed, then the prediction is performed based on the fuzzy rule base.The framework of the proposed fuzzy prediction model is shown in Fig. 1.In what follows, we explain the detailed steps of the proposed model. (4)

Fuzzy rules extraction
Given the time series T = {x 1 , x 2 , . . ., x n } , each input-output sample pair for training can be constructed as {x i , x i+1 , . . ., x i+h−1 , y i } , i = 1, 2, . . ., n − h , where {x i , x i+1 , . . ., x i+h−1 } is input sample, h is the length of input sample, and y i = x i+h is output sample.The domain of discourse is divided into q regions, then define the tri- angular fuzzy sets A 1 , A 2 , . . ., A q based on these regions shown in Fig. 2. Each feature of input-output sample pair can be assigned to the fuzzy set defined with the highest membership degree, i.e. x i is fuzzified into A 1,i with the membership degree U 1,i .The fuzzy rules can be extraction using WM method as follows: where A j,i is the jth antecedent, and A y,i is the consequence.The rule that is generated from the training data be called data-generated rules, and fuzzy rule base can be constructed and denoted as R = {R 1 , R 2 , . . ., R n }.

Fuzzy rules simplification
When the size of sample set is massive, a large number of fuzzy rules are generated.There are many fuzzy rules with same characteristics, such as redundant rules and conflict rules.Redundant rules refer to those rules with the same antecedents and consequence, and conflict rules are the rules that have the same antecedent but different consequences.To simplify the fuzzy rule base, the simplification strategy for redundant rules and conflict rules is proposed as follows: (1) Redundant rules simplification.Find the group of date-generated rules that have the same antecedents and consequences, and then keep only one fuzzy rule among them and delete the group from the fuzzy rule base.
(2) Conflict rules simplification.Find the group of date-generated rules that have the same antecedents but different consequences, and then integrate the information of all fuzzy rules in the group to generate a new fuzzy rule.Delete the group from the fuzzy rule base and add the new fuzzy rule to the fuzzy rule base.
The process of conflict rules simplification are explained as follows.Assume the group found are R ′ 1 , R ′ 2 , . . ., R ′ m , the fuzzy rule R ′ i can be expressed as: The weight of each fuzzy rule R ′ i can be computed by the product of membership function values for each antecedent: where U j,i is the membership degree of x ij to A ij .Then the value can be obtained by using the center-average defuzzification mechanism: where ȳi is the central value of fuzzy set A y i .Assuming that A ŷ is the fuzzy set on which ŷ achieves the maximum membership, the new fuzzy rule is generated as follows:

Fuzzy rules complement
The fuzzy rules are extracted from the fuzzy regions that contain sample data, thus the data-generated fuzzy rule base is in general not complete.To extrapolate the data-generated fuzzy rule base over the regions not covered by these obtained rules, the fuzzy rule base should be complemented to cover the whole domain of discourse.Especially for the forecasting problem, a complete fuzzy rule base is crucial because the rules should be welldefined at all samples in the domain of discourse.To complement the fuzzy rule base, the complement strategy is proposed as the following three steps.
Step 1) For each combination of antecedents that does not appear in the fuzzy rule base, find the group of data-generated fuzzy rules that differ from the combination in only i antecedents, and call this group the i-group.Determine the first group that is not an empty, i.e. t-group.www.nature.com/scientificreports/ Step 2) For all fuzzy rules in t-group, compute: where n t is the number of fuzzy rules in t-group, y i is the central value of fuzzy set that is the consequence of ith fuzzy rule in t-group.
Step 3) Find the fuzzy set A ŷ on which ŷ achieves the maximum membership.Assuming that the combination of antecedents is " x i1 is A i1 and • • • and x ih is A ih ", the extrapolating rule is generated as: The process is repeated until all the extrapolating rules are constructed.The complete fuzzy rule base can be obtained by integrating the extrapolating rules and data-generated rules.

Fuzzy rule-based prediction
Let {x n−h+1 , x n−h+2 , . . ., x n } be the testing sample, and each feature x ′ i is fuzzified into a fuzzy set A ′ i .The antecedents of fuzzy rule is obtained as " x n−h+1 is A ′ 1 and • • • and x n is A ′ h ", and the matching fuzzy rule can be extracted from fuzzy rule base shown as: Predicted value can be obtained by ŷ = y ′ , where y ′ is the center of fuzzy set A ′ y .In this section, an improved Wang-Mendel model for rapid construction of fuzzy inference systems is proposed, which improves the shortcomings of the incomplete fuzzy rule inference base that the Wang-Mendel model may generate by using a simpler way.Thus a complete fuzzy rule inference base is built.In the process of building this fuzzy inference system, there is not much extra time and computational overhead.
The addition of the fuzzy prediction module will affect the computational efficiency of the model.In the experiment, we improve the computational efficiency of the fuzzy prediction module as much as possible through the following methods.
1) The data in the input part of the experiment is fixed.To reduce the calculation cost of the fuzzy rule module, the construction of the fuzzy rule base is performed offline in advance.
2) For the data in the prediction part of the experiment, the branch bound search algorithm is used to reduce the computational cost when the fuzzy prediction inference base is used to find the corresponding rules.

Fuzzy inference-based LSTM for time series prediction
In this section, the fuzzy inference-based LSTM (FLSTM) for time series forecasting is proposed.The proposed method incorporates the fuzzy prediction fusion, the strengthening memory layer, and the parameter segment sharing strategy into the LSTM network.Fuzzy prediction fusion model combines the fuzzy prediction with the three gates in LSTM to enhance the fuzzy reasoning capacity of the network.Strengthening memory layer integrates the hidden state and the cell state to strengthen the long-term memory.Parameter segment sharing strategy divides the overall output layer into different parts to balance processing efficiency and architecture discrimination.The proposed forecasting model is shown in Fig. 3, and described in detail in the following section.

Fuzzy prediction fusion
The fuzzy prediction model is embedded in the LSTM to enhance the network reasoning capability and interpretability.The fuzzy rule can capture the dynamic characteristic of data change, and the reasoning relationship between the latest information and the historical information is extracted in the form of rules.The fuzzy prediction model can take full advantage of the latest information to prediction future behavior.Therefore, the LSTM combines with fuzzy prediction model can effectively overcome the lacking in the utilization of latest data.
LSTM utilizes gate cell to control information flow in recurrent computations.Therefore, the input gate, forget gate, and output gate are combined with fuzzy prediction to produce new output, which can integrate the fuzzy prediction information into the recurrent computations.The mathematical expressions can be expressed as follows: where r t is output of fuzzy prediction model at time step t, and W ff , W if , W of are weight matrices of r t for input gate, forget gate, and output gate, respectively.
After the training of model, these weights can represent the strengths of the fuzzy rules in the different gates, thus the proposed input gate, forget gate, and output gate make the results more interpretable.Meanwhile, the fusion of fuzzy prediction information in the recurrent process increases the convergence speed of the training.

Strengthening memory layer
LSTM can learn long-term dependencies through deliberate design, and the critical component is the memory cell.To strengthen the long-term memory and alleviate the gradient dispersion problem of LSTM, the output needs to be determined by the current output and the cell state, thus the strengthening memory layer is proposed.In the strengthening memory layer, the current output and the cell state are combined to form a new unit.Then, the convolution Conv1d and tanh functions are used to extract more effective features to form the new memory cell.Finally, the output is generated by adding the current and new cell states, and it can be computed as follows: Due to the addition of the new state, the latest information can be strengthened, and through the addition of new features, more information can be saved.The results of two kinds of feature information are combined in a summation way, which can strengthen the impact of the new state on the final result to a certain extent and make the results more comprehensive.

Parameter segment sharing strategy
Parameter sharing is a necessary method for controlling the number of model parameters, which improves the efficiency of the model.Parameter sharing is a reduction of the parameters that the model has to learn, which make the model processing more efficient.However, this also results in coupled optimization among different candidates, making architectures less discriminative.Therefore, a strategy of parameter segment sharing towards better trade-off between processing efficiency and architecture discrimination is proposed for LSTM.Let the prediction length be L, and the number of shared parameters be s, the k = L/s output layers are constructed to predict.Different output layers can capture temporal features from different time periods, which improves the architecture discrimination.Meanwhile, the output layer with s shared parameters guarantees the model processing efficiency.Finally, the output layer can be expressed in the form: where y t is the forecast result, W yk is weight matrices, ĥt is the output of the strengthen layer, and b y is bias.

FLSTM model
FLSTM is based on the LSTM model and integrates the fuzzy system to leverage the advantages of both fuzzy logic and deep learning.FLSTM combines the fuzzy prediction fusion, the strengthening memory layer, and the (17)

Experimental study
To verify the prediction performance of FLSTM, a comparison with twenty-two prediction methods on seven collected real-world datasets is conducted.twenty-two time series prediction methods are selected for the comparative experiments, including three classical prediction method ARIMA 11 , SVR 35 , naive 36 , six deep learning-based prediction methods GRU 37 , DRNN 38 , LSTM 39 , Reformer 22 , LogSparse self-attention 23 , and Efficient attention 40 , seven LSTM-based fuzzy inference methods FD-LSTM 41 , FIS-LSTM 42 , SEIT2FNN 43 , RIT2NFS-WB 44 , MclT2FIS-UM 45 , MclT2TIS-US 45 , eIT2FNN-LSTM 46 , a LSTM-based fuzzy gaussian prediction method LFIGLSTM 30 , a fuzzy gaussian based fuzzy inference prediction method LFIGFIS 47 , a fuzzy prediction method FPFTS 48 , and a hybrid method MLP-Arima 8 , a nonlinear autoregressive neural network NAR 49 .Seven real-world datasets are the crucial indicators in the electric power deployment, air quality assessment, daily number of Covid-19 cases, monthly sunspot numbers, and daily maximum temperatures, i.e.Electricity Transformer Temperature (ETT) 50 , PM2.5 51 , daily Covid-19 cases 52 , monthly sunspot numbers 53 , daily maximum temperatures 54 , abalone age 51 , mile per gallon 51 .To evaluate the prediction effectiveness of the proposed method, the six performance indexes, MSE, MAE, RMSE, SMAPE, MAPE, and MASE are adopted 8,40,48 .For the sake of fairness, the selection of prediction length is consistent with the the original paper of the compared models for different datasets.The results of the compared models are derived from reports in the original paper.

Experiment 1: Electricity Transformer Temperature time series
These time series are collected from Electricity Transformer Temperature (ETT) 50 in Fig. 4, where ETTh 1 and ETTh 2 are created for 1-hour-level of 2 years data from two separated countries in China, and ETTm 1 and ETTm 2 are created for 15-minutes-level from the same datasets.Experimental parameters are set as follows: The update learning of parameters used Adam optimizer, the batch size is set to 32, the learning rate is set to 0.001, the Algorithm 1. Fuzzy inference-based LSTM (FLSTM).training epoch is set to 100, the experiments times is set to 6, the dimension of the hidden layer is set to 64, the input and output channel are set to 64 for the Conv1d in the strengthening memory layer.
For ETTh 1 and ETTh 2 time series, the prediction lengths are set to 3, 6, 12, 18, 24, 36, 48, and 168 used for the experiment.For ETTm 1 and ETTm 2 time series, the prediction lengths are set to 4, 8, 12, 16, 24, 32, 48, 96, and 288.The prediction performance evaluation of ARIMA 11 , GRU 37 , DRNN 38 , LSTM 39 , FD-LSTM 41 , FIS 42 , Reformer 22 , LogTrans 23 , Efficient-att 40 , and the proposed method with different prediction lengths on the 4 time series are listed in Tables 1, 2, 3 and 4. The best results are highlighted in boldface and the winning-counts are listed in the last column.29.0% (at 168) in average.This reveals that FLSTM significantly improves the performance of LSTM.In comparison with ARIMA, GRU, DRNN, Reformer, and LogTrans, FLSTM outperforms the prediction performances of these method across all datasets.FLSTM beats Efficient-att mostly in winning-counts, i.e. 14 > 3 and 15 > 3 , and surpasses Efficient-att on longer length ( ≥ 36 ).From Tables 3, 4, we can see that FLSTM achieves better results than LSTM on MSE by decreasing 28.9% (at 96) and 32.4% (at 288) in average.This demonstrates that FLSTM significantly improves the performance of LSTM.Comparison with ARIMA, GRU, DRNN, LSTM, FD-LSTM, FIS and Reformer, FLSTM outperforms the prediction performances of these methods across all datasets.FLSTM beats LogTrans and Efficient-att mostly in winning-counts, i.e. 12 > 8 and 12 > 1 for ETTm 1 , Currently, research on PM2.5 data has generated great enthusiasm, and more and more deep learning based models have been proposed and applied to long-term PM2.5 generation 55,56 .Therefore, we have increased our research on time series prediction of PM2.5 data.These time series are collected from PM2.5 data 51 in Fig. 5, where BeijingPM and ShanghaiPM are the PM2.5 data of Bejing and Shanghai in China from 2010 to 2015, including 50387 and 51892 observations respectively.Experimental parameters are set as follows: The update learning of parameters used Adam optimizer, the batch size is set to 32, the learning rate is set to 0.001, the training epoch is set to 100, the experiments times is set to 6, the dimension of the hidden layer is set to 64, the input and output channel are set to 64 for the Conv1d in the strengthening memory layer.The prediction lengths are set to 200, 400, and 600 used for the experiment.Tables 5 and 6 summarize the evaluation results of ARIMA 11 , LSTM 39 , FPFTS 48 , and FLSTM with the three long-term prediction lengths.The best results are highlighted in boldface and the winning counts are listed in the last row.Table 5 demonstrates that FLSTM outperforms other methods for the PM2.5 time series of Beijing in terms of all evaluation metrics, except that FPFTS has the smallest RMSE when the prediction length is 600.The proposed  www.nature.com/scientificreports/method surpasses FPFTS mostly in winning-counts, i.e. 5 > 1 .In comparison with LSTM, the proposed method has a RMSE decrease of 7.0% (at 200), 8.0% (at 400), and 8.1% (at 600).This demonstrates FLSTM acquires better prediction performance than LSTM.From Table 6, we can see that FLSTM for the two evaluation metrics with all prediction lengths outperforms ARIMA, LSTM, and WM for the PM2.5 time series of Shanghai.FLSTM surpasses FPFTS mostly in winning-counts, i.e. 4 > 2 .In comparison with LSTM, FLSTM has an RMSE decrease of 38.2% (at 200), 26.6% (at 400), and 16.7% (at 600).This demonstrates FLSTM acquires better prediction performance than LSTM.The experiment shows that the success of FLSTM in improving the prediction capacity for long-term prediction.

Experiment 3: Daily number of Covid-19 cases time series
This time series is collected from the daily number of Covid-19 cases database owned by the organization Our World In Data (OWID) 52 , and it is built by the number of daily cases in the world until April 25th, 2021 in Fig. 6.Experimental parameters are set as follows: The update learning of parameters used Adam optimizer, the batch size is set to 32, the learning rate is set to 0.001, the training epoch is set to 100, the experiments times is set to 6, the dimension of the hidden layer is set to 64, the input and output channel are set to 64 for the Conv1d in the strengthening memory layer.The prediction lengths are set to 7, 14, and 28 used for the experiment same as in the literature 8 .The prediction performance evaluation of ARIMA(2,0,4)(0,1,2), MLP(14,5,1), MLP-Arima 8 , and FLSTM for short (7 days), medium (14 days), and long (28 days) prediction lengths are list in Table 7.The best results are highlighted in boldface and the winning counts are listed in the last row.Table 7 demonstrates that FLSTM for MASE and SMAPE with all prediction lengths outperforms other methods for the daily number of Covid-19 cases time series.In comparison with the state-of-the-art method MLP-Arima 8 , FLSTM has a MASE decrease of 6.6% (at 7), 79.5% (at 14), and 19.1% (at 28).This demonstrates FLSTM acquires better prediction performance than MLP-Arima.From Table 7, we can see that FLSTM surpasses the comparative methods in all winning-counts.The experiment shows that the success of FLSTM in improving the prediction capacity for different prediction lengths.

Experiment 4: Monthly sunspot numbers time series
This time series is collected from sunspot data in Fig. 7, where SUNSPOT 53 is the sunspot data of Zuerich monthly sunspot numbers from 1749 to 1983, including 2819 observations respectively.Experimental parameters are set as follows: The update learning of parameters used Adam optimizer, the batch size is set to 32, the learning rate is set to 0.001, the training epoch is set to 100, the experiments times is set to 6, the dimension of the hidden layer is set to 64, the input and output channel are set to 64 for the Conv1d in the strengthening memory layer.The prediction lengths are set to 1, 55, 110, and 165 used for the experiment.Table 8 summarizes the evaluation  www.nature.com/scientificreports/results of LFIGLSTM 30 , LFIGFIS 47 , LSTM 39 , NAR 49 , ARIMA 11 , SVR 35 , naive 36 and FLSTM with four prediction lengths.The best results are highlighted in boldface and the winning counts are listed in the last column.Table 8 demonstrates that FLSTM for RMSE, MAPE and MAE with all prediction lengths outperforms other methods for the monthly sunspot numbers time series.In comparison with the state-of-the-art method LFIGLSTM, FLSTM has a RMSE decrease of 85.2% (at 1), 50.5% (at 55), 34.8% (at 110) and 27.2% (at 165).This demonstrates FLSTM acquires better prediction performance than LFIGLSTM.From Table 8, we can see that FLSTM surpasses the comparative methods in all winning-counts.The experiment shows that FLSTM has advantages over classical prediction models, deep learning prediction models, and hybrid prediction models in both short-term and long-term prediction tasks.

Experiment 5: Daily maximum temperatures time series
This time series is collected from temperature data in Fig. 8 where Tmax 54 is the temperature data of daily maximum temperatures in Melbournea from 1981 to 1990, including 3649 observations respectively.Experimental parameters are set as follows: The update learning of parameters used Adam optimizer, the batch size is set to 32, the learning rate is set to 0.001, the training epoch is set to 100, the experiments times is set to 6, the dimension of the hidden layer is set to 64, the input and output channel are set to 64 for the Conv1d in the strengthening memory layer.The prediction lengths are set to 1, 178, 356, and 534 used for the experiment.Table 9 summarizes the evaluation results of LFIGLSTM 30 , LFIGFIS 47 , LSTM 39 , NAR 49 , ARIMA 11 , SVR 35 , naive 36 and FLSTM with four prediction lengths.The best results are highlighted in boldface and the winning counts are listed in the last column.
Table 9 demonstrates that FLSTM outperforms other methods for the daily maximum temperatures in terms of all evaluation metrics, except that LFIGLSTM has the smallest MAE when the prediction length is 356 and 534.In comparison with the state-of-the-art method LFIGLSTM, FLSTM has a RMSE decrease of 18.9% (at 1), 13.8% (at 55), 15.4% (at 110) and 9.5% (at 165).The experiment shows that FLSTM has significant advantages over these prediction models in both short-term and long-term prediction tasks.

Experiment 6: Abalone age time series
Abalone age (ABALONE) time series 51 is collected from the UCI machine learning repository as shown in Fig. 9, which includes 4177 observations.Experimental parameters are set as follows: The update learning of parameters used Adam optimizer, the learning rate is set to 0.001, the training epoch is set to 200, the experiment times is set to 6, the dimension of the hidden layer is set to 64, the input and output channel are set to 64 for the Conv1d in    10 demonstrates that FLSTM outperforms other methods for the abalone age prediction problem in terms of RMSE.In comparison with the state-of-the-art method eIT2FNN-LSTM, FLSTM has a RMSE decrease of 4.5%.The experiment shows that FLSTM has significant advantages over these prediction models.

Experiment 7: Miles-Per-Gallon time series
Miles-Per-Gallon (MPG) 51 time series is collected from the UCI machine learning repository in Fig. 10 which includes 392 observations.Experimental parameters are set as follows: The update learning of parameters used Adam optimizer, the learning rate is set to 0.001, the training epoch is set to 200, the experiment times is set to  www.nature.com/scientificreports/6, the dimension of the hidden layer is set to 64, the input and output channel are set to 64 for the Conv1d in the strengthening memory layer.The prediction length is set to 120 used for the experiment.Table 11 summarizes the evaluation results of SEIT2FNN 43 , RIT2NFS-WB 44 , MclT2FIS-UM 45 , MclT2TIS-US 45 , eIT2FNN-LSTM 46 , and FLSTM.The best result is highlighted in boldface.Table 11 demonstrates that FLSTM outperforms other methods for the Miles-Per-Gallon prediction problem in terms of RMSE.In comparison with the state-of-the-art method eIT2FNN-LSTM, FLSTM has a RMSE decrease of 9.7%.The experiment shows that FLSTM has significant advantages over these prediction models.

Ablation study
To demonstrate the respective roles of different components in the proposed method, including the fuzzy prediction fusion (FPF), strengthening memory layer (SML), and parameter segment sharing (PSS) strategy, the ablation study on the ETTh 1 dataset is carried out.For a finer analysis, the experimental results vary with differ- ent combinations of LSTM, FPF, SML, and PSS are shown in Tables 12 and 13 for different prediction lengths.
Results presented in Tables 12 and 13 reveal that the proposed method FLSTM outperforms all other combinations of LSTM, FPF, SML, and PSS for short-term and long-term predictions in terms of MSE and MAE.The three combinations with different components all improve the accuracy of LSTM, which verifies the respective roles of FPF, SML, and PSS.Although the combinations with two different components also obtain the best results as the proposed method, such as LSTM+SML+PSS at short-term prediction lengths, the performances of the methods drop when one component is removed from the proposed method for all long-term prediction lengths.This is attributed to the fact that each component has a positive impact on improving prediction capacity.The proposed method gathers the benefits of the three improvement components and gets the best performance for all prediction lengths.www.nature.com/scientificreports/fuzzy inference prediction method LFIGFIS, a fuzzy prediction method FPFTS, and a hybrid method MLP-Arima, a nonlinear autoregressive neural network NAR.In comparison with the classical prediction method, FLSTM outperforms the prediction performances of the method across all datasets.In comparison with the hybrid method, FLSTM acquires better prediction performance for all prediction lengths.In comparison with deep learning-based prediction methods, FLSTM beats these methods in winning-counts.In comparison with the fuzzy prediction method, FLSTM outperforms the prediction performances of the method in terms of winningcount.The experiments show that the success of FLSTM in improving the prediction capacity for long-term prediction.FLSTM has disadvantages in computational complexity.FLSTM can only predict one step at a time, thus the time cost becomes larger as the prediction length increase.The fixed fuzzy rule generation mechanism also limits the flexibility of prediction.Of course, these also provide ideas for future research.Future research will include the following: (1) support multi-step prediction at a time; (2) provide fuzzy reasoning with different cycle lengths; (3) extend LSTM network to more complex data; (4) apply the proposed method to other appealing directions.

Figure 1 .
Figure 1.Framework of the proposed fuzzy prediction model.

Figure 6 .
Figure 6.Illustration of the daily number of Covid-19 cases time series.

Figure 7 .
Figure 7. Illustration of the Zuerich monthly sunspot numbers time series.

Figure 8 .
Figure 8. Illustration of the Melbournea daily maximum temperatures time series.
www.nature.com/scientificreports/parameter segment sharing strategy to enhance the accuracy and interpretability of LSTM for long-term time series prediction.First, the proposed fuzzy prediction model is utilized to obtain the fuzzy rule-based prediction value.This information will be fused into the input gate, forget gate, and output gate of LSTM.Then, the strengthening memory layer sums the hidden state and the cell state to form a new state, and extracts more effective features using convolution and tanh functions.Add the new state and the new state after feature extraction to generate the strengthening memory state.The parameter segment sharing strategy can be flexibly adjusted based on different datasets and various transformations of prediction cycles and lengths, improving the model's ability to extract periodic features from time series data and effectively manages the increasing of network parameter.Algorithm 1 shows the details of the FLSTM model.

Table 1 .
Time series forecasting results on ETTh 1 dataset.

Table 2 .
Time series forecasting results on ETTh 2 datasets.

Table 3 .
Time series forecasting results on ETTm 1 datasets.

Table 4 .
Time series forecasting results on ETTm 2 datasets.

Table 5 .
Time series forecasting results on PM2.5 time series of Beijing.

Table 6 .
Time series forecasting results on PM2.5 time series of Shanghai.

Table 7 .
Time series forecasting results on the daily number of Covid-19 cases time series.

Table 9 .
Time series forecasting results on maximum temperatures.

Table 10 .
Time series forecasting results on abalone age prediction.
Figure 10.Illustration of Miles-Per-Gallon time series.

Table 11 .
Time series forecasting results on Mile-Per-Gallon prediction.