Hydrological time series prediction based on IWOA-ALSTM

The prediction of hydrological time series is of great significance for developing flood and drought prevention approaches and is an important component in research on smart water resources. The nonlinear characteristics of hydrological time series are important factors affecting the accuracy of predictions. To enhance the prediction of the nonlinear component in hydrological time series, we employed an improved whale optimisation algorithm (IWOA) to optimise an attention-based long short-term memory (ALSTM) network. The proposed model is termed IWOA-ALSTM. Specifically, we introduced an attention mechanism between two LSTM layers, enabling adaptive focus on distinct features within each time unit to gather information pertaining to a hydrological time series. Furthermore, given the critical impact of the model hyperparameter configuration on the prediction accuracy and operational efficiency, the proposed improved whale optimisation algorithm facilitates the discovery of optimal hyperparameters for the ALSTM model. In this work, we used nonlinear water level information obtained from Hankou station as experimental data. The results of this model were compared with those of genetic algorithms, particle swarm optimisation algorithms and whale optimisation algorithms. The experiments were conducted using five evaluation metrics, namely, the RMSE, MAE, NSE, SI and DR. The results show that the IWOA is effective at optimising the ALSTM and significantly improves the prediction accuracy of nonlinear hydrological time series.


Basic methods, LSTM concepts and models
LSTM is a special recurrent neural network (RNN) 18 that solves the problems of gradient explosion and disappearance of RNNs in processing time series by introducing a gate mechanism to control which information is retained and forgotten 19 .LSTM is essentially a special kind of RNN.Compared with the conventional RNN, LSTM uses three cyclic gating units, an input gate, an output gate and a forgetting gate, to fully exploit the features of the temporal data.A schematic diagram of the LSTM structure of a single cell is shown in Fig. 1.
In the structure diagram, h t−1 denotes the hidden state of the neuron at moment t − 1 , C t−1 denotes the state of the memory unit of the neuron at moment t − 1 , x t denotes the input value at moment t, f t , i t , and o t denote the forget gate, the input gate, and the output gate, respectively, and C t denotes the status update value of the memory cell.
The forget gate f t in the LSTM determines which information the memory unit should discard.The forget gate reads the value h t−1 of the previous hidden layer with the input value x t and subsequently outputs a vector between 0 and 1, where 0 indicates that all information in the information in the memory cell C t−1 is forgotten and 1 indicates that all information is retained.
The input gate i t determines whether new information is added to the memory cell.First, pass the input value x t and the information h t−1 of the previous hidden layer are passed into the sigmoid activation function, and a vector i t is output with the same range of values as that of f t .
Then, using the input value x t and the information h t−1 of the previous hidden layer, a new state value is output by the tanh activation function.
Then, the memory cell C t−1 is updated.C t−1 * f t denotes the cell state of the previous layer multiplied by the forget gate to determine the information forgotten from C t−1 ; i t * C t indicates that the sigmoid output value is multiplied by the tanh to determine the information added to the memory cell, and the two parts are weighted and summed to finally obtain the new information and update it into the cell state. (1) The output gate o t is used to determine the information for the next hidden layer input.First, the input value x t and the information h t−1 of the previous hidden layer are fed into the sigmoid activation function.
Then, the tanh activation function is applied to the updated memory cell information C t , and finally the two activation function values are multiplied to obtain the state variable h t of the current hidden layer.
The internal structure of an LSTM network is more complex than that of a conventional RNN.The internal memory unit in LSTM is able to freely select the content of that memory in each time step, thus solving the problems of gradient explosion and gradient disappearance in RNNs and making LSTM more suitable for processing time series.

Attention mechanism in LSTM
The long short-term memory(LSTM) neural network is adept at capturing long-term dependencies within hydrological time series data.The LSTM architecture incorporates gate units that enable the network to retain contextual memory from the hydrological time series, making it a widely employed technique for hydrological time series prediction.Nevertheless, LSTM processes sequential information incrementally during the prediction process, treating input data from each time step and feature equally.However, in practice, the proximity of the time intervals significantly influences the prediction outcomes.To address this issue, in this paper, an attention mechanism is introduced into the LSTM neural network.By placing the attention mechanism between two LSTM layers, the importance of various features at each time step can be assessed through attention weights, enabling the adaptive selection of input vectors with varying degrees of relevance and thereby enhancing the prediction accuracy.
The ALSTM model consists of six parts: the input layer, the first LSTM layer, the attention layer, the second LSTM layer, the fully connected layer and the output layer.The ALSTM hydrological prediction model is shown in Fig. 2. www.nature.com/scientificreports/According to the characteristics of hydrological time series, 12 months is usually used as the observation period.The observed hydrological data such as water level and water potential, are formed into a sequence of feature vectors [x 1 , x 2 , . . ., x n ] , where x i ∈ R N and N is the number of features in the sequence data.Feature vector sequence x = [x 1 , x 2 , . . ., x n ] is put after the first LSTM layer to obtain the new hidden layer vector h 1 1 , h 1 2 , . . ., h 1 n and memory cell vector c 1 1 , c 1 2 , . . ., c 1 n at moment t.
The attention layer weights and sums the input feature vector sequence x and the hidden layer vector and memory cell vector output from the first LSTM layer to obtain the attention vector.
The attention weight a i for the importance of the ith feature at moment t is obtained by changing the sum of all the attention weights to 1 through the softmax function and represents the importance of the different features.
where ω T ∈ R N , W ∈ R N× Hidden layer size , and U ∈ R N×N are weight matrices, and b ∈ R N is the item to be learned by the attention mechanism.
The attention layer assigns weights to the input feature vector sequence x to obtain a new vector The new sequence of feature vectors x is used as input to the second layer of the LSTM network, and the infor- mation in the sequence data is extracted, memorised and learned.The mapping relation from xt to h 2 t at time t can be obtained through the learning of the second layer of the LSTM network.
The hidden layer vector output from the second layer of the LSTM network is dimensionally transformed in the fully connected layer.Finally, in the output layer, the length to be output is selected, and the result is transformed into the predicted dimensionality.
where [y 1 , y 2 , . . ., y m ] denotes the prediction results in m time periods, H denotes the input matrix formed by h 2 1 , h 2 2 , . . ., h 2 n , W d denotes the weight matrix, and activation denotes the activation function.

Whale optimisation algorithm
The whale optimisation algorithm (WOA) is a novel population intelligence optimisation algorithm that was proposed in 2016 20 .The algorithm simulates the bubble-net feeding method used in humpback whale hunting, and its performance far exceeds that of traditional algorithms.This algorithms consists of three main stages: surrounding prey, performing a bubble-net attack, and searching for prey.

Surrounding prey
Humpback whales swim in the direction of the best positioned humpback whale when they locate and surround.
It is assumed the best positioned humpback whale is the target prey.
where t indicates the current number of iterations, − → A and − → C are vectors of coefficients, − → X * is the currently obtained optimal humpback whale as a position vector, and − → X is the current humpback whale position vector.The optimal solution − → X * will change with updates during the iterative process.
− → a decreases linearly from 2 to 0 with the number of iterations.
where t indicates the current number of iterations and T max is the maximum number of iterations. (7) Vol.:(0123456789)

Performing a bubble-net attack
When hunting, whales blow bubbles to form bubble nets to chase their prey, and the following mathematical model is used to simulate this predatory behaviour.Shrinkage envelope mechanism: The behaviour of whales feeding on their prey is simulated by decreasing the value of − → a during the iterative process.
− → A also decreases as − → a decreases and fluctuates within the interval [− a,a].When − → A is a random value in [− 1,1], then the new position of the humpback whale can be any position between the original position and the optimal individual position.
Spiral location update: The distance between the humpback whale at (X, Y) and the prey at (X * , Y * ) is first cal- culated, and then the spiral movement performed by the humpback whale is modelled using the spiral equation.
where (X, Y) is the position of the humpback whale, (X * , Y * ) is the position of the prey, − → D ′ denotes the distance between the ith humpback whale and the target prey, b is a constant, and l is a random number between [-1,1].
As the humpback whale swims around its prey, it follows a spiral path.To model these two simultaneous behaviours, the same probability is used for updating of the whale's position.

Searching for prey
In the search phase, i.e, when |A| > 1, the humpback whale is not in the position of the best individual in the reference population.However, the position of a randomly selected humpback whale is updated with the aim of conducting a global search.
where − → X rand is the position vector of a randomly selected humpback whale and − → D is the distance from the randomly selected humpback whale to the prey.

Improved whale optimisation algorithm
The whale optimisation algorithm achieves good results in terms of the convergence accuracy and convergence speed and has the advantages of operation simplicity, and few parameters.However, this algorithm also has the problem of an imbalance between its global search ability and local exploitation ability, and it easily falls into local optimal solutions.A high-quality initialisation population contributes significantly to the performance of the algorithm in terms of the solution accuracy and convergence speed.To further improve the accuracy of the whale optimisation algorithm, an improved whale optimisation algorithm named the IWOA is proposed.The algorithm first uses a backwards learning approach to initialise the population followed by a nonlinear convergence factor for optimisation seeking at update time, achieving a balance between global and local search capabilities.

Reverse learning to initialise populations
The quality of the initial population directly affects the subsequent iterations of the algorithm, and a high-quality population can effectively improve the convergence speed and accuracy of the iterative process.Due to the stochastic nature of intelligent optimisation algorithms, the initial populations of the original Whale Optimisation algorithm are generated in a random way, causing the WOA to be inefficient in its runtime search.To ensure the diversity of the initialised populations, a backwards learning approach is introduced into the WOA.N individuals from the initial population are combined with N individuals after reverse learning to form a new population with 2N individuals, and then the N individuals with the greatest diversity from the new population are selected by a ranking algorithm to form a new initialised population.
Reverse learning 21 is based on specifying the range boundaries of variables and finding their corresponding reverse solutions via certain rules.If the size of the whale population is N and the search space is d-dimensional, the position of the ith whale in the d-dimensional space can be expressed as   The individuals in the population are stratified by a noninferiority sorting algorithm, which divides the individuals in the population into L levels according to the relationships between individuals.Individuals at the same level have the same rank, which is denoted as Level i .The individuals in the first level are noted as Level 1 , and the level rank is the highest.For individuals in the same stratum, the crowding distance sorting method is used.
where Dist(i) denotes the crowding distance of an individual and the initial value is set to 0, i.e, Dis(i) = 0 .f Mmax and f Mmin are the maximum and minimum values of the Mth objective function, respectively.f M (i + 1) and f M (i − 1) are the values of the Mth objective function for the two individuals on the same level as i and adjacent to it.The individuals of the population are ordered in the following way: each individual has two attributes, Level(i) and Dist(i).For any two individuals i and j in the population, Level(i) and Level(j) are compared when they are at different levels.If Level(i) < Level(j), then i is ordered higher than j; when they are at the same level, the crowding distance of the individuals is compared and the individual with the greater crowding distance is retained.That is, when Dis(i) > Dis(j) , i is ordered higher than j.

Convergence factor update
The traditional whale optimisation algorithm determines whether to perform a global or local search by means of the parameter − → A .However the update of the parameter − → A relies mainly on the convergence factor − → a for linear changes.The use of linear transformations makes the convergence rate consistent, so a nonlinear convergence factor is designed in this paper.
where T max denotes the maximum number of iterations, µ and ϕ are the relevant parameters.In this paper,µ = 0.5 and ϕ = 0 are chosen in this paper.

IWOA-optimised ALSTM model
The ALSTM model requires the determination of six main parameters, namely, the number of nodes in the first LSTM hidden layer, the number of nodes in the second LSTM hidden layer, the number of nodes in the fully connected layer, the learning rate, the number of batch processes and the number of iterations.With a large sample size, the prediction accuracy of a neural network model varies with the structure of the network.The learning rate determines the step size of the weight iterations; too large of a step size will result in a nonconverging model, and too small of a step size will result in slower convergence.A large batch size will reduce the training time and improve the stability of the model, but as the batch size increases, the performance of the model will decrease; as the number of iterations increases, the neural network will fit increasingly better and eventually enter overfit.In this paper, the IWOA algorithm is used to determine the above six parameters, and the optimised network model parameters are used as the final prediction model.The structure of the IWOA-optimised ALSTM is shown in Fig. 3.
The optimisation of the proposed ALSTM model using the IWOA proceeds in six main steps, which are explained below.
• Step 1: The maximum and minimum boundary values for the number of LSTM hidden layer neurons, the number of fully connected neurons, the learning rate, the number of batches, and the number of iterations are set, and the IWOA selects the minimum boundary value as the initial value and encodes it.• Step 2: The IWOA is initialised with parameters such as the number of populations N, the maximum number of iterations T max and the probability p.The initial population selection is carried out by backwards learn- ing and individual sorting of the populations, and the parameters are passed into the ALSTM model, which calculates the fitness value of the model and derives the current optimal solution X * .• Step 3: The IWOA population and the parameters − → a and − → A .If ∴ − → |A| > 1, then a global merit search is per- formed; otherwise, a local merit search is performed and the population update is completed.
• Step 4: The updated information about the population parameters is passed into the ALSTM and the fitness value is calculated, overwriting the current optimal solution and its corresponding fitness value if it is smaller.If larger, then the current solution and its corresponding fitness are retained.( 24)

Experiment and analysis
To investigate the performance of the IWOA-ALSTM model, we verify the iterative convergence effect of the IWOA convergence factor and verify whether the IWOA can effectively perform a hyperparameter search for the ALSTM and predict the non-linear components of the water level time series.In this section, experiments are conducted on the nonlinear subcomponent series of hourly water level data from Hankou station in the Yangtze River Basin.

Data and environment
The hydrological dataset for this experiment is derived from the nonlinear components of the measured hydrological data from Hankou Station in the Yangtze River Basin, with the raw data collected at a frequency of 60 minutes.The data were collected from 8:00 on June 17, 2016, to 8:00 on June 16, 2017, with a total of 8736 sets of experimental samples (including 106 sets of missing values).The first 7862 sets of data were selected as the sample data for training, and the final 874 sets of data were used as the test sample to test the accuracy of the model predictions.The nonlinear component variation curves of the water level at Hankou station are shown in Fig. 4.

Missing value filling
For missing values in the water level data, linear interpolation was used to fill in the values.

Data normalisation
As obseved in the graphs of water level changes, the water levels fluctuate widely range and have different magnitudes.The data are processed using min-max normalisation and scaled to the [0,1] range.
where X max and X min are the maximum and minimum values of the sequence, respectively, X ′ i denotes the normalised value, and X i denotes the element in the original sequence.In terms of the prediction accuracy, the RMSE and MAPE of the IWOA-ALSTM model are 0.868 and 0.012 lower than those of the WOA-LSTM model, respectively.The NSE is 0.0058 greater and the SI of the IWOA-ALSTM model is better than that of the WOA-ALSTM model.The DR results of the IWOA-ALSTM model are the same as those of the WOA-ALSTM model but better than those of the PSO-ALSTM and GA-ALSTM models.The overall performance of these four models is ranked in increasing order, with GA-ALSTM yielding the worst results and IWOA-ALSTM achieving the best results.Moreover, PSO-ALSTM performs better than GA-ALSTM but worse than WOA-ALSTM.The GA and PSO algorithms are less effective at finding the best ALSTM model, and the WOA-ALSTM is significantly inferior to the IWOA-ALSTM.In terms of operational efficiency, the overall time overhead of the four models is the smallest for WOA-ALSTM, followed by IWOA-ALSTM.Although the training time of the IWOA-ALSTM model is slightly longer than that of the WOA-ALSTM model, it improves the prediction accuracy of the water level data.Thus, the experiments show that using the IWOA for ALSTM searches can result in better ALSTM hyperparameter values than can using the GA, PSO algorithm or WOA while improving the accuracy of hydrological time series predictions.
The above experiments show that the designed IWOA is slightly inferior to the WOA in terms of the identification time but is greatly improved in terms of the prediction accuracy.This result shows that the improved whale optimisation algorithm can effectively perform a parameter search for the designed ALSTM model and can improve the predictive power of the model.

Conclusion
In this paper, an improved whale optimisation algorithm named the IWOA is proposed.This algorithm is used to perform a hyperparametric optimisation search for the ALSTM design, and experiments are carried out on nonlinear water level component data from Hankou station.Five different evaluation functions, i.e., the RMSE, MAE, NSE, SI and DR, are used to validate the accuracy of the proposed method.The experimental results show that the method proposed in this paper achieves the highest prediction accuracy.The IWOA-ALSTM model utilises an attention mechanism to extract more important features and the powerful parameter optimisation ability of the IWOA to achieve a more accurate prediction of hydrological time series.
Although this study presents promising results, there are some questions that deserve further consideration and exploration in the future:
j i , b j i (j=1,2,3,...d), a j i and b j i denote the lower and upper bounds of x j i respectively, x J l denotes the new individual after reverse learning, and rand is a random number from 0 to 1.

2 Vol 2 2
= 1, 2, ..., d) denotes the coordinates of a single whale.The position of the ith whale in the d-dimensional space can be expressed asX i = x 1 i , x 2 i , . . .x d i (i = 1, 2, 3 . . .N)denotes the average of the upper and lower bounds, rand is a random number from 0 to 1, and rand * denotes the random part of the reverse learning.If rand is 1, then the position of the whale is unchanged.If rand is 0, the position of the whale is the midpoint of the upper and lower bounds.A new individual is created after each initial population individual is learned in reverse, and the number of individuals in the combined population is 2N.

3 •
https://doi.org/10.1038/s41598-024-58269-Step5: A determination on whether the training number has reached the maximum number of iterations T max .If the training number reaches the maximum, the optimal ALSTM hyperparameters are obtained and assigned to the ALSTM model.If the number of iterations is less than T max , return to step 3.• Step 6: An ALSTM model is built based on the obtained optimal ALSTM hyperparameters to predict and analyse the hydrological time series.

Table 1 .
Error values and training times for the four algorithms.