Introduction

In recent years, the frequent occurrence of extreme rainfall events has posed a significant threat to socioeconomic well-being and human safety. Accurate simulation of rainfall processes holds crucial significance for water resources management1,2. The formation of precipitation is influenced by multiple factors, exhibiting a high degree of complexity and uncertainty3,4. Existing rainfall prediction models can generally be classified into two main categories: process-driven and data-driven5. The latter, data-driven models, operate without the need to consider the physical mechanisms underlying runoff occurrence. Instead, they solely entail mathematical analysis of time series data to establish functional relationships between input and output variables. Consequently, these models exhibit greater operational feasibility6,7,8. With the development of artificial intelligence and big data in recent years, the application of machine learning has presented new opportunities for rainfall prediction9,10. The Bidirectional Long Short-Term Memory Neural Network (BiLSTM) model, characterized by its simple structure, strong fault tolerance, and ability to capture long-term dependencies, has achieved successful applications in numerous rainfall prediction studies11,12,13. However, the predictive capability of a single model is limited. Coupling machine learning models with data decomposition methods such as ensemble empirical mode decomposition, wavelet decomposition, and variational mode decomposition (VMD) to establish decomposition ensemble models can effectively enhance model accuracy14. Among these methods, VMD demonstrates the ability to control central frequency aliasing phenomena and noise levels, making it easier to improve the predictive performance of decomposition ensemble models15.

Establishing coupled forecasting models to enhance rainfall prediction accuracy has emerged as a current research focus16,17,18. However, traditional decomposition methods typically involve initially decomposing the entire rainfall sequence and then partitioning the decomposed sub-sequences into training and testing periods. This approach of decomposition before partitioning leads to the premature utilization of testing period data prior to model training, which falls short of meeting actual forecasting requirements. Comparative analyses conducted by Du et al.19 and Wei et al.20 scrutinized the outcomes of various ensemble hydrological prediction models employing decomposition. They discovered that treating forecast factor data from the testing period as known information for constructing ensemble decomposition models results in "false" high-precision prediction outcomes. Some scholars have proposed stepwise decomposition techniques, wherein the observed sequence is initially divided into training and testing periods, followed by decomposition modeling of the training period, thus preventing the incorporation of future information into the training samples. Models developed based on stepwise decomposition techniques demonstrate reliable performance21,22. Wei et al. introduced a stepwise decomposition sampling technique to construct accurate decomposition ensemble models, aiming to enhance the prediction accuracy of correctly decomposed ensemble models20.

This study utilizes weekly rainfall data from the Nansi Lake Basin and employs a stepwise decomposition technique combined with variational mode decomposition (VMD) to construct training and testing sets for rainfall prediction models. A VMD-BiLSTM coupled model is developed to forecast the weekly rainfall sequence in the Nansi Lake Basin, with model parameters optimized using an improved particle swarm optimization (IPSO) algorithm. This model effectively addresses the issue of traditional ensemble decomposition models incorporating future information, thereby enhancing the prediction accuracy of decomposition ensemble models.

Models and methods

Variational mode decomposition

The variational mode decomposition (VMD)23 method is an innovative, fully non-recursive data decomposition approach that is adaptive in nature. This method achieves the decomposition of the original signal x into a series of Intrinsic Mode Functions (IMFs) by seeking the optimal solution to a constrained variational problem.

$$ \left\{ {\begin{array}{*{20}l} {\frac{\min }{{\left\{ {\mu_{k} ,\omega_{k} } \right\}}}\sum\limits_{k = 1}^{k} {\left\| {\partial_{t} \left[ {\left( {\delta \left( t \right) + \frac{j}{\pi t}} \right) \otimes \mu_{k} \left( t \right)} \right]e^{{ - j\varpi k^{t} }} } \right\|_{2}^{2} } } \hfill \\ {\sum\limits_{k = 1}^{k} {\mu_{k} = x} } \hfill \\ \end{array} } \right., $$
(1)

where \(k\) represents the number of \(IMF_{S}\); \(\left\{ {\mu_{K} (t)} \right\} = \left\{ {\mu_{1} ,\mu_{2} , \cdots ,\mu_{k} } \right\}\) denotes the \(kth\) modal component; \(\mu_{k} \left( t \right)\) is the value of the \(kth\) modal component at time \(t\); \(\left\{ {\omega_{k} } \right\} = \left\{ {\omega_{1} ,\omega_{2} , \cdots ,\omega_{k} } \right\}\) corresponds to the central frequency of the \(kth\) modal component; \(t\) stands for time; \(\partial_{t}\) is the first-order derivative of the function with respect to time \(t\); \(\delta (t)\) is the unit impulse function; \(j\) represents the imaginary unit; \(\otimes\) signifies the convolution operation.

$$ \begin{gathered} L(\left\{ {\mu_{k} } \right\},\left\{ {\omega_{k} } \right\},\lambda ) = \alpha \sum\limits_{k = 1}^{k} {\left\| {\partial_{t} \left[ {\left( {\delta \left( t \right) + \frac{j}{\pi t}} \right) \otimes \mu_{k} \left( t \right)} \right]e^{{ - j\varpi k^{t} }} } \right\|_{2}^{2} } + \hfill \\ \left\| {x(t) - \sum\limits_{k = 1}^{k} {\mu_{k} (t)} } \right\|_{2}^{2} + \left\langle {\lambda (t),x(t) - \sum\limits_{k = 1}^{k} {\mu_{k} (t)} } \right\rangle , \hfill \\ \end{gathered} $$
(2)

where \(\alpha\) represents the quadratic penalty factor; \(\lambda\) denotes the Lagrange multiplier. \(\lambda \left( t \right)\) is the value of \(\lambda\) at time \(t\), and \(x\left( t \right)\) is the value of \(x\) at time \(t\). The alternating direction multiplier iteration algorithm is employed to solve the saddle point of Eq. (2).

Improved particle swarm optimization

Due to the fact that in the basic particle swarm optimization algorithm, parameters \(\omega\)\(c_{1}\)\(c_{2}\) are constants, the optimization process is highly susceptible to getting trapped in local optima, and its optimization capability is relatively poor when dealing with multiple objective functions and constraints. Therefore, an optimized improvement is proposed for the basic particle swarm optimization algorithm, aiming to make it more suitable for multi-objective problem solving.

$$ \left\{ {\begin{array}{*{20}l} {\omega^{\prime} = \omega_{\min } + \left( {\omega_{\max } - \omega_{\min } } \right)\left( {\frac{{t_{cur} }}{{t_{\max } }}} \right)^{2} } \hfill \\ \begin{gathered} c^{\prime}_{1} = c_{1i} + (c_{1f} - c_{1i} )\sqrt {\frac{{t_{cur} }}{{t_{\max } }}} \hfill \\ c^{\prime}_{2} = c_{2i} + (c_{2f} - c_{2i} )\left( {\frac{{t_{cur} }}{{t_{\max } }}} \right)^{2} \hfill \\ \end{gathered} \hfill \\ \end{array} } \right., $$
(3)

where \(\omega^{\prime}\) is the improved inertia weight factor, with \(\omega_{\max }\) set to 0.9 and \(\omega_{\min }\) to 0.2; \(c^{\prime}_{1}\) and \(c^{\prime}_{2}\) are the refined learning factors; \(t_{cur}\) represents the current generation count; \(t_{\max }\) is the maximum number of iterations; \(c_{1f}\) and \(c_{2f}\) are the termination values for \(c_{1}\) and \(c_{2}\), set to 0.5 and 2, respectively; \(c_{1i}\) and \(c_{2i}\) are the initial values, taken as 2 and 0.5, respectively.

$$ \overline{P} = \frac{1}{N}\sum\limits_{i = 1}^{N} {P_{ij}^{t} } , $$
(4)

where \(\overline{P}\) is the average of the optimal values of all individual particles; \(N\) is the number of particles; \(P_{ij}^{t}\) is the location of the optimal values of individual particles. The improved expression of the algorithm is:

$$ \upsilon_{ij}^{t + 1} = \omega^{\prime}\upsilon_{ij}^{t} + c^{\prime}_{1} r_{1} (\overline{P} - x_{ij}^{t} ) + c^{\prime}_{2} r_{2} (P_{gj}^{t} - x_{ij}^{t} ), $$
(5)

where \(\upsilon_{ij}^{t + 1}\) is the velocity of the particle; \(t\) is the number of selected generations; \(r_{1}\), \(r_{2}\) are random numbers in the interval [0–1], \(x_{ij}^{t}\) is the position of the particle for \(t\) iterations; \(P_{gj}^{t}\) is the current optimal value position of all particles of the population.

Bidirectional long short-term memory neural network

LSTM24 is a deep neural network that can accurately and efficiently learn long-term dependent information by introducing a gating mechanism that allows the model to selectively retain the function of transmitting long-term timing data information25. As shown in Fig. 1, it consists of three gates, input gate, output gate and forgetting gate and one core computing node. The forgetting gate, the input gate, and the output gate jointly realise the control to the unit state, selectively adding or removing information to the unit state.

Figure 1
figure 1

Structure of LSTM unit.

BiLSTM26 network is composed of forward and reverse LSTM neural networks, which can realise forward and reverse two LSTM training for time series, and effectively improve the comprehensiveness and completeness of feature selection. The structure of BiLSTM27 is shown in Fig. 2.

Figure 2
figure 2

BiLSTM structure.

The output \(\vec{h}_{t}\) of the forward LSTM layer in Fig. 2 is connected to the output \(\mathop{h}\limits^{\leftarrow} _{t}\) of the backward LSTM layer, which is weighted and fused to obtain the final power output value \(O_{t}\). The BiLSTM equation is:

$$ \vec{h}_{t} = \overrightarrow {{L_{LSTM} }} (h_{t - 1} ,i_{t} ),\,\,\,t = 1,2, \cdots ,n $$
(6)
$$ \mathop{h}\limits^{\leftarrow} _{t} = \overleftarrow {{L_{LSTM} }} (h_{t + 1} ,i_{t} ),\,\,t = 1,2, \cdots ,n $$
(7)
$$ O_{t} = f\left( {W_{{\vec{h}}} \vec{h}_{t} + W_{{\mathop{h}\limits^{\leftarrow} }} \mathop{h}\limits^{\leftarrow} _{t - 1} + b_{t} } \right), $$
(8)

where \(i_{t}\) is the input eigenvector; \(\vec{h}_{t}\), \(\mathop{h}\limits^{\leftarrow} _{t}\) forward and backward power predictions; \(\overrightarrow {{L_{LSTM} }} ( \cdot )\), \(\overleftarrow {{L_{LSTM} }} ( \cdot )\) is the network bidirectional computation process; \(W_{{\vec{h}}}\), \(W_{{\mathop{h}\limits^{\leftarrow} }}\) is the bidirectional output connection weight matrix, \(b_{t}\) is the output layer bias, and \(O_{t}\) the final output power prediction of the network.

Model construction

Stepwise decomposition for sample construction

  1. (1)

    The rainfall sequences \((S_{1} ,S_{2} , \cdots ,S_{N} )\) is divided into training set \((S_{1} ,S_{2} , \cdots ,S_{P} )\) and test set \((S_{P + 1} ,S_{P + 2} , \cdots ,S_{N} )\). As shown in Fig. 3, the training set and test set ratios of 9:1,8:2,7:3 are considered. The violin plots illustrate the distribution of the rainfall sequences for the three allocation ratios.

Figure 3
figure 3

Scale violin diagram for the division of training and testing periods.

According to Fig. 3, it is evident that the rainfall data for Heze, Jining, Xuzhou, and Zaozhuang exhibit relatively uniform overall distributions. The kernel density distributions of training and testing datasets are closer under both 8:2 and 7:3 allocations. By considering the closest match in terms of mean and standard deviation as indicated in Table 1, the final decision is made to set the training and testing set ratios for Heze and Zaozhuang at 7:3, while for Jining and Xuzhou, the ratios are set at 8:2.

Table 1 Precipitation characteristic indexes in training set and testing set.
  1. (2)

    Sequence \((S_{1} ,S_{2} , \cdots ,S_{m} )\) is decomposed into \(K\) sub-sequences. Data \(S_{m + 1}\) is added to sequence \((S_{1} ,S_{2} , \cdots ,S_{m} )\) to form a new sequence \((S_{1} ,S_{2} , \cdots ,S_{m} ,S_{m + 1} )\), which is then decomposed into \(K\) sub-sequences. New data is sequentially added to \((S_{1} ,S_{2} , \cdots ,S_{m} )\) for decomposition.

  2. (3)

    Each sequence can be decomposed into \(K\) sub-sequences, from which the last m elements of each sub-sequence are extracted as explanatory variables. These explanatory variables serve as initial input data for the coupled model, which is then fine-tuned to predict the values of the response variable.

  3. (4)

    The response variables of the sub-sequences obtained through decomposing sequence \((S_{1} ,S_{2} , \cdots ,S_{m} , \cdots ,S_{p - 1} ,S_{p} )\) are illustrated in Fig. 4, depicting the stepwise decomposition sampling technique12.

Statistical evaluation indicators

To validate the predictive superiority of the stepwise decomposition-based VMD-IPSO-BiLSTM model, it is compared with the IPSO-BiLSTM model. Additionally, to assess the advantages of the IPSO optimization algorithm, a comparison is made between the IPSO algorithm and the conventional PSO algorithm. The predictive performance of the models is evaluated using three error metrics: mean absolute error (MAE), root mean square error (RMSE), and Nash–Sutcliffe efficiency (NSE). Smaller MAE and RMSE values and an NSE closer to 1 indicate better point prediction performance of the model. Due to the stochastic nature of the coupled model, the weekly rainfall time series data for the four cities are run 20 times, recording the results for evaluation based on error and model performance metrics. The Nash–Sutcliffe efficiency (NSE), root mean square error (RMSE), and mean absolute error (MAE) are employed as model error evaluation metrics, with the following formulas:

$$ RMSE = \sqrt {\frac{{\sum\limits_{i = 1}^{n} {(P\left( i \right) - P_{{}}^{*} \left( i \right))^{2} } }}{n}} , $$
(9)
$$ NSE = 1 - \frac{{\sum\limits_{i = 1}^{n} {\left( {P\left( i \right) - P_{{}}^{*} \left( i \right)} \right)^{2} } }}{{\sum\limits_{i = 1}^{n} {\left( {P\left( i \right) - \overline{P} } \right)^{2} } }}, $$
(10)
$$ MAE = \frac{1}{n}\sum\limits_{i = 1}^{n} {|P^{*} (i) - P(i)|} ,i = 1,2,...,n. $$
(11)

where \(P\) is the observed value, \(P^{*}\) is the predicted value, \(\overline{P}\) is the mean of observed values.

Figure 4
figure 4

Stepwise decomposition flow chart12.

Case studies

Data sources

Nansi Lake, located in Shandong Province, China, is the largest lake in the province and one of the largest freshwater lakes in China. With a basin area of 31,700 km2, it serves multiple ecological functions and serves as a crucial reservoir for the South-to-North Water Diversion Project, playing a significant role in water diversion and storage. The average annual precipitation in the South Four Lakes basin is 731 mm, with a total storage capacity of 4.731 billion m3 and an average annual available water volume of 1.273 billion m3. An overview of the study area is presented in Fig. 5. The map in Fig. 5 was created using the ArcGIS software version10.8, available at http://www.esri.com/software/arcgis.

Figure 5
figure 5

Research situation of Nansi Lake basin.

The Nansi Lake Basin encompasses the cities of Heze, Jining, Xuzhou, and Zaozhuang. Daily rainfall data for these four cities were obtained from the National Meteorological Science Data Center (https://data.cma.cn/). The 7671 daily rainfall data points spanning from January 1, 2000, to December 31, 2020, were organized into 1096 sets of weekly rainfall data. The data was split according to the allocation ratios for the training and testing periods. For Heze and Zaozhuang, the training period spans from January 1, 2000, to September 26, 2014, and the testing period from September 27, 2014, to December 31, 2020. For Jining and Xuzhou, the training period extends from January 1, 2000, to October 28, 2016, and the testing period from October 29, 2016, to December 31, 2020. Please refer to Fig. 6 for details.

Figure 6
figure 6

Weekly rainfall series of four cities in the South Four Lakes basin.

Model training

The BiLSTM model features a bidirectional recurrent neural network with input and output layers, as well as hidden layers. The Adam optimizer is selected to update the model parameters. The IPSO optimization model is configured with a population size of 50, an inertia weight of 0.7, and individual learning factors \(C_{1}\) and social learning factors \(C_{2}\) as shown in Table 2.

Table 2 Table of optimal parameter combinations for each model.

Addressing issues such as manual parameter tuning and slow convergence speed in the BiLSTM neural network predictive model, the particle swarm optimization (PSO) algorithm is employed to optimize the iteration count (N), learning rate (L), the number of nodes in the first hidden layer (\(H_{1}\)), and the number of nodes in the second hidden layer (\(H_{2}\)). The optimal parameter combination is ultimately obtained, as presented in Table 2.

Model prediction results

To better analyze the predictive performance of each model during the testing period, Fig. 7 presents a scatter plot of observed values against predicted values. The red line in the figure represents the linear fitting line for observed and predicted values. A comparison between PSO-BiLSTM and IPSO-BiLSTM reveals that the scatter distribution of IPSO-BiLSTM is more compact, and the slope of the linear fitting line is closer to 1 during the testing period for each city. Additionally, comparing the predictive results of VMD-PSO-BiLSTM and VMD-IPSO-BiLSTM shows that the slopes of VMD-IPSO-BiLSTM are all greater than 0.78, indicating a more compact scatter distribution. This suggests that the IPSO-optimized VMD-BiLSTM model outperforms the VMD-BiLSTM model optimized with conventional PSO. The IPSO algorithm demonstrates its ability to enhance the predictive accuracy of both individual models and decomposition ensemble models. In comparison to the two individual models (PSO-BiLSTM and IPSO-BiLSTM), the hybrid models (VMD-PSO-BiLSTM and VMD-IPSO-BiLSTM) exhibit superior performance in rainfall prediction across the four cities.

Figure 7
figure 7

Scatterplot of rainfall series prediction results for each city.

Multi-model comparison

Table 3 presents the model accuracy evaluation results during the training and testing periods for the IPSO-optimized models. During the training period, PSO-BiLSTM, IPSO-BiLSTM, and VMD-PSO-BiLSTM demonstrated optimal simulation results in Heze, while VMD-IPSO-BiLSTM exhibited superior performance in simulating Zaozhuang. In the testing period, VMD-IPSO-BiLSTM outperformed other models in terms of performance across all cities.

Table 3 Evaluation results of each model in the training and testing periods.

As shown in Table 3. Comparing the prediction results of PSO-BiLSTM and IPSO-BiLSTM for each city in the test period, it can be seen that the MAE and RMSE values of the IPSO-BiLSTM model are smaller, and the NSE value is closer to 1. The MAE value decreases from 6.30 to 7.65 to 6.13 to 6.97, and the RMSE decreases from 10.94 to 12.99 to 9.66 to 11.91. NSE values improved from 0.43 ~ 0.50 to 0.309 ~ 0.630, indicating that the performance of the IPSO-optimised BiLSTM model is better than that of the ordinary PSO-optimised BiLSTM model. The IPSO algorithm can improve the model overfitting problem, and thus improve the predictive performance of the model.

Comparing the predictions of IPSO-BiLSTM and VMD-IPSO-BiLSTM, the VMD-IPSO-BiLSTM model demonstrates the most significant reduction in MAE in Jinan, reaching 45.73%. The RMSE value sees the most noticeable decrease in Zaozhuang, reaching 48.78%. This implies that the stepwise decomposition technique based on VMD significantly improves the performance of the coupled prediction model, leading to a minimum 45% increase in NSE values for rainfall testing across all cities.

To assess model accuracy, Taylor diagrams were employed, as depicted in Fig. 8. The correlation coefficients and standard deviations of different models for the four cities were compared. Concerning the correlation coefficients, the simulation results for Heze, Jinan, and Xuzhou fall within the range of 0.9 to 0.99, while Zaozhuang's simulation results range from 0.8 to 0.99. Regarding standard deviations, VMD-IPSO-BiLSTM closely approximates observed values in terms of standard deviations across all four cities.

Figure 8
figure 8

Taylor diagram comparing the prediction performance of multiple models.

Discussion

The traditional decomposition-prediction coupling model technique, when applied to decompose time series data, may inadvertently introduce some testing period data into the training period. Constructing a decomposition ensemble model can result in "false" high-accuracy prediction outcomes, making it challenging for the model to meet the demands of practical forecasting work19,20. The VMD-IPSO-BiLSTM model, based on the stepwise decomposition technique, sequentially appends new data \(S_{m + 1}\) to the existing sequence segment \((S_{1} ,S_{2} , \cdots ,S_{m} )\) for decomposition, gradually extends the sequence segment, and extracts corresponding explanatory samples.

Compared to previous VMD-BiLSTM approaches28,29, the model performance is significantly enhanced by optimizing model parameters through the IPSO optimization algorithm. Notably, the VMD-IPSO-BiLSTM model does not utilize testing period data during model training. However, in contrast to optimized prediction models based on traditional decomposition methods30, the predictive accuracy of the stepwise decomposition model is relatively lower. Therefore, further considerations are needed to improve model accuracy in the future.

Conclusion

To enhance the accuracy of rainfall prediction models, this paper introduces a novel VMD-IPSO-BiLSTM stepwise decomposition ensemble model. Comparative analysis of the prediction results with the PSO-BiLSTM, IPSO-BiLSTM, and VMD-PSO-BiLSTM models reveals the following research findings:

  1. (1)

    Compared with PSO-BiLSTM, the MAE value of IPSO-BiLSTM model decreases from 6.30 to 7.65 to 6.13 to 6.97, the RMSE decreases from 10.94 to 12.99 to 9.66 to 11.91, and the NSE value improves from 0.43 to 0.50 to 0.309 to 0.630, which indicates that the IPSO-optimised BiLSTM model parameters with better performance.

  2. (2)

    Contrast between IPSO-BiLSTM and VMD-IPSO-BiLSTM predictions reveals that the VMD-IPSO-BiLSTM model achieves the most substantial reduction in MAE in Jinan, reaching 45.73%. RMSE values experience the most notable decrease in Zaozhuang at 48.78%. This highlights the significant improvement in the performance of the coupled prediction model based on VMD's stepwise decomposition technique, leading to a minimum 45% increase in NSE values during the rainfall testing period across various cities.

  3. (3)

    VMD-IPSO-BiLSTM effectively addresses the issue of erroneously using validation period forecast factor numbers in traditional decomposition ensemble prediction models. NSE values during the testing period exceed 0.88 in all cities, indicating higher predictive accuracy. This model provides valuable reference for the correct establishment of decomposition ensemble rainfall prediction models and serves as a basis for practical forecasting of non-stationary and non-linear rainfall sequences.