Application of classical and novel integrated machine learning models to predict sediment discharge during free-flow flushing

Javadi, Fahime; Qaderi, Kourosh; Ahmadi, Mohammad Mehdi; Rahimpour, Majid; Madadi, Mohamad Reza; Mahdavi-Meymand, Amin

doi:10.1038/s41598-022-23781-x

Download PDF

Article
Open access
Published: 12 November 2022

Application of classical and novel integrated machine learning models to predict sediment discharge during free-flow flushing

Fahime Javadi ORCID: orcid.org/0000-0003-2332-5898¹,
Kourosh Qaderi¹,
Mohammad Mehdi Ahmadi¹,
Majid Rahimpour¹,
Mohamad Reza Madadi² &
…
Amin Mahdavi-Meymand³

Scientific Reports volume 12, Article number: 19390 (2022) Cite this article

1387 Accesses
2 Citations
4 Altmetric
Metrics details

Subjects

Abstract

In this study, the capabilities of classical and novel integrated machine learning models were investigated to predict sediment discharge (Q_s) in free-flow flushing. Developed models include Multivariate Linear Regression (MLR), Artificial Neural Network (ANN), Adaptive Neuro-Fuzzy Inference System (ANFIS), Group Method of Data Handling (GMDH), and four hybrid forms of GMDH and Support Vector Regression (SVR) in combination with Henry Gas Solubility Optimization (HGSO) and Equilibrium Optimizer (EO) algorithms. The number of 160 datasets that were applied to assess these models was extracted from the Janssen (PhD’s Thesis, Engineering, Civil and Environmental Engineering. University of California, 1999) experimental study. Input parameters to predict Q_s included the water level in the reservoir (h_w), bed level in the flushing channel (h_b), outflow (Q_out), inflow (Q_in), and elapsed time of flushing (T). The performance of all models was evaluated by four statistical indices of root mean square error (RMSE), mean absolute error (MAE), correlation coefficient (R²), and Mean absolute relative error (MARE). Evaluation of results demonstrated that the HGSO and EO algorithms could enhance the accuracy of the GMDH model (up to 26% and 22% in terms of RMSE), respectively. According to statistical criteria, the SVR-EO and SVR-HGSO provided the highest accuracy in both training (R² = 0.98) and validation phases (R² = 0.96). Moreover, among the developed models, the GMDH-HGSO algorithm provided excellent fitness to the observed data (R² = 0.96, RMSE = 22.37, MAE = 15.65, and MARE = 0.26). The results indicated the high efficiency of the HGSO and EO algorithms in improving the accuracy of the GMDH and SVR models. However, among the developed models, the GMDH-HGSO is the most accurate model and is recommended for sediment transport modelling.

Global prediction of extreme floods in ungauged watersheds

Article Open access 20 March 2024

Removal of heavy metal ions from wastewater: a comprehensive and critical review

Article Open access 08 July 2021

Environmental impact of direct lithium extraction from brines

Article 23 February 2023

Introduction

Dams are large hydraulic structures used for several purposes, such as water supply for drinking and agriculture, industrial usage, flood control, navigation and hydropower generation. These structures alter the sediment balance of natural rivers. A dam reservoir significantly reduces the flow velocity, acts as a sediment trap, and causes deposition of a portion of the incoming sediment load. While fine sediments are transported into the reservoir, the coarse materials are deposited immediately at the head of the reservoir to form the delta deposits¹.

Sediment deposition reduces and eventually eliminates the reservoir storage volume. Loss of reservoir storage volume causes the reduction of reservoir function and the dam’s practical lifetime, resulting in economic losses. According to the literature, annually, 0.5–1% of the global water storage volume is lost by sediment deposition. In addition, the losing annual global storage volume is higher than the volume of newly constructed reservoirs². The construction of new dams is complicated due to the environmental restrictions, construction and design costs, and the unique and scarce appropriate locations³. Management of existing reservoirs to control the progressive deposited sediments and long-term use of reservoirs is necessary, which needs the knowledge of sedimentation processes in reservoirs and dredging methods⁴.

Measures for reducing the long-term deposited sediments in the reservoirs can be divided into three general categories: reducing the sediment inflow from the upstream, diverting the sediments route to minimize deposition, and removing the accumulated sediment in reservoirs. Lowering the sediment inflow from the upstream is divided into two main categories: 1: reducing the sediment particles (soil and channel erosion control at the source) and 2: trapping the sediment particles upstream of the reservoirs. Sediment route diverting is a set of techniques to bypass the sediments around the reservoirs or pass through tunnels or canals. Deposited sediments can be removed by flushing and mechanical removal methods. The sediment management strategies depend on many factors, including the climate, the reservoir operation, the water supply needs, and the catchment and reservoir characteristics. Therefore, a sustainable sediment management strategy for a unique reservoir does not exist. Among different management strategies for controlling reservoirs' sedimentation, flushing is an efficient hydraulic technique for storage capacity restoration and conservation. In the reservoirs, hydraulic flushing is classified into pressure flushing and drawdown flushing (free-flow sediment flushing)³.

Machine learning models are flexible and non-parametric algorithms that can make a connection between the inputs and outputs variables without a deep knowledge of the physical behaviour of the system. Successful performances of machine learning models have attracted the attention of many researchers. For instance, Emamgholizadeh et al. reported the superiority of Artificial Neural Networks (ANN) to Adaptive Neuro-Fuzzy Inference System (ANFIS) in predicting the geometry of flushing half-cone⁵. Li et al. reported the suitable performance of the Back Propagation training Artificial Neural Network (BP-ANN) to analyze the relationship between the sediment flushing efficiency of the Three Gorges Reservoir (TGR)⁶. Cao et al. compared the results of a novel SVR- Henry Gas Solubility Optimization (SVR-HGSO) model with the SVR, SVR-FA, SVR- Particle Swarm Optimization (SVR-PSO), SVR- Ant Lion Optimizer (SVR-ALO), SVR-Dragonfly Algorithm (SVR-DA) and SVR- Salp Swarm Algorithm (SVR-SSA) algorithms. They found that the SVR-HGSO algorithm outperformed other algorithms⁷. Qaderi et al. used hybrid models based on the Group Method of Data Handling (GMDH) to predict bedform dimensions of alluvial channels. They concluded that hybrid models perform better than ordinary GMDH and empirical equations⁸. EL Bilali et al. evaluated the performance of the ANN and a modified universal soil loss equation coupled with multiple linear regression (MUSLE-MLR) to predict yearly sedimentation in the reservoir. They found that the performance of the ANN model is higher than other methods in predicting of yearly sedimentation in the reservoir⁹. Qaderi et al. compared the performance GMDH based models to predict daily dew point and reported that GMDH-HS and GMDH-SCE models produced better results than other developed models in predicting the dew point temperature¹⁰. Qaderi et al. used support vector machine (SVM), ANFIS, ANN, gene-expression programming (GEP), and integrated GMDH with harmony search (HS), and shuffled complex evolution (SCE) algorithms to predict bridge pier scour depth. The results indicated the superiority of ANFIS to other developed models¹¹. Zeynoddin et al. checked out generalized linear stochastic model (GLSM) result with ANFIS, ANN, GEP, and SVM tuned by the firefly algorithm (SVM-FA) to forecast the weekly and monthly lake water levels. The results revealed that the GLSM model has higher accuracy than other models¹². EL Bilali et al. compared the performance of SVR, K-nearest neighbour (K-NN), random forest (RF), and ANN models in groundwater level prediction. They proved that the ANN is the most accurate non-linear model¹³. Sayari et al. evaluated the ability of ANFIS, GMDH, multi-layer perceptron neural network (MLPNN), support vector regression (SVR), multivariate linear regression (MLR), and integrate these models with the firefly algorithm (FA) to predict infiltrated water volume in furrow irrigation. The results demonstrated the superiority of the MLPNN-FA and SVR-FA models¹⁴. Roy et al. developed Equilibrium Optimizer-based Extreme Learning Machine (EO-ELM) for rainfall-runoff modelling. The EO-ELM model was compared with a Deep Neural Network (DNN) with ELM, Kernel ELM (KELM), particle swarm optimization-based ELM (PSO-ELM), SVR, ANN, and Gradient Boosting Machine (GBM). The Results demonstrated that the EO-ELM was the most accurate model¹⁵. Mahdavi-Meymand et al. applied different kinds of GMDH that integrated with PSO and HGSO algorithms to simulate the maximum hydro-suction dredging depth. The results demonstrated that the GMDH-HGSO algorithm provides an excellent fit to the observed data¹⁶. Ezzaouini et al. used RF, adaptive boosting (AdaBoost), SVR, K-NN, and ANN models to predict the suspended sediment load. Results showed that all models have good accuracy in predicting the daily suspended sediment load¹⁷.

Many researchers have studied the reservoir behaviour during sediment flushing which include field and experimental research. According to the authors’ knowledge, machine learning models have not been used to model free-flow flushing channel formation, which highlights this study. In this study, the capability of machine learning models (MLR, MLPNN, GMDH, ANFIS, GMDH-HGSO, GMDH-EO, SVR-HGSO, and SVR-EO) is compared in the prediction of free-flow flushing channel formation. Another highlight of this study is the integration of two new meta-heuristic algorithms (EO and HGSO) with SVR and GMDH. These novel models are analyzed exclusively in this study.

Methodology

Data collection

In this study, in total, 160 datasets were used to investigate and compare the ability of classical and novel integrated machine learning models in predicting Q_s during free-flow flushing. The datasets were extracted from the Janssen¹⁸ experimental study report. Janssen¹⁸ investigated the effect of reservoir drawdown on flushing channel formation. Experiments were performed in a concrete flume 50 m in length, 2.4 m wide and 1.5 m high. Sediments used in these experiments were non-cohesive sediment with mean grain size (D₅₀) and sediment saturated density of 1.25 mm and 1270 kg/m³, respectively. Deposits were set 10 cm above the valve gate threshold and were paved with side slopes of 1%. Water levels in the reservoir, bed levels in the flushing channel, width of the flushing channel, outflow, and sediment discharge were measured over time to investigate the variations of flushing channel characteristics along the reservoir.

The water level in the reservoir (h_w), bed level in the flushing channel (h_b), outflow (Q_out), inflow (Q_in), and elapsed time of flushing (T) are considered as input parameters to predict sediment discharge (Q_s). 112 datasets (70%) were used for training the developed models, 24 datasets (15%) were considered as validation data to prevent overtraining, and 24 datasets (15%) were used to evaluate the accuracy of the implemented models. A summary of the statistical criteria of utilized datasets is illustrated in Table 1.

Table1 Statistical characteristics of data.

Full size table

Moreover, all the variables (inputs and output) before the training process were normalized, as follows:

$${X}_{i}^{^{\prime}}=\frac{{X}_{i}-{X}_{min}}{{X}_{max}-{X}_{min}},$$

(1)

where the maximum and minimum values of variables are denoted by ${X}_{min}$ and ${X}_{max}$, respectively.

Conventional machine learning models

Group method of data handling (GMDH)

The GMDH neural network is a basic technique of self-organizing data mining that was introduced by Ivakhnenko¹⁹ as a rival to the stochastic approximation method²⁰. Proposed algorithm is based on a multi-layer structure that provides a structure for simulating and modelling complex phenomena, image processing, and data mining. A complicated discrete function called Ivakheneco polynomial is used to connect inputs and output variables in the GMDH.

In the current study, a polynomial function was used as a transfer function in the neurons of the middle and output layers as follows:

$$Y={W}_{0}+{W}_{1}{X}_{1}+{W}_{2}{X}_{2}+{W}_{3}{X}_{1}^{2}+{W}_{4}{X}_{2}^{2}+{W}_{5}{X}_{1}{X}_{2},$$

(2)

where W is the coefficients’ vector (network weights), X is the input vector, and Y is the output. In the conventional GMDH, the coefficients are determined by the least square estimation (LSE) model²¹.

Support vector regression (SVR)

SVR generally is an extended version of support vector machines (SVM). SVR was developed by Vapnik²² to solve the regression problems by the SVM model. The SVR model uses the structural risk minimization technique to find the best regression hyperplane, which is defined by the following equation:

$$Y=\vartheta \varphi \left(z\right)+c.$$

(3)

In this equation, Y denotes the non-linear regression function to predict the target vector,$\vartheta$ is the weight vector, and φ(z) is an irregular higher dimension mapping input data. The coefficients c and $\vartheta$ are estimated as follows:

$$Minimize :{R}_{reg}=\left[\frac{1}{2}{\Vert \vartheta \Vert }^{2}+P\sum_{i=1}^{n}{(\xi }_{i}+{\xi }_{i}^{*})\right],$$

(4)

$$Subject\, to: \left\{\begin{array}{c}{Y}_{i}-\left(\vartheta \varphi \left({z}_{i}\right)+{c}_{i}\right)\le \varepsilon +{\xi }_{i}\\ \left(\vartheta \varphi \left({z}_{i}\right)+{c}_{i}\right)-{Y}_{i}\le \varepsilon +{\xi }_{i}^{*}\\ {\xi }_{i},{\xi }_{i}^{*}\ge 0\end{array}\right.,$$

(5)

where P factor is the regularization cost parameter, $\varepsilon$ is the acceptable error margin, and ${\xi }_{i},{\xi }_{i}^{*}$ are positive constants called slack variables²³.

Adaptive neuro-fuzzy inference system (ANFIS)

ANFIS is a hybrid model developed based on the Takagi–Sugeno (TS) fuzzy inference system²⁴. ANFIS has both neural network (e.g., ability to train machines, the ability of parallel processing, and connectionist structures) and fuzzy logic (e.g., simplicity and flexibility, high speed of training, and can be combined easily) advantages in a single framework. This structure makes the ANFIS a robust model for formulating non-linear problems and forecasting time series phenomena. ANFIS uses a fuzzy inference system (FIS) to calibrate a membership structure or parameters with a combination of methods like the least-squares-type method and the back-propagation algorithm. ANFIS structure comprises five layers and each layer includes different nodes. These layers include the fuzzy layer, production layer, normalization layer, de-fuzzy layer, and total output layer. The first layer consists of several membership functions that convert input variables to fuzzy inputs. The second layer is formed when fuzzy rules are determined for the nods. In the third layer, the effectiveness of the second layer outputs is normalized by rules. The fourth layer receives as input the normalized values from the third layer and performs the defuzzification process. The fifth layer (the last layer), utilizes the defuzzification values returned by the fourth layer to produce the system output as a numerical variable²⁵.

Artificial neural networks (ANNs)

The ANNs are one of computational intelligence systems composed of many units or nodes. These nodes are called neurons. Neurons are connected by weights and are aggregated into separated layers. The multi-layer perceptron neural network (MLPNN) is a class of feed-forward ANNs that consist of three layers: the input layer (introduction of data to the network), the multiple hidden layers (data processing), and the output layer (results from data processing). Except for input neurons, each neuron uses a non-linear activation function of the weighted summation of the inputs of the previous layer. The structure of this model is highly influenced by the problem variables, and the interconnection strategy is used in the training step¹⁴.

Multiple linear regression (MLR)

Multiple linear regression is the development of simple linear regression for cases with more than one variable. Basic equation for this model is:

$$Y={\beta }_{0}+{\beta }_{1}{X}_{1}+{\beta }_{2}{X}_{2}\dots {\beta }_{k}{X}_{k}+\varepsilon ,$$

(6)

where Y is the output, X is the input vector, β represents the regression coefficient vector, and ε is the standard estimation error²⁶.

Meta-heuristic algorithms

Henry gas solubility optimization (HGSO)

HGSO is a new meta-heuristic optimization method imitating Henry’s law that was presented by Hashim et al.²⁷. According to the Henry’s law, the amount of dissolved gas in a liquid is proportional to its partial pressure above the liquid. This law was formulated by William Henry as the following equation:

$${C}_{g}={H}_{g}\times {P}_{g},$$

(7)

where, C_g is the solubility of the gas, P_g denotes the gas partial pressure, and H_g is Henry’s constant. Henry’s law constants depended on the system temperature.

The HGSO algorithm consists of the following steps: initialization process (the positions of the population are initialized), Clustering (to divide the population into equal clusters), evaluation (the clusters are evaluated to determine the gas that reaches the highest equilibrium), updating Henry’s coefficient and solubility of gas, updating the positions and escaping from the local optimum (selecting and re-initializing a number of worst agents). A more detailed presentation of the HGSO algorithm can be found in Hashim et al.²⁷.

Equilibrium optimizer (EO)

EO is a new swarm-based meta-heuristic algorithm proposed by Faramarzi et al.²⁸. EO is inspired by the control volume mass balance models. Search agents in EO are particles (solution). Concentration is considered as the particle’s position. Positions of search agents are randomly updated based on the best current solutions (equilibrium candidates) until the equilibrium state (optimal result) is achieved. EO, like other stochastic optimization algorithms, considers an initial population to start the optimization process. Positions of the population are randomly initialized in the space search. At the beginning of the optimization process, the particles are classified to specify the equilibrium candidates. Equilibrium candidates are evaluated to provide a search pattern for the particles. These candidates are five particles that are nominated as the equilibrium candidates. Four equilibrium candidates have the best fitness in all populations, and another is chosen based on the average fitness. The four best particles improve the exploration, while the average one helps to increase the exploitation abilities of the algorithm. The populations can be determined based on the type of problems²⁸.

Integrated machine learning models

Meta-heuristic algorithms are robust optimization strategies for low and high dimension complex problems. Meta-heuristic algorithms in machine learning models optimize the structure, and calibrate unknown weights coefficients of machine learning models or both structure and weights. In this study, HGSO and EO were applied to optimize the weights of GMDH and unknown coefficients of SVR. A GMDH with two middle layers and maximum of ten neurons for each layer was considered. Weights of quadratic polynomial transfer functions of neurons were obtained by EO and HGSO. Regularization parameter (C), the insensitive loss coefficient (ε), and the kernel constant (σ) are three parameters of SVR that affect the output results. HGSO and EO were used to obtain the optimal values of these parameters. Figures 1 and 2 illustrate the flowcharts of integrated GMDH and SVR models, respectively. HGSO and EO, like most other meta-heuristic algorithms, have some parameters that must be initialized before the optimization process. In this study, the HGSO and EO parameters values were initialized based on the main source study^27,28. The values of the initial parameters of meta-heuristic algorithms are shown in Table 2.

Table 2 Considered initial values of developed EO and HGSO algorithms for this study.

Full size table

Evaluation criteria

The results of developed models are evaluated by four standard statistical criteria. These statistical indices are included the root mean square error (RMSE), mean absolute error (MAE), correlation coefficient (R²), and mean absolute relative error (MARE). RMSE is used to indicate the difference between predicted and observed values. MAE is the average of absolute differences between predicted and observed values and is indifferent to the direction of errors. The model will have the best prediction if the values of RMSE and MAE are close to zero²⁹. The correlation coefficient measures the degree of similarity between predicted and measured data. R² = 1 corresponds to a perfect model match to the observed data. Predicted and actual values are similar if the MARE value is close to 0³⁰. These indexes were calculated as follows:

$$MARE=\frac{1}{n}\sum_{i=1}^{n}\left(\frac{\left|{X}_{Oi}-{X}_{Pi}\right|}{{X}_{Oi}}\right),$$

(8)

$$MAE=\frac{1}{n}\sum_{i=1}^{n}\left|{X}_{Pi}-{X}_{Oi}\right|,$$

(9)

$$RMSE=\sqrt{\frac{1}{n}\sum_{i=1}^{n}{({X}_{Pi}-{X}_{Oi})}^{2}},$$

(10)

$${R}^{2}=\frac{\sum_{i=1}^{n}({X}_{Pi}-{\overline{X} }_{P}){(X}_{Oi}-{\overline{X} }_{O})}{\sqrt{\sum_{i=1}^{n}{({X}_{Pi}-{\overline{X} }_{P})}^{2}{{(X}_{Oi}-{\overline{X} }_{O})}^{2}}},$$

(11)

where the observed and predicted values are denoted by ${X}_{Oi}$ and ${X}_{Pi}$, respectively. The ${\overline{X} }_{O}$ and ${\overline{X} }_{P}$ represent the mean of the observed and predicted values, and n is the number of observations.

Results

In this paper, the capability of several new-developed machine learning models was investigated in free-flow flushing sediment discharge modelling. The performance of each model was evaluated by standard statistical criteria of MARE, RMSE, MAE, and R² for both the training and testing phases. Simulation results of the training, validation, and testing datasets are presented in Table 3.

Table 3 Statistical indices of the proposed model in the training, validation and testing datasets.

Full size table

It can be observed from Table 3 that the R² values of all models are higher than 0.8, which means all applied models have good performance during both the training and validation phases. In the training phase, the MLPNN, SVR-EO, and SVR-HGSO models produced the highest R², which are 0.99, 0.98, and 0.98, respectively. But in the validation set, two models, SVR-EO and SVR-HGSO, have the highest R² (0.96, 0.96, respectively). This shows the suitable accuracy of SVR-EO and SVR-HGSO models in predicting of sediment discharge in training and validation sets. In hybrid GMDH-based models, it is observed that the GMDH had better performance compared to the integrated versions of GMDH (GMDH-HGSO and GMDH-EO) during both the training and validation phases. RMSE values of GMDH-HGSO and GMDH-EO are 42.99 g/s and 37.10 g/s in the training dataset and 38.51 g/s and 36.98 g/s in the validation dataset, while in GMDH, it is 33.09 g/s and 30.82 g/s, respectively. However, in the testing phase, GMDH-HGSO and GMDH-EO models are better than GMDH. It can be observed that in the training, validation, and testing datasets, the MLR model obtained very high values of RMSE and low R² values among all the applied models. In other words, there is no linear relation between the variables of the free-flow flushing phenomena. Comparison between GMDH and ANFIS indicates that their obtained results are very similar in the testing phase. However, in the training phase, ANFIS is slightly more accurate than GMDH in terms of R², RMSE, MAE, and MARE. On the other hand, for predicting Q_s in the testing phase, GMDH-HGSO obtained the best R², RMSE, and MAE statistics of 0.96, 22.37, and 15.65, respectively. GMDH-EO is in the second rank. Values of R², RMSE, and MAE of GMDH-EO in the testing phase are 0.95, 23.71, and 18.35, respectively. SVR-EO and SVR-HGSO are slightly better than GMDH-HGSO and GMDH-EO in terms of MARE. The performance of the GMDH-EO model was slightly better than SVR-EO and SVR-HGSO. Generally, the performances of GMDH-HGSO, GMDH-EO, SVR-EO, and SVR-HGSO are better than other models in the testing phase. This indicates the ability of HGSO and EO algorithms in the optimal determination of the GMDH coefficients and SVR parameters. Figure 3 presents the scatter plots of predicted values of Q_s using the developed models in the testing phase. The lines − 25% and + 25% represent two dispersion bounds, while the 1:1 line denotes the line of agreement.

The data points located under the 1:1 line reveal the under-prediction of applied models. Moreover, the over-predicted data are situated on top of the 1:1 line. As it can be seen from Fig. 3, except for the MLR model, the most scatter points of the applied models are between the − 25% and + 25 lines. According to Fig. 3, about 75% of the scatter points of GMDH-HGSO and 71% of GMDH-EO, SVR-EO and SVR-HGSO are located within a ± 25% error from the 1:1 line (correct agreement line). Moreover, 58% of the predicted values of the GMDH model are located in this area. It can be concluded that using the HGSO and EO algorithms improves prediction accuracy. It can be concluded from Fig. 3 and Table 3, among new-developed machine learning models, GMDH-HGSO, GMDH-EO, SVR-EO, and SVR-HGSO models demonstrated acceptable performance. GMDH-HGSO is the best model for predicting sediment discharge in the free-flow flushing channel. The performance of the best models in predicting sediment discharge in the training, validation, and testing phases are illustrated in Fig. 4.

Taylor’s diagram is a mathematical diagram to evaluate and compare the performances of different models using the Pearson correlation coefficient, the central root-mean-square error (CRMSE), and the standard deviation. The distance between each model and the observed point is a measure to evaluate the models’ performance. If the model point and observed point are similar, the Pearson correlation coefficient is close to 1, the CRMSE is close to 0, and the calculated standard deviation for both of them is the same³¹. Figure 5 illustrates the Taylor diagram of developed models. According to the results presented in Fig. 5, all machine learning models have relatively high correlation and low CRMSE which indicates the excellent performance of the applied models. The MLR has a low pattern correlation. The GMDH and the hybrid forms of GMDH and SVR have relatively small differences with observed values. However, The GMDH-HGSO model with relatively high correlation and low CRMSE is the most accurate method.

Discussion

Reservoir sedimentation represents a serious threat to water management, flood control, and energy production that should be investigated. Classical and novel integrated models can be valuable tools for sediment management. In this study, several models, including MLR, ANN, ANFIS, GMDH and four hybrid forms of GMDH and SVR in combination with HGSO and EO algorithms were used to predict the sediment discharge (Q_s) in free-flow flushing. The results revealed that these models have the acceptable performance to predict the sediment discharge in the free-flow flushing. The findings of this study confirm that the non-linear models’ accuracy is higher than linear methods.

The models in this study were trained based on the experimental data of Janssen¹⁸. Whether the models were trained based on data with the sufficient domain or not is a challenging question. In an actual project where the maximum water level during free flushing is around 10 m the scale based on the Janssen¹⁸ experiments and Froude number would calculate 115. With this scale, the maximum discharges in the prototype would calculate 1617 m³/s which is realistic. However, training machine learning methods with more data and broader ranges of parameters would develop more advanced models.

Meta-heuristic algorithms are robust tools for optimizing complicated problems. Sediment transport is a complex hydraulic phenomenon. The results of this study approved that using novel algorithms such as HGSO and EO increases the accuracy of machine learning performance. The application of new integrative machine learning models is recommended to simulate other hydraulic problems. Also designing new experiments with large scales or field measurements to increase the domain of data is a good topic for the future direction of study.

Conclusions

Free-flow flushing is an efficient method for reducing deposited sediments in reservoirs. Accurate prediction of free-flow flushing parameters is significantly essential for the economical design of reservoirs and sediment management. In this study, new developed integrated machine learning models were applied to predict the sediment discharge of free-flow flushing (Q_s). Models used included MLR, ANFIS, MLPNN, GMDH, and the integration of GMDH and SVR with EO and HGSO. Performances of advanced machine learning models are close to each other. However, the GMDH-HGSO is the most accurate model. Results revealed that the HGSO and EO algorithms are robust optimization tools for training machine learning models. The HGSO and EO improved the GMDH performance by about 5% and 3%, respectively. The application of the GMDH-HGSO model in reservoir management for predicting the sediment output is recommended. However, increasing the domain of parameters and analyzing the models’ performance with new data set is a good topic for the future direction of study.

Data availability

The utilized data are available in Janssen (1999), and by the Corresponding author per request.

References

Morris, G. L. & Fan, J. Reservoir Sedimentation Handbook: Design and Management of Dams, Reservoirs and Watersheds for Sustainable Use (McGraw-Hill, 1998).
Google Scholar
Rahmani, V. et al. Examining storage capacity loss and sedimentation rate of large reservoirs in the Central U.S.. Great Plains. Water 10(2), 190. https://doi.org/10.3390/w10020190 (2018).
Article Google Scholar
Morris, G. L. Classification of management alternatives to combat reservoir sedimentation. Water 12(3), 861. https://doi.org/10.3390/w12030861 (2020).
Article MathSciNet Google Scholar
Chang, M. J. et al. Outflow sediment concentration forecasting by integrating machine learning approaches and time series analysis in reservoir desilting operation. Stoch. Env. Res. Risk Assess. https://doi.org/10.1007/s00477-020-01802-3(0123456789 (2020).
Article Google Scholar
Emamgholizadeh, S., Bateni, S. M. & Jeng, D. S. Artificial intelligence-based estimation of flushing half-cone geometry. Eng. Appl. Artif. Intell. 26, 2551–2558 (2013).
Article Google Scholar
Li, X., Qiu, J., Shang, Q. & Li, F. Simulation of reservoir sediment flushing of the three gorges reservoir using an artificial neural network. Appl. Sci. 6(5), 148 (2016).
Article Google Scholar
Cao, W., Liu, X. & Ni, J. Parameter optimization of support vector regression using henry gas solubility optimization algorithm. IEEE Access 8, 88633–88642. https://doi.org/10.1109/ACCESS.2020.2993267 (2020).
Article Google Scholar
Qaderi, K., Maddahi, M. R., Rahimpour, M. & Masoumi Shahr-babak, M. Investigating the capability of two hybrid intelligence methods to predict bedform dimensions of alluvial channels. Water 18(5), 1706–1718. https://doi.org/10.2166/ws.2017.230 (2018).
Article Google Scholar
Bilali, E. L. & A, Taleb A, EL Idrissi B, Brouziyne Y, Mazigh N,. Comparison of a data-based model and a soil erosion model coupled with multiple linear regression for the prediction of reservoir sedimentation in a semi-arid environment. Euro-Mediterr. J. Environ. Integr. 5, 64. https://doi.org/10.1007/s41207-020-00205-8 (2020).
Article Google Scholar
Qaderi, K., Bakhtiari, B., Maddahi, M. R. & Afzali-Gorouh, Z. Evaluating GMDH-based models to predict daily dew point temperature (case study of Kerman province). Meteorol. Atmos. Phys. 132, 667–682 (2020).
Article ADS Google Scholar
Qaderi, K., Javadi, F., Madadi, M. R. & Ahmadi, M. M. A comparative study of solo and hybrid data driven models for predicting bridge pier scour depth. Mar. Georesour. Geotechnol. 2, 1–11. https://doi.org/10.1080/1064119X.2020.1735589 (2020).
Article Google Scholar
Zeynoddin, M., Bonakdari, H., Ebtehaj, I., Azari, A. & Gharabaghi, B. A generalized linear stochastic model for lake level prediction. Sci. Total Environ. 723, 138015. https://doi.org/10.1016/j.scitotenv.2020.138015 (2020).
Article ADS CAS PubMed Google Scholar
EL Bilali, A., Taleb, A. & Brouziyne, Y. Comparing four machine learning model performances in forecasting the alluvial aquifer level in a semi-arid region. J. Afr. Earth Sci. 181, 104244. https://doi.org/10.1016/j.jafrearsci.2021.104244 (2021).
Article Google Scholar
Sayari, S., Mahdavi-Meymand, A. & Zounemat-Kermani, M. Irrigation water infiltration modeling using machine learning. Comput. Electron. Agric. 180, 105921 (2021).
Article Google Scholar
Roy, B. et al. Data-driven approach for rainfall-runoff modelling using equilibrium optimizer coupled extreme learning machine and deep Neural Network. Appl. Sci. 11(13), 6238. https://doi.org/10.3390/app11136238 (2021).
Article CAS Google Scholar
Mahdavi-Meymand, A., Zounemat-Kermani, M. & Qaderi, K. Prediction of hydro-suction dredging depth using data-driven methods. Front. Struct. Civ. Eng. 15, 652–664. https://doi.org/10.1007/s11709-021-0719-7 (2021).
Article Google Scholar
Ezzaouini, M. A. et al. Predicting daily suspended sediment load using machine learning and NARX hydro-climatic inputs in semi-arid environment. Water 14, 862. https://doi.org/10.3390/w14060862 (2022).
Article Google Scholar
Janssen, R. H. A. An Experimental Investigation of Flushing Channel Formation During Reservoir Drawdown. PhD’s Thesis, Engineering, Civil and Environmental Engineering. University of California (1999).
Ivakhnenko, A. G. The group method of data handling, a rival of the method of stochastic approximation. Soviet Autom. Control 1(3), 43–55 (1968).
MathSciNet Google Scholar
Akram, T., Naqvi, S. R., Ali Haider, S., Kamran, M. & Qamar, A. A novel framework for approximation of magneto-resistance curves of a superconducting film using GMDH-type neural networks. Superlattices Microstruct. 145, 106635. https://doi.org/10.1016/j.spmi.2020.106635 (2020).
Article CAS Google Scholar
Masoumi Shahr-Babak, M., Khanjani, M. J. & Qaderi, K. Uplift capacity prediction of suction caisson in clay using a hybrid intelligence method (GMDH-HS). Appl. Ocean Res. 59, 408–416 (2016).
Article Google Scholar
Vapnik, V. N. Statistical learning theory. Adaptive and learning systems for signal processing. Commun. Control 2, 1–740 (1998).
Google Scholar
Haghbin, M., Sharafati, A. & Motta, D. Prediction of channel sinuosity in perennial rivers using Bayesian Mutual Information theory and support vector regression coupled with meta-heuristic algorithms. Earth Sci. Inform. 14, 2279–2292. https://doi.org/10.1007/s12145-021-00682-7 (2021).
Article ADS Google Scholar
Jang, J.-S.R. ANFIS: Adaptive-network-based fuzzy inference systems. IEEE Trans. Syst. Man Cybern. 23(3), 665–685 (1993).
Article Google Scholar
Sharafati, A., Haghbin, M., Torabi, M. & Yaseen, Z. M. Assessment of novel nature-inspired fuzzy models for predicting long contraction scouring and related uncertainties. Front. Struct. Civ. Eng. 15(3), 665–681. https://doi.org/10.1007/s11709-021-0713-0 (2021).
Article Google Scholar
Abrougui, Kh. et al. Prediction of organic potato yield using tillage systems and soil properties by artificial neural network (ANN) and multiple linear regressions (MLR). Soil Tillage Res. 190, 202–208. https://doi.org/10.1016/j.still.2019.01.011 (2019).
Article Google Scholar
Hashim, F. A., Houssein, E. H., Mabrouk, M. S., Al-Atabany, W. & Mirjalili, S. Henry gas solubility optimization: A novel physics-based algorithm. Future Gener. Comput. Syst. 101, 646–667 (2019).
Article Google Scholar
Faramarzi, A., Heidarinejad, M., Stephens, B. & Mirjalili, S. Equilibrium optimizer: A novel optimization algorithm. Knowl.-Based Syst. 191, 105190 (2020).
Article Google Scholar
Soltani, K. & Azari, A. Forecasting groundwater anomaly in the future using satellite information and machine learning. J. Hydrol. 612, 128052. https://doi.org/10.1016/j.jhydrol.2022.128052 (2022).
Article Google Scholar
Zeynoddin, M. et al. Novel hybrid linear stochastic with non-linear extreme learning machine methods for forecasting monthly rainfall a tropical climate. J. Environ. Manage. 222, 190–206. https://doi.org/10.1016/j.jenvman.2018.05.072 (2018).
Article PubMed Google Scholar
Taylor, K. E. Summarizing multiple aspects of model performance in a single diagram. J. Geophys. Res. 106, 7183–7192. https://doi.org/10.1029/2000JD900719 (2001).
Article ADS Google Scholar

Download references

Author information

Authors and Affiliations

Department of Water Engineering, Shahid Bahonar University of Kerman, Kerman, Iran
Fahime Javadi, Kourosh Qaderi, Mohammad Mehdi Ahmadi & Majid Rahimpour
Department of Water Engineering, University of Jiroft, Jiroft, Iran
Mohamad Reza Madadi
Institute of Hydro-Engineering, Polish Academy of Sciences, Warsaw, Poland
Amin Mahdavi-Meymand

Authors

Fahime Javadi
View author publications
You can also search for this author in PubMed Google Scholar
Kourosh Qaderi
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Mehdi Ahmadi
View author publications
You can also search for this author in PubMed Google Scholar
Majid Rahimpour
View author publications
You can also search for this author in PubMed Google Scholar
Mohamad Reza Madadi
View author publications
You can also search for this author in PubMed Google Scholar
Amin Mahdavi-Meymand
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

F.J., K.Q. and M.R.M. wrote the main manuscript text and M.M.A., M.R. and A.M-M. edited the text. F.J. prepared figures and tables. All authors reviewed the manuscript.

Corresponding author

Correspondence to Fahime Javadi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Javadi, F., Qaderi, K., Ahmadi, M.M. et al. Application of classical and novel integrated machine learning models to predict sediment discharge during free-flow flushing. Sci Rep 12, 19390 (2022). https://doi.org/10.1038/s41598-022-23781-x

Download citation

Received: 13 June 2022
Accepted: 04 November 2022
Published: 12 November 2022
DOI: https://doi.org/10.1038/s41598-022-23781-x

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.