Abstract
In river research, forecasting flow velocity accurately in vegetated channels is a significant challenge. The forecasting performance of various independent and hybrid machine learning (ML) models are thus quantified for the first time in this work. Utilizing flow velocity measurements in both natural and laboratory flume experiments, we assess the efficacy of four distinct standalone machine learning techniques—Kstar, M5P, reduced error pruning tree (REPT) and random forest (RF) models. In addition, we also test for eight types of hybrid ML algorithms trained with an Additive Regression (AR) and Bagging (BA) (ARKstar, ARM5P, ARREPT, ARRF, BAKstar, BAM5P, BAREPT and BARF). Findings from a comparison of their predictive capabilities, along with a sensitivity analysis of the influencing factors, indicated: (1) Vegetation height emerged as the most sensitive parameter for determining the flow velocity; (2) all ML models displayed outperforming empirical equations; (3) nearly all ML algorithms worked optimal when the model was built using all of the input parameters. Overall, the findings showed that hybrid ML algorithms outperform regular ML algorithms and empirical equations at forecasting flow velocity. ARM5P (R^{2} = 0.954, R = 0.977, NSE = 0.954, MAE = 0.042, MSE = 0.003, and PBias = 1.466) turned out to be the optimal model for forecasting of flow velocity in vegetatedrivers.
Similar content being viewed by others
Introduction
Vegetation in an aquatic environment, such as aquatic herbs, plants, saplings, and shrubs that blossom around the water body, may be either submerged in the flow or emergent. The presence of vegetation decreases flow velocity and promotes local sedimentation by enhancing hydraulic roughness. Thus, being able to forecast accurately flow velocity is important for estimating flow resistance and the shear stress acting on the bed, and for producing estimates of flow depth and sediment transport. Nevertheless, our comprehension of the comprehensive impact of vegetation cover on river hydraulics, encompassing factors such as size, density, arrangement of vegetation stems, height of submergence, stem flexibility, geometry, and spacing, remains incomplete^{1}, making flow velocity forecasting in vegetated alluvial channels a significant challenge in river science.
The velocity profiles generated by submerged and emergent vegetation differ due to a contrast in height and flexibility of the vegetation. The complexity of estimating these profiles escalates when the boundary roughness undergoes variations tied to the vegetation's growth stage, along with the temporal alignment of these changes with seasonal differences in river flows, often affecting whether the vegetation is submerged or emergent^{2}. For example, Kouwen et al.^{3} performed various laboratory flume experiments and concluded that velocity profile above the vegetation layer followed the logarithmic law.
Velasco et al.^{4} performed numerous lab experiments to ascertain the flow resistance occurring due to varying densities of flexible vegetation. Their results showed the velocity profile within the canopy differed from a logarithmic profile due to the existence of vegetation stems in the flow, and that the profile shape is related to the deflected height of the plants. Wilson et al.^{5} also concluded that plant form has a significant effect on the mean flow field. A similar vertical change in flow structure was also observed by Chen et al.^{6}. Their experiments showed a considerable variation in the flow field at the sheath section and at the top of a plant clump. The plants foliage thus contributes to the plant's global resistance, reaching 40% of the overall drag^{7}.
Other researchers have focused on understanding how flow dynamics are impacted by the existence of vegetation. For example, Ikeda and Kanazawa^{8} conducted experiments to examine the threedimensional, organized vortices generated above flexible vegetation. Liu et al.^{9} performed lab experiments to examine velocity profiles under rigid acrylic dowels. Their discoveries support the idea that the flow along the riverbed and atop vegetation exhibits notable instability, leading to the formation of coherent structures and significant exchange of mass and momentum.
Stoesser et al.^{10} also showed that the interspacing between the vegetation impacts turbulence by altering the 3D flow patterns. Their study found that, cylinder (or vegetation) density had a greater impact on flow and turbulence than the cylinder Reynolds number. Flow velocity in vegetated channels can be forecasted using four main types of model: theoretical, numerical/mathematical, empirical and machinelearning approaches^{11}. Theoretical and numerical attempts have included using firstorder and higherorder closure models^{12,13,14,15}. Neary^{11} showed that reasonable forecasting of velocity profiles is achieved by adopting universal values for all model coefficients.
Choi and Kang^{16} worked on numerical simulations and found that flow quantities are optimal forecasted using Reynolds stress model as compared to others approaches. Theoretical descriptions are usually complex however, and often require poorly understood closure parameters, and at times, there are practical difficulties in collecting such data, especially in natural rivers. To overcome these difficulties, others have developed empirically based regression models to estimate depthaveraged velocity. For example Green^{17} utilized natural vegetated fields to generate percentiles of blockage factor (the fraction of a crosssection blocked by vegetation), which were then regressed against vegetation resistance. The optimal results were obtained using an exponential optimalfit connection utilizing the 69th blockagefactor percentile.
Huthoff^{18} proposed an alternate model for flow velocity within submerged vegetation. The model was constructed based on a twolayer approach, with distinct characterizations for the flow above and through the plant layer. Other linear empirical models, developed mainly from experimental datasets, include Kouwen and FathiMoghadam^{19}, Stephan and Gutknecht^{20}, Stone and Shen^{21}, Velzen et al.^{22}, Huthoff^{18}, and Baptist et al.^{12}. These equations provide an underlying relation between flow velocity and vegetation interactions, but their applicability beyond the conditions in which they were derived and developed is limited.
In natural rivers, flow conditions depend on flow resistance and roughness type, with bedform dynamics regulating flow resistance. Manning's equation is commonly used for predicting roughness. Mir and Patel^{23} used ML models to predict Manning's roughness coefficient (n) based on six input features. Random forest, extra trees regression, and extreme gradient boosting models performed exceptionally well (R^{2} = 0.99), while Lasso Regression showed moderate efficiency. Sensitivity analysis revealed the energy grade line as a crucial predictor, providing deeper insights into riverbed characteristics and the complex relationship between roughness and other parameters.
Kouwen and FathiMoghadam^{19} proposed a modified model for estimating coniferous tree resistance coefficients in openchannel flow that takes into account species flexibility variations. Experiments have validated that model, which effectively incorporates vegetationflow interactions while improving accuracy over existing methods. Key findings include a method for estimating Manning's n value, which improves flow resistance predictions in vegetated channels. Stephan and Gutknecht^{20} investigated the impact of roughness caused by submerged macrophytes on flow dynamics, emphasizing their adaptability and variable nature in various flow scenarios. Conventional flow formulas are inadequate for this complexity, necessitating the development of a hydraulic roughness parameter based on deflected plant height. Laboratory experiments with three types of aquatic vegetation revealed a relationship between hydraulic roughness and deflected plant height, resulting in a more precise quantification method.
Stone and Shen^{21} conducted extensive flume experiments to study flow hydraulics in an open channel with circular cylindrical roughness. The results showed that flow resistance varies with flow depth, stem concentration, length, and diameter and is best expressed as the maximum depthaveraged velocity between stems. They developed and validated physically based formulas for flow resistance and velocities in roughness and surface layers, which enable the calculation of channel hydraulic conditions. Velzen et al.^{22} submitted a RIZA report on floodplain vegetation flow resistance for the Directorate of Public Works and Water Management in the East Netherlands, which summarizes office studies conducted in collaboration with WL/Delft Hydraulics. The first section of the report is a manual that details flow resistance for various vegetation structures, while the second section discusses resistance formulations, vegetation structural properties, and the parameters used. The key findings include detailed descriptions and validated formulas for estimating flow resistance across various vegetation types.
Huthoff^{18} investigated methods for describing vegetation impact on flow fields, which is important for river flood studies because vegetationcovered floodplains influence flow during high discharge. It emphasizes the importance of incorporating vegetation obstruction into riverreach hydraulic models with simple, measurable input parameters that require little computational effort. The proposed method effectively meets these requirements while improving flow behavior predictions. Baptist et al.^{12} developed vegetationinduced roughness equations using a variety of methods, including two analytical methods and a numerical turbulence model. The first analytical approach simplified the vertical flow profile, whereas the second addressed the momentum balance for flow through and over vegetation. They also demonstrated the use of genetic programming to generate roughness expressions from synthetic data, which are then validated against flume experiment results. Include the effective development and validation of these roughness estimation methods.
Recently, machine learning (ML) models have been widely used to model different catchment phenomena such as floods^{24}, landslides^{25,26}, and incipient sediment motion^{27,28}. ML methods are widely adopted these days because they able to forecast complex and nonlinear environment phenomena, they require less data than other model types, are user friendly, have a nonlinear structure, and without any knowledge of the underlying phenomenon, are able to formulate a nonlinear and robust formula between inputs and output. Thus, these models can have a higher predictive power than both theoretical and empirical equations^{28}. Data driven and ML approaches have been widely used in various hydraulic applications in rivers.
For instance, Wang et al.^{29} estimated river velocity based on GAN image enhancement and multifeature fusion. Their results revealed ML models can produce high levels of accuracy, up to 92%. Hussain and Khan^{30} found that Random Forest models had a 17.8% and 33.6% higher performance than ANN and SVM methods for forecasting river stream flow. Others have shown that ANN models used to forecast the hydraulic geometry of irrigation canals^{31} and gravelbed rivers^{32} outperform empirical equations. Tahershamsi et al.^{33} forecasted width of alluvial channels using multilayer perceptron (MLP) and radial basis function (RBF) models. The performance of both models was satisfactory. Gene Expression Programming has been used to estimate bed shear stress distributions within channels, demonstrating superior performance to a wellestablished entropybased model^{34}.
Hybrid machine learning methods in machine learning (ML) employ the amalgamation of multiple independent ML methods to generate a more resilient predictive ML method. The aim of this method is to leverage the benefits of different base ML methods to improve the overall accuracy, robustness, and generalizability of the forecast, particularly when applied to fresh data. Hybrid machine learning methods are widely used in several fields due to their ability to tackle complex difficulties and enhance ML method accuracy.
Investigating changes in flow characteristics in open channels is crucial for understanding water ecosystems, influencing sediment deposition and water quality. Maji^{35} used Machine learning, specifically Polynomial Regression Techniques to validate laboratory experimental data of turbulent flow in a channel with emergent vegetation, showing close matches between experimental and theoretical data. Deng and Liu^{36} used a hybrid ML model, combining Bayesian Optimization with Least Squares Support Vector Machine (BOLSSVM) to predict depthaveraged velocity in submerged vegetation flows, improving accuracy over traditional ML models and empirical formulas. Nondimensionalization as a preprocessing method further enhances prediction performance. BOLSSVM outperforms standalone LSSVM, SVM, and MLP models, achieving superior results and demonstrates the highest reliability in uncertainty analysis. Sensitivity analysis reveals frictional resistance parameters are more critical than bed slope parameters.
Kumar et al.^{37} evaluated multiple standalone and hybrid ML methods to predict flow velocity in vegetative alluvial channels using diverse datasets. Among the six ML methods analyzed, ARM5P demonstrated the highest prediction accuracy. Sensitivity analysis identified vegetation height as the most critical variable in predicting flow velocity. Meddage et al.^{38} proposed models using treebased ML models (Decision Tree, Extra Trees, XGBoost) to predict bulkaverage velocity and surface layer friction factor (fS), with SHAP for interpretation. Existing regression models, despite accuracy, lack feature importance and causality insights. XGBoost outperforms in predicting bulkaverage velocity (R = 0.984) and fS (R = 0.92). SHAP enhances understanding by revealing prediction rationale, dependencies, and feature importance, aligning with observed flow behaviors and increasing trust in the predictions.
Boraah and Kumar^{39} investigated the impact of vegetation on the transport of sediment and the flow of water in river channels. They discovered that aquatic plants regulate the mean flow and turbulence, reduce discharge, and increase sediment accumulation. The study employs the Group Method of Data Handling (GMDH) soft computing technique to model flowvegetation interactions and predict flow resistance, given the limitations of traditional methods. The GMDH model efficiently optimizes predictions and emphasizes the impact of a variety of factors on the velocity profile by capturing the relationship between input and output parameters.
Barman and Kumar^{40} looked at how bank angle and floodplain vegetation emergence affect flow in compound channels. They used 45degree and 90degree bank angles, as well as three vegetation setups: no vegetation, fully submerged, and partially emergent. The findings indicate that vegetation has a significant impact on slopes, with steeper banks (90 degrees) experiencing higher velocity, Reynolds shear stress (RSS), and turbulent kinetic energy (TKE) resulting in greater instability. Increased vegetation emergence in floodplains exacerbates slope vulnerability, providing insights for improved hydraulic engineering and bank stability maintenance.
Arora et al.^{41} investigated flow structure changes at the interface of partially and fully vegetated sections and recommended fully vegetated sections near riverine structures for improved flow management. Partially vegetated sections show helical flow and increased turbulent kinetic energy downstream, while fully vegetated areas show more transverse flux and intermixing. These findings indicate that fully vegetated covers improve safety and effectiveness in managing flow around critical river structures.
Barman et al.^{42} used three soft computing techniques to predict flow velocity in vegetated channels. They discovered that the group method data handling (GMDH) model is better at making predictions than the optimizable Gaussian process regression (GPR) model and the ensemble tree (ET) model with Bayesian optimization. However, ETB converges more quickly.
Barman et al.^{43} investigated flow past homogeneous and heterogeneous vegetation heights in a controlled setting, accounted for submerged and emergent vegetation cases. Barman et al.^{43} discovered that while height variations in fully submerged heterogeneous vegetation influence main channel flow, increased vegetation emergence and density significantly impact flow near the floodplain interaction zone. Near the water's surface, fully emergent cases show a dip effect with specific velocity gradients and negative streamwise Reynolds shear stress. Near the channel bed, sweep and ejection events are more common.
Despite the fact that all of this earlier research has demonstrated that ML algorithms have greater predictive capacity than conventional equations, they have yet to be used to forecast flow velocity in vegetated channels. As a result, there exists a significant gap in knowledge concerning the potential of machine learning algorithms and the identification of the most flexible and accurate algorithm.
Research gap

1.
Very limited studies have worked on prediction of flow velocity in vegetated alluvial channel

2.
The application of hybrid ML methods along with the sensitivity analysis of the input parameters used is often missing from the existing studies. At times, researchers are not in a position to capture all the parameters due to various limitations. Using the sensitivity analysis researchers can get information regarding which parameters are important and which are relatively less important.

3.
Using multiple datasets from the lab as well as flume ensures that a robust model is developed which incorporates the uncertainties from various data collected.
The current paper aims to address this knowledge gap by achieving the following objectives: (1) forecasting of flow velocity in vegetated alluvial channels using four types of standalone ML techniques—Kstar, M5P, reduced error pruning tree (REPT) and random forest (RF) models—in addition to eight types of hybrid ML methods; viz., Additive Regression (AR) and Bagging (BA) (ARKstar, ARM5P, ARREPT, ARRF, BAKstar, BAM5P, BAREPT and BARF); (2) Compare and contrast the predictive capabilities of these proposed ML models with four frequently employed empirical equations.; and (3) Conduct a sensitivity analysis on the input combination that yields the highest forecasting accuracy.
This work is the first attempt to predict flow velocity in vegetated channels using a variety of machine learning methods. Based on simple flow and channel factors, the research offers new insights into ML techniques that might be used for precise and effective flow velocity forecasting.
Methodology
Proposed architecture
Figure 1 presents the proposed architecture utilized in this research work for the forecasting of flow velocity. The methodology can be summarized in eight steps:

1.
Data collection from different sources

2.
Dimensional analysis to find the effective input parameters

3.
Divide data sets for model training and testing

4.
Construct different input scenarios

5.
Find the effectiveness of each input parameter on the modeled results, based on sensitivity analysis

6.
Develop standalone and hybrid ML approaches

7.
Optimize model’s hyperparameters

8.
Compare and contrast the efficacy of the proposed models using existing approaches.
Dimensional analysis and functional formula
Yen^{1} analyzed a number of flow resistance equations with respect to their dependent and independent parameters, revealing the following functional form can characterize flowvegetation interactions^{1}.
where V is the flow velocity, α is the channel slope, h_{v} is the height of the vegetation, D_{f} is the flow depth, N_{v} is the number of cylinders per unit vegetated area, d_{v} is the diameter of cylindrical vegetation, and β_{d} is the nondimensional drag coefficient. Equation (1) applies to homogeneous vegetation having a fixed diameter and height of stems. The channel flow is assumed to be steady, 2D and uniform. All data comes from wide channels and thus sidewall effects are neglected^{44}. In this study, V is viewed as a dependent variable, which mainly depends on several factors, according to Eq. (1). With this in mind, Eq. (1) can be rewritten as:
Dataset
We compiled 447 data points from different sources. These datasets included Einstein and Banks^{45}, Fenzl^{46}, Kouwen et al.^{3}, Ree and Crow^{47}, Murota^{48}, Tsujimoto and Kitamura^{49}, Tsujimoto^{50}, Tsujimoto^{51}, Shimizu^{52}, Dunn et al.^{53}, Ikeda and Kanazawa^{8}, Meijer^{54}, Jarvela^{55}, Rowinski and Kubrak^{56}, Stone and Shen^{21}, Poggi et al.^{15}, Carollo et al.^{57}, and Murphy et al.^{58}. These studies include results for both labbased flume experiments as well as experiments conducted on natural rivers.
After ascertaining the optimal input combination and selecting the optimal hyperparameters, the data was split into two parts^{59} with 70% reserved for training and 30% for testing purposes. This ratio produced 314 data points for training and while 133 data points was allotted for testing phase. Table 1 presents the statistical metrics related to both the training and testing sets, as well as the entire dataset.
Determination of optimal input parameter combination
Six parameters (α, h_{v}, D_{f}, N_{v}, d_{v}, and β_{d}) were considered as potential effective parameters. The correlation coefficient between each of these six parameters with V was utilized to construct different input combinations. In total, six inputs were formulated, starting with the parameter exhibiting the highest correlation with flow velocity (i.e., N_{v}), followed by the inclusion of the parameter with the second highest correlation, and subsequently incorporating the parameter with the third highest correlation, continuing this sequence until all parameters were utilized (see Table 2). This approach was grounded in the assumption that parameters with the highest correlation would exert the most significant influence on forecasting power.
Model descriptions
Machine learning models
Kstar
The Kstar procedure^{60} is an instancebased model that was inspired by the kNearest Neighbor regression model. In kNearest Neighbor, the Euclidean metric is used to evaluate the distance between the instances, while K^{*} uses the entropy metric. The complexity of transforming instances is calculated by K^{*} distance:
where the probability of paths between instances is represented by P*. In the case of real numbers, \({P}^{*}({\beta }_{k}/{\alpha }_{k})\) depends on the difference between \({\beta }_{k}\) and \({\alpha }_{k}\).
where j = \(\ {\alpha }_{k}{\beta }_{k}\\) and s is a parameter, whose value is between zero and one.
M5Prime (M5P)
The M5P model, proposed by Wang and Witten^{61}, extends the M5 model that was initially proposed by Quinlan et al.^{62}. One of the valuable features of the M5P model is that it handles large datasets consisting of a high number of features and dimensions. The model is also robust when it comes to handling missing data points in the dataset.
The M5P model initiates by partitioning the input space into multiple subspaces, ensuring that each subspace encompasses data points with common features. To minimize the variability within a particular subspace, a linear regression is used. This information is utilized to make several nodes; at these nodes a splitting process is carried out according to a given attribute. These steps help create an inverse treelike structure with the root at the top and leaves at the bottom. When a new record comes to the system, it moves from the root, traversing the tree until it reaches the leaf node. This process helps in knowledge derivation. Model development consists of three important steps:
Step 1: To construct a tree, the input space is divided into several subspaces, and the specified splitting criterion is employed to minimize intrasubspace variability. In order to measure the variability, the standard deviation is used for the values that reach a node. During the M5P treegrowing procedure, the standard deviation reduction (SDR)^{63} is optimized to ensure optimal model performance. The equation for SDR is given by:
where S represents the collection of data records that reach the node, S_{i} are the sets resulting from dividing the node based on a specified attribute, and sd represents the standard deviation.

Step 2: Pruning of the tree is carried out to remove unnecessary subtrees. This phase aims to mitigate overfitting, a phenomenon wherein a machine learning model accurately predicts training data but struggles with testing or new data.

Step 3: The pruning process may induce sharp discontinuities between the adjacent linear models at the leaves of the pruned tree^{64}. As a final stage, a smoothing process is therefore implemented to address this issue.
Reduced error pruning tree (REPT)
The machine learning model called the Reduced Error Pruning Tree (REPT) starts with building a decision tree and works its way up to a complete representation of the data. A pruning procedure is then used to remove superfluous branches, which avoids overfitting and enhances generalization to fresh data. After that, rules are extracted from this pruned tree, yielding a more straightforward and understandable model. The REPT model is useful in situations where precise forecasts and a thorough comprehension of the elements influencing decisions are crucial because it finds a balance between complexity and transparency.
Random forest (RF)
Breiman^{65} introduced a treebased ensemble learning model RF that is used for regression as well as classification problems. In RF, multiple weak learner trees are used to compose a strong learner, so each tree is responsible for the RF errors. Multiple trees are known as forests, and if they are not fully grown, are considered deep trees. These deep trees have low bias but high variance, so they are appropriate choices for the RF model as it focuses on reducing variance. To decrease the dataset's variance, it is partitioned into numerous small subsets using a replacement method known as bootstrap sampling.
However, RF also uses another sampling method called feature sample to use a random subset of the dataset to make the tree. This method can also help in reducing the variance of the dataset. Both sample methods are introduced for RF by Dong et al.^{66} that prevent overfitting problems that can arise from multiple decision trees using the same feature to make their decision. Hence, we can say that RF model is an enhancement of bagging model with feature sample of the dataset.
Additive regression (AR)
Additive Regression (AR) is a ML method approach that focuses on increasing the forecasting accuracy by combining the predictions of multiple regression models. AR methods involves the creation of individual regression models for each predictor variable and then combining their outputs. The AR method aims to utilize the additive effects of each predictor on the response variable. AR models usually perform well when predictor variables interact nonlinearly, as they possess the flexibility to model complex relationships. The final model is an additive composition of these individual regressions models, providing a comprehensive representation of the overall relationship between predictors and the response variable.
Bootstrap aggregation (bagging/BA)
Bootstrap aggregation (bagging) is an ensemble methodology used for both regression and classification problems. In many cases, decision tree models suffer from high variance, which can be circumvented by the Bagging approach. Bagging is usually applied when the amount of data is limited, and a robust estimate of a statistical feature is required. The model uses multiple random training data samples to train multiple models for forecasting. To provide a reliable forecast, the forecasting accuracy of each of these many models is evaluated, and the averaged findings are used. By reducing the effect of individual model variances, this averaging strategy improves the forecasts' overall reliability.
For a given set of k independent observation k_{1}, k_{2}, …, k_{n} each having variance \({\sigma }^{2}\), the variance of mean K of the set of observation is \({\sigma }^{2}/k\). Thus by taking the average value, the resultant observed variance is reduced, and increasing the size of the training sample reduces the variance, enhancing the forecasting accuracy For sample training sets C, Multiple models are produced sample training sets C. \({f}_{1}{\prime}(x),{f}_{2}{\prime}(x),{f}_{3}{\prime}(x),..,{f}_{C}{\prime}(x),\) where \(x<k\). These algorithms are averaged to obtain a low variance model:
However, in many instances large sample sizes are not available. To overcome this, bootstrapping is used to randomly sample multiple datasets and the averaged model is given by:
Empirical equations
The proposed approach is compared to four commonly used empirical equations (Eqs. 8–11) (Huthoff^{18}; Velzen et al.^{22}; Baptist et al.^{12}; Stone and Shen^{21}):
Model performance metrics
To evaluate the effectiveness of the proposed models in forecasting the mean velocity of flow in a vegetated channel, the following six metrics were used: R^{2}, R, MAE, MSE, NSE, and PBias. Their mathematical formulation is given below:
where \(\hat{V}\) and V refer to the forecasted and actual values, \(\overline{\hat{V}}\) and \(\overline{V}\) denote the mean forecasted and mean actual value, respectively, and N is the total number of data points used in the study. Ideal values of R^{2}, R, NSE, MAE, MSE and Pbias are 1, 1 or − 1, 1, 0, 0 and 0 respectively. Model performance can be classified using the NSE values (between − ∞ and 1; Moriasi et al.^{67}): (i) unsatisfactory: NSE ≤ 0.4; (ii) acceptable: 0.40 < NSE ≤ 0.50; (iii) satisfactory: 0.50 < NSE ≤ 0.65; (iv) good: 0.65 < NSE ≤ 0.75; (v) very good: 0.75 < NSE ≤ 1.00.
For visual examination Taylor diagrams, box plots as well as line and scatter plots were utilized in this study. The Taylor diagram offers the advantage of incorporating two primary correlation statistics: standard deviation (SD) and correlation ®, providing a comprehensive visualization of model performance^{68}. The reference point for a Taylor diagram refers to the measured data point. The stronger the forecasting capability of a given model, the nearer the forecasted value to the reference value in terms of R and SD. A box plot’s can demonstrate how effectively a model predicts values at the extremes, median, and quartile ranges; the closer the quartile line of the forecasted value to the actual quartile, and more generally, the greater the similarity in boxplot shape, the better the model performance.
Results
Ascertaining the optimal input parameter combination
Spider and heat map plots of the correlation coefficient in Fig. 2 shows that the number of cylinders per unit vegetated area had the highest impact on flow velocity (R = 0.27), followed by flow depth (R = 0.21), channel (R = 0.18), nondimensional drag coefficient (R = − 0.08), height of the vegetation (R = 0.05), and diameter of cylindrical vegetation (R = − 0.04).
Table 3 shows the different input combination effectiveness, based on the R and MSE values. Input 6 (all input parameters involved) was the optimal combination for seven models out of 12 models (ARM5P, ARREPT, ARRF, BAKstar, BAM5P, BARF, and RF). Input 5 (all involved except d_{v}) was optimal for four models (ARKstar, BAREPT, Kstar, and M5P), and the REPT model performed most strongly with Input 4 (all involved except of d_{v} and h_{v}).
Model performance
Using the testing dataset, it can be observed that all ML models exhibit high performance (Fig. 3), and hybrid models are more capable than standalone models at capturing extreme values (minimum and maximum V values).
To benchmark this performance, Table 4 shows a comparison in performance metrics of the twelve ML models with four empirical equations. In all cases, the model performance is far superior for the ML models. All the models except the empirical equations demonstrate very good forecasting capabilities in terms of R^{2} (R^{2} > 0.7). Based on the NSE model performance classification proposed by Moriasi et al.^{67}, all ML models performed very well, while empirical equations had unsatisfactory performance.
The PBias metric shows the level of bias in model performance. The optimal value of PBias is 0. Usually, the value of PBias ≤ ± 10 corresponds to very good model performance^{69}. A positive PBias indicates an underestimation, while a negative PBias signifies overestimation. Although all ML models have a very good performance, Table 4 shows the PBias values for the standalone and hybrid version of the Kstar model are close to zero. All models, except the empirical equations of Baptist et al.^{12} and Stone and Shen^{21}, demonstrate that the developed models underestimated flow velocity.
The comparison in Table 4 also reveals which of the models had the highest performance. For all metrics but PBias, ARM5P model had the highest forecasting power. In the case of PBias, the Kstar model was judged as the optimal performing model. For all metrics the hybridized ML models outperformed their standalone counterpart.
Box plots are presented to compare the performance of both standalone and hybridized machine learning models (Fig. 4). The results show the quartiles of the ARM5P and observed data almost coincide. In contrast, the quartile for ARREPT shows higher deviation, indicating low performance. In terms of the maximum V value, the RF model and its hybridized versions (ARRF, BARF) showed higher performance, while ARM5P more accurately captured the lowest V value than the other models.
Figure 5a–d shows the box plots of forecasted flow velocity for the empirical equations, plotted separately to those in Fig. 4 because they overestimate flow velocity by a very large margin. The equation developed by Baptist et al.^{12} performed better than the other empirical equations, but none of these equations were able to forecast V accurately.
In the Taylor plot (Fig. 6), the ARM5P model was in close proximity to the observed reference point, indicating that the forecasted standard deviation of flow velocity closely matched the observed data standard deviation, and the correlation was highest among the models evaluated. On the Taylor plot, the RAKStar, BAKStar, ARRF, and BAREPT data points nearly coincide, indicating comparable model performance. Stone and Shen’s^{21} empirical equation had the lowest performance.
Sensitivity analysis
A sensitivity analysis is undertaken to understand the impact of each input parameter on flow velocity by removing one by one a parameter from the model construction and evaluating the effect on model performance. The input combinations for sensitivity analysis are shown in Table 5. For example, Input combination A removed the parameter d_{v} and used the remaining five parameters (N_{v}, D_{f}, α, β_{d}, and h_{v}), Input combination B removed parameter h_{v} and so on. The removal of the h_{v} parameter from the input variable combination produced the largest increase in MAE and MSE values, and thus improvement in model performance, compared to the other parameters (Fig. 7). Therefore, the h_{v} parameter was the most sensitive and effective input parameter for the forecasting of flow velocity, followed by D_{f}, α, N_{v}, β_{d}, and d_{v}.
Discussion
Compare and contrast of the efficacy of empirical, standalone, and proposed hybrid machine learning models
The paper used numerous datasets collected from various sources, in which flow velocity had been measured in differing ways in vegetated channels in varied natural and laboratory conditions, to investigate the efficiency of each model. The empirical equations performed poorly, confirming these relations should be used with due caution outside the conditions for which they were developed. In contrast, all ML models performed well because they can learn and adapt to the changing data.
Among the standalone models, the RF model had a superior performance as compared to the other models. This result occurred for a number of reasons: (1) RF is better at handling datasets that contain null or missing values; (2) each constructed base tree is independent of the others, exhibiting the feature of parallelization; (3) the algorithm is extremely stable, since the average response of a large number of trees are used; and (4) the model preserves variety since all qualities are not evaluated when creating each base tree. This feature has the added advantage of minimizing the feature space and resulting in RF being unaffected by the curse of dimensionality (When the number of features is large compared to the number of observations in the datasets, this situation is commonly referred to as the "curse of dimensionality."). Thus, RF can handle larger datasets, both in dimension and attributes. The hybridized models outperformed their standalone counterparts. This enhanced performance occurred due to the hybridization which lead to a coupled model exhibiting higher flexibility, that is better trained and has a nonlinear structure^{70}. Given the nonlinearity of the relationships between the variables and the weak connection between the individual variables and flow velocity, this flexibility and structure is particularly crucial for the forecasting of flow velocity.
Several factors explain why the hybrid M5P models outperformed all other hybridized models. First, M5P is a comparatively simple and interpretable algorithm, which makes the model's output simpler to comprehend and interpret. Second, M5P is capable of handling both continuous and categorical data, which is beneficial when working with datasets containing both categories of variables. Thirdly, M5P model contains two key components: growing stages and pruning stages. The growth stage involves splitting nodes based on the growth stage values of the characteristics, aiming to reduce the forecasting error for numerical responses at terminal nodes and increase the tree's depth. The pruning stage assesses the contribution of each attribute to a node's forecasting inaccuracy and subsequently prunes unnecessary branches. Fourthly, hybrid models that combine M5P with other algorithms can capitalize on the strengths of both models, resulting in a model that is more robust and accurate.
Impact of input variables on the accuracy of model forecasting
The permutation of input variables significantly influenced the predictive capability of the model, underscoring that identifying the optimal combination is a crucial step in developing an accurate machine learning model. For instance, the input combination with variable "h_{v}" removed exhibited over three times superior forecasting accuracy (in terms of NSE) compared to the least performing input combination. Consequently, a variety of input variable combinations must be explored during the optimization of machine learning models. We employed a manual approach in order to determine the optimal input combination. Methods like PCA and gamma test also provide for optimal input combination but they provide only one set of combination. Manually ascertaining the optimal combination can result in models with a superior forecasting performance because it is possible to determine the hypersensitivity parameters and comprehend the model's hyperparameter reaction and trend by varying the input values.
The current paper showed that, in most of the cases, the optimal input combination corresponded to the inclusion of all the input parameters. Even parameters with low correlation with flow velocity, such as vegetation height and diameter, contributed to better forecasting power. This result further highlights the complex, nonlinear nature of the interaction of vegetation with flow mechanics, and the requirement for multiple input parameters to represent this interaction. Consequently, a variety of distinct input variable combinations must be taken into account during the optimization of machine learning models, even when channel, flow and vegetation parameters might a priori be considered ineffective.
Capturing impact on flow velocity of vegetated alluvial channels through AI models
Vegetative elements significantly influence flow velocity in vegetated channels, and understanding these effects is critical for accurate forecasting and effective river management. Taller plants extend further into the flow, giving the water more surface area to push against, increasing drag force and decreasing flow velocity. Furthermore, dense canopies formed by taller vegetation greatly restrict water flow, increasing resistance and decreasing velocity, whereas shorter vegetation allows water to flow more freely, resulting in higher velocities. Taller vegetation contributes to more turbulence in the water column. The turbulence dissipates the water's kinetic energy, further reducing its velocity. In contrast, shorter plants have less surface area in contact with the water, resulting in less drag and faster water flow. In addition, shorter vegetation produces less turbulence, preserving the water's kinetic energy and maintaining a higher flow velocity.
These complex interactions between vegetation and flow mechanics demonstrate the nonlinearity of flow velocity in vegetated channels. Machine learning models, particularly hybrid models, have demonstrated great potential for capturing these complex nonlinear interactions. These models can learn the relationships between various vegetative parameters (such as height, density, and flexibility) and flow velocity by utilizing heterogeneous datasets. The superior performance of hybrid models demonstrates their ability to accurately forecast flow velocity in a variety of vegetative and channel conditions. The proposed ARM5P model, for example, effectively integrates autoregressive components to account for temporal dependencies and combines them with treebased algorithms to detect nonlinear patterns in data.
This study compared twelve ML models, including hybrids like ARM5P, to traditional empirical equations. The results showed that hybrid ML models outperformed empirical equations for predicting flow velocity in vegetated channels. These models excelled at accounting for the diverse and complex effects of vegetative elements on flow velocity. However, more research is needed to investigate how these models perform across a wider range of vegetation types and channel morphologies. Vegetation flexibility, spacing, and seasonal variations in vegetation characteristics all have an impact on model accuracy and should be taken into account in future studies.
Applying machine learning methods to forecast flow velocity in vegetated channels
The results indicate that hybrid M5P models, particularly M5P models trained with an Additive Regression algorithm, have the potential to generate accurate forecasting of flow velocity in vegetated river channels. Such methods can be easily employed in regions/countries where understanding of the flowvegetation processes in river systems is limited. The ML models developed in this paper offer primary advantages in terms of simplicity, ease of construction, and low operational costs. This stands in contrast to theoretical and numerical models, which frequently demand substantial prior knowledge and resources for their development. The main disadvantages are twofold. In line with other statistical approaches, the models formulated in this research are tailored to the specific rivers under examination and employing them in different river settings might not produce comparable forecasting accuracy. The input parameter range will be wider than examined in this paper, despite using datasets composed from a variety of sources from both lab and field investigations. Thus, future studies should develop and apply ML models to rivers with differing channel and plant morphologies to test their wider applicability. Second, as a result of their 'black box' structure, these models have limited explanation regarding their results and are unable to provide insight into the physical factors that determine flow velocity.
The current study has considered seven controlling parameters, revealing that flow depth, channel slope, nondimensional drag coefficient, height and the diameter of vegetation, and ratio of cylinders to vegetation per unit area must all be accounted for in ML models of flow velocity. Future studies should take into account how other characteristics, like vegetation flexibility and spacing, affect the effectiveness of these models where data is available. (e.g. Haslam^{71}; SandJensen^{72}), assisting in identifying the key parameters influencing flow velocity and elucidating the reasons behind their variations among rivers characterized by distinct vegetation and channel properties.
Applying hybrid ML models for forecasting natural issues
The ARM5P algorithm, a hybrid approach combining autoregressive (AR) models with the M5P model tree, has demonstrated superior performance in our prediction tasks. This model’s effectiveness can be leveraged in several critical natural and environmental domains. ARM5P can be used to model and predict climate variables such as temperature, precipitation, and sealevel rise. Its ability to handle both linear and nonlinear relationships makes it particularly suitable for capturing the complex interactions inherent in climate systems. For instance, it can predict temperature anomalies or precipitation patterns, which are crucial for understanding and mitigating the impacts of climate change.
In order to estimate pan evaporation rates using meteorological data from three Iraqi stations, Elbeltagi et al.^{73} investigated the coupling of the additive regression model (AR) with four machine learning models including M5P. The ARM5P model, which used wind speed, relative humidity, and minimum and mean temperatures, showed that hybrid methods can accurately predict complex hydrological relationships.
Elbeltagi et al.^{74} used five intelligent and hybrid metaheuristic machine learning algorithms (AR, ARBagging, ARRandomSubspace, ARM5P, and ARREPTree) to predict monthly mean daily reference evapotranspiration using climatic data from two semiarid regions in Pakistan (1987–2016). The results revealed that all models predicted monthly mean daily reference evapotranspiration with high precision, with the ARM5P model achieving the highest accuracy.
The increasing need for agricultural production and frequent droughts require accurate estimation of actual evapotranspiration for effective irrigation management. Granata^{75} compared three machine learning models along with ARM5P with different input variables to predict evapotranspiration using data from Central Florida. Vishwakarma et al.^{76} used the M5P model to assess dams' impact on river hydrology and daily water temperature in the Yangtze River at Cuntan, emphasizing the importance of accurate water temperature prediction for ecological and operational planning. These models offer dependable and costeffective tools for forecasting water temperature, which helps with reservoir planning and environmental management.
In summary, the ARM5P algorithm’s robustness and flexibility makes it a valuable tool for addressing a wide range of natural and environmental issues. Its ability to integrate and analyze multifaceted datasets allows for more accurate predictions and informed decisionmaking, ultimately contributing to the sustainability and resilience of natural systems. We trust that this enhanced discussion addresses your concern and illustrates the broader applicability of the ARM5P model in tackling natural issues.
Explainability of machine learning approaches used
Explainability of machine learning approaches in the context of ARM5P
Explainability in ML refers to the ability to describe the inner workings and decisionmaking processes of models in a way that is understandable to humans. This is crucial for validating model predictions, ensuring user trust, and facilitating regulatory compliance. In the context of our study, the explainability of the ARM5P algorithm can be discussed as follows:
Model structure and decision rules
The ARM5P algorithm combines autoregressive models with M5P model trees, which are inherently more interpretable than many blackbox models. The M5P model tree generates decision rules in the form of linear regression functions at its leaves. These rules can be easily inspected and interpreted to understand how the model makes predictions based on input features. For example, the decision paths in the tree can be traced to see how specific variables contribute to the final prediction.
Feature importance
The ARM5P model provides insights into feature importance by indicating which variables are used in the decision nodes of the tree. By analyzing the frequency and impact of features at different nodes, we can identify the most influential variables driving the predictions. This helps in understanding the relative importance of each feature in the context of the model. Further, we have done input combinations in this study thereby incorporating the relative importance of features with respect to others. Also, we determine the sensitivity analysis to find the most influential parameter in this study.
Model simplification
While ARM5P is more interpretable than many complex models, further simplification techniques, such as pruning the decision tree, can enhance interpretability without significantly compromising accuracy. Simplified models are easier to interpret and explain, making them more accessible to nontechnical stakeholders.
In summary, the ARM5P algorithm offers several avenues for explainability, from its inherently interpretable model structure. By leveraging these methods, we can enhance the transparency and interpretability of our ML predictions, thereby fostering greater trust and understanding among users and stakeholders.
Limitations of the study
Predicting flow velocity in a vegetative alluvial channel can be quite challenging due to the numerous variables that require consideration. This study utilized a range of datasets from the literature, including the number of cylinders per unit vegetated area (N_{v}), flow depth (D_{f}), channel slope (α), vegetation height (h_{v}), cylindrical vegetation diameter (d_{v}), and nondimensional drag coefficient (β_{d}). However, various factors, such as the shape of the channel bed, the Froude number, the amount of water flowing through the channel, and more, can all impact the prediction of flow velocity. Our dataset was missing these factors, so our proposed methods did not take their influence into consideration. In addition, the range of variables plays a crucial role in the training of the ML method. Although our dataset includes data from various field and laboratory studies, there are instances where the input variables exceed the values considered by the authors. In these two instances, the proposed method may not perform as well as it currently does. These concerns are common in most MLbased methods, as training heavily depends on the dataset and its characteristics.
Gaussian noise is a key concept in signal processing and machine learning. It refers to a type of random variation that follows a Gaussian distribution. By injecting Gaussian noise into the data, we impose a level of unpredictability specified by this particular distribution. This has the potential to significantly alter the performance and analysis of the ML approach. In our method, we apply 10%, 20%, and 30% Gaussian noise to each column sequentially. This methodology introduces a specified level of disruption into the data, which might be useful for assessing the resilience and ability of ML approaches to apply to fresh data sets. An investigation of the influence of Gaussian noise on ML method performance frequently includes analyzing the ML method's ability to handle noisy inputs and determining whether it can still create correct predictions despite the increased variability (Table 6).
Conclusion and future work
The precise forecasting of flow velocity in vegetated channels is important for estimating flooding and sediment transport. As a result of the nonlinear interactions between vegetation and flow mechanics, machine learning methods have great potential for forecasting flow velocity with high accuracy. Using flow velocity measurements in natural and laboratory flume experiments, this research evaluated the performance of twelve ML models (Kstar, ARKstar, BAKstar, M5P, ARM5P, BAM5P, REPT, ARREPT, BAREPT, RF, BARF, ARRF) for forecasting of flow velocity in an alluvial channel with submerged vegetation. Their performance was compared against those of four empirical equations, using a large number of datasets available in the literature. The main findings were as follows:

(1)
Results from a sensitivity analysis indicated that the most influential factor on flow velocity was vegetation height, followed by flow depth, the ratio of cylinders to vegetation per unit area, channel slope, nondimensional drag coefficient, and vegetation diameter.

(2)
The ARM5P model had the greatest predictive ability. According to Nash–Sutcliffe Efficiency values, all machine learning models displayed ‘very good’ performance and outperformed empirical models which had ‘unsatisfactory’ performance. All models, except two empirical equations, underestimated flow velocity.

(3)
Compared to standalone machine learning and empirical models, hybrid models have a superior forecasting power because they are more flexible in their internal structure and had capabilities of reproducing nonlinear interactions between vegetation, channel, and flow characteristics more effectively.

(4)
Nearly all ML methods performed accurately when all input parameters were utilized in model construction. Input variables exhibiting low correlation coefficients with flow velocity were found to enhance the accuracy of forecasting. As a result, the optimization of machine learning models necessitates the consideration of a diverse array of input variable combinations.
These results of this study shows that hybrid ML models possess tremendous potential in forecasting flow velocity and examining nonlinear flowvegetation interactions, particularly in situations where the physical processes under consideration are not fully understood. Consequently, understanding this potential over a wider range of vegetation and channel morphologies, and considering how other factors affect the performance of these models, such as vegetation flexibility and spacing, is a crucial research avenue for river scientists.
Future work on flow velocity prediction in vegetated channels could explore and improve in a number of directions. Firstly, the ML method could enhance its ability to capture nontrivial data by incorporating cuttingedge deep learning architectures such as convolutional neural networks (CNNs) or recurrent neural networks (RNNs). Domainspecific properties related to fluid dynamics and hydrodynamics may enhance the prediction capacity of the ML approach. As a result, predictions for different flow patterns, vegetation types, and scenarios may be more accurate and reliable. Also, studying hybrid machine learning methods that combine datadriven machine learning methods with physicsbased ML techniques could combine the benefits of both approaches, making predictions more accurate without sacrificing the ability to understand how physical things work. Furthermore, the acquisition of a larger and more diverse dataset encompassing a wide range of flow conditions, geometries, and sizes facilitates the training of machine learning algorithms that can handle realworld scenarios with greater precision and reliability. Furthermore, the development of easytouse software tools or platforms for predicting flow velocity in vegetated channels using webbased techniques or Android apps can improve their acceptability. This way, we can practically implement our work on river flow management and environmental protection.
Data availability
Data will be made available on request from the corresponding author Vishal Deshpande at deshpande@iitp.ac.in .
Abbreviations
 V :

Flow velocity
 N _{v} :

Number of cylinders per unit vegetated area
 D _{f} :

Flow depth
 α :

Channel slope
 h _{v} :

Height of the vegetation
 d _{v} :

Diameter of cylindrical vegetation
 β _{d} :

Nondimensional drag coefficient
 g :

Gravitation acceleration
References
Yen, B. C. Open channel flow resistance. J. Hydraul. Eng. 128, 20–39 (2002).
Clark, S. D. A. et al. Modelling river flow through instream natural vegetation for a gravelbed river reach. In Recent Trends in Environmental Hydraulics: 38th International School of Hydraulics 33–41 (2020).
Kouwen, N., Unny, T. E. & Hill, H. M. Flow retardance in vegetated channels. J. Irrig. Drain. Div. 95, 329–342 (1969).
Velasco, D., Bateman, A., Redondo, J. M. & Demedina, V. An open channel flow experimental and theoretical study of resistance and turbulent characterization over flexible vegetated linings. Flow, Turbul. Combust. 70, 69–88 (2003).
Wilson, C. A. M. E., Stoesser, T., Bates, P. D. & Pinzen, A. B. Open channel flow through different forms of submerged flexible vegetation. J. Hydraul. Eng. 129, 847–853 (2003).
Chen, S. C., Kuo, Y. M. & Li, Y. H. Flow characteristics within different configurations of submerged flexible vegetation. J. Hydrol. 398, 124–134 (2011).
Armanini, A., Righetti, M. & Grisenti, P. Direct measurement of vegetation resistance in prototype scale. J. Hydraul. Res. 43, 481–487 (2005).
Ikeda, S. & Kanazawa, M. Threedimensional organized vortices above flexible water plants. J. Hydraul. Eng. 122, 634–640 (1996).
Liu, D., Diplas, P., Fairbanks, J. D. & Hodges, C. C. An experimental study of flow through rigid vegetation. J. Geophys. Res. Earth Surf. 113, (2008).
Stoesser, T., Kim, S. J. & Diplas, P. Turbulent flow through idealized emergent vegetation. J. Hydraul. Eng. 136, 1003–1017 (2010).
Cheng, S. et al. Improved understanding of how catchment properties control hydrological partitioning through machine learning. Water Resour. Res. 58, e2021WR031412 (2022).
Hoffmann, M. R. & Hoffmann, R. D. On inducing equations for vegetation resistance. J. Hydraul. Res. 47, 281 (2009).
Defina, A. & Bixio, A. C. Mean flow and turbulence in vegetated open channel flow. Water Resour. Res. 41, 1–12 (2005).
Neary, V. S. Numerical solution of fully developed flow with vegetative resistance. J. Eng. Mech. 129, 558–563 (2003).
Poggi, D., Krug, C. & Katul, G. G. Hydraulic resistance of submerged rigid vegetation derived from firstorder closure models. Water Resour. Res. 45, (2009).
Choi, S. U. & Kang, H. Reynolds stress modeling of turbulent openchannel flows. Water Resour. Res. Prog. 42, 351–414 (2008).
Green, J. C. Effect of macrophyte spatial variability on channel resistance. Adv. Water Resour. 29, 426–438 (2006).
Huthoff, F. Modeling hydraulic resistance of floodplain vegetation. 171 (2007).
Kouwen, N. & FathiMoghadam, M. Friction Factors for Coniferous Trees along Rivers. J. Hydraul. Eng. 126, 732–740 (2000).
Stephan, U. & Gutknecht, D. Hydraulic resistance of submerged flexible vegetation. J. Hydrol. 269, 27–43 (2002).
Stone, B. M. & Shen, H. T. Hydraulic resistance of flow in channels with cylindrical roughness. J. Hydraul. Eng. 128, 500–506 (2002).
Van Velzen, E., Jesse, P., Cornelissen, P. & Coops, H. Stromingsweerstand vegetatie in uiterwaarden deel 1 handboek versie 1.0. RIZA, Arnhem 157 (2003).
Mir, A. A. & Patel, M. Machine learning approaches for adequate prediction of flow resistance in alluvial channels with bedforms. Water Sci. Technol. 89, 290–318 (2024).
Munawar, H. S., Hammad, A. W. A. & Waller, S. T. A review on flood management technologies related to image processing and machine learning. Autom. Constr. 132, 103916 (2021).
Kavzoglu, T., Colkesen, I. & Sahin, E. K. Machine learning techniques in landslide susceptibility mapping: a survey and a case study. Landslides Theory Pract. Model. 283–301 (2019).
Tehrani, F. S., Calvello, M., Liu, Z., Zhang, L. & Lacasse, S. Machine learning and landslide studies: Recent advances and applications. Nat. Hazards 114, 1197–1245 (2022).
Najafzadeh, M. & Oliveto, G. Riprap incipient motion for overtopping flows with machine learning models. J. Hydroinformatics 22, 749–767 (2020).
Bizimana, H. & Altunkaynak, A. Investigating the effects of bed roughness on incipient motion in rigid boundary channels with developed hybrid GenoFuzzy versus NeuroFuzzy Models. Geotech. Geol. Eng. 39, 3171–3191 (2021).
Wang, Y., Chen, W. & Wang, Y. Prediction and estimation of river velocity based on GAN and multifeature fusion. Comput. Intell. Neurosci. 2022, (2022).
Hussain, D. & Khan, A. A. Machine learning techniques for monthly river flow forecasting of Hunza River Pakistan. Earth Sci. Inf. 13, 939–949 (2020).
Mohamed, H. I. Design of alluvial Egyptian irrigation canals using artificial neural networks method. Ain Shams Eng. J. 4, 163–171 (2013).
Gholami, A., Bonakdari, H., Ebtehaj, I., Shaghaghi, S. & Khoshbin, F. Developing an expert group method of data handling system for predicting the geometry of a stable channel with a gravel bed. Earth Surf. Process. Landforms 42, 1460–1471 (2017).
Tahershamsi, A., Majdzade Tabatabai, M. R. & Shirkhani, R. An evaluation model of artificial neural network to predict stable width in gravel bed rivers. Int. J. Environ. Sci. Technol. 9, 333–342 (2012).
Khozani, Z. S., Bonakdari, H. & Ebtehaj, I. An expert system for predicting shear stress distribution in circular open channels using gene expression programming. Water Sci. Eng. 11, 167–176 (2018).
Maji, S., Senapati, A. & Mondal, A. Investigation and validation of flow characteristics through emergent vegetation patch using machine learning technique. Smart Innov. Syst. Technol. 267, 131–139 (2022).
Deng, Y. & Liu, Y. Prediction of depthaveraged velocity for flow though submerged vegetation using least squares support vector machine with bayesian optimization. Water Resour. Manag. 38, 1675–1692 (2024).
Kumar, S., Kumar, B., Deshpande, V. & Agarwal, M. Predicting flow velocity in a vegetative alluvial channel using standalone and hybrid machine learning techniques [Formula presented]. Expert Syst. Appl. 232, 120885 (2023).
Meddage, D. P. P. et al. Predicting bulk average velocity with rigid vegetation in open channels using treebased machine learning: A novel approach using explainable artificial intelligence. Sensors 22, 4398 (2022).
Boraah, N. & Kumar, B. Prediction of submerged vegetated flow in a channel using GMDHtype neural network approach. River Hydraul. Hydraul. Water Resour. Coast. Eng. Vol. 2 191–205 (2022).
Barman, J. & Kumar, B. Flow in multilayered vegetated compound channels with different bank slopes. Phys. Fluids 35, (2023).
Arora, S., Patel, H. K., Srinivasulu, G. & Kumar, B. Turbulent characteristics at interface of partly vegetated alluvial channel. Int. J. Civ. Eng. 22, 75–85 (2024).
Barman, B., Kashyap, S. N. & Kumar, B. Flow velocity prediction in a vegetated channel using soft computing techniques. Multiscale Multidiscip. Model. Exp. Des. 1–11 (2024).
Barman, J., Kumar, B. & Balachandar, R. Hydrodynamics in channels with partial vegetation cover: Investigating the effects of homogeneous and heterogeneous vertical vegetation distribution. Adv. Water Resour. 185, 104642 (2024).
Borovkov, V. S. & Yurchuk, M. Hydraulic resistance of vegetated channels. Hydrotechnical Constr. 28, (1995).
Einstein, H. A. & Banks, R. B. Fluid resistance of composite roughness. Eos, Trans. Am. Geophys. Union 31, 603–610 (1950).
Fenzl, R. N. Hydraulic Resistance of Broad Shallow Vegetated Channels (University of California, 1962).
Ree, W. O. & Crow, F. R. Friction Factors for Vegetated Waterways of Small Slope. ArsS151 (Agricultural Research Service, US Department of Agriculture, 1977).
Murota, A., Fukuhara, T. & Sato, M. Turbulence structure in vegetated open channel flows. J. Hydrosci. Hydraul. Eng. 2, 47–61 (1984).
Tsujimoto, T. & Kitamura, T. Velocity profile of flow in vegetatedbed channels. KHL Progress. Rep. 1, 43e55 (1990).
T. Tsujimoto, T. Kitamura & T. Okada. Turbulent Structure of Flow over Rigid VegetationCovered Bed in Open Channels. KHLCommunication 31–40 (1991).
Tsujimoto, T. Turbulent structure of openchannel flow over flexible vegetation. KHLCommunication 37–46 (1993).
Shimizu, Y. & Tsujimoto, T. Numerical aanlysis of turbulent openchannel flow over a vegetation layer using A ke turbulence model. J. Hydrosci. Hydraul. Eng. 11, 57–67 (1994).
Dunn, C., Lopez, F. & Garcia, M. Mean Flow and Turbulence in a Laboratory Channel with Simulated Vegetation. Hydraulic Engineering Series vol. 51 http://hdl.handle.net/2142/12229 (1996).
Meijer, D. G. Modelproeven overstroomd riet. HKVlijn in water (1998).
Jarvela, J. Flow resistance of flexible and stiff vegetation: A flume study with natural plants. J. Hydrol. 269, 44–54 (2002).
Rowinski, P. M. & Kubrak, J. A mixinglength model for predicting vertical velocity distribution in flows through emergent vegetation. Hydrol. Sci. J. 47, 893–904 (2002).
Carollo, F. G., Ferro, V. & Termini, D. Flow resistance law in channels with flexible submerged vegetation. J. Hydraul. Eng. 131, 554–564 (2005).
Murphy, E., Ghisalberti, M. & Nepf, H. Model and laboratory study of dispersion in flows with submerged vegetation. Water Resour. Res. 43, (2007).
Chung, C.J.F. & Fabbri, A. G. Validation of spatial prediction models for landslide hazard mapping. Nat. Hazards 30, 451–472 (2003).
Cleary, J. G. & Trigg, L. E. K. An instancebased learner using an entropic distance measure. In Machine Learning Proceedings 1995, pp. 108–114 (Elsevier, 1995).
Wang, Y. & Witten, I. H. Induction of model trees for predicting continuous classes (1996).
Quinlan, J. R., et al. Learning with continuous classes. In 5th Australian joint conference on artificial intelligence vol. 92, pp. 343–348 (1992).
Zhan, C., Gan, A. & Hadi, M. Prediction of lane clearance time of freeway incidents using the M5P tree algorithm. IEEE Trans. Intell. Transp. Syst. 12, 1549–1557 (2011).
Wang, Y. & Witten, I. H. Inducing model trees for continuous classes. Proc. Ninth Eur. Conf. Mach. Learn. 9, 128–137 (1997).
Breiman, L. Random forests. Random Forests, 1–122. Mach. Learn. 45, 5–32 (2001).
Dong, X., Yu, Z., Cao, W., Shi, Y. & Ma, Q. A survey on ensemble learning. Front. Comput. Sci. 14, 241–258 (2020).
Moriasi, D. N. et al. Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Trans. ASABE 50, 885–900 (2007).
Taylor, K. E. Summarizing multiple aspects of model performance in a single diagram. J. Geophys. Res. Atmos. 106, 7183–7192 (2001).
Legates, D. R. & McCabe, G. J. Evaluating the use of ‘goodnessoffit’ measures in hydrologic and hydroclimatic model validation. Water Resour. Res. 35, 233–241 (1999).
De’Ath, G. & Fabricius, K. E. Classification and regression trees: A powerful yet simple technique for ecological data analysis. Ecology 81, 3178–3192 (2000).
Haslam, S. M. River plants; the macrophytic vegetation of watercourses. (1978).
SandJensen, K. Drag and reconfiguration of freshwater macrophytes. Freshw. Biol. 48, 271–283 (2003).
Elbeltagi, A., AlMukhtar, M., Kushwaha, N. L., AlAnsari, N. & Vishwakarma, D. K. Forecasting monthly pan evaporation using hybrid additive regression and datadriven models in a semiarid environment. Appl. Water Sci. 13, 42 (2023).
Elbeltagi, A. et al. Data intelligence and hybrid metaheuristic algorithmsbased estimation of reference evapotranspiration. Appl. Water Sci. 12, 152 (2022).
Granata, F. Evapotranspiration evaluation models based on machine learning algorithms—A comparative study. Agric. Water Manag. 217, 303–315 (2019).
Vishwakarma, D. K. et al. Pre and postdam river water temperature alteration prediction using advanced machine learning models. Environ. Sci. Pollut. Res. 29, 83321–83346 (2022).
Funding
This work was supported by JSPS KAKENHI Grant Number 22KK0160.
Author information
Authors and Affiliations
Contributions
Conceptualization, M.A., Data Curation, Formal Analysis, Investigation, S.K., V.D., Methodology, M.A., Visualization; N.R., Writing an Original Draft, K.H.K., Editing and Reviewing, U.R., K.K., Validation, J.R.C., Y.H., Supervision and Project Administration, U.R.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kumar, S., Agarwal, M., Deshpande, V. et al. AIdriven predictions of geophysical river flows with vegetation. Sci Rep 14, 16368 (2024). https://doi.org/10.1038/s41598024672692
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598024672692
Keywords
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.