Abstract
Water is stored in reservoirs for various purposes, including regular distribution, flood control, hydropower generation, and meeting the environmental demands of downstream habitats and ecosystems. However, these objectives are often in conflict with each other and make the operation of reservoirs a complex task, particularly during flood periods. An accurate forecast of reservoir inflows is required to evaluate water releases from a reservoir seeking to provide safe space for capturing high flows without having to resort to hazardous and damaging releases. This study aims to improve the informed decisions for reservoirs management and water prerelease before a flood occurs by means of a method for forecasting reservoirs inflow. The forecasting method applies 1- and 2-month time-lag patterns with several Machine Learning (ML) algorithms, namely Support Vector Machine (SVM), Artificial Neural Network (ANN), Regression Tree (RT), and Genetic Programming (GP). The proposed method is applied to evaluate the performance of the algorithms in forecasting inflows into the Dez, Karkheh, and Gotvand reservoirs located in Iran during the flood of 2019. Results show that RT, with an average error of 0.43% in forecasting the largest reservoirs inflows in 2019, is superior to the other algorithms, with the Dez and Karkheh reservoir inflows forecasts obtained with the 2-month time-lag pattern, and the Gotvand reservoir inflow forecasts obtained with the 1-month time-lag pattern featuring the best forecasting accuracy. The proposed method exhibits accurate inflow forecasting using SVM and RT. The development of accurate flood-forecasting capability is valuable to reservoir operators and decision-makers who must deal with streamflow forecasts in their quest to reduce flood damages.
Introduction
Floods are natural hazards that affect an average of 80 million people annually and cause more deaths and financial losses than any other natural disaster1,2. One of the traditional ways to control floods is building dams and reservoirs, which are operated to create flood control space to store and regulate high flows. Water is released gradually according to the safe discharge in the rivers downstream to meet the required flood control space. Accurate forecasts of reservoir inflows must be made before the flood events. Identifying appropriate algorithms for forecasting future reservoir inflow is paramount to reservoir operators. An example of Forecast-Informed Reservoir Operation (FIRO) has been practised in Mendocino Lake, California, during the past few decades3. FIRO is a strategy that improves informed decisions about releasing water from reservoirs and increases flexibility in the operation and management of reservoirs by improving hydrologic forecasting3,4.
Physically-based and statistical models have been applied to forecast reservoir inflows5. Physically-based models simulate the involved hydrological processes and estimate reservoir inflow6,7,8. Physically-based models such as the Soil and Water Assessment Tool (SWAT)9, the watershed-scale Long-Term Hydrologic Impact Assessment model (watershed-scale L-THIA)10 and the Hydrological Simulation Program—Fortran (HSPF)11 are used to simulate water cycle components12. Physically-based models can be applied to simulate flood events accounting for the key hydrologic processes involved. They often require large volumes of hydro-geomorphological data, detailed information about the characteristics and dynamic changes of a watershed, and are computationally expensive13. Besides, physically-based models make simplifications of hydrologic processes14 and involve parameters that must be calibrated, sometimes with in-depth effort, which causes model forecasts to vary greatly among models15.
Recent advancements in Machine Learning (ML) modeling techniques can address and overcome the difficulties that beset physically-based models, giving impetus to using data-driven algorithms and ML modeling in reservoir inflow forecasting, among others. ML algorithms can be applied to forecast reservoir inflow by relying on relevant data rather than simulating the hydrological processes involved16. The advantages of using ML algorithms are easier and faster implementation, less computational effort, and reduced complexity compared to the physically-based models, particularly the distributed type variety17,18. A variety of ML algorithms have been applied to analyze big data and large-scale systems, in particular for hydrologic modeling and water resources management19,20,21,22,23. For example, Support Vector Machine (SVM) was implemented for lake water level forecast24, modelling daily reference evapotranspiration25, soil moisture estimation26, water quality forecast modelling27, and groundwater quality characterization28. Artificial Neural Networks (ANNs) were applied to forecasting the runoff coefficient29, river discharges forecasting30, water demand forecasting under climate change31, wastewater temperature forecasting32, and groundwater level simulation33. Genetic Programming (GP) was applied to forecasting rainfall-runoff response34, suspended sediment modeling35, calculating of the optimal operation of an aquifer-reservoir system36, modelling of groundwater37, and crop yield estimating38.
Several previous studies have forecasted river flow for flood routing39, flood susceptibility mapping40 and calculating flood damages41 in unregulated rivers. This work proposes a river flow forecasting method to improve flood mitigation by reservoirs and guide FIRO to reduce flood damages.
Heavy and continuous precipitation in 2019 led to severe floods in large areas of Iran, which caused great material and human losses. The southwestern basins of the country had the most share of precipitation and suffered significant damages due to floods. River flow forecasts did not forecast accurately the magnitude of the reservoirs inflow, which led to inadequate flood control by reservoirs operation42. The 2019 flood event raised questions about the poor river flow forecasting performance. This work addresses these questions. This work develops methods for flood forecasting in terms of timing and magnitude to allow operators to release water from reservoirs and route the floods with minimal or no damage. The flood forecasts are made with 1- and 2-month time-lag patterns in the algorithms. Each time-lag pattern produces four flood projections, which correspond to the wettest months in the study area. Specifically, the flood forecasts provide operators with information about the reservoirs inflows likely to occur during the wettest months of the year (January, February, March and April) with one month lead time (obtained with the forecasts based on the 1-month time-lag pattern) and with two months lead time (obtained with the forecasts based on the 2-month time-lag pattern). This study’s flood forecasting methodology considers the effect that practical limitations, such as data scarcity, have on the accuracy of the forecasts. A challenge in developing countries is the scarcity of hydro-climate data due to the lack of modern hydrologic and weather monitoring stations. This paper’s data-driven flood forecasting methodology is intended to support FIRO and reduce flood damages.
Methods
This study applies the SVM, ANN, RT, and GP, for forecasting monthly reservoirs inflow with 1- and 2-month time lags. The historical data for inflow to the Dez, Karkheh, and Gotvand reservoirs were collected and used to build the ML algorithms. The inputs to the algorithms for the Dez, Karkheh, and Gotvand reservoirs are the monthly inflows for 1965–2019, 1957–2019, and 1961–2019, respectively. Four projections were designed for the 1-month time lag and the 2-month time lag patterns based on the input and output months, as depicted in Fig. 1. Figure 2 displays the flowchart of this paper’s methodology.
Support vector machine
Support Vector Machine was introduced by Vapnik et al.43. SVM performs classification and regression based on statistical learning theory44. The regression form of SVM is named support vector regression (SVR). Vapnik et al.45 defined two functions for SVR design. The first function is the error function. (Eq. (1), see Fig. 3). The second function is a linear function that calculates output values for input, weight, and deviation values (Eq. 2):
where \(y\), \(f(x)\), \(\varepsilon\), \(\xi\), \(W\), \(b\), \(T\) denote respectively the observational value, the output value calculated by SVR, a function sensitivity value, a model penalty, the weight applied to the variable \(x\), the deviation of \(W^{T} x\) from the \(y\), and the vector/matrix transpose operator.
It is seen in Fig. 3 that the first function (Eq. 1) does not apply a penalty to the points where the difference between the observed value and the calculated value falls within the range of \(( - \varepsilon , + \varepsilon )\). Otherwise, a penalty \(\xi\) is applied. SVR solves an optimization problem that minimizes the forecast error (Eq. 3) to improve the model’s forecast accuracy. Equations (4) and (5) represent the constraints of the optimization problem.
Subject to:
where \(C\), m, \(\xi_{i}^{ - }\), \(\xi_{i}^{ + }\), \(y_{i}\), and || || denote respectively the penalty coefficient, the number of input data to the model in the training phase, the penalty for the lower bound \(( - \varepsilon , + \varepsilon )\), the penalty for the upper bound \(( - \varepsilon , + \varepsilon )\), the i-th observational value, and vectorial magnitude. The values of W and b are calculated by solving the optimization problem embodied by Eqs. (3)–(5) with the Lagrange method, and they are substituted in Eq. (2) to calculate the SVR output. SVR is capable of modeling nonlinear data, in which case it relies on transfer functions to transform the data to such that linear functions can be fitted to the data. Reservoirs inflow is forecasted with SVR was performed with the Tanagra software. The transfer function selected and used in this study is the Radial Basis Function (RBF), which provided better results than other transfer functions. The weight vector W is calculated using the Soft Margin method46, and the optimal values of the parameters \(\xi_{i}^{ - } , + \xi_{i}^{ + }\) and C were herein estimated by trial and error.
Regression tree (RT)
RT involves a clustering tree with post-pruning processing (CTP). The clustering tree algorithm has been reported in various articles as the forecasting clustering tree47 and the monothetic clustering tree48. The clustering tree algorithm is based on the top-down induction algorithm of decision trees49; This algorithm takes a set of training data as input and forms a new internal node, provided the best acceptable test can be placed in a node. The algorithm selects the best test scores based on their lower variance. The smaller the variance, the greater the homogeneity of the cluster and the greater the forecast accuracy. If none of the tests significantly reduces the variance the algorithm generates a leaf and tags it as being representative of data47,48.
The CTP algorithm is similar to the clustering tree algorithm, except that its post-pruning process is performed with a pruning set to create the right size of the tree50.
RT used in this study is programmed in the MATLAB software. The minimum leaf size, the minimum node size for branching, the maximum tree depth, and the maximum number of classification ranges are set by trial and error in this paper’s application.
Genetic programming (GP)
GP, developed by Cramer51 and Koza52, is a type of evolutionary algorithm that has been used effectively in water management to carry out single- and multi-objective optimization53. GP finds functional relations between input and output data by combining operators and mathematical functions relying on structured tree searches44. GP starts the searching process by generating a random set of trees in the first iteration. The tree's length creates a function called the depth of the tree which the greater the depth of the tree, the more accurate the GP functional relation is54. In a tree structure, all the variables and operators are assumed to be the terminal and function sets, respectively. Figure 4 shows mathematical relational functions generated by GP. Genetic programming consists of the following steps:
-
Select the terminal sets: these are the problem-independent variables and the system state variables.
-
Select a set of functions: these include arithmetic operators (÷ , ×, −, +), Boolean functions (such as "or" "and"), mathematical functions (such as sin and cos), and argumentative expressions (such as if–then-else), and other required statements based on problem objectives.
-
Algorithmic accuracy measurement index: it determines to what extent the algorithm is performing correctly.
-
Control components: these are numerical components, and qualitative variables are used to control the algorithm's execution.
-
Stopping criterion: which determines when the execution of the algorithm is terminated.
The Genexprotools software was implemented in this study to program GP. The GP parameters, operators, and linking functions were chosen based on the lowest RMSE in this study. The GP model's parameters and operators applied in this study are listed in Table 1.
Artificial Neural Network (ANN)
ANN, developed by McCollock and Walterpits55, is an artificial intelligence-based computational method that features an information processing system that employs interconnected data structures to emulate information processing by the human brain56. A neural network does not require precise mathematical algorithms and, like humans, can learn through input/output analysis relying on explicit instructions57. A simple neural network contains one input layer, one hidden layer, and one output layer. Deep-learning networks have multiple hidden layers58. ANN introduces new inputs to forecast the corresponding output with a specific algorithm after training the functional relations between inputs and outputs.
This study applies the Multi-Layer Perceptron (MLP). A three-layer feed-forward ANN that features a processing element, an activation function, and a threshold function, as shown in Fig. 5. In MLP, the weighted sum of the inputs and bias term is passed to activation level through a transfer function to create the one output.
The output is calculated with a nonlinear function as follows:
where \(W_{i}\), \(X_{i}\), \(b\), \(f\), and \(Y\) denote the i-th weight factor, the i-th input vector, the bias, the conversion function, and the output, respectively.
The ANN was coded in MATLAB. The number of epochs, the optimal number of hidden layers, and the number of neurons of the hidden layers were found through a trial-and-error procedure. The model output sensitivity was assessed with various algorithms; however, the best forecasting skill was achieved with the Levenberg–Marquardt (LM) algorithm59, and the weight vector W is calculated using the Random Search method60. Furthermore, the Tangent Sigmoid and linear transfer function were chosen by trial and error and used in the hidden and output layers, respectively.
70% of the total data were randomly selected and used for training SVM, ANN, RT, and GP. The remaining 30% of the data were applied for testing the forecasting algorithms.
Performance-evaluation indices
The forecasting skill of the ML algorithms (SVM, ANN, RT, and GP) was evaluated with the Correlation Coefficient (R), the Nash–Sutcliffe Efficiency (NSE), the Root Mean Square Error (RMSE), and the Mean Absolute Error (MAE) in the training and testing phases. The closer the R and NSE values are to 1, and the closer the RMSE and MAE values are to 0, the better the performance of the algorithms20. Equations (7)–(10) describe the performance indices:
in which \( Q_{fore,i}\), \(Q_{obs,i}\), \(Q_{mean \; fore}\), \(Q_{mean \; obs}\), \(i\), and \(n\) denote the forecasted inflow, observed inflow, mean forecasted inflow, mean observed inflow, time step, and the total number of time steps during training and testing phases, respectively.
Ethics approval
All authors complied with the ethical standards.
Consent to participate
All authors consent to participate.
Consent for publish
All authors consent to publish.
Case study
The Great Karun Basin, Iran, is part of the Persian Gulf catchment. It is located in southwestern Iran, with an area of about 67,257 km2. The main river of the basin, the Karun, with a length of about 950 km, stems from the Yellow Mountains and flows through mountainous areas in Indika and Masjed Soleyman and ultimately discharges into the Persian Gulf. Dez and Gotvand are the two main reservoirs which are located in this basin.
Karkheh Basin is located in western Iran, in the middle and southwestern regions of the Zagros Front. The area of this basin is about 51,604 km2. Karkheh reservoir is located in this basin. Table 2 lists the characteristics of the Dez, Karkheh, and Gotvand reservoirs. Figure 6 shows the location of Dez and Gotvand reservoirs in the Great Karun basin and the Karkheh reservoir in the Karkheh basin.
During March and April 2019 Iran faced three major waves of extreme precipitation, leading to extreme floods with long return periods in large parts of Iran61,62. Before the 2019 flood many parts of Iran suffered drought and the drying of lakes and rivers for almost 30 years due to climatic change63. The southwestern regions of Iran including Great Karun and Karkheh basins endured the brunt of the second and third waves of precipitation and suffered severe damages due to fluvial floods.The Dez, Gotvand and Karkheh reservoirs received large volumes and precipitation and river flows. Table 3 shows the average, minimum, and maximum inflows to the Dez, Karkheh, and Gotvand reservoirs during January through April. This study develops a method to forecast reservoirs inflows in the Great Karun and Karkheh basins, which can be applied to future events.
Results and discussion
Dez reservoir evaluation
The values of the performance indices for SVM, ANN, RT and GP with the time-lag patterns in the Dez reservoir are listed in Table 4. It is seen that SVM had minimal RMSE, and RT had minimal MAE with the 1-month time-lag pattern applied to the January and April projections. SVM and RT performed better than the other algorithms in the testing phase. SVM had the best RMSE (MAE), 74.27 (74.26) for the February projection. RT achieved the best results for the March projection by having RMSE (MAE) of 33.37 (8.34). Appendix 1 presents the performance of the applied forecasting algorithms corresponding to the 1-month time-lag pattern for the Dez reservoir for the four projections.
The results listed in Table 4 indicate that the RT’s RMSE (MAE) obtained with the 2-month time-lag pattern applied to the January projection is 181.07 (63), which means a better forecast than the other algorithms in the testing phase. SVM had the best RMSE (MAE), 146.67 (144.15) for the February projection. SVM had the best values of RMSE for the other projections, and RT had the lowest values of the MAE. The 2-month time-lag pattern results associated with the Dez reservoir are presented in Appendix 2.
Karkheh reservoir evaluation
It is seen in Table 5 that SVM and RT have the best RMSE and MAE values, respectively, with the 1-month time-lag pattern applied to the January projection and the February projection and produced more accurate forecasts than the other algorithms. The smallest RMSE and MAE recorded in the testing phase corresponded to SVM for the other projections. The 1-month time-lag pattern results corresponding to the Karkheh reservoir under the four projections herein considered are presented in Appendix 3.
The results in Table 5 indicate that RT had the best accuracy according to the RMSE and MAE values for the 2-month time-lag pattern in the testing phase for the January projection and April projection. The highest accuracy corresponded to SVM and RT according to the RMSE and MAE values, respectively, for the February and March projections. Appendix 4 presents the 2-month time-lag pattern results for the Karkheh reservoir with the applied forecasting algorithms.
Gotvand reservoir evaluation
The SVM, RT, ANN, and GP results associated with the Gotvand reservoir are listed in Table 6. SVM and RT had the lowest RMSE and MAE values, respectively, for the January and April projections with the 1-month time-lag pattern in the testing phase. SVM produced the lowest RMSE (MAE), 93.46 (91.12) for the February projection. RT had the lowest RMSE (MAE), 257.91 (60.79) for the April projection. Appendix 5 presents the performance of the applied forecasting algorithms corresponding to the 1-month time-lag pattern for the four projections associated with the Gotvand reservoir.
It is seen in Table 6 that RT had the best RMSE (MAE) value, 256.84 (209.1) corresponding to the 2-month time-lag pattern in the testing phase for the January projection. SVM had the lowest RMSE and MAE for the February and March projections. SVM had the lowest RMSE (260.76), and RT had the lowest MAE (111.73) for the April projection. Appendix 6 confirms the accurate forecasting skill of SVM and RT for inflow to Gotvand reservoir with the 2-month time-lag pattern compared to the other forecasting algorithms.
RT has the lowest MAE for several projections with both time-lag patterns in the three reservoirs, while the minimal RMSE was obtained by SVM. It is seen in Appendixes 1–6 that RT calculated excellent forecasts for most years for the four projections; yet, RT had a large forecast error in some years. In contrast, SVM forecasted inflows with a relatively constant error. The MAE (Eq. (9)) calculates the mean of the absolute values of the differences between the observed and forecasted inflows to the reservoirs assigning the same weights to the differences. This is the main reason RT had lower MAE values than SVM under most projections, as RT forecasted most of the observed inflows well. On the other hand, the RMSE is the root of the mean square differences, which assigns more weight to the large differences because of the squaring applied [see Eq. (10)]. This caused SVM to produce lower RMSE than RT.
Tables 4, 5 and 6 establish that all the applied algorithms had the lowest forecasting accuracy under January projection with the 2-month time-lag pattern in the three reservoirs compared with the other projections judging by the significant drop in the values of the performance indices. This is so because the hydrologic or water year starts in September–October in Iran, and the algorithms for the January projection with a 2-month time lag forecast the reservoir inflows relying only on the October input data. It is evident in Fig. 7 that the reservoirs inflow in October 2019 are affected by the long-term reservoirs inflows and prolonged drought. Therefore, forecasting reservoirs inflow for the January projection with a 2-month time lag is more uncertain than the other projections.
A more detailed evaluation of the obtained results is the average improvement percentages (AIPs) of R and RMSE for the SVM and the AIPs of the MAE corresponding to the RT compared with the other forecasting algorithms in the testing phase when applying the 1-month and 2-month time-lag patterns. It is seen in Table 7 the clear superiority of the average R and RMSE associated with SVM model when using the 1-month time-lag pattern; that is, SVM features positive AIPs of R and RMSE when compared with RT, ANN, and GP. The largest AIPs of R and RMSE for SVM were obtained relative to ANN and GP (in Dez and Gotvand reservoirs), respectively. SVM featured negative AIPs of R compared to RT and GP in Dez reservoir and comparison with ANN, RT, and GP in the other reservoirs under the 2-month lag-time pattern, as shown in Table 7. Also, SVM had negative AIPs of RMSE in comparison with the RT in Karkheh and Gotvand reservoirs for the 2-month time-lag pattern. The reason for negative AIPs of R and RMSE for SVM was the SVM's performance decline with respect to the January projection with a 2-month time lag compared to the other algorithms in forecasting the reservoirs inflow. The most negative AIPs of R for SVM was obtained when compared with RT. Therefore, under the 2-month time-lag pattern, RT had higher accuracy on average than SVM, ANN and GP with respect to R. It is evident from Table 7 that RT had positive AIPs of MAE compared to SVM, ANN and GP except for the 2-month time-lag pattern in Gotvand reservoir. The largest positive AIPs of MAE for RT were obtained when compared with GP except for the 1-month time-lag pattern in the Karkheh reservoir.
Evaluation of time-lag patterns
The distribution of the forecast errors is examined with boxplots for further evaluation of the forecasting algorithms’ performance. The error equals the difference between the observed and forecasted inflows to the reservoirs. Positive and negative error values indicate under-estimation and over-estimation, respectively. The lower quartile (Q25) and upper quartile (Q75) contains one-fourth and three-fourths of the errors, respectively; therefore, the upper quartile is more significant than the lower quartile for comparing the algorithms’ performance. Figure 8a–d shows the SVM, GP, RT, and ANN results, respectively. It is seen that the upper quartiles for the 1-month time-lag pattern were equal to 19.183, 86.703, 0.0003, and 84.515, respectively, which were lower than the upper quartile for the 2-month time-lag pattern (138.243, 79.172, 0.0004, and 123.067, respectively), except GP. Therefore, SVM, RT, and ANN applying the 1-month time-lag pattern and GP applying the 2-month time-lag pattern had better accuracy in forecasting the inflow to the Dez reservoir. It is seen in Fig. 9 that the SVM’s upper quartile Q75 = 92.978 was more accurate for the 1-month time lag pattern; however, GP, RT, and ANN had Q75 = 84.991, 0.0008, and 74.838, respectively for the 2-month time-lag pattern performed better than the 1-month time-lag pattern in Karkheh reservoir. The minimum upper quartiles were equal to 181.679 and 0.0012 for SVM and RT, respectively, with the 1-month time-lag pattern, as can be seen in Fig. 10. GP and ANN, on the other hand, had better performance with the 2-month time-lag pattern based on the low values of their upper quartiles (equal to 197.765 and 206.622, respectively) in forecasting inflow to Gotvand reservoir.
Evaluation of the performance of the applied algorithms in forecasting reservoirs inflow in 2019
Figures 11, 12 and 13 display the performance of the applied forecasting algorithms corresponding to the 1- and 2-month time-lag patterns in foresting reservoirs inflow in 2019. As shown in Figs. 11, 12 and 13, the observed inflows to the Dez, Karkheh and Gotvand reservoirs in April and February 2019 are larger than the other months. A comparison of the observed reservoirs inflow reveals that the largest inflow in February accrues to Gotvand reservoir, and in April it corresponds to Karkheh reservoir.
Dez reservoir
It is seen in Fig. 11a that under the projection-January, RT and SVM with a 1-month time lag forecasted the Dez reservoir inflow with a lower error than the other algorithms and another time-lag pattern, which are 42.7 and 55.4 m3/s, respectively. Figure 11b,c show that ANN and RT with a 1-month time lag were more accurate in forecasting Dez reservoir inflows in February and March. The error values for the February projection are 32.7 and 71.1 m3/s, respectively, and for the March projection are 27.19 and − 44.31 m3/s, respectively. According to Fig. 11d the April projection with the RT model obtained with a 2-month time lag has an error of 38.0 m3/s, and SVM model with a 1-month time lag has an error of 236.6 m3/s.
Karkheh reservoir
Comparison of the forecasted inflows to Karkheh reservoir in 2019 shows that the RT model with 1- and 2-month time lag for the January projection (with errors equal to 6.9 and 13.9 m3/s, respectively) and the February projection (with errors equal to 1.2 and 5.0 m3/s, respectively) is superior to the other algorithms (see Fig. 12a,b). The minimum forecasts error of Karkheh reservoir inflows for the March projection belongs to RT with a 2-month time lag, and to ANN with a 1-month time lag (with errors equal to 89.4 and 149.7 m3/s, respectively) and for the April projections belongs to RT and ANN with a 2-month time lag (with errors equal to 3.2 and 6.7 m3/s, respectively).
Gotvand reservoir
Figures 13a,b show the superiority of RT and ANN for the 2-month time-lag pattern for the January projection (with errors equal to 22.4 and 1.2 m3/s, respectively) and for the February projection (with errors equal to 15.8 and 74.2 m3/s, respectively) in forecasting Gotvand reservoir inflows in 2019 compared to the other applied algorithms and another time-lag pattern. Furthermore, RT with 1- and 2- month time lag for the March projection (with errors equal to 39.1 and 31.8 m3/s, respectively) and for the April projection (with errors equal to 18.4 and 29.1 m3/s, respectively) had better performance accuracy in forecasting Gotvand reservoir inflows according to Fig. 13c,d.
Concluding remarks
This study presents a method for forecasting reservoirs inflow. SVM, ANN, RT, and GP were selected to forecast the monthly inflows to Dez, Karkheh, and Gotvand reservoirs in Iran. The proposed method is applied to evaluate the forecasting performance of the algorithms during the large flood of 2019. The applied algorithms were developed based on the 1-month and 2-month time-lag patterns. Monthly reservoirs inflow were used to train the forecasting algorithms. The forecasting skill of the algorithms were compared using the Correlation Coefficient, Root Mean Squared Error, Nash–Sutcliffe efficiency, and Mean Absolute Error. The capacity of RT to forecast the largest reservoir inflows in 2019 indicates that the reservoir inflows in 2019 could have been forecasted accurately. The results showed that SVM and RT had better accuracy among the algorithms. The SVM model with the 1-month time-lag pattern performed better (22.14%) than the 2-month time-lag pattern according to the upper quartile (Q75) of forecast errors distribution in forecasting the Karkheh reservoir’s inflow. In contrast, the RT model had better accuracy (99%) with the 2-month time-lag pattern. Furthermore, SVM and RT had better performance with the 1-month time lag based on the low value of Q75 in forecasting inflow to Dez (86.12 and 25%, respectively) and Gotvand (1 and 7.69%, respectively) reservoirs.
This study’s results guide FIRO for improved reservoir management, decision-making and planning, and optimal reservoir storage allocation for flood control. Accurate forecasting of reservoir inflow is imperative for effective and timely flood control, reduction of damages, and for reducing the risk of not meeting downstream water demands.
Future research may be applied to develop ensemble models and comparing their performance with the ML algorithms in forecasting the 2019 reservoir inflows. Furthermore, comparing the forecasting skill of the ML algorithms with those of physically-based models for forecasting reservoir inflows would provide a comprehensive assessment of the relative advantages of these forecasting methods. Employing remote sensing data in data-sparse areas, especially for developing countries, would be worth pursuing in future works.
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Code availability
The codes that support the findings of this study are available from the corresponding author upon reasonable request.
References
Li, S. et al. Assessment of the catastrophic Asia floods and potentially affected population in summer 2020 using VIIRS flood products. Remote Sens. 12, 3176. https://doi.org/10.3390/rs12193176 (2020).
Kundzewicz, Z. & Schellnhuber, H. Floods in the IPCC TAR perspective. Nat. Hazards 31, 111–128. https://doi.org/10.1023/B:NHAZ.0000020257.09228.7b (2004).
Delaney, C. et al. Forecast informed reservoir operations using ensemble streamflow predictions for a multi-purpose reservoir in Northern California. Water Resour. Res. https://doi.org/10.1029/2019WR026604 (2020).
Xiang, Z. et al. Urban drought challenge to 2030 sustainable development goals. Sci. Total Environ. 693, 133536. https://doi.org/10.1016/j.scitotenv.2019.07.342 (2019).
Lee, J. E., Heo, J.-H., Lee, J. & Kim, N. W. Assessment of flood frequency alteration by dam construction via SWAT simulation. Water 9, 264. https://doi.org/10.3390/w9040264 (2017).
Tayefi, V., Lane, S. N., Hardy, R. J. & Yu, D. A comparison of one- and two-dimensional approaches to modelling flood inundation over complex upland floodplains. Hydrol. Process 21, 3190–3202 (2007).
Leandro, J., Chen, A., Djordjević, S. & Savic, D. Comparison of 1D/1D and 1D/2D coupled (sewer/surface) hydraulic models for urban flood simulation. J. Hydrol. Eng. 135, 495–504. https://doi.org/10.1061/(ASCE)HY.1943-7900.0000037 (2009).
Rene, J.-R. et al. A real-time pluvial flood forecasting system for Castries, St. Lucia. J. Flood Risk Manag. 11, S269–S283. https://doi.org/10.1111/jfr3.12205 (2015).
Rocha, J. et al. Impacts of climate change on reservoir water availability, quality and irrigation needs in a water scarce Mediterranean region (southern Portugal). Sci. Total Environ. 736, 139477. https://doi.org/10.1016/j.scitotenv.2020.139477 (2020).
Ryu, J. et al. Development of a watershed-scale long-term hydrologic impact assessment model with the asymptotic curve number regression equation. Water 8, 307. https://doi.org/10.3390/w8070307 (2016).
Albek, M., Albek, E., Goncu, S. & Uygun, B. Ensemble streamflow projections for a small watershed with HSPF model. Environ. Sci. Pollut. Res. 26, 36023–36036. https://doi.org/10.1007/s11356-019-06749-9 (2019).
Hong, J. et al. Development and evaluation of the combined machine learning models for the prediction of dam inflow. Water 12, 2927. https://doi.org/10.3390/w12102927 (2020).
Nayak, P. C., Sudheer, K., Rangan, D. & Ramasastri, K. Short-term flood forecasting with a neurofuzzy model. Water Resour. Res. 41, 04004. https://doi.org/10.1029/2004WR003562 (2005).
Thavhana, M. P., Savage, M. J. & Moeletsi, M. E. SWAT model uncertainty analysis, calibration and validation for runoff simulation in the Luvuvhu River catchment, Suth Africa. Phys. Chem. Earth 105, 115–124. https://doi.org/10.1016/j.pce.2018.03.012 (2018).
Ficklin, D. L. & Barnhart, B. L. SWAT hydrologic model parameter uncertainty and its implications for hydroclimatic projections in snowmelt-dependent watersheds. J. Hydrol. 519, 2081–2090. https://doi.org/10.1016/j.jhydrol.2014.09.082 (2014).
Ke, Q. et al. Urban pluvial flooding prediction by machine learning approaches: A case study of Shenzhen city, China. Adv. Water Resour. 145, 103719. https://doi.org/10.1016/j.advwatres.2020.103719 (2020).
Mekanik, F., Imteaz, M. A., Gato-Trinidad, S. & Elmahdi, A. Multiple regression and Artificial Neural Network for long-term rainfall forecasting using large scale climate modes. J. Hydrol. 503, 11–21. https://doi.org/10.1016/j.jhydrol.2013.08.035 (2013).
Mosavi, A. & Ozturk, P. Flood prediction using machine learning models: Literature review. Water 10, 1536. https://doi.org/10.3390/w10111536 (2018).
Coulibaly, P., Anctil, F. & Bobée, B. Daily reservoir inflow forecasting using artificial neural networks with stopped training approach. J. Hydrol. 230, 244–257 (2000).
Wang, W.-C., Cheng, C.-T. & Qiu, L. A comparison of performance of several artificial intelligence methods for forecasting monthly discharge time series. J. Hydrol. 374, 294–306. https://doi.org/10.1016/j.jhydrol.2009.06.019 (2009).
Erdal, H. İ & Karakurt, O. Advancing monthly streamflow prediction accuracy of CART models using ensemble learning paradigms. J. Hydrol. 477, 119–128. https://doi.org/10.1016/j.jhydrol.2012.11.015 (2013).
Bozorg-Haddad, O., Zarezadeh-Mehrizi, M., Abdi Dehkordi, M., Loaiciga, H. & Mariño, M. A self-tuning ANN model for simulation and forecasting of surface flows. Water Resour. Manag. 30, 2907–2929. https://doi.org/10.1007/s11269-016-1301-2 (2016).
Meng, E. et al. A robust method for non-stationary streamflow prediction based on improved EMD-SVM model. J. Hydrol. 568, 462–478. https://doi.org/10.1016/j.jhydrol.2018.11.015 (2018).
Khan, M. & Coulibaly, P. Application of support vector machine in lake water level prediction. J. Hydrol. Eng. 11, 199–205. https://doi.org/10.1061/(ASCE)1084-0699(2006)11:3(199) (2006).
Wen, X. et al. Support-vector-machine-based models for modeling daily reference evapotranspiration with limited climatic data in extreme arid regions. Water Resour. Manag. 29, 3195–3209. https://doi.org/10.1007/s11269-015-0990-2 (2015).
Gill, M., Asefa, T., Kemblowski, M. & McKee, M. Soil moisture prediction using support vector machines. J. Am. Water Resour. Assoc. 42, 1033–1046. https://doi.org/10.1111/j.1752-1688.2006.tb04512.x (2007).
Yahya, A. et al. Water quality prediction model based support vector machine model for ungauged river catchment under dual scenarios. Water 11, 1231. https://doi.org/10.3390/w11061231 (2019).
Bilali, A., Abdeslam, T. & Brouziyne, Y. Groundwater quality forecasting using machine learning algorithms for irrigation purposes. Agric. Water Manag. 245, 106625. https://doi.org/10.1016/j.agwat.2020.106625 (2020).
Parida, B. P., Moalafhi, D. & Kenabatho, P. Forecasting runoff coefficients using ANN for water resources management: The case of Notwane catchment in Eastern Botswana. Phys. Chem. Earth 31, 928–934. https://doi.org/10.1016/j.pce.2006.08.017 (2006).
Awchi, T. River discharges forecasting in northern Iraq using different ANN techniques. Water Resour. Manag. 28, 801–814. https://doi.org/10.1007/s11269-014-0516-3 (2014).
Shrestha, M., Manandhar, S. & Shrestha, S. Forecasting water demand under climate change using artificial neural network: A case study of Kathmandu Valley, Nepal. Water Supply 20, 1823–1833 (2020).
Golzar, F., Nilsson, D. & Martin, V. Forecasting wastewater temperature based on artificial neural network (ANN) technique and monte carlo sensitivity analysis. Sustainability 12, 6386. https://doi.org/10.3390/su12166386 (2020).
Trichakis, I. C., Nikolos, I. K. & Karatzas, G. Artificial neural network (ANN) based modeling for karstic groundwater level simulation. Water Resour. Manag. 25, 1143–1152 (2011).
Liong, S.-Y. et al. Genetic programming: A new paradigm in rainfall runoff modeling. J. Am. Water Resour. Assoc. 38, 705–718 (2002).
Aytek, A. & Kisi, O. A genetic programming approach to suspended sediment modeling. J. Hydrol. 351, 288–298. https://doi.org/10.1016/j.jhydrol.2007.12.005 (2008).
Fallah-Mehdipour, E., Bozorg-Haddad, O. & Mariño, M. A. Prediction and simulation of monthly groundwater levels by genetic programming. J. Hydro-Environ. Res. 7, 253–260. https://doi.org/10.1016/j.jher.2013.03.005 (2013).
Fallah-Mehdipour, E., Bozorg-Haddad, O. & Mariño, M. Genetic programming in groundwater modeling. J. Hydrol. Eng. 19, 04014031. https://doi.org/10.1061/(ASCE)HE.1943-5584.0000987 (2014).
Babaee, M., Maroufpoor, S., Jalali, M., Zarei, M. & Elbeltagi, A. Artificial intelligence approach to estimating rice yield. Irrig. Drain. 70, 732–742 (2021).
Nikoo, M., Hadzima-Nyarko, M., Nyarko, K. & Nikoo, M. Flood-routing modeling with neural network optimized by social-based algorithm. Nat. Hazards 82, 1–24. https://doi.org/10.1007/s11069-016-2176-5 (2016).
Tehrany, M., Pradhan, B. & Jebur, M. Spatial prediction of flood susceptible areas using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS. J. Hydrol. 504, 69–79. https://doi.org/10.1016/j.jhydrol.2013.09.034 (2013).
Lee, E. H. & Kim, J. Development of a flood-damage-based flood forecasting technique. J. Hydrol. 563, 181–194. https://doi.org/10.1016/j.jhydrol.2018.06.003 (2018).
Bozorg-Haddad, O. et al. Investigation of Floods in 2019 from the Perspective of Reservoir Management. Report No. 1, 1-304 (Special Committee on National Flood Report, 2019).
Vapnik, V. The Nature of Statistical Learning Theory (Springer, 1995).
Sarzaeim, P., Bozorg-Haddad, O., Bozorgi, A. & Loaiciga, H. Runoff projection under climate change conditions with data-mining methods. J. Irrig. Drain. Eng. 143, 0001205. https://doi.org/10.1061/(ASCE)IR.1943-4774.0001205 (2017).
Vapnik, V. Statistical Learning Theory New York (Wiley-Interscience, 1998).
Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297 (1995).
Vens, C. et al. Inductive Databases and Constraint-Based Data Mining 365–387 (Springer, 2010).
Chavent, M. A monothetic clustering method. Pattern Recogn. Lett. 19, 989–996 (1998).
Quinlan, J. Induction of decision trees. Mach. Learn. 1, 1–81 (1986).
Blockeel, H. & De Raedt, L. Top-down induction of first-order logical decision trees. Artif. Intell. 101, 285–297 (1998).
Cramer, N. L. In Proceedings of an International Conference on Genetic Algorithms and the Applications 183–187.
Koza, J. R. Genetic Programming: On the Programming of Computers by Means of Natural Selection (MIT Press, 1992).
Fallah-Mehdipour, E. & Haddad, O. B. Handbook of Genetic Programming Applications 59–70 (Springer, 2015).
Orouji, H., Bozorg-Haddad, O., Fallah-Mehdipour, E. & Marino, M. Flood routing in branched river by genetic programming. Water Manag. 166, 115–123. https://doi.org/10.1680/wama.12.00006 (2013).
McCulloch, W. S. & Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biol. 5, 115–133 (1943).
Bozorg Haddad, O., Aboutalebi, M., Ashofteh, P.-S. & Loáiciga, H. A. Real-time reservoir operation using data mining techniques. Environ. Monit. Assess. 190, 1–22 (2018).
Arefinia, A., Bozorg-Haddad, O., Oliazadeh, A. & Loaiciga, H. Reservoir water quality simulation with data mining models. Environ. Monit. Assess. 192, 482. https://doi.org/10.1007/s10661-020-08454-4 (2020).
Kelleher, J. D. & Tierney, B. Data Science (MIT Press, 2018).
Marquardt, D. W. An algorithm for least-squares estimation of nonlinear parameters. J. Soc. Ind. Appl. Math. 11, 431–441 (1963).
Solis, F. J. & Wets, R.J.-B. Minimization by random search techniques. Math. Oper. Res. 6, 19–30 (1981).
Sadeghi, M. et al. Application of remote sensing precipitation data and the CONNECT algorithm to investigate spatiotemporal variations of heavy precipitation: Case study of major floods across Iran (Spring 2019). J. Hydrol. 600, 126569. https://doi.org/10.1016/j.jhydrol.2021.126569 (2021).
Bozorg-Haddad, O., Zolghadr-Asli, B., Chu, X. & Loaiciga, H. Intense extreme hydro-climatic events take a toll on society. Nat. Hazards 108, 2385–2391. https://doi.org/10.1007/s11069-021-04749-y (2021).
Yadollahie, M. The food in Iran: A consequence of the global warming?. Int. J. Occup. Environ. Med. 10, 54 (2019).
Aminyavari, S., Saghafian, B. & Sharifi, E. Assessment of precipitation estimation from the NWP models and satellite products for the spring 2019 severe floods in Iran. Remote Sens. 11, 2741. https://doi.org/10.3390/rs11232741 (2019).
Acknowledgements
The authors thank Iran’s National Science Foundation (INSF) for its support for this research.
Author information
Authors and Affiliations
Contributions
M.Z.; Formal analysis, Writing—Original Draft. O.B.-H.; Conceptualization, Supervision, Project administration. S.B.; Formal analysis, Writing—Original Draft. M.D.; Formal analysis, Writing—Original Draft. E.G.; Validation, Writing—Review & Editing. H.A.L.; Validation, Writing—Review & Editing.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zarei, M., Bozorg-Haddad, O., Baghban, S. et al. Machine-learning algorithms for forecast-informed reservoir operation (FIRO) to reduce flood damages. Sci Rep 11, 24295 (2021). https://doi.org/10.1038/s41598-021-03699-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-021-03699-6
This article is cited by
-
Rewards, risks and responsible deployment of artificial intelligence in water systems
Nature Water (2023)
-
Impact assessment of natural and anthropogenic activities using remote sensing and GIS techniques in the Upper Purna River basin, Maharashtra, India
Modeling Earth Systems and Environment (2023)
-
Comparing three types of data-driven models for monthly evapotranspiration prediction under heterogeneous climatic conditions
Scientific Reports (2022)
-
Applications of Data-driven Models for Daily Discharge Estimation Based on Different Input Combinations
Water Resources Management (2022)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.