Machine-learning algorithms for forecast-informed reservoir operation (FIRO) to reduce flood damages

Water is stored in reservoirs for various purposes, including regular distribution, flood control, hydropower generation, and meeting the environmental demands of downstream habitats and ecosystems. However, these objectives are often in conflict with each other and make the operation of reservoirs a complex task, particularly during flood periods. An accurate forecast of reservoir inflows is required to evaluate water releases from a reservoir seeking to provide safe space for capturing high flows without having to resort to hazardous and damaging releases. This study aims to improve the informed decisions for reservoirs management and water prerelease before a flood occurs by means of a method for forecasting reservoirs inflow. The forecasting method applies 1- and 2-month time-lag patterns with several Machine Learning (ML) algorithms, namely Support Vector Machine (SVM), Artificial Neural Network (ANN), Regression Tree (RT), and Genetic Programming (GP). The proposed method is applied to evaluate the performance of the algorithms in forecasting inflows into the Dez, Karkheh, and Gotvand reservoirs located in Iran during the flood of 2019. Results show that RT, with an average error of 0.43% in forecasting the largest reservoirs inflows in 2019, is superior to the other algorithms, with the Dez and Karkheh reservoir inflows forecasts obtained with the 2-month time-lag pattern, and the Gotvand reservoir inflow forecasts obtained with the 1-month time-lag pattern featuring the best forecasting accuracy. The proposed method exhibits accurate inflow forecasting using SVM and RT. The development of accurate flood-forecasting capability is valuable to reservoir operators and decision-makers who must deal with streamflow forecasts in their quest to reduce flood damages.


Methods
This study applies the SVM, ANN, RT, and GP, for forecasting monthly reservoirs inflow with 1-and 2-month time lags. The historical data for inflow to the Dez, Karkheh, and Gotvand reservoirs were collected and used to build the ML algorithms. The inputs to the algorithms for the Dez, Karkheh, and Gotvand reservoirs are the monthly inflows for 1965-2019, 1957-2019, and 1961-2019, respectively. Four projections were designed for the 1-month time lag and the 2-month time lag patterns based on the input and output months, as depicted in Fig. 1. Figure 2 displays the flowchart of this paper's methodology. www.nature.com/scientificreports/ Support vector machine. Support Vector Machine was introduced by Vapnik et al. 43 . SVM performs classification and regression based on statistical learning theory 44 . The regression form of SVM is named support vector regression (SVR). Vapnik et al. 45 defined two functions for SVR design. The first function is the error function. (Eq. (1), see Fig. 3). The second function is a linear function that calculates output values for input, weight, and deviation values (Eq. 2): where y , f (x) , ε , ξ , W , b , T denote respectively the observational value, the output value calculated by SVR, a function sensitivity value, a model penalty, the weight applied to the variable x , the deviation of W T x from the y , and the vector/matrix transpose operator. It is seen in Fig. 3 that the first function (Eq. 1) does not apply a penalty to the points where the difference between the observed value and the calculated value falls within the range of (−ε, +ε) . Otherwise, a penalty ξ is applied. SVR solves an optimization problem that minimizes the forecast error (Eq. 3) to improve the model's forecast accuracy. Equations (4) and (5) represent the constraints of the optimization problem.
Subject to: where C , m, ξ − i , ξ + i , y i , and || || denote respectively the penalty coefficient, the number of input data to the model in the training phase, the penalty for the lower bound (−ε, +ε) , the penalty for the upper bound (−ε, +ε) , the i-th observational value, and vectorial magnitude. The values of W and b are calculated by solving the optimization problem embodied by Eqs. (3)-(5) with the Lagrange method, and they are substituted in Eq. (2) to calculate the SVR output. SVR is capable of modeling nonlinear data, in which case it relies on transfer functions to transform the data to such that linear functions can be fitted to the data. Reservoirs inflow is forecasted with SVR was performed with the Tanagra software. The transfer function selected and used in this study is the Radial Basis Function (RBF), which provided better results than other transfer functions. The weight vector W is calculated using the Soft Margin method 46 , and the optimal values of the parameters ξ − i , +ξ + i and C were herein estimated by trial and error.
Regression tree (RT). RT involves a clustering tree with post-pruning processing (CTP). The clustering tree algorithm has been reported in various articles as the forecasting clustering tree 47 and the monothetic clustering tree 48 . The clustering tree algorithm is based on the top-down induction algorithm of decision trees 49 ; This algorithm takes a set of training data as input and forms a new internal node, provided the best acceptable test can be placed in a node. The algorithm selects the best test scores based on their lower variance. The smaller Figure 3. Illustration of the error function of SVR. www.nature.com/scientificreports/ the variance, the greater the homogeneity of the cluster and the greater the forecast accuracy. If none of the tests significantly reduces the variance the algorithm generates a leaf and tags it as being representative of data 47,48 . The CTP algorithm is similar to the clustering tree algorithm, except that its post-pruning process is performed with a pruning set to create the right size of the tree 50 .
RT used in this study is programmed in the MATLAB software. The minimum leaf size, the minimum node size for branching, the maximum tree depth, and the maximum number of classification ranges are set by trial and error in this paper's application.
Genetic programming (GP). GP, developed by Cramer 51 and Koza 52 , is a type of evolutionary algorithm that has been used effectively in water management to carry out single-and multi-objective optimization 53 . GP finds functional relations between input and output data by combining operators and mathematical functions relying on structured tree searches 44 . GP starts the searching process by generating a random set of trees in the first iteration. The tree's length creates a function called the depth of the tree which the greater the depth of the tree, the more accurate the GP functional relation is 54 . In a tree structure, all the variables and operators are assumed to be the terminal and function sets, respectively. Figure 4 shows mathematical relational functions generated by GP. Genetic programming consists of the following steps: • Select the terminal sets: these are the problem-independent variables and the system state variables.
• Select a set of functions: these include arithmetic operators (÷ , ×, −, +), Boolean functions (such as "or" "and"), mathematical functions (such as sin and cos), and argumentative expressions (such as if-then-else), and other required statements based on problem objectives. • Algorithmic accuracy measurement index: it determines to what extent the algorithm is performing correctly. • Control components: these are numerical components, and qualitative variables are used to control the algorithm's execution. • Stopping criterion: which determines when the execution of the algorithm is terminated.   www.nature.com/scientificreports/ The Genexprotools software was implemented in this study to program GP. The GP parameters, operators, and linking functions were chosen based on the lowest RMSE in this study. The GP model's parameters and operators applied in this study are listed in Table 1. 55 , is an artificial intelligencebased computational method that features an information processing system that employs interconnected data structures to emulate information processing by the human brain 56 . A neural network does not require precise mathematical algorithms and, like humans, can learn through input/output analysis relying on explicit instructions 57 . A simple neural network contains one input layer, one hidden layer, and one output layer. Deeplearning networks have multiple hidden layers 58 . ANN introduces new inputs to forecast the corresponding output with a specific algorithm after training the functional relations between inputs and outputs.

Artificial Neural Network (ANN). ANN, developed by McCollock and Walterpits
This study applies the Multi-Layer Perceptron (MLP). A three-layer feed-forward ANN that features a processing element, an activation function, and a threshold function, as shown in Fig. 5. In MLP, the weighted sum of the inputs and bias term is passed to activation level through a transfer function to create the one output.
The output is calculated with a nonlinear function as follows: where W i , X i , b , f , and Y denote the i-th weight factor, the i-th input vector, the bias, the conversion function, and the output, respectively. The ANN was coded in MATLAB. The number of epochs, the optimal number of hidden layers, and the number of neurons of the hidden layers were found through a trial-and-error procedure. The model output sensitivity was assessed with various algorithms; however, the best forecasting skill was achieved with the Levenberg-Marquardt (LM) algorithm 59 , and the weight vector W is calculated using the Random Search method 60 .  www.nature.com/scientificreports/ Furthermore, the Tangent Sigmoid and linear transfer function were chosen by trial and error and used in the hidden and output layers, respectively. 70% of the total data were randomly selected and used for training SVM, ANN, RT, and GP. The remaining 30% of the data were applied for testing the forecasting algorithms.
Performance-evaluation indices. The forecasting skill of the ML algorithms (SVM, ANN, RT, and GP) was evaluated with the Correlation Coefficient (R), the Nash-Sutcliffe Efficiency (NSE), the Root Mean Square Error (RMSE), and the Mean Absolute Error (MAE) in the training and testing phases. The closer the R and NSE values are to 1, and the closer the RMSE and MAE values are to 0, the better the performance of the algorithms 20 . Equations (7)-(10) describe the performance indices: in which Q fore,i , Q obs,i , Q mean fore , Q mean obs , i , and n denote the forecasted inflow, observed inflow, mean forecasted inflow, mean observed inflow, time step, and the total number of time steps during training and testing phases, respectively.

Ethics approval. All authors complied with the ethical standards.
Consent to participate. All authors consent to participate.

Case study
The Great Karun Basin, Iran, is part of the Persian Gulf catchment. It is located in southwestern Iran, with an area of about 67,257 km 2 . The main river of the basin, the Karun, with a length of about 950 km, stems from the Yellow Mountains and flows through mountainous areas in Indika and Masjed Soleyman and ultimately discharges into the Persian Gulf. Dez and Gotvand are the two main reservoirs which are located in this basin.
Karkheh Basin is located in western Iran, in the middle and southwestern regions of the Zagros Front. The area of this basin is about 51,604 km 2 . Karkheh reservoir is located in this basin. Table 2 lists the characteristics of the Dez, Karkheh, and Gotvand reservoirs. Figure 6 shows the location of Dez and Gotvand reservoirs in the Great Karun basin and the Karkheh reservoir in the Karkheh basin.
During March and April 2019 Iran faced three major waves of extreme precipitation, leading to extreme floods with long return periods in large parts of Iran 61,62 . Before the 2019 flood many parts of Iran suffered drought and the drying of lakes and rivers for almost 30 years due to climatic change 63 . The southwestern regions of Iran including Great Karun and Karkheh basins endured the brunt of the second and third waves of precipitation and suffered severe damages due to fluvial floods.The Dez, Gotvand and Karkheh reservoirs received large volumes and precipitation and river flows. Table 3 shows the average, minimum, and maximum inflows to the Dez, Karkheh, and Gotvand reservoirs during January through April. This study develops a method to forecast reservoirs inflows in the Great Karun and Karkheh basins, which can be applied to future events. www.nature.com/scientificreports/  Karkheh reservoir evaluation. It is seen in Table 5 that SVM and RT have the best RMSE and MAE values, respectively, with the 1-month time-lag pattern applied to the January projection and the February projection and produced more accurate forecasts than the other algorithms. The smallest RMSE and MAE recorded in the testing phase corresponded to SVM for the other projections. The 1-month time-lag pattern results corresponding to the Karkheh reservoir under the four projections herein considered are presented in Appendix 3.
The results in Table 5 indicate that RT had the best accuracy according to the RMSE and MAE values for the 2-month time-lag pattern in the testing phase for the January projection and April projection. The highest accuracy corresponded to SVM and RT according to the RMSE and MAE values, respectively, for the February Table 4. Results of the applied algorithms obtained with the 1-month and 2-month time-lag patterns in Dez reservoir.

Algorithms
Phase Projections RT has the lowest MAE for several projections with both time-lag patterns in the three reservoirs, while the minimal RMSE was obtained by SVM. It is seen in Appendixes 1-6 that RT calculated excellent forecasts for most years for the four projections; yet, RT had a large forecast error in some years. In contrast, SVM forecasted inflows with a relatively constant error. The MAE (Eq. (9)) calculates the mean of the absolute values of the differences between the observed and forecasted inflows to the reservoirs assigning the same weights to the differences. This is the main reason RT had lower MAE values than SVM under most projections, as RT forecasted most of the observed inflows well. On the other hand, the RMSE is the root of the mean square differences, which assigns Table 5. Results of the applied algoriths obtained with the 1-month and 2-month time-lag patterns in Karkheh reservoir.

Algorithms
Phase Projections www.nature.com/scientificreports/ more weight to the large differences because of the squaring applied [see Eq. (10)]. This caused SVM to produce lower RMSE than RT. Tables 4, 5 and 6 establish that all the applied algorithms had the lowest forecasting accuracy under January projection with the 2-month time-lag pattern in the three reservoirs compared with the other projections judging by the significant drop in the values of the performance indices. This is so because the hydrologic or water year starts in September-October in Iran, and the algorithms for the January projection with a 2-month time lag forecast the reservoir inflows relying only on the October input data. It is evident in Fig. 7 that the reservoirs inflow in October 2019 are affected by the long-term reservoirs inflows and prolonged drought. Therefore, forecasting reservoirs inflow for the January projection with a 2-month time lag is more uncertain than the other projections.

R NSE RMSE (m 3 /s) MAE (m 3 /s) R NSE RMSE (m 3 /s) MAE (m 3 /s)
A more detailed evaluation of the obtained results is the average improvement percentages (AIPs) of R and RMSE for the SVM and the AIPs of the MAE corresponding to the RT compared with the other forecasting algorithms in the testing phase when applying the 1-month and 2-month time-lag patterns. It is seen in Table 7 the clear superiority of the average R and RMSE associated with SVM model when using the 1-month timelag pattern; that is, SVM features positive AIPs of R and RMSE when compared with RT, ANN, and GP. The largest AIPs of R and RMSE for SVM were obtained relative to ANN and GP (in Dez and Gotvand reservoirs), respectively. SVM featured negative AIPs of R compared to RT and GP in Dez reservoir and comparison with ANN, RT, and GP in the other reservoirs under the 2-month lag-time pattern, as shown in Table 7. Also, SVM had negative AIPs of RMSE in comparison with the RT in Karkheh and Gotvand reservoirs for the 2-month time-lag pattern. The reason for negative AIPs of R and RMSE for SVM was the SVM's performance decline with respect to the January projection with a 2-month time lag compared to the other algorithms in forecasting the reservoirs inflow. The most negative AIPs of R for SVM was obtained when compared with RT. Therefore, under the 2-month time-lag pattern, RT had higher accuracy on average than SVM, ANN and GP with respect Table 6. Results of the applied algorithms obtained with the 1-month and 2-month time-lag patterns in Gotvand reservoir.

Algorithms
Phase Projections   www.nature.com/scientificreports/ to R. It is evident from Table 7 that RT had positive AIPs of MAE compared to SVM, ANN and GP except for the 2-month time-lag pattern in Gotvand reservoir. The largest positive AIPs of MAE for RT were obtained when compared with GP except for the 1-month time-lag pattern in the Karkheh reservoir.

R NSE RMSE (m 3 /s) MAE (m 3 /s) R NSE RMSE (m 3 /s) MAE (m 3 /s)
Evaluation of time-lag patterns. The distribution of the forecast errors is examined with boxplots for further evaluation of the forecasting algorithms' performance. The error equals the difference between the observed and forecasted inflows to the reservoirs. Positive and negative error values indicate under-estimation and over-estimation, respectively. The lower quartile (Q25) and upper quartile (Q75) contains one-fourth and three-fourths of the errors, respectively; therefore, the upper quartile is more significant than the lower quartile for comparing the algorithms' performance. Figure 8a-d shows the SVM, GP, RT, and ANN results, respectively. It is seen that the upper quartiles for the 1-month time-lag pattern were equal to 19.183, 86.703, 0.0003, and 84.515, respectively, which were lower than the upper quartile for the 2-month time-lag pattern (138.243, 79.172, 0.0004, and 123.067, respectively), except GP. Therefore, SVM, RT, and ANN applying the 1-month time-lag pattern and GP applying the 2-month time-lag pattern had better accuracy in forecasting the inflow to the Dez reservoir. It is seen in Fig. 9 that the SVM's upper quartile Q75 = 92.978 was more accurate for the 1-month time lag pattern; however, GP, RT, and ANN had Q75 = 84.991, 0.0008, and 74.838, respectively for the 2-month time-lag pattern performed better than the 1-month time-lag pattern in Karkheh reservoir. The minimum upper quartiles were equal to 181.679 and 0.0012 for SVM and RT, respectively, with the 1-month time-lag pattern, as can be seen in Fig. 10.  www.nature.com/scientificreports/ Dez reservoir. It is seen in Fig. 11a

Concluding remarks
This study presents a method for forecasting reservoirs inflow. SVM, ANN, RT, and GP were selected to forecast the monthly inflows to Dez, Karkheh, and Gotvand reservoirs in Iran. The proposed method is applied to evaluate the forecasting performance of the algorithms during the large flood of 2019. The applied algorithms were developed based on the 1-month and 2-month time-lag patterns. Monthly reservoirs inflow were used to train the forecasting algorithms. The forecasting skill of the algorithms were compared using the Correlation Coefficient, Root Mean Squared Error, Nash-Sutcliffe efficiency, and Mean Absolute Error. The capacity of RT to forecast the largest reservoir inflows in 2019 indicates that the reservoir inflows in 2019 could have been forecasted accurately. The results showed that SVM and RT had better accuracy among the algorithms. The SVM model with the 1-month time-lag pattern performed better (22.14%) than the 2-month time-lag pattern according to the upper quartile (Q75) of forecast errors distribution in forecasting the Karkheh reservoir's inflow. In contrast, the RT model had better accuracy (99%) with the 2-month time-lag pattern. Furthermore, SVM and RT had better performance with the 1-month time lag based on the low value of Q75 in forecasting inflow to Dez (86.12 and 25%, respectively) and Gotvand (1 and 7.69%, respectively) reservoirs. This study's results guide FIRO for improved reservoir management, decision-making and planning, and optimal reservoir storage allocation for flood control. Accurate forecasting of reservoir inflow is imperative for effective and timely flood control, reduction of damages, and for reducing the risk of not meeting downstream water demands.
Future research may be applied to develop ensemble models and comparing their performance with the ML algorithms in forecasting the 2019 reservoir inflows. Furthermore, comparing the forecasting skill of the ML algorithms with those of physically-based models for forecasting reservoir inflows would provide a comprehensive assessment of the relative advantages of these forecasting methods. Employing remote sensing data in data-sparse areas, especially for developing countries, would be worth pursuing in future works.    www.nature.com/scientificreports/

Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.

Code availability
The codes that support the findings of this study are available from the corresponding author upon reasonable request.