Novel Hybrid Evolutionary Algorithms for Spatial Prediction of Floods

Adaptive neuro-fuzzy inference system (ANFIS) includes two novel GIS-based ensemble artificial intelligence approaches called imperialistic competitive algorithm (ICA) and firefly algorithm (FA). This combination could result in ANFIS-ICA and ANFIS-FA models, which were applied to flood spatial modelling and its mapping in the Haraz watershed in Northern Province of Mazandaran, Iran. Ten influential factors including slope angle, elevation, stream power index (SPI), curvature, topographic wetness index (TWI), lithology, rainfall, land use, stream density, and the distance to river were selected for flood modelling. The validity of the models was assessed using statistical error-indices (RMSE and MSE), statistical tests (Friedman and Wilcoxon signed-rank tests), and the area under the curve (AUC) of success. The prediction accuracy of the models was compared to some new state-of-the-art sophisticated machine learning techniques that had previously been successfully tested in the study area. The results confirmed the goodness of fit and appropriate prediction accuracy of the two ensemble models. However, the ANFIS-ICA model (AUC = 0.947) had a better performance in comparison to the Bagging-LMT (AUC = 0.940), BLR (AUC = 0.936), LMT (AUC = 0.934), ANFIS-FA (AUC = 0.917), LR (AUC = 0.885) and RF (AUC = 0.806) models. Therefore, the ANFIS-ICA model can be introduced as a promising method for the sustainable management of flood-prone areas.


Description of the Study Area
Located between longitudes of 51° 43′ to 52° 36′E, and latitude of 35° 45′ to 36° 22′N, Haraz watershed is situated in the south of Amol City in Mazandaran Province, Northern Iran. This watershed drains an area of 4,014 km 2 . Elevation of the watershed ranges from 328 m to about 5595 m at the highest point and its ground slope varies from flat to 66° (Fig. 1). The average annual rainfall of Mazandaran province is 770 mm, and for the Haraz watershed case study, it is 430 mm. The maximum rainfall occurs in January, February, March, and October with an average monthly rainfall of 160 mm (Iran's Meteorological Organization) during these four months. The climate in the study area is a combination of the mild humid climate of the Caspian shoreline area and the moderate cold climate of mountainous regions. The area is almost entirely encompassed by rocks from Mesozoic (56.4%), Cenozoic (38.9%), and Paleozoic (4.7%) eras. The rangeland covers around 92% of the study area. Other land use patterns present in the area include forest, irrigation land, bare land, garden land, and residential area.
The main residential centres located in Haraz watershed include Abasak, Baladeh, Gaznak, Kandovan, Noor, Polur, Rineh, Tashal, and Tiran 5 . The Haraz watershed was considered for the present study due to the many devastating floods that have occurred annually, claiming lives and incurring damage to the property in recent years. The major reason for flooding in these areas is high rainfall intensity within a short period of time, major land use changes specifically from rangelands to farmlands, deforestation, recent extensive changes from orchards to residential areas, and the lack of control measures required to curb flooding. In the study area, floods lead to fatalities and damages to infrastructure, and disruption of traffic, trade and public services.

Results and Discussion
Spatial relationship between floods and influential factors using SWARA model. The first step in this study was determining the sub-factor weights based on the SWARA algorithm, as presented in Table 1. Table  republished from 27 . In this study, 10 factors showing a significant impact on flood occurrence were selected based on literature review, the characteristics of watershed, and data availability 5 . One of the most significant factors in flood modelling is the slope angle. The slope map was developed and then classified into seven classes by the natural break classification method. The SWARA value was the highest for the class 0-0.5 (0.4). Hence, the results revealed that the higher the slope angle, the less the probability of flooding will be. Another important factor was elevation which was classified into nine classes. The results showed that the first class, 328-350 m, with a weight of 0.63, had the highest impact on flooding such that the higher the elevation, the lower the flooding probability will be. This was confirmed by Khosravi et al. 5 and Tehrany et al. 19 who explained that slope angle, elevation, and distance to river had higher impacts on flooding such that once the slope angle, elevation, and distance to river increased, the probability of flooding diminished. Meanwhile, curvature is representative of topography of the ground surface. This factor was prepared and classified into three classes including concave, convex, and flat. For curvature, the highest SWARA value was found for concave followed by flat and convex.
The results of SPI revealed that the class of 2000-3000 (0.32) showed a higher impact on flood occurrence than other classes such that higher SPIs were associated with higher flood probability. In the case of topographic wetness index (TWI), the SWARA value showed a decreasing trend when the TWI value increased. The highest SWARA value belonged to the class of 6.96-11.5 (0.08). TWI indicates the wetness of land; wetness was higher in areas with lower slope than mountainous areas. This suggests that larger TWI values are associated with higher flood probability. River density results also showed that the river density values higher than 3.66 had the highest correlation with flood occurrence (0.37). The results of the SWARA model revealed that as the river density grew, the probability of flood occurrence also increased. For distance-to-river factor, the results demonstrated that the first class of 0-50 m (0.59) had a relatively higher susceptibility to flood occurrence. The farther the distance to the river, the less probable the flood occurrence was. For lithology, Teryas formations had the highest SWARA value (0.31) suggesting the highest probability of flood occurrence.
The land use map was constructed for seven types including water bodies, residential area, garden, forest land, grassland, farming land, and barren land. The water bodies demonstrated a higher impact on flood occurrence (0.75), followed by residential area (0.15), garden (0.06), forest land (0.02), grassland (0.01), farming land (0), and barren land (0). The highest SWARA value (0.4) was assigned to the lowest amount of rainfall (188-333), where the higher the rainfall level, the less the flood occurrence probability was. The reason why rainfall increase was still insignificant for flooding was that rainfall increased in areas where elevation increased and these areas were not prone to flooding 5 . Sensitivity analysis of conditioning factors. The results of sensitivity analysis are reported in Table 2.
Based on the testing error values, the results indicated that in the ANFIS-ICA hybrid model, when the slope angle, elevation, curvature, TWI, SPI, rainfall, distance to river, river density, lithology, and land use were removed, the RMSE values were 0.196, 0.141, 0.172, 0.145, 0.133, 0.143, 0.134, 0.172, 0.145, and 0.134, respectively. It was observed that when all factors were considered in the modelling process, the error value was 0.130. It was also seen that all conditioning factors had a positive role in the flood modelling process. On the other hand, the results of sensitivity analysis for ANFIS-FA hybrid model concluded that with removing the slope angle, elevation, curvature, TWI, SPI, rainfall, distance to river, river density, lithology, and land use factors, the RMSE values were computed as 0.269, 0.196, 0.257, 0.193, 0.173, 0.198, 0.171, 0.260, 0.201, and 0.171, respectively. In this model, when all factors were introduced into the model, the error value was 0.170. Overall, the results suggested that all factors were also significant for flood modelling in ANFIS-FA hybrid model.

Application of ANFIS ensemble models.
Using the MATLAB programming language, two hybrid models called ANFIS with complementary models of FA and ICA were developed, leading to ANFIS-ICA and ANFIS-FA, which were initially trained. The accuracy of training was then calculated using test data. To that end, the dataset related to flood and non-flood was segmented into 70% for training and 30% for testing. Then, the two methods started extracting and learning the relationship between the SWARA values for conditioning factors and flood (1) as well as non-flood (0) values.

Model results and validation.
The results of the model prediction capacity were evaluated using Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) for both modelling and validation 26   model was the model that predicted the results of testing data with higher accuracy. The values of MSE and RMSE for ANFIS-ICA and ANFIS-FA were obtained as 0.41 and 0.35, and 0.169 and 0.129, respectively. Since MSE and RMSE of ANFIS-ICA were lower than of those of ANFIS-FA, it can be concluded that ANFIS-ICA was a more optimal model than ANFIS-FA, since ANFIS-ICA had higher accuracy in both the training and testing.
Preparation of flood susceptibility mapping using ANFIS ensemble models. In this study, the ensembles of ANFIS based on the ICA and FA algorithms were developed by the training dataset. The developed models were then applied for calculating the flood susceptibility index (FSI) which was assigned to all pixels of the study area to create flood susceptibility maps (FSM). In the first step, each pixel of the study area was assigned to a unique flood susceptibility index; then, these indices were exported in Arc GIS 10.2 format and used to construct the final flood susceptibility mapping.
There are different techniques for map classification in the Arc GIS 10.2 software including natural break, equal interval, geometrical interval, quantile, standard deviation, and even manual technique. They should be tested to select the most appropriate one to classify any map in the study area. Generally, selection of the classification methods is based on the distribution of flood susceptibility indexes 28 . For example, if the distribution of FSI values is close to normal, to prepare an equal interval, FSM or standard deviation classification methods should be used. However, when FSI has positive or negative skewness, the best classification methods are quantile or natural break 29 . In the current study, the data distribution histogram revealed that quantile classifier was capable of generating better results compared to other classifiers. Thus, the approach of quantile data classification was selected, and FSI was classified in four classes of susceptibility: low, moderate, high, and very high 19 . Finally, based on the hybrid model, the two maps of flood susceptibility were constructed, as depicted in Figs 4 and 5.
Validation and comparison of new hybrid flood susceptibility models. Validation of the proposed ensemble models. The reliability and predictive power of these two new hybrid models for flood modelling were assessed using success and prediction rate curves. The outcomes of success and prediction rate were prepared for 70% (training data) and 30% (testing data that were not used in training) of data. Since the success rate curve was drawn using the training data, It was impossible to evaluate the predictive power of the models 30 . On the other hand, the AUC for prediction rate curve showed how well the model predicted floods 31 . The area under ROC curve was considered for evaluating the overall performance of the models. According to that, Larger AUC represented better performance of the model. The ROC method has been one of the most popular techniques to evaluate the efficiency of models, as this method quantitatively calculates the efficiency of models 5 . The results of success-rate showed that the AUC values for the two models of ANFIS-ICA and ANFIS-FA were 0.95 and 0.93, respectively (Fig. 6a). The prediction rate which was not used in the modelling was applied to the assessment of the model capacity in predicting flood-prone areas. The values of area under the prediction-rate curve for ANFIS-ICA and ANFIS-FA were 0.94 and 0.91, respectively (Fig. 6b). Then, the highest predictive power for  flood-prone areas in the Haraz watershed was provided by the new hybrid ANFIS-ICA model, which had also a lower standard error of 0.35 compared to the ANFIS-FA (0.41) model. The results were compatible with those of the RMSE and MSE values in both the training and testing phases. These two new hybrid models had also a reasonable ROC; yet, the ANFIS-ICA model represented the best performance in predicting the flood susceptibility, followed by the ANFIS-FA model. In addition to the success and prediction rate curves, Friedman and Wilcoxon signed-rank tests were also utilized for assessing the validity of the two new hybrid models (the performance of the models). They determined whether there were significant statistical differences between performances of the two hybrid models. The results of Freidman test reported in Table 3 indicated that the average ranking values for the ANFIS-ICA and ANFIS-FA hybrid models were 1.29 and 1.71, respectively. Although chi-square value was 35.945, due to having a significant level of less than 5%, the Friedman test was not applicable to assess the validity of models. To overcome this challenge, the Wilcoxon signed-rank test was performed for determining the pairwise differences between the two flood hybrid models at the 5% significance level (Table 4). No significant difference was detected between the two flood hybrid models at the significance level of 5% when the null hypothesis was rejected. On the other hand, in this case their results were not the same. The results of Wilcoxon signed-rank test were specified based on the p-value and z-value. If the p-value was less than 5% (0.05), and z-value was larger than −1.96 and +1.96, the performances of the two models were significantly different 32 . The results of Wilcoxon signed-rank test, shown in Table 3, imply that there was a statistical difference between the two flood susceptibility models.
Comparison with some state-of-the-art sophisticated machine learning techniques. Chapi et al. 24 introduced a novel ensemble data mining model called Bagging-logistic model tree (LMT) for mapping of flood susceptibility in this study area. Then, the results of this new model were compared with those of some state-of-the-art sophisticated    algorithm, ANFIS-ICA (AUC = 0.947) outperformed all these models, but ANFIS-FA was unable to show a greater performance than the Bagging-LMT, BLR and LMT models. However, AUC ANFIS-FA was better than the LR and RF models. Overall, the new proposed model, ANFIS-ICA, was a powerful ensemble model which improved the prediction accuracy among all models in this study area.

Conclusions
The dynamic nature of floods will always necessitate new approaches and models for its management. This is the main reason that no one has been able to introduce the best model. Unpredicted spatial-temporal changes of this phenomenon have forced scientists to continuously seek for better approaches to generate better results and outcomes; sometimes new individual models and sometimes combinations of the individual models known as hybrid models. In the current study, two new hybrid models, ANFIS-ICA and ANFIS-FA, were applied to enhance the predictive power of flood spatial modelling. A total of 10 conditioning factors were selected for the spatial modelling in the Haraz watershed, Northern Iran. The factors were divided into several classes using the SWARA model and their maps were generated for modelling purposes. The new hybrid models were then used for modelling spatial floods in the study area. The models' outputs were compared to the results of several soft computing approaches that had been successfully used before in the study area whose accuracy had been approved. The ANFIS-ICA model was the most successful among all the models and its results provided the most appropriate congruence with reality. Although the results of ANFIS-FA were reasonable, the model was not ranked as the best. Accordingly, this study introduced a new model, ANFIS-ICA, for spatial prediction of floods in the study area and other similar watersheds. This model can be evaluated as a tool for more appropriate mitigation and management of floods in Northern Iran.

Materials and Methods
Data collection and preparation. Flood inventory maps, which are generated in accordance with historical flood data, are the cornerstones of spatial prediction of floods. The quality of historical data is closely related to the accuracy of the prediction model 33     Spaceborne Thermal Emission and Reflection Radiometer (ASTER) Global DEM (https://gdex.cr.usgs.gov/gdex/) with a resolution of 30 × 30 m), and geological maps in the Haraz watershed were also used in data collection and preparation. Eventually, 201 flood points and non-flood points were selected utilizing satellite images. In order to build spatial prediction models and to test the validity of model results, these data points were randomly divided into a training (70%; 141 locations) and a testing (30%; 60 locations) set 24 . In the present study, 10 flood conditioning factors were adopted: curvature, distance to river, elevation, land use, lithology, rainfall, river density, SPI, slope, and TWI. These factors were extracted using DEM and geological maps by ArcGIS software.
Sensitivity analysis. The relative importance of different conditioning factors affecting the modelling process and outputs should be assessed using sensitivity analysis approaches 34,35 . Ilia and Tsangaratos (2016) reported that the sensitivity analysis is performed using three methods including: (i) Changing criteria values, (ii) Changing relative importance of criteria, and (iii) Changing criteria weights 36 . The current study adopted the second method, changing relative importance of criteria, such that each conditioning factor was firstly removed and then modelling was performed with other factors. Accordingly, the obtained results were compared with the condition taking all conditioning factors for the modelling into account by error estimation such as RMSE measure.

Theoretical background of the methods used. Stepwise Assessment Ratio Analysis (SWARA). The
Step-wise Assessment Ratio Analysis (SWARA), which was first proposed by Keršuliene in 2010, is one of the Multi-Criteria Decision Making (MCDM) methods. Experts play an important part in calculating the weights in this method, and hence the SWARA method is called an expert-oriented method 37 . The expert assigns the highest rank to the most valuable and the lowest rank to the least valuable criteria. Subsequently, the average value of these ranks determines the overall ranks. The SWARA method follows these steps 37 : Step 1: Criteria are developed and determined. The expert must develop decision-making models from the determined factors. Moreover, the criteria are arranged in accordance with their priority and the importance assigned by the expert's viewpoint and then the influential criteria are sorted in a descending order.
Step 2: criteria's weighting is calculated. Then, the weights are assigned to all criteria based on expert's knowledge, information gained for the case study, and the previous experience as follows: The respondent expresses the comparative importance of the criterion j compared to the previous (j − 1) criterion starting from the second criterion, with the trend repeating the same way for each particular criterion. The trend, according to Keršuliene, determines the relative importance of the average value, S j in this manner 37 : where, n represents the number of experts; A i stands for the ranks suggested by the experts for each factor; and j shows the number of factors involved. The coefficient K j is calculated as: The recalculation of Q j is done by: The relative weights of evaluation criteria can be expressed as: where, W j denotes the relative weight of the j-th criterion, and m represents the total number of criteria.
Adaptive neuro-fuzzy inference systems (ANFIS). Adaptive Neuro Fuzzy Inference System (ANFIS) is a data-driven model belonging to the family of neuro-fuzzy methods. The ANFIS is a hybrid algorithm integrating neural network with fuzzy logic which generates input-output data pairs 38 . This algorithm, first proposed by Jang in 39 , is used in many fields such as data processing of landslides 40 , flooding 26 , and groundwater 41 . The fuzzy inference system has two inputs x 1 and x 2 , one output z, the fuzzy set A 1 , A 2 , B 1 , B 2 , and p i , q i , and r i which are the consequence of output function parameters in a set of fuzzy IF-THEN rules based on the first-order Takagi-Sugeno fuzzy model 42 as follows: x A y B f px qy r Rule 1: if is and is , then The ANFIS is a feedforward neural network with a multi-layer structure 43 . The most important point is the parameters of ANFIS which must be optimized by other methods. In this study, the imperialist competitive algorithm (ICA) and the Firefly algorithm (FA) were used for optimization.
Firefly algorithm (FA). As an evolutionary algorithm, Firefly algorithm was first introduced by Yang in 44 . In recent years, many researchers have been using Firefly algorithm for optimization purposes 45 . The results of this algorithm for solving optimization problems have been more satisfactory than other algorithms including SA, GA, PSO, and HAS 46 . This meta-heuristic was inspired by firefly's flashing and their communication 44 . There are about 2000 firefly species, and most of them produce rhythmic and short flashes 47 . Similar to other swarm intelligence algorithms whose details can capture a problem solution, in this algorithm each firefly is a solution to a problem. The light intensity also matches the objective function value. Notably, a firefly which has a brighter light is the solution and this firefly absorbs others.
Imperialist Competitive Algorithm (ICA). One of the novel evolutionary algorithms developed according to socio-political relations is the colonial competition, introduced originally by Atashpaz-Gargari and Lucas 48 . Nowadays, ICA is known as a rich meta-heuristic algorithm and is used for optimization 49 . As with other evolutionary algorithms, ICA is made up of a series of components each indicating a solution to problems trying to find the best solution. A general flowchart of flood susceptibility mapping can be seen in Fig. 8.
Evaluation and comparison methods. Statistical error-index. Evidently, when a new hybrid model is introduced, the performance of models should be evaluated and compared for both training and testing datasets. Basically, the results of modelling using a training dataset represent the degree of fit of models, while the results of a testing/validation dataset indicate the predictive power of modelling 30 .
In this study, root mean square error (RMSE) and mean square error (MSE) were applied as statistical error evaluation criteria along with ROC curve for evaluating and comparing the performance of the two flood hybrid models. RMSE and MSE can be calculated as 50 : Receiver operating characteristic curve. Receiver Operating Characteristic (ROC) is another standard technique to determine the general performance of models, which has been used in geoscience 51 . It is constructed by plotting the values of two statistical indexes, "sensitivity" and "100-specificity", on the y-axis and x-axis, respectively 52 . Based on the definition, the number of positive cases (flood) which are correctly classified as positive (flood) class refers to sensitivity, while 100-specificity is considered as the number of negative (non-flood) cases correctly classified as negative (non-flood) class. The area under the ROC curve (AUROC) quantitatively evaluates the models' performance and their capability of predicting an event's occurrence or non-occurrence 53 . The AUROC represents how well a flood model generally performs. It ranges between 0.5 (inaccuracy) and 1 (perfect model/high accuracy), where the higher the AUROC, the better the model performance is 54 . The AUROC can be formulated as: AUROC TP TN P N (9) = ∑ + ∑ + where, TP refers to the percentage of positive instances which are classified correctly. TN shows the percentage of negative instances which are classified correctly. Then, P denotes the total number of events (flood), and N is the total number of no-events (non-flood).
Non-parametric statistical tests. The performance of the two new hybrid models for flood modelling was assessed by parametric and non-parametric statistical tests 24 . If the data follows a normal distribution with an equal variance, parametric methods are applied 55 . Although non-parametric tests are free from any statistical assumptions, they are safer and their results are stronger than those of parametric tests, since the former do not assume normal distribution or homogeneity of variance 56 . Since the conditioning factors have been classified as categorical classes, non-parametric statistical tests should be used. For this reason, Freidman 57 and Wilcoxon 58 sign rank tests were used to assess the differences between treatments of hybrid models. The objective of these tests was to reject or accept the null hypothesis stating that the performances of flood hybrid models were not different at the significance level of α = 0.05 (or 5%). The Friedman test was first used for the data to compare the significant differences between two or more models. Bui et al. 54 reported that in the Friedman test, if p-value is less than 0.05, then the result is not applicable and we are not able to judge the significant differences between two or more models 54 . Therefore, the Wilcoxon signed-rank test was applied to specify the statistical significance of systematic pairwise differences between the two flood hybrid models using two criteria including p-value and the z-value. Eventually, the null hypothesis was rejected according to the p-value < 0.05 and z-value exceeding the critical values of z (−1.96 and +1.96). Accordingly, the performance of the susceptibility models differed significantly 54 .