Credal decision tree based novel ensemble models for spatial assessment of gully erosion and sustainable management

We introduce novel hybrid ensemble models in gully erosion susceptibility mapping (GESM) through a case study in the Bastam sedimentary plain of Northern Iran. Four new ensemble models including credal decision tree-bagging (CDT-BA), credal decision tree-dagging (CDT-DA), credal decision tree-rotation forest (CDT-RF), and credal decision tree-alternative decision tree (CDT-ADTree) are employed for mapping the gully erosion susceptibility (GES) with the help of 14 predictor factors and 293 gully locations. The relative significance of GECFs in modelling GES is assessed by random forest algorithm. Two cut-off-independent (area under success rate curve and area under predictor rate curve) and six cut-off-dependent metrics (accuracy, sensitivity, specificity, F-score, odd ratio and Cohen Kappa) were utilized based on both calibration as well as testing dataset. Drainage density, distance to road, rainfall and NDVI were found to be the most influencing predictor variables for GESM. The CDT-RF (AUSRC = 0.942, AUPRC = 0.945, accuracy = 0.869, specificity = 0.875, sensitivity = 0.864, RMSE = 0.488, F-score = 0.869 and Cohen’s Kappa = 0.305) was found to be the most robust model which showcased outstanding predictive accuracy in mapping GES. Our study shows that the GESM can be utilized for conserving soil resources and for controlling future gully erosion.

effects including land degradation, soil fertility loss, and accumulation of sediments, landslide, flooding and decline of water quality [5][6][7] . GE not only causes environmental deterioration but also immensely impacts the socio-economic aspects 8 . Previous studies have shown the main role of GE in transporting sediments from upper region of the catchments 9 . Thus, a precise evaluation of gully erosion susceptibility (GSE) is an essential requirement for planners and decision-makers in controlling the subsequent problems of GE and for a sustainable management of soil resources 3 .
Various factors including topographic, geologic, hydrologic, environmental, climatic and anthropogenic activities, instigate the process of GE [10][11][12] . Rahmati et al. 10 reported that drainage density, distance to stream and land use also play a vital role in triggering GE. Zhao et al. 12 noted that GE is mostly initiated by natural processes rather than anthropogenic activities and that the density of gullies is reliant on the intensity of vegetation cover and topographic features.
Most of the physically based models reported in earlier studies of gully erosion were not aimed at predicting the gully hotspots, but focused on quantifying the erosion rates 11 . For predicting the evolution of gullies, dynamic and static models have been utilized previously based on the development phase of the gully 2 . However, both these models require different erosion factors which are hard to quantify for a large area. Thus, for the gully erosion susceptibility mapping (GESM), researchers utilized various models such as knowledge based, statistical and machine learning algorithms (MLAs) coupled with geographical information system (GIS) and remote sensing (RS) 13 . The knowledge-based models include multi-criteria decision-making models (MCDM) that involve the decision made by experts to prepare the GESM. Even though there are more than nearly 20 MCDM models available, the derived factor weights based on these models are still subjective 14 . Several bivariate and multivariate statistical models such as frequency ratio 15 , logistic regression 16 , weights of evidence 17 , and certainty factor 18 also used for generating GESM. The benefit of employing statistical models is that various types of predictor variable can be easily accommodated in the evaluation 13 . The disadvantages of using simple bivariate models are that these could be ad-hoc processes owing to the poor probability distribution that the bivariate models depend on 15 . In the case of parametric multivariate models, the resultant spatial maps become smoother than in MLAs, and provide more elaborate maps of GES 19 .
Various MLAs including random forest 20 , logistic model tree (LMT) 13 , support vector machine(SVM) 21 , naive Bayes tree (NBT) 13 , multivariate adaptive regression spline (MARS) 22 , generalized linear model (GLM) 23 , artificial neural network (ANN) 24 , boosted regression tree (BRT) 22 , mixture discriminant analysis (MDA) 18 , classification and regression trees (CART) 25 , and functional data analysis 14 are commonly utilized for the creation of GESM. The MLAs exhibit a superior predictive accuracy than statistical models in GESM owing to their advantage in handling huge datasets and potential ability in assessing the intricate relationship between dependent and predictor variables 26 . Performance of individual models can be enhanced using hybrid ensemble methods 27,28 . Hybrid ensemble methods outperform the forecast preciseness of individual MLA 29 . Arabameri et al. 30 showed that meta-classifiers increase the classification accuracy of the base classifiers in gully erosion susceptibility modelling. It is essential to test a novel base classifier using different meta-classifiers 11 . Chowdhuri et al. 31 reported high predictive accuracy of hybrid ensemble BRT-bagging (BA) algorithm in comparison with the individual BRT and bagging algorithms. Similar results were displayed by Roy and Saha 32 in their study in which the authors reported Multilayer perceptron neural network-dagging (DA) ensemble.
In this study, we propose novel hybrid ensemble models for mapping GES based on a case study on the Bastam sedimentary plain of Northern Iran. Apart from individual credal decision trees (CDT) model, we integrated four meta-classifiers including bagging, dagging, rotation forest (RF) and alternating decision tree (ADTree) with a base-classifier, i.e., the CDT for GESM. To our knowledge, no previous study has employed the CDT both as a base classifier in a hybrid ensemble model and as an individual model for predicting the GES. The four hybrid ensemble models, namely CDT-BA, CDT-DA, CDT-RF and CDT-ADTree along with CDT were compared, and the best model is identified. The significance of the gully erosion conditioning factors (GECFs) for mapping GES is evaluated using the random forest model. The predictor variables used in this work for forecasting GES include clay content, bulk density, elevation, distance to road, distance to stream, drainage density, lithology, land use/ land cover (LU/LC), normalized difference vegetation index (NDVI), rainfall, terrain rugged index (TRI), slit content, slope degree, and topography wetness index (TWI).

Results
Outcome of multi-collinearity test. The values of VIF and tol used for testing the multi-collinearity among GECFs are given in Table 2. The NDVI shows minimum VIF value of 1.099 and TRI has maximum VIF value of 4.184 and, since the tol is the reciprocal of VIF, the NDVI and TRI acquired the maximum (0.910) and minimum tol value (0.239). The VIF and tol values of GECFs from Table 1 indicate that there is no linear dependency among the GECFs and confirms that all the selected fourteen GECFs can be utilized for the generation of GESMs (Table 2).
Relative significance of GECFs. This study employed the random forest algorithm for assessing the significance of GECFs in mapping GES. The confusion matrix created by random forest with gully presence (1) and gully absence (0) information is provided in Table 3. The algorithm generated an OOB error of 6.54%, which infers that the precision of the predicted values is equivalent to 93.46%. From Table 2, it can be observed that among 201 non-gully locations, 190 were identified as non-gully locations and 11 were determined to be gully locations. On the other hand, among 212 gully locations, 196 were predicted as gully locations while 16 were identified as non-gully locations. The outcome of the relative significance of GECFs assessed using the mean decrease in accuracy and mean decrease Gini of the random forest algorithm is provided in   www.nature.com/scientificreports/  www.nature.com/scientificreports/ tively (Fig. 1b). The percentage of gully pixels present in very high to very low GES classes are 76.79%, 14.33%, 4.78%, 2.73% and 1.37%, respectively ( Table 4). The very high and high GES categories comprise 225 and 42 gully pixels whereas the moderate, low and very low GES categories comprised 14, 8, and 4 gully pixels, respectively. The total quantity of pixels in each GES zones of CDT-DA model is shown in Table 4.

CDT-alternative decision tree (ADTree).
In the case of GESM generated by CDT-ADTree, the percentage of pixels covering very high and high GES categories are 26.75% and 21.20% whereas those of other GES categories including moderate, low and very low classes were 21.93%, 14.74%, and 15.37%, respectively (Fig. 1c). The percentage of gully pixels in very low, low, moderate, high and very high GES regions is 76.11%, 16.72%, 3.41%, 2.37%, and 1.02% whereas the number of gully pixels present in the same order of GES regions was 223, 49, 10, 8 and 3, respectively ( Table 4). The information on the number of pixels in each susceptibility class of CDT-ADTree model is given in Table 4.
CDT-bagging (BA). The GESM predicted by CDT-BA (Fig. 1d) Table 4). The number of gully pixels existed in the same order of GES classes are 223, 45, 14, 9, and 2, respectively. The number of pixels present in each category of GES generated by CDT-BA is displayed in Table 4.

Outcome of validation measures and model comparison.
In this study, we assessed the predictive performance of CDT-DA, CDT, CDT-ADTree, CDT-RF, and CDT-BA models with the help of different validation metrics such as accuracy, sensitivity, specificity, F-score, AUROC, Cohen's Kappa, and RMSE using both calibration (Fig. 2) and testing dataset (Fig. 7).     5). The outcome of validation techniques including accuracy, sensitivity, specificity, F-score, AUROC, Cohen's Kappa, odd ratio and RMSE displayed the excellent predictive ability of models in mapping GES. Based on the training and testing performance of the models, it is found that CDT-RF was the best model followed by CDT-ADTree, CDT-BA, CDT-DA and CDT models.
The values of SCAI ( Fig. 6) generated from GES of CDT-DA, CDT, CDT-ADTree, CDT-RF, and CDT-BA models increased from very high to very low susceptibility. This outcome of SCAI reveals the enhanced predictive performance of the GES models employed in this study.
Even though the newly developed approaches have advanced from traditional statistical techniques to the MLAs 57-60 , recent studies attempt to formulate novel/hybrid models that could achieve better predictive performance than previously employed approaches. Thus, several studies have successfully enhanced the forecast ability of the MLAs by employing diverse novel ensemble methods. In this study, study we presented a novel hybrid ensemble for GESM in Bastam sedimentary plain of Northern Iran. We employed five MLAs for modelling GES among which four were novel hybrid ensemble models constructed by combining BA, DA, ADTree  Fourteen GECFs including clay content, bulk density, elevation, distance to road, distance to stream, drainage density, lithology, LU/LC, normalized difference vegetation index (NDVI), rainfall, terrain rugged index (TRI), slit content, slope degree and topography wetness index (TWI) were chosen for the modelling of GES. The dependency test among the GECFs was carried out which exposed that there was no correlation, thus making it applicable for processing the outcome. The importance of GECFs in modelling GES was assessed using the random forest algorithm, which revealed that drainage density, distance to road, rainfall and NDVI were the most influential factors of GES whereas slope degree, elevation, silt content, bulk density, TWI and TRI exhibited moderate control over the GES. Similarly, Pourghasemi et al. 8 showed that drainage density, distance to stream, soil content and altitude largely influence the initiation of GE. Likewise, Arabameri et al. 61 determined distance to stream and distance to road to influence the GES most. Capra et al. (2009) 62 reported that formation of GE is higher when the vegetation cover decreases, and soil wetness increases due to high rainfall. Kariminejad et al. 63 determined that silt content and slope angle influence GES. Arabameri et al. 11 showed that topographic factors such as TWI, TRI and elevation has moderate control over the instigation of GE.
The process-response of a river catchment area is highly influenced by several environmental factors, among which drainage is the most vital one, which has a strong positive correlation with gully head cut retreat 11 . The pattern of drainage is also critical in the initiation and further development of gullies. The drainage pattern in a river catchment area is highly affected by nature and structure of the geological formation, soil characteristics, density of vegetation coverage, infiltration rate, and slope degree 22 . Previous studies on gully erosion have shown that initiation and development of gullies are connected to the stream networks and gullying by streams are responsible where favorable conditions are available for their development 20 . The slope instability of an area is causes by initiation of river and the associated toe erosion and fluctuations of groundwater level. Moreover, the degree of surface incision is highly dependent on the pattern of drainage network of an area. The development and pattern of drainage of an area is directly related to the power of degree of surface incision 22 . The road and undercutting construction work gradually increases the strain and stress of the slope which significantly influences slope disturbances and failure 20 . The pattern and rate of surface runoff is mainly determined through road networks, and the concentrated surface runoff flow from one catchment area to another leads to steady increase in watershed size which is ultimately responsible for the process of gullying 20 . The major finding of this research is that CDT-RF (AUSRC = 0.942, AUPRC = 0.945, accuracy = 0.869, specificity = 0.875, sensitivity = 0.864, RMSE = 0.488, F-score = 0.869 and Cohen's Kappa = 0.305) was determined to be the finest model having superior accuracy than the rest of the hybrid models. The CDT-RF is followed by CDT-ADTree, CDT-BA, CDT-DA and CDT. This clearly shows that RF meta-classifier enhances the predictive performance of individual CDT model. It is also true in the case of other meta-classifiers, namely ADTree, BA and DA, which improves the forecast accuracy of the base classifier. The higher performance of RF can be due to utilization of the feature abstraction method to augment the learning groups for calibrating the base classifiers.
The low predictive accuracy of CDT can be owing to the subset in that the sub-dataset formed is dissimilar from a particular issue field which generates fairly diverse trees 64 . It should also be noted that RF is a powerful MLA that is derived from random forest algorithm. He et al. 65 also showed that RF increases the predictive ability CDT than any other meta-classifiers such as BA and multiBoostAB (ABM). Nguyen et al. 66 also determined that different meta-classifiers ABM and radial basis function network (RBFN) increases the forecast ability of CDT. Similarly, both Pham et al. 67 and Nguyen et al. 68 demonstrated that meta-classifier helps base classifier CDT in improving the predictive performance in modelling landslide and flash flood vulnerability. From the present study, it is evident that combining meta-classifier such as RF, ADTree, BA and DA with the base-classifier such as CDT would increase its performance in accurately predicting GES. The general advantage of meta-classifiers

Concluding remarks
Identifying precise and robust algorithms for decreasing inaccuracies in GESM and demarcating GES zones is crucial. This research employed four novel hybrid ensemble models (CDT-RF, CDT-ADTree, CDT-BA and CDT-DA) for predicting GES with the aid of fourteen GECFs and 293 gully locations. Various validation measures including SRC, PRC, specificity, sensitivity, Cohen's Kappa, F-score, accuracy, RMSE and odd ratio were employed for assessing the model outcome using both calibration as well as testing dataset. The outcome of cross-checking revealed that all the employed models had excellent predictive accuracy, among which CDT-RF is identified to be the most robust model. In addition, the outcome of SCAI also suggests the better performance of the models in predicting GES. Our study reveals that meta-classifiers increase the predictive efficacy of base classifiers in modelling GES. The models used in this research can be also applied in other study areas. The GESM generated by CDT-RF model for Bastam sedimentary plain of Northern Iran can therefore be utilized in controlling the occurrence of future gullies and sustainable management of soil resources.

Methods
Description of the study area. The Bastam sedimentary plain is one of the most GE prone watersheds located in the Semnan Province of Northern Iran (Fig. 7).  71,72 . Among the several soil types found in the present study area, aridisols cover the maximum portion, constituting the dominant soil type. The evaluation of gullies has indicated that this area is highly susceptible to gully erosion as nearly 10.34% of the study area is affected by ephemeral gully erosion. The low slope area is found to be highly susceptible for gully erosion, with the south-central part more prone to gully erosion as this region is dominated by low slope zone. On the other side, steep slope zone with rocky outcrops in the northern portion of the study area is conquered by a small number of gullies. Morphometric analysis of gullies indicates that the length of gullies ranges from few meters to several hundred meters. The www.nature.com/scientificreports/ width also varies from few centimeters to several meters and depths can be as much as several meters. The length of the gullies ranges from 364 m (maximum) to 0.95 m (minimum) and depths vary from 6.3 to 0.63 m. Our field survey also reveals that northern part of the study area is dominated by V-shaped cross-section of gullies as this area is characterized by rocky outcrops and steep slope. However, the central and southern parts are dominated by U-shaped gullies, as this area is low slope zone with coverage of more erodible soils and more concentrated runoff and associated erosional activities.

Methodology
The mapping of GES with the help of novel ensemble models, including CDT-BA, CDT-DA, CDT-RF and CDT-NBT was executed based on the four following phases (Fig. 8).
(1) Initially, the spatial distribution of existing gullies (dependent variable) and GECFs (predictor variables) were prepared for GESM. (2) This was followed by the assessment of multi-collinearity among GECFs. This evaluation is implemented to eliminate noisy GECFs and to confirm that there is no correlation among the predictor variables that could affect the prediction of GE.
(3) With the aid of calibration dataset, GESM is generated based on the five models (CDT, CDT-BA, CDT-DA, CDT-RF and CDT-ADTree). The generation of GESMs is followed by the assessment of each independent factor's influence in predicting the GES using random forest model. 4) Using testing dataset, various validation measures www.nature.com/scientificreports/ such as the area under receiver operating characteristic curve (AUROC), accuracy, sensitivity, specificity, root mean square error (RMSE), F-score, odd ratio, Cohen Kappa and seed cell area index (SCAI) were applied for cross-checking the predictive ability of the GESM.

Preparation of gully inventory map. Mapping the extent in the location of gullies in the study area is
indispensable for predicting the GES 13 . This is because the susceptibility to most of the natural hazards, including GE is spatially modelled based on the presumption that gullies that occur in future may follow the identical conditions that triggered the existing ones 61 . Thus, understanding the association between the conditioning factors and previously existing gullies are essential 61 . We carried out detailed field investigations using the global positioning system for the preparation of gully inventory map (Fig. 9). A total of 293 gullies were identified in the Bastam sedimentary plain. These were arbitrarily split into 70% (206 gullies) and 30% (87 gullies) for model calibration and testing the predictive ability of the model 13 . In addition, an identical number of non-gully locations were also identified for the processes of model training and validation. GECFs employed in this study were created using ArcGIS 10.5. The primary and secondary topographic factors including elevation, slope degree, TWI and TRI were acquired from ALOS DEM having a spatial resolution of 12.5 m. The stream network and roads were derived from topographical map with a scale of 1:50,000. The 30 years of rainfall data from 9 stations were utilized for the interpolation of rainfall map using Inverse Distance Weighting 63 . Inverse spatial mapping of soil was performed for the areas occupied by gully headcut (GH) morphology. Around 395 soil samples were obtained from the inlets and outlets of GH by digging profile pits ranging between 0 and 2 m in size. While conducting the field investigation, 2 kg of each sample was collected and transported to the lab, where these were air-dried, followed by soil particle size analyses based on the hydrometer technique 71,72 , without eliminating the carbonates, organic matter, and secondary oxides. Secondly, the core approach 73 was utilized for estimating the bulk density. Following this, the techniques proposed by Walkley and Black (1934) 74 and Van Bavel 75 were employed in measuring the organic matter content and stability of the soil. Ultimately, the prepared soil layers were added individually to ArcGIS 10.5 and were processed to the scale of 12.5 m × 12.5 m for additional examination. The foremost soil properties, i.e., bulk density, percentages of silt, and clay content were estimated employing approved petrological techniques and mapped in the GIS. The lithological units were extracted from maps generated by 1:100,000 ( Table 5). The LU/LC of the study area is acquired from Landsat-8 data. Elevation is considered to be a significant factor that influences the occurrence of gullies 13 . It controls the processes of GE owing to its association with various factors such as precipitation, soil texture, run-off, vegetation type and cover 13 . The elevation of Bastam sedimentary plain ranges between 1359 and 2249 m. As slope angle influences runoff and drainage density, it is one of the many important factors that govern gully formation 24 . The slope angle varies from 0 to 57.96%. The TWI is generally applied for assessing the impact of topography on the infusion of water into the saturated zones of runoff generation 24 . TWI is also an effective factor that is essential for GESM owing to its association with soil erosion 11 , and is computed as follows 24 :

Preparation of gully erosion conditioning factors. GE is an intricate
where Ds and μ denote the upslope contributing region and slope incline, respectively. It also aids in assessing the water content present in the soil owing to upstream catchment area and slope 24 . TWI of the Bastam (1) TWI = ln D s tan µ www.nature.com/scientificreports/ sedimentary plain ranges from 1.728 to 21.04. TRI reflects the terrain morphology and has a considerable effect on surface runoff 24 . TRI values range between 0 and 35.45. Since gully initiation is closely associated with stream networks 61 , the distance to stream plays a major role in gully formation. The maximum and minimum distance to stream was 1050 and 0 m. Drainage density is another important factor to be considered while modelling GES as most of the previous studies have revealed that drainage density is the most influential factor in gully formation 8 . The drainage density of the Bastam sedimentary plain ranges from 0.37 and 3.63 km/km 2 . Building of roads increases the rigidity of gradients, which also leads to gully formation 11 . The minimum and maximum distance to roads are 0 and 9021.57 m. Couper 76 showed that increase in the content of silt and content of clay would lead to vertical incising of soil, which eventually results in the formation of gullies. The content of clay varies between 32 and 14%, whereas content of silt ranges from 12 to 43%. The increase in the bulk density of soil decreases the potential of plants to reduce the soil erosion. The maximum and minimum bulk density ranges between 1622 and 1491 g cm −3 . The rainfall is also a significant factor that controls surface flow and erosivity 11 . The high and low rainfall ranges between 381.12 and 159.20 mm. Vegetation cover has an inverse association with soil erosion 8 . In this study, the red band (b4) and infra-red band (b5) from Landsat 8 data were used for the computation of NDVI as follows 8 : The value of NDVI ranges from -1 to 1, where values < 0.2 indicates non-vegetation and > 0.2 denotes vegetation presence. The NDVI of the study area ranges between 0.15 and − 0.55. The wearing down of bare lithological structures also impacts GE 17 . Table 1 and Fig. 10m provide information of the lithological units existing in the Bastam sedimentary plain. LU/LC is also an important factor considered for GESM 5 . Six types of LU/LC are witnessed in the Bastam sedimentary plain.

Evaluation of multi-collinearity.
It is vital to assess the dependency among the GSCFs before employing these for GESM as the presence of any correlation would impact the consistency and understanding of model outcome 11 . There are numerous techniques including Pearson correlation, variance inflation factors (VIF), ridge regression, the least absolute shrinkage and selection operator (LASSO), conditional index, elastic net, tolerance (tol), and jack-knife tests using which multi-collinearity is evaluated. However, commonly, all multi-collinearity evaluation technique would estimate the dependence between the predictor factors 63 . In this study, we adopted VIF and tol approach for assessing the linear dependency among the GECFs. The expressions of VIF and tol are as follows: where r 2 i is attained by reversing all remaining variables in a multivariate regression 11 . Since there has been no approved values of VIF and tol for denoting the collinearity among predictor variables, commonly established values: tol ≤ 0.1 and VIF ≥ 5 indicates that there is dependency among the independent variables 11 .
Credal decision tree (CDT). Abellan and Moral (2003) 77 introduced CDT for n classification issues through the application of credal sets 78 . It utilizes a unique partitioning condition which was created with the help of uncertainty computation along with inexact possibilities 78 . To circumvent the intricate decision tree (DT) generation while constructing CDT, an innovative idea was developed, which administered to suspend the categorization process from growing the cumulative uncertainty owing to the consequence of DT branching 78 . A modernized approach was developed with the help of the Dempster and Shafe theory, which is utilized for the quantification of overall uncertainty from credal sets 79 . The aforementioned approach is expressed as follows: where, n denotes a credal set; CU signifies the complete uncertainty value; and NC and RC are functions that refers to the common non-specificity and common randomness, respectively. The creators of CDT obtained series of outcomes and successes compared to CU measurement, and furthermore, the computation method of CU and its attributes are explained orderly in related sources 79 . The inexact possibility method 78 was selected to investigate the possibility of interims of discrete variables 79 . Assuming 'W' as a variable whose values are denoted with the help of wj, and the identical possibility order p(wj) meets the following expression 79 : where, m wj refers to the total number of incidence (W = w j ); M represents the sample size and h denotes the hyperparameter (value: 1 or 2) 79 .
Bagging (BA). The BA, also popularly known as bootstrap aggregating, enhances the predictive capabilities of MLAs 80 . Recent studies show that BA has been successfully employed for precise forecasting of susceptibility to various natural hazards 80 . Even a minute variation in the calibration data could create a great difference in the model outcome 80 . BA involves the following stages: (a) arbitrary and independently choosing data from calibration dataset; (b) formation of several classifier models (CMs) with the help of subgroup datasets and (c) www.nature.com/scientificreports/ model generation through the accumulation of every single CMs 81 . Integrating the rule of base classifiers has been confirmed to have a distinguished impact on BA predicting capability 81 . Assume C (ai, bi) as a subset of calibration data which is arbitrarily chosen repetitively from a Calibration dataset (ai, bi), where ai represents gully presence and bi refers to gully absence. Multiple CMs are generated based on all subset where Vi(a) represents the created CM. Then finally, every individual classifier (Fi) is combined to form the model outcome (F′). The final prediction of F′ is performed based on the following expression 81 . Dagging (DA). The DA is widely used as an ensemble method that is frequently employed for the creation of meta-classifiers 82 . There are numerous variations between DA and other techniques such as boosting and BA, where boosting flexibly alters the calibration dataset according to distribution while the BA adjust the calibration dataset speculatively and raises bases according to the efficiency of all classifiers as a weight for choosing 82 . In DA, the prediction of a model is carried out based on the top vote 82 . The algorithm utilizes the maximum vote concept for integrating several classifiers in order to enhance the forecast preciseness of the base classifier. DA can be employed in case of base classifiers that are a worst case in timely performance 82 . Rotation forest (RF). The RF is an established integration method which aids weak classifiers in performing better 1,31 . It was introduced by Rodríguez et al. 83 . It is employed in advancing the variation and precision of base classifiers according to the feature transformation 83 . Random forest algorithm serves as the base for the development of RF, still, RF has the improved capability in handling both multi-dimensional and small dataset 83 . The classification possibility of RF algorithm is assessed with the help of the following expressions 83 : where, a refers to a classification sample; D represents common groups; l indicates the overall quantity of base classifiers and S b j specifies the rotation matrix.
Alternative decision tree (ADTree). ADTree was proposed by Freund and Mason (1999) and is by far the highly effective decision tree model which is rooted upon the principle of boosting and is widely applied for modelling purposes 19 . ADT was hardly employed for GESM in previous studies. It provides good accuracy and consistency for categorization and forecast issues 19 . ADTree comprises of two nodes, namely forecast nodes and judgement nodes 19 . The components of a calibration dataset are partitioned into forecast nodes through separation tests, and the equivalent extrapolative values of forecast nodes are acquired. Moreover, through the repetitive estimation, producing and clipping, the ADTree meta-classifier is created that has the affirmative capability to handle intricate and large datasets. The following expression defines the partition testing of forecast node 19 : where, V + (b) and V − (b) refers to the complete weight of the calibration data which fulfils the circumstance of c; V′ denotes the overall weight of the dataset which does not fit for the forecast node, and c represents partition testing. The optimal partition testing is attained by determining the least value of T. The appropriate repetitive split test is assessed based on a top to bottom approach in ADTree, and the pruning method applied in this approach is given as follows 19 : where, Tpure refers to the lowest threshold of T that is employed for pruning the estimation of few forecast nodes.
Relative importance assessment of GECFs using random forest. Random forest is a popular nonparametric MLA which comprises a horde of classification and regression trees 61 . Several studies have employed random forest for the evaluation of the significance of predictive variables 84 . RF competently handles vagueness and unknown data and has the exceptional operational ability even with massive and extremely complex datasets 84 . RF comprises two major internal stages. Firstly, it builds several bootstrap samples that are considered to be calibration sets and then constructs classification rules for every tree. In this process, a few datasets that were not employed are leftover; these are known as out-of-bag trials (OOB). OOBs are used to evaluate the inaccuracies in the categorization and to approximate the precision of the prediction 61 .
Validation measures. Evaluation of the prediction exactness of a model is essential for concluding the technical importance of an investigation 85 . In this study, both training and testing data of GIM is utilized for the cross-checking of the model outcome 1,39 . There are two types of validation metrics, i.e. cut-off-independent and dependent 86 . The computation of validation metrics stated above is executed with the help of contingency table www.nature.com/scientificreports/ which comprises of four components namely TP (true positive), TN (true negative), FN (false negative), and FP (false positive) 87 . Apart from these measures, SCAI has also been employed in this study to assess the prediction accurateness of the calibrated model.

Cut-off-independent metrics.
The AUROC curve is an extensively utilized metric in various branches of science for accuracy and efficacy evaluation of predictive model outcomes 88,89 . It plots the sensitivity on the Y-axis and 1-specificity on the X-axis 90 . The value of AUROC varies between 0 and 1, where the value equivalent to unity signifies perfect predictive capability 87 . In this research, assessment of success rate curve (SRC) and prediction rate curve (PRC) were carried out using the calibration and testing data of GIM, where the former is employed to estimate the learning ability of the algorithm whereas the latter is applied to determine the forecast capability 90 . The only difference between PRC and SRC is that testing data is replaced with calibration data in PRC 89 .
Cut-off-dependent metrics. The measures such as accuracy, sensitivity, specificity, F-score, odd ratio and Cohen Kappa belongs to the cut-off dependent approach 89 . The sensitivity refers to the possibility of predicting the gullies precisely as witnessed in actuality, whereas the specificity targets to approximate the likelihood of predicting non-gullies as perceived in actuality 20 . The accuracy represents the efficacy of the model as it reveals the complete success of the forecast model. The F-score is defined as the harmonic average of precision and recall. The values of F-score varies between 0 and 1 where value near 1 represents high precision and recall. Odd ratio estimates the chances that an outcome will appear provided a selective display, related to the chances of the outcome happening in the nonexistence of that display 30 . Cohen's Kappa tests the robustness of the model and aids the modeller to completely comprehend the actual model outcome 32 . These cut-off-dependent approaches were utilized for assessing both the training as well as the testing performance of the models used in this study.
The following expressions are employed for the computation of cut-off-dependent metrics 20 : Seed cell area index (SCAI). Süzen and Doyuran 91 introduced the SCAI method which is known as the proportion between the total amount of pixels of the particular GES category and the total amount of pixels of prevailing gullies in that particular GES category 86 . Numerous studies have employed SCAI for assessing the performance of the forecast models 20 . The very high value of SCAI for very high susceptibility class and low value of SCAI for low susceptibility class indicates a perfect model and any contrary outcome of this values denotes the poor predictive performance of the model.

Statistical measures.
The RMSE is employed in this study for the validating the model's calibration as well as testing performance. The RMSE of 0.7 and below indicates better predictive ability while a value greater than 0.7 signifies the poor predictive performance of the model 20,32 . The RMSE is assessed using the following expression: where, Vp refers to the value present in calibration or testing data; Va represents the forecast values produced for the GESMs and z indicates the total number of calibration or testing data.