Groundwater recharge potential zonation using an ensemble of machine learning and bivariate statistical models

Many regions in Iran are currently experience water crisis, largely driven by frequent droughts and expanding agricultural land combined with over abstraction of groundwater. Therefore, it is extremely important to identify potential groundwater recharge (GWR) zones to help in prevent water scarcity. The key objective of this research is to applying different scenarios for GWR potential mapping by means of a classifier ensemble approach, namely a combination of Maximum Entropy (ME) and Frequency Ratio (FR) models in a semi-arid mountainous, Marboreh Watershed of Iran. To consider the ensemble effect of these models, 15 input layers were generated and used in two models and then the models were combined in seven scenarios. According to marginal response curves (MRCs) and the Jackknife technique, quaternary formations (Qft1 and Qft2) of lithology, sandy-clay-loam (Sa. Cl. L) class of soil, 0–4% class of slope, and agriculture & rangeland classes of land use, offered the highest percolation potential. Results of the FR model showed that the highest weight belonged to Qft1 rocks and Sa. Cl. L textures. Seven scenarios were used for GWR potential maps by different ensembles based on basic mathematical operations. Correctly Classified Instances (CCI), and the AUC indices were applied to validate model predictions. The validation indices showed that scenarios 5 had the best performance. The combination of models by different ensemble scenarios enhances the efficiency of these models. This study serves as a basis for future investigations and provides useful information for prediction of sites with groundwater recharge potential through combination of state-of-the-art statistical and machine learning models. The proposed ensemble model reduced the machine learning and statistical models’ limitations gaps and promoted the accuracy of the model where combining, especially for data-scarce areas. The results of present study can be used for the GWR potential mapping, land use planning, and groundwater development plans.

www.nature.com/scientificreports/ the groundwater dependency has been largely increased over the last decades. Specifically, the machine learning (ME) and a bivariate statistical method (FR) were used for assess ensemble of spatial prediction of GWR potential scenarios. Whereas, Jackknife test as a simple but effective method utilized for screening important casual factors based on the random sampling in training dataset. The performance of spatial prediction models was compared by area under curve (AUC) and Correctly Classified Instances (CCI).

Methodology
Study area. Marboreh Watershed is located between 33 o 12′ to 33 o 51′ N and 49 o 03′ to 49 o 57′ E in the west of Iran, Lorestan Province (Fig. 1). The total area of Marboreh Watershed is about 2560 km 2 with a cold semi-arid climate. The annual rainfall decreases from south (900 mm) towards eastern (280 mm) parts of the watershed with the average annual rainfall and the average annual temperature of 412 mm and 13.8 ℃, respectively (LPMO (Lorestan Province Meteorological Organization.)). Rainfall is not uniformly distributed through the year, with the majority occurring in March and April. In the summer time, the rainfall mostly occurs once or twice nearly every month. Topographically, the region is a part of Zagros Chain with altitude range from 1442 to 4056 m, with average altitude of 2749 m. The main river, Marboreh, originates in the Aligudarz Mountains at the east of watershed, making its 110.22 km journey westwards along the entire length study area. Smaller tributaries contributing to the river discharge such as Azna, Aligudarz, and Kamandan. Finally, Marboreh River joins to Tireh River and form the Sezar, which is one of the main branches of Dez basin that flows in Karun River, the longest river in Iran which finally drains into the Persian Gulf. Geologically, the watershed lies within Zagros Fold and 29 types of geological formation cover this watershed, and the dominate land use is agriculture (49.5%), followed by rangeland (42%). The flowchart shown in Fig. 2, illustrate the method employed in this study and briefly given in the following sections: Permeability samples inventory map. The key link between surface water and groundwater is groundwater recharge 4 and quantifying the Groundwater recharge is difficult at large scale. On the other hand, the gross groundwater recharge rate may be depending on the permeability and soil's absorption, which make it easier to measure 52 . In order to evaluate the performance of the models in GWR potential mapping, double-ring infiltrometer method and soil sampling (Fig. 3) were used for understanding the spatial variability of GWR 37,53 . The double-ring infiltrometer consist of an inner and outer ring that are driven 5-cm into the subsurface ground. The inner and the outer rings were filled with water. The head is kept constant during the experiments. The procedure continue until the infiltration rate is considered steady and set up by the Sangab Zagros Consulting Engineering Company 22 standard. Due to the extent of the studied area and the cost of the sampling process, information from previous studies 22 in the studied area were also used. The percolation points were randomly split into two datasets: the first group for model training (70%) and to validate the training dataset, a separate independent set of data (30%) to evaluate modelled and observed maps. GWR  www.nature.com/scientificreports/ availability, 15 effective factors on GWR potential, namely elevation, slope percent, slope aspect, profile curvature, plan curvature, drainage density, distance from rivers, topographic wetness index (TWI), stream power index (SPI), Rainfall, lithology, fault distance, land use (LU), Normalized Difference Vegetation Index (NDVI), and soil texture were used. Out of these selected thematic maps, elevation, slope degree, slope aspect, drainage density, TWI and SPI were generated from ASTER DEM data with a pixel size of 30 × 30-m, while the remaining maps were extracted from Landsat-8 images and through conventional data, such as soil and geology data.

GWR influencing factors.
Topographic influencing factors. Elevation play in key role on vegetation and climate that have connection with recharge distribution area. Here, the elevation ranges from 1442 to 4056 m, and the hypsometric map extracted from DEM and classified into five classes (Fig. 4a). In groundwater recharge investigations, slope angel is a major determinant factor and highlights the importance of topography and the size of runoff-saturated zones 56,57 . The regions of steep slope facilities high runoff, whereas at the lower slope gradients, runoff generation reduced and eventually increase infiltration rate and recharge the saturated zone 58 . The slope map extracted from DEM and was classified into 5 classes (Fig. 4b). Slope aspect influences GWR as indicator for solar radiation which strongly influences infiltration rate 54 . Slope aspect is the orientation of slope, measured clockwise in degrees from 0 to 360. This factor was derived from the DEM in ArcGIS10.6 (https ://www.arcgi s.com) and classified into five main categories (Fig. 4c). Flow distribution on the surface depends on topography, which is represented by profile curvature. Profile curvature influences on the overland flow velocity, whereas plan curvature mainly affects flow convergence and divergence ( Fig. 4d-e).
Hydrological influencing factors. Drainage density is another effective factor for controlling GWR. The drainage density demonstrates the signature of surface to subsurface formation 59 and it implies indirectly soil characteristics, which is assigned soil infiltration capacity 55 . Drainage density was extracted from DEM with help of ArcHydro Tool of ArcGIS10.6. Then the extracted drainage networks (km -1 ) were overlaid on vector digitized stream map of LPRWA (Lorestan Province Regional Water Authority.) for improved drainage direction map and the area was categorized into five classes (Fig. 4f). The low drainage density indicates low surface runoff and hence high infiltration rate 57 and consequently a high GWR potential. Rivers are one of the sources of GWR, and thus influence GWR potential in a watershed. The Euclidean distance tool from spatial analyst tools in Arc-GIS10.6 was used to generate distance categories. The distance map was reclassified into five classes (Fig. 4g). Topographic wetness index (TWI) and stream power index (SPI) are as secondary topographic indices. In the GWR potential zone, one of the important secondary topographic factors was TWI. TWI was proposed by Moore et al. (1991) to indicate local groundwater potential. TWI relies on upslope area which highlights the potential exfiltration groundwater by topography effects; the lower TWI, the greater GWR potential in the specific class. The TWI of the region ranges from 0.891 to 22.553 and the TWI was further classified in five classes (Fig. 4h). SPI index describes stream power and its value ranges from 6.257 to 26.100. Five categories of SPI values were generated based on the natural break method as (< 9.04), (9.04-10.75), (10.75-12.46), (12.46-15.02), and (> 15.02). (Fig. 4i). The mean annual rainfall was used to determine the groundwater recharge investigations of www.nature.com/scientificreports/ the Marboreh watershed. The mean annual rainfall over the study area was interpolated based on the Kriging method from 12 gauging station. The mean annual rainfall ranges from 280 to 900 mm ( Fig. 4o.).
Geological influencing factors. The geological factors generally demonstrate the distribution of various rock units, which have a notable effect on GWR and water availability through hydraulic conductivity 60 . In this study, the lithological map was extracted from the national geological map of Iran at a 1:100,000-scale, and encompassed 26 major lithological groups (Fig. 4j). Fault map (Fig. 4k) was extracted from the national geological map of Iran at a 1:100,000-scale. This layer was used to infer groundwater storage and in regional scale, ground water flow direction is controlled by fault systems.
Ecological influencing factors. Land use (LU), NDVI, and soil are ecological parameters, which are commonly used factors for GWR potential zones 27,37,54,55 . The LU types can be significantly effective on runoff, permeability, www.nature.com/scientificreports/  www.nature.com/scientificreports/ and evapotranspiration. As stated by Gee et al 61 , recharge in vegetated areas is much lower than non-vegetated areas. In addition, recharge is greater in agricultural lands and grass lands than perennial lands including, shrub and forest areas 62 . The Landsat8 data (downloaded from https ://www.earth explo rerus gs.gov; column 165 and row 37, 17 June 2018) were used to determine the types of LU. Also, LU was classified by a supervised classification technique according to the maximum likelihood approach into 8 classes, namely; agriculture, bare land, forest, orchard, rangeland, rocky areas, urban, and water. LU within the study area is dominated by agriculture (49.6%) and rangeland (43.4%) (Fig. 4l). The vegetation density and coverage were represented using the Normalized Difference Vegetation Index (NDVI) map. The NDVI layer was prepared in ENVI 5.3. The value range of NDVI is -1 to 1, and higher value of NDVI represents dense vegetation. NDVI values were grouped based on the natural break method into five classes, namely (< 0.11), (0.11-0.17), (0.17-0.23), (0.23-0.33), and (> 0.33) (Fig. 4m). Soil texture is another effective factor used to specify sites suitable for recharge. Soil type is a major criterion in groundwater recharge and agricultural production. Soil texture provides essential information on infiltration rate 29 . For example, sandy soils have high permeability, which consequently might cause in reduced GWR. The soil map was obtained from Agricultural Research, Education, and Promotion Organization of Lorestan Province in 1:250,000-scale. The most prominent soil type in region were Loamy-sandy and Silty-loamy which cover 15.2% and 14.3%, respectively of the study area. The soil texture map of the study area was classified into 9 soil texture classes (Fig. 4n).

Models.
Maximum entropy (ME) model. We applied a family of machine learning techniques based on multinomial logistic regression know as maximum entropy (ME) to yield the potential of GWR from ground base observation and the condition factors. Jaynes introduced ME technique in 1957 based on probability distribution of maximum entropy to derive the probability of given number of individuals occurrence across the any given area based on a set of condition factors. The MaxEnt software (version 3.3.3 k), which is based on the ME approach, is conducted for GWR potential mapping. Groundwater potential mapping 63,64 , flood 30 , landslide 65,66 , rangeland 67,68 , earthquake 69 , species distribution 70 , springs and wells 71 , and groundwater quality 72 .
Frequency ratio (FR) model. The FR model, as the probability of incidence for a special attribute, is a simple geospatial assessment tool and is used to compute the eventual relation between GRW potential and GWR´s effective factors 30 . In specifying the frequency ratio, the recharge occurrence ratio in each conditioning factors sub-class is obtained toward the total recharge. Then, each class's surface ratio is computed compared to the total area of the watershed.
The FR values were identified using Eq. (1) for each sub-class of GWR potential effective factors based on their correlation with GWR potential inventory: where α is the number of recharge pixels in each subclass, β is the total number of pixels in the area, γ is the total number of recharge pixels of the entire area. ɘ indicates the number of pixels in every subclass of conditioning factors, θ represents the percentage of recharge occurrences in any sub-class of conditioning factors, and ɷ is the relative percentage of the area of each subclass.
Ensemble modeling. In data analytics and predictive modeling, an individual model based on one data sample could have great variability, a large number of inaccuracies, or extensive biases, which affect the reliability of results 73 (Rokach, 2010). The effects of these restrictions can be reduced by analyzing multiple samples or combining various models which can help provide better information to decision makers and improve modeling algorithms 39,73 . In the current study, we integrated the ME and FR models in various scenarios using basic mathematical operations ( Validation of the built models. The accuracy of predictive models should be analyzed by comparing the generated data with a training and testing dataset (existing GWR percolation points). To assess the performance of individual and ensembles models, we utilized the area under the Receiving Operator Curve (AUC-ROC) 74,75 and the Correctly Classified Instances (CCI) 76,77 . The AUC value ranges from 0 to 1, in which the higher the value, the greater perfect discrimination 37 . The CCI derived from the corresponding confusion matrix that calculated true positive (TP), true negative (TN), false positive (FP), false negative (FN), with a range from 1 to 100. The greater CCI value, the more accurate prediction 78,79 .

Result
Conditioning factors importance. The Jackknife test was implemented during MaxEnt model building to identify conditioning factors contribution (Fig. 5). According to Fig. 5, soil provided the most critical conditioning factor in GWR potential prediction, and lithology was second important having high training gain contained most unique information when they used independently. Also, DEM and rain conditioning factors provided high gain when used independently. While, distance from fault, drainage density, distance from river, slope and NDVI had moderate gains respectively, when used alone. Furthermore, soil and lithology reduced training gain the most when was excluded from the model, thus had the most information that were not present in other conditioning factors.
Application of MaxEnt. Firstly, we examined the response of GWR potential to effective factors. Response curves identified the quantitative relationship between the logistic probability of the GWR potential and effective factors. MRCs include curves representing the separate effect of each independent factor on the dependent variable (i.e. GWR potential). According to these graphs (Fig. 6), the highest probability of percolation was seen at elevations between 1810 and 1820 m, with a probability of 77% (Fig. 6a). These conditions are in accordance with the inverse relationship between elevation and percolation since percolation level is higher at the lowest points or in the plains, and decreases with increasing altitude. Due to the altitude range of the area, this elevation www.nature.com/scientificreports/ class has the most suitable conditions for GWR. Slope and aspect are also among the significant considerations in site selection for groundwater recharge. Steep slopes concentrate water on the lower slopes and contribute to the buildup of hydrostatic pressure 80 . Runoff on lower slopes has a better opportunity to concentrate and percolate. In Marboreh watershed, the slope layer was classified into five class and the highest probability (67%) was observed for the 1st class, which ranges from 0 to 4% (Fig. 6b). This slope range comprises the suitable slopes for recharging. Aspect does not directly affect runoff. Its role is in determining the rate of runoff generation due to the difference in microclimate on the different slopes. The highest probability of percolation (66%) was recorded for the southwest aspects (Fig. 6c). In order to determine the geomorphometric characteristics of shapes, the second derivative is used in digital elevation (curvature) model. This characteristic is associated with geomorphologic processes and has two distinct types of curvature with vertical properties that are called plan and profile curvature. Areas with concave morphology have the most potential for percolation. Concave morphology can concentrate water and moisture and create suitable areas for GWR 81 . Permeability will be higher in areas with lower drainage densities because drainage by conduction of water on the ground prevents its penetration into the depths and thus reduces underground recharge 82 . The range for the drainage density factor at the studied watershed was between 0 and 1.39 km. The results showed that most places with recharge potential had drainage densities (with 67% probability) between 0.56 and 0.58 km (Fig. 6f). Results indicate that the probability of percolation in areas close to rivers was greater. The range of distance from rivers spanned 0-5.36 km (Fig. 6g). For the variable distance from river, the curve represents a steep decline with increasing distance, indicating that the more percolation area located near the river. In this study, the highest probability of percolation was observed at a distance of 0.52 km from rivers, with 0.67 percent. The curve shows a steep decline with increasing precipitation, indicating that this species likely prefers drier areas.
The TWI index is mainly used for quantitative topographic assessment of hydrological processes as well as showing the effect of topography on the position and size of groundwater flow, soil moisture, and saturated sources of runoff production. The higher index resulted in higher recharge potential (Fig. 6h). The stream power index, assuming the proportion of drainage to the surface area of the watershed, is a topographical combination feature. In this study, the highest percolation probability (69%) was seen in areas with SPI values in the 4-6 range (Fig. 6i).
The lithology of the studied area includes 26 geomorphic units. Areas with Qft2 quaternary lithology had the highest percolation potential because these layers are more closely associated with foothills and alluvial fans, and have high absorbance and percolation (Fig. 6j). This type of lithological unit, covering 33.65% of the watershed, is the largest lithological section in the region. Faults, due to the creation of crushed areas and the formation of water conduits, are very important in the development of morphology and the formation of subsidence in the region 54,83 . Faults facilitate the penetration of water to lower levels and increase the potential for the formation of underground liquidation cavities. The range of distance from faults was 0-28.9. The highest probability of percolation was 0.7% (Fig. 6k).
Croplands and rangelands (constituting 93% of the watershed's land use) are known to be conducive to GWR due to presence of vegetation and permeable soil textures. According to the results, the highest percolation and recharge probability occurred in agricultural land and rangeland (Fig. 6l) where the soil crust of farmlands being broken by plowing and vegetation cover reducing the velocity of rain or surface water and providing more time for percolation. Cropland and rangelands, especially in areas with other suitable effective factors such as slope, elevation, and geology, show the highest recharge potential. Vegetation is a good sign of underground water and has a direct relationship with percolation potential; the greater the amount of vegetation in a region, the greater the permeability. NDVI was used to measure vegetation cover (ranging from 0.6 to 0.68). As depicted in Fig. 6m, the highest probability of recharge potential based on NDVI was found to be 70%. In Marboreh watershed, nine soil texture classes were identified. The sandy clay loam (Sa. Cl. L) class has the highest probability of percolation with 57% (Fig. 6n). Soil texture greatly influences water percolation, permeability, and water-holding capacity. The Sa. Cl. L class, due to its combination of coarse and fine grains, is one of the most permeable textures. According to the response curve of annual rainfall, the probability of GWRP occurrence (78%) increased in area with 600 mm and decreased smoothly after that (Fig. 6o). Application of FR model. The outcomes of FR model calculated for each class of 15 effective factors based on their relationship with GWR potential locations are presented in Fig. 7, and a larger FR value means a higher probability on GWR potential of the corresponding effective factor. The ratio of the area with GWR potential to the entire area was calculated. In terms of elevation, high potential occurs mainly in the elevation range between 1442 and 1964.8 m. In the case of slope aspect, the FR weights were the highest for the north area (2.26) and flat area (1.19), while the West aspects have the lowest value of 0.01. For slope degree, the values of FR were decreasing through to increasing the slope degree. According to the FR, weights of profile curvature, convex, flat, and concave are 1.53, 0.77, and 0.64, respectively. For plan curvature, concave class had the highest GWR potential. In term of the drainage density, the class 0.28-0.56 had the greatest FR value of 1.96, while the class 1.11-1.39 had the lowest value of 0.24. Among various distance in the river network, the area closes to river had the most correlation with GWR potential, while the greatest distance from river had no GWR potential. TWI with higher values have larger weights of FR., while SPI with higher values have smaller weights of FR. The relationship between GWR potentials and the annual rainfall show that the high annual rainfalls have a high frequency of GWR potentials. The highest FR value (1.99) was observed for class > 622.5 mm.
Lithology FR conditioning factor was the highest in Qft1 (9.39) and has the highest FR value than other conditioning factors. Furthermore, Qft2 also had a high FR value of 2.82. In the study area, 0.40% and 33.7% of the area were covered by Qft1 and Qft2, respectively. Result showed that among the fault distance classes, the second class (5.78-11.56) and the third class ( www.nature.com/scientificreports/ impact on GWR potential in Marboreh watershed. The results obtained in the present study showed that among the land use types, agricultural land (1.32) and range land (0.78) had the highest probability of GWR potential. According to the NDVI, the greater NDVI value had the higher potential for GWR. Another factor affecting GWR potential is the soil texture. The result revealed that the Sa. Cl. L soil texture weighing 4.66 had the highest probability on GWR potential and might indicate the high possibility of GWR potential in this thematic layer.
Groundwater recharge mapping and validation of the built models. The Maxent model performed well for predicting the GWR potential with the given set of training and test data with overall of 0.993 for AUC and 86.5 for CCI in testing data ( Table 1). The GWR probability index was divided into four potential classes of low, moderate, high, and very high using a natural break method as seen in Fig. 8a. The model predicted approximately 7.26% of the studied region included very high GWR potential area ( Table 2). The GWR with high and moderate potential encompass 8.59 and 29.45 percent of the studied area (Table 2). Through the AUC and CCI techniques the obtained map using the FR was evaluated. In the present study, the AUC value www.nature.com/scientificreports/ show 0.975 which revealed an "excellent" predictive accuracy in the model prediction. Also, the validation using CCI in total was 61.50% which has a satisfactory prediction. Ultimately, the map was generated and classified into four potential classes by natural break classification method, i.e., low, moderate, high and very high (Fig. 8b). Around 2.92% of the studied area falls into the very high GWR potential zone, while 36.63% of the land belongs to the low GWR potential category (Table 2). Based on the FR model, most parts of the studied area (45.20%) have a moderate GWR potential.
Application of hybrid model. The ensemble method, where a set of simple mathematic scenarios represented in Table 1 combining different base models to get better predictive performance compared to an individual model. Thus, to provide the GWR potential map with the ensemble model, the weights obtained from FR model and ME model were integrated to acquire a better GWR potential classifier. Seven ensemble classifier models were identified for GWR potential zones. AUC and CCI were considered as measures of models' effectiveness. Table 1 showed AUC and CCI measures for seven classification models assessed. Compared with individual models (scenario 8 and scenario 9), it is relatively obvious that the proposed ensemble models outperformed with respect to AUC. CCI is also applied to assess the performance of the models, for which our results exhibited a satisfactory performance for GWR potential mapping. For the different scenarios, AUC and CCI ranged from 0.984 to 0.990 and 54.6 to 90.9, respectively (Table 1). Effective classification of ensemble models in comparative to individual models indicated that the best ensemble classifier, scenario 5, is characterized by better validation measures in both training datasets and testing datasets. In addition, the results of both individual models clearly indicated satisfactory output for GWR prediction, while the FR model attained slightly lower predictive performance than ME model. The best ensemble model in terms of performance was scenario 5. Then, GWR potential map of scenario 5 was developed using ArcGIS10.6 software. Each pixel has a value that represents the occurrence of GWR potential. Consequently, the map was divided into four classes of low, moderate, high and very high based on natural break method (Fig. 9). The area of each class in scenario 5, was represented in Table 2. Based on the scenario 5, most parts of the studied area had a low GWR potential (73.79%). A very high GWR potential is only found in 3.01% studied area.

Discussions
Nowadays, groundwater depletion became a critical global problem with serious consequences for sustainability of water supplies. Knowledge of spatial ground water recharge potential zones has placed considerable emphasis to addressing effectively planning and better managing water resources. Correspondingly, there has been growing numbers of research addressing zoning of groundwater recharge potential in recent years 19,27,29,36,37,54,55 . Although  www.nature.com/scientificreports/ there still remains much work to understanding the effect of multiple drivers on the distribution and pattern of GWR potential. Population growth along with increased development led to heavy withdrawals from the plain aquifer system of the Lorestan province and subsequently decline in water table. Mountainous areas play a critical role in delivering water flow to the lowlands; therefore, it is extremely important to understand the GWR potential zones. Depending on hydrogeology, climate, physiography, and groundwater consumption, the volume of stored groundwater may vary considerably in different places 19,29,84,85 . While physical models are promising to quantitative estimation of natural recharge rates to aquifer system, they often require large hydrogeological data which are not easily measurable in the field 86 and require numerically intensive simulation of the behavior. Furthermore, in-depth physical knowledge of the relevant hydrological process is requiring when developing physical based models (Kim et al., 2015). Moreover, many developing countries suffer from lack of information, accordingly susceptibility and potential mapping with help of statistical methods and machine learning methods which requiring few parameters without an explicit need about the underlying physical process becomes a useful tool to subdivide area into regions with significant GWR potential zone. Nevertheless, significant improvements in model prediction and decreased the prediction uncertainty through an ensemble approaches were recently published by researchers 32,41,44,46 . The current state of scientific researches on disaster susceptibility and natural potential by an ensemble models are in the early stage of development. In our study, GWR potential zones were investigated in Marboreh watershed in Lorestan province, Iran, using a combination of the machine learning (ME) and a bivariate statistical method (FR), into one predictive model in order to improve predictions. These two individual methods are commonly used by researchers in the field of Groundwater potential mapping 41,43 , landslide 47,48 , flood 44,45 . In order to make clearly accurate evaluations of the ensemble methods, FR and ME were also performed independently. The result showed that the ensemble method in scenario 5 performed "excellent' with AUC and CCI values of 0.990 and 0.907, respectively. This supports the previous findings from Tehrany et al 74 ; Razavi-Termeh et al 28 and Di Napoli et al 87 that ensemble classifiers allow more accurate prediction than classical models and alternative to the individual models in susceptibility and potential mapping. A research carried out in Qued Guenniche in the north of Tunisia to produce a GWR potential map by Chenini and Msaddek 36 . The authors found that the bivariate and multivariate statistical approaches provide more accurate results than the AHP. Pourghasemi et al 37 compared a number of machine learning methods to investigate groundwater recharge potential zones in Firuzkuh County, Iran. The results showed that SVM and MARS outperformed RF in terms of accuracy based on the data from 2000 double ring infiltrometer.
Here, we also address importance effective factors of GWR potential. Precisely, to quantify effects of topographical, hydrological, geological and ecological factors on GWR potential using data from 850 observations. Jackknife test was applied to evaluate the relative importance of each causal factors. The result indicated that soil had the highest training gain. This is followed by lithology, aspect and DEM. Our findings are in agree with pervious researches that highlight the influence of the soil, lithology, elevation, aspect and drainage density in GWR potential identification 29,55 . Similar results were found by Pourghasemi et al 37 in the Firuzkuh County, Iran, who noted that the mean annual rainfall, drainage density, elevation and slope angel are dominate factors that influence groundwater recharge. Rainfall data have been widely used in understanding the GWR potential zones 21  www.nature.com/scientificreports/ the high effective GWR potential in gentle slope. In contrast, the least GWR potential can be found in higher drainage density zones 25,54 . Additionally, high altitude along with steep slope areas could identified poor GWR potential. Furthermore, based on Jackknife test, SPI, plan and profile curvatures have no particular impact on GWR potential in studied area. Moreover, the final GWR potential maps indicated that the areas with high percolation potential are mainly distributed along the south and southwestern parts of the study area.

Conclusion
Marboreh Watershed continue to an experience water scarcity, largely driven by frequent droughts and expanding agricultural land combined with over abstraction of groundwater. Therefore, there is a need to identify potential groundwater recharge zones to help prevent water scarcity. In this study to address water scarcity issue we discuss the combination of the predictions achieved by FR and MA models in the ensemble in order to develop a GWR potential zones for the Marboreh watershed in Lorestan province, Iran. Moreover, the spatial mapping for managed aquifer recharge based on topographic factors, hydrological factors, geological factors and ecological factors and assessing their likely contribution on GWR is explored. The results of the research reveal that evaluating the individual's models using the AUC and CCI indices reach relatively high performance. The ensemble classifier with scenario 5 may achieves more accurate result than other ensemble models and individual models with AUC and CCI values of 0.990 and 0.907, respectively. Additionally, GWR potential areas were found to vary largely based on soil, lithology, aspect and elevation. The results of the study revealed that about 11.85% of the Marboreh watershed was found to have a very high to high recharge potential and 73.79% falls in the low zone class. It has been generally recognized that, the accuracy of the outputs of interest is dependent mainly on the input uncertainty. There are large uncertainties in soil spatial information, insufficient knowledge of underlying groundwater system and inadequate of data measurements. Additionally, higher uncertainties are associated with aquifer characteristics and recharge process needs to be address of further studies. Here, since there were no available data for hydraulic conductivity, double-ring infiltrometer method was used to identify varying groundwater recharge potential. Thus, priority needs for validate model predictions in area with monitoring wells and resistivity imaging of water movement in vadose zone.
The results of this study could be used in the planning, management, and control of surface runoff in highdischarge events. It is recommended to construct aquifer recharge facilities in areas with "high" and "very high" percolation potential. Using these findings, it is possible to obtain higher quantities of surface water and recharge aquifers, as well as managing water deficit. The studied area heavily relies on the abstraction of groundwater to sustain irrigated agriculture. Consequently, identifying the groundwater recharge zones is promising, especially for future droughts, but requires more scientific support and supervision to succeed. The studied area is suffering from a combination of water crisis and groundwater pollution. Therefore, identifying zone with a high GWR potential is critical to enhance water storage and taking acting by local managers to define conservation priorities areas to reduce nutrient losses and enhances aquifer recharge.