Normalized difference vegetation index as the dominant predicting factor of groundwater recharge in phreatic aquifers: case studies across Iran

The estimation of long-term groundwater recharge rate (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${GW}_{r}$$\end{document}GWr) is a pre-requisite for efficient management of groundwater resources, especially for arid and semi-arid regions. Precise estimation of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${GW}_{r}$$\end{document}GWr is probably the most difficult factor of all measurements in the evaluation of GW resources, particularly in semi-arid regions in which the recharge rate is typically small and/or regions with scarce hydrogeological data. The main objective of this study is to find and assess the predicting factors of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${GW}_{r}$$\end{document}GWr at an aquifer scale. For this purpose, 325 Iran’s phreatic aquifers (61% of Iran’s aquifers) were selected based on the data availability and the effect of eight predicting factors were assessed on \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${GW}_{r}$$\end{document}GWr estimation. The predicting factors considered include Normalized Difference Vegetation Index (NDVI), mean annual temperature (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T$$\end{document}T), the ratio of precipitation to potential evapotranspiration (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${P/ET}_{P}$$\end{document}P/ETP), drainage density (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${D}_{d}$$\end{document}Dd), mean annual specific discharge (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${Q}_{s}$$\end{document}Qs), Mean Slope (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S$$\end{document}S), Soil Moisture (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${SM}_{90}$$\end{document}SM90), and population density (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${Pop}_{d}$$\end{document}Popd). The local and global Moran’s I index, geographically weighted regression (GWR), and two-step cluster analysis served to support the spatial analysis of the results. The eight predicting factors considered are positively correlated to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${GW}_{r}$$\end{document}GWr and the NDVI has the greatest influence followed by the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P/{ET}_{P}$$\end{document}P/ETP and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${SM}_{90}$$\end{document}SM90. In the regression model, NDVI solely explained 71% of the variation in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${GW}_{r}$$\end{document}GWr, while other drivers have only a minor modification (3.6%). The results of this study provide new insight into the complex interrelationship between \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${GW}_{r}$$\end{document}GWr and vegetation density indicated by the NDVI. The findings of this study can help in better estimation of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${GW}_{r}$$\end{document}GWr especially for the phreatic aquifers that the hydrogeological ground-data requisite for establishing models are scarce.

www.nature.com/scientificreports/ GIS technology can provide a very effective means to map crops, due to their fast response, periodic observations, and low cost 44 .
In this study, we aimed to clarify and emphasize the explanatory power of long-term NDVI as a proxy or characterization of vegetation density for estimating GW r in phreatic aquifers. To reveal the importance of NDVI in estimation of GW r , a range of factors including climatic (precipitation, potential evapotranspiration, and temperature), hydrological (specific discharge), geomorphological (slope and drainage density), human (population density) and soil properties (soil moisture) are considered and their relations with GW r are analysed. A quantitative understanding of the extent of changes of surface vegetation and associated impacts on GW r is crucial. We use stepwise and geographically weighted regression (GWR) models along with the two-step cluster analysis to identify the main predicting factors of GW r in 325 Iran's phreatic aquifers. This synthesis is, to our knowledge, the first attempt globally to quantify the relative importance of predicting factors (especially vegetation) on GW r .

Materials and methods
Techniques used in this study for spatial analysis of the relationship between GW r and predicting factors considered for 325 Iran's phreatic aquifers are shown schematically in Fig. 1. They include developing stepwise regression, GWR model and cluster analysis to classify hydrologically distinct regions on the degree of impact of each driver on GW r estimation. Predicting factors for GW r considered in this study include: long-term (30-year during 1989-2019) NDVI as an explanatory proxy or measure of land cover factor; long-term (30-year) mean annual temperature ( T ) and the ratio of precipitation to potential evapotranspiration ( P/ET P ) as climatic factors; long-term (30-year) mean annual specific discharge ( Q s ) as a hydrologic factor; mean slope ( S ) and drainage density ( D d ) as geomorphological factors, soil moisture ( SM 90 ) as soil factor and population density ( Pop d ) as a proxy of urbanization effect.
Study areas and datasets. Iran's WRM Co 17 explored 535 unconsolidated aquifers across the country (494 phreatic aquifers and 41 phreatic-confined aquifers) based on geology and geophysics studies, exploration wells logs, type of sediments, and depth of bedrock investigations. The consolidated aquifers (e.g. karstic aquifers) generally located in the mountainous areas specially Zagros Mountain in west and southwest part of Iran www.nature.com/scientificreports/ and not considered in this study. Based on the hydrogeological data availability, 325 Iran's unconfined aquifers (61% of unconsolidated aquifers) were selected and consisted of our research areas as shown in Fig. 2. Moreover, the selected aquifers mostly (89.5%) located in an arid and semi-arid climate, and less in humid (5.3%) and Mediterranean (5.2%) as shown in Fig. 2.
Natural groundwater recharge estimation.  average values of natural recharge for 325 phreatic aquifers across Iran calculated have previously been calculated using a water balance equation by Iran's Water Resources Management Company 17 . This is considered as a response variable. For this purpose, a lumped water balance model was adopted by Iran's WRM Company to determine the long-term GW r for each aquifer. The total natural groundwater recharge to a phreatic aquifer using the general groundwater balance equation can be defined as 45 where GW r is total natural groundwater recharge from rainfall, river seepage, return flows from water used for irrigation, domestic, and commercial sectors, and groundwater inflow from other basins; D g is draft from groundwater by pumping wells, springs, and qanats;S r is Groundwater drainage into surface water (e.g. lake and streamflow); O g is groundwater outflow to other basins; and V is the change in groundwater storage. All components of the water balance equation are computed by Iran's WRM Co 17 using independent methods; involve errors due to uncertainties in method's data required and shortcomings of the techniques used. In many cases, the water balance equation does not balance. The discrepancy of the water balance equation arises from the errors in the calculation of the components and/or components which are not considered is known as residuals term. In Iran, the component of GW r is calculated as the unbalanced term (i.e. residual term) of Eq. 1. To reduce the error, long-term averages values of components are considered which generally have smaller errors of estimation than short term averages. According to previous studies (e.g. 46,47 ) the accuracy of water budgets decreases as shorter time frames are considered.
Calculating NDVI. We used remote sensing data acquired by the Landsat 5 TM and Landsat 7 ETM + satellites as its 30-m resolution can target a fine spatial configuration. This avoids the inclusion of potentially irrigated   , 660 Landsat images  with 16-day interval and 30-m resolution during 1989-2019 for each aquifer (totally, 214,500 images for 325  aquifers) were processed in Google Earth Engine platform. Following atmospheric correction, NDVI was calculated using Eq. (2) for all images using bands 3 and 4 in Landsat which have been calibrated to sense radiation in the visible ( Red ) and near-infrared ( NIR ) regions of the spectrum respectively 48 : We used mean, 10th and 90th percentiles of NDVI values for our analysis to assess the effect of low, average and high levels of vegetation coverage of aquifer surface, respectively, on GW r estimations. The low, and high vegetation coverage conditions denote vegetation during non-growing and growing season of crops, respectively 22, 49, 50 . Calculating other predicting factors of GW recharge. Potential evapotranspiration ( ET P ) over study aquifers were computed in monthly scale by Hargreaves-Samani Equation 51 . Hargreaves equation is one of the most precise and simplest empirical equations which is used to estimate ET P and relies on monthly minimum, maximum, and average temperature and extraterrestrial radiation ( R a ) [51][52][53][54] . This method is more accurate for arid and semi-arid regions and gives reliable results [55][56][57] .
The ratio of long-term annual average values of precipitation ( P ) over the aquifer area to the ET P gives the predicting factor of P/ET P . long-term mean annual specific discharge ( Q s ) was calculated using dividing longterm mean annual streamflow ( Q ) by the area of the aquifer ( A ). The Drainage Density ( D d ) over each study aquifer is obtained by dividing the total length of all the streams over the aquifer area by the area of the aquifer ( A ). The mean slope ( S ) was calculated using DEM and Slope tools in ArcGIS software 58 . Population density ( Pop d ) is calculated by dividing the total number of peoples living over the aquifer area by the aquifer area ( A ). The long-term soil moisture content in the upper layer (depth of 0-273 mm) of the vadose zone is obtained by SMAP satellite images with 3-day interval and 27-km resolution during 2015-2019, totally 186,875 images over the 325 study aquifer process in Google Earth Engine platform.
By adopting the above factors, we aimed to emphasize the role of surface-motivated predicting factors of GW r , especially NDVI. Considering other parameters (e.g. hydrogeological properties of aquifer) that may also be correlated to GW r are not the primary aim of this study. We believe that relating the GW r to the factors which may obtained by the remote sensing techniques (e.g. GEE platform) could be used as a preliminary tool for estimation of GW r magnitude, especially in the regions with scares ground-data pre-requisite for model establishing. where n is the total number of aquifers, x i and x j are the values of attribute feature of x at location i and j , w ij is the element of the space weight matrix, W in row i th and column jth, used to express the neighboring relationship of spatial regions at n location, and − x is the average of all observations for the attribute feature of x in n study areas. This index reflects only the differences in the spatial average. While local Moran's I ( I L ) examines the distribution pattern of individual attribute values distributed in a heterogeneous space and can measure the degree of local spatial correlation between each area and its surrounding areas 61, 62 : where z i and z j are the values normalized to regions i and j , and w ij is an element of the space weight matrix of W.
Stepwise regression model. The stepwise regression model (SRM) is a linear regression that filters independent variables (i.e. predicting factors) that have the most significant influence on the dependent variable ( GW r ) in a step by step way. When the given explanatory variables are no longer significant, the regression is culled. This process is repeated until all independent variables in the regression are significant 63 . Geographically weighted regression (GWR) model. To test the spatial non-stationarity between the most influential explanatory variables (predicting factors) on GW r identified by SRM, the GWR model is adopted 64 . The model outputs coefficients of correlation for all aquifers, which are then mapped and tested spatially against raw values to understand what is predicting the most sensitive local relationships 65 . This cartographic approach illustrates the spatial distribution of the sign, magnitude, and significance of the influence of each predicting factor on the dependent variable ( GW r ). The GWR model reflects the non-stationarity of parameters in different spaces and allows the relationships between variables to change with the spatial position, www.nature.com/scientificreports/ which provides more realistic results 62 . The formula used by the GWR adopted in this study is the logarithmic transformation of a nonlinear regression as follows 66 : where GW r is the estimated value of GW recharge as a dependent variable for i th aquifer; (u i , v i ) are the geographic coordinates for i th aquifer; a 0 (u i , v i ), and a k (u i , v i ) are the intercept and local coefficients for i th aquifer, respectively; n p is the number of predicting factors included in regression (i.e. five variable); x ik is the k th explanatory variable for i th aquifer, and ε i is the random error term for i th aquifer. According to the Eq. 5, the logarithm of GW r values for 325 aquifers and corresponding five predicting factors considered as GWR inputs. Following the typical estimation method 67 , the regression coefficients for i th aquifer are estimated as follows: where α i is the k × 1 vector of regression coefficients for aquifer i with coordinate (u i , v i ); w i is a diagonal matrix m × m of spatial weights obtained by the weighting functions quantifying the proximities of aquifer i to its m neighborhoods; X is the variable matrix m × k ; and GW r is the vector of estimated value of GW recharge value k × 1 . GWR typically employs a kernel weighting function 68 , to allow data points located nearer to the location of interest to have more influence in the regression calculations. For GWR calculations in this study, we used the Gaussian distance-decay based weighting function as follows 69 : where w Tkj is the weight of the observation at site k on the observation at site j for GW r when the independent variable is c ; d kj is the distance between site k and site j , b Tc is the kernel bandwidth for GW r when the independent variable is c , and exp is the exponential function. When the distance is greater than the kernel bandwidth ( d > b ), the weight rapidly approaches zero ( w → 0 ). In this study, the optimal bandwidth was determined by minimizing the corrected Akaike's Information Criterion (AIC) value 70,71 . All GWR modeling was done using the GWR4 software package version 4.09, which is freely available online 72 . GWR can be used to calculate a set of local regression results including a localparameter estimate, a local R 2 value, and a local residual for each sampling site 73 .
Cluster analysis. The GWR model generated a large number of results which provides a challenge for interpretation 74 . Therefore, based on GWR results, a clustering analysis usually served to further scrutinize the results. Two-step cluster method used in this study is a clustering method that determines the optimal number of clusters 75 . through two steps: first, all records are investigated by distance to construct the classification feature tree, while records in the same tree node have high similarity. In the second step, the nodes are classified using the cohesion method and each clustering result is evaluated using an appropriate criterion (i.e. Bayesian information criterion) which yields the final clustering result 62 .

Results
Spatial distribution of dependent/independent variables. Groundwater recharge rate, GW r . The spatial distribution of long-term (~ 30-year) GW r values for 325 study phreatic aquifers calculated by Iran's WRM Company by the year of 2014. The GW r is calculated as the sum of recharge from the rainfall, river seepage, return flows from water used for irrigation, domestic, and commercial sectors, and groundwater inflow from other basins as given in the Table S1 in the Supplementary Data. The GW r values (as the dependent variable) are in the range of 8.92-1346.8 mm/year (with an average of 257.5 mm/year) as summarized in Table 1.
The aquifers with greater GW r values are located in southwest, west and northwest of Iran and are associated with the semi-arid, humid and Mediterranean regions (Fig. 2). While 53% of study aquifers receive a recharge rate less than 200 mm/year (Fig. 3b), only 2% of aquifers recharged annually at rates greater than 1000 mm/year mainly located in the southwest region of Iran (due to high precipitation). Noteworthy, Tehran aquifer (located in northern Iran) has received a recharge rate more than 800 mm/year mainly due to leakage from water supply network and sewage network in Tehran city 76 . The histogram of the GW r values (Fig. 3b)    Mean annual temperature, T. Long-term mean annual temperatures over study aquifers were also computed by the inverse distance weighted (IDW) method in ARC GIS 58 based on analysis of monthly data of 3128 synoptic and climatological stations during 1989-2019. The spatial distribution of T values is shown in Fig. 3e. The T values over study aquifers ranged between 6.95 and 28.2 °C, with an average of 17.6 °C ( Table 1). The histogram of the T values (Fig. 3f) indicates a weak skewness (0.27).
Precipitation to potential evapotranspiration, P/ET P . As another predicting factor of GW r , the ratio of the long-term mean annual precipitation to potential evapotranspiration (calculated by Hargreaves-Samani equation) was considered. For this purpose, the monthly temperature data of 3128 synoptic and climatological stations from 1989 to 2019 and the extraterrestrial radiation ( R a ) for each aquifer were utilized. The spatial distribution of P/ET P over the study aquifers is shown in Fig. 3g with relies on the range of 0.03-0.78. The data of P/ET P indicates a positive skewness (0.99) as can be observed in the corresponding histogram in Fig. 3h.
Mean annual specific discharge, Q s . River base flow is taken as equivalent to the total groundwater recharge of a basin and the system is assumed steady state such that groundwater discharge is assumed to equal to the recharge 78 . In the study areas, due to lack of continuous streamflow data, the annual averaged river-flow ( Q ) divided by the area of the aquifer ( A ) is considered as another predicting factor of GW r . The spatial distribution and histogram of Q s values are shown in Fig. 3i,j. A strong positive skewness is observed for this set of data (9.55).
The Q s values are in the range of 0.0 (no surface flow) to 59.2 MCM/km 2 for the aquifers located in southwest Iran (with an average value of 2 MCM/km 2 ) according to Table 1.
Mean surface slope, S. Another factor that may have a strong influence on GW r for the phreatic aquifers is topography. For this purpose, the average surface slope of study aquifers ( S ) is calculated by using ALOS DEM 12.5 m and Slope tools in ArcGIS software 58 . The obtained values of S are in the range of 1.8-23.0% (with an average value of 5.2%) as shown in Table 1 and also Fig. 3k,l.  Table 1.
Soil moisture content, SM 90 . The long-term value of soil moisture content ( SM ) in the upper layer of soil (in the depth of 0-273 mm) for the study aquifers are considered as another predicting factor of GW r . Because GW r occurs when the infiltrated water exceeds the maximum soil moisture capacity ( SM max ), the 90th percentile of daily data of soil moisture content ( SM 90 ) could be a good approximation for this threshold value 79 . In deep phreatic aquifers, which is true for the most of Iran's aquifers (mean depth to water table 34 m), the presence of thick unsaturated zone buffers the water table response to rainfall. www.nature.com/scientificreports/ www.nature.com/scientificreports/ Due to lack of ground-data for SM , such data is derived through remote sensing from SMAP satellite. In total, 186,875 images (27-km resolution and 3-day intervals) were analyzed in GEE platform for 325 study aquifers (575 images for each aquifer) during 2015-2019. Using the time-series data of SM over the aquifer area, the 90th percentile of SM data ( SM 90 ) for each aquifer were calculated, daily and considered as one of predicting factor of GW r . The spatial distribution of long-term SM 90 data for the study aquifers are shown in Fig. 3o,p. According to Table 1, the values of SM 90 for the study aquifers are in the range of 7.2-130.5 mm (44.0 mm on average).
Population density, Pop d . In the urban areas, GW r results from rainfall infiltration (across the pervious areas), and leakage from water supply and sewerage networks and thus, varies widely with population density and development level 80 . In this study, the population density ( Pop d ) over the aquifer areas was considered as an available explanatory variable of urbanization and the human effect on GW r . The Pop d values have a broad range of 0.0 (without habitant) to 5137 people/km 2 (averagely 224 people/km 2 ) as shown in Fig. 3q and Table 1. The data of Pop d indicates strong positive skewness (5.45) as can be seen from their corresponding histogram (Fig. 3r).
The correlation matrix of dependent and independent variables based on the initial data shown in Table 2 gives more information about the relation between the GW r and the predicting factors. The GW r has the maximum correlation with NDVI (0.74), followed by the P/ET P (0.46) and SM 90 (0.39). Dense surface vegetation (i.e. higher values of NDVI) may be associated with the potential zones for recharge to underlying layers. While previous studies reported that the groundwater level (GWL) has a strong linear relationship with NDVI, especially for shallow aquifers during dry years (e.g. 40 ), a non-significant relationship is observed between NDVI and GWL. This may be due to the depth of GWL in Iran's aquifers (mean 34 m) are much greater than interacts with surface vegetation. Of note is that the GW r has a positive correlation (direct relationship) with all considered predicting factors. The positive correlation of GW r and Pop d in Table 2 (0.23) can be interpreted with increasing population the rate of return flows of domestic use to groundwater resources increases, especially for the urban areas that the source of domestic water supplies from outside of aquifer basin. Surprisingly, increasing the surface slope of the study aquifers ( S ) increased the GW r rate, due to the topography, geomorphological and climatic condition of Iran's basins. Usually, in Iran, lower slope lands consist of with fine-grain sediments (e.g. clay and silt) which have a low rate of GW r .
While P/ET P and SM 90 are the strong drivers of NDVI with direct relationship, the temperature has negatively correlated with NDVI (Table 2). This is consistent with other studies which suggested temperature to be negatively correlated with NDVI during spring 81 and summer 82 . This negative relationship can be due to lower soil moisture caused by higher temperature, especially in the regions with limited rainfall. The importance of evaporation and its negative impact on NDVI has been reported in other studies 81 . Global and local Moran's indicator. The spatial characteristics of GW r in Iran's phreatic aquifers is investigated by the global and local Moran's I indicator. The global Moran's I for GW r data is 0.373 ( p< 0.01), which indicates an important positive spatial correlation for this variable. To better understand the spatial characteristics and distribution of GW r across the study aquifers, the local Moran's I indicator was used. Figure 4 shows the results of Moran's I indicator, locally. There are several aquifers of high-high cluster in southwest Iran (especially Fars Province) which indicates these aquifers have high value of GW r and neighboring to such aquifers. One of aquifer included in this cluster is Roudan in Hormozgan Province (southern Iran) with an annual recharge of 826 mm/year. The high value of recharge rate associated with this aquifer leads to Hosseini et al. 11 ranked it as the third Iran's aquifers from viewpoint of sustainable management. The aquifers which classified as low-low clusters Stepwise regression (SR) model. The SR model was used to select the most influenced predicting factors of GW r in the study aquifers. For this purpose, four criteria of the adjusted coefficient of determination ( R 2 ), standard error of estimation ( SE ), Akaike's information criterion ( AIC ), and variance inflation factor ( VIF ) were used and computed for all regressions. As discussed previously, before the variables enter the SR model, the data series of both dependent and independent variables were transformed using the logarithm function. This transformation was used to obtain a constant variance of the residuals about the regression line, and to linearize the relation between the variables to use linear least squares regression techniques. The equations and goodnessof-fit criteria of five types of SR models (SR 1 to SR 5) that includes one (including NDVI with 10th percentile, 90th percentile and mean) to five (NDVI, Pop d , T , Q s and D d ) independent variables are shown in Table 3. These regressions are statistically significant at α= 0.01 based on t-test values and F-test. Since all t-test and F-values of regressions given in Table 3 are much smaller than 0.01, therefore, the regression results for the selected parameters show the significance of these factors (with a confidence level of 0.99) in explaining the GW r . Results given in Table 3 indicate that the SR 5 model that includes five independent variables of NDVI (mean values), Pop d , T , Q s and D d indicates better performance in the estimation of GW r . The positive exponent of predicting factors in the regressions reveals the direct relationships of these variables on the magnitude of GW r . It is worth noting that the adjusted R 2 values (in Table 3) associated with the regression models varies between 0.711 for SR1 (including only mean NDVI as an independent variable) to 0.747 for SR5 (including five independent variables). This reveals that the estimation of GW r values in the study aquifers using the mean NDVI can solely explain 71% of the GW r variations. Adding the four predicting factors of Pop d , T , Q s and D d will improve the regression efficiency as 3.6% in term of R 2 , and 4.5% in terms of SE and AIC criteria. The VIF values calculated for five SR models are less than 2, which indicates that there are not problems of serious multi-collinearity among the independent variables (the VIF value above 5 indicates high correlation that may be problematic).
The effect of 10th, 90th percentiles and mean of NDVI (i.e. explanation of low, high and average levels of vegetation coverage of aquifer surface) on GW r in estimations were also investigated in SR modeling. Results given in Table 3 indicate that the mean values of NDVI shows better correlation with GW r than 10th and 90th percentiles according to the goodness of fit criteria of R 2 , AIC , VIF , and SE . This reveals the pivotal role of average condition of surface vegetation coverage (i.e. mean values of NDVI) in estimation of GW r for phreatic aquifers rather than high-level (growing period of crops) and low-level (non-growing period of crops) of vegetation coverage. Thus, the mean values of NDVI better represents the inter-annual surface vegetation variability and its role of the GW r .
Results shown in Table 3 indicate that the regression cannot explain 29% (for SR1) to 25% (for SR5) of the variations in the GW r . This may be due to discounting other influential factors that are difficult to quantify (e.g. groundwater inflow/outflow from adjacent basins). Developing a single-variable regression model including NDVI to estimate GW r has great promise due to simplicity of the deriving vegetation related index from GEE platform especially for the aquifers with scarce ground data. The five drivers of GW r selected by the SR model (i.e. SR 5) are considered as GWR model inputs to obtain the locally-varying relationships between the variables (five predictors and GW r ). Since the SR model uses the multi-collinearity diagnostic test, thus, the variables of SM 90 , P/ET P , and S are excluded from the regressions since they are maximum correlated to the NDVI in the study aquifers. GWR model. Spatial analysis of the relationship between GW r and five predicting factors (NDVI, Pop d , T , Q s and D d ) through GWR model were performed for the study aquifers. Table 4 shows the descriptive statistics of the coefficients. According to Table 4, NDVI and Temperature were the variables with the greatest and lowest spatial coefficients (mean 0.889 and 0.092, respectively) and they can predict GW r with a direct relationship.
In Fig. 5a-e, the coefficients of independent variables were divided into five classes based on the Natural Breaks method 58 . According to Fig. 5a-e, the coefficient of NDVI ranges from 0.67 to 1.02. Significant relationships between NDVI and GW r occur in the southwest and south of Iran. The GW r for the aquifer located in southern part is highly affected by NDVI. The coefficient for the drainage density ( D d ) ranges from 0.00 to 0.34 and its coefficient gradually increases from the northwest to the southeast of Iran. The aquifers that are most affected by D d are ones located in southeast part. The coefficient for the Pop d ranges from 0.05 to 0.16 and the significant relationships are observed in the aquifers in southeast and east parts of Iran. According to spatial distribution of Pop d coefficients, the aquifers located in eastern Iran are highly affected by it/them. The coefficient of Q s ranges from 0.0 (for west aquifers) to 0.361 (for east aquifers) and its coefficient gradually decrease from the east to the west of Iran. The coefficient of temperature ( T ) ranges from 0.01 to 0.12, and the highest values are observed for the northwest aquifers and the lowest for the southwest regions of Iran. The aquifer which is mostly affected by T is ones located in the northwest part.
The spatial distribution of R 2 values of GWR model is shown in Fig. 5f. The local R 2 value ranges from 0.67 (for northwest aquifers) to 0.80 (for central aquifer). The study aquifers show diversity in degrees of fit (high variation of R 2 ). Based on Fig. 5f, the local R 2 value increases when moving towards central aquifers of Iran. In other words, The GWR model has the best fit in the central Iran. These results indicate that in the aquifers in the central parts of Iran, the relationship between predicting factors and GW r is better in the regression model. www.nature.com/scientificreports/ the study aquifers were divided automatically into six categories based on the Bayesian Information Criterion (BIC) method. The statistical and mapping result of the five predicting factors in each cluster is given in Table 5 and Fig. 6a,b. The coefficients with larger values have a greater impact on the prediction of GW r . The results shown in Fig. 6 provide valuable information about the recharge predicting factor(s) in each aquifer system. Two-step clustering results reveal that the NDVI (as the most influential predictor of GW r ) has the highest effect on the estimation of GW r for all classes of aquifers (Table 5). Figure 6 reveals that the effect of NDVI for Table 2. Cross-correlation matrix of dependent ( GW r ) and predicting factors (NDVI, T , P/ET P , Q s , S , D d , SM 90 , Pop d ) considered in this study (original data without transformation were considered). GW r long-term mean annual groundwater recharge, NDVI long-term mean annual normalized difference vegetation index, T long-term mean annual temperature, P/ET P ratio of long-term mean annual precipitation to potential evapotranspiration, Q s mean annual specific discharge , S mean surface slope, D d drainage density, SM 90 90th percentile of soil moisture content, Pop d : population density. *Significant at 90% confidence level. **Significant at 95% confidence level. ***Significant at 99% confidence level. www.nature.com/scientificreports/ the aquifers located in south and southwest Iran (aquifers categorized in class 4) is higher than other parts of Iran (the attribute value of NDVI for class 4 is 0.97). These aquifers are characterized by high NDVI and the significant GW r (mean 310 mm/year). The lowest effect of NDVI are for the aquifers located in northwest part (aquifers in class 1) which are characterized by NDVI value 0.75 (Table 5). Drainage density, D d (as the second most influential predictor of GW r ) has the most effect on aquifers located in southeast of Iran (class 6) which characterized by drainage density > 90 m/km 2 . Population density, Pop d (as the third most influential predictor of GW r ) has the highest effect in increasing of GW r for the aquifers located in southeast, east, northeast and north of Iran (aquifer of classes 6, 3 and 2). The presence of population on aquifer surface has the lowest effect on GW r for the aquifers located in south and southwest parts (class 4). Specific Discharge, Q s (as the fourth most influential predictor) has the highest effect on the aquifers located in southeast and northeast of Iran which characterized by mean annual specific discharge 18 MCM/km 2 (aquifers of classes 6 and 2). Mean annual temperature, T (as the least influential predictor of GW r ) has the most effect on aquifers located in south, southwest and northwest of Iran (classes 4 and 1) which are characterized by mean annual temperature 7-28 °C.

Conclusion
In this study, the effects of different explanatory variables of climate ( T , P/ET P ), geomorphologic ( S and D d ), hydrologic ( Q s ), soil ( SM 90 ), human ( Pop d ), and land cover (as NDVI) were analyzed for explaining groundwater recharge rate ( GW r ) for 325 of Iran's phreatic aquifers. Of these variables, the stepwise regression consistently indicates the predominant effects of NDVI, Pop d , T , Q s and D d on GW r in the study aquifers. All these predictors are positively correlated with the GW r . To support the spatial analysis of the results, local and global Moran's I index, GWR model, and two-step cluster analysis were employed. Results indicated that NDVI is consistently the dominant predictor of GW r , and followed by the P/ET P and SM 90 . Thus, land cover is the dominant control on groundwater recharge in all studies areas of Iran. A consistent and robust story has emerged in terms of the relationships between the predicting factors, especially NDVI, and GW r for the phreatic aquifers used as case studies here. Remotely sensed NDVI has allowed rapid collection of data not only across sizeable aquifers, but more importantly, across a time span of years.
In this way, the use of remote sensing along with the GEE cloud platform can be viewed as a strength to provide a large number of hydrological data points for the wide spatial and temporal scales. However, the lack of field studies to verify remotely-sensed observations (NDVI and SM ) with ground truthed data is a limitation of this study. The results indicate that combining a geographically weighted regression model with two-step cluster analysis can be a valuable tool for identifying the spatial heterogeneity of GW r predictors. Table 3. Results of stepwise regression (SR) analysis and evaluation criteria (all variables are significant at α = 0.01) for estimation of groundwater recharge rate over the study aquifers. The equations given in table are back transformed logarithmically from linear regression. GW r natural groundwater recharge rate (mm/year), NDVI mean values of normalized difference vegetation index obtained by Landsat satellite images with interval 16-day and resolution 30-m during 1989-2019, NDVI 10 and NDVI 90 are the 10th and 90th percentile of the NDVI, T : mean annual temperature (°C), Pop d population density (people/km 2 ), Q s mean annual specific discharge (MCM/km 2 ), D d drainage density (m/km 2 ), R 2 determination coefficient of regression, AIC Akaike information criteria, VIF variance inflation factor, SE standard error of estimations.

Model evaluation criteria
Adjusted www.nature.com/scientificreports/ In conclusion, relating the remotely sensed data (e.g. NDVI) with GW r in the phreatic aquifers will help landuse decisions for sustainable groundwater management, especially where the field data for precise calculation of GW r through traditional models does not exist. Among the explanatory variables we investigated, population density ( Pop d ) and surface vegetation (NDVI) are of manageable ones through human intervention on aquifer surface that are directly related to GW r magnitude and its spatial pattern. www.nature.com/scientificreports/