Spatial–temporal characterization of rainfall in Pakistan during the past half-century (1961–2020)

Spatial–temporal rainfall assessments are integral to climate/hydrological modeling, agricultural studies, and water resource planning and management. Herein, we evaluate spatial–temporal rainfall trends and patterns in Pakistan for 1961–2020 using nationwide data from 82 rainfall stations. To assess optimal spatial distribution and rainfall characterization, twenty-seven interpolation techniques from geo-statistical and deterministic categories were systematically compared, revealing that the empirical Bayesian kriging regression prediction (EBKRP) technique was best. Hence, EBKRP was used to produce and utilize the first normal annual rainfall map of Pakistan for evaluating spatial rainfall patterns. While the largest under-prediction was estimated for Hunza (− 51%), the highest and lowest rainfalls were estimated for Malam Jaba in Khyber Pakhtunkhwa province (~  1700 mm), and Nok-kundi in Balochistan province (~  50 mm), respectively. A gradual south-to-north increase in rainfall was spatially evident with an areal average of 455 mm, while long-term temporal rainfall evaluation showed a statistically significant (p = 0.05) downward trend for Sindh province. Additionally, downward inter-decadal regime shifts were detected for the Punjab and Sindh provinces (90% confidence). These results are expected to help inform environmental planning in Pakistan; moreover, the rainfall data produced using the optimal method has further implications in several aforementioned fields.

www.nature.com/scientificreports/ methods exist in the current literature for estimating unknown rainfall values, for this study, spatial interpolation methods were first systematically analyzed to assess the degree to which these different methods affect the unknown estimated rainfall values. Most of the existing studies in this field provided a limited comparison of interpolation methods, relying on only one or a few selected error statistics for selecting the optimal method. In this study, nearly all generally known interpolation methods were analyzed; further, different models within these methods were considered in order to discern the most optimal method for interpolating rainfall data in the study area. Based on a comprehensive literature review, we selected the interpolation techniques under two different categories (i.e., deterministic and stochastic) 21 . For the stochastic category, different semi-variogram models were also considered (i.e., Circular, Spherical, Exponential, Gaussian, Hole effect, K-Bessel, and J-Bessel), as they are known to significantly influence the prediction of unknown climate variable values. The codes for these methods are provided in Fig. 1, which will be used hereinafter. In this study, cross-validation focusing range parameter estimation was used to optimize the semi-variogram models and associated parameters such as nugget, sill, and range 21 . It should be noted that the area interpolation method was excluded from this comparison, as that method requires data assigned to polygons (areas) and our data were collected from individual stations 34 . The deterministic category represents four methods, including inverse distance weighted (IDW), global polynomial interpolation (GPI), local polynomial interpolation (LPI), and radial basis functions (RBF). In contrast, the stochastic category presents five methods including ordinary kriging (OK), simple kriging (SK), universal kriging (UK), empirical Bayesian kriging (EBK), and empirical Bayesian kriging regression prediction (EBKRP). www.nature.com/scientificreports/ The common objective of all the selected interpolation methods is to estimate the unknown value of Ẑ at the point x 0 . This is given as: where Z(x i ) represents the measured value at point x i , n is the total number of existing data points, and i is the weight function allocated to data points 35 . The details of the different methods selected for the comparison are provided in the following sub-sections and can be found at: https:// bit. ly/ 3k2Io 5a.
Deterministic methods. Inverse distance weighting (IDW) is one of the most widely used, exact, and quick local interpolation methods. This method requires no subjective assumptions or pre-modelling in selecting the semivariogram model, giving it advantage over other methods, particularly kriging. Under this method, the values are estimated using a linear combination of the value at the sampled point weighted by an inverse function of the distance between the two points. This method assumes that the observation points which are closer to the prediction points are more similar as compared to more distant points. We calculate the weights using: where d i represents the distance between the measured point x i and the predicted point x 0 , n represents the total number of measured observations used, and p represents a power parameter. The power parameter ( p ) defines the decrease in the weight as the distance increases. The power parameter for IDW used in our study was 3.41, after optimizing the model. The searching neighborhood shape is selected in such a way that all observations are used for the prediction.
Global polynomial interpolation (GPI) is a global, smooth (inexact), and deterministic trend-surface approach, ideal for addressing the need to make few decisions in the context of model parameters. This deterministic method has an underlying supposition that the fitting pattern in each predicted continuous surface represents underlying, gradually varying surface trends in the area. While the order of the polynomial defines the shape of the resultant surface, the first-order polynomial functions were used, as this gave a relatively large value (63) of the exploratory trend surface analysis (ETSA) after optimizing the model. The larger the value of ETSA, the more local the interpolation. Moreover, this method results in a smoothly varying surface using low-order polynomials. The simplest form of the first-order polynomial also known as linear polynomial is given as: in which Z x i , y i represents the datum at x i , y i location, β the parameters, and ε x i , y i represents random error.
Local polynomial interpolation (LPI) is a more flexible method than GPI, being a quick, local, and smooth trend-surface approach. Rather than adjusting a polynomial over the entire area, such as is done in GPI, the LPI adjusts different polynomials overlapping within a specified neighborhood. To make the comparison similar to the GPI, the first-order polynomial was used. This method is capable of capturing local variations in the data and can adapt to non-stationary and heterogeneous datasets. This local capturing is done using a moveable "window"-fitting local trends. The surface value, µ 0 x i , y i , at the center of this window is computed at each point. For first-order polynomial it can be represented as: and so on. The parameters β are estimated when the center point as well as the window moves in space.
Radial basis function (RBF) is a form of artificial neural network based on a structure that is three-layer feedforward (i.e., input-layer, hidden-layers, and output-layer) 36 . The interpolators covered by the RBF include a thin-plate spline, tension-based spline, regularized spline, and multi-quadric function. Splining can be visualized by fitting a bendable flexible rubber sheet through the sampled observations while minimizing the total curvature of the surface. All of these interpolators use a basic equation that depends on the distance between the measured and predicted points 37 . Under RBF, the predictor is a linear combination of the basis function, which is: in which Ẑ RBF (x 0 ) is estimated using the sum of n radial basis functions ϕ having different centers c i and weighted as per the coefficients i .
Geo-statistical methods. Simple kriging (SK) is a flexible interpolator in nature and can be both smooth and exact. Therefore, it results in various surfaces such as probability, standard error, and prediction surfaces. SK is based on the kriging estimator: www.nature.com/scientificreports/ where i represents the kriging weight, n is the total measured points, Z(x i ) is the measured variable value at a given data point i , and µ represents a known stationary mean, also called the trend component. µ is calculated as the mean of the data and is assumed to be constant over the entire study area. The weight i is derived using a semi-variogram or covariance function. In this study, a semi-variogram was used because of its wide application in different interpolation approaches, and was estimated using: where γ (h) represents the semi-variance and h represents the distance between the measured and predicted data points. It should be noted that various semi-variogram models were used in this study to examine how selecting a different model influences the predictions. These models included circular, spherical, exponential, Gaussian, hole effect, K-Bessel, and J-Bessel. In SK, it is assumed that the trend component is an exactly known constant for the whole study area, which is approximated using the average value of the measured data, µ(x 0 ) = µ , such that: Ordinary kriging (OK) is also described as the acronym BLUE, representing "best linear unbiased estimator" 38 . It is given as: The only difference between OK and SK is that in OK, µ (the unknown trend constant) needs to be approximated. One of the key considerations of OK is that it assumes that the mean values remain constant over the whole are to be interpolated.
Universal kriging (UK) is also known as regression kriging, kriging with a trend, and external drift-based kriging. UK is also known as a multivariate extension of OK, which uses a higher deterministic or linear trend function µ(x i ) rather than relying on a constant trend function µ . In this case, the local trend function is given as: It should be noted that an exponential kernel function was used in this study as a trend function, as it results in the most satisfactory results 21 .
Empirical Bayesian kriging (EBK) is a robust and straightforward interpolation technique that requires minimal interactive modeling. In this geo-statistical interpolation technique, the most difficult aspects of a kriging model building are automated. This implies that, rather than adjusting the parameters to receive accurate results manually, the EBK computes the parameters automatically using a sub-setting and simulation procedure. The key difference between EBK and other kriging approaches is that EBK accounts for the errors caused by the estimation of the underlying semi-variogram. The other kriging approaches underestimate the prediction standard errors because they do not account for the semi-variogram's uncertainty. It is noted that EBK is more of an algorithm based on 6 different steps (see Gribov and Krivoruchko 2019 for further details) designed to automate the most difficult aspects of building a valid kriging model through a process of sub-setting and simulations. It is based on two different geo-statistical models (i.e., linear mixed model and intrinsic random function kriging). Readers are encouraged to see 36 for further details on this.
Empirical Bayesian kriging regression prediction (EBKRP) is a relatively new geo-statistical interpolation approach. It is an advanced form of EBK and considers different additional explanatory indicators (in raster format) that are known to influence the estimation of the dependent variable-acting as prior information. In this approach, regression analysis is coupled with the kriging method to make the interpolations more precise than the ones estimated by kriging and regression alone. This approach is a hybrid interpolation method that uses simple kriging (Eq. 4) and ordinary least squares (OLS) regression. The dependent variable (rainfall in this case) is approximated based on kriging and regression models by separating the estimation of the average variable value and an error term. This can be expressed as Dependentvariable = avarage + error . Here, the difference between OLS and kriging approach is that the main emphasize in OLS is to model average whereas, kriging models error through a semi-variogram. In EBKRP, both regression and a semi-variogram models for average and error term, respectively, are modeled simultaneously. This simultaneous operation on both terms results in more precise prediction of the variable as compared with individual application. In this study, elevation data was used as an explanatory variable due to its influence on rainfall. Further, a 30-m spatial resolution digital elevation model-based raster was used as an input parameter for the dependent variable.

Comparing interpolation methods based on multiple cross-validation parameters. Different
parameters of cross-validation (a well-known statistical approach to assess the preciseness of the interpolated data) were used to evaluate the performance of the different interpolation methods. In cross-validation, the performance of interpolation is determined by iteratively removing an observed value from the dataset and re-estimating it using the remaining values. An error is determined by estimating the difference between each observed (measured) and predicted value. The cross-validation statistics performed on these errors are expected to be reasonable evaluation estimators for comparing the different interpolation models and forming the basis of the method selection process. Based on extensive literature review, we focus on six cross-validation methods: www.nature.com/scientificreports/ mean error (ME), root mean square error (RMSE), Pearson R 2 (R2), mean standardized error (MSE), root mean square standardized error (RMSSE), and average standard error (ASE) [17][18][19][20][21]39,40 . Few studies have included all six cross-validation parameters to determine the optimal interpolation technique due to the complexity and higher likelihood of human error. Most studies have used only one or two parameters to determine the most suitable interpolation technique, with RMSE and R2 being the most widely used 30,41,44 .
To facilitate this selection process, an appropriateness index (AI) was computed based on the underlying cross-validation indicator/parameters and their contribution towards the suitability of a certain interpolation method. These validation methods are considered to be positively contributing if the increase in the indicator value results in the interpolation method being more appropriate than others, and vice versa. Among these parameters, an increase in the parameter value (for all parameters except R 2 ) indicates that the interpolation technique is less suitable. To compute the AI, the values of all the parameters were normalized to non-dimensional using a minimum-maximum normalization method [45][46][47] . For the normalization of ME, RMSSE, ASE, RMSE, and MSE, the following was used: due to their negative contribution towards the suitability. For R2, the following was used: because of its positive contribution towards suitability. Here, X j represents the parameter value for the interpolation method j. This approach helps to distribute the values of each parameter between 1 and 0, making them non-dimensional, where a value closer to 1 indicates that the technique is more suitable, and vice versa. Based on these normalized values of the cross-validation parameters, the AI was computed using the following equation: where AI is the appropriateness index, R is the normalized value of the cross-validation parameter for interpolation technique i, and n is the total number of parameters considered for the computation of the AI. It should be noted that because the EBK and EBKRP were identified as the most optimal methods, a continuous ranked probability score (CRPS) was further considered, as this score is only available for these two methods in ArcGIS. The CRPS is known to be a good evaluating scoring rule of probabilistic forecasts in the context of a univariatequantity 39 . The CRPS is oriented negatively, implying that, the smaller the values, the better the performance of the method. It is calculated as: where P represents the cumulative distribution function of the density forecast and x represents the observed rainfall (normalized). The values of CRPS inside 90% and 95% confidence intervals were used to compare the methods. The details of all cross-validation parameters are presented in Table 1.

Temporal analysis of rainfall in Pakistan.
To analyze the temporal rainfall trends in Pakistan, the wellknown Mann-Kendall test and Sen's Slope estimator were selected for long-term assessment 46 . The Mann-Kendall test is given by: where q is the number of tied groups and t p is the number of data values in the "pth" group. The values of S and VRS(S) were used to compute the test statistic "Z" as follows: The two-tailed test at 0.001, 0.01, 0.05 and 0.1 levels of significance was used to detect a positive or negative value of "Z. " The null hypothesis was rejected if the absolute value of "Z" was greater than Z 1-α/2 , where Z 1-α/2 was obtained from the standard normal cumulative distribution tables.
Further, Sen's non-parametric method was used to estimate the true slope of an existing trend as change per unit of time (year, in this case). Once the trend was determined, Sen's slope estimation was used to calculate the magnitude of the trend slope, which is: where t refers to the year, Q refers to the trend slope (the tendency is more obvious when Q is greater), and B is constant. www.nature.com/scientificreports/ Additionally, a regime shift detection algorithm was applied to assess the inter-decadal rainfall trends in Pakistan 48,49 . This algorithm is based on the Student's t-test, which is sensitive to deviation in the successive running averages of the variable values under a given cut-off length (10 years, in this case). These regimes shift (inter-decadal variations) were analyzed with 90% confidence. Both the long-and short-term evaluations were conducted at national and sub-national (provincial) levels in order to provide more detailed insights.

Results and discussion
Optimal interpolation method for rainfall interpolation in Pakistan. The statistics of cross-validation and the results on the AI are presented in Table 2. A two-step evaluation approach was used to select the most optimal interpolation technique. In the first step, the AI was computed based only on three cross-validation parameters (i.e., MR, RMSE, and Pearson R 2 , represented by AI-3 in Table 2), as these parameters were available for all possible interpolation techniques including the IDW, GPI, and RBF. Based on the ranking of all interpolation methods, it was evident that none of the typical techniques were optimal for Pakistan. Based on the AI-3 ranking, the EBKRP method was identified as the best choice with an AI-3 value of 1. The order of suitability of the interpolation techniques according to AI-3 was EBKRP > UK-k > EBK > IDW > OK-k > OK-e > RBF > OK-g > OK-s > OK-c > OK-j > UK-h > UK-e > SK-k > UK-j > SK-e > OK-h > SK-s > UK-c > UK-s > UK-g > LPI > SK-c > S K-g > SK-h > SK-j > GPI. This indicates that if only the ME, RMSE, and R2 parameters of the cross-validation are considered, the IDW would perform best among all the deterministic interpolation techniques with an AI-3 value of 0.94 and an overall rank of 4. However, it would not perform as well compared to the geo-statistical and stochastic interpolation techniques.
Given that IDW, GPI, and RBF were not suitable interpolation techniques for the rainfall data in Pakistan, the AI was re-computed based on all available parameters of the cross-validations (six, i.e., ME, RMSE, R2, MSE, RMSSE, and ASE), termed as AI-6 in this study. AI-6 also indicated the EBKRP (AI-6 = 0.9425) as the most appropriate interpolation technique followed by EBK (AI = 0.9379). The order of suitability of the interpolation techniques according to AI-6 was EBKRP > EBK > OK-k > OK-j > OK-g > OK-c > OK-s > OK-e > SK-e > SK-k > U K-j > SK-s > UK-k > UK-e > LPI > UK-g > SK-g > SK-c > UK-s > SK-h > UK-c > OK-h > SK-j > UK-h. Results from both AI-3 and AI-6 show that among all kriging approaches, the ordinary kriging-based techniques performed comparatively well. As EBK and EBKRP were the top two interpolation methods for interpolation based on AI-6, we further compared them on the basis of CRPS using confidence intervals of 90% and 95%. The CRPS values for EBK and EBKRP in the 90% confidence interval were 91.46 and 89.02, respectively. Similarly, the CRPS values for EBK and EBKRP in the 95% confidence interval were 93.90 and 92.68, respectively. Because the CRPS is negatively oriented, smaller values correspond to better methods. Hence, CRPS also showed that EBKRP is the most optimal technique for interpolating rainfall in Pakistan, as the values of CRPS for EBKRP under both confidence intervals (i.e., 90% and 95%) were smaller than those of EBK.
The scatter-plots for measured vs. predicted rainfall show inconsistencies among the different interpolation methods used to interpolate the rainfall in the study area (Fig. 2). This inconsistency also existed for a single interpolation method when different semi-variogram models (i.e., spherical, exponential, circular, Gaussian, Table 1. Details on multiple cross-validation parameters used to compute the appropriateness index (AI).

Cross-validation parameter Description Formula
Mean Error (ME) The Mean Error (ME) represents the arithmetic average of all the estimated errors in the interpolation. It also tells the direction (average) of the estimated errors. The positive bias represents an overestimation of the variable whereas the under-estimation is represented by negative bias. This parameter is used as an accuracy indicator because positive and negative estimations counteract each other resulting the ME comparatively lower than the actual error 40

Root mean square error (RMSE)
The Root mean square error (RMSE) is the most widely and commonly used cross-validation parameter. It holds the same measuring unit as the predicted value (e.g. inches)

Root mean square standardized error (RMSSE)
It is ideal to have the Root mean square standardized error (RMSSE) close to one. If the RMSSE > 1, then it indicates a general under-estimation in the inconsistency of the estimated/ predicted variable. If the RMSSE < 1, then it indicates a general overestimation in the inconsistency of the estimated/predicted variable It measures the correlation (linear) between the measured and the predicted covariance (rescaled). The resultant values range between − 1 and 1 where the value closer to − 1 indicates a strong negative correlation and the value closer to 1 indicates a strong positive correlation. The value closer to zero represents a week correlation The Pearson R 2 is widely used to measure the goodness of fit for different interpolation methods. However, as noted by Li and Heap, it could often be misleading and hence, one should be careful in using this single parameter as an evaluation criterion www.nature.com/scientificreports/ hole-effect, k-Bessel, and j-Bessel) were employed, as shown in Fig. 2. Hence, care should be taken in selecting not only the interpolation method but also the semi-variogram model to determine the optimal interpolation method. Notably, most points where rainfall prediction was under-estimated were located in the northern regions of Pakistan. This underestimation could possibly be a result of the complex physiography of that region (i.e., mountainous terrain), making the estimation more difficult in comparison to the plain regions such as the southern Punjab and Sindh provinces. Using the scatter-plot for the EBKRP interpolation method, it was estimated that the largest under-estimation in predicted rainfall was for Hunza station (− 55%) in northern Pakistan ( Supplementary Fig. S1). The optimal interpolation technique for rainfall in Pakistan identified in our study is different from that of 41 , who concluded that the universal kriging with hole effect model (UK-h) is best for the Indian Himalayas 16 . One possible reason for this different result could be the used comparison approach along with the scale of their study, which was substantially smaller than that of the present study. Overall, the identified optimal interpolation method from this study is important for Pakistan and could be adopted by institutions, researchers, and professionals responsible for national-level rainfall data archiving and inventory management (i.e., Pakistan Meteorological Department) to support the consistency of rainfall data. This empirically validated interpolation technique can result in more reliable rainfall information as compared with typical in-practice interpolation methods such as IWD or simple kriging, which can ultimately be utilized in further research.

Spatial distribution of rainfall in Pakistan.
The predicted rainfall at all observation stations using different interpolation methods showed clear spatial heterogeneity throughout the study area. Most of the stations with comparatively larger rainfall (std. dev. > 1.5) belonged to the Khyber Pakhtunkhwa (KPK) province and Gilgit Baltistan region (Fig. 3). The final normal rainfall map of Pakistan using the EBKRP method showed clear spatial variability in the rainfall from the south to the north (Fig. 3). Almost all the interpolation methods showed a similar general distribution and pattern of the rainfall in Pakistan; indicating an increase in the rainfall from south to north (Fig. 4). However, spatial variations among different interpolation methods were evident with a clear difference between the deterministic and stochastic methods (e.g., see the spatial demarcation of brown, green, and blue shades in Fig. 4). Based on the optimal interpolation method of EBKRP, the estimated Table 2. Cross-validation statistics from all interpolation methods and the appropriateness index (AI). AI-3 represents the first step of evaluation whereas AI-6 represents the second step of evaluation of the optimal interpolation technique. The values of cross-validation parameters are normalized using the minimummaximum normalization method as discussed in section "Deterministic methods".  www.nature.com/scientificreports/ normal annual rainfall in Pakistan ranged from ~ 50 to ~ 1700 mm throughout the study area with an areal average rainfall of 455 mm (Fig. 5). From a cross-regional perspective, the lowest rainfall (ranging from 48-48 to 162.85 mm) in the study area was experienced by the south-western areas in Balochistan province and the central region in Sindh province. Results indicate that the lowest rainfall was in Nok-kundi, Balochistan (~ 50 mm, within 95% confidence interval). The northwestern regions of Balochistan (i.e., Quetta) experienced comparatively larger rainfall (presented as green shades) than southern Balochistan and most of Sindh province (presented as brown shades in Fig. 5). One possible reason for the larger rainfall in the Quetta region could be its higher altitude compared to other southern regions (Fig. 1). However, further assessment in this regard is needed to analyze the correlation between altitude and rainfall occurrence. This could be a useful research question in the context of Pakistan to further improve our understanding of local spatial inconsistencies in rainfall patterns. The inner-Sindh province includes the areas of Rohri, Jacobabad, Sukkar, Larkana, Nawabshah, and Dadu, which were among the lowest rainfall areas. This region is highly water-scarce, thus the lower rainfall coupled with increasing population and decreasing percapita water capacity could result in worsening consequences without proper planning and management 1,50,51 . To address this, the normal rainfall data produced in this study could assist in this regard to effectively improve water www.nature.com/scientificreports/ resources planning and management for Pakistan, which is among the top fifteen countries facing extremely high baseline water stress 32 . According to the results, most regions in Punjab experienced moderate to high rainfall ranging from ~ 275 to 830 mm, with southern Punjab experiencing lower rainfall compared to northern Punjab (Fig. 4). The highest rainfall in Pakistan was experienced by the northern regions of KPK and Gilgit Baltistan (~ 1000-1700 mm). Among all the regions, the highest rainfall was estimated for the Malam Jabba region in KPK province (~ 1700 mm). This finding is in agreement with the results of 4 who presented a localized rainfall assessment in Pakistan's northwestern regions (a < 10 meteorological stations-based assessment).

Temporal rainfall trends at national and sub-national levels. Results from the Man-Kendall test
and Sen's slope estimator showed a decreasing monotonic trend for Pakistan overall, as well as in all provinces, as indicated by the negative coefficient values ( Table 3). The Kendall coefficient shows the highest downward trend for Sindh province (coefficient = − 2.28), which is statistically significant at p = 0.05. Punjab was the second highest downward trend in the province with a coefficient value of − 1.54; however, this downward trend was      Fig. S3). This upward inter-decadal regime shift contrasts with the downward long-term trend in rainfall for these provinces, thus illustrating the significance of long-term and short-term rainfall assessments. At the national level, Pakistan experienced an overall downward regime shift in rainfall (in 2012). Sindh province is among the most severely drought-affected provinces in Pakistan (see https:// bit. ly/ 36PuU To). Based on the present analysis, the lower rainfall determined for most parts of the province (i.e., Rohri, Sukkar, Larkana, and Jacobabad, Fig. 5), as well as the statistically significant long-term decrease (Table 3) and inter-decadal downward regime shift in the rainfall (Supplementary Fig. S3), could further stress the region in terms of drought and water availability, affecting millions of people in the near future. Similarly, the regions in southern Punjab (i.e., Bahawalpur, D.G. Khan, and Khanpur) were also found to receive very little rainfall (Fig. 5), thus a decreasing trend in the rainfall could potentially have severe impacts on agriculture as well as water resources of the province. If the management of dams and reservoirs in Pakistan could be improved, nearly 145 million-acre feet water could be conserved annually, based on rainfall observed each year during the monsoon season. The results presented here reflect the need for better water resources management in the Punjab and Sindh provinces, which have the highest populations and are the largest contributors to Pakistan's total agricultural yield as well as the national Gross Domestic Product.

Conclusions
In the context of environmental planning and management, this study bridges the existing knowledge gap regarding the spatial distribution and patterns as well as temporal trends (long-and short-term) of rainfall in Pakistan during the past half-century. For spatial assessment of rainfall, we produced the first continuous normal rainfall dataset (spatial resolution of 11 km) using the most optimal interpolation method. For this purpose, 27 interpolation techniques under two broader interpolation categories (i.e., deterministic and stochastic/geostatistical) were rigorously compared using an AI, which was computed based on multiple cross-validation evaluation criteria. It was concluded that the EBKRP approach is the most optimal method (AI-6 = 0.9425) to interpolate rainfall in Pakistan and should be adopted in the future for national-scale rainfall interpolations. Based on subsequent EBKRP analysis, a spatial variation in the rainfall was identified from south to north across Pakistan, while the highest rainfall was estimated for Malam Jaba in KPK province (~ 1700 mm), and the lowest rainfall was estimated at Nok-kundi in Balochistan province (~ 50 mm). The temporal analysis of rainfall in Pakistan shows an overall decreasing trend at the national and subnational levels. Sindh province experienced a statistically significant decreasing trend in rainfall during the study period, and a recent downward regime shift in rainfall was detected (90% confidence). Similarly, the highest decreasing rate in the rainfall was estimated for Punjab province (-3 mm/year), which is Pakistan's most populated province and is the largest contributor to the national GDP. The identified decreasing rainfall could escalate the likelihood of droughts, thereby affecting the agricultural sector in an agrarian country already prone to the effects of global warming.
The normal annual rainfall map produced in this study at the national level could be used as an input to hydrological models, integrated to assess flood risk, and research related to agriculture such as crop yield sensitivity to climate change. In Addition, the data could be used in further assessments contributing towards understanding the hydrological processes in Pakistan and aiding in water resources planning and management. Such studies could be carried out in the future. This can provide broader opportunities to reduce the baseline water stress. Additionally, the AI used in this study could also be calculated at different subnational scales to produce similar products at the provincial or district levels in order to produce higher-resolution studies in Pakistan, and beyond. Moreover, the comparison approach presented here is applicable to compare the interpolation methods in diverse scientific areas such as climatology, earth sciences, environmental sciences, and epidemiology, reflecting that the application of the proposed AI is substantially broad. Additionally, as two different wind systems (i.e., Monsoon in summer and Westerly wind system in winter season) are primarily responsible for rainfall in Pakistan, it would be interesting to see, via approach presented here, if the optimal interpolation method for these two systems vary. This could be a potential question for future research.
The authors do acknowledge the limitations of the study at this point. Firstly, the data to compute total annual rainfall for the sake of normal rainfall distribution are mean monthly observations. The use of daily data could result in more comprehensive information and may allow capturing further in-depth details (i.e., extremes) in the temporal analysis. However, this might increase the computational and time costs significantly if analyzed on such large spatial and temporal scales. Similarly, the distribution of rain gauges is not uniform across the study area resulting in sparse gauges in some areas as compared to others (i.e., Balochistan and Northern areas, respectively). No doubt that the EBKRP performs well to predict the rainfall at unknown points, the installation of more observation stations in the regions with sparse distribution of rain gauges is recommended in the context of comprehensive data collection.

Data availability
The data on rainfall observations for all stations in Pakistan are available online (http:// www. pmd. gov. pk/), while the data for the digital elevation model (DEM) used in this study are also available online (https:// earth explo rer. usgs. gov/). The final normal rainfall distribution gridded data based on the EBKRP method is available from the corresponding authors upon request.