## Background & Summary

Knowledge about the climatological probability distribution of precipitation (P) is essential for a wide range of scientific, engineering, and operational applications1,2,3. In the scientific realm, P distributional information is used, for example, to evaluate P outputs from climate models (e.g., Dai4 and Bosilovich et al.5) and to reduce systematic biases in gridded P datasets (e.g., Zhu and Luo6, Xie et al.7, and Karbalaee et al.8). In the engineering sector, information on the upper tail of the P distribution is crucial for planning, design, and management of infrastructure (e.g., Lumbroso et al.9 and Yan et al.10). In an operational context, reliable information on the P distribution can be used to improve flood and drought forecasting systems (e.g., Cloke and Pappenberger11, Hirpa et al.12, and Siegmund et al.13).

Past studies deriving (quasi-)global climatologies of P distributional characteristics typically relied on just one of the three main sources of P data (satellites, reanalyses, or gauges), each with strengths and weaknesses:

1. 1.

Ricko et al.14 used the TMPA satellite P product15 while Trenberth and Zhang16 and Li et al.17 used the CMORPH satellite P product7,18 (both available from 2000 onwards) to estimate climatological P characteristics. Satellite P estimates have several advantages, including a high spatial (~0.1°) and temporal (≤3-hourly) resolution, and near-global coverage (typically <60°N/S)19,20. However, satellite P retrieval is confounded by complex terrain21,22,23 and surface snow and ice22,24, and there are major challenges associated with the detection of snowfall25,26.

2. 2.

Courty et al.27 used P estimates between 1979 and 2018 from the ERA5 reanalysis28 to derive probability distribution parameters of annual P maxima for the entire globe. Reanalyses assimilate vast amounts of in situ and satellite observations into numerical weather prediction models29 to produce a temporally and spatially consistent continuous record of the state of the atmosphere, ocean, and land surface. However, reanalyses are affected by deficiencies in model structure and parameterization, uncertainties in observation error statistics, and historical changes in observing systems5,30,31.

3. 3.

Sun et al.32 calculated P occurrence estimates directly from gauge observations, while Dietzsch et al.33 used the gauge-based interpolated GPCC dataset34 to estimate P characteristics for the entire land surface. Gauges provide accurate estimates of P at point scales and therefore have been widely used to evaluate satellite- and reanalysis-based P products (e.g., Hirpa et al.35, Zambrano-Bigiarini et al.36, and Beck et al.37). However, only 16% of the global land surface (excluding Antarctica) has a gauge located at <25-km distance38,39, gauge placement is topographically biased towards low elevations40,41, and there has been a significant decline in the number of active gauges over recent decades42. Additionally, interpolation of gauge observations generally leads to systematic overestimation of low intensity events and underestimation of high intensity events43,44.

Here, we introduce the Precipitation Probability DISTribution (PPDIST) dataset, a collection of high-resolution (0.1°) fully global climatologies (1979–2018) of eight P occurrence indices (using different thresholds to discriminate between P and no-P) and nine peak P intensity indices (using different return periods) at daily and 3-hourly time-scales (Figs. 1b and 2b). The climatologies were produced using neural network ensembles trained to estimate the P indices using quality-controlled P observations from an unprecedented database consisting of 93,138 daily and 11,881 hourly gauges worldwide (Fig. 1a and Supplement Fig. S2a). Ten global P-, climate-, and topography-related predictors derived from all three main P data sources — satellites, reanalyses, and gauges — were used as input to the neural networks to enhance the accuracy of the estimates (Table 1). Ten-fold cross-validation was used to obtain an ensemble of neural networks, evaluate the generalizability of the approach, and quantify the uncertainty. The performance was assessed using the coefficient of determination (R2), the mean absolute error (MAE), and the bias (B; Fig. 3). The neural networks yielded mean validation R2 values (calculated using the validation subsets of observations) of 0.78 and 0.86 for the daily and 3-hourly P occurrence indices, respectively, and 0.72 and 0.83 for the daily and 3-hourly peak P intensity indices, respectively (Fig. 3 and Supplement Fig. S4). The neural networks outperformed current state-of-the-art reanalysis (ERA528) and satellite (IMERG45,46) products for all P indices and performance metrics (Fig. 3). ERA5 consistently overestimated the P occurrence (Fig. 1c), while IMERG tended to overestimate the peak P intensity (Fig. 2d). The uncertainty in the neural network-based estimates of the P indices was quantified globally and was generally higher in sparsely gauged, mountainous, and tropical regions (Fig. 4). Overall, these results suggest that the neural networks provide a useful high-resolution, global-scale picture of the P probability distribution. It should be noted that although the climatologies are provided as gridded maps, the estimates represent the point scale rather than the grid-box scale, due to the use of gauge observations for the training43,47.

Since the eight PPDIST P occurrence climatologies were similar, we only present and discuss results for the >0.5 mm d−1 daily P occurrence climatology (Fig. 1b), although the others can be viewed by accessing the dataset. The P occurrence was particularly high (>0.5 mm d−1 on >50% of days) in windward coastal areas of Chile, Colombia, the British Isles, and Norway, along the Pacific coast of North America, and over parts of Southeast Asia (Fig. 1b)16,48,49,50,51. Conversely, the P occurrence was particularly low (>0.5 mm d−1 on <1% of days) in the Atacama and Sahara deserts. At a daily time-scale using a 0.5 mm d−1 threshold, P was found to occur 34.4%, 24.9%, and 38.9% of days, on average, over the global, land, and ocean domains, respectively (Fig. 1b). At a 3-hourly time-scale using a 0.1 mm 3 h−1 threshold, P was found to occur 12.2%, 7.4%, and 14.3% of the time, on average, over the global, land, and ocean domains, respectively (Supplement Fig. S2b). These estimates are in line with previous estimates derived from the satellite-based CMORPH product16 and from the satellite-, reanalysis-, and gauge-based MSWEP product52. The higher average P occurrence over ocean than over land reflects the unlimited water supply53. The PPDIST > 0.5 mm d−1 daily P occurrence climatology (Fig. 1b) exhibits reasonable visual agreement with similar climatologies from previous studies despite the use of different data sources, spatiotemporal resolutions, and thresholds to discriminate between P and no-P (Sun et al.32, their Fig. 1; Ellis et al.54, their Fig. 3; Trenberth and Zhang16, their Fig. 4; and Beck et al.52, their Fig. 8).

Since the nine PPDIST peak P intensity climatologies were similar, we only present and discuss results for the 15-year return-period daily P climatology (Fig. 2b). Peak P intensities were markedly higher at low latitudes (<30°N/S; Fig. 2b), due to the higher air temperatures which lead to more intense P in accordance with the Clausius-Clapeyron relationship55,56,57. The highest 15-year return-period P intensities (>200 mm d−1) over land were found in parts of Central America, India, and Southeast Asia, along the western equatorial coast of Africa, and over oceans in the intertropical convergence zone (Fig. 2b)17,48,58,59. Conversely, the lowest 15-year return-period P intensities (<50 mm d−1) were found in the desert regions of the world, at high latitudes where frontal P dominates, and over southeastern parts of the Atlantic and Pacific oceans where semi-permanent anticyclones prevail54,60. The PPDIST 15-year return-period daily P climatology (Fig. 2b) exhibits reasonable visual agreement with a maximum one-day P climatology derived by Dietzsch et al.33 (their Fig. 5d) from the gauge-based GPCC Full Data Daily dataset34, as well as with a 3.4-year return-period 3-hourly P climatology derived by Beck et al.52 (their Fig. 7) from the satellite-, reanalysis-, and gauge-based MSWEP dataset52.

## Methods

### Gauge observations and quality control

Our initial database of daily P gauge observations comprised 149,095 gauges worldwide and was compiled from (i) the Global Historical Climatology Network-Daily (GHCN-D) dataset (ftp.ncdc.noaa.gov/pub/data/ghcn/daily/)61 (102,954 gauges), (ii) the Global Summary Of the Day (GSOD) dataset (https://data.noaa.gov; 24,730 gauges), (iii) the Latin American Climate Assessment & Dataset (LACA&D) dataset (http://lacad.ciifen.org/; 231 gauges), (iv) the Chile Climate Data Library (www.climatedatalibrary.cl; 716 gauges), and (v) national datasets for Brazil (11,678 gauges; www.snirh.gov.br/hidroweb/apresentacao), Mexico (5398 gauges), Peru (255 gauges), and Iran (3133 gauges). Our initial database of hourly P gauge observations (resampled to 3-hourly) comprised 14,350 gauges from the Global Sub-Daily Rainfall (GSDR) dataset62 produced as part of the INTElligent use of climate models for adaptatioN to non-Stationary hydrological Extremes (INTENSE) project63.

Gauge data can have considerable measurement errors and therefore quality control is important64,65,66. To eliminate long series of erroneous zeros frequently present in daily GSOD records67,68, we calculated a central moving mean with a one-year window (assigning a value only if at least half a year of values were present), and discarded daily and 3-hourly P observations with a coincident moving mean of zero. To eliminate long series of P without zeros, indicative of biased reporting practices or quality control issues, we calculated a central moving minimum with a one-year window (assigning a value only if at least half a year of values were present), and discarded daily and 3-hourly P observations with a coincident moving minimum greater than zero. We only used observations between January 1st, 1979, and December 31st, 2018, and discarded gauges with (i) < 3 years of remaining data, (ii) the largest daily P > 2000 mm d−1 or 3-hourly P > 500 mm 3 h−1 (approximately the maximum recorded 24-h and 3-h P, respectively69), (iii) the highest P value < 2 mm d−1 or <1 mm 3 h−1, or (iv) the lowest P value > 1 mm d−1 or >0.5 mm 3 h−1. The resulting database comprised 93,138 gauges with daily data (Fig. 1a) and 11,881 gauges with 3-hourly data (Supplement Fig. S2a). See Supplement Fig. S1 for record lengths of the daily and 3-hourly gauge data.

### Precipitation occurrence and peak intensity indices

We considered eight P occurrence indices (related to the lower tail of the P distribution) and nine peak P intensity indices (related to the upper tail of the P distribution), all calculated for both the daily and 3-hourly time-scale. The eight P occurrence indices measure the percentage of time with P using thresholds of 0.1, 0.15, 0.2, 0.25, 0.35, 0.5, 0.75, and 1 mm d−1 or mm 3 h−1 to discriminate between P and no-P. The P occurrence was calculated using the percentileofscore function of the scipy Python module70. We considered multiple thresholds because (i) different researchers and meteorological agencies adopt different definitions of “wet day” and “rain day”, (ii) the most suitable threshold depends on the application, and (iii) the estimation accuracy depends on the threshold16,71.

The nine peak P intensity indices measure the magnitude of daily or 3-hourly P events for return periods of 3 days, 7 days, 1 month, 3 months, 1 year, 3 years, 5 years, 10 years, and 15 years. The magnitudes were calculated using the percentile function of the numpy Python module72,73. Percentiles were calculated according to p = 100 − 100/(365.25 · R) for the daily time-scale and p = 100 − 100/(365.25 · 8 · R) for the 3-hourly time-scale, where p is the percentile (%) and R is the return period (years). Gauges with <3 years of data were discarded per the preceding subsection. However, we only calculated the three highest return periods if the record lengths were longer than 5, 10, and 15 years, respectively. We did not consider return periods >15 years because (i) the number of observations would significantly decrease, and (ii) the IMERG dataset, which was used as predictor, spans only 20 years.

### Neural network ensembles

We used feed-forward artificial neural networks based on the multilayer perceptron74 to produce global maps of the P indices. Neural networks are models composed of interconnected neurons able to model complex non-linear relationships between inputs and outputs. Neural networks have been successfully applied in numerous studies to estimate different aspects of P (e.g., Coulibaly et al.75, Kim and Pachepsky76, and Nastos et al.77). We used a neural network with one hidden layer composed of 10 nodes with the logistic sigmoid activation function. The stochastic gradient descent method was used to train the weights. To make the training more efficient, the inputs (i.e., the predictors; Table 1) were standardized by subtracting the means and dividing by the standard deviations of the global predictor maps. Additionally, the outputs (i.e., the P indices) were square-root transformed to reduce the skewness of the distributions, and standardized using means and standard deviations of the observed P indices.

To obtain an ensemble of neural networks, evaluate the generalizability of the approach, and quantify the uncertainty, we trained neural networks for each P index using ten-fold cross-validation. For each cross-validation iteration, the observations were partitioned into subsets of 90% for training and 10% for validation. The partitioning was random, although each observation was used only once for validation. Since the validation subsets are completely independent and were not used to train the neural networks, they allow quantification of the performance at ungauged locations.

We subsequently applied the trained neural networks using global 0.1° maps of the predictors, yielding an ensemble of ten global maps for each P index (one for each cross-validation iteration). The values were destandardized by multiplying by the standard deviation and adding the mean of observed P indices, and back-transformed by squaring the values, after which we used the mean of the ensemble to provide a ‘best estimate’ and the standard deviation to provide an indication of the uncertainty. As a last step, we sorted the eight mean P occurrence estimates for each 0.1° grid cell to ensure that they continuously decrease as a function of the thresholds, and sorted the nine mean peak P intensity estimates to ensure that they continuously increase as a function of the percentiles. The uncertainty estimates were re-ordered according to the mean estimates.

### Predictors

Table 1 presents the predictors used as input to the neural networks. The predictors were upscaled from their native resolution to 0.1° using average resampling. To highlight broad-scale topographic features, the elevation predictor (Elev) was smoothed using a Gaussian kernel with a 10-km radius following earlier studies78,79. The Elev predictor was square-root transformed to increase the importance of topographic features at low elevation80. The mean annual air temperature predictor (MAT) was included because higher air temperatures are generally associated with higher P intensities56,81.

Mean annual P predictors (MAP1 and MAP2) were included due to the high correlations found between mean and extreme values of P14,82. To account for the uncertainty in mean annual P estimates, we considered two state-of-the-art climatologies: CHPclim V183 and WorldClim V284. Since the use of correlated predictors can increase training time85, we did not directly use both climatologies as separate predictors. Instead, we used square-root transformed mean annual P from WorldClim as predictor MAP1, and the difference in square-root transformed mean annual P between CHPclim and WorldClim as predictor MAP2. The square-root transformation was used to increase the importance of smaller values.

The ERA5 and IMERG predictors represent maps of the P index subject of the estimation derived from ERA5 and IMERG P data, respectively. The ERA5 and IMERG predictors were added because they provide valuable independent information about the P distribution with (near-)global coverage. All predictors cover the entire land surface, with the exception of IMERG, which was often not available at latitudes higher than approximately 60°N/S46. This issue was addressed by replacing missing IMERG estimates of the P indices with equivalent estimates from ERA5, resulting in a duplication of ERA5 at high latitudes. P distributional characteristics associated with different durations often exhibit similar spatial patterns27,86. To allow the 3-hourly estimates to take advantage of the much better availability of daily P observations (Figs. 1a and 2a), we used the daily climatology of the P index subject of the estimation as additional predictor for producing the 3-hourly climatology.

### Performance metrics

We assessed the performance of the PPDIST dataset, the ERA5 reanalysis, and the IMERG satellite product in estimating the P indices using the gauge observations as reference. Prior to the assessment, the P indices of the products and the reference were square-root transformed to reduce the skewness of the distributions. For the PPDIST dataset, we calculated the average of the ten scores calculated using the ten independent validation subsets of observations (one for each cross-validation iteration). For ERA5 and IMERG, we simply calculated the scores using all available observations. The following three performance metrics were used:

1. The coefficient of determination (R2), which measures the proportion of variance accounted for87, ranges from 0 to 1, and has its optimum at 1.

2. The mean absolute error (MAE), calculated as:

$${\rm{MAE}}=\frac{\mathop{\sum }\limits_{i=1}^{n}\left|{X}_{i}^{e}-{X}_{i}^{o}\right|}{n},$$
(1)

where $${X}_{i}^{e}$$ and $${X}_{i}^{o}$$ represent estimated and observed values of the transformed P indices, respectively, i = 1, …, n denotes the observations, and n is the sample size. The MAE ranges from 0 to ∞ and has its optimum at 0. We used the MAE, instead of the more common root mean square error (RMSE), because the errors are unlikely to follow a normal distribution88,89.

3. The bias (B), defined as:

$$B=\frac{\mathop{\sum }\limits_{i=1}^{n}{X}_{i}^{e}-{X}_{i}^{o}}{\mathop{\sum }\limits_{i=1}^{n}{X}_{i}^{e}-{X}_{i}^{o}}.$$
(2)

B ranges from −1 to 1 and has its optimum at 0. Other widely used bias formulations, which contain the sum of only the observations in the denominator rather than the sum of both the observations and estimates90,91, were not used, because they yield unwieldy high B values when the observations tend to zero.

## Data Records

The PPDIST dataset is available for download at figshare92 and www.gloh2o.org/ppdist. The data are provided as a zip file containing four netCDF-4 files (www.unidata.ucar.edu/software/netcdf) with gridded observations, mean estimates, and uncertainty estimates of (i) the daily P occurrence (daily_occurrence_point_scale.nc), (ii) the daily peak P intensity (daily_intensity_point_scale.nc), (iii) the 3-hourly P occurrence (3hourly_occurrence_point_scale.nc), and (iv) the 3-hourly peak P intensity (3hourly_intensity_point_scale.nc). The grids have a 0.1° spatial resolution and are referenced to the World Geodetic Reference System 1984 (WGS 84) ellipsoid. The observational grids were produced by averaging the observations if multiple observations were present in a single grid cell. The netCDF-4 files can be viewed, edited, and analyzed using most Geographic Information Systems (GIS) software packages, including ArcGIS, QGIS, and GRASS.

## Technical Validation

Figure 3 presents the performance of the PPDIST dataset, the ERA5 reanalysis, and the IMERG satellite product in estimating the daily P indices (Supplement Fig. S4 presents the performance in estimating the 3-hourly P indices). The PPDIST scores represent averages calculated from the ten independent validation subsets of observations (one for each cross-validation iteration). The PPDIST R2 values ranged from 0.76 to 0.80 (mean 0.78) for the daily P occurrence indices, 0.85 to 0.88 (mean 0.86) for the 3-hourly P occurrence indices, 0.44 to 0.84 (mean 0.72) for the daily peak P intensity indices, and 0.72 to 0.88 (mean 0.83) for the 3-hourly peak P intensity indices (Fig. 3 and Supplement Fig. S4). The 3-hourly R2 values were thus higher than the daily R2 values, likely because the 3-hourly gauge observations tend to be of higher quality62. The markedly lower R2 for the 3-day return-period daily P intensity reflects the fact that the majority (58%) of the observations were 0 mm d−1. For the ≥3-year return-period daily and 3-hourly P indices, the lower performance reflects the uncertainty associated with estimating the magnitude of heavy P events93,94,95. The PPDIST scores were superior to the ERA5 and IMERG scores for every P index and performance metric by a substantial margin. The PPDIST B scores were close to 0 for all P indices, indicating that the estimates do not exhibit systematic biases, apart from those already present in the gauge data due to wind-induced under-catch64,96 and the underreporting of light P events65. Overall, these results suggest that the PPDIST dataset provides a reliable global-scale picture of the P probability distribution.

The ERA5 and IMERG products yielded mean R2 values of 0.59 and 0.58 for the daily P occurrence indices, 0.69 and 0.49 for the 3-hourly P occurrence indices, 0.65 and 0.56 for the daily P intensity indices, and 0.79 and 0.57 for the 3-hourly P intensity indices, respectively (Fig. 3 and Supplement Fig. S4). The global patterns of P occurrence and peak P intensity were thus generally better estimated by ERA5 than by IMERG. However, ERA5 consistently overestimated the P occurrence and the lowest return periods, in line with evaluations of other climate models30,37,97. Additionally, ERA5 generally showed lower values for the highest return periods (Fig. 3), which is at least partly due to the spatial scale mismatch between point-scale observations and grid-box averages43,47,98,99. Conversely, ERA5 strongly overestimated the P intensity over several grid cells in eastern Africa in particular (Fig. 2c), likely due to the presence of so-called “rain bombs” caused by numerical artifacts100.

The overly high P occurrence of IMERG in the tropics (Fig. 1d) may be at least partly attributable to the inclusion of relatively noisy P estimates from the SAPHIR sensor on board the tropical-orbiting Megha-Tropiques satellite46. Similar to the TMPA 3B42 and CMORPH satellite products, IMERG overestimated the P occurrence over several large rivers (notably the Amazon, the Lena, and the Ob), likely due to misinterpretation of the emissivity of small inland water bodies as P by the passive microwave P retrieval algorithms101. The too low P occurrence at high latitudes reflects the challenges associated with the retrieval of snowfall26,102 and light P103,104. Furthermore, IMERG tended to underestimate the P intensity over regions of complex topography (e.g., over Japan and the west coast of India; Fig. 2d), which is a known limitation of the product23,105. The too high P intensity in the south central US may stem from the way P estimates from the various passive microwave sensors are intercalibrated to the radar-based CORRA product in the IMERG algorithm46.

## Usage Notes

The PPDIST dataset will be useful for numerous purposes, such as the evaluation of P from climate models (e.g., Dai4 and Bosilovich et al5.), the distributional correction of gridded P datasets (e.g., Zhu and Luo6, Xie et al.7, and Karbalaee et al.8), the design of infrastructure in poorly gauged regions (e.g., Lumbroso et al.9), and in hydrological modelling where P intensity is required (e.g., Donohue et al.106 and Liu et al.107). However, some caveats should be kept in mind:

1. 1.

The uncertainty estimates included in the PPDIST dataset provide an indication of the reliability of the climatologies. The uncertainty is generally higher in sparsely gauged, mountainous, and tropical regions (Fig. 4). The uncertainty estimates may, however, be on the low end, because while we accounted for uncertainty in the mean annual P predictor, we did not account for uncertainty in the other predictors.

2. 2.

The P occurrence estimates using low thresholds to discriminate between P and no-P (≤0.25 mm d−1 or mm 3 h−1) should be interpreted with caution due to the detection limits of gauges (~0.25 mm for the US National Weather Service [NWS] 8” standard gauge108) and satellite sensors (~0.8 mm h−1 for the TRMM microwave imager [TMI]109 and possibly higher for other sensors with coarser footprints) and the underreporting of light P events65.

3. 3.

P observations from unshielded gauges (e.g., the US NWS 8” standard gauge) are subject to wind-induced gauge under-catch and therefore underestimate rainfall by 5–15% and snowfall by 20–80%64,96. This underestimation is reflected in the PPDIST estimates. The Precipitation Bias CORrection (PBCOR) dataset110 (www.gloh2o.org/pbcor) provides a means to ameliorate this bias to some degree.

4. 4.

The PPDIST climatologies were derived from gauge observations and therefore represent the point scale. Compared to point-scale estimates, grid-box averages tend to exhibit more frequent P of lower intensity16,43,47,99. Point-scale P occurrence estimates can be transformed to grid-box averages using, for example, the equations of Osborn and Hulme111, whereas the peak P intensity estimates can be transformed to grid-box averages using, for example, the areal reduction factors collated by Pietersen et al.112.

5. 5.

As mentioned before, ERA5 strongly overestimates the P intensity over several grid cells in eastern Africa in particular (Fig. 2c). Since ERA5 was used as predictor, these artifacts are also present, albeit less pronounced, in the PPDIST P intensity estimates (Fig. 2b).

6. 6.

Some of the PPDIST climatologies (e.g., the 15-year return-period daily P climatology; Fig. 2b) exhibit a longitudinal discontinuity at 60°N/S, reflecting the spatial extent of the IMERG data (Table 1). Additionally, some of the climatologies exhibit discontinuities at coastlines (e.g., along the west coast of Patagonia), due to the use of different data sources for land and ocean for the MAP1, MAP2, MAT, and SnowFrac predictors.