Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

# Global daily 1 km land surface precipitation based on cloud cover-informed downscaling

## Abstract

High-resolution climatic data are essential to many questions and applications in environmental research and ecology. Here we develop and implement a new semi-mechanistic downscaling approach for daily precipitation estimate that incorporates high resolution (30 arcsec, 1 km) satellite-derived cloud frequency. The downscaling algorithm incorporates orographic predictors such as wind fields, valley exposition, and boundary layer height, with a subsequent bias correction. We apply the method to the ERA5 precipitation archive and MODIS monthly cloud cover frequency to develop a daily gridded precipitation time series in 1 km resolution for the years 2003 onward. Comparison of the predictions with existing gridded products and station data from the Global Historical Climate Network indicates an improvement in the spatio-temporal performance of the downscaled data in predicting precipitation. Regional scrutiny of the cloud cover correction from the continental United States further indicates that CHELSA-EarthEnv performs well in comparison to other precipitation products. The CHELSA-EarthEnv daily precipitation product improves the temporal accuracy compared with a large improvement in the spatial accuracy especially in complex terrain.

 Measurement(s) hydrological precipitation process Technology Type(s) cloud-cover informed downscaling Factor Type(s) temporal interval • geographic location Sample Characteristic - Environment climate system • cloud Sample Characteristic - Location global

Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.16910344

## Background & Summary

High resolution information on precipitation is essential in many scientific fields, ranging from ecology, agriculture, forestry, to global change impact studies1,2,3. Spatiotemporal precipitation data is usually derived from a range of different sources, including satellites, reanalysis, global circulation models, or precipitation gauges4,5. However, each of these sources on their own have limitations in coverage, accuracy, or detail, impeding many downstream uses, especially those addressing large spatial and temporal extents6,7.

Reanalysis data products such as ERA58,9, MERRA 210,11 or MSWEP12 overcome these constraints by combining data from a variety of sources. To date, however, they remain limited to rather coarse spatial resolutions such as 0.5° ~0.25°, i.e. ca. 55–27 km near the equator. This is much coarser than the scale of many environmental and ecological processes and the associated data requirements for ecosystem management and conservation. This resolution is furthermore too coarse to capture orographic precipitation in complex terrain13,14,15. Global circulation and weather models such as WRF-ARF16 and ICON17,18 are able to run at high spatial resolutions of 1 km, but are still heavily constrained by computational limits7. Currently, global kilometer scale models are only able to archive a simulation throughput of 0.043 SYPD (Simulated years per day)19, which amounts to an 100x shortfall compared to computationally efficient simulations defined as 1 SYPD7,20. Even with the largest supercomputers and with state of the art climate models, as well as large financial investments, this shortfall can only be reduced to approximately 20x21.

Although achieving 1 km resolution in numerical climate models is important to quantify effects such as deep convection or surface drag21, studies focusing on the impact of climate on different systems often rely on a limited set of climatic variables. In ecological studies for example, precipitation together with minimum-, mean-, and maximum temperatures, are often used to delineate occurrences of species22. It is common to characterize the range of a species by its climatic envelope in e.g. species distribution models (SDMs) using a rather simple set of climatic predictors and derivations thereof23,24. This means that for applications like these, only a subset of available variables needs to be downscaled to finer spatial resolution.

Environmental scientists or climate impact modelers in need of high-resolution precipitation data, therefore often resort to data from computationally less intensive methods. One such method is the spatial interpolation of data from climatic stations. Here, precipitation gauges form the input for interpolation25 or regression models to achieve a high spatial resolution either with or without additional, often terrain derived, predictors26,27,28. Such interpolations however, usually suffer from a spatially uneven station density29,30,31,32 and severely underestimate snowfall6,33,34,35. While gauge undercatch can be corrected using statistical methods in combination with steam flow observations36, the spatially uneven distribution of gauges can lead to false parametrizations of precipitation lapse rates in regression based interpolation methods37. One method to overcome the limitation imposed by uneven station density is to directly downscale the output of reanalysis data calculated at coarser cell size37,38,39. However, there are still interpolations and parametrizations involved which account for processes not resolved at the original model resolution.

This uneven distribution of gauges can be overcome by the use of satellite data2,40,41,42,43, which offers spatially more complete information of precipitation patterns. Yet, satellites also detect snowfall poorly44,45, meaning that the satellite-derived amounts of precipitation have to be corrected. This is usually done by bias-correction using station observations46,47,48,49,50. Although satellite precipitation products generally have a higher horizontal resolution than reanalysis data, they are globally still not available at resolutions of 1 km needed for local impact studies. However, available at this very high resolution is cloud cover from satellites, which can potentially lead to improved spatial representation of precipitation. The established relationship between cloud occurrences and precipitation51,52 scales precipitation with cloud cover frequency such that if no clouds occur there is no precipitation, and increasing cloud frequency translates to increasing precipitation53.

Here we merge data from a downscaled reanalysis (ERA5) using the CHELSA algorithm37,39 with cloud cover information derived from MODIS from the EarthEnv layer suite (https://www.earthenv.org)54 to achieve a better representation of the fine-scale variation of global precipitation patterns. The presented CHELSA-EarthEnv daily precipitation data at ~1 km horizontal resolution offers a more reliable characterization of precipitation in topographically heterogeneous regions and supports a range of applications that require high resolution precipitation data.

## Methods

### Bias correction of ERA5 precipitation data

ERA5 shows an increased performance over its predecessor ERA-Interim in several attributes55 and especially in precipitation56. Nevertheless, for application in impact studies, there is often still a significant bias observed in several parameters that need to be accounted for57. For accumulated parameters such as total daily precipitation, we used the monthly sum of the hourly precipitation from ERA5 pera to assess this bias. ERA5 generated estimates of the surface precipitation, similar to its predecessor ERA-Interim are extracted from short range forecasts, which vary considerably with forecast length58. This bias in the short-range forecasts can be a problem for monthly and climatological means as it accumulates over time58. Several methods exist to account for such biases but most of them require gapless gridded observational data which comes with an inherent interpolation error itself57. To correct the bias in the ERA5 precipitation estimates and account for the interpolation errors we therefore performed a bias correction which consists of three steps.

1. 1.

One very common approach to account for reanalysis bias is to calculate the difference between baseline precipitation from the reanalysis and the observed precipitation from station data and apply this ‘change factor’ to the reanalysis data. We apply a monthly bias correction on the accumulated ERA5 precipitation for each month psim. We used the monthly accumulated precipitation pobs of the gridded GPCC 2018 dataset59. The bias correction in earlier versions of CHELSA did not adequately interpolate across the dateline, which caused artefacts in this region. To correct for this we reprojected both pera and pobs to a North Pole Azimuthal Equidistant projection (EPSG: 102016) to allow interpolation across the dateline. We then calculate the monthly bias Rm caused by the ERA5 parametrization, and the excessive or insufficient precipitation of the forecast algorithm for each month using:

$${R}_{m}^{obs}=\frac{{p}_{obs}+c}{{p}_{sim}+c}$$

with c being a constant of 0.0001 kg*m−2*s−1 to avoid division by zero. We only used grid cells with meteorological stations present for the calculation of the observed bias $${R}_{m}^{obs}$$. The forecast algorithm used to produce the precipitation amounts for ERA5 exhibits a considerable bias (too much or too less precipitation), that has a coherent spatial structure, with a larger bias over high elevation terrain, or specific landforms such as tropical rainforests. Based on this observation, we assumed that grid cells without stations share a similar bias as their neighbouring stations.

2. 2.

To achieve a gap-free bias correction grid surface, we interpolated the gaps in the Rm grid using a multilevel B-spline interpolation60 with 14 error levels optimized using B-spline refinement to a 0.25° resolution. The multilevel B-spline approximation60 applies a B-spline approximation to Rm starting with the coarsest lattice ϕ0 from a set of control lattices $${\phi }_{0},{\phi }_{1},\ldots ,{\phi }_{n}$$ with n = 14 that have been generated using optimized B-spline refinement61. The resulting B-spline function $${f}_{0}({R}_{m}^{obs})$$ gives the first approximation of Rm. $${f}_{0}({R}_{m})$$ leaves a deviation between $${\Delta }^{1}{R}_{m\,c}^{obs}={R}_{m}^{obs}-{f}_{0}({x}_{c},{y}_{c})$$ at each location $$\left({x}_{c},{y}_{c},{R}_{m\quad c}^{obs}\right)$$61. Then the next control lattice $${\phi }_{1}$$ is used to approximate $${f}_{1}({\Delta }^{{\rm{1}}}{R}_{m\,c}^{obs})$$61. Approximation is then repeated on the sum of $${f}_{0}+{f}_{1}$$ = $${R}_{m}^{obs}-{f}_{0}\left({x}_{c},{y}_{c}\right)-{f}_{1}\left({x}_{c},{y}_{c}\right)$$ at each point $$\left({x}_{c},{y}_{c},{R}_{m\quad c}^{obs}\right)$$ n times resulting in the gap free interpolated bias surface $${R}_{m}^{int}$$61.

3. 3.

The bias correction surface $${R}_{m}^{int}$$ is then multiplied with the ERA5 precipitation psim to get the bias corrected monthly precipitation sums $${p}_{m}^{cor}$$ at 0.25° resolution:

$${p}_{m}^{cor}={p}_{sim}\ast {R}_{m}^{int}$$

### Orographic wind effects

Orographic effects are among the most reported drivers of precipitation62,63,64,65,66. Orographic effects have been taken into account using a variant of the CHELSA V1.2 algorithm which uses a parametrization of orographic rainfall based on wind fields67,68,69,70. We used daily u-wind and v-wind components at the 10-m level of ERA5 as underlying wind components. As the calculation of a windward leeward index H (hereafter: wind effect) requires a projected coordinate system, both wind components (u-wind, v-wind) were projected to a world Mercator projection and then interpolated to a 3 km grid resolution using a multilevel B-spline interpolation similar to the one used for the bias correction surface. The resolution of 3 km was chosen as resolutions of around 1 km would over-represent orographic terrain effects26. The wind effect H was then calculated by multiplying the windward Hw and leeward HL components calculated using:

$${H}_{W}=\frac{{\sum }_{i=1}^{n}\frac{1}{{d}_{WHi}}{tan}^{-1}\left(\frac{{d}_{WZi}}{{d}_{WHi}^{0.5}}\right)}{{\sum }_{i=1}^{n}\frac{1}{{d}_{LHi}}}+\frac{{\sum }_{i=1}^{n}\frac{1}{{d}_{LHi}}ta{n}^{-1}\left(\frac{{d}_{LZi}}{{d}_{LHi}^{0.5}}\right)}{{\sum }_{i=1}^{n}\frac{1}{{d}_{LHi}}}$$
$${H}_{L}\frac{{\sum }_{i=1}^{n}\frac{1}{{\rm{ln}}\left({d}_{WHi}\right)}ta{n}^{-1}\left(\frac{{d}_{LZi}}{{d}_{WHi}^{0.5}}\right)}{{\sum }_{i=1}^{n}\frac{1}{{\rm{ln}}\left({d}_{LHi}\right)}}$$

where $${d}_{WHi}$$ and $${d}_{LHi}$$ refer to the horizontal distances between the focal 3 km grid cell in windward and leeward direction and $${d}_{WZi}$$ and $${d}_{LZi}$$ are the corresponding vertical distances compared with the focal 3 km cell following the wind trajectory. Distances are summed over a search distance of 75 kilometers as orographic airflows are limited to horizontal extents between 50–100 km71,72. The second summand in the equation for $${H}_{W,L}$$ where $${d}_{LHi} < 0$$ accounts for the leeward impact of previously traversed mountain chains. The horizontal distances in the equation for $${H}_{W,L}$$ where $${d}_{LHi}\ge 0$$ lead to a longer-distance impact of leeward rain shadow. The final wind-effect parameter, which is assumed to be related to the interaction of the large-scale wind field and the local-scale precipitation characteristics, is calculated as:

$$H={H}_{W,L}\to {d}_{LHi} < 0\ast {H}_{W,L}\to {d}_{LHi}\ge 0$$

and generally, takes values between 0.7 for leeward and 1.3 for windward positions. Both equations were applied to each grid cell at the 3 km resolution in a World Mercator projection.

We used the boundary layer height PBL from ERA5 as an indicator of the pressure level that has the highest contribution to the wind effect. PBL and H have been interpolated to a 30 arc second using a B-spline interpolation. To create a boundary layer height corrected wind effect HB, the wind effect grid H containing was then proportionally distributed to all grid cells falling within a respective 0.25° grid cell using:

$${H}_{B}=\frac{H}{1-\left(\frac{| z-PB{L}_{z}| \,-\,{z}_{max}}{h}\right)}$$

with zmax being the maximum distance between the boundary layer height Bz at elevation z and all grid cells at a 30 arc sec resolution falling within a respective 0.25° grid cell, h being a constant of 9000 m, and z being the respective elevation from the Global Multi-resolution Terrain Elevation Data (GMTED2010)72 with:

$$PB{L}_{z}=PBL+{z}_{ERA}+f$$

where B is the height of the daily means of the boundary layer from ERA5, and zERA is the elevation of the ERA5 grid cell. The boundary layer height provided by ECMWF is based on the Richardson number73 which is usually at the lower end of the elevational spectrum compared to other methods74. We therefore tuned our model by adding a constant of 500 m similar to the approach in the original CHELSA algorithm37.

Although the wind effect algorithm can distinguish between the windward and leeward sites of an orographic barrier, it cannot distinguish extremely isolated valleys in high mountain areas75. Such dry valleys are situated in areas where the wet air masses flow over an orographic barrier and are prevented from flowing into deep valleys75. These effects are however mainly confined to large mountain ranges, and are not as prominent in intermediate mountain ranges72. To account for these effects, we used a variant of the windward-leeward equations with a linear search distance of 300 km in steps of 5° from 0° to 355° circular for each grid cell. The calculated leeward index was then scaled towards higher elevations using:

$$E={\left(\frac{{\sum }_{i=1}^{n}\frac{1}{{\rm{ln}}\left({d}_{WHi}\right)}ta{n}^{-1}\left(\frac{{d}_{LZi}}{{d}_{WHi}^{0.5}}\right)}{{\sum }_{i=1}^{n}\frac{1}{{\rm{ln}}\left({d}_{LHi}\right)}}\right)}^{\frac{z}{h}}$$

which rescales the strength of the exposition index relative to elevation z from GMTED2010, and gives valleys at high elevations larger wind isolations E than valleys located at low elevations. The correction constant h was set to 9000 m to include all possible elevations of the DEM and because values of z > h could otherwise lead to a reverse relationship between z and E.

$${p}_{{I}_{c}}=E\ast {H}_{B}$$

will give the first approximation of precipitation intensity $${p}_{{I}_{c}}$$ at each grid location ($${x}_{c},{y}_{c}$$).

### Precipitation including orographic effects

To achieve the distribution of daily precipitation po given the approximated precipitation intensity $${p}_{Ic}$$ at each grid location ($${x}_{c},{y}_{c}$$), we used a linear relationship between $${p}_{m}^{cor}$$ and $${p}_{Ic}$$ using:

$${p}_{o}=\frac{{p}_{Ic}}{\frac{1}{n}{\sum }_{i=1}^{n}\,{p}_{Ici}}\ast {p}_{m}^{cor}$$

where n equals the number of 0.0083334°. grid cells that fall within a 0.25 grid cell. This equation ensures that the precipitation at 0.25° resolution exactly matches the mean precipitation of all 0.0083334° cells that overlap with a 0.25° cell.

The GPCC dataset used for the bias correction does not include a correction for gauge undercatch. We therefore additionally correct for gauge undercatch using a downscaled version of the bias correction layers from Beck et al. 202036. We downscaled the bias correction surfaces to 0.0083334° by using a moving window regression with a search radius of three cells and elevation from GMTED2010 as predictor. We then multiplied this downscaled bias correction layer with the po.

### Monthly cloud frequencies

To derive monthly cloud frequencies, we used the internal cloud mask in the PGE11 program that relies on two reflective and one thermal test MODIS MOD09 atmospherically corrected surface reflectance product76,77. The reflective tests include the shortwave and middle infrared data combined in the “middle infrared anomaly” index (MIRA = ρ20,21 − 0.82ρ7 + 0.32ρ6, where ρ indicates MODIS band number). The second test uses reflectance at 1.38 microns (1.38 mic = ρ26). The MIRA and the 1.38 mic reflectance are designed to be complementary, with MIRA efficiently detecting low or high reflective clouds77, while 1.38 mic effectively detects high (and potentially not very reflective) clouds. Additionally, a thermal test is used to identify pixels with high infrared reflectance anomalies (e.g., fires, sun-glint, and high albedo surfaces) with respect to near-surface (2 m) air temperature computed by the NCEP reanalysis model78. The MOD09 cloud algorithm was designed to minimize confusion over snow and ice by taking the surface air temperature into account. Like many cloud masks, the MOD09 detection algorithm has a binary response (cloudy/not cloudy) and does not retain an estimate of confidence in cloud state (i.e., probability that the pixel was actually cloudy given the tests). We extracted the daily cloud flags from bit 10 of the daily daytime surface reflectance product “state 1 km” Scientific Data Set (SDS) from both the Terra (MOD09GA, collected at approximately 10:30 AM local time) and Aqua (MYD09GA, approximately 1:30 PM) satellites. The time series of monthly cloud frequencies (proportion of days with a positive cloud flag) was calculated separately for the daily MOD09GA and MYD09GA data using the Google Earth Engine application programming interface (http://earthengine.google.org/).

### Cloud frequency correction of daily precipitation estimates

We include monthly cloud frequencies $$c{f}_{m}$$ into the daily precipitation estimates assuming that the frequency of cloud occurrences is related to precipitation events and their geographic distribution carries a spatial signal of precipitation51,52. Strictly we assume that where no clouds occur, no precipitation occurs, and where clouds occur more frequently, more precipitation occurs53. To achieve the distribution of daily precipitation p given the approximated orographic corrected precipitation $${p}_{o}$$ at each grid location ($${x}_{c},{y}_{c}$$), we first approximate the cloud cover corrected precipitation intensity using:

$${p}_{cfc}={p}_{o}\ast c{f}_{m}$$

This however distorts the precipitation amount of each grid cell. We therefore repeat the step used to create orographic precipitation in a similar manner by estimating daily precipitation p at each grid location ($${x}_{c},{y}_{c}$$) using:

$$p=\frac{{p}_{c{f}_{c}}}{\frac{1}{n}{\sum }_{i=1}^{n}\,{p}_{c{f}_{ci}}}\ast {p}_{m}^{cor}$$

where n equals the number of 30 arc sec. grid cells that fall within a 0.25 grid cell.

## Data Records

The dataset79 is available at EarthEnv (https://doi.org/10079/MOL/6f52b80d-0a41-40f7-84ec-873458ca6ee6). All files are provided as georeferenced tiff files (GeoTIFF). GeoTIFF is a public domain metadata standard which allows georeferencing information to be embedded within a TIFF file. Additional information included in the file are: map projection, coordinate systems, ellipsoids, datums, and fill values.

GeoTIFF can be viewed using standard GIS software such as:

SAGA GIS – (free) http://www.saga-gis.org/

ArcGIS - https://www.arcgis.com/

QGIS - (free) www.qgis.org

DIVA – GIS - (free) http://www.diva-gis.org/

GRASS – GIS - (free) https://grass.osgeo.org/

All files contain variables that define the dimensions of longitude and latitude (Table 1). The time variable is usually encoded in the filename.

All files are in a geographic coordinate system referenced to the WGS 84 horizontal datum, with the horizontal coordinates expressed in decimal degrees. The extent (minimum and maximum latitude and longitude) are a result of the coordinate system inherited from the 1-arc-second GMTED2010 data which itself inherited the grid extent from the 1-arc-second SRTM data.

The filename includes the respective model used, the variable short name, the respective time variables, and the version of the data:

[Model]_[short_name]_[day]_[month]_[year]_[Version].tif

There are two different models available. CHELSA which includes the results from the bias correction and orographic correction, and CHELSA_EarthEnv which includes the cloud cover correction as well.

The unit of the precipitation is CHELSA_EarthEnv is: (kg*m−2*day−1)/100.

## Technical Validation

To validate the performance of CHELSA_EarthEnv we are focusing on (a) the downscaling performance by calculating different performance metrics between coarse and high resolution and comparing observations from meteorological stations and (b) a comparison with similar high-resolution precipitation datasets (Table 2) within the continental United States where meteorological station density is high and of good quality.

### Validating the downscaling performance

To validate if the downscaling to 0.0083334° resolution leads to a better performance over the coarser 0.25° gridded data that was used as forcing, we compare both resolutions with precipitation measured at Global Historical Climate Network – daily weather stations (GHCN-D)80. The 0.25° resolution has been chosen as benchmark as it is the resolution of the forcing ERA5 data that is used as an input for CHELSA_EarthEnv. To set the performance changes in to four comparable products (Table 2): PRISM AN81d, MSWEP 2.1, CHIRPS 2.0, and WorldClim 2.1 and repeated the analysis with these datasets over the continental United States except Alaska.

### Accessing the global downscaling performance across several metrics

To validate the performance of CHELSA_EarthEnv globally we compare it to observations at metrological stations from the GHCN-D80 network for the time 2003–2016. We use only stations without any quality flags and compare them to the precipitation data at the coarse 0.25°, and the high 0.0083334° spatial resolution.

Downscaling can affect different aspects of model performance such as bias, variability, or correlation coefficients. To test in a first step which metric is affected by the applied downscaling we calculated for each grid cell separately the Kling-Gupta efficiency (KGE) scores from daily time series from 2003 to 2016. KGE is a performance metric combining correlation, bias, and variability81,82 and is defined as follows:

$$KGE=1-\sqrt{{\left(r-1\right)}^{2}+{\left(\beta -1\right)}^{2}+{\left(\gamma -1\right)}^{2}}$$

where the correlation component r is represented by the Pearson’s correlation coefficient, the bias component 𝛽 by the ratio of estimated and observed means, and the variability component $$\gamma$$ by the ratio of the estimated and observed coefficients of variation:

$$\beta =\frac{{\mu }_{s}}{{\mu }_{s}}\,and\,\gamma =\frac{\frac{{\sigma }_{s}}{{\mu }_{s}}}{\frac{{\sigma }_{o}}{{\mu }_{o}}}$$

where μ is the mean and σ the standard deviation, and the subscripts s and o indicate simulated and observed, respectively. KGE, r, β, and γ values all have their optimum at 1. KGE values between −0.41 and 1 indicate that the model estimates precipitation better than just taking the mean of the recorded precipitation at the gauges83.

We also calculated the percent bias (pbias) that reflects the average tendency of the modelled precipitation values $${p}_{sim}$$ to be larger or smaller than their observed values $${p}_{obs}$$ at the stations. The optimal value of pbias is 0, with low values indicating accurate model simulation. Positive values indicate an overestimation, whereas negative values indicate an underestimation. pbias is defined as follows:

$$pbias=100\ast \left(\frac{{\sum }_{i=0}^{n}\left({p}_{si{m}_{i}}-{p}_{ob{s}_{i}}\right)}{{\sum }_{i=0}^{n}{p}_{ob{s}_{i}}}\right)$$

Additionally, we also report the mean absolute error (mae) which is defined as:

$$mae=\frac{1}{n}\left(\mathop{\sum }\limits_{i=0}^{n}\left|{p}_{si{m}_{i}}-{p}_{ob{s}_{i}}\right|\right)$$

and the root mean squared (rmse) error which is defined as:

$$rmse=\sqrt{\frac{1}{n}\left(\mathop{\sum }\limits_{i=0}^{n}{\left({p}_{si{m}_{i}}-{p}_{ob{s}_{i}}\right)}^{2}\right)}$$

### Accessing the regional performance

To compare the results to similar precipitation datasets, we use GHCN-D and four other gridded datasets (Table 2) that provide data over the same time period: PRISM (AN81d)26, MSWEP 2.112, and CHIRPS 2.043, and WorldClim 2.184. PRISM is a high-resolution precipitation dataset for the United States that, similar to CHELSA_EarthEnv, takes orographic effects into account and additionally profits from a dense quality-controlled network of weather stations. While PRISM uses a regression approach to predict long term precipitation climatologies, daily precipitation is derived from climatologically aided interpolation (CAI)85. MSWEP 2.1 is a merged product from various sources (weather stations, reanalysis data, satellite observations) and consistently has high performance scores in comparison to other precipitation products6. CHIRPS is a high-resolution precipitation dataset, that integrates remote sensed precipitation with observations from weather stations. Additionally, we also include the WorldClim 2.1 data in our comparison. Although WorldClim 2.1 does not offer daily data, it provides monthly timeseries that has been created using climatologically aided interpolation of the CRU-TS 4.03 data86. All these datasets have been aggregated over the period 2003–2016 to annual means to gain a comparable temporal extent as CHELSA_EarthEnv. We then compare these data from the different datasets with observations from GHCN-D80 for the continental United states except Alaska. Within this spatial extent all five products overlap and the quality of the stations can be considered as high. All products have additionally been aggregated to a 0.25° grid resolution by taking the mean of all grid cells overlapping with a 0.25° grid cell in WGS84 geographic projection. We then used all stations with data available between 2003 and 2016 and without any quality flag (58,071 stations) and extracted precipitation from both the highest available spatial resolution of the different datasets (Table 2) and the coarse 0.25° resolution using a nearest neighbour approach. We then calculated the differences in absolute bias between coarse and high resolution, and compared these among products using an ANOVA with post-hoc Tukey HSD test.

### Comparison with PRISM

The validation of the temporal accuracy, done using the GHCN-D station data gives information how well a product reproduces precipitation directly at the locations of these stations. All products we compare to CHELSA here are however, at least partly, parameterized on a subset of the GHCN-D stations as well. This often leads to a high fit with station data in all products that use exactly these climate stations at the locations of the stations. However, predicted precipitation patterns between stations, where the data is actually interpolated or predicted cannot be validated in this way. The performance of a model to predict the spatial patterns of precipitation correctly could for example be accessed by a cross validation approach, but this is not possible without the station data or the source code of the respective model being available. As the exact station data each dataset uses are generally not available, we use the spatially explicit PRISM model as a benchmark for comparison. PRISM has a very high accuracy and captures small scale precipitation gradients well. It uses the highest amount of meteorological stations of all models compared here. It is however, also a model and therefore has its own inherent biases. To compare models, we aggregated the daily values (monthly for WorldClim) over 2003–2016 to mean annual precipitation, and calculated the bias and correlation between products.

### Comparing precipitation lapse rates

In a case study, we compare CHELSA-EarthEnv’s annual precipitation climatology in coastal British Columbia with that of PRISM, simulation data from the Weather Research and Forecasting (WRF) convection-permitting dynamical simulation for North America87; and WorldClim2.1. We calculated horizontal precipitation gradients for each grid cell by multiplying precipitation lapse rate by the terrain slope. The precipitation lapse rate is calculated from a moving window regression of precipitation against elevation in the 8 cells surrounding the focal cell.

### Accessing the improvement from the cloud layers

We validate the inclusion of the cloud frequencies from MODIS in two steps. First, we compare the global performance of the precipitation dataset with, and without cloud refinement globally using GHCN-D. The refinement however, is done at the 0.0083334° resolution, and the mesoscale patterns of the data with, or without refinement are nearly identical. To compare the to datasets with, and without refinement at the scale where an effect of the cloud layer is actually expected, we use the island of Hawai’i as an example. Here both the station density and the quality of the stations are high, and the island has step precipitation gradients ranging from nearly 0 to >20 kg m−2s−1. We use 105 stations that recorded at least 25 days per month between 2003 and 2016 from GHCN-D dataset and compare the annual mean precipitation it to the one derived at the original 0.25° resolution, the data without cloud refinement, and the data with cloud refinement at 0.0083334° resolution.

### Global downscaling performance across several metrics

Kling-Gupta Efficiency, as well as Pearson’s r values were highest in Europe, Central Asia, and North America (Supplementary Fig. 1). The lowest values are found within the tropics, but also in areas with very high precipitation, such as Venezuela, Colombia, or the Congo basin, or very low precipitation, such as the Sahara, or the Arabian Peninsula. There are several possible explanations for the relatively lower performance in the tropics. We are using the GHCN-D dataset for validation, as it is one of the few available datasets for large-scale, global validation of precipitation. Gauge data such as GHCN-D is however very heterogeneous in quality30,83,85,86 and, even after cleaning using the provided quality flags, errors likely remain. The lower validation performance in these regions may therefore be partially an artefact of poor station data quality.

Differences in KGE values between coarse and high resolution are higher in areas with large spatial heterogeneity such as mountains (Fig. 1). This shows that the downscaling has a positive effect on the estimation of precipitation at high spatial resolutions (Table 3). The increase in KGE values is however, not confined to areas with heterogeneous terrain, but also the lowlands in the United States or Europe. The high-resolution data shows improvements in KGE and all of its components compared to the coarse 0.25° data. Performance gains are given for the root mean squared error (rmse), mean absolute error (mae), and percent bias (pbias) (Fig. 1). The global performance gain is $$\Delta KGE=$$ 0.045, but shows a strong geographical pattern (Fig. 1) especially in mountainous regions such as the Andes, or the Rocky Mountains, but also large parts of Asia. Performance losses are most prominent in Western Indonesia, with the rest of Indonesia however, showing a gain in KGE.

While globally an increase in the γ component of KGE is larger than the increase in the β or r component, in most of the regions with the highest gain in KGE, both increases in r and β prevail. A possible explanation for this is that the inclusion of topography in the downscaling has the largest effect on the bias (Fig. 1).

The more evenly distributed differences in the γ component, which reflects the variability in precipitation is most likely due to the inclusion of the MODIS cloud cover, that adds additional information on the spatio-temporal variance in precipitation to the downscaling. Although we only included monthly cloud frequency distributions into the downscaling, this shows the potential high resolution cloud cover frequencies have in improving high resolution precipitation estimates globally.

### Regional performance

The comparison of all five precipitation products for the continental United States, shows a relatively high performances of all datasets (Fig. 2) ranging from a correlation of r ~ 0.85 (PRISM), to r ~ 0.5 (CHIRPS). CHELSA_EarthEnv performs slightly worse than MSWEP in estimating daily precipitation rates, but better than CHIRPS. PRISM performs best with the highest correlations compared to GHCN-D. The performance increases for all products when monthly climatological means, instead of daily precipitation values are used, with CHELSA, CHIRPS, and MSWEP performing almost identically. PRISM still outperforms all models slightly. WorldClim shows a comparably poor performance compared to all other products during the period 2003–2016 with low correlations (r ~ 0.5) and a much higher standard deviation than all other products.

All precipitation products use part of the GHCN-D stations to parametrize their algorithms. PRISM uses the daily station data directly and uses the anomalies from long term climatologies at the stations and interpolates them to achieve a gap free anomaly surface for the CAI. The achieved performance might therefore be due to the high station density in PRISM itself. CHELSA_EarthEnv uses GPCC gridded station data at 0.25° for a bias correction, therefore the algorithm cannot force the interpolation through each station location directly, which might explain the difference between PRISM and CHELSA_EarthEnv. CHIRPS uses a smaller set of stations compared to PRISM, so the difference in performance might partly be due to the less dense station network. MSWEP uses a wide variety of input sources from remote sensed data, to reanalysis data, to station data. MSWEP therefore averages out most of the errors of a single source, which leads to a relatively high performance in the resulting precipitation estimates12. Interestingly, WorldClim does not perform well compared the other products, despite being parameterized on a large number of stations. This might be due to errors in the parametrization of the predictors used for the long term climatologies, or uncertainties from the CAI applied on the CRU-TS data.

### Downscaling performance in relation to comparable products

The bias compared to observations at stations is heterogenous in all different precipitation datasets. PRISM shows the lowest bias compared to GHCN-D data, while CHELSA_EarthEnv, MSWEP, and CHIRPS show similar biases (Fig. 3). WorldClim has the largest overall bias of all five comparable products.

A similar pattern emerges when the different products are compared at the 0.25° and the highest resolution. Comparing the absolute bias of the coarse resolution aggregations with the highest available resolutions shows that all different precipitation datasets have a lower absolute bias at the highest spatial resolution (Fig. 3). The amount of bias correction however varies to a large degree, with PRISM and CHELSA_EarthEnv showing the largest bias reduction, while CHIRPS and MSWEP show a slightly lower bias reduction, and WorldClim the lowest reduction. The relative smaller reduction of CHIRPS and MSWEP to CHELSA_EarthEnv and PRISM might could be due to the lower native spatial resolution (Table 2) compared to CHELSA_EarthEnv and PRISM (Table 2). However, the monthly WorldClim timeseries has the same native spatial resolution as PRISM, and still has a very low difference in absolute bias between the high and the coarse resolution, indicating poor downscaling performance.

Downscaling performance also varies geographically (Fig. 4). Generally, the bias reduction is higher in mountainous regions of the western United States, and lower in the more homogenous terrain in the east. Comparing at which stations the bias is reduced (Fig. 5), shows that PRISM, CHELSA_EarthEnv, MSWEP and CHIRPS are able to reduce the absolute bias in mountainous terrain, but also in the convective regimes of the Midwest and Southwest of the United States. WorldClim only reduces the bias in the mountainous regions, but does not reduce the precipitation bias in convective regimes.

### Comparison with PRISM

PRISM shows consistently the highest performance metrics and is therefore a suitable benchmark for a spatially explicit comparison. Overall, all precipitation datasets show similar mesoscale patterns of precipitations (Fig. 5). Marked differences are mainly apparent in the southwestern United states, where all models are comparably dryer than PRISM. Differences are also apparent in the eastern Rocky Mountains, where CHIRPS, MSWEP, and WorldClim have a considerable dry bias, but CHELSA_EarthEnv shows more similar precipitation rates as PRISM. Overall CHELSA_EarthEnv shows the lowest differences and highest correlations to PRISM (Fig. 6) (r = 0.97, mae=0.20), followed by MSWEP (r = 0.97, mae = 0.23) and CHIRPS (r = 0.96, mae = 0.23). WorldClim shows the highest differences with PRISM and the lowest correlation among all products (r = 0.95, mae = 0.28).

### Precipitation lapse rates

The general similarity between CHELSA-EarthEnv and PRISM (at 800 m resolution) in precipitation amount and in precipitation gradients (Fig. 7d,f) is notable, given that elevation-precipitation relationships in CHELSA-EarthEnv are produced by the orographic wind effect algorithm, rather than by elevational relationships to station observations as in PRISM. The WRF simulation is independent of station observations and provides further evidence that precipitation increases with elevation in this region (Fig. 7h). Weaker gradients in WRF are due to the coarser (4 km) grid scale, which imposes more subdued gradients of both terrain and precipitation. The strong negative gradients in WorldClim2 (Fig. 7j) are due to derivation of a precipitation-elevation relationship from stations spanning the windward (low elevation stations with high precipitation) and leeward (higher elevation stations with low precipitation) sides of the mountain range. These erroneous negative gradients produce a strong underestimation of regional precipitation (Fig. 7i) as they are used to extrapolate station precipitation into higher elevations (Fig. 7k) that have very low station density. This case study illustrates the utility of CHELSA-EarthEnv for mountainous regions with sparse station observations: the dynamical ERA5 reanalysis provides a physically plausible regional distribution of precipitation while the orographic wind effects algorithm provides credible local elevational gradients, even in the absence of station observations.

### Improvement from the cloud layers

The global comparison between the predicted precipitation with and without cloud cover refinement yielded in very small differences in all test metrics indicating no significant differences in global test metrics (with cloud refinement r = 0.609, mae = 2.404, without refinement: r = 0.610, mae = 2.402). The cloud cover refinement, however happens on a spatial scale, that is not necessarily captured well by a global comparison. The local comparison for the island of Hawai’i (Fig. 8) indicates that the cloud cover refinement largely acts on the local scale, where it reduces the wet bias of the interpolation without cloud cover refinement. Without the refinement the CHELSA algorithm distributes precipitation based on wind fields and boundary layer height alone. It does not distinguish areas that are usually above the clouds very well, leading to an overestimation in precipitation in these areas. Here the cloud cover refinement shows an effect, by increasing the correlation between predicted precipitation and observed precipitation, as well as decreasing the error in the estimates (Fig. 8).

### Validation results—Conclusions

The comparison of the coarse grid resolution with the high resolution of CHELSA_EarthEnv shows that the applied downscaling is able to increase the accuracy of the precipitation predictions in several aspects and generates realistic precipitation patterns in complex terrain. The downscaling algorithm together with remotely sensed cloud cover performs equally well as other high-resolution products in predicting precipitation. The CHELSA_EarthEnv algorithm produces similar high resolution precipitation patterns as datasets that need to be informed by a high quality, dense weather station network without directly relying on stations itself. With respect to the realistic simulation of precipitation gradients in complex terrain, it also outperforms comparable high resolution global products.

## Usage Notes

Note that because of the pixel center referencing of the input GMTED2010 data the full extent of each grid as defined by the outside edges of the pixels differs from an integer value of latitude or longitude by 0.000138888888 degree (or 1/2 arc-second). Users of products based on the legacy GTOPO30 product should note that the coordinate referencing of each grid (and GMTED2010) and GTOPO30 are not the same. In GTOPO30, the integer lines of latitude and longitude fall directly on the edges of a 30-arc-second pixel. Thus, when overlaying grids with products based on GTOPO30 a slight shift of 1/2 arc-second will be observed between the edges of corresponding 30-arc-second pixels.

CHELSA_EarthEnv differs in several aspects with the already available climatological data (CHELSA V1-V2)37 and long term downscaled CMIP5 modelled data (CHELSAcmip5ts)39. The main difference is the increase in temporal resolution to a daily one, compared to the other two datasets. It is similar to CHELSA V1.x in the respect that both are ‘observational’ datasets, while CHELSAcmip5ts is a downscaled “modelled” dataset. A value of a climate variable given a specific day or month in CHELSA_EarthEnv, or CHELSA V1.x can therefore be seen as an event which actually has been recorded, while one in the CHELSAcmip5ts dataset is only a modelled and does not represent a real observation similar to those in the forcing CMIP5 models.

## Code availability

The code calculating the bias correction on the CHELSA V2.0 precipitation data is written in Python 2.7 and C++ (via the SAGA-GIS api). The code for the cloud cover refinement is available here: https://gitlabext.wsl.ch/karger/chelsa_earthenv. The code for the validation is available here: https://gitlabext.wsl.ch/karger/chelsa_earthenv_validation.

## References

1. 1.

Kucera, P. A. et al. Precipitation from Space: Advancing Earth System Science. Bull. Am. Meteorol. Soc. 94, 365–375 (2012).

2. 2.

Tapiador, F. J. et al. Global precipitation measurement: Methods, datasets and applications. Atmospheric Res. 104–105, 70–97 (2012).

3. 3.

Kirschbaum, D. B. et al. NASA’s Remotely Sensed Precipitation: A Reservoir for Applications Users. Bull. Am. Meteorol. Soc. 98, 1169–1184 (2016).

4. 4.

Sun, Q. et al. A Review of Global Precipitation Data Sets: Data Sources, Estimation, and Intercomparisons. Rev. Geophys. 56, 79–107 (2018).

5. 5.

Beck, H. E. et al. MSWEP V2 Global 3-Hourly 0.1° Precipitation: Methodology and Quantitative Assessment. Bull. Am. Meteorol. Soc. 100, 473–500 (2019).

6. 6.

Beck, H. E. et al. Daily evaluation of 26 precipitation datasets using Stage-IV gauge-radar data for the CONUS. Hydrol. Earth Syst. Sci. 23, 207–224 (2019).

7. 7.

Schär, C. et al. Kilometer-scale climate models: Prospects and challenges. Bull. Am. Meteorol. Soc. 101 (2019).

8. 8.

Service (C3S), C. C. C. ERA5: Fifth generation of ECMWF atmospheric reanalyses of the global climate, Copernicus Climate Change Service Climate Data Store (CDS). (2017).

9. 9.

Hersbach, H. et al. Operational global reanalysis: progress, future directions and synergies with NWP. (2018).

10. 10.

Gelaro, R. et al. The Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2). J. Clim. 30, 5419–5454 (2017).

11. 11.

Reichle, R. H. et al. Land surface precipitation in MERRA-2. J. Clim. 30, 1643–1664 (2017).

12. 12.

Beck, H. E. et al. MSWEP: 3-hourly 0.25◦ global gridded precipitation (1979–2015) by merging gauge, satellite, and reanalysis data. Hydrol Earth Syst Sci Discuss 2016, 1–38 (2016).

13. 13.

Skamarock, W. C. Evaluating mesoscale NWP models using kinetic energy spectra. Mon. Weather Rev. 132, 3019–3032 (2004).

14. 14.

Ménégoz, M., Gallée, H. & Jacobi, H. W. Precipitation and snow cover in the Himalaya: from reanalysis to regional climate simulations. Hydrol. Earth Syst. Sci. 17 (2013).

15. 15.

Liu, Z. et al. Evaluation of spatial and temporal performances of ERA-Interim precipitation and temperature in mainland China. J. Clim. 31, 4347–4365 (2018).

16. 16.

Skamarock, C. et al. A Description of the Advanced Research WRF Model Version 4. OpenSky https://doi.org/10.5065/1dfh-6p97 (2019).

17. 17.

Dipankar, A. et al. Large eddy simulation using the general circulation model ICON. J. Adv. Model. Earth Syst. 7, 963–986 (2015).

18. 18.

Heinze, R. et al. Large-eddy simulations over Germany using ICON: a comprehensive evaluation. Q. J. R. Meteorol. Soc. 143, 69–100 (2017).

19. 19.

Fuhrer, O. et al. Near-global climate simulation at 1km resolution: establishing a performance baseline on 4888 GPUs with COSMO 5.0. Geosci. Model Dev. 11, 1665–1681 (2018).

20. 20.

Schulthess, T. C. et al. Reflecting on the goal and baseline for exascale computing: a roadmap based on weather and climate simulations. Comput. Sci. Eng. 21, 30–41 (2018).

21. 21.

Neumann, P. et al. Assessing the scales in numerical weather and climate predictions: will exascale be the rescue? Philos. Trans. R. Soc. Math. Phys. Eng. Sci. 377, 20180148 (2019).

22. 22.

Woodward, F. I., Fogg, G. E., Heber, U., Laws, R. M. & Franks, F. The impact of low temperatures in controlling the geographical distribution of plants. Philos. Trans. R. Soc. Lond. B Biol. Sci. 326, 585–593 (1990).

23. 23.

Guisan, A. & Zimmermann, N. E. Predictive habitat distribution models in ecology. Ecol. Model. 135, 147–186 (2000).

24. 24.

Guisan, A. & Thuiller, W. Predicting species distribution: offering more than simple habitat models. Ecol. Lett. 8, 993–1009 (2005).

25. 25.

Tabios, G. Q. & Salas, J. D. A Comparative Analysis of Techniques for Spatial Interpolation of Precipitation1. JAWRA J. Am. Water Resour. Assoc. 21, 365–380 (1985).

26. 26.

Daly, C., Taylor, G. H. & Gibson, W. P. The PRISM approach to mapping precipitation and temperature. Proc 10th AMS Conf Appl. Climatol. 20–23 (1997).

27. 27.

Thornton, P. E., Running, S. W. & White, M. A. Generating surfaces of daily meteorological variables over large regions of complex terrain. J. Hydrol. 190, 214–251 (1997).

28. 28.

Hijmans, R. J., Cameron, S. E., Parra, J. L., Jones, P. G. & Jarvis, A. Very high resolution interpolated climate surfaces for global land areas. Int. J. Climatol. 25, 1965–1978 (2005).

29. 29.

Briggs, P. R. & Cogley, J. G. Topographic bias in mesoscale precipitation networks. J. Clim. 9, 205–218 (1996).

30. 30.

Schneider, U. et al. GPCC’s new land surface precipitation climatology based on quality-controlled in situ data and its role in quantifying the global water cycle. Theor. Appl. Climatol. 115, 15–40 (2013).

31. 31.

Kidd, C. et al. So, how much of the Earth’s surface is covered by rain gauges? Bull. Am. Meteorol. Soc. 98, 69–78 (2017).

32. 32.

Berndt, C. & Haberlandt, U. Spatial interpolation of climate variables in Northern Germany—Influence of temporal resolution and network density. J. Hydrol. Reg. Stud. 15, 184–202 (2018).

33. 33.

Groisman, P. Y. & Legates, D. R. The accuracy of United States precipitation data. Bull. Am. Meteorol. Soc. 75, 215–228 (1994).

34. 34.

Sevruk, B. Regional Dependency of Precipitation-Altitude Relationship in the Swiss Alps. in Climatic Change at High Elevation Sites (eds. Diaz, H. F., Beniston, M. & Bradley, R. S.) 123–137, https://doi.org/10.1007/978-94-015-8905-5_7 (Springer Netherlands, 1997).

35. 35.

Rasmussen, R. et al. How well are we measuring snow: The NOAA/FAA/NCAR winter precipitation test bed. Bull. Am. Meteorol. Soc. 93, 811–829 (2012).

36. 36.

Beck, H. E. et al. Bias Correction of Global High-Resolution Precipitation Climatologies Using Streamflow Observations from 9372 Catchments. J. Clim. 33, 1299–1315 (2020).

37. 37.

Karger, D. N. et al. Climatologies at high resolution for the earth’s land surface areas. Sci. Data 4, 170122 (2017).

38. 38.

Muñoz-Sabater, J. et al. ERA5-Land: an improved version of the ERA5 reanalysis land component. in Joint ISWG and LSA-SAF Workshop IPMA, Lisbon 26–28 (2018).

39. 39.

Karger, D. N., Schmatz, D. R., Dettling, G. & Zimmermann, N. E. High resolution monthly precipitation and temperature timeseries for the period 2006–2100. Sci. Data (2020).

40. 40.

Huffman, G. J. et al. The TRMM Multisatellite Precipitation Analysis (TMPA): Quasi-Global, Multiyear, Combined-Sensor Precipitation Estimates at Fine Scales. J. Hydrometeorol. 8, 38–55 (2007).

41. 41.

Biasutti, M., Yuter, S. E., Burleyson, C. D. & Sobel, A. H. Very high resolution rainfall patterns measured by TRMM precipitation radar: seasonal and diurnal cycles. Clim. Dyn. 39, 239–258 (2011).

42. 42.

Goddard Space Flight Center Distributed Active Archive Center (GSFC DAAC). TRMM/TMPA 3B43 TRMM and Other Sources Monthly Rainfall Product V7. (2011).

43. 43.

Funk, C. et al. The climate hazards infrared precipitation with stations—a new environmental record for monitoring extremes. Sci. Data 2, 150066 (2015).

44. 44.

Levizzani, V., Laviola, S. & Cattani, E. Detection and measurement of snowfall from space. Remote Sens. 3, 145–166 (2011).

45. 45.

Skofronick-Jackson, G. et al. Global precipitation measurement cold season precipitation experiment (GCPEX): for measurement’s sake, let it snow. Bull. Am. Meteorol. Soc. 96, 1719–1741 (2015).

46. 46.

Vila, D. A., D Goncalves, L. G. G., Toll, D. L. & Rozante, J. R. Statistical evaluation of combined daily gauge observations and rainfall satellite estimates over continental South America. J. Hydrometeorol. 10, 533–543 (2009).

47. 47.

Xie, P., Yoo, S.-H., Joyce, R. & Yarosh, Y. Bias-corrected CMORPH: A 13-year analysis of high-resolution global precipitation. In Geophysical Research Abstracts 13, EGU2011–1809 (2011).

48. 48.

Xie, P. & Xiong, A.-Y. A conceptual model for constructing high‐resolution gauge‐satellite merged precipitation analyses. J. Geophys. Res. Atmospheres 116 (2011).

49. 49.

Vernimmen, R. R. E., Hooijer, A., Mamenun, N. K., Aldrian, E. & Van Dijk, A. Evaluation and bias correction of satellite rainfall data for drought monitoring in Indonesia. (2012).

50. 50.

Cannon, A. J., Sobie, S. R. & Murdock, T. Q. Bias Correction of GCM Precipitation by Quantile Mapping: How Well Do Methods Preserve Changes in Quantiles and Extremes? J. Clim. 28, 6938–6959 (2015).

51. 51.

Richards, F. & Arkin, P. On the relationship between satellite-observed cloud cover and precipitation. Mon. Weather Rev. 109, 1081–1093 (1981).

52. 52.

Arkin, P. A. & Meisner, B. N. The relationship between large-scale convective rainfall and cold cloud over the western hemisphere during 1982-84. Mon. Weather Rev. 115, 51–74 (1987).

53. 53.

Betts, A. K., Tawfik, A. B. & Desjardins, R. L. Revisiting Hydrometeorology Using Cloud and Climate Observations. J. Hydrometeorol. 18, 939–955 (2017).

54. 54.

Wilson, A. M. & Jetz, W. Remotely Sensed High-Resolution Global Cloud Dynamics for Predicting Ecosystem and Biodiversity Distributions. PLOS Biol 14, e1002415 (2016).

55. 55.

Hersbach, H. et al. The ERA5 global reanalysis. Q. J. R. Meteorol. Soc. 146, 1999–2049 (2020).

56. 56.

Hersbach, H. et al. Global reanalysis: goodbye ERA-Interim, hello ERA5. 17–24, https://doi.org/10.21957/vf291hehd7 (2019).

57. 57.

Cucchi, M. et al. WFDE5: bias-adjusted ERA5 reanalysis data for impact studies. Earth Syst. Sci. Data 12, 2097–2120 (2020).

58. 58.

Kållberg, P. Forecast drift in ERA-Interim. ERA Rep. Ser. 10, 9 (2011).

59. 59.

Ziese, M. et al. GPCC Full Data Daily Version.2018 at 1.0°: Daily Land-Surface Precipitation from Rain-Gauges built on GTS-based and Historic DataZiese, Markus; Rauthe-Schöch, Armin; Becker, Andreas; Finger, Peter; Meyer-Christoffer, Anja; Schneider, Udo. DWD 10.5676/DWD_GPCC/FD_D_V2018_100.

60. 60.

Lee, S., Wolberg, G. & Shin, S. Y. Scattered data interpolation with multilevel B-splines. IEEE Trans. Vis. Comput. Graph. 3, 228–244 (1997).

61. 61.

Press, W. H., Flannery, B. P., Teukolsky, S. A. & Vetterling, W. T. Numerical recipes. vol. 3 (Cambridge University Press Cambridge, 1989).

62. 62.

Basist, A., Bell, G. D. & Meentemeyer, V. Statistical Relationships between Topography and Precipitation Patterns. J. Clim. 7, 1305–1315 (1994).

63. 63.

Weisse, A. K. & Bois, P. Topographic Effects on Statistical Characteristics of Heavy Rainfall and Mapping in the French Alps. J. Appl. Meteorol. 40, 720–740 (2001).

64. 64.

Marquı́nez, J., Lastra, J. & Garcı́a, P. Estimation models for precipitation in mountainous regions: the use of GIS and multivariate analysis. J. Hydrol. 270, 1–11 (2003).

65. 65.

Smith, R. B. & Barstad, I. A Linear Theory of Orographic Precipitation. J. Atmospheric Sci. 61, 1377–1391 (2004).

66. 66.

Jiang, Q. Precipitation over multiscale terrain. Tellus Dyn. Meteorol. Oceanogr. 59, 321–335 (2007).

67. 67.

Böhner, J. Advancements and new approaches in climate spatial prediction and environmental modelling. Arbeitsberichte Geogr. Inst. HU Zu Berl. 109, 49–90 (2005).

68. 68.

Böhner, J. General climatic controls and topoclimatic variations in Central and High Asia. Boreas 35, 279–295 (2006).

69. 69.

Böhner, J., Antonic, O., Böhner, J. & Antonic, O. Land-Surface Parameters Specific to Topo-Climatology. In T. Hengl, & H. I. Reuter (Eds.), GEOMORPHOMETRY: CONCEPTS, SOFTWARE, APPLICATIONS (pp. 195–226). Elsevier Science. in in T. Hengl, & H. I. Reuter (eds.) Geomorphometry: Concepts, Software, Applications 195–226 (Elsevier Science, 2009).

70. 70.

Gerlitz, L., Conrad, O. & Böhner, J. Large-scale atmospheric forcing and topographic modification of precipitation rates over High Asia – a neural-network-based approach. Earth Syst Dynam 6, 61–81 (2015).

71. 71.

Austin, G. L. & Dirks, K. N. Topographic Effects on Precipitation. in Encyclopedia of Hydrological Sciences https://doi.org/10.1002/0470848944.hsa033 (American Cancer Society, 2006).

72. 72.

Liu, M., Bárdossy, A. & Zehe, E. Interaction of valleys and circulation patterns (CPs) on small-scale spatial precipitation distribution in the complex terrain of southern Germany. Hydrol. Earth Syst. Sci. Discuss. 9 (2012).

73. 73.

Vogelezang, D. H. P. & Holtslag, A. A. M. Evaluation and model impacts of alternative boundary-layer height formulations. Bound.-Layer Meteorol. 81, 245–269 (1996).

74. 74.

von Engeln, A. & Teixeira, J. A Planetary Boundary Layer Height Climatology Derived from ECMWF Reanalysis Data. J. Clim. 26, 6575–6590 (2013).

75. 75.

Frei, C. & Schär, C. A precipitation climatology of the Alps from high-resolution rain-gauge observations. Int. J. Climatol. 18, 873–900 (1998).

76. 76.

Roger, J. C. & Vermote, E. F. A method to retrieve the reflectivity signature at 3.75 μm from AVHRR data. Remote Sens. Environ. 64, 103–114 (1998).

77. 77.

Petitcolin, F. & Vermote, E. Land surface reflectance, emissivity and temperature from MODIS middle and thermal infrared data. Remote Sens. Environ. 83, 112–134 (2002).

78. 78.

Kalnay, E. et al. The NCEP/NCAR 40-Year Reanalysis Project. Bull. Am. Meteorol. Soc. 77, 437–471 (1996).

79. 79.

Karger, D. N., Wilson, A. M., Mahony, C., Zimmermann, N. E. & Jetz, W. Global daily 1km land surface precipitation based on cloud cover-informed downscaling. EarthEnv, https://doi.org/10079/MOL/6f52b80d-0a41-40f7-84ec-873458ca6ee6 (2021).

80. 80.

Menne, M. J. et al. Global Historical Climatology Network - Daily (GHCN-Daily), Version 3. NOAA National Climatic Data Center. 10.7289/V5D21VHZ [access 3.11.2018]. (2018).

81. 81.

Gupta, H. V., Kling, H., Yilmaz, K. K. & Martinez, G. F. Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. J. Hydrol. 377, 80–91 (2009).

82. 82.

Kling, H., Fuchs, M. & Paulin, M. Runoff conditions in the upper Danube basin under an ensemble of climate change scenarios. J. Hydrol. 424–425, 264–277 (2012).

83. 83.

Knoben, W. J. M., Freer, J. E. & Woods, R. A. Technical note: Inherent benchmark or not? Comparing Nash–Sutcliffe and Kling–Gupta efficiency scores. Hydrol. Earth Syst. Sci. 23, 4323–4331 (2019).

84. 84.

Fick, S. E. & Hijmans, R. J. WorldClim 2: new 1-km spatial resolution climate surfaces for global land areas. Int. J. Climatol. 37, 4302–4315 (2017).

85. 85.

Willmott, C. J. & Robeson, S. M. Climatologically aided interpolation (CAI) of terrestrial air temperature. Int. J. Climatol. 15, 221–229 (1995).

86. 86.

Harris, I., Osborn, T. J., Jones, P. & Lister, D. Version 4 of the CRU TS monthly high-resolution gridded multivariate climate dataset. Sci. Data 7, 1–18 (2020).

87. 87.

Liu, C. et al. Continental-scale convection-permitting modeling of the current and future climate of North America. Clim. Dyn. 49, 71–95 (2017).

88. 88.

Sorooshian, S., Duan, Q. & Gupta, V. K. Calibration of rainfall-runoff models: Application of global optimization to the Sacramento Soil Moisture Accounting Model. Water Resour. Res. 29, 1185–1194 (1993).

## Acknowledgements

D.N.K. & N.E.Z. acknowledge funding from: The WSL internal grant exCHELSA, the 2019–2020 BiodivERsA joint call for research proposals, under the BiodivClim ERA-Net COFUND program, with the funding organisations Swiss National Science Foundation SNF (project: FeedBaCks, 193907), Agence nationale de la recherche (ANR-20-EBI5-0001-05), the Swedish Research Council for Sustainable Development (Formas 2020–02360), the German Research Foundation (DFG BR 1698/21–1, DFG HI 1538/16–1), and the Technology Agency of the Czech Republic (SS70010002), as well as the Swiss Data Science Projects: SPEEDMIND, and COMECO. D.N.K. acknowledges funding to the ERA-Net BiodivERsA - Belmont Forum, with the national funder Swiss National Foundation (20BD21_184131), part of the 2018 Joint call BiodivERsA-Belmont Forum call (project ‘FutureWeb’), the WSL internal grant ClimEx. We thank EarthEnv project collaborators Rob Guralnick and Brian McGill for discussions preceding and intellectually benefitting the research presented here. W.J. acknowledges funding from NASA grants 80NSSC17K0282, 80NSSC20K0202, and 80NSSC18K0435.

## Author information

Authors

### Contributions

D.N.K., A.W. and W.J. developed the idea. A.W. produced the monthly MODIS cloud frequency layers, D.N.K. and N.E.Z. developed and implemented the precipitation downscaling and bias correction algorithm, C.M. and D.N.K. conducted the validation, D.N.K. wrote the first version of the manuscript and all authors contributed significantly to the revision.

### Corresponding authors

Correspondence to Dirk Nikolaus Karger or Walter Jetz.

## Ethics declarations

### Competing interests

The authors declare no competing interests.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article.

Reprints and Permissions

Karger, D.N., Wilson, A.M., Mahony, C. et al. Global daily 1 km land surface precipitation based on cloud cover-informed downscaling. Sci Data 8, 307 (2021). https://doi.org/10.1038/s41597-021-01084-6

• Accepted:

• Published:

• DOI: https://doi.org/10.1038/s41597-021-01084-6