Climatologies at high resolution for the earth’s land surface areas

High-resolution information on climatic conditions is essential to many applications in environmental and ecological sciences. Here we present the CHELSA (Climatologies at high resolution for the earth’s land surface areas) data of downscaled model output temperature and precipitation estimates of the ERA-Interim climatic reanalysis to a high resolution of 30 arc sec. The temperature algorithm is based on statistical downscaling of atmospheric temperatures. The precipitation algorithm incorporates orographic predictors including wind fields, valley exposition, and boundary layer height, with a subsequent bias correction. The resulting data consist of a monthly temperature and precipitation climatology for the years 1979–2013. We compare the data derived from the CHELSA algorithm with other standard gridded products and station data from the Global Historical Climate Network. We compare the performance of the new climatologies in species distribution modelling and show that we can increase the accuracy of species range predictions. We further show that CHELSA climatological data has a similar accuracy as other products for temperature, but that its predictions of precipitation patterns are better.


Background & Summary
High-resolution climate data are essential to many applications in environmental and ecological sciences. Whereas many studies in these fields are conducted at a resolution of~1 km 2 , state-of-the-art global climate reanalyses often only represent climatic variation at spatial resolutions of 0.25°-1°(ca. 25-100 km at the equator). The gap between these spatial scales may be bridged using satellite data (CHIRPS 1 , TRMM 2,3 ) and statistical downscaling 4-7 for a specific region of interest and/or interpolation methods applied to meteorological station data (WorldClim 8 , CRU 9 , GPCC 10 , PRISM 11 ). Climatologies based on satellites or statistical downscaling, considered superior to interpolated data 12 for ecological applications, are currently either not available on a global scale or are still too coarse to reflect the small scale patterns needed in ecological studies. While interpolated datasets 8 often perform well in matching precipitation or temperature of the stations from which they are produced, they often fail to accurately predict patterns between stations. This is particularly problematic in highly variable terrain with low station density 13 . Whereas some interpolated datasets use elevation as a predictor (e.g., WorldClim 8 ) and observations such as the Global Historical Climate Network (GHCN) 14,15 to achieve a high-resolution prediction, it is also possible to use predictors from global circulation models (e.g., from the National Center for Atmospheric Research (NCEP) 16 , or the European Centre for Medium-Range Weather Forecast (ECMWF) climatic reanalysis interim (ERA-Interim) 17 ).
Although interpolation and statistical downscaling approaches may also integrate land-surface predictors such as elevation, slope or aspect, satisfactory results still require a more or less regular distribution of meteorological stations and a proper representation of topo-climatic settings 13 . However, the global distribution of meteorological stations is highly biased by funding and accessibility, leading to a poor representation of climatic variability in mountainous regions or areas with intact lowland rainforest, such as the Amazon or Congo basin 8,13 . On a global scale, it is also difficult to find generally valid transfer functions between predictors and the climatic variable of interests, especially for highly non-linear phenomena such as precipitation. Statistical downscaling is problematic 18 , especially on a global scale, due to temporal variation in the spatial distribution of weather stations. Although measurements for a given predictor might be available in a given month, they might be absent in another, leading to a generally high heterogeneity of the underlying climate records when time series of precipitation need to be calculated. While this does not affect static predictors such as elevation, slope, or aspect, statistical downscaling becomes especially problematic when highly dynamic predictors such as wind fields need to be integrated. The heterogeneity in the temporal and spatial distribution of such dynamic factors can also lead to spurious correlations in specific months or in specific regions, which can severely influence regression model parameters. When specific predictors, such as windward or leeward mountain sides 19,20 change over the course of the year, the location of the climatic records does not change accordingly. Therefore, regression-based downscaling might, for example, detect a significant negative relationship between a station on the windward site of a mountain for one month, and a positive relationship for another, although atmospheric physics would always predict a positive relationship. Due to this problem, statistical downscaling and interpolation methods have often been applied to single regions 11 , while a global model is lacking.
For ecological applications, the representation of the temporal and spatial variability of temperature and precipitation is, however, extremely important to infer ecological niches, growing seasons, species migrations, or small scale species distribution. Errors in the underlying climatic dataset at this small spatial scale can easily inflate in such studies 13 , which calls for an improvement of climatic information available for such analyses.
To overcome the problem of heterogeneous spatial and temporal distribution of meteorological station data, we use a Model Output Statistics algorithm for data provided from the ERA-Interim reanalysis 17 which we correct using gauge-derived products from the GPCC 10 and the GHCN 14,15 datasets. The results are improved climatologies for precipitation and temperature at high spatial resolution for environmental and ecological studies, which might prove valuable in varied scientific applications that rely on a good representation of small scale precipitation and temperature patterns.

Calculation of monthly temperature and precipitation values
ERA-Interim (developed at the European Centre for Medium-Range Weather Forecast, ECMWF), simulates six-hourly large-scale atmospheric fields for 60 pressure levels between 1,000 and 1 hPa globally with a horizontal resolution of 0.75°lat/long (T255) 17,21,22 . Since the ERA-Interim reanalysis combines modelling results with ground and radiosonde observations as well as remote sensing data using a data assimilation system, the free-atmospheric and surface fields can be considered as the best approximation of the current large-scale atmospheric situation for every time step. Several studies show that ERA-Interim adequately captures the variability of relevant free-air meteorological parameters, even over complex terrain [23][24][25] .

Temperature
Spatial variation in temperature is to a large degree determined by the vertical state of the troposphere and thus, if not affected by inversion layers, temperature decreases with increasing altitude 26 vertical distribution of moist-or dry-adiabatic lapse rates 20 . Typical temperature lapse rates are in the order of −0.4 to −0.8 K/100 m with a characteristic seasonality. The corresponding temperature distribution pattern in the free atmosphere 28 can be assumed to be directly related to surface elevation 19 .
For our downscaling approach of mean monthly temperatures, we used the monthly means of daily mean temperature derived from six-hourly synoptic data from ERA-Interim. Temperature lapse rates were calculated from the ERA-Interim for pressure levels from 1,000 to 300 hPa, using linear regression for each ERA-Interim grid cell. We then interpolated temperature to sea level using the derived lapse rates. Sea level temperatures were then interpolated between grid cells using B-spline interpolation, and then projected back on the elevational surface of the digital elevation model using the equation: where t equals the temperature at a given elevation, Γ d equals the lapse rate, elev equals elevation at 30 arc sec. from the Global Multi-resolution Terrain Elevation Data 2010 (GMTED2010) 29 of the United States Geological Survey (USGS) and the National Geospatial-Intelligence Agency (NGA), and t 0 equals the interpolated temperature at sea level.

Maximum and minimum temperatures
Maximum (t max ) and minimum (t min ) temperatures were calculated using climatological aided interpolation 30 . For that we used the mean monthly temperature values (t) and added, or subtracted the maximum or minimum daily temperature derived from the three-hourly data of minimum or maximum temperature since previous post processing data fields in ERA-Interim: where Δt era_max and Δ t era_min are the respective differences between maximum and minimum temperatures interpolated to 30 arc sec resolution using B-spline interpolation from the mean monthly temperatures (t).

Precipitation
Elevation is one of the main topo-climatic drivers of vertical precipitation gradients, but the relation between elevation and precipitation can be idiosyncratic 19,[31][32][33][34][35][36] . In the convective regimes of the tropics, precipitation amounts commonly increase up to the condensation level at about 1,000-1,500 m above the ground surface, and the exponentially decreasing air moisture content in the mid-to upper troposphere results in a corresponding drying above the condensation level of tropical convection cluster systems (non-linear precipitation lapse rates) 37 . Likewise, negative lapse rates typically occur in the extremely dry polar climates. At mid-latitudes and in the subtropics, the frequent or even prevalent advection of moisture bearing air to high altitudes leads to increasing precipitation with increasing elevation. Consequently, the summits of high mountain ranges such as the Alps 38 may have high rainfall, and this leads to linear precipitation lapse rates 39 . The reduced precipitation at lower elevations is due, firstly, to the evaporation of rain drops when falling through non-saturated, lower-air levels. Secondly, the vertical precipitation gradient in high mountain ranges is often increased due to the diurnal formation of autochthonous upslope breezes. This upward flow of air intensifies cloud and precipitation formation in upper slope positions whilst the subsiding branch of these autochthonous local circulation systems along the valley axis leads to cloud dissolution and a corresponding reduction of precipitation rates in the valley bottoms. We approximated such orographic precipitation effects and used them as a parameter for the CHELSA precipitation downscaling algorithm ( Fig. 1) as explained below.

Wind effect correction
Orographic precipitation patterns 40 caused by the uplift of moist air currents at the windward side of a mountain range and the intimately related rain shadow effect on leeward sides induced by the blockage of moisture-bearing air are most common effects influencing small-scale precipitation patterns 38,[40][41][42][43] .
Based on the assumption that the windward impact on the precipitation intensity depends on the prevailing wind direction at any given elevation of an orographic barrier, we used a wind index 19,20 to account for the expected higher precipitation at the windward sites of an orographic barrier. We used u-wind and v-wind components at the 10-m level of ERA-Interim as underlying wind components. These two wind components were interpolated to the CHELSA grid resolution using a B-spline interpolation. As the calculation of a windward leeward index (hereafter: wind effect) requires a projected coordinate system, both wind components were projected to a world Mercator projection and then combined to a directional grid.   component H L were then calculated using: where d WHi and d LHi refer to the horizontal distances in windward and leeward direction and d WZi and d LZi are the corresponding vertical distances compared with the considered raster cell. The second summand in equation (4) accounts for the leeward impact of previously traversed mountain chains. The horizontal distances in equation (5) lead to a longer-distance impact of leeward rain shadow. The final wind-effect parameter, which is assumed to be related to the interaction of the large-scale wind field and the local-scale precipitation characteristics, is calculated as H = H L × H W and generally takes values between 0.7 for leeward and 1.3 for windward positions 19 . Equation (3) and equation (4) were applied to each grid cell at the CHELSA resolution in a world mercator projection.

Valley exposition correction
Although the wind effect algorithm can distinguish between the windward and leeward sites of an orographic barrier, it cannot distinguish extremely isolated valleys in high mountain areas. Such dry valleys are situated in areas where the wet air masses flow over an orographic barrier and are prevented from flowing into deep valleys. To account for these effects, we used a variant of equation (4) and equation (5) with a linear search distance of 300 km in steps of 5°from 0°to 355°c ircular for each grid cell. The calculated leeward index was then scaled towards higher elevations using: which rescales the strength of the exposition index relative to elevation (elev) from GMTED2010, and gives valleys at high elevations larger wind isolations (E) than valleys located at low elevations. The correction constant c was set to 9,000 m to include all possible elevations of the DEM. The constant has been set to 9,000 m as values of elev >c could lead to a reverse relationship between elev and H L . Additionally, a prior sensitivity analysis indicated that downscaled precipitation with c = 9,000 m has a better fit with precipitation measured at the stations (GHCN stations) than values of c >9,000 m. We therefore choose to set c conservatively to 9,000 m.

Boundary layer correction
Orographic precipitation effects are less pronounced just above the surface, as well as in the free atmosphere above the planetary boundary layer 11,44,45 . The highest impact of orography is considered just at the boundary layer height where the airflow interacts with the terrain. While former studies used single ERA pressure levels, known to represent the main wind field patterns in a specific area 20 , the pressure level representing the prevailing wind directions at the boundary layer is usually not known a priory on a global basis. We therefore used the boundary layer height B from ERA-Interim as indicator of the pressure level that has the highest contribution to the wind effect. The boundary layer height has been interpolated to the CHELSA resolution using a B-spline interpolation. The wind effect grid H containing the windward (H W ) and leeward (H L ) index values was then proportionally distributed to all grid cells falling within a respective 0.75°grid cell using: with: With d being the distance between a grid cell and the boundary layer height B, d max being the maximum distance between the boundary layer height B and all grid cells at the CHELSA resolution falling within a respective 0.75°grid cell, c being a constant of 9,000 m, and elev being the respective elevation from GMTED2010. with: B being the height of the monthly means of daily mean boundary layer from ERA-Interim, elev ERA being the elevation of the ERA-Interim grid cell, and f being a constant of 500 m which takes into account that the level of highest precipitation is not necessarily at the lower bound of the boundary layer, but slightly higher 44,45 . Similar to the c value in equations (6-8) we used a prior sensitivity analysis that varied f in steps of 50 m to determine the impact of f on the modelled precipitation values. Values of 500 >f o500 showed to a lower fit between modelled precipitation and precipitation measured at stations.

Precipitation data from ERA-Interim
For accumulated parameters (total monthly precipitation), we used the monthly means of daily forecast accumulations of total precipitation initialized at the synoptic hours 0:00 and 12:00. To calculate monthly precipitation sums, we added the synoptic monthly means at time 0:00, step 12 and time 12:00, step 12 and multiplied it by the number of days in the respective month.
Bias correction of ERA-Interim data using GPCC and GHCN data Model-generated estimates of the surface precipitation are extracted from short range forecasts, which vary with forecast length. This drift in the short-range forecasts can be a problem for monthly and climatic means 46 . One very common approach is to calculate the difference between baseline precipitation from the GCM and the observed precipitation and apply this 'factor of change' to historically observed time series to generate a synthetic time series [47][48][49] . We therefore performed three steps of bias correction.

Monthly bias correction
We applied the monthly bias correction before the downscaling of the precipitation data on the ERA-interim precipitation values p ERA directly 49 . To this end, we used the monthly values p GPCC of the gridded GPCC dataset 10 to calculate the monthly bias R m caused by the ERA-Interim parametrization, and the excessive or insufficient precipitation of the forecast algorithm 46 for each month from Jan. 1979-Dec. 2013 using: We only used grid cells with meterological stations present for R m . The forecast algorithm used to produce the precipitation amounts for ERA-Interim exhibits a considerable spin up-spin down effect (too much or too less precipitation), that has a coherent spatial structure, with a larger bias over high elevation terrain, or specific land forms such as tropical rainforests 46 . Based on this observation, we assumed that grid cells without stations share a similar bias as their neighbouring stations. To achieve a gap-free grid surface, we interpolated the gaps in the R m grid using a multilevel B-spline interpolation with 14 error levels to a 0.75°resolution. The gap-free bias correction surface R m was then multiplied with the ERA-Interim precipitation p ERA to get the bias corrected monthly precipitation sums p m at 0.75°r esolution: Monthly precipitation including orographic effects To achieve the distribution of monthly precipitation sums p including orographic effects, we used a linear relationship between the monthly bias corrected precipitation grids at the ERA resolution p m and the boundary layer corrected wind effect surface H: where H is the mean wind effect at ERA resolution. By using a linear relationship we archive that the data are to scale, e.g., the precipitation at 0.75°resolution exactly matches the mean precipitation at all 30°sec cells within the range of a 0.75°cell.

Station bias correction
We used precipitation from a set of meteorological stations from GHCN, MeteoSwiss, and DWD to correct the remaining error between p and p Station . We calculated the bias ratio between p and p Station and interpolated the bias ratio using a multilevel B-spline interpolation with 14 error levels to a 0.1°grid which matches the spatial accuracy of many GHCN stations. The resulting bias surface was then multiplied with p to achieve the final monthly precipitation estimates.

Climatologies
We calculated the climatologies as the mean monthly sum of precipitation in the years 1979-2013 for each month. As slight errors in the precipitation sums can, however, accumulate over time, we applied an additional bias correction step using the GPCC Climatology Version 2015 50 . We used the cells at which stations are present, calculated the bias between the annual accumulations and the GPCC climatology, and used a multilevel B-spline interpolation of the biases to a 0.25 grid to create the bias surface. This bias surface was then multiplied with the mean annual precipitation sums to create the final climatologies.

Bioclimatic parameters
From the monthly temperature and precipitation values, we additionally calculated a set of derived parameters often used in ecological applications. These bioclimatic variables are derived variables from the monthly mean, min, max, mean temperature, and mean precipitation values. These variables are specifically developed for species distribution modelling and related ecological applications. They represent annual averages (e.g., mean annual temperature, annual precipitation), seasonality (e.g., annual range in temperature and precipitation), and extreme or limiting environmental factors (e.g., temperature of the coldest and warmest month, and precipitation of the wet and dry quarters). A quarter is defined as the period of three months (1/4 of the year). The procedure strictly followed that of WorldClim 8 and ANUCLIM 51 . The equations used to calculate the bioclimatic variables (where applicable) are: bio1 : bio2 : bio4 : ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi 1 11 bio12 : bio15 : Not listed here are the variables which are based on quarters (3 consecutive months) or specific (wettest, driest, warmest, coldest) months.

Code availability
The codes used to calculate CHELSA climatologies are written in C++ and are included in SAGA Version 2.2.7, freely available at www.saga-gis.org under the GNU public license including the necessary source codes. Calculations were done in SAGA Version 2.2.7 on the 'Science Cloud' cloud computing facility of the University of Zurich www.s3it.uzh.ch/infrastructure/sciencecloud/.

Data Records
The CHELSA data contains records for monthly mean temperature in°C and precipitation values in mm/month, and derived bioclimatic variables for the reference period 1979-2013 in form of GeoTIFF files. The climatologies available for download have been derived from monthly values of precipitation and temperature. The files are freely available at www.chelsa-climate.org as well as Dryad (Data Citation 1).
The file format is GeoTIFF.

Technical Validation
To validate the results of the CHELSA algorithm and the different bias correction steps applied, we use a statistical cross-validation, compared the results with several comparable products that are available at comparable spatial and temporal resolution, and independent meteorological station data. The climatologies are validated in two steps. First, as the effects of orographic winds are a non-stationary phenomenon, we show a validation of the time series (hereafter: reanalysis) from which the climatologies are created. Second, we show a validation of the final climatological products which are available for download.

Cross-validation of the bias correction method using monthly stations
To validate the results of the bias correction method using meteorological station data, we employed a cross-validation approach based on repeated split-resampling. We randomly omitted 20% of the stations for validation and used the remaining 80% for the bias correction by multilevel B-spline interpolation. We

Validation of the orographic precipitation patterns
The CHELSA algorithm distributes the mean precipitation measured in a grid cell onto the expected precipitation pattern which in turn is calculated based on the wind effect and valley exposition indices. To test whether the inclusion of wind effect and valley exposition patterns produce a higher accuracy, we compared the fit between station data and precipitation in cells of 0.75°spatial resolution (correction step 1) and compared it with the fit between station data and corrected precipitation at 30 arc sec. We performed this comparison in five topographically complex regions ( Table 1). The inclusion of the small-scale orographic effects generally leads to a better fit to the station data in all complex areas in CHELSA compared to WorldClim 8  between measured station data and orographically corrected precipitation generally increases during this step of the algorithm. The remaining variance between stations and orographic predictors is finally removed using the subsequent bias correction steps.

Small-scale fit between stations and final climatology
To compare the fit of GHCN stations 14 with the final climatologies, we calculated a linear regression model for each station separately. For each station, the surrounding stations in 2°distance where included (with a minimum of 16 stations). We then regressed the mean annual precipitation measured at the station with the mean annual precipitation derived from six different models that either use GHCN data in their algorithms (CHELSA, CRU 9 , WorldClim 8 , GPCC 50 ) or do not use GHCN directly (ERA-Interim 17 ). From this linear regression approach two inferences can be drawn. First, the comparison between ERA-Interim, GPCC, and CHELSA shows the increase in fit with the specific corrections which are applied in the CHELSA algorithm, as ERA-Interim and GPCC contribute data to CHELSA. Second, the results show how well the stations correspond to the respective models at small spatial scale.
Among the models which use GHCN in their algorithm, CHELSA shows the highest fit between stations and predicted precipitation, with WorldClim, GPCC and CRU showing smaller, but still high fits with the station data (Fig. 2).

Validation using independent precipitation station data
A statistical comparison with different datasets is complicated by the fact that most gridded temperature and precipitation datasets are parameterized using similar observational data, leading to generally high correlations between climatic reanalyses. To validate the results of the CHELSA algorithm, we identified several independent datasets of various size and temporal extents. None of these have been used in the final bias correction within the algorithm, and we have additionally screened them for duplicates in the GHCN 15 , MeteoSwiss, and DWD datasets. As the station data is of different spatial extent, it allows us to validate the accuracy of CHELSA on the global scale, as well as on the very small target scale of 30 arc°. We split the validation in two parts. One part examines the temporal performance of the reanalysis dataset (Table 2) and the other the climatological performance of the dataset (Table 3). For comparison with other reanalysis products, we also calculated a similar validation for the CRU 9 and ERA-Interim 17 datasets. For comparison with other climatologies, we included the CHPclim 52 , CRU, ERA-Interim, and WorldClim 8 datasets.

FAO data validation results
The FAO data obtained from the Agromet Group of the Food and Agriculture Organization of the United Nations (FAO) is a collection of 2,316 stations with a good representation in many typically data-sparse regions, but many stations only have a short measuring period. The data is global and duplicates in the GHCN data report have been removed. CHELSA shows high correspondence of R 2 = 0.83 throughout the years with the FAO dataset, while CRU and ERA-interim show considerably lower values (R 2 = 0.73 & R 2 = 0.51, respectively) ( Table 2). The root mean square error (RMSE) and mean absolute error between CHELSA and the FAO data is not significantly different from those of the other validation datasets (with the exception of the SAEON dataset) ( Table 2).
For the climatological validation, CHELSA performs similar to CHPclim, and WorldClim (Table 3). All three climatologies, however, already include FAO data in some way, which explains the close fit among the data. CHPclim and WorldClim use them in their station interpolation, and in the case of CHELSA, FAO data are only included in the GPCC data that have been used for the bias correction at the  large scale, but not in the GHCN data that have been used in the monthly bias interpolation step. Both CRU and ERA-Interim perform considerably mediocre when compared to FAO data (Table 3). However, this comparison is only partly valid and only shows the increase of fit when station data is included into a precipitation downscaling or interpolation algorithm.

Mexico data validation results
The Mexico data consist of 2,950 stations with a dense spatial distribution, but with only a short measuring period for many stations. None of the reanalysis datasets are able to capture the temporal variation in the station dataset well ( Table 2). The average R 2 of CHELSA only reaches 0.39, which is still slightly better than the performance of CRU and ERA-Interim. The RMSE and MAE values are also lower in their mean, as well as in their standard deviation. The poor performance of all products might be due to the fact that many meteorological stations in this dataset have missing values.
The climatological performance of all models with the Mexico dataset is slightly better than that of the reanalysis dataset (Table 3). WorldClim shows the highest fit with stations, which is not surprising, as WorldClim already includes most of the stations for its original calibration and the Mexico data set is therefore not independent from the WorldClim climatologies. CHELSA shows the second highest fit with the Mexico data and is slightly better than CHPclim in all three metrics (R 2 , RMSE, MAE). Era-Interim and CRU do not capture the climatological precipitation in this area well, in comparison to the other three datasets.

Austria Ehyd data validation results
The Ehyd data from the Federal Ministry of Agriculture, Forestry, Environment and Water Management of Austria comprises of a dense net of 877 precipitation stations in Austria.
The overall performance of all precipitation products is low when compared to the Ehyd stations ( Table 2). From all models however, CHELSA performs best, with the highest R 2 , and lowest errors. For the climatologies CHELSA performs second best after CHPclim, but with all climatologies having a comparably low fit with the station data (Table 3). WorldClim shows the lowest fit with station data. In general the overall performance of the climatologies is comparable to that of the reanalysis.

Skandinavia-Nordklim data
The Nordklim data 1.0 includes observations of twelve climate variables from more than 119 stations in the Nordic region including precipitation and air temperature, in a time span of over 100 years. The data are provided by NORDKLIM/NORDMET on behalf of the National meteorological services in Denmark (DMI), Finland (FMI), Iceland (VI), Norway (DNMI) and Sweden (SMHI). We screened these stations for duplicates in the GHCN dataset and remained with a set of 11 independent stations which we used for the validation.
All reanalysis products track the temporal signal in the data reasonably well, with CRU slightly outperforming CHELSA, and ERA-Interim performing the worst ( Table 2). All models, however, show relatively small errors in this region, and are only slightly different in their temporal signal.
CHELSA, WorldClim, and CRU climatologies fit the Nordklim data well, with CHELSA performing better than the other models, despite the fact that Nordklim data are included in WorldClim but not in CHELSA (Table 3). ERA-Interim and CHPclim do not perform well in this region, which is probably due to the larger errors of remote sensing data in arctic regions, on which both models depend.

China-CMA data
The precipitation data from China comes from the Chinese Meteorological Administration and consists of 241 stations with daily records that are not included in the GHCN dataset. CHELSA and CRU are able to track the temporal signal in precipitation rather well when compared to the CMA data, with ERA-Interim performing less well (Table 2).
For the climatological means, CRU slightly outperforms CHELSA, WorldClim shows a slightly worse performance compared to the former two, and CHPclim and ERA-Interim show the lowest performance in this region, with R 2 values slightly below 0.5 (Table 3).

South Africa-SAEON data validation results
One of the main purposes of CHELSA is the better representation of precipitation gradients at small spatial scales. The SAEON precipitation stations are located in the Jonkershoek valley in South Africa with a strong elevational gradient, from the entrance to the valley to the watershed at the top of the valley. They additionally have a very high interannual variability and have not been included in any global precipitation product. Although the timespan of the dataset is too low to validate the climatological performance, the dataset can be used to track the performance of models in a very complex terrain and a strong seasonality.
For all stations of the SAEON network, CHELSA predicts the temporal variation and the actual precipitation values best compared to the ERA-Interim, CRU and CHIRPS 1 reanalysis products (Fig. 3). All reanalysis products predict the temporal variation in precipitation well, but they differ in the respective errors. All of them underestimate the extremes of precipitation in the region covered by the SAEON data.

Large-scale spatial comparison of precipitation patterns
To compare our precipitation data with those of other products, we first compared the spatial patterns of precipitation with those of the Tropical Rainfall Measuring Mission (TRMM) 2,3 combined multisatellite product TRMM/TMPA (3B43) 53 , CRU 9 , WorldClim 8 , CHPclim 52 , GPCC 50 , and ERA-Interim 17 . Figure 4 shows the bias of all mentioned products with the CRU dataset. We used the CRU dataset as a comparison, as it is not included in the other datasets, and therefore the most independent. TRMM/TMPA (3B43), WorldClim, GPCC are all using similar stations, and ERA-Interim is known to exhibit large biases in precipitation. All products, with the exception of ERA-Interim show similar amounts and patterns of biases when compared to CRU data. The bias of CHELSA is lower than that of GPCC and ERA-Interim, the two datasets which have been used in the correction algorithm. The large scale comparison however, only serves as a guideline for the deviation in the above mentioned products in general and cannot be seen as independent validation. For regions in which all models exhibit large biases, we would urge caution in the use of a single precipitation product and would suggest the use of multiple models from various sources.

Small-scale comparison of precipitation patterns
To highlight small scale performance of CHELSA, we compared precipitation patterns of three different models in the topographically and climatically highly complex terrain of Bhutan (Fig. 5). A comparison of the annual precipitation totals between TRMM/TMPA (3B43) 53 , WorldClim 8 , CHELSA, and the statistical downscaling approach of Böhner 31 shows similar patterns between all models at the mesoscale. The differences at the microscale are, however, severe between CHELSA and Böhner 31 compared to WorldClim 8 . There are only few climate stations in the region of Bhutan, which creates spurious correlations between elevation and precipitation in the ANUSPLIN algorithm of WorldClim. CHELSA and Böhner show a more consistent relation between the terrain features and the resulting precipitation patterns. A comparison with the patterns of cloud formations in this region 54 shows similarities in the patterns where clouds form and where higher precipitation amounts are predicted by CHELSA and Böhner (Fig. 5). Although the formation of clouds does not necessarily coincide with rainfall, there is generally a high correlation between the formation of clouds and the patterns of rainfall especially in topographically complex terrain 55 . We therefore assume that our model is able to capture the topographic heterogeneity of precipitation at the small spatial scale well.

CHELSA -CRU GPCC -CRU
WorldClim -CRU TRMM -CRU CHPclim -CRU ERA-Interim -CRU    54 . In this region, most precipitation falls during the SW-monsoon in the northern summer, when wet air masses from the SW are lifted at the south face of the Himalayas and dry until reaching the Tibetan high plateau. While the mesoscale patterns are in congruence between models, there are clear differences at the microscale. WorldClim 8 predicts wet valleys and dry mountain faces, whereas CHELSA and Böhner 31 predict dry valleys and wet windward exposed mountain faces due to the inclusion of orographic predictors. CHELSA and Böhner 31 are also in closer congruence with the observed distribution of cloud in the area, which shows lower cloud cover in the isolated mountain valleys compared to the wind exposed mountain faces in the south.

Validation of temperature using independent meteorological stations
We compared CHELSA temperature data to that of MODIS (MOD11C3) 56 and several independent station datasets. Other high resolution products for temperature such as WorldClim do not have the same validation period as CHELSA. A comparison is therefore problematic due to the increase of global temperatures in the last decades 57 . PRISM 11 is geographically restricted to the United States and therefore also not available for global comparisons. As climate station data not directly used by the CHELSA algorithm for temperature, a comparison with station data is possible. We used a set of four different station networks with different temporal and spatial extent for the validation.
Temperature validation data: 1. FAO-data: 400 stations 2. Mexico-data: 2,915 stations 3. GHCN-data: 6,093 stations 4. Scandinavia-Nordklim data: 32 stations The downscaled CHELSA data tracks the temperature data well in the GHCN, and FAO datasets, but larger deviations in the Mexico and Nordklim datasets with regard to the R 2 values (Table 4). However, the RMSE and MAE of the Nordklim dataset are comparable to those of the two global datasets GHCN and FAO. Only the comparison with the Mexico dataset shows a high RMSE of 1.95, and MAE of 1.43 ( Table 4).
The climatologies show a lower RMSE and MAE than the time series when compared to all station data (Table 5). This indicates that although an error exists in the monthly CHELSA reanalysis temperatures, the error does not inflate when these values are averaged. R 2 values are also higher for the climatologies, than for the reanalysis values.

CHELSA-MODIS comparison
Coefficients of determination between MODIS (MOD11C3) 56 and CHELSA temperatures range from 0.95 to 0.99 globally, between GHCN Version 3 and CHELSA temperatures range 0.96 to 0.99 globally, and between MODIS (MOD11C3) and GHCN Version 2 range from 0.83-0.97 (Fig. 6)  sensing data observed as well in the MODIS (MOD11C3) and the ERA-Interim data 17,58 . As MODIS (MOD11C3) and ERA-Interim data are showing a similar bias, we can assume that the deviations either stem from the GHCN dataset or the remote sensing input to ERA-Interim and not the downscaling algorithm we use. The high spatial correlation between CHELSA and MODIS (MOD11C3) 56 shows that CHELSA is able to predict spatial patterns of temperature distributions well, and additionally accurately predicts the observed values of temperature on a small scale.
Application example: Performance for species distribution modelling As we are generally interested in the use of CHELSA climatologies for ecological studies we compare the performance of CHELSA in a species distribution modelling (SDM) approach 59,60 to the most commonly used climate dataset for this purpose: WorldClim 8 . We calculated SDMs for 67 species from Switzerland. We used species from Switzerland as this allows a comparison of performances in areas where climate station density is high in CHELSA and WorldClim, and tests whether the performance improvement of CHELSA is also found in areas with detailed climate data. We modelled species using a generalized linear model with mean annual precipitation and mean annual temperature as predictors. We randomly sampled six times as many pseudo-absence points as presence points and used an inverse weighting approach on the resulting presences and absences. We evaluated the models in a 10-fold cross-validation using the area under the receiver operating characteristic curve (AUC), Kappa statistics 61 and true skill  statistic 62 (TSS). AUC and Kappa are traditional test measures between predicted and observed data and usually ranges from 0.5 (AUC) or 0 (Kappa), indicating random fit, to 1, indicating perfect fit. TSS assesses model specificity and sensitivity and ranges from zero (both the specificity and the sensitivity of the model are zero) to 1 (both specificity and sensitivity are 1). Additionally, we calculated the adjusted D 2 value which represents the percentage deviance explained by (goodness-of-fit of) the model. Model performance for all 67 species was then compared using a paired t-test of all species distribution models. The result shows an improvement of the models when using CHELSA over WorldClim data (Fig. 7). All measures show a higher performance of the CHELSA data, although the difference in mean is not significant for the TSS. This shows that even in areas with comparably high station density and good climatic information the CHELSA algorithm improves the spatial prediction of climatic variables and subsequently the modelled distribution of species (Fig. 8).

Validation results-Conclusions
The validation results in general show that including orographic effects can improve existing climatologies and reanalysis to a degree that the derived analysis (here the SDMs) show increasing accuracies. While CHELSA is an improvement over existing very high-resolution climatologies, it still exhibits errors which we quantified in several ways. The validation of main correction step in the algorithm that includes the orographic wind effects and boundary layer shows that the precipitation at the stations is better captured after the correction than before the downscaling to 30 arc sec resolution. The improvement varies by region and month with the majority of months showing an improvement. Most importantly, the better prediction with regard to SDMs in which precipitation and temperature data are used already indicates that CHELSA might be a substantial improvement over existing products which are currently being employed for such purposes.

Usage Notes
All CHELSA products are in a geographic coordinate system referenced to the WGS 84 horizontal datum, with the horizontal coordinates expressed in decimal degrees. The CHELSA layer extents (minimum and maximum latitude and longitude) are a result of the coordinate system inherited from the 1-arc-second GMTED2010 data which itself inherited the grid extent from the 1-arc-second SRTM data.
Note that because of the pixel center referencing of the input GMTED2010 data the full extent of each CHELSA grid as defined by the outside edges of the pixels differs from an integer value of latitude or longitude by 0.000138888888 degree (or 1/2 arc-second). Users of products based on the legacy GTOPO30 product should note that the coordinate referencing of CHELSA (and GMTED2010) and GTOPO30 are not the same. In GTOPO30, the integer lines of latitude and longitude fall directly on the edges of a 30-arc-second pixel. Thus, when overlaying CHELSA with products based on GTOPO30 a slight shift of 1/2 arc-second will be observed between the edges of corresponding 30-arc-second pixels.