CA-discharge: Geo-Located Discharge Time Series for Mountainous Rivers in Central Asia

We present a collection of 295 gauge locations in mountainous Central Asia with norm discharge as well as time series of river discharge from 135 of these locations collected from hydrological yearbooks in Central Asia. Time series have monthly, 10-day and daily temporal resolution and are available for different duration. A collection of third-party data allows basin characterization for all gauges. The time series data is validated using standard quality checks. Norm discharge is validated against literature values and by using a water balance approach. The novelty of the data consists in the combination of discharge time series and gauge locations for mountainous rivers in Central Asia which is not available anywhere else. The geo-located discharge time series can be used for water balance modelling and training of forecast models for river runoff in mountainous Central Asia.


Background & Summary
While highly vulnerable with regard to water availability and its impacts 1,2 , Central Asia remains for all practical purposes a hydrologically data-scarce region 3,4 .Knowledge about river discharge is the basis of understanding a hydrological system and of performing adequate water management.Forecast models for river runoff rely heavily on observations of past discharge for model setup and calibration 5 as well as for example for the validation of gridded discharge products 6 .The presented data set provides the, to date, most comprehensive publicly available collection of in-situ measured discharge for Central Asia, including Afghanistan (see Fig. 1).It further provides a full basin characterisation as well as monthly time series of average temperature and snow cover and of monthly precipitation sums for each basin.The data is available as geopackage and can be downloaded from Zenodo 7 .Codes used to process the data are available from the same location.

Methods
This section describes how the third-party data has been processed (in sequence) to obtain the resulting data layers (see Table 1).For reproducibility, the steps and R scripts to arrive at the data set presented here are made available in a Zenodo repository 7 .The repository further includes more detailed step-by-step instructions for anyone wishing to reproduce the data set (see also chapter Usage notes).

Hydrological data.
In former Soviet Central Asia, water levels in rivers were monitored typically twice a day, generally by a nearby resident observer.Few stations were equipped with water level tape recorders.During low-flow periods in winter, one measurement per day may have been taken (for example at gauge Mullala on the Pskem River (Chirchiq River basin).During high-flow periods, more than 2 measurements per day may have been taken.The observation frequency at each gauge is documented in the hydrological yearbooks but may change from one year to the next which has an impact on statistics derived from the observations.To this date, the data is collected and communicated to the regional hydrometeorological office on a daily basis where it is copied to ledgers and sent to the national hydrometeorological office.We will refer to the national hydrometeorological services as Hydromets in this article.From these ledgers, data is copied and processed in relation to the various information products the national Hydromets provide (e.g.hydrological yearbooks, long-term norm discharge, used for discharge forecasting).The organizational charts and tasks of the individual Hydromet services vary slightly from country to country.For the data collection in this paper, data were available from Kazakh 8 , Kyrgyz 9 , Tajik 10 , and Uzbek Hydromets 11 but not from Turkmen Hydromet.Nowadays, this data collection and processing process is being digitized but the integration of digital data into well-established workflows is slow.
Discharge data can be accessed in daily resolution from the hydrological yearbooks archived in the Hydromets.The data is publicly available but needs to be digitized manually.The Hydromets offer daily river discharge data in digital format for sale.The present discharge time series dataset relies mostly on decadal (10-days)  or monthly data gathered and processed by the Hydromets and published in the hydrological yearbooks.Time series data is available for a subset of gauges in the present data set.For each gauge in the data set, the norm discharge has been obtained from the hydrological yearbooks.Each gauge is attributed a reference called SOURCE (see Table 2) indicating the source of the data.
Gauge levels are collected in Afghanistan 3 times a day in a form or in 15 minutes intervals at newer automatic stations.Through monthly flow meter measurements, the rating curve is updated and discharge is calculated.During the flood season, flow meter measurements are collected every 10 days if possible.The data is copied in the regional office to a spreadsheet and subsequently sent to the Ministry of Energy and Water in Kabul.Historical monthly data is available until approximately 1980 (included in the presented data set).Newer data that has been collected in Afghanistan after 2008 is not available here.

Cleaning of time series data.
The raw data consisted of many datasets in different formats and types.
Irrelevant texts and duplicates were removed.Then, all discharge data were converted to the same type; dates were synchronized and changed to the same date format.All data sets were formatted into a long (narrow) format and Fig. 1 Overview over gauge locations and their respective catchment areas (slim grey lines) in Central Asia where discharge norms are available (light grey points) and where time series data is available (dark grey points).The background map is a hillshade layer derived from the DEM 14 .The clear blue polygons denote lakes 67 and the dark blue lines indicate rivers 68 .Administrative boundaries (GADM v.4.1, gadm.org) are shown as thick grey lines.Mountain ranges (grey labels) have been digitized manually.combined together.To make sure the data were not erroneous, visual and manual checks of all individual time series were done at all steps of cleaning.All data preparation was done in R (r-project.org/).Gauge locations for which time series data is available have the flag has ts _ set to TRUE in the attribute table.The temporal resolution of the time series was determined (daily, 10-day, or monthly) (gauges attribute: res).The start and end of the time series were extracted and stored in the gauge attributes ts start _ and ts end _ .The period between ts start _ and ts end _ is henceforward called the observation period.The total number of observations in the observation period is given in the attribute n complete _ .This number indicates the number of time steps at a given temporal resolution in an observation period.Included in n complete _ is the number of missing observations in the observation period (attribute n miss _ ).n propmiss _ gives the proportion of missing observation in the entire available time series for each station and n largestgap _ shows the length of the largest observation gap in years.
Gap filling of time series data.Individual data gaps of length 1 were filled using linear interpolation in R by applying the function na.seadec of the imputeTS package 12 with option linear.This lead to the imputation of 1 monthly value in time series of monthly resolution, 1 10-day average value in time series of 10-day resolution and 1 daily value in time series of daily resolution.Longer gaps were left as are.Gap-filling resulted in a median change of the long-term norm discharge and mean monthly discharge of 0% and a mean change of 0.2% which was deemed acceptable.

Gauge locations.
As there are no geolocated consistent records of gauging stations available, all gauging stations were manually located in a Geographic Information System (GIS).In the former Soviet region, parts of gauging station names were often consisting of the village names where they are (were) located.For this reason, we developed a workflow to manually match station location names with village names found on the relevant Topographic maps (1:200'000) from the corresponding region and along the corresponding rivers.The topographic maps of the entire region were downloaded from https://maps.vlasenko.netand subsequently manually georeferenced in QGIS (qgis.org).Gauge locations were then inferred by visually inspecting high-resolution optical remote sensing imagery by locating obvious measurement locations, such as bridges or installations that would allow for cross-section measurements of water depth and velocity.The gauge locations are provided in the gauges layer.
COUNTRY UZB GADM Contry the gauge is located in.

Catchment boundaries.
The delineation of the catchment boundaries was done using WhiteboxTools v2.0.0 13 which allows the concurrent delineation of watersheds from multiple gauge points.The workflow requires the gauge locations and a digital elevation model (DEM) as input.Here we use the SRTM DEM Global 1 arc second product 14 .
The catchment delineation can be reproduced using the script WatershedDelineation MultiplePourPoints V Rmd _ _2. in the Zenodo repository 7 (see also usage notes).Some catchment boundaries are not correctly delineated by the automated process and had to be edited manually in QGIS and then merged to the rest of the basins, namely the basins of gauges with ID 60029 (Kassansay River/Кассансай, gauge Uryukty/Урюкты).The R package sf 15 was used to derive the area of each basin (basins attribute: area km _ 2) which was compared to the literature [16][17][18] (see technical validation).The outlines of the catchment boundaries are provided in the GIS layer basins.gpkg.extraction of basin characteristics.Third-party data has been downloaded and extracted to each basin using the R package exactextractr (https://github.com/isciences/exactextractr)for fast, efficient extraction of raster data on polygons (basin boundaries in the present case).All third-party data is publicly available for free.For storage space reasons, we do not provide third-party data in this data repository but only the extracted results, i.e. average values for each basin.We do, however, provide the sources of each data type and detailed instructions on how to download and process the third-party data.The corresponding R script extract_and_compile_catchment_data.Rmd is available on Zenodo 7 (see usage notes).A list of third-party data available for each basin is given in Table 3.The paragraphs below describe how the third-party data in the basins layer is derived.
Statistics derived from DeM.The 30-arc-second global SRTM DEM 14 was used as the basis to derive average characteristic parameters for each basin.The average basin slope (basin attribute: slope) and aspect (basin attribute: aspect) have been computed using the terrain function of the R package raster (http://CRAN.R-project.org/package=raster) with a method suitable for rough surfaces 19 .The Terrain Ruggedness Index (TRI, basin attribute: tri) is the basin average over the mean of the absolute differences between the value of a cell and the value of its 8 surrounding cells.The Topographic Position Index (TPI, basin attribute: tpi) is the basin average over the difference between the value of a cell and the mean value of its 8 surrounding cells.Roughness (basin attribute: roughness) is the basin average over the difference between the maximum and the minimum value of a cell and its 8 surrounding cells.tri, tpi and roughness are calculated using the raster package 20 .The attribute flowdir contains the basin average of the direction of the largest difference between a cell and its neighbors.It corresponds to the average flow direction in the basin.
Climate data.CHELSA v2.1 is a high-resolution (30 arc seconds or approx. 1 km grid resolution) climate data set.CHELSA v2.1 daily precipitation and temperature [21][22][23] between 1981 and 2010 have been cut to the Central Asian domain.CHELSA temperature fields are produced using statistical downscaling from ERA5 temperatures and CHELSA precipitation fields incorporate orographic predictors and bias correction with station data from GPCC 21 .Daily average precipitation and temperature time series have been extracted for each basin and aggregated to monthly values.CHELSA v.2.1 climatologies 21,22 have been downloaded and average climatologies have been extracted for each basin.
Monthly time series of average basin precipitation sums and mean basin temperature are calculated from daily values and included in the basin layer with attribute names pr or tas for precipitation or temperature respectively, followed by mon and ending with the year and the month of the year as numbers, separated with an underline.
Table 3. Attributes of the basins layer.This table only gives an overview, please refer to the basin_attribute_ description layer in the geopackage for more details.
the basin layer includes growth season statistics, for example, the first and last days of the growing season based on the daily temperature and precipitation fields in CHELSA that can be used for the estimation of tree line position (TREELIM) 24 .Further, the bioclimatic indicators include an estimate of net primary production made with Lieths Miami model 25 .The parameters are described in detail in the CHELSA v2.1 technical manual 26 .
A brief description of the variables is given in the basin description table included in the data set presented here (layer basin attribute description _ _ in the geopackage).The data set further includes classifications of climatologies according to Köppen-Geiger 27 (basin attributes: kg 0 and kg1), modified Köppen-Geiger 28 (kg 2), climatologies according to Wissmann 29 (kg3) and Thornthwaite 30 (kg 4) and Troll and Paffen 31 (kg5).
For comparison with the CHELSA climatologies, average basin WorldClim climatologies between 1970 and 2000 32 are provided in the basin layer.The attributes include average annual norm precipitation as well as average annual cold and warm season precipitation (attributes: pr worldclim ann _ _ , pr worldclim cs _ _ and pr worldclim ws _ _ respectively).Further global data sets have been included for comparison: The CHELSA W5E5 precipitation product 33 (basin attribute: pr ann w e _ _ 5 5); the PBCorr precipitation product 23 which includes bias corrections for the following data sets: CHPclim V1 34 (basin attribute: pr ann chp _ _ ) and WorldClim v21 32 (basin attribute: pr ann worldclim _ _ ); and the average annual evaporation and aridity index between 1970 and 2000 (CGIAR) 35,36

Snow cover.
Based on the daily pixel values of CHELSA v2.1 temperature, daily snow cover fractions are estimated 37 , aggregated to annual data and averaged over the period between 1981 and 2010.The average annual snow cover fraction in per basin is given in the basin attribute fs ann chelsa _ _ .Monthly mean snow covered fraction of basin area between January 2000 and December 2021 was extracted from the daily snow cover products (MOD10A1 and MYD10A1) of Moderate-Resolution Imaging Spectroradiometer (MODIS) satellite imagery 38 We use the MODIS snow cover mapping based on the Normalized Difference Snow Index (NDSI), which reveals the magnitude of the difference between reflectance in visible bands and in the shortwave infrared, respectively.A high difference is typical for snow.We use the NDSI Snow Cover _ _ band, which represents the Snow Cover Fraction (SCF) at the subpixel level within 500 m grid cells 39 .For the combination of the two MODIS products and for cloud-gap-filling we use the method as detailed in Tang et al. 40 .Maps and time-series of SCF from every study basin can be viewed and downloaded through an Earth Engine application (https://hydrosolutions.users.earthengine.app/view/snowcovermapper-ca).
Land cover.Land cover data from the Copernicus 100 m 2019 land cover data 41 set was downloaded and extracted for each basin.The basin attributes are names using the prefix lc with the id number of the Copernicus land cover class.For example, lc_20 for the basin area classified as shrubs in square kilometers.The class ids of the Copernicus land cover classification are taken from Buchhorn and colleagues 42 and reproduced in the Glacier storage.Glacier areas are extracted for each basin from the Randolph Glacier Inventory Version 6.0 43 (basin attribute: gl A km _ _ 2).The glacier volume (gl V km _ _ 3) is estimated on a per-glacier basis using the empirical area volume scaling function by Erasov 44 and aggregated for each basin.The fraction of glaciated area per basin (basin attribute: gl fr _ ) is computed as gl A km area km _ _ 2/ _ 2. Average glacier thinning rates as water equivalents per year and average annual glacier mass loss between 2000 and 2010 from Hugonnet and colleagues 45 were extracted and aggregated by basin (basin attributes: gl dmdtda mma _ _ and gl dmdt km a _ _ 3 ).

Mapping of gauge network.
For each gauge within a river system, the code of the downstream gauge was manually added as attribute to the spatial basins layer (attribute name: dnstr gauge _ ) in QGIS 3. Further, the sub-basins were manually assigned an attribute basin order _ indicating if the catchment is a headwater basin (basin order _ = 0) or a downstream basin (basin order _ > 0).The higher the basin order _ , the more upstream basins drain through a downstream gauge.The mapping of the gauge network was validated in R using network functionalities from the GGally package (https://github.com/ggobi/ggally).

Data Records
The GIS layers and discharge time series presented here are available as a geopackage through Zenodo 7 .Table 1 gives an overview of the available layers in the geopackage.Each layer is described in more detail in the following sub-sections.The presented data set includes a gauges layer with point features for each gauge location (gauges), a basins layer with the outlines of the catchment areas for each gauge location (basins), an attribute table with average basin characteristics (basin attributes _ ), an attribute table description with detailed explanations for each basin attribute (basin attribute description _ _ ), and a table with discharge time series data (discharge time series _ _ ).The gauges layer shows 297 features (or gauges) while the BASIN and basinattributes layers only show 295 fea- tures (or basins).This is due to the fact that for one gauge (station 17050 on the Gunt River) decadal as well as monthly time series are available and for another gauge (station 16070 on the Small Naryn River) daily and decadal data are available which are only partially overlapping.These two time series are marked in the gauges layer as well as in the discharge time series layer with codes 17050d and 17050m for the higher resolution and the gauges layer.Table 2 gives an overview of the gauge attributes included in the data set.All gauges are uniquely identified throughout the data set by their station codes (attribute CODE) which either consists of a 5-digit code or a combination of digits and characters as detailed hereafter.The 5-digit code starting with 15 refers to gauges in the basins of the Chu River and Talas River as well as of lake Issyk Kul.The number starts with 16 for gauges in the Syr Darya basin, and with 17 for gauges in the Amu Darya basin [8][9][10][11] .For some gauges, it was not possible to identify the official 5-digit code of a station.This is the case for example for gauges which stopped measuring after independence, more than 30 years ago.For these gauges an arbitrary 5-digit code was assigned, starting with the basin identifier (5 for Chu, Talas and Issyk Kul, 6 for Syr Darya and 7 for Amu Darya) and
h_mean Mean elevation of the basin in metres above mean sea level 14 .
h_min Minimum elevation of the basin in metres above mean sea level 14 .
h_max Maximum elevation of the basin in metres above mean sea level 14 .
slope Mean slope in the basin 14,19 aspect Mean aspect in the basin 14,19 tpi Mean Topographic Position Index in the basin 14,20 tri Mean Terrain Ruggedness Index in the basin 14,20 roughness Mean topographical roughness 14,20 flowdir Mean flow direction in a basin 14 lc_X Copernicus land cover in km2 with X ranging from 20 to 126 according to the Copernicus land cover classes 41,42 gl_A_km2 Glaciated area from the Randolph Glacier Inventory v6.0 43 gl_V_km2 Glacier volume estimated using Erasov, 1968 44 gl_fr Fraction of glaciated area in total basin area gl_dmdtda_mma Basin average of per-glacier thinning rates in water equivalent by Hugonnet and colleagues 45 gl_dmdt_km3a Basin average of per-glacier mass loss rates in water equivalent by Hugonnet and colleagues 45 fs_ann_chelsa Snow cover fraction estimated from daily CHELSA v2.1 21,22 ai_ann_bio_chelsa Aridity index calculated from daily CHELSA v2.1 21,22 bio1 till wi Bioclimatic parameters included in CHELSA v2.1 26 .
pr_mon_X_Y Sum monthly precipitation in mm/month for year X and month Y tas_mon_X_Y Average monthly temperature in deg C for year X and month Y pr_ann_w5e5 Average annual precipitation from CHELSA W5E5 33 pr_ann_chpclim Average annual bias-corrected CHPclim precipitation from 23,34 pr_ann_worldclim Annual average bias-corrected Worldclim precipitation from 23,32 X_ann_cgiar Average annual X [pet: evaporation, ai: aridity index] 35,36 scf_X_Y Snow covered fraction TODO citations from Silvan dnstr_gauge Gauge code of downstream lying gauge along the river network.
basin_order Number between 0 and 9 indicating where along a gauge network the gauge lies.0 for headwater basins and higher numbers for downstream gauges.ending with a digit identifying the gauge (e.g.60003).Afghan stations consist of a first 1-2 digit number identifying the river, followed by a combination of 4 and 2 digits and characters, separated by a hyphen 46 .Gauge locations are available as geometry attributes in a WGS84 projection (EPSG:4326) (geom) as well as attributes in EPSG:4326 and EPSG:32642 (UTM 42 N) projections (EASTING, NORTHING, LON , LAT ).English names for gauges are available as transliterations of Russian gauge names (where available) [8][9][10][11][16][17][18] . Russia gauge names are written in Cyrillic letters which require UTF-8 compatibility for display.The name of the river at which the gauge location is located is given in the attribute name.The attribute COUNTRY contains the country within which boundaries the gauge location falls.The BASIN attribute shows in which regional river basin the river is draining into.The long-term average or norm discharge is given in attribute q m s _ 3 .The attribute Source indicates the source of the norm discharge value.The data sources include the following: USGS Data Report 529 46 , Kazakhhydromet 8 , Kyrgyzhydromet 9 , Tajikhydromet 10 , Uzbekhydromet 11 , Surface Water Resources, Vol.14, Issue 1 16 , Surface Water Resources, Vol.14, Issue 3 17 , and Yuri Ivanov 18 .The norm discharge is not calculated from time series data but taken from the aforementioned sources.The attribute has ts _ is a TRUE or FALSE flag indicating if time series data is available for a gauge location.If has ts _ is TRUE, the remaining attributes contain information about the temporal resolution of the time series data (res), the first and the last date of the time series (ts start _ and ts end _ , also called the observation period), the total number of time steps at a given resolution in the observation period (n complete _ ), the number of missing observations in the observation period (n miss _ ), the proportion of missing values in the observation period (n propmiss _ ) and the size of the largest data gap in a count of observations (n_largestgap).If has ts _ is FALSE, the remaining attributes are empty.
the basins layer.Each basin in the basin layer is linked to the gauges in the gauges layer through the attribute The basins layer only includes a few basin attributes (see Table 3) to keep it at a reasonable size for visualizations, namely CODE, BASIN , SOURCE, REGION , area km _ 2, q m s _ 3 , q m a _ 3 , and q mm _ .a more detailed description of these basin attributes is given in the layer basin attribute description _ _ which is discussed below.The attributes CODE, BASIN , and SOURCE are the same as in the gauges layer.Further, the attribute q m s _ 3 in the basins and basins attributes _ layer corresponds to the attribute q m s _ 3 in the gauges layer.
the basin attributes layer.The basin attributes _ layer is linked to both gauges and basins layers through the CODE attribute.Table 4 gives an overview of the attributes included in the basins attribute layer.For space reasons, this table remains descriptive.A detailed description, including data units and examples, is given in the layer basin attributes description _ _ of the geopackage.The first basin attributes from CODE to SOURCE are identical to the gauge attributes.The attribute REGION gives the name of the drainage basin of regional importance.The long-term average runoff in m 3 /s produced from the basin (q m s _ 3 ) corresponds to the gauge attribute QNORM M S _ 3 from which attributes q m a _ 3 and q mm _ are derived.Attributes derived from the DEM range from the basin area (area km _ 2), over the basin averages of elevation (h mean _ ), slope (slope), aspect (aspect), Topographic Position Index (tpi), Terrain Ruggedness Index (tri), topographical roughness (roughness) to the average flow direction in the basin ( flowdir).Further, the basin minimum and maximum elevations are given in the attributes h min _ and h max _ respectively.Elevations are given in meters above mean sea level (masl).The areas of different land cover classes 41 4).These include the aridity index (ai ann bio chelsa _ _ _ ), the annual mean temperature (bio1), mean diurnal range (bio2), isothermality (bio3), temperature seasonality (bioi4), the maximum temperature of the warmest month (bio5), minimum temperature of the coldest month (bio6), temperature annual range (bio7), mean temperature of wettest quarter (bio8), mean temperature of driest quarter (bio9), mean temperature of warmest quarter (bio10), mean temperature of coldest quarter (bio11), annual precipitation (bio12), precipitation of wettest month (bio13), precipitation of driest month (bio14), precipitation seasonality (bio15), precipitation of wettest quarter (bio16), precipitation of driest quarter (bio17), precipitation of warmest quarter (bio18), precipitation of coldest quarter (bio19).Thereby a quarter corresponds to one-fourth of a year, i.e. 3 months.Further attributes include the frost change frequency ( fcf ), the first day of the growing season ( fgd), the growing degree days heat sum above 0, 5 and 10 degrees Celsius (gdd0, gdd5, gdd10), the first and the last growing degree day above 0, 5, or 10 degrees Celsius (gdgfgd0, gdgfgd5, gdgfgd10, gdglfgd0, gdglfgd5, gdglfgd10), the growing season length (gsp), the mean temperature of the growing season (gst), the accumulated precipitation in the growing season(gsp), the maximum, mean, minimum, and range of relative humidity (hurs ax m , hurs ean m , hurs in m , hurs ange r ), the climate classifications (kg 0, kg1, kg 2, kg3, kg 4, kg5), the last growth day of the season (lgd), the number of days at which the daily average temperatures is above 0, 5 or 10 degrees Celsius (ngd0, ngd5, ngd10), the net primary production (npp), the maximum, mean, minimum and range of monthly potential evaporation, wind speed, total cloud cover, and vapor pressure deficit (   The quality flags layer.Validation of discharge time series, norm discharge as well as basin geometry is done.For each quality check that is done, a flag is written describing fail or pass of the test.The reader is referred to the technical validation section for a detailed description of each quality flag. the basin attribute description layer.The detailed description of the basin attributes is given in the basin attribute description _ _ layer.For each attribute (Attribute name), an example value is given (Example value), the unit of the value (Unit), the data source (Source), and one to two sentences of description for each attribute (Description).
the discharge time series layer.The discharge time series _ _ layer includes dates (date) and discharge values in cubic meters per second (value) for each gauge location listed in the layer (CODE).Figures 3, 4 give an overview of the start and end of each available discharge time series as well as the number of data points and data gaps in each time The gauges are grouped per river basin in these Figures: Fig. 3 shows gauges in the basins of the rivers Chu, Talas, Harirud, and Murghab as well as of lake Issyk Kul. Figure 4 shows gauges in the basins of the Syr Darya and Amu Darya.
The time series from gauges in Afghanistan are only available in monthly time steps and end before 1979, i.e. before the Soviet-Afghan war.Data gaps are present in several gauge locations in the early 90es when, after the demise of the USSR, hydrological monitoring was interrupted (see for example Fig. 5).Most of the time series presented here go up to the year 2012.More recent data are available in the hydrological yearbooks.

technical Validation
Monitoring data from Central Asia is extremely rare and thus also almost impossible to validate in the proper sense of the word.We attempt to estimate the quality of the discharge data, the gauge locations and the derived watersheds for each gauge using a number of different angles and third-party data to help potential users in assessing the data quality.We use flags for each quality test performed (see Table 5 for an overview over the quality flags and their meaning).

Discharge time series.
No alternative in-situ measurements are available to validate the discharge time series data from the hydrological yearbooks.We therefore choose the following methods to quasi-validate the time series data: (a) classification of discharge time series in runoff regimes, and (b) Comparison of long-term average river runoff.

River runoff regimes.
We use the river regime classification suggested by Viktor Shultz 47 to assign discharge regimes to the river runoff time series.If the discharge classification is not consistent with the expected classification according to basin elevation, glaciation and precipitation patterns, a qual regime _ is set to FALSE, otherwise, it is set to TRUE.The classification was done in decreasing priority with the Shults coefficient δ, the July to September runoff in percentage of the annual runoff (    Comparison of average runoff.The average annual river runoff was calculated as follows: Average monthly discharge was first calculated by averaging daily data if at least 26 daily observations were available in a given month or by averaging decadal data if at least 2 out of 3 observations were available in a given month.In the second step, the annual average discharge was calculated by averaging the monthly discharge if at least 11 observations were available in a given month.Finally, the annual average discharge was calculated only if at least 8 years of data were available.As the time period, over which the norm discharge reported by the Hydromets is not known, we used the entire available time series data to calculate the average annual discharge from the time series data. Differences in the average annual runoff between the long-term discharge provided by the Hydromets (basin attribute q_m3s) and the average annual runoff computed from the time series data of 20% were assumed acceptable.For three gauges, the average annual discharge deviates by more than 20% from the norm discharge reported by the Hydromets, namely gauges 8-0.000-1M (Hari Rud River at Tir Pul), 16093 (Torkent/ Торкент-То), 16137 (Kurshab River/Куршаб, gauge Kochkor-Ata/Кочкор-Ата), 16151 (Maylisu/Майлису, Kayragach/Кайрагач), 17050 (Gunt River/Гунд, Khorog/Хорог, both decadal and monthly resolution), 17223 (Sherabad River/Шерабад, gauge 0.4 km above the confluence with the Maidan River/в 0.4 км выше устья р.Майдан), and 17462 (Kysylsuu West/Кызылсу Западная, Daraut-Korgan/Дараут-Курган).These gauges are assigned quality flag qual_normq FALSE.

Time series with different resolution.
For two gauges, 17050 (Gunt River) and 16070 (Small Naryn River), we have time series at two different resolutions available (decadal and monthly in the case of the Gunt River and daily and decadal in the case of the Small Naryn River).Both higher and lower frequency data for the Gunt River are from the Tajik Hydromet and both higher and lower frequency data for the Small Naryn River are from the Kyrgyz Hydromet.For comparison, the daily and decadal data are aggregated to monthly data.
Figure 7 shows the discharge time series at gauge 16070 in the Small Naryn River during the time period where both time series overlap.The daily and decadal time series are fairly consistent with a mean difference of 0.2%.One inconsistency occurs in the 2006 winter season where the decadal data shows an increase in discharge where the daily time series shows a decrease in discharge.The discrepancy between the two data sets is up to 80%.It is impossible to say which time series is correct.
Figure 8 shows the same visualization for the Gunt River station.Also in this case, the two data sets are fairly consistent.
To test if the average monthly and annual discharge is consistent over time, we calculate mean monthly and mean annual discharge as well as standard deviations of monthly and annual discharge in up to 7 30-year periods, depending on the length of the time series: from 1940 to 1970, 1950 to 1980, 1960 to 1990, 1970 to 2000,  1980 to 2010, and from 1990 to 2020.We only calculate the statistics if more than 26 daily or more than 2 decadal values are available per month, if at least 11 monthly values are available per year and if at least 8 years of data are available within a 30-year period.If the mean plus/minus one standard deviation ranges of the monthly or annual discharge values across the 7 averaging periods do not overlap, we detect a strong change in mean monthly and annual discharge respectively.The flag qual consistency _ is assigned −1 when an overall decrease of average monthly and annual discharge is detected, 0 when no change is detected and +1 when an overall increase in discharge is detected.Gauges with no discharge time series information or gaps in the time series are assigned qual consistency _ NA.Two time series display strong decreasing discharge trends, namely 15102 (Chu/ Чу, Kochkorka/Кочкорка) and 15215 (Kara-Balta -Sosnovka/Кара-Балта-Сосновка).Both time series show a distinct increase in flows in the winter half-year.
Gauge locations and basin outlines.The gauge locations and basin outlines have been validated visually in QGIS by zooming in on selected gauges and basins and following the basin outline on a topographical map Fig. 9 Area derived from the catchment delineation (area km _ 2) vs. the basin areas from literature (Reference area) [16][17][18] .Orange points indicate basins extending to the far east parts of the Pamir mountains where the basin areas from the literature are deemed inexact.Red points with gauge labels indicate a deviation of more than 20% between literature areas and derived areas.
(DEM 14 and hillshades derived on DEM with a glacier area overlay 43 .It was thus visually ensured that the basin boundaries follow the watershed boundaries.It was further verified that the gauge attributes NORTHING, EASTING, LON , and LAT lie within the bounding box of the basin polygons.The basin areas calculated on the catchment outlines (basin attribute area km _ 2) have been compared to 138 basin areas found in the literature [16][17][18] (see Fig. 9).The basin areas in the far east of the Pamirs (CODEs 17057 to 17065, orange points in Fig. 9) may not be correct in the literature as the Soviet topographical maps only extended up to the Chinese border and the catchment delineation may not have been correct then.These basins are thus excluded from the comparison with catchment areas from the literature.Figure 9 shows catchments of gauges with a deviation of at least 20% from the area reported in the literature in red with the gauge CODEs as labels.The placement of these gauges was re-checked with the available literature but the catchments of the gauges could not be improved based on the available information and the source of this deviation in the area is unknown.The authors did not wish to remove these gauge points because some of them may be of public interest (e.g.downstream of a large mining pit).The 11 gauge catchments in question (CODE 60003 (Baydula/Байдула), 60011 (Kekemeren/Кекемерен, 1.8 km Djumgol), 60015 (Kumtor/Кумтор, Tyan-Shan/Тян), 60018 (Nichkesay/Ничкесай), 60019 (Orto/Орто, Kugandy/Куган), 60020 (Ottuk/Оттук), 16164 (Abshirsay/Абширсай, Uch-Terek/Уч-Терек), 16198 (Sokh/Сох, Sarykanda/Сарыканда), 16210 (Khodjabakirgan/Ходжабакирган, Andarkhan/Андархан), 17165 (Siama/ Сиама, Igizak Mouth/Устье Игизак), and 17169 (Kurortnaya/Курортная, Kusheri/Хушъери)) are, however, excluded from the discussion of the basin attributes.Only for gauge 16198 discharge time series are available.These 11 basins are assigned flag flag area _ FALSE.Possible explanations for area discrepancies may be incorrect placing of gauge location, copying error in the present data set or in the literature, or incorrect delineation of the basin area because of problems in the underlying DEM.Norm discharge.While many Hydromets have very stringent monitoring quality requirements (errors of less than 5% in discharge), the actual monitoring process by the gauge operators is often less than optimal as anecdotal evidence suggests.Errors in monitoring have for example occurred by the observer moving the gauge location closer to his hut or by observers inventing water level data.The cross-section at the gauge location may not be maintained properly and the accumulation (or erosion) of sediments in the cross-section can lead to water level readings which are not consistent with the water-level discharge relationship and thus lead to an over-(or under-) estimation of the river runoff.We, therefore, assume an error in the discharge of 20% which is a widely accepted rule-of-thump value for errors in river discharge among hydrologists.In-situ measured validation data for the norm discharge does not exist.Therefore, we use different methods using third-party data to assess the quality of the norm discharge data.
test discharge-elevation relationship.Figure 10 shows manually digitized observations of specific discharge and mean basin elevation 16 for 17 oro-hydrographic regions in the Syr Darya basin in black and data from the present data set in red.The correspondence of the norm discharge with the historical data is satisfying for most gauges.Outliers are visible in oro-hydrographic zone 3 which is described as the middle part of the Naryn basin (i.e. from the confluence of the small and big Naryn rivers to the confluence of the rivers Naryn and Kekemeren.Gauges which do not satisfy the relationships are flagged qual module _ FALSE.Reasons for discrepancies between literature data and the present data can be reporting errors, errors in gauge location or catchment area delineation, but also natural variability of discharge or changes in discharge since the collection of the literature data more than 50 years ago.For example, the tributary basins in the large middle Naryn oro-hydrographic region are highly heterogeneous and variable in their discharge production.Outliers in the discharge-elevation relationship in Fig. 10 in the middle Naryn basin are therefore not necessarily wrong but may reflect natural variability within the region.As the exact gauge locations which were used to derive the literature relationship are not documented, it is further possible that gauges flagged here were not used to derive the relationships.The flagged gauges are 16081 (Kekirim-Kara-Tabylga/Кекирим-Ка), 16298 (Nauvalisoy/Наувалысай, Sidjak/Сиджак), 60013 (Kekemeren/Кекемерен, Sarykamysh), 60018 (Nichkesay/Ничкесай), 60038 (Oygaing/Ойгаинг, above mouth Koksu, выше устья р.Коксу), and 60039 (Koksu -mouth/Коксу -устье).
Water balance.The long-term water balance over a basin, neglecting sub-surface flows, is given as = − Q P E where Q is the long-term norm discharge at the outlet of the basin, P is the long-term norm precipitation over the basin area, and E is the long-term norm actual evaporation over the basin area.Since global warming is accelerating glacier mass loss which contributes to a de-storage of water in glaciated basins, the water balance can be extended to include glacier mass loss dS as Q P E dS = − + (neglecting other potential long-term changes of storage in the basin such as permafrost).
As mentioned above, the time period to calculate the norm discharge reported by the Hydromets (attribute q m s _ 3 ) is not available.It is therefore not possible to choose weather data from the exact same time period as the discharge measurement period.Also, publicly available weather station data does not cover the remote mountainous catchments.We thus have to rely on publicly available gridded data products.We calculate average annual precipitation for each basin as mean over 7 gridded precipitation products (P mean ): CHELSA v2.1 (P CHELSA ), WorldClim v21 32 , CHPclim 34 , CRU 49 , GPM IMERG 50 , CHIRPS 51 , and APHRODITE 52 which all have shown acceptable performance in Central Asia [53][54][55][56][57] .Depending on the basin, the difference in precipitation can be up to 1'000 mm/a from one precipitation product to another.for each basin.This standard error was also assumed to hold for the mean over all precipitation products as well as for individual precipitation products.All precipitation products are extracted between 1981 and 2010 except for GPM IMERG which is only available after 2000 and was extracted between 2000 and 2010.
Actual evaporation was estimated using two different methods: (a) Based on the aridity index and precipitation from CGIAR/WorldClim 32,35 and CHELSA respectively using the Budyko framework 58 with Fu's equation as done by Beck and colleagues 23 whereby the mean actual evaporation using these methods is denoted as E Budyko and (b) using gridded data products SSEBop 59 and PML 60 which both show good performance on global scale 61 (the mean actual evaporation from SSEBop and PML is denoted as E grid ).Where the mean over all 4 data prod- ucts & methods has been used we write E mean .Actual evaporation estimated using the Budyko framework was consistently higher than the average between SSEBop and PML (by 170 mm on average).SSEBop data is available between 2003 and 2020, PML data is available between 2001 and 2020.We assume that the standard error is a suitable error statistic to represent the uncertainty of the average of actual evaporation from either the Budyko method or from the gridded data sets thus resulting in s E Budyko , and s E grid , calculated analogous to s P with = N 2 and to s E calculated with N 4 = .Glacier mass loss from Hugonnet and colleagues (denoted as dS) and their error estimates (denoted as s dS ) were used 45 . We

−
, whereby P CHELSA refers to bio12, P mean is the mean over 7 precipitation products as described above, E Budyko is the average actual evaporation estimated using the Budyko framework, and E grid is average actual evaporation from SSEBop and PML.
The errors were propagated accordingly by summing up the standard errors of the respective variables in each equation.For the first equation this yields s s s P EBudyko dS , + + as an example.We compared typical model performance statistics used in hydrology (mean error, mean absolute error, root mean squared error (RMSE), percent bias, Nash-Sutcliffe efficiency, index of agreement, Klinge-Gupta efficiency, and volumetric efficiency among others) using the R package hydroGOF 62 .The water balance model P E CHELSA grid − thereby showed the best performance with an RMSE of 91 m 3 /s and volumetric efficiency of 0.5.The associated uncertainty of the runoff estimated with the water balance is then the sum of the standard errors over the 7 precipitation products and the 2 gridded evaporation products (s s P Egrid ,

+
). Figure 11 shows the norm discharge (q mm ) against the discharge calculated using the water balance approach described in the paragraph above with associated uncertainties.Gauges for which there is no overlap between the discharge calculated from the water balance plus/minus the combined uncertainty of the calculated discharge and the norm discharge plus/minus the uncertainty of the norm discharge (assumed to be 20%), _ FALSE (this is the case for 46% of the gauges).The formula used to assess the overlap and to accept the water balance is Gauge network.To validate the mapping of the gauge network, the basin order was visualised against norm discharge for each gauge that passed the area validation (Fig. 12).Unless there is an abstraction of river water, downstream gauges should show higher runoff than upstream gauges.This is given in the Naryn basin for example.In the Harirud basin, river runoff decreases in the higher-order downstream gauges.An inspection of an optical satellite image shows irrigated agriculture along the lower reaches of the Harirud river, accounting for the decrease in river discharge despite increasing river order.We flag downstream gauges which have lower discharge than the sum of upstream gauges at the same basin order with qual order _ −1.Gauges with discharge larger or equal to the sum of the discharge of upstream gauges at the same basin order are assigned qual order _ 1. Gauges for which this test does not apply, i.e. headwater gauges, are assigned qual order _ 0.
time series of basin attributes.From the time series of basin average precipitation, temperature, and snow-covered fraction the average seasonal pattern over all basins was derived (see Fig. 13).The basins show highly variable precipitation patterns.The summer months are dry in all basins with an average of below 50 mm/month of precipitation.Precipitation falls mostly in spring and winter with averages of 70 to 120 mm/month.The temperature pattern is more homogeneous with maximum temperatures in the summer of 12 degrees Celsius (deg.C) on average and minimum temperatures in the winter months of −2 deg.C on average.The snow-covered fraction is highly variable as well but shows the expected pattern of a high snow-covered fraction in winter months and a low snow-covered fraction in summer months.
We calculate Sen's slopes 63 on annual aggregates of the monthly time series of average precipitation per basin, average temperature per basin (both CHELSA v2.1) and average snow cover fraction per basin (extracted from MODIS) to check for trends in the data sets.For Fig. 14 we chose a threshold of 0.2 for the p-value meaning that we accept a 1 in 5 chance of displaying a spurious trend.About 40% of the Sen's slopes calculated on annual precipitation sums have a p-value above 0.2 and are thus not visualized in Fig. 14.Precipitation between 1981 and 2010 shows strong trends in the southern Alai mountains of −40 to −60 mm/a or −1200 to −1800 mm in 30 years (for example basin of gauge 17107).This basin and the surrounding ones show a step decline of annual precipitation of around 1'000 mm in the mid 90ies (see Fig. 15).This step-change in the CHELSA precipitation originates from the GPCC data set which was used for bias correction 21 .The station density of the GPCC data in Central Asia reduced considerably following the demise of the Soviet Union 64 .A cursory comparison of gridded precipitation products in Central Asia that are corrected with station data suffer from the same problem of changes in station density.As long as such problems persist in precipitation products, we suggest applying a weighted ensemble approach, using several precipitation products for hydrological modelling in Central Asia.
CHELSA temperature trends in the basins look reasonable at a first glance albeit high (for example up to 0.1 deg.K/a or 3 deg.K over 30 years in the Arys basin in the north of the study area).While spatial temperature distributions are less variable than precipitation distributions and thus also the problem of station density reduction may be less prominent, it is advisable to evaluate the several temperature products prior to hydrological modelling.Trends in snow cover fraction (right tile in Fig. 14) are less pronounced than trends in temperatures or precipitation.They do however show a marked decrease in snow cover in the eastern Tien Shan mountains (basin of the Naryn) and the Zarafshan River basin.
Comparison with discharge from the global runoff data centre.The Global Runoff Data Centre (GRDC) (https://www.bafg.de/GRDC/EN/01_GRDC/grdc_node.html)provides daily or monthly discharge time series of stations all over the globe.To compare the CA-discharge data set with the GRDC data set (hereafter abbreviated with GRDC), we manually mapped the gauge CODE to each gauge location in QGIS.84 gauges from GRDC were identified to overlap with gauges in the CA-discharge data set, whereby 12 locations in the GRDC data set were found to be wrongly or imprecisely geolocated.In the mountainous area of Central Asia, the CA-discharge data has time series from 52 more available.The CA-discharge data set further features longer time series.However, for the Afghan stations, GRDC has daily values where CA-discharge has monthly values only.

Concluding remarks on data validity.
Without in-situ measurements of river runoff, the possibilities to validate the time series data are limited.The same goes for gauge locations.Future data users need to be aware of these limitations.and GPCC 64 .The significant decrease of CHELSA precipitation in the high-altitude basin can be explained by the decrease of precipitation in the GPCC precipitation product.The ERA5 precipitation does not show a step-change in precipitation in this basin.

Usage Notes
The data set is named CA-discharge and stored as a geopackage with point features, polygon features, and attribute tables.The geopackage can be downloaded from Zenodo 7 .The geopackage can be opened in a GIS software like QGIS or in a scripting language like R or Python.instructions for R. Open R and navigate to the location where you have stored the geopackage CA-discharge.gpkg.To view the content of the geopackage type sf::st_layers("CA-discharge.gpkg").This will print a table similar to Table 1 and provide you with the names of the layers in the geopackage.To read the content of individual layers, adapt the following command to read in the gauges layer: gauges < -sf::st_read("CA-discharge.gpkg",layer = "gauges").In your workspace, you will now have an object gauges of class sf containing 299 features (rows) and 20 fields (columns) as described in Table 2.
The R-scripts that were used to generate the code are available from Zenodo 7 .Users wishing to reproduce workflow using the provided scripts, require a working knowledge of R. A README guides through the process.
instructions for QGiS.Store the geopackage in a convenient location.In your QGIS browser window, navigate to the file CA-discharge.gpkg.With a double-click on the geopackage, the list of layers is opened.Load a layer to your map with a double-click.

Fig. 2
Fig. 2 Table schema illustrating the linkages between the layers in the geopackage.The attribute CODE links gauges, basins, basin attributes as well as time series.Each gauge in the gauges layer is linked to at most one discharge time series.For each attribute in the basin attributes _ layer, a detailed description is available in the basin attribute description _ _ layer.

Fig. 3
Fig. 3 Time series data availability and gaps in the sub-basins of (a) Chu, Talas and Lake Issyk Kul and (b) Murghab and Harrirud.The Observation period is indicated by the available time series' first and last dates.Daily data is indicated by light grey squares, monthly data is indicated by dark grey triangles, and decadal (10-day) data is indicated by medium grey circles in the observation period plot.The percentage of missing data has been calculated based on the observation period of each gauge and added as a label to the missing data plot.

Fig. 4
Fig. 4 Time series data availability and gaps in the sub-basins of the (a) Syr Darya (b) Amu Darya rivers.The Observation period is indicated by the available time series' first and last dates.Daily data is indicated by light grey squares, monthly data is indicated by dark grey triangles, and decadal (10-day) data is indicated by medium grey circles in the observation period plot.The percentage of missing data has been calculated based on the observation period of each gauge and added as a label to the missing data plot.
are given in attributes with the prefix lc_.A detailed description of each land cover class is given in the layer basin attribute description _ _ .The glacier related parameters are glacier area (gl A km _ _ 2), glacier volume (gl V km _ _ 3), fraction of glaciated area (gl fr _ ), glacier thinning rates (gl dmdtda mma _ _ and gl dmdt km a _ _ 3 ) 45 .The average snow cover fraction of the basin is given in the attribute fs ann chelsa _ _ .Next the entire range of bioclimatic indicators available through CHELSA v2.1 is given as basin averages (indicated as bio1 til wi in Table

Fig. 5
Fig.5 Example of data availability at 26 gauges from the Syr Darya basin with decadal observations.A change in the monitoring frequency or even the abandonment of gauges is visible in the 1990ies in many stations.

−W
VII IX ), and the month of peak discharge.Where the Shults coefficient δ is calculated from the July to September runoff in percentage of the annual runoff ( − W VII IX ) and the March to June runoff as percentage of the annual runoff (W III VI − ) as expressed in Eq. 1.

Figure 6
Figure6shows the seasonal development of discharge for 4 classes of discharge regimes according to the classification by Shultz 47 : (1) Glacio-nival regime, (2) Nivo-glacial regime, (3) Nival regime, and (4) Nivo-pluvial regime.All classified regimes are consistent with the expected regime distribution according to basin elevation, glaciation and precipitation patterns.qual regime _ flags are set to the ID of the regime (1 for glacio-nival and so on).Gauges which do not have a time series or whose time series have too many missing data points to calculate discharge statistics are set qual regime _ equal to NA (175 gauges).In total the data set includes 46 time series with the glacio-nival regime, 58 time series with the nivo-glacial regime, 12 time series with nival regime and 4 time series with nivo-pluvial regime.

Fig. 6
Fig. 6 Seasonal development of specific river discharge (discharge times series _ _ attribute value in millimetres per day) in all gauges where time series data is available.Runoff regime classification following Schultz into glacio-nival regime, nivo-glacial regime, nival regime and nivo-pluvial regimes.

Fig. 7
Fig.7 Comparison of daily and decadal discharge time series at station 16070 in the Small Naryn River.The lines represent the original time series obtained from Kyrgyz Hydromet at daily and decadal resolution (red and black respectively) and the points indicate monthly averages calculated based on the daily and decadal time series data (red and black respectively).The bottom panel shows the relative difference between the monthly aggregates of the time series with higher and lower resolutions.

Fig. 8
Fig.8Comparison of decadal and monthly discharge time series at station 17050 in the Gunt River.The lines represent the original time series obtained from Kyrgyz Hydromet at decadal and monthly resolution (red and black respectively) and the points indicate monthly averages calculated based on the decadal and monthly time series data (red and black respectively).The bottom panel shows the relative difference between the monthly aggregates of the time series with higher and lower resolutions.

Fig. 10
Fig. 10 Specific discharge vs. mean basin elevation as digitized from Surface Water Resources, Vol.14, Issue 1 16 (black) and calculated from the present data set (red).The oro-hydrographic regions are: 1 -Big Naryn basin, 2 -Small Naryn basin, 3 -Middle part of Naryn basin, 4 A -Upper part and right side of the middle part of Kekemeren basin, 4B -Left side of Kekemeren basin and Western Karakol basin, 5 -Low part of Naryn basin, 6 -Karasu Right basin, 7 -Northern part of the south-western slope of Ferghana Range, including right side tributaries of Yassy River, 8 A -Yassy and Karakuldzha basins, 8B -Tar basin, 9 A -Kurshab and Akbura basins, 9B -Abshirsay and Isfayramsay basins, 9 C -Shakhimardan basin, 10 -Aravan basin, 11 -Western part of the northern slope of Alay Range and eastern part of the northern slope of Turkestan Range, 12 -Western part of the northern slope of Turkestan Range, 13 A -Padsha-Ata basin, 13B -Kassansay and Gavasay basins, 13 C -Western part of the south-eastern slope of Kuramin Range, 14 -Akhangaran basin, 15 A -Upper and middle part of Chatkal basin, 15B -Right side of muddle part of Chirchik basin, 15 C -Left side of muddle part of Chirchik basin, 16 -Upper part of Arys basin and Pskem basin, 17 -Rivers of south-western slopes of Karatau and Boralday-Tau Ranges.

Fig. 11
Fig. 11 Scatter plot of norm discharge (attribute q mm _ ) vs. = − Q P E estimated C HELSA g rid .The crosses indicate the uncertainties in norm discharge in the horizontal dimension and in P E CHELSA grid − in the vertical dimension.Orange crosses indicate basins for which the water balance is deemed not fulfilled within given uncertainties (quality flag qual wb _ FALSE).

Fig. 12
Fig. 12 Basin discharge generally increases with increasing basin order within a basin.Exceptions are for example the Harirud River basin (top right tile) where downstream discharge decreases (probably due to abstractions).flag_order 0 indicates headwater basins, flag_order 1 indicates downstream gauges with discharge larger or equal to the sum of the discharge of upstream gauges and flag_order -1 indicates gauges with downstream gauges discharge smaller than the sum of the discharge of upstream gauges.
It should be noted that the choice of the error statistic used for estimating the uncertainty of a flux heavily influences the result, as does the choice of the precipitation and evaporation products.

Fig. 14
Fig.14 Sen's slopes over 30 years of monthly precipitation (left) and temperature (center) and snow cover fraction (right) time series.We chose an arbitrary cutoff value of 0.2 for the p.value for visualizing Sen's slopes.Please note that precipitation and temperature time series are extracted for the period between 1981 and 2010 from CHELSA v2.1 while snow cover fraction is extracted from MODIS for the period between 2001 and 2021.

Fig. 15
Fig.15 Average annual basin precipitation of gauge 17107 (mean elevation of 4011 masl, extracted from CHELSA v2.122 , ERA569 and GPCC64 .The significant decrease of CHELSA precipitation in the high-altitude basin can be explained by the decrease of precipitation in the GPCC precipitation product.The ERA5 precipitation does not show a step-change in precipitation in this basin.

Table 1 .
Layer information for the geopackage CA-discharge.gpkg.The features and fields columns indicate the dimensions of the layers.CRS stands for Coordinate Reference System.No CRS is available for data layers without geometry.Data layers are linked to geometry layers through unique identifier attributes called CODE.

Table 2 .
Attributes of the gauges layer.

Table 4 .
Full list of attributes available in the basin_attributes data table layer.This table only gives a short description of the basin attributes.Please refer to the basin_attribute_description layer in the geopackage for a complete description in table form.

Table 5 .
Quality flags that result from the technical validation of the data set.