Background and Summary

Soil erosion by water and sediment delivery to river systems are gaining political importance and scientific attention for their integral role in issues spanning across the domains of soil health1, food security2, environmental pollution3,4,5,6, greenhouse gas offsetting7,8,9,10, reservoir longevity11, and a range of other ecosystem services12,13,14,15,16,17,18. The scientific community has responded to these priorities with a continuingly increasing number of model-based assessments, ranging across the full spectrum of spatial scales relevant to the end-user19,20. While model applications have dominated the scientific output, the production and sharing of empirical observations haven’t necessarily kept pace21. Available summarised compilations of long-term annual average rates from monitored areas have unravelled large-scale spatial trends in soil loss by water erosion and fluvial sediment yield22,23,24,25, but often do so with a long-term annual average temporal focus that misses the high temporal variability between soil loss events26,27,28. Quantifications of net soil loss at dynamic timescales arguably form the basis of contemporary research priorities, which include, but are not limited to: (1) understanding the variable frequency-magnitude relationships of gross and net soil loss through space and time in a changing climate, (2) understanding the influences of management practices on the dynamics and magnitude of soil loss, (3) up/down-scaling soil loss by water erosion predictions to integrate soil loss by water erosion processes into Earth system models, and (4) quantifying uncertainty on model predictions and observational data.

Given the intimate coupling between empirical observations and modelling opportunities (e.g. model development, calibration and validation), the open sharing of high resolution time series data from monitoring networks is vital to confront modern research questions29,30,31,32. For example, while not without criticism33,34, typical validation routines for spatially distributed catchment models involve the routing of overland fluxes into stream channel outlets in which an integrated comparison can be made35,36,37,38,39,40. The value of small monitored catchments manifests since soil erosion and sediment delivery models require an idealised ‘goldilocks’ spatial scale for such confrontations; suitably large to incorporate catchment-scale processes, but without transitioning to scales after which fluvial processes mask and confound the signal from hillslope sediment delivery32,41. Among the spectrum of catchment drainage areas monitored in Europe, catchments potentially matching this criteria have the lowest relative abundance25.

The limited open availability of suitable catchment measurements is perhaps a key underlying reason for broad critiques of model validation efforts42. The cascading value of available centralised monitored catchment networks (e.g. USDA-ARS) is evidenced through numerous scientific and technological advancements in soil erosion research43,44,45,46. In Europe, despite a relative data-richness as a continent, the absence of a multi-national network instead requires community collaborations to systematise data in a way that can unite researchers with monitoring program operators30. This priority is compounded by the tendency of legacy research data to become increasingly unavailable through time47, emphasising the general need for European data conservation efforts.

Here we present the EUropean SEDiments collaboration (EUSEDcollab) database, a multi-source platform containing over 1600 catchment years of water discharge and sediment yield time series measurements suitable for soil erosion, sediment delivery and runoff studies. The dataset originates from collaborative efforts between a network of researchers and practitioners across the community with the goal of increasing data accessibility and usability. The data collection and harmonisation campaign was undertaken in multiple phases: (1) a call of interest for participation was made to the research community, issued by the Joint Research Centre (JRC) as part of the erosion working group within the EU Soil Observatory (EUSO), (2) interested collaborators were given (meta-)data templates to compile and share time series data to a centralised data repository, and (3) following data acquisition, a harmonisation and quality checking effort was undertaken to create a standardised database from the multiple data contributors. Following this process, we provide the first data release (EUSEDcollab.v1) of a continuing collaboration and data collation campaign through the EUSO, with the broad objective of converging scientific knowledge, people and data for research and policy-related objectives in Europe48.

Methods

Data collection: scope

The initial scope of EUSEDcollab on conception was to identify and unite high value research data in predominantly agricultural landscapes across Europe. Binary conditions were not set during the data collation phase, rather holistic criteria were made to be reflected in the compiled database, such as: (1) a significant contribution of rill and inter-rill erosion to the total sediment yield among the other relevant erosion processes (i.e. landslides, gullying and river bank erosion), and (2) a small to medium spatial scale (<1000 km2) in which the signal of hillslope sediment delivery is reflected in the sediment yield dynamics. Following this, an inclusionary approach is taken to maximise the number of catchment datasets in the repository, allowing a user to later subset the data repository based on their needs.

Data collection: time series and metadata structure

The monitoring of suspended sediment loads (SSL) at gauging stations requires quantifications of water discharge (Q) and suspended sediment concentration (SSC) through time. These spatial and temporal extrapolation exercises inevitably associate appreciable uncertainty with the final estimated quantity49. Uncertainties depends on:(1) the proficiency of Q and SSC measurement methods in capturing lateral and vertical gradients of sediment transport rate within the stream profile, (2) the timing and frequency of these measurements, and (3) the strategy used to extrapolate discrete measurements into (nearly) continuous time series. Such extrapolation is commonly undertaken using water depth-Q and Q-SSC rating curves to continuously approximate Q and SSC respectively50,51. In the case of SSC, surrogate approximators such as water turbidity and acoustic signals are also used to proxy changes in SSC at fine temporal resolutions based on calibrated relationships52. Minimising uncertainty is context-dependent based on the system dynamics53,54,55, requiring a strategic SSC sampling technique using random, calendar-based, or flow-proportional sampling schemes. Particularly at small spatial scales, a high number of SSC samples over time and using flow-proportional sampling regimes typically associates lower uncertainties with time-integrated sediment load approximations49.

Given the method dependency of SSL quantifications, we invited data contributors to add descriptive metadata properties of the water discharge and SSC measurement methods to provide users with background context for each timeseries (Table 1). Additionally, for the popular case in which a sediment rating curve was used for the extrapolation of SSC, we invited the contributing scientists to include the original data in order for a user to reproduce the time series of SSL.

Table 1 The standardised metadata template issued to the collaborating data producers of EUSEDcollab in the data collection campaign.

Each data entry has a standardised format with a column for the datetime, water discharge (Q: volume time−1), suspended sediment concentration (SSC: mass volume−1) and the derived suspended sediment load (SSL: mass time−1) accompanied by the relevant units. A metadata file accompanies each catchment entry to allow data contextualisation using open or categorical properties (Table 1). Input fields predominantly define descriptive properties of the catchment (e.g. monitoring station location, catchment drainage area and land cover), the data record (e.g. temporal extent) and the methods used to measure and quantify the water discharge and sediment yield. Land cover information is included as a metadata field since it gives the opportunity for data contributors to add and qualify primary descriptive catchment properties with more localised detail than is possible with auxiliary large-scale landcover datasets.

At minimum, each catchment entry contains a Q and SSL timeseries with a metadata file providing the geographic coordinates of the monitoring station location. However, for the majority of catchment entries the population of each metadata field within EUSEDcollab is relatively high (Table 1). Where possible, we also include: (1) precipitation time series data and rain gauge location information, (2) accompanying literature references from relevant publications for each dataset, and (3) a readme file to give expert-based contextual information to the end-user and qualify any necessary considerations within the time series data. For catchments without an associated English language publication, the submission of this file is emphasised in order to supplement the metadata with sufficient background information.

Data Records

The EUSEDcollab repository contains 245 catchments with time series of Q and SSL (Tables 24). We include a further seven catchment records with full Q time series and intermittent SSC measurements for a user to define their own extrapolation method, since no prior extrapolation was completed in these cases. These records are not considered in the subsequent summary but are included in the data release with accompanying metadata files. The combined dataset covers over 1600 catchment years of water discharge and suspended sediment load records. Based on time-structure, this repository is divided into 22 daily data records, 212 monthly records, 1 event record with a fixed timestep, 2 event records with variable timesteps, and 8 event records with event aggregations (Fig. 1). A large addition of data was made available from monitored Danish catchments56, which have a comparatively lower temporal resolution (monthly) than other individual or small collections of monitored catchments (Tables 24).

Table 2 An overview of database entries with individual event measurements and their respective assigned IDs and classified temporal structure.
Table 3 An overview of database entries with a daily timestep and their respective assigned IDs.
Table 4 An overview of database entries with monthly data or only daily discharge and sediment rating curve data.
Fig. 1
figure 1

A statistical overview of the EUSEDcollab database. Catchment records are categorised into ‘Monthly’ data, with quantifications of sediment yield per month, and ‘Daily/event’ data, including all other data time structures with daily timesteps or time-distributed and time aggregated event data. The plotted overviews include: (a) the number of datasets belonging to each classified time-structure type, (b) the distribution of measurement record lengths within the database, (c) the number of datasets with coverage in each year, and (d) boxplot distributions of catchment drainage areas within the dataset for monthly and daily/event time series records.

The distribution of catchment drainage areas (median = 43 km2, min = 0.04 km2, max = 817 km2) included in EUSEDcollab reflects the overall focus on small to medium monitored catchment areas relevant for soil erosion and hydrological research (Fig. 1). These catchments distribute across a range of elevation settings and climatic regions but contain an overall dominance of agricultural land uses (Fig. 2). Excluding catchment entries with monthly resolution data, this median drainage area reduces to 3.6 km2 (min = 0.04 km2, max = 566 km2). The mean measurement length of all records is 6.7 years and 9.7 years for only high temporal resolution (excluding monthly data) records. These years of data coverage are predominantly concentrated from the year 1995 onwards (Fig. 1).

Fig. 2
figure 2

Histogram charts of the elevation (a) and mean annual precipitation in mm (b) of the monitoring stations included in EUSEDcollab. The distribution of the % cover of each land use type within the database is given for catchments with metadata inputs (c). Elevation is extracted from the SRTM global digital elevation layer and total annual average precipitation from Worldclim103.

Of the total repository, 32 catchment entries contain additional time series measurements of precipitation depth at varying temporal resolutions for their respective location depending on the method employed. This precipitation file gives additional information on the rain gauge type and spatial coordinates. A total of 228 catchments have catchment boundary polygons added as additional information by the data provider (Fig. 3). Some monitored catchments, such as Kinderveld and Ganspoel35, contain additional geospatial information on land use as well as erosion surveys. In these cases we include the data in the original format and structure in which it was made available by the data producers. A full overview of all catchment locations is given in Fig. (4).

Fig. 3
figure 3

Google Earth satellite image examples of monitored catchments in EUSEDcollab with included catchment boundary polygons: (a) Kinderveld, BE (including parcel boundary information), and (b) Nučice, CZ. The point markers represent the registered monitoring locations in EUSEDcollab.

Fig. 4
figure 4

Top: A geographical overview of EUSEDcollab.v1 data entries per climate (EnZ) region in Europe104 (a). Bottom: summary-level empirical relationships found within the database entries, showing a) the relationship between catchment area (km2) and specific sediment yield (t km2 yr−1), and (b) the relationship between mean annual discharge (m3 yr−1) and the mean annual sediment yield (t yr−1) for all high temporal resolution datasets (excluding monthly data). The error bars show the variation of the annual sediment yield values around the mean annual average.

Technical Validation

Technical validation of each original record is done in a decentralised manor by the data producer. The multi-source nature of EUSEDcollab means that measurements of Q and SSL measurements were acquired with varying apparatus set-ups, temporal structures and post-processing methods (Tables 24). Acknowledging varying degrees of data heterogeneity requires end-users to make a judgement on the inter-comparability of catchment records for a particular use-case, based on differing measuring extents, sampling resolutions and uncertainty sources. As a data integration and harmonisation exercise, we aimed to facilitate this user-side assessment by providing necessary metadata properties, namely: (1) water discharge method descriptors, (2) sediment flux measurement and quantification methods, and (3) quality control properties describing the frequency of monitoring station checks, (4) literature references, and (5) dataset contact information (Table 1).

Data evaluation: quality and completeness assessment

To give a centralised assessment of the completeness and consistency of each submitted time series record, a ready-to-use evaluation was made of missing data inputs (Fig. 5). For example, missing inputs could be due to temporary technical issues, incomplete measurements or periodic discontinuation. Depending on the use-case, missing data may limit the applicability of a catchment dataset to a certain task and therefore may be useful for a user to know a priori.

Fig. 5
figure 5

An overview of the data quality control procedure to include an evaluation of missing data entries within each time series record. A modified evaluation is made according to the time series structure of each data record. The output of the quality control procedure provides an accompanying JSON file for each data entry within EUSEDcollab.

The compiled time series entries in EUSEDcollab contain continuous measurements (e.g. with a daily or monthly timestep) in perennial streams or episodic measurements (e.g. time-aggregated or time-distributed events) in discontinuous streams. Based on these structural data characteristics, adapted evaluation routines were used to summarise data presence/absence through time (Fig. 5). Each time series entry is initially classified into one of five structures: (1) daily data series with a fixed timestep, (2) monthly data series with a fixed timestep, (3) event data with a fixed timestep within each event, (4) event data with a variable timestep within each event, or (5) event data that is temporally-aggregated per event. Thereafter, evaluations of each time series are made to give the total % completeness of the instances for both Q and SSL. For data containing fine-resolution measurements during episodic events, within-event evaluations are additionally generated to quantify the completeness of each individual event making up the entire time series (Fig. 5). A full description of each evaluation parameter is given in S.(1) for each classified time series structure.

Usage Notes

Data opportunities

EUSEDcollab is the first database of its kind in Europe, intended as a resource for a non-exhaustive range of applications relating to runoff, soil loss by water erosion and sediment delivery research at singular or multiple sites. These opportunities can include a range of research domains seeking to understand the system dynamics of catchment-scale runoff, erosion and sediment fluxes (Figs. 4, 6). These may include modelled and analytical developments in frequency-intensity relationships26,27,57,58, spatial and temporal scale-effects25,59,60,61, or internal (e.g. topography, geology, soil characteristics), external (e.g. meteorological conditions) and anthropogenic (e.g. land use and land cover) drivers of sediment variability62.

Fig. 6
figure 6

Example syntheses of time series data from the Kinderveld catchment, BE (250 ha) and the Nučice catchment, CZ (53 ha) in the EUSEDcollab repository. Note that the data is not area-normalised and the data from the Kinderveld catchment (a) is presented in tonnes per aggregated event, while the Nučice catchment (b) is made available and presented in tonnes per day. Additionally, it is important to consider the following contextual factors: (i) The Nučice measurements include periods with baseflow carrying sediments, whereas in the Kinderveld, only runoff events are included. This difference in sediment sources (rill and interrill, bank erosion and gullying) between the two catchments, explained in the related literature (Tables 2, 3), may contribute to variations in the observed values. (ii) In Nučice, the low number of days in the data record for specific years (e.g., 2015, 2017, 2018, 2021) is due to exceptionally dry years when the discharge was zero or very low, limiting the availability of sediment data.

By uniting data from across a European scientific network, we aim to: (1) release an open-access data resource hosted on the European Soil Data Centre (ESDAC) with the goal of continued database growth in a standardised manor, (2) mitigate data loss from discontinued research projects, (3) build a repository upon which a broad range of analytical and modelling methods can be built to advance scientific knowledge, and (4) allow cross-domain intercomparisons to assess the generalisation of empirical relationships and model prediction systems.

Data limitations

Data users are advised to consider the applicability of each utilised dataset for their application. These considerations range from the spatial scale (drainage area) of the catchment in its context-dependent environmental setting, to the temporal detail and measurement-richness underlying the dataset. The data quality evaluation gives additional relevant information on the time series completeness in order for initial evaluations to be made (Fig. 5).

The EUSEDcollab.v1 repository has a significant spatial bias in its coverage due to a large number of data additions from small to medium sized catchments from a national monitoring campaign in Denmark56. These data have evidenced usage in erosion modelling36 but may not meet the requirements of certain high temporal resolution research applications due to infrequent underlying suspended sediment sampling. We envisage that continued catchment data inputs from national monitoring campaigns fitting the motivations of EUSEDcollab will improve the overall spatial coverage and reduce this spatial bias.

Data platform and continued community contributions

The EUSEDcollab repository is openly accessible via the European Soil Data Centre63 (ESDAC) platform (https://esdac.jrc.ec.europa.eu/content/EUSEDcollab) and Figshare64. All files are provided in .csv format in their relevant folders and are identifiable based on the assigned ID listed in the overview file (Catchment_ID_assignment.csv). In the case of database-wide applications, users are requested to cite this article as the reference for the entire repository. In cases of individual catchment applications, users should refer to the reference studies for each catchment provided in the metadata and summarised in Tables 24.

EUSEDcollab.v1 is intended as the first version of a continued effort to gather and platform data through collaborative efforts from across the community. Future data collection efforts will seek to extend the size and scope of the repository through including a wider diversity of catchment types (e.g. pristine forests, badlands etc.) across a wider range of elevation settings.

Further contributions can be made to the database by downloading and completing the data and meta-data template files available in the ESDAC data portal (https://esdac.jrc.ec.europa.eu/content/EUSEDcollab). Data submissions can be included in future data releases by contacting the listed data manager through the contact details listed in the ESDAC data portal.