EUSEDcollab: a network of data from European catchments to monitor net soil erosion by water

As a network of researchers we release an open-access database (EUSEDcollab) of water discharge and suspended sediment yield time series records collected in small to medium sized catchments in Europe. EUSEDcollab is compiled to overcome the scarcity of open-access data at relevant spatial scales for studies on runoff, soil loss by water erosion and sediment delivery. Multi-source measurement data from numerous researchers and institutions were harmonised into a common time series and metadata structure. Data reuse is facilitated through accompanying metadata descriptors providing background technical information for each monitoring station setup. Across ten European countries, EUSEDcollab covers over 1600 catchment years of data from 245 catchments at event (11 catchments), daily (22 catchments) and monthly (212 catchments) temporal resolution, and is unique in its focus on small to medium catchment drainage areas (median = 43 km2, min = 0.04 km2, max = 817 km2) with applicability for soil erosion research. We release this database with the aim of uniting people, knowledge and data through the European Union Soil Observatory (EUSO).


Background and Summary
Soil erosion by water and sediment delivery to river systems are gaining political importance and scientific attention for their integral role in issues spanning across the domains of soil health 1 , food security 2 , environmental pollution [3][4][5][6] , greenhouse gas offsetting [7][8][9][10] , reservoir longevity 11 , and a range of other ecosystem services [12][13][14][15][16][17][18] .The scientific community has responded to these priorities with a continuingly increasing number of model-based assessments, ranging across the full spectrum of spatial scales relevant to the end-user 19,20 .While model applications have dominated the scientific output, the production and sharing of empirical observations haven't necessarily kept pace 21 .Available summarised compilations of long-term annual average rates from monitored areas have unravelled large-scale spatial trends in soil loss by water erosion and fluvial sediment yield [22][23][24][25] , but often do so with a long-term annual average temporal focus that misses the high temporal variability between soil loss events [26][27][28] .Quantifications of net soil loss at dynamic timescales arguably form the basis of contemporary research priorities, which include, but are not limited to: (1) understanding the variable frequency-magnitude relationships of gross and net soil loss through space and time in a changing climate, (2) understanding the influences of management practices on the dynamics and magnitude of soil loss, (3) up/down-scaling soil loss by water erosion predictions to integrate soil loss by water erosion processes into Earth system models, and (4) quantifying uncertainty on model predictions and observational data.
Given the intimate coupling between empirical observations and modelling opportunities (e.g.model development, calibration and validation), the open sharing of high resolution time series data from monitoring networks is vital to confront modern research questions [29][30][31][32] .For example, while not without criticism 33,34 , typical validation routines for spatially distributed catchment models involve the routing of overland fluxes into stream channel outlets in which an integrated comparison can be made [35][36][37][38][39][40] .The value of small monitored catchments manifests since soil erosion and sediment delivery models require an idealised 'goldilocks' spatial scale for such confrontations; suitably large to incorporate catchment-scale processes, but without transitioning to scales after which fluvial processes mask and confound the signal from hillslope sediment delivery 32,41 .Among the spectrum of catchment drainage areas monitored in Europe, catchments potentially matching this criteria have the lowest relative abundance 25 .
The limited open availability of suitable catchment measurements is perhaps a key underlying reason for broad critiques of model validation efforts 42 .The cascading value of available centralised monitored catchment networks (e.g.USDA-ARS) is evidenced through numerous scientific and technological advancements in soil erosion research [43][44][45][46] .In Europe, despite a relative data-richness as a continent, the absence of a multi-national network instead requires community collaborations to systematise data in a way that can unite researchers with monitoring program operators 30 .This priority is compounded by the tendency of legacy research data to become increasingly unavailable through time 47 , emphasising the general need for European data conservation efforts.
Here we present the EUropean SEDiments collaboration (EUSEDcollab) database, a multi-source platform containing over 1600 catchment years of water discharge and sediment yield time series measurements suitable for soil erosion, sediment delivery and runoff studies.The dataset originates from collaborative efforts between a network of researchers and practitioners across the community with the goal of increasing data accessibility and usability.The data collection and harmonisation campaign was undertaken in multiple phases: (1) a call of interest for participation was made to the research community, issued by the Joint Research Centre (JRC) as part of the erosion working group within the EU Soil Observatory (EUSO), (2) interested collaborators were given (meta-)data templates to compile and share time series data to a centralised data repository, and (3) following data acquisition, a harmonisation and quality checking effort was undertaken to create a standardised database from the multiple data contributors.Following this process, we provide the first data release Table 1.The standardised metadata template issued to the collaborating data producers of EUSEDcollab in the data collection campaign.Each time series of water discharge and sediment yield has an accompanying metadata file to allow the filtering based on method or catchment attributes and provide the user with relevant contextual information (e.g.method descriptors and published work).Metadata identifiers were open or categorical for the data producer, or otherwise assigned during the database harmonisation process.The '% populated' column refers to the % completeness of each metadata field for the entire collected database.For Boolean variables, the % populated column gives the database % with an accompanying count of the cases with a true value (i.e. containing precipitation or sediment rating curve data).
(EUSEDcollab.v1) of a continuing collaboration and data collation campaign through the EUSO, with the broad objective of converging scientific knowledge, people and data for research and policy-related objectives in Europe 48 . Catchment

Methods
Data collection: scope.The initial scope of EUSEDcollab on conception was to identify and unite high value research data in predominantly agricultural landscapes across Europe.Binary conditions were not set during the data collation phase, rather holistic criteria were made to be reflected in the compiled database, such as: (1) a significant contribution of rill and inter-rill erosion to the total sediment yield among the other relevant erosion processes (i.e.landslides, gullying and river bank erosion), and (2) a small to medium spatial scale (<1000 km 2 ) in which the signal of hillslope sediment delivery is reflected in the sediment yield dynamics.Following this, an inclusionary approach is taken to maximise the number of catchment datasets in the repository, allowing a user to later subset the data repository based on their needs.
Data collection: time series and metadata structure.The monitoring of suspended sediment loads (SSL) at gauging stations requires quantifications of water discharge (Q) and suspended sediment concentration (SSC) through time.These spatial and temporal extrapolation exercises inevitably associate appreciable uncertainty with the final estimated quantity 49 .Uncertainties depends on:(1) the proficiency of Q and SSC measurement methods in capturing lateral and vertical gradients of sediment transport rate within the stream profile, (2) the timing and frequency of these measurements, and (3) the strategy used to extrapolate discrete measurements into (nearly) continuous time series.Such extrapolation is commonly undertaken using water depth-Q and Q-SSC rating curves to continuously approximate Q and SSC respectively 50,51 .In the case of SSC, surrogate approximators such as water turbidity and acoustic signals are also used to proxy changes in SSC at fine temporal resolutions based on calibrated relationships 52 .Minimising uncertainty is context-dependent based on the system dynamics [53][54][55] , requiring a strategic SSC sampling technique using random, calendar-based, or flow-proportional sampling schemes.Particularly at small spatial scales, a high number of SSC samples over time and using flow-proportional sampling regimes typically associates lower uncertainties with time-integrated sediment load approximations 49 .Given the method dependency of SSL quantifications, we invited data contributors to add descriptive metadata properties of the water discharge and SSC measurement methods to provide users with background context for each timeseries (Table 1).Additionally, for the popular case in which a sediment rating curve was used for the extrapolation of SSC, we invited the contributing scientists to include the original data in order for a user to reproduce the time series of SSL.
Each data entry has a standardised format with a column for the datetime, water discharge (Q: volume time −1 ), suspended sediment concentration (SSC: mass volume −1 ) and the derived suspended sediment load (SSL: mass time −1 ) accompanied by the relevant units.A metadata file accompanies each catchment entry to allow data contextualisation using open or categorical properties (Table 1).Input fields predominantly define descriptive properties of the catchment (e.g.monitoring station location, catchment drainage area and land cover), the data record (e.g.temporal extent) and the methods used to measure and quantify the water discharge and sediment yield.Land cover information is included as a metadata field since it gives the opportunity for data contributors to add and qualify primary descriptive catchment properties with more localised detail than is possible with auxiliary large-scale landcover datasets.
At minimum, each catchment entry contains a Q and SSL timeseries with a metadata file providing the geographic coordinates of the monitoring station location.However, for the majority of catchment entries the population of each metadata field within EUSEDcollab is relatively high (Table 1).Where possible, we also include: (1) precipitation time series data and rain gauge location information, (2) accompanying literature references from relevant publications for each dataset, and (3) a readme file to give expert-based contextual information to the end-user and qualify any necessary considerations within the time series data.For catchments without an associated English language publication, the submission of this file is emphasised in order to supplement the metadata with sufficient background information.

Data Records
The EUSEDcollab repository contains 245 catchments with time series of Q and SSL (Tables 2-4).We include a further seven catchment records with full Q time series and intermittent SSC measurements for a user to define their own extrapolation method, since no prior extrapolation was completed in these cases.These records are not considered in the subsequent summary but are included in the data release with accompanying metadata files.The combined dataset covers over 1600 catchment years of water discharge and suspended sediment load records.Based on time-structure, this repository is divided into 22 daily data records, 212 monthly records, 1 event record with a fixed timestep, 2 event records with variable timesteps, and 8 event records with event aggregations (Fig. 1).A large addition of data was made available from monitored Danish catchments 56 , which have a comparatively lower temporal resolution (monthly) than other individual or small collections of monitored catchments (Tables 2-4).
The distribution of catchment drainage areas (median = 43 km 2 , min = 0.04 km 2 , max = 817 km 2 ) included in EUSEDcollab reflects the overall focus on small to medium monitored catchment areas relevant for soil erosion and hydrological research (Fig. 1).These catchments distribute across a range of elevation settings and climatic regions but contain an overall dominance of agricultural land uses (Fig. 2).Excluding catchment entries with monthly resolution data, this median drainage area reduces to 3.6 km 2 (min = 0.04 km 2 , max = 566 km 2 ).The mean measurement length of all records is 6.7 years and 9.7 years for only high temporal resolution (excluding monthly data) records.These years of data coverage are predominantly concentrated from the year 1995 onwards (Fig. 1).
Of the total repository, 32 catchment entries contain additional time series measurements of precipitation depth at varying temporal resolutions for their respective location depending on the method employed.This precipitation file gives additional information on the rain gauge type and spatial coordinates.A total of 228 catchments have catchment boundary polygons added as additional information by the data provider (Fig. 3).Some monitored catchments, such as Kinderveld and Ganspoel 35 , contain additional geospatial information on land use as well as erosion surveys.In these cases we include the data in the original format and structure in which it was made available by the data producers.A full overview of all catchment locations is given in Fig. (4).

technical Validation
Technical validation of each original record is done in a decentralised manor by the data producer.The multi-source nature of EUSEDcollab means that measurements of Q and SSL measurements were acquired with varying apparatus set-ups, temporal structures and post-processing methods (Tables 2-4).Acknowledging varying degrees of data heterogeneity requires end-users to make a judgement on the inter-comparability of catchment records for a particular use-case, based on differing measuring extents, sampling resolutions and uncertainty sources.As a data integration and harmonisation exercise, we aimed to facilitate this user-side assessment by providing necessary metadata properties, namely: (1) water discharge method descriptors, (2) sediment flux measurement and quantification methods, and (3) quality control properties describing the frequency of monitoring station checks, (4) literature references, and (5) dataset contact information (Table 1).

Data evaluation: quality and completeness assessment.
To give a centralised assessment of the completeness and consistency of each submitted time series record, a ready-to-use evaluation was made of missing data inputs (Fig. 5).For example, missing inputs could be due to temporary technical issues, incomplete measurements or periodic discontinuation.Depending on the use-case, missing data may limit the applicability of a catchment dataset to a certain task and therefore may be useful for a user to know a priori.
The compiled time series entries in EUSEDcollab contain continuous measurements (e.g. with a daily or monthly timestep) in perennial streams or episodic measurements (e.g.time-aggregated or time-distributed events) in discontinuous streams.Based on these structural data characteristics, adapted evaluation routines were used to summarise data presence/absence through time (Fig. 5).Each time series entry is initially classified into one of five structures: (1) daily data series with a fixed timestep, (2) monthly data series with a fixed timestep, (3) event data with a fixed timestep within each event, (4) event data with a variable timestep within each event, or (5) event data that is temporally-aggregated per event.Thereafter, evaluations of each time series are made to give the total % completeness of the instances for both Q and SSL.For data containing fine-resolution measurements during episodic events, within-event evaluations are additionally generated to quantify the completeness of each individual event making up the entire time series (Fig. 5).A full description of each evaluation parameter is given in S. (1) for each classified time series structure.

Usage Notes
Data opportunities.EUSEDcollab is the first database of its kind in Europe, intended as a resource for a non-exhaustive range of applications relating to runoff, soil loss by water erosion and sediment delivery research at singular or multiple sites.These opportunities can include a range of research domains seeking to understand the system dynamics of catchment-scale runoff, erosion and sediment fluxes (Figs. 4, 6).These may include modelled and analytical developments in frequency-intensity relationships 26,27,57,58 , spatial and temporal scale-effects 25,[59][60][61] , or internal (e.g.topography, geology, soil characteristics), external (e.g.meteorological conditions) and anthropogenic (e.g.land use and land cover) drivers of sediment variability 62 .
By uniting data from across a European scientific network, we aim to: (1) release an open-access data resource hosted on the European Soil Data Centre (ESDAC) with the goal of continued database growth in a standardised manor, (2) mitigate data loss from discontinued research projects, (3) build a repository upon which a broad range of analytical and modelling methods can be built to advance scientific knowledge, and (4) allow cross-domain intercomparisons to assess the generalisation of empirical relationships and model prediction systems.
Data limitations.Data users are advised to consider the applicability of each utilised dataset for their application.These considerations range from the spatial scale (drainage area) of the catchment in its context-dependent environmental setting, to the temporal detail and measurement-richness underlying the dataset.The data quality evaluation gives additional relevant information on the time series completeness in order for initial evaluations to be made (Fig. 5).
The EUSEDcollab.v1repository has a significant spatial bias in its coverage due to a large number of data additions from small to medium sized catchments from a national monitoring campaign in Denmark 56 .These data have evidenced usage in erosion modelling 36 but may not meet the requirements of certain high temporal resolution research applications due to infrequent underlying suspended sediment sampling.We envisage that continued catchment data inputs from national monitoring campaigns fitting the motivations of EUSEDcollab will improve the overall spatial coverage and reduce this spatial bias.
Data platform and continued community contributions.The EUSEDcollab repository is openly accessible via the European Soil Data Centre 63 (ESDAC) platform (https://esdac.jrc.ec.europa.eu/content/EUSEDcollab) and Figshare 64 .All files are provided in .csvformat in their relevant folders and are identifiable based on the assigned ID listed in the overview file (Catchment_ID_assignment.csv).In the case of database-wide applications, users are requested to cite this article as the reference for the entire repository.In cases of individual catchment applications, users should refer to the reference studies for each catchment provided in the metadata and summarised in Tables 2-4.
EUSEDcollab.v1 is intended as the first version of a continued effort to gather and platform data through collaborative efforts from across the community.Future data collection efforts will seek to extend the size and scope of the repository through including a wider diversity of catchment types (e.g.pristine forests, badlands etc.) across a wider range of elevation settings.
Further contributions can be made to the database by downloading and completing the data and meta-data template files available in the ESDAC data portal (https://esdac.jrc.ec.europa.eu/content/EUSEDcollab).Data Fig. 5 An overview of the data quality control procedure to include an evaluation of missing data entries within each time series record.A modified evaluation is made according to the time series structure of each data record.The output of the quality control procedure provides an accompanying JSON file for each data entry within EUSEDcollab.
submissions can be included in future data releases by contacting the listed data manager through the contact details listed in the ESDAC data portal.Fig. 6 Example syntheses of time series data from the Kinderveld catchment, BE (250 ha) and the Nučice catchment, CZ (53 ha) in the EUSEDcollab repository.Note that the data is not area-normalised and the data from the Kinderveld catchment (a) is presented in tonnes per aggregated event, while the Nučice catchment (b) is made available and presented in tonnes per day.Additionally, it is important to consider the following contextual factors: (i) The Nučice measurements include periods with baseflow carrying sediments, whereas in the Kinderveld, only runoff events are included.This difference in sediment sources (rill and interrill, bank erosion and gullying) between the two catchments, explained in the related literature (Tables 2, 3), may contribute to variations in the observed values.(ii) In Nučice, the low number of days in the data record for specific years (e.g., 2015, 2017, 2018, 2021) is due to exceptionally dry years when the discharge was zero or very low, limiting the availability of sediment data.

Fig. 1 A
Fig. 1 A statistical overview of the EUSEDcollab database.Catchment records are categorised into 'Monthly' data, with quantifications of sediment yield per month, and 'Daily/event' data, including all other data time structures with daily timesteps or time-distributed and time aggregated event data.The plotted overviews include: (a) the number of datasets belonging to each classified time-structure type, (b) the distribution of measurement record lengths within the database, (c) the number of datasets with coverage in each year, and (d) boxplot distributions of catchment drainage areas within the dataset for monthly and daily/event time series records.

Fig. 2
Fig. 2 Histogram charts of the elevation (a) and mean annual precipitation in mm (b) of the monitoring stations included in EUSEDcollab.The distribution of the % cover of each land use type within the database is given for catchments with metadata inputs (c).Elevation is extracted from the SRTM global digital elevation layer and total annual average precipitation from Worldclim 103 .

Fig. 3
Fig. 3 Google Earth satellite image examples of monitored catchments in EUSEDcollab with included catchment boundary polygons: (a) Kinderveld, BE (including parcel boundary information), and (b) Nučice, CZ.The point markers represent the registered monitoring locations in EUSEDcollab.

Fig. 4
Fig. 4 Top: A geographical overview of EUSEDcollab.v1data entries per climate (EnZ) region in Europe 104 (a).Bottom: summary-level empirical relationships found within the database entries, showing a) the relationship between catchment area (km 2 ) and specific sediment yield (t km 2 yr −1 ), and (b) the relationship between mean annual discharge (m 3 yr −1 ) and the mean annual sediment yield (t yr −1 ) for all high temporal resolution datasets (excluding monthly data).The error bars show the variation of the annual sediment yield values around the mean annual average.

Table 2 .
An overview of database entries with individual event measurements and their respective assigned IDs and classified temporal structure.The associated timeseries data contains either a variable or fixed sub-event timestep, or the data is aggregated per event.The 'Literature references' column gives the corresponding studies on the catchment undertaken before the data submission phase.

Table 4 .
An overview of database entries with monthly data or only daily discharge and sediment rating curve data.'Q and rating curve data only' signifies that the dataset contains continuous water discharge records and matching Q-SSC pairs, but no extrapolation has been performed.The 'Literature references' column gives the corresponding studies on the catchment undertaken before the data submission phase.Catchment ID

Catchment name Country Start date End date Drainage area (ha) Data type Literature referencesTable 3 .
An overview of database entries with a daily timestep and their respective assigned IDs.The 'Literature references' column gives the corresponding studies on the catchment undertaken before the data submission phase.