Close co-variation between soil moisture and runoff emerging from multi-catchment data across Europe

Soil moisture is an important variable for land-climate and hydrological interactions. To investigate emergent large-scale, long-term interactions between soil moisture and other key hydro-climatic variables (precipitation, actual evapotranspiration, runoff, temperature), we analyze monthly values and anomalies of these variables in 1378 hydrological catchments across Europe over the period 1980–2010. The study distinguishes results for the main European climate regions, and tests how sensitive or robust they are to the use of three alternative observational and re-analysis datasets. Robustly across the European climates and datasets, monthly soil moisture anomalies correlate well with runoff anomalies, and extreme soil moisture and runoff values also largely co-occur. For precipitation, evapotranspiration, and temperature, anomaly correlation and extreme value co-occurrence with soil moisture are overall lower than for runoff. The runoff results indicate a possible new approach to assessing variability and change of large-scale soil moisture conditions by use of long-term time series of monitored catchment-integrating stream discharges.

arise and are changing (problem 9); (vi) how innovative technologies (e.g., satellite and re-analysis products) can be used to quantify hydrological states and fluxes at a range of spatial and temporal scales (problem 16). From a practical management perspective, it is also essential to know if soil moisture responses to precipitation changes feed mostly into ET (through vegetation) or runoff (through groundwater) changes. Dominant propagation to ET changes may imply criticality for vegetation and crop security 16 , while dominant propagation to runoff changes may imply criticality for freshwater 17 and energy 18 security, flood risk 8,19,20 , irrigation timing during droughts 5 , and tradeoffs between future water storage needs 21 and their hydro-climatic feedbacks 22,23 .
Advancing our quantitative understanding of soil moisture interaction strengths with different hydrological fluxes, towards answering the above open questions, requires comparison and benchmarking of recent/current relationships between soil moisture and flux anomalies over whole catchments (Fig. 1b) and not just their local or upscaled surface conditions (Fig. 1a). Furthermore, the recent hydrological community synthesis of such key questions 15 has emphasized data-driven multi-catchment comparisons 5,23,24 as a main method for building up hydrological knowledge that goes beyond and can bridge a fragmented understanding of local inter-compartment fluxes.
This study uses such a multi-catchment comparative approach to quantify possible dominant interactions emerging at large regional scales from available long-term catchment data. A main hypothesis addressed and tested in the study is that dominant soil moisture co-variation with runoff may emerge as a clear large-scale statistical signal 5,10,19 , above local differences in hydro-climate and human activities prevailing in catchments of different scales and in different parts of the world. The motivation for this hypothesis is twofold. First, strong soil moisture correlation with runoff has been indicated by recent results of multi-catchment data analyses for droughts 5 and floods 19 , and multi-catchment model-data comparisons showing good agreement of modelled catchment-average soil moisture based on runoff data with remotely sensed large-scale soil moisture 10 . Second, if this hypothesis were supported by more multi-catchment data in different world regions, it would open a new avenue of using long-term time series of monitored catchment-integrating stream discharges to assess related variations of large-scale soil moisture conditions 25 , complementing other estimates and interpretations of the latter from upscaled locally measured data and remotely sensed data. A main aim of this study is to further investigate this complementary possibility by evaluating and comparing data-given statistical co-variation patterns between catchment-average soil moisture, runoff and other hydro-climatic variables in and across multiple study catchments.
To achieve this aim, we compile and use three alternative datasets from global observation and re-analysis products covering at least the 30-year period 1980-2010 (Table 1; further data descriptions and references in Methods) for 1378 catchments across Europe (Fig. 2). The study focus on this time period and Europe is due to the required good long-term multi-variable data availability for these catchments across this continent over this study period. Schematic conceptualization of (a) the near-surface soil-plant water system feeding actual evapotranspiration (ET), and (b) the full surface and subsurface hydrological catchment system feeding both ET and runoff/stream discharge. The conceptualization perspective (a) includes the local near-surface hydrological pathways through soil moisture to ET, as commonly considered and, by certain assumptions, scaled up in large-scale land surface schemes of Earth System Models (ESMs) and surface-focused hydrological modelling. The conceptualization perspective (b) includes both the near-surface interaction pathways between catchmentaverage precipitation, soil moisture and ET, and the mainly subsurface soil-ground-stream water pathways to catchment-average runoff that feeds the total stream discharge from each catchment of any spatial scale. Panel (b) represents the conceptualization and interactions considered and quantified in this study; it is redrawn from Supplementary Figure 8  Europe also spans steep climate gradients, such that we can study the targeted possible emergent large-scale patterns of soil moisture co-variation with other hydro-climatic variables in and across the three main European climate regions distinguished and proposed by the Intergovernmental Panel on Climate Change (IPCC) as appropriate for managing risks of extreme events and advancing climate change adaptation 2 : Northern Europe (NEU), Central Europe (CEU), and Southern Europe (SEU). Recent large-scale multi-catchment studies of current 5 and past 26 hydro-climate across Europe have also shown distinctly different large-scale characteristics of these three climate zones with regard to soil moisture anomaly responses to atmospheric driver variations. To also assess data uncertainty effects, the analysis combines the three alternative data products into a fully independent set for a basic study, as well as an internally consistent and an intermediate dataset for comparative study (Table 1). where X and σ x are the month-independent long-term average value and the standard deviation, respectively, of each variable X over all months in the study period 1980-2010. Equation(1) thus normalizes the long-term average value ( X m ) of each variable for each month of the average year to multi-variable comparable values of Nor X m . Figure 3 shows the normalized long-term average monthly variations of each variable X (solid lines) calculated by applying Eq. (1) on the aggregated time series of X over each region. The shadows in Fig. 3 show the inter-quartile range (IQR) of X values for each month of the year over the whole time period of analysis (30 values for each month from 1980 to 2010), as a measure of X variability around its average value for each month of the year. By plotting Nor X m for the different variables X in Fig. 3, we can see the average monthly variation of each X over the average year, along with the corresponding monthly IQR of each X for each month among the different years in the study period. This allows direct comparison and identification of possible time lags in the variations of the different monthly averages, and their fluctuations around those averages among different years. Comparison of resulting Nor X m for the different variables ( Fig. 3) shows that the soil moisture variations (yellow) are most consistent with the runoff variations (light blue) across all three European climate regions. In contrast, the ET variations (green) are most consistent with the temperature variations (red), and the variations of both these variables differ considerably from the soil moisture and runoff variations. The monthly precipitation variations (dark blue) are overall smaller than and differ from those of the other hydro-climatic variables while its IQR is wider than for all other variables. Temperature and ET have overall the smallest IQR in all three climate regions, but the variation range of ET is higher in the drier SEU region than in the wetter CEU and NEU regions. The IQRs of soil moisture and runoff are intermediate and are smallest during the driest months (Aug and Sep) of the relatively dry SEU region.
The peaks in average monthly ET and temperature, which both occur in summer, exhibit a time lag (with ET peaking and starting to decline before temperature) that is greater in SEU than in NEU and CEU (Fig. 3). This lag depends on the corresponding average variations in monthly soil moisture and precipitation, which together  (Table 1), respectively. Shadows show interquartile ranges around the normalized long-term average monthly values of each variable. www.nature.com/scientificreports www.nature.com/scientificreports/ provide the water availability for ET in each month. Across all climate regions, soil moisture declines considerably from around May and remains relatively low during summer. In SEU, precipitation also declines and remains relatively low during summer, while it remains relatively high throughout summer in NEU and CEU. A relatively high water supply from precipitation is thus available for ET (even though soil moisture declines) through the summer months in NEU and CEU. Nevertheless, ET in NEU and CEU declines while precipitation is still increasing (in NEU) or remains relatively high (in CEU) after the decrease in temperature during Jul, Aug, and Sep. This ET decline, while precipitation remains high, reflects that average monthly ET is here energy limited; as such, ET in NEU and CEU can only efficiently use the relatively high water availability in the warm months from start to end of summer, but not thereafter when temperature and thus energy supply declines while water availability is still relatively high. In SEU, however, both soil moisture and precipitation are relatively small during summer, reflecting water limitation of ET in this region; ET therefore starts here to decline already around June, as the water provided from precipitation becomes then insufficient for keeping up a high ET level, even though the temperature and associated energy supply remain high until later in the average year. These results for Nor X m are robust for the fully independent, intermediate, and internally consistent datasets (left, middle, and right, respectively, Fig. 3).
The whole time series of variable X values in each month m of each year y over the whole study period 1980-2010 is further compared for all data products in terms of absolute monthly values (X m,y ; Supporting Figures S1-S4) and corresponding normalized anomalies (NorX m,y ; Supporting Figures S5-S8) calculated as: where X m and σ X m are the long-term average value and standard deviation of monthly X, respectively, for each month m over all years in the study period 1980-2010. The comparisons of monthly absolute and anomaly values of X show high level of agreement between the variables, regardless of which data product they belong to, with an exception for the absolute X m,y values of soil moisture; for this variable, the ERA-Interim/Land reanalysis product (used in the fully independent and internally consistent datasets) has overall higher mean values and smaller variations than the GLEAM-3.2a product (used in the intermediate dataset) (Figure S3). The differences in absolute soil moisture are greatest for NEU, smaller for CEU, and smallest for SEU, while corresponding differences all disappear when comparing the normalized anomalies NorX m,y for soil moisture ( Figure S7). Figure 4 shows scatter plots and regression lines for the normalized anomalies NorX m,y of precipitation (left panels), ET (middle panels) and runoff (right panels) versus those of soil moisture based on the fully independent dataset. Supporting Figure S9 shows corresponding results from this dataset also for temperature versus soil moisture anomalies (exhibiting no correlation), while Supporting Figures S10-S11 show NorX m,y results for all variables based on the intermediate and internally consistent datasets, respectively. The normalized soil moisture and runoff anomalies exhibit by far the greatest correlations across all three climate regions and based on all three datasets ( Fig. 4 and Supporting Figures S10-S11). Furthermore, when comparing the points for the most dry and wet soil moisture conditions within the dataset, the same data points also largely represent the most dry and wet runoff conditions, respectively (right panels, Fig. 4). This indicates largely concurrent occurrences of extreme soil moisture and runoff conditions. In contrast, precipitation (left panels) and ET (middle panels) vary over a large range, from negative to positive values, during both the driest (less than −1.5 normalized monthly values) and the wettest (greater than +1.5 normalized monthly values) data points for soil moisture. This indicates largely uncorrelated extreme occurrences between soil moisture and precipitation or ET. These correlation and extreme co-occurrence results are robust also when using the intermediate and internally consistent datasets (Supporting Figures 10-11).
Note that Fig. 3 shows the intra-annual variation of long-term average monthly values (Nor X m , Eq. (1)), while Fig. 4 shows the X anomalies for each month of each year from these average conditions in each month of the average year NorX m,y , Eq. (2)). As such, the correlation patterns are not and should not be expected to be the same for these different quantities, Nor X m and NorX m,y . For example, the average seasonal variability within each year quantified by Nor X m shows negative correlation between ET and soil moisture because ET is on average largest in the warm season when it is also driest and soil moisture is on average the smallest (Fig. 3). In contrast, the monthly anomalies from this average seasonal variation exhibit positive correlation between ET and soil moisture because the individual values of these variables in some month within the time series are both similarly affected if this month is anomalously wet or dry relative to the long-term average seasonal conditions for that specific month order within the average year (Fig. 4).
For further distinction of seasonal co-variations, Fig. 5 summarizes all monthly coefficients of determination (r 2 ) between the normalized monthly anomalies NorX m,y of soil moisture and those of the other hydro-climatic variables for the NEU, CEU, and SEU regions and for all of the fully independent, intermediate, and internally consistent datasets. Supporting Figures S12-S15 (fully independent dataset), S16-S19 (intermediate dataset), and S20-S23 (internally consistent dataset) further show the underlying monthly scatter plots and regression lines for the anomalies NorX m,y of precipitation (Figures S12, S16, and S20), ET (Figures S13, S17, and S21), runoff (Figures S14, S18, and S22) and temperature (Figures S15, S19, and S23) versus that of soil moisture. For runoff and soil moisture, their monthly anomaly correlations are overall relatively high (Fig. 5) and their extreme (most dry/wet) monthly values tend to largely co-occur (Supporting Figures S14, S18, and S22); as for their total correlation (Fig. 4), these monthly results are robust across the months and climate regions and for all three datasets. For ET and precipitation versus soil moisture, some regional monthly correlations approach those for runoff, including also relatively consistent co-occurrences of dry/wet events; such region-and month-specific results are more evident in the intermediate than the fully independent or internally consistent datasets. For ET, there is relatively (2020) 10:4817 | https://doi.org/10.1038/s41598-020-61621-y www.nature.com/scientificreports www.nature.com/scientificreports/ high correlation with soil moisture in the SEU region for the summer to early autumn months of July-October (Fig. 5, Supporting Figures S13, S17, and S21). For precipitation, some relatively high correlations appear for some winter/spring months during January-April in the NEU and CEU regions (Fig. 5, Supporting Figures S12, S16, and S20). For temperature versus soil moisture, monthly correlations are generally low (Fig. 5), or inconsistently varying between negative and positive among different months (Supporting Figures S15, S19, and S23).
The results shown in Figs. 4-5 (and corresponding Supporting Figures in the Supplementary Information) are derived without consideration of possible time lags in the co-variation of the studied variables. To resolve the influence of such time lags, we have also considered temporal shifts in additional analysis of the variable time series (Fig. 6, Supporting Figure S24-S25). Specifically, variable correlations are calculated and compared with the corresponding no-shift results for a backward 1-month shift and a forward 1-month shift of precipitation, ET, runoff, or temperature versus soil moisture. Although some of the studied temporal lag scenarios may not reflect reasonable physical relationships (e.g., the forward temporal shift of precipitation or temperature in relation to soil moisture), the analysis has been systematic and not presuming to know a priori what different temporally shifted co-variation patterns may emerge at large scales. As such, and for illustration consistency and clarity, we have quantified and present all different time lag scenario results.  (Table 1). Figure 6 summarizes these results in terms of their respective r 2 values, showing that a backward 1-month shift of precipitation versus soil moisture yields higher r 2 value between these variables than the corresponding no-shift results for all cases except for the NEU and CEU regions based on the fully independent dataset (for the intermediate and internally consistent datasets, see Supporting Figure S24). A marginal increase of r 2 values relative to the no-shift results can also be seen for a forward 1-month shift of ET versus soil moisture for the CEU region (for all datasets) as well as for a backward 1-month shift of temperature versus soil moisture in most cases. However, no-shift scenario results yield consistently the highest r 2 value for the correlation between runoff and soil moisture. Importantly, the correlation between runoff and soil moisture also remains the overall highest one among all time lag scenarios (backward, forward, and no shift), across all regions and for all datasets. The obtained co-variation patterns across time lag scenarios are thus consistent with the basic no-shift results in this respect, which strengthens the main finding of a dominant co-variation pattern between soil moisture and runoff ( Fig. 6 and Supporting Figure S24; see also Supporting Figure S25, showing the highest monthly r 2 values from the results of all considered time shifts).

Discussion
The results of considerably closer co-variation and higher anomaly correlation between runoff (than between precipitation, evapotranspiration, or temperature) and soil moisture may be considered surprising as, e.g., the land surface schemes of ESMs as well as much hydrological modelling have focused primarily on soil moisture interactions with precipitation and ET 7 (Fig. 1a). However, indications of close relationships between variations 10 Clearly, precipitation, ET, runoff and water storage changes (including soil moisture changes as well as changes in groundwater and surface water levels) are all mechanistically linked through fundamental water balance in the landscape. The present data-given co-variation and correlation results identify some dominant, first-order relationships emerging on large (multi)catchment/regional scales from all the complex local process links and interactions taking place in underlying smaller scales. This emergence indicates a key role for soil moisture in regulating large-scale runoff variations, consistently across European climate regions and datasets. Mechanistically, this role is explainable by soil moisture changes implying corresponding changes in groundwater table depth 8 and thereby also in the hydraulic gradients that govern groundwater flows towards and into nearest streams (Fig. 1b); these hydraulic-hydrological links apply and emerge as dominant across all studied hydro-climatic conditions. In contrast, ET variations are regulated by temperature variations as long as there is sufficient water supply from precipitation and/or soil moisture to efficiently use the associated energy supply in vegetation transpiration and evaporation. In SEU, where both soil moisture and precipitation are on average small and declining in summer (Fig. 3), water limitation leads to declining ET and thereby to relatively high region-and summer-specific correlation between ET and soil moisture (Fig. 5).
The emergent large-scale relationships found in this multi-catchment study may not be evident in each individual catchment, as various underlying physical processes affect local hydrological behavior in each catchment (e.g. various geological and climatological conditions, anthropogenic influences, different hydrological regime types, and other specific catchment features). However, the fact that a clear statistical large-scale signal of dominant correlation between soil moisture and runoff emerges consistently by use of different datasets from different data sources for numerous catchments with different local features, e.g., including stronger or weaker anthropogenic influences, and annual, seasonal and monthly variation conditions within each climate region, and even with or without consideration of time lags between variables, indicates the main study conclusions as robust and reliable for the European continent and its main climate regions.

conclusion
The results of this study show that monthly soil moisture anomalies correlate well with runoff anomalies across multiple study catchments of various scales in the different climate regions of Europe. Average monthly soil moisture and runoff further co-vary, while extreme soil moisture and runoff values also co-occur across Europe. For the other studied hydro-climatic variables (precipitation, ET, temperature), the degrees of co-variation, anomaly correlation and extreme value co-occurrence with soil moisture are predominantly low in the European climates. A summer correlation exception appears in SEU, between the region-specific water-limited and declining ET and the generally declining soil moisture during summer.
As outlined in the Introduction, the results and findings of this study can contribute to efforts for eventually answering a number of community-formulated key open questions in hydrology 15 . Overall, this data-driven multi-catchment and multi-climate study shows robust and mechanistically explainable emergence of dominant, first-order co-variation between soil moisture and runoff on large scales, across the European climates and used datasets. These robust results should be further considered in large-scale soil moisture modeling, for example in the land surface schemes of ESMs and hydrological soil moisture models. The runoff results indicate a possible new, complementary approach to assessing variability and change of large-scale soil moisture conditions by use of long-term time series of monitored catchment-integrating stream discharges. They benchmark a data-given large-scale co-variation pattern that can be used for model testing, but also calls for further research on large-scale www.nature.com/scientificreports www.nature.com/scientificreports/ pattern emergence in other parts of the world. Moreover, the found soil moisture relationships should also be investigated at finer scales, with explicit consideration of various local factors, such as different catchment features, hydrological regimes, anthropogenic influences, and their influence on the large-scale co-variation patterns. Table 1 lists the data products, and their temporal extents and spatial resolutions, used to analyze the large-scale variation patterns of soil moisture and other main hydro-climatic variables (precipitation, actual ET, runoff and temperature) in the 1378 study catchments across Europe during 1980-2010. Both station-based observation data (the GSIM, GHCN_CAMS and GPCC-V7 products in Table 1) and model-based re-analysis data (the ERA-Interim and GLEAM-3.2a products in Table 1) are used for the different study variables. There is no doubt that the best way to study co-variation patterns would be to use a purely observation-based dataset. However, there is no such database available for soil moisture and ET with homogeneous and consistent temporal and spatial resolution. Therefore, there is an inevitable need to use alternative datasets for soil moisture and ET in this study. For this purpose, we have used model-based re-analysis data, which many researchers regard as the available datasets closest to direct observations 27 . The basic dataset for this study called as the fully independent dataset, includes all available observation-based products for precipitation, temperature, and runoff from GPCC-V7 28,29 , GHCN_CAMS 30 , and GSIM 31,32 databases, while obtaining soil moisture and ET from two different re-analysis products of ERA-Interim Land re-analysis 33,34 and GLEAM-3.2a 35,36 , respectively. Conversely, for the variables besides runoff, the internally consistent dataset retrieves all variables from the same re-analysis product (ERA-Interim re-analysis data 27 and the Land re-analysis data 33,34 ), while the intermediate dataset uses soil moisture (which is a key variable in this study) from a different product (GLEAM-3.2a) than the other datasets (ERA-Interim\Land re-analysis data) with keeping the source for other variables the same as the fully independent dataset.

Methods
It is worth mentioning that if all variable data are from the same product, there is a risk of discovering relationships built into that re-analysis model. However, by using an independent dataset, there is another risk of discovering unrealistic or disturbed relationships due to possible inconsistencies between the modeling and data assimilation methods used in different re-analysis datasets. In other words, there is always some risk of arriving at misleading correlations based on either an internally consistent or a fully independent dataset, and only comparison with corresponding results from a fully observation-based dataset for all variables (which is not available at the current time for soil moisture and ET) can reveal which type of model-based dataset provides the most realistic results. In the absence of a fully observation-based dataset, we have repeated the analysis for different datasets in order to reveal and evaluate how results depend on the choice of different types of (independent, dependent or intermediate) model-based reanalysis data. The dependency or independency of different datasets or model outputs, however, is not a main or critical investigation issue in this study. We here use and compare results from the multiple datasets to see if the findings of large-scale soil moisture correlations are highly sensitive to (and thereby uncertain) or remain robust irrespective of which specific dataset is used for obtaining these large-scale results.
Among the data for the different study variables (Table 1), the data availability for runoff is limiting, even though the GSIM dataset has greatly expanded this from previous availability of stream discharge data. While data for the other variables are consistently available as global gridded time series with few missing values, the runoff data in GSIM (as in other runoff datasets) represent water flow through the whole catchment of each hydrometric station, with its specific data availability and measurement time series. As such, runoff data (discharge divided by contributing catchment area) is available for 5235 European catchments in the GSIM dataset. However, only 1378 of these catchments have at least 300 non-missing monthly runoff values (corresponding to 25 years) within the study period 1980-2010. This study therefore includes and is based on the relatively complete runoff time series for these 1378 catchments across the three European climate regions of NEU, CEU and SEU (Fig. 2). Many of these catchments may be influenced by human activities and other local conditions affecting their hydrological regimes 23 . However, this study does not aim to study pristine catchments. It aims to test whether a possible large-scale dominance of soil moisture correlation with runoff, or alternatively with ET, or any other of the studied hydro-climatic variables, may emerge as a clear statistical large-scale signal above the noise of all local peculiarities in each of the multiple study catchments.
To get relevant regional statistics for identifying such a possible large-scale signal, all data used in this study are spatially aggregated to catchment scales and then to climate regions. Given the catchment association of runoff data, the datasets for the other hydro-climatic variables also need to be catchment-wise aggregated for consistency and compatibility. Depending on catchment area and grid sizes (spatial resolution) of each dataset, each catchment may include one or more data grid cell(s). The spatial aggregation of the gridded data for each catchment is based on area-weighted catchment averaging over the grid cells covered by the catchment, providing a catchment-specific data time series for each variable. For regional aggregation, the catchment-specific data time series are used for further area-weighted averaging over the study catchments included in each climate region, providing a region-specific data time series for each variable. As for the regional categorization into the three main European climate regions NEU, CEU, and SEU, the study follows the IPCC-proposed delineation of appropriate climate region delineation for risk management of regional extremes and advancement of change adaptation in this continent 2 . These distinct European climate regions also represent a continental-scale climate gradient from the wet and energy limited in NEU, towards the intermediate conditions in CEU, to the relatively dry and water limited conditions in SEU.

Data availability
The ECMWF re-analysis for ET and soil moisture from ERA-Interim/Land reanalysis can be found at https:// apps.ecmwf.int/datasets/data/interim-land/type=fc/, and the ERA-Interim reanalysis for precipitation and temperature at https://apps.ecmwf.int/datasets/data/interim-full-moda/levtype=sfc/. Soil moisture and ET data from the GLEAM-3.2a model are available at https://www.gleam.eu/. GPCC-V7 precipitation data are