Background & Summary

Freshwater ecosystems are hotspots of biodiversity that provide vital resources to humanity. These systems are however increasingly threatened by a multitude of anthropogenic pressures, including land cover change, pollution and hydraulic engineering schemes1,2,3. On top of these, climate change will pose a progressively larger threat in the future2,4. Ongoing and expected increases in air temperature and changing precipitation patterns alter water temperatures and flow regimes, with water scarcity issues and ecological impacts becoming apparent already at relatively moderate global mean warming levels of 1.5 to 2 °C4,5,6,7.

Developing effective strategies to alleviate the pressures on freshwater ecosystems worldwide requires globally consistent datasets that can be used to diagnose the threats1,8. During recent years, there have been various efforts to create compilations of relevant characteristics of freshwater systems and their surrounding watersheds8,9,10. These efforts have significantly advanced the availability of consistent high-resolution data on for example river network topology, watershed boundaries, streamflow and a variety of catchment characteristics including current climate, geology and soil, and landcover. However, a consistent global dataset that can be used to assess threats to freshwater systems imposed by climate change is yet lacking. Existing global projections of climate-related changes in hydrology typically focus on water quantity only4,5,11, without considering water temperature. However, water temperature plays a crucial role in many physical, chemical, and biological processes, including the solubility of oxygen and the performance of aquatic plants and animals7,12,13. Thus, consistent data on potential streamflow and water temperature would be an important asset for a plethora of socio-economic as well as ecological analyses, including assessments of potential changes in freshwater biodiversity7,14 and the availability of global water resources, in terms of both quantity and quality15,16,17,18.

Recent developments in models and computational efficiency have enabled us to simulate both streamflow and water temperature consistently, across the globe, and under a range of potential future climate conditions at high spatial and temporal resolution. Our FutureStreams dataset contains weekly streamflow and water temperature at a 5 arcminute resolution (approximately 10 km at the equator) and global extent for multiple climate scenarios up to the year 2099 (see Table 1). For comparison, we also provide streamflow and water temperature data for the recent past (1976–2005). We furthermore include a set of derived streamflow and water temperature metrics that are expected to be particularly relevant from an ecological perspective (see Tables 2 and 3), designed based on indicators of hydrologic alteration19 and bioclimatic variables computed in the widely used WorldClim dataset20 as well as the CMCC-BioClimInd dataset21. Datasets of derived bioclimatic indicators have been proven essential for ecological applications in the terrestrial realm, notably for projecting potential climate change impacts on biodiversity22,23, but an equivalent for freshwater environments was lacking.

Table 1 Overview of output variables discharge (streamflow) and water temperature, available for each scenario and GCM (see Fig. 1, GCMs used are gfdl, hadgem, ipsl, miroc and noresm).
Table 2 Ecologically relevant derived variables (bioclimatic indicators) for streamflow Q.
Table 3 Ecologically relevant derived variables for water temperature (WT).

We produced the dataset using the state-of-the-art global hydrological model PCR-GLOBWB, validated by Sutanudjaja et al.24, coupled to the Dynamical Water temperature model DynWat, validated by Wanders et al.13. These are the only models currently capable of computing streamflow and water temperature globally at a native resolution of 5 arcminute (approximately 10 km at the equator). We forced the models with meteorological time series from five general circulation models (GCMs) selected by ISI-MIP25 (the Inter-Sectoral Impact Model Intercomparison Project) for the historical period as well as four climate scenarios (RCPs, Representative Concentration Pathways), thus covering a range of future climate scenarios. A historical simulation forced with reanalysis data is available as well. Streamflow and water temperature are available for the historical period as well as each RCP and each GCM in netCDF format26.

Methods

Workflow

We created the FutureStreams dataset using a combination of General Circulation Model (GCM) output, reanalysis data and state-of-the-art hydrological and water temperature models (Fig. 1). We obtained the GCM output from the Inter-Sectoral Impact Model Intercomparison Project (ISI-MIP) ensemble. The ISI-MIP ensemble consists of output from five CMIP5 GCMs, for four Representative Concentration Pathways (RCPs, thus 20 scenarios in total)25, which is downscaled to 0.5° and bias-corrected against the Watch Forcing Dataset27 (WFD). FutureStreams also includes a historical simulation forced with bias-corrected E2O (Earth2Observe) reanalysis data28. We used historical meteorological time series as well as future projections from the GCMs under the four RCP scenarios as input to the global hydrology and water resources model PCR-GLOBWB24. The hydrological model produces runoff that is used with the high-resolution water temperature model DynWat to simulate water temperature and streamflow time series. Both PCR-GLOBWB and DynWat run at a native resolution of 5 arcminute (approximately 10 km at the equator).

Fig. 1
figure 1

Schematic overview of the study design. The top left figure shows 30-year running mean global air temperature difference relative to 1976–2005 for each ISI-MIP GCM-RCP combination25. Temporally and spatially varying meteorological inputs are provided to PCR-GLOBWB and DynWat (right panel, from Sutanudjaja et al.24). The thin red lines indicate surface water withdrawal, the thin blue lines groundwater abstraction, the thin dashed lines return flows from water use. For DynWat see Wanders et al.13. The bottom-left panel shows the model outputs, which are weekly gridded discharge and water temperature per GCM, for the historical period and each RCP, at 5 arcminute resolution, as well as ecologically relevant derived variables.

Hydrological and water temperature models

The global hydrology and water resources model PCR-GLOBWB uses daily meteorological inputs to simulate the hydrological response in terms of local runoff. If the meteorological input data, such as the GCM input used here, has a lower spatial resolution than the 5 arcminute resolution of PCR-GLOBWB, the model statistically downscales the data to 5 arcminute based on spatial patterns in historical meteorological data24. From this, PCR-GLOBWB computes the water balance between two soil layers and a groundwater layer, and has up to four land cover types per grid cell24. Local runoff at the surface as well as in the soil and groundwater layers is routed along the river network using the kinematic wave approximation and includes floodplain inundation (Wanders et al. 2019). PCR-GLOBWB includes historical development of over 6,000 man-made reservoirs as well as water use for irrigation, industry, livestock and domestic purposes. Streamflow and meteorological inputs, such as air temperature and radiation, are then used to force DynWat to derive dynamic physically based water temperature estimates13. DynWat includes temperature advection, radiation and sensible heating, ice formation and breakup, thermal mixing and stratification in larger water bodies, as well as effects of water abstractions and reservoir operations at a spatial resolution of 5 arcminute with a daily global coverage.

Historical and climate scenario forcing from ISI-MIP GCMs

We forced PCR-GLOBWB with output from the five ISI-MIP GCMs (GFDL-ESM2M, HadGEM2-ES, IPSL-CM5A-LR, MIROC-ESM-CHEM, NorESM1-M) for four RCPs (2.6, 4.5, 6.0 and 8.5 W/m2 in 2100). These GCMs were selected by ISI-MIP from the full CMIP5 ensemble to cover the full envelope of potential climate changes from wet to dry and warm to cold (Warszawski et al. 2013).

We used the GCM output to produce a baseline scenario for the historical period 1976–2005 and projections for the period 2006–2099. The GCM inputs are bias-corrected by ISI-MIP to correct the climate model data for systematic deviations of the simulated historical data from observations27, and are provided at a spatial resolution of 0.5° and a daily temporal resolution. The GCMs will therefore reproduce the correct statistical properties of floods and droughts (e.g. mean, severity, duration), but not necessarily the exact timing as observed in the historic record.

We obtained daily mean surface air temperature, precipitation, reference potential evapotranspiration and downward surface solar radiation from the ISI-MIP ensemble and used these to force the hydrological and water temperature models. As simulations of cloud cover and relative humidity are not bias-corrected via the ISI-MIP ensemble, we used CRU TS3.21 monthly climatology29.

Historical reanalysis-forced simulation

We also provide streamflow and water temperature from a historical simulation forced by reanalysis data (1979–2005). Reanalysis data consists of observations assimilated into weather models, to create consistent and globally complete time series. A simulation forced with reanalysis data therefore enables validation of the model not only with respect to climatology and variability, but also with respect to timing of actual events such as droughts or floods. Output from a reanalysis-forced simulation can furthermore be used in comparison to the GCM-forced historical simulations to assess how well the GCM-forced simulations are capable of capturing the climatology and variability. Here we used the Earth2Observe (E2O) reanalysis data for a historical simulation from 1979 to 2005. E2O uses WFDEI data (WATCH forcing data methodology applied to ERA-Interim28).

Simulations and output

We forced the PCR-GLOBWB model with the meteorological data from ISI-MIP and E2O as described above. We started the simulations in 1951 with initial conditions from an earlier E2O-forced simulation. Subsequent years (1951–1975) are considered spin-up. The E2O reanalysis-forced simulation starts in 1976 directly from the initial conditions.

We performed the simulations at SurfSara Cartesius, the national e-infrastructure for Dutch universities and institutes, parallelized the calculations along watershed boundaries24, and aggregated the output using Python. From the model output, we extracted weekly streamflow and temperature values for each year from 1976 through 2099. The output is grouped in 10-year chunks, separately for each GCM and RCP (Table 126). In addition, we calculated a set of derived streamflow and temperature indicators that are expected to be particularly relevant from an ecological perspective (Tables 2 and 3). We calculated these derived variables as long-term aggregates/averages for six periods aligned with those commonly used in the worldclim dataset20 as well as the CMCC-BioClimInd dataset21: 1976–2005 (1979–2005 for E2O); 2021–2040; 2041–2060; 2061–2080; 2081–2099. These derived variables are also provided through26.

Data Records

Available data files at Yoda26:

  • weekly streamflow (m3/s, called discharge in filename) and water temperature (K) globally from 1976 through 2005 for the historical period, or 1979–2005 for the E2O simulation, and 2006 through 2099 for each RCP (2.6, 4.5, 6.0 and 8.5), stored in chunks of 10 years per GCM (GFDL - HadGEM - IPSL - MIROC - NorESM), see Table 1.

  • set of ecologically-relevant (indicator) variables derived from the weekly values: see Tables 2 and 3, as well as Usage notes below and scripts (Code Availability) for more details.

  • masks (see Usage notes below)

The netCDF4 files have regular latitude - longitude grids with a cell size of 5 arcminute (~10 km) and a global extent, including all continents except for Antarctica (90°N to 90°S latitude and 180°W to 180°E longitude).

Technical Validation

Water temperature records

Wanders et al.13 have validated water temperatures from DynWat, using the same model set-up and the ERA-40 and ERA-Interim reanalysis data as forcing. They showed that the modeled temperature matches observed temperatures well (R2 = 0.861, using observations at 358 locations), and that DynWat is capable of capturing spatial patterns and trends in water temperature, thereby providing confidence in the quality of the dataset.

Streamflow records

Sutanudjaja et al.24 showed that seasonality, inter-annual anomalies, and the general discharge characteristics in PCR-GLOBWB compare well to observations, especially at the 5 arcminute resolution (modeled discharge compared to time series at 5,363 locations gives an R2 mode between 0.7 and 0.8). They furthermore showed that PCR-GLOBWB is able to reproduce trends and seasonality in total water storage, as observed by satellite measurements.

Usage Notes

Variable names, units and timestamps

Streamflow is runoff routed along a drainage network, in m3/s, also known as discharge, which is the variable name used in the files. Water temperature is given in units of Kelvin. Filenames include the variable name, GCM, scenario (hist for historical, or one of the RCPs) and the time period (years). The timestamps in the files reflect the last date of the period over which the output was averaged, so the first timestamp of the weekly averages is January 7th 1976.

Ecologically-relevant variables

The ecologically-relevant streamflow and water temperature variables derived from the weekly values are established based on a combination of classification frameworks, i.e., indicators of hydrologic alteration19, terrestrial bioclimatic variables in the worldclim dataset20 as well as the CMCC-BioClimInd dataset21, aggregated accordingly: 1976–2005 (1979–2005 for E2O); 2021–2040; 2041–2060; 2061–2080; 2081–2099. The scripts used to compute these derived variables can be found under Code Availability.

For files containing information on timing (see Tables 23), note that the counting is 0-indexed. So week numbers run from 0 through 51, months from 0 to 11. For timing of quarters, 0 is DJF, 1 is MAM, 2 is JJA, 3 is SON. The week number (for WT-wmin, WT-wmax, Q-wmin, Q-wmax) is determined as the mode, i.e. the most frequent week number within a period. For each period (20, 25 or 30 years) we looked for the week number in which the minimum or maximum water temperature or discharge occurs. If that happens most often in week X, that week number is stored. It can however occur that a certain minimum/maximum temperature or discharge occurs equally often in multiple weeks - then we assign a missing value.

The variables Q-bfi and Q-vi are calculated according to Pastor et al.30. The baseflow index is an indicator of the importance of stored sources; a high index indicates that flow is mostly sustained by stored sources such as groundwater.

Scripts used to create the derived variables are available through the FutureStreams GitHub repository (see Code Availability below).

Multi-model set-up

We provide future scenarios for four RCPs (representative concentration pathways; 2.6, 4.5, 6.0 and 8.5 W/m2 in 2100) for the five ISI-MIP GCMs. Projections differ across RCPs due to differences in greenhouse gas forcing, and across GCMs due to differences in e.g model parameterization and resolution. Generally the spread across GCMs is larger than that across RCPs7,31. When interested in the general effect of climate change, users are advised to use the mean or median across the GCMs, rather than selecting a specific GCM. When interested in the spread across GCMs, users can explore or represent that in various ways, such as color intensity indicating agreement amongst models5, bar or violin plots7 etc.

Warming levels

To facilitate assessments and comparisons of streamflow and water temperature at a certain air temperature rise rather than specific years5,7, we provide a table with the years in which each GCM/RCP reaches the global mean temperature rises 1.5°, 2.0°, 3.2°, 4.5° compared to pre-industrial temperatures (as used by Barbarossa et al.7) with our scripts (see Code Availability). These years represent the central value of a 30-year running mean, so users should evaluate the 30-year mean (or other statistic) of discharge or water temperature centered around the year that a certain warming level is reached, which is specific to each RCP and GCM combination. For instance, if 1.5° warming is reached in 2040, the 30-year period 2025–2054 should be considered.

GCMs, bias-correction and reanalysis data

The majority of our simulations are forced with meteorological time series from GCMs. Those are bias-corrected27 before being applied to impact models such as PCR-GLOBWB, which corrects for systematic deviations of the simulated historical data from observations. For instance, for temperature the offset in average temperature in the historical GCM simulation with respect to observations is subtracted from temperatures in all scenarios of that GCM. The bias-corrected GCM forcing should thus well represent climatology, but not necessarily timing of actual events such as floods and droughts. Reanalysis data is created by assimilating observations into weather models, to obtain consistent and globally complete time series. The output of the simulation forced with meteorological time series from the (E2O) reanalysis data should therefore reflect not only the average streamflow and water temperatures, but also timing of actual events such as droughts.

If users want to check for themselves how the GCM-forced historical simulations discussed here deviate from reanalysis-forced simulations, they can use the output from the E2O-forced simulation provided here, the monthly output linked to Wanders et al.13 (see also Code Availability) or the daily output of those simulations which are available from Niko Wanders upon request. The latter are forced with ERA-40/ERA-Interim reanalysis data.

Notes of caution

Beware of temperature in grid cells where streamflow is low, which can cause temperatures to become unrealistically high due to strong fluctuations in the water level. The computational timesteps currently implemented in DynWat are not sufficiently small to provide stable solutions for these conditions. For some lakes and reservoirs we observe a similar problem when lakes expand or shrink as a result of water levels changes. These locations can be masked and we can assume that water temperature follows the air temperature for these very shallow water layers. A file with locations of lakes and reservoirs is provided in the data repository (under indicators/mask) so users can mask these if desired.

Furthermore, we provide masks for each GCM-RCP-period which users can apply to the derived variables if desired. These masks are based on Q-mean and WT-mean and thresholds of 10 m3/s and 350 K, respectively. They can be found in the data repository (i.e. indicators/waterTemperature/WT-mask). The scripts used to create these masks are provided through the FutureStreams GitHub repository (see Code Availability below), which can be used to create masks with different thresholds. These scripts are called mask_unrealistic_values.py and maskFunctions.py.

We also provide scripts to mask out unrealistic values directly in the weekly Q and WT files, these scripts are mask_unrealistic_values_weekly.py and maskFunctions_weekly.py. In all these scripts the threshold for discharge is set to 10 m3/s and for water temperature to 350 K, but users can change those to their preferred values. The threshold value will be included in the resulting output file name.

Furthermore, we encountered spin-up issues in some pixels for the future RCP simulations. Instead of following the temperatures from the end of the historical simulation, temperatures drop at the beginning of the future simulation, so the first few weeks of 2006 temperatures can be unrealistically low. In Fig. 2, output of the year 2007 is used for the year 2006 .

Fig. 2
figure 2

Water temperature [°C] anomaly. The maps show the difference between the mean water temperature over the period 2070–2099 (RCP8p5) and the historical period 1975–2005. The map shows values only for rivers with streamflow greater than 50 m3/s and the width of the rivers is scaled based on the streamflow values for clarity of representation. Insets below the map show the original gridded resolution at 5 arcminute for cells with streamflow values greater than 10 m3/s. The bottom insets show water temperature time series sampled at specific grid-cell locations (white crosses in the insets) for the Amazon (−57.2083° longitude, −2.625° latitude), Danube (20.125° lon, 45.2083° lat) and Ganges (88.375° lon, 24.375° lat). Time series are represented for each GCM and RCP available within FutureStreams; thin lines represent weekly values, thick lines represent 10 year rolling means.

Fig. 3
figure 3

Streamflow [m3/s] anomaly. The maps show the difference between the log10 transformed mean streamflow over the period 2070–2099 (RCP8p5) and the log10 transformed mean streamflow over historical period 1975–2005. The map shows values only for rivers with streamflow values greater than 50 m3/s and the width of the rivers is scaled based on the streamflow values for clarity of representation. Insets below the map show the original gridded resolution at 5 arcminute for cells with streamflow values greater than 10 m3/s. The bottom insets show water temperature time series sampled at specific grid-cell locations (white crosses in the insets) for the Amazon (−57.2083° longitude, −2.625° latitude), Danube (20.125° lon, 45.2083° lat) and Ganges (88.375° lon, 24.375° lat). Time series are represented for each GCM and RCP available within FutureStreams; thin lines represent weekly values and thick lines represent 10 year rolling means.

Fig. 4
figure 4

Anomalies for selected ecologically relevant derived variables (bioclimatic indicators) for the same areas in the Amazone (left), Danube (middle) and Ganges (right) basins as used in Figs. 2 and 3. Differences are shown between RCP8.5 2080–2099 and 1976–2005. WT-cq is the water temperature of the coldest quarter, WT-range is temperature range, Q-max is maximum streamflow, Q-dm is streamflow of the driest month (see also Tables 2 and 3 below). For streamflow we show the difference between log10-transformed flow.