Global monthly sectoral water use for 2010–2100 at 0.5° resolution across alternative futures

Water usage is closely linked with societal goals that are both local and global in scale, such as sustainable development and economic growth. It is therefore of value, particularly for long-term planning, to understand how future sectoral water usage could evolve on a global scale at fine resolution. Additionally, future water usage could be strongly shaped by global forces, such as socioeconomic and climate change, and the multi-sector dynamic interactions those forces create. We generate a novel global gridded monthly sectoral water withdrawal and consumption dataset at 0.5° resolution for 2010–2100 for a diverse range of 75 scenarios. The scenarios are harmonized with the five Shared Socioeconomic Pathways (SSPs) and four Representative Concentration Pathways (RCPs) scenarios to support its usage in studies evaluating the implications of uncertain human and earth system change for future global and regional dynamics. To generate the data, we couple the Global Change Analysis Model (GCAM) with a land use spatial downscaling model (Demeter), a global hydrologic framework (Xanthos), and a water withdrawal downscaling model (Tethys).


Background & Summary
This paper documents a global monthly gridded (0.5° resolution) sectoral water withdrawal and consumption dataset that contains conditional projections of water usage (from 2010 to 2100) across a range of future socio-economic and climate scenarios. This dataset is important because it quantifies the sources of demand-side pressures on scarce water resources globally under diverse future scenarios. Mekonnen & Hoekstra 2016 1 (also cited in the UN World Water Development Report 2022 2 ) estimated that roughly 71% (4.1 billion people) of the world's population was exposed to water scarcity at least one month in the year over the period from 1996 to 2005. In their more recent study, Van Vliet et al. 2021 3 estimate global water scarcity over the period from 2000 to 2010 to range from 30% (without water quality considered) to 40% (when also including water quality). Global water scarcity is expected to increase across the globe with critical implications for sustainable development [4][5][6][7][8] . Recent studies highlight that future water scarcity is primarily driven by human water demands rather than climate impacts on water availability 4,9 . Additionally, irrigation water demands have been shown to have the largest relative impact on water scarcity 5,6,10 . Furthermore, water access, availability and demands are highly localized, with large energy and economic costs associated with water transfers, and thus a regional understanding of water use is essential 11,12 . This paper accounts for all of these key factors by providing a transparent and open-source dataset and accompanying methodology that captures the key drivers of future water scarcity (water use for human activities) at a fine spatio-temporal scale (0.5° resolution and monthly) and with added detail on irrigation water use by crop types.
Past studies [13][14][15] that have evaluated global gridded water use at monthly resolution have been limited to historical analyses. Other studies, such as World Resources Institute (WRI) 2019 16 , look at future water withdrawals but only at an annual time resolution and up to 2040 with sectoral detail divided into domestic, industry, agriculture and livestock sectors. In this paper we offer a finer spatiotemporal resolution for future projections compared to previous studies applied to a broader suite of socioeconomic and climate forcing scenarios. Additionally, we provide more detail in the irrigation sector which includes 13 different crop types by www.nature.com/scientificdata www.nature.com/scientificdata/ coupling our water demand model with a land allocation model. Table 1 compares the key features in this study to a representative set of previous studies that have analysed global water use. Table 1 highlights that, compared to previous studies, our study captures additional sectoral detail (especially by irrigated crop types) and a more diverse set of future scenarios.
This study thus addresses the critical need for future projections of distributed water demand at a fine resolution so that scientists and water managers can start to explore and plan for future water needs. The dataset could also directly support the growing MultiSector dynamics research literature, particularly scenario-based studies of the future interactions between water and other sectors (e.g., energy and land) across scales in a global context [17][18][19] . The diverse set of 75 scenarios we produce supports scenario-based water demand uncertainty analysis by varying key elements of human and earth system change. The entire dataset can be downloaded from a dataverse online repository 20 (https://doi.org/10.7910/DVN/VIQEAB) and is accompanied by a meta-repository (https://jgcri.github.io/khan-etal_2022_tethysSSPRCP/) that provides detailed figures and workflows for interested readers.
We generated this dataset by linking together multiple models and datasets designed to explore the dynamic interactions among energy, water, and land systems at global scale and gridded resolution. Central to our modeling workflow is the Global Change Analysis Model (GCAM 4 ), an integrated tool for exploring the coarse regional dynamics of the coupled human-Earth system and the response of this system to global change, including human system and climate system changes into the future. Tethys 21 then spatially and temporally downscales outputs from GCAM to grid resolution. We enhance Tethys' projections of irrigation water usage by coupling it with Demeter 22 , a high-resolution downscaling model that uses GCAM outputs to calculate global gridded land-use change. With the combination of GCAM and Demeter, Tethys is able to project water withdrawal and consumption demands for 6 sectors (domestic, electricity generation, irrigation, livestock, industry and mining). The irrigation sector is further divided into 13 different crop types (biomass, corn, fiber crop, miscellaneous crops, oil crop, other grain, palm fruit, rice, root tuber, sugar crop, wheat, fodder herb, and fodder grass). Withdrawal refers to the total volume of water that is extracted by a user from a water source. While some of this withdrawn water may be returned to its original source (e.g., a river), a remaining portion (referred to as consumption) may not returned to the system (e.g., evaporated water). To capture a range of futures reflecting diverse global change across the human and Earth systems, we used 75 scenarios comprised of a combination of 4 Representative Concentration Pathways (RCPs) 23 , 5 Shared Socioeconomic Pathways (SSPs) 24 , and 5 Global Climate Models (GCMs) from the Inter-sectoral Impact Model Intercomparison Project (ISIMIP) 25 protocol 2b. 15 viable combinations of the SSPs and RCPs were combined with each of the 5 GCMs to arrive at the final 75 scenarios. Graham et al. 2020 4 provides the details on these original GCAM runs for the 75 scenarios which included a characterization of demand-side narratives corresponding to the SSPs for the water sector 26 . The GCAM outputs were then passed on to the Demeter model to produce the downscaled irrigated crop land area for 13 different crops in the study by Chen et al. 2020 27 . The combined outputs from the GCAM study and the Demeter study were used in this study to calculate the final downscaled water demand results. The entire workflow of data from the original scenarios through GCAM and Demeter to Tethys is shown in Fig. 1. www.nature.com/scientificdata www.nature.com/scientificdata/ Methods GCAM produces water withdrawal and consumption outputs for 32 regions for the domestic, mining, power generation, industry, and livestock sectors and for 434 region-basin intersections for the irrigation sector as shown in Fig. 2. (These spatial boundaries 28 are determined by Moirai 29 , the land data system used by GCAM). Tethys v1.3.1 30 was used to downscale the water withdrawals and consumption outputs from GCAM onto a 0.5° by 0.5° grid as shown in Fig. 3. Of the 259,200 possible grid cells at this resolution (360 × 720), only the 67,420 cells categorized as land are considered. The Tethys outputs focus only on demand-side dynamics, so they make no distinctions regarding the water supply sources used to meet the demands (i.e., surface water, groundwater, desalinated water), though GCAM does make this distinction. While many adjacent regions differ largely in total water demand, most of this demand is directly related to total population or land area, and often concentrated in a few cells, such as those containing cities. As a result, spatial distributions at the border are smoother than they appear on the region scale map, without additional consideration of the boundaries by Tethys.
Spatial downscaling -non-agriculture. Spatial downscaling for non-agricultural (domestic, electricity, manufacturing, and mining), water withdrawals and consumption in each grid cell are assumed to be proportional to that cell's population as compared to the larger GCAM region within which that grid cell is located. The population data set used for this paper is from "Gridded Population of the World" (SEDAC, 2016) 31 . Tethys uses the nearest available year, which for this paper was 2010 in 2010, and 2015 in all other years. Each region's population is determined by taking the sum of population over all cells belonging to that region. For each of these sectors, Tethys calculates the water withdrawals and consumption as shown in Eq. 1, 2 for a given cell by: www.nature.com/scientificdata www.nature.com/scientificdata/ Spatial downscaling -livestock. Spatial downscaling of livestock water use is calculated using gridded global maps from the FAO gridded livestock of the world (Wint and Robinson, 2007) 32 dataset for six types of livestock (cattle, buffalo, sheep, goats, pigs, and poultry). GCAM outputs are organized into five types (beef, dairy, pork, poultry, and "sheepgoat") and these are first reorganized to match the six types from Wint and Robinson, 2007 32 using ratios for each region estimated from the dataset. The ratios are stored in two files that are used as inputs to Tethys: bfracFAO2005.csv ("buffalo fraction") and gfracFAO2005.csv ("goat fraction"). The following formulas are used to map the water withdrawals and consumption values for the five GCAM livestock types to the six livestock types from Wint and Robinson, 2007 32 for each region: www.nature.com/scientificdata www.nature.com/scientificdata/ No adjustment is required for pork (pigs) or poultry. After this, downscaling for each livestock type is very similar to downscaling the nonagricultural sectors, with the exception that the respective livestock population (heads) is used as the proxy instead of human population.  The results for each of the six types are then added together to get the total livestock withdrawal and consumption for each cell:  In cases where the GCAM outputs for a region-basin have nonzero irrigation of a crop type, but Demeter shows no corresponding cells (due to the harmonization with the base map), the distribution is assumed to be proportional to land area. Note that in the current version of Tethys (v.1.3.1) used in this paper, biomass is also downscaled uniformly within a region-basin intersection (with respect to land area), as given by: Where: ρ b = Proportion of electricity used for buildings ρ it = Proportion of electricity used for industry and transportation ρ b +ρ it = 1 ρ h = Proportion of electricity used for buildings heating ρ c = Proportion of electricity used for buildings cooling ρ u = Proportion of electricity used for buildings other ρ h +ρ c +ρ u = 1 HDD = Heating Degree Days CDD = Cooling Degree Days Heating degree days (HDD) and cooling degree days (CDD) are indicators for the amount of electricity used to heat and cool buildings, and are calculated from mean daily outdoor air temperature. HDD for a month is the sum of (18 °C -temperature day ) across all days where temperature is less than 18 degrees Celsius. CDD is the sum of (temperature day -18°C) across all days where temperature is greater than 18°C. Annual HDD and CDD are the sum of their respective monthly values.
Tethys uses HDD, CDD, and ρ values for each cell from the nearest available year in the input files listed at the end of this subsection, which is 2010 for this data set.
The formula is modified for cells with low annual HDD or CDD as described in Huang et al., 2018 13 , since these may not have heating or cooling services despite nonzero values of ρ h or ρ c .
When HDD year <650, the HDD term is removed (leaving only CDD) and ρ h is reallocated to the cooling proportion, giving: When annual HDD and CDD are both below their respective thresholds (<650 for HDD and <450 for CDD), all sources of monthly variation vanish and the formula reduces to

= ×
In the event that the model has no monthly data for a basin with nonzero irrigation, the profile of the nearest available basin is used.

Data Records
Data outputs from this experiment have been minted and are available in the repository indicated in Table 2. A meta-repository with detailed information on the workflows to produce the data is also available and shown in Table 2.
The dataset contains separate files with names which start with a combination of the following SSP, RCP, GCM and water usage type: www.nature.com/scientificdata www.nature.com/scientificdata/ The datasets files have been then divided into sub-sets to manage their size. The following list shows the file structure for one of the SSP, RCP, GCM combinations: The files with "_crops_" in their names include data for individual crops while the files with "_sectors_" in their name include data for other aggregated sectors. The following expanded list shows the individual files inside the zipped files for the example ssp1_rcp26_gfdl cases. "cd" stands for "consumption downscaled" and "tcd" stands for "temporal consumption downscaled":

Technical Validation
GCAM outputs are calibrated at a regional scale to match observed data for base year values as described in Graham et al.  www.nature.com/scientificdata www.nature.com/scientificdata/ whereas the current version of Tethys uses crop landcover maps from Demeter. Consumption and withdrawals generally showed similar spatial patterns, with differences in assumptions regarding each region's and sector's consumption-to-withdrawal ratios accounting for some differences. There are also some differences in accounting. For example, in this study hydropower is included in the consumption for electricity generation category, which by itself is several times greater than the entire water consumption for electricity generation in Huang et al. 2018 13 .
The second data set we compared with is from Mekonnen, M.M. and Hoekstra, A.Y. 2011 15 . It contains monthly total blue water consumption values representing an average of years 1996-2005, which we compare to the base year values from 2010 from this study. The sectoral breakdown is different between the two datasets, but the datasets are at the same spatial-temporal resolution, so we compare monthly totals for each grid cell. Comparing datasets cell by cell is highly sensitive to local differences, and since our spatial downscaling is based on proxy quantities we do not expect every detail to be recreated exactly.
Nonetheless, there is general agreement in the sub-regional patterns across the data sets as seen in Fig. 5. Figure 6 also shows similar sub-annual patterns across the dataset with some differences in total values being attributed to underlying data and year of the study.

Usage Notes
Users are encouraged to explore the accompanying meta-repository (https://jgcri.github.io/khan-etal_2022_ tethysSSPRCP/index.html), which provides detailed visualization across the various scenarios, sectors and time periods. Users can then download specific datasets for water withdrawal or consumption for relevant sectors, crops and desired SSP, RCP or GCM from the accompanying dataset repository 20 (https://doi.org/10.7910/DVN/ VIQEAB) to analyze the raw data. Some example figures from the meta-repository are presented in this section.   www.nature.com/scientificdata www.nature.com/scientificdata/ Figure 7a shows the total annual water withdrawals by sector for each of the 75 SSP-RCP-GCM combinations from 2010 to 2100. Similar figures are available for consumption as well as by crop. Figure 7b shows the sub-annual temporal distribution across the same set of scenarios for 2010 and for 2100. Patterns such as an increase in summer water withdrawals can be seen in such figures.
The meta-repository also includes details on three selected basins: the Indus, Nile and Upper Colorado River Basin (U.S.). These are used to show how the data can be used to explore trends and patterns at this finer resolution. Figure 8a  www.nature.com/scientificdata www.nature.com/scientificdata/ water user in the Indus basin over time for the SSP1-RCP2.6-GFDL scenario. Figure 8c,d show the accompanying distribution of total water withdrawals both spatially and temporally. Similar figures are provided in the meta-repository for water consumption as well as for other sectors, crops and scenarios.
We highlight that several developments have been planned in the next release of Tethys to improve the methodologies used to downscale water use for the dataset in this paper. Some of the key planned developments include: www.nature.com/scientificdata www.nature.com/scientificdata/ 1. Improving the spatial distribution of powerplant water use based on actual and projected powerplant location instead of based on population. 2. Updating the output resolution to 1/8 th degrees from the existing ½ degree resolution. 3. Including future population projections to improve on the current methodology which uses a static base year population map even for future years. 4. Improving the downscaling of biomass water use which is currently distributed equally within each region. 5. Making Tethys compatible with GCAM-USA 37 , which allow use of more accurate state-level water use data instead of using national data as inputs to Tethys. 6. Comparing gridded outputs against observational data for individual sectors and regions where data is available. Table 3 provides links to all models, data, versions and DOI's used to generate this dataset.