A socio-environmental geodatabase for integrative research in the transboundary Rio Grande/Río Bravo basin

Integrative research on water resources requires a wide range of socio-environmental datasets to better understand human-water interactions and inform decision-making. However, in transboundary watersheds, integrating cross-disciplinary and multinational datasets is a daunting task due to the disparity of data sources and the inconsistencies in data format, content, resolution, and language. This paper introduces a socio-environmental geodatabase that transcends political and disciplinary boundaries in the Rio Grande/Río Bravo basin (RGB). The geodatabase aggregates 145 GIS data layers on five main themes: (i) Water & Land Governance, (ii) Hydrology, (iii) Water Use & Hydraulic Infrastructures, (iv) Socio-Economics, and (v) Biophysical Environment. Datasets were primarily collected from public open-access data sources, processed with ArcGIS, and documented through the FGCD metadata standard. By synthesizing a broad array of datasets and mapping public and private water governance, we expect to advance interdisciplinary research in the RGB, provide a replicable approach to dataset compilation for transboundary watersheds, and ultimately foster transboundary collaboration for sustainable resource management.


Background & Summary
Finding solutions to manage scarce water resources to meet human water needs while sustaining ecosystems has become a priority for research [1][2][3][4] and policy makers. In 2015, the United Nations (UN) adopted 17 Sustainable Development Goals (SDG), including "Ensuring access to water and sanitation for all" (SDG6) 5 . However, increasing pressure on water resources, due to population growth and climate change, may be an obstacle to reaching this goal and may lead to increased competition and political tensions. This especially applies to transboundary river systems 6 , which represent 60% of freshwater supply basins and almost 50% of the total land area on Earth 7 .
In transboundary basins, data-and information-sharing provide a mechanism to foster cooperation among countries and, ultimately, achieve more sustainable water management and maintain peace and security 7 . Synthesizing a broad range of environmental and social datasets is further critical for in-depth understanding of watershed dynamics 8 and for assessing the sustainability of water allocation 9 . In large areas, spatial datasets are also valuable to capture the spatial heterogeneity of local conditions and for better understanding of different development trends across a region 10 . However, integrating scattered cross-disciplinary and multinational datasets is challenged by the disparity of data sources, transboundary discontinuity 11 and the inconsistencies in format, content, spatial and temporal resolution, languages 12 , and institutional norms. This can limit the study of transboundary socio-environmental systems, and calls for the development of a harmonized, transboundary, open-access database.
The Rio Grande/Río Bravo basin (RGB) epitomizes the challenges of managing water resources under conditions of scarcity 13 and transboundary cooperation issues 14 . Covering 552,382 km 2 , the basin is shared by two countries (United States and Mexico) and eight states, three in the U.S. (Colorado, New Mexico, and Texas) and five in Mexico (Chihuahua, Coahuila, Durango, Nuevo León, and Tamaulipas). Distribution of water resources is governed by distinct national and state water rights regimes, as well as bi-national and inter-state agreements 14 . The river is highly engineered via extensive damming and channelization 15 , which significantly disturb the natural flow regime 16 . Classified as one of the most endangered in the world 17 , the river hosts several endangered and threatened species, such as the Rio Grande Silvery Minnow, and the Southwestern Willow Flycatcher. Despite chronically severe water scarcity, human water demand has grown along with an increase in population (current population is estimated at 10.5 million people 18 ). Furthermore, irrigated agriculture makes up approximately 83.5% of the total surface and groundwater withdrawals 19,20 while cultivated areas cover only 3.5% of the basin 21 . Projected climate change and population growth are likely to lead to a growing imbalance between water supply and demand, potentially impacting the sustainability of the basin 22 and resulting in a cascade of negative impacts (e.g., increased risk of wildfires) 23 .
To overcome the challenges of data-and information sharing in transboundary basins, several initiatives were launched in recent years. The Transboundary Freshwater Dispute Database gathers global and regional information for the world's 310 international river basins, including the RGB 24 . Examples of topics include population, land-cover, irrigation, rainfall, and dams (http://gis.nacse.org/tfdd/index.php). In the RGB, a basin level database synthesizing hydrologic, administrative, land-use/cover, and water management datasets, was also developed at a finer resolution 25,26 . However, we are not aware of a similar approach to capture the social heterogeneity and complexity across the basin 27 . Hence, we compiled a comprehensive socio-environmental geodatabase for the RGB, encompassing geospatial data sets related to Water & Land Governance, Hydrology, Water Use & Hydraulic Infrastructures, Socio-Economics, and Biophysical Environment. An innovative contribution of our geodatabase is to provide open-access to a thorough collection of reliable and well documented, basin-wide geospatial data sets, as well as, to our knowledge, the most extensive spatial picture of multi-scale water governance in the RGB. The geodatabase expands the availability of information detailing the social and environmental characteristics of the basin and enables an integrated and cross-disciplinary approach to the watershed. Many outputs can be derived from the geodatabase, including maps, spatially explicit integrated models, and statistical analyses (see Usage Notes for additional details). The data synthesis approach presented here is transferable to other transboundary basins, and the geodatabase will help the research, management, and policy-making communities to foster transboundary collaboration.

Methods
Dataset preparation. The development of the socio-environmental geodatabase followed four steps: (1) data identification; (2) collection of raw data; (3) assembling, harmonizing, and geoprocessing of datasets; and (4) organization of the final geospatial data sets into the geodatabase (Fig. 1).
First, we drew on an interdisciplinary collaboration among social and environmental scientists to identify critical data requirements to support socio-environmental research in the RGB. The environmental anthropologists shared primary data analysis from ethnographic research and developed a qualitative typology of actors that we used to identify water and land governance datasets. We integrated land management as a key component for the study of socio-environmental dynamics in the watershed because land management decisions affect water run-off, stream flow, aquifer recharge, and soil erosion. The actor typology distinguishes the actors according to their roles in water and land management (water supply, consumption, infrastructure management, environmental protection, recreation) and their spatial scale of action (from local to bi-national level; Supplementary Information 1).
Second, we gathered 210 raw datasets from 49 distinct sources totalling around 53 GB. Where possible, and if the spatial resolution was high enough, we prioritized global datasets over national datasets and national datasets over regional datasets to limit data inconsistency and heterogeneity. Most of the raw datasets were collected from national agencies (67%), followed by state agencies (13%), global agencies (2%) and binational agencies (7%). We also collected datasets from joint research projects (5%) and private institutions (6%). Most of the input datasets were available open-access online. We also contacted public or private institutions to access unpublished information and were granted permission to publish the synthesized data. The original formatting of the raw datasets varied greatly. 62% were vector data (51.5% shapefiles, 3% KML/KMZ files, 7.5% feature class), 19% were raster files, and 19% were only available in tabulated format (comma-delimited text or Microsoft Excel files). In terms of language, 73% of the raw datasets were in English and 27% in Spanish. The detailed list of raw datasets used to produce the geospatial output datasets is available in Supplementary Information 2. For each raw dataset, we included a short description, the spatial and temporal dimensions, the format, language, data source, and date of access.
Third, we assembled, harmonized, and geoprocessed the datasets using ESRI's ArcGIS 10.6 and, where appropriate, the Python programming language (version 2.7) and the arcpy package for automating the geoprocessing workflow. Geoprocessing operations varied according to the type of raw dataset, and a detailed description of these operations is available in the metadata for each geographic layer. In general, we tried to synthesize scattered state and national raw data into one data layer. Where necessary, we harmonized the information in the datasets (variables, values, units of measurement) and translated Spanish information to English. Geospatial operations included the projection of the final datasets to the most appropriate projected coordinate system that minimized areal distortion for the whole RGB (North America Albers Equal Area Conic), and clipping datasets to the RGB boundary. Instead of using the hydrological basin boundary, we used the spatial boundaries of the Rio Grande/ Río Bravo socio-environmental system as delineated by Koch et al. 27 . For all political jurisdictional boundary information (national, state, or county), given the spatial mismatch with the catchment boundary, we prioritized the use of administrative boundaries over catchment boundaries in locations where they overlap.
www.nature.com/scientificdata www.nature.com/scientificdata/ Fourth, we stored the resulting 145 geographic layers, including 125 vector data and 20 raster data (GRID), in a geodatabase with a total size of 1. 40  Metadata. For each individual data layer, we produced a standardized metadata record, following the Federal Geographic Data Committee Content Standard for Digital Geospatial Metadata (FGDC CSDGM). The standard includes seven types of information: Identification Information (basic information about the dataset such as the citation, description, spatial domain, access constraints, etc.), Data Quality Information, Spatial Data Organization Information, Spatial Reference Information (Geographic Coordinate System Definition), Entity and Attribute Information, Distribution Information, and the Metadata Reference Information. The metadata www.nature.com/scientificdata www.nature.com/scientificdata/ records have been created through the ArcMap graphical user interface and are automatically included with the hosted layer item. The user can export the metadata in XML format through the ArcMap graphical user interface if needed.

Data Records
The socio-environmental geodatabase is freely and publicly available on the OSF repository 28 . The collection of datasets can be downloaded as a single file (.gdb) or as individual geospatial output layers (shapefiles or raster files). Here, we describe the geospatial datasets available for the five themes Water & Land Governance, Hydrology, Water Use & Hydraulic Infrastructures, Socio-Economics, and Biophysical Environment. Online-only Table 1 and Tables 1-4 provide a systematic overview of the individual layers, i.e., the name, description, format and data sources for each theme. A comprehensive list of acronyms is included in Supplementary Information 3.

Water & Land Governance (GOV). The Water & Land Governance category is divided into 12
sub-categories, representing political jurisdiction boundaries, surface water management agencies (binational, federal, state), state and interstate multi-stakeholder platforms, groundwater-focused institutions, irrigation and conservation organizations, land management, and border control. A description of each entity with their missions is provided in Supplementary Information 1.
Political jurisdiction boundaries. The geodatabase provides the boundaries of four political jurisdiction divisions: nation, state, county/municipio, and places (cities, towns and villages). The geodatabase also provides a point dataset of the most populated places in both countries, which includes the main U.S.-MX sister cities along the border.
Binational surface water management agencies. The geodatabase gathers 13 vector datasets of binational infrastructure that spatializes the joint operations of the U.S. International Boundary and Water Commission (IBWC) and its Mexican counterpart the Comisión Internacional de Límites y Aguas (CILA), regarding national allocation of water, dam and reservoir operations, river course management (the banks, the berms, the vegetation), water quality, sanitation, and flood control in the border region. State and intrastate surface water management agencies. This sub-category depicts the boundaries of the Offices of the State Engineer (OSE) and their regional offices (divisions and districts) in charge of the administration of water rights and interstate compacts in the U.S. For Mexico, we generated spatial layers representing state-level offices of the federal agency CONAGUA. We also included entities that engage in planning, coordination, and oversight of domestic water supply and wastewater treatment state-wide: in Chihuahua, the Junta Central de Agua y Saneamiento (JCAS); in Coahuila, the Comisión Estatal de Aguas y Saneamiento (CEAS); in Durango, the Comisión de Agua del Estado de Durango (CAED); in Nuevo León, the Servicios de Agua y Drenaje de Monterrey (SADM); and in Tamaulipas, the Comisión Estatal del Agua en Tamaulipas (CEAT).
State and inter-state multi-stakeholder platforms. The geodatabase includes five platforms: • The Rio Grande Compact Commission -U.S. inter-state platform -monitors water flows and administers water-sharing among the states of Colorado, New Mexico, and Texas. We mapped its domain through three spatial layers: one representing the spatial scope of the Interstate Compacts (including the Rio Grande Compact), one identifying the gauges used to monitor and account for states' water delivery obligations to each other, and one identifying the "post-Compact" reservoirs used to manage states' water delivery obligations.  Table 1. List of geospatial layers for the Hydrology category including their description, format and input data sources.
• The San Juan Chama Project (SJCP) -U.S. inter-state platform -moves water through a trans-mountain diversion from the Colorado River basin in the state of Colorado (San Juan River) to municipal and irrigation users in the upper and middle Rio Grande valley of New Mexico. We mapped this project with three vector datasets: one for the trans-mountain diversion, one for the reservoirs where imported water is stored and transferred, and one displaying the users of that water (counties, municipalities, irrigation districts, tribe). • The Rio Grande Project -binational and inter-state platform -supplies water for irrigation from the Elephant Butte Reservoir to several U.S. and Mexican irrigation districts. Its domain is represented through two vector datasets: one representing the dams used for storage (Elephant Butte, Caballo) plus the diversion dams for distributing water in the U.S. and Mexico (Percha, Leasburg, Mesilla, American, Riverside -which is inactive, International); and one representing the irrigation districts supplied by the Rio Grande Project (EBID, EPWID#1 and DR 009 Valle de Juarez). • The Consejo de Cuenca del Río Bravo (Rio Bravo Watershed Council) -Mexican inter-state platform -is a multi-state entity that develops and implements a basin management plan for the CONAGUA Hydrological-Administrative region VI in Mexico. We mapped its domain using its administrative boundary. • The Rio Grande Basin Roundtable in southern Colorado -intra-state platform -regroups various stakeholders such as counties, municipalities, and conservancy districts to identify consensual strategies that meet competing needs in the region, and to provide support and funding for basin projects 29 . The geodatabase maps the boundary of its domain of action.
Groundwater-focused institutions. This dataset maps the local entities and related infrastructures associated with the administration, management, monitoring and conservation of groundwater in the U.S. states of Colorado, New Mexico, and Texas.
• In Colorado, we mapped three levels of groundwater management: the Rio Grande Water Conservation District (RGWCD) that manages groundwater at the regional level (San Luis Valley); the Groundwater Management Subdistricts that manage groundwater at the local level (subdistrict); and the high capacity wells with augmentation/replacement plans. For the regional level, we provided the spatial domain of the RGWCD, the location of unconfined and confined wells that the RGWCD monitors, and the boundaries of the Closed Basin, an unconfined groundwater salvage project maintained by RGWCD that transfers water from the aquifer to the surface water basin.
• The state of New Mexico is divided into 39 declared groundwater basins where the OSE assumes jurisdiction over the appropriation and use of groundwater. We mapped the 18 basins overlapping the RGB. • In Texas, we mapped the domains of the Groundwater Management Areas (GMA), the Groundwater Conservation Districts (GCD) in charge of the implementation of groundwater management plans within the GMA, and the Priority Groundwater Management Areas (PGMA) located in areas with critical groundwater problems.
In Mexico, except for the groundwater permits (Títulos y permisos de aguas nacionales) available in the section Water Use & Hydraulic Infrastructures > Water Rights/Permits, we did not find any comparable administrative or governance mechanism explicitly set up on a spatial basis to manage groundwater per se.  www.nature.com/scientificdata www.nature.com/scientificdata/ At the transboundary level, the U.S. and Mexico also established a binational agreement, the Transboundary Aquifer Assessment Program (TAAP), to strengthen collaborations among Mexican and U.S. institutions to jointly assess priority shared aquifers along the U.S.-Mexico border 30 . Two of the four transboundary priority aquifers underlay the RGB: Mesilla/Conejos-Médanos and Hueco Bolson.
Irrigation distribution organizations. This sub-category maps the boundaries of the organizations that divert and coordinate surface water delivery to irrigators, and ensure maintenance of the irrigation conveyance systems, in both the U.S. and Mexico. Our dataset provides the name and the area managed by the organizations, in addition to other specific information for each state and country. We sought to gather the spatial boundaries of three organizations: the irrigation districts (Distritos de Riego in Mexico), ditch companies, and community ditch associations, an umbrella term under which we include the acequia systems of Colorado and New Mexico, and Unidades de Riego (Irrigation Units) across the states of Mexico. However, spatial datasets of ditch companies and community ditch associations were unavailable for New Mexico and Mexico. Border control. This sub-category maps three key components of border control along the international reach of the Rio Grande/Río Bravo: the U.S. Southwest Border Patrol Sectors associated with the migration apprehension statistics; the border crossing ports of entry; and vehicle and pedestrian fences/barriers. U.S. border control is very active and has important implications for the management of riverbank and riparian areas. However, this research to-date has not covered policies or practices by Mexican border-related agencies that affect water management or flow.

Soil and water conservation districts (included only for the U.S.).
The vector dataset provided in this sub-category map the outlines of the Soil and Water Conservation Districts (SCWDs) in Colorado, New Mexico, and Texas. SCWDs foster voluntary conservation practices among private and public landowners, helping to manage and protect soil, water, forests, and wildlife at the local level. The SWCDs are a specific kind of district, with a mandate that is particular to the legal and institutional history of land and water management in the U.S. Our research to-date has not found anything equivalent to this formation in Mexico, and even less so, any spatial datasets for it.
Collaborative conservation projects. This sub-category includes conservation projects where members of multiple sectors and/or institutions actively work in collaboration to identify, protect, manage, monitor, and/or develop policy for the conservation of specific sites. Our research to-date has not extended to documentation of all such projects in either the U.S. or Mexico, and did not uncover spatial data sets that document their location. Our datasets map three collaborative conservation projects. Two of them, the Landscape Conservation Cooperatives (LCCs) and the North America Bird Conservation Joint Ventures (BJVs), are transboundary. For both, we mapped the boundaries of the four LCCs and the five BJVs that span the RGB. While the U.S. Fish and  www.nature.com/scientificdata www.nature.com/scientificdata/ Wildlife Service recently discontinued its funding for the LCCs network, we decided to include these cooperatives in our dataset since the LCCs generated highly relevant information for the RGB in the past. The third project refers to the Instream Flow Program in Colorado. For this project, we mapped the streams and lakes affected by the Instream Flow Program.
Hydrology (HYD). The Hydrology category includes five surface water and groundwater related sub-categories. River networks. This dataset of vectorized river reaches, derived from the WWF HydroSHEDS products, was also accessed from the FAO geoportal.
Aquifer boundaries. We mapped the aquifer boundaries by assembling two national geospatial datasets compiled by the National System of Water Information of CONAGUA and the USGS. Water quality stations. This dataset provides the location of water quality stations monitored by IBWC and TCEQ for the U.S. portion of the RGB.

Water Use & Hydraulic Infrastructures (USE). Water Use & Hydraulic Infrastructures datasets provide information related to consumption and diversion of water.
Withdrawals. This dataset provides estimates in million cubic meters per year of water used at the U.S. county and MX municipio level. For both countries, estimates are available by type of source (surface or groundwater) and by sector of use (e.g., public supply, domestic, industrial, irrigation, thermoelectricity). For the U.S., the data set also breaks down estimates by fresh water and saline water.
Irrigation. We provide two datasets (one per country), which reports on a county/municipio basis the irrigated area estimated in hectares and the spatial coverage of different systems of irrigation (surface, sprinkler, micro-irrigation and other), estimated in hectares for U.S. and in number of farms for Mexico. For the U.S., we used the original classes (surface, sprinkler, micro-irrigation) and for Mexico, we clustered as follows: lined and earthen canals in surface; sprinkler and micro-sprinkler in sprinkler; and drip in micro-irrigation.  Table 4. List of provided geospatial layers for the Biophysical Environment category including their description, format and input data sources.
www.nature.com/scientificdata www.nature.com/scientificdata/ Water rights/permits. This sub-category displays the allocation of surface water rights and groundwater rights at the state level for U.S. and Mexico. A water right is a property right that conveys the right to use a certain amount of water. These are administered by the Offices of the State Engineer in the U.S. and by CONAGUA in Mexico. The datasets report the geographic coordinates of the water right holder, and when available, the amount of water right authorized (in acre feet in the U.S. and in cubic meters in Mexico), the name of the water right holder, the adjudication date, the rank of priority (i.e., the water right's seniority in a water drainage), or the type of use.
Dams. We provide two vector datasets. The first one, derived from the National Inventory of Dams, maps all dams in the U.S. exceeding a certain height and capacity of storage. The second one maps all dams recorded by the CENAPRED in Mexico.
Water diversion structures. The geodatabase provides three vector datasets of water diversion structures (ditch/ canal, pipeline, underground conduit and connector). The first one covers the five MX states and is derived from a Mexican national data source. The second one covers the three U.S. states and is derived from a U.S. national data source. The third one covers Colorado with more details and is derived from a state data source.
Wells (included only for the U.S.). We compiled four datasets of well locations. One is derived from the National Hydrography Dataset (NHD) for the whole U.S. portion of the RGB; the three other ones were compiled by the OSE for each U.S. state and provides more details. Our research to-date has not uncovered data for wells in Mexico.  www.nature.com/scientificdata www.nature.com/scientificdata/ Land use. Due to a lack of consistent transboundary datasets, we gathered national datasets to map land-use in the RGB. For the U.S., we used the yearly USDA Cropland Data Layer, a high-resolution crop-specific land-use image, available from 2008 to 2018 at a 30-meter resolution. For Mexico, we derived our datasets from two governmental datasets. The first one is a spatial dataset of land uses for 2011 and 2014 at a 30-meter resolution, with a coarse classification of crops (rain-fed vs irrigated crops). The second one, non-spatial, is derived from the 2007 Census and provides on a municipio basis planted area estimates (in hectares) of annual spring and winter crops and perennial crops.

Socio-Economics (SE). This category includes
Data gaps. Although we developed an extensive database, it is not comprehensive for either the U.S. or Mexico. Our approach led us to identify several spatial data gaps for socio-environmental research in the RGB.
Yearly crop-specific datasets for Mexico. Supplying yearly crop-specific time-series datasets at a 30-m resolution will be helpful for comparing cropland dynamics in the U.S. and Mexico.

Boundaries of community ditch and acequia associations (in NM) and Unidades de Riego (MX).
To the authors' knowledge, these spatial boundaries are not publicly available. Mapping them is important, because of their water management implications. The NMOSE is currently working on updating and assembling this information for New Mexico 31 .
Land ownership. For Mexico, we did not find any spatial datasets delineating public lands, except the Certified Ejidos and Communal Lands. Generating a dataset that differentiates, in more detail, public from private lands in Mexico would be valuable for land and water management research.
Groundwater wells (MX). We were not able to access a spatial dataset of wells for Mexico. Mapping well locations is of primary interest for surface and groundwater management in the basin.
Binational infrastructures. Some information on binational infrastructure is missing, such as the Morillo drain (described in Supplementary Information 1).
Climate records are also critical for understanding hydrological processes in the basin. While not included in our geodatabase, Livneh and others 11 published a transboundary historic dataset that derives observed daily precipitation, minimum and maximum temperature gridded to a 1/16° (~6 km) resolution for the time period 1950-2013. www.nature.com/scientificdata www.nature.com/scientificdata/

technical Validation
To ensure the reliability of the spatial database, we draw on a critical assessment of three factors: the quality of the source of information, transboundary discontinuities, and geoprocessing.
Regarding the sources of information, we sought to download most of the raw datasets from official sources such as intergovernmental organizations and governmental agencies (e.g., FAO, USIBWC, USGS, USDA,  CONAGUA, INEGI, CONABIO, CONANP, CODWR, NMOSE, TCEQ) or from joint projects overseen by reputable universities, research institutes, and non-governmental organizations (e.g., from the NASA Socioeconomic Data and Applications Center, the North American Environmental Atlas). We compiled a few datasets (around 6%) from private entities (associations, districts) as they were the only source of the desired information. For example, this is the case for the border fences. We downloaded the spatial data from Reveal from The Center for Investigative Reporting and OpenStreetMap contributors (https://github.com/cirlabs/border_fence_map) who traced the fences along the border by using open-source mapping tools and by digitizing detailed PDF maps obtained through a Freedom of Information Act request 32 . The creators of the data set acknowledge slight differences with the official government agency numbers. However, this is the most detailed border fence map publicly available. We also gathered four datasets -that were not downloadable online -by email requests from three organizations. The RGWCD provided the boundaries of the RGWCD and the Groundwater Management Subdistricts #1, #2, #3, #4, #5, #6 in Colorado. The CSCB (Colorado State Conservation Board) and the TSSWCB (Texas State Soil and Water Conservation Board) shared the boundaries of the SWCDs of Colorado and Texas, respectively. Although these datasets were not gathered from governmental sources, we believe that these datasets are the best available and we rely on the accuracy of data collection of the three organizations.
Whenever possible, we used global or bi-national input datasets to avoid inconsistencies related to differences in resolution, type of information, or geoprocessing methods among counties and states. Around 11.5% of our raw datasets have a global or bi-national spatial extent, and therefore cross international and state boundaries. Nevertheless, a significant amount of the raw datasets we gathered were nation-wide (70%) or state-wide (18.5%), which could lead to some of the inconsistencies previously mentioned. All information gathered from Mexican sources was translated from Spanish to English and verified by a bilingual speaker.
Regarding the geoprocessing operations, for all datasets, we used several tools and functions available on ArcGIS, such as Project, Clip, Dissolve, Slope, Join, Merge and we rely on the accuracy of this software to create the final outputs. A series of quality checks has been carried out to catch as many potential coding errors as possible. Attributes were checked by using visual inspection. For quality control, transparency, and reproducibility, processing steps are thoroughly detailed in the metadata of each dataset.

Usage Notes
This spatial database offers a wide range of applications for transboundary and interdisciplinary research. We provide three examples drawing from our own research.
1. A broad range of mapping products can be created for the whole RGB or sub-sections, such as a map of the main irrigation districts and dams in the RGB (Fig. 2), a map of land managers and protected areas (Fig. 3), and a comparative map of water use and population on a county/municipio basis (Fig. 4). 2. The geodatabase can support spatially explicit and integrated modelling for the RGB 27 . We are currently developing a simulation model for the RGB that draws on water and land governance datasets (political jurisdiction boundaries, land management and protected areas, irrigation district outlines), as well as hydrologic and biophysical environment datasets (river network, elevation, water bodies, gauges, land-use), socio-economic (population), and water use and hydraulic infrastructures (dams). 3. The geodatabase can be used to conduct spatiotemporal and geostatistical analyses. Examples of interdisciplinary topics include: (i) evaluating whether water use correlates with environmental variables (ecoregions, land use), socio-economic factors (population), or governance (e.g., related to land ownership, irrigation district, or state); and (ii) assessing land-use change at multiple scales (local, state, national). For example, we are currently analysing the spatial patterns of land fallowing in the basin.
Users of the compiled datasets should cite this data paper following the recommended citation format of the journal. Furthermore, we encourage users to refer to the sections "Credits" and "Use limitations" in the metadata of each layer file, as additional data constraints and required credits of the original data sources may apply.
In conclusion, to achieve more sustainable transboundary water management, it is necessary to approach problems from a boundary-spanning and interdisciplinary perspective. By creating a basin-wide, socio-environmental geodatabase for the RGB, we address this issue and support further interdisciplinary research in the basin. The main novelty of our approach was to include geospatial datasets mapping public and private water governance, which contributes to better understanding of scale interactions and mismatches between ecological and social (decision-making) boundaries 33 . Moreover, we were able to identify data gaps, especially for the Mexican side of the RGB, and therefore the need for further research to reduce data imbalance between the two sides of this transboundary basin. Even though our study is limited to the socio-environmental boundaries of the RGB system and the typology of actors is specific to the shared basins between U.S. and Mexico, we consider that our approach to assembling a wide range of data is transferable to other shared river basins to serve transboundary and cross-disciplinary collaboration and research.