The changing face of floodplains in the Mississippi River Basin detected by a 60-year land use change dataset

Floodplains provide essential ecosystem functions, yet >80% of European and North American floodplains are substantially modified. Despite floodplain changes over the past century, comprehensive, long-term land use change data within large river basin floodplains are limited. Long-term land use data can be used to quantify floodplain functions and provide spatially explicit information for management, restoration, and flood-risk mitigation. We present a comprehensive dataset quantifying floodplain land use change along the 3.3 million km2 Mississippi River Basin (MRB) covering 60 years (1941–2000) at 250-m resolution. We developed four unique products as part of this work, a(n): (i) Google Earth Engine interactive map visualization interface, (ii) Python code that runs in any internet browser, (iii) online tutorial with visualizations facilitating classroom code application, and (iv) instructional video demonstrating code application and database reproduction. Our data show that MRB’s natural floodplain ecosystems have been substantially altered to agricultural and developed land uses. These products will support MRB resilience and sustainability goals by advancing data-driven decision making on floodplain restoration, buyout, and conservation scenarios. Measurement(s) land use process Technology Type(s) Geographic Information System Factor Type(s) floodplain land use change Sample Characteristic - Environment flood plain Sample Characteristic - Location Mississippi River Basin Measurement(s) land use process Technology Type(s) Geographic Information System Factor Type(s) floodplain land use change Sample Characteristic - Environment flood plain Sample Characteristic - Location Mississippi River Basin Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.14804514


Background & Summary
Riverine floodplains are vital and productive ecosystems that provide essential biological, geomorphic, and hydrologic functions 1,2 . Services provided by floodplains -including regulation of disturbances (e.g., flood attenuation), water supply, and waste treatment -are valued at approximately US$1.5 × 10 12 yr -1 globally (in 2007 US$) 3 . Yet floodplains are continually threatened by human development and encroachment, including loss of floodplain-river connectivity due to channelization and levee construction 4 , which exacerbates habitat loss 5 and hydrologic alteration 6 .
Human modifications to floodplains include changes in land use from activities such as urbanization, agriculture, industry, and mining 7 . For instance, approximately 80-90% of floodplains across Europe have been intensively cultivated 8 , and 90% of floodplains in North America are non-functional due to cultivation 9 . New developments in floodplains expose an increased population in the United States to flooding 10 , and even a 1% chance of flooding can cause losses exceeding $78 billion per year in the US 11 .
Flood-risk management efforts of the previous century have focused on minimizing flood impacts on humans through large and expensive infrastructure projects 12 at the expense of floodplain ecosystem health and resilience. However, programs such as floodplain buyouts and conservation can produce co-benefits for economies and floodplain ecosystems 13 . Yet they require a comprehensive understanding of the history of floodplain changes along the full river continuum to ensure sustainable and effective floodplain and flood-risk management 14,15 .
Despite the human-induced changes in floodplains over the past century, comprehensive data of long-term land use change within floodplains of large river basins are limited 16 . A recent large-scale study assessed floodplain conditions across England from 1900 to 2015 using land use data 17 . Others have focused on changes across smaller geographic expanses and shorter time scales, such as floodplain losses in Dhaka, Bangladesh 18 and Kumasi, Ghana 19 . No studies, to our knowledge, have integrated long-term ( > 30 year) data to examine changes in floodplain land use across a large river basin. Data of long-term and large-scale floodplain land use are required (1) to effectively quantify floodplain functions and development trajectories, and (2) for a holistic perspective on the future of floodplain management and restoration and concomitantly flood-risk mitigation.
Here, we present the first available dataset to our knowledge that quantifies land use change along the floodplains of the Mississippi River Basin (MRB) covering 60 years (1941-2000) at 250-m resolution. The MRB is the fourth largest river basin in the world (3,288,000 km 2 ) comprising 41% of the United States and draining into the Gulf of Mexico, an area with an annually expanding and contracting hypoxic zone resulting from basin-wide over-enrichment of nutrients 20 . The basin represents one of the most engineered systems in the world, and includes a complex web of dams, levees, floodplains, and dikes. This new dataset reveals the heterogenous spatial extent of land use transitions in MRB floodplains. The floodplains have transitioned from natural ecosystems to predominantly agricultural land use (e.g., more than 10,000 km 2 of wetland loss due to agricultural expansion; Fig. 1). Developed land use within floodplain has also steadily increased (Fig. 2). These irreversible transitions in floodplain composition reduce storage 21 and conveyance 22 of natural flow, amplify flood risks posed by climate change 23,24 , and hinder both ecosystems and human well-being 25 .
To maximize the reuse of this dataset, we also include four unique products: (i) a Google Earth Engine interactive interface mapping MRB floodplain land use change over 60 years, (ii) a Google-based Python code that runs in any internet browser, (iii) an online tutorial with visualizations facilitating classroom application of the code, and (iv) an instructional video showing how to run the code and partially reproduce the dataset. We share all data through HydroShare: https://doi.org/10.4211/hs.41a3a9a9d8e54cc68f131b9a9c6c8c54. The 60 years of spatially explicit floodplain land use change data produced herein are usable for flood-risk and nutrient management and research across the 31 US states that drain the MRB. A recent strategic plan of the Upper Mississippi River Restoration partnership, representing 0.5 million km 2 of the MRB, envisions "a healthier and more resilient Upper Mississippi River ecosystem that sustains the river's multiple use" 26 . This data will help achieve these goals by providing foundational information for data-driven decision making on floodplain restoration, buyouts, and conservation. Importantly, the data and associated materials can be the template for developing similar datasets for other river basins across the globe.
Methods input Data Sources. We derived the 60-year MRB floodplain land use change dataset from two input data sources: (i) the high-resolution global floodplain extent dataset GFPLAIN250m developed by Nardi et al. 27 , and (ii) the annual continental United States land use data developed by USGS 28,29 . GFPLAIN250m is based on a geomorphic analysis of the Digital Terrain Model (DTM) identifying riparian areas underlying maximum flood levels. The GFPLAIN algorithm estimates distributed flood energy gradients, at the river basin scale, with a simplified hydrologic model that assigns every channel cell a maximum flood depth using the drainage area as a scaling variable 30,31 . Conceptually, this algorithm dissects floodplains from surrounding hillslopes as those low-lying landscape features that have been naturally shaped by accumulated geomorphic effects of past flood events. Therefore, the GFPLAIN250m dataset is built on the principle that a flood-prone area is implicitly contained in the DTM, decoupling the floodplain extent zoning from the need to preliminarily define a design flood event. The outcome is a DTM-based morphometric indexing of floodplain domains rather than a simulation associated to a specific return period, e.g., 100-year 32 . The dataset is publicly available at 250-m spatial resolution gridded GeoTIFF format, with coordinates set by World Geodetic System 1984 (WGS84).
The USGS land use data is based on a spatially explicit modeling framework which reconstructed a temporally continuous land use from widely acknowledged baselines including the National Land Cover Databases (NLCD) [33][34][35] , more than 100 years of agricultural census information 36 , and three decades of representative satellite imagery 37 . The dataset is publicly available at 250-m spatial resolution in gridded GeoTIFF format, with coordinates set by USA Contiguous Albers Equal Area Conic USGS version projected system. This land use dataset is divided into two parts: a 14-class historical land use for each year from 1938 to 1992 28  Procedure. Our methodology followed six consecutive steps ( Fig. 3): (i) reprojection of floodplain and land use data to a consistent coordinate system, (ii) land use reclassification, (iii) extraction of floodplain land use, (iv) land use change detection, (v) formation of inter-class land transition matrix, and (vi) technical validation. These steps are discussed in detail below. All the associated tasks were performed in ArcGIS 10.5 and ENVI 5.1 geospatial analysis platforms.
(i) Reprojection of coordinate systems: The two primary inputs used in our approach, i.e., the global floodplain and annual USGS land use, were originally developed in two different coordinate systems. Because non-identical coordinate systems across corresponding datasets induce positional error in floodplain geospatial analysis especially across large river basins 38,39 , we reprojected the coordinate system of the global floodplain to that of the USGS land use such that the two datasets become interoperable. (ii) Land use reclassification: Commonly used land use datasets follow classification schemes (i.e., categorizing the intended purpose of a landscape parcel 40 ) with multiple levels of nested hierarchy 41 . While a detailed land use classification offers critical insights to environmental monitoring and restoration research 42,43 , the large semantic variability of land use classifications across disciplines often complicates their practical

Data Reco rds
The MRB floodplain land use change dataset is made available through an open-access geospatial data sharing platform HydroShare. Our archive also includes all corresponding input data, intermediate calculations, and supporting information. Tables 2 and 3 below provide an overview of the file contents. The entire archive can be downloaded as a single zip file from this web address: https://doi.org/10.4211/ hs.41a3a9a9d8e54cc68f131b9a9c6c8c54 54 .

Technical Validation
To ensure the technical quality and reliability of our MRB floodplain land use change dataset, we validated both GFPLAIN250m floodplain and USGS land use datasets with respect to the best available references.
Although the GFPLAIN250m floodplain 27 has been previously validated across different scales 55 , we further compared it to the Global Flood Maps (GFM) 56 across the MRB Hydrologic Unit Codes (HUCs). We used critical success index (CSI) and true positive rate (TP rate) metrics to confirm the spatial consistency between GFPLAIN250m and GFM floodplain extents (Fig. 5). When compared to GFM, GFPLAIN tends to produce larger floodplain delineations in the headwater regions of the MRB (Fig. 5a). This was also evident in Fig. 5b,which Area in year T 1 * Total area in T 2  57 , however. TP rates (Fig. 5c) were greater than 0.8 for most of the HUCs, indicating a high overlap between GFPLAIN250m and GFM datasets. These assessments demonstrate the applicability of using the GFPLAIN250m data to identify continuous floodplains.
We validated the USGS land use using the European Space Agency's (ESA) Climate Change Initiative (CCI) data. The CCI data include a time-series of consistent global land use maps at 300-m spatial resolution on an annual basis from 1992 to 2019 58 59 (hereafter, we refer CCI land use as the remotely sensed land use for simplicity). In contrast, the 250-m spatial resolution USGS land use maps were based on a hindcast modeling , derived from the NLCD, Landsat satellite, and county-level agricultural census. We chose the CCI/remotely sensed dataset as the reference because it was developed by a different agency with different data sources, which ensured an independent validation of our land use change estimates (see Fig. 6).
The USGS land use contains 17 classes (Supplementary Table 2), whereas the remotely sensed land use contains 37 classes (Supplementary Table 3). To make these two datasets comparable, we reclassified them into seven generic classes, including open water, developed area, barren land, forest, grassland, agriculture, and wetland (see Supplementary Tables 2, 3). In addition, we reprojected the CCI coordinates from World Geodetic System 84 (WGS84) to USA Contiguous Albers Equal Area Conic projected system to enable a uniform comparison with the USGS data. After the reclassification and reprojection steps, we selected 12 sites to validate our MRB floodplain land use change dataset. These validation sites were chosen objectively to represent different geophysical settings across the MRB as well as different stream orders, including both major rivers and lower order tributaries. During the common period of data availability between USGS (input) and remotely sensed (reference) datasets, we conducted validations for the 12 selected sites from 1992 to 2000, with a different year randomly assigned to each validation site. It should be noted that the comparison was conducted at an aggregated level for each site. We did not evaluate the cell-by-cell correlations for two reasons. First, the two datasets were developed at two  Table 1 Supplementary  Table 3. Output dataset file descriptions. Note: All GIS raster and shapefile datasets are in the Albers Equal Area Conic projected coordinate system. *The raster datasets are in GeoTIFF format at 250-m spatial resolution (except Table 2 item# 4).
www.nature.com/scientificdata www.nature.com/scientificdata/ different spatial resolutions (250-m and 300-m). Second, the respective definitions of land classes are not identical across the two datasets, although we reclassified them to bring some degree of consistency. Three validation sites are shown in Fig. 6, and the other nine validation sites are shown in Supplementary Figs. 2-4.
The validation results for the 12 selected sites show high correlations between the USGS and remotely sensed data across all land use classes, with R 2 ranging from 0.90 to 0.99. Agricultural land and grassland appear to show the largest discrepancy between the USGS and remotely sensed data among the seven land use classes, which is largely due to the potential inconsistency in respective land class definition schemes between these two datasets. The original USGS classification of hay/pasture was designated as grassland while in the remotely sensed dataset, there is no hay/pasture class but a cropland with herbaceous cover which was designated as agriculture in our simplified  Table 3).
2 Semi-automatic coding framework https://colab.research.google.com/drive/1vmIaUCkL66CoTv4rNRIWpJXYXp4TlAKd?usp = sharing A ready-to-use python code that operates entirely in Google's web-based high-performance programming platform called Google Colaboratory.
• Allows users to reproduce the MRB floodplain land use change dataset (up to the item #5 listed in Table 3); users can run the code simply in a web browser without requiring to write any new code or setting up a programming environment in users' local computers.
• Not limited to change detection only; an end-to-end workflow that performs all data discovery, download, and pre-processing tasks in a semi-automatic manner.
• Scalable to the floodplains in any of the world's major river basins. In addition to the specific land use input 28,29 used in our work to generate the MRB floodplain land use change dataset, the current version of our code can also assimilate land use input from two different sources: 30-m National Land Cover Database (United States) 33  www.nature.com/scientificdata www.nature.com/scientificdata/ classifications (Supplementary Tables 2, 3). The other five land use classes are highly consistent across all validation sites. Overall, these validation results indicate that our input and accordingly our output datasets are sufficiently reliable.

Usage Notes
To ensure that the MRB floodplain land use change dataset, relevant inputs, and underlying methodology are Findable, Accessible, Interoperable, and Reproducible (FAIR) 60 , we developed four software solutions and educational products (Table 4). These products, besides assisting other researchers with the reuse of our dataset, will also foster new research on floodplain resilience by allowing efficient analysis of floodplain land use change in any of the world's major river basins.

Code availability
The MRB floodplain land use change dataset is derived entirely through ArcGIS 10.5 and ENVI 5.1 geospatial analysis platforms (see Methods section for details). We developed additional open-access codes and visualization interfaces, however, to promote reproducibility and widespread application of the dataset. The python code is accessible at: https://colab.research.google.com/drive/1vmIaUCkL66CoTv4rNRIWpJXYXp4TlAKd?usp = sharing. The visualization interface is available online at: https://gishub.org/mrb-floodplain. See Usage Notes section for details.  58 , while a 2 , b 2 , c 2 show the land use data obtained from USGS land modeling framework 28,29 . The remotely sensed land use (a 1 − c 1 ) was our reference to validate the spatial consistency of the USGS land use (a 2 − c 2 ; the input land use data in our methodology). Subplots a 3 − c 3 show the correlation between remotely sensed and USGS datasets across different land use classes within the given spatial domains. The generic land use classes include water, developed, barren, forest, grassland, agriculture, and wetland (abbreviated as Wat, Dev, Barr, For, Grass, Ag, and Wet, respectively).