## Background & Summary

Riverine floodplains are vital and productive ecosystems that provide essential biological, geomorphic, and hydrologic functions1,2. Services provided by floodplains – including regulation of disturbances (e.g., flood attenuation), water supply, and waste treatment – are valued at approximately US$1.5 × 1012 yr-1 globally (in 2007 US$)3. Yet floodplains are continually threatened by human development and encroachment, including loss of floodplain-river connectivity due to channelization and levee construction4, which exacerbates habitat loss5 and hydrologic alteration6.

Human modifications to floodplains include changes in land use from activities such as urbanization, agriculture, industry, and mining7. For instance, approximately 80–90% of floodplains across Europe have been intensively cultivated8, and 90% of floodplains in North America are non-functional due to cultivation9. New developments in floodplains expose an increased population in the United States to flooding10, and even a 1% chance of flooding can cause losses exceeding \$78 billion per year in the US11.

Flood-risk management efforts of the previous century have focused on minimizing flood impacts on humans through large and expensive infrastructure projects12 at the expense of floodplain ecosystem health and resilience. However, programs such as floodplain buyouts and conservation can produce co-benefits for economies and floodplain ecosystems13. Yet they require a comprehensive understanding of the history of floodplain changes along the full river continuum to ensure sustainable and effective floodplain and flood-risk management14,15.

Despite the human-induced changes in floodplains over the past century, comprehensive data of long-term land use change within floodplains of large river basins are limited16. A recent large-scale study assessed floodplain conditions across England from 1900 to 2015 using land use data17. Others have focused on changes across smaller geographic expanses and shorter time scales, such as floodplain losses in Dhaka, Bangladesh18 and Kumasi, Ghana19. No studies, to our knowledge, have integrated long-term ( > 30 year) data to examine changes in floodplain land use across a large river basin. Data of long-term and large-scale floodplain land use are required (1) to effectively quantify floodplain functions and development trajectories, and (2) for a holistic perspective on the future of floodplain management and restoration and concomitantly flood-risk mitigation.

Here, we present the first available dataset to our knowledge that quantifies land use change along the floodplains of the Mississippi River Basin (MRB) covering 60 years (1941–2000) at 250-m resolution. The MRB is the fourth largest river basin in the world (3,288,000 km2) comprising 41% of the United States and draining into the Gulf of Mexico, an area with an annually expanding and contracting hypoxic zone resulting from basin-wide over-enrichment of nutrients20. The basin represents one of the most engineered systems in the world, and includes a complex web of dams, levees, floodplains, and dikes. This new dataset reveals the heterogenous spatial extent of land use transitions in MRB floodplains. The floodplains have transitioned from natural ecosystems to predominantly agricultural land use (e.g., more than 10,000 km2 of wetland loss due to agricultural expansion; Fig. 1). Developed land use within floodplain has also steadily increased (Fig. 2). These irreversible transitions in floodplain composition reduce storage21 and conveyance22 of natural flow, amplify flood risks posed by climate change23,24, and hinder both ecosystems and human well-being25.

To maximize the reuse of this dataset, we also include four unique products: (i) a Google Earth Engine interactive interface mapping MRB floodplain land use change over 60 years, (ii) a Google-based Python code that runs in any internet browser, (iii) an online tutorial with visualizations facilitating classroom application of the code, and (iv) an instructional video showing how to run the code and partially reproduce the dataset. We share all data through HydroShare: https://doi.org/10.4211/hs.41a3a9a9d8e54cc68f131b9a9c6c8c54.

The 60 years of spatially explicit floodplain land use change data produced herein are usable for flood-risk and nutrient management and research across the 31 US states that drain the MRB. A recent strategic plan of the Upper Mississippi River Restoration partnership, representing 0.5 million km2 of the MRB, envisions “a healthier and more resilient Upper Mississippi River ecosystem that sustains the river’s multiple use”26. This data will help achieve these goals by providing foundational information for data-driven decision making on floodplain restoration, buyouts, and conservation. Importantly, the data and associated materials can be the template for developing similar datasets for other river basins across the globe.

## Methods

### Input Data Sources

We derived the 60-year MRB floodplain land use change dataset from two input data sources: (i) the high-resolution global floodplain extent dataset GFPLAIN250m developed by Nardi et al.27, and (ii) the annual continental United States land use data developed by USGS28,29. GFPLAIN250m is based on a geomorphic analysis of the Digital Terrain Model (DTM) identifying riparian areas underlying maximum flood levels. The GFPLAIN algorithm estimates distributed flood energy gradients, at the river basin scale, with a simplified hydrologic model that assigns every channel cell a maximum flood depth using the drainage area as a scaling variable30,31. Conceptually, this algorithm dissects floodplains from surrounding hillslopes as those low-lying landscape features that have been naturally shaped by accumulated geomorphic effects of past flood events. Therefore, the GFPLAIN250m dataset is built on the principle that a flood-prone area is implicitly contained in the DTM, decoupling the floodplain extent zoning from the need to preliminarily define a design flood event. The outcome is a DTM-based morphometric indexing of floodplain domains rather than a simulation associated to a specific return period, e.g., 100-year32. The dataset is publicly available at 250-m spatial resolution gridded GeoTIFF format, with coordinates set by World Geodetic System 1984 (WGS84).

The USGS land use data is based on a spatially explicit modeling framework which reconstructed a temporally continuous land use from widely acknowledged baselines including the National Land Cover Databases (NLCD)33,34,35, more than 100 years of agricultural census information36, and three decades of representative satellite imagery37. The dataset is publicly available at 250-m spatial resolution in gridded GeoTIFF format, with coordinates set by USA Contiguous Albers Equal Area Conic USGS version projected system. This land use dataset is divided into two parts: a 14-class historical land use for each year from 1938 to 199228 and a 17-class recent land use for each year from 1992 to 200529. We included the land use from 1941 to 2000 in our approach to develop the 60-year MRB floodplain land use change dataset.

### Procedure

Our methodology followed six consecutive steps (Fig. 3): (i) reprojection of floodplain and land use data to a consistent coordinate system, (ii) land use reclassification, (iii) extraction of floodplain land use, (iv) land use change detection, (v) formation of inter-class land transition matrix, and (vi) technical validation. These steps are discussed in detail below. All the associated tasks were performed in ArcGIS 10.5 and ENVI 5.1 geospatial analysis platforms.

1. (i)

Reprojection of coordinate systems: The two primary inputs used in our approach, i.e., the global floodplain and annual USGS land use, were originally developed in two different coordinate systems. Because non-identical coordinate systems across corresponding datasets induce positional error in floodplain geospatial analysis especially across large river basins38,39, we reprojected the coordinate system of the global floodplain to that of the USGS land use such that the two datasets become interoperable.

2. (ii)

Land use reclassification: Commonly used land use datasets follow classification schemes (i.e., categorizing the intended purpose of a landscape parcel40) with multiple levels of nested hierarchy41. While a detailed land use classification offers critical insights to environmental monitoring and restoration research42,43, the large semantic variability of land use classifications across disciplines often complicates their practical applications44. To allow easy integration of our land use change dataset with cross-disciplinary research and decision-making tools, we simplified the original classification scheme of the USGS land use data to produce seven generic classes. The new land use classes included: 1) open water, 2) developed area, 3) barren land, 4) forest, 5) grassland, 6) agriculture, and 7) wetland.

3. (iii)

Extraction of 60-year floodplain land use: We used the MRB boundary polygon as a mask and clipped the MRB portion of the global floodplain (hereafter, the MRB floodplain). Following the same approach in a subsequent step, we used the MRB floodplain as a mask on the USGS land use and extracted a series of floodplain land use maps for each of the years from 1941 to 2000. We then calculated the areal extents of different land use classes by multiplying their corresponding total number of grid-cells with the spatial resolution of a single grid-cell (250*250 m2), thus creating 60-year time-series of floodplain land use in the MRB (e.g., Fig. 2 and Supplementary Fig. 1). Since the USGS land use dataset was developed only for the continental United States, the small upstream portion of the MRB that drains two Canadian provinces (~1% of the basin’s drainage area) was excluded from our analysis.

4. (iv)

Land use change detection between two end-years (1941 and 2000): We applied a statistical approach to detect the difference/non-uniqueness in land use between the two end-years of comparison (i.e., 1941 and 2000). Specifically, we calculated the number of unique grid-cell values between the two land use maps on a cell-by-cell basis. The outcome was a new map with only two possible values in the grid-cells. Value “1” indicated one unique value of a target grid-cell across the two input land use maps, meaning “no change” of land use between two points in time. Conversely, value “2” indicated that a target grid-cell had two non-unique values and hence a “change” of land use between two points in time.

5. (v)

Formation of inter-class land transition matrix: To demonstrate the “nature of change”45 in the MRB floodplains, we quantified how the land use therein transitioned from one class to the other(s) between two end-years (1941 and 2000). We conducted this task using a widely acknowledged approach called Transition Matrix Analysis33,46,47,48,49,50,51,52,53. Table 1 schematically shows the resultant transition matrix across the seven land use classes of the MRB floodplains. Here, T1 and T2 respectively indicate the two end-years of comparison, while Aij is the areal extent [L2] that transitioned from class i at the initial year to class j at the final year. The last row in the transition matrix represents the net gain or loss of areal extent between T1 and T2 in every land use class. The inter-class land use transitions in the MRB floodplains between 1941 and 2000 are graphically presented in Fig. 4, while the corresponding calculations are provided in Supplementary Table 1.

## Data Reco rds

The MRB floodplain land use change dataset is made available through an open-access geospatial data sharing platform HydroShare. Our archive also includes all corresponding input data, intermediate calculations, and supporting information. Tables 2 and 3 below provide an overview of the file contents. The entire archive can be downloaded as a single zip file from this web address: https://doi.org/10.4211/hs.41a3a9a9d8e54cc68f131b9a9c6c8c5454.

## Technical Validation

To ensure the technical quality and reliability of our MRB floodplain land use change dataset, we validated both GFPLAIN250m floodplain and USGS land use datasets with respect to the best available references.

Although the GFPLAIN250m floodplain27 has been previously validated across different scales55, we further compared it to the Global Flood Maps (GFM)56 across the MRB Hydrologic Unit Codes (HUCs). We used critical success index (CSI) and true positive rate (TP rate) metrics to confirm the spatial consistency between GFPLAIN250m and GFM floodplain extents (Fig. 5). When compared to GFM, GFPLAIN tends to produce larger floodplain delineations in the headwater regions of the MRB (Fig. 5a). This was also evident in Fig. 5b,which showed lower CSI values in the headwater HUCs. Values of CSI were within acceptable ranges demonstrated by previous studies27,57, however. TP rates (Fig. 5c) were greater than 0.8 for most of the HUCs, indicating a high overlap between GFPLAIN250m and GFM datasets. These assessments demonstrate the applicability of using the GFPLAIN250m data to identify continuous floodplains.

We validated the USGS land use using the European Space Agency’s (ESA) Climate Change Initiative (CCI) data. The CCI data include a time-series of consistent global land use maps at 300-m spatial resolution on an annual basis from 1992 to 201958. These global land use maps were derived from multiple satellite data sources, including Envisat Medium Resolution Imaging Spectrometer (MERIS) (2003–2012), Advanced Very High Resolution Radiometer (AVHRR) (1992–1999), SPOT-VGT (1999–2013), and PROBA-V (2013–2015)59 (hereafter, we refer CCI land use as the remotely sensed land use for simplicity). In contrast, the 250-m spatial resolution USGS land use maps were based on a hindcast modeling (1938–2005), derived from the NLCD, Landsat satellite, and county-level agricultural census. We chose the CCI/remotely sensed dataset as the reference because it was developed by a different agency with different data sources, which ensured an independent validation of our land use change estimates (see Fig. 6).

The USGS land use contains 17 classes (Supplementary Table 2), whereas the remotely sensed land use contains 37 classes (Supplementary Table 3). To make these two datasets comparable, we reclassified them into seven generic classes, including open water, developed area, barren land, forest, grassland, agriculture, and wetland (see Supplementary Tables 2, 3). In addition, we reprojected the CCI coordinates from World Geodetic System 84 (WGS84) to USA Contiguous Albers Equal Area Conic projected system to enable a uniform comparison with the USGS data. After the reclassification and reprojection steps, we selected 12 sites to validate our MRB floodplain land use change dataset. These validation sites were chosen objectively to represent different geophysical settings across the MRB as well as different stream orders, including both major rivers and lower order tributaries. During the common period of data availability between USGS (input) and remotely sensed (reference) datasets, we conducted validations for the 12 selected sites from 1992 to 2000, with a different year randomly assigned to each validation site. It should be noted that the comparison was conducted at an aggregated level for each site. We did not evaluate the cell-by-cell correlations for two reasons. First, the two datasets were developed at two different spatial resolutions (250-m and 300-m). Second, the respective definitions of land classes are not identical across the two datasets, although we reclassified them to bring some degree of consistency. Three validation sites are shown in Fig. 6, and the other nine validation sites are shown in Supplementary Figs. 24.

The validation results for the 12 selected sites show high correlations between the USGS and remotely sensed data across all land use classes, with R2 ranging from 0.90 to 0.99. Agricultural land and grassland appear to show the largest discrepancy between the USGS and remotely sensed data among the seven land use classes, which is largely due to the potential inconsistency in respective land class definition schemes between these two datasets. The original USGS classification of hay/pasture was designated as grassland while in the remotely sensed dataset, there is no hay/pasture class but a cropland with herbaceous cover which was designated as agriculture in our simplified classifications (Supplementary Tables 2, 3). The other five land use classes are highly consistent across all validation sites. Overall, these validation results indicate that our input and accordingly our output datasets are sufficiently reliable.

## Usage Notes

To ensure that the MRB floodplain land use change dataset, relevant inputs, and underlying methodology are Findable, Accessible, Interoperable, and Reproducible (FAIR)60, we developed four software solutions and educational products (Table 4). These products, besides assisting other researchers with the reuse of our dataset, will also foster new research on floodplain resilience by allowing efficient analysis of floodplain land use change in any of the world’s major river basins.