Metabolism Regimes in Regulated Rivers of the Illinois River Basin, USA

Harvey, Judson W.; Choi, Jay; Quion, Katherine

doi:10.1038/s41597-024-03037-1

Download PDF

Data Descriptor
Open access
Published: 15 February 2024

Metabolism Regimes in Regulated Rivers of the Illinois River Basin, USA

Scientific Data volume 11, Article number: 211 (2024) Cite this article

383 Accesses
1 Citations
Metrics details

Subjects

Abstract

Metabolism estimates organic carbon accumulation by primary productivity and removal by respiration. In rivers it is relevant to assessing trophic status and threats to river health such as hypoxia as well as greenhouse gas fluxes. We estimated metabolism in 17 rivers of the Illinois River basin (IRB) for a total of 15,176 days, or an average of 2.5 years per site. Daily estimates of gross primary productivity (GPP), ecosystem respiration (ER), net ecosystem productivity (NEP), and the air-water gas exchange rate constant (K₆₀₀) are reported, along with ancillary data such as river temperature and saturated dissolved oxygen concentration, barometric pressure, and river depth and discharge. Workflows for metabolism estimation and quality assurance are described including a new method for estimating river depth. IRB rivers are dominantly heterotrophic; however, autotrophy was common in river locations coinciding with reported harmful algal blooms (HABs) events. Metabolism of these regulated Midwestern U.S. rivers can help assess the causes and consequences of excessive algal blooms in rivers and their role in river ecological health.

High rates of daytime river metabolism are an underestimated component of carbon cycling

Article Open access 07 November 2022

Phytoplankton gross primary production increases along cascading impoundments in a temperate, low-discharge river: Insights from high frequency water quality monitoring

Article Open access 30 April 2019

River ecosystem metabolism and carbon biogeochemistry in a changing world

Article 18 January 2023

Background & Summary

Aquatic metabolism measures the balance between organic carbon accumulation by primary productivity of algae and other autotrophs and the rate of carbon removal by respiration of autotrophs and heterotrophs such as bacteria. River metabolism is relevant to assessing causes and consequences of eutrophication such as hypoxia, serving as an early warning indicator of changing river functions and health as well as indicating shifts in greenhouse gas emissions^1,2. Here we focused on metabolism of regulated rivers in the Illinois River basin (IRB) where river algal blooms and associated toxins have been reported^3,4,5,6,7. To quantify metabolism, the rate of oxygen production and consumption in the aquatic system is measured over time to estimate gross primary productivity (GPP) and ecosystem respiration (ER). GPP is a positive quantity that estimates the daily growth rate of autotrophs and ER is a negative quantity that estimates the daily rate of organic carbon loss by organism respiration including respiration of autotrophs and respiration associated with microbial decomposition of detrital organic matter. The sum of GPP and ER is the net ecosystem productivity (NEP), which estimates the daily balance between organic carbon build up and depletion in the system by primary productivity and respiration. To use the oxygen balance method to estimate metabolism it is necessary to also quantify the rate of dissolved oxygen exchange with the atmosphere, which depends on water temperature and atmospheric pressure as well as water mixing and turbulence. As methods improve to measure metabolism, the numbers of studies have substantially increased. However, most long-term estimates in flowing waters are confined to small streams and wadable rivers².

For the present study we estimated aquatic metabolism at 17 river sites in the Illinois River basin (IRB)⁸ that encompassed extensive agricultural areas and a major metropolitan area in northeastern Illinois as well as agricultural and suburban areas in northwestern Indiana and in southern Wisconsin that drain to the Illinois River (Fig. 1, Table 1).

Table 1 Site name, U.S. Geological Survey National Water Information System (NWIS) site number, geographic coordinates, presence of lock and dam regulation, and period of data availability for metabolism modelling at the study of 17 IRB river sites.

Full size table

The selected IRB sites represent a variety of river sizes and characteristics, including mainstem sites on the Illinois River as well as several large tributaries and a few smaller streams. The Illinois River is substantially regulated by a series of locks and dams to maintain minimum water levels for navigation through the upper Illinois River as it enters the Des Plaines River tributary and headwaters of the Chicago Area Waterway System (CAWS). Not surprisingly, water quality and ecological conditions are substantially impaired in IRB rivers, including high nutrients and suspended sediments^3,4. Large tributaries of the Illinois River include the Kankakee River which drains large areas of corn and soybean agriculture and has been dredged and straightened to increase its conveyance, and now has significant problems with high turbidity and sedimentation³. The Fox River flows through agricultural areas in southern Wisconsin and then traverses the western edge of the Chicago urban corridor before joining the Illinois River⁵. Dam storage in the Illinois and Fox Rivers maintains significant water depths and lengthens water residence times while also increasing water clarity⁴. Recently, excessive plankton blooms and associated algal toxins have been observed in the Illinois and Fox Rivers^5,6,7.

The type of autotrophs in water bodies (e.g., benthic vs. planktonic algae vs. submerged aquatic vegetation) depends on light availability which is affected by tree and bank shading and water-column light attenuation, disturbance frequency and severity, and other factors^1,2. Benthic algae are usually thought to dominate GPP in streams and small rivers where the river bed is illuminated^1,2. Many benthic algal species are adapted to shading by forest canopies, as well as the high-flow events that scour stream beds and disrupt GPP². Planktonic algae are usually thought to dominate in lakes, reservoirs, and estuaries; however, the expectation for large rivers is less clear⁹. However, unshaded rivers with low or moderate turbidity have the potential for high water-column GPP from phytoplankton growth^8,9.

Phytoplankton and harmful algal blooms (HABs) have increasingly been observed in large rivers and reservoirs of the Midwest and Great Plains areas of the United States such as the Kansas, Ohio, and Mississippi Rivers^10,11,12,13, as well as in the Illinois River^5,6,7 and elsewhere^14,15. Flow extremes are moderated in regulated rivers such as the Ohio, Mississippi, and Illinois Rivers where locks and dams lengthen the water residence time and increase the water clarity in the quiescent river pools between the dams^16,17. Regulated rivers also often have abundant nutrient supply^3,4,5,6 which can support phytoplankton blooms during low flow periods, when water residence time is prolonged, when water is warmer than average, and when turbidity from suspended sediments is often at its lowest^16,17.

Chlorophyll-a (chl-a) is often used as a measure of phytoplankton, however, riverine chl-a can reflect a myriad of algal types and is not distinctly diagnostic of phytoplankton¹⁸. Also, the relationship between chl-a and autotrophic biomass may vary greatly depending on light, nutrients, temperature, and other factors¹⁹. Use of metabolism metrics in rivers can improve understanding of the drivers of river algal blooms²⁰ and can help anticipate future changes in river health^21,22,23. For example, changes in the sign of NEP and in the temporal correlation of GPP and ER can signal changes in the relative importance of phytoplankton versus submerged aquatic vegetation as dominant primary producers in rivers²¹.

Most previous metabolism estimation in rivers was focused on streams and small rivers². To motivate further use of the IRB metabolism data⁸, we plotted long-term average metabolism for 17 IRB river sites (Fig. 2). Like many heterotrophic streams and rivers that process substantial inputs of allochthonous organic matter^1,2,9,23, the metabolism of IRB rivers was generally heterotrophic (Fig. 2).

The overall productivity of IRB rivers (mean GPP = 2.77 g O₂ m⁻² d⁻¹) was representative of the relatively high productivity of a subgroup of 18 high productivity “unshaded and stable flow” rivers evaluated as part of a study of 220 rivers and streams² (Fig. 2). Productivity was generally higher in unshaded and stable flow rivers compared to most other streams and rivers because of greater light availability and because smaller variations of river discharge disturb autotrophs less frequently². Only one of our IRB study rivers (Fox R. with an average GPP of 7.13 g O₂ m⁻² d⁻¹) was a standout in productivity compared to the unshaded and stable flow subgroup. However, nearly all IRB rivers were substantially higher (more negative) in ER (mean ER = −6.05 g O₂ m⁻² d⁻¹) compared with the unshaded and stable flow subgroup from the broader analysis² (Fig. 2).

Our dataset indicates that IRB river metabolism is heterotrophic overall (mean IRB river NEP = −3.28 g O₂ m⁻² d⁻¹), however, IRB rivers were intermittently autotrophic, accounting for between 1 and 56% of the measured days (Table 6 and Fig. 2). At one extreme the Kankakee and Des Plaines Rivers were usually strongly heterotrophic and were only autotrophic on 1% and 5% of days, respectively. At the other extreme the Illinois River and Fox Rivers were autotrophic 33% and 43% of days, respectively. Tributaries were intermediate in their autotrophy ranging between 12% and 23% of days (Table 6 and Fig. 2).

Frequent autotrophy in rivers is an indicator but does not in itself imply phytoplankton production²¹. However, the correspondingly high chlorophyll-a (chl-a) measurements in the Illinois and Fox Rivers⁶ compounded with visual reporting and analytical determinations of planktonic algae^5,7 indicate that phytoplankton blooms are common in the IRB. We encourage further analysis of our IRB river metabolism data set⁸ in the context of water quality^24,25 and river conditions^26,27,28 to better understand the triggers and consequences of riverine planktonic algal blooms, in the IRB and elsewhere.

Methods

Initial site selection for metabolism estimation in IRB rivers was based on the availability of dissolved oxygen data accessed from the U.S. Geological Survey National Water Information System²⁵ (USGS NWIS). We used the USGS | National Water Dashboard link to help identify NWIS site numbers with the needed input data. USGS scalable maps of water-quality data collection sites that are available at that site were consulted. Potential river sites were identified by searching all “stream type” sites including “streams”, “canals”, and “ditches” with at least a year of continuous collection of dissolved oxygen data (i.e., generally 15-minute intervals). Sites were excluded that were obviously not lotic in character, e.g., wetlands, ponds, gravel pits, which resulted in identifying seventeen IRB river sites that were appropriate for modeling long-term metabolism. Selected sites were linked to the National Hydrography Dataset (NHDPlus)²⁶ to take advantage of documented river and catchment attributes.

We used USGS data retrieval software (dataRetrieval)²⁹ to download between one and nine years of data from 17 selected IRB river sites (Table 1) including all continuous (sub-daily) measurements of dissolved oxygen concentration, water temperature, specific conductivity, continuous daily water discharge and gage height (Table 2), as well as downloading infrequently collected channel field measurements (Table 3). Barometric pressure was obtained separately through a request to NOAA³⁰ using site latitude and longitude to select the closest nearby measurement location for each river site. All of the dissolved oxygen (DO) data used in this study were quality assured and approved by the USGS. The DO data are expected to be of high quality because they were collected after 2010, after the use of optical DO sensors had become standard practice. Although it did not apply to our IRB data, recently collected USGS data that is available for download is sometimes provisional and not yet quality assured.

Table 2 List of data sources for metabolism modeling including USGS data obtained using USGS data retrieval software²⁹ and NOAA National Centers for Environmental Information, U.S. Local Climatological Data (LCD)³⁰.

Full size table

Table 3 Parameters calculated from source data for metabolism modeling.

Full size table

To model metabolism we took advantage of recent advancements with state-space models that simultaneously estimate three unknown metabolism variables, GPP, ER, and K₆₀₀^31,32,33. Generally, models converge better and produce physically realistic estimates when GPP > rate of air-water oxygen exchange, a condition that accentuates diel variation in dissolved oxygen concentration and increases the signal-to-noise ratio that aids model identification of the competing influences of GPP, ER, and K₆₀₀. Nevertheless, metabolism estimation remains a challenge because of the potential difficulties in estimating three co-related parameters from a single oxygen time series.

To model metabolism in IRB rivers we used the streamMetabolizer R package (https://github.com/USGS-R/streamMetabolizer), a widely tested and well documented state-space metabolism model³³. This model uses the one-station modeling approach that assumes that sensor data collected at a single point in a river is representative of a well-mixed water column. The accuracy of DO measurements is also important; however, the measurement accuracy has improved substantially since high-quality optical dissolved oxygen sensors began being used routinely (approximately 2005). Furthermore, the model does not quantify anaerobic respiration that is sometimes significant in low-oxygen rivers. In addition to assuming well-mixed conditions, the one-station modeling approach assumes homogenous upstream conditions affecting metabolism for a distance that is assumed to be proportional to v/K where v is stream velocity and K is the gas exchange coefficient.

The governing mass balance equations equate the instantaneous rate of change in DO [O₂] in the river with the sum of the rates of DO inputs and outputs by metabolism and gas exchange³². Expressed as volumetric rates, the mass balance for DO is:

$$\frac{d[{O}_{2}]}{dt}={P}_{t}+{R}_{t}+{D}_{t}$$

(1)

where d[O₂]/dt is the rate of change in water column O₂ [mg O₂ L⁻¹ d⁻¹]; P_t is the instantaneous volumetric rate of oxygen addition by gross primary production [mg O₂ L⁻¹ d⁻¹]; R_t is the instantaneous volumetric rate of oxygen removal by respiration [mg O₂ L⁻¹ d⁻¹]; and D_t is the instantaneous volumetric rate of air-water oxygen exchange [mg O₂ L⁻¹ d⁻¹]. By the definition, P_t should be greater than or equal to zero, R_t should be less than or equal to zero, and gas exchange, D_t, can take either sign. The streamMetabolizer model³³ restructured the oxygen balance expressions by using long-term oxygen times series to estimate daily metabolism variables through the solution of the following equations:

$${P}_{t}={\boldsymbol{GPP}}\times \frac{1}{h}\times \frac{\left({t}_{1}-{t}_{0}\right)\times PPF{D}_{t}}{{\int }_{u={t}_{0}}^{{t}_{1}}PPF{D}_{u}{d}_{u}}$$

(2)

$${R}_{t}={\boldsymbol{ER}}\times \frac{1}{h}$$

(3)

$${D}_{t}={K}_{2,t}\times \left({O}_{sat,t}-{O}_{mod,t}\right)$$

(4)

$${K}_{2,t}={{\boldsymbol{K}}}_{{\bf{600}}}\times {\left(\frac{{S}_{A}+{S}_{B}{T}_{t}+{S}_{C}{T}_{t}^{2}+{S}_{D}{T}_{t}^{3}}{600}\right)}^{-0.5}$$

(5)

where GPP is the daily areal average rate of primary production (g O₂ m⁻² d⁻¹), ER is the daily areal average rate of respiration [g O₂ m⁻² d⁻¹], and K₆₀₀ is the daily average gas exchange rate constant normalized for molecular properties and temperature to a Schmidt number of 600 [day⁻¹]. Variables with subscript t are instantaneous values that are typically estimated from 15-minute interval measurements. The rate of gas exchange, D_t, is the product of the rate constant and the deficit between actual and saturated concentrations of dissolved O₂. Rather than fit actual gas exchange, i.e., the K_2,t value, the model fits K₆₀₀, so that only one standardized gas-exchange-related parameter per day need be reported that still captures and reflects the within-day variation in gas exchange rates caused by diel variation in temperature. Additional variables are h, mean river depth representing the width and upstream length of the reach affecting the oxygen balance [m]; PPFD, photosynthetic photon flux density [μmol photons m⁻² d⁻¹]; O_sat,t, saturated O₂ concentration [mg O₂ L⁻¹]; O_mod,t, model estimated O₂ concentration [mg O₂ L⁻¹]; K_2,t, O₂-specific and temperature specific gas exchange coefficient [day⁻¹]; T_t, water temperature [°C]; and S, Schmidt number coefficients: S_A = 1568, S_B = −86.04, S_C = 2.142, and S_D = −0.0216. The solution approach is described in detail in Appling et al.³³.

River depth estimation

River depth is necessary for metabolism estimation and the accuracy of depth estimation has a directly proportional effect on the estimation accuracy of GPP and ER. An approach previously underutilized for depth estimation in multi-river metabolism studies is using channel field measurements by the U.S. Geological Survey. We used a linear rating curve approach for estimating river depth that was based on USGS field measurements of channel width, channel area, gage height, channel discharge and channel cross-section average velocity. We obtained those field measurements from USGS NWIS²⁵ using the dataRetrieval²⁹ function “readNWISmeas()” that referenced USGS NWIS site number and start and end date, which often returned tens of field measurements for each site during the period of interest.

To use the linear rating curve approach to estimate river depth, the cross-section averaged depth was determined for days with field measurements by dividing the measured flow cross section by the wetted channel width:

$${h}_{fm}={A}_{fm}/{w}_{fm}$$

(6)

where h_fm is the field measured river depth, A_fm is the field measured channel cross-sectional area, and w_fm is the field measured wetted width of the river.

River depth for all model days was estimated from a linear estimation equation:

$$h=m\cdot GH+b$$

(7)

where h and GH are river depth and measured gage height, respectively, and model coefficients m and b for this equation were determined from a linear regression of the field measured river depth against measured gage height on the days of the field measurements.

Usually, we excluded USGS field measurements rated as “poor” from the regression of field measured river depth on gage height. At some sites, however, most of the field measurements, and sometimes all of them, were rated as poor. Nevertheless, if the gaging cross section was representative of upstream conditions, we usually judged that using field measurements to estimate river depth was superior to hydraulic geometry estimation of river depth no matter what the quality rating of the field measurements. The preferred water depth estimation method for each site is noted in Table 7.

We used the linear rating curve estimation approach for estimating river depth at thirteen of the seventeen IRB river sites where the river width at the sensor location was representative of upstream conditions (see details in next section). However, four of the seventeen river sites were located at relatively narrow control sections for which river depth estimates at the sensor location were not representative of upstream conditions. For those sites we used a hydraulic geometry approach³⁴ to estimate cross-section average river depth, h, estimated from hydraulic geometry as:

$${h}_{hgc}=c\cdot {Q}^{f}$$

(8)

where c and f are hydraulic geometry coefficients³⁵ for each of the river reach codes (comID²⁶) associated with our IRB river sites, and Q is continuous discharge at the IRB river site.

Assessing site representativeness of river conditions

The one station method for estimating metabolism depends on the measurement site representing both local and upstream conditions that affect metabolism estimates. A well-mixed water column, both vertically and laterally, is assumed with longitudinal consistency in river physical and biological conditions³⁴. Those assumptions have been examined theoretically³⁶ but are not often tested at field sites. For the present study we assessed the consistency of river width at the oxygen sensor site with river width upstream to evaluate whether the local measured river depth was representative of upstream conditions.

It is not unusual for USGS gaging and sensor measurement cross sections to be located at “control sections” that are narrower than average for the river reach, in which case the field measurements from the cross section may differ from the reach average. Both the average river depth and average velocity could be overestimated in a narrower than average measurement cross section. We consulted the USGS “water-year summary” for each site²⁵ and we visually examined the gaging cross section and upstream conditions using publicly available aerial imagery (https://www.google.com/maps). The sensor location and gaging cross section where depth was measured by USGS field crews was determined from the description provided in the water-year summary²⁵. Using the imagery, we examined the consistency of river width at the measurement site for approximately 10 kilometers upstream of the oxygen measurement site. Because the regulated rivers of the IRB were relatively consistent in width, we could estimate the river depth at most sites using the linear rating curve approach as described in the previous section.

To accurately estimate river metabolism, we also had to be concerned how close the site was to upstream flow regulation structures, e.g., locks and dams, or lakes. If close enough, those features affect dissolved oxygen concentrations in ways that disrupt the river metabolic signals being modeled at the sensor site. Proximity is usually judged by estimating the “metabolism reach length”, i.e., the distance required for substantial turnover of the dissolved oxygen in the water column by gas exchange with the atmosphere. Metabolism reach length was estimated as the river distance required for 80% turnover in river dissolved oxygen by gas exchange³⁴, i.e., the distance where upstream river conditions are likely to influence metabolism calculations. For each day in each river, we estimated the metabolism reach length as:

$${\rm{metabolism}}\;{\rm{reach}}\;{\rm{length}}=-ln\left(1-0.8\right)\frac{v}{{K}_{{O}_{2}}}$$

(9)

where v is the cross-section averaged river velocity in m d⁻¹, and ${K}_{{O}_{2}}$ is the air-water exchange coefficient for oxygen that was calculated from the K₆₀₀ using the measured water temperature and published analysis equations and coefficients³³. Cross-section averaged river velocity was estimated by dividing daily average discharge by the estimated cross-sectional channel area for that day:

$$v=Q/{A}_{fm}$$

(10)

where A_fm is the field measured channel cross-sectional area. A for each modeled day was estimated using a linear estimation equation:

$$A=m\cdot GH+b$$

(11)

where GH is gage height and m and b for this equation are model coefficients determined from a linear regression of the field measured cross-sectional channel area against measured gage height for the days of the field measurements.

To compare the estimated metabolism reach length with field conditions, we measured the distance from the metabolism sensor site to the nearest upstream flow regulation structures, e.g., lock and dam, or lake, by visual inspection of publicly available aerial imagery (https://www.google.com/maps) where we used that product’s measurement tool to estimate the distance from the metabolism sensor site.

Workflow for modeling IRB river metabolism

We used R Statistical Software³⁷ to process existing data to create model inputs, verify model inputs, run the streamMetabolizer model, and post-process and quality assure the results (Fig. 3).

The broad outlines of the workflow are documented in Fig. 3 and Table 4 and briefly summarized here. Running the first script time-matched the downloaded data, converted units, and filled time gaps less than 3 hours by linear interpolation. Running script 2 calculated model input variables such as solar time, saturated dissolved oxygen concentration, river depth, and estimated a proxy for light intensity at the river surface, and produced an output file compatible with the requirements of streamMetabolizer. The script 2 calculations were based on published functions³⁴, except for the new method of estimating river depth discussed in the “River depth estimation” section.

Table 4 Summary documentation of scripts.

Full size table

Running script 3 provided a consistency check with script 1 outputs before running script 4 to run the streamMetabolizer model. Script 5 post processes the model outputs to produce results and model diagnostics where daily metabolism results are flagged based on established criteria³⁴. Also provided are plots for visual evaluation of the results as well as censored versions of metabolism output files that remove results for all days that were flagged. Details are provided in the “Quality assurance” section. Table 4 summarizes script operation in data acquisition, preparation of inputs, running the model, and post-processing outputs to evaluate and quality assure the model results.

Running the metabolism model

We ran streamMetabolizer version 0.12.0 on a laptop using R version 4.1.1³⁷. Computational times varied between 1 and 12 hours per site, with the two IRB sites with more than 5 years of record (Kankakee River at Davis and Illinois River at Florence) needing to be split into approximately 3-year segments to facilitate run completion. We used the streamMetabolizer option for Bayesian partial pooling in our models, which conditions estimates of K₆₀₀ based on the expectation that K₆₀₀ varies as a function of discharge. Appling et al.³³ showed that partial pooling helps improve model performance because, although partial pooling does not impose a strict relationship between K₆₀₀ and discharge, it establishes an across-day, piecewise linear relationship between ln(K₆₀₀) and ln(Q) that helps improve the estimation of GPP, ER, and K₆₀₀. Models were run with the recommended setup using four Monte Carlo Markov Chains and 1000 warmup steps. The streamMetabolizer model calculates values of the Gelman-Rubin statistic for observational error, ${\widehat{{\rm{R}}}}_{{{\rm{\sigma }}}_{obs}}$, process error, ${\widehat{{\rm{R}}}}_{{{\rm{\sigma }}}_{proc}}$, and K₆₀₀ estimation error, ${\widehat{{\rm{R}}}}_{{{\rm{\sigma }}}_{K600}}$, with values ≤ 1.1 used as an initial screening criteria to indicate that model converged adequately^38,39. Many of the IRB models converged on first run, but if unsuccessful, we ran the models again after increasing the number of burn-in steps to 1500. After the model runs were completed, we compiled the results and used the final diagnostic values reported by streamMetabolizer in our quality assurance steps. Also, at several river sites we tested the influence of using the default initial values for GPP, ER, and K₆₀₀ provided in streamMetabolizer by varying initial values by approximately a factor of two and finding that model outcomes were robust.

Quality assurance

Daily model outputs were flagged based on indicators of poor signal to noise strength of the modeled timeseries, and indicators of biologically and physically unrealistic outcomes for GPP, ER, and K₆₀₀. For Flag 1, we compared each day’s coefficient of determination of modeled oxygen, R²_det against a threshold to assess signal to noise strength. For Flag 2 and 3, we assessed biologically unrealistic values of GPP and ER, respectively, following a previous example³⁴ that allowed for slightly negative GPP and slightly positive ER outcomes to reflect error variation. Lastly, for Flag 4 we assessed physically unrealistic values of K₆₀₀ (Table 5).

Table 5 Flagging of daily estimates of GPP, ER, and K₆₀₀ and confidence criteria for overall metabolism outcomes at IRB river sites.

Full size table

Our overall confidence assessments in metabolism outcomes followed Appling et al.³⁴ (Table 5). We assessed the percentage of days that estimated GPP, ER, and K₆₀₀ fell outside biologically or physically realistic thresholds as well as assessing model convergence statistics ($\widehat{{\rm{R}}}$) that could indicate inadequate convergence of parameter estimates. Lastly, we assessed potential interference in metabolism estimation depending on proximity of nearest upstream dam or lake (Table 5).

To evaluate overall confidence in metabolism results for IRB rivers, we ranked each river based on combining the individual rankings for the five criteria [(Table 5)]. A river site’s individual ratings needed to be high for all five metrics for that site’s metabolism overall output to rank as “High” in confidence. A single low rating for any criterion earned a “Low” overall confidence assessment. All other combinations of individual ratings earned a “Medium” overall confidence assessment for a river site’s estimated metabolism (Table 5).

Data Records

Our U.S. Geological Survey data release⁸ (https://doi.org/10.5066/P9TEBOUR) presents long-term aquatic metabolism estimation at 17 river sites in the IRB. The principal outcomes are 15,176 daily estimates of GPP, ER, and K₆₀₀ accompanied by sub-daily input timeseries of dissolved oxygen, temperature, barometric pressure, and river depth and discharge, as well as diagnostic metrics and statistics which we used to assess the quality of model outcomes. Our source data for the IRB (Table 1) had only minimal overlap encompassing a partial record for one site, DES PLAINES RIVER AT JOLIET, IL, with a previous multi-river modeling study⁴⁰.

Metabolism estimates for the Illinois River and Fox River indicate that autotrophic conditions occur between 14 and 56% of days compared to the Kankakee and Des Plaines Rivers, which experienced autotrophy on just a few percent of days (Table 6). Metabolism in the regulated rivers of the IRB can be informative about hydrologic, biogeochemical, and ecosystem health issues in larger rivers managed for navigation. We particularly encourage use of the IRB river metabolism data⁸ by joining with other IRB data sets²⁴ to identify and isolate drivers and develop early warning indicators of planktonic algal blooms in rivers.

Table 6 Time-averaged IRB river discharge, metabolism, and percent of days at each site with autotrophic metabolism, i.e. NEP > 0.

Full size table

Data release file structure

Our data release⁸ provides files documenting metabolism estimation for 17 IRB rivers and the associated workflow. The main landing page of the USGS data release includes the metadata, readme file, and scripts (R code), and from there two child items that can be accessed leading to “Input data” and “Output data” pages, each with additional metadata and downloadable files. The data release can be accessed at https://doi.org/10.5066/P9TEBOUR. The structure of the data release and locations of downloadable files are summarized below:

MAIN PAGE: Metadata File, Readme File, and Scripts

RiverMET_workflow_and_scripts_metadata.xml: Metadata file describing overview of workflow and scripts
RiverMET_readMe.txt: Readme file providing overview of file contents and guidance for running the scripts
RiverMET_Scripts.zip: R code scripts 1 through 5 are provided and can be downloaded with this zip file. For convenience, we list the Script names and note behind each Script the input and output files that are downloadable under Child Item 1 (Inputs) and Child Item 2 (Outputs) as described further below:
- 1_Process-Data.R (note: Script-1 input files not included but output from Script-1 is provided in the form of Script-2 input files)
- 2_Prepare-Model-InputFiles.R (note: Script 2 input files included, see Child Item 1; Script-2 output files also included and are equivalent to Script-3 and Script-4 input files, see Child Item 2)
- 3_Verify-Model-InputFiles.R (note: Script-3 output files not included because this is an optional step for cross checking files)
- 4_Run-streamMetabolizer.R (note: Script-4 output files are not included because they are not useful without first being processed by Script-5)
- 5_PostProcess-ModelOutputs.R (note: Script-5 output files are included, see Child Item 2)

CHILD ITEM 1: Input Files

RiverMET_Input_Files_metadata.xml: Metadata file describing all input data including column headers and data units.
RiverMET_Inputs.zip: Downloadable Script 2 input files with filenames and contents summarized below.

barop.csv – barometric pressure in millibar (mb); 15-minute time series
disch_gage.csv – discharge in m³ s⁻¹, gage height in m; 15 – minute time series
do.csv – dissolved oxygen in mg/L; 15-minute time series
sal.csv – salinity in Practical Salinity Units (PSU); 15-minute time series
temp.csv – water temperature in degrees Celsius (°C); 15-minute time series
hydraulic_coeffs.txt – hydraulic geometry coefficients a, b, c, and f as used in estimation equations for river width, B = aQ^b and river depth, h = cQ^f where Q is river discharge, B is river width, and h is river depth.

CHILD ITEM 2: Output Files

RiverMET_Output_Files_metadata.xml: Metadata file describing all output data including column headers and data units.
RiverMET_Outputs.zip: Downloadable output files in two folders, “outputs_from_script-2” and “outputs_from_script-5”. Script-2 output files are ready for modeling using streamMetabolizer. Script-5 output files are the final metabolism outputs from our study. Output files details are described below:
- RiverMET_Outputs.zip/outputs/outputs_from script-2/: (note: 34 csv files with 17 using hydraulic geometry estimation of river depth and 17 using gage height estimation of river depth; example filename: bayesInput_[date]_depth-hgc_[site_no].csv
- RiverMET_Outputs.zip/outputs/outputs_from_script-5/: (note: “outputs_from_script-5” has two folders, “outputs-A” and “outputs-B”. Each folder has 21 files including 15 site files plus 3 files each for 2 long-record sites. The “outputs-A” filenames follow this example: flagged_GPP_ER_K600_[date]_depth-hgc_[site_no].csv. The “outputs-B” filenames follow this example: censored_ GPP_ER_K600 _[date]_depth-hgc_[site_no].csv.

Technical Validation

There is no universally accepted way to quality assure modeling results. In the IRB we assessed daily metabolism results by flagging values that exceeded thresholds based on biologically or physically unrealistic values or on daily model-fit diagnostics from the streamMetabolizer model (Table 5). Overall confidence in each river site’s model outcomes was assessed using aggregated metrics and statistical diagnostics, e.g., percentages of daily values that were flagged and model convergence statistics (Table 5).

In the IRB an average of 29% of the modeled days had one or more flags. As described in the section on “Data release file structure”, two output versions were produced that can serve various needs. The first output version provides only censored GPP, ER, and K₆₀₀ model estimates of the highest apparent quality after removing all days with flags. However, it is possible that some “useful” data may have been removed in the censoring process. The second output version provides complete results, including results for days with flags, which allows the user to judge each day’s data and allows users to perform custom assessments of the quality of model outcome to meet specific needs.

In terms of overall confidence in model outcomes, thirteen of the seventeen IRB river metabolism timeseries earned an overall high or medium confidence ranking (Table 7). The most frequent criterion causing a low confidence ranking was exceedance of the ${\widehat{{\rm{R}}}}_{{{\rm{\sigma }}}_{K600}}$ statistic threshold of 1.2 indicating problems with model convergence. The four river sites earning a low confidence ranking were FOX RIVER NEAR MCHENRY, IL; ILLINOIS RIVER AT FLORENCE, IL; SUGAR CREEK NEAR CHATHAM, IL; and LICK CREEK NEAR WOODSIDE, IL.

Table 7 Summary of metabolism model confidence assessment for the 17 river sites in IRB.

Full size table

Having approximately three quarters of the IRB river sites (76%) earn a high or medium confidence ranking is only slightly lower performance than a similarly assessed set of rivers modeled by Appling et al.³⁴, where 84% ranked high or medium confidence. The IRB river metabolism results⁸ are therefore quality assured based on application of the best available diagnostic metrics and statistical criteria for models of this type. Nonetheless, it is important to consider that model confidence assessments are only guidance and do not override future investigations of model quality that may be more detailed or judged “fit for purpose”.

Usage Notes

Our data release⁸ provides metabolism outcomes and documents our workflow for modeling metabolism at 17 ILB river sites. Here we summarize descriptive information about the dataset and guidance for its use, including geographic coordinates and period of data availability for each site (Table 1), summary of USGS parameter codes used for downloading (Table 2), information about calculating parameters needed as model inputs (Table 3), an overview of script workflows (Table 4), quality assurance criteria (Table 5), and metabolism outcomes (Table 6) including a model performance assessment (Table 7). In addition, our data release⁸ provides guidance for potential reuse of codes in the file RiverMET_readMe.txt, including suggestions for changes that may be needed to run on a different system, re-run IRB sites with different options, or adapt scripts to model metabolism in other rivers. Users who wish to adapt parts of our workflow will need to acquire publicly available data from USGS and NOAA. They can use existing software (dataRetrieval²⁹) to download the needed USGS data from their sites of interest, including dissolved oxygen, water temperature, specific conductance, discharge, gage height, and field measurements of channel parameters from the USGS NWIS site, and they can obtain barometric pressure data from NOAA. After downloading their own data, users can adapt parts of 1_Process-Data.R to perform the data time matching, gap filling, and unit conversion (Table 4). As long as their code produces output files that match the input files for 2_Prepare-Model-InputFiles.R that we provide in our data release, they can likely make minor adaptations to run scripts 2, 3, 4 and 5 (as described in Table 4) to prepare final model inputs, run streamMetabolizer, and organize and quality assure their metabolism modeling results.

Our data release⁸ also suggests approaches that can help expand the capacity for modeling river metabolism. For example, several of the IRB sites could perhaps have been included in an earlier study⁴⁰, however, not all the needed input data were available at certain sites, resulting in those sites being passed over. To facilitate modeling at those sites, where appropriate, we acquired the missing measurements from nearby “replacement” sites (Table 7). An example is several sites where dissolved oxygen was collected without collecting the river discharge needed to accomplish Bayesian partial pooling that estimates K₆₀₀ based on a prior expectation that K₆₀₀ varies as a function of discharge. In such cases we “replaced” the missing discharge with data from a nearby site, which allowed metabolism estimation at sites previously overlooked because of missing data⁸. Because of the large river size where replacement discharges were used, e.g., often over 350 m³ s⁻¹, and given the proximity of the replacement site, usually within 10-km, we did not perform scaling by basin size when applying a replacement discharge.

Code availability

Our workflow includes scripts that were written and tested using R version 4.1.1. The scripts can be accessed from the data product⁸ which includes an appropriate licence (CC0 1.0 Universal) license permitting reuse without restrictions.

References

Battin, T. J. et al. River ecosystem metabolism and carbon biogeochemistry in a changing world. Nature 613, 449–459, https://doi.org/10.1038/s41586-022-05500-8 (2023).
Article CAS PubMed ADS Google Scholar
Bernhardt, E. S. et al. Light and flow regimes regulate the metabolism of rivers. Proceedings of the National Academy of Science 119(8), e2121976119, https://doi.org/10.1073/pnas.2121976119 (2022).
Article CAS Google Scholar
McIsaac, G. F., Hodson, T. O., Markus, M., Bhattarai, R. & Kim, D. C. Spatial and Temporal Variations in Phosphorus Loads in the Illinois River Basin, Illinois USA. J Am Water Resour Assoc. https://doi.org/10.1111/1752-1688.13054 (2023).
Article Google Scholar
Houser, J.N., ed. Ecological status and trends of the Upper Mississippi and Illinois Rivers (ver. 1.1, July 2022): U.S. Geological Survey Open-File Report 2022–1039, 199 p. https://doi.org/10.3133/ofr20221039 (2022).
Illinois Environmental Protection Agency, Illinois Officials Confirm Algal Bloom on Portions of the Illinois River, Residents should continue to use caution when recreating and be aware of blue-green algae. News Release June 25, 2018, Illinois Department of Public Health (2018).
Getahun, E., Keefer, L., Chandrasekaran, S. & Zavelle, A. Water Quality Trend Analysis for the Fox River Watershed: Stratton Dam to the Illinois River. Illinois State Water Survey Prairie Research Institute, University of Illinois at Urbana-Champaign prepared for the Fox River Study Group (2019).
Fox River Study Group. Fox River Implementation Plan, A plan to improve dissolved oxygen and reduce nuisance algae in the Fox River, https://www.foxriverstudygroup.org/ (2015).
Choi, J., Quion, K. M., Reed, A. P. & Harvey, J. W. RiverMET: Workflow and scripts for river metabolism estimation including Illinois River Basin application, 2005 - 2020. U.S. Geological Survey data release https://doi.org/10.5066/P9TEBOUR (2022).
Hoellein, T.J., Bruesewitz, D.A., & Richardson, D.C., Revisiting Odum (1956): A synthesis of aquatic ecosystem metabolism. Limnology and Oceanography, 58(2013), https://doi.org/10.4319/lo.2013.58.6.2089 (1956).
Manier, J. T., Haro, R. J., Houser, J. N. & Strauss, E. A. Spatial and temporal dynamics of phytoplankton assemblages in the upper Mississippi River. River Research and Applications 37(10), 1451–1462, https://doi.org/10.1002/rra.3852 (2021).
Article Google Scholar
Giblin, S. M. & Gerrish, G. A. Environmental factors controlling phytoplankton dynamics in a large floodplain river with emphasis on cyanobacteria. River Res Applic. 36, 1137–1150, https://doi.org/10.1002/rra.3658 (2020).
Article Google Scholar
Graham, J.L., Ziegler, A.C., Loving, B.L., & Loftin, K.A. Fate and transport of cyanobacteria and associated toxins and taste-and-odor compounds from upstream reservoir releases in the Kansas River, Kansas, September and October 2011: U.S. Geological Survey Scientific Investigations Report 2012–5129, 65 p. (Revised November 2012), https://doi.org/10.3133/sir20125129 (2012).
Graham, J. L. et al. Cyanotoxin occurrence in large rivers of the United States. Inland Waters 10(1), 109–117, https://doi.org/10.1080/20442041.2019.1700749 (2020).
Article CAS Google Scholar
Rousso, B. Z., Bertone, E., Stewart, R. & Hamilton, D. P. A systematic literature review of forecasting and predictive models for cyanobacteria blooms in freshwater lakes. Water Research 182, 115959, https://doi.org/10.1016/j.watres.2020.115959 (2020).
Article CAS PubMed Google Scholar
Beaver, J. R., Tausz, C. E., Scotese, K. C., Pollard, A. I. & Mitchell, R. M. Environmental factors influencing the quantitative distribution of microcystin and common potentially toxigenic cyanobacteria in U.S. lakes and reservoirs. Harmful Algae 78, 118–128, https://doi.org/10.1016/j.hal.2018.08.004 (2018).
Article CAS PubMed PubMed Central Google Scholar
Nietch, C. T. et al. Development of a Risk Characterization Tool for Harmful Cyanobacteria Blooms on the Ohio River. Water 14, 644, https://doi.org/10.3390/w14040644 (2022).
Article Google Scholar
Houser, J. N., Bartsch, L. A., Richardson, W. B., Rogala, J. T. & Sullivan, J. F. Ecosystem metabolism and nutrient dynamics in the main channel and backwaters of the Upper Mississippi River. Freshw Biol 60, 1863–1879, https://doi.org/10.1111/fwb.12617 (2015).
Article CAS Google Scholar
Peipoch, M. & Ensign, S. H. (2022), Deciphering the origin of riverine phytoplankton using in situ chlorophyll sensors. Limnol. Oceanogr. Lett 7, 159–166, https://doi.org/10.1002/lol2.10240 (2022).
Article Google Scholar
Cloern, J. E., Grenz, C. & Vidergar-Lucas, L. An empirical model of the phytoplankton chlorophyll: carbon ratio-the conversion factor between productivity and growth rate. Limnol. Oceanogr. 40, (1995).
Reisinger, A. J. et al. Water column contributions to the metabolism and nutrient dynamics of mid-sized rivers. Biogeochemistry 153, 67–84, https://doi.org/10.1007/s10533-021-00768-w (2021).
Article CAS Google Scholar
Diamond, J. S. et al. Metabolic regime shifts and ecosystem state changes are decoupled in a large river. Limnol Oceanogr 67, S54–S70, https://doi.org/10.1002/lno.11789 (2022).
Article Google Scholar
Batt, R. D., Carpenter, S. R., Cole, J. J., Pace, M. L. & Johnson, R. A. Changes in ecosystem resilience detected in automated measures of ecosystem metabolism during a whole-lake manipulation. Proc. Natl. Acad. Sci. 110, 17398–17403 (2013).
Article CAS PubMed PubMed Central ADS Google Scholar
Hall, R. O. Jr. Metabolism of streams and rivers: Estimation, controls, and application (chapter 4. In Jones, J.B. & Stanley, E.H. eds. Stream Ecosystems in a Changing Environment, 151-173 (Academic Press, 2016).
Platt, L. R. C. et al. Harmonized discrete and continuous water quality data in support of modeling harmful algal blooms in the Illinois River Basin, 2005 - 2020. U.S. Geological Survey data release https://doi.org/10.5066/P9RISQGE (2022).
U.S. Geological Survey. National Water Information System (USGS Water Data for the Nation), accessed November 4, 2021, at http://waterdata.usgs.gov/nwis/ (2021).
Blodgett, D., & Johnson, M. nhdplusTools: Accessing and Working with the NHDPlus (Version 0.5.7). Reston, VA: U.S. Geological Survey https://doi.org/10.5066/P97AS8JD (2022).
Schwarz, G. E. E2NHDPlusV2_us: Database of Ancillary Hydrologic Attributes and Modified Routing for NHDPlus Version 2.1 Flowlines. U.S. Geological Survey data release https://doi.org/10.5066/P986KZEM (2019).
Wieczorek, M. E., Jackson, S. E. & Schwarz, G. E. Select Attributes for NHDPlus Version 2.1 Reach Catchments and Modified Network Routed Upstream Watersheds for the Conterminous United States [Data set]. U.S. Geological Survey. https://doi.org/10.5066/F7765D7V (2018).
Article Google Scholar
De Cicco, L.A., Hirsch, R.M., Lorenz, D., Watkins, W.D., Johnson, M. dataRetrieval: R packages for discovering and retrieving water data available from Federal hydrologic web services, v.2.7.13, https://doi.org/10.5066/P9X4L3GE (2023).
National Oceanic and Atmospheric Administration National Centers for Environmental Information. U.S. Local Climatological Data (LCD), accessed September 7, 2021, at https://www.ncei.noaa.gov/maps/lcd/ (2021).
Holtgrieve, G. W., Schindler, D. E., Branch, T. A. & A’mar, Z. T. Simultaneous quantification of aquatic ecosystem metabolism and reaeration using a Bayesian statistical model of oxygen dynamics. Limnology and Oceanography 55(3), 1047–1062, https://doi.org/10.4319/lo.2010.55.3.1047 (2010).
Article CAS ADS Google Scholar
Grace, M. R. et al. Fast processing of diel oxygen curves: Estimating stream metabolism with BASE (BAyesian Single-station Estimation). Limnology and Oceanography: Methods 13(3), 103–114, https://doi.org/10.1002/lom3.10011 (2015).
Article Google Scholar
Appling, A. P., Hall, R. O. J., Yackulic, C. B. & Arroita, M. Overcoming equifinality: Leveraging long time series for stream metabolism estimation. Journal of Geophysical Research: Biogeosciences 123, 624–645, https://doi.org/10.1002/2017JG004140 (2018).
Article CAS ADS Google Scholar
Appling, A. P. et al. The metabolic regimes of 356 rivers in the United States. Sci. Data. 5, 180292, https://doi.org/10.1038/sdata.2018.292 (2018).
Article CAS PubMed PubMed Central Google Scholar
Gomez-Velez, J. D., Harvey, J. W., Cardenas, M. B. & Kiel, B. Denitrification in the Mississippi River network controlled by flow through river bedforms. Nature Geoscience 8, 941–945, https://doi.org/10.1038/ngeo2567 (2015).
Article CAS ADS Google Scholar
Chapra, S. & Runkel, R. Modeling impact of storage zones on stream dissolved oxygen. Journal of Environmental Engineering, Volume 125, Issue 5, https://doi.org/10.1061/(ASCE)0733-9372(1999)125:5(415) (1999).
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing https://www.R-project.org (2022).
Brooks, S. P. & Gelman, A. General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics 7(4), 434–455, https://doi.org/10.1080/10618600.1998.10474787 (1998).
Article MathSciNet Google Scholar
Gelman, A. & Rubin, D. B. Inference from iterative simulation using multiple sequences. Statistical Science 7(4), 457–472, https://doi.org/10.1214/ss/1177011136 (1992).
Article ADS Google Scholar
Appling, A. P. et al. Metabolism estimates for 356 U.S. rivers (2007–2017). U.S. Geological Survey data release https://doi.org/10.5066/F70864KX (2018).

Download references

Acknowledgements

This work was completed as part of the USGS Proxies Project, an effort supported by the USGS Water Mission Area (WMA) Water Quality Processes program to develop estimation methods for harmful algal blooms (HABs), per- and polyfluoroalkyl substances (PFAS), and metals, at multiple spatial and temporal scales. Sincere thanks are due to many who supported this effort. We extend our appreciation to Ariel Reed for her analysis contributions in the early phases of the study, Lindsay Platt for her assistance with data acquisition, Mike Stouder for preparing the metadata, Elizabeth Nystrom for reviewing the data and metadata, and Katie Summers and Jacob Zwart for their helpful reviews of an earlier draft of this manuscript. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

Author information

Authors and Affiliations

U.S. Geological Survey, Earth System Processes Division, Reston, VA, USA
Judson W. Harvey, Jay Choi & Katherine Quion

Authors

Judson W. Harvey
View author publications
You can also search for this author in PubMed Google Scholar
Jay Choi
View author publications
You can also search for this author in PubMed Google Scholar
Katherine Quion
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.W.H. conceived the project and led study design, site selection, quality assurance, and writing of the manuscript. J.C. led the computations, assembled the final data release, and contributed to writing the manuscript. K.Q. made computations and contributed to writing the manuscript.

Corresponding author

Correspondence to Judson W. Harvey.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Harvey, J.W., Choi, J. & Quion, K. Metabolism Regimes in Regulated Rivers of the Illinois River Basin, USA. Sci Data 11, 211 (2024). https://doi.org/10.1038/s41597-024-03037-1

Download citation

Received: 18 August 2023
Accepted: 01 February 2024
Published: 15 February 2024
DOI: https://doi.org/10.1038/s41597-024-03037-1