Background & Summary

Aquatic metabolism measures the balance between organic carbon accumulation by primary productivity of algae and other autotrophs and the rate of carbon removal by respiration of autotrophs and heterotrophs such as bacteria. River metabolism is relevant to assessing causes and consequences of eutrophication such as hypoxia, serving as an early warning indicator of changing river functions and health as well as indicating shifts in greenhouse gas emissions1,2. Here we focused on metabolism of regulated rivers in the Illinois River basin (IRB) where river algal blooms and associated toxins have been reported3,4,5,6,7. To quantify metabolism, the rate of oxygen production and consumption in the aquatic system is measured over time to estimate gross primary productivity (GPP) and ecosystem respiration (ER). GPP is a positive quantity that estimates the daily growth rate of autotrophs and ER is a negative quantity that estimates the daily rate of organic carbon loss by organism respiration including respiration of autotrophs and respiration associated with microbial decomposition of detrital organic matter. The sum of GPP and ER is the net ecosystem productivity (NEP), which estimates the daily balance between organic carbon build up and depletion in the system by primary productivity and respiration. To use the oxygen balance method to estimate metabolism it is necessary to also quantify the rate of dissolved oxygen exchange with the atmosphere, which depends on water temperature and atmospheric pressure as well as water mixing and turbulence. As methods improve to measure metabolism, the numbers of studies have substantially increased. However, most long-term estimates in flowing waters are confined to small streams and wadable rivers2.

For the present study we estimated aquatic metabolism at 17 river sites in the Illinois River basin (IRB)8 that encompassed extensive agricultural areas and a major metropolitan area in northeastern Illinois as well as agricultural and suburban areas in northwestern Indiana and in southern Wisconsin that drain to the Illinois River (Fig. 1, Table 1).

Fig. 1
figure 1

Seventeen river sites in the Illinois River Basin (IRB) selected for metabolism modeling. Site names and numbers reference data sourced from the U.S. Geological Survey National Water Information System (USGS | National Water Dashboard).

Table 1 Site name, U.S. Geological Survey National Water Information System (NWIS) site number, geographic coordinates, presence of lock and dam regulation, and period of data availability for metabolism modelling at the study of 17 IRB river sites.

The selected IRB sites represent a variety of river sizes and characteristics, including mainstem sites on the Illinois River as well as several large tributaries and a few smaller streams. The Illinois River is substantially regulated by a series of locks and dams to maintain minimum water levels for navigation through the upper Illinois River as it enters the Des Plaines River tributary and headwaters of the Chicago Area Waterway System (CAWS). Not surprisingly, water quality and ecological conditions are substantially impaired in IRB rivers, including high nutrients and suspended sediments3,4. Large tributaries of the Illinois River include the Kankakee River which drains large areas of corn and soybean agriculture and has been dredged and straightened to increase its conveyance, and now has significant problems with high turbidity and sedimentation3. The Fox River flows through agricultural areas in southern Wisconsin and then traverses the western edge of the Chicago urban corridor before joining the Illinois River5. Dam storage in the Illinois and Fox Rivers maintains significant water depths and lengthens water residence times while also increasing water clarity4. Recently, excessive plankton blooms and associated algal toxins have been observed in the Illinois and Fox Rivers5,6,7.

The type of autotrophs in water bodies (e.g., benthic vs. planktonic algae vs. submerged aquatic vegetation) depends on light availability which is affected by tree and bank shading and water-column light attenuation, disturbance frequency and severity, and other factors1,2. Benthic algae are usually thought to dominate GPP in streams and small rivers where the river bed is illuminated1,2. Many benthic algal species are adapted to shading by forest canopies, as well as the high-flow events that scour stream beds and disrupt GPP2. Planktonic algae are usually thought to dominate in lakes, reservoirs, and estuaries; however, the expectation for large rivers is less clear9. However, unshaded rivers with low or moderate turbidity have the potential for high water-column GPP from phytoplankton growth8,9.

Phytoplankton and harmful algal blooms (HABs) have increasingly been observed in large rivers and reservoirs of the Midwest and Great Plains areas of the United States such as the Kansas, Ohio, and Mississippi Rivers10,11,12,13, as well as in the Illinois River5,6,7 and elsewhere14,15. Flow extremes are moderated in regulated rivers such as the Ohio, Mississippi, and Illinois Rivers where locks and dams lengthen the water residence time and increase the water clarity in the quiescent river pools between the dams16,17. Regulated rivers also often have abundant nutrient supply3,4,5,6 which can support phytoplankton blooms during low flow periods, when water residence time is prolonged, when water is warmer than average, and when turbidity from suspended sediments is often at its lowest16,17.

Chlorophyll-a (chl-a) is often used as a measure of phytoplankton, however, riverine chl-a can reflect a myriad of algal types and is not distinctly diagnostic of phytoplankton18. Also, the relationship between chl-a and autotrophic biomass may vary greatly depending on light, nutrients, temperature, and other factors19. Use of metabolism metrics in rivers can improve understanding of the drivers of river algal blooms20 and can help anticipate future changes in river health21,22,23. For example, changes in the sign of NEP and in the temporal correlation of GPP and ER can signal changes in the relative importance of phytoplankton versus submerged aquatic vegetation as dominant primary producers in rivers21.

Most previous metabolism estimation in rivers was focused on streams and small rivers2. To motivate further use of the IRB metabolism data8, we plotted long-term average metabolism for 17 IRB river sites (Fig. 2). Like many heterotrophic streams and rivers that process substantial inputs of allochthonous organic matter1,2,9,23, the metabolism of IRB rivers was generally heterotrophic (Fig. 2).

Fig. 2
figure 2

Average gross primary productivity (GPP) versus ecosystem respiration (ER) in regulated rivers and various tributaries of the Illinois River Basin (IRB), USA. IRB study rivers are distinguished by symbol color with symbol size scaled by mean river discharge. Dashed line denotes where net ecosystem productivity (NEP) equals zero and separates heterotrophic from autotrophic conditions. The orange cross shows the approximate inter-quartile range of average GPP and ER for 18 “unshaded and stable flow” rivers in the United States2.

The overall productivity of IRB rivers (mean GPP = 2.77 g O2 m−2 d−1) was representative of the relatively high productivity of a subgroup of 18 high productivity “unshaded and stable flow” rivers evaluated as part of a study of 220 rivers and streams2 (Fig. 2). Productivity was generally higher in unshaded and stable flow rivers compared to most other streams and rivers because of greater light availability and because smaller variations of river discharge disturb autotrophs less frequently2. Only one of our IRB study rivers (Fox R. with an average GPP of 7.13 g O2 m−2 d−1) was a standout in productivity compared to the unshaded and stable flow subgroup. However, nearly all IRB rivers were substantially higher (more negative) in ER (mean ER = −6.05 g O2 m−2 d−1) compared with the unshaded and stable flow subgroup from the broader analysis2 (Fig. 2).

Our dataset indicates that IRB river metabolism is heterotrophic overall (mean IRB river NEP = −3.28 g O2 m−2 d−1), however, IRB rivers were intermittently autotrophic, accounting for between 1 and 56% of the measured days (Table 6 and Fig. 2). At one extreme the Kankakee and Des Plaines Rivers were usually strongly heterotrophic and were only autotrophic on 1% and 5% of days, respectively. At the other extreme the Illinois River and Fox Rivers were autotrophic 33% and 43% of days, respectively. Tributaries were intermediate in their autotrophy ranging between 12% and 23% of days (Table 6 and Fig. 2).

Frequent autotrophy in rivers is an indicator but does not in itself imply phytoplankton production21. However, the correspondingly high chlorophyll-a (chl-a) measurements in the Illinois and Fox Rivers6 compounded with visual reporting and analytical determinations of planktonic algae5,7 indicate that phytoplankton blooms are common in the IRB. We encourage further analysis of our IRB river metabolism data set8 in the context of water quality24,25 and river conditions26,27,28 to better understand the triggers and consequences of riverine planktonic algal blooms, in the IRB and elsewhere.

Methods

Initial site selection for metabolism estimation in IRB rivers was based on the availability of dissolved oxygen data accessed from the U.S. Geological Survey National Water Information System25 (USGS NWIS). We used the USGS | National Water Dashboard link to help identify NWIS site numbers with the needed input data. USGS scalable maps of water-quality data collection sites that are available at that site were consulted. Potential river sites were identified by searching all “stream type” sites including “streams”, “canals”, and “ditches” with at least a year of continuous collection of dissolved oxygen data (i.e., generally 15-minute intervals). Sites were excluded that were obviously not lotic in character, e.g., wetlands, ponds, gravel pits, which resulted in identifying seventeen IRB river sites that were appropriate for modeling long-term metabolism. Selected sites were linked to the National Hydrography Dataset (NHDPlus)26 to take advantage of documented river and catchment attributes.

We used USGS data retrieval software (dataRetrieval)29 to download between one and nine years of data from 17 selected IRB river sites (Table 1) including all continuous (sub-daily) measurements of dissolved oxygen concentration, water temperature, specific conductivity, continuous daily water discharge and gage height (Table 2), as well as downloading infrequently collected channel field measurements (Table 3). Barometric pressure was obtained separately through a request to NOAA30 using site latitude and longitude to select the closest nearby measurement location for each river site. All of the dissolved oxygen (DO) data used in this study were quality assured and approved by the USGS. The DO data are expected to be of high quality because they were collected after 2010, after the use of optical DO sensors had become standard practice. Although it did not apply to our IRB data, recently collected USGS data that is available for download is sometimes provisional and not yet quality assured.

Table 2 List of data sources for metabolism modeling including USGS data obtained using USGS data retrieval software29 and NOAA National Centers for Environmental Information, U.S. Local Climatological Data (LCD)30.
Table 3 Parameters calculated from source data for metabolism modeling.

To model metabolism we took advantage of recent advancements with state-space models that simultaneously estimate three unknown metabolism variables, GPP, ER, and K60031,32,33. Generally, models converge better and produce physically realistic estimates when GPP > rate of air-water oxygen exchange, a condition that accentuates diel variation in dissolved oxygen concentration and increases the signal-to-noise ratio that aids model identification of the competing influences of GPP, ER, and K600. Nevertheless, metabolism estimation remains a challenge because of the potential difficulties in estimating three co-related parameters from a single oxygen time series.

To model metabolism in IRB rivers we used the streamMetabolizer R package (https://github.com/USGS-R/streamMetabolizer), a widely tested and well documented state-space metabolism model33. This model uses the one-station modeling approach that assumes that sensor data collected at a single point in a river is representative of a well-mixed water column. The accuracy of DO measurements is also important; however, the measurement accuracy has improved substantially since high-quality optical dissolved oxygen sensors began being used routinely (approximately 2005). Furthermore, the model does not quantify anaerobic respiration that is sometimes significant in low-oxygen rivers. In addition to assuming well-mixed conditions, the one-station modeling approach assumes homogenous upstream conditions affecting metabolism for a distance that is assumed to be proportional to v/K where v is stream velocity and K is the gas exchange coefficient.

The governing mass balance equations equate the instantaneous rate of change in DO [O2] in the river with the sum of the rates of DO inputs and outputs by metabolism and gas exchange32. Expressed as volumetric rates, the mass balance for DO is:

$$\frac{d[{O}_{2}]}{dt}={P}_{t}+{R}_{t}+{D}_{t}$$
(1)

where d[O2]/dt is the rate of change in water column O2 [mg O2 L−1 d−1]; Pt is the instantaneous volumetric rate of oxygen addition by gross primary production [mg O2 L−1 d−1]; Rt is the instantaneous volumetric rate of oxygen removal by respiration [mg O2 L−1 d−1]; and Dt is the instantaneous volumetric rate of air-water oxygen exchange [mg O2 L−1 d−1]. By the definition, Pt should be greater than or equal to zero, Rt should be less than or equal to zero, and gas exchange, Dt, can take either sign. The streamMetabolizer model33 restructured the oxygen balance expressions by using long-term oxygen times series to estimate daily metabolism variables through the solution of the following equations:

$${P}_{t}={\boldsymbol{GPP}}\times \frac{1}{h}\times \frac{\left({t}_{1}-{t}_{0}\right)\times PPF{D}_{t}}{{\int }_{u={t}_{0}}^{{t}_{1}}PPF{D}_{u}{d}_{u}}$$
(2)
$${R}_{t}={\boldsymbol{ER}}\times \frac{1}{h}$$
(3)
$${D}_{t}={K}_{2,t}\times \left({O}_{sat,t}-{O}_{mod,t}\right)$$
(4)
$${K}_{2,t}={{\boldsymbol{K}}}_{{\bf{600}}}\times {\left(\frac{{S}_{A}+{S}_{B}{T}_{t}+{S}_{C}{T}_{t}^{2}+{S}_{D}{T}_{t}^{3}}{600}\right)}^{-0.5}$$
(5)

where GPP is the daily areal average rate of primary production (g O2 m−2 d−1), ER is the daily areal average rate of respiration [g O2 m−2 d−1], and K600 is the daily average gas exchange rate constant normalized for molecular properties and temperature to a Schmidt number of 600 [day−1]. Variables with subscript t are instantaneous values that are typically estimated from 15-minute interval measurements. The rate of gas exchange, Dt, is the product of the rate constant and the deficit between actual and saturated concentrations of dissolved O2. Rather than fit actual gas exchange, i.e., the K2,t value, the model fits K600, so that only one standardized gas-exchange-related parameter per day need be reported that still captures and reflects the within-day variation in gas exchange rates caused by diel variation in temperature. Additional variables are h, mean river depth representing the width and upstream length of the reach affecting the oxygen balance [m]; PPFD, photosynthetic photon flux density [μmol photons m−2 d−1]; Osat,t, saturated O2 concentration [mg O2 L−1]; Omod,t, model estimated O2 concentration [mg O2 L−1]; K2,t, O2-specific and temperature specific gas exchange coefficient [day−1]; Tt, water temperature [°C]; and S, Schmidt number coefficients: SA = 1568, SB = −86.04, SC = 2.142, and SD = −0.0216. The solution approach is described in detail in Appling et al.33.

River depth estimation

River depth is necessary for metabolism estimation and the accuracy of depth estimation has a directly proportional effect on the estimation accuracy of GPP and ER. An approach previously underutilized for depth estimation in multi-river metabolism studies is using channel field measurements by the U.S. Geological Survey. We used a linear rating curve approach for estimating river depth that was based on USGS field measurements of channel width, channel area, gage height, channel discharge and channel cross-section average velocity. We obtained those field measurements from USGS NWIS25 using the dataRetrieval29 function “readNWISmeas()” that referenced USGS NWIS site number and start and end date, which often returned tens of field measurements for each site during the period of interest.

To use the linear rating curve approach to estimate river depth, the cross-section averaged depth was determined for days with field measurements by dividing the measured flow cross section by the wetted channel width:

$${h}_{fm}={A}_{fm}/{w}_{fm}$$
(6)

where hfm is the field measured river depth, Afm is the field measured channel cross-sectional area, and wfm is the field measured wetted width of the river.

River depth for all model days was estimated from a linear estimation equation:

$$h=m\cdot GH+b$$
(7)

where h and GH are river depth and measured gage height, respectively, and model coefficients m and b for this equation were determined from a linear regression of the field measured river depth against measured gage height on the days of the field measurements.

Usually, we excluded USGS field measurements rated as “poor” from the regression of field measured river depth on gage height. At some sites, however, most of the field measurements, and sometimes all of them, were rated as poor. Nevertheless, if the gaging cross section was representative of upstream conditions, we usually judged that using field measurements to estimate river depth was superior to hydraulic geometry estimation of river depth no matter what the quality rating of the field measurements. The preferred water depth estimation method for each site is noted in Table 7.

We used the linear rating curve estimation approach for estimating river depth at thirteen of the seventeen IRB river sites where the river width at the sensor location was representative of upstream conditions (see details in next section). However, four of the seventeen river sites were located at relatively narrow control sections for which river depth estimates at the sensor location were not representative of upstream conditions. For those sites we used a hydraulic geometry approach34 to estimate cross-section average river depth, h, estimated from hydraulic geometry as:

$${h}_{hgc}=c\cdot {Q}^{f}$$
(8)

where c and f are hydraulic geometry coefficients35 for each of the river reach codes (comID26) associated with our IRB river sites, and Q is continuous discharge at the IRB river site.

Assessing site representativeness of river conditions

The one station method for estimating metabolism depends on the measurement site representing both local and upstream conditions that affect metabolism estimates. A well-mixed water column, both vertically and laterally, is assumed with longitudinal consistency in river physical and biological conditions34. Those assumptions have been examined theoretically36 but are not often tested at field sites. For the present study we assessed the consistency of river width at the oxygen sensor site with river width upstream to evaluate whether the local measured river depth was representative of upstream conditions.

It is not unusual for USGS gaging and sensor measurement cross sections to be located at “control sections” that are narrower than average for the river reach, in which case the field measurements from the cross section may differ from the reach average. Both the average river depth and average velocity could be overestimated in a narrower than average measurement cross section. We consulted the USGS “water-year summary” for each site25 and we visually examined the gaging cross section and upstream conditions using publicly available aerial imagery (https://www.google.com/maps). The sensor location and gaging cross section where depth was measured by USGS field crews was determined from the description provided in the water-year summary25. Using the imagery, we examined the consistency of river width at the measurement site for approximately 10 kilometers upstream of the oxygen measurement site. Because the regulated rivers of the IRB were relatively consistent in width, we could estimate the river depth at most sites using the linear rating curve approach as described in the previous section.

To accurately estimate river metabolism, we also had to be concerned how close the site was to upstream flow regulation structures, e.g., locks and dams, or lakes. If close enough, those features affect dissolved oxygen concentrations in ways that disrupt the river metabolic signals being modeled at the sensor site. Proximity is usually judged by estimating the “metabolism reach length”, i.e., the distance required for substantial turnover of the dissolved oxygen in the water column by gas exchange with the atmosphere. Metabolism reach length was estimated as the river distance required for 80% turnover in river dissolved oxygen by gas exchange34, i.e., the distance where upstream river conditions are likely to influence metabolism calculations. For each day in each river, we estimated the metabolism reach length as:

$${\rm{metabolism}}\;{\rm{reach}}\;{\rm{length}}=-ln\left(1-0.8\right)\frac{v}{{K}_{{O}_{2}}}$$
(9)

where v is the cross-section averaged river velocity in m d−1, and \({K}_{{O}_{2}}\) is the air-water exchange coefficient for oxygen that was calculated from the K600 using the measured water temperature and published analysis equations and coefficients33. Cross-section averaged river velocity was estimated by dividing daily average discharge by the estimated cross-sectional channel area for that day:

$$v=Q/{A}_{fm}$$
(10)

where Afm is the field measured channel cross-sectional area. A for each modeled day was estimated using a linear estimation equation:

$$A=m\cdot GH+b$$
(11)

where GH is gage height and m and b for this equation are model coefficients determined from a linear regression of the field measured cross-sectional channel area against measured gage height for the days of the field measurements.

To compare the estimated metabolism reach length with field conditions, we measured the distance from the metabolism sensor site to the nearest upstream flow regulation structures, e.g., lock and dam, or lake, by visual inspection of publicly available aerial imagery (https://www.google.com/maps) where we used that product’s measurement tool to estimate the distance from the metabolism sensor site.

Workflow for modeling IRB river metabolism

We used R Statistical Software37 to process existing data to create model inputs, verify model inputs, run the streamMetabolizer model, and post-process and quality assure the results (Fig. 3).

Fig. 3
figure 3

Workflow overview showing data processing and preparation of input files, model execution, post processing and quality assurance of model results.

The broad outlines of the workflow are documented in Fig. 3 and Table 4 and briefly summarized here. Running the first script time-matched the downloaded data, converted units, and filled time gaps less than 3 hours by linear interpolation. Running script 2 calculated model input variables such as solar time, saturated dissolved oxygen concentration, river depth, and estimated a proxy for light intensity at the river surface, and produced an output file compatible with the requirements of streamMetabolizer. The script 2 calculations were based on published functions34, except for the new method of estimating river depth discussed in the “River depth estimation” section.

Table 4 Summary documentation of scripts.

Running script 3 provided a consistency check with script 1 outputs before running script 4 to run the streamMetabolizer model. Script 5 post processes the model outputs to produce results and model diagnostics where daily metabolism results are flagged based on established criteria34. Also provided are plots for visual evaluation of the results as well as censored versions of metabolism output files that remove results for all days that were flagged. Details are provided in the “Quality assurance” section. Table 4 summarizes script operation in data acquisition, preparation of inputs, running the model, and post-processing outputs to evaluate and quality assure the model results.

Running the metabolism model

We ran streamMetabolizer version 0.12.0 on a laptop using R version 4.1.137. Computational times varied between 1 and 12 hours per site, with the two IRB sites with more than 5 years of record (Kankakee River at Davis and Illinois River at Florence) needing to be split into approximately 3-year segments to facilitate run completion. We used the streamMetabolizer option for Bayesian partial pooling in our models, which conditions estimates of K600 based on the expectation that K600 varies as a function of discharge. Appling et al.33 showed that partial pooling helps improve model performance because, although partial pooling does not impose a strict relationship between K600 and discharge, it establishes an across-day, piecewise linear relationship between ln(K600) and ln(Q) that helps improve the estimation of GPP, ER, and K600. Models were run with the recommended setup using four Monte Carlo Markov Chains and 1000 warmup steps. The streamMetabolizer model calculates values of the Gelman-Rubin statistic for observational error, \({\widehat{{\rm{R}}}}_{{{\rm{\sigma }}}_{obs}}\), process error, \({\widehat{{\rm{R}}}}_{{{\rm{\sigma }}}_{proc}}\), and K600 estimation error, \({\widehat{{\rm{R}}}}_{{{\rm{\sigma }}}_{K600}}\), with values ≤ 1.1 used as an initial screening criteria to indicate that model converged adequately38,39. Many of the IRB models converged on first run, but if unsuccessful, we ran the models again after increasing the number of burn-in steps to 1500. After the model runs were completed, we compiled the results and used the final diagnostic values reported by streamMetabolizer in our quality assurance steps. Also, at several river sites we tested the influence of using the default initial values for GPP, ER, and K600 provided in streamMetabolizer by varying initial values by approximately a factor of two and finding that model outcomes were robust.

Quality assurance

Daily model outputs were flagged based on indicators of poor signal to noise strength of the modeled timeseries, and indicators of biologically and physically unrealistic outcomes for GPP, ER, and K600. For Flag 1, we compared each day’s coefficient of determination of modeled oxygen, R2det against a threshold to assess signal to noise strength. For Flag 2 and 3, we assessed biologically unrealistic values of GPP and ER, respectively, following a previous example34 that allowed for slightly negative GPP and slightly positive ER outcomes to reflect error variation. Lastly, for Flag 4 we assessed physically unrealistic values of K600 (Table 5).

Table 5 Flagging of daily estimates of GPP, ER, and K600 and confidence criteria for overall metabolism outcomes at IRB river sites.

Our overall confidence assessments in metabolism outcomes followed Appling et al.34 (Table 5). We assessed the percentage of days that estimated GPP, ER, and K600 fell outside biologically or physically realistic thresholds as well as assessing model convergence statistics (\(\widehat{{\rm{R}}}\)) that could indicate inadequate convergence of parameter estimates. Lastly, we assessed potential interference in metabolism estimation depending on proximity of nearest upstream dam or lake (Table 5).

To evaluate overall confidence in metabolism results for IRB rivers, we ranked each river based on combining the individual rankings for the five criteria [(Table 5)]. A river site’s individual ratings needed to be high for all five metrics for that site’s metabolism overall output to rank as “High” in confidence. A single low rating for any criterion earned a “Low” overall confidence assessment. All other combinations of individual ratings earned a “Medium” overall confidence assessment for a river site’s estimated metabolism (Table 5).

Data Records

Our U.S. Geological Survey data release8 (https://doi.org/10.5066/P9TEBOUR) presents long-term aquatic metabolism estimation at 17 river sites in the IRB. The principal outcomes are 15,176 daily estimates of GPP, ER, and K600 accompanied by sub-daily input timeseries of dissolved oxygen, temperature, barometric pressure, and river depth and discharge, as well as diagnostic metrics and statistics which we used to assess the quality of model outcomes. Our source data for the IRB (Table 1) had only minimal overlap encompassing a partial record for one site, DES PLAINES RIVER AT JOLIET, IL, with a previous multi-river modeling study40.

Metabolism estimates for the Illinois River and Fox River indicate that autotrophic conditions occur between 14 and 56% of days compared to the Kankakee and Des Plaines Rivers, which experienced autotrophy on just a few percent of days (Table 6). Metabolism in the regulated rivers of the IRB can be informative about hydrologic, biogeochemical, and ecosystem health issues in larger rivers managed for navigation. We particularly encourage use of the IRB river metabolism data8 by joining with other IRB data sets24 to identify and isolate drivers and develop early warning indicators of planktonic algal blooms in rivers.

Table 6 Time-averaged IRB river discharge, metabolism, and percent of days at each site with autotrophic metabolism, i.e. NEP > 0.

Data release file structure

Our data release8 provides files documenting metabolism estimation for 17 IRB rivers and the associated workflow. The main landing page of the USGS data release includes the metadata, readme file, and scripts (R code), and from there two child items that can be accessed leading to “Input data” and “Output data” pages, each with additional metadata and downloadable files. The data release can be accessed at https://doi.org/10.5066/P9TEBOUR. The structure of the data release and locations of downloadable files are summarized below:

MAIN PAGE: Metadata File, Readme File, and Scripts

  • RiverMET_workflow_and_scripts_metadata.xml: Metadata file describing overview of workflow and scripts

  • RiverMET_readMe.txt: Readme file providing overview of file contents and guidance for running the scripts

  • RiverMET_Scripts.zip: R code scripts 1 through 5 are provided and can be downloaded with this zip file. For convenience, we list the Script names and note behind each Script the input and output files that are downloadable under Child Item 1 (Inputs) and Child Item 2 (Outputs) as described further below:

    • 1_Process-Data.R (note: Script-1 input files not included but output from Script-1 is provided in the form of Script-2 input files)

    • 2_Prepare-Model-InputFiles.R (note: Script 2 input files included, see Child Item 1; Script-2 output files also included and are equivalent to Script-3 and Script-4 input files, see Child Item 2)

    • 3_Verify-Model-InputFiles.R (note: Script-3 output files not included because this is an optional step for cross checking files)

    • 4_Run-streamMetabolizer.R (note: Script-4 output files are not included because they are not useful without first being processed by Script-5)

    • 5_PostProcess-ModelOutputs.R (note: Script-5 output files are included, see Child Item 2)

CHILD ITEM 1: Input Files

  • RiverMET_Input_Files_metadata.xml: Metadata file describing all input data including column headers and data units.

  • RiverMET_Inputs.zip: Downloadable Script 2 input files with filenames and contents summarized below.

  • barop.csv – barometric pressure in millibar (mb); 15-minute time series

  • disch_gage.csv – discharge in m3 s−1, gage height in m; 15 – minute time series

  • do.csv – dissolved oxygen in mg/L; 15-minute time series

  • sal.csv – salinity in Practical Salinity Units (PSU); 15-minute time series

  • temp.csv – water temperature in degrees Celsius (°C); 15-minute time series

  • hydraulic_coeffs.txt – hydraulic geometry coefficients a, b, c, and f as used in estimation equations for river width, B = aQb and river depth, h = cQf where Q is river discharge, B is river width, and h is river depth.

CHILD ITEM 2: Output Files

  • RiverMET_Output_Files_metadata.xml: Metadata file describing all output data including column headers and data units.

  • RiverMET_Outputs.zip: Downloadable output files in two folders, “outputs_from_script-2” and “outputs_from_script-5”. Script-2 output files are ready for modeling using streamMetabolizer. Script-5 output files are the final metabolism outputs from our study. Output files details are described below:

    • RiverMET_Outputs.zip/outputs/outputs_from script-2/: (note: 34 csv files with 17 using hydraulic geometry estimation of river depth and 17 using gage height estimation of river depth; example filename: bayesInput_[date]_depth-hgc_[site_no].csv

    • RiverMET_Outputs.zip/outputs/outputs_from_script-5/: (note: “outputs_from_script-5” has two folders, “outputs-A” and “outputs-B”. Each folder has 21 files including 15 site files plus 3 files each for 2 long-record sites. The “outputs-A” filenames follow this example: flagged_GPP_ER_K600_[date]_depth-hgc_[site_no].csv. The “outputs-B” filenames follow this example: censored_ GPP_ER_K600 _[date]_depth-hgc_[site_no].csv.

Technical Validation

There is no universally accepted way to quality assure modeling results. In the IRB we assessed daily metabolism results by flagging values that exceeded thresholds based on biologically or physically unrealistic values or on daily model-fit diagnostics from the streamMetabolizer model (Table 5). Overall confidence in each river site’s model outcomes was assessed using aggregated metrics and statistical diagnostics, e.g., percentages of daily values that were flagged and model convergence statistics (Table 5).

In the IRB an average of 29% of the modeled days had one or more flags. As described in the section on “Data release file structure”, two output versions were produced that can serve various needs. The first output version provides only censored GPP, ER, and K600 model estimates of the highest apparent quality after removing all days with flags. However, it is possible that some “useful” data may have been removed in the censoring process. The second output version provides complete results, including results for days with flags, which allows the user to judge each day’s data and allows users to perform custom assessments of the quality of model outcome to meet specific needs.

In terms of overall confidence in model outcomes, thirteen of the seventeen IRB river metabolism timeseries earned an overall high or medium confidence ranking (Table 7). The most frequent criterion causing a low confidence ranking was exceedance of the \({\widehat{{\rm{R}}}}_{{{\rm{\sigma }}}_{K600}}\) statistic threshold of 1.2 indicating problems with model convergence. The four river sites earning a low confidence ranking were FOX RIVER NEAR MCHENRY, IL; ILLINOIS RIVER AT FLORENCE, IL; SUGAR CREEK NEAR CHATHAM, IL; and LICK CREEK NEAR WOODSIDE, IL.

Table 7 Summary of metabolism model confidence assessment for the 17 river sites in IRB.

Having approximately three quarters of the IRB river sites (76%) earn a high or medium confidence ranking is only slightly lower performance than a similarly assessed set of rivers modeled by Appling et al. 34, where 84% ranked high or medium confidence. The IRB river metabolism results8 are therefore quality assured based on application of the best available diagnostic metrics and statistical criteria for models of this type. Nonetheless, it is important to consider that model confidence assessments are only guidance and do not override future investigations of model quality that may be more detailed or judged “fit for purpose”.

Usage Notes

Our data release8 provides metabolism outcomes and documents our workflow for modeling metabolism at 17 ILB river sites. Here we summarize descriptive information about the dataset and guidance for its use, including geographic coordinates and period of data availability for each site (Table 1), summary of USGS parameter codes used for downloading (Table 2), information about calculating parameters needed as model inputs (Table 3), an overview of script workflows (Table 4), quality assurance criteria (Table 5), and metabolism outcomes (Table 6) including a model performance assessment (Table 7). In addition, our data release8 provides guidance for potential reuse of codes in the file RiverMET_readMe.txt, including suggestions for changes that may be needed to run on a different system, re-run IRB sites with different options, or adapt scripts to model metabolism in other rivers. Users who wish to adapt parts of our workflow will need to acquire publicly available data from USGS and NOAA. They can use existing software (dataRetrieval29) to download the needed USGS data from their sites of interest, including dissolved oxygen, water temperature, specific conductance, discharge, gage height, and field measurements of channel parameters from the USGS NWIS site, and they can obtain barometric pressure data from NOAA. After downloading their own data, users can adapt parts of 1_Process-Data.R to perform the data time matching, gap filling, and unit conversion (Table 4). As long as their code produces output files that match the input files for 2_Prepare-Model-InputFiles.R that we provide in our data release, they can likely make minor adaptations to run scripts 2, 3, 4 and 5 (as described in Table 4) to prepare final model inputs, run streamMetabolizer, and organize and quality assure their metabolism modeling results.

Our data release8 also suggests approaches that can help expand the capacity for modeling river metabolism. For example, several of the IRB sites could perhaps have been included in an earlier study40 , however, not all the needed input data were available at certain sites, resulting in those sites being passed over. To facilitate modeling at those sites, where appropriate, we acquired the missing measurements from nearby “replacement” sites (Table 7). An example is several sites where dissolved oxygen was collected without collecting the river discharge needed to accomplish Bayesian partial pooling that estimates K600 based on a prior expectation that K600 varies as a function of discharge. In such cases we “replaced” the missing discharge with data from a nearby site, which allowed metabolism estimation at sites previously overlooked because of missing data8. Because of the large river size where replacement discharges were used, e.g., often over 350 m3 s−1, and given the proximity of the replacement site, usually within 10-km, we did not perform scaling by basin size when applying a replacement discharge.