Uncertainty analysis of model inputs in riverine water temperature simulations

Simulation models are often affected by uncertainties that impress the modeling results. One of the important types of uncertainties is associated with the model input data. The main objective of this study is to investigate the uncertainties of inputs of the Heat-Flux (HFLUX) model. To do so, the Shuffled Complex Evolution Metropolis Uncertainty Algorithm (SCEM-UA), a Monte Carlo Markov Chain (MCMC) based method, is employed for the first time to assess the uncertainties of model inputs in riverine water temperature simulations. The performance of the SCEM-UA algorithm is further evaluated. In the application, the histograms of the selected inputs of the HFLUX model including the stream width, stream depth, percentage of shade, and streamflow were created and their uncertainties were analyzed. Comparison of the observed data and the simulations demonstrated the capability of the SCEM-UA algorithm in the assessment of the uncertainties associated with the model input data (the maximum relative error was 15%).


Scientific Reports
| (2021) 11:19908 | https://doi.org/10.1038/s41598-021-99371-0 www.nature.com/scientificreports/ combines the strengths of the Metropolis algorithm, controlled random search, competitive evolution, and complex shuffling to infer the posterior target distribution by a continuous updating process 20 . Due to these advantages, the SCEM-UA algorithm was selected for the current study. Lin et al. 9 used the GLUE algorithm and the SCEM-UA to address the issue of parameter uncertainty for conceptual hydrological modeling. Their results showed that when setting the threshold value at the interior sites, the simulated runoff series by the Xinanjiang model with the behavioral parameter sets fitted better with the observed runoff series. Liu et al. 10 used the SCEM-UA to calibrate a hydrological model. They represented the posterior distribution functions of the hydrologic model output by the SCEM-UA. The calibration of the model, using 16 different parameters, was performed using two procedures of SCEM-UA and the genetic algorithm (GA). Results indicated that both methods were performing very well, but the SCEM-UA was better. Ajami et al. 3 proposed the Integrated Bayesian Uncertainty Estimator (IBUNE) to analyze the uncertainty of model parameters. They used the SCEM algorithm to analyze the parameter uncertainty of a rainfall-runoff model. By utilizing the Bayesian model average (BMA) and the developed SCEM algorithm, they examined the total uncertainty of the model. Tang et al. 15 used the SCEM-UA algorithm to analyze the uncertainties for nonlinear structural systems and demonstrated that the algorithm effectively estimated the parameters with uncertainties. Liu et al. 11 evaluated the uncertainty of urban flood modeling. Due to the lack of reliable discharge data, they combined experimental data and modelling to characterize the floods and map the inundated areas in Xiamen Island, China. Quantification of the uncertainties for river temperature models based on heat fluxes is considered in evaluating the effectiveness of ecological restoration alternatives. In ecological restoration studies such as aquatic green infrastructure 4 and fish habitat 2 , small changes in driving parameters could cause faulty understanding from simulated water temperature, a fundamental factor affecting water quality and ecosystems (e.g., 12,18 ).
Thus, it is important to select the most efficient method to assess the uncertainties associated with the input data and model parameters, as most methods require longer running time and are computationally intensive for specific models. Neglecting uncertainty analysis can lead to a series of problems such as overdesign, higher costs, reduced reliability, and failure to achieve optimal benefits. Examining the uncertainties of model inputs provides a comprehensive insight into their influences on the model outputs, which potentially improves the modeling efficiency and lowers the cost in their measurements.
The objective of this study is to investigate the uncertainties of the inputs of the HFLUX model using the SCEM-UA algorithm. As a new effort, the SCEM-UA algorithm is used for assessing the HFLUX model inputs, and its performance is evaluated in this study. The HFLUX 8 is an efficient and useful river water quality model for 1D simulation of the spatio-temporal distribution of stream water temperatures. In the simulation of temperature at each discretized node and each time step, the HFLUX considers the heat fluxes from the environment and lateral inflows of water to the node. It is also flexible in choosing the solution methods. Thus, the HFLUX model is selected for simulating water temperatures in this study. The simulation results are compared with the observed data to evaluate the performance of the SCEM-UA algorithm in the analysis of the uncertainties of the model inputs.

Materials and methods
In this study, the HFLUX model was coupled with the SCEM-UA algorithm for analyzing the uncertainties of the model inputs. The specific procedures started with selecting the inputs of the HFLUX model. With the linked HFLUX and SCEM-UA model and implementation of an iteration scheme, the uncertainty of each of the selected inputs was obtained based on the ranges (minimum and maximum values) of the input data/parameters and the Latin hypercube sampling. The simulations were then compared against the observed data to evaluate the performance of the SCEM-UA algorithm. These steps are depicted in Fig. 1. River water temperatures simulated by the HFLUX model. River water temperature affects the water quality and the ecosystem health, and hence control of river water temperature is important to mitigation of its adverse effects 1 . The HFLUX model was used to simulate the streamflow temperatures at different locations and times. The model is highly flexible in terms of choosing the solution methods for solving the governing equations and selecting the energy budget terms such as shortwave solar radiation, latent heat flux, and sensible heat transfer flux. The model input data include the initial spatial and temporal temperature conditions, stream geometry data, discharge data, and meteorological data 8 . The water balance and energy balance equations are respectively given by 8 : where A is the cross section area of the stream (m 2 ), x is the distance along the stream (m), t is the time (s), Q is the discharge of the stream (m 3 /s), q L is the lateral inflow per unit stream length (m 2 /s), T w is the stream temperature ( • C ), T L is the temperature of the lateral inflow ( • C ), R is the energy flux (source or sink) per unit stream length ( • C m 2 /s), B is the width of the stream (m), ϕ total is the total energy flux to the stream per surface area (W/m 2 ), ρ w is the density of water (kg/m 3 ), and Cw is the specific heat of water (J/kg • C ). Equation (3) is www.nature.com/scientificreports/ based on a thermal datum of 0 • C and the impact on the absolute value of the advective heat flux term. In Eq. (2), if q L is negative, the first term on the right-hand side of the equation becomes a loss of q L T w . Also, dispersive heat transport that is omitted in Eq. 2 is negligible when the longitudinal change in water temperature is small in comparison to the temporal changes 8 .
SCEM-UA algorithm. The SCEM-UA algorithm provides posterior distribution functions for the model parameters and input data by generating an initial sample from the parameter space. First, the indicators of n, q, and s that are respectively dimension (the number of investigate inputs), number of complexes (the population to be divided), and population (the number of sample points) are determined for the algorithm. Then, the algorithm searches the sampling points in the feasible space and sorts the points according to the density. The algorithm determines the sequence and complexes based on those points. The sequence is the first q points of www.nature.com/scientificreports/ the population and complexes are a collection of m points from the population. Note that m = s/q. In the next step, the points of each complex are sorted based on the density, which can be mathematically expressed as 20 : where k = 1,2,…,q, α is the ratio of the mean posterior density of the m points of complexes to the mean posterior density of the last m generated points of sequences, θ is the points of complexes, c n = 2.4 √ n , T = 10 6 , µ is the mean, and ∑ denotes the covariance. To investigate the new points created by the algorithm, the points of complexes are replaced by 20 : where C k is the K th complex, Z is drawn from the uniform distribution in the range of 0-1, and Ω is calculated by 20 : where P θ t+1 y and P θ t y are the posterior probability distributions for θ t+1 and θ t , respectively. Then, the algorithm examines the following condition for each complex. If it is rejected, the algorithm replaces the worst member c k (the point with the lowest density) with θ t+1 20 .
where Ŵ k is the ratio of the posterior density of the best (the point with the highest density) to the posterior density of the worst member of c k . The last step is to examine β and L. Note that β = 1 and L = m/10. If β < L , β = β + 1 and the algorithm returns to sort complex points. Otherwise, the algorithm examines the Gelman and Rubin convergence 6 , and eventually provides the posterior distribution functions 20 . The value of the Gelman and Rubin convergence should be less than 1.2. The Gelman and Rubin convergence is examined by: where g is the number of iterations within each sequence, B is the variance between the q sequence means, and W is the average of the q within-sequence variances for the parameter under consideration 20 .

Study AREA. Meadowbrook
Creek was selected to test the methods proposed in this study 8 . The creek flows through the City of Syracuse in New York. Thus, this catchment consists of high residential and industrial land covers, which contribute runoff to the main channel. The creek is about 4 km long. A portion of this creek (475 m long) was selected for the modeling for a period of June [13][14][15][16][17][18][19]2012 in this study. The upstream boundary condition in the HFLUX model was set based on the water temperature of the creek observed at the upstream station 8 . The uncertainty of the model inputs was examined at three selected points as shown in Fig. 2. Note that the input values at these three points had greater relative changes than the changes at other locations, which provided the possibility to improve the evaluation of the algorithm performance. In addition, these three locations had the same sampling of the selected input data. During the simulation period, the streamflow velocity varied within a range of 0.06-0.63 (m/s). The daily temperature changed between 8.9 and 28.2 °C. The relative humidity, used to calculate the total energy flux to the stream per surface area, changed from 36 to 93%. The creek bed mainly consisted of clay, cobbles, sand, and gravel materials. The basic statistics of the data/variables used in the HFLUX model are presented in Table 1. Figure 2 shows the study area, the creek, and the three selected points for analysis.

Ethical approval. All authors accept all ethical approvals.
Consent to participate. All authors consent to participate.

Results and discussion
In the uncertainty analysis, the inputs at the three selected points on the main channel include: the depth and width of the creek, the percentage of shade, and the streamflow. Note that the shade value at a location ranges from 0 to 1 with 0 being no shade and 1 being total shade. The values of these inputs estimated by the SCEM-UA algorithm at different locations along the creek are shown in Fig. 3.
To implement the SCEM-UA, the posterior distribution functions for the selected inputs of the model were first developed. The initial ranges of the four selected inputs, according to the related literature and field observations, were selected to form the first generation of the SCEM-UA population using the Latin hypercube sampling. Table 2 presents the values of the maximum and minimum values of these inputs. www.nature.com/scientificreports/ The SCEM-UA simulations were performed for a spatial interval of 10 m and a time step of 30 min. The number of the SCEM-UA iterations considered for this study was 30,000, depending upon the Gelman and Rubin convergence criterion 6 , a statistical indicator used for examining the convergence of the chains. The value should be less than 1.2 20 . Figure 4 shows the changes in the Gelman and Rubin convergence, indicating that the values for all inputs are less than 1.2.
The posterior distribution functions for the selected points of the HFLUX model were obtained based on the SCEM-UA algorithm in the form of histograms for the inputs. The width or range of the histograms indicates the uncertainty of the inputs and the average value or the most likely value of the histograms is the most likely prediction value of the SCEM-UA algorithm for that input. Figures 5, 6, 7 and 8 show the histograms of the posterior distribution functions of the selected inputs. As shown in Fig. 5, the uncertainty for the creek width at the first point/location ranges from 5 to 6 and the most likely value predicted by the SCEM-UA is 5.6 with a probability of 50%. The range of the uncertainty for this input at the second point/location is from 3.6 to 4.05 and the most likely value is 3.7 with a probability of approximately 60%. For this input at the third point/location, the uncertainty ranges from 2.5 to 5.2 and the most likely value is 3.1 with a probability of 50%. According to Fig. 6, the uncertainty ranges for the creek depth at the first, second, and third points/locations are 0.08-0.125, 0.07-0.25, and 0.26-0.44, respectively. The most likely values for this input at the first, second, and third points/ locations are 0.085 with a probability of nearly 70%, 0.13 with a probability of almost 70%, and 0.36 with a probability of almost 55%, respectively. Figure 7 shows the uncertainty ranges and the most likely values for the percentage of shade at the first, second, and third points/locations. The ranges for the tree points are respectively 0.2-0.47, 0.17-0.44, and 0.15-0.42, while the most likely values are respectively 0.26 with a probability of almost 60%, 0.23 with a probability of almost 40%, and 0.18 with a probability of almost 50%. Similarly, Fig. 8 shows the uncertainty ranges and the most likely values for the streamflow at the first, second, and third points/locations. The ranges for the three points respectively are 0.06018-0.06027, 0.0716-0.0725, and 0.073-0.0739, while the most likely values respectively are 0.06023 with a probability of almost 30%, 0.0719 with a probability of almost 60%, and 0.0735 with a probability of almost 55%.
Based on the statistical results related to the inputs, their sensitivity can be identified. The smaller the coefficient of variation (CV) of an input, the more sensitive the input. Figure 9 indicates the order of sensitivity of  www.nature.com/scientificreports/ the inputs. Accordingly, inputs Q 2 , Q 3 , and Q 1 with CV values of 1.94%, 2.57%, and 3.18%, respectively, have the higher sensitivity, implying very small changes in the histograms of these three points. Inputs D 1 , Sh 3 , and Sh 2 with CV values of 78.14%, 65.08%, and 60.48%, respectively, have the lower sensitivity. Table 3 shows the CV  Creek width at the beginning of the study reach (0 m) Creek width at 375 m from the beginning of the study reach    Figure 10 shows the comparison of the observed data and the most likely values estimated by the SCEM-UA algorithm. According to Fig. 10 the SCEM-UA algorithm overestimated the inputs of W 1 , W 2 , W 3 , D 3 , Sh 1 , Sh 2 , Q 2 , and Q 3 and underestimated the inputs of D 1 , D 2 , Sh 3 , and Q 1 . The most likely values of the selected inputs simulated by the SCEM-UA algorithm were compared with the observed data and their relative errors are shown in Table 4. It can be observed that the maximum relative error (15%) is related to the percentage of shade for the second point/location. Thus, it is suitable to use the SCEM-UA algorithm for evaluating the uncertainties of the HFLUX inputs.

Concluding remarks
This study focused on investigating the uncertainties of the HFLUX model inputs by using the SCEM-UA algorithm. Meadowbrook Creek in the City of Syracuse, New York was selected as an application of the proposed methods. The histograms of the selected model inputs were obtained, based on which the uncertainty of the inputs and their most likely values were determined. Specifically, the width of each histogram indicated the uncertainty of the corresponding input. It was found that the creek depth at the beginning of the study reach with a CV of 78.14% was the most uncertain and thus the least sensitive input. The streamflow at 375 m from the beginning of the study reach with a CV of 1.94% was the least uncertain and thus the most sensitive input. The mean of each histogram indicated the most likely value for the corresponding input. The performance of the SCEM-UA algorithm was evaluated by comparing the observed data and the most likely values from the SCEM-UA algorithm. Based on the comparisons, the streamflow at 375 m from the beginning of the study reach with the smallest relative error (0.03%) was the most accurately estimated input, while the percent shade coefficient at 375 m from the beginning of the study reach with the largest relative error (15%) was the least accurately estimated input. These results demonstrated that the SCEM-UA algorithm was suitable for analyzing the uncertainties associated with the inputs of the HFLUX model.

Data availability
All of the required data have been presented in our article.