Introduction

Flood routing is a process of simulating the movement of water in a river or stream system during a flood event using mathematical models. The goal of flood routing is to predict the behavior of the water as it moves through the system, including the peak flow, the timing of the peak flow, and the overall duration of the flood. There are two basic approaches to routing flood waves in natural channels: hydrologic (lumped) and hydraulic routing. Hydrologic (lumped) routing is a simplified approach that treats the entire river or stream system as a single unit. This approach relies on the storage continuity equation, which states that the change in storage in a system is equal to the difference between the inflow and outflow. Hydraulic routing is a more complex approach that considers the physical characteristics of the river or stream system, such as the channel geometry, the roughness of the bed, and the presence of any structures. Hydrologic routing is typically used for flood forecasting and planning, while hydraulic routing is more commonly used for the design and operation of flood control structures1, 2. The Muskingum model is a widely accepted hydrologic routing model due to its adequate levels of accuracy and the reliable relationships between its parameters and channel properties.

Muskingum model

The Muskingum method is founded on the fundamental principles of mass and momentum conservation, positing that the discharge at a given point in the river can be derived by subtracting the outflow from the inflow3. This model uses a linear reservoir approach to model the river channel characterized by two parameters, namely, the wave travel time (K) and the reach weighting factor (x).

The travel time (K) is the time required for the water to travel through the reach, which is dependent on the channel geometry, roughness, and other hydraulic characteristics. The reach weighting factor (x) is the proportion of the discharge that enters the reach from the upstream section, which is also known as the weighting coefficient. These parameters can be determined using various techniques, including trial and error, optimization algorithms, and regression analysis. The Muskingum model can be represented mathematically as follows4:

$$S=K\left[xI+\left(1-x\right)O\right]$$
(1)

where O is the discharge at the downstream end of the reach (m3/s), and I is the discharge at the upstream end of the reach (m3/s). x is the weighting factor for the reach (ranges between 0 and 0.5 for reservoir storage and between 0 and 0.3 for stream channels5), K is the travel time for the reach(s), and S is the storage volumes of the reach (m3). By combining Eq. (1) with the continuity equation an explicit equation can be obtained to calculate the outflow at the next time step:

$${O}_{2}={C}_{0}{I}_{2}+{C}_{1}{I}_{1}+{C}_{2}{O}_{1}$$
(2)

The subscripts 1 and 2 on I and O represent the values at time t1 and t2 respectively. C0, C1, and C2 are the coefficients.

The traditional linear Muskingum model seeks a method of parameter estimation to determine the values of K and x. However, the linear Muskingum model leads to considerable inaccuracy in the forecast of flood behavior throughout its propagation along a river because natural channel reaches often have a nonlinear storage-discharge connection. To address this limitation, models such as the Muskingum model have been modified to account for the nonlinearity of flow movement processes. Gill6 introduced a nonlinear storage equation using the exponent of the Muskingum storage equation as the third parameter, and later models such as the Nonlinear Muskingum model (NLMM) have been developed to include lateral inflows and better simulate the nonlinear processes of flood movements in rivers. As stated by Perumal et al.7 there exists no “truly physically based” flood routing model which does not require any calibration. Although the roughness factor is a property of the natural conditions, it is considered as a model tuning parameter in the routing process of Muskingum–Cunge models8, 9.

Furthermore, as pointed out by Koussis, nonlinear routing models like the nonlinear Muskingum model possess an advantage in their ability to accurately replicate the rapid surge of a flood wave, a task that linear models often struggle with10, 11. It should be noted that the Muskingum–Cunge is not a “linear model”. However, it is a “time-variant linear model”, which means that it is “locally linear” in time, but the overall behavior is nonlinear. Every flood routing model necessitates specific input parameters and data. In some river segments, all the required inputs are readily available, and the choice of a model can be based on personal expertise and computational capacity. When there are no constraints on these factors, the use of a dynamic wave model is the most suitable option. For the Muskingum–Cunge model, essential input parameters include initial conditions, upstream boundary conditions, Manning's roughness coefficient, length of the routing reach, river cross-sections, and the bed slope12, while nonlinear Muskingum model requires the initial condition, upstream boundary condition and the hydrologic parameters. One of the important motivations of the authors is to suggest alternative hydrological flood routing model for using in modeling software of hydrologic processes of watershed systems such as HEC-HMS (hydrologic modeling system) and SWAT (soil and water assessment tool). The proposed and rigorously validated distributed hydrological Muskingum model can serve as a valuable addition to hydrological software, effectively mitigating uncertainties in flood modeling. It's important to note that this study primarily emphasizes the nonlinear Muskingum routing models.

Nonlinear Muskingum model

Previous research has advocated a nonlinear Muskingum model for accounting nonlinearity, which allows for a better representation of the nonlinear relationship between the inflow at the upstream end and outflow at the downstream end of the river channel, which is presented in Eq. (3)5, 6, 13,14,15:

$$S={K\left[x{I}_{t}+\left(1-x\right){O}_{t}\right]}^{m}$$
(3)

where m takes the nonlinearity without lateral inflow to the models. These models feature an extra parameter m (= exponent power), which may be calculated using various parameter estimation approaches. On the other hand, K with dimension of \({L}^{3(1-m)}{T}^{m}\) in nonlinear models unlike the linear model does not describe the travel time of the flood wave. In addition, x does not have to be the same as in the linear model. Equation (4) shows a modified storage equation that considers lateral inflow16:

$$\frac{dS}{dt}=\frac{\Delta S}{\Delta t}=\left(1+\beta \right){I}_{t}-{O}_{t}$$
(4)

where β is the parameter accounting for the lateral inflow. The storage at time t + 1 is shown in equation below.

$${S}_{t+1}={S}_{t}+\Delta S$$
(5)

By substituting Eq. (4) into Eq. (5), with consideration of lateral inflow in a nonlinear relationship between the inflow at the upstream end and outflow at the downstream end, the storage at time t + 1 is represented in Eq. (6):

$${S}_{t}=K{\left[\left(1+\beta \right)x{I}_{t}+\left(1-x\right){Q}_{t}\right]}^{m}$$
(6)

The NLMM with the lateral inflow (NLMM-L) has been suggested as an accurate solution method for addressing the nonlinear Muskingum model17,18,19,20,21,22,23,24,25,26. The Muskingum model can be solved using various numerical methods, with the fourth-order Runge–Kutta method standing out as one of the most accurate approaches. Here, the fourth order Runge–Kutta method has been offered as an accurate and acceptable solution method among the different explicit solution methods for addressing the nonlinear Muskingum model since it is simpler than the Runge–Kutta–Fehlberg method27, 28.

It's important to note that Cunge12 established a vital connection between the flood routing parameters within the Muskingum approach and the channel properties, as well as flow characteristics. This connection was achieved by utilizing an approximation error derived from a Taylor series expansion of grid specifications and employing a diffusion analogy. Consequently, Cunge introduced a model known as the Muskingum–Cunge model, which has served as a cornerstone for further research and refinement7, 29,30,31,32,33,34,35.

This particular class of flood routing models necessitates a set of crucial input parameters, including the initial condition, upstream boundary condition, Manning's roughness coefficient, length of the routing reach, cross-sections along the river reach, and the bed slope. Within these inputs, the Manning's roughness coefficient stands out as a notable source of uncertainty for Muskingum–Cunge models. This coefficient is closely linked to surface roughness, vegetation, channel irregularities, channel alignment, silting, scouring, obstruction, channel size, and shape, as well as the magnitudes of stages and discharges36. Most of these factors exhibit variations from one flood event to another within a given river reach9. As a result, the roughness variations within a river reach are inherently three-dimensional, making them challenging to model. Therefore, there's a need to strategically select a single parameter as the roughness coefficient through a calibration process to align the flood routing results, particularly when only a single set of inflow and corresponding outflow hydrographs are available for the considered river reach7, 8.

In contrast, the traditional linear Muskingum model primarily relies on the initial condition, upstream boundary condition, and various hydrologic parameters10, 11. One of the principal motivations of the authors is to propose an alternative hydrological flood routing model suitable for integration into modeling software for hydrologic processes within watershed systems like HEC-HMS (hydrologic modeling system) and SWAT (soil and water assessment tool). Consequently, the primary focus of this study revolves around enhancing nonlinear Muskingum routing models.

To further improve its accuracy and convergence, optimization algorithms like the Salp Swarm algorithm have emerged as effective tools. Salp Swarm algorithm is a population-based optimization algorithm inspired by the swarming behavior of salps. The algorithm starts with a population of salps, each of which has a random position in the search space. In each iteration, the salps move towards the leader salp, which is the salp with the best fitness37. The salps that have the best fitness values are more likely to be selected for reproduction, and their offspring are added to the population. This process continues until the algorithm converges on an optimal solution38, 39. SSA has been shown to outperform other optimization algorithms in terms of both accuracy and convergence speed37 and has the potential to be used to solve a wide variety of problems40, 41.

Through the integration of the Salp Swarm algorithm and NLMM-L Muskingum method, the research seeks to address the challenges associated with snowmelt-induced flooding and provide more precise flood routing solutions for the study area. The objective of this research is to develop a nonlinear Muskingum model for the Red River between two USGS stream gauging stations, Grand Forks, and Drayton, in the US, using flood hydrographs caused by snowmelt in spring. This area was selected due to the recurrent flooding observed in the central part of the Red River and its adjacent floodplain regions, stretching between Grand Forks, ND, and Emerson, ND. These flood events are notably characterized by the substantial seasonal water area that consistently forms during wet spring periods42. The flat terrain and downstream ice jams in the Red River and Lake Winnipeg contribute to the frequent flooding during spring seasons in wet years, including 1997, 2009, 2011, and 2013. The repetitive nature of flood events in this section of the Red River underscores the importance of comprehending and effectively managing flood risks in the region.

The primary objective of this study is to develop a Muskingum model that can accurately estimate river discharge, considering lateral inflow conditions. The research methodology involves estimating the parameters (K, x, m, and β) of the nonlinear Muskingum models using a distributed flood routing model, utilizing the Salp Swarm algorithm. This approach divides the river reach into multiple intervals, allowing individual Muskingum model calculations for each interval, thereby improving the overall accuracy of the estimation process. The findings of this research are expected to contribute to the enhancement of flood forecasting and warning systems in the Red River basin, enabling better preparedness and response to flood events.

Study area

The Red River Basin, spanning both the United States and Canada, encompasses an area of 117,185 km2, with most of its expanse situated in North Dakota, South Dakota, and Minnesota43. Figure 1a shows the location of the basin. Characterized by a semi-arid climate, the region experiences cold winters and hot, dry summers, with the primary streamflow occurring during spring and early summer due to snowmelt or heavy rainfall44. The Red River itself is a prominent watercourse within the basin, flowing northward along the border between Minnesota and North Dakota, as well as the Canadian provinces of North Dakota and Manitoba. Notably, the river is prone to frequent flooding, owing to its gentle flow and flat topography, and its broad and shallow floodplain exacerbates the vulnerability to heavy rain or spring snowmelt, resulting in historically devastating floods45,46,47,48,49.

Figure 1
figure 1

(a) Red River basin (b) USGS stations on Red River in Drayton and Grand Forks51. The map was created using ArcGIS Pro 2.8.0 (https://www.esri.com/en-us/arcgis/products/arcgis-pro/overview).

A recent study by Atashi et al.50 investigated various forecasting methods for water levels in flood warning systems. The study found that the Long Short-Term Memory (LSTM) method demonstrated superior accuracy and precision compared to classical statistical and machine learning approaches, making it a reliable choice for flood prediction, particularly for downstream stations lacking discharge information50.

Methodology

Grouping of dataset

For conducting flood routing analysis, it is essential to have observed flow hydrographs at specific upstream and downstream cross-section pairs.

For this study, we selected the existing USGS streamflow gauging stations at Drayton (Station No. 05092000) and Grand Forks (Station No. 05082500). These stations were chosen because they offer vital streamflow data necessary for hydrograph analysis in the region extending from Grand Forks to the US-Canada border. As mentioned, this area has experienced recurrent flooding, particularly in the central part of the Red River and the nearby floodplain regions, which extend from Grand Forks, ND, to Emerson, ND. The locations of the gauging stations are shown in Fig. 1b.

A total of fourteen flood events occurring between 1990 and 2022 were utilized to calibrate and validate the proposed model, with twelve events designated for calibration and two events for validation. These validation events specifically corresponded to the flood events in 2020 and 2022.

To form the groups, specific criteria were considered based on the distinct routing characteristics observed in different flood types. For instance, the 1997 flood primarily resulted from snowmelt with minimal rain-on-snow impact, while the 2022 flood predominantly comprised rain-on-snow conditions. The selection of these criteria was crucial to ensure accurate calibration results across all events. The summarized data for the fourteen flood occurrences between 1990 and 2022 is presented in Table 1, with the events categorized into Group A, Group B, and Group C. The criteria used for forming the groups include:

  1. 1.

    Snowmelt dominant events: flood events where the primary driver was snowmelt with minimal rain-on-snow impact were categorized into Group A. These events typically exhibit specific routing characteristics associated with snowmelt-dominated hydrological processes.

  2. 2.

    Rain-on-snow dominant events: flood events predominantly characterized by rain-on-snow conditions were categorized into Group B. These events show distinct routing behavior resulting from the combined effects of rain and snowmelt on the hydrological system.

  3. 3.

    Mixed events: flood events that had a combination of snowmelt and rain-on-snow conditions were categorized into Group C. These events exhibit routing characteristics influenced by both snowmelt and rainfall contributions.

Table 1 Characteristics of the spring snowmelt flood events at two Red River stream gauging stations.

By categorizing the flood events into these distinct groups, it was possible to calibrate the model separately for each flood type, considering the specific routing behaviors associated with each category. Group A and Group B exhibit distinct variations in terms of precipitation amounts. Specifically, Group A demonstrates a lower range of monthly precipitation, ranging from 57. 2 to 141.0 mm during the selected years. In contrast, Group B showcases a higher range of monthly precipitation, ranging from 187.3 to 276.4 mm, observed specifically in the years 1999, 2004, 2013, and 2022.

Distributed nonlinear Muskingum model incorporating lateral inflows

Nonlinear Muskingum models consist of a series of nonlinear Muskingum reaches, which are further subdivided into equal nonlinear Muskingum sub-reaches. The distributed nonlinear Muskingum model, as depicted in Fig. 2, provides an illustrative representation of this arrangement. Only one set of hydrological model parameters (K, x, and m) needs to be calibrated and used in the nonlinear routing calculations. The flood hydrograph is routed from the main inflow hydrograph at the upstream section to the downstream section of the first sub-reach. The outcome is treated as the inflow for the second sub-reach and is routed subsequently to the downstream section of the second sub-reach52. To get the flood hydrograph at the downstream section of the final sub-reach, this process is repeated sequentially. The number of sub-reaches (NR) can be determined by trying different options and selecting the one that gives the best results. An objective function value and other performance evaluation criteria can be used to compare the different NR options. The continuity and storage equations used in the distributed nonlinear Muskingum model that includes lateral inflows are presented as follows:

$$\frac{d{S}_{t}^{j}}{dt}=\left(1+\beta \right){Q}_{t}^{j-1}-{Q}_{t}^{j}$$
(7)
$${S}_{t}^{j}={K\left[\left(1+\beta \right)x{Q}_{t}^{j-1}+\left(1-x\right){Q}_{t}^{j}\right] }^{m}$$
(8)

where the lateral inflows varied linearly along the river reach and could be represented as a ratio of the inflow rate by considering the β parameter. β allows for the consideration of lateral inflow or outflow from the main channel during flood events. It represents the ratio of the inflow or outflow to the main channel flow within the reach. One of the assumptions used in the modeling process is that β is constant in time that means hydrograph shape of the lateral inflow wave is proportional to the upstream hydrograph inflow. t is the measure of time between zero and the flood's finish time. The spatial index between 2 and NR + 1 is called j. The following stages are used in the routing strategy for the distributed nonlinear Muskingum model utilizing the fourth order Runge–Kutta method:

Figure 2
figure 2

Models for distributed nonlinear Muskingum model: (a) single reach with no sub-reaches, (b) two sub-reaches within a reach, (c) three sub-reaches within a reach, and (d) multi-interval sub-reach within a reach.

  1. 1.

    Choose one, two, three, or more sub-reaches as NR and assume random values for the hydrological model parameters K, x, and m, as well as \(\beta\).

  2. 2.

    Use Eq. (8) to estimate the starting storage. The starting flow rate at each sub-reach's downstream part is the same as the initial flow rate at the sub-reach's upstream section. Calculate the next storage.

  3. 3.

    The next storage is computed by the present value plus the product of the size of the interval, Δt, and an estimated slope. The slope will be a weighted average of the following slopes using the Fourth order Runge–Kutta method:

    $${{L}_{1}}_{t}^{j}=-\left(\frac{1}{1-X}\right){\left(\frac{{S}_{t}^{j}}{K}\right)}^{1/m}+\left(\frac{1+\beta }{1-X}\right){Q}_{t}^{j-1}$$
    (9)
    $${{L}_{2}}_{t}^{j}=-\left(\frac{1}{1-X}\right){\left(\frac{{S}_{t}^{j}+0.5{{L}_{1}}_{t}^{j}\Delta t}{K}\right)}^{1/m}+\left(\frac{1+\beta }{1-X}\right)\left(\frac{{Q}_{t}^{j-1}+{Q}_{t+1}^{j-1}}{2}\right)$$
    (10)
    $${{L}_{3}}_{t}^{j}=-\left(\frac{1}{1-X}\right){\left(\frac{{S}_{t}^{j}+0.5{{L}_{2}}_{t}^{j}\Delta t}{K}\right)}^{1/m}+\left(\frac{1+\beta }{1-X}\right)\left(\frac{{Q}_{t}^{j-1}+{Q}_{t+1}^{j-1}}{2}\right)$$
    (11)
    $${{L}_{4}}_{t}^{j}=-\left(\frac{1}{1-X}\right){\left(\frac{{S}_{t}^{j}+{{L}_{3}}_{t}^{j}\Delta t}{K}\right)}^{1/m}+\left(\frac{1+\beta }{1-X}\right){Q}_{t+1}^{j-1}$$
    (12)

By weight averaging these four slopes, one can calculate the next storage by using the following equation:

$${S}_{t+1}^{j}={S}_{t}^{j}+\frac{\Delta t}{6}\left({{L}_{1}}_{t}^{j}+2{{L}_{2}}_{t}^{j}+2{{L}_{3}}_{t}^{j}+{{L}_{4}}_{t}^{j}\right)$$
(13)
$${Q}_{t+1}^{j}=\left(\frac{1}{1-X}\right){\left(\frac{{S}_{t+1}^{j}}{K}\right)}^{1/m}-\left(\frac{X}{1-X}\right)\left(1+\beta \right){Q}_{t+1}^{j-1}$$
(14)
  1. 4.

    Calculate the next outflow by using the following equation:

  2. 5.

    Repeat Steps 3 and 4 for the following time intervals.

  3. 6.

    Repeat Steps 2 and 5 for subsequent sub-reaches.

Salp swarm algorithm (SSA)

The Salp swarm algorithm (SSA) is a population-based swarm intelligence algorithm developed in 2017 by Mirjalili et al.37. The food source, which is the objective of the swarm, is represented by F. The leader of the swarm updates its position using a specific equation below:

$${x}_{j}^{1}=\left\{\begin{array}{c}{F}_{j}+{c}_{1}\left(\left(U{b}_{j}-l{b}_{j}\right){c}_{2}+l{b}_{j}\right) {c}_{3}\ge 0\\ {F}_{j} - {c}_{1}\left(\left(U{b}_{j}-l{b}_{j}\right){c}_{2}+l{b}_{j}\right) {c}_{3}<0\end{array}\right.$$
(15)

where \({x}_{j}^{1}\) is the position of leader in jth dimension, \(U{b}_{j}\) are the upper and lower boundary at jth dimension, \({F}_{j}\) is the food source position. The coefficient \({c}_{1}\) plays an important role in SSA balancing exploration and exploitation. During the process of optimization, exploration refers to searching the search space thoroughly to find better solutions, while exploitation refers to utilizing the information present in the local region to improve the current solution. The parameter \({c}_{1}\) is gradually decreased over iterations and can be calculated using the following formula.

$${c}_{1}=2{e}^{{-\left(\frac{4t}{L}\right)}^{2}}$$
(16)

where l is the current iteration and L is the maximum number of iterations. The parameters \({c}_{2}\) and \({c}_{3}\) are random numbers generated within the interval [0,1]. \({c}_{3}\) is responsible for indicating whether the next position of current leader salp should be toward + ∞ or -− ∞. The other members of the salp swarm update their positions based on Newton's law of motion, which is expressed using the following equation:

$${x}_{j}^{i}= \frac{1}{2}a{t}^{2}+{v}_{0}^{t}$$
(17)

where \(i\ge 2\), \({x}_{j}^{i}\) is the position of the ith follower in the jth dimension, t is the time, \({v}_{0}\) is the initial speed, and \(a= \frac{{v}_{final}}{{v}_{0}}\) where \(v=\left(x-{x}_{0}\right)/t\).

Since the time is considered as iterations and \({v}_{0}=0\), Eq. (15) can be reformulated as the equation below:

$${x}_{j}^{i}=\frac{1}{2}({x}_{j}^{i}+{x}_{j}^{i-1})$$
(18)

where \(i\ge 2\), \({x}_{j}^{i}\) is the position of the ith follower in the jth dimension.

The main steps of the SSA can be summarized as follows (see Fig. 3):

  • Parameter initialization: the algorithm starts by initializing the parameters such as the population size N, number of the iterations t and the maximum number of iterations \({max}_{itr}\).

  • Initial population: we generate the initial population \({x}_{i}\), \(i=\left\{1, ..., n\right\}\) randomly in the range [u,l], where u and l are the upper and lower boundaries, respectively.

  • Individual evaluations: every individual (solution) within the population is assessed by determining its value using the objective function, and the best overall solution is designated as F.

  • Exploration and exploitation: to balance between the exploration and the exploitation of the algorithm, the value of the parameter \({c}_{1}\) is updated as shown in Eq. (16).

  • Update the position of the solutions: the position of the leader solution and the other follower solutions are updated as shown in Eqs. (15) and (18), respectively.

  • Boundary violations: boundary violations occur when a solution goes beyond the allowable range of the search space while updating, and it is then adjusted to fall within the problem's range.

  • Termination criteria: the number of iterations t is increased gradually until it reaches to the maximum number of iterations \({max}_{itr}\). Then the algorithm terminates the search process and produces the overall best solution found.

Figure 3
figure 3

Optimization algorithm flowchart for salp swarm algorithm (SSA).

Statistical performance evaluation criteria

Statistical performance evaluation criteria are metrics used to assess the accuracy and reliability of mathematical models, such as hydrological or hydraulic models. Some common statistical performance evaluation criteria used in the papers referenced5, 14, 53,54,55,56,57,58,59 are applied to assess the performance of the SSA-based routing results. These criteria are described below.

Sum of squared errors (SSE): the SSE measures the sum of the squared differences between predicted and observed values. It measures the model’s overall error and indicates how well the model fits the observed data.

$$SSE={\sum }_{t=1}^{N}{\left\{{O}_{t}-{\widehat{O}}_{t}\right\}}^{2}$$
(19)

where \({Q}_{i}\) and \({\widehat{Q}}_{\upiota }\) respectively are the observed and calculated outflow rates at the ith time, and N is the number of data.

The sum of absolute differences (SAD): the SAD measures the sum of the absolute differences between predicted and observed values. It is a measure of the overall deviation of the model from the observed data and is useful for evaluating the model's performance under conditions where large errors may have a significant impact.

$$SAD={\sum }_{t=1}^{N}\left|{O}_{t}-{\widehat{O}}_{t}\right|$$
(20)

Difference of peak observed (DPO): the DPO measures the difference between the predicted discharge values from the observed peak discharge values. It is a measure of the model's ability to accurately predict extreme events, such as floods or droughts, that may have a significant impact on the environment or society53.

$$DPO= \left|{Peak}_{routed}-{Peak}_{Observed}\right|$$
(21)

The deviation of peak time of routed and actual outflows (DPOT).

$$DPOT=\frac{\left|{T}_{\text{pest}}\hspace{0.33em}-{\text{T}}_{\text{pobs}}\right|}{\Delta t}$$
(22)

\({T}_{pobs}\) and \({T}_{pest}\) denote the observed and estimated times to peak discharge, respectively. All the criteria presented are measurements of the accuracy of a routing model, with the optimum value at 0.

Results

Table 2 presents estimates of hydrologic parameters and performance evaluation criteria (PEC) values for Group A, considering different numbers of sub-reaches. The table encompasses sub-reach configurations ranging from 1 to 4, each characterized by Muskingum parameters (k, x, m, and β). The performance evaluation criteria include SSE, SAD, DPO, and DPOT.

Table 2 Hydrologic parameters estimates and PEC values for different numbers of sub-reaches applied for Group A.

The table clearly demonstrates that the number of sub-reaches employed has a substantial influence on both the estimates of hydrologic parameters and the corresponding PEC values. Analysis of the performance criteria indicates that the optimal performance is achieved when the Red River at Drayton station is considered as a single sub-reach. This is primarily due to the single sub-reach model exhibiting the lowest SSE values. Using the current study algorithm for Group A flood, the optimized parameters were determined to be K = 0.54, x = 0.24, m = 1.478, and β = 0.19 for NR = 1.

The hydrologic parameters estimates and performance evaluation criteria (PEC) values for Group B were investigated using various numbers of sub-reaches, ranging from 1 to 4. Each sub-reach was characterized by Muskingum parameters, namely K, x, m, and β. These findings are presented in Table 3, along with performance metrics including SSE, SAD, DPO, and DPOT. The analysis of performance criteria revealed that the optimal performance was achieved when employing two sub-reaches (NR = 2). This was primarily attributed to the fact that NR = 2 yielded the lowest SSE values. It is noteworthy that the Muskingum parameters for NR = 2 were determined as follows: K = 0.06, x = 0.06, m = 1.46, and β = 0.16.

Table 3 Hydrologic parameters estimates and PEC values for different numbers of sub-reaches applied for Group B.

The calculation of performance evaluation criteria (PEC) values necessitates the availability of at least one year of observed flood data for the corresponding validation period. Unfortunately, for Group C, no year of observed data was available, rendering the calculation of PEC values impossible. However, the hydrologic parameters estimate provided in Table 4 can still be employed to evaluate the performance of the model. These estimates allow for comparisons between the model's predictions and those of other models. The Muskingum parameters determined using the SSA technique are presented in Table 4.

Table 4 Hydrologic parameters estimates and PEC values for different numbers of sub-reaches applied for Group C.

For evaluation of the developed model in real field condition validation step has been considered. Tables 5 and 6 present examples of calibration simulation for Group A and B, respectively. Table 5 presents the measured inflow data from the Grand Forks USGS station, measured outflow data from the Drayton USGS station, and the corresponding routed outflow values for Group A, specifically when using a single sub-reach (NR = 1). The calibration years considered for Group A include 1997, 2001, 2005, 2006, 2009, 2010, and 2018. Similarly, Table 6 displays the measured inflow data from the Grand Forks USGS station, measured outflow data from the Drayton USGS station, and the associated routed outflow values for Group B. For Group B, the calibration years selected are 1999, 2004, and 2013, and these results correspond to the scenario where two sub-reaches are utilized (NR = 2).

Table 5 Calibration and calculations for single-reach Muskingum flood routing applied to Data of Group A.
Table 6 Calibration and calculations for single-reach Muskingum flood routing applied to Data of Group B.

These tables provide essential data for the calibration and validation of the respective models and facilitate the comparison between the simulated and observed flow values during the specified calibration years for each group.

Table 7 displays the validated inflow data from the Grand Forks USGS station, measured outflow data from the Drayton USGS station, and the corresponding routed outflow values for Group A in the year 2020 in Drayton, specifically when utilizing a single sub-reach (NR = 1).

Table 7 Validation and calculations for single-reach Muskingum flood routing applied to Data of Group A.

Likewise, Table 8 presents the validated inflow data from the Grand Forks USGS station, measured outflow data from the Drayton USGS station, and the associated routed outflow values for Group B in the year 2022 in Drayton. For Group B, the model configuration involved two sub-reaches (NR = 2).

Table 8 Validation and calculations for multi-reach Muskingum flood routing applied to Data of Group B.

Furthermore, during the evaluation of the model's performance, it was observed that the model accurately predicted the maximum outflow discharge for both Group A in 2020 (NR = 1) and Group B in 2022 (NR = 2). Specifically, for Group A, the model's prediction of the maximum outflow discharge on April 14th exhibited only a 3.7% difference compared to the measured value. Similarly, for the 2022 flood classified as Group B with NR = 2, the difference was 2.84%. These minimal differences indicate a close agreement between the model's predictions and the actual measured values, suggesting a high level of accuracy in the model's ability to forecast future floods.

Figures 4 and 5 complement the information found in Tables 7 and 8 by presenting a visual depiction of the validation results for the flood data of Group A and B. The figures provide a graphical representation that enhances the understanding and analysis of the validation outcomes for the respective datasets.

Figure 4
figure 4

Validation for Data of Group A, 2020 flood data.

Figure 5
figure 5

Validation for Data of Group B, 2022 flood data.

Figures 4 and 5 serve as visual representations of the validation results for the flood data of Group A and B, respectively. These figures complement the information presented in Tables 7 and 8 by providing a graphical depiction of the validation outcomes. By utilizing visual representations, it becomes easier to comprehend and analyze the results of the validation process for each dataset. These figures display important information such as observed and simulated flood hydrographs, peak flow values, and timing of peak flows. They also show how well the simulated hydrographs match the observed data, indicating the accuracy of the routing model.

Upon careful examination of Tables 2, 3 and 4 it is evident that during the validation phase, the maximum deviation in peak time between the routed and actual outflows was only 2 units. This observation aligns with our previous experiences and findings, where we have encountered similar discrepancies when employing a dynamic wave model that considers all the necessary inputs for flood modeling1, 34. It is worth noting that when soft computing models are brought into the equation, such errors tend to be more pronounced.

It is essential to note that the absence of attenuation in the flood peak on the downstream side could be due to various factors. These include the absence of significant floodplain storage, a sufficiently large channel capacity, the lack of hydraulic controls such as dams or levees, and specific hydrological conditions such as continuous heavy rainfall or rapid snowmelt.

Discussion

In our recent paper, we conducted an existing case study to assess the performance of the proposed model using the inflow-outflow hydrograph data from Wilson60, which represents a smooth single-peak hydrograph. The distributed Muskingum method, implemented with the WOA algorithm, was employed for the routing process. Table 9 provides an overview of the Muskingum model routing parameters and performance evaluation criteria (PEC) obtained for the nonlinear Muskingum model parameters, as described in the material and methods section of the paper. The optimal parameter values for the Wilson flood data were determined using our research technique, yielding K = 0.865, x = 0.043, m = 1.478, and β = − 0.008 for NR = 3.

Table 9 Comparison of the outflow hydrographs calculated for the Wilson flood data.

Furthermore, we analyzed the impact of varying the number of sub-reaches (NR) on the model's performance. The maximum outflow discharge, occurring at the 60th hour of the flood data, was examined for NR values ranging from 1 to 5. The results showed that the difference in peak discharge varied as follows: − 1.67%, 1.01%, 0.13%, 1.18%, and 0.96% for NR = 1 to NR = 5, respectively. Notably, NR = 3 exhibited the lowest difference in peak discharge compared to the Wilson flood data.

It is clear from comparing the findings of this case study with prior studies that the proposed model demonstrates consistent accuracy in predicting peak discharge across different datasets. Both the Group A and Group B case studies, along with the Wilson flood data, indicate close agreement between the model's predictions and the observed values. This reaffirms the model's reliability and its potential for effectively predicting future floods.

In comparison to the previous studies, our research offers distinctive findings. While Ayvaz and Gurarslan61 introduced a novel partitioning approach for flood routing models, our study focuses on analyzing the impact of varying the number of sub-reaches on the performance of the nonlinear Muskingum model. In contrast to the primary emphasis of Hirpurkar and Ghare62 on parameter estimation, our research evaluates the accuracy of peak discharge predictions using the proposed model. Furthermore, while Barbetta et al.63 addresses river discharge estimation and rating curve development, our study concentrates on peak discharge prediction and the model's reliability across diverse datasets. Importantly, our study distinguishes itself by utilizing a specific case study with real flood data, providing a unique and valuable contribution to the field. By analyzing actual data, our findings offer practical insights, showcasing the effectiveness of the proposed model in real-world scenarios. This emphasis on practicality enhances the applicability and relevance of our research.

Conclusion

This study aims to develop and evaluate a nonlinear Muskingum model for river modeling, with a specific focus on enhancing accuracy through the incorporation of lateral inflow. The Grand Forks and Drayton USGS stations serve as case studies for parameter estimation using the distributed Muskingum method. The results demonstrate that the developed nonlinear Muskingum model effectively routes floods through these stations. Our selection of this area is rooted in a prior spatial analysis we conducted, which highlighted a particular region's vulnerability. This area, situated between Grafton city, Grand Forks, and Emerson, demonstrated a high susceptibility to severe floods when we examined the permanent water area (PWA) and seasonal water area (SWA). Building upon these previous findings, our current research hones in on this specific locale, aiming to delve deeper into flood dynamics and comprehensively grasp the underlying factors that make it exceptionally vulnerable42.

Through an analysis of hydrologic parameters and model performance, the study underscores the significant impact of the number of sub-reaches on parameter estimation and modeling precision. Optimal model performance varies between case studies, underscoring the importance of selecting the appropriate number of sub-reaches for precise peak discharge predictions.

For Group A, where snowmelt is the primary driver with minimal rain-on-snow impact, the findings indicate that a single sub-reach provides the best performance in accurately predicting peak discharge. This suggests that a simplified representation of the river system adequately captures the routing characteristics of this flood type. In contrast, for Group B, representing flood events primarily characterized by rain-on-snow conditions, the study reveals that employing two sub-reaches optimizes model performance. This implies that considering additional sub-reaches is essential for accurately capturing the routing dynamics and behavior of this specific flood type.

The findings hold practical significance for flood prediction and management, offering valuable insights to decision-makers and stakeholders involved in flood mitigation. The proposed model holds potential for broader application beyond the specific case studies, contributing to enhanced river system modeling and flood management practices. For future studies, non-linear Muskingum model and Muskingum-Cunge approach should be compared in real field case studies to better understand their abilities compare to each other. Moreover, β parameter of the proposed model can be consider variable to examine effects of its variation on the model prediction performance.