Introduction

Flooding of fluvial systems is one of the most serious climate-related threats to people’s livelihoods, impacting socio-economic development1. Even though this threat is already considerable, climate change and growing urbanization in flood-prone areas are anticipated to exacerbate it2,3,4. The latest report from the Intergovernmental Panel on Climate Change stresses the need to deal with the worsening effects of climate change and ensure the most vulnerable people can adapt and stay safe5. In recent decades, both the population and the economic value of material assets in flood-prone areas have increased2. According to McDermott6, 1.81 billion people (almost 23% of the world’s current population) are exposed to flooding, which poses a major risk to people’s lives and livelihoods. Indeed, flooding already has a huge impact on economies and communities, since the worldwide annual cost of urban flood damage is USD 120 billion7. Besides, long-term trends and changes point to an overall rise in population and asset exposure to flooding over the coming decades8.

In addition to flood mitigation, it is vital to reduce vulnerability in flood-prone areas via well-designed land-use planning and flood-adapted urban development in order to avoid a future increase in flood risk9. Land-use planning is regarded as one of the most important measures in mitigating flood risk, and it is acknowledged that an integrated strategy embedded in spatial planning procedures plays a crucial role in risk management10. Land-use planning employs formal instruments such as flood hazard mapping to restrict settlement development in hazard areas and assure flood-adapted land uses11. Since flood hazard mapping enables spatial planners to limit development to the most suitable sites, accurate and relevant information on the threat of an area prone to flooding is crucial for land-use planning12,13.

The National Flood Insurance Program in the US and the European Floods Directive (2007/60/EC) are two examples of legislation that promotes flood risk management approaches that encourage land-use planning to prevent new development in flood-prone areas14. In real-world and research contexts, deterministic flood hazard (DFH) assessment is the approach primarily used for this purpose. This approach relies on using fixed model input data and boundary conditions, establishing rigid limits within which all assets and individuals are equally susceptible to flooding and those outside of which are truly safe15. Consequently, DFH maps may have detrimental social and economic repercussions since they are used to certify flood risk and assist in decisions about how to plan land uses in flood-prone areas15. The above argument is supported by the fact that flood mapping is subject to substantial uncertainties arising from the general procedures included in the models, how they are set up, and the input data16. Given all these uncertainties, the outcomes of a deterministic approach, which simply considers a single system configuration, may be spuriously precise17.

Using probabilistic flood hazard (PFH) maps appears reasonable in the context of uncertainty. PFH includes the assessment of different types of uncertainty related to the natural variability of hydrological processes (aleatory uncertainties) and the lack of knowledge or model simplifications (epistemic uncertainties)18. Di Baldassarre et al.19 argued that probabilistic flood mapping should be employed instead of deterministic flood mapping because: (i) uncertainty is always present in hydrological and hydraulic analysis and can’t be ignored; (ii) uncertainty can only be shown reasonably when it is quantified and displayed, and this is only possible in a probabilistic framework; and (iii) experts should provide decision-makers with understandable probabilistic flood maps to support and guide them in making decisions.

The benefits of assessing flood hazards probabilistically have been proven in recent decades16,19. However, studies that currently take a probabilistic approach typically tackle the problem from a fragmented perspective, accounting for only specific uncertainty sources, e.g., deciphering effects of uncertain boundary conditions15; uncertainty analysis linked to the rainfall-frequency analysis20,21; uncertainty assessment of flood hazard due to levee breaches22,23; depicting uncertainty associated with streamflow, land use, or geomorphic adjustment14,24,25, or considering any other combination of uncertainties, e.g., discharge, topographic, and roughness26,27.

Also, the influence of the input data and boundary conditions on the outputs of flood hazard analysis is seldom evaluated using sensitivity analysis28, despite its necessity for reliable flood risk management29. With a few exceptions, when these impacts are analyzed, they are not characterized spatially30,31. Finally, convergence analysis32 is commonly used to evaluate the consistency and reliability of the PFH analysis qualitatively rather than quantitatively20 and for the entire flooded area, but in the vast majority of cases without verifying that the hydrodynamic model converges in every cell of the studied domain30 for both water depth and flow velocity outcomes.

Here, we characterize the most important uncertainty sources affecting the implementation of the European Floods Directive’s third cycle (2022–2027). To do this, we provide a fully integrated floodplain modeling information system to probabilistically quantify errors in flood hazard maps. To this end, we analyzed the 500-year flood since it represents the flooded area for extraordinary events, encompassing all possible restrictions for land uses based on the categories of low, medium, and high flood hazards. We focused on the Duero River (Spain), which crosses the city of Zamora, where historically there have been large flood events33 (Supplementary Fig. 1), disrupting the population and causing large amounts of damage and fatalities34. We use Monte Carlo (MC) simulations to derive PFH maps aimed at (i) conducting an uncertainty analysis to identify inaccuracies in flooding depths, flow velocities (both related to flood hazard), and flooding area model outputs arising from the identified uncertainty sources; (ii) performing a local sensitivity analysis and global sensitivity analysis to identify the model inputs that have a major impact on hydraulic outputs; and (iii) accomplishing a convergence analysis to demonstrate the consistency and reliability of the model outputs. To achieve this, the central limit theorem (CLT) was applied to the total study area and to each cell separately. The PFH maps created from the approach deployed here were compared to a DFH map built using the same input data and boundary conditions considered to construct the PFH maps. The degree of agreement between the two approaches was examined using global and category-level indices.

The procedure displayed here revealed that model sensitivity varies spatially, with the most sensitive input data being those with the greatest level of uncertainty (e.g., flood quantiles, channel roughness). Our findings demonstrate conclusively that the substantial uncertainty of the upstream boundary condition to the hydrodynamic model results in a highly variable output of the flooding area specified by the PFH approach. Meanwhile, there is a significant disparity between deterministic and stochastic flood zones. Even though deterministic and stochastic flood zones overlap, there may be differences between flood hazard categories. Our results also suggest that, although the stochastics model converged on a broad scale, there were specific areas of the studied domain where the convergence was not achieved. Particularly, flow velocities did not converge on slope breaks between riverbanks and the floodplain, and water depths did not converge along floodplain boundaries when water depths were smaller than the digital surface model (DSM) error, nor in complex surface-model areas with considerable related errors. These findings have major implications for the applicability of the European Flood Directive, as they reveal that DFH maps are not entirely trustworthy for flood risk management.

Results and discussion

Convergence of the stochastic flood hazard model

The quantification of the convergence of PFH is crucial to better assess and communicate the reliability of flood hazards maps35 and to validate its construction as a stochastic product36. However, it is either ignored or only superficially accounted for in so-called MC-based procedures37. Our analysis centered on the required quantitative convergence assessment37 of the PFH’s three most relevant output data: flooded area (m2), water depth (m), and flow velocity (m s−1).

Quantitative convergence of the flooded area was only achieved after 288 hydraulic simulations, when the mean flooded area was maintained within confidence bounds for at least 60 simulations (Fig. 1a). However, convergence of water depth and flow velocity outputs was reached later at the cell scale (Fig. 1b). This local convergence stability was attained after 421 simulations, when the number of cells with water depth and flow velocity convergence remained visibly stable (Fig. 1b). In terms of computation time, this implied an increase of almost 146.2%, according to the characteristics of the workstation used (see the Methods section). Moreover, cell-level convergence of water depth and flow velocity outputs was not achieved in all evaluated cells. Thus, 26.7% of the analyzed cells lacked cell-level convergence. Flow velocity failed to converge in at least 13.7% of the cells (611.5 m2), while water depth ceased to converge in 9.3% of the cells (417.3 m2). Moreover, 3.7% of cells (163.9 m2) exhibited no convergence in water depth and flow velocity (Fig. 1c).

Fig. 1: Convergence analysis of the PFH model for the 500-year flood.
figure 1

a Depicts the convergence of flooded area outcomes (in ha) achieved through the PFH approach across the entire study area. To determine convergence, the CLT method was employed. It produced a confidence bound (CB) with a width corresponding to the maximum acceptable variance for the flooded area outcomes and a length representing the minimum number of simulations needed (at least 60) to ensure an almost negligible probability of the flooded area results falling outside this bound. Accordingly, the orange CB shows simulations where the model was unstable, the pale-gray vertical line marks the simulation assessing model stability, and the green CB represents simulations in which the model remained stable. b Displays the percentage of cells in the study domain that converge in each simulation for water depth and flow velocity outcomes. The pale-gray vertical line represents the simulation that achieves the highest percentage of convergence for both outcomes, with no major deviations from this simulation. c Shows the stochastic 500-year flood map control aligned with the 0.50 confidence limit (CL). The flooded area is highlighted in green for cells that converged on water depth and flow velocity (yellow for those that did not), red for cells that did not converge on flow velocity, and blue for cells that did not converge on water depth. The orthoimage was obtained from the Spanish National Center for Geographic Information.

These findings have far-reaching implications for the entire flood hazard assessment. Thus, although general convergence is achieved for the flooded area output20,35,38,39, there are local effects that may prevent convergence at the cell level or, at the very least, require a substantially greater number of simulations. Our findings indicate that local convergence in areas with complex geometry, such as urban areas, riverbanks, islands, and bars, is challenging because the hydraulic model has greater difficulty representing reality. This implies that flood hazard assessment in complex floodplains comprising urban areas is subject to uncertainties, with important implications for flood hazard zonation.

Sensitivity of the stochastic flood hazard model

To evaluate the relevance of input data uncertainties in the final PFH model, we performed a local sensitivity analysis on hydraulic outputs associated with the general scale (i.e., total flooded area) as well as a global sensitivity analysis focused on the local scale (i.e., the importance of input data on water depth and flow velocity outputs is displayed at the cell scale). As seen in Fig. 2, the flood quantile is the primary source of uncertainty on the general scale, with the capacity to alter the flooded area by −45.0% to +62.1%. These results support the notion that historical and multi-archive paleoflood data must be appended to systematic data series of instantaneous peak flows to improve at-site flood frequency analysis40,41.

Fig. 2: Local sensitivity analysis of the stochastic flood hazard model (500-year flood) with a specific focus on the flooded area output across the entire study domain.
figure 2

This figure shows a tornado plot, which indicates the relative relevance of input parameters in influencing the flooded area output. This plot displays horizontal bars for each input parameter being examined. The bars are arranged in decreasing order according to their influence on the flooded area output, with the most significant parameter at the top and others following in descending order. The direction in which each bar extends from the centerline indicates whether increasing or reducing that specific parameter value results in an increase or reduction in the flooded area output. The bigger the spread between opposing ends of a bar, the more effect it has on the output.

Flooded area ranked between −49.1% and +54.4% due to land use/land cover uncertainties. Interestingly, another important uncertainty source is related to river channel roughness (from −48.7% to +49.7%), indicating that inconsistencies in the characterization of land use/land cover along the river channel might result in considerable changes to the flooded area. Moreover, as Abily et al.30 have shown, bathymetry and topography are important sources of uncertainty (ranging from −48.6% to +4.3% and from −16.0% to +16.5%, respectively), albeit predominantly in poorly topographically defined locations. Bathymetry, in particular, may have generated a major negative bias in the flooded area, emphasizing the critical need for its accurate characterization.

On the local scale, the Sobol Index (SI), which represents the relative influence of an input data over an output data, indicated that an inadequate definition of an input data leads to greater individual sensitivity. Looking at the water depth outcomes (Fig. 3a), we found that the flood quantile and channel roughness scored highest at 84.6% and 14.2% of cells, respectively. Particularly, the flood quantile showed higher SI values in 9.8% of cells (SI rank: 0.4 to 0.6), moderate values in 70.4% of cells (SI rank: 0.2 to 0.4), and lower values in 4.4% of cells (SI < 0.2).

Fig. 3: Global sensitivity analysis of the stochastic flood hazard model (500-year flood) for the water depth output at the local scale.
figure 3

a Portrays the influence of input factors on this output, considering their spatial weight throughout the entire domain under study. A hierarchical ranking system was established to assess the influence levels of input factors (first, second, third, and fourth ranking positions) by evaluating the SI for each factor. The bar chart illustrates the percentage of cells assigned to each input factor for each ranking position, representing its influence on the water depth output across the entire study domain. b Depicts a map illustrating the spatial representativeness of the first-ranked input factors in relation to the output water depth. The legend below the sensitivity map shows how the SI fluctuates over the whole examined domain for each input factor, with lighter hues of the same color corresponding to lower SI values and darker shades corresponding to higher SI values. The orthoimage was obtained from the Spanish National Center for Geographic Information. The sensitivity maps for water depth and flow velocity outputs for the four ranking positions are displayed in Supplementary Fig. 2.

Roughness seems to have similar relative influence in the model, with SI ranked between 0.4 and 0.6 in 7.2% of cells and between 0.6 and 0.8 in 7.0% of cells. Interestingly, the spatial representation of the first-rank SI (Fig. 3b) indicated that the sensitivity related to the flood quantile is mostly located in cells outside the river channel, whereas the sensitivity associated with roughness is restricted to the river channel. This confirms that the characterization of roughness within the river channel is crucial for the reliability of the PFH model24,26,28,35, even being comparable with the flow quantile sensitivity42. This may also be connected to the significance of roughness in urbanized areas39. As a result, it is preferable to use more robust methods to characterize roughness, particularly in river channels, such as developing flow-dependent schemes that relate grain roughness coefficient to either grain size or channel slope43 or evaluating the roughness linked to riparian zones, whose analysis provides robust results when equations considering tree spacing, trunk diameter, wood area index, and leaf area index are used44.

In addition, our findings implied that local inaccuracies in bathymetry may have a major effect on the water depth and, therefore, on the entire flooded area. As illustrated in Fig. 3a, despite bathymetry being ranked in the first position with a SI greater than 0.4 in only 0.3% of cells, it has a considerable influence on the water depth outcomes, as indicated by the flooded area (Fig. 2). The other analyzed parameters (i.e., land use/land cover, energy slope, and topography) were ranked between the second and fourth positions but had lower SI values (0 to 0.2). These findings imply that there is still considerable room for improvement in minimizing the influence of epistemic uncertainties on the outcomes of PFH, particularly through a better characterization of flood quantiles, roughness of the channel bed, bathymetry and DSM of floodplains. This reinforces the idea that defining the PFH for the 500-year flood in the absence of reliable historical or paleohydrological data is challenging41. Therefore, the sensitivity analysis demonstrated that a thorough characterization of input data, particularly in regard to determining flood quantile, is essential for effective management of the 500-year flood in terms of risk mitigation.

Differences between deterministic and stochastic flood hazards models

Our findings demonstrated that there are considerable disparities between DFH and PFH maps, which have important implications for flood hazard assessment. As shown in Fig. 4a, the DFH model displayed a flooded area of 262.8 ha, while the PFH model showed an expected flooded area of 313.6 ha, ranging from 121.1 ha (0.95 CL) to 442.7 ha (0.05 CL). Consequently, when comparing the PFH and DFH methods, the DFH approach on average underestimated the flooded area by 16.2%. The distribution of flooded area values in the PFH model exhibited a bimodal distribution, which reflects the arrangement of the floodplain and the channel. As a result, a sizeable number of simulations (25.6%) occupied a small area of the floodplain (between 135 and 190 ha), whereas a remarkable 50.6% of simulations occupied the entire floodplain (>320 ha). This implies that the floodplain along the partly confined river reach under study is quickly occupied and that subsequent variability occurs mostly in water depth rather than the total flooded area, which is an output commonly used for flood risk management purposes. As a result, there is a greater probability of experiencing a flood closer to 0.05 CL (0.0034) or 0.95 CL (0.0041) than receiving the expected flood (0.0026). Moreover, the probability of a deterministic flood is even smaller (0.0018). It is quite feasible that the flood extent of any particular flood will be greater or lesser than the expected flood. Accordingly, in 77.8% of cases, the F-statistic estimated between both approaches was less than 0.77, while the F-statistic for the expected flood was 0.84. Figure 4b shows a spatial depiction of both methods, easing the visualization of these differences.

Fig. 4: Comparison between stochastic and deterministic approaches for the 500-year flood.
figure 4

a Depicts the bimodal PDF that follows the set of flooded area values obtained by the PFH approach utilized here. This bimodal PDF is cut vertically by lines depicting flooded areas and their associated probabilities, as determined by the deterministic approach (green dashed line) and the stochastically determined floods (pale-pink line, dark-pink line, and dark-red line) when considering the 0.95 CL, 0.5 CL, and 0.05 CL, respectively. The F-statistic is also indicated to illustrate the level of disagreement between the deterministic flooding and the stochastic flooding for the 0.5 CL. The 500-year flood area produced by using the 0.5 CL is displayed in b. In addition, the limits or contours of the deterministic flood area and stochastically produced flooding are displayed, taking into consideration both the 0.95 CL and the 0.05 CL. The orthoimage was obtained from the Spanish National Center for Geographic Information. For the stochastic velocity map, see Supplementary Fig. 3.

Considering differences in flow velocity (Supplementary Fig. 3) and water depth as the primary hydrodynamic outputs to establish flood hazard categories, the findings revealed greater disparities across the two approaches (Fig. 5). We found that only 45.1% of the cells displayed the same flood hazard category throughout both methods. Along the channel, there was a very high degree of agreement in the flood hazard categories supplied by both methodologies; however, in the floodplain, there was a substantial degree of disagreement in the computed flood hazard categories. These differences are further reinforced by the overall accuracy and Kappa values obtained by comparing the flood hazard categories of the PFH and DFH models (45.1% and 0.3, respectively). Here, we found that the PFH’s definition of the medium-flood hazard category is vague owing to associated uncertainties. As shown in Fig. 5, the DFH only hits 32.9% of this category, whereas the high and low flood hazard categories match 89.9% and 100.0%, respectively. Moreover, the majority of cells (35.5%) did not match with any flood hazard category since they were exclusively deployed by the stochastic approach. Therefore, our findings revealed that the deterministic approach may be inaccurate for the spatial flood hazard category representation and that the medium-flood hazard category could be of limited use when the stochastic approach delivers mapping outputs with high uncertainty (e.g., 500-year flood).

Fig. 5: Spatial representation and quantitative comparison of deterministic flood hazard categories with stochastic flood hazard categories.
figure 5

The level of agreement between the deterministic and stochastic methods in obtaining 500-year flood hazard maps (with the latter considering the 0.5 CL) is shown. The black contour delimits the stochastic outputs. A confusion matrix was employed to assess the level of agreement by comparing cells throughout the entire area of interest. Consequently, global accuracy and the kappa coefficient were computed and visually presented in bar graph style at the bottom left corner of this figure. Additionally, the level of agreement for each flood hazard category is displayed cell-by-cell as a percentage, considering the entire flooded area (refer to the top horizontal bar chart). The percentage of agreement for each hazard category is also depicted in horizontal bar charts positioned directly below the preceding bar chart. Red, orange, and green depict cells in the simulated area where the deterministic and stochastic methods concur for high, medium, and low flood hazard categories, respectively. Light pink and pale-orange colors indicate cells where the deterministic approach underestimates the high and medium-flood hazard categories, as determined by the stochastic method. Lastly, gray color represents pixels that display flooding only when the stochastic approach is employed. The orthoimage was obtained from the Spanish National Center for Geographic Information.

Implications for flood hazard and risk management

Here, we incorporate all prior stochastic outcomes intended to enhance flood hazard characterization in urban areas and reduce inequities in flood risk management, therefore identifying stochastic flood risk management zones (Fig. 6). The sensitivity analysis of extreme flood scenarios demonstrated that poorly defined input data may have a major effect on the outcomes. First, the acceptance of these findings necessitates rigorous assessments of the uncertainties, particularly the flood quantile definition as well as roughness and bathymetry. Second, based on the analysis of the 500-year flood, our results suggest that stochastically-based risk management could require one of the following decisions in order to manage it: (i) focus the assessment on the area defined by the expected flood (0.50 CL); (ii) manage the risk in the area defined by the most frequent scenario flood (between the 0.50 CL and the 0.05 CL); and (iii) manage the risk in two zones defined by two differentiated flooding scenarios according to their probability.

Fig. 6: Stochastic flood risk management zones resulting from stochastic outcomes integration.
figure 6

Accordingly, flood hazard categories of high, moderate, and low are depicted. All of these flood hazard categories are divided into two types of zones based on: (1) their convergence (plain colors) and lack of convergence (grated colors); and (2) their high probability (bright colors) and lower probability of occurrence (pale colors). In the table-format legend, all the possible combinations are shown. The deterministic limit contour is drawn to illustrate the various stochastic flood hazard categories that exist both inside and outside of the deterministic limit contour. The orthoimage was obtained from the Spanish National Center for Geographic Information.

Our findings support the conclusion that the third option would be the best alternative in complex urban areas. The first zone would be defined by the flood extent associated with the most likely case on the first unimodal function (close to the 0.95 CL), and the second zone would be defined by the flood extent associated with the most likely scenario on the second unimodal function (close to the 0.05 CL), which has a lower probability of occurrence than the first zone. Different flood hazard categories might then be considered, and appropriate restrictions may be imposed for each. However, as previously mentioned, the medium-flood hazard category is poorly defined using a stochastic approach, so it is advised to utilize just two hazard categories—high and low—with two types of risk reduction measures—more restrictive and less restrictive—according to the previously specified zones.

Lastly, the convergence map provides crucial information on areas where it is assumed that there is no local convergence, i.e., where the outputs are not credible enough. In these zones, even with a large number of simulations using a stochastic approach, the water depth and flow velocity data seem to be less certain. Thus, it might be decided to adopt more restrictive measures to reduce flood risk. Alternate options include enhancing the model with sensitivity analysis information (i.e., through a well-defined high-sensitivity input) or increasing the number of simulations to achieve better local convergence.

Methods

Model boundaries

The study site is focused on the Duero River crossing the city of Zamora (Supplementary Fig. 1), a sixty-thousand-inhabitant town in northwestern Spain (Castilla y Leon region). Zamora is located in the middle part of the Duero Basin, where the river is characterized by a meandering morphology and vegetated bars. The floodplain at Zamora is asymmetric, with a 300-meter-wide alluvial floodplain on the southern margin and a channel carved into silicified sandstones and conglomerates on the northern side. The average annual discharge in the studied reach is 96.3 m3 s−1 (for the period from 2002 to 2018). Maximum discharges from December to May, which can be 30 times greater than the mean discharge, and minimum discharges from July to September are the general hydrological features. Most of the largest floods are caused by prolonged winter rains (lasting several weeks) related to successive Atlantic frontal depressions and occasionally coupled with snowmelt from neighboring mountain ranges. These floods have gentle hydrographs due to the fact that drainage basin up to Zamora is 46,225 km2. Besides, there are man-made structures around the urban site (i.e., weirs, bridges, walls, and buildings) that conform to a complex environment for flood modeling. Zamora has suffered historical flooding with economic and social flood damages34,45 and is defined as a relevant potential flood hazard area by the Spanish government, whereby the Environmental Ministry applies activity restrictions based on deterministic flood hazard maps.

The model boundary was established along 6.6 km of the city-crossing river (Supplementary Fig. 1), encompassing an area of 835.4 hectares. We utilized the freely accessible 2D hydraulic model HEC-RAS 6.1 by means of diffusion wave equations and a finite difference approach46, taking advantage of its multithreading and automation (i.e., using Python) capabilities38. We developed a robust, geometrically consistent, and detailed hydraulic model20,24. Input data were: (i) a 1-m spatial resolution DSM enhanced with buildings, walls, streets, and bridge piers information; (ii) a 1-m spatial resolution digital bathymetry model (DBM); (iii) Manning’s n coefficients retrieved from land use/land cover data; (iv) upstream (input flow) and downstream (energy slope) boundary conditions; and (v) hydraulic structures. In addition, we built an optimized flexible computational mesh with a minimum cell size of 1 m, and we restricted the model time step between 1 and 5 min, complemented with a warm-up period of 12 h, ensuring a correct water depth and flow velocity calculation.

Framework

Using the hydraulic model outlined above, we devised a framework to compare the traditional DFH approach with the upcoming PFH approach in order to improve the creation of stochastic maps that enhance flood risk management (Fig. 7). We spatially compared water depth, flow velocity, and flood hazard at both a general (within the model’s limits) and local scale (in each 1-meter model cell). We relied on the 500-year flood (Supplementary Note 1) since it was adequately defined by historical data34 and exemplified the study’s objective.

Fig. 7: Stochastic integrated framework for flood risk management.
figure 7

The red arrows and boxes with red outlines show the start and end points of the Python script process; the blue boxes represent the basic steps of the process; and the orange boxes represent the mapping outcomes that were generated at each step.

The deterministic hydraulic model for the 500-year flood was developed utilizing a highly accurate topography that incorporated the DSM and the DBM. The flood quantiles were computed by fitting a log-Pearson Type-III (Supplementary Note 2) to the flow gauge data, which spans 105 years and contains 79 annual maximum instantaneous flow data (Supplementary Fig. 4). The mean value of the ranges specified for each input was used to describe the downstream boundary condition, Manning’s n values, and weir inputs (i.e., weir coefficient and elevation of the top of each weir). Through the deterministic approach, input data uncertainties were not considered.

The stochastic flood approach was based on a MC procedure and consisted of performing: (i) uncertainty analysis to quantify the errors of the flow depth, flow velocity, and flooded area given the uncertainties of the input data; (ii) sensitivity analysis to determine the sources of errors and their significance; and (iii) convergence analysis to know the reliability of the model outcomes.

We tailored these three analyses throughout an updated floodplain modeling information system framework16 (Fig. 7). Within this framework, five actions were taken: (1) define model inputs subject to uncertainty; (2) create a pseudorandom sample with the Latin Hypercube Sampling method47 for running MC models in uncertainty analysis and computing probabilistic maps (Supplementary Note 3); (3) perform sensitivity analysis with Random Balance Designs Fourier Amplitude Sensitivity Test (RBD-FAST) method48 to estimate the first-order sensitivity index and compute sensitivity analysis maps (Supplementary Note 3); (4) perform convergence analysis with CLT as convergence criterion36 and compute convergence analysis maps (Supplementary Note 3); (5) analyze the results of the stochastic uncertainty analysis, sensitivity analysis and convergence analysis maps to find possible hydraulic model improvements or to increase the MC simulations number (if necessary). To implement the floodplain modeling information system framework, we created a script in Python 3.10 and ran it in a workstation with a CPU Intel(R) Core (TM) i7-8750H (2.20 GHz, 2208 Mhz, 6 main cores, and 12 logic cores), and a 128 GB-capacity RAM.

After acquiring the deterministic and stochastic maps, we compared the findings on a general and local scale using the expected stochastic values49,50 (see Figs. 4 and  5). We evaluated differences in the flooded area (ha) on a general scale using the F form statistic (Eq. 1)25,26. On 1 × 1 m-cells, we compared water depth (m) and flow velocity (m s−1) at a local scale. In particular, we compared the flood hazard model by examining the deterministic and stochastic classifications of flood hazard, using the expected stochastic values as a benchmark. In order to do this, we used a confusion matrix, which provided us with the overall accuracy (Eq. 2) and the Kappa coefficient (Eq. 3)51.

$$F\left( \% \right)=\frac{{A}_{{{{{{{\mathrm{ref}}}}}}}}\cap {A}_{{{{{\mathrm{mod}}}}}}^{n}}{{A}_{{{{{{{\mathrm{ref}}}}}}}}\cup {A}_{{{{{\mathrm{mod}}}}}}^{n}}\times 100$$
(1)

where, \({A}_{{ref}}\) is the area flooded by the deterministic model, and \({A}_{{{{{\mathrm{mod}}}}}}^{n}\) is the area of flooding for the stochastic model n.

$${{{{{{\mathrm{Overall}}}}}}}\; {{{{{{\mathrm{Accuracy}}}}}}}\left( \% \right)=\frac{{{{{{{\mathrm{TP}}}}}}}+{{{{{{\mathrm{TN}}}}}}}}{{{{{{{\mathrm{TP}}}}}}}+{{{{{{\mathrm{FN}}}}}}}+{{{{{{\mathrm{TN}}}}}}}+{{{{{{\mathrm{FP}}}}}}}}$$
(2)

where TP are true positives (i.e., the deterministic and probabilistic outputs assign the same flood hazard category to a particular cell), TN are true negatives (i.e., the deterministic and probabilistic outputs prove that there is no flooding for a given cell), FP are false positives (i.e., the deterministic output for a particular cell indicates a distinct category of flood hazard than the probabilistic output), and FN are false negatives (i.e., the deterministic output indicates the absence of flooding for a particular cell, but the probabilistic output reveals flooding).

$$K=\frac{{p}_{0}-{p}_{e}}{1-{p}_{e}}$$
(3)

where po is the empirical probability of agreement on the hazard classes (observed agreement ratio) and pe is the expected agreement assigned.

Uncertainty analysis

We created one sample for every MC model using the Latin Hypercube Sampling method. The probability distribution functions assigned to each input data and the multidimensional space resulting from all probability distribution functions combinations were correctly and efficiently represented using this technique24,37. As described below, we applied Latin Hypercube Sampling to all inputs, concentrating on the probability distribution functions selection, robustness, and quality of each input data (Supplementary Note 4).

Uncertainty was analyzed for the high-resolution digital model (HRDM) on a cell-by-cell basis. We coupled the DBM (Supplementary Note 5), with the DSM (Supplementary Note 6), yielding the HRDM (Supplementary Fig. 5), whose elevation values in each cell served as the reference central values (c). We obtained the DSM and DBM from high-quality data sources, then refined and filtered them to appropriately represent surface and bathymetry (Supplementary Notes 5 and 6 and Supplementary Fig. 5). As proposed previously by other authors such as Milan et al.52, the approach adopted here focused on assessing spatially distributed error (cell-by-cell) throughout the HRDM in order to spatially identify the sources of uncertainty inside the HRDM. For this estimate, we conducted topographical and bathymetric surveys using a digital global positioning system -DGPS (with centimeter accuracy) and a single-beam echosounder embedded in an aquatic drone (with sub-centimeter accuracy) to cover the model area with a set of independent control points with a higher resolution than the HRDM (Supplementary Notes 5 and 6). To evaluate errors, we estimated the absolute differences between the independent control points and the HRDM values. Subsequently, a geostatistical analysis (Supplementary Notes 5 and 6) was performed to determine the spatial error at the pixel level. In order to determine the error in each cell, the central (c) and error values (ε) were retrieved. Then, we define a triangular probability distribution function in each cell by setting its minimum (\(a=c-\varepsilon\)), maximum (\(b=c+\varepsilon\)) and central (\(c\)) values (Eq. 4).

$$f\left(a,c,b\right)=\left\{0\,{for}\,x < a\frac{2\left(x-a\right)}{\left(b-a\right)\left(c-a\right)}{for}\,a\le x < c\frac{2}{b-a}\,{for}\,x=c\frac{2\left(b-x\right)}{\left(b-a\right)\left(b-c\right)}\,{for}\,c < x\le b\,0\,{for}\,b < x\right\}$$
(4)

To derive the Manning coefficients, we first used the CORINE Land Cover database53 to establish the most prevalent types of land uses/land cover at the study site54. Then, each land use/land cover unit was assigned a possible range of Manning’s n values, following the approach developed by Chow55 and using Manning’s n values from the Spanish methodological guidance for the development of the national flood zone mapping system56. We adopted a uniform distribution (Eq. 5) for Manning’s n to account for the uncertainty associated with the roughness of the channel-floodplain system. This approach is widely used20,26,27,35, given the lack of knowledge about the probability distribution that this coefficient follows57:

$$f\left(a,b\right)=\frac{1}{b-a}\,{for}\,a\le x\le b$$
(5)

With regards to the upstream boundary condition, the flow gauge nearest to the study reach (flow gauge code 2121; UTM-X ETRS89 H30N 272809; UTM-Y ETRS89 H30N 4599163; operational since 2002) does not possess a sufficiently extended annual time series of maximum instantaneous peak flow to carry out a meaningful flood frequency analysis. Instead, we propagated a time series collected from a flow gauge located 40 km upstream that has been operational since 1912 (flow gauge code 2062; UTM-X ETRS89 H30N 298658; UTM-Y ETRS89 H30N 4598753). We characterized alluvial aquifer-river interaction and water storage on the floodplain surface as a result of the existence of landforms such as abandoned meanders and orthogonally arranged man-made structures (i.e., mainly rural roads and irrigation channels). We take three steps to do this: (i) handle the time series to obtain as much data as possible and to check its statistical validity; (ii) perform a Log-Pearson type III-based flood frequency analysis (Supplementary Note 2) to obtain the fitted 5-percent and 95-percent confidence limits values; and (iii) propagate this value downstream to create a triangular probability distribution function (Eq. 4) (Supplementary Note 1).

Uncertainties arising from flow simulation in hydraulic controls like weirs were examined by taking into account the weir coefficient and the elevation of the top of the weir. The range of weir coefficient values, which can be assumed within a uniform distribution (Eq. 5), was established by using the standard weir formula of Poleni18,58. Thus, the existing inline structures in the reach under study were classified as trapezoidal-shaped crested weirs. Regarding the elevation of the top of the weir, a representative value was considered to be the average of the elevations extracted from the Light Detection and Ranging (LiDAR) data. For the uncertainty analysis, we used a uniform distribution, whose range of values was established by extracting the maximum elevation error at the weir crest from the geostatistical analysis performed on the DSM.

Energy slope was employed to establish the downstream boundary condition, assuming that the slope of the reach is equal to the energy slope. Consequently, energy slope was estimated by calculating the reach slope between the last modeled weir and the subsequent downstream weir (1.49 km between weirs). A depth value was then computed using Manning’s equation55. To approach the uncertainty analysis of energy slope, the lowest and maximum slopes of the reach arising from the estimated bathymetric error were determined. Then, we fit this range to a uniform distribution (Eq. 5).

Sensitivity analysis

We conducted a sensitivity analysis to identify sources of uncertainty in the model’s output (flooded area, water depth, and flow velocity)59. First, we conducted a local sensitivity analysis to independently determine the relevance of each source of uncertainty on the flooded area output model at a general scale (the whole area within the model boundaries) and create a tornado plot (Fig. 2). Secondly, we performed a global sensitivity analysis, in which all uncertainty sources on water depth and flow velocity were examined collectively at each cell (local scale) (Fig. 3a).

To conduct the local sensitivity analysis on the flooded area, we used the deterministic simulation result as the reference flooded area. Then, we conducted two simulations in which we varied the studied data’s range (minimum and maximum value), whereas the remaining input data was kept unchanged. This procedure was repeated to examine the possible uncertainties associated with: (1) DSM; (2) DBM; (3) land uses; (4) upstream boundary condition; (5) downstream boundary condition; (6) weir coefficients; and (7) weir top elevation. Using a tornado diagram, we assessed each data’s contribution to global uncertainty.

The global sensitivity analysis was carried out using the RBD-FAST method60, which was integrated inside a Python script49. This method is an adapted variant of the FAST method, allowing for efficient computation of the first-order SI. It provides an approximation of the total-order sensitivity index, which evaluates the contribution of each individual input factor to the overall output variance while accounting for non-linear interactions among factors. This method uses a variance-based sensitivity analysis to calculate the SI, which breaks down the model output’s variance into various components that may be assigned to specific input factors or combinations of them37,60. This decomposition provides valuable insight into which factors have the greatest influence on the model’s output and how they interact.

First-order SI was calculated using the RBD-FAST technique through Latin Hypercube Sampling, delivering a highly cost-efficient approach in terms of computing time without affecting the reliability of the sensitivity index sought60. The RBD-FAST method was used to perform the global sensitivity analysis at the cell level, taking into account the following uncertainties: (1) estimated vertical error for DSM, DBM, or weir elevation; (2) accepted ranges in the literature for Manning’s n and weir coefficient; (3) confidence intervals for quantiles to be used as upstream boundary conditions; and (4) energy slope as downstream boundary conditions. All global sensitivity analysis findings were processed with all these sources of uncertainty in mind, yielding tabular and cartographic representations of the ranking (i.e., ranking of the input data based on their relative effect on model outputs) and SI values (Fig. 3).

Convergence analysis

We analyzed the convergence of the MC-based ensemble’s first moment for the outputs flooded area, water depth, and flow velocity, and we devised a criterion to consistently end the MC process. The proposed criterion is based on the CLT36, which seeks a band of a given width (provided by the maximum acceptable variance for the output of interest) and length (the minimum number of simulations) such that the probability of the MC samples falling outside of this band is virtually null.

Convergence analysis provides evidence of the process’s consistency and reliability. Consistency and reliability of a stochastic model are demonstrated when its stability and, thus, convergence can be confirmed. A stochastic model has converged when all possible outcomes are well represented (i.e., when the outcomes’ underlying distribution function has been properly described), indicating that the mean of the outputs is stable35,37. Unlike previous studies that mostly relied on visual inspection30,35, we quantified convergence using MC trials employing the CLT36 for each output data of interest (i.e., flooded area, water depth, and flow velocity). The CLT approach evaluates the mean evolution of the output data in order to quantitatively verify that all the stochastic possible values are properly represented in the findings, and hence in the probability distribution function. CLT produces two bounds for this evolution based on the output data variance, resulting in a fluctuating bandwidth (Eq. 6). Each time a new value is added to the existing set of data, the output mean must remain within the established bounds. Convergence is assured (e.g., Fig. 1a, after 288 simulations) if the new mean values stay within the bounds for 60 consecutive simulations, because the likelihood of new unknown values appearing is minimal. If the new average value exceeds the bounds, however, new variance-based constraints are computed. By achieving this condition, we assure that the model of the result converges.

$${{{{{{\mathrm{CB}}}}}}}=\frac{\alpha }{2}x\,{{{{{{\mathrm{raiz}}}}}}}\frac{v}{n}{{{{{{\mathrm{BW}}}}}}}={Mn}+-{{{{{{\mathrm{CB}}}}}}}$$
(6)

where\(\,\alpha\) is the level of significance (set in 0.95), v is the variance of the set of values obtained up to the simulation n, and n is the current simulation number.

The CLT method was initially used to examine the convergence of the flooded area over the whole study area (Fig. 1a). Furthermore, we employed the CLT approach at the local scale to evaluate the convergence of water depth and flow velocity in each cell (Fig. 1b). We repeated the procedure described above in each cell across all simulations, identifying each cell as stable or unstable based on whether it met the quantitative convergence condition. Finally, to visually depict these cell states, we display the ultimate configuration on a map, yielding the convergence map (Fig. 1c).