Main

River floods are primary natural disasters, steadily accounting for several billion dollar losses every year and most of the affected population1,2. Assessment of the flood hazard is complicated by runoff generation processes which might be more variable than observed records suggest, let alone ongoing global change3,4. A reliable evaluation of the propensity of rivers to undergo extreme floods with magnitudes not previously experienced (here quantified by the values of the river discharge) is therefore crucial for urban planning, designing engineering structures, pricing insurances, and laying out mitigation and adaptation strategies.

Flood hazard assessment is particularly difficult when the magnitude of the rarer floods strongly increases5,6. Whenever flood magnitudes rise gradually with diminishing chance of their occurrence (Fig. 1a), they are indeed also characterized by high predictability7. In some cases (Fig. 1b), however, a clear growth of the magnitude of the rarer floods points to the possible occurrence of very large events that arise unexpectedly8, often causing catastrophic socio-economic outcomes, as in the recent case of the July 2021 floods in Germany.

Fig. 1: Observed flood divides and extreme floods.
figure 1

a,b, Normalized seasonal maxima versus chance of occurrence (Methods) for case studies lacking (ID 11402001, autumn; a) and exhibiting (ID 11946000, summer; b) a flood divide (red dot). c, Relative deviation from the flood divide of observed seasonal floods (grey dots linked by solid lines) sorted by their chance of occurrence. Inset shows ratios between highest observed and mean maximum seasonal flood for case studies (black dots) with (n = 27) and without (n = 7) flood divides. Centre line: median; box limits: 25th and 75th percentiles; whiskers: minimum and maximum values that are not considered outliers, that is, 1.5 x interquartile range.

Several studies signalled the pervasiveness of the latter phenomenona9,10. A few works tried to link these behaviours to the catchment water balance11,12 and suggest, on the basis of extensive field surveys in two small basins and a synthetic experiment, that they may occur when the catchment storage capacity is exceeded13,14. Nonetheless, data constraints7,15 and spare knowledge on their causes limit our skills to diagnose the possible occurrence of extreme events based on precursory signals, as done for other natural and societal phenomena16.

Here, we combine long hydroclimatic records and a mechanistic–stochastic approach to flood hazard assessment to reveal that the spatial organization of stream networks and the river flow regime jointly control the emergence of pronounced increases of the magnitude of the rarer floods. We further demonstrate that the identified controls enable predicting the propensity of rivers to generate extreme floods in an additional set of several thousand case studies.

Sharp rise of flood magnitudes

We identified rivers that exhibit a strong increase in the magnitude of rarer floods from a set of 101 case studies from mid-sized (drainage area: 43–9,052 km2, median: 865 km2) unregulated basins in the United States and Germany, denoted as the study dataset (Extended Data Fig. 1, Extended Data Table 1 and Methods). We further pinpointed the case-specific flood magnitude whose exceedance marks the rise of progressively larger floods14 (Fig. 1b), which we term ‘flood divide’ as it discriminates between common and increasingly extreme floods.

To identify flood divides we applied a protocol (Methods) to the characteristic relation between river flood magnitudes and their chance of occurrence, which was both empirically derived from available observations and inferred from a well-established mathematical description of precipitation, soil moisture and runoff dynamics in river basins (Methods), referred to as the physically based extreme value (PHEV) distribution of river flows12,17.

In all instances where the empirical and theoretical methods consistently identified a flood divide (27 out of 101 case studies), this feature neatly partitions contrasting characters in terms of the increase of flood magnitudes (Fig. 1c). On the left-hand side of the flood divide, magnitudes gently rise within a narrow range of common values, whereas they substantially increase with a remarkable nonlinear growth on the right-hand side of it. The flood divide is thus an effective attribute to distinguish common from increasingly extreme floods that may occur in river basins.

Moreover, the existence of a flood divide indicates whether much larger floods than those observed on average shall be expected in a river18. The ratios between the highest observed flood and the mean maximum seasonal flood indeed significantly differ (Kolmogorov–Smirnov test, P < 0.01; Methods) for basins with or without flood divides (inset of Fig. 1c). When a flood divide exists, the highest observed floods are much larger, with deviations from their mean values twice as big on average than for basins with no flood divide. In particular, floods can reach exceptionally high magnitudes of up to ten times the mean maximum seasonal flood in these cases, a prospect also evoked by research on historical and palaeofloods18,19,20. The existence and magnitude of the flood divide therefore represent pivotal features to characterize the propensity of rivers to extreme hydrological events and raise awareness of the intrinsic peril of floods in these contexts.

The magnitude of the flood divide varies between 2.5 and 35 (median: 8.4) times the long-term mean discharge of rivers in the study dataset. Comparison of the empirical (that is, derived from the available data records) and theoretical (that is, inferred through PHEV) positions of the flood divide (Extended Data Fig. 2) indicates good agreement (distance and Spearman correlation coefficients of 0.64 and 0.44, respectively, P < 0.01; Methods) in this varied set of case studies. The adopted mechanistic–stochastic description of hydrological dynamics is thus suitable to guide investigation on what are the physical controls of the emergence of extreme floods in river basins.

Watershed features promoting extreme floods

In contrast with the ubiquitous attribution of extreme flood instances to intense rainfall and anomalous antecedent conditions21,22,23, we show here that intrinsic attributes of river basins explain the penchant of rivers for generating extreme floods.

We applied a dimensional analysis tool (Methods) to the description of hydrologic dynamics provided by PHEV to set research hypotheses on the key factors promoting the occurrence of flood divides in river basins. We then validated the hypotheses with observations and tested whether the identified controls accurately predict the occurrence of extreme floods. The analysis indicates that two specific watershed properties, namely the hydrograph recession exponent24 and the coefficient of variation of daily flows25, control the emergence and magnitude of the flood divide.

The hydrograph recession exponent is a compelling descriptor of the geomorphological structure of the contributing river basin24,26, which determines how watersheds funnel runoff towards their outlets27. Specifically, it stems from the spatial organization of the stream network, which defines how the geometry of saturated areas26 and drainage of the riparian unconfined aquifer24 vary in time and contribute to discharge. The coefficient of variation of daily flows instead arises from distinctive interactions among precipitation inputs, evapotranspiration rates, soil moisture dynamics and response times of river basins25. It summarizes in a single index how watersheds filter the incoming climate signal28 and thus recaps the chance of precipitation falling on dry or saturated basins, which in turn controls the mix of small and large runoff events29. Although typically estimated from streamflow observations, both these properties and their descriptors can be likewise evaluated from commonly available hydroclimatic data series30 and geomorphological data only24,26.

A distance correlation coefficient (Methods) of the multivariate relation between observed flood divides and the two physioclimatic controls equal to 0.47 (P < 0.05) confirms significant dependence of the magnitude of the flood divide from the hydrograph recession exponent and the coefficient of variation of daily flows. We inferred the form of the bivariate relations as well as their uncertainties through PHEV (Methods) and validated the theoretical patterns by overlying the available observations, which mostly fall within the anticipated ranges (Fig. 2). The distance and Spearman correlation coefficients of the bivariate relations are respectively 0.45 (P < 0.05) and −0.3 (P = 0.12) for the hydrograph recession exponent (Fig. 2a), and 0.44 (P < 0.05) and 0.40 (P < 0.05) for the coefficient of variation of daily flows (Fig. 2b).

Fig. 2: Magnitude of the flood divide as a function of its physioclimatic controls.
figure 2

a,b, Normalized magnitude (that is, divided by the long-term mean river discharge \(\bar q\)) of the flood divide in the study dataset as a function of the hydrograph recession exponent (a) and the coefficient of variation of daily flows (b). Shaded areas span the 95% variability range of theoretical predictions and provide an estimate of their uncertainties. Grey markers display the median (squares), minimum and maximum values of the binning (horizontal bars), and 5th and 95th percentile range (vertical bars) of the observations (dots; equal number of n = 4 case studies for each bin), here used for validation.

When the hydrologic response is highly nonlinear, the flood divide appears for relatively small magnitudes (Fig. 2a). Heterogeneous drainage density typically enhances the nonlinearity of the hydrologic response26. In these cases, a given increase of the overall length of the stream network actively draining runoff during events determines a superlinear growth of the connected riparian aquifers24 and saturated areas26, causing sharp increments of streamflow and the emergence of flood divides. The areas contributing runoff instead add up gradually with more linear hydrologic responses26, preventing the appearance of flood divides, which shift to increasingly larger magnitudes (Fig. 2a). This evidence corroborates findings of theoretical31 and modelling studies14,32 that suggest a role of nonlinear hydrological responses in the occurrence of extreme runoff events.

The magnitude of the flood divide also increases with the streamflow variability (Fig. 2b). The coefficient of variation of daily flows stems from the ratio between interarrivals of runoff-producing precipitation events and response times of river basins12,25 (Methods). When the interarrival between events is larger than the time required for draining them (because of sporadic precipitation, intense evapotranspiration or fast hydrologic response), watersheds can dry substantially before new precipitation occurs. Events are likely to be filtered by the available basin storage, decreasing the chance of marked growths of the flow magnitudes and shifting the flood divide to larger values. Conversely, when streamflow weakly varies, watersheds experience sustained wet conditions that are likely to cause marked streamflow increments and the emergence of flood divides for relatively small magnitudes.

The coefficient of variation of daily flows hence recapitulates in a single metric the characteristic water storage dynamics of river basins25. Its identification as a key control of the emergence of flood divides and extreme floods agrees with studies pointing at a relation between the predisposition of rivers to flooding and the long-term wetness conditions of their basins33,34, which largely affect streamflow variability35. Here, we confirm with data the key importance of typical water storage dynamics for the emergence of increasingly extreme floods, and provide general explanations of the underlying mechanisms by means of the PHEV framework.

Foreseeing the chance of extreme floods

A question that naturally arises is whether we can label river basins as hazardous (that is, they may exhibit flood divides and extreme floods) by leveraging the hydrograph recession exponent and the coefficient of variation of daily flows as indicators. We show here that we can indeed provide accurate predictions by means of binary logistic regression (Methods), using the two properties as explanatory variables of the likely occurrence of extreme floods in river basins. We first evaluated reliability and robustness of the predictions over the study dataset in a cross-validation fashion (Methods). The large majority of results are true cases (Extended Data Fig. 3), which indicate good ability to identify either the emergence of flood divides (true positives) or their absence (true negatives) from the two physioclimatic properties. Median balanced accuracy and the Matthews correlation coefficient (MCC; Methods) of 0.87 (interquartile range: 0.80–0.94) and 0.63 (0.44–0.77), respectively, denote overall high prediction accuracy. In particular, hydrograph recession exponent and streamflow variability successfully categorize river basins as either having flood divides or not in 83% of the cases on average (interquartile range: 75%–92%), and outclass a random classifier (Methods) in 97% of the cases.

We further performed a stress test (Methods) to evaluate the skills of our indicators in foretelling the possibility of extreme floods in a broader set of 2,519 case studies from mid-sized (drainage area: 36–23,843 km2, median: 966 km2) unregulated basins, denoted as the test dataset (Extended Data Fig. 1, Extended Data Table 1 and Methods). This is an especially severe trial as, contrary to common practice, the training set is here more than 70 times smaller than the validation set. Median balanced accuracy and MCC are in this case equal to 0.60 (interquartile range: 0.54–0.65) and 0.18 (0.07–0.28), respectively. The onset of several false positives (that is, cases where we predict a flood divide that is not confirmed by observations; Extended Data Fig. 4) mainly causes the decrease of accuracy. These false positive instances may be partly owing to the inclusion in the test dataset of case studies undergoing hydrological processes (for example, snowmelt, strongly variable recession properties across events) that are not explicitly characterized by the adopted theoretical framework, and for which the identified physioclimatic controls might hence be only partially telling. However, past studies also show that marked growths of the magnitude of the rarer floods are systematically more often detected with longer data records7, and argue that extreme floods that would allow us to confirm these predictions may not be included in the available observations because of their limited lengths36. This is probably the case here, as the fraction of false positive cases in the test dataset consistently decreases with increasing data length (Extended Data Fig. 5a). Moreover, previous studies also highlight that marked rises of the magnitude of the rarer floods are less clearly identified from observations in humid regions7 characterized by reduced streamflow variability25, as for our set of false positives (Extended Data Fig. 5b). The lower likelihood of observing extreme events in these contexts29 hence suggests caution in considering these basins as at low-risk. Here we simply note that, notwithstanding the false positive labels, also in the stress test the descriptors of stream network organization and river flow variability outdo a random classifier in 87% of the cases.

Most importantly, the predicted existence of a flood divide based on these two physioclimatic features of watersheds provides indications on whether much larger floods than those observed on average shall be expected in a river basin (Fig. 3), analogously to that previously shown for observed flood divides (inset of Fig. 1c). In fact, our predictions successfully distinguish river basins in the test dataset where extreme deviations of the highest observed flood from the mean maximum seasonal flood occur. The ratios between the latter variables are indeed significantly larger (Kolmogorov–Smirnov test, P < 0.01) for case studies where we predicted the existence of flood divides, regardless of whether we benchmark our predictions against the available observations (Fig. 3) or not (Extended Data Fig. 4b).

Fig. 3: Prediction of flood divides and extreme floods from their identified physioclimatic controls.
figure 3

Ratios between the highest observed flood and the mean maximum seasonal flood for case studies in the test dataset for which we predict the presence (true positives, n = 531) and the absence (true negatives, n = 728) of a flood divide. Centre line: median; box limits: 25th and 75th percentiles; whiskers: minimum and maximum values that are not considered outliers, that is, 1.5 x interquartile range; dots: outliers.

Although only tested in mid-sized unregulated river basins, the knowledge gained on the intrinsic attributes of watersheds that control the emergence of flood divides offers a chance to raise awareness of the propensity of certain rivers to generate extreme floods18. The foreseen existence of a flood divide may provide guidance on the choice of alternative statistical tools (for example, light- versus heavy-tailed distributions)37 widely employed in the practice of flood hazard assessment. Estimates of its expected position empower evaluations of the reliability of discharge records for unveiling the peril of extreme events exceeding the flood divide in river basins subject to varied geomorphological and hydroclimatic settings38. Furthermore, the attested feasibility of inferring flood divides from measurable metrics of ordinary discharge dynamics (that is, the hydrograph recession exponent and the coefficient of variation of daily flows), rather than records of streamflow maxima, enables the inception of hazard mapping tools that do not merely rely on past flood records39, but actively identify hazardous regions that are susceptible to the occurrence of flood divides and extreme floods, thus informing concerned communities of possibly overlooked hazards40.

Methods

PHEV distribution of river flows

PHEV12,17 is a mechanistic–stochastic characterization of the magnitude and probability of streamflow maxima occurring in a given reference period (for example, a season). It results from a well-established mathematical description of catchment-scale daily precipitation, soil moisture and runoff dynamics41,42,43,44,45, which has been proved suitable for a wide array of physioclimatic conditions46,47,48,49,50,51,52,53,54. This framework describes precipitation as a marked Poisson process with frequency λP (1/T) and exponentially distributed depth with average α (L), where T stands for time and L for length. Soil moisture increases due to precipitation infiltration and decreases as a result of evapotranspiration, which is a linear function of soil moisture between the wilting point and a critical upper threshold. Exceedance of this threshold triggers runoff pulses with frequency λ < λP (1/T) and exponentially distributed magnitude with average α (L). These pulses recharge a single catchment storage, which finally drains into the stream network. A nonlinear storage–discharge relation mimics the hydrological response and the related hydrograph recessions, which are described through the coefficient K (L1−a/T2−a) and exponent a of a power law function55. The summarized mechanistic–stochastic description of runoff generation processes enables expressing the probability distributions of daily flows45, peak flows12 (that is, local flow peaks occurring as a result of runoff-producing rainfall events) and flow maxima12 (that is, maximum values in a specified timespan) as a function of a few physically meaningful parameters (α, λ, a, K).

We directly computed three parameters of PHEV (α, λ, a) from daily rainfall and streamflow series: α is the mean precipitation during rainy days; λ is the ratio between the mean specific river discharge \(\bar q\) (L/T) and α; and a is the median value of the exponents of power law functions fitted to observed dq/dtq pairs of single hydrograph recessions24, where dq/dt are the first derivatives in time (t) of the river discharge q. K is instead obtained via maximum likelihood estimation on the observed seasonal maxima12.

Data

Two datasets are used in this work with distinct objectives. Both of them were analysed on a seasonal basis (spring: March to May; summer: June to August; autumn: September to November; winter: December to February). A case study represents a given catchment during one season. The first set of data, named study dataset (Extended Data Fig. 1 and Extended Data Table 1), includes 101 case studies56 across the United States (from the MOPEX dataset57,58) and Germany59. These case studies were selected as they are characterized by observational records at least 30 years long, limited anthropogenic streamflow disturbance caused by reservoirs and human water uses58,60, and modest snowfall (that is, the average daily temperature is above 0 °C for the majority of instances in each season and for most years) precluding intense snow accumulation and melting processes50,53. They also exhibit hydrograph recession coefficients (that is, the coefficients of power law functions with exponent set equal to a fitted to observed dq/dtq pairs of single hydrograph recessions) that do not consistently decrease with increasing flow magnitudes17. These case studies comply with key hypotheses of the adopted theoretical framework12, thus enabling a rigorous investigation of physical controls on the emergence of flood divides. The second set of data, termed test dataset (2,519 additional case studies; Extended Data Fig. 1 and Extended Data Table 1), consists instead of watersheds from the MOPEX and Germany that do not necessarily fulfil the above requirements. The only two criteria used for selecting them are the limited anthropogenic disturbance on streamflow58,60 and a minimum length of the observational series equal to 10 years. The test dataset constitutes a separated set of case studies to stress test the capability of the physical variables identified as explanatory of the magnitude of flood divides to predict the emergence of these features in the test catchments.

Identification of flood divides

We applied a robust methodology56 to detect flood divides in the study dataset, the steps of which are summarized in the following. We identified the point of maximum curvature14 of the semi-logarithmic relation between the inverse of the exceedance cumulative probability of flow maxima and its normalized magnitude (that is, magnitude divided by the long-term mean river discharge \(\bar q\)). We estimated this relation, which is commonly known as the flood magnitude–frequency curve, both empirically via Weibull plotting position61,62 of the observations and by means of PHEV. In the former case, the curvature fluctuates, as it is computed on a discrete set of unevenly distributed points. We thus applied a heuristic rule to remove noise and further consider as potential flood divides only observations on the right-hand side of the last point whose second derivative exceeds the range of twice the standard deviation of the curvature itself63. We then used the Mann–Whitney U test64 to evaluate statistical difference (at the 0.05 significance level) of the distributions of first derivatives before and after each potential flood divide. Additionally, we assessed whether this difference is substantial by computing an effect size by means of the Cohen’s d65,66 and the relative increase of the slope of PHEV within the observational range. Increments of the flood magnitude beyond the flood divide are finally considered relevant if the Cohen’s d for the point with minimum P value of the Mann–Whitney U test is higher than 0.4, a value that indicates a moderate effect size67,68, and the slope increment exceeds 1%. The red circle in Fig. 1b provides an example of flood divide identified through this procedure.

If we identified a flood divide from both empirical and PHEV estimates, we labelled it a true positive (TP) case. Conversely, if both observations and PHEV suggested the absence of a flood divide, we labelled the case as a true negative (TN). When we detected a flood divide from PHEV but not from the observations, we termed it a false positive (FP). If instead PHEV did not signal the existence of a flood divide, which we however identified from the observations, we labelled it a false negative (FN) case. Application of this whole procedure to the study dataset yields true flood divides for 27 case studies, which are displayed in Extended Data Fig. 2.

Dimensional analysis

Starting from a physically meaningful law, the Pi theorem69 enables reducing the variables of a problem by arranging them into dimensionless groups that help reveal the actual physical controls of the problem70. In particular, the Pi theorem states that if we can hypothesize a mechanistic relation involving n physical variables with k independent fundamental dimensions, we can rewrite it in terms of p = n − k dimensionless groups. We postulated that PHEV outlines the pivotal relations among physioclimatic variables that control the hydrological response of river basins and the occurrence of floods. We then leveraged the Pi theorem to unveil physical controls on the shape of the flood magnitude–frequency curve and hence on the magnitude of the flood divide, which are embodied by the dimensionless groups identified through the Pi theorem. The latter groups may differ depending on the hypothesized mechanistic relation and the variables considered in the analysis. Therefore, we validated the above hypothesis and the relevance of the resulting dimensionless groups against observations. We finally tested the predictive power of the identified physical controls in a large set of case studies.

The variables in this case are the normalized magnitude of the flood divide \(q/\bar q\) (that is, flow magnitude q divided by the long-term mean river discharge \(\bar q\)), the effective rainfall frequency λ (1/T), the average rainfall magnitude α (L), and the hydrograph recession exponent a and coefficient K (L1−a/T2−a). The overall number of variables is n = 5. The number of fundamental dimensions is instead k = 2 (that is, (L) and (T)). We rearranged the five variables into p = 3 dimensionless groups, which are the two dimensionless variables themselves (\(q/\bar q\) and a), and a combination of the variables that encompasses all the remaining ones and suitably yields a dimensionless group (that is, a2αa1). We thus identified an expression for the normalized magnitude of the flood divide that reads as: \(q/\bar q\) = f(a, a2αa1). The second dimensionless group on the right-hand side of the equation is the squared coefficient of variation of daily flows12,25. To stress its physical origin, we also express it as the ratio between the mean interarrival of effective rainfall events, 1/λ (T), and the characteristic response time of the basin, 1/K(αλ)a−1 (T) (refs. 52,71). The Pi theorem thus indicates the hydrograph recession exponent and the coefficient of variation of daily flows as the physical controls of the magnitude of the flood divide.

Relations between magnitude of the flood divide and its physioclimatic controls

We proceeded as follows to determine the theoretical relations between the normalized magnitude of the flood divide and its geomorphological and hydroclimatic controls (blue envelopes in Fig. 2). We fitted an exponential function in the form y = αi·exp(βixi) + γi to results from PHEV, where the dependent variable y is the normalized magnitude of the flood divide \(q/\bar q\) estimated by means of PHEV, xi is either the hydrograph recession exponent (Fig. 2a) or the observed coefficient of variation of daily flows (Fig. 2b) and i labels either of these two cases. We thus obtained the optimal parameters (that is, those for which the sum of the residuals is minimized) and their standard deviations. We finally determined the theoretical relations and their uncertainties (blue envelopes in Fig. 2) by plotting the exponential functions with the sets of parameters that encompass the 95% variability of theoretical predictions for the set of case studies.

Statistics

We used the non-parametric two-sample and two-sided Kolmogorov–Smirnov test72 to determine whether the ratios between the highest observed flood and the mean maximum seasonal flood for basins with or without flood divides are drawn from the same probability distribution. We applied the test to compare both cases in the study dataset for which empirical and PHEV estimates of the flood divide provided consistent results (inset of Fig. 1c), and cases in the test dataset for which we predicted either the presence or the absence of a flood divide by means of binomial logistic regression of its two identified physioclimatic controls (Fig. 3 and Extended Data Fig. 4b).

We used distance correlation73 to quantify the strength of the observed relations between normalized magnitude of the flood divide and its physioclimatic controls, as well as between observed and theoretical magnitudes of the flood divide (Extended Data Fig. 2). Distance correlation is a measure of multivariate dependence between random vectors, which is defined, analogously to the Pearson correlation coefficient, as the ratio between their distance covariance and the product of their distance standard deviations. It varies between 1 and 0, with the latter value indicating that the variables are independent.

Binary logistic regression

We used logistic regression to predict whether a flood divide may arise or not in a river basin, by considering the hydrograph recession exponent and the coefficient of variation of daily flows as explanatory variables. Logistic regression is a statistical tool that uses a logistic function to model a binary outcome74,75, which in this study is the occurrence/non-occurrence of a flood divide. In mathematical terms, let us consider a linear relationship (with coefficients β0, β1, β2) between two predictors (x1, x2) and the log-odds l (logit) of the event Y = 1, with Y being a Bernoulli distributed variable: l = log(P/(1 − P)) = β0 + β1x1 + β2x2. The probability that Y = 1 is thus: \(P = \frac{1}{{1 + {\mathrm{exp}}^{ - \left( {\beta _0 + \beta _1x_1 + \beta _2x_2} \right)}}} = S_e\left( {\beta _0 + \beta _1x_1 + \beta _2x_2} \right)\), where Se is the sigmoid function with base e. We set the cutoff threshold for assigning values of P to either class 0 (no flood divide expected) or 1 (flood divide expected) at 0.75, meaning that if we predict a probability lower than 0.75 the case study is allocated to class 0 and vice versa. We determined this value by means of the Youden’s statistic computed for the study dataset76. We also controlled for collinearity of the explanatory variables by computing the variable inflation factor77, which is equal to 1.62. Provided that removal of correlated variables is typically recommended77,78 for a variable inflation factor >5–10, we retained both the hydrograph recession exponent and the coefficient of variation of daily flows as explanatory variables.

We trained the binary logistic regression by using the true cases in the study dataset, that is, those for which both PHEV and the observations indicate either the presence (true positives, 27 cases) or the absence (true negatives, 7 cases) of a flood divide56. We then evaluated the predictions in a twofold fashion. We first applied a cross-validation procedure, randomly extracting for 100 times two-thirds of the 34 true cases (true positives plus true negatives) for fitting the parameters of the logistic model and using the remaining one-third of cases to evaluate the accuracy of the predictions (Extended Data Fig. 3). We later adopted a separated extended dataset (the test dataset; Extended Data Fig. 1) to evaluate the prediction performance under broader conditions. In this case we fitted the parameters of the logistic model on the whole set of 34 true cases identified in the study dataset, randomly extracted for 1,000 times 34 case studies from the test dataset to match the number of case studies used for training the binary logistic regression, and evaluated the performance each time (Extended Data Fig. 4a).

We employed two performance metrics, namely the balanced accuracy and the MCC to evaluate the accuracy of our predictions. The balanced accuracy79 is a class-wise weighted accuracy rate computed as the arithmetic mean of sensitivity (true positive rate) and specificity (true negative rate). It is recommended when one class (true negatives in this work) is underrepresented in the dataset (that is, imbalanced dataset). The balanced accuracy ranges between 0 and 1, with values lower than 0.5 indicating a worse performance than a random classifier. The MCC80,81 is a metric unaffected by biases when considering imbalanced datasets82. It is defined as \({\mathrm{MCC}} = \frac{{\mathrm{TP}} \times {\mathrm{TN}} - {\mathrm{FP}} \times {\mathrm{FN}}}{\sqrt {\left({\mathrm{TP}} + {\mathrm{FP}} \right) \times \left( {\mathrm{TP}} + {\mathrm{FN}}\right) \times \left( {\mathrm{TN}} + {\mathrm{FP}} \right) \times \left( {\mathrm{TN}} + {\mathrm{FN}} \right)}}\). The MCC ranges between −1 (complete disagreement between predictions and observations) and +1 (perfect prediction), with 0 indicating that the model performs as well as a random classifier. MCC is equivalent to the Pearson correlation coefficient in the special case of two binary variables (that is, predictions and observations). Analyses have been performed with the Python scikit-learn package, version 0.24.283.