Nonstationary flood coincidence risk analysis using time-varying copula functions

The coincidence of flood flows in a mainstream and its tributaries may lead to catastrophic floods. In this paper, we investigated the flood coincidence risk under nonstationary conditions arising from climate changes. The coincidence probabilities considering flood occurrence dates and flood magnitudes were calculated using nonstationary multivariate models and compared with those from stationary models. In addition, the “most likely” design based on copula theory was used to provide the most likely flood coincidence scenarios. The Huai River and Hong River were selected as case studies. The results show that the highest probabilities of flood coincidence occur in mid-July. The marginal distributions for the flood magnitudes of the two rivers are nonstationary, and time-varying copulas provide a better fit than stationary copulas for the dependence structure of the flood magnitudes. Considering the annual coincidence probabilities for given flood magnitudes and the “most likely” design, the stationary model may underestimate the risk of flood coincidence in wet years or overestimate this risk in dry years. Therefore, it is necessary to use nonstationary models in climate change scenarios.

The term flood coincidence is used to denote the simultaneous occurrence of floods in two (or more) rivers. The coincidence of flood flows in a mainstream and its tributaries may lead to catastrophic floods 1 . Therefore, assessing the risk of flood coincidence for the main river and its tributaries is critical for flood control and water project operations. As flood events are characterized by flood occurrence dates and flood magnitudes, both of these factors should be taken into account when analyzing the flood coincidence risk 2 . In addition, the analysis of flood coincidence involves at least two rivers. For these reasons, a multivariate hydrological analysis is needed that considers the dependence among flood variables [3][4][5] .
Traditionally, multivariate probability distributions are derived using various assumptions, e.g., the same type of marginal distribution or independence of the variables is assumed 6 . In addition, considering multivariate models from this traditional perspective, mathematical formulations are often complicated when more than two variables are involved 7 . For these reasons, a new method of determining the multivariate probability distribution based on copula functions was proposed by Sklar 8 . The copulas describe and model the dependence structure among random variables, independently of the margins involved. Due to their flexibility of construction, copula functions have been widely used in multivariate hydrological frequency analyses in recent years [9][10][11][12][13][14][15][16][17] , especially in flood coincidence risk analyses 1,18,19 .
The risk of flood coincidence has mainly been analyzed under the stationarity assumption. In other words, both the marginal distribution and the copula function are modeled with fixed moments and parameters [18][19][20] . However, climate change and anthropogenic activities have changed the statistical characteristics of hydrological series and the dependence structure of the variables 21,22 . As a result, increasing attention is being paid to the development of nonstationary multivariate models with copula functions [23][24][25] . In the univariate case, nonstationary models have been widely applied [26][27][28] . Liang, et al. 29 grouped nonstationary flood frequency methods into two types: indirect and direct methods. Direct methods have been widely used because they do not require the restoration of hydrological series. In the multivariate case, copulas have been used to describe the dependence structure of different series [30][31][32] . Chebana, et al. 33 discussed the use of copula functions with time-varying parameters in the case of a changing dependence structure for the investigated variables. Sarhadi,et al. 34 defined the copula parameter as a deterministic function of time. Jiang, et al. 32 compared time-varying copula models with time or a reservoir index as the covariate. However, to the best of our knowledge, few studies in the hydrological

Study Area and Data
The Huai River Basin (N30°55′-36°36′, E111°55′-121°25′), which is composed of many tributaries, is the sixth largest river basin in China. The basin is located in the transitional zone between semiarid and semihumid climates (Fig. 2). Previous studies have shown that extreme rainfall has increased at most stations in the Huai River Basin [39][40][41] , resulting in more turbulent flows in the main stream and its tributaries during the flood season. Therefore, it is necessary to analyze the risk of flood coincidence in the Huai River Basin.
The study area refers to the upper reaches of the Huai River Basin above the Wangjiaba station, with an area of 2.82 × 10 4 km 2 (Fig. 2). The proposed method was applied for flood coincidence analysis for two rivers within the region: one is the upper reach of the main stream of the Huai River, with a catchment area of 1.58 × 10 4 km 2 , and the other is the Hong River, which has a catchment area of 1.15 × 10 4 km 2 . The length of the channel from Huaibin to Wangjiaba is 27 km, which drains an area of 900 km 2 . There are seven rainfall stations in the Huangchuan (HC). The reason for choosing these two rivers is that coincident flooding on the Huai River and Hong River may generate flood peaks, which can threaten the flood control at Wangjiaba station. The flood control of the Wangjiaba section mainly depends on the Mengwa flood detention basin. When a flood occurs, the diversion gate of the Wangjiaba sluice is opened to discharge the flood into Mengwa, and the outflow of Mengwa is controlled by the Caotaizi sluice (Fig. 2a).
In this study, the annual maximum daily flow (AMDF) of each river (Q h for Huai River and Q b for Hong River) and the occurrence dates (T h for Huai River and T b for Hong River) were obtained using the annual maximum method (AM method). Rainfall data were derived from daily data collected at seven rainfall stations in the study area from 1959 to 2015 (Table 1). Then, the average rainfall in the catchment areas of the two rivers was obtained using Thiessen polygons 42,43 , and the annual maximum rainfalls P h and P b were sampled from the average rainfall by the AM method and used as covariates. All data were obtained from hydrological yearbook.
The flood propagation time of each section in the study area was obtained from the Huai River Water Resources Commission (Fig. 2a).

Result and Discussion
Before selection of the model for frequency analysis, nonstationarity evaluations (including of the flood magnitude series and dependence structure among series) should be performed. The nonstationarity can be assessed by change point analysis [44][45][46] and trend analysis. In this study, the change point was detected by the distribution-free cumulative summation test (CUSUM) 47 . The trend analysis was performed based on the nonparametric Mann-Kendall (MK) test 33 .

Hydrometric station
Rainfall station  According to the MK test, the series of Q h presents a significant downward trend at the 0.05 significance level. For Q h , there is a significant upward trend (Fig. 3).

Record of length
The above analyses demonstrate that the flood magnitudes in the two rivers and the dependence structure are nonstationary. In additions, the flood occurrence dates in the two rivers and the dependence structure between them are both stationary.

Estimation of marginal distribution.
Marginal distribution of flood occurrence dates. In this study, the mixed von Mises distribution with constant parameters was selected as the marginal distribution of the flood occurrence dates. The parameters of the mixed von Mises distribution were estimated by the maximum likelihood method. Table 2 summarizes the values of parameters and the goodness-of-fit results for the mixed von Mises distribution of flood occurrence dates for the two rivers: The p-values of the KS test were both larger than 5% (Table 2), which supports the validity of the assumed models.
The frequency histograms of flood occurrence dates fitted by the mixed von Mises distribution are presented in Fig. 4(a,b). The marginal cumulative distribution function (CDF) curves of the flood occurrence dates are shown in Fig. 4(c,d), in which the lines (theoretical distribution) intersect with the observed empirical frequencies. Figure 4(a-d) indicate that the theoretical distribution fits the observed data well. In additions, the highest relative frequencies for the two rivers both occur in July, indicating the floods in these two rivers are more likely to occur during this period.
Marginal distribution of flood magnitudes. In this section, both stationary models and nonstationary models with a rainfall covariate were applied to build the marginal distributions of flood magnitudes in the two rivers. Five probability distributions, including the gamma, Weibull, lognormal, Gumbel, and general extreme value (GEV) distributions, were selected as the candidate marginal distributions (Table S1).
Under the stationary assumption, the results of the five distributions in fitting the two series are presented in Table S2. According to the Akaike information criterion (AIC) minimization method, the gamma distribution with constant parameters is the optimal distribution for Q h , and the Weibull distribution with constant parameters is the best distribution for Q b .
Under the nonstationary assumption, a series of statistical analyses (Table S3) indicate that the Weibull distribution with a rainfall covariate is the best-fitted distribution for Q h and that the gamma distribution with a rainfall covariate is the best choice for Q b . Table 3 summarizes the performance of the four optimal distributions in fitting the two series. According to AIC minimization method, nonstationary models provide a better fit than stationary models for both Q h and Q b . Consequently, the nonstationary models with a rainfall covariate are selected as the marginal distributions of flood magnitudes. The worm plots for the selected models [ Fig. 5(a,b)] show that all the points fall in the 95% confidence interval (i.e., the upper and lower gray dotted lines). In the quantile-quantile (QQ) plots [ Fig. 5(c,d)], all the points are basically distributed along a straight line at a 45 degree angle. Figure 5 indicates that the actual residuals of the selected models are in good agreement with the theoretical residuals.

Gauging station
Parameters www.nature.com/scientificreports www.nature.com/scientificreports/ Estimating the copula parameter. Before the selection of copula functions, the dependence structures among the flood magnitudes and occurrence dates for the two rivers were explored. The Kendall coefficient and Spearman coefficient were used to perform the independence test 48 , as shown in Table 4. The results indicate that Q h and Q b are significantly correlated at the 5% significance level. However, the coefficient between the flood magnitude and occurrence date in the same river is close to zero, which suggests that these two variables are independent.
Three candidate copulas with constant parameters were used for modeling the dependence structure of flood occurrence dates (T h and T b ), including the Gumbel, Frank, and Clayton copulas. Table 5 summarizes the values of parameters and the goodness-of-fit results for these copulas. According to the goodness-of-fit test results (based on Cramér-von Mises statistics), the Clayton copula is not adequate for modeling the bivariate features of the flood occurrence dates (p-value is less than 0.05). In terms of the minimum AIC, the Gumbel copula with θ = .
2 077 c is the best choice. Considering the nonstationarity of the dependence of flood magnitudes, the time-varying copulas were used as candidates as Eq. (1). The parameters of time-varying copulas were described and estimated as Eqs. (8)(9). Table 6 summarizes the values of parameters and the goodness-of-fit results for the optimal copulas under stationary and nonstationary assumptions for modeling the dependence structure of the flood magnitudes (Q h and Q b ). As Table 6 shows, the P-KS values of Z 1 and Z 2 and the P-Kendall values are larger than 0.05, which supports the validity of the assumed models from a statistical perspective. The time-varying Frank copula was selected to model the dependence structure of flood magnitudes by comparing the AIC values. Figure 6 shows the worm plots for the goodness-of-fit test of the Frank copula with time-varying parameters; notably, all points fall within the 95% confidence interval, indicating satisfactory fitting performance for the selected copula model.
The risk of flood coincidence. On the basis of marginal distributions and copula functions, the risk of flood coincidence was analyzed from the three following three perspectives.
Coincidence probabilities for flood occurrence dates. With the mixed von Mises distribution and copula function, the joint distribution of flood occurrence dates was determined in order to calculate the daily probability of flood coincidence for flood occurrence dates, as described in Eq. (10). The annual coincidence probability of the flood   www.nature.com/scientificreports www.nature.com/scientificreports/ occurrence dates = .
P 0 152 T was obtained by Eq. (11). According to the observed data, nine floods occurred simultaneously in the two rivers during the 57-year study period. The statistical coincidence probability of the measured data is 0.175, which is close to the model calculation result. Coefficients   www.nature.com/scientificreports www.nature.com/scientificreports/ The daily probability of flood coincidence for flood occurrence dates was plotted as shown in Fig. 4(e). The highest probabilities of flood coincidence occur in mid-July, which suggests a high combination risk. Hence, the Wangjiaba flood diversion sluice needs to discharge the floods to Mengwa for flood control during this period. The coincidence probabilities are close to zero from May to mid-June, which indicates that the flood coincidence is extremely low in the two rivers during this period. Therefore, it is possible to open the Caotaizi escape sluices to discharge the flow in the Mengwa flood detention basin during this period.
Annual coincidence probabilities for given flood magnitudes. In this section, the annual coincidence probability for given flood magnitudes is calculated. According to the investigation of the pairwise dependence structures, the occurrence dates and magnitudes of floods are independent of each other. Hence, we employ Eq. (12) to describe the joint distribution of the flood magnitudes. Moreover, the annual flood coincidence probability for given flood magnitudes was calculated by Eq. (13).
To compare the difference in flood coincidence under stationary and nonstationary conditions, the coincidence probability P T in Eq. (13) was calculated under the stationary and nonstationary hypotheses. The values of q h and q b of the two rivers were the design flows for the t-year univariate return period under the stationary assumption. Here, we used = T (10,20,50) years as examples, and the annual coincidence probability for a given flood magnitude and the corresponding variations are presented in Fig. 7. For ease of visualization, the vertical coordinate was set to logarithm of the coincidence probabilities.
Our results indicate that the coincidence probability is constant under the stationary condition. However, under the nonstationary condition, the coincidence probabilities for the flood magnitude fluctuate each year over a wide range. The coincidence probabilities for the flood magnitude display variational processes that are consistent with those of rainfall, reflecting the positive correlation between these factors (Fig. 7). In addition, a series of statistical analyses shows that the nonstationary multivariate model performs better than the stationary multivariate model. The probabilities under the stationary assumption may underestimate the risk of flood coincidence in wet years and overestimate this risk in dry years.
The coincidence probabilities in the case of = T 50 year are close to zero under both the stationary and nonstationary assumptions, which indicates that the coincident flooding is less likely to occur for large flood events than for other events.
Flood coincidence risk in the "most likely" design. The "most likely" design in this section provides the most likely scenario for a specific coincidence probability. According to Eq. (13), the joint exceedance probability-isolines (PILs) for each year can be drawn assuming that the coincidence probability P t is a definite value. Here, taking = . P 0 01 t as an example, the PILs derived from the stationary model and nonstationary model are shown in Fig. 8(a). The "most likely" design for Q hmax and Q bmax in each year considering the definite probability can also be obtained according to Eqs. (14)(15), and the corresponding results are denoted by color points in Fig. 8(a).
Under nonstationary conditions, the isoline is constantly oscillating over time while it is a fixed curve under stationary conditions. The "most likely" design of Q hmax ranges from 4050 m 3 /s to 5035 m 3 /s and that of Q bmax ranges from 738 m 3 /s to 3625 m 3 /s. This difference indicates that Q b is more susceptible to environmental change than Q h . The combined flows of Q hmax and Q bmax in different years are shown in Fig. 8(b) and rang from 4781 m 3 /s to 8099 m 3 /s, which may lead to flood peaks downstream. Hence, the combined flows calculated based on the "most likely" design have a certain reference significance for flood predictions downstream in Wangjiaba. Therefore, it is necessary to consider the flood coincidence between the Hong River and Huai River and take necessary measures to alleviate the pressure on downstream flood control projects.

Conclusions
Flood coincidence analysis plays an important role in flood risk analysis. This study proposed a nonstationary multivariate model to solve the flood coincidence problem in a changing environment. The proposed model was developed using the mixed von Mises distribution, the GAMLSS model and a copula function. The main conclusions are presented as follows.
First, the probabilities of flood coincidence on two rivers show that coincident flooding is more likely to occur in mid-July than in other periods (Fig. 4e). Hence, it is possible to discharge these floods to Mengwa in this period to alleviate the downstream flood control pressure.
Second, both the flood magnitudes and their dependent structure are verified to be nonstationary, and the nonstationary multivariate model of flood magnitude performs better than the stationary model. The coincidence probability for large floods (i.e., under the situation of = T 50 year) is nearly equal to zero, which indicates that coincident flood events are more likely to occur for medium-scale or small-scale floods.
Finally, the combined flow under stationary and nonstationary conditions can be obtained from the corresponding "most likely" design (Q hmax and Q bmax ), which provides a basis for downstream flood safety. The range of Q bmax is larger than Q hmax , which indicates that the nonstationarity of the Hong River is larger than that of the Huai River. Therefore, more attention should be paid to flood control planning in the Hong River Basin.
In conclusion, this study provides a reasonable approach for assessing the risk of flood coincidence in the Huai River Basin under nonstationary conditions. The trends and risks of flood coincidence can be further studied to improve flood management.  www.nature.com/scientificreports www.nature.com/scientificreports/ Methods Framework of the copula function. The copula function, which was first proposed by Sklar 8 , describes the correlation between variables. It is actually a class of function that connects joint distributions to their respective marginal distributions. Nelsen 49 and Joe 50 provided a theoretical introduction to copulas. Studies on practical approaches have also been developed 6,51,52 . Specifically, basic guidelines for using copulas in hydrological applications were illustrated by Favre 51 , Salvadori and De Michele 16 , and Salvadori et al. 53,54 Useful and free software routines were provided and illustrated by Hofert et al. 55 Environmental changes can influence both the statistical characteristics of hydrological series and the dependence structure of hydrological variables. Considering these changes, a time-varying copula should be considered in such analyses. Hence, the joint distribution of the hydrological variable pair of Y Y ( , ) t t 1 2 at time t can be produced as follows: where H(·) represents the joint cumulative distribution function (CDF) of Y t 1 and Y t 2 , F 1 is the cumulative marginal distribution of Y t 1 , F 2 is the cumulative marginal distribution of Y t 2 , C(·) is the copula function, θ t 1 and θ t 2 are the time-varying marginal distribution parameters, θ c t is the time-varying copula parameter, and the marginal probabilities u t and v t should obey a uniform distribution in the range of [0,1].
Under the stationary assumption, all the parameters above can set as constants: According to Eq. (1), the implementation of the time-varying copula consists of two steps. The first step is to analyze the univariate nonstationarity (including change point and trend tests) and select an appropriate marginal distribution. The second step is to analyze the nonstationarity of the dependence structure and select an appropriate copula function.

Marginal distribution. Marginal distribution of flood occurrence dates.
Flood occurrence dates can be regarded as a vector with periodic changes. The mixed von Mises distribution 56 is a distribution commonly used to describe periodic or seasonal variables and has been proven to effectively fit flood dates 12 . The directional variable of flood occurrence dates x i can be obtained from the following relation: where L denotes the length of the flood season and D i is the flood occurrence date. The probability density function for a mixture of N von Mises distribution can be produced in the following form: where u i and κ i are location and scale parameters, respectively; p i is the weight of the probability density; = N 2 in this study; and I 0 is the Bessel function. In this study, parameters were estimated by the maximum likelihood estimation method, and the approximate Monte Carlo goodness-of-fit test p-values (based on Kolmogorov-Smirnov statistics) were calculated to assess the validity of the assumption that the flood occurrence dates followed the selected distribution.
Marginal distribution of flood magnitudes. In this study, five widely used distributions, including the gamma (GA), Weibull (WEI), lognormal (LOGNO), Gumbel (GU), and generalized extreme value (GEV) distributions, were selected to describe the flood magnitudes (Table S1). Then, based on the GAMLSS model proposed by Rigby and Stasinopoulos 36 , the time-varying marginal distribution was determined; such distributions are widely used in frequency analyses of nonstationary hydrological series.
Taking a three-parameter model with one explanatory variable as an example, if the response variable y t follows the distribution function 1, 2, ) , then each parameter can be described as a linear function of the explanatory variable x t via a monotonic link function g k (·) as follows:  www.nature.com/scientificreports www.nature.com/scientificreports/ The optimal distribution can be selected from the candidates by comparing the value of the Akaike information criterion (AIC). All computational processes were implemented in the R package "gamlss".
Copula with time-varying parameters. In multivariate hydrological frequency analysis, the Archimedean copula is widely used due to its flexibility in structural form 49 . Here, three simple single-parameter copulas, including the Gumbel copula (GH), Clayton copula (CL), and Frank copula (FR), were used as candidates to model the time-varying dependence structure (Table S4).
The time-varying copula function assumes that the copula parameters obey a time-varying process and have time-varying characteristics. In this study, a process similar to ARMA (1, q) 35 was used to describe the time-varying nature of the parameters: where Λ() is a transformation function that considers the correlation with ρ t in the definition interval (0, 1) in this study, Λ , and τ c t57 can be converted to θ c t58 . In this paper, the ARMA model with a lag of 10 orders was adopted (q = 10). Two-step pseudo maximum likelihood estimation was adopted for parameter estimation. First, the marginal distribution parameters were calculated, and the empirical distribution was then used instead of the marginal distribution to determine the logarithmic likelihood function of the following equation. A two-step method was used to estimate the maximum likelihood.
where c(·) indicates the density of the copula function and  F x ( ) represent the empirical marginal distributions. The parameters can be obtained by combining Eqs. (1) and (9). Because the test method of calculating the "distance" between the fitted copula and the empirical copula is not suitable for time-varying copulas 59 , the Rosenblatt probability integral transformation was used to test the goodness-of-fit of the time-varying copulas 60 . The optimum copula was selected according to the minimum AIC.
Analysis of flood coincidence risk. The term coincidence refers to the simultaneous occurrence of floods in two (or more) rivers, which can be measured by the exceedance probabilities of flood events. As flood events are characterized by flood occurrence dates and magnitudes, both of these factors should be considered.
First, the flood occurrence dates were defined as the occurrence dates of the AMDF in the Huai River or Hong River. Then, the daily coincidence probabilities for the flood occurrence dates were described by the following mathematical equation: T i n i 1 where i indicates the i th day of the flood season and ∆t indicates the time interval between the response in the two rivers. Because the maximum daily flow in the two rivers rarely occurs on the same day, we defined ∆ = t 1 as a flood coincidence event in this study, and τ hb indicates the difference in propagation time between the two stations (the difference in propagation time between the Huaibin and Bantai stations is 16 hours). Additionally, n indicates the length of the flood season, and P T is the annual coincidence probability of flood occurrence dates.
Second, the series of AMDFs was selected to fit the joint distribution of flood magnitudes, assuming that the random variables of AMDF in Huaibin Q h obey the F H distribution and the random variables of AMDF in Bantai Q h obey the F B distribution. Then, the exceedance probability of flood coincidence for a given flood magnitude in the t th year can be defined as: where q h and q b are the designed flows, C(·) represents the copula function with time-varying parameters, P Q t is the exceedance probabilities for coincident flood magnitudes, and Q h and Q b are the flood magnitudes, namely, the AMDFs sampled by the AM method.
Then, based on the hypothesis that the flood occurrence dates and flood magnitudes are independent of each other (i.e., the Kendall coefficient is less than 0.05), the annual flood coincidence probabilities P t for given flood magnitudes in the t th year can be stated as AND-joint exceedance probabilities: This equation describes the probability that Q h is greater than q h and Q b is greater than q b when the occurrence dates of these two flows differ by one day.
Finally, the impacts of flood coincidence on the lower reaches of the basin were analyzed using the "most likely" design 38 . When P t is fixed as a constant in the range of (0,1), there are many possible combinations of (Q h , Q b ), where P Q t is also a constant because of the constant nature of P T . Among these combinations, the "most likely" design reflects the combination with the highest probability, as defined in Eqs. (14)(15). where − F H 1 (·) and − F B 1 (·) are the inverse functions of the marginal distributions of Q h and Q b with time-varying parameters, respectively; f c is the joint probability density with time-varying parameter θ c t ; and µ and ν are independent of each other and subject to a uniform distribution in the range of [0, 1]. Then, the "most likely" design in the t th year can be calculated by using the inverse of the CDFs of the marginal distributions: