Datasets for characterizing extreme events relevant to hydrologic design over the conterminous United States

Despite the close linkage between extreme floods and snowmelt, particularly through rain-on-snow (ROS), hydrologic infrastructure is mostly designed based on standard precipitation Intensity-Duration-Frequency curves (PREC-IDF) that neglect snow processes in runoff generation. For snow-dominated regions, such simplification could result in substantial errors in estimating extreme events and infrastructure design risk. To address this long-standing problem, we applied the Next Generation IDF (NG-IDF) technique to estimate design basis extreme events for different durations and return periods in the conterminous United States (CONUS) to distinctly represent the contribution of rain, snowmelt, and ROS events to the amount of water reaching the land surface. A suite of datasets were developed to characterize the magnitude, trend, seasonality, and dominant mechanism of extreme events for over 200,000 locations. Infrastructure design risk associated with the use of PREC-IDF was estimated. Accuracy of the model simulations used in the analyses was confirmed by long-term snow data at over 200 Snowpack Telemetry stations. The presented spatially continuous datasets are readily usable and instrumental for supporting site-specific infrastructure design.


Background & Summary
Although it is well understood in the scientific community that extreme hydrometeorological events in cold climates are often related to snow processes, i.e., snowmelt and rain-on-snow (ROS) 1-6 , they have largely been ignored or under-represented in hydrologic design that relies largely on traditional precipitation-based intensity-duration-frequency curves (PREC-IDF) to estimate design basis extreme events (e.g., 100-year 24-hour event). PREC-IDF such as the National Oceanic and Atmospheric Administration (NOAA) Atlas 14 7 assumes that precipitation is in the form of rainfall that is immediately available for rainfall-runoff processes. This assumption has obvious shortcomings, especially in snow-dominated regions where winter precipitation is primarily snowfall. At locations where runoff is released slowly from accumulated snowpack, PREC-IDF can lead to infrastructure overdesign and incur unnecessary costs. Conversely, underdesign will occur where the snow-driven runoff rate is higher than the rate of precipitation. This was confirmed by previous research 8,9 , which demonstrated that PREC-IDF underestimates the 100-year, 24-hour extreme events in 45% of the Snow Telemetry (SNOTEL) stations examined, and the resulting peak design flood could be underestimated by up to 324%.
The Oroville Dam failure in February 2017 that required $1.5 billion to repair is a notable example of costly infrastructure damages resulting from floods driven by ROS events 10 . The exclusion of snow processes in the PREC-IDF technique is likely to cause greater errors in estimating extremes in a warming climate and present higher infrastructure design risk. For example, increased frequency and intensity of atmospheric rivers in the future are anticipated to cause more extreme orographic precipitation and subsequently increase flood risk along the U.S. West Coast [11][12][13][14] .
Given the limitations in PREC-IDF and implications for design risk, the Next-Generation IDF (NG-IDF) technique 8 was developed to estimate extreme events based on the amount of water reaching the land surface (W) during rain, snowmelt, and ROS events. By including snow processes, NG-IDF provides a systematic and consistent technique for all environments from rain-dominated, transitional, to snow-dominated locations. Despite the marked advantages of NG-IDF over PREC-IDF, its wide adoption by engineers and planners is hindered by rather limited snow observations for estimating W, especially relative to widely available precipitation products [15][16][17] . Using physics-based models to produce reasonable simulations of W, while feasible, is rather challenging, given the significant requirements of expert knowledge in model calibration and considerable cost of computational resources.
To support the broad adoption of the NG-IDF approach, we developed a suite of datasets that characterize extreme events relevant to hydrologic design over 1951-2013 at a 1/16th-degree (~6 km) resolution across the conterminous United States (CONUS). For over 200,000 locations in the CONUS, the datasets provide: the magnitude and dominant driving mechanism (i.e., rain, melt, and ROS) of extreme events for different durations (24-72 hours) and return periods (2-500 years) derived from the NG-IDF curves; the magnitude of design extreme events associated with different hydrometeorological drivers; trend and seasonality of annual maximum W events (AMW) over 1951-2013; and infrastructure design risk associated with the PREC-IDF technique. These datasets were developed based on sub-daily simulations of W by a well-validated physics-based hydrological model (Distributed Hydrology Soil Vegetation Model, DHSVM 18 ). To examine the accuracy of the simulations, the simulated snow water equivalent (SWE) was evaluated against long-term observations of SWE from 246 SNOTEL stations distributed across the Western U.S. These datasets intend to offer spatially distributed, quantitative measures of extreme events that are readily usable by the science and engineering communities to understand the climatology and driving mechanism(s) of extreme events, identify potential infrastructure design risk associated with the standard PREC-IDF method, and improve estimation of extremes.

Methods
Overview of NG-IDF datasets development. The method to develop the NG-IDF datasets is illustrated in Fig. 1. In contrast to PREC-IDF that estimates extreme events based on the total amount of precipitation (implicitly assumed to be in the form of rain), NG-IDF curves are developed based on the amount of water available for runoff (W) for the bare ground condition with no canopy cover, which can be represented mathematically by: where P is precipitation, ΔSWE is the change in ground snowpack water content, and S indicates condensation (positive) or evaporation/sublimation (negative) of snowpack. In this development, as described in more detail in the following sections, P was taken from the gridded, gauge-based meteorological dataset, ΔSWE and S were simulated by the DHSVM snow model. The resulting W data is a 3-hourly time series for 1950-2013 at 207,173 grid cells covering the land surface of the CONUS at a 1/16th-degree resolution. As observations suggested that snowfall events largely last up to 72 hours 19 , NG-IDF curves were developed for W events with 24-, 48-and 72-hour durations, respectively. We first developed series of annual maximum W (AMW) for years 1951-2013 based on subdaily estimations of W aggregated over different durations. For example, AMW with a 72-hour duration is the maximum of the 72-hour moving sum of W over a given year. Given variations in precipitation seasonality across the CONUS, we defined AMW events for both water year (October 1st through September 30th) and calendar year (January 1st through December 31st). AMW based on calendar year is more appropriate for locations with extreme precipitation occurring during the summer season such as the Southwest U.S., and AMW based on water year is better for locations with extreme winter precipitation such as the Western U.S. Given the focus on snow-driven extreme events, we used AMW defined by water years for IDF curves.
We applied the nonparametric Mann-Kendall test 20,21 on the AMW series. Where trends were significant at the 5% confidence level, we detrended the time series using Sen's slope 22 while maintaining the long-term average of the time series. As suggested by NOAA Atlas 14, the generalized extreme value (GEV) distribution was fitted to AMW across all locations using the L-moments statistics 7,8 . Here we used the same GEV distribution for analysis of extreme events with different durations across the CONUS so that direct comparisons can be made in frequency estimates across durations or between locations. To quantify the implications of using PREC-IDF for infrastructure design risk, we also developed PREC-IDF curves following the same approach based on annual maximum precipitation (P). The design risk was quantified by comparing the NG-IDF and PREC-IDF values of extreme events (e.g., 100-year 24-hour event) over the CONUS.
A Monte Carlo (MC) simulation procedure 23 and NOAA Atlas 14 7 was used to consider sample data uncertainty in frequency analysis. For each location, we generated 1,000 MC synthetic annual maximum series. We then fitted the GEV distribution to each MC series using the L-moments statistics and estimated the associated NG-IDF values. The uncertainties in NG-IDF curves for each location were quantified using the 5% and 95% quantiles (i.e., the 90% confidence interval) of ensemble members.
Gauge-based gridded meteorological data. For simulations of snow processes, the model requires subdaily input of P, air temperature, relative humidity, downward shortwave and longwave radiation, and wind speed. The meteorological input used here was derived from disaggregating the gridded (~6 km), daily land surface meteorological dataset 24 over the CONUS using the Mountain Microclimate Simulation Model (MTCLIM) algorithm 25 . Distinct from reanalysis products, this dataset is one of the few gauge-based gridded datasets with long-term continuous records. It includes daily records of P, maximum and minimum air temperature, and wind speed spanning the period 1950-2013. The disaggregation algorithms 25 estimate subdaily air temperatures with a 3rd-order polynomial fit based on daily temperatures range. Radiation and relative humidity are estimated based on daily temperature range, P, and solar geometry 25 . P is assumed to occur at a uniform rate throughout the day.
Observational snow data. Daily SWE measurements were retrieved from SNOTEL stations for snow model parameterization and evaluation. Among the 785 SNOTEL stations, we selected 246 stations that shared the longest common period (2007-2013) of bias-corrected and quality-controlled (BCQC) daily SWE records. Briefly, standard quality control procedures 8,26 were applied to remove stations with missing data, outliers, and problematic SWE values (e.g., peak SWE > accumulated winter precipitation). The BCQC procedures and the resulting BCQC datasets 27 are available to the public at https://www.pnnl.gov/data-products.

CONUS-scale snow modeling and parameter development.
The physics-based, snow submodel of DHSVM 18 was implemented at the point scale to simulate snowpack dynamics under a bare ground and flat terrain condition. The model was run at the 3-hourly time step from 1950-2013 for 207,173 point locations at a 1/16th-degree grid spacing that coincides with the center of the meteorological grids. Model output includes 3-hourly times series of SWE and S, which are used together with observed P for calculating W in Eq. 1.
DHSVM simulates ground snowpack accumulation and melt using a two-layer mass and energy balance ground snowpack module. The mass balance components consist of P, S, changes in SWE, and melt from the snowpack. The partition of P into rain and snow is based on air temperature thresholds: where T a is air temperature, T S and T R is the temperature threshold for P to be completely snowfall and rain, respectively. If T a ≥ T S , 100% of the P is rain (R); if T a < T R , 100% of the P is snow; if T a falls between the two thresholds, rain and snow are proportionally allocated to represent mixed rain and snow events. Energy balance at the snow surface is driven by net radiation, sensible and latent heat, and advected heat by rain. Energy and mass exchange between the thin surface layer and deep snowpack layer occurs via the exchange of meltwater. When liquid water in the deep snowpack exceeds its holding capacity, excess water is released to the underlying soil column. Detailed descriptions of the DHSVM snow model physics and governing algorithms can be found in a large body of literature 18,[28][29][30] . Prior calibration of snow models is performed typically at relatively local scales; thus, there are no calibrated, spatial snow parameter sets that can be readily applied for the CONUS-domain snow modeling. In support of this work as well as future large-domain snow modeling, here we developed spatially distributed snow parameters for the CONUS domain. Based on previous research 29 that documented the robustness of regionally www.nature.com/scientificdata www.nature.com/scientificdata/ coherent snow parameters in modeling snowpack dynamics, we developed snow parameters for five spatial clusters covering the CONUS (Fig. 2). Given strong correlations between winter climate and key aspects of snowpack dynamics 31-33 , we determined the clusters using the k-means clustering machine learning technique 34 based on the grid-level climatological mean of P, maximum and minimum air temperature, and wind speed from November through March during 1950-2013. Different numbers of clusters were tested and we selected the optimal five clusters based on the inertia elbow method and our previous work 29 .
Parameter development focused on four snow parameters that were identified by previous work 29 to be crucial for capturing daily SWE dynamics 29 : (1) T S (defined in Eq. 2), (2) fresh snow albedo (a max ), (3) albedo decay coefficient during snow accumulation (λ A ) and (4) snowmelt (λ M ). The last three parameters are applied in the snow albedo decay curve of Laramie & Schaake 35 for estimating snow surface albedo evolution, given by: www.nature.com/scientificdata www.nature.com/scientificdata/ consisting of 10,000 sets of the four snow parameters drawn uniformly from their physically plausible ranges, using the Latin Hypercube Sampling algorithm; (2) snow simulations were conducted at each SNOTEL location for every prior parameter set; (3) the posterior ensemble parameters were resampled from the prior ensemble if they met the threshold values of objective functions with observations. Here, we used three metrics (Eqs. 4-6): Nash-Sutcliffe Efficiency (NSE) of daily SWE ≥ 0.6, the bias in the mean annual peak SWE (PEAK.ERR) within ± 25%, and the bias in the timing of peak SWE (PDATE.ERR) ≤ 14 days.
where O i and Y i are the observed and predicted daily SWE at day i, respectively; t is the total number of days for which model simulations were performed; O is the observed mean daily SWE over the simulation period.
where O k P and Y k P are the observed and predicted peak SWE for the k th water year, respectively; ny is the total number of years of simulation.
where Y k D and O k D are Julian dates of observed and predicted peak SWE for the k th water year, respectively. After step (3), 58 stations with no qualified posterior parameter values were removed from subsequent analyses. Most of these stations are located in high-latitude areas of Montana and Wyoming, where model skill is challenged by the lack of representation of wind effects on snow redistribution, or the maritime Pacific Northwest where snowmelt tends to be sensitive to errors in modeled energy balance and precipitation partitioning when temperature is near freezing 29 ; (4) for each cluster, final parameter values were calculated as the ensemble mean over the posterior parameter space of all stations within the cluster. For the Southern cluster (C3), no SNOTEL observations are available for parameter development. Because snowfall is very limited for most of the Southern cluster where snow parameters have negligible effects on extreme events, we used the parameter values of the maritime cluster for the Southern cluster given their commonality of warm winter. The cluster parameter values are presented in Table 1.
Driving mechanism of extreme events. For each location, the driving mechanism was determined for extreme W events with different durations and return periods. Table 2 presents the classification approach used for determining the mechanism based on subdaily information of P and SWE, which include: 1. Rainfall only (R): precipitation on snow-free ground; 2. Snowmelt only (M): decreasing SWE with no concurrent precipitation; 3. Rain-on-snow (ROS): decreasing SWE with concurrent precipitation. Given the interest in flood potential, a ROS event is further refined as one with at least 10 mm rainfall per day falling on a snowpack with at least 10 mm SWE over the selected duration, and the sum of rain and snowmelt contains at least 20% of snowmelt [36][37][38] .
Annual maximum series resulting distinctively from each driving mechanism were determined, based on which IDF curves were developed following the same approach as described for NG-IDF. The mechanism producing the largest IDF value was identified as the dominant mechanism of extreme events.   www.nature.com/scientificdata www.nature.com/scientificdata/ Seasonality of extreme events. The seasonality of AMW at the 24-, 48-and 72-hour durations was represented by the seasonality index (SI) and the mean date (MD) relative to October 1st due to the use of water year. They were calculated using the circular statistics 1,39,40 , given by: For a given water year denoted by i, D is the day of the AMF occurrence relative to October 1st (i.e., D = 1 if the event occurred on October 1st); n is the total number of water years used in the analysis. SI, ranging from 0 to 1, measures the temporal variability of the occurrence of events. A smaller SI suggests weaker seasonality, and the associated MD is therefore less reflective of the actual timing of the extreme events.

Data Records
The NG-IDF datasets 41 are available to the public through an unrestricted repository at https://doi.org/10.5281/ zenodo.5827028 in comma-separated value (.csv) format. Table 3 provides a summary of the folder structures, description of data files, output variables in each file and the format.

technical Validation
The accuracy of estimated W and all related datasets depends primarily on the accuracy of daily SWE simulations given that P was observational (see Eq. 1). As there exists no data for direct evaluation of NG-IDF curves, we validated SWE simulations against daily SWE observations from 246 SNOTEL stations. Three model performance metrics that compare the simulated and observed SWE were applied: (1) NSE, (2) PEAK.ERR, and (3) PDATE.ERR, which measured the overall goodness-of-fit, peak SWE, and the timing of peak SWE, respectively. Model evaluations (Fig. 2) showed that NSE of daily SWE was greater than 0.6 at 75% of all stations, PEAK.ERR was within ± 25% at 67% of the stations, and PDATE.ERR was within two weeks at 67% of the stations. Overall, the simulations were able to reproduce the observed SWE dynamics at most stations using the cluster-based snow parameters.

Usage Notes
The NG-IDF datasets listed in Table 3, with no additional data analysis, can be used for a wide variety of applications over spatial scales ranging from local, regional to the CONUS scales. Overall, the presented estimates of extreme events and their characteristics based on long-term observational and simulation records are crucial for understanding flood potential, particularly for cold regions where infrastructure design risk exists from using PREC-IDF curves or NOAA Atlas 14.
For hydrologic engineering designs and analyses, one can obtain for any location(s) of interest the magnitude of extreme W events (Fig. 3a) and their dominant mechanism (Fig. 3c). Through comparing the NG-IDF and PREC-IDF values of extreme events for the same locations, one can determine the magnitude of bias or design risk related to the use of PREC-IDF (Fig. 3b). Information related to the flood seasonality is also key for understanding the generating mechanisms of floods and supporting future flood management. For instance, one can obtain the seasonality index (SI) and the mean timing of AMW events for any location of interest. For locations with a stronger seasonality (i.e., a higher SI), there is less inter-annual variability in the timing of AMW occurrence, and thus the mean date represents better the occurrence dates of AMW from year to year. At broader spatial scales, the datasets can be used for prioritizing locations for flood management and adaptation. As shown in Fig. 3, there is substantial spatial variability in the magnitude of 100-year 24-hour W events, which are typically higher in the ROS-dominated Pacific Northwest mountain ranges, and rain-dominated Gulf coastal plains. The dominant mechanism exhibits greater heterogeneity in topographically complex mountainous regions, and there is a shift in the dominant mechanism from ROS to rain or melt for events with a longer duration (72-hour versus 24-hour) (Fig. 3c,d). The presented datasets can also be used for estimating catchment-scale flood responses by coupling the NG-IDF curves with a rainfall-runoff model as demonstrated in prior work 16 .
Lastly, it should be noted that the datasets are subject to a few limitations: (1) Diurnal variability is not represented in the precipitation data used here to force the snow model and construct IDF curves. As a result, our estimates of extreme events are limited to daily or longer durations. While the daily temporal resolution is mostly sufficient to capture snow-related extreme events, short-duration IDF curves based on high-resolution rainfall data are more appropriate for capturing short-duration extremes (e.g., flash floods) or floods in small catchments with fast response times (<24 hours).
www.nature.com/scientificdata www.nature.com/scientificdata/ (2) Although generally smaller compared to sample data uncertainty 8 Fig. 1). In this particular case, the analysis suggests that the stationary assumption may lead to the underestimation of extreme events into the future. With the provided AMW series, one can develop nonstationary IDF curves using the approach of choice.   Table 3. Description of the NG-IDF datasets. *Note: In "Data File Description", C = column, R = Row. C[i] indicates the ith column of a data file.