Seasonality of climatic drivers of flood variability in the conterminous United States

Flood variability due to changes in climate is a major economic and social concern. Climate drivers can affect the amount and distribution of flood-generating precipitation through seasonal shifts in storm tracks. An understanding of how the drivers may change in the future is critical for identifying the regions where the magnitude of floods may change. Here we show the regions in the conterminous U.S. where seasonal changes in global-scale climate oscillations have driven a large part of the variability of flood magnitude. The regions are cohesive across multiple watershed boundaries suggesting that variability in floods is driven by regional climate influences. Correlations with climate indices indicate that floods in the western and southern U.S. are most affected by global-scale climate. The regions provide a useful approach for characterizing flood variability and for attributing climatic drivers on flood variability and magnitude.

strongly resembles the variability at other gauges in the cluster, whereas gauges with smaller circles have less similarity of flow variability to other gauges within its cluster. Table 1 provides statistical summaries for the clusters and the p value for the Mann-Kendall non-parametric trend test for a monotonic trend in the cluster-mean flow for each cluster.  www.nature.com/scientificreports www.nature.com/scientificreports/ We found that the spatial patterns of the clusters correspond more to hydroclimatic regions related to atmospheric moisture delivery rather than regional drainage-basin boundaries, which is similar to previous findings for mean annual streamflow 24,32 . A distinct pattern in the northwestern U.S. (cluster 1) for annual peaks and all seasonal maximum streamflow corresponds to a relatively consistent pathway of moisture delivery from the North Pacific Ocean [33][34][35] . That is, variations in moisture delivery may drive part of the variability of floods in this region. Cluster patterns for the central eastern coast of the U.S. and Appalachian Mountain regions are consistent with findings that this region receives moisture mainly from the Atlantic Ocean and Gulf region 33 .
The spatial patterns and number of clusters for both annual and seasonal maximum streamflow vary for the southwest, southeast, and central U.S., which suggests that seasonal shifts in atmospheric moisture delivery 35 are important for peaks and seasonal maximum flows in those regions than other regions indicated by the clusters. The seasonal variability of cluster patterns in the central U.S. are likely related to seasonal shifts in the pathways of flood-generating moisture from the Pacific and Atlantic Oceans and Gulf of Mexico 20 . For JFM a distinct cluster includes much of the southeastern U.S. and may be related to the variability of moisture from the Gulf of Mexico and Atlantic Ocean that is limited to the Gulf Coast region in winter. For AMJ the distinct southeast cluster found for the JFM data shifts to the southwest, and the AMJ maximum flows resemble the annual peak flows across the U.S. For JAS, the spatial pattern of clusters shifts toward large regions for the central and southeast U.S. that may be related to the variability of widespread moisture from the Gulf of Mexico and Atlantic Ocean that intrudes into the southern U.S. 20 . The spatial patterns of clusters in the southwestern U.S. likely vary because of seasonal differences in moisture delivery (frontal systems, monsoon-driven, and tropical storms) from the Pacific. The seasonal differences in the spatial patterns of the clusters demonstrate that seasonal variability of moisture sources produces different regions of peak flow variability

Flood Variability through Time
We found that the magnitude of peaks and seasonal maximum flows were highly variable year-to-year from 1966 to 2015 (Fig. 3) but the variability had some temporal persistence. The annual time series of the cluster-mean flows appear nearly random, so we used a three-year moving average to retain interannual oscillations. For the annual peaks, the smoothed time series indicate periods of lower-than-average peaks (yellow and red) in almost all clusters before 1970, in the mid-to-late 1980's, and generally between 2000 to 2015, especially during several years around 2000. Periods of higher annual peaks (blue shades) occurred nationwide during the early to mid-1970's, early to mid-1980's and the mid-to-late 1990's.  Table 1. The mean, maximum (max), minimum (min) correlation coefficient r between the mean time series for the peaks and seasonal-maximum flow clusters and the same flows at gauges within the clusters, the count of gauges in each cluster, and the p value for the Mann-Kendall non-parametric trend test for a monotonic trend in the mean-cluster time series (p < 0.05 in bold). The '-' indicates no correlation test.
Many of the temporal patterns in the seasonal maximum flow metrics resemble the variations in annual peak flows. With few exceptions, below-average maximum flows prior to 1970 (1966-1969) occurred nationwide in OND and JFM-in AMJ and JAS below-average flows were only in clusters 4 and 5 (southeast and eastern U.S.). Above-average maximum flows were highest during OND and AMJ in the early-to-mid 1980's. Other above-average maximum flows occurred in JAS during that same period in clusters 1 and 2 (western U.S.) while the Midwest and eastern U.S. experienced below-average maximum flows. Above-average flows in the mid-to late 1990's mostly occurred during OND and JFM. Since 2000, the tendency of below-average maximum flows is more prevalent during AMJ and JAS until about 2012 when OND and especially JFM also experienced below-average maximum flows. The below-average flows in the early 2000s generally occurred in all seasons.
We used trend tests and frequency analyses to examine temporal persistence in the cluster-mean flows ( Table 1). A Mann-Kendall 36 trend test on the unsmoothed cluster-means found significant trends at p < 0.05 only for region 3 (southwestern U.S.) in the OND and AMJ seasons, and in region 1 for the JAS season (northwestern U.S.). A lack of significant trends in large flows is consistent with previous findings 1,7,[37][38][39] . To identify periodic variability, we used a Discrete Fourier Transform (DFT) with Hann windowing 40

climatic Drivers of Annual and Seasonal floods
The regional clusters have some correspondence with the spatial relations of precipitation with climate indices, particularly in the western U.S. 27,45,46 . So, we examined the relations between climate indices and the cluster-mean flows using Pearson correlation. Because the individual gauge records are averaged for each cluster, we further examined the relations of the seasonal means of the climate indices with annual peaks (Fig. 4) and with seasonal maximum streamflow ( Supplementary Fig. S1) at each gauge. We also correlated the cluster-mean flows with mean water-year sea surface temperature (SST) (Supplementary Figs S12-S16). The climate indices were the MEI 47 , PDO 26 , PNA 27 , AMO 28 , NAO 29 , and AO 30 . Table 2 shows significant correlations (p < 0.05) between annual peaks and seasonal means of the climate indices in bold and in outlined cells and greater significance (p < 0.01) as bold text in outlined cells. The clusters with field significance (p < 0.05) are indicated by a percentage in bold (see methods for details). Figure 4 shows correlations of the seasonal climate indices and peak flows at individual gauges that were selected for having many significant correlations over coherent regions.
Peak flows in the northwestern U.S. (cluster 1) and southern and central U.S. (cluster 3) were significantly correlated with the most climate indices we examined (Table 1)  www.nature.com/scientificreports www.nature.com/scientificreports/ AMJ (p < 6E-3), and with PNA in AMJ (p < 3E-3). These patterns are consistent with the dipole pattern of correlations of precipitation 45,48 and streamflow with ENSO 49 , PDO, and PNA 27 , in which precipitation is less than normal in the northwestern U.S. and greater than normal in the southwest and southern U.S. during positive phases of MEI, PDO, and PNA. These correlations with peaks and MEI, PDO, and PNA are stronger (more negative in the northwest, and more positive in the central and southern U.S.) than those identified for mean annual streamflow 24 . The southern and central U.S. also had greater significance (negative) with AO in AMJ (p < 1E-3), which may be related to the relations between the Northern Annular Mode (indicated by NAO and AO) and storm tracks across the western U.S. and northern central plains 50 . Cluster 2 (western U.S. and central plains) also had significant correlation (negative) with AO in AMJ for the same reasons and in OND months possibly because of changes in the strength of the westerlies 51 . The southern and central U.S. (cluster 3) had significant negative correlations with the AMO in OND, which may be related to negative correlations of AMO and precipitation in similar areas 52 . Somewhat surprising, cluster 4, which includes most of the northeastern U.S., was not significantly correlated to any of the climate indices we investigated.
Like the peak flows, the seasonal cluster-mean maximum flows in the western and southern U.S. were significantly correlated with the most climate indices (Supplementary Table S1). Supplementary Fig. S1 shows correlations between all seasonal climate indices investigated here and the seasonal maximum flows at individual gauges. The cluster-mean correlations using the seasonal maximum flows were sometimes greater than those for the peak flows, likely because the smaller sizes of the seasonal clusters could better represent local teleconnections. The correlations with seasonal indices may also represent teleconnections during specific seasons. Correlations of peaks with the seasonal mean of climate indices may not be as clear because the peaks can occur anytime in a water year, whereas the climate index was determined for a fixed season. Additional correlations revealed through the seasonal analysis (rather than with the peaks) include the eastern U.S.

Implications for flood attribution
Here we identified regions in the conterminous U.S. where variations in flood magnitude can be largely attributed to seasonal global-scale climate drivers. These drivers can affect the amount and distribution of flood-generating precipitation through shifts in storm tracks. An understanding of how the drivers may change in a future climate 53-55 may be critical for identifying the regions where the magnitude of floods may change. For example, an enhanced hydrologic cycle may increase the importance of moisture deliveries to North American that are related to ENSO 56 . Peak flows in clusters 1 and 3 (Fig. 2, northwest and southern U.S.) and seasonal maximum flows at www.nature.com/scientificreports www.nature.com/scientificreports/ many gauges in the northwest, southern, and eastern regions of the U.S. were significantly correlated with MEI. Intensification of moisture related to ENSO may result in changes in flood variability in those regions. It is possible that shifts in climatic variability may result in a different distribution, timing, and amount of precipitation, but these shifts themselves and their effects on the hydrologic cycle are uncertain 53,57 . Using the period of record from 1966-2015, seasonal climate drivers (as indicated by MEI, PNA, and AO) explain much of the flood variability in the western and southern U.S. Changes in the peak flows and the seasons with the largest flows may be more important for flood studies than those seasons having a lower maximum flow (for example, lower seasonal maximum flows in summer in the northeastern U.S.). Here, we examined standardized flows, which do not indicate flood magnitudes. This analysis provides climate attributions for these regions which may help to disentangle the compounding effects of climate variability on flood attribution studies.

Streamflow data. Streamflow data were obtained from the U.S. Geological Survey (USGS) National Water
Inventory System (NWIS, waterdata.usgs.gov) database for a 50-year period for water years (October through September) 1966 to 2015. We selected sites from the HCDN-2009, which is a subset of USGS streamflow gauges that have minimal anthropogenic interference (e.g. minimal effects of dams, diversions, water withdrawals etc.). We selected a subset of 415 gauges from the HCDN-2009 using criteria for completeness of the peak flow record at each stream gauge 7 (Fig. 1). We also determined a maximum seasonal streamflow for the same 415 gauges for the months October through December (OND), January through March (JFM), April through June (AMJ), and July through September (JAS). For this study, the maximum seasonal streamflow is the largest daily streamflow during the season. The peak flow at a gauge can occur any season, and the mean season of the peak can shift through time 23 . We used circular statistics 58 to identify the mean day and season of the peak. The mean season of the peak flow is OND in portions of the northwestern and southwestern U.S. as well as part of the Appalachian region, JFM in most of California and the eastern U.S., and AMJ in much of the western interior U.S. The peak occurs in JAS at a few of the analyzed gauges in the southern U.S., mainly parts of Florida and Texas. climate data. Many studies have identified relations between precipitation (and streamflow) and global-scale climatic drivers 45,48,59,60 . To evaluate relations between the peaks, seasonal maximum streamflow and climate, we obtained monthly sea surface temperature (SST) data, monthly atmospheric pressure data for the 500 hecto-Pascal pressure surface, and climate indices for the water years 1966 to 2015. We computed the mean of the climate indices to represent seasonal values for the periods of NDJF (November through February) and for OND, JFM, AMJ, and JAS.  4-28-2017). The PDO is a multi-decadal ENSO-like pattern of SST variability in the North Pacific defined as the first principal component of monthly SST anomalies over the North Pacific Ocean 43 . The PNA index was downloaded from the NOAA Climate Prediction Center (CPC) (http://www.cpc.ncep.noaa.gov/data/teledoc/telecontents.shtml, accessed 9-20-2018). The PNA 27 is a measure of atmospheric pressure anomalies over the Pacific Ocean and North America which affect the movement and strength of the East Asian jet stream and is strongly influenced by ENSO oscillations. The AMO was obtained from the NOAA ESRL PSD (https://www.esrl.noaa.gov/psd/data/timeseries/AMO/, accessed 4-19-2017). The AMO 28 is a measure of SST variability in the North Atlantic Ocean and is related to air temperature and precipitation anomalies in North America. The station-based NAO index was downloaded from the National Center for Atmospheric Research Climate Analysis Section (https://climatedataguide.ucar.edu/climate-data/ hurrell-north-atlantic-oscillation-nao-index-station-based, accessed December 11, 2017). The NAO 29 is a measure of the differences of sea-level pressure between the Subtropical (Azores) High and the Subpolar Low 29 . The AO index 30 was downloaded from the NOAA CPC (https://www.cpc.ncep.noaa.gov/products/precip/CWlink/ daily_ao_index/ao.shtml, accessed 11-9-18). The AO is an index of the variation in the strength of the polar vortex that is casually related to weather patterns in North America.
Standardization. The streamflow data were standardized prior to analysis by the procedure for the Standardized Precipitation Index 61,62 . SPI was developed as a means for identifying periods of dryness or wetness in precipitation as a standard normal variable and has become a common approach for quantifying the intensity of drought 63 . SPI is useful for precipitation because it accounts for time series that follow a skewed distribution, and for different amounts of skew between regions with different climates. These properties make SPI a useful approach for quantifying periods of high and low streamflow because annual and seasonal peak flow records are often skewed (usually positively with the tail toward larger flows) and are generated by diverse processes. The SPI procedure provides a means to make comparisons of flood variability across different regions and climates. Standardized streamflows computed using the SPI procedure have a mean of 0 and a variance of 1, which are the same properties obtained from a z-score transformation.
Regionalizing streamflow. Clustering based on a correlation approach 24 was used to group the gauges into regions with common temporal variability. The clusters were created using a hierarchical cluster analysis of the time series of annual peak and seasonal maximum streamflow for water years 1966 through 2015. The clustering procedure involved a correlation-based approach to identify groups of sites with similar temporal variability of annual peak and seasonal maximum flows (thus five separate clustering procedures were performed: annual peak flows, OND maximum flows, JFM maximum flows, AMJ maximum flows, and JAS maximum flows). The correlation-based clustering process involved the following steps. First, the standardized time series of flows for each site were correlated (using Pearson correlation) with time series of standardized flows for all other sites. The site that was correlated with the most other sites above an arbitrary specified threshold (r > 0.5) was removed from the original set of sites along with all the sites that are correlated with the selected site above the specified threshold; these sites initiated the first cluster. To find the second cluster, using the remaining sites, the site that was correlated with the most other sites above the same threshold (r > 0.5) was removed along with all the sites that were correlated with the selected sites above the threshold. The process is repeated for the remaining sites to obtain the third and subsequent clusters until there are no remaining sites that have correlations of standardized flows with the other sites above the specified threshold. Sites that were not correlated with another site above the threshold were not assigned to a cluster. A second screening was done to ensure that all sites were assigned to the most representative cluster. The time series of standardized flows for each site in each cluster were averaged to produce a cluster average of standardized flows. Subsequently, the time series of standardized flows for each site then were correlated with each cluster average time series and each site was ultimately assigned to the cluster with which it had the highest correlation. We obtained a representative time series for each cluster by computing the mean of flows at gauges in each cluster for each year. This "cluster-mean flow" represents the dominant temporal patterns and cycles of peak and seasonal maximum flow variability within each region.
Wavelet transform of cluster-mean flows. We used the R package biwavelet 41  www.nature.com/scientificreports www.nature.com/scientificreports/ climate teleconnections. To evaluate regional patterns of peak and seasonal maximum flows and global-scale climatic drivers, we correlated the cluster-mean flows with each climate index to identify teleconnections related to SST using Pearson correlation. Significance was defined at the 95% confidence level (p < 0.05). We correlated the maximum seasonal streamflow with the concurrent seasonal mean of the climate indices. The standardized annual peaks were correlated with the mean of each climate index over the months OND, JFM, AMJ, and JAS of the same water year. The seasonal maximum flows were correlated with the mean of the time series for the concurrent months. To evaluate the strength of the correlations of the cluster-mean flow, we also correlated the individual time series of standardized annual peak and seasonal maximum streamflows at each gauge with the climate indices, and the cluster-mean flow with mean water-year SST.
Evaluating the relations of clusters and climate indices. To assess the relations between floods and climate drivers, we correlated the cluster-mean flows with climate indices. The correlation patterns of the individual gauges (Fig. S1) with the climate indices matched the regions of the clusters for several, but not all teleconnections, and the percentage of the gauges with significant correlations in a cluster varies (Tables 1 and S1). For example, the correlation coefficient between the JFM cluster 2 mean and MEI is 0.50 (p = 2E-4, Table S1) and 76% of the gauges have significant correlation, whereas the correlation between cluster 3 for annual peaks mean and JFM MEI is 0.61 (p = 3E-6, Table 2) and fewer gauges (36%) in the cluster have significant correlations. Thus, the question remains about how well the clusters represent the patterns of teleconnections.
To evaluate the clusters in terms of how well they represent a climatic influence over a region, we used field significance 64 to assess the collective significance of the spatial pattern of correlations at gauges within each cluster. That is, we assessed whether the number of significant correlations within a cluster is significant, which provides some measure of whether the correlation pattern as a cohesive group is significant and not substantially affected by serial correlation. To estimate the field significance of the relations between each cluster and climate index, we calculated a large set of correlations between the streamflow time series, reordered through time using permutation, with the climate indices (not reordered). Then we calculated the percentage of gauges with significant correlation (p < 0.05). We reordered the time series 10,000 times to obtain a null distribution of the percentage of gauges that passed the significance test. Field significance for each cluster and climate relation was satisfied if the actual percentage of significant gauges in the data was matched in less than 5% of the randomized tests. The clusters that passed the field significance test are indicated in Tables 1 and S1 by the percentage of significant gauges in bold text. Field significance was generally satisfied for cases of significant correlation between the cluster means and climate indices, meaning the clusters are robust for representing the patterns of teleconnections. In cases where the correlation of the cluster-mean time series and time series was significant, but field significance was not achieved (for example, peaks cluster 2 and AO), the cluster-mean time series was likely dominated by a few gauges of higher correlation with the climate index. The opposite case of field significance but no significant correlation by the cluster mean suggests that the cluster is robust in that it represents a region with a weak climate teleconnection.

Data availability
Peak flow data for the gauges in this study are available for download from the U.S. Geological Survey National Water Information System (NWIS) at waterdata.usgs.gov and in U.S. Geological Survey data releases 65,66 .