Long-term trends in storm surge climate derived from an ensemble of global surge reconstructions

We address the challenge, due to sparse observational records, of investigating long-term changes in the storm surge climate globally. We use two centennial and three satellite-era daily storm surge time series from the Global Storm Surge Reconstructions (GSSR) database and assess trends in the magnitude and frequency of extreme storm surge events at 320 tide gauges across the globe from 1930, 1950, and 1980 to present. Before calculating trends, we perform change point analysis to identify and remove data where inhomogeneities in atmospheric reanalysis products could lead to spurious trends in the storm surge data. Even after removing unreliable data, the database still extends existing storm surge records by several decades for most of the tide gauges. Storm surges derived from the centennial 20CR and ERA-20C atmospheric reanalyses show consistently significant positive trends along the southern North Sea and the Kattegat Bay regions during the periods from 1930 and 1950 onwards and negative trends since 1980 period. When comparing all five storm surge reconstructions and observations for the overlapping 1980–2010 period we find overall good agreement, but distinct differences along some coastlines, such as the Bay of Biscay and Australia. We also assess changes in the frequency of extreme surges and find that the number of annual exceedances above the 95th percentile has increased since 1930 and 1950 in several regions such as Western Europe, Kattegat Bay, and the US East Coast.

www.nature.com/scientificreports/ more details). We show here four different cut-off probabilities (probability values above which a given year is considered to be a change point) to identify likely change point years: 15%, 20%, 25%, and 30%. In the case of Astoria and for G-20CR, all cutoff probabilities indicate that the year 1948 is the most recent change point in the time series. This is also apparent from the time series shown in Fig. 2a-d. There is a rapid decrease in the annual variability of the predictors before 1948. On the other hand, for G-E20C, three probable change points (1947, 1972, and 1994) are identified for the 30%, 20%, and 15% cutoff probabilities, respectively. Visual inspection of the changes in the variability of the reanalysis predictors and reconstructed storm surge time series (Fig. 2a-d) shows a decrease in the variability of all four variables before 1947. Hence, we choose 1947 as the change point year and assume that data for the time periods 1949 to 2015 (G-20CR) and 1948 to 2010 (G-E20C) are reliable in Astoria. The same change point detection procedure has been applied for all selected G-20CR and G-E20C reconstructions (see http:// gssr. info/ chang epoint for detailed results). After removing suspicious data from G-20CR surge reconstructions for tide gauges in southern Australia, New Zealand, Japan, and the northwest coast of the US vary in length from 50 to 75 years, and along the US Gulf coast, US East coast, and across Europe between 125 and 150 years (Fig. 3, Table 1). For several tide gauges (16 in total) along the US Gulf and East coast, Spain, Portugal, and France, G-20CR provides 150-180 years of surge reconstructions after removing suspicious data. Some tide gauges (red triangles in Fig. 3), mainly in Antarctica, southern Africa, and parts of Australia were discarded after the change point analysis due to significant (and recent) changes in the annual variabilities of predictors (see "Discussion"). For G-E20C, the lengths of the reconstructed surge time series, after removing suspicious data, for tide gauges along the US northwest coast, most of New Zealand, and Japan is 50-75 years. However, data lengths for tide gauges in southern Australia are between 100 and 110 years, which is in some cases twice as long compared to G-20CR in the same locations,  www.nature.com/scientificreports/ pointing to distinct differences in the quality of the reanalysis data. G-E20C provides 50-75 years of data along the US Gulf and East coast, which is shorter than G-20CR. In Europe, most of the tide gauges have 100-110 years of reconstructed surge data. Similar to G-20CR, there are several tide gauges discarded in the southern polar region due to quality issues. On average, and after removing suspicious data, GSSR 31 has extended the average storm surge data lengths at the 310 (G-20CR) and 320 (G-E20C) sites from 30 to 111 years (G-20CR) and 16 to 69 years (G-E20C), with significant spatial variability. We find that G-20CR provides at least 100 years of additional surge data (on top of available observed surge information) for 40% of the tide gauges and at least 50 additional years for 68% of the tide gauges; G-E20C provides at least 100 additional years of surge data for 4% of the tide gauges and at least 50 additional years for 46% of the tide gauges (Fig. 4). According to the aggregated results in Table 1, G-20CR leads to the shortest extension of existing data along the US West coast, adding on average 30 years of data. In Europe, on the other hand, an average of 111 additional years of surge data is made available. For instance, at Delfzjil (The Netherlands), G-20CR provides a total of 149 years of reconstructed surge data which is 104 more years in addition to the 45 years of existing observational data (available in the GESLA-2 1 database).
G-E20C also provides the shortest extension for tide gauges along the US West coast, with an average of 16 additional years of data, and a maximum extension in Europe, with 69 additional years on average (Table 1). There are 9 tide gauges along the US East and Gulf coast, where the observational data is longer than the reconstruction when using G-E20C (Fig. 4b). These are tide gauges with particularly long observational records such as Galveston (102 years) and Atlantic City (94 years), where change points are detected in the reconstructions leading to shorter records compared to observations. Trend analysis. Long-term trends in storm surge magnitude. After removing suspicious data based on the change point detection, we calculate and compare trends of the observed and reconstructed surges (see "Methods" for details) to assess their similarities. We use annual values of high percentiles and chose the 95th and 99th percentiles here as those have been used in many previous studies and are often considered as thresholds when performing extreme value analysis. For this comparison, we select 122 tide gauges with at least 30 years of overlapping data between observations, G-20CR, and G-E20C and a minimum of 75% completeness in the observations. For the majority of the 122 tide gauges, no statistically significant differences (5% level) exist between observed trends and reconstruction trends (Fig. 5). Differences between observed surge and G-20CR are found at 25% (95th percentile surges) and 19% (99th percentile surges) of the tide gauges. When comparing observations and G-E20C, significant differences are found at 30% (95th percentile surges) and 18% (99th percentile surges) of the tide gauges. These differences with observations mainly exist along the Salish Sea (US West coast), New England (northeast US coast), and the Atlantic coast of France. For 64% (95th percentile surges) and 78% (99th percentile surges) of the tide gauges, both reconstructions agree with the observed trends, in particular along the US southeast coast, Japan, and the German Bight.
Next, we investigate G-20CR and G-E20C trends for the 1950-2010 (2015) and 1930-2010 (2015) periods. Those were chosen as a tradeoff between covering relatively long time periods while still having reasonable spatial coverage. Figure 6 shows the long-term trends of the annual 99th percentile surges from G-20CR (a-e) and G-E20C (f-j) for the 1950-2015 and 1950-2010 periods respectively (Supplementary Figs. S1 and S2 show trends for the annual 95th percentile surges). Trends are shown for regions with at least 10 tide gauges. Note that the number of tide gauges can be different in the same region for the two reconstructions, because the change point analysis may have identified suspicious data post-1950 in one reconstruction but not the other.
For G-20CR, significant trends at the 5% significance level are found at 26% of the tide gauges (which were considered originally for change point analysis), notably in the northern UK, Kattegat Bay, southeast Australia, and New Zealand. The largest statistically significant positive trends are derived for the northern UK and New Zealand with magnitudes of 1.9 mm/year and 1.6 mm/year, respectively ( Supplementary Fig. S7). Although statistically insignificant, Cuxhaven (Germany) and Esbjerg (Denmark) have the largest trends with magnitudes of 2.48 mm/year and 1.89 mm/year respectively. Small but significant negative trends with an average magnitude of − 0.6 mm/year are derived at 13 tide gauges and those are mostly located along the Atlantic coasts of France and Spain and in the Adriatic Sea.
Similarly, for G-E20C significant trends for the 1950-2010 period are found at 26% of the tide gauges. Positive trends are derived for the US Gulf coast, UK, Kattegat Bay, and the German Bight. The largest statistically significant positive trend of 2.9 mm/year is derived in the southeastern North Sea (for both Cuxhaven in Germany and Delfzjil in the Netherlands), followed by 2.5 mm/year at Nome (Alaska), and 2.1 mm/year at Millport   www.nature.com/scientificreports/ Trend sensitivity analysis. As discussed in the Introduction, observed surges are usually short and not as continuous as G-20CR and G-E20C. There exist, however, tide gauges with relatively long surge records that can be used to compare against the reconstructed surges. Here we compare 99th percentile observed and reconstructed surges by computing their corresponding trends for various overlapping time windows. We start with a window length of 30 years which is moved by 1 year each time step and repeat the same analysis for longer time windows (adding 1 year each step) (Fig. 8). This allows us to not only compare the reconstructed and observed trends for  www.nature.com/scientificreports/ many more time periods than were used in the previous section, but also shows how multidecadal variability affects observed and reconstructed trends. In Cuxhaven (Fig. 8b), for example, negative trends are found in observations early in the record when using shorter window lengths; and while G-20CR also shows some negative trends early in the record and for short window lengths, the overall patterns in both reconstructions are different, with more persistent positive trends compared to observations. At Port Pire (Fig. 8c), also both G-20CR and G-E20C show positive trends for most time periods and window lengths, while observed trends are mostly negative. In Boston (Fig. 8a), G-E20C agrees well with observations in terms of the sign of the trends, while G-20CR shows very different patterns. More examples are provided in Supplementary Fig. S5 with similar conclusions, i.e. agreement between reconstructions but not with observations in Astoria, relatively good agreement between G-E20C and observations in Brest, and general agreement between all three in Fremantle for most time periods and window lengths. Overall, there is more agreement when trends are derived for longer time windows. Check Supplementary Tables 1-4 for detailed results of the trend analysis.
Comparison of trends for the satellite era from all GSSR reconstructions. Finally, we compare trends in storm surge magnitudes of all five reconstructions available in GSSR with each other and with observations for the overlapping period from 1980 to 2010, for which many more tide gauges provide (near-)continuous records. We also include an ensemble mean (G-EnsMean) based on all GSSR reconstructions. Satellite data is assimilated into all reanalysis products over that time period, and spurious long-term trends due to incosnistancies in the assimilated data are less likely to occur. However, over a 30-year period decadal variability can have significant effects on trends and those long-term variations may be represented differently in the reanalysis products and  Trend analysis for the satelite era shows generally good agreement for Europe in terms of the spatial distribution of observed trends and GSSR trends as well as amongst the different GSSR trends themselves (Fig. 9). All seven datasets (including G-EnsMean) show strong negative trends along the southeastern North Sea and the Kattegat Bay, the largest negative trend being − 6.9 mm/year at Cuxhaven (G-20CR). The actual magnitude of GSSR trends, however, is smaller than that of observed trends (Fig. 10) for most of the tide gauges in Europe. Tide gauges along the Atlantic Coast of France, Spain and North Adriatic Sea have larger negative observed trends which is not reflected in most GSSR reconstructions. Moreover, tide gauges along the Bay of Brest (Brest, Le Conquet) and Loire Estuary (Saint Gildas) show stark differences between observed and GSSR trends. Observed trends at all three tide gauges are negative (− 2.58 mm/year at Brest, statistically significant), whereas GSSR trends are mostly positive, except for G-EInt and G-Merra. Along the US East coast, in the New England area all seven datasets indicate a positive trend for the majority of tide gauges. In the Chesapeake Bay region, there are differences between GSSR trends and observed trends. All GSSR reconstructions (except G-E20C) show negative trends in this region, whereas observed trends are all positive. The largest observed trend has a magnitude of 3.4 mm/year (statistically significant) at Chesapeake Bay Bridge Tunnel. Refer to Supplementary Table 5 for detailed results of this analysis.  www.nature.com/scientificreports/ Along the US Gulf coast there is a positive trend in most datasets (except G-20CR and G-EInt). The observed trend at Pensacola is 5.2 mm/year (statistically significant), which is the largest positive trend observed in all tide gauges considered in this study during the 1980-2010 period. In general, observed trends are positive and larger in magnitude than GSSR trends in this region which leads to the largest differences between GSSR and observed trends as shown in Fig. 10.
On the US west coast, differences exist in trends between observed surges and GSSR reconstructions mostly for tide gauges on the Columbia River and Salish Sea. While observed trends are negative at Astoria, (− 4.1 mm/ year, statistically insignificant) some GSSR trends are positive (G-20CR, G-E20C, and G-EInt) and others negative but very small in magnitude (G-Merra, G-E5, and G-EnsMean). In the southwest (Alameda, Monterey, and San Francisco), there is a general agreement (statistically insignificant but negative trends) among most datasets (except G-E20C).
In Australia, G-20CR and G-E20C generally show positive trends (see also Fig. 10) which is not the case for the satellite era reconstrutions (G-EInt, G-Merra, and G-E5). Differences are most pronounced at Portpire where observed surge, G-20CR, and G-E20C show statistically insignificant but positive trends and the other three GSSR reconstructions show negative trends. The ensemble mean reconstruction (G-EnsMean) gives the smallest difference compared to observed trends (Fig. 10).
Trends in storm surge frequency. To study the spatial patterns in frequency trends of extreme surges, we cluster tide gauges into eight regions: US east coast, US west coast, US Gulf coast, east Asia (tide gauges from Japan and China), Oceania (tide gauges from Australia and New Zealand), Mediterranean, western Europe, and the Kattegat Bay (tide gauges from Sweden and Norway). Here we investigate the trends in storm surge frequency for both centennial reconstructions (G-20CR and G-E20C) for the 1930-2010 (2015) and 1950-2010 (2015) periods, after suspicious data identified from the change point analysis is removed. To quantify storm surge frequency at individual locations, the 95th percentile of the entire reconstructed surge time series is considered as a threshold. The number of annual storm surge events exceeding this threshold is derived at each tide gauge and the resulting time series are averaged per region and a linear trend is estimated for the regional average annual storm surge frequency.
Differences between trends in annual exceedances (after declustering, see "Methods") above the 95th percentile surges for observed surges and reconstructed surges (G-20CR and G-E20C) are computed for 133 tide gauges that have storm surge data available during the 1930-2010 (2015) period (detailed results not shown). Results show that the trends for the number of annual exceedances above the 95th percentile of the observed and reconstructed surges for the overlapping periods are not statistically different (at 5% significance level) for 73% and 81% at the tide gauges for G-20CR and G-E20C respectively. Figure 11 shows the trends for six regions, as the other two regions (Mediterranean and US west coast), do not show significant trends at the 5% significance level for either of the reconstructions. East Asia does not show significant trends in G-E20C, whereas no significant trends exist for the US Gulf and east coasts in G-20R (and hence panels are not shown in Fig. 12). Similar to the trend analysis for the surge magnitudes, the frequency trends are computed for two time periods

Discussion
Change point analysis. We apply a Bayesian change point analysis on G-20CR and G-E20C storm surge reconstructions as well as the predictors that were used to derive them. The goal is to identify and remove suspicious data, related to inconsistencies in the reanalysis products, from the surge reconstructions. Figure 12 shows the annual variability time series for 20CR predictors and the associated reconstruction G-20CR for five tide gauges in the Arctic, Antarctica, Australia, New Zealand, and the US northwest coast. In all cases, sharp decreases in the variability of the reanalysis predictors and reconstructed surges exist when going back in time. G-E20C (not shown here) also shows such a decrease in annual variability in several but not all of these tide gauges. Some of the tide gauges like Base Prat and Kerguelen show a very rapid decline in the variability leading to a change point year in the 2000s. Therefore, surge reconstructions for these tide gauges and others with similar suspicious behavior are not considered for trend analysis. They are marked as red triangles in Figs. 3 and 4. For most of the other tide gauges in these regions, our change point analysis shows that G-20CR should be considered only from the mid-twentieth century onward since change points are detected in the 1940s and 1950s (see Supplementary Figs. S3 and S4). This aligns with the findings from Brönniman et al. 32 who showed the strong downward trend in 20CR wind speeds in the Arctic, northeastern Canada, and the northern North Pacific before 1940. This is due to the scarcity of observations in these regions used in the data assimilation for the 20CR reanalysis. ERA-20C predictors and G-20C, on the other hand, do not show such rapid decline in variability (see Supplementary Figs. S3 and S4 for examples). A possible explanation for this might be the assimilation of surface marine wind observations into ERA-20C which is not the case for 20CR 18,26,27 .
The comparison presented in this section is not indicative of the superiority of one surge reconstruction (or reanalysis) over the other and should not be interpreted as such. For the majority of the tide gauges used in this study, the record lengths of the observed surges are too short to robustly compare trends in the annual variability with that of the reconstructed surges. However, the two centennial reconstructions, together with the other GSSR    37 found no significant trends in storm surge magnitude in the European Atlantic coast. For the same period, we also find insignificant trends in G-20CR and G-E20C (except for a few tide gauges in the northern UK).
Next, we showed that during the common period 1980-2010, where all GSSR reconstructions overlap and many more tide gauge provide (near-)complete data, spatial distribution of trends is similar across all datasets (including an ensemble mean of the GSSR reconstructions) in many regions. This is particularly pertinent to tide gauges in northern Europe and northeast coast of the US. There are, however, regions where trends differ (in magnitude and sometimes also in sign), particularly in estuaries and bays. For example, at tide gauges along the Chesapeake Bay, Columbia River, Salish Sea, Bay of Brest, and Loire Estuary, observed and GSSR trends (for the majority of reconstructions) have opposite signs. This could be due to the modulating effect of river discharge on water levels in bays and estuaries 38,39 not captured by GSSR.
We also show that GSSR centennial reconstructions exhibit statistically significant positive trends in storm surge frequency during the 1930-2010 (2015) and 1950-2010 (2015) periods. The tide gauges with the largest positive trends in surge magnitude (95th and 99th percentile) also often have the highest positive trends in the storm frequency (e.g., northwestern Europe and the Kattegat Bay). This aligns with previous studies that report an increase in the storm frequency for the high-latitude North Atlantic and northern Europe 26 , including the North Sea 25 . On the other hand, Krueger et al. 27 argued that the long-term positive trend of the storm index, estimated from the 20CR reanalysis in northern Europe and northeast Atlantic, is implausible as the same storm index for the upper percentiles of the observed geostrophic wind speeds does not show a similar long-term trend. However, the storm indices from the 20CR reanalysis and the observed geostrophic winds behave similarly in the second half of the twentieth century.
As noted above, a limiting factor in our analysis is the potential impact of reanalysis inconsistencies on the reconstructed surges that might introduce spurious long-term trends in some regions. The Bayesian change point detection method successfully identified suspicious changes in the variability of surges and predictors at tide gauges along the northwestern coast of the US, northern Australia and some high-latitude regions. These changes in the variability, if not accounted for, would lead to significant and implausible trends in high-percentile surge time series (such as the annual 95th and 99th percentiles used here). While the change point analysis identified instances where that was the case, the methodology might still miss small and subtle trends that can be attributed to inconsistencies arising from the atmospheric reanalyses.
The case of Astoria (Fig. 2) for instance, shows some of the challenges related to the change point analysis and comparison to in-situ observations. The year 1948 (47) is identified as a change point for G-20CR (G-E20C), based on the change point probabilities as well as the visually obvious shift in the four variables during the 1940s. This could be associated with the sparse amount of observations assimilated into the reanalysis products during and shortly after World War II 40 . On the other hand, the specific years (1947 and 1948) where the change points are detected, may also be associated with a shift from the warm to the cold phase of the Pacific Decadal Oscillation 41 . In the Pacific Northwest, the cold phase of the PDO is associated with cooler water temperatures www.nature.com/scientificreports/ and changes in streamflow patterns (due to the change in temperature differences between cold and warm PDO phases) 42 , both of which can influence water levels 43,44 . Moreover, the year 1948 was particularly stormy, leading to a large snowpack and the second largest flood on the Columbia River since records began 45 . Hence, the particular attribution of a change point to the year 1947-1948 may be related in part to natural variability. We note, however, that the shift to warm PDO phases in ~ 1925 and ~ 1977 are not picked up by the change point analysis. Hence, we conclude that inconsistencies in the reanalysis lead to a drop in the variability in the 1940s, with the exact year(s) possibly conflated by background atmospheric/oceanic variability. To conclude, our trend analysis on GSSR storm surge reconstructions demonstrates that in northwestern Europe, both G-20CR and G-E20C show consistent positive trends for the periods starting in 1930 and 1950 in magnitude as well as frequency, particularly in the southeastern German Bight. From the satellite-era trend analysis (1980 to 2010) we find consistently negative trends for major parts of northern Europe, which also agree with the observational data. Results are also relatively consistent across reanalysis products in the northeastern US, whereas in other regions we find pronounced differences in trends when comparing between reanalysis products, for example southern Australia and in bays and estuaries. The ensemble mean often resembles the spatial patterns of trends in the observed data best and is hence the preferred approach, but it limits the analysis to a relatively short period where all reanalysis products provide data. Trends derived from centennial reanalysis products over longer time periods provide useful insights as to where changes in storm surges may have taken place in the past (and could continue in the future), while at the same time being cautious in the interpretation due to the inhomogeneities in reanalysis products. In addition to performing trend analyses presented here, the underlying GSSR data is also useful for other applications, such as studying intra-annual to multi-decadal variability. In the future we plan to apply bias correction to the GSSR reconstructions and use those for extreme value analysis and to study spatial storm surge footprints 46 , among others.

Methods
Data. We use daily maximum surge reconstructions obtained from the Global Storm Surge Reconstructions database (GSSR, http:// gssr. info) developed in Tadesse and Wahl 31 . GSSR comprises two centennial and three satellite-era storm surge reconstructions, all of which have been obtained with data-driven models from Tadesse et al. 47 using wind speed and mean sea-level pressure forcing from five different atmospheric reanalysis products. GSSR reconstructions are available for 882 globally distributed tide gauges, and they have been validated against in-situ daily maximum surge observations from tide gauges 47 . Observed storm surges are extracted from sea-level measurements from the GESLA-2 database 1 as the difference between the measured water level and the tidal prediction, after removing the annual mean sea level. Hourly water level records for Astoria (1855-1876) shown on Fig. 2d were obtained from Talke et al. 44 . We only select GSSR reconstructions corresponding to tide gauges that show correlations with observed daily maximum surges of 0.7 or greater. This results in 310 and 320 tide gauges with G-20CR and G-E20C reconstructions, respectively.
Change point analysis. Reanalysis products are sensitive to the assimilated meteorological and/or oceanic observations (changing over time), which may result in spurious trends in key outputs variables 27,48,49 . Furthermore, due to sparsity in assimilated observations, atmospheric events, particularly small-scale events (hurricanes, atmospheric rivers), may be poorly represented, which may result in an underestimation of modelled variables such as peak wind speeds or minimum pressure 50,51 . Systematic underestimations would therefore become visible in the variability of output variables from the atmospheric reanalysis products and therefore also translate into underestimated variability in the GSSR reconstructions (see Fig. 1). We therefore hypothesize that time-periods with a persistent decrease in the variance (or standard deviation) of surges in GSSR and/or its forcing variables likely indicate systematic model drifts rather than real trends.
In order to identify suspicious data in GSSR we apply a Bayesian change point analysis to annual standard deviation time series of GSSR surges and the atmospheric forcing datasets from 20CR and ERA-20C. The Bayesian change point analysis is carried out using the R package "bcp"  52 . It is based on the product partition model 53,54 that separates a time series into several partitions based on different parameters (for instance, the mean and variability of the time series). The product partition model considers the number of change points and their positions as random variables and assumes that there exists an unknown partition ρ of the set {1, 2,…, n} that divides the time series into b contiguous blocks (random variable ranging from 1 to n, where n is the length of the time series). We used 500 MCMC iterations for our analysis. At the end of each iteration, the posterior distribution for the random partition, the number of change points, and change point probability of a given year are updated. For each year, we average change point probabilities corresponding to the four variables (the reconstructed surge and the three predictors). This is done to find the years in the time series where all (or most) of the variables show unusual changes in the variability. Sometimes, one or more variables show a deviation from the "typical" values, but this could be an artifact and may not be reflected in other variables. From the estimated average change point probabilities, we identify years in the time series where change point probabilities are equal or greater than a set of cutoff probabilities. In our analysis, cutoff probabilities of 15%, 20%, 25%, and 30%, are used to find change point years. Usually there are multiple years in the annual variability time series where a given cutoff probability is exceeded. In that case, we select the most recent year as the change point for the given cutoff probability. Only surge data from change point years onward are considered to quantify the trends in magnitude and frequency of daily maximum surges.
As mentioned in the introduction, using the RMSE time series between observed and reconstructed surges would be the preferred approach to identify spurious trends in reconstructions. Although this is not feasible  Fig. S6) with long records for which the annual RMSE between daily maximum observed and reconstructed surges time series was used to implement change point analysis (in addition to the annual standard deviation time series). There are noticeable differences in change point analysis results when using annual RMSE vs annual standard deviation time series of the reconstruction alone ( Supplementary Fig. S6). This could be due to unrealistic surge values in observations (e.g., due to time shifts in the tidal analysis or other data issues) or in the GSSR reconstructions that can lead to very high RMSE values for individual years which in turn would be wrongly flagged as change points. For instance, in Brest (France) there is a change point identified in 1975 with high probability (97%) when the annual RMSE time series is used. However, there is no persistent deviation of the annual RMSE time series before or after this period. The change point analysis does not detect any change point for the same period when the annual standard deviation time series is used. Similar issues are found in Seattle ( Supplementary Fig. S6d) when using the annual RMSE time series for change point detection (particularly 1960 onward).
As an alternative to the annual standard deviation time series, we tested using the annual interquartile range. This is a measure of variability that is more suitable for skewed distributions and is robust against outliers. The interquartile range is computed by taking the difference of the 75th percentile and 25th percentile values of the system variable (daily maximum observed and reconstructed surges) for a given year. Using the annual interquartile range time series for change point analysis showed very similar results to that of annual standard deviation. Hence, in this study we focus on the annual standard deviation time series of predictors and surge reconstructions to detect change points.
In addition to identifying the change point years corresponding to the different cutoff probabilities, a visual inspection of the individual annual variability time series is carried out. This is done to avoid instances where extreme events (such as surges caused by hurricanes) not adequately represented in either the observations or the reconstructions are identified as change points, or when subtle but consistent changes in the variability time series are not picked up by the change point algorithm (i.e., change point probabilities are below the cutoff values we considered). Furthermore, we assess if similar shifts occur in the different predictors and the surge reconstruction. In other words, if a change point year indicates only a change in one variable but no significant change is reflected in other variables, this change point year is disregarded. Hence, while the automated change point analysis provides initial indication of when change points occurred, the results are manually corrected in some instances for the various reasons outlined here.
Trend analysis. First, trends in extreme storm surges are calculated for the two centennial GSSR reconstructions, after suspicious data was removed, and we focus on the periods 1930 to 2010 (2015) and 1950 to 2010 (2015). Trends are computed by fitting a linear regression model to the annual 95th and 99th percentile surges from G-20CR, G-E20C, and observations where available. The standard errors of the linear regression coefficients representing the trends are adjusted for heteroscedasticity and autocorrelation using the Newey-West estimator 55 . Before fitting trends to extreme surges from the reconstructions, we compare the trends in extreme surges from observations to trends in extreme surges from G-20CR and G-E20C. We limit our analysis to tide gauges with > 30 years of data and > 75% completeness between the years 1930 and 2010. Trends are computed using the common period between observations and reconstructions at each tide gauge. We check if the trends from observations are significantly different from the reconstruction trends at the 5% significance level. Our null hypothesis is that there is no significant difference between the trends obtained from observations and reconstructions for their period of overlap. For the annual 95th and 99th surge time series, a categorical variable is added to differentiate the time series as observation, G-20CR, or G-E20C. An interaction term (product of the categorical variable and the years) is then added as an additional predictor to fit linear trends to the annual 95th and 99th surges. We calculate the p-values for the coefficient of the interaction term and determine its significance at the 5% significance level. If the p-value for the coefficient of the interaction term is higher than 0.05 then the null hypothesis cannot be rejected. In other words, there is no significant difference between the trends in observed surges and reconstructed surges.
Following this test, we estimate trends in G-20CR and G-E20C (above 95th and 99th percentiles) for the 1950-2010 (G-E20C)/1950-2015 (G-20CR) and 1930-2010 (G-E20C)/1930-2015 (G-20CR) periods. As start years for the reconstructions vary among tide gauges due to the change point analysis, we constrain G-20CR and G-E20C strictly to the chosen time period before computing trends. For instance, when computing trends for the 1950-2015 period, we select only tide gauges that have data covering the entirety of this period.
To investigate the sensitivity of trends to start dates and periods covered and to compare and contrast trends from observation, G-20CR, and G-E20C (where long enough observations exist), a trend sensitivity analysis is carried out (Fig. 8 and Supplementary Fig. S5). A window of 30 years is selected as the starting window length where trends are computed, and the window is shifted one year each time step. Trends are then computed for each (moving) window length (by increasing the window length by one year until record length is reached). Availability of 75% of the data is required for each window. For windows where this is not met, trends are not computed (see for example Supplementary Fig. S5b). When gaps exist in observations they are also introduced to G-20CR and G-E20C, so that the trend comparison considers exactly the same period. We also compare observed trends with trends from all five GSSR reconstructions, including an ensemble mean (G-20CR, G-E20C, G-EInt, G-Merra, and G-E5, and G-EnsMean) for the 1980-2010 period where all datasets overlap (Figs. 9, 10) and (near-)complete observations are available for many tide gauges. Results are aggregated for five regions (Europe, US east Coast, US west Coast, Japan, and Australia) (Fig. 11).
Finally, we compute trends in annual storm surge frequency for G-20CR and G- www.nature.com/scientificreports/ window to decluster daily maximum surges that are above the 95th percentile threshold. We group tide gauges into eight different regions across the globe and derive the regional time series of annual number of extreme surges (> 95th percentile) by averaging them over the tide gauges within each region before fitting a linear regression model and adjusting for heteroscedasticity and autocorrelation. We also compare frequency trends from reconstructions and observations and test whether they are significantly (5% level) different from each other using the same approach as outlined above for comparing observed and reconstructed trend magnitudes.

Data availability
The Global Storm Surge Reconstructions (GSSR) 31 (http:// gssr. info) is a publicly available database that contains five daily maximum storm surge reconstruction datasets derived by forcing five climate reanalyses into a datadriven storm surge model 47 . The surge reconstructions obtained after incorporating the change point analysis as well as all change point analysis plots can be accessed at (http:// gssr. info/ chang epoint). Trend analysis results for all tide gauges are available at (http:// gssr. info/ trends).