Introduction

Extreme wave conditions cause large disruption in coastal areas1 and their impacts are likely to increase due to climate change2,3, including sea level rise4, while also being enhanced by expanding coastal populations5,6. Therefore, characterising the nearshore wave climate is fundamental for present and future coastal management, as extreme wave events can potentially affect a wide range of socio-economic activities. Storms or extreme wave events enhance sediment transport7 and modify coastal geomorphology8,9, cause erosion and flooding10 and must be considered when designing and implementing coastal structures or marine energy devices11,12. Knowledge of the extreme wave climate is also necessary to characterize coastal hazards, quantify the risks associated with different return periods, and propose management actions as a function of the expected consequences13. If the extreme wave parameters used in such analyses are biased (i.e. underestimated or overestimated), the error is propagated to the estimation of the related coastal hazards and risks. This can lead to incorrect coastal risk assessments, affecting and compromising the planning, the mitigation efforts, and ultimately the adoption of risk reduction measures.

At a global level, reliable in-situ measurements of waves in coastal areas (e.g. buoys) are still scarce, comprise records with relatively short duration (often only a few years or decades), have limited geographical distribution14 and are subject to errors15,16 that require appropriate quality control. Comprehensive characterisation of extreme wave conditions derived from satellite altimetry data is also irregular in time and space, scarcely available in coastal regions and high latitudes, and requires complex post-processing techniques for retrieving accurate information in coastal areas17. Wave hindcast models are a powerful alternative for wave climate analysis, allowing to minimise the lack of data by providing consistent global datasets of past wave conditions, including coastal areas, and temporally and spatially augmenting wave observations.

Several global and regional wave reanalyses, derived from the assimilation of atmospheric and oceanographic data from models, satellites and in-situ observations, have been developed and improved in the last decades, and their reliability and use for coastal hazard analysis increased substantially18,19,20,21. Until recently, the most widely used wave reanalyses on a global scale were the 1.5° spatial resolution ERA4022 and 1° ERA-Interim23 produced by the European Centre for Medium-Range Weather Forecasts (ECMWF), the CFSR/NCAR (0.5°/0.07°) produced by NCEP/NOAA24 and GOW225 (0.5°/0.25°) from IH Cantabria. In the past few years, new global wave reanalyses have been produced, including the 0.5° ERA526 by ECMWF and 0.2° WAVERYS27 by the Copernicus Marine Environment Monitoring Service (CMEMS). These last generation and up-to-date wave reanalyses are becoming the most commonly used reanalyses globally, and several studies have demonstrated the accuracy and improvements they have achieved28,29,30,31,32. Compared to wave buoys, reanalyses have the advantage of being a physically consistent reconstruction of past wave climate, with no gaps in space and time. However, they have coarse resolutions33 that cannot fully resolve the complexity of coastlines, accurately incorporate regional coastal winds, simulate depth-limited wave propagation processes across shallow parts of the continental shelf, or entirely incorporate local processes such as bottom friction, wave breaking and sub-grid island shadowing34,35. Despite being quite reliable in the open ocean, wave reanalyses can still exhibit systematic errors and thus consistently over or under predict wave parameters in coastal areas with respect to in-situ observations36,37,38,39. Consequently, modelled outputs from wave reanalysis might not be directly applicable to coastal areas without previous validation and calibration40,41,42.

To address this limitation, several bias correction methodologies36,43,44,45 and improved calibration methods12,46,47,48,49,50 have been developed to produce wave time series that better fit observations. Some authors focused on finding linear relationships between buoy and reanalysis data for the n-year return period12,47,51, others proposed spatial calibration that considers multiple observations49, while more recently a calibration that depends on mean wave direction was proposed48,50. At regional and local scales, downscaling has also been used to better resolve the physical processes and improve reanalysis accuracy52,53. However, this method does not correct systematic errors and instead propagates the errors of the global reanalyses54, which are normally used as boundary conditions to force the downscaled local wave propagation models. Despite these limitations, wave data from global reanalyses have been routinely and increasingly applied in coastal studies without validation and/or calibration20,55,56,57,58,59,60,61,62 due to their easiness of use and because measured data are often not available. The systematic errors of global reanalyses are often amplified for extreme values39,63, resulting in greater under or over prediction38,64 and potentially causing large errors and uncertainties in the determination of the return periods of extreme events12,65, incorrectly estimating consequences and potentially misrepresenting coastal risks and influencing the subsequent coastal management decisions.

Aiming to improve the characterisation of extreme wave conditions and associated return periods for coastal areas based on validated and calibrated global wave reanalysis data, this work evaluates ERA5 and WAVERYS reanalysis against a database of 326 coastal buoys. Results indicate a globally consistent underestimation of extreme wave conditions and associated return periods, which can be considerably reduced using a range of calibration approaches based on in-situ observations. To overcome data limitations in many coastal areas worldwide, a global calibration is proposed and shown to improve accuracy and reduce uncertainty in estimates of extreme wave conditions in coastal areas. The findings of this work provide an improved understanding of the limitations of global wave reanalyses and enhance their applicability around the world’s coasts.

Results

Overview

The evaluation of the ERA5 and WAVERYS reanalysis time series for significant wave height (Hs), mean wave period (Tm), peak wave period (Tp) and mean wave direction (Dirm), was performed against data from 326 globally distributed coastal buoys using standard errors metrics (see Materials and Methods). Independent calibration of the wave reanalysis data for each individual buoy location was performed using four transfer functions, improving estimates of wave parameters in areas where in-situ data is available. Considering all the locations where a negative relative bias in extreme wave conditions was identified in the reanalyses as a single time series, the same transfer functions were applied to determine and validate a global calibration for extreme wave conditions (see Materials and Methods).

Evaluation of global wave reanalysis

When compared against the 326 coastal buoys, both reanalyses show a very strong correlation for Hs, with a mean Pearson correlation coefficient (R, Eq. (5)) of 0.91 for WAVERYS and 0.90 for ERA5 (all R values reported are statistically significant for p-value <0.01). Overall, WAVERYS performed better than ERA5, presenting higher or identical R coefficients for Hs, Tm, Tp and Dirm (Supplementary Fig. 1a, b). On average, the lower scatter index (SI, Eq. (6)) for Hs, 22% for WAVERYS and 25% for ERA5, and smaller root mean square error (RMSE, Eq. (7)), 0.33 m and 0.37 m respectively (Supplementary Figs. 2a, b and 3a, b), confirm the improved performance of WAVERYS, reinforcing the findings of Law-Chune et al.27. Although the reanalyses display both negative and positive relative biases (Eq. (8)) for Hs (Supplementary Fig. 4a, b), there is a consistent underestimation of Hs in the upper quantiles, above the 95th percentile for over 95% of the buoy locations (Fig. 1a, g for Belmullet in Ireland and Supplementary Fig. 5a, b) and above the 99th percentile for 98% of the 326 locations considered (Supplementary Fig. 6a, b). Positive biases were found around islands (e.g., in the Azores, Canary Islands, Caribbean Islands and Taiwan) and channels (e.g., Strait of Gibraltar and English Channel), which evidences the limitations of global reanalyses to resolve the sheltering effects of small islands and complex coastal configurations. The relative bias computed using only data from the higher percentiles (>95th percentile) is negative for 94% (78%) of the buoy locations in WAVERYS (ERA5), with results indicating an average underprediction of −16% (−9%) in the two reanalyses (Fig. 2a, b). Given the underestimation of the extreme wave heights, the resulting Hs 50-year return periods are underestimated on average by 1.11 m for WAVERYS and 0.80 m for ERA5 (Supplementary Fig. 7a, b). Return periods were only determined for locations with more than 12 years of observations, which corresponds to 30% of the 326 buoys used for calculation of the error metrics. Underestimation is highest in enclosed basins or sheltered locations: specifically the Sea of Japan with deviations of 3.08 m for the Hs 50-year return period determined using WAVERYS; and coastal areas in South Korea and the Balearic Sea with 2.95 m deviation for ERA5 (Supplementary Fig. 7a, b). Overall, the Hs value for the 50-year return period is underestimated in 89% (80%) of the locations considered for extreme value analysis in WAVERYS (ERA5).

Fig. 1: Calibration of wave reanalysis parameters for Belmullet.
figure 1

Example of original (a, c, e, g, i, k) wave parameters (Hs, Tm and Tp) and calibrated for individual buoy locations (b, d, f, h, j, l). Results shown for the Belmullet buoy location, in western Ireland, for WAVERYS (af) and ERA5 (gl). Results for mean wave direction (in °) are shown for original (light red) and calibrated for individual buoy locations (light blue) for WAVERYS (m) and ERA5 (n) and compared to the buoy data (grey). The black dots indicate the percentiles (from 1st to 99th).

Fig. 2: Relative bias for extreme Hs values.
figure 2

Global distribution of bias (in %) for extreme (above the 95th percentile) values of Hs between wave buoys and original reanalysis data (a, b), individual calibration (c, d) and percentual improvement obtained with the calibration (e, f) for WAVERYS (a, c, e) and ERA5 (b, d, f). Red (blue) values represent an underestimation (overestimation) from the reanalysis (in ad).

Regarding Tm and Tp, results indicate strong correlation for both reanalyses, with an average R of 0.80 and 0.61 for WAVERYS and 0.79 and 0.59 for ERA5, respectively (Supplementary Fig. 1c–f). Underestimation was generally observed for Tm, with higher values identified in ERA5 around islands and channels (Supplementary Fig. 4c, d). The average relative bias for Tm is −9% (−13%) for WAVERYS (ERA5), increasing to −12% (−18%) when calculated for the data above the 95th percentile (Supplementary Fig. 8a, b). The underestimation of Tp in both reanalyses is observed mostly in the higher percentiles (Supplementary Fig. 8c, d). In fact, while the average relative bias for Tp is 1% (−1%) in WAVERYS (ERA5), it increases to −15% (−20%) when considering only the values above the 95th percentile, with both reanalyses showing underestimation for 95% of the buoy locations (Supplementary Fig. 8c, d). This is reflected in the mean difference between buoy and reanalysis data for the extreme Tm and Tp values (Supplementary Figs. 5c–f and 6c–f). A total of 183 buoys incorporated in this analysis included directional information, allowing a comparison of Dirm between buoy and reanalysis data. From the Dirm histograms, the dominant wave regimes are well reproduced by both reanalyses, although with some deviation in directional peaks (Fig. 1m, n). The average absolute difference between the circular mean direction of the buoys and the reanalyses is 17° (20°) for WAVERYS (ERA5), with the largest differences identified in the Azores islands, Strait of Gibraltar and areas of the Baltic sea (Supplementary Fig. 9a, b). Overall, both reanalyses evidence a strong correlation of Dirm with buoy data, with an average R of 0.64 (0.62) for WAVERYS (ERA5) (Supplementary Fig. 1g, h). The highest R values (>0.90) were obtained along the West coast of the USA and the North coast of the Netherlands, while the lowest were in sheltered locations in the Caribbean, Azores and Strait of Gibraltar (R < 0.2 for both reanalyses). The average RMSE of Dirm was 40° (44°) for WAVERYS (ERA5), with the lowest performance in the same sheltered locations, where RMSE exceeds 80° (Supplementary Fig. 2g, h). The relative bias for Dirm is positive for 63% (61%) of buoy locations for WAVERYS (ERA5), indicating that reanalysis waves are dominantly biased in a counter-clockwise direction, with an average circular bias of less than 2° in both reanalyses (Supplementary Fig. 4g, f). The deviations in Dirm between buoy and reanalysis data are consistent in both reanalyses for 85% of the cases. These results highlight a similar performance between both reanalyses in terms of Dirm with differences in buoy and reanalysis likely due to the complex interaction between wind-sea and swell components39 and coastal configuration.

Calibration of reanalyses for individual buoy locations

Depending on the method applied to calibrate the reanalysis data, different improvements between R and relative bias are achieved. When the first-degree polynomial (Eqs. (9) and (10)) is used, R is unmodified, and the bias is corrected. When rotation around the mean is applied instead (Eq. (12)), the time series rotates and the bias remains the same, but R is improved. When a power function is used (Eq. (11)), R remains identical while the bias is improved. Additionally, in a limited number of cases (1–15% depending on the model/variable considered), the original reanalysis data were already accurate, and no calibration was applied. In 71% of the cases, either a power function (Eq. (11)) or rotation (Eq. (12)) was applied to correct Hs in the reanalyses, with R improving by 34% (26%) for WAVERYS (ERA5) (Supplementary Fig. 10a, b). For the locations that required bias correction (i.e., the reanalyses were corrected with a first-degree polynomial (Eqs. (9) and (10)) or a power function (Eq. (11)), changes in the relative bias resulted in an average improvement of 99% (97%) for WAVERYS (ERA5). SI and RMSE improved with all the calibration methods applied, with SI reduced on average by 18% for both reanalyses, while the RMSE for Hs improved by 18% (20%) for WAVERYS (ERA5) (Supplementary Fig. 10a, b). The reanalysis time series for Tm were calibrated with a first-degree polynomial (Eq. (10)) in 49% of the cases. The calibration of Tm resulted in a relative bias improvement of 86% (69%) for WAVERYS (ERA5). Calibration of Tp was performed using the rotation around the mean (Eq. (12)) for 59% of the cases, with 15% not requiring calibration. Overall, Tp calibration contributed to an improvement of 36% (24%) in relative bias for WAVERYS (ERA5). It is important to note that in some cases the bias value in Tp is related to the presence of secondary swells with longer wave period, which are not correctly captured in either reanalysis. In terms of Dirm, the difference in the circular mean computed between the buoys and the reanalyses was reduced from an average of 17° to 10° for WAVERYS and from 20° to 13° for ERA5. The relative bias decreased by 21% (16%) on average for WAVERYS (ERA5), but no significant improvements were obtained for R, SI and RMSE. While no calibration of the directions was deemed necessary for 50% of the buoy locations, results obtained for the calibrated sites show considerable improvements (e.g., Belmullet, Fig. 1m, n).

For extreme wave conditions, i.e. Hs above the 95th percentile, the calibration of Hs reduced the average underestimation by 0.30 m (0.15 m) for WAVERYS (ERA5) (Supplementary Fig. 10b). When considering Hs above the 99th percentile, the calibration improves results further, with a reduction in the average underestimation of 0.45 m (0.22 m) for WAVERYS (ERA5) (Supplementary Fig. 10). The relative bias for Hs values above the 95th percentile (Fig. 2c, d) was reduced to −7% (−8%), which corresponds to an average improvement of 56% (36%) for WAVERYS (ERA5) (Fig. 2e, f and Supplementary Fig. 11a, b). The calibration for individual buoy locations allowed for relevant improvements in the Hs return period values, with differences in the 50-year return period estimated with buoy observations and with reanalysis data reduced by ~0.68 m (0.37 m) for WAVERYS (ERA5), corresponding to a 61% (46%) average improvement (Fig. 3b and Fig. 4a, b). This included areas like the southeast coast of Spain, where the underestimation of the 50-year return period in WAVERYS was corrected by 2.5 m with the calibration for individual buoy locations. The underestimation of the relative bias for the extreme (above the 95th percentile) Tm (Tp) was reduced to −11 % (−12%) for both reanalyses, while the average underestimation of the extreme Tm (Tp) values was reduced to 0.8 s (1.5 s).

Fig. 3: Effect of individual and global calibration on Hs 50-year return periods.
figure 3

a Return period values for original WAVERYS reanalysis (red) and buoy data (black), and with the individual (light blue) and global (dark blue) calibration for the location in the coast of South Carolina, USA. b Box plot of differences in Hs 50-year return period between buoy data (only buoys with negative relative bias for extreme Hs) and reanalysis data, estimated with the original reanalysis data, individual calibration, and global calibration for WAVERYS and ERA5. The bottom and top of each box plot are the 25th and 75th percentile, the line in the middle the median, the whiskers correspond to 99.3% of the data and the outliers are reported as circles.

Fig. 4: Relative improvement of the Hs 50-year return period.
figure 4

Global distribution of the improvement (in %) for the 50-year return period Hs calculated for the wave buoys and individual calibration (a, b) and global calibration (c, d) for WAVERYS (a, c) and ERA5 (b, d).

Global reanalysis calibration

Based on the common underestimation of Hs and Tm in both reanalyses for most coastal areas, a global calibration for each reanalysis dataset is proposed. The equations for the global calibration of Hs and Tm were determined by identifying the best-fit parameters considering the entire dataset as a unique time series, but excluding the data from buoys located in areas where extreme wave heights are overestimated by the reanalyses. These are areas where the extreme relative bias is positive (Fig. 2a, b), and can be found in sheltered coasts and areas with complex coastal configurations, which represent 6% (22%) of buoy locations for WAVERYS (ERA5). For both reanalyses, a first-degree polynomial performed better for the global calibration of Hs, indicated by Eq. (1) for WAVERYS and Eq. (2) for ERA5, while a first-degree polynomial with intercept provided the best correction for Tm, shown in Eq. (3) for WAVERYS and Eq. (4) for ERA5. No calibration method provided a significant improvement for Tp, in the reanalysis datasets.

Based on the data from a group of randomly selected buoys (30% of the dataset) excluded from the global reanalysis calibration, a validation of the global calibration equations using the error metrics (detailed in Materials and Methods) was performed (Table 1). The mean difference between buoy observations and the globally calibrated reanalysis data for Hs values above the 95th percentile improved on average by 0.14 m (0.10 m) for WAVERYS (ERA5), while the relative bias for Hs above the 95th percentile, was reduced to −12% (−11%) (Table 1). The averaged relative bias for extreme Tm was reduced by 4% (12%) for WAVERYS (ERA5), indicating that the underestimation was partially corrected. After applying the global calibration to all the buoy locations where a negative relative bias for extreme wave conditions was identified, the 50-year return period Hs values were corrected on average by 0.30 m (0.16 m) for WAVERYS (ERA5) (Fig. 3b) and improvements in the overall return period estimates were observed in 90% (70%) of the locations for WAVERYS (ERA5) (Fig. 4c, d).

$${{H}_{s}}_{{CWYS}}=1.077* {{H}_{s}}_{{WYS}}$$
(1)
$${{H}_{s}}_{{CERA}5}=1.045* {{H}_{s}}_{{ERA}5}$$
(2)
$${{T}_{m}}_{{CWYS}}=0.870* {{T}_{m}}_{{WYS}}+1.124$$
(3)
$${{T}_{m}}_{{CERA}5}=0.928* {{T}_{m}}_{{ERA}5}+1.156$$
(4)
Table 1 Error metrics for original and calibrated WAVERYS and ERA5 global wave reanalyses.

Discussion

In the past decade, the performance of global reanalyses to estimate (hindcast) past wave conditions has improved significantly, and reanalysis data are now widely applied in different fields. They are highly reliable in the open ocean27,29,66, while their inaccuracies have been increasingly reported in coastal areas35,40,42. Here, the evaluation of global reanalysis against data from 326 buoys distributed around the world’s coasts allowed to determine that the higher-resolution WAVERYS reanalysis performed better than the more widely used ERA5 reanalysis. These results are consistent across most error metrics (R, RMSE, and SI) and for the most used wave parameters (Hs, Tm, Tp and Dirm) in coastal areas. Law-Chune et al.27 also compared these reanalyses with buoy observations from the Copernicus Marine Service observations database, and although a different set of coastal buoys was considered (up to 200 m water depth), WAVERYS performed better than ERA5, regardless of the coastal setting and geographical location of the buoys27. Spatially, lower SI values (Supplementary Fig. 3a, b) are found on the north-eastern Pacific coast and larger SI, up to 30%, is observed in semi-enclosed regions such as the Mediterranean Sea27. In terms of wave direction, the mean differences for Dirm are consistent with previous research27, with higher values found in enclosed basins and around small islands. So far, comparisons of the global performance of WAVERYS and ERA5 against coastal buoy observations are limited, but a recent regional assessment by Crespo et al.67 for the south-western Atlantic Ocean, an area underrepresented in terms of wave buoy data, also identified improved performance by WAVERYS for mean and extreme Hs.

The main contribution of this work is the detailed analysis of the errors in the estimation of extreme wave conditions in global wave reanalyses and the impact this has on the determination of return period values, which are fundamental for coastal risk analysis. A main finding is the overall underestimation of coastal Hs in global reanalyses, especially for extreme wave heights (Hs > 95th percentile), as highlighted by average negative biases of −16% (−9%) in WAVERYS (ERA5). Underestimation of extreme Hs in ERA5 based on comparison with buoy data was also found in Chinese waters40, around Japan42, and in the Mediterranean sea53,68. Cases of extreme Hs overestimation were identified for 6% (22%) of the buoy locations when compared to WAVERYS (ERA5). The overestimation in WAVERYS occurred mainly in coastal areas where the reanalysis resolution is insufficient to resolve nearshore wave changes across complex shelf bathymetries (in the presence of reefs, headlands, sheltering land), while ERA5 was found to overestimate extreme Hs for buoy locations that are closer to the coastline due to its coarser resolution. Previous work evidenced the overestimation of wave heights in ERA5 for locations in the Arabian Sea69, as well as its precursor ERA-Interim in several coastal locations around India41,70,71. However, no regional patterns were identified in the present study for overestimated values of Hs. Instead, such cases corresponded to very specific locations, often surrounded by cases of underestimation in extreme waves and most likely related to the limitations of coarse resolution global models to resolve local wave propagation and transformation.

Underestimation of Tm was also found in most coastal areas analysed, with an average relative bias of −9% (−13%) for WAVERYS (ERA5). Tm was overestimated in less than 10% of the buoy locations considered in both reanalyses, mostly coincident with the sites where Hs was also overestimated, and for very long wave periods (>95th percentile). Overestimation of Tm in ERA5 is not uncommon and has been reported along the Chinese coast40, Arabian Sea, Bay of Bengal72 and in the Mediterranean Sea53. Tp was also consistently underestimated by global reanalyses for 90% of the buoy locations and for conditions above the 95th percentile, with average relative biases of −15% (−20%) for WAVERYS (ERA5), confirming previous results for the North Indian Ocean72 and in the Greek seas68.

Assessments of the performance of global wave reanalysis rarely consider wave direction, mainly because in coastal areas the wave direction is highly dependent on the model resolution and the ability to simulate small scale processes53. However, accurately determining wave direction is extremely important for coastal studies related to sediment transport73 and coastal erosion74. Although the calibration of wave direction cannot be applied at a global scale and instead requires local buoy data, application of a shift in the distribution of wave directions as demonstrated in this work, allows to improve the representation of the peaks in the directional distribution and achieve a better average direction in the reanalysis datasets (see Fig. 1n, m as an example). Furthermore, WAVERYS and ERA5 present similar wave direction distributions, and this is likely due to the use of the same 2-minute ETOPO2 bathymetric grid and the fact that WAVERYS is forced with the ERA5 10-m wind conditions.

The systematic underestimation of wave parameters in the ERA reanalysis products developed by ECMWF is not new. It was reported for the previous reanalysis, ERA-Interim, when compared to buoy data from the National Data Buoy Centre in the USA38, in the Atlantic coast of Spain and the Italian coast75. It is widely known that global wave models, such as WAM used by both WAVERYS and ERA5 in different versions, tend to underestimate extremes in Hs, and hence Tp63. This is mostly due to uncertainties in wind forcing39, but also because of limitations in the computation of various wave processes (bottom friction, wave breaking, island shadowing, and fetch length)34,63. However, more than the numeric errors in reanalysis datasets, the potential risks associated with using model outputs for coastal applications that systematically under or overpredict wave conditions lay in the likely under or overestimation of the outcome. This includes assessments of wave energy and impact, erosional potential, flooding severity, amongst others. For example, deviations in the modelled Hs are propagated quadratically in terms of wave energy calculation (or linearly in the case of Tp). This is particularly relevant for extreme Hs conditions, as the higher percentiles are poorly estimated by wave models63.

Coastal hazard and risk assessments are often based on the definition of extreme values for different return periods13. The widespread global underestimation of the wave parameters associated with a given return period based on original reanalysis data, highlighted in the results presented here for most coastal locations, will result in a potential minimisation of the hazard or risk level. Despite this limitation, global wave reanalyses have and continue to be widely used for coastal studies without calibration or without quantifying their uncertainties. The applications of uncalibrated reanalyses include analysis of wave climate71,76 and extreme events57,58, coastal wave energy assessments70,77,78, estimation of overtopping20, shoreline retreat62 and coastal flooding56. As the recognition of the limitations of global reanalyses increases, it becomes evident that it is fundamental to improve return period estimates from global reanalyses65 to ensure the accurate definition of coastal hazards and risk analysis at local to global scales.

To minimise the underestimation that is common in global wave models, particularly for extreme conditions, different calibration techniques have been developed to correct wave reanalysis36,48,50 or explicitly the extreme wave heights79 and their return periods12,47,51. Although directional calibration techniques exist48,50 for Hs, the calibration of the direction itself, and the periods, have not been addressed. Underestimation of Hs extremes occurs randomly in wave models and not all extreme Hs values need correction46. Thus, the application of simple parametric corrections can also lead to errors and uncertainties, since many complex factors influence the prediction of extreme events63. However, the results obtained by applying four different transfer functions to calibrate time series of wave parameters, demonstrated that it is possible to improve the estimation of extreme wave conditions by correcting the overall underestimation of Hs, Tm and Tp in reanalysis datasets. Substantial improvements were observed (Fig. 3a and Fig. 4a, b) in the estimation of the Hs values associated with long return periods (50 years), which are often used for hazard and risk assessments in coastal areas13, demonstrating the importance of calibrating wave reanalyses before their use in coastal applications.

In ideal conditions, calibration should be performed against wave buoy data from a location close to the reanalysis grid node. This not only improves the reanalysis but allows extending, with higher accuracy, the often limited temporal coverage of wave buoy time series. Once a reanalysis is calibrated for a specific location, the results can be extended to adjacent areas with similar morphology and wave climate. However, when in-situ observations are not available, this work proposes a global calibration equation for Hs and Tm, to be applied in coastal areas where ERA5 and WAVERYS reanalyses are known to underestimate wave heights and mean periods. This excludes locations with complex coastal configurations, sheltered by islands, headlands or reefs, as well as nearshore areas where shallow water processes cannot be properly captured by global coarse resolution reanalyses. In those cases, calibrated reanalysis could be used after downscaling with a regional wave model that better resolves the impacts of complex coastal configurations on wave propagation. Based on the validation performed with the buoys that were not included in the global calibration equations, the estimation of Hs for different return periods using the global calibration achieves improvements for 90% (70%) of the buoys compared to WAVERYS (ERA5), although the calibration for individual buoy locations allows a greater improvement. The main potential of the global equations proposed is their broader applicability when compared to the individual calibration, which require in-situ data to be applied. The underestimation of the widely used Hs 50-year return period was reduced on average by 25% (15%) for WAVERYS (ERA5). This allows more reliable results in any coastal engineering application that requires the use of return periods, whether for design purposes, coastal hazard and protection analysis or evaluation of wave energy devices. Although global calibration equations have the merit of being a simple method to correct wave reanalysis data, their use should consider the specific characteristics of the wave climate in the study areas. In areas where the overall Hs time series is commonly overestimated by global reanalyses, such as in extremely shallow or sheltered areas, the global equation would further increase the error (i.e. the overestimation). Future work can potentially explore the development of basin scale calibrations to be applied for areas where reanalyses have clear and distinct regional patterns. This would allow a regionally consistent calibration for those areas where a global calibration may not be suitable.

In conclusion, by comparing two of the most recent and widely used global wave reanalyses, WAVERYS and ERA5, with observations from 326 coastal buoys in different areas of the world, this work demonstrates that global reanalyses have important limitations in the determination of extreme wave parameters in coastal areas, including widely used return period estimates. This highlights the need to perform site specific validation and calibration before using wave reanalyses in coastal areas in order to reduce their uncertainties. Such limitations can be problematic, particularly when assessing coastal hazards and associated risks. To address this, this work proposes efficient and reproducible calibration approaches to improve wave reanalyses in coastal areas, either using site-specific observations where available, or globally derived approximations.

Materials and methods

Two of the most recent and advanced wave reanalyses were selected for evaluation: ERA529, the latest reanalysis released by the European Centre for Medium-Range Weather Forecasts (ECMWF) with 0.5° resolution, and WAVERYS27, a 0.2° resolution reanalysis by the Copernicus Marine Environment Monitoring Service (CMEMS). The reanalyses have different resolutions, dissipation terms, and white capping terms. WAVERYS includes wave-current interactions but is not coupled with an atmospheric model like ERA5. Both reanalyses have been extensively validated against buoy and satellite data (e.g. refs. 27,40,42,66,). In this work, the calibration of Hs, Tm, Tp and Dirm was performed for 326 buoys located in distinct coastal areas around the world (Supplementary Table 1).

Wave reanalysis data

ERA529 is produced by the ECMWF, covers the period from 1950 to the present at hourly intervals, and it has improved spatial and temporal resolution and performance in comparison to its predecessors (ERA-40 and ERA-Interim)28,29. ERA5 provides records of the atmosphere, land surface, and ocean waves. The atmospheric model is coupled to WAM, a third-generation spectral WAve Model developed by the WAMDI Group80. ERA5 also assimilates Hs measured from satellites (ERS1, ERS2, SARAL, CryoSat2, Jason1, Envisat) into the predicted wave spectra29. The ERA5 wave data are available from the Copernicus Climate Data Store (https://cds.climate.copernicus.eu/). For the present study, synthetic parameters, derived from the wave spectra, were used. The parameters considered are the Hs of combined wind waves and swell, Tp, Tm, and Dirm.

WAVERYS is the first global wave reanalysis produced by CMEMS27 and released in December 2019. It has a spatial resolution of 0.2° and covers the period from 1993 to 2019 with a temporal resolution of 3 h. WAVERYS is not coupled with an atmospheric model but includes the 3 h surface currents from GLORYS12V181 global physical reanalysis. The altimeter wave data from ERS1 until Sentinel-3A missions27 were assimilated into the model. WAVERYS is forced with the ERA5 10-m wind fields and the wave model used is MF-WAM version 4 implemented by Météo-France, which differs from the version used in ERA5 for the input and dissipation terms39.

For both reanalyses, the data from the cell closer to the mooring location of each buoy were considered, unless the location was beyond a radius of 0.099° (0.249°) for WAVERYS (ERA5). In those cases, the data from adjacent cells were interpolated using a weighted approach based on the distance between the buoy location and adjacent grid cells centre. Regardless of the use of weighted averages or the values for individual grid cells, it must be recognised that reanalysis data are estimates of the average conditions for entire grid cells82. This has implications for the consistency between buoy and reanalysis data, not only because of the spatial scale but also because the grid cell depth will not represent exactly the buoy depth. This may contribute to differences in wave parameters between wave reanalysis and coastal buoys, particularly as depth influences wave transformation. However, the improved resolution of both reanalysis datasets, the developments in model source terms and quality of bathymetric grids, as well as improved assimilation of a larger range of observations27,29, provides increased confidence that buoy datasets are spatially consistent with the reanalysis datasets. In addition, the buoy data were linearly interpolated to a 1-hour temporal resolution (as described below), and such averaging over time is considered to bring the temporal and spatial scales of buoy and reanalysis data closer together83.

Wave buoy data

Buoys are available for limited locations around the world, with the majority in the Northern Hemisphere, particularly in the North Pacific and North Atlantic, and large data gaps in the Southern Hemisphere14. In addition, buoy data are often discontinuous or cover only short periods, which makes the analysis of the return periods unfeasible in many locations12. This work used buoy measurements to evaluate and calibrate the two wave reanalyses. The in-situ wave observations were extracted from the global network of buoys provided by the CMEMS, with product name INSITU_GLO_WAV_REP_OBSERVATIONS_013_04584, supplemented by additional buoys sourced from hydrographic agencies for underrepresented areas. The CMEMS database includes 1489 buoys with information about latitude, longitude and time series of Hs and in some cases Tm, Tp and Dirm, together with quality flags (from 0 to 9, more details in the product user manual85) for each parameter. In cases where only the 1D wave spectra were available, the synthetic parameters were computed from the spectral moments. The criteria for selecting coastal buoys from the database included: i) water depth, only buoys in water depths of less than 100 m were considered (water depth was estimated using the global GEBCO2022 bathymetry); and ii) record length, only buoys with more than one complete year of data and including the winter months were considered. Buoys in lagoonal areas, small enclosed basins, protected by headlands or around small islands (less than ~22 km in length, which is the spatial resolution of WAVERYS) were removed, as their setting is not properly resolved by the relatively coarse resolution of global reanalyses. The buoys situated in coastal locations where both reanalyses had empty cells were also automatically excluded. All buoy records were inspected to remove spikes, quality control flags, flat lines, wrongly assigned variables and values outside the acceptance range86. The criteria and filters used reduced the set of buoys considered in this study to 326 (Supplementary Table 1). The time series from the buoys were interpolated at 1-hour intervals to ensure consistency with model outputs and homogenise variable temporal resolutions (e.g., some buoys increase the sampling frequency to 30 min during storm events) or fill in missing records for data gaps of less than 6 h. Gaps longer than 6 h were excluded from the analysis. The records from the buoys ranged from 1 year to 42 years, with 83 buoys including more than 12 years of data. These 83 buoys were used for the estimation of return periods, as described below. The number of buoys used for comparison with WAVERYS and ERA5 differs depending on the presence of a reanalysis cell at the buoy location and on the temporal range of the reanalysis, as WAVERYS only covers the period from 1993 to 2019. Moreover, while 326 buoys have records for Hs, only 210 also recorded Tm, 270 Tp and 183 Dirm.

Evaluation

The performance of wave reanalysis was evaluated against wave data (for Hs, Tm, Tp and Dirm) from the in-situ buoy records for each of the selected coastal locations (Supplementary Table 2). Standard error metrics were used to evaluate the hourly time series, namely the Pearson’s correlation coefficient (R), scatter index (SI), root mean square error (RMSE) and relative bias (bias):

$$R=\frac{{\sum }_{i=1}^{N}({M}_{i}-\bar{M})({O}_{i}-\bar{O})}{\sqrt{{\sum }_{i=1}^{N}{\left({M}_{i}-\bar{M}\right)}^{2}{\sum }_{i=1}^{N}{\left({O}_{i}-\bar{O}\right)}^{2}}}$$
(5)
$${SI}=\frac{\sqrt{\frac{1}{N}{\sum }_{i=1}^{N}{[\left({O}_{i}-\bar{O}\right)-({M}_{i}-\bar{M})]}^{2}}}{\frac{1}{N}{\sum }_{i=1}^{N}{O}_{i}}$$
(6)
$${RMSE}=\sqrt{\frac{1}{N}{\sum }_{i=1}^{N}{\left({M}_{i}-{O}_{i}\right)}^{2}}$$
(7)
$${bias}=\frac{{\sum }_{i=1}^{N}({M}_{i}-{O}_{i})}{{\sum }_{i=1}^{N}{O}_{i}}* 100$$
(8)

Where \({O}_{i}\) represents the observations, \({M}_{i}\) the modelled data from the wave reanalysis and N is the number of measurements. The overbar refers to the mean. A negative (positive) bias represents an underestimation (overestimation) from the wave reanalysis. For the wave directions circular R, SI, RMSE and bias were estimated taking into consideration the fact that when comparing two angular variables in the sector from −90° to 90° the values ranging from 270° to 360° need to be corrected to −90° and 0°.

Calibration

Calibration for individual buoy locations

To calibrate the wave reanalysis (y) against the buoy observations (x) four transfer functions were tested to better represent the behaviour of wave data from buoys with different locations around the world. The functions were applied to Hs, Tm and Tp of each buoy. The selection of the best fit function was performed automatically by finding the fit that optimizes both the standard error metrics for the entire time series and for the extremes (i.e. the values above the 95th percentile). This was done by estimating the improvement obtained with each fit (in terms of R, RMSE, SI, mean absolute difference, and bias) and finding the fit that would optimise the most parameters (Supplementary Table 3). The transfer functions used were a first-degree polynomial without and with intercept (Eqs. (9) and (10)), a power function (Eq. (11)), and a rotation around the mean (Eq. (12)), from which Eq. (13) is derived:

$${y}^{{\prime} }={ay},$$
(9)
$${y}^{{\prime} }={ay}+b,$$
(10)
$${y}^{{\prime} }=a{y}^{b},$$
(11)
$$\left[\begin{array}{c}x{\prime} \\ y{\prime} \end{array}\right]=\left[\begin{array}{cc}{{\cos }}(1-a) & -{{\sin }}(1-a)\\ {{\sin }}(1-a) & {{\cos }}(1-a)\end{array}\right]\left[\begin{array}{c}x-{x}_{c}\\ y-{y}_{c}\end{array}\right],$$
(12)
$${y}^{{\prime} }=\sin \beta \left(x-{x}_{c}\right)+\cos \beta \left(y-{y}_{c}\right),{with}\,\beta =\left(1-a\right)$$
(13)

Linear regression methods, such as Eqs. (9) and (10), have been widely used to correct modelled wave data against observations, and have been applied to the full Hs time series36, to the extreme Hs79 or to correct the Hs associated with specific return periods12,47,51. Here, the equations are applied to the entire original reanalysis wave time series from WAVERYS and ERA5 to obtain a calibrated time series (y’) and as a result, the bias is corrected. The power function has also been used in previous studies36,48,50, with the correction parameters \(a\) and \(b\) varying with the direction of wave propagation. In this study, the parameters were instead constant variables for the entire time series. The rotation around the mean (xc, yc) was, to the authors’ knowledge, applied here for the first time in wave data calibration. For this purpose, a rotation matrix was used (Eq. (12)), which depends on the angular coefficient found through the first-degree polynomial fit with intercept. This method was deemed appropriate to correct both the underestimation of the higher Hs values from the reanalysis and the overestimation of the lower Hs values. The bias is not modified. The rotation matrix in Eq. (12) refers to a counterclockwise rotation, which corresponds to cases where the model underestimates the Hs compared to the observations. In case of overestimation, a clockwise rotation matrix is required, which can be obtained by changing the sign of both sines in Eq. (12).

For calibrating the wave directions, a different method was implemented. First, the wave roses from the buoys and from the reanalysis were compared visually. If differences between the observed and modelled wave direction distributions were observed, a calibration was performed. This was done by dividing wave direction data into 2, 3 or 4 sectors depending on the number of peaks present in the distribution (e.g., for the Belmullet buoy in Fig. 1m, n two sectors were identified: from 0° to 180° and from 180° to 360°). For each sector, the difference between the model and buoy averaged directions was determined and subtracted/added to the reanalysis data, according to the need to rotate each sector in a counterclockwise/clockwise direction.

Global calibration

A global calibration was implemented to improve the wave reanalyses time series where these are known to underestimate Hs and Tm, particularly above the 95th percentile of the wave height and period distributions. Underestimation of the extreme bias in the reanalysis datasets (Fig. 2a, b) occurs in 94% (78%) of buoy locations for WAVERYS (ERA5), while overestimation was observed in 6% (22%) of buoy locations for WAVERYS (ERA5). This smaller subset of buoys where reanalyses overestimate the extreme bias was excluded from the global calibration, as these locations correspond to areas of complex coastal configuration, poorly resolved in global wave reanalyses models. Such locations are found mainly in areas sheltered by islands (e.g., Hawaii, Azores, Sardinia), headlands (e.g., Cape Cod) or reefs (e.g., Great Barrier Reef), as well as in sections of enclosed basins (e.g., parts of the Baltic and North Sea), as global reanalysis models have known limitations in resolving sheltering effects35.

From the buoy datasets identified for the global calibration, 70% were randomly selected and used to derive the equations for the global calibration while the remaining 30% were used to independently validate the approach. A sensitivity analysis (Supplementary Figs. 12 and 13) was performed by selecting different percentages of buoys for the derivation of the equations and for the validation, as well as different random groups, which did not result in observable differences in the error statistics. After combining the data from the randomly selected buoys and calibrating ERA5 and WAVERYS reanalysis for those locations as a unique time series, Eqs. (9)–(11) were applied to find the best-fit calibration equations to be used globally to correct the underestimation of the Hs and Tm values above the 95th percentile. The equations found were then validated against the data from the remaining 30% of buoys.

Calculation of return periods

The calculation of the return periods of Hs was based on the Peak-Over Threshold method87 and a de-clustering algorithm to characterize extreme Hs events88, following the approach proposed by Oikonomou et al. (2020)89. Independently and identically distributed extreme value datasets were obtained by applying a threshold based on the 95th percentile value90 of the Hs distribution and an independence criterion for separating consecutive events based on the extremal index calculation. Each extreme value dataset was then fitted to the Generalized Pareto Distribution to determine the return periods of an extreme occurrence in Hs. This was performed for each location using the wave buoy data (for buoys with more than 12 years of complete data), the original, individually calibrated and globally calibrated reanalysis data.