Background & Summary

Wind-generated waves are recognized as a key element of the climate system1, having considerable environmental2,3, geophysical3,4 and socioeconomic5 impacts globally. They are considered paramount to navigation planning, offshore and coastal engineering activities, and energy generation (from fossil to renewable energy)6 with structural design strongly dependent on wind-wave characteristics.

Furthermore, ocean waves are considered dominant drivers of coastal dynamics and stability7,8, and are key contributors to coastal sea-level extremes at multiple time-scales9,10. Hence, integrating non-stationary multivariate wave conditions into broad-scale comprehensive assessments of future coastal hazards and vulnerability is critical10,11 to avoid potentially costly maladaptation12. These assessments must consider not only wave run-up and swash contributions9,10,13 but also changes in littoral sediment supply whose effects on the open coasts can be as considerable as effects of projected future sea-level rise13,14,15. Impacts of a changing wave climate might also affect surfing tourism worldwide, a growing market with economic relevance16.

However, projected wave climate data is not available among the standard suite of climate variables used to characterize the climate system1,17 since coupled atmosphere-ocean general circulation models (GCMs) under the Coupled Model Intercomparison Project 5 (CMIP5)18 do not usually include wind-wave-dependent parameterizations As a result, the availability of projected wind-wave climate data is limited relative to other climatological parameters such as temperature, precipitation and/or sea level. Using atmospheric forcing derived from CMIP5 GCM models to force dynamical or statistical wave models, multiple international climate research groups19,20,21,22,23,24,25,26,27,28 have developed ensembles of global wave climate projections. However, these standalone studies cover different subsets of the uncertainty space (e.g., number of climate models, or emission scenarios), use different wave downscaling approaches, consider different historical and future simulation periods and provide different wave characteristics, within a range of data formats.

Hence, to date, there is no consistent global multivariate dataset of global wave climate projections capable of sufficiently sampling the uncertainty associated with projected future ocean wave climate available29 for widespread use by stakeholders, government, and the research community. Here, we describe the first community-driven dataset (COWCLIP2.0) of 21st century global wind-wave climate projections comprising different dynamical and statistical downscaled data. This collection assembles ten individual global datasets and was created under a pre-designed sampling framework established by the Coordinated Ocean Wave Climate Project (COWCLIP)30,31,32.

The COWCLIP2.0 dataset aims to meet current needs from many different perspectives, through the provision of an open access spatial data collection which provides consistent data (in terms of format, resolution and quality) across the global ocean. This dataset archived in Network Common Data Form (NetCDF) with CF (Climate & Forecasts) compliant metadata contains a large ensemble of 148 global ocean wave climate projections gridded on a 1° spatial grid resolution (i.e., a common grid is imposed on the various resolutions of the different datasets - section 2.3.2). The dataset provides a variety of standard wave statistics for present-day and future global multivariate wave fields (HS, Tm and θm) at monthly, seasonal and annual time scales (Table 1). The COWCLIP2.0 data also includes a new set of extreme HS indices designed by the Expert Team on Climate Change Detection (hereafter ETCCDI)33 (https://www.wcrp-climate.org/data-etccdi). These represent an additional set of ocean wave statistics (Table 2) relevant to climate change detection for a range of scientific applications.

Table 1 Summary of the wave contributions to the COWCLIP2.0 intercomparison data set.
Table 2 Summary of the variables and standard wave statistics included in the COWCLIP2.0 data set.

The COWCLIP2.0 dataset overcomes many previous limitations29, including lack of standardisation amongst existing CMIP5-driven global wave field simulations (e.g. wave variables and their statistics, spatial coverage and resolution and time-slices used for simulation) and limited sampling of dominant sources of uncertainty (e.g., model forcing and wave-downscaling uncertainties). This extensive wave information can now be widely used by different research communities (e.g. those focusing on natural hazards, coastal management, renewable energy, and ship navigation). The purpose is for this dataset to expand, as further projections of future global wave climate become available. It is envisaged that open and easy access to such dataset might provide a new stimulus and facilitates broad-scale coastal hazard and vulnerability assessments. It is also a robust basis for a range of inter-comparison analyses (e.g., quantification of sources of uncertainty)29, given the size and diverse nature of this dataset. For instance the annual and seasonal set of wave statistics from the COWCLIP2.0 ensemble were recently used to quantify the robustness and uncertainties in multivariate global wave projections34.

The development of the COWCLIP2.0 dataset helps wave researchers and data users to address the previous limited sampling of dominant uncertainties (e.g., model forcing and wave-downscaling) and significantly enhances interoperability. Before this dataset was created, researchers could access only a limited range of simulations, meaning assessment across projection scenarios and intra and/or inter-model ensembles were challenging31,35, with little possibility of sampling the uncertainty among wave downscaling methodologies. The inconsistencies in output wave parameters and data structures made intercomparison analysis between wave data produced by different modelling groups difficult.

Methods

In this data descriptor, we explain the methods and techniques used to generate the original data; the data acquisition process; the standardized framework applied; the methodology used to derive the vast range of wave parameters/statistics for historical and future periods; and the computational processing used to create this consistent global dataset.

The dataset presented has been compiled from ten standalone CMIP5-based global wave projection datasets, which have been extensively described elsewhere. Those wave projection data sets draw on thirty-three different CMIP5 climate models to force the dynamical and statistical wave models, listed in Table 1. In this section, we provide a concise description of the original data created by each wave climate modelling group, with the details of each contribution provided in Table 1.

CMIP5 GCM-forced dynamical global simulations

CSIRO: Multiple-model multiple-scenario ensemble

Hemer and Trenham19 (hereafter CSIRO) developed a global wind-wave climate projection dataset derived using a dynamical wave approach. Surface wind fields (10 m) at 3-hourly temporal resolution and sea-ice fields at monthly frequency, taken from eight CMIP5 GCMs, were used to drive a global WAVEWATCH III (WW3)36 wave model at 1° spatial grid resolution. The WW3 was setup using the ST3 (BAJ) source-term physics. The simulations were conducted under RCP4.5 and RCP8.5 emission scenarios for three time-slices: 1979–2005, 2026–2045 and 2080–2100.

JRC: Multiple-model, multiple-scenario ensemble

Mentaschi et al.20 (hereafter JRC) developed a global wave climate projection dataset using 3-hourly surface wind forcing from six CMIP5 models to drive a global WW3 model at 1.5° grid resolution. The WW3 model was set up using the ST4 source-term physics with no sea-ice forcing fields. The simulations were conducted between 1970–2100 under emission scenarios RCP4.5 and RCP8.5.

USGS: Multiple-model, multiple-scenario ensemble

Li et al.21 (hereafter USGS) used 3-hourly surface winds (no sea-ice concentration) simulated by four CMIP5 GCMs to generate an ensemble of wave conditions for a recent historical time-period (1976–2005) and projections for the middle and end of the 21st century for 2 forcing scenarios (RCP4.5 and RCP 8.5). The wave fields were simulated by the wave model WW3, applied globally at 1 × 1.25° grid resolution.

NOC: Single-model, multi-scenario ensemble

Bricheno and Wolf22 (hereafter NOC) developed a global wave climate projection for RCP4.5 and RCP8.5 scenarios, using surface wind forcing fields from EC-EARTH and daily sea-ice concentration to drive a global WW3 wave model (using the ST4 source-term physics). The global simulation was conducted at ~0.7 × 0.5° between 1970–2100.

ECCC (d): Multiple-model, single-scenario ensemble

Casas-Prat et al.23 (hereafter ECCC(d)) developed a global wave climate projection dataset at 1° grid resolution (refined to 0.5° nearshore). The simulations were conducted using the WW3 model using the ST4 source-term physics, forced by 3-hourly surface winds and daily sea-ice fields taken from the RCP8.5 emissions scenario simulations by five CMIP5 climate models. Simulations were conducted for two time-slices: 1979–2005 and 2081–2100.

IHE-DELFT: Single-model, single-scenario multiple-run ensemble

Semedo et al.24 (hereafter IHE-DELFT) developed a dataset of global wave climate projections using the WAM4.5 model at a 1° spatial resolution forced by surface wind fields and sea-ice concentration from seven different EC-EARTH realizations under the RCP8.5 emissions scenario. The WAM model was set up with default ST3 source-term physics and the simulation period spanned from 1979–2100 continuously.

LBNL: Single-model, single-scenario ensemble

Timmermans et al.25 (hereafter LBNL) developed a high-resolution global wave climate projection using monthly sea-ice fields and 3-hourly surface winds taken from the Community Atmospheric Model (or ‘CAM5’), the atmospheric model of the NCAR Community Earth System Model at 0.25° horizontal resolution. These surface wind fields were used to drive a global WW3 model (using ST4 source-term physics) between 1995–2005. Four simulations were performed using the high-resolution wind fields each initialized with a different microscopically perturbed atmospheric state. Future wave conditions were generated using the high-resolution 0.25° CAM5 wind forcing for RCP8.5 between 2081–2100 using observed SST + 2 °C.

KU: Single-model, multiple-scenario ensemble

Shimura et al.26 (hereafter KU) developed an ensemble of global wave climate projections using the WW3 model forced by 6-hourly surface winds (and monthly sea-ice forcing) at 0.5625° horizontal resolution from the high-resolution atmospheric MRI-AGCM3.2 H model. The WW3 model was setup using ST4 source-term physics. The forcing of MRI-AGCM were four future SST conditions derived from CMIP5 GCMs under the RCP8.5 emissions scenario. Simulations were conducted for two time-slices: 1979–2005 and 2079–2100.

CMIP5 GCM-forced statistical global simulations

IHC: Multiple-model, multiple-scenario ensemble

Camus et al.27 (hereafter IHC) developed a global wave projection dataset at 1° grid resolution on the basis of a weather-type statistical downscaling method. They used daily SLP fields as predictor from thirty CMIP5 climate models and a reference wave hindcast ‘Global Ocean Wave’ (GOW2.0) as predictand observations. A regression-guided clustering method based on linear regression and k-mean clustering was performed at each wave grid site of GOW2.0, from which estimates of average HS and Tm were obtained for each weather type (WT). The wave climate projections were estimated from the future probability of WTs and the mean value of the variables associated with each WT at each wave grid node. The CFSR (Climate Forecast System Reanalysis) and GOW2 data from 1970–2015 were used in the training of the statistical relationship by comparing estimations of monthly wave parameters obtained using the statistical approach and from the time series of GOW2.0. To diminish GCM biases, the SLP data were adjusted such that they have the same climatological average and standard deviation as the CFSR SLP dataset, used as proxy for observations over 1975–2005. The simulations were performed for two time-slices: 1975–2005 and 2010–2100 (under emissions scenarios RCP4.5 and RCP8.5).

ECCC (s): Multiple-model, multiple-scenario ensemble

Wang et al.28 (hereafter ECCC(s)) developed a global dataset of statistical wave projections using a multivariate regression model with lagged dependent variable to represent a SLP-HS (mean sea level pressure and significant wave height) relationship. ECMWF’s ERA-interim data was used to calibrate the statistical relationship between predictand HS and its SLP-based predictors. To reduce biases, the CMIP5 simulated SLP data fields were adjusted such that they have the same climatological mean and standard deviation as the ERA-Interim SLP data (used as proxy for observations for 1981–2000). The time series of 6-hourly SLP-based predictors obtained from the RCP4.5 and RCP8.5 scenarios simulations by twenty CMIP5 climate models were input to the calibrated statistical model to make projections of 6-hourly HS over a 150-year period from 1950–2100 under both scenarios.

Data processing framework

The COWCLIP experimental protocol was defined to provide a systematic, community framework and infrastructure to support validation, intercomparison, documentation and access for global (and eventually regional) wave climate projections forced from CMIP atmospheric datasets. Inconsistency between data (due to different historical and future time-slices, emission scenarios and variables) has been a key factor precluding our ability to move forward.

Based on this framework, we removed wind-wave parameter uncertainty by adopting a set of wave variables - significant wave height (HS), mean wave period (Tm) and mean wave direction (θm) - from which a standard set of wave statistics was obtained (across annual, seasonal and monthly time-frame resolutions) in a consistent manner (Table 2)31,32. This is explained below in Data Generation Method. The resulting data over three frequencies and three variables, capturing seven statistical measures (for HS and Tm, and two for θm) and seven extremes statistics measures (for HS annual), represents the entire dataset available for CMIP5-forced wave climate projection data. We note however that the USGS ensemble was not available to process with the COWCLIP code (section 2.3.1) - only annual and seasonal means and 99th percentile of HS were accessible.

The flowchart of the experimental framework employed, and described below, is shown in Fig. 1.

Fig. 1
figure 1

Flowchart of the COWCLIP2.0 experimental framework.

Data generation method

As part of the COWCLIP community framework, code was developed with programming language Fortran90 to ensure a consistent and precise computational data processing. The code comprises three functions (getStat.f, getStatDir.f and getHsEx.f) to calculate two standard sets of statistics, using sub-daily raw data from each standalone dataset19,20,21,22,23,24,25,26,27,28. During processing, the data was written to netCDF4 format. For information on access to (and guidelines for setup and usage of) the COWCLIP Fortran code, consult the Code Availability section.

Standard statistics - getStat.f and getStatDir.f

The getStat.f code was designed to estimate statistics valid for scalar variables (HS, Tm). The code was applied to each individual dataset separately19,20,21,22,23,24,25,26,27,28, enabling the calculation of seven wave statistics (mean, 10th, 50th, 90th, 95th, 99th percentiles, and maximum) for HS and Tm calculated for monthly, seasonal and annual time-frame resolutions. The seasonal statistics were computed on default seasons defined as DJF, MAM, JJA and SON. The output netCDF files derived from each individual dataset retained all the relevant metadata of the input file and the coordinate variables/statistics. The names of the output files contained the time-frames of the statistics processed and the temporal resolution of the input data.

The getStatDir.f code is analogous to the getStat.f, but it was designed to calculate circular statistics meaningful for directional variables such as θm. The code was applied to each standalone dataset19,20,21,22,23,24,25,26,27,28 (with available θm) providing 2 circular statistics (mean and standard deviation) over the time-frames described above (Table 2).

Extremes statistics - getHsEx.f

The getHsEx.f code was designed to calculate an ETCCDI set of extreme annual Hs indices from the sub-daily Hs input data19,20,21,22,23,24,25,26,27,28 (Table 1). The code was applied to each standalone dataset separately after concatenating the COWCLIP standard historical and future time-slices in a time sequence. A defined baseline period over 1986–2005 for relative statistics was adopted. The output netCDF files contained seven extreme statistics calculated annually (Table 3).

Table 3 Summary of the ETCDDI set of extreme significant wave height statistics included in the COWCLIP2.0 data set.

Data assembly method

The netCDF files generated from each standalone dataset using the code described above, were used as a basis to build the collection of global wave climate projections following the standardization framework (see Fig. 1)31,32. In addition to removing parameter uncertainty, we also removed time-slice uncertainty between the processed datasets by using standardized historical (1979–2004) and future projection (2081–2100) time-slices. In terms of future emission scenarios, we processed data for two representative concentration pathways (RCPs)37: RCP4.5 and RCP8.5 defining a medium stabilization (+4.5 W/m2 forcing by the end of 21st century) and a very high-emission scenario (+8.5 W/m2 forcing by the end of 21st century), respectively.

Before assembling, each independent netCDF file underwent a quality-control analysis. The relevant statistics were extracted from each file (i.e. derived from each standalone dataset). The data compliant with the COWCLIP standard time-slices for simulation (for each frequency resolution), was extracted, and then converted to a global grid at 1° spatial resolution. For consistency, a mask was applied to exclude areas that are not captured by the full ensemble set of simulations (e.g. some simulations did not consider particular enclosed/semi-enclosed areas and others did not archive model outputs across regions with latitudes >60°N or S). After the regridding process, a shoreline dataset was imposed on the full set of wave simulations to ensure consistency between all the gridded data at the shoreline. The resultant data is therefore temporally and spatially consistent, without ‘undesirable’ uncertainties that previously hampered intercomparison analysis. Users seeking particular simulations (i.e., original simulated data developed by a specific climate modelling group) can be obtained with the individual modelling groups or through a request via the COWCLIP portal (data accessibility).

Data Records

The full archived dataset38 comprising the different statistics described (consult the Data Generation Method) can be accessed through a Scientific Data recommended data repository: Australian Ocean Data Network (AODN) at DOI: 10.26198/5d91a9d00d60d.

The data set in total comprises 1372 files, with a total volume of 144 GB. The data is structured to mimic the DRS used for CMIP (and related data sets) and was specifically based on the DRS of the Coordinated Regional Downscaling Experiment (CORDEX)39 (as described in the CORDEX archive design: https://www.cordex.org/publications/report-and-document-archives/). This means a consistent directory structure and file naming convention is employed. Some wave modelling groups performed analysis across ensemble members within a GCM defined differently to the ‘r1i1p1’ definitions used within CMIP. Where this has occurred, the value for ‘ensemble’ in the DRS will take values relevant to that climate modelling group rather than standard CMIP5 values. The DRS adopted for the global COWCLIP2.0 dataset is as follows:

Directories

global/<modelling_centre>/<GCM>/<experiment>/ensemble>/<region>/<version>/<frequency>/<variable>

Filenames

<variable>_<region>_<modelling_centre>_<GCM>_<experiment>_<ensemble>_<frequency>_<start_date>-<end_date>.nc

Where <region> takes value “glob” and version is given in the form “vYYYYMM” (year/month). The Earth System Grid Federation convention is that files contain only one variable, however as we have produced three standard wave variables with two or seven statistical measures for each, as well as extremes statistics for annual Hs the files use <variable> values Hs, Tm, Dm, and HsEx, and each file contains multiple variables describing the statistics for that wave variable.

The data were made CF compliant by ensuring the ‘standard_name’ field was not erroneously used, variable ‘long_name’ was defined consistent with the Fortan90 code and units applied. No value for ‘_FillValue’ was provided and thus this has been omitted. Recommended global attributes are defined and included, drawing from the COWCLIP metadata table (Table 1) - which enable some additional compliance with the ACDD metadata standard.

Note that although every effort was made to ensure data adhered to both the CF and ACDD metadata conventions, the files are not strictly CF-compliant in time dimension - which uses units “years since” and “months since” the reference date. This is not advised by the CF convention since these values are ambiguous and depend on the calendar used. As the input data comes from CMIP5 models which use a variety of calendars and this information is not captured in the data generated by the getStat scripts, retrospectively applying calendar definitions was deemed to be less appropriate than using the more generic time definition, which is in line with the data produced by getStat.

Technical Validation

All contributing datasets have undergone previous validation, with each individual study providing a model-skill assessment of developed GCM-forced global wave simulations against waverider buoy observations, and/or wave hindcasts/reanalysis, as reference19,20,21,22,23,24,25,26,27,28. Comparison of model-skill between all simulations relative to two well-validated historical datasets have also been conducted34, allowing an intercomparison of all simulated wave data under a common reference dataset.

The data produced for publication was verified to be numerically unchanged between the submitted netCDF, intermediate Matlab matrix, and final netCDF files. Comparison of the GCM-forced global wave simulations against satellite altimetry data40 (between 1991–2017). Note that climate models are not constrained to reproduce the timing of natural climate variability in the ‘observational record’, and consequently, our climate model-driven wave simulations are not in phase with observations. Hence, we can test the performance of the climatology (distribution) of model vs altimeter wave heights only; Figs. 2 and 3 are examples of skill analysis that have been previously done with respect to satellite measurements.

Fig. 2
figure 2

Taylor diagram for annual mean of Hs (a) and Hs99 (b) of all global ocean region relative to the Satellite data over the period 1991–2017. The metrics shown are the spatial correlation (SC), normalized standard deviation (NSD) (given by σsim/σobs derived from a specific simulation and the satellite dataset40) and the centred-root-mean-square (CRMSD) difference. The SC is shown by the azimuthal angle, the normalized standard deviation is shown by the radial distance from the origin (i.e., satellite data) and the CRMSD is shown by the distance from the origin (the yellow lines). Each colour denotes a specific model forcing and each symbol a specific modelling group. The symbols with black outline denote the ensemble mean of each study group when suitable and the asterisk to the full multi-member ensemble mean.

Fig. 3
figure 3

Taylor diagram for annual mean of Hs in 3-sub regions (North Pacific Ocean, Tropical Pacific and South Indian Ocean) of global ocean relative to the satellite data over period 1991–2017, respectively. The metrics shown are the spatial correlation (SC), normalized standard deviation (NSD) (given by σsim/σobs derived from a given simulation and the satellite dataset40) and centred-root-mean-square (CRMSD) difference. The SC is shown by the azimuthal angle, the normalized standard deviation is shown by the radial distance from the origin (satellite data) and the CRMSD is shown by the distance from the origin (the yellow lines). Legend as per legend of Fig. 2.

Usage notes

The data is published via the Australian Ocean Data Network (AODN). The metadata record is available via GeoNetwork at ‘DOI’: 10.26198/5d91a9d00d60d. The dataset is accessible via the AODN THREDDS server (netCDF files) and can be accessed remotely using the OPeNDAP protocol at: http://thredds.aodn.org.au/thredds/catalog/CSIRO/Climatology/COWCLIP2/catalog.html. OPeNDAP is a protocol that allows netCDF files to be accessed from a remote server as though they were local on the file system. It is an effective mechanism to remotely subset files to extract only an area or time period of interest. This reduces the need for data replication and download. OPeNDAP file access is supported through most tools which permit analysis of the netCDF data files, including MATLAB, R, Python, ArcGIS and many others.

Due to the ambiguous nature of the time dimension defined without a calendar attribute, these files may display unexpected timestamps when read with some tools. We would advise the data consumer that use of this data with python’s Iris library or other libraries which depend on CF-compliance of the time dimension may be problematic.