A North Atlantic synthetic tropical cyclone track, intensity, and rainfall dataset

Xu, Wenwei; Balaguru, Karthik; Judi, David R.; Rice, Julian; Leung, L. Ruby; Lipari, Serena

doi:10.1038/s41597-024-02952-7

Download PDF

Data Descriptor
Open access
Published: 25 January 2024

A North Atlantic synthetic tropical cyclone track, intensity, and rainfall dataset

Scientific Data volume 11, Article number: 130 (2024) Cite this article

1320 Accesses
8 Altmetric
Metrics details

Subjects

A Publisher Correction to this article was published on 06 February 2024

This article has been updated

Abstract

Tropical Cyclones (TCs) cause significant socio-economic damages to the US and Caribbean coastal regions annually, making it important to understand TC risk at the local-to-regional scales. However, the short length of the observed record and the substantial computational expense associated with high-resolution climate models make it difficult to assess TC risk using either approach. To overcome these challenges, we developed a database of synthetic TCs using the Risk Analysis Framework for Tropical Cyclones (RAFT). The database includes 40,000 synthetic TC tracks, along-track intensities and storm-induced precipitation. TC tracks generated in RAFT are in reasonable agreement with the observed spatial distribution of TC tracks and basin-scale TC statistics. Specifically along the coast, spatial variations in TC crossing probability and extreme winds upon landfall are well-reproduced by RAFT with R-squared values of 0.81 and 0.73, respectively. In summary, the synthetic TC database constructed with RAFT provides a reasonable pathway for the robust assessment of North Atlantic TC wind and rainfall risks.

Estimation of global tropical cyclone wind speed probabilities using the STORM dataset

Article Open access 10 November 2020

Generation of a global synthetic tropical cyclone hazard dataset using STORM

Article Open access 06 February 2020

The Imperial College Storm Model (IRIS) Dataset

Article Open access 24 April 2024

Background & Summary

Tropical cyclones (TCs) are among the most destructive natural hazards for the North and Central American regions, and are responsible for 71% of fatalities and 78% of economic losses due to weather extremes over the period 1970–2019¹. Within the United States (US), they have caused over 6,500 fatalities and inflicted economic damages of about $1.1 trillion over the last 40 years². This makes it critical to understand the risk associated with Atlantic TCs at the regional-to-local scales, where damages manifest most profoundly³. Using observations to quantify TC risk is difficult, considering the short length of the reliable TC record and the fact that annually only 1–2 storms make US landfall on average⁴. While high-resolution climate models that can explicitly resolve TCs may provide a viable alternative, simulating a large number of storms using such models typically incurs a high computational cost.

To address these issues of data-scarcity and computational expense, a suggested approach has been to generate a synthetic TC record to supplement the historical record, providing a more complete picture of TC characteristics at regional-to-local scales. This approach was likely first introduced by Vickery et al.⁵, who generated a set of synthetic storm tracks in the Atlantic basin using a fully statistical approach. This method was improved upon in 2006 by Emanuel et al.⁶ by coupling two track generation models with a deterministic intensity model, ensuring that storm intensity conforms to the underlying physics. A similar statistical-dynamical approach was adopted by Lee et al.⁷ to simulate synthetic global tracks with an environmental index-based genesis model and an auto-regressive intensity model. More recently, Bloemendaal et al.⁸ developed a purely statistical algorithm to generate synthetic TC tracks at the global scale by means of synthetic resampling. Another recent study⁹ coupled a statistical Markov renewal TC track model with a physics-based rainfall model to simulate TC rainfall over South Korea. Further, several of these studies have made their data publicly available^7,8. However, the data provided only includes synthetic TC tracks and along-track intensities, and does not provide estimates of TC-induced rainfall over the North Atlantic region.

While the high winds associated with TCs directly cause substantial damage to buildings and structures, storm-induced flooding and the generation of coastal storm surge additionally have devastating impacts on life and infrastructure. For instance, the torrential rain from TC Harvey (2017) resulted in catastrophic flooding in the Houston metropolitan area, estimated to have been responsible for economic damages of about $97 billion¹. Later, in September 2018, TC Florence made landfall near Wilmington, North Carolina and became the ninth most intense hurricane to affect the US, largely due to extreme rainfall and flooding exacerbated by its slow progression¹⁰. This highlights the need for a TC database that includes precipitation in addition to winds to more holistically represent TC risk. To address this, we have developed a database of synthetic TCs using the Risk Analysis Framework for Tropical Cyclones (RAFT), a unified framework capable of simulating tens of thousands of TCs and their impacts based on the large-scale environment^11,12. RAFT consists of several different components coupled into one cohesive system, as illustrated in Fig. 1. RAFT uses established methods for storm genesis⁶, track translation⁶, and precipitation^13,14, while the intensity model is a novel deep neural network-based approach developed for this framework¹². We believe the data generated by RAFT is unique and valuable to both the scientific community and the general public by nature of the unique combination of RAFT’s salient features, as outlined below:

A variety of methods have been evaluated for each component of RAFT – many derived from established methods in the field of TC modeling – and we have selected those with the best balance between performance and computational efficiency. The resulting system utilizes a variety of techniques, from physical equations to linear models to deep neural networks.
We include storm-induced rainfall in our dataset, which despite being one of the primary drivers of TC risk² is generally not provided in other synthetic TC datasets. Since wind and rainfall parameters are generated together in a coupled approach, our method allows for a more consistent simulation and understanding of secondary risks such as compound flooding and power outages.
RAFT is computationally efficient, allowing many thousands of storms to be generated in a reasonable amount of time. Specifically, it takes roughly 3 hours to generate 40,000 synthetic TCs using the track model and 5 hours to calculate intensity labels when run on a desktop with four Intel Xeon CPUs @ 2.80 GHz each and 32 GB of memory. This allows RAFT to be used for TC risk assessment at a fine resolution over large spatial and temporal scales.
Importantly, RAFT can be run using simulated environmental conditions, such as those generated by climate models. It follows that RAFT can generate TCs representative of not just the historical or present climate, but potential future climates as well.

Using RAFT, we generated 40,000 synthetic TC tracks for the North Atlantic based on the current climate conditions obtained from ERA5¹⁵ reanalysis for the period 1979–2018. These climatic conditions – variables including large scale wind, sea-surface temperature, air temperature and humidity – are used to determine the various parameters of our synthetic storms. For each TC, we model its track, intensity, and rainfall pattern (0.08 resolution) at 6-hourly time steps. We make this data publicly available to facilitate a deeper understanding and further analysis of North Atlantic TC risk. While the RAFT framework has previously been applied to investigate the evolution of TC risk in the US¹¹, here we provide a more thorough description of each of the framework components, perform an in-depth validation of the output, and discuss the constituent models’ limitations and potential for generalizability.

Methods

Track

After TCs are generated as a random draw from a Gaussian kernel-based probability distribution centered around historically observed TC genesis locations⁶, TC tracks are subsequently modeled using a beta-advection method following Marks et al.¹⁶ and Emanuel et al.⁶. Essentially, this method assumes that a TC track is governed primarily by the background steering flow, which is estimated as a weighted mean of the large scale wind at 850 hPa and 200 hPa levels. While Emanuel et al.⁶ uses 850 and 250 hPa winds, we chose to use 850 and 200 hPa winds as implemented in Marks et al.¹⁶ to be consistent with how wind shear is calculated within the intensity model. For generation of the synthetic zonal and meridional wind values at these levels, three components are used: the mean monthly winds in the zonal and meridional directions at 850 hPa and 200 hPa, the covariance component between these four wind values, and four Fourier series per track which model periodic fluctuations. The resultant synthetic wind values fluctuate around the monthly mean with both coupled and individual stochastic contributions. For more details on the track model method, readers should refer to Emanuel et al.⁶. This flow-driven movement is then modified by the addition of a beta-drift correction term V_β, used to generate the movement of the cyclone each hour as:

$${{\rm{V}}}_{{\rm{track}}}=\alpha {{\rm{V}}}_{850{\rm{mPa}}}+\left(1-\alpha \right){{\rm{V}}}_{200{\rm{mPa}}}+{{\rm{V}}}_{\beta }$$

(1)

where α is a constant weight set to 0.8.

Rather than using a constant beta-drift correction as in Emanuel et al., the beta-drift here varies spatially, and is determined by a linear regression fitted on the large scale wind and latitude as suggested in Wu & Wang¹⁷ and Zhao et al.¹⁸ and implemented in Kelly et al.¹⁹. The tracks are simulated until one of the following conditions is met: the storm has lasted for 30 days; the track travels outside of the study domain (4°N – 50°N, 100°W – 5°W); or the intensity falls below a predefined threshold (see next section).

Intensity

Every 6 hours, the TC intensity (10 m maximum sustained 1-minute wind speed) at the current track location is estimated using the deep learning model as developed in Xu et al.¹². This model, trained on global Statistical Hurricane Intensity Prediction Scheme (SHIPS) predictors²⁰ from 1982 to 2021, forecasts 6-hourly intensity changes in the North Atlantic using a multilayer perceptron with multiple feed-forward, fully connected layers. Each hidden layer computes a linear combination of the outputs from the preceding layer and applies a nonlinear activation function, enabling the model to effectively capture complex, nonlinear relationships among a multitude of predictors. The initial maximum wind speed is set to be 33 knots (17 m/s), the beginning of Tropical Storm designation according to the Saffir-Simpson scale. The original intensity model used the following 9 environmental variables to estimate the intensity change: the current intensity, the last 6-hour intensity change, vertical wind shear, 200 hPa zonal wind, maximum potential intensity, latitude, longitude, 1000 hPa equivalent potential temperature, and distance to major landmass. Since then, we have made further improvements to the model. First, we added two additional variables inspired by DeMaria et al.²⁰ to account for the dry Saharan air layer and the westerly wind: low-level (850–700 hPa) relative humidity and the zonal component of the storm motion. Next, following Lee et al.⁷, the distance to major landmass predictor is replaced with the percentage of landmass within a 500 km distance from the storm center. The use of landmass percentage instead of distance to land helped correct some issues that were noted in the earlier version (e.g. storm intensity dropped too quickly when passing through the islands of Cuba and Hispaniola). While steering flow, vertical shear, storm longitude and latitude come from the track model, other variables like low-level relative humidity, maximum potential intensity and equivalent potential temperature come from the environmental conditions in ERA5 reanalysis.

Further, we also adopted the stochastic error component from the autoregressive intensity model described in Lee et al.²¹ and applied it to our deep learning model. The stochastic component works by randomly drawing errors from the training data, conditioned on the initial intensity. Since IBTrACS intensities are rounded to the nearest 5 kt interval and RAFT-predicted intensities are continuous, we obtain randomly drawn errors corresponding to the nearest two training intensity values. The probability of picking from either set of errors is based on the distance between the storm’s current intensity and the nearest two training intensities. For example, if we want to predict the intensity change for a model storm that is currently at 54 kt, the relevant training intensities would be 50 kt and 55 kt. In this situation, the model will draw an error randomly from the 55 kt bin $\frac{4}{5}$ of the time, and the 50 kt bin $\frac{1}{5}$ of the time. The final predicted 6-hour intensity change is then computed as the ML model’s predicted value plus this randomly drawn error correction.

While the track model uses monthly data to construct Fourier series as a way to generate random fluctuations, for intensity we use daily ERA5 reanalysis data¹⁵ to directly leverage daily variations of environmental conditions. To expose each synthetic TC to a variety of environmental conditions, a year between 1979 and 2018 is randomly assigned, and the ERA5 daily environmental conditions from that year are used as inputs. The intensity model continues along the track until the intensity drops below 25 kt. TCs lasting less than one day are considered transient and therefore removed from the dataset.

TC intensity is closely related to pressure differentials – specifically the low pressure at the storm center – and the radius of maximum winds (R_max). Generally, as the central pressure decreases, TC intensity increases and R_max tends to contract, or decrease, thereby generating stronger winds and more severe weather conditions. To quantify the low pressure at the storm center, we fitted a linear regression between minimum sea level pressure and the maximum wind speed (v_max) in knots. We trained the model using 2001–2016 North Atlantic data, and used 2017–2020 for testing. The model is able to explain 88% of the variance for the testing years, with a RMSE of 7.11 hPa. We compute pressure (hPa) given v_max as:

$$Pressure=1030.222-0.730\cdot {v}_{max}$$

(2)

Analytical wind generation also requires R_max as input, and here the R_max is calculated from storm intensity and latitude using a linear regression with logarithmic transformations fitted with 2001–2016 North Atlantic data, following Willoughby & Rahn²²:

$${R}_{max}=48.7\cdot \exp \left(0.0163637\cdot {\rm{latitude}}-0.01450866\cdot {v}_{max}\right)$$

(3)

where v_max is in kt, and the resulting R_max is in nautical miles.

Rainfall

TC-induced rainfall is simulated using the Tropical Cyclone Rainfall (TCR) model, a 4-component physics-based approach^13,14,23. The theory behind the model is that the TC precipitation rate P_rate is proportional to the upward vapor flux, which can be obtained by multiplying the vertical velocity (w) above the boundary layer with the the 925–1000 hPa specific humidity q_s. This P_rate~wq_s relationship is theoretically and empirically validated in Lu et al.¹⁴, where a thorough derivation and justification of each step of the calculation can be found. The final precipitation (in mm/hr) is computed in the TCR model by multiplying this upward vapor flux wq_s by a constant precipitation efficiency ε_p and the ratio of the densities of air ρ_air to liquid water ρ_liquid:

$${P}_{rate}={\varepsilon }_{p}\frac{{\rho }_{air}}{{\rho }_{liquid}}w{q}_{s}$$

(4)

A value of 0.0012 is used for the ratio of densities of air to water, and ε_p is set at 0.9. The vertical velocity (w) is estimated as the sum of contributions of four physical components (surface friction (w_f), topography (w_h), baroclinicity due to wind shear (w_s), and vortex stretching (w_t)):

$$w={w}_{f}+{w}_{h}+{w}_{s}+{w}_{t}$$

(5)

These components are estimated at each grid point over a 2-dimensional spatial domain as follows:

$${w}_{f}=-\frac{1}{r}\frac{\partial }{\partial r}\left({r}^{2}\frac{{\tau }_{\theta s}}{\partial M/\partial r}\right)$$

(6)

$${w}_{h}={\bf{V}}\cdot \Delta h$$

(7)

$${w}_{t}={H}_{b}\frac{1}{r}\frac{\partial }{\partial r}\left(r\frac{\partial M/\partial t}{\partial M/\partial r}\right)$$

(8)

$${w}_{s}=0.5fV\left({{\bf{S}}}_{200hPa-850hPa}\cdot {\bf{j}}\right)$$

(9)

The physical components are calculated using the following set of parameters:

r, distance to the storm center
τ_θs, azimuthal surface stress (modeled as a function of surface drag coefficient and wind speed)
M, angular momentum of wind
V, wind velocity vector
h, topographic height
H_b, representative depth scale of the lower troposphere
f, Coriolis parameter
∂M/∂t, rate of change of the angular momentum with respect to time
S_{200hPa–850hPa}, wind shear computed as the difference between geostrophic wind and the 200 hPa and 850 hPa levels (V_{g, 200hPa}–V_{g, 850hPa})
j, unit vector pointing radially away from the center of the storm

The rainfall generation model relies on the synthetic surface wind field, which is generated analytically according to Holland et al.²⁴ and as a function of R_max. Analytical wind and rainfall are generated at hourly time steps, and hourly TC locations and intensities are linearly interpolated from the 6-hourly track and intensity simulations.

Input data sources

Most atmospheric parameters are from ECMWF’s ERA5¹⁵ reanalysis, including large scale wind, relative humidity, wind shear, maximum potential intensity, equivalent potential temperature, and surface roughness (used in the frictional component of the rainfall model). US topographic elevation is derived from the USGS GMTED2010 project²⁵.

Data Records

The synthetic TC dataset²⁶ generated by RAFT is publicly accessible at https://zenodo.org/doi/10.5281/zenodo.10392723, which includes 40,000 simulated storm events with each 6-hourly track location, along-track intensity (maximum wind speed and minimum pressure), and radius of maximum winds. The NetCDF4 file named “RAFT.NA.v20231016.nc” contains the complete set of variables for the synthetic TCs, as described in Table 1.

Table 1 Variables used in the RAFT dataset NetCDF file.

Full size table

Additionally, for each synthetic event, we provide detailed data on storm-lifetime accumulated precipitation at each 0.08 by 0.08 degree grid point. This data is available in individual NetCDF4 files, named according to the convention “modeled_rainfall_ERA5_syn_i.h5”, where the i is the synthetic storm’s ID number. “ERA5” denotes the source of the reanalysis inputs, while “syn” clarifies that this file is for a synthetic track. Each of these files comprises five variables: total accumulated rainfall (p_accum) and its four constitutive vertical wind-induced rainfall components, frictional (p_accum_f), topographic (p_accum_h), shear (p_accum_s), and vortex stretching (p_accum_t), all measured in the unit of total millimeters of precipitation. During the calculation, each component of vertical wind is allowed to negatively contribute to the total precipitation, however in the final output of total precipitation and decomposition of accumulated precipitation, we convert any negative precipitation values to zeros to avoid misinterpretation. The rainfall data is presented on a regular spatial grid defined in “RAFT_rainfall_latlon_grid.h5”, which describes the grid in two variables, lat and lon.

Technical Validation

Spatial distribution of TCs

To assess the realism with which synthetic TCs are generated using RAFT, we begin by examining their spatial distribution. Figure 2a,b show 200 randomly selected TCs from observations and RAFT, respectively. Overall, the RAFT-simulated tracks match the observations from IBTrACS remarkably well, adequately capturing TC translation behavior in the Atlantic basin. There is a noticeable difference in the spatial distribution of sampled cyclogenesis points, where RAFT includes tracks initiated south of 10°N unlike what appears in the observations. This could partly be attributed to the methodology of RAFT cyclogenesis, which uses a 3-dimensional Gaussian kernel to expand the regions from where cyclones can originate beyond the observed subset of genesis points. RAFT tracks broadly tend to be clustered further offshore in the North Atlantic compared to observations, which is illustrated in the climatological track density or Tropical Cyclone Frequency (TCF) maps shown in Fig. 2c,d. TCF is defined as the number of 6-hourly TC track locations where the storm intensity exceeds 25 kt per square 2.5 degrees per TC event. Specifically along the coast, RAFT underestimates TCF when compared with observations, and landfalling TCs in RAFT are seen to dissipate sooner than observed TCs. This underestimation of track density near the coast is likely due to a systematic bias in the intensity model near land, which we plan to address in the future by developing a dedicated near-coastal intensity model. Despite these points, the spatial distribution of TCs is reasonably reproduced in RAFT, with a pixel-wise correlation coefficient of 0.87 (p < 0.001) when compared with observations.

Salient features of TCs

To further understand the simulation of TCs using RAFT, we now explore the representation of TC properties particularly relevant for impacts. RAFT captures the duration of TCs remarkably well, as illustrated in the lifespan distribution in Fig. 3a. Distributions of TC translation speeds (Fig. 3d) indicate that RAFT TCs move at comparable speeds to observed TCs, as indicated by the mean 6-hourly displacements for RAFT and observations which are 1.33° and 1.27°, respectively (Table 2). The third important aspect of TCs that we consider here is the intensification rate (Fig. 3c), defined as the change in TC intensity over a 24-hr period. The TC intensification rate distributions are remarkably consistent between the RAFT simulations and observations. Importantly, RAFT well represents the occurrence of TC rapid intensification (RI), defined as an instance where a storm increases in intensity by 30 kt or higher in 24 hours. RI occurs roughly 3.4% of the time in RAFT, closely aligning with the 3.8% occurrence rate in observational data. This is a major advantage of RAFT, considering the difficulty associated with simulating RI even for high-resolution dynamical models²⁷.

Table 2 Mean and standard deviation values for TC characteristics in the IBTrACS and RAFT datasets over the North Atlantic basin.

Full size table

Next, we compare distributions of TC lifetime maximum intensity (LMI), defined as the peak one-minute sustained wind speed reached during the lifetime of a TC. The PDF of LMI for RAFT simulations similarly resembles the LMI of observed TCs (Fig. 3b), including for storms that reach Category 3 and higher strength. This strong agreement between RAFT simulations and observations is further illustrated in Table 2. The mean LMI for RAFT simulations is 64.88 kt, similar to the observed mean LMI of 68.09 kt. Similarly, the mean maximum wind speed simulated in RAFT is 54.88 kt, closely matching the observed mean of 53.24 kt. However, there is a noticeable disparity in the mean TC radius of maximum winds (R_max), with RAFT’s average of 36.86 nautical miles reflecting a 19% underestimation when compared with the corresponding observed mean R_max of 45.28 nautical miles, as detailed in Table 2. Lastly, the mean lifetime minimum central pressure is 982.84 hPa, remarkably similar to the observed value of 979.44 hPa.

Coastal TC characteristics

So far, we have examined some basin-scale aspects of TCs simulated in RAFT. However, one main goal of producing the RAFT-based synthetic TC dataset is its potential use for risk assessment. Therefore, we will now focus our discussion on TC characteristics in coastal regions. Figure 4a depicts the observed TC crossing probability for 51 major U.S. cities. Historically, more TC crossings have occurred over the US Southeast Coast, followed by the Gulf Coast and the Northeast Coast. The spatial distribution of TC crossing is reasonably reproduced in RAFT (Fig. 4b) and is in broad agreement with observations despite some differences. For instance, RAFT tends to underestimate landfall over the Southeast Coast and the Gulf Coast. RAFT is however able to capture 81% of the observed variance in TC crossing probability for major U.S. cities, thereby modeling the spatial distribution of TC landfall and recurving fairly well (Fig. 4c). Next, we investigate the ability of RAFT to accurately represent TC winds in US coastal regions. Since extreme TC winds have the strongest impacts, here we consider the 99^th percentile TC winds upon landfall. The observed distribution of extreme TC winds reveals that most regions along the northern Gulf Coast, Florida peninsula and the Southeast US Coast have historically experienced the most devastating TC winds exceeding 90 kt, which is close to Category 3 strength (Fig. 5a). When considering the corresponding distribution produced by RAFT, we see that RAFT underestimates the magnitude of extreme TC winds by about 8 kt (Fig. 5b), particularly over the Gulf and Southeast Coasts. Despite this general underestimation, RAFT adeptly simulates several instances of intense TC landfalls. For example, extreme TC winds impacting New Orleans are close to Category 3 strength, and exceed Category 4 strength for Miami in both the observations and RAFT simulations. Overall, the spatial distribution of extreme TC winds (Fig. 5c) is well represented in RAFT with an R-squared value of 0.73 when compared with observations.

Finally, we evaluate TC-induced rainfall in coastal regions based on RAFT using daily rainfall data from the Global Historical Climate Network Daily Database (GHCN)²⁸ rain gauges. We select GHCN rain gauges within 2° of the US Atlantic and Gulf Coasts that have complete recordings of daily rainfall since 1979. Since rain gauge measurements are prone to underestimation due to the high winds associated with TC events, rain gauge measurements are bias-corrected using the wind-correction function by the World Meteorological Organization²⁹. Observed TC tracks from 1979 to 2018 are then used to estimate which precipitation is TC-induced, where a daily rainfall recording is considered to be TC-induced if an observed TC track passes within 500 km of the gauge during that day. We compare these TC rainfall observations with those simulated by RAFT. Despite potential limitations (e.g., observed precipitation within 500 km of a TC may have been generated by other storm systems, and larger TCs can induce precipitation outside of a 500 km radius), this method provides an indication of where RAFT may systematically over- or underestimate TC-induced rainfall.

Figure 6a shows the observed climatology of mean expected rainfall per event, a metric that represents the combined influence of TC track spatial distribution, storm intensity, and rainfall efficiency. TCs tend to produce more rainfall over the northern Gulf Coast, Florida peninsula and the Southeast Coast, and less rainfall over the Northeast Coast and the western Texas Coast. RAFT is able to broadly capture this spatial pattern (Fig. 6c) albeit with a weaker magnitude. In other words, RAFT broadly underestimates TC rainfall everywhere except for over the Florida peninsula (Fig. 6e). The mean error is −1.7 mm, with an undercatch of about 35%. This underestimation is likely due to the underestimation of coastal TCF and landfall intensity, as well as bias from the rainfall model. Despite this, RAFT-predicted mean expected rainfall has a statistically significant pixel-wise correlation coefficient of 0.86 (p < 0.001) when compared with observations, indicating that RAFT effectively captures mean expected rainfall spatial patterns within 2° from the coast. We also evaluate extreme rainfall from TCs using 99^th percentile daily rainfall. The spatial pattern of extreme TC rainfall (Fig. 6d) simulated by RAFT is broadly similar to the RAFT-simulated mean expected rainfall (Fig. 6c), with a high correlation coefficient of 0.95. However, the observed pattern of extreme TC rainfall (Fig. 6b) is patchy with larger values more widely distributed, having a correlation of only 0.60 with observed mean expected rainfall (Fig. 6a), indicating that extreme rainfall patterns may be more complex than what TCR is capable of modeling. Observed extreme TC rainfall appears more patchy than what is simulated by RAFT likely due to localized extreme events in the IBTrACS dataset. For example, the observed maxima occurs near the Texas Coast, and is likely associated with the torrential downpour from TC Harvey (2017). The difference map of extreme TC rainfall (Fig. 6f) further indicates that RAFT underestimates the magnitude over almost the entire coast except for the Florida peninsula. The mean bias in expected 99^th percentile daily rainfall simulated by RAFT is −39 mm (underestimation by 27% of the observed mean) and the pixel-wise correlation between RAFT-simulated 99^th percentile daily rainfall and that of observations is relatively lower (r = 0.39, p < 0.001). This broad underestimation and spatial bias is likely attributed to TC rainfall simulation challenges in addition to RAFT’s underestimation of TC frequency and winds upon landfall, noted previously (Figs. 4b, 5b). Nevertheless, RAFT is able to simulate some extreme rainfall events near major coastal cities, as illustrated in Fig. 7 with their respective along-track intensity and accumulated precipitation.

Usage Notes

RAFT is a computationally efficient framework that can be used to simulate numerous synthetic TCs along with their properties. Here, we utilize RAFT to generate 40,000 synthetic TCs reflecting the current North Atlantic TC climatology using climate conditions of the last 40 years²⁶. TC tracks, along-track intensity (maximum wind speed and minimum sea level pressure), radius of maximum winds and TC-induced rainfall are provided for comprehensive TC hazard assessment. The RAFT-generated dataset presented here enables the study of several downstream risks pertinent to the TC risk modeling community, including storm surge, freshwater flooding, wind hazard, and electrical power outages. The rainfall component of RAFT, which is not included in many similar approaches^6,7,8, makes our dataset unique and valuable for assessing TC flood risk.

In the pursuit of generating synthetic TCs, it is imperative to critically examine the inherent limitations, assumptions, and potential drawbacks associated with various model components that collectively constitute the RAFT model. The genesis and track models used here follow Emanuel et al.⁶ by generating storms as a random draw from the probability distribution of historical cyclone genesis points, then moving storms according to a weighted average of the ambient flow plus beta-drift. Notably, the track method employed here cannot capture the effects of nonlinear interactions between tropical and extratropical systems, therefore caution should be exercised in these cases. Since the beta-drift term is obtained from a specific pattern of large-scale environmental flows, adaptations may be necessary when applying this relationship to future climate scenarios. The intensity model used here has been trained on global data and thus may be extended globally, while at this time it has only been tested in the Atlantic basin. Additionally, as with any deep learning model, performance is expected to be better in regions with more frequent TC events than those with sparse events, which in part might explain the intensity model’s land-ocean bias. Lastly, the rainfall model implemented in RAFT is the physics-based TCR model^13,14,30 which we find to broadly underestimate TC-induced rainfall along the Atlantic and Gulf Coasts, and overestimate rainfall over the Florida peninsula. It’s important to highlight that this model assumes a fixed precipitation efficiency and is highly sensitive to the drag coefficient C_d, performing less satisfactorily when representing regions with large gradients in surface roughness²³. Xi et al.³⁰ found that TCR overestimates rainfall at the core and underestimates rainfall outside of the core, and other studies have found that TCR may not handle extratropical transition very well^6,14,23, thus RAFT track and precipitation accuracy is expected to be best in areas where TCs have not yet transitioned to extratropical storms. Overall, the ability of these models to produce a multitude of synthetic TCs makes RAFT a valuable tool for advancing our understanding of TCs and improving preparedness for severe weather events.

RAFT excels in simulating spatial variability in TC characteristics relevant for risk, such as TCF and the distribution of landfall locations. It also effectively captures basin-scale TC characteristics like translation speed, intensity, and rapid intensification. However, RAFT tends to underestimate extreme winds at landfall (Fig. 5b) and TC-induced precipitation over coastal regions (Fig. 6). Given these limitations, caution is advised when interpreting extreme wind and rainfall results at local-to-regional scales. Statistical bias-correcting techniques, such as quantile mapping³¹, can be employed to overcome these limitations and improve the potential of this data for risk assessment. While the track and intensity components of RAFT are interconnected, the framework offers flexibility for users, allowing the implementation of an alternative rainfall model suiting their requirements. To further address the challenges of simulating extreme rainfall near the coast, we are currently developing a deep-learning rainfall model specifically tailored for coastal risk assessment. Additionally, ongoing research efforts are focused on improving the spatial representation of storms and evaluating their impacts on risks such as urban flooding, power outages and damage to infrastructure. Our approach is also adaptable for future climate scenarios since RAFT can be driven by climate model simulations, and our upcoming work includes releasing RAFT simulations driven by CMIP6 future climate scenarios. Moreover, as RAFT is currently confined to the North Atlantic, plans are underway to expand its domain and increase the capability of RAFT to simulate synthetic TCs on a global scale.

Code availability

Code to read and process the RAFT synthetic TC dataset can be found at https://doi.org/10.5281/zenodo.7976242.

Change history

06 February 2024
A Correction to this paper has been published: https://doi.org/10.1038/s41597-024-03040-6

References

World Meteorological Organization. WMO Atlas of Mortality and Economic Losses from Weather, Climate and Water Extremes (1970–2019), vol. 1267 of WMO (World Meteorological Organization, Geneva, 2021).
NOAA Office For Coastal Management. Hurricane costs. https://coast.noaa.gov/states/fast-facts/hurricane-costs.html. Accessed 8-2-2021.
Sobel, A. H. et al. Tropical cyclone frequency. Earth’s Future 9, e2021EF002275, https://doi.org/10.1029/2021EF002275 (2021).
Article ADS Google Scholar
Klotzbach, P. J., Bowen, S. G., Pielke, R. & Bell, M. Continental US hurricane landfall frequency and associated damage: Observations and future risks. Bulletin of the American Meteorological Society 99, 1359–1376, https://doi.org/10.1175/BAMS-D-17-0184.1 (2018).
Article ADS Google Scholar
Vickery, P. J., Skerlj, P. F. & Twisdale, L. A. Simulation of hurricane risk in the U.S. using empirical track model. Journal of Structural Engineering 126, 1222–1237, https://doi.org/10.1061/(ASCE)0733-9445(2000)126:10(1222) (2000).
Article Google Scholar
Emanuel, K., Ravela, S., Vivant, E. & Risi, C. A statistical deterministic approach to hurricane risk assessment. Bulletin of the American Meteorological Society 87, 299–314, https://doi.org/10.1175/BAMS-87-3-299 (2006).
Article ADS Google Scholar
Lee, C.-Y., Tippett, M. K., Sobel, A. H. & Camargo, S. J. An environmentally forced tropical cyclone hazard model. Journal of Advances in Modeling Earth Systems 10, 223–241, https://doi.org/10.1002/2017MS001186 (2018).
Article Google Scholar
Bloemendaal, N. et al. Generation of a global synthetic tropical cyclone hazard dataset using STORM. Scientific Data 7, 40, https://doi.org/10.1038/s41597-020-0381-2 (2020).
Article PubMed PubMed Central Google Scholar
Alcantara, A. L. & Ahn, K.-H. Estimation of tropical cyclone (tc) rainfall risk in south korea using the integrated tc track and semi-physical tc rainfall models. International Journal of Climatology 43, 2776–2793 (2023).
Article ADS Google Scholar
Callaghan, J. Extreme rainfall and flooding from Hurricane Florence. Tropical Cyclone Research and Review 9, 172–177, https://doi.org/10.1016/j.tcrr.2020.07.002 (2020).
Article Google Scholar
Balaguru, K. et al. Increased US coastal hurricane risk under climate change. Science advances 9, eadf0259, https://doi.org/10.1126/sciadv.adf0259 (2023).
Article PubMed PubMed Central Google Scholar
Xu, W. et al. Deep learning experiments for tropical cyclone intensity forecasts. Weather and Forecasting 36, 1453–1470, https://doi.org/10.1175/WAF-D-20-0104.1 (2021).
Article ADS Google Scholar
Zhu, L., Quiring, S. M. & Emanuel, K. A. Estimating tropical cyclone precipitation risk in Texas. Geophysical Research Letters 40, 6225–6230, https://doi.org/10.1002/2013GL058284 (2013).
Article ADS Google Scholar
Lu, P., Lin, N., Emanuel, K., Chavas, D. & Smith, J. Assessing hurricane rainfall mechanisms using a physics-based model: Hurricanes Isabel (2003) and Irene (2011). Journal of the Atmospheric Sciences 75, 2337–2358, https://doi.org/10.1175/JAS-D-17-0264.1 (2018).
Hersbach, H. et al. The ERA5 global reanalysis. Quarterly Journal of the Royal Meteorological Society 146, 1999–2049, https://doi.org/10.1002/qj.3803 (2020).
Article ADS Google Scholar
Marks, D. G. The beta and advection model for hurricane track forecasting. NOAA technical memorandum NWS NMC (1992).
Wu, L. & Wang, B. Assessing impacts of global warming on tropical cyclone tracks. Journal of Climate 17, 1686–1698, 10.1175/1520-0442(2004)017<1686:AIOGWO>2.0.CO;2 (2004).
Zhao, H., Wu, L. & Zhou, W. Observational relationship of climatologic beta drift with large-scale environmental flows. Geophysical Research Letters 36, https://doi.org/10.1029/2009GL040126 (2009).
Kelly, P. et al. Shape of Atlantic tropical cyclone tracks and the Indian monsoon. Geophysical Research Letters 45, 10,746–10,755, https://doi.org/10.1029/2018GL080098 (2018).
Article Google Scholar
DeMaria, M., Mainelli, M., Shay, L. K., Knaff, J. A. & Kaplan, J. Further improvements to the Statistical Hurricane Intensity Prediction Scheme (SHIPS). Weather and Forecasting 20, 531–543, https://doi.org/10.1175/WAF862.1 (2005).
Article ADS Google Scholar
Lee, C.-Y., Tippett, M. K., Sobel, A. H. & Camargo, S. J. Autoregressive modeling for tropical cyclone intensity climatology. Journal of Climate 29, 7815–7830 (2016).
Article ADS Google Scholar
Willoughby, H. E. & Rahn, M. E. Parametric representation of the primary hurricane vortex. Part I: Observations and evaluation of the Holland (1980) model. Monthly Weather Review 132, 3033–3048, https://doi.org/10.1175/MWR2831.1 (2004).
Article ADS Google Scholar
Feldmann, M., Emanuel, K., Zhu, L. & Lohmann, U. Estimation of Atlantic tropical cyclone rainfall frequency in the United States. Journal of Applied Meteorology and Climatology 58, 1853–1866, https://doi.org/10.1175/JAMC-D-19-0011.1 (2019).
Article ADS Google Scholar
Holland, G. J., Belanger, J. I. & Fritz, A. A revised model for radial profiles of hurricane winds. Monthly Weather Review 138, 4393–4401, https://doi.org/10.1175/2010MWR3317.1 (2010).
Article ADS Google Scholar
Danielson, J. J. & Gesch, D. B. Global multi-resolution terrain elevation data 2010 (GMTED2010). USGS Numbered Series 2011–1073, U.S. Geological Survey. https://doi.org/10.3133/ofr20111073 (2011).
Xu, W. et al. North Atlantic synthetic tropical cyclone track, intensity, and rainfall dataset from RAFT, Zenodo, https://doi.org/10.5281/zenodo.10392723 (2023).
Patricola, C. M. & Wehner, M. F. Anthropogenic influences on major tropical cyclone events. Nature 563, 339–346, https://doi.org/10.1038/s41586-018-0673-2 (2018).
Article ADS CAS PubMed Google Scholar
Menne, M. J., Durre, I., Vose, R. S., Gleason, B. E. & Houston, T. G. An overview of the Global Historical Climatology Network-Daily database. Journal of Atmospheric and Oceanic Technology 29, 897–910, https://doi.org/10.1175/JTECH-D-11-00103.1 (2012).
Article ADS Google Scholar
Sevruk, B. & Hamon, W. International comparison of national precipitation gauges with a reference pit gauge, instruments and observing methods. Rep 17 (1984).
Xi, D., Lin, N. & Smith, J. Evaluation of a physics-based tropical cyclone rainfall model for risk assessment. Journal of Hydrometeorology 21, 2197–2218 (2020).
Article ADS Google Scholar
Cannon, A. J., Sobie, S. R. & Murdock, T. Q. Bias correction of GCM precipitation by quantile mapping: How well do methods preserve changes in quantiles and extremes? Journal of Climate 28, 6938–6959, https://doi.org/10.1175/JCLI-D-14-00754.1 (2015).
Article ADS Google Scholar

Download references

Acknowledgements

This work was supported by the Multisector Dynamics and Regional and Global Model Analysis program areas of the U.S. Department of Energy (DOE), Office of Science, Office of Biological and Environmental Research as part of the multi-program, collaborative Integrated Coastal Modeling (ICoM) project. This research used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility located at Lawrence Berkeley National Laboratory, operated under Contract No. DE-AC02-05CH11231 using NERSC award BER-ERCAP0024320. The Pacific Northwest National Laboratory is operated for DOE by Battelle Memorial Institute under contract DE-AC05-76RL01830. We thank the Chief Editor Dr. Guy Jones, Editorial Board Member Dr. Soroosh Sorooshian, and the two anonymous reviewers for their insightful comments that helped improve this manuscript significantly. We also acknowledge Dr. Effy Baby John and Nicholas Lalo for help reproducing and testing the parametric hurricane rainfall model with moisture (PHRaMM) as an alternative rainfall model during the paper revision process.

Author information

Authors and Affiliations

Pacific Northwest National Laboratory, Richland, 99354, USA
Wenwei Xu, Karthik Balaguru, David R. Judi, Julian Rice, L. Ruby Leung & Serena Lipari

Authors

Wenwei Xu
View author publications
You can also search for this author in PubMed Google Scholar
Karthik Balaguru
View author publications
You can also search for this author in PubMed Google Scholar
David R. Judi
View author publications
You can also search for this author in PubMed Google Scholar
Julian Rice
View author publications
You can also search for this author in PubMed Google Scholar
L. Ruby Leung
View author publications
You can also search for this author in PubMed Google Scholar
Serena Lipari
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

W.X. and K.B. primarily designed and developed the RAFT model and the resulting database. W.X., K.B. and J.R. wrote the paper. D.J. and L.R.L. were actively involved in the interpretation of the model output and the writing process. S.L. aided in revising the manuscript and contributed to the Usage Notes section.

Corresponding author

Correspondence to Wenwei Xu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Xu, W., Balaguru, K., Judi, D.R. et al. A North Atlantic synthetic tropical cyclone track, intensity, and rainfall dataset. Sci Data 11, 130 (2024). https://doi.org/10.1038/s41597-024-02952-7

Download citation

Received: 12 June 2023
Accepted: 11 January 2024
Published: 25 January 2024
DOI: https://doi.org/10.1038/s41597-024-02952-7