Introduction

Among all the tropical cyclone (TC) basins worldwide, the Western North Pacific (WNP) (Fig. 1a) is characterized by the most active tropical cyclone (TC) activity in terms of frequency and intensity1. Each year, about 25 named TCs form in the WNP where its peak seasonal TC frequency begins in June to August (JJA) and dips in September to November (SON) (Fig. 1b). Thus, the active TC season in the WNP (86% of total annual TCs) runs from June to November (JJASON). About 16 TCs eventually intensify into Tropical Storm category and nine TCs continue to become major storms1, which consequently render TCs as one of the most destructive natural hazards in the region. However, it should be noted that TCs can still develop and present considerable damages during the less active (i.e., December to February) and quiescent TC seasons (i.e., March to May), respectively2,3.

Fig. 1: Characteristics of tropical cyclones (TC) in the Western North Pacific (WNP).
figure 1

a Location of TC genesis in Domain 1 (blue) and Domain 2 (red). The large dots indicate the mean TC genesis location in Domain 1 (blue) and Domain 2 (red), respectively. The crosses represent the mean TC genesis location in June-August (JJA; blue) and September-November (SON; red), respectively. b Monthly relative TC frequency in the WNP (black), Domain 1 (blue), and Domain 2 (red), respectively. The yellow box indicates the active TC season during June-November (JJASON). c, d Correlation heatmap of mean seasonal TC genesis in Domains 1 and 2, respectively. e Correlation heatmap of mean seasonal TC genesis between Domains 1 and 2. f–h Time series of seasonal TC genesis frequency in JJASON, JJA, and SON in Domain 1 (blue) and Domain 2 (red), respectively. The significance of correlation is tested using the student’s t test with two-tailed distribution.

Previous studies have reported increasing economic losses and number of people affected by TCs along with the recent changes in the characteristics of TCs in the WNP such as increasing activity and peak intensity4,5, poleward migration of lifetime maximum intensity6, interdecadal changes in TC translation speed7, among others. These changes effectively underscore the need to provide reliable, timely, and more granular seasonal TC prediction in the WNP. Seasonal TC prediction generally involves two key components: long-range climate forecasting - the prediction of future climate state and/or its evolution across time and space, and TC forecasting, which focuses on predicting the response of TCs to such future climate state8,9,10.

The pioneering works in seasonal TC prediction were mainly based on the relationship between TCs and El Niño Southern Oscillation (ENSO)11,12,13,14. At present, several forecasting agencies worldwide remain to use predictors that are closely related to the predicted evolution of ENSO and other related sea surface temperature (SST)-based indices8,15. Among the challenges in ENSO-based seasonal prediction is the springtime predictability barrier where the prediction skill of ENSO becomes relatively low during the boreal spring16, which consequently clouds ENSO-based TC outlook right before the onset of active TC season in the WNP. Furthermore, the SST anomalies associated with ENSO events exhibit asymmetrical patterns between the western and eastern parts of the WNP. During warm (cold) ENSO events, more (less) TCs occur in the eastern half-court of the WNP. A contrasting pattern also typically occurs in the western half of the WNP during the said events1. Such nuances in the asymmetric large-scale environment related to ENSO that is favorable for TC development in the WNP are expected to similarly prompt asymmetric response of TC activity1,2,3,17.

Meanwhile, the temporal variability of TC genesis is also expected to change with respect to the seasonality of the Northern Hemisphere and the WNP itself. For example, the more poleward location of mean TC genesis in JJA (Fig. 1a) is attributed with the northward shift of the Intertropical Convergence Zone coincident with the boreal summer18, and more southward locations during the latter seasons2. The more westward mean TC genesis in JJA than in SON can be likewise attributed with the location of the WNP monsoon trough during the boreal summer where TC formation is generally more conducive17. The spatial and temporal contrasts in seasonal TC genesis frequency and associated large-scale environment are discussed in succeeding analysis.

Another challenge in seasonal TC prediction is the provision of outlook with sufficient lead time. Hence, the timely release of a seasonal outlook is preferably during the pre-season or one to two months ahead of the start of the active TC season (i.e., JJASON). With this earlier release, various end-users can properly utilize the seasonal TC outlook for strategic planning and resource mobilization, more particularly in disaster risk reduction. There is also a growing demand for more granulated seasonal TC outlook in smaller sub-regions (or even country-level) of the WNP (e.g., Southeast Asia, East Asia). Considering the WNP as a homogeneous spatial entity might overlook smaller sources of variability in its sub-regions. Consequently, this raises questions on the spatial and temporal homogeneity of TC characteristics in the WNP. An example of varying TC characteristics in the subregions of the WNP is highlighted in a report where there is an increasing TC activity in East Asia during the boreal autumn but the same cannot be said true for Southeast Asia and TCs that remain in the open ocean5.

Such enumerated challenges led us to explore an alternative approach on the seasonal prediction of TC genesis frequency in the WNP by using the response of TCs on the asymmetric seasonality and spatial contrast of the large-scale environments favorable (or unfavorable) for TC development. Different long-range prediction models are developed for the western (Domain 1) and eastern (Domain 2) half-courts of the WNP while another set of long-range forecasts isimplemented for different seasons. To underscore, our primary aim is to effectively highlight that taking advantage of the spatial and seasonal asymmetries in the characteristics of TC genesis in WNP results in improved long-range TC forecasting in lieu of the conventional approach of clustering and predicting all TCs in the WNP altogether. From our proposed approach, improved long-range TC predictability are expected to provide more granular information with sufficient lead-time, simple (e.g., straightforward and easy to implement), open-source (e.g., accessible environment), and interoperable (e.g., accessible in different computing languages) that can offer operational guidance for forecasting agencies in the WNP in creating their respective seasonal TC outlook. As a way forward, it is expected that our proposed approach will be used as a platform for seamless seasonal TC prediction, including the landfalling TCs and corresponding metrics of TC activity.

Results

Asymmetric seasonality of tropical cyclones in the Western North Pacific

The TC season in the WNP runs across two distinct meteorological seasons—boreal summer (JJA) and autumn (SON). The derivative seasons in between them (i.e., JAS and ASO) are often called transitional seasons. It is previously reported that the relationship of the WNP summer monsoon and convective activities around the Philippines dramatically reverses from the boreal summer to the boreal winter19, which can be attributed to the weakening and subsequent termination of the summer monsoon flow in the WNP. A stronger summer monsoon flow leads to the eastward extension of the monsoon trough17, which is generally accorded to provide favorable environment (i.e., increased background vorticity and moist environment, less vertical wind shear) for increased TC genesis in the WNP20.

We support such previous reports by presenting more evidence on the asymmetric seasonality of TCs from 1984 to 2020 (n = 37 years) in both Domains 1 and 2, respectively (Fig. 1c, d). Unless otherwise stated, we used the said period in our succeeding analysis. In Domain 1, the TC genesis between JJA and SON is not significantly correlated with each other. While the correlation of TC genesis between JJA and SON in Domain 2 is marginally significant (r = 0.36, p < 0.05), it should be noted that such correlation, similar to Domain 1, considerably decreases across the intermediary seasons. Interestingly, the cross-correlation of seasonal TC genesis between Domains 1 and 2, respectively, show increasing negative correlation that becomes more significant during the transition seasons (Fig. 1e), which indicates two things: weaker spatial asymmetry in the large-scale environment favorable for TC genesis in JJA than in SON, and stronger asymmetric seasonality of TC genesis in JJA and SON. Moreover, we note that the higher negative correlation of TC genesis in SON (r = –0.54, p < 0.01) than in JJA (r = –0.26) might have contributed more to the significant negative correlation in JJASON in (r = –0.50, p < 0.01) (Fig. 1f–h). We further corroborate such findings by showing that there is a weaker asymmetric pattern in the correlation of the outgoing longwave radiation (OLR) with TC genesis in JJA (Supplementary Fig. 1) when compared with the SON. Such seasonal asymmetry in convective activities from JJA to SON is possibly due to the gradual weakening and/or termination of the WNP summer monsoon during the intermediary seasons17,18.

The composite differences between JJA and SON, respectively, show significant and contrasting patterns in TC tracks and in the large-scale environmental conditions such as SST, OLR, and vertical wind shear (VWS) in Domains 1 and 2, respectively (Fig. 2a–d). During JJA, warmer SSTs, lower OLR, and reduced VWS is evident in Domain 1, which are more favorable conditions for TC development. Warmer SSTs thermodynamically fuel TC development where the increased cloudiness (heuristically represented by OLR) and lower VWS promote positive feedback mechanisms that further promote TC development. Such patterns are reversed in SON, which could additionally explain why there is an asymmetric seasonality and significant negative correlation in the time series of TC genesis in the WNP (Fig. 1c–h).

Fig. 2: Asymmetric seasonality of tropical cyclones (TC) and large-scale environmental variables in the Western North Pacific (WNP).
figure 2

a Composite difference map of TC tracks between June-August (JJA) and September-October (SON) in the WNP, respectively. The mean position of the monsoon trough during JJA (blue dashed line) and SON (red dashed line) are displayed accordingly. b–d Composite difference map of sea surface temperature, vertical wind shear, and outgoing longwave radiation in JJA and SON, respectively. The dots denote significant correlation at p < 0.05 tested using student’s t test with two-tailed distribution.

Many studies on seasonal TC characteristics consider the entire WNP as a homogenous entity in terms of its spatial characteristics21. However, some studies have reported otherwise. More particularly, it is shown that an increase in TC genesis is usually observed in the southeastern quadrant of the WNP during an El Niño event1,2,17,18,22, which is attributed to more favorable environment for TC formation such as warmer SSTs, increased background vorticity, reduced VWS, and extended monsoon trough20. In reverse, during a La Niña the active TC development region shifts towards the western part of the WNP1 where associated large-scale environment tends to be generally unfavorable for TC genesis. Meanwhile, the Pacific Meridional Mode (PMM), which is another SST-based non-ENSO climate mode that influences interannual TC variability in the WNP, is reported to be more significantly correlated with TC genesis frequency in the eastern half than in the western half of the WNP23. Unlike ENSO, the PMM reaches its maximum seasonal amplitude during the boreal spring24, which makes it useful as a potential predictor of seasonal TC variability in the WNP.

We further support the asymmetric variability of TCs in the WNP between Domain 1 and Domain 2 by showing the correlation of indicated seasonal TC genesis frequency with OLR (Supplementary Fig. 1a–i) and SST (Fig. 3a–i), respectively. Such an asymmetric pattern becomes more prominent in SON approximately along 140°E in Domains 1 and 2, respectively (Supplementary Fig. 1f, i). The negative correlation between SST and TC genesis frequency is stronger in Domain 2 than in Domain 1 across all seasons (Fig. 3d–i), particularly in SON. The spatial correlation pattern in Domain 1 is less asymmetric during JJA than SON. These findings further confirm that there is a less contrast in the spatial correlation between TC genesis frequency, OLR, and SST in JJA, respectively, when compared with those in SON.

Fig. 3: Relationship of tropical cyclone (TC) genesis frequency and sea surface temperature (SST).
figure 3

a–c Spatial correlation of indicated seasonal TC genesis frequency in the entire WNP (black box) and SST. d–f Spatial correlation of indicated seasonal TC genesis frequency in Domain 1 and SST. g–i same as (d–f) but in Domain 2. The black boxes indicate the location of the indicated domains. The dots denote significant correlation at p < 0.05 tested using student’s t test with two-tailed distribution.

Furthermore, we note that there is less significant contrasting spatial correlation found in JJASON between TC genesis frequency and OLR (Supplementary Fig. 1a–c), and SST (Fig. 3a–c), respectively. Our findings show that the asymmetric seasonality is more prominent in SON than in JJA, particularly in Domain 2. Banking on our findings on the spatial and seasonal asymmetry of TCs in the WNP, we confirm the use of the time series of seasonal TC genesis frequency into two half-courts along 140°E (i.e., Domain 1 and Domain 2) and used them in the development of seasonal TC prediction models.

Forward predictor selection

Given our secondary aim to develop an interoperable (i.e., accessible in different computing languages) prediction system, we selected several climate indices as potential predictors in our analysis that are regularly updated (i.e., sufficient historical data), open-source (i.e., accessible environment) and freely downloadable from reputable institutions such as the National Oceanic and Atmospheric Administration25,26 and Japan Meteorological Agency27 (Supplementary Table 1).

We also developed a forward predictor selection technique based on the collinearity reduction in the pool of the identified climate indices. To ensure the timely release of seasonal TC outlook, we used the pre-season months (i.e., Lead 1; January to April) because the updated climate indices for the month of April are typically released in the second half of May. Lead 2 employed in-season predictors (i.e., January to May) because the updated climate indices for the month of May are released in the second half of June, which means the TC season has already started (hence the term in-season). Accordingly, Lead 3 (i.e., February to June) and Lead 4 (i.e., March to July) also utilize in-season predictors. Therefore, the TC genesis frequency in JJA can only be predicted with Lead 1 predictors while the TC genesis frequency in SON employs predictors from Lead 1 to 4, respectively. Based on such predictor selection, the WNP Trade Wind (TW) index is identified as primary predictor of TC genesis frequency in Domain 1 from Lead 1 to 2 but in Lead 3 to Lead 4, PMM becomes the primary predictor for TC genesis (Supplementary Table 2). Meanwhile, PMM primarily influences the TC genesis frequency in Domain 2 in all leads (Supplementary Table 3). Further details of such forward predictor selection are provided in the Methods under predictor selection and regression modeling while the mechanisms associated with the aforementioned primary predictors are discussed in succeeding analysis.

Long-range tropical cyclone prediction

We employed an Ordinary Least Square (OLS) regression technique in our statistical prediction of seasonal TC genesis frequency. Compared to the other linear regression techniques, OLS is one of the most straightforward statistical prediction methods that is relatively simple to implement and is proven to produce accurate forecasts when applied to suitable datasets and conditions. Furthermore, while dynamical and hybrid models have made significant strides in TC prediction, the statistical method remains valuable by leveraging historical relationships between TC activities and various climate drivers and environmental factors making them easy to implement and to update8. Owing to advancements in numerical climate models and the accessibility of high-performance supercomputing facilities, dynamical models have become an appealing approach for seasonal tropical cyclone forecasts but they are expensive and resource-intensive to implement8,15,21. However, uncertainties due to inherent or systematic biases and errors remain a challenge as model performance highly depends on both the data and model predictability. Forecast agencies opt for model ensembles to minimize uncertainties, however, this has proven to be resource-intensive in terms of computing power and time.

On the other hand, the statistical approach is cost-effective as they are often less computationally intensive compared to dynamical models. Statistical models often rely on open-source tools and are designed with interoperability in mind making them user-friendly and adaptable to various data sources and platforms8,15,21. While we do not explicitly suggest that one approach is generally superior to the other, it remains that dynamical forecasts require statistical pre- and post-processing to be competitive with or superior to the statistical models27. Hence, the statistical models are comparable even with dynamical schemes, if not better for the reasons of being more practical and efficient due to their shorter development cycles and flexibility, which allows economical and rapid implementation.

Initially, we implemented the forward predictor selection to predict the TC genesis frequency in the homogeneous WNP using a no-split approach. A no-split approach means that the entire timeseries is used as training data for prediction; hence, there is no split in the timeseries. The predicted TC genesis frequency in the WNP is significantly correlated with the actual TC genesis frequency during JJA (r = 0.65, p < 0.01) and SON (r = 0.65, p < 0.01) giving a combined predicted correlation in JJASON with r = 0.71 (p < 0.01) (Supplementary Fig. 2a–c). Note that the predicted TC genesis frequency in JJASON is the sum of predictions in JJA and SON, respectively. While these scores are already good, it prompts us to ask whether such scores can still be improved. By exploring the presented asymmetric seasonality of TC variability in the WNP during JJA and SON, we propose an alternative approach in long-range TC prediction by separately predicting the seasonal TC genesis frequency in Domains 1 and 2 during JJA and SON, respectively (Fig. 4a–i). The predictions in Domain 1 and Domain 2 in JJA are added to get the predicted TC genesis frequency in the WNP. The same operations are implemented in SON. Lastly, we define the predicted TC genesis frequency in the WNP during JJASON as the sum of predictions in Domains 1 and 2 in JJA and SON, respectively.

Fig. 4: Predictability of tropical cyclone (TC) genesis frequency in the Western North Pacific (WNP) using Lead 1 predictors using a no-split approach.
figure 4

a–c Time series of predicted (red) and actual (black) TC genesis frequency during June to November (JJASON), June to August (JJA), and September to November (SON) in Domain 1 from 1984 to 2020, respectively. d–i, same as (a–c) but for Domain 2 and in the WNP, respectively. Inset statistics indicate the model output statistics measured using bivariate correlation (r), Normalized Root Mean Squared Error (NRMSE), and Ratio of Variance (RV). The significance of correlation is tested using the student’s t test with two-tailed distribution.

Following the similar implementation of forward predictor selection in a no-split ratio approach, the predicted TC genesis frequency in Domain 1 is significantly correlated with the actual TC genesis frequency during JJASON with r = 0.83 (p < 0.01). Meanwhile in Domain 2, the predicted TC genesis has a significant correlation with the actual TC genesis count during JJASON but slightly lower than in Domain 1 with r = 0.81 (p < 0.01). Adding the predictions in Domain 1 and Domain 2, the predicted TC genesis frequency in the WNP has significant correlation with the observed value in JJASON (r = 0.77, p < 0.01). Thus, the performance of the predicted TC genesis frequency using our alternative approach (r = 0.77, p < 0.01; NRMSE = 0.64; RV = 0.49) is better than the conventional approach of predicting the clustered TC genesis frequency in the entire WNP (r = 0.71, p < 0.01; NRMSE = 0.71, RV = 0.40) (Fig. 4g, Fig. 5g, Supplementary Fig. 2a). Refer to Methods for further description of model output statistics used in the analysis.

Fig. 5: Summary of prediction model scores in various lead times using Domain 1 (100–140°E) and Domain 2 (140–180°E) using a no-split ratio approach.
figure 5

a–c Summary of model output statistics measured using bivariate correlation, Normalized Root Mean Squared Error (NRMSE), and Ratio of Variance (RV), respectively. d–i same as (a–c) but for Domain 2 and in the WNP (Domain 1 + Domain 2), respectively. The blue, orange, green, and red columns correspond to the different lead times in predictor selection from Lead 1 to Lead 4, respectively.

Comparing the performance of the models across different lead times (Fig. 5a–i), Lead 1 and Lead 2 have the highest performance in predicting SON TC genesis in Domain 1 (r = 0.78, p < 0.01; NRMSE = 0.63; RV = 0.60) while Lead 2 performed best for Domain 2 (r = 0.75, p < 0.01; NRMSE = 0.66, RV = 0.56). Adding the combined predicted TC genesis frequency in JJA using Lead 1 and SON using Lead 2 in Domain 1 and Domain 2, the WNP TC genesis in JJASON can be best predicted using Lead 2 (r = 0.80, p < 0.01; NRMSE = 0.60, RV = 0.56) (Fig. 5g). These metrics clearly indicate that by using our proposed alternative approach the TC genesis frequency in the WNP during the active TC season (i.e., JJASON) can be predicted with sufficient lead time (e.g., one to two months ahead) along with very satisfactory performance skill scores. While the results show that updating the prediction models using in-season predictors (i.e., Lead 2) can provide better performance, the pre-season prediction models (i.e., Lead 1) also achieves comparable results, which supports the efficiency and flexibility of our proposed statistical approach.

Meanwhile, an 80-20 split ratio approach between training (n = 30 years) and test (n = 7 years) dataset in JJASON was also employed to confirm the robustness of the prediction models (Supplementary Fig. 3a–f). Unlike the no-split ratio approach, the 80-20 split ratio approach divides between the training and test dataset in JJASON, which corresponds to 80% and 20% of the entire timeseries, respectively. Overall, the predicted TC genesis frequency in WNP during JJASON using an 80-20 split approach have confirmed good predictability scores in both training (r = 0.79, p < 0.01; NRMSE = 0.62; RV = 0.51) and test (r = 0.54; NRMSE = 0.94; RV = 0.18) periods, respectively.

Influence of the primary predictors

Given that the identified primary predictors are the WNP TW index and the PMM, respectively, we proceed to explain the possible mechanism of the asymmetric seasonal TC genesis predictability during JJASON using the said indices. While it is true that the secondary to tertiary predictors also contribute to the regression model, we underscore our primary aim, which is to highlight that the regional and temporal asymmetry in the WNP can lead to a better seasonal TC predictability but notwithstanding the performance of the prediction model itself. In future studies, we encourage more investigations on identifying more and/or better predictors based on the asymmetry of TC characteristics in the WNP.

Meanwhile, a positive WNP TW index translates to strengthened trade winds or anomalous easterlies (Fig. 6a) that flow towards Domain 1 where they converge with weaker westerlies leading to the rise of increased convective activities in Domain 1 (Fig. 6b). The risen air from the upper-air divergence in Domain 1 zonally rises and descends as dry air, which exacerbates the anomalous dry environment in Domain 2. This mechanism produces positive feedback for further strengthened trade winds drawn towards the warm SST anomalies in Domain 1 (Fig. 6c). The dry environment in Domain 2 prompts a strengthened North Pacific High where it weakens the subtropical jet, and ultimately, limits and amplifies the warm SST anomalies in Domain 1. The warm SST anomalies, increased moisture, and pre-existing anomalous convective activities consequently render a favorable environment for increased TC development and convective activities in Domain 1 (Fig. 6b). However, a strengthened TW index that is favorable for TC genesis in Domain 1 is not necessarily suppressing the TC genesis in Domain 2.

Fig. 6: Influence of the Western North Pacific (WNP) Trade Wind (TW) Index on tropical cyclones (TC).
figure 6

a Schematic of the influence of a positive WNP TW index on TC genesis in Domain 1 (blue box) and Domain 2 (red box), respectively. The strengthened trade winds are drawn towards Domain 1 where it converges with the weaker westerlies leading to the formation of anomalous convective activities in Domain 1. The warm sea surface temperature anomalies and anomalous convective activities associated with positive WNP TW index led to increased TC genesis in Domain 1. The zonally-transported rising air from Domain 1 to Domain 2 exacerbates the anomalous anticyclonic circulation rendering less TC genesis in Domain 2. The response of the midlatitude environment to the WNP TW index amplifies the asymmetric spatial TC genesis distribution in the WNP. b, c Spatial correlation of WNP TW index with outgoing longwave radiation and sea surface temperature during June to November. The dots denote significant correlation at p < 0.05 tested using student’s t test with two-tailed distribution.

Another possible mechanism that encourages the dipole TC genesis pattern is through the influence of the PMM (Fig. 7a). A positive PMM is associated with anomalous westerlies from the summer monsoon flow drawn towards the warmer SST region in Domain 2 (Fig. 7b, c). The anomalous westerlies also weaken and relax the trade winds that lead to anomalous low-level convergence or increased convective activities in Domain 2. A positive PMM is characterized by a meridional pattern of warm SST anomalies in the central Pacific, which influences reduced zonal wind shear in the eastern section of the WNP and is one of the reasons why there is an increased TC development in Domain 2 in a positive PMM phase23. The risen air from the convective activities in Domain 2 zonally descends as dry air, which exacerbates the anomalous anticyclonic circulation in Domain 1. Meanwhile, the warm and moist air meridionally transported from Domain 2 weakens the North Pacific High resulting in an increased equatorward cold air intrusion cooling the SSTs in the North Pacific (Fig. 7c). At the same time, the anomalous convective activities in Domain 2 meridionally influence a strengthened subtropical jet, which confine the cold SSTs off the coast of Japan. In totality, the mechanism associated with a positive PMM renders more TC development in Domain 2 and less TC development in Domain 1. Furthermore, we note that the PMM typically reaches its maximum amplitude during the boreal spring where it becomes meridionally closer to the WNP, which could be a possible reason why PMM becomes a good predictor for seasonal TC predictability, particularly in Domain 2.

Fig. 7: Influence of the Pacific Meridional Mode (PMM) on tropical cyclones (TC) in the Western North Pacific (WNP).
figure 7

a Schematic of the influence of a positive PMM index on TC genesis in Domain 1 (blue box) and Domain 2 (red box), respectively. The warm sea surface temperature anomalies and anomalous convective anomalies associated with a positive PMM lead to increased TC genesis in Domain 2 while the zonally-transported rising air from Domain 2 to Domain 1 exacerbate the anomalous anticyclonic circulation rendering less TC genesis in Domain 1. The response of the midlatitude environment to the PMM amplifies the asymmetric spatial TC genesis distribution in the WNP. b, c Spatial correlation of PMM index with outgoing longwave radiation and sea surface temperature during June to November. The dots denote significant correlation at p < 0.05 tested using student’s t test with two-tailed distribution.

Using the WNP TW index and PMM as the primary predictors of TC genesis frequency in Domain 1 (Supplementary Table 2) and Domain 2 (Supplementary Table 3), respectively, the prediction models still show satisfactory performance in JJASON (Supplementary Fig. 4a–f). In a no-split ratio approach, the predicted TC genesis frequency in the WNP has significant correlation with the observed value in JJASON (r = 0.53, p < 0.01; NRMSE = 0.85, RV = 0.21). In an 80-20 split approach, the predicted TC genesis frequency in JJASON remains to have good predictability scores in both training (r = 0.51, p < 0.01; NRMSE = 0.86, RV = 0.31) and test (r = 0.69, p < 0.01; NRMSE = 0.81, RV = 0.13) periods, respectively. These scores confirm the satisfactory performance of the prediction models even when only using the dominant predictors (i.e., WNP TW index, PMM) in Domain 1, Domain 2, and WNP during the JJA, SON, and JJASON, respectively. Ultimately, the results of our proposed seasonal TC genesis predictions are comparable to the results of recent literature on the statistical prediction of seasonal TC genesis in the WNP and in the Philippines28,29.

We further support the robustness of our results by showing that the sensitivity of the predicted TC genesis frequency in the WNP during JJASON using a 130°E partition (r = 0.71, p < 0.01; NRMSE = 0.71; RV = 0.45) (Supplementary Fig. 5g–i) is generally inferior in Lead 2 (see lead time with best prediction in Fig. 5g) than in a 140°E partition (r = 0.80, p < 0.01; NRMSE = 0.60; RV = 0.56) (Fig. 5g–i). Moreover, we also found lower model performance during JJASON in WNP using a 150°E partition in Lead 2 (r = 0.56, p < 0.01; NRMSE = 0.85; RV = 0.54) (Supplementary Fig. 6g–i). Such consistent results corroborate that the seasonal TC genesis frequency in WNP can be best predicted in the two domains equally partitioned along 140°E.

Discussion

In the age of emerging technologies, it is not surprising that end-users look for more granulated information with finer spatial resolution or lower temporal resolution to support their various requirements on seasonal climate prediction. There is also a demand among end-users for accessible and open-source information that is simple, inexpensive, and easy to implement. In the case of seasonal TC prediction, a timely release of information is generally preferred to serve as guidance for operational forecasting, resource mobilization in disaster risk reduction, and strategic planning for most industries and sectors such as hydropower generation, water resources, and agriculture.

Following such expected outcomes, we propose an alternative approach in predicting the seasonal TC genesis frequency in the WNP during its active season (i.e., JJASON) by separately predicting the TC genesis in two domains (i.e., Domains 1 and 2) and two seasons (i.e., JJA and SON), respectively. Such separation is based on two distinct asymmetries: the reversal in the seasonality of TCs between JJA and SON, and the spatial inhomogeneity in the large-scale environments favorable (or unfavorable) for TC development between the western (Domain 1) and eastern (Domain 2) half-courts of the WNP, respectively.

Using a number of open-source and freely downloadable climate indices, we have identified possible sources of predictability of seasonal TC genesis frequency in JJA, SON, and JJASON, respectively. Among these predictors, we show that the PMM and the WNP TW index can mostly explain two possible mechanisms of asymmetric seasonality of TC genesis frequency in WNP. In fact, the said two climate indices alone produce satisfactory model output statistics such that they can be considered as the smart predictors of seasonal TC genesis frequency in the WNP. For a timely release, considerable performance scores are also demonstrated using Lead 1, which means that it is possible to release a seasonal TC outlook one to two months ahead of the active TC season. Since the spatial and temporal inhomogeneity are accounted for in the development of an OLS regression-based prediction model of seasonal TC genesis frequency, therefore, our results can effectively provide more granulated information by providing seasonal TC outlook in three domains: the WNP itself, and the western (Domain 1) and eastern (Domain 2) half-courts of the WNP.

Finally, our findings are comparable with known literatures on seasonal TC prediction in the WNP even with dynamical models. While our statistical models may not be as fancy as most dynamical models, their predicted results remain and perhaps provide more added value to operational long-range TC forecasting because they are inexpensive and practical with efficient statistical methods, which allows their rapid implementation. It is expected that our findings can be used to provide additional insights as well as operational guidance for seasonal TC outlook in the WNP.

Methods

Tropical cyclone best track

The TC best track data in the WNP from 1984-2020 was obtained from the International Best Track Archive for Climate Stewardship version 430, which covers the reported reliable period in the WNP beginning in 198431. Consequently, all time series, plots, and statistical tests in our analysis used the period 1984-2020. The domain of WNP defined as 0-35°N and 100-180°E is further divided into two half courts with similar latitudes: Domain 1 (100-140°E) and Domain 2 (140-180°E) (Fig. 1a). The TCs considered in the analysis include all named storms that reached at least Tropical Storm category (maximum sustained wind speed ≥ 35 knots). The location of TC genesis is defined as the first reported coordinates of a TC while the TC genesis frequency is the count of the considered TCs in the indicated domains.

Climate indices and reanalysis data

A total of 21 climate indices (Supplementary Table 1) from the National Oceanic and Atmospheric Administration Physical Sciences Laboratory (NOAA-PSL)25, Climate Prediction Center26, and Japan Meteorological Agency Tokyo Climate Center were utilized as potential predictors in our analysis27. The ERA5 reanalysis data products are used in the analysis of zonal wind, meridional wind, and SST32 while the OLR is taken from the NOAA-PSL33. The location of the monsoon trough is estimated along the overturning zonal wind flow at approximately –0.5 ms-1 contour. The VWS was obtained from the calculated zonal and meridional winds at 200 hPa and 850 hPa.

Statistical and significance tests

We used Pearson’s correlation to measure the strength of linear relationship between indicated variables where their significance is tested using a student’s t test with two-tailed distribution. A value of ±1 is considered perfect correlation. The Normalized Root Mean Squared Error (NRMSE) refers to the amount of error between the predicted and actual values normalized by the actual standard deviation to account for the differences in data spread. An NRMSE value equal to 0 corresponds to a perfect score. The Ratio of Variance (RV) quantifies the dissimilarity between the predicted and actual values. A perfect score of 1 for RV denotes that the predicted and observed values have the same variance.

Predictor selection and regression modeling

We used at least two-month lead data ahead of the start of each season to define four lead times (Lead 1 - January to April; Lead 2 - January to May; Lead 3 - February to June; Lead 4 - March to July). For instance, the data of pre-season climate indices (i.e., prior to the start of the active TC season in June) is released in May, which means the Lead 1 is the only pre-season prediction. Therefore, the updated climate indices used in Lead 2 are released in June, Lead 3 in July, and Lead 4 in August, respectively, which make these lead times as in-season predictions already. In turn, JJA can only be predicted with Lead 1 while SON can be analyzed until Lead 4, thus, JJASON has four lead times.

Among the 21 climate indices, we selected the first climate index with the highest bivariate correlation between the seasonal TC genesis frequency and the indicated monthly climate indices. To highlight the influence of the first climate index, we reduced its linear contribution from the other climate indices by removing the line of best fit between them. After such collinearity reduction, we repeated the process of such influence removal until we found the remaining set of predictors with significant correlation. This sequential process was implemented to eliminate any possible multicollinearity among the predictors. The resulting set of predictors was used to predict the number of TC genesis in each lead time in JJA and SON using OLS. After which, the predicted values of both seasons were added to predict the entire JJASON. We implemented both 80-20 train-test split ratio and no split ratio approaches to support the robustness of the developed prediction models. A no-split approach means that the entire timeseries is used as training data for prediction while an 80-20 split ratio approach divides between the training and test dataset corresponding to 80% and 20% of the number of years in the entire timeseries, respectively.