A new approach for location-specific seasonal outlooks of typhoon and super typhoon frequency across the Western North Pacific region

With an average of 26 tropical cyclones (TCs) per year, the western North Pacific (WNP) is the most active TC basin in the world. Considerable exposure lies in the coastal regions of the WNP, which extends from Japan in the north to the Philippines in the south, amplifying TC related impacts, including loss of life and damage to property, infrastructure and environment. This study presents a new location-specific typhoon (TY) and super typhoon (STY) outlook for the WNP basin and subregions, including China, Hong Kong, Japan, Korea, Philippines, Thailand, and Vietnam. Using multivariate Poisson regression and considering up to five modes of ocean-atmospheric variability and teleconnection patterns that influence WNP TC behaviour, thousands of possible predictor model combinations are compared using an automated variable selection procedure. For each location, skillful TY and STY outlooks are generated up to 6 months before the start of the typhoon season, with rolling monthly updates enabling refinement of predicted TY and STY frequency. This unparalleled lead time allows end-users to make more informed decisions before and during the typhoon season.


Results
Defining the TY season. The monthly climatology of TSTD, TY and STY events (1987-2020) is calculated for the WNP and seven individual countries/regions, including China, Hong Kong, Japan, Korea, Philippines, Thailand and Vietnam (Fig. 1). The subjective nature used to define the TY season across the WNP and the geographical spread of countries considered in this analysis means that TY seasons can differ between agencies. To account for this objectively, the most active 6-month period is used to define the TY season for each region (Fig. 2). Doing so revealed two different TY seasons: June-November (WNP, China, Hong Kong, Japan, Philippines, Thailand and Vietnam) and May-October (Korea). While TCs can occur in any month of the year, the defined TY seasons using the described methodology account for between 78% (Philippines) and 99% (Korea) of TSTD events, between 88% (Philippines) and 100% (Korea) of TY events and between 86% (WNP) and 100% (Korea) of STY events.
Comparing model skill for TY and STY outlooks. In this study, we evaluated the performance of 10 unique predictor models in producing skillful TC outlooks, each of which pairs a unique ENSO index with other ocean/atmospheric climate influences that drive changes to WNP TC behaviour (see Table S1) [19][20][21][22][23] . These indices include the Dipole Mode Index (DMI) 40 , including the Indian Ocean Dipole East Box (IOD E) and West Box (IOD W) 40 , the Pacific/North American Pattern (PNA) 41 , Quasi-biennial Oscillation (QBO) 31 and the Pacific Meridional Mode (PMM) 42 (see "Data and model development" and Table 1). An automated variable selection procedure is applied to select the optimum combination of predictors for each of the ten predictor models (Table S1) using a generalised linear model with a Poisson distribution and log link function to predict the mean number of TCs per season. After generating a training time series and a Leave-One Out Cross-Validation (LOOCV) time series, the model that generates the highest skill score (SS) for the LOOCV series is selected for further analysis. Values from the LOOCV analysis are presented throughout the manuscript and values from the training analysis are presented in the "Supplementary Material".
The skill of TY (maximum sustained winds ≥ 64 kt and < 114 kt) and STY (maximum sustained winds ≥ 114 kt) outlooks for a range of model initialisation periods (six pre-season and three in-season outlooks) are evaluated  For TYs, skillful predictions can be made up to 6 months (December) before the start of the TY season (June) (Fig. 3, left panels). LOOCV model statistics suggest that at lead − 6 (the preceding December), preseason outlooks are most skillful for the WNP, with a skill score (SS) of 48%, correlation between the LOOCV predicted and the observed TY time series of r = 0.72 (significant at p < 0.01), an exact strike rate (SR-E; where the LOOCV prediction, rounded to the nearest count, exactly matches the observation) of 12% (4 seasons of 34 TY seasons) and an SR ± 1 (where the LOOCV outlook matches the observation ± 1 TY) of 53% (18 seasons of 34 TY seasons). Lead − 6 pre-season outlooks are also most skillful for Japan (SS = 69% and SR-E = 41%) and Hong Kong, (SS = 31% and SR-E = 26%). For the other locations, lead − 3 is the most skillful model initialisation period for China (SS = 52% and SR-E = 47%) and lead − 4 is most skillful for the Philippines (SS = 54% and SR-E = 41%). Interestingly, for the WNP, Japan and the Philippines, the SS typically decreases with reduced lead time, indicating that these models are able to detect important pre-conditioning of parameters most suited to TC formation and movement.
Compared to TYs, the most skillful pre-season STY outlooks are typically generated closer to the start of the season, perhaps due to fewer STYs than TY counts ( Fig. 3; right panels). Outlooks generated at lead − 2 (April) are most skillful for the WNP (SS = 59% and SR-E = 32%) and the Philippines (SS = 51% and SR-E = 32%) and outlooks generated at lead − 3 (March) are most skillful for China (SS = 59% and SR-E = 38%) and Hong Kong (SS = 49% and SR-E = 24%). However, for Japan, outlooks generated at lead − 6 (the preceding December) are more skillful than any other lead month (SS = 57% and SR-E = 82%). Regardless for other locations, useful information about STY frequency can still be obtained by generating the outlook up to lead − 6 where the SS ranges from between 13% (China) up to 57% (Japan) and SR-E ranges from between 24% (WNP) up to 47% (Hong Kong).
While pre-season outlooks provide information about the fixed 6-month typhoon season across six lead times, in-season outlooks provide valuable information about TY and STY frequency for the remaining season e.g. models initialised between lead months + 1 to + 3 provide guidance for the remaining 5-3 months of the season, respectively. This enables greater temporal granularity of when TYs and STYs may occur, but at the detriment of lead time. This is particularly useful for locations where the latter half of the season is more active than the first half (e.g. Philippines). However, given the length of the season (and thus TY and STY frequency) varies with increased in-season lead time, skill should not be compared. Generally, in-season TY and STY outlooks demonstrate good performance, demonstrating skill in both SS and SR-E metrics. For example, China (Philippines) TY outlooks have SS ranging between 10 and 54% (17-53%) and WNP (Philippines) STY outlooks have SS ranging from between 34 and 58% (43-46%).
Comparison of observed and LOOCV predicted TYs ( Fig. 4; left panels) and STYs ( Fig. 4; right panels) demonstrates the LOOCV models ability at lead − 1 and lead − 6 to capture the season-to-season variability of TY/ STY counts and in terms of SR-E/SR ± 1, the superior model performance at lead − 6. However, there are some cases where predictions generated at lead − 6 tend to overestimate, e.g. the 1995 Hong Kong TY season, where lead − 6 (− 1) LOOCV predicted 11 (4) TYs compared to 6 TYs observed. Similar overestimation occurs for lead − 6 outlooks during the 1991 Japan TY season (11 TYs predicted compared to 8 TYs observed). Regardless, no seasonal TC outlook is skillful 100% of the time due to the multiple and competing influences that drive variability in TC behaviour. Importantly, the LOOCV time series presented is representative of model performance in 'validation mode' and the same overestimations are not observed in the training time series (Fig. S3). The LOOCV models are also able to capture the observed linear trends in TY and STYs. For TYs, significant downward trends (≥ 90% confidence level; Mann-Kendall Trend Test) are observed for the WNP (− 1.24 TYs/decade), China (− 0.59 TYs/decade), Japan (− 0.70 TYs/decade) and the Philippines (− 0.58 TYs/decade). For STYs, no significant trends were observed, but an increasing trend in STY frequency was observed for China (0.22 STYs/ decade), Hong Kong (0.59 STYs/decade) and Japan (0.40 STYs/decade). Negligible non-significant trends in STY frequency were observed for the WNP (− 0.04 STYs/decade) and the Philippines (− 0.18 STYs/decade).

Comparing model skill for all TYs. As individual TY and STY outlooks for Korea, Thailand and Vietnam
resulted in negligible skill, TY + STY models are derived (where maximum sustained winds ≥ 64 kt) for the WNP and seven other locations considered in this analysis. Consistent with previous findings (Fig. 3), TY + STY outlooks generated at lead − 1 are typically not the most skillful (Fig. 5). Instead, outlooks generated between lead − 2 and lead − 6 are the most skillful. For the WNP, China, Hong Kong, Japan and the Philippines, TY + STY model skill is comparable to individual TY and STY outlooks (Fig. 3), so the remaining discussion will focus on Korea, Thailand and Vietnam. For Korea, pre-season outlooks generated at lead − 6 are most skillful (SS = 41%, SR-E = 32%) with a fairly consistent SS between lead − 5 and lead − 3. For Thailand, lead − 3 (March) is the most skillful model initialisation period (SS = 48%, SR-E = 29%), however a substantial decline in model skill for lead − 2 and lead − 1 is evident, and no skillful in-season outlooks are available. Lastly, for Vietnam, lead − 2 (April) is most skillful (SS = 53%, SR-E = 35%), with models initiated in lead − 6 (December) a close second (SS = 52%, SR-E = 26%). In-season model skill is also impressive; however, not all lead times produced skillful models (e.g. Thailand), likely due to a diminishing sample size.
Of the eight regions considered, seven have a decreasing trend in TY + STY counts, three of which are statistically significant (≥ 90% confidence level), including the WNP (− 1.27 TY + STY/decade), the Philippines (− 0.76 TY + STY/decade) and Thailand (− 0.56 TY + STY/decade) (Fig. 6). Korea is the only location to see an increasing trend in the number of TY + STY events, however, this trend is not statistically significant (+ 0.28 www.nature.com/scientificreports/ Kong (e, f), Japan (g, h) and the Philippines (i, j) between 1987 and 2020. The LOOCV prediction is compared for two pre-season periods: lead − 1 (1 month before the start of the typhoon season; blue line) and lead − 6 (6 months before the start of the typhoon season; red line). On-panel percentage values indicate LOOCV SR-E (SR ± 1 in parentheses) for models lead − 1 and lead − 6. Dashed line represents observed linear trend with on-panel trend (/decade) summarised in grey italics with statistical significance (Mann-Kendall test) denoted by an asterisk (*significant at 90% confidence level; **significant at 95% confidence level; ***significant at 99% confidence level). www.nature.com/scientificreports/ TY + STY/decade). Both lead − 1 and lead − 6 LOOCV time series' capture the trends in observed TC counts well. Some overestimation is observed (e.g. 1998 season in Thailand) and is most typical in lead − 1 outlooks, unlike individual TY and STY outlooks where lead − 6 outlooks were more likely to overestimate (Fig. 4).

Scientific Reports
The stepAIC function selected, on average, nine covariates for TY, STY and TY + STY forecasts and eight covariates for STY forecasts; however models ranges from including as few as two covariates to as many as 19. For each model run, up to 42 covariates in total were available for selection. While parsimony varies between models, only models with AICc differences (between the intercept only model and the fitted model) ≥ 10 were selected. Figure 7 summarises the proportionality of selected model covariates for each location. No one individual ENSO index was consistently or commonly selected, highlighting that the oceanic, atmospheric and coupled oceanatmospheric interactions that the ENSO indices quantify and their associated interaction and impact on TC activity varies according to sub-region. This demonstrates the benefits of including multiple indices of ENSO that reflect the diversity of the phenomena. Note that the proportionality of individual ENSO indices (light grey bars) cannot be compared with non-ENSO indices (red bars) as only one ENSO index was included in each of the ten independent predictor models (Table S1). However, analysing the proportionality of all ENSO indices (dark grey bars; which can be directly compared with non-ENSO indices) indicates that for five of the eight locations (WNP (28%), China (25%), Hong Kong (25%), Korea (25%) and Vietnam (26%)), ENSO indices are most commonly selected by the automated variable selection procedure. For the Philippines and Thailand, the PMM was the most selected index, accounting for 23% and 20% of indices, respectively. For Japan, the IOD E was the most frequently selected index (18%). For all locations, the DMI was the least commonly selected index, accounting for between < 1% (Hong Kong, Korea and Thailand) and 5% (Japan) of chosen covariates, unlike the individual poles (IOD E and IOD W), which were frequently selected by the automated variable selection procedure.

Discussion
Seasonal TC behaviour is influenced by multiple climate influences that modulate the conditions suited for cyclogenesis and intensification. Using indices representing these climate influences with an automated covariate selection algorithm, multivariate Poisson regression has been applied to train and test model performance for the WNP basin and seven sub-regions. We tested model performance across nine lead times and found that in many cases, model skill is sufficient to enable skillful predictions up to 6 months before the start of the TY season. This approach has highlighted that for each country, the best performing model varies with lead time and underlines the benefits of applying this flexible and adaptive modelling framework for individual locations within a larger TC basin.
The adaptive modelling framework applied in this analysis facilitates the selection of indices that best captures the variability in TY, STY and TY + STY frequency for the WNP and seven sub-regions. The oceanic and atmospheric response to forcing from climate influences (quantified through climate indices) is not spatially or temporally homogeneous across the entire WNP region. The associated formation and intensification of TCs is sensitive to changes in oceanic and atmospheric conditions 53 , and as such, the selected combination of climate indices varies according to location.
For the WNP, China, Hong Kong, Japan and the Philippines, individual TY and STY outlooks were initialised, trained and validated, both of which demonstrated impressive skill. For Korea, Thailand and Vietnam, a small STY event set meant that individual TY and STY outlooks could not be produced (due to negligible/no model skill), but models for all events (TY + STY; where maximum sustained winds ≥ 64 kt) were initialised, trained and validated for all eight locations considered in this analysis.
Although other statistical 5-11 , dynamical 12-14 and hybrid statistical-dynamical outlooks [15][16][17] have been derived and tested for the WNP region, this study is unique in the following ways.
1. Automated covariate selection determines the optimum combination of input variables (seven indices each with 6-monthly leads-42 covariates in total) for each of the ten predictor models. This is a novel and important component of the modelling. The optimum number and combination of predictors changes between model initialisation periods, locations and for individual TY, STY and TY + STY outlooks, enabling the training and validation of the best possible model. 2. For each location and model initialisation period, ten predictor model outputs are available for analysis.
Although this study selected the model with the highest LOOCV SS for further analysis, in theory, all ten model outputs could be used to determine the confidence of the prediction. If this model was used in an operational sense to provide seasonal TY guidance, evaluation of confidence is invaluable. 3. The skill of deriving monthly TY outlooks are tested and show that skillful pre-season outlooks for TYs and STYs (WNP, China, Hong Kong, Japan and Philippines) and all TYs (TY + STY; for all locations considered in this study) can be generated up to 6 months before the start of the season, helping bridge the gap between current sub-seasonal and seasonal climate guidance for the WNP. The skill of in-season outlooks are also tested and are found to provide useful insights into TY, STY and TY + STY frequency for the latter months of the 6-month TY season. Deriving rolling monthly outlooks enables the continual refinement of operational outlooks before and during the season, important given TC behaviour is modulated by changes in ocean and atmosphere predictors. 4. If applied in an ongoing operational sense, the adaptive modelling framework accounts for the most recent changes in TY, STY and TY + STY behaviour, including potential climate change influences and the statistically significant trends observed in the historical record between 1987 and 2020 ( Figs. 4 and 6). Our approach enables consideration of the most recent season (i.e. the model is trained considering the most recent TY season), so does not assume stationarity, nor does it use static, historical assumptions about the best combination of predictors to provide a prediction for the next season. www.nature.com/scientificreports/ 5. The modelling framework used here is computationally inexpensive and can be applied to any user-specified region within the WNP and beyond 28,29 . This approach can also be trained and validated for other sub-regional locations within the WNP (e.g. multiple locations in China), which could form the basis of a future study.
While the modelling framework could technically include any relevant predictor, such as multidecadal, or extra-tropical climate variability, the covariates considered in this analysis were included for practicality, including selecting covariates that are regularly updated 1 month before the month of model initialisation. This is required to generate a prediction. We acknowledge that other indices not included in this analysis may add additional skill to the models; however, these could be considered in future updates to the model. In addition, some of the country-specific domains considered in this analysis are large (e.g. China), and the processes that drive changes in TY and STY events may differ considerably according to location. Future work could apply the modelling framework used here and test it for smaller, regional locations within the WNP region and could also consider the incidence of landfalling TYs and STYs. Also, the objectively defined TY seasons (i.e. the most active 6-month period; Fig. 2), may be longer than they need to be, but as we have shown, using the 6 months captures the majority of TSTD, TY and STY events for each location. Regardless, our models can be applied to user-specified seasons. With time, the addition of more seasons to the event set (currently 34 seasons between 1987 and 2020) will increase the sample size and may improve model skill.
The benefits of skillfully predicting TYs and STYs for specific WNP locations up to 6 months before the start of the TY season are wide-reaching. This modelling has the potential to greatly assist governments, decision-makers, aid agencies, insurers/reinsurers and many others that have to make important decisions under uncertainty. Advance warning up to 6 months before the start of the TY season is particularly useful where above-average TY/STY activity is predicted and may enact a series of decisions that could potentially minimise the loss of life and reduce economic losses. The lead-time provided by the models presented in this study could also be used to underpin weather index insurance and/or parametric insurance solutions for the locations considered.

Data and model development
Tropical cyclone data and Outlook regions. This study uses TC best-track data from the International Best-Track Archive for Climate Stewardship (IBTrACSv4) 54 for the Western North Pacific (WNP; Equator-60° N, 100° E-180°). The following three intensity categories 55 are applied to IBTrACS: Tropical storm/depression (TSTD) with maximum sustained winds < 64 kt, Typhoon (TY) with maximum sustained winds ≥ 64 kt and < 114 kt (Category 1-3 on the Saffir-Simpson Hurricane Wind Scale) and Super Typhoon (STY) with maximum sustained winds ≥ 114 kt (Category 4 and 5 on the Saffir-Simpson Hurricane Wind Scale). The study period extends between 1987 and 2020. TC data pre-1987 was not considered due to a lack of in situ aircraft reconnaissance before this time 56 , resulting in inconsistent intensity estimates. This analysis does not account for interbasin differences in defining the intensity of TY and STY events and uses the definitions are outlined above.
Regional Areas of Responsibility (AORs) as per the World Meteorological Organizations regional warning areas 57  The number of TCs to pass within an AOR is calculated every month, and its maximum intensity within that AOR is recorded and assigned TSTD, TY or STY intensity. The most active 6-month period is used to define the typhoon season. Using this approach, June-November is typhoon season for the WNP, China, Hong Kong, Japan, Philippines, Thailand and Vietnam and for Korea, May-October is the most active 6 month period.
Predictor variables. Five large-scale modes of ocean-atmosphere variability that are known to drive changes in WNP TC activity are included in this analysis. In total, 16 indices (Table 1) quantify the following modes of variability: ENSO (index 1-10), IOD (index 11-13), PNA (index 14), QBO (index 15) and the PMM (index 16) (see Fig. S1 for a diagrammatic summary). As there is no consensus on which index best represents the ENSO phenomenon 58 , ten unique ENSO indices are included in this analysis (following 28,29 ). However, only one ENSO index at a time is paired with the remaining indices, negating the possibility of multicollinearity 59,60 , resulting in ten individual predictor models (see Table S1).
Each predictor model (seven indices in total) contains six consecutive 1-month values and are dependent on the model initialisation month. For example, for a June-November typhoon season, a lead − 1 outlook is generated in May, 1 month before the start of the season. The predictor model for this lead − 1 outlook contains six consecutive 1-month values starting from April (month 1) extending back for 6 months to November (month 6; the previous year). In total, nine model initialisation months are considered, including six "pre-season" models (lead − 1 to lead − 6) and three "in-season" (lead + 1 to lead + 3) models. Depending on the defined typhoon season, a lead + 1 (lead + 3) outlook is generated in the first (third) month of the season, providing predictions for the remaining five month (3 months) season. For each of the ten predictor models, there are 42 variables (seven indices with 6 monthly values). www.nature.com/scientificreports/  www.nature.com/scientificreports/ Poisson modelling framework. Following other studies 28,29,[61][62][63][64] , Poisson regression has been successfully applied to predict historical TC counts, y, using one or more geophysical parameters (predictors). As such, we use multivariate Poisson regression: where: where μ i is the expected number of TC counts with covariate values x ij for the j predictors on the i th observation. β j refers to the regression coefficient for each covariate and β o , the intercept. The training period for this analysis is 1987-2020 period (34 TY seasons in total). Due to the number of variables (42 in total) in each of the ten predictor models, automated variable selection is necessary to select the most skillful combination of predictors while maintaining some degree of parsimony. The stepAIC (Akaike Information Criterion) R function (MASS package 65 ) applies a backward and forward stepwise search to select the most appropriate combination of predictors, using the AIC 66 as a selection criterion for determining when the variable elimination procedure should stop. Poisson regression is then applied using the selected covariates to generate a predicted TC time series. This methodology is applied independently for each of the eight regions, ten predictor models and for all model lead times (nine in total).
The above methodology is initially applied in training mode, where the generalised linear model is fitted to the entire time series . The resultant prediction is referred to as the 'training' prediction. The leaveone-out cross-validation (LOOCV 67 ) is also applied using the selected predictors from the variable selection methodology outlined above. Using this approach, the model is trained using n − 1 seasons to produce a hindcast number of events and is iteratively applied in a jackknife fashion to hindcast every historical season in the record 68,69 (e.g. 51 seasons when applied between 1970 and 2020). This is referred to as the 'LOOCV' prediction. For both training and LOOCV prediction time series, model skill is tested using a number of criteria including: where the climatological prediction is defined by A SS of 1.0 (100%) indicates a perfect prediction and negative values indicate predictions that are less accurate than climatology 70 . 4. Strike rate: exact (SR-E), the percentage of seasons where the prediction (training/LOOCV) exactly matches the observation; 5. Strike rate ± 1 (SR ± 1): the percentage of seasons for which the training/LOOCV prediction is ± 1 from the observation.
For each outlook region, model initialisation period and TC category, the skill of each of the ten predictor models are evaluated for both the training and LOOCV predictions. The model with the highest LOOCV SS is selected as the best performing model and skill statistics are reported. The difference in finite-sample corrected AIC (AICc) 66 between the intercept only and the best performing model is checked to ensure models are not overfitted (AICc values ≥ 10 indicate models are not overfitted).