Introduction

The Kuroshio Extension (KE) is the offshoot of the Kuroshio – the western boundary current of the North Pacific (NP) subtropical gyre – that flows eastward as an inertial jet after separating from the east coast of Japan around 35°N. KE transports warm tropical and subtropical waters at a rate of about 100 Sv1 (1 Sv ≡ 106 m3 s−1) and provides heat and moisture to the mid-latitude atmosphere through intense air–sea interactions2,3,4. KE exhibits multi-scale variability in both time and space, from intraseasonal-to-seasonal, small-scale variability associated with mesoscale eddies and meanders5,6,7 to decadal, large-scale variability associated with its meridional shifts or changes in strength6,8. This multi-scale variability can have substantial impacts on climate from local to remote regions, e.g., the west coast of the United States2,3,4,9,10,11,12,13 and on local marine ecosystems14,15,16. Therefore, skillfully predicting KE variability is a scientifically and societally important challenge.

Both observational and modeling studies show distinct decadal variability in KE. The sea surface height (SSH) field derived from satellite altimeter reveals bimodal – stable and unstable – regimes of the KE6. These regimes reflect fluctuations between strengthened and weakened KE states associated with northward and southward meridional shifts of KE, respectively6. This decadal KE variability is evident in the satellite-derived SSH data available for the last three decades6,7,17. The decadal KE variability has been reasonably reproduced in forced ocean simulations constrained at the surface by atmospheric reanalysis products18,19.

It is well established by numerous previous studies that the decadal KE variability arises from the westward propagation of first baroclinic mode Rossby waves, generated in the central NP through Ekman pumping by wind stress curl (WSC) forcing8,18,19,20,21. Predictions based on idealized models suggest that KE can be predicted several years in advance due to well-known properties of Rossby wave propagation17,20. Initialized ocean-only retrospective forecasts under climatological surface forcing also suggest potential multi-year predictability of KE for the same reason22. However, it has not been examined whether the decadal KE variability can be predicted on multi-year timescales in more comprehensive, fully coupled prediction systems until very recently. Specifically, a study based on a newly developed decadal prediction system (GFDL-SPEAR23) shows skillful multi-year prediction of KE24, in agreement with previous studies based on simpler models. This study again connects the source of the skill to Rossby wave propagation. However, this system uses a low-resolution ocean model (~1° horizontal resolution), which cannot resolve the mesoscale eddies and fronts associated with KE that may be essential for KE dynamics6,21,25,26 and thus KE prediction.

Studies based on a dynamical framework, so called “thin-jet” theory21,25, suggest that the narrow meridional structure of KE is essential to the dynamics of the decadal KE variability as Rossby waves are trapped and guided by the sharp KE front. The narrow KE front can only be realistically simulated in ocean models that resolve eddies and frontal scales27. However, utilization of such high-resolution ocean models may not necessarily guarantee better prediction skill of KE as they also invigorate intrinsic ocean variability associated with mesoscale eddies. Indeed, eddy-resolving ocean simulations show intrinsic interannual to decadal KE variability under climatological forcing18,28. This intrinsic variability appears to complicate prediction of KE variability on interannual and shorter timescales29. However, recent studies suggest that decadal-scale eddy activity around the KE axis is paced, together with large-scale KE variability, by remote wind forcing18,30,31. Thus, skillful decadal prediction of KE might be achievable in high-resolution models even without accurate initialization of ocean eddies.

In this study, we investigate the predictability of KE and the source of its predictability from an ensemble (10 member) forecast set using the Community Earth System Model version 1 (CESM1) High-Resolution Decadal Prediction (HRDP) system32 (see Methods for details) that can resolve mesoscale eddies at latitudes around the KE axis33. The predictability from HRDP is compared to that from a companion low-resolution ensemble (40 members) forecast set using the CESM1 Decadal Prediction Large Ensemble (DPLE) system34 (Methods). As will be demonstrated below, HRDP shows exceptional multi-year prediction skill of KE, substantially better than that of DPLE.

Results

Predictability of KE

Both HRDP and DPLE are initialized with ocean states from a pair of forced ocean–sea-ice simulations (FOSIs) that use the same ocean and sea-ice models as in the coupled prediction simulations (referred to as FOSI-L and FOSI-H for low- and high-resolution, respectively), constrained only at the surface by atmospheric reanalysis products (“Methods” section). To assess the fidelity of FOSIs, we first examine if the observed decadal KE variability is reasonably reproduced. The observed variability is often inferred from satellite-derived SSH, averaged over a KE region near Japan6. However, because the KE latitudinal position is biased in the models, most notably in the low-resolution models, it is difficult to define a common KE domain that is suitable for both observations and model simulations. To overcome this difficulty, we first define the observed Kuroshio Extension index (KEI) from altimeter SSH data by averaging over a domain identified based on the sum of the first two empirical orthogonal functions (32.5°–36°N, 142°–154°E; Fig. 1a), both of which show pronounced decadal variability in their principal components. This KEI domain also largely encompasses the region where the total interannual variance is largest (Supplementary Fig. 1). The observed KEI shows a distinct decadal oscillation with peaks (troughs) during the early 2000s and the early to mid 2010s (mid to late 1990s and 2000s; Fig. 1b), consistent with previous studies7,17.

Fig. 1: Definitions of the KEI.
figure 1

a Sum of the first two EOFs (using the annual time series), which together explain 36% of the total variability, of SSH from satellite altimetry. b Time series of the KEIs from the satellite altimetry (black), FOSI-H (red), and FOSI-L (blue) averaged over the boxed regions in a, c, and d, respectively. c Correlation maps of SSH from FOSI-H against the observed KEI. d Same as in c, but for FOSI-L. The dark gray contours in a, c, and d are the climatological SSH in each dataset with contour intervals of 15 cm. Note that the global average is removed from satellite altimetry SSH to be consistent with the definition of SSH in the models.

Simulated KEIs are defined from FOSIs by averaging over the respective regions where the correlations with the observed KEI are the highest (Fig. 1c, d). Both FOSIs show the highest correlations along the climatological KE, that is, where the climatological SSH gradient is the largest. However, because of the broader meridional KE extent in FOSI-L than in FOSI-H (only the latter shows a comparable meridional KE extent to that observed), the high correlations also span a broader region in FOSI-L. We note that, although the positions of the KEI domains differ between FOSI-H (33°–36°N, 140°–156°E) and FOSI-L (35.5°–38.5°N, 142°–158°E), the latitudinal and zonal extents of the domains are identical. As expected, KEIs from both FOSIs are highly correlated (r ~ 0.8) with the observed KEI (Fig. 1b). Although the phase of the KEI variability in FOSI-L reasonably matches the observed KEI, its amplitude (σ = 3.8 cm) is much weaker than the observed amplitude (σ = 10.3 cm), which is more comparable to that of FOSI-H (σ = 9.8 cm).

The KEI is computed from each ensemble member of HRDP and DPLE by averaging over the same domain as FOSI-H and FOSI-L, respectively. Anomaly correlation coefficients (ACC) of the ensemble average KEI from DPLE and HRDP (Supplementary Fig. 2) are computed against the KEI from respective FOSIs for the period of 1987–2017 and against observed KEI for the period of 1993–2017 as a function of lead year (LY) in Fig. 2. HRDP reveals high ACCs against both FOSI-H and observations with significant (95% confidence level) scores up to LY 4, and substantially higher scores than the persistence forecast through LY 5. Even at LY5, ACC from HRDP remains quite high (0.4–0.5) against both FOSI-H and observations. This exceptional skill is also readily deducible from the time series of KEIs (Supplementary Fig. 2a–d). In contrast, DPLE shows relatively poor skill, always lower than the skill of HRDP, in predicting both FOSI-L and observed KEIs. ACC against FOSI-L is significant through LY 2, but rapidly drops from LY 2 to 4 (Fig. 2a). Significant skill of DPLE against the observations is only found for the first year. We account for differences in ensemble size between HRDP and DPLE by considering the distribution of ACC from randomly subsampled 10-member DPLE (blue shading in Fig. 2; “Methods” section). ACC from HRDP is always above the upper limit of the subsampled DPLE range, except at LY 1 against the observations, suggesting that the higher skill of HRDP is unlikely to arise by chance. While the skill of HRDP is higher than that recently reported using GFDL-SPEAR24, the skill of DPLE (with resolution comparable to that of GFDL-SPEAR) is noticeably lower. The HRDP skill is also higher than that obtained using a linear reduced gravity model17 where forcing is only wind-driven Ekman pumping (implying less contamination of the predictable signal by other forcings and processes).

Fig. 2: Prediction skill for KEI.
figure 2

a ACC against respective FOSI KEI as a function of LY for HRDP (solid red) and DPLE (solid blue) ensemble means for the period of 1987–2017. b Same as in a, but against the KEI from satellite observations for the period of 1993–2017. Crosses indicate that ACC is significant at the 95% confidence level determined using a bootstrapping method (Methods). The light blue shade represents the spread (2.5–97.5 percentile) of ACC scores obtained from subsampled 10-member ensembles of DPLE (Methods). Also shown are damped persistence forecasts of FOSI KEI in a (dashed red and blue from FOSI-H and FOSI-L, respectively) and KEI from the satellite altimeter in b (dashed black).

Source of skill

Previous studies emphasize the westward Rossby wave propagation induced by WSC forcing in the central NP as the primary source of predictability of KE17,18,20,22,24. In FOSI-H, lead-lag correlation maps of annual-mean SSH onto its KEI (Fig. 3a) suggest a westward propagation of SSH anomalies from the central NP (lag −3 and −2) to near Japan (lag 0 and 1) over roughly a 3-year time span. The initial SSH anomalies show a broad meridional extent roughly between 30° and 40°N in the central NP (lag −3 and −2), which then converge into the KE front once they reach west of 160°E (lag −1 through 1). This spatial pattern and the timescale of propagation are in good agreement with those from observations (Supplementary Fig. 3). In contrast, although there is an indication of the westward propagation of SSH anomalies, it is less obvious in FOSI-L (Fig. 4a) with a pre-existing anomaly near Japan at lag −2 and a sustained or even eastward propagating anomaly in the eastern NP at later lags. Also, there is no meridional concentration of the SSH anomalies toward the west, as might be expected from the coarse resolution of the ocean model.

Fig. 3: Correlation maps of SSH against KEI in FOSI-H and HRDP.
figure 3

a Lead-lag correlations of SSH against KEI from FOSI-H. b Same as in a, but ensemble average SSH from HRDP and lead-lag correlations across LYs with lag 0 corresponding to LY3. Note that the predictor in b is KEI from FOSI-H. The SSH fields lead (lag) KEI at negative (positive) lags for FOSI-H and at LY1-2 (LY4) for HRDP. The black contours indicate statistically significant correlations at the 95% confidence level. The boxed region (blue) indicates the KEI domain used in both FOSI-H and HRDP.

Fig. 4: Correlation maps of SSH against KEI in FOSI-L and DPLE.
figure 4

a Lead-lag correlations of SSH against KEI from FOSI-L. b Same as in a, but ensemble average SSH from DPLE and lead-lag correlations across LYs with lag 0 at LY3. Note that the predictor in b is KEI from FOSI-L. The SSH fields lead (lag) KEI at negative (positive) lags for FOSI-L and at LY1-2 (LY4) for DPLE. The black contours indicate statistically significant correlations at the 95% confidence level. The boxed region (blue) indicates the KEI domain used in both FOSI-L and DPLE.

The nature of the westward propagation in FOSI-H and observations is more evident in lead-lag correlations in the time-longitude plane of monthly-mean SSH averaged over the respective KEI latitudes onto each monthly-mean KEI (Fig. 5a, b). The SSH anomalies from both FOSI-H and observations take about 3 years to travel from around 160°W to the western boundary. In contrast, the westward propagation of SSH anomalies in FOSI-L appears to take only ~1 year, which is too fast to be considered as long Rossby wave propagation (Fig. 5c). Figure 5 also shows WSC correlations for FOSIs averaged over the same latitudes as SSH. In FOSI-H, negative WSC anomalies are found over the positive SSH anomalies in the central NP (negative lags), which is consistent with Ekman pumping that can generate westward propagating Rossby waves. Also, the WSC anomalies appear to march westward in tandem with positive SSH anomalies as far west as 160°E. This suggests a possible coupled feedback between the ocean and the atmosphere during the westward propagation of SSH anomalies that can maintain or even enhance the SSH signal on its way to the western boundary. In contrast, WSC anomalies in FOSI-L are not well aligned with and precede by about a year the SSH anomalies. Together with the too fast propagating signal, this suggests that Rossby wave propagation may not be the dominant mechanism that gives rise to KE variability in FOSI-L.

Fig. 5: Hovmöller diagrams of SSH and WSC correlations on KEI.
figure 5

a Correlations of the meridionally averaged, monthly SSH over the KE latitudes against monthly KEI from satellite observations plotted as a function of longitude and lag. b Same as in a, but from FOSI-H. Also shown in contours are the same correlations of WSC with solid (dashed) lines for positive (negative) correlations with contour intervals of 0.2 (zero contours are omitted). c Same as in b, but from FOSI-L. All time series are smoothed with a 12-month running mean before the computation of the correlations. SSH and WSC lead (lag) KEI for negative (positive) lags.

We explore whether Rossby wave propagation is the relevant predictability mechanism in the retrospective forecasts by performing lead-lag correlations as done for FOSIs, but across lead years (Figs. 3b, 4b). The purpose of this analysis is to find the source of the predicted KE variability. Therefore, we use KEIs from FOSIs as the independent variable, instead of KEIs from the retrospective forecasts. Because the ACC analysis suggests that both retrospective forecasts have some skill in predicting KEI from FOSIs at LY 3 (Fig. 2), although the skill is not statistically significant in DPLE, we compute the simultaneous correlation at LY 3, then the lead-lag correlations are computed across lead years. Specifically, SSH from the retrospective forecasts leads KEI from FOSI by 1 and 2 years at LY 2 and LY 1, respectively, and lags by 1 year at LY 4. We note that choosing either LY 2 or LY 4 for the lag-0 correlations gives very similar results. Indeed, the correlation maps from HRDP suggest that the westward propagating SSH signal from the initialized SSH anomaly in the central NP (LY 1 in Fig. 3b), which should be consistent with an anomalous state in FOSI-H between lag −3 and −2, leads to significant skill in predicting KEI at LY 3–4. The spatiotemporal evolution of the SSH anomalies is strikingly similar between FOSI-H and HRDP, including the convergence of anomalies into the KE axis (LY 3–4). The correlation maps from DPLE, however, do not clearly show a propagating SSH signal (Fig. 4b) and do not match well those of FOSI-L (Fig. 4a). Although there is an indication that some skill of the KEI prediction at LY 3 in DPLE is associated with an earlier SSH anomaly in the central NP (LY 1–2), it is not clear if the signal is propagating westward. In particular, an SSH anomaly exists just south of KE at LY 1, thus some of the KEI prediction skill at LY 3 may simply arise from this local source.

The source of predictability is further explored by focusing on individual events of the KE variability from the retrospective forecasts. KEI from the observations and FOSIs all show a well-defined positive peak in the early 2000s (Fig. 1). We trace the monthly SSH anomalies averaged over KEI latitudes from the retrospective forecasts initialized on 1 November 1998 as a function of lead time (Fig. 6b, d) and compare to the same SSH anomalies from respective FOSIs (Fig. 6a, c). In FOSI-H, a positive SSH anomaly is initially located in the central NP, then propagates westward and appears to be related to the positive KEI peak in 2002–2003 (Fig. 6a). This progression of the SSH anomaly is in good agreement with observations (Supplementary Fig. 4a). The HRDP retrospective forecast ensemble initialized from the state of FOSI-H on 1 November 1998 exhibits a close match to FOSI-H (Fig. 6b). Although the amplitude of the anomaly is weaker (likely due to ensemble averaging), HRDP reveals a westward propagating SSH signal through lead time that likely contributes to the predicted KEI peak at LY 4–5 (corresponding to 2002–2003). Thus, this HRDP result supports the mechanism of westward Rossby wave propagation leading to the high KEI skill score.

Fig. 6: SSH anomalies as a function of time and longitude.
figure 6

a Monthly SSH anomalies, meridionally averaged over the KEI latitudes, from FOSI-H during 1999–2003. b Same as in a, but from HRDP (ensemble average) from LY1 through LY5 (corresponding to 1999–2003) of the 1 November 1998 start. c Same as in a, but from FOSI-L. d Same as in b, but from DPLE. The time series are smoothed with a 12-month running mean. The dashed lines indicate the eastern edge of each KEI domain.

While a similar initial SSH anomaly exists in the central NP in both FOSI-L and DPLE, it does not propagate westward and lead to the KEI peak in 2003–2004 present in both observations and high-resolution simulations (Fig. 6c, d). Instead, the KEI peak in 2001 appears to be independent of the anomaly in the central NP, which largely remains at the same longitude. We have also traced the negative KEI peak in the late 2000s and found indications of westward propagating signals in the coarse-resolution simulations (Supplementary Fig. 5c, d), similar to the observations (Supplementary Fig. 4b) and the high-resolution simulations (Supplementary Fig. 5a, b). However, this signal is much weaker. It is also interesting to note that FOSI-L shows positive anomalies near the western boundary in 2007 to 2009, which do not exist either in the observations or FOSI-H, that appear to overpower the signal propagating from the east. These positive anomalies are not predicted in DPLE, thus the westward propagating signal may contribute to a skillful prediction of the observed negative peak at the right timing.

Several previous studies have proposed air-sea coupled feedbacks as a mechanism involved in generating the quasi-oscillatory, decadal variability of KE13,17,35,36. A possible coupling is also suggested in our analysis from FOSI-H (Fig. 5b). Given the high horizontal resolution of HRDP, which can invigorate air-sea coupling compared to the low-resolution DPLE32, it is possible that the higher predictive skill in HRDP may be associated with predicted feedbacks from the atmosphere. Figure 7a shows lead-lag regression maps from FOSI-H of winter (January to March) sea surface temperature (SST; shading) and SLP (contours) onto KEI. The SLP regressions show a meridional dipole anomaly resembling the North Pacific Oscillation19 (NPO) – the second most dominant mode of atmospheric variability in the NP sector37 – at lag −3. The positive SLP anomaly in the mid-latitudes corresponds to negative WSC anomalies in the central to eastern NP that can generate Rossby waves (Fig. 5b). The positive SLP anomaly extends to the west through lag −1 along with the SSH anomaly, consistent with the WSC anomaly (Fig. 5b). At the same time, the center of the SLP anomaly also moves northeastward to the west coast of Canada (lag −3 to lag −1) and then to the Gulf of Alaska (lag 0 to lag 2). By this time, a negative SLP anomaly emerges in the central NP, suggesting a phase reversal of the NPO. This counter-clockwise procession is consistent with previous studies13,35 that propose this procession as evidence of a coupled feedback between KE and the atmosphere that maintains the quasi-decadal variability in the NP. The phase reversal of the NPO is also associated with a tropical SST anomaly that resembles central Pacific El Niño-Southern Oscillation (CP-ENSO) that is of opposite sign to the initial tropical Pacific SST anomalies (cf. lags −3 and +1). Therefore, it is possible that the decadal variability of KE is phased by atmospheric teleconnections triggered by the CP-ENSO anomalies, as suggested by other studies36,38.

Fig. 7: Regressions of winter SST and SLP onto KEI.
figure 7

a Lead-lag regressions of January to March (JFM) SST (shading) and SLP (black contours) onto KEI from FOSI-H. b Same as in a, but ensemble average JFM SST and SLP from HRDP and lead-lag regressions across LYs with lag 0 at LY3. Note that the independent variable in b is KEI from FOSI-H. The SST and SLP lead (lag) KEI at negative (positive) lags for FOSI-H and at LY1-2 (LY4) for HRDP. The gray contours indicate statistically significant SST regressions at the 95% confidence level. Contour intervals for SLP are 0.4 hPa and zero contours are omitted.

Whether the atmospheric forcing reflects a coupled feedback within mid-latitudes or from the tropical pacific, HRDP is not able to predict the atmospheric conditions associated with the KE variability (Fig. 7b). Within a few months of initialization, the central tropical Pacific SST anomaly associated with the KE variability in FOSI-H is greatly enhanced from the initialized state (LY 1), which should be close to the FOSI-H anomalies at lag −2. In addition, an SST anomaly in the eastern tropical Pacific, which is absent in FOSI-H, also develops. Likely because of these SST anomalies, the predicted SLP anomalies strongly project onto the Aleutian Low mode (AL) rather than the NPO, which is the most dominant mode of atmospheric variability in the NP sector and the typical ENSO teleconnection pattern. Because (negative) WSC anomalies associated with this AL-like SLP anomaly are centered north of 40°N (Supplementary Fig. 6a), they do not appear to be able to reinforce the initialized SSH signal. After the first winter, predicted SLP anomalies are very weak (implying a lack of consistency across ensemble members), while the mid-latitude SST anomalies in FOSI-H are reasonably predicted in HRDP. Based on these results, it is reasonable to conclude that the highly predictable KEI in HRDP primarily results from the initialization of the anomalous ocean state in the central NP and that air-sea coupling does not appear to play a role, although it might provide additional skill if predicted.

Predictability of subsurface temperature

In this subsection, we examine the potential predictability of subsurface temperature associated with the decadal KE variability. In both FOSIs, KEI is strongly correlated with subsurface temperature variability around the KE axis in the respective zonal KEI domains (Fig. 8a, d). Although correlations >0.8 are seen throughout the upper ocean in both FOSIs, the regressions of temperature show that the center of action is located at ~400 m in FOSI-H (Fig. 9b), but near the surface in FOSI-L (Fig. 9c), roughly coinciding with the regions where the respective temperature variance maximizes in both FOSIs (gray contours). The subsurface-centered variability in FOSI-H is consistent with observations that also show a center of action at about 400 m and negligible anomalies near the surface (Fig. 9a). The high subsurface variability in FOSI-H and observations further supports the notion that the KE variations are associated with the Rossby wave propagation mechanism and associated fluctuations in the thermocline depth39, and the increased fidelity of the simulated subsurface temperature variability associated with KE variations in FOSI-H compared to FOSI-L. We also note that FOSI-H shows a negative anomaly north of the KE axis, which is also hinted at in observations.

Fig. 8: Correlations of the temperate profile in the KEI region.
figure 8

a, d Correlation of the vertical potential temperature profile from FOSI-H (a) and FOSI-L (d), zonally averaged over the respective KEI longitudes, against respective KEI. b, c Same as in a, but from the ensemble average potential temperature profile of HRDP at LY1 (b) and LY4 (c). e, f Same as in b, c, but from DPLE. Note that the predictors in b, c and e, f are KEIs from respective FOSIs and all correlations are simultaneous. The black contours in b, c and e, f indicate statistically significant correlations at the 95% confidence level. The green (gray) contours in a and d are the climatological potential temperature (zonal velocity) profile from respective FOSIs.

Fig. 9: Regressions of the temperate profile in the KEI region.
figure 9

ac Regression of the vertical potential temperature profile from observations (a), FOSI-H (b), and FOSI-L (c), zonally averaged over the respective KEI longitudes, against respective KEI. The dark gray contours are the variance of the potential temperature from the respective datasets with the contour intervals of 0.5 °C2.

ACCs of the temperature profile against KEI from the respective FOSIs show high skill scores in the subsurface exceeding 0.8 (0.7) around the KE axis at LY 1 in HRDP (DPLE) (Fig. 8b, e). The spatial patterns of ACCs also closely resemble those of FOSIs (Fig. 8a, d). However, we note that high ACCs in the subsurface in DPLE is associated with minimal variability as indicated by the regression map from FOSI-L (Fig. 9c). Although the spatial patterns are generally maintained, ACCs wane with lead time, but remain statistically significant below 200 m in HRDP even at LY 4 (Fig. 8c), consistent with the significant skill in predicting KEI at this lead time (Fig. 2). The subsurface temperature of DPLE is no longer significantly correlated with FOSI-L KEI at LY 4 (Fig. 8f), also consistent with the KEI skill of DPLE (Fig. 2), although significant correlations are found below 700 m where variability is minimal.

Why is the Rossby wave propagation signal weak in the low-resolution models?

It is rather unexpected to see a very weak signature of westward wave propagation in the low-resolution simulations, given the large-scale nature of long Rossby waves. A clue for the weak wave propagation signal can be found in the vertical structure of the anomalous ocean temperature associated with KEI explored in the previous subsection. In FOSI-L, the center of action is located near the surface around 42°N (Fig. 9c), which is not supported by observations (Fig. 9a). Lead-lag regressions of temperature at 100-m depth onto KEI from FOSI-L show that a temperature anomaly develops locally at this latitude off the east coast of Japan and extends to the east as KEI reaches its peak (from lag −3 to 0 in Supplementary Fig. 7). In contrast, the anomaly associated with KEI at this latitude is negligible for all depths in observations and FOSI-H (Fig. 9a, b).

To further investigate the dynamics of the KE variability in FOSI-L, we utilize forced ocean–sea-ice simulations that are similar to FOSI-L, but forced with interannually-varying momentum (FOSI-L-M) or buoyancy forcing (FOSI-L-B) alone (“Methods” section). Lead-lag SSH correlations for FOSI-L-M reveals a clear westward propagation of SSH anomalies, taking about 3 years to reach the western boundary, comparable to FOSI-H and observations (Fig. 10a; compare to Fig. 5). Interestingly, FOSI-L-B shows an eastward propagation of SSH anomalies emanating from the western boundary, consistent with the eastward extension of the subsurface temperature in FOSI-L (Supplementary Fig. 7). Therefore, these experiments suggest that wind forcing in the central NP can generate westward propagating signals in FOSI-L, but buoyancy forcing generates a signal propagating in the opposite direction along KE, interfering with the westward propagating signal and resulting in a mixed signal that appears to propagate westward faster than baroclinic Rossby waves (Fig. 5c).

Fig. 10: Hovmöller diagrams of SSH correlations with KEI in FOSI-L-M and FOSI-L-B.
figure 10

a Correlations of the meridionally averaged, monthly SSH over the KE latitudes in FOSI-L against monthly KEI from FOSI-L plotted as a function of longitude and lag. b Same as in a, but for FOSI-H. All time series are smoothed with a 12-month running mean before the computation of the correlations. SSH lead (lag) KEI for negative (positive) lags.

The incoming signals to the western boundary also appear to be weaker in FOSI-L than in FOSI-H. The amplitude of the initialized SSH anomalies in the central NP is very similar between two FOSIs, which is in turn close to that of observed SSH (Supplementary Fig. 8a). However, the initialized signal in FOSI-L is quickly damped as it propagates westward (Supplementary Fig. 8c). On the other hand, the amplitude of the propagating signal is relatively well preserved in FOSI-H (Supplementary Fig. 8b). This suggests that the incoming Rossby waves to the western boundary region (west of 160°E) have a weaker amplitude in FOSI-L than FOSI-H, likely because of the absence of the convergence into a sharp front and more diffusive nature of the low-resolution model40. Therefore, it seems reasonable to hypothesize that both westward and eastward propagating signals exist in DPLE when initialized with ocean states from FOSI-L, and because the amplitude of the westward propagating signal is weak, it is overpowered by the eastward propagating signal, resulting in predictability more governed by the latter. Since this signal is propagating eastward away from the KEI domain, predictability arising from this source is likely more short-lived than predictability achieved via westward Rossby wave propagation from the central NP (that takes 3–4 years to reach the western boundary).

Discussion

We have shown in this study exceptional skill in predicting KE variability up to 4 years ahead in a decadal prediction system at an eddy-resolving resolution (HRDP), significantly higher than the skill found in a low-resolution system using the same model framework (DPLE). The source of the exceptional skill in HRDP is an initialized ocean mechanism; specifically, SSH anomalies in the central NP induced by Ekman pumping that propagate westward as baroclinic Rossby waves, taking about 3 years to arrive at the western boundary. Local persistence of the signal17 and atmospheric feedback from mesoscale air-sea interaction26 could potentially be contributing some additional skill, in addition to the skill from Rossby wave propagation, resulting in the skillful prediction of the KE variability up to 4 years. The westward propagating signal appears to be trapped and guided by the sharp KE front in the high-resolution simulations21,25, causing a meridional convergence of initially broad SSH anomalies in the central NP as they approach the western boundary. The representation of this process appears to be the key to the exceptional skill in HRDP, because such convergence of the signal could accumulate energy within a narrow meridional extent. We do not find evidence of skill at predicting large-scale air-sea coupled feedbacks that could further augment the skill for KE variability in decadal prediction systems17,35,36. However, we cannot rule out the possibility that coupled air-sea interaction associated with ocean mesoscale eddies26 could be playing some role in HRDP. While HRDP does represent this process, it does not appear to feedback significantly onto large-scale SLP anomalies (Fig. 7), and in-depth study of its role in KE variability and predictability requires further investigation.

Although the westward Rossby wave propagation appears to exist in the low-resolution models, the amplitude of the signal reaching the western boundary region (west of 160°E) is weaker than in the high-resolution models because the convergence of waves into a sharp KE front is not represented in this resolution. Furthermore, the low-resolution system suffers from unrealistically strong upper ocean variability around 42°N near Japan in the state reconstruction used for initialization. This spurious upper ocean variability propagates eastward and appears to interfere with the incoming Rossby wave signals from the central NP. Thus, the potential skill associated with Rossby wave propagation from the central NP at later lead times (LY 3–4) appears degraded in DPLE. The eastward propagating signal in the upper ocean along the subarctic frontal zone (SAFZ; ~41°N) and its too large amplitude in a low-resolution model have been discussed in previous studies41,42. An ocean model intercomparison study also shows an overestimated SST variance along SAFZ in most models43. Thus, the spurious upper ocean variability along SAFZ appears to be a common symptom of low-resolution models. At this resolution (~1°), KE is much too broad and extends too far to the north (gray contours in Fig. 8d), and thus there is no clear distinction between KE and the SAFZ. Therefore, it is possible that while these two systems have more independent dynamics in reality and high-resolution models44, they are highly interdependent in low-resolution models, generating spurious variability. This implies that the key dynamics governing the KE variability are potentially underestimated in low-resolution models, while variability associated with SAFZ is overemphasized. Considering a possible role of the KE variability for decadal variability in the NP (e.g., Pacific decadal Oscillation)3,36, the implications of this underestimated KE dynamics in low-resolution coupled simulations need to be investigated in future studies.

Although GFDL-SPEAR uses a low-resolution ocean model similar to DPLE, it appears to outperform DPLE in predicting the KE variability24. A possible explanation for the higher skill in GFDL-SPEAR is the SST restoration towards observations in its initialization run23, which is absent in FOSIs. The SST restoring could eliminate the spurious variability along SAFZ, thus when initialized, Rossby waves may propagate to the western boundary without much interference from the SAFZ variability. We would also like to point out that a part of the lower skill in DPLE compared to GFDL-SPEAR can be attributed to less frequent start dates (every second yearly start date in DPLE vs every yearly start date in GFDL-SPEAR) and a shorter time period in our analysis used for consistency with HRDP. If every start date and a longer period are considered, ACC skill against FOSI-L is enhanced for longer lead times (LY 3-5) and all ACCs LY 1 through 5 become significant (Supplementary Fig. 9), being comparable or marginally lower than the ACC skill obtained from GFDL-SPEAR. We note, however, that DPLE has an advantage in terms of ensemble size over SPEAR (40 for DPLE vs. 20 for SPEAR).

Given the large computational resources required to run coupled high-resolution decadal prediction systems, it would be useful to know how many ensemble members are needed to achieve skill scores comparable to those obtained with 10 members. To answer this question, we resample the KEI from HRDP using a bootstrap method to generate a distribution of ACCs as a function of ensemble size (Methods). The answer to this question depends on lead time. For short lead time (e.g., LY 2 shown in Supplementary Fig. 10), the mean ACC is reasonably high even with a single ensemble member (~0.65), compared to ~0.75 with 10 ensemble members, although the uncertainty range is large (0.45–0.82). Both the ACC mean and uncertainty range quickly level off with the increase in ensemble size. For long lead time (e.g., LY 4 shown in Supplementary Fig. 10), however, the mean ACC for small ensemble size (1–2) is substantially lower (~0.4) than that (~0.6) for large ensemble size (9–10) and the uncertainty range for small ensemble size is very large. For example, the ACC score with three ensemble members at LY 4 can lie anywhere between 0.25–0.7. Therefore, a standard decadal prediction ensemble size (~10) appears to be important for long-lead time predictions (3- or longer-year in advance), while for short-lead time predictions (up to 2 years in advance), a smaller ensemble size (~3) may be enough.

Observations suggest that surface biomass, as represented by chlorophyll, in the upstream KE region varies in tandem with the decadal KE variability14,15. Lin et al.16 show from a high-resolution regional model coupled with a biogeochemistry (BGC) model that nutrient anomalies propagate westward along with thermocline (thus SSH) anomalies to the upstream KE region, which appears to modulate upper ocean chlorophyll there through vertical mixing. Based on the realistic representation of the westward propagating signal and its high predictability in HRDP, it seems reasonable to expect skillful prediction of nutrients and thus upper ocean chlorophyll in the upstream KE in high-resolution systems. Unfortunately, the BGC component was not activated in HRDP due to resource limitations, but the results of this study clearly support use of a BGC component in future high-resolution prediction studies. If the predictability of KE BGC fields is shown to be viable, an application of multi-year to decadal predictions using high-resolution models could be very helpful for managing marine resources and fisheries for communities that rely heavily on fisheries for food.

Methods

Prediction systems

The decadal prediction systems used in this study are identical to Yeager et al.32 Readers are referred to Yeager et al.32 for further details and a general comparison of the two systems. We only summarize a few key aspects of the systems below.

The CESM High Resolution Decadal Prediction (HRDP) system32 uses the CESM1 model configured at high horizontal resolution (~0.1° for the ocean and sea-ice; 0.25° for the atmosphere and land)45. The ocean and sea-ice components are initialized from a forced ocean–sea-ice (FOSI) simulation at the same resolution constrained only at the surface by reanalysis-derived (Japanese 55-year Reanalysis; JRA55) atmospheric states following the protocol of the Ocean Model Intercomparison Project version 2 (OMIP2)46. The atmosphere initial conditions are from regridded JRA55 analysis fields and the land initial conditions are taken from a high-resolution atmosphere-land simulation forced with observed sea surface temperatures47. All components use full-field initialization. HRDP is comprised of 10-member ensembles initialized every other year on November 1 between 1982 and 2016 and integrated for 62 months. The ensemble spread is generated by applying round-off level perturbations to the atmosphere temperature initial conditions.

The CESM Decadal Prediction Large Ensemble (DPLE) system34 uses a slightly different version of CESM148. The horizonal resolutions of all components of DPLE are nominal 1°. The atmosphere component uses a finite volume dynamic core instead of the spectral element used in HRDP. The ocean and sea-ice components are initialized similarly as HRDP, but from a coarse-resolution FOSI performed following the protocol of the Ocean Model Intercomparison Project version 1 (OMIP1)49 where the base atmospheric state variables are largely taken from the NCEP reanalysis. The initial conditions for the atmosphere and land components come from a single member of the CESM1 Large Ensemble48. In contrast to HRDP, DPLE is initialized (full field) every year on November 1 between 1954 and 2017 and integrated for 122 months. The strategy for the generation of ensemble spread (40 members) is identical to HRDP. We note that although DPLE includes more start dates (64) and longer lead times (122 months) than HRDP, DPLE has been sampled to match HRDP in terms of start dates (18) and lead times (62 months).

Both high- and coarse-resolution FOSIs used for the initialization of the respective prediction systems are used in the analyses of the study and referred to as FOSI-H and FOSI-L, respectively. In addition, we utilize forced ocean–sea-ice simulations same as FOSI-L, but forced with interannually-varying momentum (FOSI-L-M) or buoyancy forcing (FOSI-L-B) along with seasonally-varying climatological forcing for the opposite forcing50. This climatological forcing is applied by repeating Normal Year Forcing (NYF)51, which is designed to retain synoptic atmospheric variability.

Statistical methods

Statistical analyses are based on ensemble mean forecast anomalies from HRDP and DPLE. The forecast anomalies are relative to the model climatology for 1987–2017 for odd lead years (i.e., lead years 1, 3, and 5) and for 1986–2016 for even start years (i.e., lead years 2 and 4) because of the discontinuity in time for HRDP as a result of even-year initialization. Anomalies of FOSIs and observations are defined in the same way.

The significance of ACC scores (Fig. 2) is tested using a bootstrapping method34. We first randomly resample the Kuroshio Extension index (KEI) from the forecast ensembles with replacement across both the time and member dimensions, and compute ACC against KEI from both respective FOSIs or observations. To account for temporal autocorrelation, the resampling in time selects 2 consecutive time values (equivalent to 4 years because HRDP is available for every other start year). This process is repeated 5000 times to generate a distribution of ACC skill scores. The ensemble mean ACC score is deemed significant if it is found above the 97.5 percentile from the resampled distribution.

To account for the different ensemble size between HRDP (10) and DPLE (40), we randomly resample (5000 times with replacement) 10-member ensembles from DPLE and compute ACC against KEI from both FOSI-L and observations. The resultant distribution (2.5 to 97.5 percentile) of ACC scores is displayed in Fig. 2. A similar resampling method is used to generate the distribution of ACCs against FOSI-H in HRDP to assess skill stability as a function of ensemble size from 1 to 10 (Supplementary Fig. 10). Unlike the resampling method for the above case that a resampled time series entirely comes from a single ensemble member, any time points of a resampled time series in this case can come from different ensemble members. This is possible because each time point in an ensemble member is independent of each other as the predictions runs are integrated through the “lead time” dimension starting from initial conditions.

The statistical significance of correlations and regressions in Figs. 34 and 79 is assessed at the 95% confidence level through a two-sided Student’s t test with the effective degree of freedom accounting for lag 1 autocorrelation52.

Observational datasets

Observational datasets used in the analyses are: monthly sea surface height dataset from SSALTO/DUCAS altimetric mean dynamic topography distributed by the Copernicus Marine and Environment Monitoring Service (CMEMS); monthly NOAA Optimum Interpolation Sea Surface Temperature version 2 (OISSTv2)53; monthly gridded ocean temperature profiles from the Met office EN4.2.154; monthly sea level pressure data from the NCEP55 and JRA5556 reanalysis products.