## Introduction

The past decades have seen significant progress in understanding the physics and predictability of the El Niño and Southern Oscillation (ENSO)1,2,3,4,5 (e.g., a recent review by Clarke6). In particular, a fairly robust phase-lock of interannual variability to the seasonal cycle has been revealed7. This is demonstrated by applying an Empirical Orthogonal Function (EOF) analysis to the Niño 3.4 index (Fig. 1). The annual amplitude function defines an El Niño, La Niña or neutral year; and the evolution of an ENSO event follows the calendar-year structure function. The ENSO’s persistence from July to February in the following year has a high level of predictability. One main and remaining challenge is the so-called spring predictability barrier (SPB)6. That is, the skill of prediction made from April/May to July (i.e., the transition through sparing) is lower than in other months for all dynamical and statistical forecasting models2.

Statistical models based on previously identified precursors, including the equatorial Pacific Warm Water Volume (WWV)8,9,10 and Indo-Pacific equatorial wind11, 12, show reduced prediction skill for ENSO after 20026. This has been attributed to the weaker ENSO amplitude, more frequent shifts between El Niño and La Niña, and a tendency toward more central Pacific than eastern Pacific El Niños2, 7, 13,14,15. Because of its role in the zonal displacement of the Warm Pool, zonal surface current anomaly (U) in the equatorial Pacific has been identified as a potential additional precursor. An index based on U averaged over the Niño 3.4 region (5°N–5°S, 170–120°W) shows a precursor property similar to the WWV, but with a slightly lower precursor correlation6.

In this study, we show that the ENSO prediction skill across the SPB using surface currents can be significantly improved by identifying “hotspots” of correlation between the Niño 3.4 index and surface currents (from satellite remote sensing) over the whole tropical Pacific region. This is achieved by undertaking a multivariate regression analysis16 (Methods). The analysis reveals regions where surface current variability influences ENSO, and the directions of these currents. Figure 2 presents the results of four regressions “trials”. (1) A 2-month-lead “prediction” of the July Niño 3.4 index (denoted as N 7) is based on surface current anomalies in May. It shows two hotspots: one straddling the equator between 170°E and 140°W, and the other centered at 6°S between 160°E and the dateline. An increase of the Niño 3.4 index is associated with strengthening of the eastward velocity in the northern and the westward velocity in the southern hotspots, respectively. (2) Two similar hotspots are found for a 3-month-lead “prediction” from April, based on surface currents averaged from February to April. In this case, the area of the northern hotspot shrinks, while the southern one moves eastward. (3) A 4-month-lead “prediction” from March is based on surface currents averaged from January to March. In this case, the northern hotspot nearly vanishes, while the southern one moves further eastward. (4) A 5-month-lead “prediction” from February is based on surface currents averaged from January to February. In this case, the northern hotspot completely vanishes, while the southern hotspot expands in the meridional direction. The eastern boundary of the southern hotspot is located at the dateline, 170°W, 160°W and 150°W for prediction lead times of 2, 3, 4 and 5 months, respectively.

The locations of hotspots identified by the regression analysis hint at the role played by currents in the displacement of warm water. Figures 3 and 4 show the hotspot positions relative to two major branches of surface currents, and the distributions of sea surface temperature (SST) in the tropical Pacific. The northern hotspot lies in the interleaving area of the eastward-flowing North Equatorial Countercurrent (NECC) and westward-flowing South Equatorial Current (SEC). During El Niño years, the enhanced NECC and the weakened SEC facilitate the eastward migration of the Warm Pool, as revealed previously17,18,19. The southern hotspot is located at the southern edge of the SEC (Fig. 3), and between two cores of maximum SST (Fig. 4). During El Niño years, the westward current in the southern hotspot intensifies, resulting in displacement of warm water from east to west. The opposite situation happens in La Niña years. Previous studies have revealed that the accumulation of warm water in the western Pacific is a necessary precondition for the onset of El Niño20, 21. Thus, the currents in the southern hotspot may be particularly useful for ENSO prediction across the SPB.

The zonal current anomaly in the southern hotspot can be related to the anomalous curl of wind stress. We take averages from February to April (right panels of Fig. 3) as an example. The mean wind stress curl is positive over the southern hotspot and to its north, and negative to the south (Fig. 3d), corresponding to downwelling and upwelling, respectively, in the southern hemisphere. During El Niño years, the curl anomaly is generally negative near the southern hotspot, weakens the downwelling to the north and enhances upwelling to the south. This favours the shaping of the SST distribution with the maximum SST being located just to the south of the hotspot (Fig. 4c). This leads to an increase in the westward current in the southern hotspot (Fig. 3b). Note that to both the north and south of this hotspot the anomalous zonal currents are eastward, so enhancement of the westward displacement of warm water occurs only in this very localized southern hotspot during the transition stage of the ENSO cycle. The situation is reversed during La Niña years.

The prediction of N 7 can be formulated through a multivariate regression to various precursors in the spring transition stage, including surface currents in the southern hotspot, and the total and western WWV (Methods). The prediction skill is illustrated in Fig. 5 and summarized in Table 1. If only surface currents are used, the correlations between the observed and predicted N 7 with lead times of 2, 3, 4 and 5 months are 0.95, 0.87, 0.68 and 0.64, respectively. If only the total WWV are used, the corresponding correlations are 0.76, 0.72, 0.67 and 0.66. Combining both the surface currents in the southern hotspot and the western WWV, the corresponding correlations are increased to 0.95, 0.90, 0.77 and 0.78, respectively.

The surface current in the hotspot identified from the prediction of N 7 also shows high forecasting skill in the subsequent months after July. Figure 6 shows the skills of retrospective predictions during 1993–2005 as a function of lead months, based on surface currents, the WWV and persistence (Methods). The skill is quantified by the correlation coefficient between the predicted Niño 3.4 index against the corresponding observations, and the root mean square of their difference (rms error). A prediction with the correlation being larger than 0.6 is referred to as being skillful22. Predictions are initialized in May, April, March and February, respectively. Predictions based on surface current beat the persistence with all lead months. The SPB is evident as indicated by the significant reduction of skills at lead months of 11, 12, 13 and 14, respectively, for predictions started in May, April, March and February. Prior to the approaching to the “barrier” of next year (in April), predictions based on surface currents are all skillful, with correlations being 28%, 21%, 13% and 21% higher, and the rms error being 26%, 16%, 8% and 8% lower, than the predictions based on the WWV on average, for initialization in May, April, March and February, respectively. If the surface current and the western WWV are both used, the prediction skill can be improved, especially for initialization made in March and February. This further indicates that prediction based on surface current has a better ability to overcome the SPB than that based on the WWV alone.

Next, we perform cross validations by examining whether the prediction skill based on surface current varies with the training and application periods. Figure 7 compares the skills of four regression trials. The reference trial sets the training and application spanning the same period of 1993–2015. The next trial sets the training period as 1993–2007 and the application period as 1993–2015. Very similar skill is achieved as the reference trial. The other two trials set non-overlapping training and application periods, one period being 1993–2004 and the other being 2005–2015. For these two trials, their differences with the reference trial in terms of correlation and rms error are generally within 0.1 before approaching the SPB of next year. An exception is the trial with training period of 1993–2004 and application period of 2005–2015, when initialized in April the correlation drops below 0.6 at lead times beyond 8 months. Overall, for the latter three trials with training periods different from application periods, their prediction skills are not seriously degraded compared to the reference trial that has an overlapping period for training and application. These cross-validation tests suggest that after trained with existing date, the surface currents-based model can indeed be skillful for prediction.

In summary, the skill of ENSO prediction across the SPB can be improved using surface currents in the southern hotspot. The importance of surface currents in the very localized region, in terms of the overall westward accumulation of warm water, may be related to its significant vertical extension. The satellite ocean current product used in the analysis represents currents averaged from surface to 30 m depth23. Furthermore, observations have shown that the SEC here can extend to about 300 m depth24,25,26. Quantification of the anomalous zonal heat transport in this area requires detailed knowledge of the time-space variations in currents and ocean temperature. In this study, the statistical relationship and prediction model are derived from analysis of 23-year observations from 1993 to 2015. It remains to be verified whether this prediction model can be applied over longer durations. Also, it is desirable to apply this model to real-time ENSO predictions27. However, it is worth noting that the prediction model based on surface current precursor shows skill both before and after 2002, while predictions based on other precursors, including the WWV and winds6, show reduced skill after 2002.

## Methods

### Prediction Based on Regression Relationship

High influence regions of surface current are identified from the prediction of N 7 through a linear multiple regression analysis. The regression model is formulated as N 7-N t = αU it + βV it + ε, where N 7 and N t are the Niño 3.4 index in month numbers 7 (July) and t; (U it, V it) denote surface current anomalies averaged from month i to t; α and β are regression coefficients, and ε is the residual. Thus, 7-t defines the lead time (in months) of the prediction, and t-i defines the length (in months) to average the surface currents as the prediction precursor. The values of i are selected among t, t-1, and t-2, corresponding to averages lengths of 1, 2 and 3 months. For each choice of t and i, the location of the “hotspot” where N 7-N t and αU it + βV it have the maximum correlation is identified. Table 1 lists the locations of the identified hotspot, and the correlation (r) between the observed and predicted N 7 using different precursors with different lead times.

The prediction for N t+lead is formulated as N t+lead = N t + αU it + βV it + γ $${W}_{W}^{t}$$ + δU it $${W}_{W}^{t}$$ + ζV it $${W}_{W}^{t}$$ + ε, where (U it, V it) now denote surface currents in the southern hotspot identified from the prediction of N 7; $${W}_{W}^{t}$$ is the western WWV; lead denotes the lead time in month. Values of α, β, γ, δ and ζ are obtained through regression, and some of them can be pre-set to zero to exclude the related precursors. The combination of surface current with the western WWV achieves a higher prediction skill than using the total WWV (W) alone. Except for the initial condition (N t), the other terms may be interpreted as various “precursors”: the displacement of the mean WWV by the anomalous surface currents (αU it + βV it); the displacement of the anomalous western WWV by the mean surface currents (γ $${W}_{W}^{t}$$); and the displacement of the anomalous western WWV by the anomalous surface currents (δU it $${W}_{W}^{t}$$ + ζV it $${W}_{W}^{t}$$). The prediction skill is measured by the correlation between the observed and predicted N t+lead and the root-mean-squared (rms) of their difference. The prediction model based on the WWV is formulated as N t+lead = γW t + ε. The prediction skill is measured by the correlation and rms error between N t+lead and γW t. Without involving the surface current, a higher prediction skill is achieved using the total WWV instead of the western or eastern WWV.

The persistence prediction assumes that N t+lead = N t. The prediction skill is measured by the correlation between N t and the observed N t+lead and the rms of their difference.

### Data Sources

Surface currents are obtained from the satellite altimeter Ocean Surface Current Analysis Real-Time (OSCAR) estimate (http://www.oscar.noaa.gov/). The Niño 3.4 index, representing the SST anomaly averaged over 5°N–5°S and 170–120°W, is down loaded from http://www.cpc.ncep.noaa.gov/data/indices/. The WWV is the heat content of the upper ocean within 5°N–5°S and 120°E–80°W with water temperature greater than 20 °C. The WWV is also split into the western (W W ) and eastern (W E ) parts, defined for regions within 120°E–155°W and 155°W–80°W, respectively. The WWV data are obtained from http://www.pmel.noaa.gov/elnino/. Wind stress data are obtained from the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA-Interim product (http://www.ecmwf.int/en/research/climate-reanalysis/era-interim). SST is obtained from the Hadley Centre Sea Ice and Sea Temperature Dataset (http://www.metoffice.gov.uk/hadobs/hadisst/). Seasonal cycles have been removed from the Niño 3.4 index, and the total, western and eastern WWV. Monthly anomalies of surface current and wind stress are obtained by removing their mean seasonal cycles for the period 1993–2015.