Extended-range statistical ENSO prediction through operator-theoretic techniques for nonlinear dynamics

Forecasting the El Niño-Southern Oscillation (ENSO) has been a subject of vigorous research due to the important role of the phenomenon in climate dynamics and its worldwide socioeconomic impacts. Over the past decades, numerous models for ENSO prediction have been developed, among which statistical models approximating ENSO evolution by linear dynamics have received significant attention owing to their simplicity and comparable forecast skill to first-principles models at short lead times. Yet, due to highly nonlinear and chaotic dynamics (particularly during ENSO initiation), such models have limited skill for longer-term forecasts beyond half a year. To resolve this limitation, here we employ a new nonparametric statistical approach based on analog forecasting, called kernel analog forecasting (KAF), which avoids assumptions on the underlying dynamics through the use of nonlinear kernel methods for machine learning and dimension reduction of high-dimensional datasets. Through a rigorous connection with Koopman operator theory for dynamical systems, KAF yields statistically optimal predictions of future ENSO states as conditional expectations, given noisy and potentially incomplete data at forecast initialization. Here, using industrial-era Indo-Pacific sea surface temperature (SST) as training data, the method is shown to successfully predict the Niño 3.4 index in a 1998–2017 verification period out to a 10-month lead, which corresponds to an increase of 3–8 months (depending on the decade) over a benchmark linear inverse model (LIM), while significantly improving upon the ENSO predictability “spring barrier”. In particular, KAF successfully predicts the historic 2015/16 El Niño at initialization times as early as June 2015, which is comparable to the skill of current dynamical models. An analysis of a 1300-yr control integration of a comprehensive climate model (CCSM4) further demonstrates that the enhanced predictability afforded by KAF holds over potentially much longer leads, extending to 24 months versus 18 months in the benchmark LIM. Probabilistic forecasts for the occurrence of El Niño/La Niña events are also performed and assessed via information-theoretic metrics, showing an improvement of skill over LIM approaches, thus opening an avenue for environmental risk assessment relevant in a variety of contexts.


Results
We present prediction results obtained via KAF and LIMs in a suite of prediction experiments for (i) the Niño 3.4 index in industrial-era HadISST data 27 ; (ii) the Niño 3.4 index in a 1300-yr control integration of CCSM4 28 , a CGCM known to exhibit fairly realistic ENSO dynamics and Pacific decadal variability 29 ; and (iii) the probability of occurrence of El Niño/La Niña events in both HadISST and CCSM4. All experiments use SST fields from the respective datasets in a fixed Indo-Pacific domain as training data. The Niño 3.4 prediction skill is assessed using root-mean-square error (RMSE) and pattern correlation (PC) scores. We employ = . PC 0 6 and 0.5 as representative thresholds separating useful from non-useful predictions. The probabilistic El Niño/La Niña experiments are assessed using the Brier skill score (BSS) 30,31 , as well as information-theoretic relative entropy metrics [32][33][34] . For consistency with an operational forecasting environment, all experiments employ disjoint, temporally consecutive portions of the available data for training and verification, respectively. In the CCSM4 experiments, Niño 3.4 indices are computed relative to monthly climatology of the training data; in HadISST we use anomalies relative to the 1981-2010 monthly climatology as per current convention in many modeling centers 35 . Figure 1 displays KAF and LIM prediction results for the Niño 3.4 index in the HadISST dataset over a 1998-2017 verification period, using 1870-1997 for training. In Fig. 1(a,b) it is clearly evident that even though there is www.nature.com/scientificreports www.nature.com/scientificreports/ significant decadal variability of skill, KAF consistently outperforms the LIM in terms of both RMSE and PC scores, with an apparent increase of 4-7 months of useful forecast horizon. In particular, KAF attains the longest useful skill in the first decade of the verification period, 1998-2007, despite an initially faster decrease of skill over 0-6 months. The KAF PC score for 1998-2007 is seen to cross the 0.6 threshold at approximately 12 months, and remains above 0.5 at least out to 14 months, compared to 5 and 6 months for LIM, respectively. In 2008-2017, the PC skill of KAF exhibits a more uniform rate of decay, crossing the = . PC 0 5 threshold at a lead time of 9 months, compared to 6-7 months for LIM. When measured over the full, 20-year verification period, KAF provides 3 to 4 months of additional useful skill, as well as a markedly slower rate of skill reduction beyond 4 months. As a comparison with state-of-art dynamical forecasts, the skill scores in Fig. 1 are generally comparable, and in some cases exceed, recently reported scores from the North American Multimodel Ensemble 36 (though it should be kept in mind that the verification periods of that study and ours are different).
In separate calculations, we have verified that incorporating delays in the kernel is a significant contributor to the skill of KAF, particularly at lead times greater than 2 months. This is consistent with previous work, which has shown that delay embedding increases the capability of kernel algorithms to extract dynamically intrinsic coherent patterns in the spectrum of the Koopman operator 19,20 , including ENSO and its linkages with seasonal and decadal variability of the climate system [21][22][23] . Given that our main focus here is on the extended-range regime, Fig. 1 depicts KAF results for a single embedding window of 12 months. Operationally, however, one would employ different embedding windows at different leads to optimize performance.
Next, in order to assess the potential skill of KAF and LIM approaches in an environment with more data available for training and verification (contributing to more robust model construction and skill assessment, respectively) than HadISST, we examine Niño 3.4 prediction results for the CCSM4 dataset, using the first 1100 years for training and the last 200 years for verification. Figure 2 demonstrates that while both KAF and LIM exhibit significantly higher skill compared to the HadISST results in Fig. 1(a,b), the performance of KAF is particularly strong, with PC scores computed over the 200-year verification period crossing the 0.6 and 0.5 levels at 17-and 24-month leads, respectively. The latter values correspond to an increase of predictability over HadISST by 9 and 14 months, measured with respect to the same PC thresholds. In comparison, the = . PC 0 6 and 0.5 predictability horizons for LIM are 11 and 18 months, respectively, corresponding to an increase of 6 and 11 www.nature.com/scientificreports www.nature.com/scientificreports/ months over HadISST. As illustrated in Fig. 2, the RMSE and PC scores from individual decades in the CCSM4 verification period exhibit significant variability, and similarly to HadISST, KAF consistently outperforms LIM on individual verification decades. In general, KAF appears to benefit from the larger CCSM4 dataset more than LIM, which is consistent with the former method better leveraging the information content of large data sets due to its nonparametric nature and theoretical ability to consistently approximate an optimal prediction function (the conditional expectation). In contrast, LIM's performance is limited to some extent by structural model error due to linear dynamics, and this type of error may not be possible to eliminate even with arbitrarily large training datasets.
Another important consideration in ENSO prediction is the seasonal dependence of skill, exhibiting the so-called "spring barrier" 37 . Figure 3 shows the month-to-month distribution of the KAF and LIM Niño 3.4 PC scores, computed for 6-and 9-month leads in HadISST and 0-24-month leads in CCSM4 using the same training and verification datasets as in Figs. 1 and 2, respectively. These plots feature characteristic spring barrier curves, with the highest predictability occurring in late winter to early spring and a clear drop of skill in summer 3,4 . The diminished summer predictability is thought to be caused by the low amplitude of SST anomalies developing then, making ENSO dynamics more sensitive to high-frequency atmospheric noise 3 . As we have shown in previous work 22 , the class of kernels employed here for forecasting is adept at capturing the effects of atmospheric noise and its nonlinear impact on ENSO statistics, including positive SST anomaly skewness underlying El Niño/La Niña asymmetry. The method has also been found highly effective in capturing the phase locking of ENSO to the seasonal cycle through a hierarchy of combination modes 21,23,38 . Correspondingly, while both methods exhibit a reduction of skill during summer, KAF fares significantly better than the benchmark LIMs in both of the HadISST and CCSM4 datasets. More specifically, in the HadISST results in Fig. 3(a,b), the LIM's PC score drops rapidly to small, 0-0.3 values between June and September, while KAF maintains scores of about 0.4 or greater over the same period. In particular, the June-September PC scores for KAF at 6-month leads hover between 0.5 and 0.6, which are values that one could consider as "beating" the spring barrier (albeit marginally). Similarly, in CCSM4 ( Fig. 3(d)), the = . PC 0 6 contour for KAF decreases less appreciably and abruptly over May-August than in the case of the LIM, indicating that KAF is better at maintaining skill during the transition from spring to summer. Since strong El Niño events are often triggered by westerly wind bursts in spring, advecting warm surface water to the east 3 , the better performance of KAF in summer suggests that it is more capable of capturing the triggering mechanism 22 , aiding its ability in predicting the onset of ENSO. Indeed, this is consistent with the running forecast time series for HadISST in Fig. 1(c,d), in which KAF often yields more accurate forecasts at the beginning of El Niño events, e.g., during 2009/10 and 2015/16.
As a final deterministic hindcast experiment, we consider prediction of the 2015/16 El Niño event with KAF. This event has received considerable attention by researchers and stakeholders across the world 35 , in part due to it being the first major El Niño since the 1997/98 event, providing an important benchmark to assess the advances in modeling and observing systems in the nearly two intervening decades. Figure 4 shows KAF forecast trajectories of 3-month running means of the Niño 3.4 index in HadISST for the period May 2015 to May 2017, initialized in May, June, and July of 2015 (i.e., 7, 6, and 5 months prior to the December 2015 peak of the of the event).
In this experiment, we use Indo-Pacific SST data from HadISST over the period January 1870 to December 2007 for training. Moreover, we perform prediction of 3-month-averaged Niño 3.4 indices to highlight seasonal evolution 35 . Prediction results for the instantaneous (monthly) Niño 3.4 index are broadly consistent with the results in Fig. 4.
We find that the KAF hindcast initialized in July 2015 accurately predicts the 3-month averaged Niño 3.4 evolution through February 2016, including the 2015/16 El Niño peak, yielding a moderate over-prediction over the ensuing months which nevertheless correlates well with the true signal. The trajectory initialized in June 2015 also predicts a strong (≥2 °C) El Niño event peaking in December 2015, but underestimates the true 2.5 °C www.nature.com/scientificreports www.nature.com/scientificreports/ anomaly peak by approximately 0.4 °C. Meanwhile, the trajectory initialized in May 2015 predicts a 1.5 °C positive anomaly in the ensuing December, which falls short of the true event by 1 °C but still qualifies as an El Niño event by conventional definitions. Overall, this level of performance compares favorably with many statistical model forecasts of the 2015/16 El Niño (which typically did not yield a possibility of a ≥2 °C event until August 2015 35 ).   www.nature.com/scientificreports www.nature.com/scientificreports/ In fact, the forecast trajectories in Fig. 4 have comparable, and sometimes higher, skill than forecasts from many operational weather prediction systems, despite KAF utilizing far-lower numerical/observational resources and temporal resolution.
We now turn to the skill of probabilistic El Niño/La Niña forecasts. A distinguishing aspect of probabilistic prediction over point forecasts (e.g., the results in Figs. [1][2][3] is that it provides more direct information about uncertainty, and as a result more actionable information to decision makers. Probabilistic prediction with either first-principles or statistical models, or ensembles of models, is typically conducted by binning collections of point forecasts, e.g., realized by random draws from distributions of initial conditions and/or model parameters. Here, we employ an alternative approach, which to our knowledge has not been previously pursued, whereby KAF and LIM approaches are used to predict the probability of occurrence of El Niño or La Niña events directly, without generating ensembles of trajectories. Our approach is based on the fact that predicting conditional probability is equivalent to predicting the conditional expectation of a characteristic function indexing the event of interest 13 . As a result, this task can be carried out using the same KAF and LIM approaches described above, replacing the Niño 3.4 index by the appropriate characteristic function as the response variable. Here, we construct characteristic functions for El Niño and La Niña events in the HadISST and CCSM4 data using the standard criterion requiring that at least five overlapping seasonal (3-month running averaged) Niño 3.4 indices to be greater (smaller) than 0.5 (−0.5) °C, respectively 35 . In this context, natural skill metrics are provided by the BSS 30,31 , as well as relative entropy from information theory 32,33 (see Methods). For binary outcomes, such as occurrence vs. non-occurrence of El Niño or La Niña events, the BSS is equivalent to a climatology-normalized RMSE score for the corresponding characteristic function. As information-theoretic metrics, we employ two relative entropy functionals 34 , denoted  and , which measure respectively the precision relative to climatology and ignorance (lack of accuracy) relative to the truth in a probabilistic forecast. For a binary outcome,  attains a maximal value *  , corresponding to a maximally precise binary forecast relative to climatology. On the other hand,  is unbounded, but has a natural scale  * corresponding to the ignorance of a probabilistic forecast based on random draws from the climatology; this makes = *   a natural threshold indicating loss of skill. Overall, a skillful binary probabilistic forecast should simultaneously have Note that we only report values of these scores in the CCSM4 experiments, as we found that the HadISST verification is not sufficiently long for statistically robust computation of relative-entropy scores.
The results of probabilistic El Niño/La Niña prediction for CCSM4 and HadISST are shown in Figs. 5 and 6, respectively. As is evident in Fig. 5(a,b), in the context of CCSM4, KAF performs markedly better than LIM in terms of the BSS and  metric for all examined lead times, and for both El Niño and La Niña events. The  scores in Fig. 5(c) start from 0.3 at forecast initialization (zero lead) for both methods, and while the KAF scores exhibit a monotonic decrease with lead time, in the case of LIM they exhibit an oscillatory behavior, hovering around 0.25 values. The latter behavior, in conjunction with a steady decrease of BSS and  seen in Fig. 5(a,b), is a manifestation of the fact that, as the lead time grows, LIM produces biases, likely due to dynamical model error. Note, in particular, that a forecast of simultaneously high ignorance (i.e., large error) and high precision, as the LIM forecast at late times, must necessarily be statistically biased as it underestimates uncertainty. In contrast, the KAF-derived results exhibit a simultaneous increase of ignorance and decrease of precision, as expected for an unbiased forecast under complex dynamics with intrinsic predictability limits. Turning to the HadISST results, Fig. 6 again demonstrates that KAF performs noticeably better than LIM for both El Niño and La Niña prediction. Note that the apparent "false positives" in the KAF-based La Niña results around 2017 are not necessarily unphysical. In particular, there are weak La Niña events documented during 2016-2017 and 2017-2018, which are excluded from the true characteristic function for La Niñas, but may exhibit residuals in the approximate characteristic function output by KAF.
In conclusion, this work has demonstrated the efficacy of KAF in statistical ENSO prediction, with a robust improvement in useful prediction horizon in observational data by 3-7 months over LIM approaches, and three appealing characteristics-namely, more skillful forecasts at long lead times with slower reduction of skill, reduced spring predictability barrier, and improved prediction of event onset. Moreover, in the setting of model data with larger sample sizes, the enhanced performance of KAF becomes more pronounced, with skill extending out to 24 month leads. Aside from the higher skill in predicting ENSO indices, the method is also considerably more skillful in probabilistic El Niño/La Niña prediction. We attribute these improvements to the nonparametric nature of KAF, which can consistently approximate optimal prediction functions via conditional expectation in the presence of nonlinear dynamics through a rigorous connection with Koopman operator theory. A combination of this approach with delay-coordinate maps further aids its capability to extract dynamically coherent predictor variables (through kernel eigenfunctions), including seasonal variability associated with ENSO combination modes, representations of higher-order ENSO statistics due to atmospheric noise, and Pacific decadal variability. Even though nonparametric ENSO prediction methods are not particularly common, this study shows that methods such KAF have high potential for skillful ENSO forecasts at long lead times, and can be naturally expected to be advantageous in forecasting other geophysical phenomena and their impacts, particularly in situations where the underlying dynamics is unknown or partially understood.

Methods
Datasets. The observational data used in this study consists of monthly averaged SST fields from the Hadley Centre Sea Ice and Sea Surface Temperature (HadISST) dataset 27,39 , sampled on a 1° × 1° latitude-longitude grid, and spanning the industrial era, 1870-2017. The modeled SST data are from a 1300-year, pre-industrial control integration of the Community Climate System Model version 4 (CCSM4) 28,40 , monthly averaged, and sampled on the model's native ocean grid of approximately 1° nominal resolution. All experiments use SST on the Indo-Pacific longitude-latitude box 28°E-70°W, 30°S-20°N as input predictor data, and the corresponding Niño 3.4 www.nature.com/scientificreports www.nature.com/scientificreports/ indices as target (predictand) variables. The latter are defined as SST anomalies relative to a monthly climatology (to be specified below) computed from each dataset, spatially averaged over the region 5°N-5°S, 170°-120°W. The number p of spatial gridpoints in the HadISST and CCSM4 Indo-Pacific domains is 11,315 and 31,984, respectively.
Each dataset is divided into disjoint, temporally consecutive, training and verification periods. In the HadISST results in Figs  , where t and τ are the forecast initialization and lead times, respectively, given a vector  ∈ x t ( ) m of predictor variables observed at forecast initialization, under the assumption that τ + y t ( ) and x t ( ) are generated by an underlying dynamical system. To make a prediction of τ + y t ( ) at lead time τ = Δ q t, where q is a non-negative integer and Δ > t 0 a fixed interval, it is assumed that a time-ordered dataset consisting of pairs + x y ( , ) n n q , with For every such lead time τ, KAF constructs a forecast function To that end, it treats x t ( ) and y t ( ) as the values of observables (functions of the state) along a trajectory of an abstract dynamical system (here, the Earth's climate system), operating on a hidden state space Ω. On Ω, the dynamics is characterized by a flow map Φ Ω → Ω : t , such that ω Φ ( ) t corresponds to the state reached after dynamical evolution by time t, starting from a state ω ∈ Ω. Moreover, there exist functions  Ω → X: m and  Ω → Y: such that, along a dynamical trajectory initialized at an arbitrary state ω ∈ Ω, we have That is, in this picture, x t ( ) and y t ( ) are realizations of random variables, X and Y, referred to as predictor and response variables, respectively, defined on the state space Ω.
It is a remarkable fact, first realized in the seminal work of Koopman 41 and von Neumann 42 in the 1930s, that the action of a general nonlinear dynamical system on observables such as X and Y can be described in terms of intrinsically linear operators, called Koopman operators. In particular, taking   = Ω → f { : } to be the vector space of all real-valued observables on Ω (note that Y lies in  ), the Koopman operator τ U at time τ is defined as the linear operator mapping ( ( )). From this perspective, the problem of forecasting τ + y t ( ) at lead time τ becomes a problem of approximating the observable where ω ∈ Ω is the state initializing the observed dynamical trajectory. It should be kept in mind that while  is by construction a linear space, on which τ U acts linearly without approximation, the elements of  may not (and in general, will not) be linear functions satisfying a relationship such as ω ω ω ω . In fact, in many applications Ω is not a linear set (e.g., it could be a nonlinear manifold), in which case no elements of  are linear functions.
In the setting of forecasting with initial data determined through X, a practically useful approximation of τ U Y must invariably be through a function  ∈ τ F that can be evaluated given values of X alone. That is, we must have is a real-valued function on data space, referred to above as the forecast function, and In real-world applications, including the ENSO forecasting problem studied here, the observed data x t ( ) are generally not sufficiently rich to uniquely infer the underlying dynamical state on Ω www.nature.com/scientificreports www.nature.com/scientificreports/ (i.e., X is a non-invertible map). In that case, any forecast function τ f will generally exhibit a form of irreducible error. The goal then becomes to construct τ F optimally given the training data + x y ( , ) n n q so as to incur minimal error with respect to a suitable metric. Effectively, this is a learning problem for a function in an infinite-dimensional function space, which KAF addresses using kernel methods from statistical learning theory 14 .
Following the basic tenets of statistical learning theory, and in particular kernel principal component regression, KAF searches for an optimal τ f in a finite-dimensional hypothesis space L  , of dimension L, which is a subspace of a reproducing kernel Hilbert space (RKHS), , of real-valued functions on data space  m . For our purposes, the key property that the RKHS structure of  allows is that a set of orthonormal basis functions can be evaluated at any point ∈ x X, not necessarily lying in the set of training points x n . In particular, ψ x t ( ( )) l can be evaluated for the data x t ( ) observed at forecast initialization. As with every RKHS,  is uniquely characterized by a symmetric, positive-definite kernel function    × → k: m m , which can be intuitively thought of as a measure of similarity, or correlation, between data points. As a simple example, EOF analysis is based on a covariance kernel, , also known as a linear kernel, but note that the kernel k used in our experiments is a Markov-normalized, nonlinear Gaussian kernel (whose construction will be described below).
Associated with any kernel function k and the training data  A key aspect of the KAF forecast function is its asymptotic behavior as the number of training data N and the dimension of the hypothesis space L grow. In particular, it can be shown 13 that if the dynamics on Ω has an ergodic invariant probability measure (i.e., there is a well-defined notion of climatology, which can be sampled from long dynamical trajectories), and the eigenvalues of the kernel matrix K are all strictly positive for every n, then under mild regularity assumptions on X, Y , and the kernel k (related to continuity and finite variance), τ F converges in a joint limit of → ∞ N followed by → ∞ L to the conditional expectation  | τ U Y X ( ) of the response observable τ U Y evolved under the Koopman operator for the desired lead time, conditioned on the data available at forecast initialization through X. In particular, it is a consequence of standard results from probability theory and statistics that  | τ U Y X ( ) is optimal in the sense of minimal RMSE among all finite-variance approximations of τ U Y that depend on the values of X alone. Note that no linearity assumption on the dynamics was made in order to obtain this result. It should also be noted that while, as with many statistical forecasting techniques, stating analogous asymptotic convergence or optimality results in the absence of measure-preserving, ergodic dynamics is a challenging problem, the KAF formulation described above remains well-posed in quite general settings, including the long-term climate change trends present in the observational SST data studied in this work. choice of predictors and kernel. Our choice of predictor function X and kernel k is guided by two main criteria: (i) X should contain relevant information to ENSO evolution beyond the information present in individual SST snapshots. (ii) k should induce rich hypothesis spaces L  ; in particular, the number of positive eigenvalues λ l (which controls the maximal dimension of  L ) should grow without bound as the size N of the training dataset grows. First, note that the covariance kernel employed in EOF analysis is not suitable from the point of view of the latter criterion, since in that case the number of positive eigenvalues is bounded above by the dimension of the data space m (which also bounds the number of linearly independent, linear functions of the input data). This means that one cannot increase the hypothesis space dimension L to beyond m, even if plentiful data www.nature.com/scientificreports www.nature.com/scientificreports/ is available, i.e.,  N m. In response, following earlier work 17,18,43 , to construct k we start from a kernel of the form is a nonlinear shape function, set here to a Gaussian, a distance-like function on data space. We set d to an anisotropic modification of the Euclidean distance, shown to improve skill in capturing slow dynamical timescales 43 . Specifically, choosing the dimension m of the predictor space to be an even number, and (for reasons that will become clear below) partitioning the predictor vectors into two m ( /2)-dimensional components, i.e., Here, ζ is a parameter lying in the interval [0, 1), ||·|| is the standard (Euclidean) 2-norm, and Δx, ξ, ξ′ are vectors in  m/2 given by Moreover, θ and θ′ are the angles between Δx and ξ and ξ′, respectively, so that The kernel  k is then Markov-normalized to obtain k using the normalization procedure introduced in the diffusion maps algorithm 25 . This involves the following steps: www.nature.com/scientificreports www.nature.com/scientificreports/ Linear inverse models. Under the classical LIM ansatz 1 , the dynamics of ENSO can be well modeled as linear system driven by stochastic external forcing represented as temporally Gaussian white noise. The method used here to conduct LIM experiments follows closely the procedure described by Penland and Sardeshmukh 2 . Specifically, the dynamics is governed by a stochastic differential equation where  ψ ∈ t ( ) L is the LIM state vector at time t, B is an × L L matrix representing a stable dynamical operator, and W(t) is a Gaussian white noise process.
Let s(t) be an SST vector from Eq. (3), observed at forecast initialization time t, and ′ = − s s s t t t ( ) ( ) ( ) be the corresponding anomaly vector relative to the monthly climatology s t ( ) at time t (computed as described above for the HadISST and CCSM4 datasets). Then, the corresponding LIM state vector ψ t ( ) is given by projection onto the leading L EOFs  ∈ e l p , computed for the × p p spatial covariance matrix C based on the anomalies ′ s t ( ) n in the training data, i.e., where S′ is the × p N data matrix whose n-th column is equal to ′ − s t ( ) n 1 . The RMSE-optimal estimate for ψ τ + t ( ) at an arbitrary lead time τ under the assumed dynamics in Eq. (5) is then given by the solution of the deterministic part with initial data ψ t ( ), viz.    form which we deduce that predicting using Eq. (7) is equivalent to Eq. (9). For completeness, we note that had one implemented KAF using the covariance kernel, the eigenvectors φ l of the × N N kernel matrix  = K S S corresponding to nonzero eigenvalues would be principal components, given by linear projections of the data onto the corresponding EOF from Eq. (6); that is, www.nature.com/scientificreports www.nature.com/scientificreports/  φ λ = . e S/ (10)  We emphasize that Eq. (10) is special to eigenvectors associated with covariance kernels, and in particular does not hold for the nonlinear Gaussian kernels employed in this work.
Returning to the LIM implementation, in practice, τ G( ) 0 is first estimated at some time τ = Δ q t 0 0 ,  ∈ q 0 , from the training SST data s t ( ) n sampled at the times t n from (1), All LIM-based forecasts reported in this paper were obtained via Eq. (7) with τ G( ) from Eq. (12). Note that under analogous ergodicity assumptions to those employed in KAF, the empirical time averages in Eq. (11) converge to climatological ensemble averages. It should also be noted that, among other approximations, the model structure in Eq. (5) assumes that the dynamics is seasonally independent.
Validation. The main tunable parameters of the KAF method employed here are the number of delays Q, the Gaussian kernel bandwidth , and the hypothesis space dimension L. Here, we use throughout the values = Q 12,  = 1, ζ = .
0 99, and = L 400. As stated above, these values were determined by hold-out validation, i.e., by varying the parameters seeking optimal prediction skill in validation datasets. Note that this search was not particularly exhaustive, as we found fairly mild dependence of forecast skill under modest parameter changes around our nominal values.
The LIMs in this work have two tunable parameters, namely τ 0 in Eq. (12) and the number L of principal components employed. We set τ = 2 0 months and = L 20, using the same hold-out validation procedure as in KAF.
probabilistic forecasting. Our approach for probabilistic El Niño/La Niña forecasting is based on the standard result from probability theory that the conditional probability of a certain event to occur is equal to the conditional expectation of its associated characteristic function. Specifically, let, as above,  Ω → Y: be the real-valued function on state space Ω such that ω Y( ) is equal to the Niño 3.4 index corresponding to climate state ω ∈ Ω. Let also  Ω → Y : be the 3-month running-averaged Niño 3.4 index, i.e., Here, we follow a standard definition for El Niño events 35 , which declares ω to be an El Niño state if ω > .°Y( ) 0 5 C for a period of five consecutive months about ω. This leads to the characteristic function  χ Ω → + : , such that χ ω = + ( ) 1 if there exists a set J of five consecutive integers in the range − [ 4,4], such that for all ∈ j J, and χ ω = + ( ) 0 otherwise. Similarly, we define a characteristic function χ − for La Niña events, requiring that ω < − .°Y( ) 0 5 C for a five-month period about ω. With these definitions, the conditional probabilities τ ± P x t ( ( )) , for El Ninõ/La Niña, respectively, to occur at lead time τ, given the predictor vector x t ( ) at forecast initialization time t, is equal to the conditional expectation  χ | = τ ± U X x t ( ()) of χ ± acted upon by the time-τ Koopman operator τ U . In particular, because the values of χ ± are available to us over the training period, we can estimate τ ± P x t ( ( )) , using the KAF and LIM methodologies analogously to the Niño 3.4 predictions described above. All probabilistic forecast results reported in this paper were obtained in this manner.

Forecast skill quantification. Let
1}, be the climate states in Ω over a verification period consisting of ∼ N samples, and n q n n q be the corresponding values of the predictor and response (Niño 3.4) functions at lead time τ = Δ q t. We assess the forecast skill of the prediction function τ f for τ U Y using the root-mean-square error (RMSE) and Pearson correlation (PC; also known as pattern correlation) scores, defined as~~~~^∑ respectively. Here, µ τ (µ τ ) and σ τ (σ τ ) are the empirical means and standard deviations of + y n q ( τ  f x ( ) n ), respectively.
In the case of the probabilistic El Niño/La Niña forecasts, we additionally employ the BSS and relative-entropy scores, which are known to provide natural metrics for assessing the skill of statistical forecasts [31][32][33][34] . In what follows, we outline the construction of these scores in the special case of binary response functions, such as the characteristic functions χ ± for El Niño/La Niña events, taking values in the set {0, 1}. First, note that every probability measure on {0, 1} can be characterized by a single real number π ∈ [0, 1] such that the probability to www.nature.com/scientificreports www.nature.com/scientificreports/ obtain 0 and 1 is given by π and π − 1 , respectively. Given two such measures characterized by π ρ ∈ , [0, 1], we define the quadratic distance function Moreover, under the condition that ρ = 0 only if π = 0, we define the relative entropy (Kullback-Leibler divergence) where, by convention, we set π π ρ log ( / ) equal to zero whenever ρ or ρ − 1 is equal to zero, respectively. This quantity has the properties of being non-negative, and vanishing if and only if π ρ = . Thus, D KL can be thought of as a distance-like function on probability measures, though note that it is non-symmetric (i.e., in general, , and does not obey the triangle inequality. Intuitively, π ρ  D ( ) KL can be interpreted as measuring the precision (additional information content) of the probability measure characterized by π relative to that characterized by ρ, or, equivalently, the ignorance (lack of information) of ρ relative to π.
In order to assess the skill of probabilistic El Niño forecasts, for each state ω  n in the verification dataset, we consider three probability measures on {0, 1}, characterized by (i) the predicted probability τ +  P x ( ) n , determined by KAF or LIM; (ii) the climatological probability + P , which is equal to the fraction of states ω  n in the verification dataset corresponding to El Niño events (i.e., those states for which χ ω = +  ( ) 1 n ); and (iii) the true probability equal to the value χ ω +  ( ) n of the characteristic function indexing the event. Using these probability measures, for each lead time τ = Δ q t,  ∈ q 0 , we define the instantaneous scores ) n q is equal to 1 (i.e., if whenever the forecast model predicts an El Niño event, the event actually occurs). Moreover, based on the interpretation of relative entropy stated above, τ  x ( ; ) n  measures the additional precision in the forecast distribution relative to climatology, and  τ ω  ( ; ) n the ignorance of the forecast distribution relative to the truth. In our assessment of a forecasting framework such as KAF and LIM we consider time-averaged, aggregate scores over the verification dataset, viz. Note that τ ( )  is equal to the mean square difference between the characteristic function χ + and the forecast function + P evaluated on the verification dataset. Similarly, τ ( )  is equal to the mean square difference between χ + and the constant function equal to the climatological probability + P . As is customary, we normalize τ ( )  by the climatological error  τ ( ), and subtract the result from 1, leading to the Brier skill score 31 Note that in the expression above we have tacitly assumed that the climatological forecast has nonzero error, i.e., τ ≠ ( ) 0  , which holds true apart from trivial cases. With these definitions, a skillful model for probabilistic El Niño prediction should have small values of τ BSS( ), large values of  τ ( ), and small values of  τ ( ). Let now +⁎ P be defined such that it is equal to 1 if ≤ . + P 0 5, and 0 if > . + P 0 5. Note that the binary probability distribution represented by ⁎ P places all probability mass to the outcome in {0, 1} that is least probable with respect to the climatological distribution, which in the present context corresponds to an El Niño event. One can verify that the precision score τ ( )  is bounded above by the relative entropy  = + + ⁎ ⁎ D P P ( ) KL . The latter quantity provides a natural scale for τ ( )  corresponding to the precision score of a probabilistic forecast that is maximally precise relative to climatology, in the sense of predicting with probability 1 the climatologically least likely outcome. To define a natural scale for  τ ( ), we consider the time-averaged relative entropy , i.e., the average ignorance of the climatological distribution relative to the truth. This quantity sets a natural threshold for useful probabilistic El Niño forecasts, in the sense that such forecasts should have τ < ⁎ ( )   . The BSS and relative entropy scores and thresholds for probabilistic La Niña prediction are derived analogously to their El Niño counterparts using the characteristic function χ − . www.nature.com/scientificreports www.nature.com/scientificreports/