Abstract
The Earth’s climate system is a classical example of a multiscale, multiphysics dynamical system with an extremely large number of active degrees of freedom, exhibiting variability on scales ranging from micrometers and seconds in cloud microphysics, to thousands of kilometers and centuries in ocean dynamics. Yet, despite this dynamical complexity, climate dynamics is known to exhibit coherent modes of variability. A primary example is the El Niño Southern Oscillation (ENSO), the dominant mode of interannual (3–5 yr) variability in the climate system. The objective and robust characterization of this and other important phenomena presents a longstanding challenge in Earth system science, the resolution of which would lead to improved scientific understanding and prediction of climate dynamics, as well as assessment of their impacts on human and natural systems. Here, we show that the spectral theory of dynamical systems, combined with techniques from data science, provides an effective means for extracting coherent modes of climate variability from highdimensional model and observational data, requiring no frequency prefiltering, but recovering multiple timescales and their interactions. Lifecycle composites of ENSO are shown to improve upon results from conventional indices in terms of dynamical consistency and physical interpretability. In addition, the role of combination modes between ENSO and the annual cycle in ENSO diversity is elucidated.
Similar content being viewed by others
Introduction
Ever since the discovery of phenomena such as ENSO^{1} and the MaddenJulian Oscillation (MJO)^{2}, the objective identification and characterization of coherent modes of climate variability have been vigorously studied across the disciplines of Earth system science. In the face of dynamical complexity and eventtoevent diversity, the state of largescale patterns of climate dynamics is typically described through a reduced representation provided by climatic indices, constructed using physical understanding and/or statistical approaches. For example, ENSO is an oscillation with a broadband periodicity of 2–7 years, commonly monitored using socalled Niño indices^{3}. The latter are defined as spatial and temporal averages of sea surface temperature (SST) anomalies over the equatorial Pacific source region of ENSO. Such indices are employed for a multitude of diagnostic and prognostic purposes, including lifecycle composites^{4} and prediction skill assessment^{5}.
Clearly, the success of these efforts depends strongly on the properties of the indices employed to characterize the phenomenon of interest. In general, it is desirable that a climatic index be as objective as possible, i.e., reveal an intrinsic pattern of climate dynamics independent of subjective choices such as data prefiltering, or details of the observation modality. For oscillatory patterns such as ENSO and MJO, it is important that the indices reveal the full cycle as a sequence of observables, e.g., SST fields in the case of ENSO. Yet, despite their widespread use, conventional approaches for defining climatic indices have inherent limitations, obfuscating the properties of the phenomenon under study, and sometimes yielding inconsistent results^{6}. Empirical Orthogonal Function (EOF) analysis^{7}, for example, is perhaps the most commonly used statistical technique for identification of climatic indices, yet it is well known to exhibit timescale mixing and poor physical interpretability due to EOF invariance under temporal permutations of the data, even in idealized settings^{8}. In the context of ENSO, scalar Niño indices do not provide full information about the state of the cycle because the index could be increasing or decreasing.
In contrast to EOF analysis and related approaches, which identify patterns based on eigendecomposition of covariance operators, spectral analysis techniques for dynamical systems employ composition operators, such as Koopman and transfer operators^{9,10,11}. A key advantage of this operatortheoretic formalism is that it transforms the nonlinear dynamics on phase space to linear dynamics on vector spaces of functions or distributions, enabling a wide variety of spectral techniques to be employed for coherent pattern extraction and forecasting. Indeed, starting from early spectral approximation techniques for Koopman^{12,13} and transfer^{14,15,16} operators in the 1990s, there has been vigorous research on operatortheoretic approaches applicable to broad classes of autonomous^{17,18,19,20,21,22} and nonautonomous systems^{23,24,25,26,27,28}. In addition, recently developed methods^{29,30,31,32,33,34,35,36,37} combine Koopman and transfer operator theory with kernel methods for machine learning^{38,39,40} to yield datadriven algorithms adept at approximating evolution operators and their spectra.
In this paper, we show that the operatortheoretic framework provides an effective route for identifying slowly decaying (equivalently, slowly decorrelating) observables of the climate system as dominant eigenfunctions of transfer/Koopman operators and their generator. These eigenfunctions directly describe coherent climate phenomena such as ENSO, with higher dynamical consistency and physical interpretability than indices derived through conventional approaches. The principal distinguishing aspects of this analysis, illustrated in Figs. 1, 2, 3, and 4, can be summarized as:

1.
Identification of cycles from spatiotemporal information: Our spectral approach is based on dynamical systems techniques, providing a superior basis for extracting persistent cyclic behavior. We transform the underlying full nonlinear dynamics to a larger linear space, yielding a complete linear picture for our spectral analysis. This transformation is built directly from physical spatiotemporal fields such as SST snapshots. Complex pairs of eigenvalues and their eigenvectors directly reveal persistent cycles (see outer panels of Fig. 1) and their periods.

2.
Dynamical rectification: The underlying oscillations in ENSO are clearly revealed in a “rectified” twodimensional (2D) phase space provided by a complex eigenvector. Temporal evolution of the oscillation is well described by a harmonic oscillator, represented by motion at a fixed speed around a circle in 2D phase space, with the oscillation frequency α determined by the complex eigenvalue. Importantly, this property holds true even if the dynamics of the full system is chaotic. See Figs. 2, 3, and 4, and accompanying animations in Supplementary Movies 1 and 2 for illustrations. Our thorough treatment of rectification clearly shows the asymmetry of ENSO, and enables an estimate of the “local speed” of the ENSO cycle.

3.
Phase equivariance: If the 2D phase space is partitioned into S “wedges”, each corresponding to a lifecycle phase, then the dynamical evolution of the samples starting in any given phase over a time interval of 2π/(Sα) maps them consistently to the next phase. See Fig. 1 (center) and Fig. 5 for examples of this behavior with S = 8. An important consequence of equivariance combined with slow decay is that it endows the identified phases with higher predictability, while enabling the discovery of new mechanistic relationships between physical fields because of a more accurate lifecycle. Our improved phasing suggests that ENSO has a more significant cyclical component than previously thought.
Results
The perspective adopted here is to view a climatic time series \({x}_{0},{x}_{1},\ldots ,{x}_{N1}\in {{\mathbb{R}}}^{d}\) as an observable of an abstract dynamical system representing the evolution of the Earth’s climate. That is, we envision that there is an (unobserved) state space Ω and a function \(X:{{\Omega }}\to {{\mathbb{R}}}^{d}\) such that x_{n} = X(ω_{n}), where ω_{n} ∈ Ω is the climate state underlying snapshot x_{n}. Moreover, we consider that there is an (unknown) dynamical evolution law Φ^{t} : Ω → Ω, such that Φ^{t}(ω_{0}) is the climate state reached at time t starting from an initial state ω_{0}. In particular, the climate states underlying the observed data are given by ω_{n} = Φ^{n Δt}(ω_{0}), where Δt is a fixed sampling interval. In the analyses that follow, X will correspond to monthly averaged SST, sampled at d IndoPacific gridpoints at a monthly sampling interval Δt.
Given the data x_{n}, our goal is to identify a collection of observables (eigenfunctions) \({g}_{j}:{{\Omega }}\to {\mathbb{C}}\) with two main features: cyclicity and slow correlation decay. First, the observables are cyclic in the sense that there is an associated period over which they approximately return to their original values. Second, the observables are slowly decaying (or “persistent” or “coherent”) in the sense that their norm decreases slowly under forward evolution of the dynamics. In the context of this work, “slowly decaying” and “slowly decorrelating” observables are synonymous notions.
From a machinelearning perspective, this task corresponds to an unsupervised learning problem aiming to identify slowly decaying cyclic observables. Note that cyclicity is a significantly different objective than variance maximization performed in the Proper Orthogonal Decomposition (POD), EOF analysis, and related techniques^{7,41,42}. Complex EOF analysis^{43}, Principal Oscillation Pattern (POP) analysis^{44} and spectral analysis of autoregressive models^{45} seek to identify oscillatory modes from time series, though generally through the restrictive lens of linear state space dynamics. Operatortheoretic approaches are able to consistently extract cyclicity and coherence from nonlinear systems^{30,46,47,48}, without invoking a specific modeling ansatz such as linear dynamics.
In the present work, to cope with the highdimensional data spaces resulting from climatic variables (e.g., SST fields), these operators will be learned using geometrical kernel methods combined with delayembedding methodologies^{49,50,51}. Delaycoordinate maps are also leveraged for analysis of climatic time series by extended EOF (EEOF) analysis^{52}, Singular Spectrum Analysis (SSA)^{53,54,55}, and related approaches, which extract temporal principal components (PCs) and associated spatiotemporal patterns (EEOFs) through singular value decomposition of a trajectory data matrix in delaycoordinate space. The success of these methods at recovering oscillatory patterns, including ENSO, has been interpreted from both state space^{55} and operatortheoretic perspectives^{33,37}. Ultimately, however, the extracted PCs from EEOF analysis/SSA are constrained to be linear functions of the delayembedded data, and do not provide direct spectral information about evolution operators acting on observables.
Operatortheoretic formalism
Similar to classical methods such as EOF analysis, our approach assumes that the dynamics Φ^{t} on Ω is a stationary, ergodic process. We note that while the climate system is not strictly stationary, our methods perform well in extracting the dominant cycles on interannual or shorter timescales. Mathematically, the stationarity is governed by a probability measure μ on Ω, which is preserved by the dynamics; formally μ(Φ^{−t}(A)) = μ(A) for any measurable set A ⊆ Ω. The ergodicity assumption is an indecomposability hypothesis: there are no nontrivial Φ^{t}invariant sets, meaning that Ω cannot be decomposed into separate subsystems.
Operatortheoretic approaches shift attention from studying the properties of the (generally, nonlinear) flow Φ^{t} on state space to studying its induced action on linear spaces of (generally, nonlinear) observables. We denote by \({{{{{{{\mathscr{F}}}}}}}}\) the space of complexvalued functions on Ω. The space \({{{{{{{\mathscr{F}}}}}}}}\) has the structure of an infinitedimensional linear (vector) space equipped with the standard operations of function addition and scalar multiplication, but the elements of \({{{{{{{\mathscr{F}}}}}}}}\) need not be linear functions. We will consider the subspace of observables \(H=\{f\in {{{{{{{\mathscr{F}}}}}}}}: \int_{{{\Omega }}} f{ }^{2}\ d\mu\, < \, \infty \}\). Intuitively, thinking of μ as the climatological distribution of the system, the space H consists of all observables with finite climatological mean and variance.
The dynamics acts naturally by composition on each element f_{0} ∈ H. For invertible Φ^{t}, the composition operator, f_{t} ≔ P^{t}f_{0} = f_{0}∘Φ^{−t}, known as the transfer operator, evolves f_{0} forward t units of time to the function f_{t}. Dual to (and here, the inverse of) the transfer operator P^{t}, is the Koopman operator defined by U^{t}f_{0} ≔ f_{0}∘Φ^{t}. Traveling forward in time along a trajectory \({\{{{{\Phi }}}^{t}({\omega }_{0})\}}_{t\ge 0}\), the observations recorded by f_{0} along this trajectory are f_{0}(Φ^{t}(ω_{0})) = (U^{t}f_{0})(ω_{0}).
Ergodicity may be equivalently characterized by the constant function 1 being the unique (normalized) fixed point of U^{t}. Ergodicity implies (via Birkhoff’s Ergodic Theorem or the strong law of large numbers) that sufficiently long trajectories in Ω will well sample μ. This will be important in this paper because we are using a single trajectory as our input data. We note that many operatortheoretic algorithms may also use information from multiple trajectories and are not restricted to using a single time series. Similar operator constructions can be carried out in other functional settings, notably there is a welldeveloped spectral theory for infinite compositions of different transfer operators arising from nonautonomous dynamical systems^{24,56,57,58}.
We now describe how the spectral properties of P^{t} and U^{t} provide natural notions of persistent almostcyclic functions and observations. We distinguish between methods applicable for discrete and continuoustime dynamics. Discretetime approaches are based on approximations of the time1 transfer/Koopman operators, whereas continuoustime approaches target the infinitesimal generators of the transfer/Koopman evolution semigroups. In the present setting of observables in the Hilbert space H associated with the invariant measure, the Koopman and transfer operators are unitary, and are duals to one another under operator adjoints, i.e., P^{t*} = U^{t}. Thus, working with P^{t} vs. U^{t} is merely a matter of convention.
Persistent cycles from the spectrum: discrete time
Let P = P^{1} be the time1 transfer operator on H. If Pg = Λg, with g ≢ 0, we call \({{\Lambda }}\in {\mathbb{C}}\) an eigenvalue and g an eigenfunction. One has^{11} that ∣Λ∣ = 1, ∣g∣ is constant, and the collection of all eigenvalues of P, denoted σ_{e}(P), is a subgroup of the unit circle (if \({{\Lambda }},\hat{{{\Lambda }}}\in {\sigma }_{e}(P)\) then \({{\Lambda }}\hat{{{\Lambda }}}\) and \({{\Lambda }}/\hat{{{\Lambda }}}\) are both in σ_{e}(P)). As a simple example, if our phase space is S^{1} (a circle of circumference 2π) and Φ = Φ^{1} rotates the circle by an angle α, then P has eigenvalues Λ_{k} = e^{ikα} with corresponding eigenfunctions e^{ikθ} for \(k\in {\mathbb{Z}}\) and θ ∈ S^{1}. Analogous results hold for the Koopman operator U = U^{1}.
Numerical estimation of P or U inevitably introduces perturbations or “noise” to the operators, and leads to finitedimensional representations which cannot exactly comply with the above theory. In particular, numerical representations of P are often not unitary. Nevertheless, numerical schemes such as projected restrictions of P or U onto subspaces of H spanned by locally supported or globally supported basis functions have been highly successful^{17,23,29,59} and in certain settings, convergence results for the spectrum and eigenfunctions have been proven^{14,15,22,60,61,62}. In these schemes, the spectrum of the approximate P is contained in the unit disk \(\{z\in {\mathbb{C}}: z \le 1\}\), rather than lying on the unit circle \(\{z\in {\mathbb{C}}: z =1\}\). This addition of noise, which may also be done theoretically, for example by convolution with a stochastic kernel^{15,62,63}, is frequently harnessed to easily select the most important eigenvalues from the typically infinite collection σ_{e}(P), namely those eigenvalues with large magnitude (close to 1).
Let P_{ϵ} denote this perturbed operator and consider an eigenfunction g^{(ϵ)} corresponding to an eigenvalue Λ^{(ϵ)} of large magnitude. Because \({({P}_{\epsilon })}^{t}{g}^{(\epsilon )}={({{{\Lambda }}}^{(\epsilon )})}^{t}{g}^{(\epsilon )}\), these eigenfunctions g^{(ϵ)} decay slowly under iteration of P_{ϵ} relative to the decay rates of eigenfunctions corresponding to eigenvalues of smaller magnitude. It is these “leading” or “dominant” eigenfunctions that will persist over long timescales and will accurately describe the evolution of the dynamics over similarly long timescales.
Returning to our example Φ rotating the circle by an angle α, the eigenfunctions of P_{ϵ} with least decay will be approximations of (and for carefully chosen approximations, equal to) e^{±iθ} (k = ±1) because they are the most regular, and persist longest under continued perturbation. The corresponding eigenvalues are \({{{\Lambda }}}_{\pm 1}^{(\epsilon )}={R}_{\epsilon }{e}^{\pm i{\alpha }_{\epsilon }}\approx {R}_{\epsilon }{e}^{\pm i\alpha }\), for 0 < R_{ϵ} ⪅ 1, which correspond to rotation by ±α with small decay rate of R_{ϵ} per unit time. Thus, the eigenvalues of P_{ϵ} of greatest magnitude (excluding the eigenvalue 1) automatically identify the rotation angle α. See Methods for a description of our numerical approach for approximating P.
Persistent cycles in continuous time
In continuous time one can consider generators for the transfer and Koopman operators. These generators are timederivatives of P^{t} and U^{t}, and are given by \(Gf=\mathop{\lim }\nolimits_{t\to 0}\frac{1}{t}({P}^{t}ff)\) and \(Vf=\mathop{\lim }\nolimits_{t\to 0}\frac{1}{t}({U}^{t}ff)\), respectively. The operators G and V are defined on a dense subspace of H, and are skewsymmetric duals to one another, i.e., G = V^{*} = −V. One has that σ_{e}(G) = σ_{e}(V) are additive subgroups of \(i{\mathbb{R}}\) (the eigenspectrum lies on the imaginary axis in \({\mathbb{C}}\)); that is, if \(\lambda ,\hat{\lambda }\in {\sigma }_{e}(G)={\sigma }_{e}(V)\) then \(\lambda +\hat{\lambda }\) and \(\lambda \hat{\lambda }\) are both in σ_{e}(P). Eigenvalues of G and V are interpreted as rates of rotation per unit time. If our phase space is S^{1}, and Φ^{t} rotates the circle at a rate α, then G and V have eigenvalues λ_{k} = ±ikα and corresponding eigenfunctions e^{ikθ} for \(k\in {\mathbb{Z}}\) and θ ∈ S^{1}.
The operators G and V “generate” the semigroup of operators P^{t} and U^{t} by P^{t} = e^{tG} and U^{t} = e^{tV}, and the spectral mapping theorem connects their spectra: \({\sigma }_{e}({P}^{t})={e}^{t{\sigma }_{e}(G)}\) and \({\sigma }_{e}({U}^{t})={e}^{t{\sigma }_{e}(V)}\) (if Φ^{t} is not invertible, the spectral value 0 = e^{−∞} is treated separately). For example, the relationship Λ = e^{λ} links the eigenvalues Λ of the discretetime operators with the eigenvalues λ of their continuoustime counterparts.
As in the discretetime setting, one may perturb the generators by addition of a diffusion process or through a numerical scheme. In the former case, if Φ^{t} is governed by a vector field then natural “diffused” versions G_{ϵ} and V_{ϵ} of G and V are provided by normalized forward and backward Kolmogorov equations, respectively. In the latter case, one may apply various numerical schemes^{26,30,34,36,64}. The scheme^{34} is outlined in the Methods section. The eigenvalues of G_{ϵ} and V_{ϵ} are in general complex numbers with zero or negative real part. For the same reasons as in the discretetime setting, one seeks eigenvalues with real part closest to the imaginary axis, which describe the slowest decay rate. In our example of a circle rotation with rotation rate α, the eigenfunctions of least decay rate are e^{±iθ} (k = ±1) with corresponding eigenvalues \({\lambda }_{\pm 1}^{(\epsilon )}={r}_{\epsilon }\pm i{\alpha }_{\epsilon }\approx {r}_{\epsilon }\pm i\alpha\), for r_{ϵ} ⪅ 0 (r_{ϵ} is analogous to \({{{{{{\mathrm{log}}}}}}}\,{R}_{\epsilon }\) from the discretetime setting).
Eigenvalue frequency analysis of monthlyaveraged IndoPacific SST
We analyze model and observational SST data over the IndoPacific domain 28^{∘}E–70^{∘}W, 60^{∘}S–20^{∘}N. This domain was selected as a representative region of activity for several largescale modes of climate variability on seasonal to decadal timescales, including ENSO, ENSO combination modes^{65}, and the Interdecadal Pacific Oscillation (IPO)^{66}. The model data comprise 1300 yr of monthlyaveraged SST fields from a preindustrial control integration of the Community Climate System Model Version 4 (CCSM4)^{67}, sampled at the model’s native ocean grid of approximately 1^{∘} resolution. As observational data, we use monthly averaged SST fields at 2^{∘} resolution from the Extended Reconstructed Sea Surface Temperature Version 4 (ERSSTv4) reanalysis product^{68} over the period January 1970 to February 2020. The resulting SST data vectors x_{n} have dimension d = 44,771 and 4868 for CCSM4 and ERSSTv4, respectively.
Our numerical approach builds an approximation of the generator in a datadriven basis consisting of eigenvectors of a kernel matrix. The kernel matrix, K, has size \(\tilde{N}\times \tilde{N}\), where \(\tilde{N}=NQ+1\), and is constructed from delayembedded SST fields over a window of Q − 1 = 48 lags of length Δt = 1 month, corresponding to an interannual time interval of Q Δt = 4 years. Its eigenvectors represent temporal patterns that can be thought of as nonlinear generalizations of the PCs obtained via EEOF analysis. Previously^{69,70,71}, such kernel eigenvectors were shown to successfully recover physically meaningful modes from monthly averaged SST data in both IndoPacific and Antarctic domains. For our purposes, however, the eigenvectors and eigenvalues of K are employed to construct a datadriven version of the regularized generator V_{ϵ}, and extract dominant modes by solution of an associated eigenvalue problem (see Methods). The approximation basis formed by the eigenvectors of K is (i) learned from the highdimensional SST data at a feasible computational cost; (ii) is refinable, in the sense of having a welldefined asymptotic limit as the amount of data N increases; and (iii) as the delay window Q Δt increases, it is provably welladapted to representing eigenfunctions of the generator^{33,37}. The results we obtain are not particularly sensitive to the precise choice of kernels and lags, nor to the use of the generator or transfer operator. For example, similar results can be obtained with the transfer operator P^{Δt} constructed using a single lag (Q = 2) of length ℓ = 12 months (see Methods). A summary of the dataset attributes and numerical parameters employed in our computations is displayed in Supplementary Tables 1 and 2.
Figure 6 and Supplementary Table 3 show eigenvalues λ_{0}, λ_{1}, … of the generator V_{ϵ} computed from the CCSM4 and ERSSTv4 datasets, arranged in order of decreasing real part (i.e., increasing decay rate). The leading eigenvalues form distinct branches corresponding to (i) the annual cycle and its harmonics; (ii) ENSO and its combination modes with the annual cycle; and (iii) lowfrequency (decadal) modes with vanishing oscillatory frequency. In the case of the observational data, the spectrum also contains a trendlike mode representing climate change, as well as combination modes representing the modulation of the annual cycle by the trend (see Supplementary Fig. 1).
In interpreting the results in Fig. 6 and Supplementary Table 3, it should be kept in mind that, modulo a small amount of numerical drift, the CCSM4 data are generated by autonomous dynamics associated with fixed (preindustrial) concentrations of greenhouse gases and perfectly periodic radiative forcing representing the seasonal cycle. In particular, the phase of the seasonal cycle is implicitly represented in the delayembedded SST data. The autonomous techniques employed in this paper are therefore rigorously applicable in this dataset; see Supplementary Note 1 for further details. In contrast, the ERSSTv4 data are subject to different natural and anthropogenic external forcings (e.g., volcanoes and greenhouse gas emissions, respectively), so strictly speaking our autonomous methodology does not formally apply here. Nevertheless, our spectral decomposition separates the trend (corresponding to a real eigenvalue) from the periodic and approximately periodic cycles (corresponding to complex eigenvalues), which are by definition trendless. In fact, we posit that an advantage of our approach is that it is capable of extracting trendless cyclical modes in ERSSTv4 without ad hoc detrending of the data, which is oftentimes performed in the context of EOF analysis and related approaches.
In both CCSM4 and ERSSTv4, the seasonalcycle modes occur first in our ordering, which is consistent with the fact that these are purely periodic modes remaining correlated for arbitrarily long times. Two pairs of eigenfrequencies \({\nu }_{j}:= {{{{{{{\rm{Im}}}}}}}}\,{\lambda }_{j}/(2\pi )\) in this family are accurately identified by the datadriven eigenvalue problem, namely the annual (1 yr^{−1}) and semiannual (2 yr^{−1}), eigenfrequencies where the numerical results agree with the true values to within 1% and 4%, respectively (see Supplementary Table 3). The third (triannual) harmonic is not identified as accurately, being assigned an eigenfrequency of ≃2.5 yr^{−1} as opposed to the expected 3 yr^{−1}. This discrepancy is at least partly due to finitedifference errors in our numerical approximation of the generator; this is discussed in more detail in the Methods section. Other contributing factors to approximation errors for the eigenfrequencies include the Nyquist limit (which imposes a limit of 1/(2 Δt) = 6 cycles/yr on the maximum frequency that can be resolved with a monthly sampling interval) and the addition of diffusion (which in general perturbs the eigenvalues along both the real and imaginary axes) in the construction of the regularized generator V_{ε}.
Beyond the seasonal cycle branch, the CCSM4 spectrum exhibits a branch of eigenvalues consisting of a pair of fundamental modes with an interannual frequency ν_{7} ≃ 0.25 yr^{−1} = : ν_{ENSO}, as well as combination frequencies ν_{j}, j = 9, 11, 13, 15, approximately equal to ν_{ENSO} + mν_{annual}, where ν_{annual} = 1 yr^{−1} is the annualcycle frequency, and m is an integer taking values in the set { −2, −1, 1, 2}. Note that the spacing of 2 in the index j is due to restricting to positive frequencies in this discussion; see Supplementary Table 3.
We will shortly interpret the eigenfunction corresponding to the eigenvalue ν_{ENSO} as representing the fundamental ENSO cycle. We note that this choice is unambiguous as the eigenvalue with largest real part and frequency close to 0.25 yr^{−1}. Similarly, the frequencies ν_{ENSO} + mν_{annual} are naturally interpretable as combination modes, consistent with the group structure of the generator spectrum described above. Further notable aspects of these results are that (i) distinct generator eigenvalues correspond to distinct combination frequencies (as opposed to EOF analysis, which mixes the combination and fundamental frequencies^{65}); and (ii) two harmonics are identified corresponding to the annual and semiannual cycles. We have verified that the ENSO eigenfrequencies extracted from the CCSM4 data remain unchanged to two significant digits for embedding windows ranging from 1 year (Q = 12) to 16 years (Q = 192); see Supplementary Table 4.
ENSO and ENSO combination eigenvalues are also identified in the ERSSTv4 spectrum, but these eigenvalues occur after an eigenvalue with vanishing imaginary part that we interpret as a representation of climate change trend. As shown in Supplementary Fig. 1(a), the eigenfunction time series corresponding to this eigenvalue has a manifestly nonstationary character, which is broadly consistent with accepted climate change signals such as persistent warming from the 1980s to early 2000s, “hiatus” during the mid to late 2000s, and accelerated warming during the early to mid 2010s^{72}. In addition, the trend eigenfunction time series is found to correlate with areaaveraged anomalies of IndoPacific SST and global surface air temperature, with 0.83 and 0.78 correlation coefficients, respectively. As with ENSO, this trend eigenfunction comes with its own “combination frequencies” close to 1 yr^{−1} (since the trend frequency is zero), capturing the modulation of the annual cycle by the trend (see Supplementary Fig. 1(b)). Aside from this trend family, both the CCSM4 and ERSSTv4 spectra contain additional modes with zero corresponding eigenfrequency, representing internal decadal variability of the IndoPacific^{69,70}. The spectra also contain interannual modes with higher frequencies than ν_{ENSO}, notably a mode with an approximately 3year eigenperiod (see Supplementary Table 3). In what follows, we will focus on the fundamental ENSO eigenfunctions and the corresponding lifecycle analysis. These correspond to eigenfunctions g_{7} and g_{6} in the CCSM4 and ERSSTv4 ordering, respectively. Since the observational data are sparser and noisier than the model data, we expect larger (numerical) decay rates for the observational data as stronger diffusion is needed to regularize the generator (see Methods). This is borne out in Fig. 6, where the real parts of the generator eigenvalues for ENSO and the ENSO combination modes are more negative for the ERSSTv4 data than for CCSM4.
In summary, we have extracted ENSO eigenfunctions and eigenfrequencies from two datasets (CCSM4 and ERSSTv4), using two computational techniques (generator and transfer operator) and a range of numerical parameters (lag and embedding window length). Moreover, in each spectral analysis experiment, there is no ambiguity in associating particular eigenfunctions with ENSO, as discussed above.
Rectified cycles from eigenfunctions
As described above, when nonzero eigenfrequencies exist, the dominant eigenfunctions correspond to observables with approximately cyclic evolution, even if the underlying flow Φ^{t} is aperiodic. Below, we will use this idea to extract a rectified ENSO lifecycle from the spatiotemporal SST data. We first describe the mathematical construction, using idealized dynamical systems as examples.
Let \({({U}_{\epsilon })}^{t}{g}^{(\epsilon )}={({{{\Lambda }}}^{(\epsilon )})}^{t}{g}^{(\epsilon )}\) as before. We follow an orbit in state space Ω starting at some ω_{0}. Evaluating both sides of \({({U}_{\epsilon })}^{t}{g}^{(\epsilon )}={({{{\Lambda }}}^{(\epsilon )})}^{t}{g}^{(\epsilon )}\) at ω_{0}, we obtain \(({({U}_{\epsilon })}^{t}{g}^{(\epsilon )})({\omega }_{0})\approx {g}^{(\epsilon )}({{{\Phi }}}^{t}({\omega }_{0}))={({{{\Lambda }}}^{(\epsilon )})}^{t}{g}^{(\epsilon )}({\omega }_{0})\), where we have inserted the definition of \({U}_{0}^{t}\) as the middle term, recalling we have U_{ϵ} ≈ U in some sense. Defining the multiplicative action of a complex number Λ on another complex number by \({M}_{{{\Lambda }}}:{\mathbb{C}}\to {\mathbb{C}}\) by M_{Λ}z = Λz, we have that \({g}^{(\epsilon )}({{{\Phi }}}^{t}({\omega }_{0}))\approx {M}_{{{{\Lambda }}}^{(\epsilon )}}^{t}({g}^{(\epsilon )}({\omega }_{0}))\). Thus, we may think of the eigenfunction g^{(ϵ)} as an approximate projection (or factor map) from Ω to \({\mathbb{C}}\); this is summarized in the following (approximate) commutative diagram:
Evolution under Φ^{t} on Ω is projected down (by g^{(ϵ)}) to approximately a fixed multiplicative action on \({\mathbb{C}}\) by Λ^{(ϵ)}. Further, for ∣Λ^{(ϵ)}∣ ≈ 1, we may consider the multiplicative action of \({M}_{{{{\Lambda }}}^{(\epsilon )}}\) as an approximate action on \({S}^{1}:= \{z\in {\mathbb{C}}: z =1\}\). Recalling that \({{{\Lambda }}}_{\pm 1}^{(\epsilon )}\approx {R}_{\epsilon }{e}^{\pm i\alpha }\) with 0 < R_{ϵ} ⪅ 1, the multiplicative action of \({M}_{{{{\Lambda }}}_{\pm 1}^{(\epsilon )}}\) corresponds to an approximate rotation on S^{1} by an angle of ±α. Thus, for \( {{{\Lambda }}}_{\pm 1}^{(\epsilon )} \approx 1\), evolution under Φ^{t} on Ω is projected down (by g^{(ϵ)}) to approximately a fixed rotation on S^{1} by α. The above statement is illustrated numerically for the Lorenz equations in Fig. 2(e–g), where g^{(ϵ)}(Φ^{t}(ω_{0})) is plotted for t ∈ [0, 160]. The evolution lies approximately on \({S}^{1}\subset {\mathbb{C}}\) (Fig. 2(g)), rotating at an approximately fixed rate (Fig. 2(h)). See Supplementary Movie 1 for a more direct visualization of these results. Projections of this type for real eigenfunctions of the transfer operator have been used to project out fast dynamics in multiple time scale systems^{73}.
The fact that the rotation on S^{1} occurs at close to a fixed rate is a key aspect of our ENSO analysis, and so we emphasize this property by discussing a simple example that is strongly illustrative for climate cycles such as ENSO. We imagine a crude model of the ENSO cycle with onedimensional phase space Ω = S^{1}. The dynamics of this idealized model is given by a flow Φ^{t}: S^{1} → S^{1}, generated by a nonconstant velocity on S^{1}. We choose a sawtoothlike velocity field to model the observation that the La Niña to El Niño transition is slower than the transition in the other direction^{74}; see Fig. 4(c) for the corresponding evolution of a “normalized Niño 3.4” index vs. time and Supplementary Movie 2 for the corresponding animation. In this situation there is no need to “extract” a cycle in the dynamics, because the dynamics is a cycle—but importantly with nonconstant speed.
Similar to above, let \({M}_{{{\Lambda }}}^{t}\) denote the flow that advances the angle on S^{1} by \(\arg ({{\Lambda }})t\), where \(\arg ({{\Lambda }})=(2\pi )/T\) and T is the period of the cycle. The flow \({M}_{{{\Lambda }}}^{t}\) has a constant velocity around S^{1}, namely 2π/T; see Fig. 4(f) for the corresponding cosinelike evolution. Because Φ^{t} and \({M}_{{{\Lambda }}}^{t}\) are both cycles of the same period, there exists a homeomorphism h: S^{1} → S^{1} conjugating Φ^{t} and \({M}_{{{\Lambda }}}^{t}\); that is, \(h\circ {{{\Phi }}}^{t}={M}_{{{\Lambda }}}^{t}\circ h\), summarized in the commutative diagram below:
We denote by θ the angle in the “original” cycle (the upper part of the commutative diagram) and by \(\theta ^{\prime}\) the angle in the rectified cycle (the lower part of the commutative diagram); by definition \(\theta ^{\prime} =h(\theta )\). We set θ = 0 to represent the peak El Niño state in our crude cyclic model of ENSO, and without loss of generality we fix h(0) = 0 so that peak El Niño occurs at the same angle \(\theta =\theta ^{\prime} =0\) in both the original and rectified cycles. We define θ = π as peak La Niña, according to the original cycle, directly opposing El Niño; this is represented by the green dot in Fig. 4(a).
The eigenfunction of the Koopman operator corresponding to \({M}_{{{\Lambda }}}^{t}\) with eigenvalue Λ is \({g}_{{{{{{{{\rm{rect}}}}}}}}}(\theta ^{\prime} ):= {e}^{i\theta ^{\prime} }\); this eigenfunction is illustrated Fig. 4(e) where the value of the real part of \({e}^{i\theta ^{\prime} }\) is colored. By the above conjugacy, the function (g_{rect}∘h)(θ) = e^{ih(θ)} is an eigenfunction of U^{t} with eigenvalue Λ; see Fig. 4(a). Because La Niña is reached more quickly from El Niño than viceversa in the original flow, so too (by conjugacy) must this occur in the rectified, constantspeed flow. Thus, La Niña in fact appears earlier than halfway through the rectified cycle; see the green dot in Fig. 4(e), which lies at the angle h(π). Finally, let f_{orig}(θ) ≔ e^{iθ} represent the complexvalued function corresponding to our crude cyclic model of ENSO, where θ = 0 is El Niño and θ = π is La Niña. We can map f_{orig} to the rectified space by \({f}_{{{{{{{{\rm{orig}}}}}}}}}\circ {h}^{1}(\theta ^{\prime} )={e}^{i{h}^{1}(\theta ^{\prime} )}\); the real part of this latter function is shown in Fig. 4(b). We will use versions of the functions f_{orig} and f_{orig}∘h^{−1} as our main demonstration of our rectification process in Fig. 5. These results are an example of the automatic rectification performed by Koopman eigenfunctions (Theorem 17.11^{11}) for systems with discrete spectra. Operatortheoretic approaches to different kinds of rectification have also been explored^{75,76}.
A rectified ENSO lifecycle from eigenfunctions
We now apply the ideas from the previous two subsections to the CCSM4 and ERSSTv4 data, where g_{rect} in those sections will be the generator eigenfunctions g_{7} and g_{6}, arising from these two datasets, respectively. In the following we will refer to g_{6} and g_{7} collectively as simply g_{j}. Figure 5 compares several aspects of the g_{j} to new, lagged ENSO indices f_{nino} derived from the Niño 3.4 index output from the CCSM4 and ERSSTv4 data as follows. At each time instance, f_{nino} is a 2D vector consisting of the current Niño 3.4 value and its value ℓ months in the past; that is, (Niño 3.4(t), Niño 3.4(t − ℓ months)). We choose ℓ to be the lag that gives the most cyclelike behavior for f_{nino}. If the Niño 3.4 index evolved as a perfect cycle with a period of T = 4ℓ months, the two components of f_{nino} would be in quadrature (90^{∘} phase difference), resulting in a purely angular motion in the associated 2D phase space. This situation would be analogous to the evolution of the f_{orig} observable depicted in Fig. 4(a), which is periodic but not of fixed frequency. Yet, in Fig. 5(a, i), it is evident that the evolution of f_{nino} exhibits significant departures from an ~4year cycle, featuring both retrograde and radial motion, particularly in the case of the ERSSTv4 data (Fig. 5i). In Fig. 5(d, l), we show the evolution of the phase angle obtained by treating the components of f_{nino} as the real and imaginary parts of a complex number, analogous to the L63 example in Fig. 2(b, f) (note that the latter representations are in the full phase space). Here, an approximately cyclical evolution of f_{nino} would induce an approximately monotonic phase evolution (modulo 2π), which would additionally be linear for a constantfrequency cycle. While such a behavior is discernible in Fig. 5(d, i), the phase evolution of f_{nino} is clearly corrupted by highfrequency noise due to retrograde/radial motion.
Consider now the generator eigenfunctions g_{j}. The time series plots (Fig. 5(c, g, k, o)) demonstrate that the real part of g_{j} is positively correlated with the Niño 3.4 index (the first component of f_{nino}): large positive values of \({{{{{{{\rm{Re}}}}}}}}\,{g}_{j}\) tend to coincide with large positive values of the first component of \(f_{{{{{{{{\rm{nino}}}}}}}}}\), including a number of significant events in the recent observational record such as the 1997/98 and 2015/16 El Niños. Recall that despite the presence of a climatechange signal in the ERSSTv4 data, the extracted ENSO eigenfunctions are trendless. Figure 5(f, n) displays scatterplots of the 2D phase spaces associated with the real and imaginary parts of g_{j}, colored by the Niño 3.4 index. These plots are analogous to the scatterplots of \({{{{{{{\rm{Re}}}}}}}}\,({f}_{{{{{\rm{orig}}}}}}\circ {h}^{1})\) in Fig. 4(d), and illustrate that the very negative Niño 3.4 index values (deep blue) occur not directly opposite the very positive Niño 3.4 index values (deep red), but instead appear earlier in the rectified cycle. These facts and the fact that the corresponding eigenfrequencies ν_{j} are interannual and wellapproximate ν_{ENSO}, provide evidence that the g_{j} provide a representation of the ENSO lifecycle; a fact which will be corroborated further below using phase composites. Before doing that, however, we note two important aspects of the results in Fig. 5.
First, the generator eigenfunctions provide a significantly more cyclic representation of the ENSO lifecycle than conventional Niño indices. In Fig. 5e, m, the 2D phase space trajectories associated with the real and imaginary parts of g_{j} are seen to undergo a predominantly polar evolution, with little to no retrograde motion when g_{j} is located sufficiently away from the origin (∣g_{j}∣ ≳ 1). As noted above, this is in contrast to the retrograde and radial motion seen in the Niño 3.4based f_{nino} index. Moreover, in separate calculations we have verified that the generator eigenfunctions g_{j} are also more cyclical than the twodimensional f_{nino} indices constructed from the Niño 4, 3, and 1+2 indices. Twodimensional phase space representations of the ENSO state with approximately cyclical behavior can also be constructed through multivariate indices, such as SST and thermocline depth anomalies^{4}, that reveal recharge–discharge processes^{77}, but these representations are also generally less coherent than those provided by the generator eigenfunctions.
Second, the generator eigenfunctions “rectify” the ENSO cycle in a manner analogous to the oscillator example in Fig. 4. In Fig. 5(h), the phase angle associated with the CCSM4derived g_{j} undergoes a nearlinear evolution, with some excursions from this behavior occurring. We observe that these deviations from linear behavior occur when the Niño 3.4 (scalar) index is close to zero (white color in Fig. 5(h, p)). Mathematically, deviations from cyclic behavior are more likely when ∣g_{j}∣ is small, which implies \({{{{{{{\rm{Re}}}}}}}}\,{g}_{j}\) is also small, and is in turn consistent with weak ENSO amplitude. Visually, the rectification induced by g_{j} can be seen in the time series plots in Fig. 5(c, g), where a comparatively uniform El Niño–La Niña cycling of \({{{{{{{\rm{Re}}}}}}}}\,{g}_{j}\) (Fig. 5(g)) is contrasted with slow La Niña to El Niño ramp ups followed by rapid El Niño to La Niña decays in the f_{nino} representation (Fig. 5(c)). In Fig. 6(c), we examine the relationship between the phase angles associated with f_{nino} and g_{j} through a curve fit of \(\theta ^{\prime} := \arg {g}_{j}\) as a function of \(\theta := \arg {f}_{{{{{{{{\rm{nino}}}}}}}}}\) (shown in a solid yellow line). The fitted curve provides an estimate of the homeomorphism function h discussed above in the context of the oscillator example. When θ = π (i.e., during La Niñas according to the Niño 3.4 index), the fitted \(\theta ^{\prime}\) is less than π, which shows that La Niña events occur earlier than halfway through the g_{j} cycle, as in Fig. 4.
A similar general behavior of the phase angle is observed for the ERSSTv4 data (Figs. 5(l, p) and 6(f)), though as one might expect the results are noisier than for CCSM4. Still, the phase angle progression associated with g_{j} (Fig. 5(p)) exhibits a significantly more rectified behavior than its f_{nino} counterpart (Fig. 5(l)), particularly during significant El Niño/La Niña events (highlighted with green star markers). Interestingly, the generator angle \(\arg {g}_{j}\) corresponding to La Niña events following strong El Niños (e.g., the 1973/74 and 1999/00 La Niñas in Fig. 5(n)) is close to 90^{∘}. This is consistent with the fact that strong consecutive El Niño and La Niña events in the observational record have a tendency to occur one year apart, corresponding to a quarter of the 4year ENSO cycle.
In summary, our spectral analysis extracts a canonical ENSO cycle, and provides rectified coordinates representing the cycle as an approximately fixedspeed oscillation. In rectified space it is clear that the representation of the ENSO cycle in terms of Niño indices (SST anomalies) is asymmetric because La Niña appears earlier (in phase/angle space) around the onedimensional cycle (see Fig. 5(f, j)). Without the rectified representation, it would be difficult to assign a characteristic speed/frequency around the cycle. This notion of characteristic frequency will be useful below for constructing phase composites, and should also be useful for constructing reduced models. More broadly, we suggest that rectification is an important conceptual construction, which should be useful in a wide range of climate dynamics applications.
ENSO phases and their associated composites
We construct reduced representations of the ENSO lifecycle by partitioning the 2D phase spaces associated with the generator eigenfunctions and lagged Niño 3.4 index into angular phases, and then study the properties of associated phase composites of relevant oceanic and atmospheric fields. Figure 6(a, b, d, e) depicts the phase space partitions over eight such phases for CCSM4 and ERSSTv4, respectively. Each phase is constructed from samples at times for which ∣g_{j}∣ lies in the top m values in the corresponding 45^{∘} radial sector, where m = 200 and 20 for CCSM4 and ERSSTv4, respectively. Larger magnitude values of the eigenfunction g_{j} occur at times belonging to stronger ENSO cycles, and because we seek a strong canonical ENSO cycle, we subsample at these times. Mathematically, the phase composites constructed in this manner can be interpreted as conditional expectations of observables (e.g., SST anomaly fields) with respect to a discrete variable π_{j}: Ω → {0, 1, …, 8} indexing the eight phases associated with eigenfunction g_{j}. The inclusion of a “zero” phase nominally is to account for states which are not ENSOactive, consistent with earlier work^{24,78} that prioritizes larger values of real eigenfunctions and equivariant functions; see Methods for further details.
It should be noted that in the eigenfunctionbased representation, partitioning the phase space into phases of uniform angular extent is a natural choice since the evolution is rectified and takes place at an approximately constant angular frequency. In other words, in the eigenfunction picture in Fig. 6(b, e), phases of uniform angular extent correspond to phases of uniform temporal duration, in this case approximately 4/8 = 1/2 years. In the case of the Niño 3.4based representation in Fig. 6(a, d), achieving a wellbalanced partitioning is more challenging due to variable/retrograde angular speed and significant radial motion. Here, we have opted to employ a uniform partitioning scheme which is common practice with many cyclical climatic indices, including indices for the MJO and other intraseasonal oscillations^{79}. We note that this is already an improvement over a characterization of ENSO phases based on scalar indices, since such representation cannot distinguish the time tendency (increasing or decreasing) of the oscillation.
In both the Niño 3.4 and eigenfunctionbased representations, the phases are numbered such that Phase 1 corresponds to El Niño, and periodic cycling of the phases from 1 to 8 represents an El Niño to La Niña to El Niño evolution. Turning back to the Niño3.4 representation in Fig. 5(b), Phase 5 is a La Niña phase centered at angle π. On the other hand, in the generator representation in Fig. 5(f), La Niña (deep blue, corresponding to lowest Niño3.4 values) occurs at Phase 4, centered at 3π/4, due to the rectification. This means that the rectified generator representation allocates more phases (Phases 5–8) in the La Niña to El Niño portion of the ENSO lifecycle, thus yielding a more granular description of ENSO initiation processes.
In Fig. 7, we examine phase composites of monthly averaged SST and surface wind anomalies, constructed using the Niño 3.4 and generator phases from CCSM4 and ERSSTv4 depicted in Fig. 7. In the the CCSM4 analysis we use surface wind data from the atmospheric component of the model (CAM2). In the ERSSTv4 analysis, the surface wind data is from the NCEP/NCAR Reanalysis 1 product^{80}. First, on a coarse level, both the Niño and generatorbased composites recover the salient features of the ENSO lifecycle. These include (i) the characteristic El Niño “tongue” of positive SST anomalies in the Eastern equatorial Pacific, together with its associated anomalous surface westerlies, in Phase 1; (ii) meridional discharge in the ensuing intermediate phases; and (iii) formation of negative SST anomalies and easterly surface winds during the La Niña phases (Phases 5 and 4 for the Niño and eigenfunction–based representations, respectively).
The Niño3.4 and generatorbased composites in Fig. 8 also exhibit important differences, particularly in the La Niña to El Niño transition phases. In both CCSM4 and ERSSTv4, Phases 6–8 of the generator capture a reorganization of the largescale surface winds from a convergent configuration over the Maritime Continent in Phase 6 to a divergent configuration initiating in Phase 7 with a buildup of anomalous westerlies in the Western Pacific, developing further in Phase 8. In particular, the anomalous westerlies in Phase 7 are consistent with the aggregate effect of higherfrequency, stochastic atmospheric variability such as westerly wind bursts^{81} that trigger the development of El Niño events.
To examine this behavior in more detail, in Fig. 8 we show phasecomposited zonal wind profiles at the dateline for the latitude range 40^{∘}S–40^{∘}N. These composites recover a number of important atmospheric features of the ENSO lifecycle, including (i) the mature El Niño state in Phase 1 characterized by strong westerlies in the tropics maintaining positive SST anomalies in the eastern part of the Pacific basin; (ii) El Niño decay in Phase 2 with decreasing easterly intensity and a southward shift^{82} of the anomalous equatorial westerlies; (iii) La Niña initiation in Phase 3; (iv) La Niña growth, saturation, and decay in Phases 4–6; (v) El Niño initiation in Phase 7, featuring a clear signal of anomalous westerlies; and (vi) El Niño growth in Phase 8, cycling back to the mature El Niño state in Phase 1. These features are resolved in both the CCSM4 and ERSSTv4 datasets, though the observational composites tend to display a higher degree of asymmetry between the Northern and Southern hemispheres.
In contrast to the generator composites, the Niño3.4based composites exhibit significantly more abrupt El Niño–La Niña and La Niña–El Niño transitions, failing to recover a number of the processes outlined above. In particular, Niño Phase 2 (which represents El Niño decay in the generator picture), closely resembles the mature El Niño phase in Phase 1. In Phase 3, the Phase 2 configuration is abruptly replaced by nearneutral conditions, failing to capture the southward shift of the anomalous equatorial westerlies associated with El Niño termination. The Niñobased composites are characterized by a similarly abrupt La Niña to El Niño transition in Phases 7 and 8, with weak negative SST anomalies in the eastern equatorial Pacific being replaced by welldeveloped El Niño conditions. Importantly, there is no representation of anomalous westerlies during these phases. The more physically informative reconstruction of the ENSO lifecycle provided by the generator is likely due to the dynamical rectification property discussed above, which enables phase partitioning in the “intrinsic” phase of the oscillation. Beyond ENSO, we expect this rectification property to be beneficial in diagnostic and mechanistic studies of different climate phenomena.
Phase equivariance
Besides the diagnostic aspects described above, an important requirement of an index representing a coherent oscillatory phenomenon such as ENSO is that phase progression is consistent with the temporal evolution of the samples constituting each phase—this is the concept of phase equivariance stated in the Introduction. In the particular setting of the eightphase ENSO reconstruction studied here, phase equivariance means that the forward evolution of the samples that constitute phase i by six months (the nominal duration of each phase) should map these samples into the samples making up phase i + 1, modulo 8. Theoretically, this correspondence should be exact for a purely periodic process such as the variablespeed oscillator in Fig. 4, but for a chaotic oscillator such as ENSO we expect it to hold only approximately. We will demonstrate below that the indices based on the generator eigenfunctions g_{7} (CCSM4) and g_{6} (ERSSTv4) exhibit greater equivariance than the lagged Niño 3.4 index f_{nino}.
To test for equivariance in the Niño3.4 and generatorbased representation of ENSO, in Fig. 9 we show this forward evolution in the corresponding 2D phase spaces in sixmonth increments starting from Phase 7 (i.e., the phase most closely related to El Niño initiation). There, it is evident that the generator lifecycle exhibits phase equivariance on significantly longer intervals than the Niño 3.4 lifecycle, in both CCSM4 and ERSSTv4. In the case of the generator, the centroid of the cloud of points making up the forward evolution of Phase 7 has a phase angle consistent with equivariant phase evolution over the examined 2year interval. While there is visible dispersion occurring by ≃12 months, this dispersion occurs predominantly in the radial direction and has a limited effect on the phase classification. In contrast, the point clouds corresponding to forward evolution of the Niñobased Phase 7 exhibit strong dispersion in both radial and angular directions, decorrelating with the target phase expected from equivariance on intervals as short as 6–12 months. The difference in equivariance between the Niño 3.4 and generator lifecycle is most striking in the ERSSTv4 data, where after a 1year interval the forwardevolved Phase 7 from Niño 3.4 has zero overlap with the expected Phase 1, whereas in the case of the generator that overlap is close to 100%. These results open the possibility that methods of characterizing ENSO based on areaaveraged anomalies (such as the lagged Niño 3.4 index f_{nino}) may conflate unrelated parts of the cycle. This could contribute to difficulties with ENSO prediction, as shown in Fig. 9(a, b); (top) for f_{nino}, where poorly chosen groupings mix together unrelated ENSO phases, leading to rapid divergence of these “false” groupings. Our results suggest that ENSO may have a more significant cyclic component than previously realized.
As a more quantitative assessment of phase equivariance, in Supplementary Fig. 2 we show the fractional sample overlap between the forwardevolved ENSO phases in CCSM4 in sixmonth increments with the expected target phases from equivariance. It is worthwhile noting that highest predictability of the generator phases occurs for start phases near the El Niño/La Niña peaks (Phases 1, 2, and 6), where the fractional overlap remains above 0.5 for at least a year. The evolution initialized at intermediate phases such as 3–5, 7, and 8 is somewhat less equivariant, with the relative overlap dropping to smaller than 0.5 values after a year. This behavior may be a manifestation of the ENSO spring predictability barrier^{83}.
ENSO diversity
ENSO diversity, i.e., the tendency of El Niño/La Niña events to differ from each other in terms of their spatial and temporal characteristics, has been a topic of considerable interest in the literature^{4,84,85,86,87}. It is common to spatially classify El Niño events as being of Eastern Pacific (EP) or Central Pacific (CP) type, depending on the longitudinal location of the highest SST anomalies^{4}. Some studies have interpreted these patterns as being the outcome of distinct temporal processes, with CP events dominated by quasibiennial (QB; 1.5–3 yr) components, and strong EP events exhibiting both QB and lowfrequency (LF) components in the 3–7 yr band^{87}. Other studies have classified ENSO events as cyclic, episodic, or multiyear, depending on whether they are preceded by the opposite, neutral, or same phase, respectively^{86}. In this section, we show how the generator eigenfunctions extracted from ERSSTv4 can account for some interevent differences in period 1975–2020. That period saw the occurrence of three strong EP El Niños (1982/83, 1997/98, and 2015/16), one moderate EP El Niño (1986/87), two CP El Niños (1994/95 and 2009/10), and two events which were of mixed character (1991/92, 2002/03)^{4}.
Recall from Fig. 3(b) that the top part of the generator spectrum exhibits the fundamental ENSO eigenfunctions (with a 4 yr eigenperiod), the associated ENSO combination modes (with various eigenperiods in the interannual to seasonal band), and also a pair of eigenfunctions, g_{19} and g_{20}, with a ≃ 3 yr eigenperiod (not shown in Fig. 3(b); see Supplementary Table 3). To assess the contribution of these eigenfunctions in the variability of the Niño 3.4 index, we compute associated time series reconstructions, or “modes”, using the standard approach employed in SSA, EEOF analysis, and other comparable techniques utilizing delay embedding^{55,69}. Given a complexconjugate pair of generator eigenfunctions, {g_{j}, g_{j+1}}, this procedure produces a (real) time series that represents the component of the Niño 3.4 index reconstructed by the pair {g_{j}, g_{j+1}}. Moreover, the time series from several such pairs can be added together to produce reconstructions of Niño 3.4 based on groups of generator eigenfunctions. (See Methods for details of the reconstruction procedure.) In Fig. 10(a), we present reconstructed Niño 3.4 time series based on the fundamental 4year ENSO mode (red line), the 3year ENSO mode (blue line), and the sum of the leading two ENSO combination modes (green line). The sum of these ENSOrelated modes is also shown (orange line), and captures greater variability than the fundamental ENSO mode alone. Shaded time intervals indicate periods where the the 2month running correlation coefficient between the latter reconstruction and the Niño 3.4 index is greater than 0.9.
First, it is readily apparent that certain El Niños are well captured by a small number of leading modes—i.e., modes that reflect greater dynamical persistence and cyclicity. In particular, consider the very intense 1982/83 and 1997/98 El Niños: for these two events, the peaks of all three ENSOrelated modes are effectively coincident in Fig. 10(a). The next most intense event, 2015/16, is characterized by 4year mode amplitude comparable to 1982/83, with overall correlation among the 4year, 3year, and combination modes. However, the 3year ENSO mode for 2015/16 is less intense than the corresponding mode for 1982/3.
Consider now the 1986/87 event, which is shown in detail in Fig. 10(b). In this case, we see a distinct behavior, as the combination modes have two peaks, one occurring before and one after the peak of the 4year mode, and a trough occurring during the peak of the 4year mode (see green and red lines in Fig. 10(a)). Superposing the combination modes with the fundamental ENSO mode (green line in Fig. 10(b)) results in consecutive peaks in the reconstructed Niño 3.4 index around the peak of the 4year mode. If we additionally include the 3year mode (orange line) the relative amplitude of the two peaks changes. In contrast, if we superpose only the 3year and 4year modes, the two consecutive peaks do not occur (see blue line in Fig. 10(b)).
Previously, ENSO combination modes have received significant attention due to their role in El Niño termination in boreal spring^{65,82,88}. The results in Fig. 10 show that ENSO combination modes can also play an important role in reconstructing events with multiple peaks. We note that while we have directly computed the combination eigenfunctions, in theory (as discussed in subsection “Eigenvalue frequency analysis of monthlyaveraged IndoPacific SST”) they may be determined from the state of the annual and the 4year ENSO eigenfunction. Thus, our results show that certain doublypeaked events, such as the 1986/87 El Niño, can be reconstructed from the same small set of modes as those used to reconstruct strong EP events.
It is noteworthy that the 1982/83, 1986/87, and 1997/98 El Niños, as well as the 2009/10 event (which is also well captured by the reconstructions in Fig. 10(a)), are all followed by La Niñas in the subsequent year. In the transitionbased classification of ENSO^{86}, these La Niñas are thus all classified as cyclic (cyclic El Niños are defined in a symmetric way). From the perspective of our spectral analysis approach, the occurrence of these cyclic La Niñas can be explained from the fact that once a generator eigenfunction g_{j} becomes “active”, i.e., ∣g_{j}(ω)∣ is large for a given climate state ω, it will, with high likelihood, remain active for at least a significant fraction of the cycle that it represents (since g_{j}(ω) precesses in the complex plane with fixed frequency and a weaker radial motion; see, e.g., Fig. 6(e)). In particular, significant El Niño events in the generatorbased representation have high likelihood of leading to La Niñas in the following year. On the other hand, the generator eigenfunctions have only moderate magnitude in the La Niña phase (see Fig. 5(m, n)). This suggests fewer cyclic El Niños, which is consistent with the different triggering mechanisms of the two phenomena^{86}.
In contrast to all of the events mentioned above, other events such as the 1991/92 El Niño, are not readily accounted for by the leading modes. It is possible that this event is tied to the June 1991 eruption of Mt. Pinatubo^{89,90,91}; external forcing of this event may explain why the eigenmodes fail to capture it. We note that the 1982/83 El Niño followed the eruption of El Chichón, but is nonetheless a strong EP event captured by the leading ENSO modes. The 1994/95 CP El Niño represents an example of another “missed” event in the context of the modes illustrated in Fig. 10. The fact that the leading generator eigenfunctions, which favor cyclicity and dynamical persistence, do not capture these events is consistent with the broadly accepted observation that CP events exhibit less canonical behavior than their EP counterparts^{4}. Intriguingly, the reconstructions in Fig. 10 indicate the existence of a 3year interannual mode, associated with generator eigenfunctions g_{19} and g_{20}, which plays a significant role in strong EP events but does not significantly contribute to CP events. This differs somewhat from previous mode decompositions of the Niño 3.4 index^{87}, which have identified a single QB mode contributing to both CP and strong EP events.
In summary, the results in Fig. 10 show that our spectral approach can differentiate certain ENSO events in terms of amplitude and phasing of an underlying set of dominant modes. What we would argue here, at least from the viewpoint afforded by a single regional index like Niño3.4, is that a rich diversity of El Niño behavior can be “constructed” from a small number of eigenfunctions of a dynamical operator. That the ENSO combination modes also contribute in a discernible way, either by adding to the fundamental and 3year ENSO modes as in 1982/83 or 1997/98 or creating consecutive peaks as in 1986/87, is also of note, as these modes may not be separately identified in the variance basis of EOFs, although they have been noted in SSA.
Discussion
Operatortheoretic approaches for dynamical systems, realized through kernel methods for machine learning, provide an effective framework for identification of persistent cyclic modes of variability in climate dynamics. Central to this framework is modeling the evolution of observables of the climate system with transfer and Koopman operators. The dominant eigenfunctions of these operators yield succinct and physically interpretable representations of fundamental modes of climate variability, with the corresponding eigenvalues reflecting the intrinsic timescale of variability of the mode. We have shown by means of theoretical arguments and numerical analyses of (i) idealized dynamical systems, (ii) comprehensive climate models, and (iii) reanalysis data, that these eigenfunctions reveal approximate cycles embedded in complicated systems with several advantageous characteristics over conventional approaches. Composites in the original observation space can be readily constructed; see Fig. 1. A further distinguishing aspect of our eigenfunctions is that they provide rectified coordinates for the state of the oscillation (Figs. 5 and 6), making them better suited for indexing the fundamental oscillations of the climate. Moreover our extracted cycles display a high level of selfconsistency under forward evolution (Fig. 9), a desirable property for characterizing a canonical strong ENSO and promising for prediction.
A major focus of this work has been the El Niño Southern Oscillation, extracted from monthlyaveraged IndoPacific SST data from a millennial control integration of a comprehensive climate model (CCSM4) and reanalysis data (ERSSTv4). In both of these datasets, the generator spectrum (Fig. 3) contains a pair of slowly decaying eigenfunctions with an interannual eigenfrequency, providing a rectified representation of the canonical ENSO lifecycle. In addition to the fundamental ENSO modes, the spectrum of the generator is found to exhibit a hierarchy of combination modes between ENSO and the annual cycle with the theoretically expected frequencies. These combination modes appear to play a role in capturing “double El Niño” events in the recent observational record, such as the 1986/87 series of events. Meanwhile, other events, such as the 1991/92 El Niño following the Mt. Pinatubo eruption and the 1994/95 central Pacific El Niño are not captured by the leading eigenfunctions, suggesting a different dynamical origin. Going beyond cyclic behavior, in the case of the reanalysis data, the spectrum of the generator was found to contain nonstationary modes associated with climate change, as well as combination modes representing the modulation of the annual cycle by the climatechange trend. Our analysis motivates further application of the spectral theory of dynamical systems to diagnosing and predicting the fundamental dynamical patterns of the climate.
Methods
As described in the Results, we have a timeordered dataset \({x}_{1},\ldots ,{x}_{N}\in {{\mathbb{R}}}^{d}\), arising as a series of observations x_{n} = X(ω_{n}) from a trajectory of an abstract dynamical system Φ^{t}: Ω → Ω, where ω_{n} = Φ^{n Δt}(ω_{0}). In our experiments, the SST field sampled at d ≫ 1 IndoPacific gridpoints at timeindex i yields a vector \({x}_{i}\in {{\mathbb{R}}}^{d}\) (see Supplementary Table 1 for further details on the datasets employed in this study). We also consider lowdimensional examples with d = 3 (L63 system; Fig. 2, Supplementary Table 2) and d = 2 (variablefrequency oscillator; Fig. 4), where X is the identity map on the respective state space Ω. Recall that Δt > 0 is the sampling interval, and μ is an assumed physically meaningful invariant probability measure for Φ^{t}, μ = μ∘Φ^{−t}. In what follows, we describe datadriven techniques for approximation of (i) the transfer operator P^{Δt}, or the Koopman operator U^{Δt}; and (ii) the generator V of the transfer/Koopman operator semigroups. In the measurepreserving setting, the transfer and Koopman operators on H = L^{2}(Ω, μ) form dual pairs related by adjoints, \({({P}^{t})}^{* }={U}^{t}\) for every \(t\in {\mathbb{R}}\). Thus, for conciseness of exposition, in what follows we focus on approximation of the transfer operator P = P^{Δt}, corresponding to Φ = Φ^{Δt}. In addition, we describe our procedure for computing spatiotemporal mode reconstructions from eigenfunctions.
Delay embedding
We will delayembed the data to form vectors \({\tilde{x}}_{i}=({x}_{i(Q1)\ell },\ldots ,{x}_{i\ell },{x}_{i})\in {{\mathbb{R}}}^{Qd}\) for some positive integers Q and ℓ, to have an improved estimation of the underlying state ω_{i} ∈ Ω as in standard Takens embedding^{50,51}. Let ν be the measure induced on \({{\mathbb{R}}}^{Qd}\) by the invariant measure μ on Ω. We seek to approximate projected versions of the operators P^{Δt}, U^{Δt}, and V that act on functions on a space of projected observables \({L}^{2}({{\mathbb{R}}}^{Qd},\nu )\). In practice, integrals with respect to ν are approximated by integrals with respect to the sampling probability measure \({\nu }_{N}:= \mathop{\sum }\nolimits_{i = (Q1)\ell }^{N1}{\delta }_{\tilde{{x}_{i}}}/(N(Q1)\ell )\), where \({\delta }_{\tilde{{x}_{i}}}\) is the Dirac δmeasure centered at \({\tilde{x}}_{i}\). This will shortly reduce to summing over the original data points \({\tilde{x}}_{i}\).
Approximation of transfer and Koopman operators
We define a novel, datadriven Markov chain approximation of P, where each embedded data point \({\tilde{x}}_{i}\in {{\mathbb{R}}}^{Qd}\), i = (Q − 1)ℓ, …, N − 2, N − 1 is identified with a Markov state. Our approximation retains important structural properties of P, namely it is a positive operator (nonnegative functions are mapped to nonnegative functions) and preserves integrals with respect to the databased measure ν_{N}. The addition of noise mentioned in the main text to form P_{ϵ} is done via Gaussian kernels
centered on each point \({\tilde{x}}_{i}\), where ϵ is a positive bandwidth parameter, the choice of which is discussed at the end of this subsection.
We discretely approximate the Markov operator \({P}_{\epsilon }:{L}^{2}({{\mathbb{R}}}^{Qd},\nu )\to {L}^{2}({{\mathbb{R}}}^{Qd},\nu )\) defined by
as
Evaluating P_{ϵ}f at an embedded data point \({\tilde{x}}_{i}\), we have
where
The matrix P = [P_{ij}] is column stochastic and we may think of P_{ij} as the conditional probability that the state j (or data point \({\tilde{x}}_{j}\)) transitions to the state i (or data point \({\tilde{x}}_{i}\)), in one time step, according to the kernel k_{ϵ} and the measure ν_{N}. A function \(f:{{\mathbb{R}}}^{Qd}\to {\mathbb{C}}\) taking values \({{{{{{{{\boldsymbol{f}}}}}}}}}_{0}:= f({\tilde{x}}_{i})\), i = (Q − 1)ℓ, …, N − 2, is evolved forward in time by matrixvector multiplication Pf; this approximates the action of P_{ϵ}f. By construction, the mass conservation property \({P}_{\epsilon }^{* }{{{{{{{\bf{1}}}}}}}}={{{{{{{\bf{1}}}}}}}}\) is inherited by P, namely P^{⊤}1 = 1.
In practice, we compute the numerical spectrum of P, Pg_{j} = Λ_{j}g_{j}, and extract the most persistent cyclic behavior from the eigenvector g_{1} corresponding to the eigenvalue Λ_{1} = re^{iα} with largest magnitude inside the unit circle (largest ∣r∣ < 1) and α > 0. As described in the Results, α represents the angle of rotation around the extracted cycle per unit time. The corresponding eigenvector g_{1} approximates the eigenfunction \({g}^{(\epsilon )}({\tilde{x}}_{i})\) of P_{ϵ} that approximately projects the system from \({{\mathbb{R}}}^{Qd}\) to the most persistent cycle (approximately lying on S^{1}) as illustrated, e.g., in Fig. 2(f–h) in the context of the L63 system.
In the experiment in Fig. 2(f–h), we used N = 16,000 samples taken at a sampling interval of Δt = 0.01 time units; see Supplementary Table 2. Moreover, the dimension is d = 3, and we do not need to embed the data as we have access to the original state, thus we set Q = 1. In calculations using the ERSSTv4 (N = 600) and CCSM (N = 15,600) IndoPacific SST datasets, using Δt = 1 month we find that a single lag of Q = 2 with ℓ = 12 months (approximately one quarter of the cycle period) is sufficient to accurately extract an accurate ENSO frequency and ENSO eigenfunctions; see Supplementary Table 1.
Regarding the choice of ϵ, generally one wishes to select an ϵ as small as possible, while maintaining an eigenvalue 1 of P with unit multiplicity. A ballpark estimate of suitable ϵ is the mean nearest neighbor distance (averaged over all embedded data points \({\tilde{x}}_{i}\)), divided by \(\sqrt{2}\); this scales the Gaussian in (1) to have greatest slope (and therefore “distinguishing ability”) when \(\parallel\! {\tilde{x}}_{i}\!y\!\parallel\) is the mean nearest neighbor distance. The values of ϵ used in the Lorenz, ERSSTv4, and CCSM calculations are shown in Supplementary Tables 1 and 2, and are modified by less than a factor of four from the above ballpark estimate.
Approximation of the generator
Our method outputs a collection of \(\tilde{N}\)dimensional complex vectors \({{{{{{{{\boldsymbol{g}}}}}}}}}_{0},\ldots ,{{{{{{{{\boldsymbol{g}}}}}}}}}_{L}\in {{\mathbb{C}}}^{\tilde{N}}\), with \(\tilde{N}=NQ+1\), \({{{{{{{{\boldsymbol{g}}}}}}}}}_{j}={({g}_{(Q1)j},\ldots ,{g}_{(N1)j})}^{\top }\), and complex numbers \({\hat{\lambda }}_{0},\ldots ,{\hat{\lambda }}_{L}\), such that g_{ij} approximates the value of an eigenfunction \({g}_{j}^{(\epsilon )}\) of the regularized generator V_{ϵ} at state ω_{i} ∈ Ω, and \({\hat{\lambda }}_{j}\) approximates the corresponding eigenvalue, λ_{j}. That is, we have g_{ij} ≈ g_{j}(ω_{i}) and \({\hat{\lambda }}_{j}\approx {\lambda }_{j}\), where V_{ϵ}g_{j} = λ_{j}g_{j}. The numerical procedure to compute the eigenpairs \(({\hat{\lambda }}_{j},{{{{{{{{\boldsymbol{g}}}}}}}}}_{j})\) consists of two parts:

1.
Computation of basis vectors \({{{{{{{{\boldsymbol{\phi }}}}}}}}}_{0},\ldots ,{{{{{{{{\boldsymbol{\phi }}}}}}}}}_{L1}\in {{\mathbb{R}}}^{\tilde{N}}\) as eigenvectors of an \(\tilde{N}\times \tilde{N}\) kernel matrix \(\tilde{{{{{{{{\boldsymbol{K}}}}}}}}}\) constructed from the data.

2.
Formation of an L × L matrix W approximating the operator V_{ϵ} and solution of the associated eigenvalue problem.
In what follows, we outline these steps, referring the reader to our previous work^{34} for additional details and pseudocode.
Kernel matrix and basis functions
Using the delayembedded data \({\tilde{x}}_{i}\), we compute an \(\tilde{N}\times \tilde{N}\) matrix K, whose entries are given by the values \({K}_{ij}={k}_{\gamma }({\tilde{x}}_{i},{\tilde{x}}_{j})\) of a pairwise kernel function \({k}_{\gamma }:{{\mathbb{R}}}^{dQ}\times {{\mathbb{R}}}^{dQ}\to {\mathbb{R}}\). We use variablebandwidth kernels
centered on each point \({\tilde{x}}_{i}\), where γ is a positive bandwidth parameter, and \(\sigma ({\tilde{x}}_{i},y)\) is a positive bandwidth function. Intuitively, the role of σ is to control the rate of decay (locality) of the kernel in datadependent manner, such that in regions of high sampling density σ is small, leading to a tighter kernel k_{γ}, and allowing resolution of finerscale features. Conversely, in lowdensity regions σ is large, and hence we obtain a broader kernel k_{γ}, enhancing robustness to statistical sampling errors. Note that the radial Gaussian kernel in (1) is a special case of (2) with the constant bandwidth function \(\sigma ({\tilde{x}}_{i},y)=1\) and bandwidth parameter γ = ϵ. Here, we use the symbol γ for the kernel bandwidth parameter to distinguish it from ϵ employed for transfer/Koopman operator approximation in the previous section. The choice of γ and bandwidth function σ will be discussed in a subsequent section. It should be noted that in addition to improving state estimation, delay embedding also improves the efficiency of basis vectors derived from the data \({\tilde{x}}_{j}\) in approximating transfer/Koopman operator eigenfunctions^{33} (as noted in the main text).
Having constructed the kernel matrix K, we next normalize it to obtain a bistochastic kernel matrix, i.e., a symmetric \(\tilde{N}\times \tilde{N}\) matrix \(\tilde{{{{{{{{\boldsymbol{K}}}}}}}}}\) with positive entries \({\tilde{K}}_{ij}\), satisfying \(\mathop{\sum }\nolimits_{j = Q1}^{N1}{\tilde{K}}_{ij}=1\) for all i ∈ {Q − 1, …, N − 1}. The normalization procedure^{92} employs the steps
where D and S are diagonal matrices with diagonal entries \({D}_{ii}=\mathop{\sum }\nolimits_{j = Q1}^{N1}{K}_{ij}\) and \({S}_{ii}=\mathop{\sum }\nolimits_{j = Q1}^{N1}{K}_{ij}/{D}_{jj}\), respectively. The basis vectors ϕ_{j} are then obtained by solving the matrix eigenvalue problem
By convention, we order the eigenvalues η_{j} in decreasing order, η_{0} ≥ η_{1} ≥ ⋯ , and normalize the corresponding eigenvectors such that \({{{{{{{{\boldsymbol{\phi }}}}}}}}}_{i}^{\top }{{{{{{{{\boldsymbol{\phi }}}}}}}}}_{j}=\tilde{N}{\delta }_{ij}\). By Markovianity of \(\tilde{{{{{{{{\boldsymbol{K}}}}}}}}}\) and strict positivity of k_{γ} (which implies that the elements of \(\tilde{{{{{{{{\boldsymbol{K}}}}}}}}}\) are strictly positive), the leading eigenvalue η_{0} is equal to 1, and is strictly greater than η_{1}. Moreover, the corresponding eigenvector ϕ_{0} has constant elements, which can be set to 1 by our choice of normalization. As a result, viewed as temporal patterns t_{i} ↦ ϕ_{ij}, the eigenvectors ϕ_{j} with j > 1 have zero mean (since they are orthogonal to ϕ_{0}) and unit variance (since \(\parallel\! {{{{{{{{\boldsymbol{\phi }}}}}}}}}_{j}{\parallel }^{2}/\tilde{N}=1\)).
Note that because k_{γ} from (1) is a nonlinear kernel, the entries of ϕ_{j} are not necessarily linear projections of the data \({\tilde{x}}_{i}\) onto a corresponding extended EOF (EEOF); that is, in general, ϕ_{ij} is not equal to \({u}_{j}^{\top }{\tilde{x}}_{i}\) for an EEOF \({u}_{j}\in {{\mathbb{R}}}^{dQ}\). The ϕ_{j} can therefore be viewed as nonlinear principal components, which are able to span richer spaces of observables than conventional EEOF techniques utilizing linear (covariance) kernels. This property is particularly important for our purposes, since in what follows we will use the ϕ_{j} to build Galerkin approximation spaces for the generator that can act on nonlinear functions.
In what follows, our approach is to fix L ≪ N and employ the leading eigenvectors ϕ_{0}, …, ϕ_{L−1} as basis vectors for approximating the generator. We choose an Ldimensional approximation space with high regularity, which reduces the sensitivity of our generator approximations to sampling errors.
Spectral analysis of the generator
Viewing vectors \({{{{{{{\boldsymbol{f}}}}}}}}={({f}_{Q1},\ldots ,{f}_{N1})}^{\top }\in {{\mathbb{C}}}^{\tilde{N}}\) as complexvalued temporal patterns t_{i} ↦ f_{i} sampled discretely in time at the sampling interval Δt, we approximate the generator V by a finitedifference operator \({\mathbb{V}}:{{\mathbb{C}}}^{\tilde{N}}\to {{\mathbb{C}}}^{\tilde{N}}\). As a concrete example, used in all generator calculations in this paper, the following is a fourthorder central scheme:
Using (3), we approximate the generator V by the L × L antisymmetric matrix V with elements
It can be shown that V provides a datadriven Galerkin approximation matrix for V, which converges in a suitable largedata limit^{33,34}. Similarly, we construct an L × L matrix W approximating the diffusionregularized generator V_{ε} = V − εΔ, by defining
Here, Δ is a diffusion operator on the Hilbert space of observables H, and Δ a positivesemidefinite, selfadjoint matrix approximating Δ. We set Δ to the diagonal matrix with entries
With these definitions in place, and a choice of regularization parameter ε > 0, we solve the L × L matrix eigenvalue problem
The eigenvalues \({\hat{\lambda }}_{j}\) provide approximations to the eigenvalues λ_{j} of V_{ε}. Moreover, the eigenvectors \({{{{{{{{\boldsymbol{u}}}}}}}}}_{j}={({u}_{0j},\ldots ,{u}_{(L1)j})}^{\top }\in {{\mathbb{C}}}^{L}\) contain the expansion coefficients of the approximate generator eigenfunction g_{j} in the ϕ_{i} basis; that is,
In analogy to the discretetime case, we order the eigenpairs \(({\hat{\lambda }}_{j},{{{{{{{{\boldsymbol{g}}}}}}}}}_{j})\) in decreasing order of \({{{{{{{\rm{Re}}}}}}}}{\lambda }_{j}\). We normalize the g_{j} such that \({{{{{{{{\boldsymbol{g}}}}}}}}}_{j}^{{{{\dagger}}} }{{{{{{{{\boldsymbol{g}}}}}}}}}_{j}=\tilde{N}\), where ^{†} denotes the complexconjugate transpose. Note that, in general, the eigenvectors g_{j} are not orthogonal (though they are approximately orthogonal for sufficiently small ε).
The imaginary parts of the eigenvalues, \({{{{{{{\rm{Im}}}}}}}}{\hat{\lambda }}_{j}\), represent the angular frequencies (radians per unit time) corresponding to the eigenfunctions g_{j}. In the main text (e.g., Fig. 3) we show the frequencies \({\nu }_{j}={{{{{{{\rm{Im}}}}}}}}{\hat{\lambda }}_{j}/(2\pi )\) measuring cycles per unit time. Meanwhile, the real part \({{{{{{{\rm{Re}}}}}}}}{\hat{\lambda }}_{j}\) measures the (negative) decay rate of g_{j} under the evolution semigroup generated by W. By construction, W has a constant eigenvector g_{0} = 1 corresponding to the eigenvalue \({\hat{\lambda }}_{0}=0\) (i.e., zero decay rate and oscillatory frequency). All other eigenvalues have strictly negative real part, and we order them in order of decreasing \({{{{{{{\rm{Re}}}}}}}}{\hat{\lambda }}_{j}\) (i.e., in order of increasing decay rate) by convention.
In separate calculations with synthetic periodic data, we have verified that the ≃17% approximation error of the triennial eigenfrequency in Fig. 3 can be reduced to ≃5% by using a eighthorder finitedifference approximation scheme at a fixed monthly sampling interval Δt. Since our focus in this work is on lower frequencies (e.g., the interannual ENSO frequency), we have elected to work with the fourthorder scheme in (3), which provides adequate accuracy for the eigenfrequencies of interest while being less sensitive to numerical perturbations than higherorder schemes.
Bandwidth function and parameter tuning
In the CCSM4 and ERSSTv4 analyses, we employ a nonseparable bandwidth function that promotes connectivity between datapoints whose relative displacement vector is aligned with the local dynamical flow^{93}, viz.
where 0 ≤ ζ < 1, and \({v}_{i}={\tilde{x}}_{i}{\tilde{x}}_{i1}\), \({v}_{j}={\tilde{x}}_{j}{\tilde{x}}_{j1}\) are (trajectory tangent) vectors representing the local time tendency of the data. Following Refs ^{69,70,71}, we set ζ to a value close to 1, namely ζ = 0.995 (see Supplementary Table 1). This has the effect of promoting slow timescales in the extracted basis functions ϕ_{j}, which reduces the error of the finitedifference approximation of the generator.
Using the dataset and parameters in Supplementary Table 2, we have computed approximate generator eigenfunctions for the L63 system, which exhibit persistent cyclicity analogously to the transfer operator experiment in Fig. 2. In these experiments, we employ a separable bandwidth function^{40},
where \(\rho ({\tilde{x}}_{i})\) is an estimate of the sampling density of the data at \({\tilde{x}}_{i}\), and m > 0 a parameter approximating the dimension of the data manifold in \({{\mathbb{R}}}^{dQ}\). The density estimator is formally given by \(\rho (y)=\int \exp (\!\parallel zy{\parallel }^{2}/{\tilde{\gamma }}^{2})\ d\nu (z)\), where \(\tilde{\gamma }\, > \,0\) is a bandwidth parameter (in general, different from γ in (2)). The dimension parameter m is determined numerically using the same procedure^{40} (outlined below) as for tuning the bandwidth parameters γ and \(\tilde{\gamma }\). In the L63 experiments, we obtain a value m ≈ 2.06 approximating the fractal dimension of the Lorenz attractor. Even though the L63 snapshot data \({x}_{i}\in {{\mathbb{R}}}^{3}\) contain full state information, we have used a long embedding window of Q = 800 samples at a Δt = 0.01 sampling interval. This has the effect of “biasing” the basis vectors ϕ_{j} towards approximate Koopman/transfer eigenvectors^{33,37}, thus improving the efficiency of the basis in approximating solutions to the generator eigenvalue problem.
In all generator calculations reported in this paper we tune the bandwidth parameters γ and \(\tilde{\gamma }\) automatically using a numerical procedure^{40}. This involves computing the kernel sum \(S({\gamma }_{l}):= \mathop{\sum }\nolimits_{i,j = Q1}^{N2}{k}_{{\gamma }_{l}}({\tilde{x}}_{i},{\tilde{x}}_{j})\) on a logarithmic grid γ_{l} of trial bandwidth parameters, and choosing γ as the bandwidth parameter γ_{l} that maximizes the derivative \(d{{{{{{\mathrm{log}}}}}}}\,S({\gamma }_{l})/d{{{{{{\mathrm{log}}}}}}}\,{\gamma }_{l}\) (estimated numerically by finite differences). See Algorithm 1 in Ref. ^{34} for pseudocode. The maximum value \(\hat{m}\) of \(d{{{{{{\mathrm{log}}}}}}}\,S({\gamma }_{l})/d{{{{{{\mathrm{log}}}}}}}\,S({\gamma }_{l})\) can be shown to be approximately equal to the dimension of the data manifold in \({{\mathbb{R}}}^{dQ}\) divided by 2. Based on that, in our generator calculations utilizing the bandwidth function in (6) we set the dimension parameter \(m=\hat{m}/2\).
Phase composites
Here, we describe the procedure for constructing phase composites of the observables employed in Figs. 7 and 8. Let \(Y:{{\Omega }}\to {{\mathbb{R}}}^{d^{\prime} }\) be a target observable for compositing. For instance, in Fig. 7, Y is either the global SST, zonal surface wind, or meridional surface wind anomaly field sampled at \(d^{\prime}\) gridpoints. Let \(g:{{\Omega }}\to {\mathbb{C}}\) be a complexvalued index representing the phenomenon of interest for which composites are created. In Figs. 7 and 8, g is equal to either the generator eigenfunctions g_{j}, or the lagged Niño 3.4 index f_{nino}. We further let \(\hat{{{\Omega }}}=\{{\omega }_{Q1},\ldots ,{\omega }_{N1}\}\) denote the set of states sampled along our dynamical trajectory (taking delay embedding with Q delays into account), \(S\in {\mathbb{N}}\) the number of phases, and m an integer less than \(\tilde{N}/S\) representing the number of samples in each phase (recall that \(\tilde{N}=(NQ+1)/S\)). We define S “wedges” \({W}_{1},\ldots {W}_{S}\subset {\mathbb{C}}\) in the complex plane by
where Θ_{j} = [2π(j − 1)/S, 2πj/S), and a_{j} is the mth largest modulus of the complex numbers in the set \(\{g(\omega ):{{\mbox{}}}\omega \in \hat{{{\Omega }}}\,{{\mbox{and}}}\,\arg \omega \in {{{\Theta }}}_{j}{{\mbox{}}}\}\).
The sets W_{1}, …, W_{S} represent S “phases” of an oscillatory process represented by g. In addition, we define a phase \({W}_{0}={\mathbb{C}}\setminus \mathop{\bigcup }\nolimits_{j = 1}^{S}{W}_{j}\) associated with the states in Ω for which the process represented by g is considered inactive. For each phase W_{j} we define the associated phase composite as the vector \({{\mathbb{Y}}}_{j}\in {{\mathbb{R}}}^{d}\) given by the average \({{\mathbb{Y}}}_{j}={\sum }_{{\omega }_{i}\in {W}_{j}}Y({\omega }_{i})/ {W}_{j}\), where ∣W_{j}∣ denotes the number of elements of W_{j}. Note that ∣W_{1}∣ = … = ∣W_{S}∣ = m and \( {W}_{0} =\tilde{N}mS\).
We can interpret the phase composites \({{\mathbb{Y}}}_{j}\) as values of the conditional expectation of the observable Y with respect to the partition {W_{0}, …, W_{S}} of \({\mathbb{C}}\) induced by the complexvalued index g. For that, note that the partition induces a discrete variable π: Ω → {0, 1, …, S − 1}, where π(ω) = j if and only if ω lies in W_{j}. We define \({\mathbb{Y}}:{{\Omega }}\to {{\mathbb{R}}}^{d^{\prime} }\) as a discrete observable representing the empirical conditional expectation of Y given π, i.e.,
Note that \({\mathbb{Y}}\) is a discrete observable satisfying \({\mathbb{Y}}(\omega )={{\mathbb{Y}}}_{j}\) whenever the eigenfunction value g(ω) lies in W_{j} for the state ω ∈ Ω.
Mode reconstruction
Our approach for computing spatiotemporal mode reconstructions (e.g., as shown in Fig. 10) is closely related to the reconstruction procedure in SSA^{55}, with appropriate modifications to take into account the facts that eigenfunctions of evolution operators may be (i) complexvalued; and (ii) nonorthogonal. Let \(Y:{{\Omega }}\to {{\mathbb{R}}}^{d^{\prime} }\) be a target observable for reconstruction as in the previous section. For instance, in Fig. 10, the target observable Y is the Niño 3.4 regionaveraged SST anomaly, which is a scalar with \(d^{\prime} =1\), but Y can also be vectorvalued, for example when one reconstructs the original input data and sets Y = X.
Let 〈 ⋅ , ⋅ 〉 denote the inner product of H, \(\langle {f}_{1},{f}_{2}\rangle =\int_{{{\Omega }}}{\bar{f}}_{1}{f}_{2}\ d\mu\), and let \({g}_{j}^{\prime}\) denote an element of the biorthonormal basis of the {g_{i}}; that is, \(\langle {g}_{j}^{\prime},{g}_{i}\rangle ={\delta }_{ji}\). A procedure for constructing the biorthonormal set \(\{{{{{{{{\boldsymbol{g}}}}}}}}^{\prime} ,\ldots ,{{{{{{{{\boldsymbol{g}}}}}}}}}_{L1}^{\prime}\}\) to {g_{0}, …, g_{L−1}}, satisfying \({{{{{{{{\boldsymbol{g}}}}}}}}}_{i}^{^{\prime} {{{\dagger}}} }{{{{{{{{\boldsymbol{g}}}}}}}}}_{j}={\delta }_{ij}\) is to (i) form the L × L Gram matrix G with \({G}_{ij}={{{{{{{{\boldsymbol{u}}}}}}}}}_{i}^{{{{\dagger}}} }{{{{{{{{\boldsymbol{u}}}}}}}}}_{j}\); (ii) compute \({{{{{{{{\boldsymbol{u}}}}}}}}}_{i}^{\prime}={{{{{{{{\boldsymbol{G}}}}}}}}}^{1}{{{{{{{{\boldsymbol{u}}}}}}}}}_{i}\); and (iii) form the linear combination \({{{{{{{{\boldsymbol{g}}}}}}}}}_{i}^{\prime}=\mathop{\sum }\nolimits_{k = 0}^{L1}{u}_{ki}^{\prime}{{{{{{{{\boldsymbol{\phi }}}}}}}}}_{k}\).
For each eigenfunction g_{j} and lag \(q\in {\mathbb{Z}}\), we define the complexvalued spatial pattern \({A}_{j}^{(q)}\in {{\mathbb{C}}}^{d^{\prime} }\) given by projection of \({P}^{q{{\Delta }}t}Y={({U}^{q{{\Delta }}t})}^{* }Y\) onto generator eigenfunction g_{j}; formally,
Numerically, we approximate \({A}_{j}^{(q)}\) by projecting the samples y_{0}, …, y_{N−1}, y_{i} = Y(ω_{i}), of the target observable, lagged by q steps, onto the dual eigenvector \({{{{{{{\boldsymbol{g}}}}}}}}^{\prime}\), viz.
It is worthwhile noting that for q = 0 the spatial patterns \({\hat{A}}_{j}^{(0)}\) are analogous to Koopman modes employed in datadriven Koopman operator techniques^{12,13,17}. The patterns \({\hat{A}}_{j}^{(0)}\) can thus be thought of as timeshifted Koopman modes.
Next, we define an approximate projection of the target observable Y onto the eigenfunction g_{j}, namely \({\tilde{Y}}_{j}:{{\Omega }}\to {{\mathbb{C}}}^{d^{\prime} }\), by multiplication of \({A}_{j}^{(q)}\) with U^{q Δt}g_{j}, followed by averaging over the delayembedding window^{55},
Numerically, \({\tilde{Y}}_{j}\) is approximated by the spatiotemporal pattern \({\hat{{{{{{{{\boldsymbol{Y}}}}}}}}}}_{j}=({\hat{y}}_{j(Q1)},\ldots ,{\hat{y}}_{j(N1)})\in {{\mathbb{C}}}^{d^{\prime} \times \tilde{N}}\), where
approximates \({\tilde{Y}}_{j}({\omega }_{i})\). Note that \({\tilde{Y}}_{j}\) can be equivalently expressed as
from which we can interpret \({\tilde{Y}}_{j}\) as a projection of the observable Y onto an orderQ Krylov subspace generated by eigenfunction g_{j}. Adopting standard terminology from climate science, in the main text we refer to the reconstructed patterns \({\hat{{{{{{{{\boldsymbol{Y}}}}}}}}}}_{j}\) as modes, though it should be kept in mind that these patterns are different from Koopman modes in that they have both spatial and temporal character.
The individual modes \({\tilde{Y}}_{j}\) can be combined into sum modes by choosing an index set J = (j_{1}, …, j_{l}) and defining \({\tilde{Y}}_{J}=\mathop{\sum }\nolimits_{k = 1}^{l}{\tilde{Y}}_{{j}_{k}}\). Similarly, in the empirical setting, we define \({\hat{{{{{{{{\boldsymbol{Y}}}}}}}}}}_{J}=\mathop{\sum }\nolimits_{k = 1}^{l}{\hat{{{{{{{{\boldsymbol{Y}}}}}}}}}}_{{j}_{k}}\). Note that \({\tilde{Y}}_{J}\) (resp. \({\hat{{{{{{{{\boldsymbol{Y}}}}}}}}}}_{J}\)) is real whenever J consists of indices of pairs of complexconjugate eigenvalues λ_{j} (resp. \({\hat{\lambda }}_{j}\)). In Fig. 10, we show reconstructions using index sets J representing various complexconjugate pairs of ENSO and ENSO combination modes associated with generator eigenfunctions.
Data availability
The CCSM4 data analyzed in this study are available at the Earth System Grid repository under accession code https://www.earthsystemgrid.org/dataset/ucar.cgd.ccsm4.joc.b40.1850.track1.1deg.006.html. The ERSSTv4 and NCEP reanalysis data are available at the National Centers for Environmental Information repositories, under accession codes https://www.ncdc.noaa.gov/dataaccess/marineoceandata/extendedreconstructedseasurfacetemperatureersstv4 and https://psl.noaa.gov/data/gridded/data.ncep.reanalysis.html, respectively. The processed data are available from the corresponding author on reasonable request. The processed data can also be generated by running the MATLAB code in the repository https://doi.org/10.5281/zenodo.5508376 or https://doi.org/10.5281/zenodo.5511734.
Code availability
MATLAB code implementing the numerical techniques and reproducing the generator results described in the paper is available at https://doi.org/10.5281/zenodo.5508376. See the file /pubs/FroylandEtAl21_NatComms/README in the code repository for additional information. Code for the transfer operator computations in the Lorenz and ERSSTv4 examples is available at https://doi.org/10.5281/zenodo.5511734.
References
Bjerknes, J. Atmospheric teleconnections from the Equatorial Pacific. Mon. Wea. Rev. 97, 163–172 (1969).
Madden, R. A. & Julian, P. R. Detection of a 40–50 day oscillation in the zonal wind in the tropical Pacific. J. Atmos. Sci. 28, 702–708 (1971).
Wang, C., Deser, C., Yu, J.Y., DiNezio, P. & Clement, A. El Niño and Southern Oscillation (ENSO): A review. In Glynn, P. W., Manzello, D. P. & Enoch, I. C. (eds) Coral Reefs of the Eastern Tropical Pacific: Persistence and Loss in a Dynamic Environment, vol. 8 of Coral Reefs of the World, 85–106, https://doi.org/10.1007/9789401774994_4 (Springer Netherlands, Dordrecht, 2017).
Timmermann, A. et al. El Niño–Southern Oscillation complexity. Nature 559, 535–545 (2018).
L’Heureux, M. L. et al. Observing and predicting the 2015/16 El Niño. Bull. Am. Meteorol. Soc. 98, 1363–1382 (2017).
Kiladis, G. N. et al. A comparison of OLR and circulationbased indices for tracking the MJO. Mon. Weather Rev. 142, 1697–1715 (2014).
von Storch, H. & Zwiers, F. W. Statistical Analysis in Climate Research (Cambridge University Press, Cambridge, 2002).
Aubry, N., Lian, W.Y. & Titi, E. S. Preserving symmetries in the proper orthogonal decomposition. SIAM J. Sci. Comput. 14, 483–505 (1993).
Koopman, B. O. Hamiltonian systems and transformation in Hilbert space. Proc. Natl Acad. Sci. 17, 315–318 (1931).
Baladi, V. Positive Transfer Operators and Decay of Correlations, vol. 16 of Advanced Series in Nonlinear Dynamics (World scientific, Singapore, 2000).
Eisner, T., Farkas, B., Haase, M. & Nagel, R. Operator Theoretic Aspects of Ergodic Theory, vol. 272 of Graduate Texts in Mathematics (Springer, 2015).
Mezić, I. & Banaszuk, A. Comparison of systems with complex behavior. Physica D. 197, 101–133 (2004).
Mezić, I. Spectral properties of dynamical systems, model reduction and decompositions. Nonlinear Dyn. 41, 309–325 (2005).
Froyland, G. Computerassisted bounds for the rate of decay of correlations. Commun. Math. Phys. 189, 237–257 (1997).
Dellnitz, M. & Junge, O. On the approximation of complicated dynamical behavior. SIAM J. Numer. Anal. 36, 491 (1999).
Schütte, C., Huisinga, W. & Deuflhard, P. Transfer operator approach to conformational dynamics in biomolecular systems. In Fiedler, B. (ed.) Ergodic Theory, Analysis, and Efficient Simulation of Dynamical Systems, 191–223, https://doi.org/10.1007/9783642565892_9 (SpringerVerlag, Berlin, 2001).
Rowley, C. W., Mezić, I., Bagheri, S., Schlatter, P. & Henningson, D. S. Spectral analysis of nonlinear flows. J. Fluid Mech. 641, 115–127 (2009).
Schmid, P. J. Dynamic mode decomposition of numerical and experimental data. J. Fluid Mech. 656, 5–28 (2010).
Williams, M. O., Kevrekidis, I. G. & Rowley, C. W. A datadriven approximation of the Koopman operator: Extending dynamic mode decomposition. J. Nonlinear Sci. 25, 1307–1346 (2015).
Brunton, S. L., Brunton, B. W., Proctor, J. L., Kaiser, E. & Kutz, J. N. Chaos as an intermittently forced linear system. Nat. Commun. 8, https://doi.org/10.1038/s41467017000308 (2017).
Klus, S. et al. Datadriven model reduction and transfer operator approximation. J. Nonlinear Sci. 28, 985–1010 (2018).
Korda, M., Putinar, M. & Mezić, I. Datadriven spectral analysis of the Koopman operator. Appl. Comput. Harmon. Anal. 48, 599–629 (2020).
Froyland, G., Santitissadeekorn, N. & Monahan, A. Transport in timedependent dynamical systems: finitetime coherent sets. Chaos 20, 0431116 (2010).
Froyland, G., Lloyd, S. & Santitissadeekorn, N. Coherent sets for nonautonomous dynamical systems. Physica D. 239, 1527–1541 (2010).
Froyland, G. An analytic framework for identifying finitetime coherent sets in timedependent dynamical systems. Physica D. 250, 1–19 (2013).
Froyland, G., Junge, O. & Koltai, P. Estimating longterm behavior of flows without trajectory integration: the infinitesimal generator approach. SIAM J. Numer. Anal. 51, 223–247 (2013).
Froyland, G. Dynamic isoperimetry and the geometry of Lagrangian coherent structures. Nonlinearity 28, 3587–3622 (2015).
Froyland, G., Koltai, P. & Plonka, M. Computation and optimal perturbation of finitetime coherent sets for aperiodic flows without trajectory integration. SIAM J. Appl. Dyn. Sys. 19, 1659–1700 (2020).
Berry, T., Giannakis, D. & Harlim, J. Nonparametric forecasting of lowdimensional dynamical systems. Phys. Rev. E. 91, 032915 (2015).
Giannakis, D., Slawinska, J. & Zhao, Z. Spatiotemporal feature extraction with datadriven Koopman operators. J. Mach. Learn. Res. Proc. 44, 103–115 (2015).
Kawahara, Y. Dynamic mode decomposition with reproducing kernels for Koopman spectral analysis. In Advances in Neural Information Processing Systems, 911–919 (Curran Associates, 2016).
Banisch, R. & Koltai, P. Understanding the geometry of transport: diffusion maps for Lagrangian trajectory data unravel coherent sets. Chaos 27, 035804 (2017).
Das, S. & Giannakis, D. Delaycoordinate maps and the spectra of Koopman operators. J. Stat. Phys. 175, 1107–1145 (2019).
Giannakis, D. Datadriven spectral decomposition and forecasting of ergodic dynamical systems. Appl. Comput. Harmon. Anal. 62, 338–396 (2019).
Klus, S., Schuster, I. & Muandet, K. Eigendecomposition of transfer operators in reproducing kernel Hilbert spaces. J. Nonlinear Sci. 30, 283–315 (2019).
Das, S., Giannakis, D. & Slawinska, J. Reproducing kernel Hilbert space quantification of unitary evolution groups. Appl. Comput. Harmon. Anal. 54, 75–136 (2021).
Giannakis, D. Delaycoordinate maps, coherence, and approximate spectra of evolution operators. Res. Math. Sci. 8, 8 (2021).
Belkin, M. & Niyogi, P. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 1373–1396 (2003).
Coifman, R. R. & Lafon, S. Diffusion maps. Appl. Comput. Harmon. Anal. 21, 5–30 (2006).
Berry, T. & Harlim, J. Variable bandwidth diffusion kernels. Appl. Comput. Harmon. Anal. 40, 68–96 (2016).
Kosambi, D. D. Satistics in function space. J. Ind. Math. Soc. 7, 76–88 (1943).
Kim, K.Y. & Wu, Q. A comparison study of EOF techniques: analysis of nonstationary data with periodic statistics. J. Clim. 12, 185–199 (1999).
Horel, J. D. Complex principal component analysis: theory and examples. J. Clim. Appl. Meteorol. 23, 1660–1673 (1984).
von Storch, H., Bürger, G., Schnur, R. & von Storch, J.S. Principal oscillation patterns: a review. J. Clim. 8, 377–400 (1995).
Neumaier, A. & Schneider, T. Estimation of parameters and eigenmodes of multivariate autoregressive models. ACM Trans. Math. Softw. 27, 27–57 (2001).
Giannakis, D., Kolchinskaya, A., Krasnov, D. & Schumacher, J. Koopman analysis of the longterm evolution in a turbulent convection cell. J. Fluid Mech. 847, 735–767 (2018).
Miron, P. et al. Lagrangian geography of the deep gulf of Mexico. J. Phys. Oceanogr. 49, 269–290 (2019).
Koltai, P. & Weiss, S. Diffusion maps embedding and transition matrix analysis of the largescale flow structure in turbulent RayleighBénard convection. Nonlinearity 33, 1723–1756 (2020).
Packard, N. H. et al. Geometry from a time series. Phys. Rev. Lett. 45, 712–716 (1980).
Takens, F. Detecting strange attractors in turbulence. In Dynamical Systems and Turbulence, vol. 898 of Lecture Notes in Mathematics, 366–381, https://doi.org/10.1007/bfb0091924 (Springer, Berlin, 1981).
Sauer, T., Yorke, J. A. & Casdagli, M. Embedology. J. Stat. Phys. 65, 579–616 (1991).
Weare, B. C. & Nasstrom, J. N. Examples of extended empirical orthogonal function analyses. Mon. Weather Rev. 110, 784–812 (1982).
Broomhead, D. S. & King, G. P. Extracting qualitative dynamics from experimental data. Physica D. 20, 217–236 (1986).
Vautard, R. & Ghil, M. Singular spectrum analysis in nonlinear dynamics, with applications to paleoclimatic time series. Physica D. 35, 395–424 (1989).
Ghil, M. et al. Advanced spectral methods for climatic time series. Rev. Geophys. 40, 31–341 (2002).
Froyland, G., Lloyd, S. & Quas, A. Coherent structures and isolated spectrum for Perron–Frobenius cocycles. Ergod. Theory Dyn. Syst. 30, 729–756 (2010).
Froyland, G., Lloyd, S. & Quas, A. A semiinvertible Oseledets theorem with application to transfer operator cocycles. Discret. Cont. Dyn. Syst. 33, 3835–3860 (2013).
GonzálezTokman, C. & Quas, A. A semiinvertible operator Oseledets theorem. Ergod. Theory Dyn. Syst. 34, 1230–1272 (2014).
Froyland, G., Padberg, K., England, M. H. & Treguier, A. M. Detection of coherent oceanic structures via transfer operators. Phys. Rev. Lett. 98, 224503 (2007).
Keller, G. & Liverani, C. Stability of the spectrum for transfer operators. Ann. della Sc. Norm. Super. di PisaCl. di Sci. 28, 141–152 (1999).
Froyland, G. On Ulam approximation of the isolated spectrum and eigenfunctions of hyperbolic maps. Discret. Cont. Dyn. S. 17, 671–689 (2007).
Crimmins, H. & Froyland, G. Fourier approximation of the statistical properties of Anosov maps on tori. Nonlinearity 33, 6244 (2020).
Lasota, A. & Mackey, M. C. Chaos, Fractals, and Noise: Stochastic Aspects of Dynamics, vol. 97 of Applied Mathematical Sciences (SpringerVerlag, New York, 1997).
Denner, A., Junge, O. & Matthes, D. Computing coherent sets using the FokkerPlanck equation. J. Comput. Dyn. 3, 163 (2016).
Stuecker, M. F., Jin, F. F. & Timmermann, A. El Niño–Southern Oscillation frequency cascade. Proc. Natl Acad. Sci. 112, 13490–13495 (2015).
Power, S., Casey, C., Folland, C., Colman, A. & Mehta, V. Interdecadal modulation of the impact of ENSO on Australia. Clim. Dyn. 15, 319–324 (1999).
Gent, P. R. et al. The Community Climate System Model version 4. J. Clim. 24, 4973–4991 (2011).
Huang, B. et al. Extended Reconstructed Sea Surface Temperature version 4 (ERSST.v4): Part I. Upgrades and intercomparisons. J. Clim. 28, 911–930 (2014).
Slawinska, J. & Giannakis, D. IndoPacific variability on seasonal to multidecadal time scales. Part I: intrinsic SST modes in models and observations. J. Clim. 30, 5265–5294 (2017).
Giannakis, D. & Slawinska, J. IndoPacific variability on seasonal to multidecadal time scales. Part II: Multiscale atmosphereocean linkages. J. Clim. 31, 693–725 (2018).
Wang, X., Giannakis, D. & Slawinska, J. The Antarctic circumpolar wave and its seasonality: Intrinsic travelling modes and El NiñoSouthern Oscillation teleconnections. Int. J. Climatol. 39, 1026–1040 (2019).
Lenssen, N. J. L. et al. Improvements in the GISTEMP uncertainty model. J. Geophys. Res. Atmos. 124, 6307–6326 (2019).
Froyland, G., Gottwald, G. A. & Hammerlindl, A. A computational method to extract macroscopic variables and their dynamics in multiscale systems. SIAM J. Appl. Dyn. Syst. 13, 1816–1846 (2014).
An, S.I. & Kim, J.W. ENSO transition asymmetry: Internal and external causes and intermodel diversity. Geophys. Res. Lett. 45, 5095–5104 (2018).
Mauroy, A., Mezić, I. & Moehlis, J. Isostables, isochrons, and Koopman spectrum for the action–angle representation of stable fixed point dynamics. Physica D. 261, 19–30 (2013).
Bolt, E. M., Li, Q., Dietrich, F. & Kevrekidis, I. On matching, and even rectifying, dynamical systems through Koopman operator eigenfunctions. SIAM J. Appl. Dyn. Sys. 17, 1925–1960 (2018).
Jin, F.F. An equatorial ocean recharge paradigm for ENSO. Part I: conceptual model. J. Atmos. Sci. 54, 811–829 (1997).
Froyland, G., Rock, C. P. & Sakellariou, K. Sparse eigenbasis approximation: multiple feature extraction across spatiotemporal scales with application to coherent set identification. Commun. Nonlinear Sci. Numer. Simul. 77, 81–107 (2019).
Lau, W. K. M. & Waliser, D. E. Intraseasonal Variability in the Atmosphere–Ocean Climate System (SpringerVerlag, Berlin, 2011).
Kalnay, E. et al. The NCEP/NCAR 40year reanalysis project. Bull. Am. Meteorol. Soc. 77, 437–472 (1996).
Fedorov, A. V. The response of the coupled tropical ocean–atmosphere to westerly wind bursts. Q. J. R. Meteorol. Soc. 128, 1–23 (2002).
McGregor, S., Timmermann, A., Schneider, N., Stuecker, M. F. & England, M. F. The effect of the South Pacific Convergence Zone on the termination of El Niño events and the meridional asymmetry of ENSO. J. Clim. 25, 5566–5586 (2012).
Barnston, A. G. & Ropelewski, C. F. Prediction of ENSO episodes using canonical correlation analysis. J. Clim. 5, 1316—1345 (1991).
Jiang, N., Neelin, J. D. & Ghil, M. Quasiquadrennial and quasibiennial variability in the equatorial Pacific. Clim. Dyn. 12, 101–112 (1995).
Hu, S. & Fedorov, A. V. Crossequatorial winds control El niño diversity and change. Nat. Clim. Change 8, 798–802 (1998).
Fang, S.W. & Yu, J.Y. Contrasting transition complexity between El Niño and La Niña: observations and CMIP5/6 models. Geophys. Res. Lett. 47, e2020GL088926 (2020).
Wang, R. & Ren, H.L. Understanding key roles of two ENSO modes in spatiotemporal diversity of ENSO. J. Clim. 33, 6453–6469 (2020).
Stuecker, M. F. et al. Revisiting ENSO/Indian Ocean Dipole phase relationships. Geophys. Res. Lett. 44, 2481–2492 (2017).
Stevenson, S., Bette, O.B., Fasullo, J. & Brady, E. "el Niño like” hydroclimate responses to last millenium volcanic eruptions. J. Clim. 29, 2907–2921 (2016).
Khodri, M. et al. Tropical explosive volcanic eruptions can trigger El Niño by cooling tropical Africa. Nat. Comm. 8, 778 (2017).
Predybaylo, E., Stenchikov, G. L., Wittenberg, A. T. & Zeng, F. Impacts of a Pinatubo size volcanic eruption on ENSO. J. Geophys. Res. Atmos. 122, 925–947 (2017).
Coifman, R. & Hirn, M. Bistochastic kernels via asymmetric affinity functions. Appl. Comput. Harmon. Anal. 35, 177–180 (2013).
Giannakis, D. Dynamicsadapted cone kernels. SIAM J. Appl. Dyn. Syst. 14, 556–608 (2015).
Acknowledgements
This research was initiated during a 2week visit of D.G. to UNSW in 2018, supported by GF’s Future Fellowship, and further developed during a 5week visit by G.F. to NYU in 2019, supported by the UNSW Faculty of Science’s and School of Mathematics and Statistics’ Special Studies Program, and an ARC Discovery Project. G.F. also thanks the Courant Institute for hospitality during this visit. D.G. received support from NSF grants 1842538 and DMS 1854383 and ONR YIP grant N000141612649. BRL and MP received support from NSF grant 1842543. J.S. received support from NSF grant 1842538. J.S. also acknowledges support from the core funding of the Helsinki Institute for Information Technology (HIIT) and the Institute for Basic Sciences (IBS), Republic of Korea, under IBSR028D1.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis was performed by all authors. The first draft of the manuscript was written by G.F., D.G., and B.L., and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Additional information
Peer review information Nature Communications thanks Valerio Lucarini and the other anonymous reviewer(s) for their contribution to the peer review this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Froyland, G., Giannakis, D., Lintner, B.R. et al. Spectral analysis of climate dynamics with operatortheoretic approaches. Nat Commun 12, 6570 (2021). https://doi.org/10.1038/s4146702126357x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s4146702126357x
This article is cited by

Convective modes reveal the incoherence of the Southern Polar Vortex
Scientific Reports (2024)

Revealing trends and persistent cycles of nonautonomous systems with autonomous operatortheoretic techniques
Nature Communications (2024)

The climate variability trio: stochastic fluctuations, El Niño, and the seasonal cycle
Geoscience Letters (2023)

Theoretical tools for understanding the climate crisis from Hasselmann’s programme and beyond
Nature Reviews Physics (2023)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.