Introduction

Dynamical systems evolve according to the laws of physics, which can usually be described using differential equations. By solving these differential equations, it is possible to predict the future states of a dynamical system. Identifying accurate and efficient dynamic models based on observed trajectories is thus critical for the analysis, simulation and control of dynamical systems. We consider here the problem of learning dynamics: given a dataset of trajectories followed by a dynamical system, we wish to infer the dynamical law responsible for these trajectories and then possibly use that law to predict the evolution of similar systems in different initial states. We are particularly interested in the surrogate modeling problem: the underlying dynamical system is known, but traditional simulations are either too slow or expensive for some optimization task. This problem can be addressed by learning a less expensive, but less accurate surrogate for the simulations.

Models obtained from first principles are extensively used across science and engineering. Unfortunately, due to incomplete knowledge, these models based on physical laws tend to over-simplify or incorrectly describe the underlying structure of the dynamical systems, and usually lead to high bias and modeling errors that cannot be corrected by optimizing over the few parameters in the models.

Deep learning architectures can provide very expressive models for function approximation, and have proven very effective in numerous contexts1,2,3. Unfortunately, standard non-structure-preserving neural networks struggle to learn the symmetries and conservation laws underlying dynamical systems, and as a result do not generalize well. Indeed, they tend to prefer certain representations of the dynamics where the symmetries and conservation laws of the system are not exactly enforced. As a result, these models do not generalize well as they are often not capable of producing physically plausible results when applied to new unseen states. Deep learning models capable of learning and generalizing dynamics effectively are typically over-parameterized, and as a consequence tend to have high variance and can be very difficult to interpret4. Also, training these models usually requires large datasets and a long computational time, which makes them prohibitively expensive for many applications.

A recent research direction is to consider a hybrid approach which combines knowledge of physics laws and deep learning architectures2,3,5,6. The idea is to encode physics laws and the conservation of geometric properties of the underlying systems in the design of the neural networks or in the learning process. Available physics prior knowledge can be used to construct physics-constrained neural networks with improved design and efficiency and a better generalization capacity, which take advantage of the function approximation power of neural networks to deal with incomplete knowledge.

In this paper, we will consider the problem of learning dynamics for highly-oscillatory Hamiltonian systems. Examples include the Klein–Gordon equation in the weakly-relativistic regime, charged particles moving through a strong magnetic field, and the rotating inviscid Euler equations in quasi-geostrophic scaling7. More generally, any Hamiltonian system may be embedded as a normally-stable elliptic slow manifold in a nearly-periodic Hamiltonian system8. Highly-oscillatory Hamiltonian systems exhibit two basic structural properties whose interactions play a crucial role in their long-term dynamics. First is preservation of the symplectic form, as for all Hamiltonian systems. Second is timescale separation, corresponding to the relatively short timescale of oscillations compared with slower secular drifts. Coexistence of these two structural properties implies the existence of an adiabatic invariant8,9,10,11. Adiabatic invariants differ from true constants of motion, in particular energy invariants, which do not change at all over arbitrary time intervals. Instead adiabatic invariants are conserved with limited precision over very large time intervals. There are no learning frameworks available today that exactly preserve the two structural properties whose interplay gives rise to adiabatic invariants. This work addresses this challenge by exploiting a recently-developed theory of nearly-periodic symplectic maps11, which can be thought of as discrete-time analogues of highly-oscillatory Hamiltonian systems9.

As a result of being symplectic, a mapping assumes a number of special properties. In particular, symplectic mappings are closely related to Hamiltonian systems: any solution to a Hamiltonian system is a symplectic flow12, and any symplectic flow corresponds locally to an appropriate Hamiltonian system13. It is well-known that preserving the symplecticity of a Hamiltonian system when constructing a discrete approximation of its flow map ensures the preservation of many aspects of the dynamical system such as energy conservation, and leads to physically well-behaved discrete solutions over exponentially-long time intervals13,14,15,16,17. It is thus important to have structure-preserving neural network architectures which can learn symplectic maps and ensure that the learnt surrogate map preserves symplecticity. Many physics-informed and structure-preserving machine learning approaches have recently been proposed to learn Hamiltonian dynamics and symplectic maps2,3,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35. In particular, Hénon Neural Networks (HénonNets)2 can approximate arbitrary well any symplectic map via compositions of simple yet expressive elementary symplectic mappings called Hénon-like mappings. In the numerical experiments conducted in this paper, HénonNets2 will be our preferred choice of symplectic map approximator to use as building block in our framework for approximation of nearly-periodic symplectic maps, although some of the other approaches listed above for approximating symplectic mappings can be used within our framework as well.

As shown by Kruskal9, every nearly-periodic system, Hamiltonian or not, admits an approximate U(1)-symmetry, determined to leading order by the unperturbed periodic dynamics. It is well-known that a Hamiltonian system which admits a continuous family of symmetries also admits a corresponding conserved quantity. It is thus not surprising that a nearly-periodic Hamiltonian system, which admits an approximate symmetry, must also have an approximate conservation law11, and the approximately conserved quantity is referred to as an adiabatic invariant.

Nearly-periodic maps, first introduced by Burby et al.11, are natural discrete-time analogues of nearly-periodic systems, and have important applications to numerical integration of nearly-periodic systems. Nearly-periodic maps may also be used as tools for structure-preserving simulation of non-canonical Hamiltonian systems on exact symplectic manifolds11, which have numerous applications across the physical sciences. Noncanonical Hamiltonian systems play an especially important role in modeling weakly-dissipative plasma systems36,37,38,39,40,41,42. Similarly to the continuous-time case, nearly-periodic maps with a Hamiltonian structure (that is symplecticity) admit an approximate symmetry and as a result also possess an adiabatic invariant11. The adiabatic invariants that our networks target only arise in purely Hamiltonian systems. Just like dissipation breaks the link between symmetries and conservation laws in Hamiltonian systems, dissipation also breaks the link between approximate symmetries and approximate conservation laws in Hamiltonian systems. We are not considering systems with symmetries that are broken by dissipation or some other mechanism, but rather considering systems which possess approximate symmetries. This should be contrasted with other frameworks43,44,45 which develop machine learning techniques for systems that explicitly include dissipation.

We note that neural network architectures designed for multi-scale dynamics and long-time dependencies are available46, and that many authors have introduced numerical algorithms specifically designed to efficiently step over high-frequency oscillations47,48,49. However, the problem of developing surrogate models for dynamical systems that avoid resolving short oscillations remains open. Such surrogates would accelerate optimization algorithms that require querying the dynamics of an oscillatory system during the optimizer’s “inner loop”. The network architecture presented in this article represents a first important step toward a general solution of this problem. Some of its advantages are that it aims to learn a fast surrogate model that can resolve long-time dynamics using very short time data, and that it is guaranteed to enjoy symplectic universal approximation within the class of nearly periodic maps. As developed in this paper, our method applies to dynamical systems that exhibit a single fast mode of oscillation. In particular, when initial conditions for the surrogate model are selected on the zero level set of the learned adiabatic invariant, the network automatically integrates along the slow manifold50,51,52,53,54. While our network architecture generalizes in a straightforward manner to handle multiple non-resonant modes, it cannot be applied to dynamical systems that exhibit resonant surfaces.

Note that many of the approaches listed earlier for physics-based or structure-preserving learning of Hamiltonian dynamics focus on learning the vector field associated to the continuous-time Hamiltonian system, while others learn a discrete-time symplectic approximation to the flow map of the Hamiltonian system. In many contexts, we do not need to infer the continuous-time dynamics, and only need a surrogate model which can rapidly generate accurate predictions which remain physically consistent for a long time. Learning a discrete-time approximation to the evolution or flow map, instead of learning the continuous-time vector field, allows for fast prediction and simulation without the need to integrate differential equations or use neural ODEs and adjoint techniques (which can be very expensive and can introduce additional errors due to discretization). In this paper, we will learn nearly-periodic symplectic approximations to the flow maps of nearly-periodic Hamiltonian systems, with the intention of obtaining algorithms which can generate accurate and physically-consistent simulations much faster than traditional integrators.

Outline. We first review briefly some background notions from differential geometry in Sect. ″Differential geometry background″. Then, we discuss how symplectic maps can be approximated using HénonNets in Sect. ″Approximation of symplectic maps via hénon neural networks″, before defining nearly-periodic systems and maps and reviewing their important properties in Sect. ″Nearly-periodic systems and nearly-periodic maps″. In Sect. ″Novel structure-preserving neural network architectures″, we introduce novel neural network architectures, gyroceptrons and symplectic gyroceptrons, to approximate symplectic and non-symplectic nearly-periodic maps. We then show in Sect. ″Numerical confirmation of the existence of adiabatic invariants″ that symplectic gyroceptrons admit adiabatic invariants regardless of the values of their weights. Finally, in Sect. ″Numerical examples of learning surrogate maps″, we demonstrate how the proposed architecture can be used to learn surrogate maps for the nearly-periodic symplectic flow maps associated to two different systems: a nearly-periodic Hamiltonian system composed of two nonlinearly coupled oscillators (in Sect. ″Nonlinearly coupled oscillators″), and the nearly-periodic Hamiltonian system describing the evolution of a charged particle interacting with its self-generated electromagnetic field (in Sect. ″Charged particle interacting with its self-generated electromagnetic field″).

Preliminaries

Differential geometry background

In this paper, we reserve the symbol M for a smooth manifold equipped with a smooth auxiliary Riemannian metric g, and \({\mathcal {E}}\) will always denote a vector space for the parameter \(\varepsilon\). We will now briefly introduce some standard concepts from differential geometry that will be used throughout this paper (more details can be found in introductory differential geometry books55,56,57).

A smooth map \(h:M_1\rightarrow M_2\) between smooth manifolds \(M_1,M_2\) is a diffeomorphism if it is bijective with a smooth inverse. We say that \(f_\varepsilon :M_1\rightarrow M_2\), \(\varepsilon \in {\mathcal {E}}\), is a smooth \(\varepsilon\)-dependent mapping when the mapping \(M_1\times {\mathbb {R}}\rightarrow M_2:(m,\varepsilon )\mapsto f_\varepsilon (m)\) is smooth.

A vector field on a manifold M is a map \(X:M \rightarrow TM\) such that \(X(m) \in T_mM\) for all \(m\in M\), where \(T_mM\) denotes the tangent space to M at m and \(TM = \{ (m,v) \, | \, m\in M, v \in T_mM \}\) is the tangent bundle TM of M. The vector space dual to \(T_m M\) is the cotangent space \(T_m^* M\), and the cotangent bundle of M is \(T^* M = \{ (m,p) \, | \, m\in M, p \in T^*_mM \}\). The integral curve at m of a vector field X is the smooth curve c on M such that \(c(0)=m\) and \(c'(t) = X(c(t))\). The flow of a vector field X is the collection of maps \(\varphi _t:M \rightarrow M\) such that \(\varphi _t(m)\) is the integral curve of X with initial condition \(m\in M\).

A \({\varvec{k}}\)-form on a manifold M is a map which assigns to every point \(m\in M\) a skew-symmetric k-multilinear map on \(T_mM\). Let \(\alpha\) be a k-form and \(\beta\) be a s-form \(\beta\) on a manifold M. Their tensor product \(\alpha \otimes \beta\) at \(m\in M\) is defined via

$$\begin{aligned} (\alpha \otimes \beta )_m (v_1, \ldots , v_{k+s} ) = \alpha _m(v_1, \ldots , v_k) \beta _m (v_{k+1}, \ldots , v_{k+s}). \end{aligned}$$

The alternating operator \(\text {Alt}\) acts on a k-form \(\alpha\) via

$$\begin{aligned} \text {Alt}(\alpha )(v_1, \ldots , v_k) = \frac{1}{k!} \sum _{\pi \in S_k}{\text {sgn}(\pi ) \alpha (v_{\pi (1)}, \ldots , v_{\pi (k)})}, \end{aligned}$$

where \(S_k\) is the group of all the permutations of \(\{ 1, \ldots , k\}\) and \(\text {sgn}(\pi )\) is the sign of the permutation. The wedge product \(\alpha \wedge \beta\) is then defined via

$$\begin{aligned} \alpha \wedge \beta = \frac{(k+s)!}{k!s!} \text {Alt}(\alpha \otimes \beta ). \end{aligned}$$

The exterior derivative of a smooth function \(f: M \rightarrow {\mathbb {R}}\) is its differential \({\textbf{d}}f\), and the exterior derivative \({\textbf{d}}\alpha\) of a k-form \(\alpha\) with \(k>0\) is the \((k+1)\)-form defined by

$$\begin{aligned} {\textbf{d}} \left( \sum _{i_1, \ldots , i_k}{ \alpha _{i_1 \ldots i_k} {\textbf{d}}x^{i_1} \wedge \ldots \wedge {\textbf{d}}x^{i_k} }\right) = \sum _{j}{ \sum _{i_1, \ldots , i_k}{ \partial _j \alpha _{i_1 \ldots i_k} {\textbf{d}} x^j \wedge {\textbf{d}}x^{i_1} \wedge \ldots \wedge {\textbf{d}}x^{i_k} } }. \end{aligned}$$

The interior product \(\iota _X \alpha\) where X is a vector field on M and \(\alpha\) is a k-form is the \((k-1)\)-form defined via

$$\begin{aligned} (\iota _X \alpha )_m (v_2, \ldots , v_k) = \alpha _m(X(m), v_2, \ldots , v_k). \end{aligned}$$

The pull-back \(\psi ^* \alpha\) of \(\alpha\) by a smooth map \(\psi :M \rightarrow N\) is the k-form defined by

$$\begin{aligned} (\psi ^* \alpha )_m(v_1,\ldots ,v_k) = \alpha _{\psi (m) } ({\textbf{d}}\psi \cdot v_1, \ldots , {\textbf{d}}\psi \cdot v_k ). \end{aligned}$$

The Lie derivative \({\mathcal {L}}_{X} \alpha\) of the k-form \(\alpha\) along a vector field X with flow \(\varphi _t\) is \({\mathcal {L}}_{X} \alpha = \frac{d}{dt} \Big |_{t=0} \varphi _t^* \alpha\), and for a smooth function \(f : M \rightarrow {\mathbb {R}}\), \({\mathcal {L}}_{X} f\) is the directional derivative \({\mathcal {L}}_{X} f = {\textbf{d}}f \cdot X\).

The circle group U(1), also known as first unitary group, is the one-dimensional Lie group of complex numbers of unit modulus with the standard multiplication operation. It can be parametrized via \(e^{i\theta }\) for \(\theta \in [0,2\pi )\), and is isomorphic to the special orthogonal group \(\text {SO}(2)\) of rotations in the plane. A circle action on a manifold M is a one-parameter family of smooth diffeomorphisms \(\Phi _\theta : M \rightarrow M\) that satisfies the following three properties for any \(\theta ,\theta _1,\theta _2 \in U(1) \cong {\mathbb {R}} \text { mod } 2\pi\):

$$\begin{aligned} \Phi _{\theta + 2\pi } = \Phi _\theta \quad \text {(periodicity),} \qquad \Phi _{0} = \text {Id}_M \quad \text {(identity),} \qquad \Phi _{\theta _1 + \theta _2} = \Phi _{\theta _1} \ \circ \ \Phi _{\theta _2} \quad \text {(additivity).} \end{aligned}$$

The infinitesimal generator of a circle action \(\Phi _\theta\) on M is the vector field on M defined by \(m \mapsto \frac{d}{d\theta } \Big |_{\theta =0} \Phi _{\theta } (m)\).

Approximation of symplectic maps via Hénon neural networks

Let \(U\subset {\mathbb {R}}^{n}\times {\mathbb {R}}^n={\mathbb {R}}^{2n}\) be an open set in an even-dimensional Euclidean space. Denote points in \({\mathbb {R}}^n\times {\mathbb {R}}^n\) using the notation (xy), with \(x,y\in {\mathbb {R}}^n\). A smooth mapping \(\Phi :U\rightarrow {\mathbb {R}}^{2n}\) with components \(\Phi (x,y) = ({\bar{x}}(x,y),{\bar{y}}(x,y))\) is symplectic if

$$\begin{aligned} \sum _{i=1}^n {\textbf{d}}x^i\wedge {\textbf{d}}y^i = \sum _{i=1}^n {\textbf{d}}{\bar{x}}^i\wedge {\textbf{d}}{\bar{y}}^i. \end{aligned}$$
(2.1)

The symplectic condition (2.1) implies that the mapping \(\Phi\) has a number of special properties. In particular, there is a close relation between Hamiltonian systems and symplecticity of flows: Poincaré’s Theorem12 states that any solution to a Hamiltonian system is a symplectic flow, and it can also be shown that any symplectic flow corresponds locally to an appropriate Hamiltonian system. Preserving the symplecticity of a Hamiltonian system when constructing a discrete approximation of its flow map ensures the preservation of many aspects of the dynamical system such as energy conservation, and leads to physically well-behaved discrete solutions13,14,15,16,17. It is thus important to have structure-preserving network architectures which can learn symplectic maps.

The space of all symplectic maps is infinite dimensional58, so the problem of approximating an arbitrary symplectic map using compositions of simpler symplectic mappings is inherently interesting. Turaev59 showed that every symplectic map may be approximated arbitrarily well by compositions of Hénon-like maps, which are special elementary symplectic maps.

Definition 2.1

Let \(V:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\) be a smooth function on \({\mathbb {R}}^n\) and let \(\eta \in {\mathbb {R}}^n\) be a constant. We define the Hénon-like map \(H[V,\eta ]:{\mathbb {R}}^n\times {\mathbb {R}}^n\rightarrow {\mathbb {R}}^n\times {\mathbb {R}}^n\) with potential V and shift \(\eta\) via

$$\begin{aligned} H[V,\eta ]\begin{pmatrix} x\\ y \end{pmatrix} = \begin{pmatrix} y + \eta \\ -x +\nabla V(y) \end{pmatrix}. \end{aligned}$$
(2.2)

Theorem 2.1

(Turaev59) Let \(\Phi :U\rightarrow {\mathbb {R}}^n\times {\mathbb {R}}^n\) be a \(C^{r+1}\) symplectic mapping. For each compact set \(C\subset U\) and \(\delta >0\) there is a smooth function \(V:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\), a constant \(\eta\), and a positive integer N such that \(H[V,\eta ]^{4N}\) approximates the mapping \(\Phi\) within \(\delta\) in the \(C^r\) topology.

Remark 2.1

The significance of the number 4 in this theorem follows from the fact that the fourth iterate of the Hénon-like map with trivial potential \(V=0\) is the identity map: \(H[0,\eta ]^4 = \text {Id}_{{\mathbb {R}}^{n} \times {\mathbb {R}}^n}\).

Turaev’s result suggests a specific neural network architecture to approximate symplectic mappings using Hénon-like maps2. We review the construction of HénonNets2, starting with the notion of a Hénon layer.

Definition 2.2

Let \(\eta \in {\mathbb {R}}^n\) be a constant vector, and let V be a scalar feed-forward neural network on \({\mathbb {R}}^n\), that is, a smooth mapping \(V:{\mathcal {W}}\times {\mathbb {R}}^n\rightarrow {\mathbb {R}}\), where \({\mathcal {W}}\) is a space of neural network weights. The Hénon layer with potential V, shift \(\eta\), and weight W is the iterated Hénon-like map

$$\begin{aligned} L[V[W],\eta ] = H[V[W],\eta ]^4, \end{aligned}$$
(2.3)

where we use the notation V[W] to denote the mapping \(V[W](y) = V(W,y),\) for any \(y\in {\mathbb {R}}^n, \text { } W\in {\mathcal {W}}.\)

There are various network architectures for the potential V[W] that are capable of approximating any smooth function \(V:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\) with any desired level of accuracy. For example, a fully-connected neural network with a single hidden layer of sufficient width can approximate any smooth function. Therefore a corollary of Theorem 2.1 is that any symplectic map may be approximated arbitrarily well by the composition of sufficiently many Hénon layers with various potentials and shifts. This leads to the notion of a Hénon Neural Network.

Definition 2.3

Let N be a positive integer and

  • \(\varvec{V} = \{V_k\}_{k\in \{1,\dots , N\}}\) be a family of scalar feed-forward neural networks on \({\mathbb {R}}^n\)

  • \(\varvec{W} = \{W_k\}_{k\in \{1,\dots ,N\}}\) be a family of network weights for \(\varvec{V}\)

  • \(\varvec{\eta } = \{\eta _k\}_{k\in \{1,\dots ,N\}}\) be a family of constants in \({\mathbb {R}}^n\)

The Hénon neural network (HénonNet) with layer potentials \(\varvec{V}\), layer weights \(\varvec{W}\), and layer shifts \(\varvec{\eta }\) is the mapping

$$\begin{aligned} {\mathcal {H}}[\varvec{V}[\varvec{W}],\varvec{\eta }]&= L[V_N[W_N],\eta _N] \ \ \circ \ \ \dots \ \ \circ \ \ L[V_2[W_2],\eta _2] \ \ \circ \ \ L[V_1[W_1],\eta _1] \end{aligned}$$
(2.4)
$$\begin{aligned} = H[V_N[W_N],\eta _N]^4 \ \ \circ \ \ \dots \ \ \circ \ \ H[V_2[W_2],\eta _2]^4 \ \ \circ \ \ H[V_1[W_1],\eta _1]^4. \end{aligned}$$
(2.5)

A composition of symplectic mappings is also symplectic, so every HénonNet is a symplectic mapping, regardless of the architectures for the networks \(V_k\) and of the weights \(W_k\). Furthermore, Turaev’s Theorem 2.1 implies that the family of HénonNets is sufficiently expressive to approximate any symplectic mapping:

Lemma 2.1

Let \(\Phi :U\rightarrow {\mathbb {R}}^n\times {\mathbb {R}}^n\) be a \(C^{r+1}\) symplectic mapping. For each compact set \(C\subset U\) and \(\delta >0\) there is a HénonNet \({\mathcal {H}}\) that approximates \(\Phi\) within \(\delta\) in the \(C^r\) topology.

Remark 2.2

Note that Hénon-like maps are easily invertible,

$$\begin{aligned} H[V,\eta ]\begin{pmatrix} x\\ y \end{pmatrix} = \begin{pmatrix} y + \eta \\ -x +\nabla V(y) \end{pmatrix} \quad \Rightarrow \quad H^{-1}[V,\eta ]\begin{pmatrix} x\\ y \end{pmatrix} = \begin{pmatrix} \nabla V(x-\eta ) - y \\ x - \eta \end{pmatrix} , \end{aligned}$$
(2.6)

so we can also easily invert Hénon networks by composing inverses of Hénon-like maps.

We also introduce here modified versions of Hénon-like maps and HénonNets to approximate symplectic maps possessing a near-identity property:

Definition 2.4

Let \(V:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\) be a smooth function and let \(\eta \in {\mathbb {R}}^n\) be a constant. We define the near-identity Hénon-like map \(H_\varepsilon [V,\eta ]:{\mathbb {R}}^n\times {\mathbb {R}}^n\rightarrow {\mathbb {R}}^n\times {\mathbb {R}}^n\) with potential V and shift \(\eta\) via

$$\begin{aligned} H_\varepsilon [V,\eta ]\begin{pmatrix} x\\ y \end{pmatrix} = \begin{pmatrix} y + \eta \\ -x + \varepsilon \nabla V(y) \end{pmatrix}. \end{aligned}$$
(2.7)

Near-identity Hénon-like maps satisfy the near-identity property \(H_0[V,\eta ]^4 = \text {Id}_{{\mathbb {R}}^n\times {\mathbb {R}}^n}\).

Definition 2.5

Let N be a positive integer and

  • \(\varvec{V} = \{V_k\}_{k\in \{1,\dots , N\}}\) be a family of scalar feed-forward neural networks on \({\mathbb {R}}^n\)

  • \(\varvec{W} = \{W_k\}_{k\in \{1,\dots ,N\}}\) be a family of network weights for \(\varvec{V}\)

  • \(\varvec{\eta } = \{\eta _k\}_{k\in \{1,\dots ,N\}}\) be a family of constants in \({\mathbb {R}}^n\)

The near-identity Hénon network with layer potentials \(\varvec{V}\), layer weights \(\varvec{W}\), and layer shifts \(\varvec{\eta }\) is the mapping defined via

$$\begin{aligned} {\mathcal {H}}_\varepsilon [\varvec{V}[\varvec{W}],\varvec{\eta }] \ = \ H_\varepsilon [V_N[W_N],\eta _N]^4 \ \circ \ \ldots \ \circ \ H_\varepsilon [V_2[W_2],\eta _2]^4 \ \circ \ H_\varepsilon [V_1[W_1],\eta _1]^4, \end{aligned}$$
(2.8)

and it satisfies the near-identity property \({\mathcal {H}}_0[\varvec{V}[\varvec{W}],\varvec{\eta }] = \text {Id}_{{\mathbb {R}}^n\times {\mathbb {R}}^n}\).

Nearly-periodic systems and nearly-periodic maps

Nearly-periodic systems

Intuitively, a continuous-time dynamical system with parameter \(\varepsilon\) is nearly-periodic if all of its trajectories are periodic with nowhere-vanishing angular frequency in the limit \(\varepsilon \rightarrow 0\). Such a system characteristically displays limiting short-timescale dynamics that ergodically cover circles in phase space. More precisely, a nearly-periodic systems can be defined as follows:

Definition 2.6

[Burby et al.11] A nearly-periodic system on a manifold M is a smooth \(\varepsilon\)-dependent vector field \(X_\varepsilon\) on M such that \(X_0 = \omega _0 R_0\), where

  • \(R_0\) is the infinitesimal generator for a circle action \(\Phi _\theta :M\rightarrow M\), \(\theta \in U(1)\).

  • \(\omega _0:M\rightarrow {\mathbb {R}}\) is strictly positive and its Lie derivative satisfies \({\mathcal {L}}_{R_0}\omega _0 = 0\).

The vector field \(R_0\) is called the limiting roto-rate, and \(\omega _0\) is the limiting angular frequency.

Examples from physics include charged particle dynamics in a strong magnetic field, the weakly-relativistic Dirac equation, and any mechanical system subject to a high-frequency, time-periodic force. In the broader context of multi-scale dynamical systems, nearly-periodic systems play a special role because they display perhaps the simplest possible non-dissipative short-timescale dynamics. They therefore provide a useful proving ground for analytical and numerical methods aimed at more complex multi-scale models.

Remark 2.3

In a paper9 on basic properties of continuous-time nearly-periodic systems, Kruskal assumed that \(R_0\) is nowhere vanishing, in addition to requiring that \(\omega _0\) is sign-definite. This assumption is usually not essential and it is enough to require that \(\omega _0\) vanishes nowhere. This is an important restriction to lift since many interesting circle actions have fixed points.

It can be shown that every nearly-periodic system admits an approximate U(1)-symmetry9, known as the roto-rate, that is determined to leading order by the unperturbed periodic dynamics:

Definition 2.7

A roto-rate for a nearly-periodic system \(X_\varepsilon\) on a manifold M is a formal power series \(R_\varepsilon = R_0 + \varepsilon \,R_1 + \varepsilon ^2\,R_2 + \dots\) with vector field coefficients such that \(R_0\) is equal to the limiting roto-rate and the following equalities hold in the sense of formal series:

$$\begin{aligned} \exp (2\pi {\mathcal {L}}_{R_\varepsilon }) = 1 \qquad \text {and} \qquad [X_\varepsilon ,R_\varepsilon ] = 0. \end{aligned}$$

Proposition 2.1

(Kruskal9) Every nearly-periodic system admits a unique roto-rate \(R_\varepsilon\).

A subtle argument allows to upgrade leading-order U(1)-invariance to all-orders U(1)-invariance for integral invariants:

Proposition 2.2

(Burby et al.11) Let \(\alpha _\varepsilon\) be a smooth \(\varepsilon\)-dependent differential form on a manifold M. Suppose \(\alpha _\varepsilon\) is an absolute integral invariant for a smooth nearly-periodic system \(X_\varepsilon\) on M. If \({\mathcal {L}}_{R_0}\alpha _0 = 0\) then \({\mathcal {L}}_{R_\varepsilon }\alpha _\varepsilon = 0\), where \(R_\varepsilon\) is the roto-rate for \(X_\varepsilon\).

Nearly-periodic maps

Nearly-periodic maps are natural discrete-time analogues of nearly-periodic systems, which were first introduced in11. The following provides a precise definition.

Definition 2.8

A nearly-periodic map on a manifold M with parameter vector space \({\mathcal {E}}\) is a smooth mapping \(F:M\times {\mathcal {E}}\rightarrow M\) such that \(F_\varepsilon :M\rightarrow M:m\mapsto F(m,\varepsilon )\) has the following properties:

  • \(F_\varepsilon\) is a diffeomorphism for each \(\varepsilon \in {\mathcal {E}}\).

  • There exists a U(1)-action \(\Phi _\theta :M\rightarrow M\) and a constant \(\theta _0\in U(1)\) such that \(F_0 = \Phi _{\theta _0}\).

We say F is resonant if \(\theta _0\) is a rational multiple of \(2\pi\), otherwise F is non-resonant. The infinitesimal generator of \(\Phi _\theta\), \(R_0\), is the limiting roto-rate.

Example 2.1

Let \(X_\varepsilon\) be a nearly-periodic system on a manifold M with limiting roto-rate \(R_0\) and limiting angular frequency \(\omega _0\). Assume that \(\omega _0\) is constant. For each \(\varepsilon \in {\mathbb {R}}\) let \({\mathcal {F}}_t^\varepsilon\) denote the time-t flow for \(X_\varepsilon\). The mapping \(F(m,\varepsilon ) = {\mathcal {F}}_{t_0}^\varepsilon (m)\) is nearly-periodic for each \(t_0\). To see why, first note that the flow of the limiting vector field \(X_0 = \omega _0 R_0\) is given by \({\mathcal {F}}_t^0(m) = \Phi _{\omega _0 t \,}(m)\), where \(\Phi _\theta\) denotes the U(1)-action generated by \(R_0\). It follows that \(F(m,0) = \Phi _{\omega _0 t_0}(m) = \Phi _{\theta _0}(m)\), where \(\theta _0 = \omega _0 \, t_0 \,\, mod 2\pi\). This example is more general than it first appears since any nearly-periodic system can be rescaled to have a constant limiting angular frequency. Indeed if the nearly-periodic system \(X_\varepsilon\) has non-constant limiting angular frequency \(\omega _0\) then \(X^\prime _\varepsilon = X_\varepsilon \, / \,\omega _0\) is a nearly-periodic system with limiting angular frequency 1. The integral curves of \(X^\prime _\varepsilon\) are merely time reparameterizations of integrals curves of \(X_\varepsilon\).

Let X be a vector field on a manifold M with time-t flow map \({\mathcal {F}}_t\). A U(1)-action \(\Phi _\theta\) is a U(1)-symmetry for X if \({\mathcal {F}}_t\ \circ \ \Phi _\theta = \Phi _\theta \ \circ \ {\mathcal {F}}_t\), for each \(t\in {\mathbb {R}}\) and \(\theta \in U(1)\). Differentiating this condition with respect to \(\theta\) at the identity implies and is implied by \({\mathcal {F}}_t^*R = R\), where R denotes the infinitesimal generator for the U(1)-action. Since we would like to think of nearly-periodic maps as playing the part of a nearly-periodic system’s flow map, the latter characterization of symmetry allows us to naturally extend Kruskal’s notion of roto-rate to our discrete-time setting.

Definition 2.9

A roto-rate for a nearly-periodic map \(F:M\times {\mathcal {E}}\rightarrow M\) is a formal power series \(R_\varepsilon = R_0 + R_1\varepsilon + R_2\varepsilon ^2+\dots\) whose coefficients are vector fields on M such that \(R_0\) is the limiting roto-rate and the following equalities hold in the sense of formal power series: \(F_\varepsilon ^*R_\varepsilon = R_\varepsilon\) and \(\exp (2\pi {\mathcal {L}}_{R_\varepsilon })=1\).

A first fundamental result concerning nearly-periodic maps establishes the existence and uniqueness of the roto-rate in the non-resonant case. Like the corresponding result in continuous time, this result holds to all orders in perturbation theory.

Theorem 2.2

(Burby et al.11) Each non-resonant nearly-periodic map admits a unique roto-rate.

Thus, non-resonant nearly-periodic maps formally reduce to mappings on the space of U(1)-orbits, corresponding to the elimination of a single dimension in phase space.

Nearly-periodic systems and maps with a hamiltonian structure

Definition 2.10

A \(\varepsilon\)-dependent presymplectic manifold is a manifold M equipped with a smooth \(\varepsilon\)-dependent 2-form \(\Omega _\varepsilon\) such that \({\textbf{d}}\Omega _\varepsilon = 0\) for each \(\varepsilon \in {\mathcal {E}}\). We say \((M,\Omega _\varepsilon )\) is exact when there is a smooth \(\varepsilon\)-dependent 1-form \(\vartheta _\varepsilon\) such that \(\Omega _\varepsilon = -{\textbf{d}}\vartheta _\varepsilon\).

Definition 2.11

A nearly-periodic Hamiltonian system on an exact presymplectic manifold \((M,\Omega _\varepsilon )\) is a nearly-periodic system \(X_\varepsilon\) on M such that \(\iota _{{X}_\varepsilon }\Omega _\varepsilon = {\textbf{d}}H_\varepsilon\), for some smooth \(\varepsilon\)-dependent function \(H_\varepsilon :M\rightarrow {\mathbb {R}}\).

We already know from Proposition 2.1 that every nearly-periodic system admits a unique roto-rate \(R_\varepsilon\). In the Hamiltonian setting, it can be shown that both the dynamics and the Hamiltonian structure are U(1)-invariant to all orders in \(\varepsilon\).

Proposition 2.3

(Kruskal9, Burby et al.11) The roto-rate \(R_\varepsilon\) for a nearly-periodic Hamiltonian system \(X_\varepsilon\) on an exact presymplectic manifold \((M,\Omega _\varepsilon )\) with Hamiltonian \(H_\varepsilon\) satisfies \({\mathcal {L}}_{R_\varepsilon }H_\varepsilon = 0\), and \({\mathcal {L}}_{R_\varepsilon }\Omega _\varepsilon = 0\) in the sense of formal power series.

According to Noether’s celebrated theorem, a Hamiltonian system that admits a continuous family of symmetries also admits a corresponding conserved quantity57,60,61. Therefore one might expect that a Hamiltonian system with an approximate symmetry must also have an approximate conservation law. This is indeed the case for nearly-periodic Hamiltonian systems:

Proposition 2.4

(Burby et al.11) Let \(X_\varepsilon\) be a nearly-periodic Hamiltonian system on the exact presymplectic manifold \((M,\Omega _\varepsilon )\). Let \(R_\varepsilon\) be the associated roto-rate. There is a formal power series \(\theta _\varepsilon = \theta _0 + \varepsilon \,\theta _1 + \dots\) with coefficients in \(\Omega ^1(M)\) such that \(\Omega _\varepsilon = -{\textbf{d}}\theta _\varepsilon\) and \({\mathcal {L}}_{R_\varepsilon }\theta _\varepsilon = 0\). Moreover, the formal power series \(\mu _\varepsilon = \iota _{R_\varepsilon }\theta _\varepsilon\) is a constant of motion for \(X_\varepsilon\) to all orders in perturbation theory. In other words, \({\mathcal {L}}_{X_\varepsilon }\mu _\varepsilon = 0,\) in the sense of formal power series. The formal constant of motion \(\mu _\varepsilon\) is the adiabatic invariant associated with the nearly-periodic Hamiltonian system.

Note that general expressions for the adiabatic invariant \(\mu _\varepsilon\) can be obtained62. It can also be shown that the (formal) set of fixed points for the roto-rate is an elliptic almost invariant slow manifold whose normal stability is mediated by the adiabatic invariant associated with the nearly-periodic Hamiltonian system8.

A similar theory can be established for nearly-periodic maps with a Hamiltonian structure.

Definition 2.12

A presymplectic nearly-periodic map on a \(\varepsilon\)-dependent presymplectic manifold \((M,\Omega _\varepsilon )\) is a nearly-periodic map F such that \(F_\varepsilon ^*\Omega _\varepsilon = \Omega _\varepsilon\) for each \(\varepsilon \in {\mathcal {E}}\).

Theorem 2.3

(Burby et al.11) If F is a non-resonant presymplectic nearly-periodic map on a \(\varepsilon\)-dependent presymplectic manifold \((M,\Omega _\varepsilon )\) with roto-rate \(R_\varepsilon\) then \({\mathcal {L}}_{R_\varepsilon }\Omega _\varepsilon = 0\).

Definition 2.13

A Hamiltonian nearly-periodic map on a \(\varepsilon\)-dependent presymplectic manifold \((M,\Omega _\varepsilon )\) is a nearly-periodic map F such that there is a smooth \((t,\varepsilon )\)-dependent vector field \(Y_{t,\varepsilon }\) with \(t\in {\mathbb {R}}\) such that the following properties hold true:

  • \(\iota _{Y_{t,\varepsilon }}\Omega _\varepsilon = {\textbf{d}}H_{t,\varepsilon }\), for some smooth \((t,\varepsilon )\)-dependent function \(H_{t,\varepsilon }\).

  • For each \(\varepsilon \in {\mathcal {E}}\), \(F_\varepsilon\) is the \(t=1\) flow of \(Y_{t,\varepsilon }\).

Lemma 2.2

Each Hamiltonian nearly-periodic map is a presymplectic nearly-periodic map.

Using presymplecticity of the roto-rate, Noether’s theorem can be used to establish existence of adiabatic invariants for many interesting presymplectic nearly-periodic maps.

Theorem 2.4

(Burby et al.11) Let F be a non-resonant presymplectic nearly-periodic map on the exact \(\varepsilon\)-dependent presymplectic manifold \((M,\Omega _\varepsilon )\) with roto-rate \(R_\varepsilon\). Assume that F is Hamiltonian or that the manifold M is connected and the limiting roto rate \(R_0\) has at least one zero. Then there exists a smooth \(\varepsilon\)-dependent 1-form \(\theta _\varepsilon\) such that \({\mathcal {L}}_{R_\varepsilon }\theta _\varepsilon = 0\) and \(-{\textbf{d}}\theta _\varepsilon =\Omega _\varepsilon\) in the sense of formal power series. Moreover the quantity \(\mu _\varepsilon = \iota _{R_\varepsilon }\theta _\varepsilon\) satisfies \(F_\varepsilon ^*\mu _\varepsilon = \mu _\varepsilon\) in the sense of formal power series, that is, \(\mu _\varepsilon\) is an adiabatic invariant for F.

When an adiabatic invariant exists, the phase-space dimension is formally reduced by two. On the slow manifold \(\mu _\varepsilon = 0\) the reduction in dimensionality may be even more dramatic. For example, the slow manifold for the symplectic Lorentz system8 has half the dimension of the full system.

Novel structure-preserving neural network architectures

Approximating nearly-periodic maps via gyroceptrons

We first consider the problem of approximating an arbitrary nearly-periodic map \(P:M\times {\mathcal {E}}\rightarrow M\) on a manifold M. From Definition 2.8, there must be a corresponding circle action \(\Phi _{\theta }: M \rightarrow M\) and \(\theta _0 \in U(1)\) such that \(P_0 = \Phi _{\theta _0}\). Consider the map \(I_\varepsilon : M \rightarrow M\) given by

$$\begin{aligned} I_{\varepsilon } \ = \ P_\varepsilon \ \circ \ \Phi _{\theta _0}^{-1} \qquad \forall \varepsilon \in {\mathcal {E}}. \end{aligned}$$
(3.1)

This defines a near-identity map on M satisfying \(I_0 = \text {Id}_M\). By composing both sides of Eq. (3.1) on the right by \(\Phi _{\theta _0}\), we obtain a representation for any nearly-periodic map P as the composition of a near-identity map and a circle action,

$$\begin{aligned} P_\varepsilon \ = \ I_\varepsilon \ \circ \ \Phi _{\theta _0} \qquad \forall \varepsilon \in {\mathcal {E}}. \end{aligned}$$
(3.2)

As a consequence, if we can approximate any near-identity map and any circle action, then by the above representation we can approximate any nearly-periodic map.

Different circle actions can act on manifolds in topologically different ways, so it would be very challenging, if not impossible, to construct a single strategy which allows to approximate any circle action to arbitrary accuracy. Here, we will consider the simpler case where we assume that we know a priori the topological type of action for the nearly-periodic system, and work within conjugation classes. Conjugation of a circle action \(\Phi _{\theta } : M \rightarrow M\) with a diffeomorphism \(\psi\) results in the map \(\psi \ \circ \ \Phi _{\theta } \ \circ \ \psi ^{-1}\), and two circle actions belong to the same conjugation class if one can be written as the conjugation with a diffeomorphism of the other one. Note that although compositions of nearly-periodic maps are not necessarily nearly-periodic, the map obtained by conjugation of a nearly-periodic map with a diffeomorphism is nearly-periodic:

Lemma 3.1

Let \(P:M\times {\mathcal {E}}\rightarrow M\) be a nearly-periodic map on a manifold M, and let \(\psi : M \rightarrow M\) be a diffeomorphism on M. Then the map \({\tilde{P}}: M\times {\mathcal {E}}\rightarrow M\) defined for any \(\varepsilon \in {\mathcal {E}}\) via

$$\begin{aligned} {\tilde{P}}_\varepsilon \equiv \ \psi \ \circ \ P_\varepsilon \ \circ \ \psi ^{-1} \end{aligned}$$
(3.3)

is a nearly-periodic map.

Proof

\(\psi\) and \(P_\varepsilon\) are diffeomorphisms for any \(\varepsilon \in {\mathcal {E}}\) so \({\tilde{P}}_\varepsilon\) is also a diffeomorphism for any \(\varepsilon \in {\mathcal {E}}\). Now, from Definition 2.8, there is a circle action \(\Phi _{\theta } : M \rightarrow M\) and \(\theta _0 \in U(1)\) such that \(P_0 = \Phi _{\theta _0}\). Define \({\tilde{\Phi }}_\theta : M \rightarrow M\) via \({\tilde{\Phi }}_\theta \equiv \ \psi \ \circ \ \Phi _{\theta } \ \circ \ \psi ^{-1}\) for any \(\theta \in U(1)\). Then, for any \(\theta ,\theta _1,\theta _2 \in U(1)\),

  • \({\tilde{\Phi }}_{\theta +2\pi } \ = \ \psi \ \circ \ \Phi _{\theta +2\pi } \ \circ \ \psi ^{-1} \ = \ \psi \ \circ \ \Phi _{\theta } \ \circ \ \psi ^{-1} \ = \ {\tilde{\Phi }}_\theta\)

  • \({\tilde{\Phi }}_{0} \ = \ \psi \ \circ \ \Phi _{0} \ \circ \ \psi ^{-1} \ = \ \psi \ \circ \ \text {Id}_M \ \circ \ \psi ^{-1} \ = \ \text {Id}_M\)

  • \({\tilde{\Phi }}_{\theta _1} \ \circ \ {\tilde{\Phi }}_{\theta _2} \ = \ \psi \ \circ \ \Phi _{\theta _1} \ \circ \ \psi ^{-1} \ \circ \ \psi \ \circ \ \Phi _{\theta _2} \ \circ \ \psi ^{-1} \ = \ \psi \ \circ \ \Phi _{\theta _1} \ \circ \ \Phi _{\theta _2} \ \circ \ \psi ^{-1} \ = \ \psi \ \circ \ \Phi _{\theta _1+\theta _2} \ \circ \ \psi ^{-1} \ = \ {\tilde{\Phi }}_{\theta _1+\theta _2}\)

Therefore, \({\tilde{\Phi }}_\theta\) is a circle action, and \(\theta _0 \in U(1)\) is such that \(\ {\tilde{\Phi }}_{\theta _0} \ = \ \psi \ \circ \ \Phi _{\theta _0} \ \circ \ \psi ^{-1} \ = \ \psi \ \circ \ P_0 \ \circ \ \psi ^{-1} \ = \ {\tilde{P}}_0\).

As a consequence, \({\tilde{P}}\) is a nearly-periodic map. \(\square\)

We also have the following useful factorization result for nearly-periodic maps with limiting rotation within a given conjugacy class:

Lemma 3.2

Let \(\Phi _\theta :M\rightarrow M\) be a circle action on a manifold M. Every nearly-periodic map \(P_\varepsilon :M\rightarrow M\) whose limiting rotation \(\Phi ^\prime _{\theta _0} = P_0\) is conjugate to \(\Phi _{\theta _0}\) admits the decomposition

$$\begin{aligned} P_\varepsilon \ = \ I_\varepsilon \ \circ \ \psi \ \circ \ \Phi _{\theta _0}\ \circ \ \psi ^{-1}, \end{aligned}$$
(3.4)

where \(\psi :M\rightarrow M\) is a diffeomorphism and \(I_\varepsilon :M\rightarrow M\) is a near-identity diffeomorphism.

We will thus assume that we know in advance the topological type of the circle action \(\Phi _{\theta }\) for the dynamics of interest, and then propose to learn the nearly-periodic map \(P_\varepsilon\) by learning each component map in the composition

$$\begin{aligned} P_\varepsilon \ = \ I_\varepsilon \ \circ \ \psi \ \circ \ \Phi _{\theta } \ \circ \ \psi ^{-1}. \end{aligned}$$
(3.5)

This formula may be interpreted intuitively as follows. The map \(\psi\) learns the mode structure of an oscillatory system’s short timescale dynamics. The circle action \(\Phi _\theta\) provides an aliased phase advance for the learnt mode. Finally, \(I_\varepsilon\) captures the averaged dynamics that occurs on timescales much larger than the limiting oscillation period.

\(I_\varepsilon\) and \(\psi\) can be learnt using any standard neural network architecture, as long as the near-identity property is enforced in the representation for \(I_\varepsilon\). It is however important to invert \(\psi\) exactly, and this strongly motivates using explicitly invertible neural network architectures for \(\psi\). It has been shown that those coupling-based invertible neural networks are universal diffeomorphism approximators63. The parameter \(\theta\) in the circle action \(\Phi _\theta\) can also be considered as a trainable parameter. We will refer to the resulting architecture as a gyroceptron, named after a combination of gyrations of phase with perceptron.

Definition 3.1

A gyroceptron is a feed-forward neural network

$$\begin{aligned} P_\varepsilon [W] \ = \ I_\varepsilon [W_I] \ \circ \ \psi [W_\psi ] \ \circ \ \Phi _{\theta } \ \circ \ \psi [W_\psi ]^{-1} \end{aligned}$$
(3.6)

with weights \(W=(W_I,W_\psi )\) and rotation parameter \(\theta \in U(1)\), where

  • \(I_\varepsilon [W_I]:M\rightarrow M\) is a diffeomorphism for each \((\varepsilon ,W_I)\) such that \(I_0[W_I] = \text {Id}_M\) for each \(W_I\)

  • \(\psi [W_\psi ]:M\rightarrow M\) is a diffeomorphism for each \(W_\psi\)

  • \(\Phi _\theta :M\rightarrow M\) is a circle action on M

Gyroceptrons enjoy the following universal approximation property.

Theorem 3.1

Fix a circle action \(\Phi _\theta :M\rightarrow M\) and a compact set \(C\subset M\). Let \(P_\varepsilon :M\rightarrow M\) be a nearly-periodic map whose limiting rotation is conjugate to \(\Phi _\theta\). Let \(\psi [W_\psi ]:M\rightarrow M\) be a feed-forward network architecture that provides a universal approximation within the class of diffeomorphisms, and let \(I_\varepsilon [W_I]\) be a feed-forward network architecture that provides a universal approximation within the class of \(\varepsilon\)-dependent diffeomorphisms with \(I_0[W] = \text {Id}_M\). For each \(\delta >0\), there exist weights \(W_\psi ^*\) and \(W_I^*\) such that the gyroceptron \(P_\varepsilon [W^*] \ = \ I_\varepsilon [W_I^*]\ \circ \ \psi [W_\psi ^*]\ \circ \ \Phi _\theta \ \circ \ \psi [W_\psi ^*]^{-1}\) approximates \(P_\varepsilon\) within \(\delta\) on C.

Approximating nearly-periodic symplectic maps via symplectic gyroceptrons

We now focus on approximating an arbitrary nearly-periodic symplectic map \(P:M\times {\mathcal {E}}\rightarrow M\) on a manifold M. We will restrict our attention to symplectic manifolds with \(\varepsilon\)-independent symplectic forms (the \(\varepsilon\)-dependent case is more subtle and will not be pursued in the current study). From Definition 2.8, there must be a corresponding symplectic circle action \(\Phi _{\theta } : M \rightarrow M\) and \(\theta _0 \in U(1)\) such that \(P_0 = \Phi _{\theta _0}\). As before, consider the map \(I_\varepsilon : M \rightarrow M\) given by

$$\begin{aligned} I_{\varepsilon } \ = \ P_\varepsilon \ \circ \ \Phi _{\theta _0}^{-1}, \qquad \forall \varepsilon \in {\mathcal {E}}. \end{aligned}$$
(3.7)

Now, the inverse of a symplectic map is symplectic and any composition of symplectic maps is also symplectic. Thus, the map \(\Phi _{\theta _0}^{-1} = P_0^{-1}\) is symplectic, and as a result, \(I_{\varepsilon }\) is symplectic on M for any \(\varepsilon \in {\mathcal {E}}\) and it satisfies the near-identity property \(I_0 = \text {Id}_M\). By composing both sides of Eq. (3.7) on the right by \(\Phi _{\theta _0}\), we obtain a representation for any nearly-periodic symplectic map P as the composition of a near-identity symplectic map and a symplectic circle action:

$$\begin{aligned} P_\varepsilon \ = \ I_\varepsilon \ \circ \ \Phi _{\theta _0}, \qquad \forall \varepsilon \in {\mathcal {E}}. \end{aligned}$$
(3.8)

Lemma 3.3

Let \(\Phi _\theta :M\rightarrow M\) be a symplectic circle action on a symplectic manifold \((M,\omega )\). Every nearly-periodic symplectic map \(P_\varepsilon :M\rightarrow M\) whose limiting rotation \(\Phi ^\prime _{\theta _0} = P_0\) is conjugate to \(\Phi _{\theta _0}\) admits the decomposition

$$\begin{aligned} P_\varepsilon \ = \ I_\varepsilon \ \circ \ \psi \ \circ \ \Phi _{\theta _0}\ \circ \ \psi ^{-1}, \end{aligned}$$
(3.9)

where \(\psi :M\rightarrow M\) is a symplectic diffeomorphism and \(I_\varepsilon :M\rightarrow M\) is a near-identity symplectic diffeomorphism.

If we can approximate any near-identity symplectic map and any symplectic circle action, then by the above representation we can approximate any nearly-periodic symplectic map. As before, we will assume that we know a priori the topological type of the circle action \(\Phi _{\theta }\) for the nearly-periodic symplectic system of interest, and work within conjugation classes. Since compositions of symplectic maps are symplectic, Lemma 3.1 implies that the map \(\psi \ \circ \ P \ \circ \ \psi ^{-1}\), obtained by conjugating a nearly-periodic symplectic map P with a symplectomorphism \(\psi\) (i.e. a symplectic diffeomorphism), is also a nearly-periodic symplectic map. We will then learn the nearly-periodic symplectic map by learning each component map in the composition

$$\begin{aligned} P_\varepsilon \ = \ I_\varepsilon \ \circ \ \psi \ \circ \ \Phi _{\theta } \ \circ \ \psi ^{-1}, \end{aligned}$$
(3.10)

where \(I_\varepsilon\) is a near-identity symplectic map and \(\psi\) is symplectic.

The symplectic map \(\psi\) can be learnt using any neural network architecture which strongly enforces symplecticity. It is preferable however to choose an architecture which can easily be inverted, so that the computations involving \(\psi ^{-1}\) can be conducted efficiently. The near-identity symplectic map \(I_\varepsilon\) can be learnt using any neural network architecture strongly enforcing symplecticity with the additional property that it limits to the identity as \(\varepsilon\) goes to 0. The parameter \(\theta\) in the circle action \(\Phi _\theta\) can also be considered as a trainable parameter. We will refer to any such resulting composition of neural network architectures as a symplectic gyroceptron.

Definition 3.2

A symplectic gyroceptron is a feed-forward neural network

$$\begin{aligned} P_\varepsilon [W] \ = \ I_\varepsilon [W_I] \ \circ \ \psi [W_\psi ] \ \circ \ \Phi _{\theta } \ \circ \ \psi [W_\psi ]^{-1} \end{aligned}$$
(3.11)

with weights \(W=(W_I,W_\psi )\) and rotation parameter \(\theta \in U(1)\), where

  • \(I_\varepsilon [W_I]:M\rightarrow M\) is a symplectic diffeomorphism for each \((\varepsilon ,W_I)\) such that \(I_0[W_I] = \text {Id}_M\) for each \(W_I\)

  • \(\psi [W_\psi ]:M\rightarrow M\) is a symplectic diffeomorphism for each \(W_\psi\)

  • \(\Phi _\theta :M\rightarrow M\) is a symplectic circle action on M

Symplectic gyroceptrons enjoy a universal approximation property comparable to the non-symplectic case.

Theorem 3.2

Fix a symplectic circle action \(\Phi _\theta :M\rightarrow M\) on the symplectic manifold \((M,\omega )\) and a compact set \(C\subset M\). Let \(P_\varepsilon :M\rightarrow M\) be a nearly-periodic symplectic map whose limiting rotation is conjugate to \(\Phi _\theta\). Let \(\psi [W_\psi ]:M\rightarrow M\) be a feed-forward network architecture that provides a universal approximation within the class of symplectic diffeomorphisms, and let \(I_\varepsilon [W_I]\) be a feed-forward network architecture that provides a universal approximation within the class of \(\varepsilon\)-dependent symplectic diffeomorphisms with \(I_0[W] = \text {Id}_M\). For each \(\delta >0\), there exist weights \(W_\psi ^*\) and \(W_I^*\) such that the symplectic gyroceptron \(P_\varepsilon [W^*] \ = \ I_\varepsilon [W_I^*]\ \circ \ \psi [W_\psi ^*]\ \circ \ \Phi _\theta \ \circ \ \psi [W_\psi ^*]^{-1}\) approximates \(P_\varepsilon\) within \(\delta\) on C.

In this paper, we will use HénonNets2 as the main building blocks of our symplectic gyroceptrons. The symplectic map \(\psi\) will be learnt using a standard HénonNet (see Definition 2.3), its inverse \(\psi ^{-1}\) can be obtained easily by composing inverses of Hénon-like maps (see Remark 2.2), and the near-identity symplectic map \(I_\varepsilon\) will be learnt using a near-identity HénonNet (see Definition 2.5). The neural network architectures considered in this paper are summarized in Figure 1.

We would like to emphasize that symplectic building blocks other than HénonNets could have been used as the basis for our symplectic gyroceptrons. For instance, a possible option would have been to use SympNets3 since they also strongly ensure symplecticity and enjoy a universal approximation property for symplectic maps. However, numerical experiments conducted in the original HénonNet paper2 suggested that HénonNets have a higher per layer expressive power than SympNets, and as a result SympNets are typically much deeper than HénonNets, and slower for prediction. This is consistent with the observations we will make later in Sect. ″Nonlinearly coupled oscillators″ where a SympNet takes 127 seconds to generate trajectories that were generated by a HénonNet of similar size in 3 seconds. Together with the fact that SympNets are not as easily invertible as HénonNets, the computational advantage of HénonNets makes them more desirable as building blocks than SympNets.

Figure 1
figure 1

Network diagrams. Left: Symplectic Gyroceptron. Right: Hénon Network.

Numerical confirmation of the existence of adiabatic invariants

In this section, we will confirm numerically that for any random set of weights and bias, the dynamical system generated by the symplectic gyroceptron

$$\begin{aligned} I_\varepsilon \ \circ \ \psi \ \circ \ \Phi _{\theta } \ \circ \ \psi ^{-1}, \end{aligned}$$
(4.1)

introduced in Sect. ″Approximating nearly-periodic symplectic maps via symplectic gyroceptrons″, admits an adiabatic invariant.

In our numerical experiments, we take the circle action given by the clockwise rotation

$$\begin{aligned} {\mathcal {R}}_{\theta }=\begin{pmatrix} \cos \theta &{} \sin \theta \\ -\sin \theta &{} \cos \theta \end{pmatrix}. \end{aligned}$$
(4.2)

The quantity \({\mathfrak {I}}_0(q,p) = \frac{1}{2} q^2 + \frac{1}{2} p^2\) is an invariant of the dynamics associated to the circle action (4.2), and as a result

$$\begin{aligned} \mu \ = \ {\mathfrak {I}}_0 \ \circ \ \psi ^{-1} \end{aligned}$$
(4.3)

is an invariant of the dynamics associated to the composition \(\psi \ \circ \ \Phi _{\theta } \ \circ \ \psi ^{-1} ,\) and an adiabatic invariant of the dynamics associated to the symplectic gyroceptron (4.1).

Figure 2
figure 2

Conservation of the adiabatic invariant (4.3) over 10000 iterations for the map generated by the symplectic gyroceptron (4.1) as \(\varepsilon\) is increased.

Figure 2 displays the evolution of the adiabatic invariant (4.3) over 10000 iterations of the dynamical system generated by the symplectic gyroceptron (4.1), for different values of \(\varepsilon\). Here, \(\psi\) is a HénonNet and \(I_\varepsilon\) a near-identity HénonNet, both with 3 Hénon layers, each of which has 8 neurons in its single-hidden-layer fully-connected neural networks layer potential. We can clearly see that the conservation of the adiabatic invariant gets significantly better as \(\varepsilon\) gets closer to 0, going from chaotic oscillations of large amplitude when \(\varepsilon = 0.1\) to very regular oscillations of minute amplitude when \(\varepsilon = 10^{-8}\).

We investigated further by obtaining the number of iterations needed for the adiabatic invariant \(\mu\) to deviate significantly from its original value \(\mu _0\) as \(\varepsilon\) is varied. More precisely, given a value of \(\varepsilon\), we search for the smallest integer \(N(\varepsilon )\) such that

$$\begin{aligned} |\mu _{N(\varepsilon )} - \mu _0 | \ > \ \rho \max _{k= 0,..., K(\varepsilon ) }{ |\mu _{k} - \mu _0 | }, \quad \text { where } \ K(\varepsilon ) = \lfloor 10 + \varepsilon ^{-1/4} \rfloor . \end{aligned}$$
(4.4)

In other words, we record the first iteration where the value of the adiabatic invariant \(\mu\) deviates from its original value \(\mu _0\) by more than some constant factor \(\rho >1\) of the maximum deviations experienced in the first few \(K(\varepsilon )\) iterations. The results are plotted in Figure 3 for \(\rho = 1.1\).

Figure 3
figure 3

\(N(\varepsilon )\) as a function of \(\varepsilon\) for \(\rho = 1.1\) and a random set of weights.

We can clearly see from Figure 3 that \(N(\varepsilon )\), the number of iterations needed for the adiabatic invariant \(\mu\) to deviate from its original value \(\mu _0\) by more than \(\rho = 1.1\) times the maximum deviations experienced in the first few iterations, increases sharply as \(\varepsilon\) gets closer to 0. This is consistent with theoretical expectations. Note that using higher values of \(\rho\) and smaller values of \(\varepsilon\) would probably generate more interesting and meaningful results. Unfortunately, this is not computationally realizable since \(N(\varepsilon )\) becomes very large when \(\rho\) is increased beyond 1.2. Even for larger values of \(\varepsilon\), computing a single point would take several days.

Numerical examples of learning surrogate maps

Nonlinearly coupled oscillators

In this section, we use the symplectic gyroceptron architecture introduced in Sect. ″Approximating nearly-periodic symplectic maps via symplectic gyroceptrons″ to learn a surrogate map for the nearly-periodic symplectic flow map associated to a nearly-periodic Hamiltonian system composed of two nonlinearly coupled oscillators, where one of them oscillates significantly faster than the other:

$$\begin{aligned} {\left\{ \begin{array}{ll} {\dot{q}}_1 = p_1 \qquad &{} {\dot{p}}_1 = -q_1 - \varepsilon \partial _{q_1} U(q_1,q_2) \\ {\dot{q}}_2 = \varepsilon p_2 \qquad &{} {\dot{p}}_2 = -\varepsilon q_2 - \varepsilon \partial _{q_2} U(q_1,q_2) \end{array}\right. } \end{aligned}$$
(5.1)

These equations of motion are the Hamilton’s equations associated to the Hamiltonian

$$\begin{aligned} H_\varepsilon (q_1,q_2,p_1,p_2) = \frac{1}{2} (q_1^2 + p_1^2 ) + \frac{1}{2} \varepsilon (q_2^2 + p_2^2 ) + \varepsilon U(q_1,q_2). \end{aligned}$$
(5.2)

The \(\varepsilon = 0\) dynamics are decoupled, where the first oscillator, initialized at \(\left(q_1(0),p_1(0)\right) = (\mathcalligra{q},\mathcalligra{p})\), follows a trajectory characterized by periodic clockwise circular rotation in phase space, while the second oscillator remains immobile:

$$q_1(t) = \mathcalligra{q} \cos{t} + \mathcalligra{p} \sin{t}, \qquad \qquad p_1(t) = \mathcalligra{p} \cos{t} - \mathcalligra{q} \sin{t} .$$
(5.3)

Thus, this is a nearly-periodic Hamiltonian system on \({\mathbb {R}}^4\) with associated \(\varepsilon = 0\) circle action given by the clockwise rotation

$$\begin{aligned} {\mathcal {R}}_{\theta }=\begin{pmatrix} \cos \theta &{} 0 &{} \sin \theta &{} 0\\ 0 &{} 1 &{} 0 &{} 0\\ -\sin \theta &{} 0 &{} \cos \theta &{} 0 \\ 0 &{} 0 &{} 0 &{} 1 \end{pmatrix}. \end{aligned}$$

We will use the nonlinear coupling potential \(U(q_1,q_2) = q_1 q_2 \sin {(2q_1 + 2q_2)}\) in our numerical experiments since the resulting nearly-periodic Hamiltonian system displays complicated dynamics as the value of \(\varepsilon\) is increased from 0. We have plotted in Figure 4 a few trajectories of this dynamical system corresponding to different values of \(\varepsilon\).

To learn a surrogate map for the nearly-periodic symplectic flow map associated to this nearly-periodic Hamiltonian system, we use the symplectic gyroceptron \(I_\varepsilon \ \circ \ \psi \ \circ \ \Phi _{\theta } \ \circ \ \psi ^{-1}\) introduced in Sect. ″Approximating nearly-periodic symplectic maps via symplectic gyroceptrons″. In our first numerical experiments, \(\varepsilon = 0.01\), \(\theta\) is a trainable parameter, \(\psi\) is a HénonNet with 10 Hénon layers each of which has 8 neurons in its single-hidden-layer fully-connected neural networks layer potential, and \(I_\varepsilon\) is a near-identity HénonNet with 8 Hénon layers each of which has 6 neurons in its single-hidden-layer fully-connected neural network layer potential.

The resulting symplectic gyroceptron of 549 trainable parameters was trained for a few thousands epochs on a dataset of 20,000 updates \((q_1,q_2,p_1,p_2) \mapsto ({\tilde{q}}_1,{\tilde{q}}_2,{\tilde{p}}_1,{\tilde{p}}_2)\) of the time-0.05 flow map associated to the nearly-periodic Hamiltonian system (5.1). The training data was generated using the classical Runge–Kutta 4 integrator with very small time-steps, and the Mean Squared Error was used as the loss function in the training. Figure 5 shows the dynamics predicted by the symplectic gyroceptron for seven different initial conditions with the same initial values of \((q_1,p_1)\) against the reference trajectories generated by the classical Runge–Kutta 4 integrator with very small time-steps. We only display the trajectories of the second oscillator since the motion of the first oscillator follows a simple nearly-circular curve.

We can see that the dynamics learnt by the symplectic gyroceptron match almost perfectly the reference trajectories and follow the level sets of the averaged Hamiltonian \({\bar{H}} = \frac{1}{2\pi }\int _0^{2\pi }\Phi _\theta ^*H\,d\theta\), which is given by

$$\begin{aligned} {\bar{H}}(q_2,p_2)&= \frac{1}{2}(q_2^2 + p_2^2) + \frac{1}{2\pi } \int _0^{2\pi }{U(q_1(t) , q_2) dt} \end{aligned}$$
(5.4)
$$= \frac{1}{2}(q_2^2 + p_2^2) + \frac{q_2}{2\pi}\int_0^{2\pi}{ \left( \mathcalligra{q} \cos{t} +\mathcalligra{p} \sin{t} \right) \sin{\left(2\left[ \mathcalligra{q} \cos{t} + \mathcalligra{p} \sin{t}\right]+ 2q_2 \right)} dt}$$
(5.5)
$$= \frac{1}{2}(q_2^2 + p_2^2) + q_2 \cos{(2 q_2)} \sqrt{ \mathcalligra{q}^2 + \mathcalligra{p}^2} \ \mathcalligra{J}_1\left(2 \sqrt{ \mathcalligra{q}^2 + \mathcalligra{p}^2} \right) \label{eq: approximate averaged Hamiltonian}$$
(5.6)

where \({\mathcal {J}}_1(x)\) is the first-order Bessel function of the first kind, up to an unimportant constant. Using Kruskal’s theory of nearly-periodic systems, it is straightforward to show that this averaged Hamiltonian is the leading-order approximation of the Hamiltonian for the formal U(1)-reduction of the two-oscillator system.

We also learned a surrogate map for the nearly-periodic symplectic time-5 flow map associated to the dynamical system (5.1), using a symplectic gyroceptron where \(\varepsilon = 0.01\), \(\theta\) is a trainable parameter, and \(\psi\) and \(I_\varepsilon\) both have 10 Hénon layers each of which has 8 neurons in its single-hidden-layer fully-connected neural network layer potential. This symplectic gyroceptron of 681 trainable parameters was trained for a few thousands epochs on a dataset of 60,000 updates \((q_1,q_2,p_1,p_2) \mapsto ({\tilde{q}}_1,{\tilde{q}}_2,{\tilde{p}}_1,{\tilde{p}}_2)\). For comparison, we also trained a HénonNet2 and a SympNet3 of similar sizes and ran simulations from the same seven different initial conditions. The HénonNet used has 16 layers each of which has 10 neurons in its single-hidden-layer fully-connected neural network layer potential, for a total of 672 trainable parameters. The SympNet3 used has 652 trainable parameters in a network structure of the form \({\mathcal {L}}_n^{(k+1)}\ \circ \ ({\mathcal {N}}_{\text {up/low}}\ \circ \ {\mathcal {L}}_n^{(k)})\ \circ \ \dots \ \circ \ ({\mathcal {N}}_{\text {up/low}}\ \circ \ {\mathcal {L}}_n^{(1)})\), where each \({\mathcal {L}}_n^{(k)}\) is the composition of n trainable linear symplectic layers, and \({\mathcal {N}}_{\text {up/low}}\) is a non-trainable symplectic activation map.

Figure 4
figure 4

Sample trajectories in \((q_1,p_1)\) and \((q_2,p_2)\) phase spaces (left column: first oscillator, right column: second oscillator) for the nearly-periodic Hamiltonian system (Nonlinearly coupled oscillators) as the value of the parameter \(\varepsilon\) is increased.

Figure 5
figure 5

Level sets of the averaged Hamiltonian (5.6), and the symplectic gyroceptron predictions against the reference trajectories for the second oscillator in the nearly-periodic Hamiltonian system (5.1) with \(\varepsilon = 0.01\) and a time-step of 0.05.

Figure 6
figure 6

Predictions from a Symplectic Gyroceptron, a SympNet, and a HenonNet, against the reference trajectories for the second oscillator in the nearly-periodic Hamiltonian system (5.1) with \(\varepsilon = 0.01\) and the larger time-step of 5.

Figure 6 shows the dynamics predicted by the symplectic gyroceptron, the HénonNet, and the SympNet, for seven different initial conditions with the same initial values of \((q_1,p_1)\) against the reference trajectories generated by the Runge–Kutta 4 integrator (RK4) with small time-steps. As before, we only display the trajectories of the second oscillator. We can see that the dynamics predicted by the symplectic gyroceptron match the reference trajectories very well, although the predicted oscillations around the level sets of the averaged Hamiltonian are unsurprisingly larger than when learning the time-0.05 flow map. The crucial advantage that the symplectic gyroceptron offers over the other architectures considered, which only enforce the symplectic constraint, is provable existence of an adiabatic invariant. After training, the other architectures may empirically display preservation of an adiabatic invariant, but this cannot be proved rigorously from first principles. In contrast, the symplectic gyroceptron enjoys provable existence of an adiabatic invariant before, during, and after training.

Note that the symplectic gyroceptron generated the seven trajectories in 5 seconds, which is several orders of magnitude faster than RK4 with small time-steps which took 6,055 seconds. The HénonNet allowed to simulate the dynamics slightly faster, in 3 seconds, while the SympNet was much slower with a running time of 127 seconds, consistently with the observations made in the original HénonNet paper2 which motivated choosing HénonNets over SympNets in the symplectic gyroceptrons.

Charged particle interacting with its self-generated electromagnetic field

Problem formulation

Next we test the ability of symplectic gyroceptrons to function as surrogates for higher-dimension nearly-periodic systems, and for systems where the limiting circle action is not precisely known.

To formulate the ground-truth model, first fix a positive integer K and a sequence of single-variable functions \(V_k:{\mathbb {R}}\rightarrow {\mathbb {R}}\), \(k=1,\dots ,K\). Consider the canonical Hamiltonian system on \({\mathbb {R}}^2\times ({\mathbb {R}}^2)^K\) with coordinates \((q,p,Q_1,P_1,\dots ,Q_K,P_K)\), defined by the Hamiltonian

$$\begin{aligned} H_\epsilon = \frac{1}{2} \epsilon \left( p-\sum _{k=1}^K \sin (kq)\,Q_k\right) ^2 + \frac{1}{2}\sum _{k=1}^Kk\left( [P_k-V_k(Q_k)]^2+ Q_k^2\right) . \end{aligned}$$
(5.7)

The equations of motion are

$$\begin{aligned} {\dot{q}}&= \partial _{p}H_\epsilon = \epsilon \left( p-\sum _{\ell =1}^K \sin (\ell q)\,Q_\ell \right) , \\ {\dot{p}} & = -\partial _{q}H_\epsilon =\epsilon \left( p-\sum _{\ell =1}^K \sin (\ell q)\,Q_\ell \right) \sum _{m=1}^K m \,\cos (m q)\,Q_m , \\ {\dot{Q}}_k&= \partial _{P_k}H_\epsilon = k\,(P_k - V_k(Q_k)) , \\{\dot{P}}_k & = -\partial _{Q_k}H_\epsilon =- k\,Q_k + k\,(P_k - V_k(Q_k))\,V_k^\prime (Q_k) + \epsilon \left( p-\sum _{\ell =1}^K \sin (\ell q)\,Q_\ell \right) \sin (kq). \end{aligned}$$

These equations may be regarded as a simplified model of a charged particle (qp) interacting with its self-generated electromagnetic field \((Q_1,P_1,\dots ,Q_K,P_K)\). We will describe the application of symplectic gyroceptrons to the development of a dynamical surrogate for this system when \(\epsilon \ll 1\).

First, we verify that this Hamiltonian system is nearly-periodic, since this is the type of dynamical systems that symplectic gyroceptrons are designed to handle. So consider the limiting dynamics when \(\epsilon = 0\). The equations of motion reduce to

$$\dot{q} = 0,\;\; \dot{p} = 0, \;\;\dot{Q}_{k} = \partial _{{P_{k} }} H_{{}} = k{\mkern 1mu} (P_{k} - V_{k} (Q_{k} )),\;\; \dot{P}_{k} = - \partial _{{Q_{k} }} H_{\varepsilon } = - k{\mkern 1mu} Q_{k} + k{\mkern 1mu} (P_{k} - V_{k} (Q_{k} )){\mkern 1mu} V_{k}^{\prime } (Q_{k} ).$$
(5.8)

While these equations of motion may appear impenetrable at first glance, the symplectic transformation of variables given by \(\Lambda _0^{-1}:(q,p,Q_1,P_1,\dots ,Q_K,P_K)\mapsto (q,p,Q_1,\Pi _1,\dots ,Q_K,\Pi _K)\) where \(\Pi _k = P_k - V_k(Q_k)\) simplifies them dramatically into

$$\begin{aligned} {\dot{q}} =0, \qquad {\dot{p}} =0, \qquad {\dot{Q}}_k = k\,\Pi _k, \qquad {\dot{\Pi }}_k = - k\,Q_k, \end{aligned}$$
(5.9)

which correspond to a family (indexed by k) of harmonic oscillators with angular frequencies k. The solution map in these nice variables is therefore \(\Phi _t^0(q,p,Q_1,\Pi _1,\dots ,Q_K,\Pi _K) =(q,p,Q_1(t),\Pi _1(t),\dots ,Q_K(t),\Pi _K(t))\), where

$$\begin{aligned} Q_k(t) = \cos (kt)\,Q_k + \sin (kt)\,\Pi _k , \qquad \Pi _k(t) = -\sin (kt)\,Q_k + \cos (kt)\,\Pi _k. \end{aligned}$$
(5.10)

Note that \(\Phi ^0_t\) is periodic with minimal period \(2\pi\). The solution map in terms of the original variables \((Q_k,P_k)\) is therefore \(\Phi _t = \Lambda _0 \ \circ \ \Phi ^0_\theta \ \circ \ \Lambda _0^{-1}\). Since \(\Phi _t\) is periodic in t with minimal period \(2\pi\) the ground-truth equations are Hamiltonian and nearly-periodic. The leading-order adiabatic invariant is

$$\begin{aligned} \mu _0 \ = \ \frac{1}{2} \sum _{k=1}^K k\,(\Pi _k^2 + Q_k^2) \ = \ \frac{1}{2} \sum _{k=1}^K k\,([P_k - V_k(Q_k)]^2 + Q_k^2). \end{aligned}$$
(5.11)

Symplectic gyroceptrons are therefore well-suited to surrogate modeling for this system.

Numerical experiments

Here, we learn the nearly-periodic Hamiltonian system (5.7) in the 6-dimensional case (i.e., \(K=2\)) with \(V_1(Q_1) = \frac{1}{2} \sin (2Q_1)\) and \(V_2(Q_2) = \frac{1}{2} \exp { (-5 Q_2^2)}\). In our symplectic gyroceptron \(I_\varepsilon \ \circ \ \psi \ \circ \ \Phi _{\theta } \ \circ \ \psi ^{-1}\), the circle action \(\Phi _{\theta }\) is taken to be the rotation in Eq. (5.10) with \(\theta\) treated as a trainable parameter, and the HénonNets \(\psi\) and \(I_\varepsilon\) both have 12 Hénon layers each of which has 8 neurons in its single-hidden-layer fully-connected neural network layer potential. The resulting architecture of 1,033 trainable parameters was trained for a few thousands epochs on a dataset of 60,000 updates \((q,p,Q_1,P_1,Q_2,P_2) \mapsto ({\tilde{q}},{\tilde{p}},{\tilde{Q}}_1,{\tilde{P}}_2,{\tilde{Q}}_1,{\tilde{P}}_2)\).

To verify visually that we have learnt the dynamics successfully, we select initial conditions on the zero level set of the adiabatic invariant \(\mu _0\). There, dynamics should remain on that slow manifold which is lower-dimensional and thus more easily portrayed. For the Hamiltonian system (5.7), the slow manifold is the zero level set of \(\mu _0 = 0\), which we can see from Eq. (5.11), is the set of points \((q,p,Q_1,P_1,Q_2,P_2)\) such that \(Q_1 = Q_2 = 0\) and \(P_1 = V_1(Q_1) = V_1(0), \ P_2 = V_2(Q_2) = V_2(0)\).

On that slow manifold, the dynamics reduce to

$$\dot{q} = \varepsilon p,\;\;\;\dot{p} = 0,\;\;\;\dot{Q}_{1} = 0,\;\;\;\dot{Q}_{2} = 0,\;\;\;\dot{P}_{1} = \varepsilon p\sin (q),\;\;\;\dot{P}_{2} = \varepsilon p\sin (2q),$$
(5.12)

where in particular the (qp) dynamics are now independent of \((Q_1, Q_2, P_1, P_2)\) and can easily be solved for explicitly, given some initial conditions \(\left(q(0),p(0)\right) = (\mathcalligra{q},\mathcalligra{p})\):

$$q(t) = \mathcalligra{q} + \epsilon \mathcalligra{p} t, \qquad p(t) = \mathcalligra{p}.$$
(5.13)

Figures 7a,b show that the trained symplectic gyroceptron generates predictions for the evolution of q and p which remain very close to the true trajectories on the slow manifold when the initial conditions are selected on the zero level set of \(\mu _0\).

We also generate dynamics outside the zero level set of \(\mu _0\) and verify that the quantity \({\mathfrak {I}}_0 \ \circ \ \psi ^{-1}\) matches the learnt adiabatic invariant \(\mu ^{learnt}_0\) along the trajectories generated by the symplectic gyroceptron \(I_\varepsilon \ \circ \ \psi \ \circ \ \Phi _{\theta } \ \circ \ \psi ^{-1}\), where

$$\begin{aligned} & {\mathfrak {I}}_0 (q,p,Q_1, \Pi _1, Q_2, \Pi _2) = \frac{1}{2} \sum _{k=1}^{K=2} k\,(\Pi _k^2 + Q_k^2), \\ \text {and} \quad &\mu ^{learnt}_0 (q,p,Q_1,P_1, Q_2,P_2) = \frac{1}{2} \sum _{k=1}^{K=2}k\,([P_k - V_k(Q_k)]^2 + Q_k^2). \end{aligned}$$
(5.14)

More precisely, we check whether \({\mathfrak {I}}_0 \ \circ \ \psi ^{-1} = \mu ^{learnt}_0\) with both quantities being approximately constant along trajectories generated by the symplectic gyroceptron, where

$$\begin{aligned} {\mathfrak {I}}_0 \ \circ \ \psi ^{-1}(q,p,Q_1,P_1, Q_2,P_2) = \frac{1}{2} \sum _{k=1}^{K=2} k\,({\tilde{\Pi }}_k^2 + {\tilde{Q}}_k^2), \ \ \text { with } \ ({\tilde{q}}, {\tilde{p}},\tilde{Q_1}, {\tilde{\Pi }}_k,\tilde{Q_2}, {\tilde{\Pi }}_2 )= \psi ^{-1}(q,p,Q_1,P_1, Q_2,P_2). \end{aligned}$$
(5.15)

From Figure 7c), we see that along trajectories which are not started on the zero level set of \(\mu _0\), the value of \({\mathfrak {I}}_0 \ \circ \ \psi ^{-1}\) remains close to the approximately constant quantity \(\mu ^{learnt}_0\), although \({\mathfrak {I}}_0 \ \circ \ \psi ^{-1}\) displays small oscillations. Since \({\mathfrak {I}}_0 \ \circ \ \psi ^{-1}\) is an adiabatic invariant for the network, these oscillations remain bounded in amplitude for very large time intervals. The amplitude can in principle be reduced by finding a more optimal set of weights for the network, but it can never be reduced to zero since the true adiabatic invariant is not exactly conserved (oscillations in \(\mu _0\) are not visible at the scales displayed in the plot).

Figure 7
figure 7

(a, b) Symplectic gyroceptron predictions (colors) against the true trajectories (dashed) with 4 different choices of initial conditions on the zero level set of the adiabatic invariant for the nearly-periodic Hamiltonian system (5.7) with \(\varepsilon = 0.01\). c) Evolution of \({\mathfrak {I}}_0 \ \circ \ \psi ^{-1}\) (colors) and \(\mu ^{learnt}_0\) (dashed lines) along trajectories generated by the symplectic gyroceptron with 3 different choices of initial conditions for the nearly-periodic Hamiltonian system (5.7) with \(\varepsilon = 0.01\).

Discussion

In this paper, we have successfully constructed novel structure-preserving neural network architectures, gyroceptrons and symplectic gyroceptrons, to learn nearly-periodic maps and nearly-periodic symplectic maps, respectively. By construction, these proposed architectures define nearly-periodic maps, and symplectic gyroceptrons also preserve symplecticity. Furthermore, it was confirmed experimentally that in the symplectic case, the maps generated by the proposed symplectic gyroceptrons admit discrete-time adiabatic invariants, regardless of the values of their parameters and weights.

We also demonstrated that the proposed architectures can be effectively used in practice, by learning very precisely surrogate maps for the nearly-periodic symplectic flow maps associated to two different nearly-periodic Hamiltonian systems. Note that the hyperparameters in our architectures have not been optimized to maximize the quality of our training outcomes, and future applications of this architecture may benefit from further hyperparameter tuning.

Symplectic gyroceptrons provide a promising class of architectures for surrogate modeling of non-dissipative dynamical systems that automatically steps over short timescales without introducing spurious instabilities, and could have potential future applications for the Klein–Gordon equation in the weakly-relativistic regime, for charged particles moving through a strong magnetic field, and for the rotating inviscid Euler equations in quasi-geostrophic scaling7. Symplectic gyroceptrons could also be used for structure-preserving simulation of non-canonical Hamiltonian systems on exact symplectic manifolds11, which have numerous applications across the physical sciences, for instance in modeling weakly-dissipative plasma systems36,37,38,39,40,41,42.

The approach to symplectic gyroceptrons presented here targets surrogate modeling problems, where the dynamical system of interest is known but slow or expensive to simulate. In principle, symplectic gyroceptrons could also be used to discover dynamical models from observational data without detailed knowledge of the underlying dynamical system. However, in order to apply symplectic gyroceptrons effectively in this context data-mining methods must be developed for learning the topological conjugacy class of the limiting circle action. Given a topological classification of circle actions on the relevant state space (e.g. see64 for the case of a 3-dimensional state space), a straightforward approach would be to test an ensemble of topologically-distinct circle actions for best results. A more nuanced approach would use the observed dynamics to estimate values for the classifying topological invariants of a circle action. This topological learning problem warrants further investigation.