Approximation of nearly-periodic symplectic maps via structure-preserving neural networks

A continuous-time dynamical system with parameter \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon$$\end{document}ε is nearly-periodic if all its trajectories are periodic with nowhere-vanishing angular frequency as \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon$$\end{document}ε approaches 0. Nearly-periodic maps are discrete-time analogues of nearly-periodic systems, defined as parameter-dependent diffeomorphisms that limit to rotations along a circle action, and they admit formal U(1) symmetries to all orders when the limiting rotation is non-resonant. For Hamiltonian nearly-periodic maps on exact presymplectic manifolds, the formal U(1) symmetry gives rise to a discrete-time adiabatic invariant. In this paper, we construct a novel structure-preserving neural network to approximate nearly-periodic symplectic maps. This neural network architecture, which we call symplectic gyroceptron, ensures that the resulting surrogate map is nearly-periodic and symplectic, and that it gives rise to a discrete-time adiabatic invariant and a long-time stability. This new structure-preserving neural network provides a promising architecture for surrogate modeling of non-dissipative dynamical systems that automatically steps over short timescales without introducing spurious instabilities.


Introduction
Dynamical systems evolve according to the laws of physics, which can usually be described using differential equations.By solving these differential equations, it is possible to predict the future states of the dynamical system.Identifying accurate and efficient dynamic models based on observed trajectories is thus critical for the analysis, simulation and control of dynamical systems.We consider here the problem of learning dynamics: given a dataset of trajectories followed by a dynamical system, we wish to infer the dynamical law responsible for these trajectories and then possibly use that law to predict the evolution of similar systems in different initial states.We are particularly interested in the surrogate modeling problem: the underlying dynamical system is known, but traditional simulations are either too slow or expensive for some optimization task.This problem can be addressed by learning a less expensive, but less accurate surrogate for the simulations.
Models obtained from first principles are extensively used across science and engineering.Unfortunately, due to incomplete knowledge, these models based on physical laws tend to over-simplify or incorrectly describe the underlying structure of the dynamical systems, and usually lead to high bias and modeling errors that cannot be corrected by optimizing over the few parameters in the models.
Deep learning architectures can provide very expressive models for function approximation, and have proven very effective in numerous contexts [1][2][3] .Unfortunately, standard non-structure-preserving neural networks struggle to learn the symmetries and conservation laws underlying dynamical systems, and as a result do not generalize well.Indeed, they tend to prefer certain representations of the dynamics where the symmetries and conservation laws of the system are not exactly enforced.As a result, these models do not generalize well as they are often not capable of producing physically plausible results when applied to new unseen states.Deep learning models capable of learning and generalizing dynamics effectively are typically over-parameterized, and as a consequence tend to have high variance and can be very difficult to interpret 4 .Also, training these models usually requires large datasets and a long computational time, which makes them prohibitively expensive for many applications.
A recent research direction is to consider a hybrid approach which combines knowledge of physics laws and deep learning architectures 2,3,5,6 .The idea is to encode physics laws and the conservation of geometric properties of the underlying systems in the design of the neural networks or in the learning process.Available physics prior knowledge can be used to construct physics-constrained neural networks with improved design and efficiency and a better generalization capacity, which take advantage of the function approximation power of neural networks to deal with incomplete knowledge.
In this paper, we will consider the problem of learning dynamics for highly-oscillatory Hamiltonian systems.Examples include the Klein-Gordon equation in the weakly-relativistic regime, charged particles moving through a strong magnetic field, and the rotating inviscid Euler equations in quasi-geostrophic scaling 7 .More generally, any Hamiltonian system may be embedded as a normally-stable elliptic slow manifold in a nearly-periodic Hamiltonian system 8 .Highly-oscillatory Hamiltonian systems exhibit two basic structural properties whose interactions play a crucial role in their long-term dynamics.First is preservation of the symplectic form, as for all Hamiltonian systems.Second is timescale separation, corresponding to the relatively short timescale of oscillations compared with slower secular drifts.Coexistence of these two structural properties implies the existence of an adiabatic invariant [8][9][10][11] .Adiabatic invariants differ from true constants of motion, in particular energy invariants, which do not change at all over arbitrary time intervals.Instead adiabatic invariants are conserved with limited precision over very large time intervals.There are no learning frameworks available today that exactly preserve the two structural properties whose interplay gives rise to adiabatic invariants.This work addresses this challenge by exploiting a recently-developed theory of nearly-periodic symplectic maps 11 , which can be thought of as discrete-time analogues of highly-oscillatory Hamiltonian systems 9 .
As a result of being symplectic, a mapping assumes a number of special properties.In particular, symplectic mappings are closely related to Hamiltonian systems: any solution to a Hamiltonian system is a symplectic flow 12 , and any symplectic flow corresponds locally to an appropriate Hamiltonian system 13 .It is well-known that preserving the symplecticity of a Hamiltonian system when constructing a discrete approximation of its flow map ensures the preservation of many aspects of the dynamical system such as energy conservation, and leads to physically well-behaved discrete solutions over exponentially-long time intervals [13][14][15][16][17] .It is thus important to have structure-preserving neural network architectures which can learn symplectic maps and ensure that the learnt surrogate map preserves symplecticity.Many physics-informed and structure-preserving machine learning approaches have recently been proposed to learn Hamiltonian dynamics and symplectic maps 2,3,[18][19][20][21][22][23][24][25][26][27][28][29][30][31][32][33][34][35] .In particular, Hénon Neural Networks (HénonNets) 2 can approximate arbitrary well any symplectic map via compositions of simple yet expressive elementary symplectic mappings called Hénon-like mappings.In the numerical experiments conducted in this paper, HénonNets 2 will be our preferred choice of symplectic map approximator to use as building block in our framework for approximation of nearly-periodic symplectic maps, although some of the other approaches listed above for approximating symplectic mappings can be used within our framework as well.
As shown by Kruskal 9 , every nearly-periodic system, Hamiltonian or not, admits an approximate U(1)-symmetry, determined to leading order by the unperturbed periodic dynamics.It is well-known that a Hamiltonian system which admits a continuous family of symmetries also admits a corresponding conserved quantity.It is thus not surprising that a nearly-periodic Hamiltonian system, which admits an approximate symmetry, must also have an approximate conservation law 11 , and the approximately conserved quantity is referred to as an adiabatic invariant.
Nearly-periodic maps, first introduced by Burby et al. 11 , are natural discrete-time analogues of nearly-periodic systems, and have important applications to numerical integration of nearly-periodic systems.Nearly-periodic maps may also be used as tools for structure-preserving simulation of non-canonical Hamiltonian systems on exact symplectic manifolds 11 , which have numerous applications across the physical sciences.Noncanonical Hamiltonian systems play an especially important role in modeling weakly-dissipative plasma systems [36][37][38][39][40][41][42] .Similarly to the continuous-time case, nearly-periodic maps with a Hamiltonian structure (that is symplecticity) admit an approximate symmetry and as a result also possess an adiabatic invariant 11 .The adiabatic invariants that our networks target only arise in purely Hamiltonian systems.Just like dissipation breaks the link between symmetries and conservation laws in Hamiltonian systems, dissipation also breaks the link between approximate symmetries and approximate conservation laws in Hamiltonian systems.We are not considering systems with symmetries that are broken by dissipation or some other mechanism, but rather considering systems which possess approximate symmetries.This should be contrasted with other frameworks [43][44][45] which develop machine learning techniques for systems that explicitly include dissipation.
We note that neural network architectures designed for multi-scale dynamics and long-time dependencies are available 46 , and that many authors have introduced numerical algorithms specifically designed to efficiently step over high-frequency oscillations [47][48][49] .However, the problem of developing surrogate models for dynamical systems that avoid resolving short oscillations remains open.Such surrogates would accelerate optimization algorithms that require querying the dynamics of an oscillatory system during the optimizer's "inner loop".The network architecture presented in this article represents a first important step toward a general solution of this problem.Some of its advantages are that it aims to learn a fast surrogate model that can resolve long-time dynamics using very short time data, and that it is guaranteed to enjoy symplectic universal approximation within the class of nearly periodic maps.As developed in this paper, our method applies to dynamical systems that exhibit a single fast mode of oscillation.In particular, when initial conditions for the surrogate model are selected on the zero level set of the learned adiabatic invariant, the network automatically integrates along the slow manifold [50][51][52][53][54] .While our network architecture generalizes in a straightforward manner to handle multiple non-resonant modes, it cannot be applied to dynamical systems that exhibit resonant surfaces.
Note that many of the approaches listed earlier for physics-based or structure-preserving learning of Hamiltonian dynamics focus on learning the vector field associated to the continuous-time Hamiltonian system, while others learn a discrete-time symplectic approximation to the flow map of the Hamiltonian system.In many contexts, we do not need to infer the continuous-time dynamics, and only need a surrogate model which can rapidly generate accurate predictions which remain physically consistent for a long time.Learning a discrete-time approximation to the evolution or flow map, instead of learning the continuous-time vector field, allows for fast prediction and simulation without the need to integrate differential equations or use neural ODEs and adjoint techniques (which can be very expensive and can introduce additional errors due to discretization).In this paper, we will learn nearly-periodic symplectic approximations to the flow maps of nearly-periodic Hamiltonian systems, with the intention of obtaining algorithms which can generate accurate and physically-consistent simulations much faster than traditional integrators.
Outline.We first review briefly some background notions from differential geometry in Section 2.1.Then, we discuss how symplectic maps can be approximated using HénonNets in Section 2.2, before defining nearly-periodic systems and maps and reviewing their important properties in Section 2.3.In Section 3, we introduce novel neural network architectures, gyroceptrons and symplectic gyroceptrons, to approximate symplectic and non-symplectic nearly-periodic maps.We then show in Section 4 that symplectic gyroceptrons admit adiabatic invariants regardless of the values of their weights.Finally, in Section 5, we demonstrate how the proposed architecture can be used to learn surrogate maps for the nearly-periodic symplectic flow maps associated to two different systems: a nearly-periodic Hamiltonian system composed of two nonlinearly coupled oscillators (in Section 5.1), and the nearly-periodic Hamiltonian system describing the evolution of a charged particle interacting with its self-generated electromagnetic field (in Section 5.2).

Differential Geometry Background
In this paper, we reserve the symbol M for a smooth manifold equipped with a smooth auxiliary Riemannian metric g, and E will always denote a vector space for the parameter ε.We will now briefly introduce some standard concepts from differential geometry that will be used throughout this paper (more details can be found in introductory differential geometry books [55][56][57] ).
A smooth map h ∶ M 1 → M 2 between smooth manifolds M 1 ,M 2 is a diffeomorphism if it is bijective with a smooth inverse.We say that A vector field on a manifold M is a map X ∶ M → T M such that X(m) ∈ T m M for all m ∈ M, where T m M denotes the tangent space to M at m and T M = {(m,v) m ∈ M,v ∈ T m M} is the tangent bundle T M of M. The vector space dual to T m M is the cotangent space T * m M, and the cotangent bundle of M is T * M = {(m, p) m ∈ M, p ∈ T * m M}.The integral curve at m of a vector field X is the smooth curve c on M such that c(0) = m and c ′ (t) = X(c(t)).The flow of a vector field X is the collection of maps ϕ t ∶ M → M such that ϕ t (m) is the integral curve of X with initial condition m ∈ M.
A k k k-form on a manifold M is a map which assigns to every point m ∈ M a skew-symmetric k-multilinear map on T m M. Let α be a k-form and β be a s-form β on a manifold M. Their tensor product α ⊗ β at m ∈ M is defined via The alternating operator Alt acts on a k-form α via where S k is the group of all the permutations of {1,...,k} and sgn(π) is the sign of the permutation.The wedge product α ∧ β is then defined via The exterior derivative of a smooth function f ∶ M → R is its differential d f , and the exterior derivative dα of a k-form α with k > 0 is the (k + 1)-form defined by

3/21
The interior product ι X α where X is a vector field on M and α is a k-form is the (k − 1)-form defined via The pull-back ψ * α of α by a smooth map ψ ∶ M → N is the k-form defined by The Lie derivative L X α of the k-form α along a vector field X with flow ϕ t is L X α = d dt t=0 ϕ * t α, and for a smooth function The circle group U(1), also known as first unitary group, is the one-dimensional Lie group of complex numbers of unit modulus with the standard multiplication operation.It can be parametrized via e iθ for θ ∈ [0,2π), and is isomorphic to the special orthogonal group SO(2) of rotations in the plane.A circle action on a manifold M is a one-parameter family of smooth diffeomorphisms Φ θ ∶ M → M that satisfies the following three properties for any θ ,θ 1 ,θ 2 ∈ U(1) ≅ R mod 2π: The infinitesimal generator of a circle action Φ θ on M is the vector field on M defined by m

Approximation of Symplectic Maps via Hénon Neural Networks
( The symplectic condition (2.1) implies that the mapping Φ has a number of special properties.In particular, there is a close relation between Hamiltonian systems and symplecticity of flows: Poincaré's Theorem 12 states that any solution to a Hamiltonian system is a symplectic flow, and it can also be shown that any symplectic flow corresponds locally to an appropriate Hamiltonian system.Preserving the symplecticity of a Hamiltonian system when constructing a discrete approximation of its flow map ensures the preservation of many aspects of the dynamical system such as energy conservation, and leads to physically well-behaved discrete solutions [13][14][15][16][17] .It is thus important to have structure-preserving network architectures which can learn symplectic maps.
The space of all symplectic maps is infinite dimensional 58 , so the problem of approximating an arbitrary symplectic map using compositions of simpler symplectic mappings is inherently interesting.Turaev 59 showed that every symplectic map may be approximated arbitrarily well by compositions of Hénon-like maps, which are special elementary symplectic maps.Definition 2.1 Let V ∶ R n → R be a smooth function on R n and let η ∈ R n be a constant.We define the Hénon-like map and a positive integer N such that H[V,η] 4N approximates the mapping Φ within δ in the C r topology.
Remark 2.1 The significance of the number 4 in this theorem follows from the fact that the fourth iterate of the Hénon-like map with trivial potential V = 0 is the identity map: Turaev's result suggests the specific neural network architecture to approximate symplectic mappings using Hénon-like maps 2 .We review the construction of HénonNets 2 , starting with the notion of a Hénon layer.Definition 2.2 Let η ∈ R n be a constant vector, and let V be a scalar feed-forward neural network on R n , that is., a smooth mapping V ∶ W × R n → R, where W is a space of neural network weights.The Hénon layer with potential V , shift η, and weight W is the iterated Hénon-like map where we use the notation V [W ] to denote the mapping V [W ](y) = V (W,y), for any y ∈ R n , W ∈ W.

4/21
There are various network architectures for the potential V [W ] that are capable of approximating any smooth function V ∶ R n → R with any desired level of accuracy.For example, a fully-connected neural network with a single hidden layer of sufficient width can approximate any smooth function.Therefore a corollary of Theorem 2.1 is that any symplectic map may be approximated arbitrarily well by the composition of sufficiently many Hénon layers with various potentials and shifts.This leads to the notion of a Hénon Neural Network.
Definition 2.3 Let N be a positive integer and ..,N} be a family of scalar feed-forward neural networks on R n • W W W = {W k } k∈{1,...,N} be a family of network weights for V V V • η η η = {η k } k∈{1,...,N} be a family of constants in R n The Hénon neural network (HénonNet) with layer potentials V V V , layer weights W W W , and layer shifts η η η is the mapping A composition of symplectic mappings is also symplectic, so every HénonNet is a symplectic mapping, regardless of the architectures for the networks V k and of the weights W k .Furthermore, Turaev's Theorem 2.1 implies that the family of HénonNets is sufficiently expressive to approximate any symplectic mapping: be a C r+1 symplectic mapping.For each compact set C ⊂ U and δ > 0 there is a HénonNet H that approximates Φ within δ in the C r topology.
Remark 2.2 Note that Hénon-like maps are easily invertible, so we can also easily invert Hénon networks by composing inverses of Hénon-like maps.
We also introduce here modified versions of Hénon-like maps and HénonNets to approximate symplectic maps possessing a near-identity property: Definition 2.4 Let V ∶ R n → R be a smooth function and let η ∈ R n be a constant.We define the near-identity Hénon-like map Near-identity Hénon-like maps satisfy the near-identity property H Definition 2.5 Let N be a positive integer and ..,N} be a family of scalar feed-forward neural networks on R n ..,N} be a family of network weights for V V V The near-identity Hénon network with layer potentials V V V , layer weights W W W , and layer shifts η η η is the mapping defined via and it satisfies the near-identity property

Nearly-Periodic Systems
Intuitively, a continuous-time dynamical system with parameter ε is nearly-periodic if all of its trajectories are periodic with nowhere-vanishing angular frequency in the limit ε → 0. Such a system characteristically displays limiting short-timescale dynamics that ergodically cover circles in phase space.More precisely, a nearly-periodic systems can be defined as follows: Definition 2.6 (Burby et al. 11 ) A nearly-periodic system on a manifold M is a smooth ε-dependent vector field X ε on M such that X 0 = ω 0 R 0 , where • ω 0 ∶ M → R is strictly positive and its Lie derivative satisfies L R 0 ω 0 = 0.
The vector field R 0 is called the limiting roto-rate, and ω 0 is the limiting angular frequency.
Examples from physics include charged particle dynamics in a strong magnetic field, the weakly-relativistic Dirac equation, and any mechanical system subject to a high-frequency, time-periodic force.In the broader context of multi-scale dynamical systems, nearly-periodic systems play a special role because they display perhaps the simplest possible non-dissipative shorttimescale dynamics.They therefore provide a useful proving ground for analytical and numerical methods aimed at more complex multi-scale models.
Remark 2.3 In a paper 9 on basic properties of continuous-time nearly-periodic systems, Kruskal assumed that R 0 is nowhere vanishing, in addition to requiring that ω 0 is sign-definite.This assumption is usually not essential and it is enough to require that ω 0 vanishes nowhere.This is an important restriction to lift since many interesting circle actions have fixed points.
It can be shown that every nearly-periodic system admits an approximate U(1)-symmetry 9 , known as the roto-rate, that is determined to leading order by the unperturbed periodic dynamics: Definition 2.7 A roto-rate for a nearly-periodic system X ε on a manifold M is a formal power series R ε = R 0 + ε R 1 + ε 2 R 2 + ... with vector field coefficients such that R 0 is equal to the limiting roto-rate and the following equalities hold in the sense of formal series: Proposition 2.1 (Kruskal 9 ) Every nearly-periodic system admits a unique roto-rate R ε .
A subtle argument allows to upgrade leading-order U(1)-invariance to all-orders U(1)-invariance for integral invariants: Proposition 2.2 (Burby et al. 11 ) Let α ε be a smooth ε-dependent differential form on a manifold M. Suppose α ε is an absolute integral invariant for a smooth nearly-periodic system X ε on M. If L R 0 α 0 = 0 then L R ε α ε = 0, where R ε is the roto-rate for X ε .

Nearly-Periodic Maps
Nearly-periodic maps are natural discrete-time analogues of nearly-periodic systems, which were first introduced in 11 .The following provides a precise definition.Definition 2.8 A nearly-periodic map on a manifold M with parameter vector space E is a smooth mapping We say F is resonant if θ 0 is a rational multiple of 2π, otherwise F is non-resonant.The infinitesimal generator of Φ θ , R 0 , is the limiting roto-rate.
Example 2.1 Let X ε be a nearly-periodic system on a manifold M with limiting roto-rate R 0 and limiting angular frequency ω 0 .Assume that ω 0 is constant.For each ε ∈ R let F ε t denote the time-t flow for X ε .The mapping F(m,ε) = F ε t 0 (m) is nearlyperiodic for each t 0 .To see why, first note that the flow of the limiting vector field X 0 = ω 0 R 0 is given by F 0 t (m) = Φ ω 0 t (m), where Φ θ denotes the U(1)-action generated by R 0 .It follows that F(m,0) = Φ ω 0 t 0 (m) = Φ θ 0 (m), where θ 0 = ω 0 t 0 mod 2π.This example is more general than it first appears since any nearly-periodic system can be rescaled to have a constant limiting angular frequency.Indeed if the nearly-periodic system X ε has non-constant limiting angular frequency ω 0 then X ′ ε = X ε ω 0 is a nearly-periodic system with limiting angular frequency 1.The integral curves of X ′ ε are merely time reparameterizations of integrals curves of X ε .
Let X be a vector field on a manifold M with time-t flow map F t .A U(1)-action Φ θ is a U(1)-symmetry for X if F t ○ Φ θ = Φ θ ○ F t , for each t ∈ R and θ ∈ U (1).Differentiating this condition with respect to θ at the identity implies, and is implied by, F * t R = R, where R denotes the infinitesimal generator for the U(1)-action.Since we would like to think of nearly-periodic maps as playing the part of a nearly-periodic system's flow map, the latter characterization of symmetry allows us to naturally extend Kruskal's notion of roto-rate to our discrete-time setting.Definition 2.9 A roto-rate for a nearly-periodic map F ∶ M × E → M is a formal power series R ε = R 0 + R 1 ε + R 2 ε 2 + ... whose coefficients are vector fields on M such that R 0 is the limiting roto-rate and the following equalities hold in the sense of formal power series: A first fundamental result concerning nearly-periodic maps establishes the existence and uniqueness of the roto-rate in the non-resonant case.Like the corresponding result in continuous time, this result holds to all orders in perturbation theory.
Thus, non-resonant nearly-periodic maps formally reduce to mappings on the space of U(1)-orbits, corresponding to the elimination of a single dimension in phase space.

Nearly-Periodic Systems and Maps with a Hamiltonian Structure
Definition 2.10 A ε-dependent presymplectic manifold is a manifold M equipped with a smooth ε-dependent 2-form Ω ε such that dΩ ε = 0 for each ε ∈ E. We say (M,Ω ε ) is exact when there is a smooth ε-dependent 1-form ϑ ε such that Ω ε = −dϑ ε .Definition 2.11 A nearly-periodic Hamiltonian system on an exact presymplectic manifold We already know from Proposition 2.1 that every nearly-periodic system admits a unique roto-rate R ε .In the Hamiltonian setting, it can be shown that both the dynamics and the Hamiltonian structure are U(1)-invariant to all orders in ε.
Proposition 2.3 (Kruskal 9 , Burby et al. 11 ) The roto-rate R ε for a nearly-periodic Hamiltonian system X ε on an exact presymplectic manifold (M,Ω ε ) with Hamiltonian H ε satisfies L R ε H ε = 0, and L R ε Ω ε = 0 in the sense of formal power series.
According to Noether's celebrated theorem, a Hamiltonian system that admits a continuous family of symmetries also admits a corresponding conserved quantity 57,60,61 .Therefore one might expect that a Hamiltonian system with an approximate symmetry must also have an approximate conservation law.This is indeed the case for nearly-periodic Hamiltonian systems: Proposition 2.4 (Burby et al. 11 ) Let X ε be a nearly-periodic Hamiltonian system on the exact presymplectic manifold (M,Ω ε ).Let R ε be the associated roto-rate.There is a formal power series θ ε = θ 0 + ε θ 1 + ... with coefficients in Ω 1 (M) such that Ω ε = −dθ ε and L R ε θ ε = 0.Moreover, the formal power series µ ε = ι R ε θ ε is a constant of motion for X ε to all orders in perturbation theory.In other words, L X ε µ ε = 0, in the sense of formal power series.The formal constant of motion µ ε is the adiabatic invariant associated with the nearly-periodic Hamiltonian system.
Note that general expressions for the adiabatic invariant µ ε can be obtained 62 .It can also be shown that the (formal) set of fixed points for the roto-rate is an elliptic almost invariant slow manifold whose normal stability is mediated by the adiabatic invariant associated with the nearly-periodic Hamiltonian system 8 .
A similar theory can be established for nearly-periodic maps with a Hamiltonian structure.Definition 2.12 A presymplectic nearly-periodic map on a ε-dependent presymplectic manifold Definition 2.13 A Hamiltonian nearly-periodic map on a ε-dependent presymplectic manifold (M,Ω ε ) is a nearly-periodic map F such that there is a smooth (t,ε)-dependent vector field Y t,ε with t ∈ R such that the following properties hold true: , for some smooth (t,ε)-dependent function H t,ε .
• For each ε ∈ E, F ε is the t = 1 flow of Y t,ε .
Using presymplecticity of the roto-rate, Noether's theorem can be used to establish existence of adiabatic invariants for many interesting presymplectic nearly-periodic maps.
Theorem 2.4 (Burby et al. 11 ) Let F be a non-resonant presymplectic nearly-periodic map on the exact ε-dependent presymplectic manifold (M,Ω ε ) with roto-rate R ε .Assume that F is Hamiltonian or that the manifold M is connected and the limiting roto rate R 0 has at least one zero.Then there exists a smooth ε-dependent 1-form θ ε such that L R ε θ ε = 0 and −dθ ε = Ω ε in the sense of formal power series.Moreover the quantity µ ε = ι R ε θ ε satisfies F * ε µ ε = µ ε in the sense of formal power series, that is, µ ε is an adiabatic invariant for F.
When an adiabatic invariant exists, the phase-space dimension is formally reduced by two.On the slow manifold µ ε = 0 the reduction in dimensionality may be even more dramatic.For example, the slow manifold for the symplectic Lorentz system 8 has half the dimension of the full system.

Approximating Nearly-Periodic Maps via Gyroceptrons
We first consider the problem of approximating an arbitrary nearly-periodic map P ∶ M × E → M on a manifold M. From Definition 2.8, there must be a corresponding circle action Φ θ ∶ M → M and θ 0 ∈ U(1) such that P 0 = Φ θ 0 .Consider the map I ε ∶ M → M given by This defines a near-identity map on M satisfying I 0 = Id M .By composing both sides of equation (3.1) on the right by Φ θ 0 , we obtain a representation for any nearly-periodic map P as the composition of a near-identity map and a circle action, As a consequence, if we can approximate any near-identity map and any circle action, then by the above representation we can approximate any nearly-periodic map.
Different circle actions can act on manifolds in topologically different ways, so it would be very challenging, if not impossible, to construct a single strategy which allows to approximate any circle action to arbitrary accuracy.Here, we will consider the simpler case where we assume that we know a priori the topological type of action for the nearly-periodic system, and work within conjugation classes.Conjugation of a circle action Φ θ ∶ M → M with a diffeomorphism ψ results in the map ψ ○ Φ θ ○ ψ −1 , and two circle actions belong to the same conjugation class if one can be written as the conjugation with a diffeomorphism of the other one.Note that although compositions of nearly-periodic maps are not necessarily nearly-periodic, the map obtained by conjugation of a nearly-periodic map with a diffeomorphism is nearly-periodic: Lemma 3.1 Let P ∶ M × E → M be a nearly-periodic map on a manifold M, and let ψ ∶ M → M be a diffeomorphism on M. Then the map P ∶ M × E → M defined for any ε ∈ E via is a nearly-periodic map.

8/21
We also have the following useful factorization result for nearly-periodic maps with limiting rotation within a given conjugacy class: Lemma 3.2 Let Φ θ ∶ M → M be a circle action on a manifold M. Every nearly-periodic map P ε ∶ M → M whose limiting rotation Φ ′ θ 0 = P 0 is conjugate to Φ θ 0 admits the decomposition where ψ ∶ M → M is a diffeomorphism and I ε ∶ M → M is a near-identity diffeomorphism.
We will thus assume that we know in advance the topological type of the circle action Φ θ for the dynamics of interest, and then propose to learn the nearly-periodic map P ε by learning each component map in the composition This formula may be interpreted intuitively as follows.The map ψ learns the mode structure of an oscillatory system's short timescale dynamics.The circle action Φ θ provides an aliased phase advance for the learnt mode.Finally, I ε captures the averaged dynamics that occurs on timescales much larger than the limiting oscillation period.I ε and ψ can be learnt using any standard neural network architecture, as long as the near-identity property is enforced in the representation for I ε .It is however important to invert ψ exactly, and this strongly motivates using explicitly invertible neural network architectures for ψ.It has been shown that those coupling-based invertible neural networks are universal diffeomorphism approximators 63 .The parameter θ in the circle action Φ θ can also be considered as a trainable parameter.We will refer to the resulting architecture as a gyroceptron, named after a combination of gyrations of phase with perceptron.
with weights W = (W I ,W ψ ) and rotation parameter θ ∈ U(1), where Gyroceptrons enjoy the following universal approximation property.

Approximating Nearly-Periodic Symplectic Maps via Symplectic Gyroceptrons
We now focus on approximating an arbitrary nearly-periodic symplectic map P ∶ M × E → M on a manifold M. We will restrict our attention to symplectic manifolds with ε-independent symplectic forms (the ε-dependent case is more subtle and will not be pursued in the current study).From Definition 2.8, there must be a corresponding symplectic circle action Φ θ ∶ M → M and θ 0 ∈ U(1) such that P 0 = Φ θ 0 .As before, consider the map I ε ∶ M → M given by Now, the inverse of a symplectic map is symplectic and any composition of symplectic maps is also symplectic.Thus, the map Φ −1 θ 0 = P −1 0 is symplectic, and as a result, I ε is symplectic on M for any ε ∈ E and it satisfies the near-identity property I 0 = Id M .By composing both sides of equation (3.7) on the right by Φ θ 0 , we obtain a representation for any nearly-periodic symplectic map P as the composition of a near-identity symplectic map and a symplectic circle action: Lemma 3.3 Let Φ θ ∶ M → M be a symplectic circle action on a symplectic manifold (M,ω).Every nearly-periodic symplectic map P ε ∶ M → M whose limiting rotation Φ ′ θ 0 = P 0 is conjugate to Φ θ 0 admits the decomposition where ψ ∶ M → M is a symplectic diffeomorphism and I ε ∶ M → M is a near-identity symplectic diffeomorphism.
If we can approximate any near-identity symplectic map and any symplectic circle action, then by the above representation we can approximate any nearly-periodic symplectic map.As before, we will assume that we know a priori the topological type of the circle action Φ θ for the nearly-periodic symplectic system of interest, and work within conjugation classes.Since compositions of symplectic maps are symplectic, Lemma 3.1 implies that the map ψ ○ P ○ ψ −1 , obtained by conjugating a nearly-periodic symplectic map P with a symplectomorphism ψ (i.e. a symplectic diffeomorphism), is also a nearly-periodic symplectic map.We will then learn the nearly-periodic symplectic map by learning each component map in the composition where I ε is a near-identity symplectic map and ψ is symplectic.
The symplectic map ψ can be learnt using any neural network architecture which strongly enforces symplecticity.It is preferable however to choose an architecture which can easily be inverted, so that the computations involving ψ −1 can be conducted efficiently.The near-identity symplectic map I ε can be learnt using any neural network architecture strongly enforcing symplecticity with the additional property that it limits to the identity as ε goes to 0. The parameter θ in the circle action Φ θ can also be considered as a trainable parameter.We will refer to any such resulting composition of neural network architectures as a symplectic gyroceptron.Definition 3.2 A symplectic gyroceptron is a feed-forward neural network with weights W = (W I ,W ψ ) and rotation parameter θ ∈ U(1), where Symplectic gyroceptrons enjoy a universal approximation property comparable to the non-symplectic case.
Theorem 3.2 Fix a symplectic circle action Φ θ ∶ M → M on the symplectic manifold (M,ω) and a compact set C ⊂ M. Let P ε ∶ M → M be a nearly-periodic symplectic map whose limiting rotation is conjugate to Φ θ .Let ψ[W ψ ] ∶ M → M be a feed-forward network architecture that provides a universal approximation within the class of symplectic diffeomorphisms, and let I ε [W I ] be a feed-forward network architecture that provides a universal approximation within the class of ε-dependent symplectic diffeomorphisms with I 0 [W ] = Id M .For each δ > 0, there exist weights W * ψ and W * I such that the symplectic gyroceptron P In this paper, we will use HénonNets 2 as the main building blocks of our symplectic gyroceptrons.The symplectic map ψ will be learnt using a standard HénonNet (see Definition 2.3), its inverse ψ −1 can be obtained easily by composing inverses of Hénon-like maps (see Remark 2.2), and the near-identity symplectic map I ε will be learnt using a near-identity HénonNet (see Definition 2.5).The neural network architectures considered in this paper are summarized in Figure 1.
We would like to emphasize that symplectic building blocks other than HénonNets could have been used as the basis for our symplectic gyroceptrons.For instance, a possible option would have been to use SympNets 3 since they also strongly ensure symplecticity and enjoy a universal approximation property for symplectic maps.However, numerical experiments conducted in the original HénonNet paper 2 suggested that HénonNets have a higher per layer expressive power than SympNets, and as a result SympNets are typically much deeper than HénonNets, and slower for prediction.This is consistent with the observations we will make later in Section 5.1 where we will see that a SympNet took 127 seconds to generate trajectories that were generated by a HénonNet of similar size in 3 seconds.Together with the fact that SympNets are not as easily invertible as HénonNets, the computational advantage of HénonNets makes them more desirable as building blocks than SympNets.

Numerical Confirmation of the Existence of Adiabatic Invariants
In this section, we will confirm numerically that for any random set of weights and bias, the dynamical system generated by the symplectic gyroceptron introduced in Section 3.2, admits an adiabatic invariant.
In our numerical experiments, we will take the circle action given by the clockwise rotation The quantity I 0 (q, p) = 1 2 q 2 + 1 2 p 2 is an invariant of the dynamics associated to the circle action (4.2), and as a result is an invariant of the dynamics associated to the composition ψ ○ Φ θ ○ ψ −1 , and an adiabatic invariant of the dynamics associated to the symplectic gyroceptron (4.1). Figure 2 displays the evolution of the adiabatic invariant (4.3) over 10000 iterations of the dynamical system generated by the symplectic gyroceptron (4.1), for different values of ε.Here, ψ is a HénonNet and I ε a near-identity HénonNet, both with 3 Hénon layers, each of which has 8 neurons in its single-hidden-layer fully-connected neural networks layer potential.We can clearly see that the conservation of the adiabatic invariant gets significantly better as ε gets closer to 0, going from chaotic oscillations of large amplitude when ε = 0.1 to very regular oscillations of minute amplitude when ε = 10 −8 .
We investigated further by obtaining the number of iterations needed for the adiabatic invariant µ to deviate significantly from its original value µ 0 as ε is varied.More precisely, given a value of ε, we search for the smallest integer In other words, we record the first iteration where the value of the adiabatic invariant µ deviates from its original value µ 0 by more than some constant factor ρ > 1 of the maximum deviations experienced in the first few K(ε) iterations.The results are plotted in Figure 3 for ρ = 1.1.We can clearly see from Figure 3 that N(ε), the number of iterations needed for the adiabatic invariant µ to deviate from its original value µ 0 by more than ρ = 1.1 times the maximum deviations experienced in the first few iterations, increases sharply as ε gets closer to 0. This is consistent with theoretical expectations.Note that using higher values of ρ and smaller values of ε would probably generate more interesting and meaningful results.Unfortunately, this is not computationally realizable since N(ε) becomes very large when ρ is increased beyond 1.2.Even for larger values of ε, computing a single point would take several days.

Nonlinearly Coupled Oscillators
In this section, we use the symplectic gyroceptron architecture introduced in Section 3.2 to learn a surrogate map for the nearly-periodic symplectic flow map associated to a nearly-periodic Hamiltonian system composed of two nonlinearly coupled oscillators, where one of them oscillates significantly faster than the other: These equations of motion are the Hamilton's equations associated to the Hamiltonian 2 ) + εU(q 1 ,q 2 ). (5. 2) The ε = 0 dynamics are decoupled, where the first oscillator, initialized at (q 1 (0), p 1 (0)) = (q,p), follows a trajectory characterized by periodic clockwise circular rotation in phase space, while the second oscillator remains immobile: Thus, this is a nearly-periodic Hamiltonian system on R 4 with associated ε = 0 circle action given by the clockwise rotation .
We will use the nonlinear coupling potential U(q 1 ,q 2 ) = q 1 q 2 sin(2q 1 + 2q 2 ) in our numerical experiments since the resulting nearly-periodic Hamiltonian system displays complicated dynamics as the value of ε is increased from 0. We have plotted in Figure 4 a few trajectories of this dynamical system corresponding to different values of ε.
To learn a surrogate map for the nearly-periodic symplectic flow map associated to this nearly-periodic Hamiltonian system, we use the symplectic gyroceptron I ε ○ ψ ○ Φ θ ○ ψ −1 introduced in Section 3.2.In our numerical experiments, ε = 0.01, θ is a trainable parameter, ψ is a HénonNet with 10 Hénon layers each of which has 8 neurons in its single-hidden-layer fully-connected neural networks layer potential, and I ε is a near-identity HénonNet with 8 Hénon layers each of which has 6 neurons in its single-hidden-layer fully-connected neural network layer potential.
The resulting symplectic gyroceptron of 549 trainable parameters was trained for a few thousands epochs on a dataset of 20,000 updates (q 1 ,q 2 , p 1 , p 2 ) ↦ ( q1 , q2 , p1 , p2 ) of the time-0.05flow map associated to the nearly-periodic Hamiltonian system (5.1).The training data was generated using the classical Runge-Kutta 4 integrator with very small time-steps, and the Mean Squared Error was used as the loss function in the training.Figure 5 shows the dynamics predicted by the symplectic gyroceptron for seven different initial conditions with the same initial values of (q 1 , p 1 ) against the reference trajectories generated by the classical Runge-Kutta 4 integrator with very small time-steps.We only display the trajectories of the second oscillator since the motion of the first oscillator follows a simple nearly-circular curve.
We can see that the dynamics learnt by the symplectic gyroceptron match almost perfectly the reference trajectories and follow the level sets of the averaged Hamiltonian H = 1 2π ∫ 2π 0 Φ * θ H dθ , which is given by where J 1 (x) is the first order Bessel function of the first kind, up to an unimportant constant.Using Kruskal's theory of nearly-periodic systems, it is straightforward to show that this averaged Hamiltonian is the leading-order approximation of the Hamiltonian for the formal U(1)-reduction of the two-oscillator system.
We also learned a surrogate map for the nearly-periodic symplectic time-5 flow map associated to the dynamical system (5.1), using a symplectic gyroceptron where ε = 0.01, θ is a trainable parameter, and ψ and I ε both have 10 Hénon layers each of which has 8 neurons in its single-hidden-layer fully-connected neural network layer potential.This symplectic gyroceptron of 681 trainable parameters was trained for a few thousands epochs on a dataset of 60,000 updates (q 1 ,q 2 , p 1 , p 2 ) ↦ ( q1 , q2 , p1 , p2 ).For comparison, we also trained a HénonNet 2 and a SympNet 3 of similar sizes and ran simulations from the same seven different initial conditions.The HénonNet used has 16 layers each of which has 10 neurons in its single-hidden-layer fully-connected neural network layer potential, for a total of 672 trainable parameters.The SympNet 3 used has 652 trainable parameters in a network structure of the form L is the composition of n trainable linear symplectic layers, and N up/low is a non-trainable symplectic activation map.
Figure 6 shows the dynamics predicted by the symplectic gyroceptron, the HénonNet, and the SympNet, for seven different initial conditions with the same initial values of (q 1 , p 1 ) against the reference trajectories generated by the Runge-Kutta 4 integrator (RK4) with small time-steps.As before, we only display the trajectories of the second oscillator.We can see that the dynamics predicted by the symplectic gyroceptron match the reference trajectories very well, although the predicted oscillations around the level sets of the averaged Hamiltonian are unsurprisingly larger than when learning the time-0.05flow map.The crucial advantage that the symplectic gyroceptron offers over the other architectures considered, which only enforce the symplectic constraint, is provable existence of an adiabatic invariant.After training, the other architectures may empirically display preservation of an adiabatic invariant, but this cannot be proved rigorously from first principles.In contrast, the symplectic gyroceptron enjoys provable existence of an adiabatic invariant before, during, and after training.(q 2 , p 2 ) (q 2 , p 2 ) (q 2 , p 2 ) (q 2 , p 2 ) Figure 6.Predictions from a Symplectic Gyroceptron, a SympNet, and a HenonNet, against the reference trajectories for the second oscillator in the nearly-periodic Hamiltonian system (5.1) with ε = 0.01 and the larger time-step of 5.
Note that the symplectic gyroceptron generated the seven trajectories in 5 seconds, which is several orders of magnitude faster than RK4 with small time-steps which took 6,055 seconds.The HénonNet allowed to simulate the dynamics slightly faster, in 3 seconds, while the SympNet was much slower with a running time of 127 seconds, consistently with the observations made in the original HénonNet paper 2 which motivated choosing HénonNets over SympNets in the symplectic gyroceptrons.
15/21 5.2 Charged Particle Interacting with its Self-Generated Electromagnetic Field

Problem Formulation
Next we test the ability of symplectic gyroceptrons to function as surrogates for higher-dimension nearly-periodic systems, and for systems where the limiting circle action is not precisely known.
To formulate the ground-truth model first fix a positive integer K and a sequence of single-variable functions V k ∶ R → R, k = 1,...,K.Consider the canonical Hamiltonian system on R 2 × (R 2 ) K with coordinates (q, p,Q 1 ,P 1 ,...,Q K ,P K ), defined by the Hamiltonian (5.7) The equations of motion are These equations may be regarded as a simplified model of a charged particle (q, p) interacting with its self-generated electromagnetic field (Q 1 ,P 1 ,...,Q K ,P K ).We will describe the application of symplectic gyroceptrons to the development of a dynamical surrogate for this system when ε ≪ 1.First, we verify that this Hamiltonian system is nearly-periodic, since this is the type of dynamical systems that symplectic gyroceptrons are designed to handle.So consider the limiting dynamics when ε = 0.The equations of motion reduce to While these equations of motion may appear impenetrable at first glance, the symplectic transformation of variables given by Λ −1 0 ∶ (q, p,Q 1 ,P 1 ,...,Q K ,P K ) ↦ (q, p,Q 1 ,Π 1 ,...,Q K ,Π K ) where Π k = P k −V k (Q k ) simplifies them dramatically into q = 0, ṗ = 0, which correspond to a family (indexed by k) of harmonic oscillators with angular frequencies k.The solution map in these nice variables is therefore Φ 0 t (q, p,Q 1 ,Π 1 ,...,Q K ,Π K ) = (q, p,Q 1 (t),Π 1 (t),...,Q K (t),Π K (t)), where (5.10) Note that Φ 0 t is periodic with minimal period 2π.The solution map in terms of the original variables (Q k ,P k ) is therefore Φ t = Λ 0 ○ Φ 0 θ ○ Λ −1 0 .Since Φ t is periodic in t with minimal period 2π the ground-truth equations are Hamiltonian nearlyperiodic.The leading-order adiabatic invariant is (5.11) Symplectic gyroceptrons are therefore well-suited to surrogate modeling for this system.
To verify visually that we have learnt the dynamics successfully, we select initial conditions on the zero level set of the adiabatic invariant µ 0 .There, dynamics should remain on that slow manifold which is lower-dimensional and thus more easily portrayed.For the Hamiltonian system (5.7), the slow manifold is the zero level set of µ 0 = 0, which we can see from equation (5.11), is the set of points (q, p,Q 1 ,P 1 ,Q 2 ,P 2 ) such that Q 1 = Q 2 = 0 and On that slow manifold, the dynamics reduce to q = ε p, ṗ = 0 Q1 = 0, Q2 = 0, Ṗ1 = ε psin(q), Ṗ2 = ε psin(2q), (5.12)where in particular the (q, p) dynamics are now independent of (Q 1 ,Q 2 ,P 1 ,P 2 ) and can easily be solved for explicitly, given some initial conditions (q(0), p(0)) = (q,p): q(t) = q + εpt, p(t) = p.
(5.13) Figures 7a)b) show that the trained symplectic gyroceptron generates predictions for the evolution of q and p which remain very close to the true trajectories on the slow manifold when the initial conditions are selected on the zero level set of µ 0 .We also generate dynamics outside the zero level set of µ 0 and verify that the quantity I 0 ○ ψ −1 matches the learnt adiabatic invariant µ learnt 0 along the trajectories generated by the symplectic gyroceptron I ε ○ ψ ○ Φ θ ○ ψ −1 , where More precisely, we check whether I 0 ○ ψ −1 = µ learnt 0 with both quantities being approximately constant along trajectories generated by the symplectic gyroceptron, where , with ( q, p, Q1 , Πk , Q2 , Π2 ) = ψ −1 (q, p,Q 1 ,P 1 ,Q 2 ,P 2 ). (5.15) From Figure 7c), we see that along trajectories which are not started on the zero level set of µ 0 , the value of I 0 ○ ψ −1 remains close to the approximately constant quantity µ learnt 0 , although I 0 ○ ψ −1 displays small oscillations.Since I 0 ○ ψ −1 is an adiabatic invariant for the network, these oscillations remain bounded in amplitude for very large time intervals.The amplitude can in principle be reduced by finding a more optimal set of weights for the network, but it can never be reduced to zero since the true adiabatic invariant is not exactly conserved (oscillations in µ 0 are not visible at the scales displayed in the plot).

Theorem 3 . 1
Fix a circle action Φ θ ∶ M → M and a compact set C ⊂ M. Let P ε ∶ M → M be a nearly-periodic map whose limiting rotation is conjugate to Φ θ .Let ψ[W ψ ] ∶ M → M be a feed-forward network architecture that provides a universal approximation within the class of diffeomorphisms, and let I ε [W I ] be a feed-forward network architecture that provides a universal approximation within the class of ε-dependent diffeomorphisms with I 0 [W ] = Id M .For each δ > 0, there exist weights W * ψ and W * I such that the gyroceptron P ε [W

Figure 2 .
Figure 2. Conservation of the adiabatic invariant (4.3) over 10000 iterations for the map generated by the symplectic gyroceptron (4.1) as ε is increased.

Figure 3 .
Figure 3. N(ε) as a function of ε for ρ = 1.1 and a random set of weights.

Figure 5 .
Figure 5. Level sets of the averaged Hamiltonian (5.6), and the symplectic gyroceptron predictions against the reference trajectories for the second oscillator in the nearly-periodic Hamiltonian system (5.1) with ε = 0.01 and a time-step of 0.05.

Figure 7 .
Figure 7. a) b) Symplectic gyroceptron predictions (colors) against the true trajectories (dashed) with 4 different choices of initial conditions on the zero level set of the adiabatic invariant for the nearly-periodic Hamiltonian system (5.7) with ε = 0.01.c) Evolution of I 0 ○ ψ −1 (colors) and µ learnt 0 (dashed lines) along trajectories generated by the symplectic gyroceptron with 3 different choices of initial conditions for the nearly-periodic Hamiltonian system (5.7) with ε = 0.01.