Coarse-graining Hamiltonian systems using WSINDy

Weak form equation learning and surrogate modeling has proven to be computationally efficient and robust to measurement noise in a wide range of applications including ODE, PDE, and SDE discovery, as well as in coarse-graining applications, such as homogenization and mean-field descriptions of interacting particle systems. In this work we extend this coarse-graining capability to the setting of Hamiltonian dynamics which possess approximate symmetries associated with timescale separation. A smooth \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon$$\end{document}ε-dependent Hamiltonian vector field \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_\varepsilon$$\end{document}Xε possesses an approximate symmetry if the limiting vector field \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_0=\lim _{\varepsilon \rightarrow 0}X_\varepsilon$$\end{document}X0=limε→0Xε possesses an exact symmetry. Such approximate symmetries often lead to the existence of a Hamiltonian system of reduced dimension that may be used to efficiently capture the dynamics of the symmetry-invariant dependent variables. Deriving such reduced systems, or approximating them numerically, is an ongoing challenge. We demonstrate that WSINDy can successfully identify this reduced Hamiltonian system in the presence of large perturbations imparted in the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon >0$$\end{document}ε>0 regime, while remaining robust to extrinsic noise. This is significant in part due to the nontrivial means by which such systems are derived analytically. WSINDy naturally preserves the Hamiltonian structure by restricting to a trial basis of Hamiltonian vector fields. The methodology is computationally efficient, often requiring only a single trajectory to learn the global reduced Hamiltonian, and avoiding forward solves in the learning process. In this way, we argue that weak-form equation learning is particularly well-suited for Hamiltonian coarse-graining. Using nearly-periodic Hamiltonian systems as a prototypical class of systems with approximate symmetries, we show that WSINDy robustly identifies the correct leading-order system, with dimension reduced by at least two, upon observation of the relevant degrees of freedom. While our main contribution is computational, we also provide a contribution to the literature on averaging theory by proving that first-order averaging at the level of vector fields preserves Hamiltonian structure in nearly-periodic Hamiltonian systems. This provides theoretical justification for our approach as WSINDy’s computations occur at the level of Hamiltonian vector fields. We illustrate the efficacy of our proposed method using physically relevant examples, including coupled oscillator dynamics, the Hénon–Heiles system for stellar motion within a galaxy, and the dynamics of charged particles.


Introduction
Hamiltonian mechanics is a formulation of classical mechanics that is used to describe non-dissipative systems [1].Hamiltonian descriptions of physical systems allow for geometric interpretations which are not immediately present in the Newtonian and Lagrangian formulations of classical mechanics.Expression of the dynamics in terms of a conserved quantity (the Hamiltonian) is also essential in the formulation of quantum mechanics.In fact, the Schrödinger equation comprises an infinite-dimensional Hamiltonian system [2].The geometry of phase space allows one to systematically explore conserved quantities, also known as constants of motion, which indicate the presence of symmetry.A symmetry is map on phase space which commutes with the flow-map of the dynamics, and may be used to reduce the size of phase space.Continuous families of symmetries lead to phase space dimension reduction.While many physical systems do possess quantities which are strictly conserved, often the system possesses approximately conserved quantities which lead to approximate symmetries.Such approximate symmetries can still be used to derive a reduced-order system that approximates well the important degrees of freedom in the original system, but such derivations remain an ongoing challenge.
An important example of a system with an approximate symmetry is a charged particle moving in a strong magnetic field.The particle exhibits fast oscillations around the magnetic field lines, which are largely unimportant to measure.Isolating the slow drift motion along and across the field lines is essential in order to efficiently simulate such systems.The concept of adiabatic invariance provides the necessary approximately-conserved quantities which allow one to analytically derive reduced Hamiltonian systems for the slow dynamics of charged particle motion.Passage to this reduced system involves averaging over a continuous family of approximate symmetries resulting from the adiabatic invariant (discussed in Section 2.2).
In practice, full descriptions of Hamiltonian dynamics in the form of governing equations are challenging to identify from experimental data because the fast-scale oscillations are often underresolved.Moreover, given the full set of governing equations, analytically deriving the reduced-dimension system in the presence of symmetries (or approximate symmetries) becomes infeasible for complex systems.In this article we explore the ability of recent weak-form methods to identify sparse equation learning with other reduced-order modeling paradigms is possible, as exhibited by previous works in other contexts (.e.g POD-based methods [33,34] and Neural Networks [35]), and we leave these synergies to future work.

Paper organization
We include preliminary concepts relevant to the study of Hamiltonian coarse-graining in Section 2, namely, an overview of Hamiltonian systems (2.1) with specific attention paid to nearly-periodic Hamiltonian systems (2.2).The latter is the prototypical class of approximate-symmetries which we use in the current manuscript to investigate weak-form coarse-graining.Section 3 contains theoretical results proving that first-order averaging of nearly-periodic Hamiltonian systems at the level of vector fields preserves Hamiltonian structure in the resulting reduced phase space, justifying a search for Hamiltonian coarse-grained models in the presence of approximate symmetries.In Section 4 we describe how the WSINDy algorithm may be applied to learn a general Hamiltonian system of the form 2.1 and we demonstrate in 4.1 that for systems with approximate symmetries, WSINDy can be employed to learn multiple relevant models to describe the system in different regimes.The bulk of our findings is presented in the Section 5, where we quantify the performance of WSINDy applied to four physically-relevant nearly-periodic Hamiltonian systems of varying dimension.

Preliminaries
In this Section we review Hamiltonian dynamical systems with specific attention paid to nearly-periodic systems in Section 2.2.

Hamiltonian systems
Classically, Hamilton's equations describe the evolution of a point z = (q, p) in phase space M = R 2N along a level set a function H ∶ M → R referred to as the Hamiltonian.Hamilton's original equations are and (q, p) were originally associated with the position and momentum of a particle (the dot notation denoting the derivative with respect to time).It can readily be seen from equation ( 1) that along any trajectory t → (q(t), p(t)), H is conserved: d dt H(q(t), p(t)) = 0.By defining the matrix where Id R N is the identity in R N , we can equivalently write (1) as .
z = X H (z), where the Hamiltonian vector field X H is defined The matrix J is nonsingular, anti-symmetric, and of even dimension, in other words it is symplectic.It is this symplectic structure that allows for a significantly more general formulation of Hamiltonian dynamics on arbitrary smooth manifolds using the language of differential forms.For a comprehensive review of the subject, see textbooks [1,36] and the exposition [37] on differential forms in plasma physics.For the purposes of introducing a widely-applicable weak formulation, we will briefly describe general Hamiltonian systems starting with the following definition.
Definition 2.1 (Symplectic manifold).The pair (M,Ω) is a symplectic manifold if M is a smooth manifold and Ω is a closed, nondegenerate differential 2-form on M.
That is, dΩ = 0 (Ω is closed, see [1,36,37] for more details on the exterior derivative d) and for every z ∈ M, Ω z is a bilinear map on the product tangent space at z, T z M × T z M, that is anti-symmetric (Ω z (X,V ) = −Ω z (V,X) for all X,V ∈ T z M) and non-degenerate (Ω z (X,V ) = 0 for all V ∈ T z M implies X = 0).Non-degeneracy and anti-symmetry imply that the dimension of M must be even.If Ω is allowed to degenerate (i.e.admitting a null space) then (M,Ω) is referred to as a presymplectic manifold, which need not have even dimension.However, throughout we assume that M has dimension 2N for some N ≥ 1.We then classify a Hamiltonian system as follows.
Definition 2.2 (Hamiltonian system).Let (M,Ω) be a (2N)-dimensional presymplectic manifold and H ∶ M → R a smooth function.The Hamiltonian vector field X H associated to (M,Ω,H) is defined by The tuple (M,Ω,H,X H ) is referred to as a Hamiltonian system.
In this way, X H is defined implicitly at each point z ∈ M through the action of elements Ω z (⋅,V ) in (T z M) * , the dual of the tangent space T z M. In the Euclidean setting (M = R 2N ), if Ω is symplectic, we can associate Ω with a quadratic form and use the Euclidean inner product (dot product) to write where J z is a symplectic matrix for each z ∈ M, and further, supressing the z-dependence, This will play a role in the weak formulation below.

Nearly-periodic Hamiltonian systems
A nearly-periodic system is a dynamical system that depends smoothly on a small parameter ε and is periodic in the limit ε → 0. A nearly-periodic Hamiltonian system is nearly periodic and is Hamiltonian for all ε ≥ 0. In practice, the symplectic form may depend on ε and degenerate at ε = 0, hence why recent developments in nearly-periodic system theory [13,15,16] work with presymplectic, rather than symplectic Hamiltonian systems (see Examples 3 and 4 below).Nearly-periodic Hamiltonian systems are a prototypical class of dynamics to explore how approximate symmetries may be used to reduce the dimensionality of physical systems.An important example relevant to plasma physics is the dynamics of a charged particle in a strong magnetic field, which is emulated below in Examples 3 and 4. First, we introduce several definitions.The limiting symmetry in a nearly-periodic system is defined using a circle action.
Change with respect to an underlying vector field on M is defined using the Lie derivative.
Definition 2.4 (Lie Derivative).Let X be a vector field on M with associated flow map Φ t such that d dt Φ t (z) = X(Φ t (z)) for all z ∈ M. The Lie derivative of a tensor field τ on M with respect to X is defined by where (⋅) * is the pullback operation (see e.g.[1,Def. 1.7.16]).
For instance, if f is a function on M, then L X f (z) is the directional derivative of f at z in the direction X(z).If V is a vector field on M, then L X V is the Lie Bracket of X and V , given by ∂ X V − ∂ V X, with ∂ denoting the directional derivative.If τ is a rank-k differential form, then the Lie derivative is given by Cartan's formula, We will occasionally denote vector fields in differential operator notation, in other words with local coordinates z = (z 1 ,...,z 2N ) for M the vector field As well as providing a compact notation, this emphasizes the action of vector fields on functions through the A nearly periodic Hamiltonian System is then defined as follows.
For a nearly-periodic Hamiltonian system (M,Ω ε ,H ε ,X ε ), smoothness in ε implies that for sufficiently small ε, Ω ε and H ε have formal asymptotic expansions From Hamilton's equations (3), we find that the Hamiltonian vector X ε has a formal power series X ε = X 0 + εX 1 + ε 2 X 2 + ⋯ with coefficient vector fields given by the infinite family of equations In 1962 M. Kruskal [3] proved that every nearly periodic system possesses an approximate U(1) symmetry given by a unique vector field R ε referred to as the roto-rate, and defined as follows.
Definition 2.6 (Roto-rate).The roto-rate of a nearly periodic vector field X ε is a formal power series R Kruskal also showed that, in the Hamiltonian case, R ε is itself Hamiltonian, with Hamiltonian µ ε known as the adiabatic invariant.The adiabatic invariant is defined as a formal power series Explicit formulas for the first few terms in the series were first found by Burby and Squire [16].Since R ε commutes with X ε to all orders in ε (L R ε X ε = 0), µ ε is formally conserved by X ε , hence the notion of R ε as an approximate symmetry of X ε .

Symmetry reduction in nearly-periodic Hamiltonian systems
In the presence of an exact continuous symmetry t → Φ t of a Hamiltonian system (M,Ω,H,X H ), such that L R X H = 0 and ι R Ω = dµ * , where R is the infinitesimal generator of Φ t , the phase space (M,Ω) and dynamics (H,X H ) may be reduced using the Marsden-Weinstein-Meyer construction (see [1,Chapter 4]).At a high level, this proceduring involves (a) restricting to a level set Λ µ = µ −1 * (µ) of the conserved quantity µ * associated with Φ t , (b) identifying points on orbits of Φ t to form the reduced phase space M µ 0 , and (c) identifying a suitable symplectic form Ω µ 0 on M µ 0 .Existence of Ω µ 0 on the reduced phase space M µ 0 is at the heart of this result.On the other hand, the reduced Hamiltonian H µ 0 on M µ 0 is simply defined since H is, by assumption, constant along R-orbits.We note that other analytical methods of reducing Hamiltonian systems do exist, including those related to normal-form theory [38], which involves successive near-identity coordinate transformations.
In the setting of an approximate symmetry, to derive a reduced Hamiltonian H µ 0 the procedure is similar to Marsden-Weinstein-Meyer, but with substantially more effort for higher orders in ε.For nearly-periodic systems one is guaranteed a reduction in dimension of at least 2, as (1) we restrict to a level set of the adiabatic invariant µ ε , and (2) we condense phase space by forming an equivalence class of points that lie on the same integral curve of the roto-rate R ε .However, the reduction in dimension can be greater, as demonstrated below Section 5, Example 4.
To connect theory and practice, it is useful to draw parallels between geometric reduction of Hamiltonian systems, such as the Marsden-Weinstein-Meyer construction, and dynamic reduction involving averages and restrictions at the level of vector fields.In the next section we place the latter (dynamic) reduction technique on firmer ground by providing new theoretical results that ensure first-order averaging at the level of X ε preserves Hamiltonian structure.These theoretical results complement existing results on nearly periodic systems in [12,16], and provide motivation for weak-form equation learning from trajectories of nearly-periodic systems.Following is a discussion drawing parallels to classical time-averaging in fast-slow systems.

Hamiltonian structure of first-order averaging theory
Let X ε = ω 0 R 0 + ε X 1 + ... be a nearly-periodic Hamiltonian system on the exact presymplectic manifold (M,Ω ε ) with Hamiltonian H ε and limiting roto-rate R 0 .Recall that presymplectic means Ω ε is antisymmetric and closed, dΩ ε = 0, but possibly degenerate.Exact means there is a 1-form ϑ ε such that Ω ε = −dϑ ε .Hamiltonian means ι X ε Ω ε = dH ε , which implies 5/37 the sequence of equations ( 6)- (8).The flow map for R 0 will be denoted Φ 0 θ , where θ ∈ R mod 2π is the time parameter.Given a tensor τ on M its R 0 -average will be denoted by that is, the time-average of the pullback of τ by Φ 0 θ .Generic (e.g.possibly non-Hamiltonian) first-order averaging theory approximates the vector field X ε with the first-order average vector field The first-order average is automatically R 0 -invariant, L R 0 X * ε = 0, meaning the fast phase corresponding to motion along X 0 = ω 0 R 0 is ignorable after the replacement.We will show that the reduced dynamics defined by X * ε and obtained by ignoring the fast phase is presymplectic Hamiltonian under the assumptions in the previous paragraph.Although the all-orders averaging theory of Kruskal [3] is known to be Hamiltonian, the Hamiltonian structure underlying first-order averaging has never been identified in full generality.Since first-order averaging is sufficient for analyzing various important examples, including those considered in this work, a self-contained description here is warranted.
First we recall some useful results related to the adiabatic invariant µ ε from the general theory of nearly-periodic Hamiltonian systems on presymplectic manifolds [16].Adiabatic invariance means L X ε µ ε = 0 in the sense of formal power series.There are general formulas for the first few terms in the series µ ε , the simplest being where ϑ 0 denotes the first term in the formal power series expansion for ϑ ε = ϑ 0 + ε ϑ 1 + .... Observe that the exterior derivative of Eq. ( 10) justifies thinking of the leading-order adiabatic invariant as "energy over frequency," where we have used Eq. ( 6) to infer ⟨H 0 ⟩ = H 0 .The R 0 -average of µ 1 is also simple: In general, some number of the first terms in the series µ ε vanish.Let k denote the smallest integer such that µ k is not identically 0. Set µ * = µ k .The condition L X ε µ ε = 0 is equivalent to the sequence of equations Next we prove a simple technical lemma that establishes structural properties of the first-order averaged system related to its presymplectic Hamiltonian formulation.
Lemma 1.The first non-zero coefficient µ * in the adiabatic invariant series is constant along solutions of the first-order averaged system X * ε , In addition, the first-order averaged system satisfies the Hamilton-like equation Proof.The conservation law (15) follows immediately from (13) and the R 0 -average of ( 14).(Notice that ⟨L X 0 µ 1 ⟩ = ω 0 ⟨L R 0 µ 1 ⟩ = 0 and ⟨µ * ⟩ = µ * .)As for (16), it follows directly from the R 0 -average of (7) and the identity which represents an application of Cartan's formula (5) and (12).
We are finally in position to establish our main claim: after "forgetting" the fast phase in the first-order averaged system, the resulting slow dynamics is presymplectic Hamiltonian.The result follows from Lemma 2 and a variant of the Marsden-Weinstein symplectic quotient technique.
there is a unique vector field X µ 0 , a unique closed 2-form Ω µ 0 , and a unique function Moreover, X Proof.By construction, the first-order averaged system commutes with R 0 , L R 0 X * ε = 0.By Eqs. ( 13) and (15) both R 0 and X * ε are tangent to Λ µ .Therefore L R µ X * µ = 0, which says that the X * µ -flow maps R µ -orbits into R µ -orbits.It follows that there is a unique vector field X µ 0 on orbit space M µ 0 that is π-related to X * µ .In general, the interior product of R µ with Ω * µ is given by ι It is straightforward to show that dΩ µ 0 = 0.

7/37
Since L R µ H * µ = 0 there is a unique function Overall, we have just established existence of the desired objects on M µ 0 that satisfy Eq. (18).Establishing the Hamilton equation ( 19) is now simple.Notice that From equations ( 16), (17), and ( 19), we can identify two methods of obtaining the leading-order reduced Hamiltonian system as the "leading-order reduced" system since it is obtained from utilizing the roto-rate R ε and adiabatic invariant µ ε each to leading-order (while X * ε is the "first-order averaged" vector field since it is obtained from averaging X ε to first order, X * ε = ⟨X 0 + εX 1 ⟩).Higher-order reduced systems can be obtained in a similar manner when given access to higher-order terms in R ε and µ ε , however the order-by-order Hamiltonian structure is as-of-yet undetermined, and we reserve it for a future work.We will differentiate the two methods of obtaining as geometric reduction and dynamic reduction: , keeping terms to first-order in accordance with Hamilton's equations ( 6)-( 8) µ observed on this section equals X µ 0 and is Hamiltonian according to (19) Geometric reduction is more robust and allows for systematic exploration of higher-order reductions in ε.On the other hand dynamic reduction merely involves observing the system on the reduced submanifold M µ 0 , after averaging around the flow of the limiting roto-rate R 0 .In physical systems, it can be assumed that the system is already close to this manifold, hence some method of averaging observations is all that is needed to observe the Hamiltonian structure of the system to first order.We exploit this property in the current work, whereby the weak form emulates the averaging necessary to reveal Hamiltonian structure in the dynamics.

Connection with classical time-averaging
One case of interest is when M = R 2N and the dynamics can be partitioned into slow and fast modes, with Hamiltonian H ε given by As in Examples 1 and 2 below, the first-order averaged Hamiltonian then takes the form, for some where µ * = µ 0 is the leading-order adiabatic invariant.When the symplectic form Ω ε satisfies Ω ε = Ω 0 , the first-order averaged dynamics X * ε are already Hamiltonian, given by ι X * ε Ω 0 = dH * ε .If we further have ι V s ι V f Ω 0 = 0 for any pair of vector fields V s , V f tangent to the slow and fast submanifolds, respectively, then the dynamics ż = X * ε (z) take the form .
with ∂ 2 denoting differentiation with respect to the second argument and symplectic matrices J s , J f derived from Ω 0 for the slow and fast subsystems.While we have not yet restricted to a level set of µ 0 , for any trajectory z = (z s ,z f ) of ( 20) it holds that where the first term is zero because µ 0 is conserved by the flow of X 0 , or and the second term is zero because J f is symplectic.From the dependence of H on µ 0 , conservation of µ 0 along this reduced flow implies that the slow system zs decouples from the fast system z f .Reducing phase space according to is then just restriction to the slow variables z s .The reduced Hamiltonian is then given by, up to a constant depending on µ, We can now make some connections with classical time-averaging.If we instead represent the dynamics of the original slow variables z s as the non-autonomous system .
suppressing explicit dependence on z f through the second argument of F, then the previous procedure of first-order averaging by the limiting roto-rate R 0 is similar to classical averaging techniques [39,Ch.2].Specifically, for flow-map Φ 0 t of R 0 , it holds that z f (t) = (Φ 0 t z(0)) f + O(ε), and we see that F is O(ε) close to being 2π-periodic in its second variable.The time-averaged autonomous system .
then stays O(ε) close to the slow portion of the averaged system (20) (and hence to the original system) on timescales of O(ε −1 ).While classical time-averaging is very useful for systems (21) with F exactly 2π-periodic in its second argument, this is the extent of its utility, and its application to nearly-periodic systems is severely limited.The fast mode z f is only 2π-periodic when ε = 0, such that for any ε > 0, the slow and fast systems are coupled, with the dynamics for z s not falling into the class (21) with F periodic in t.Deriving the correct analytical formulas for reductions of nearly-periodic systems is a more subtle and labor-intensive task (if not intractable), especially if a Hamiltonian structure is to be preserved (as outlined in the previous section), but significant progress has been made in this direction [12,15,16].
The aim of the current work is to complement these analytical techniques with computational methods to learn the correct reduced Hamiltonian systems using weak-form equation learning.Given the intricacies of reducing a nearly periodic system by means of the roto-rate R ε , it is surprising that the weak form recovers the same reduced dynamics from discrete-time trajectory data.In the next section we describe how WSINDy may be employed in this context.

WSINDy for Hamiltonian systems
In this section we describe how WSINDy may be used to identify the Hamiltonian H defining a Hamiltonian system using discrete-time observations of the flow of X H .For the remainder of this article we will work in M = R 2N , although many of the concepts that follow have direct extensions to general manifolds.We will also assume that the symplectic form Ω is known, and leave identification of Ω and extension to general manifolds to future work.
We can define a weaker version of (4) (i.e.requiring less regularity of the resulting integral curves of X H ) by considering a trajectory where J −1 (z) is the symplectic matrix associated with Ω z .Using integration by parts in time against a smooth time-dependent vector field V (z,t) satisfying V (z(0),0) = V (z(T ),T ) = 0, we have If V has no explicit z-dependence, the left-hand side can be evaluated without differentiating z, leading to On the other hand, if V depends on z, we can use the dynamics .
z = X H (z) and rearrange terms to get In either case, the phase space variables z are not differentiated.This is crucial to accurate identification of H from time series data that is corrupted from measurement noise.We will show here that this weak formulation also serves to filter out intrinsic dynamics that occur on a faster timescale.We restrict ourselves to the simpler case (23) in this work and leave full exploration of ( 22) with general test vector fields V = V (z,t) to future research.
To identify H from data, we consider noisy evaluations Z = z(t) + η where t → z(t) is a trajectory from some Hamiltonian system (R 2N ,Ω,H,X H ), t = (t 0 ,...,t m+1 ) are the timepoints, and η represents i.i.d.mean-zero measurement noise with variance σ 2 < ∞.We approximate H by expanding in terms of a chosen library H = (H 1 ,...,H J ) of J trial Hamiltonian functions, and we make the assumption that ŵ is sparse.To solve for ŵ, we first discretize (23) Here, ⟨⋅,⋅⟩ t defines a discrete inner product on R 2N -valued functions of time using t as quadrature nodes.For simplicity and previously demonstrated benefits [22,23], we use the trapezoidal rule throughout, so that where V i ∶= V (t i ) and ∆t i ∶= t i+1 −t i .For compactly supported V (or X) in time, this reduces to In this work, we use the convolutional approach as in [22], that is, we fix a reference test function φ ∈ C ∞ c (R) supported on [−T φ /2,T φ /2] for some radius T φ , and we set the test vector fields to We then solve the sparse regression problem ŵ = argmin where in this work we use the MSTLS algorithm to solve (24) (modified Sequential Thresholding Least Squares [22]), which uses sequential thresholding on the combined term ∥G j w j ∥ 2 /∥b∥ 2 and coefficient magnitudes |w j |.In addition, MSTLS performs a grid search for a suitable value of λ (see [22] for more details and [17] for the original STLS algorithm).More information on MSTLS is provided in Appendix B. Data from multiple trajectories can easily be incorporated to improve the recovery process.When L trajectories Z = (Z (1) ,...,Z (L) ) are available, with samples Z (ℓ) = z (ℓ) (t (ℓ) ) + ε, the linear system (G,b) is formed by vertically concatenating the linear systems from each trajectory, G = [(G (1) ) T ] T .Note that the test vector fields in this case need not be the same for each trajectory.In the current work however, we focus on demonstrating recovery of a suitable reduced Hamiltonian H µ 0 using only a single trajectory.

WSINDy for Hamiltonian Systems with Approximate Symmetries
An intriguing aspect of Hamiltonian systems that exhibit approximate symmetries, from a computational standpoint, is that they typically admit a hierarchy of models which can be used to efficiently express the dynamics in different regimes.These consist of the dynamics of the limiting roto-rate R 0 , the full Hamiltonian vector field X ε , and reduced dynamics for each order in ε obtained from averaging and restricting X ε according to the methodology in Section 3. Theorem 1 indicates that the leading-order reduced vector-field X µ 0 is Hamiltonian, and we conjecture that similar conditions exist for higher-order-averaged systems to remain Hamiltonian, justifying a search for Hamiltonian structure in coarse-grained models of X ε .While the focus of this article is to demonstrate that weak-form methods can be used to identify a coarse-grained Hamiltonian system that agrees with averaging theory, in this Subsection we demonstrate a more general property, that when given access to all state variables, WSINDy allows access to multiple models from the hierarchy associated with nearly-periodic Hamiltonian systems, simply by varying the test function radius T φ .That is, this Subsection provides justification for a complete analysis of this multi-model inference capability, which will be provided in a future work.
The advantages of weak-form coarse-graining can be clearly observed by examining the performance of WSINDy as a function of the weak-form scale selector where T φ is the support length (in time) of the test function φ and T f is the dominant period of the fast scale.In words, σ φ f is the number of fast periods that occur in one integration of the dynamics against φ .Figures 1 and 3 depict the performance of WSINDy as σ φ f is varied from 1 to 16 (i.e.T φ is varied from T f to 16T f ) when applied to the full 4-dimensional nearly-periodic Hamiltonian system (28) in Example 1.Here the library H contains only the 7 terms necessary to represent all three relevant models, that is, the leading order dynamics given by H 0 = 1 2 (Q 2 + P 2 ), the full dynamics given by H ε (eq.( 26)), and the reduced Hamiltonian H µ 0 (eq.( 33)). Figure 3 demonstrates that simply by varying the support of the test function, we gain access to all three models.The added 1% noise affects the transition between models (Figure 3, left; with lower noise requiring larger σ φ f to obtain the reduced model) as well as the recovered coefficient accuracy (Figure 3, right), which is less that 1% for each model in its range of validity (i.e. when TPR = 1, see eq. ( 52)).Figures 1 and 2 and provide visualizations of the learned dynamics Ẑ in phase space and as time series.It is clear from the plots over (P,q, p) and (Q,P,q) space (Figure 1) that the dynamics can be approximately described as two commutative flows given by the coarse-grained (red) and roto-rate (cyan) dynamics.Meanwhile, the time series plots of the slow variables (2, left) indicate that the coarse-grained dynamics (red) very accurately match the full model (yellow).Hence, from a single dataset, the weak form allows access to three models that are fundamental to understanding a nearly-periodic Hamiltonian system.
For the remainder of the article, we focus solely on the ability of WSINDy to identify a suitable coarse-grained Hamiltonian

Numerical Experiments
To exhibit the efficacy of weak-form coarse-graining in a variety of contexts, we examine the following nearly periodic Hamiltonian systems.In each case we apply the geometric reduction procedure outlined in Section 3 to derive the reduced Hamiltonian system to be discovered.Note that in Examples 1 and 2 the coefficients of the reduced system are given in terms of special functions and integrals which must be numerically evaluated.

Example 1: Nonlinearly Coupled Oscillators
For our first system we examine two coupled oscillators with different timescales and a nonlinear coupling potential.The ε-dependent Hamiltonian and sympletic form on R 4 are given by with coupling potential given by Note that Ω ε = Ω 0 = Ω.The variables (Q,P) denote the position and momentum of a fast oscillator with dynamics on O(1) timescales, while (q, p) denotes the slow variables evolving on a timescale of O(ε).The equations of motion for this two-oscillator system are As ε increases, the fast variables (Q,P) impart large perturbations to the oscillating motion of the slow variables (q, p) by means of V (see Figure 4, right).At ε = 0 the two oscillators decouple and the dynamics are governed by the limiting roto-rate which we identify as the Hamiltonian flow on (R 4 ,Ω) of the leading-order adiabatic invariant 13/37 (Q(0),P(0)) = ( 1 2 π,0), with pendulum dynamics close to linear.Right: (Q(0),P(0)) = ( 31 32 π,0), with the pendulum initialized close to the unstable fixed, removing the strict scale separation (see Figure 6).For each, (q 1 , p 1 ) and (q 2 , p 2 ) are plotted as separate trajectories overlaying Ĥµ 0 (0,0,q 2 , p 2 ) (coloring conventions are the same as in Figure 4).
In this case the limiting frequency function ω 0 such that X 0 = ω 0 R 0 is unity.Hence, to leading order (Q,P) exhibits periodic clockwise circular rotation while (q, p) remains fixed.The flow-map Φ 0 t for R 0 is thus given by Each level set of µ 0 is a single orbit of R 0 crossed with the slow variables (q, p) ∈ R 2 , hence the quotient procedure Thus, the leading-order reduced Hamiltonian H µ 0 (q, p), is obtained from averaging H ε around Φ 0 t starting from an arbitrary point (Q,P,q, p) satisfying µ = µ 0 (Q,P,q, p): where J 1 (z) denotes the Bessel function of the first kind.The leading-order reduced equations of motion for the slow variables (q, p) are therefore given by . .
On the subset of phase space S 0 = {z ∶ 1 2 P 2 < α 2 (1 + cos Q)}, equations ( 36)-( 41) define a fast-slow system.The limiting angular frequency function (see Definition 2.5) is where K denotes the complete elliptic integral of the first kind (defined in e.g.[40]; note that within S 0 the argument for K is in the interval [0,1]).The limiting roto-rate is Let Φ θ denote the time-θ flow map for R 0 .The leading-order adiabatic invariant is given by PdQ.
where γ(z) is the parameterized curve γ(z)(θ ) = Φ θ (z).This integral can also be expressed in terms of elliptic integrals as where E denotes the complete elliptic integral of the second kind.Like Example 1, µ 0 is only a function of the fast variables (Q,P).For Q mod 2π, level sets of µ 0 contain a single R 0 orbit, so again the quotient procedure M → Λ µ → M µ 0 simply eliminates the fast variables (Q,P).Let (Q θ ,P θ ) be defined by Φ θ (z) = (Q θ ,q 1 ,q 2 ,P θ , p 1 , p 2 ).The first-order averaged Hamiltonian H * ε and reduced Hamiltonian H µ 0 are then given by where with the integrals taken along any trajectory (Q θ ,P θ ) of R 0 with µ 0 (Q 0 ,P 0 ) = µ.(bottom row) as the embedded pendulum dynamics approach the separatrix, as observed in the power spectrum.
We choose the following parametrizations, which lead to the reduced dynamics reproducing the famous Hénon-Heiles system [41] proposed as a simple model for the motion of a star within a galaxy: 2 ).
Unlike the previous example, here the nonlinear pendulum dynamics lead to anisotropic oscillation frequencies depending on the initial conditions.Correspondingly, the identified reduced system is highly-dependent on which integral curve of the roto-rate the initial conditions lie.For (Q(0),P(0)) close to the unstable fixed-point Q(0) = π, the fast dynamics become increasingly nonlinear and exhibit a longer period.This leads to a blending of the fast and slow scales, as visualized in Figure 6 in Fourier space.The right plots show the power spectrum of the trajectories at Q(0) = π/2 (top) and Q(0) = 31π/32 (bottom), where the latter clearly has much less of a separation of scales.This example demonstrates that WSINDy is still able to identify the leading-order reduced dynamics from systems with a complex microstructure.

Example 3: Charged particle motion
Consider the motion of a charged particle with position (x,y,z) and velocity (v x ,v y ,v z ) in a time-independent electromagnetic field of the form B B B(x,y,z) = ∇ × A(x,y,z), E E E(x,y,z) = −∇ϕ(x,y,z).If the field potentials are independent of z, that is, A = (A x (x,y),A y (x,y),0) and ϕ = ϕ(x,y), then the z-component of the particle momentum is a constant of motion.Therefore the Lorentz force equation describing the particle's motion may be written, for B(x,y) = ∂ x A y − ∂ y A x , as .
. This is an ε-dependent Hamiltonian system on R 4 ∋ (x,y,v x ,v y ) if the Hamiltonian and symplectic form are given by ,y) Letting q = (x,y), v = (v x ,v y ), the flow map of the roto-rate at ε = 0 is given by with R the rotation matrix from Example 1.Using the formulas in [16], the adiabatic invariant is degenerate to first order, with µ * = µ 2 (see ( 15)) given by The first-order averaged system X * ε is automatically presymplectic Hamiltonian according to Lemma 2, with Hamilton's equations ( 16) for H * ε = ε ϕ(x,y) and Ω 0 = −B(x,y)dx ∧ dy producing the first-order averaged dynamics .
Restriction to a level set of µ 2 merely restricts v-space, which is already absent from dynamics, and the flow of the roto-rate fixes q-variables, so we have A simple test case for this problem is a constant magnetic field.This corresponds physically to zero current density, as this reduces Ampére's law to ∇ ×B B B = 0, under which the assumption B B B = B(x,y)ẑ leads to B = const.We set B = 1 throughout, which for example corresponds to the vector potential A = ( 1 2 x,− 1 2 y,0).We prescribe a sinusoidal ambient electric field potential which, given Gauss's law ∆ϕ = ε −1 0 ρ relating the electric potential ϕ to the charge density ρ, loosely corresponds to a lattice of bound charges.
As the charged particle approximately travels along contours of φ , the velocity variables (v x ,v y ) exhibit near-periodic fast-scale oscillations.For larger ε, the motion of (v x ,v y ) becomes modulated by the steepness of ∇φ , which leads to irregular perturbations to the position variables (x,y).This can be seen in Figure 8 (right) as the oscillations in the black curve become elongated.

Example 4: Coupled charged particle motion
The previous example outlines a method of reducing the dimensionality of certain systems by a factor of two.We extend it in our last example to a system of two particles {(x 1 ,y 1 ,v x 1 ,v y 1 ), (x 2 ,y 2 ,v x 2 ,v y 2 )} ∈ R 4×4 ≅ R 8 interacting under a pair-wise with symplectic form 2 ), the system may be used to classically describe a pair of charged particles (Q corresponding to the product of two charges).To regularize the system we use a crude approximation to K c given by the first-order Taylor expansion about unit separation which provides a reasonable approximation for interparticle distances near unity.We use a repulsive version of this interaction potential with Q = −(2π) −1 , under which the dynamics become . .
where ϕ is given by (46).This leads to the leading order roto-rate: Carrying out the same procedures as before (note that presymplectic Hamiltonian dynamics also appear), we arrive at the reduced Hamiltonian . . . .

Dynamics and sampling regimes
Since we are interested in identifying coarse-grained models, we focus on observing only the slow variables and attempt to recover H µ 0 .It is clear from Figure 3 that this depends on σ φ f being sufficiently large.In Figure 9 we will see that this is ensured by choosing T φ according to the method described in Appendix A, a modification of the procedure in [22].
To exemplify a range of conditions under which WSINDy identifies the correct coarse-grained dynamics, we quantify the accuracy between the learned ( Ĥµ 0 ) and true (H µ 0 ) reduced Hamiltonian under the following settings: (i) Regions of phase space: We sample trajectories with initial conditions approaching an elliptic fixed point of H µ 0 , near which the ε = 0 dynamics dominate and H µ 0 fails to be recovered, to probe the near-periodicity of the system.(For Example 4, one particle is placed near an elliptic fixed point and the other a distance one away with zero velocity).
(ii) Perturbative regime: We examine two perturbative regimes, labeled the mild and extreme regimes.For Examples 1,3, and 4 this is defined by two different values of ε, with ε ∈ {0.01,0.05}for Example 1 and ε ∈ {0.01,0.03}for Examples 3 and 4. For Example 2 we fix ε at the reasonably large value ε = 0.05 and define the mild and extreme regimes as the two different sets of initial conditions for the nonlinear pendulum, (Q(0),P(0)) ∈ {( π 2 ,0),( 31π 32 ,0)}, the latter driving the system very close to the separatrix at Q = π, whereby the scale separation disappears (see Figure 6).
(iii) Data sampling regime: We fix the timestep ∆t to be 10 points per fast cycle and the total time window T to be 4 cycles of the slow system, or with T s ,T f denoting dominant slow and fast periods, From a practical perspective, this is realistic because one cannot expect to sample the fast system at a high resolution; however, this level of resolution is still adversarial because the sampling rate is high enough for WSINDy to resolve the fast system (as evidenced by Figures 1-3).With coarser sampling one might expect to more easily recover the reduced dynamics via aliasing, a property leveraged in [32], although in general, larger ε leads to mixing of the slow and fast scales over longer time windows.This is why the test function support in Example 2 cannot be taken too large (see Figure 9, row 2), hence we conjecture that in general the aliasing approach will lead to inaccurate reduced dynamics.A total time window of 4 cycles (T = 4T s ) is used to provide enough data to identify a dominate slow scale (i.e. by sampling for longer than a single slow cycle).We also found that T = 4T s was necessary to identify the dynamics in Example 2 at Q(0) = 31π/32, as in this case the slow-fast period ratio is T s f ∶= T s /T f ≈ 7, leading to only 278 timepoints at the resolution ∆t = T f /10, which did not provide adequate data for robust recovery in the high perturbative regime from time windows T ≤ 3T s .
(iv) Extrinsic noise: Realistic measurement settings require consideration of extrinsic (measurement) noise.We define the noise level of the data by where Z ⋆ represents the exact (noise-free) data and Z the observed (possibly noisy) data.We consider additive mean-zero i.i.d.Gaussian noise and noise levels up to σ NR = 0.  53)).The components of H are defined in (49)-(51).By #{H} we denote the cardinality of H.

Hyperparameters
The WSINDy hyperparameters consist mainly of the library H and test functions V.The main purpose of this article is not to show robustness to sheer library size, but instead to show that under the weak-form transformations the dynamics agree well with H µ 0 .For this reason, we restrict the library H to 40 − 70 possible terms which include a representation of H µ 0 .In Examples 1-4, the leading-order reduced Hamiltonian H µ 0 can be represented with trigonometric functions, polynomials and products thereof.Define the monomial library of degree n in 2N variables by and the partial cosine library with base frequency f 0 up to maximum frequency n f 0 as (discarding redundancies) Also define the product linear-trig library with trigonometric terms of frequencies f 0 to n f 0 by The libraries used in each example are combinations of P , given in Table 1.For the set of test vector fields V, we take the simple convolutional approach as in [22].That is, we fix a reference test function (1 S denotes the indicator function on the set S).We then set the test vector fields to for a fixed set of query timepoints Q ∶= {t k } K k=1 .The free parameters are Q and the support width T φ .We choose T φ by first finding a cornerpoint k * in the Fourier spectrum of the data, according to the method in Appendix A, and then assigning k * to be 2 standard deviations into the tail of the power spectrum |F[φ (t)]|, where F is the discrete Fourier transform and |F[φ (t)]| is interpreted as a probability distribution over Fourier modes.We let Q be equally spaced and covering the given time grid such that T φ /(t k+1 −t k ) = 12 for 1 ≤ k ≤ K.

Performance metrics
We measure accuracy of the recovered Hamiltonian Ĥµ 0 with respect to H µ 0 using three metrics.While all should be considered together to assess performance, each is independently useful and may suffice for specific applications.

20/37
We measure the model selection accuracy using the true positivity ratio TPR(ŵ), defined as where T P is the number nonzero entries in ŵ that appear in the correct weight vector w ⋆ , FP is the number of nonzero entries in ŵ that are zero in w ⋆ , and FN is the number of terms that are zero in ŵ but nonzero in w ⋆ .In this way, recovering S terms correctly and no false terms leads to TPR = S/S ⋆ where S ⋆ = ∥w ⋆ ∥ 0 is the number of true correct terms.
To assess pointwise agreement with H µ 0 , we measure the relative error of a related quantity of interest Q(z), where Q is the learned quantity of interest, D(Z) is an equally-spaced computational grid covering the smallest rectangle in phase space containing the observed data Z, and The quantities Q used for each example are listed in Table 1, with Q = H µ 0 for Examples 1 and 3.For Example 2 we use the zero-momentum section, Q(z) = H µ 0 (q 1 ,q 2 ,0,0), which captures agreement with the potential field V .In Example 4 we use agreement with the (x 1 ,y 1 ) = (x 2 ,y 2 ) section, , which measures agreement with the background electric field ϕ(x,y).Lastly, we measure agreement with forward simulations of the Hamiltonian dynamics given by H µ 0 and Ĥµ 0 .This is a more subtle performance metric because the initial conditions for the reduced system are not simply the restriction of the initial conditions to the slow variables.In addition, the full dynamics may be chaotic, which may also lead to chaos in the reduced system if the reduced system has two or more degrees of freedom (e.g.Examples 2 and 4).To find accurate initial conditions we be do a minimal grid as described in Appendix C. To deal with chaos, we only measure agreement up to the first full period of the observed variables, T s , defined as the period of the dominant Fourier mode in the dynamics.Letting Z, Z µ , and Ẑµ denote the observed data, the forward simulation from the true reduced system H µ 0 , and the forward simulation from Ĥµ 0 , respectively, we define the following forward simulation errors: The value ∆Z is not expected to be small for larger ε, but assesses whether the trajectory exhibits a phase error from the full system, wheres ∆Z µ assesses agreement with the analytical reduced system.The value ∆Z ⋆ is a reference measure and serves as an approximate lower bound for ∆Z.

Dependence on test function radius
As mentioned previously, the ratio between the test function support width T φ and the fast scale T f , σ φ f = T φ /T f , plays a role in the accuracy of Ĥµ 0 with respect to H µ 0 .This relationship is graphically represented in Figure 9, where we present the TPR, ∆H, and ∆Z µ values (equations ( 52)-(56)) for each example over a range of σ φ f , as well as the σ φ f values resulting from T φ as computed using the method in this paper (black curves), see Appendix A. In Figure 9 we display only the results for the extreme perturbative regime (ε = 0.05 for Examples 1 & 2 and ε = 0.03 for Examples 3 & 4) as the method performs very well for a wide range of σ φ f in the milder perturbative regime.
Several general trends can be observed when σ φ f is varied.From the perspective of the TPR score, it can been seen that the optimal T φ is usually associated with some σ φ f ∈ [5,30], and the exact optimal σ φ f is highly dependent on the trajectory.Comparing the first and second columns, we see that TPR=1 always correlates with lower ∆H, yet there are instances where the Hamiltonian is captured accurately (low ∆H) without correct identification of H µ 0 (e.g.Example 3, z(0) index 10).On the other hand, an accurate forward solve does not always correlate with TPR, so from the perspective of ∆Z µ we get a different optimal σ φ f .Often larger σ φ f will yield lower forward simulation errors ∆Z µ despite not capturing the correct form of H µ 0 (i.e.TPR<1).This is most readily observed from Examples 3 and 4 (right column), as well as Example 1 for z(0) indices 6 − 10.This serves to highlight the importance of examining multiple performance metrics, as the down-stream task (forward simulations, scientific inference, etc.) should determine which metric is weighted most highly.
The black curve indicates values of σ φ f (hence T φ ) resulting from default settings of the method as presented here.In all cases except z(0) indices {8,9,10,12} of Example 1 and z(0) index 10 in Example 3, the identified T φ yields the correct model terms, which automatically grants excellent agreement with H µ 0 (column 2).In these five cases with TPR < 1, the Hamiltonian is still very accurate, with ∆H ≈ 0.02.For Example 2, the black curve successfully lies in the region of admissible σ φ f values, which is narrow due to mixing of slow and fast scales which causes lower ∆H and ∆Z µ at larger T φ .Results are similar for Example 3. In Example 4 the correct model is found for nearly all σ φ f and z(0), although larger T φ may increase accuracy.3. Range of forward simulation errors across initial conditions for each example.
5.9 Results: zero extrinsic noise (σ NR = 0) In this section we demonstrate that the reduced Hamiltonian H µ 0 is sufficiently recovered over a wide range of initial conditions and perturbative regimes using a single trajectory.This implies that often only a small sample in phase space is needed to recovery the entire reduced Hamiltonian, as opposed to neural-network based approaches, which are often trained on O(10 4 ) input-output pairs with substantially longer computation times.
In the top three rows of Figures 10-14 we plot the learned trajectories on a black-to-red scale overlaying the training data in cyan.The color of each learned trajectory indicates the value of the given statistic (TPR, ∆H, ∆Z µ , ∆Z).As a reference, the bottom rows of Figures 11-14 plot simulations from H µ 0 , colored according to ∆Z ⋆ , for comparison with ∆Z.

Model identification (TPR)
In Figure 10 we report the TPR score (defined in (52)) for each trajectory, indicating that in the vast majority of cases WSINDy recovers the correct terms in H µ 0 even under significant perturbations imparted by the fast scale.In the left column we see that the mild perturbative regime yields a TPR of 1 (i.e.perfect model recovery) for all cases except the inner-most trajectories of Examples 1 and 3, which are highly degenerate, nearly-circular orbits.Moreover, trajectories from these misspecified models show qualitative agreement with the true dynamics.It is most surprising that in Example 1 there exist trajectories that only enclose one of the relevant fixed points and still enable identification of H Comparing the top three rows with the bottom rows of Figures 11-14, we observe that for each example, initial condition, and perturbative regime, the learned trajectory provides excellent qualitative agreement with the H µ 0 dynamics.That is, the level curves of Ĥµ 0 are very close to those of H µ 0 .More quantitatively, the range of ∆H is given in Table 2.For the mild perturbative regime (left columns of , the method produces a maximum ∆H of 0.012 (with the exception of one outlier trajectory with ∆H = 0.087), and for the extreme regime the maximum is ∆H = 0.053.To summarize, given a single trajectory from H ε , the learned Hamiltonian Ĥµ 0 using WSINDy agrees with H µ 0 over a large region of phase space, and with very mild dependence on the perturbative regime: ∆H is upper-bounded by ε in almost all cases.   .Note also that the right column corresponds to the black curves in Figure 9.
For Example 1, the larger forward simulation errors are due to minor phase differences which lead to rapid accumulation of errors (despite close qualitative agreement), while Example 4 suffers again because of its chaotic nature.
Ultimately, our findings indicate that coarse-grained models provided by WSINDy have the potential to be very useful surrogate models in forward simulations.However, Example 1 indicates that capturing the phase of a slow oscillator for arbitrarily many periods may require additional constraints on the model.On the other hand, all examples indicate that level sets of the reduced Hamiltonian are readily captured by the present method.As well as providing insight into the energy landscape, forward simulations may incorporate knowledge about the level sets.We will pursue these lines of research in a future work.

Results: nonzero extrinsic noise (σ NR > 0)
We now survey the performance of WSINDy in recovering H µ 0 from data with extrinsic (measurement) noise, with quantitative results presented in Figure 15.For each example we take a fixed trajectory from the extreme perturbative regime (defined above) for which WSINDy recovers the correct model without noise (TPR=1), and we add various levels of Gaussian white noise.We apply WSINDy to the noisy data Z and average results over 100 independent trials.Example trajectories with 10% noise are included in Figures 16-19, with the noisy data Z in red, the clean data Z ⋆ in black, the true reduced data Z µ in blue, and the learned reduced data Ẑµ in green.It should be noted that combined extrinsic noise and extreme intrinsic perturbations presents a significant challenge, and the results in Figure 15 improve greatly under less severe corruption levels.
From Figure 15 we observe similar trends for each example in the low-medium noise regime.For each example the method is robust up until at least σ NR = 10 −1.5 ≈ 0.032, or ≈ 3% noise providing on average TPR > 0.95 and ∆H well below 10%.For larger noise levels, results vary by performance metric and example.The remainder of this section discusses these variations for the 10% noise case, examples of which are given in Figures 16-19.
At 10% noise (σ NR = 0.1), WSINDy is still able to recover the model structure of Examples 1 and 3 very well, with TPR∈ [0.9,1], yet for Example 3 this coincides with large errors ∆H and Z µ .We can see from the dynamics in Figure 18 that although the model is identified correctly, with reasonable qualitative agreement between the green and blue curves, the large value of ∆Z µ is explained by the warped contour followed by the green curve.This is an artifact of errors in the coefficients ŵ compared to w ⋆ due to noise.On the other hand, Example 1 performs well in all three categories TPR, ∆H, ∆Z µ , but has significantly larger errors compared to Z ⋆ , as indicated by ∆Z.This is due to accumulation of errors from a slight phase differences between Z µ and Ẑµ (see Figure 16, right).Correcting for this effect will be essential for utilizing WSINDy in related forward simulations.Overall, the fact that WSINDy still identifies the correct model, given combined extrinsic noise and intrinsic fast-scale dynamics, suggests that the method is well-suited for scientific discovery with these mixed effects.
At 10% noise Example 2 exhibits the lowerest TPR, but this coincides with excellent agreement with H µ 0 as measured by ∆H and ∆Z µ , both with average values less than 4%.Hence, in this case large noise leads to identification of a model with slightly different terms, but still with acceptable accuracy compared to H µ 0 .Figure 15 (bottom right, orange curve) shows that on average the WSINDy model performs nearly as well as the analytical reduced dynamics given by H µ 0 .Since H µ 0 is only a leading-order approximation, a natural next line of inquiry would be to examine the connection between "misspecified" models and next-order corrections.
For Example 4, 10% noise is clearly outside of the feasible recovery regime, with misspecified models (TPR<1) leading to large values of ∆H and significant forward simulations errors ∆Z µ combined with chaotic effects.This can be overcome by considering multiple trajectories (not shown here), which serves to recover TPR = 1, yet accuracy issues with ∆H remain.We conjecture that some form of variance reduction is needed.We aim to investigate the applicability of WENDy (Weak-form estimation of nonlinear dynamics) [18] which has been demonstrated to reduce regression errors due to extrinsic noise.

Conclusions
In this article we have described a weak-form equation learning approach to coarse-graining Hamiltonian systems using the WSINDy algorithm.We have provided substantial evidence that appropriate coarse-grained models with Hamiltonian structure can be identified from noisy data simply by choosing a proper weak formulation.This is significant due to the nontrivial analytical procedures involved in deriving reduced Hamiltonian systems using nearly-periodic reduction techniques.The dictionary learning approach naturally enforces structure preservation, and the method is highly efficient, requiring no forward solves of candidate models.The output is a human-readable equation for the reduced-order Hamiltonian which can then be used in down-stream tasks.
The fact that a suitable weak formulation enables one to identify the reduced-order Hamiltonian over all of phase space using only a single (noisy) trajectory is surprising, and to the best of the authors' knowledge not found in the literature.An obvious next direction is to leverage this in combination with black-box methods (neural networks, Gaussian processes, etc.), and to adapt the test functions to target different models in a hierarchical fashion, as demonstrated in Figures 1 and 3. We aim also in a future work to quantify the sufficiency of information needed to identify the reduced Hamiltonian, as results from degenerate cases (trajectories that only encircle one of the relevant fixed points, see Examples 1 in Figure 10) imply that   the reduced Hamiltonian is accessible from regions of phase with seemingly low levels of information.At the other extreme, combining information from multiple trajectories (as typically done in black-box learning approaches) is an obvious means of increasing performance.Answering questions of data and information sufficiency will be crucial to developing methods of identifying higher-order (in ε) coarse-grained models.Finally, combined with findings in [31], a major implication of our results is that the weak form opens the door to a new generation of coarse-graining algorithms.We aim to explore the use of these algorithms in surrogate modeling to reduce the number of full simulations of dynamics involving disparate scales, which are computationally prohibitive.and let its cumulative sum (taken over the negative wavenumbers) be denoted by For convenience we will assume m is even as the odd case is nearly identical.With high probability, H n will lie below the line 2 where [⋅] 2 indicates the second component of the vector.This process is visualized in Figure 20, and in this work, we let the final cornerpoint be the average k * = 1 2N ∑ 2N n=1 k * n , used for all observed variables.Figure 20 displays that the previous method is more prone to labeling irrelevant modes as signal-dominated (2nd row), but overall the performance of both methods is comparable.

B MSTLS algorithm
To solve for a sparse coefficient vector ŵ such that Gŵ ≈ b, we use the MSTLS algorithm proposed in [22] that uses the original STLS algorithm from [17] with variable thresholding and then performs a line search for the sparsity threshold λ .By heterogeneous thresholding, we mean that instead of employing the typical hard thresholding operator which treats all columns of G equally, we define the variable hard-thresholding operator for specified upper and lower bound vectors U,L ∈ R J (for a J-term library H).This is particular useful for enforcing not only that the coefficients w j stay within a certain range, but also that the term magnitudes ∥G j w j ∥ have a reasonable contribution to the dynamics given by b.
In this work, we define L and U to enforce that ŵ j and ŵ j G j are comparable to the best 1-term solution.That is, denoting the projection operator by P, we define The top and bottom rows display coordinates 1 and 2, the configuration and momentum variables for the first oscillator.The data and DFTs are plotted in the left column, along with markers for the detected cornerpoints, while the right column shows the cumulative sums H n together with the line Y and the piecewise linear approxmation L used in [22] (inset plots show the rotated H which is minimized at k * ).The value k * is the cornerpoint detected by the current method, while k is the value detected by the method in [22].In the top row, the two methods nearly agree, while the more perturbed dynamics in the bottom row cause the k to be significantly larger, leading to more irrelevant modes entering the recovered model.

36/37
We denote by MSTLS(G,b,λ λ λ ) ∶= ŵ the output weight vector in (63).The collection of thresholds λ λ λ can be chosen by the user.Following the strategy seen to be successful in [22], we let λ λ λ be 100 equally log-spaced values from 10 −4 to 1.This process is visualized in Figure 21.

C Initial conditions for forward simulations
In order to measure agreement between forward simulations of H µ 0 and Ĥµ 0 , we first use a data-driven nonlinear least squares approach to find an adequate set of initial conditions.For all time intervals I k = [t 0 ,t k ], k = 2,...,100, we fit a polynomial of degree 2 through the data Z(I k ) and let z(0) be the value of this polynomail at t 0 .We then simulate the system of interest (defined by H µ 0 or Ĥµ 0 ) starting from the selected z(0) for 1/8 of the total time of the available data.The initial condition z(0) that minimizes the error between these partial forward simulations and the data Z over the 99 time intervals I k is chosen for forward simulations.

37/37
Obtain the reduced Hamiltonian vector field X µ Dynamic reduction: (a) Obtain X * ε by averaging the first-order vector field X 0 + εX 1 around the flow of R 0 (b) Notice that for regular values of µ

Figure 2 .
Figure 2. Time series plots of the data and learned systems from Figure 1.The coarse-grained model (red) accurately captures the slow dynamics (left, yellow).

Figure 4 .
Figure 4. Dynamics of Example 1 in reduced phase space, with true dynamics X ε in black, leading-order reduced dynamics X µ 0 in yellow, and learned reduced dynamics X µ 0 in red (initial and final conditions are marked with dots and x's), overlaying contours of the learned reduced Hamiltonian Ĥµ 0 .Left: ε = 0.01 dynamics, right: ε = 0.05.At ε = 0.05 the nonlinear coupling between the fast and slow oscillators results in irregular perturbations to the slow variables.
are each fixed smooth functions.The corresponding equations of motion are

Figure 6 .
Figure 6.Time series of the slow variables for Example 2 with ε = 0.05 (left) along with power spectra (right).The top plots show Q(0) = π/2, and an observable separation of scales (top right).The separation of scales breaks down at Q(0) = 31π/32 (bottom row) as the embedded pendulum dynamics approach the separatrix, as observed in the power spectrum.

Figure 7 .
Figure 7. Dynamics of Example 3 in reduced phase space (x,y) ∈ R 2 .Left: ε = 0.01 dynamics, right: ε = 0.03.On the right one can observe irregular perturbations as the unreduced dynamics (black curve) jump between contours of the electric potential energy ϕ shown in gray.(Coloring conventions are the same as in Figure4).

Figure 9 .
Figure 9. Values of the TPR, ∆H, and ∆Z µ (see eqs. (52)-(56)) statistics in the extreme perturbative regime as a function of the ratio σ φ f = T φ /T f (see eq. (25)) and the learning trajectory (indicated on the y-axes as the z(0) index).

µ 0 .
In the extreme perturbative regime (right column of Figure 10), we recover TPR=1 in all cases except indices {8,9,10,12} of Example 1 and index 10 in Example 3, yet the learned trajectories are still accurate.(Note that the initial conditions (z(0) indices) referred to in Figure 9 correspond to the green dots in Figures 11-14, with increasing z(0) index corresponding to increasing magnitude |z(0)| in Examples 1 and 2 and to z(0) moving diagonally downwards in Examples 3 and 4.)

5. 9 . 3
Forward simulation errors (∆Z µ ,∆Z,∆Z ⋆ ) From Figures 11-14 and Table 3 we observe that in the mild regime, Examples 1-3 yield O(10 −2 ) errors for both ∆Z µ (agreement with H µ 0 ) and ∆Z (agreement with H ε ).Example 4 exhibits much larger forward simulation errors due to the chaotic nature of the dynamics (yet leads to O(10 −2 ) values for ∆H and TPR=1 in all cases).The right columns of Figures 11-14 display ∆Z µ and ∆Z for the extreme perturbative regime, where larger values of ∆Z µ and ∆Z can be observed for Examples 1 and 4 (Figures11 and 14), while Examples 2 and 3 accurately capture both the reduced dynamics of H µ 0 and the full dynamics of H ε .The method performs exceptionally well on Example 2, offering better agreement with H ε than that provided by H µ 0 , which is especially surprising because of the mixing of slow and fast scales (see Figure6).

1 εFigure 10 .
Figure10.TPR values for each example (defined in (52)).Each trajectory in red is simulated from the learned Hamiltonian system Ĥµ 0 and is colored according to its TPR.Green dots indicate initial conditions and the training data is plotted in cyan.The correct model is recovered in the vast majority of cases (TPR = 1), with exceptions still corresponding to accurate learned dynamics (see.Note also that the right column corresponds to the black curves in Figure9.

Figure 11 .Figure 12 .Figure 13 .Figure 14 .Figure 15 .
Figure 11.Error statistics for Example 1.Each trajectory in red is simulated from the learned Hamiltonian system Ĥµ 0 and is colored according to the error metric on the left (defined in (52)-(56)).The training data is plotted in cyan.Green dots indicate initial conditions.Left: results for ε = 0.01, right: results for ε = 0.05.

Figure 16 .. 30 / 37 Figure 17 .
Figure 16.Visualization of noisy data (in red) and recovered model (in green) for Example 1, as well as the clean data Z ⋆ and simulation from the true reduced system H µ 0 .

Figure 18 .. 31 / 37 Figure 19 .
Figure 18.Visualization of noisy data (in red) and recovered model (in green) for Example 3, as well as the clean data Z ⋆ and simulation from the true reduced system H µ 0 .

37 Figure 20 .
Figure 20.Visualization of cornerpoint detection and comparison to the method used in[22].The data Z is taken from Example 2 with ε = 0.05 and Q(0) = 31π 32 .The top and bottom rows display coordinates 1 and 2, the configuration and momentum variables for the first oscillator.The data and DFTs are plotted in the left column, along with markers for the detected cornerpoints, while the right column shows the cumulative sums H n together with the line Y and the piecewise linear approxmation L used in[22] (inset plots show the rotated H which is minimized at k * ).The value k * is the cornerpoint detected by the current method, while k is the value detected by the method in[22].In the top row, the two methods nearly agree, while the more perturbed dynamics in the bottom row cause the k to be significantly larger, leading to more irrelevant modes entering the recovered model.

Figure 21 .S
Figure 21.Example profile of the loss function L(λ ) employed in MSTLS (63) for the data in Figure 16.

Table 1 .
,y 1 ,x 1 ,y 1 ) Algorithmic specifications for examples, including the Hamiltonian libraries H employed in sparse regression, the total number of terms in H, the number of terms required to represent H 1 (or 10% noise), which together with the intrinsic perturbations imparted by the fast scale represents a significant level of corruption.

Table 2 .
Range of ∆H values across initial conditions for each example.

Table
where k max = argmax k∈{−m/2,...,0} |F[Z n ](k)| is the wavenumber of the largest Fourier coefficient.A tie is broken by the maximizing k with the largest magnitude.Moreover, H n will be approximately convex for k ∈ {−m/2,...,k max }.For instance, convexity and H n ≤ Y over this region are both true if F[Z n ] exhibits any decay of the form|F[Z n ](k)| ∼ |k| −s for s > 0.We define the cornerpoint of H n as the point (k * n ,H n (k * n )) that maximizes the distance between H n (k) and Y n (k) over k ∈ {−m/2,...,k max } in the total least squares sense.A practical computation of this is the following.Let R(θ ) be the rotation matrix in the (k,H n ) plane such that That is, R(θ ) rotates Y n parallel to the k-axis.Then