Abstract
Dynamical systems that evolve continuously over time are ubiquitous throughout science and engineering. Machine learning (ML) provides datadriven approaches to model and predict the dynamics of such systems. A core issue with this approach is that ML models are typically trained on discrete data, using ML methodologies that are not aware of underlying continuity properties. This results in models that often do not capture any underlying continuous dynamics—either of the system of interest, or indeed of any related system. To address this challenge, we develop a convergence test based on numerical analysis theory. Our test verifies whether a model has learned a function that accurately approximates an underlying continuous dynamics. Models that fail this test fail to capture relevant dynamics, rendering them of limited utility for many scientific prediction tasks; while models that pass this test enable both better interpolation and better extrapolation in multiple ways. Our results illustrate how principled numerical analysis methods can be coupled with existing ML training/testing methodologies to validate models for science and engineering applications.
Similar content being viewed by others
Introduction
Dynamical systems—systems whose state varies over time—describe many chemical, physical, and biological processes. Thus, understanding and describing these dynamical systems is important for many scientific and engineering applications. Dynamical systems can often be described by differential equations which evolve continuously in time, meaning that the domain of the solution spans a continuum^{1}. In such systems, the gap between any two timesteps can be subdivided into an infinite number of infinitely smaller timesteps. In practice, these systems are often identified via a finite set of discrete observational data, and there is a long history within scientific computing for dealing with this discretetocontinuous gap: experimentally measuring scientific data at sufficiently fine timescales to resolve approximatelycontinuous dynamics of interest; formulating theory within function spaces of sufficient smoothness to guarantee certain continuity requirements; and developing numerical algorithms that come with appropriate stability and convergence guarantees.
Machine learning (ML) techniques have recently been shown to provide a powerful approach to model and learn from discrete data, and many scientific fields make extensive use of datadriven methods for describing^{2,3,4}, discovering^{5,6,7}, identifying^{8, 9}, predicting^{10,11,12,13,14,15,16,17}, and controlling^{2, 18,19,20,21} dynamics. These approaches (see ref. ^{22} for a survey) include purely datadriven methods that learn from observational data points^{23}, adding constraints to ML methods that aim to respect the relevant physics^{24}, and/or hybrid methods combining classical numerical solvers with (say) deep learning^{25, 26}.
In many scientific and engineering applications, we observe measurements that yield a series of discrete data points \(\left\{{x}_{0},\,{x}_{1},\,{x}_{2},\,\ldots ,\,{x}_{N}\right\}\), where each point is spaced apart by some timestep size Δt. There are many techniques from ML and statistical data analysis to learn datadriven inputoutput mappings (G: x_{n} → x_{n+1}) that can provide an approximation for the next discrete timestep. One popular class of datadriven inputoutput mappings is given by neural networks (NNs). A NN, denoted as \({{{{{{{\mathcal{N}}}}}}}}\), can be trained to predict x_{n+1} from x_{n} by learning model parameters θ:
However, when considering continuous dynamical systems, there are challenges with this approach. Most obviously, this approach does not learn a continuous function^{27,28,29,30}; it simply learns a function that predicts subsequent discrete time steps. This is to be expected, as this model is optimized to make (discrete) point estimates, i.e., to predict solutions at specific (discrete) points. For this reason, predicting future states of a dynamical system with this approach can result in compounding errors of the dynamics over time^{19, 31}.
A related approach is to assume that the discrete data points can be modeled and described by a continuous differential equation of the form,
where F is a function that describes the vector field. In some cases, there is an underlying true F, while in other cases it is simply a modeling assumption. A challenge is that we cannot derive F from firstprinciples in many situations. Instead, we can use a datadriven approach for modeling F. For instance, an arbitrary NN architecture \({{{{{{{\mathcal{N}}}}}}}}\) can be used to model the vector field F,
This approach, socalled Neural Ordinary Differential Equations (ODENets), has been proposed to model temporal systems^{32,33,34,35,36,37,38}.
It is often assumed or simply taken for granted that ODENets and other ML methods for ODEs automatically capture some continuous dynamics, either of the system that generated the data or of some related system^{39,40,41,42,43}.
However, due to how ODENets are trained, i.e., to predict solutions at specific (discrete) points, these models can easily fail to learn even the simplest continuous dynamical systems^{44, 45}, even when they accurately fit the temporal discretization (i.e., the discrete training points and testing points). An ODENet that incorrectly learns a continuous model will simply provide highquality discrete time predictions — i.e., it is not a ContinuousNet but is simply a very good DiscreteNet^{44}.
Such a model will fail to extrapolate to new data points outside the temporal discretization, and it will fail to interpolate the solution at timesteps in between the discrete training data. It can also fail to correctly identify qualitative longterm behavior such as bifurcations^{46}. As we demonstrate later, this DiscreteversusContinuous distinction affects nonNN ML methods as well, even when they accurately fit the temporal discretization of the data.
Figure 1 illustrates the difference between a model that has learned to predict discrete data points and a model that has learned an underlying continuous dynamics. After training a model at a given discretization Δt, the trained model can be used to predict trajectories at arbitrary timestep sizes h during inference. For validation, an error metric Error(h) can be defined over a holdout trajectory that allows for evaluation with discretizations that are different than the data spacing. Learning a discreteonly model (a DiscreteNet) means that only the discrete training points—and potentially testing points at the same discretization (i.e., when h = Δt)—are learned. When evaluating using Error(h = Δt), the model will appear to perform well. The model may perform well on testing points with a similar discretization, but it will perform poorly for points sampled with other discretizations; that is, Error(h ≠ Δt) can be much worse than a discreteonly testing methodology would determine. This will even occur when the discretization h → 0, counter to expectations. In contrast, learning a meaningfully continuous model means that the model can converge to a smooth solution as the discretization h → 0, or at least that its error will decrease gradually and level off as h → 0. (This will be true regardless of whether the learned continuous model corresponds to the true underlying model, even assuming that such a true model exists and/or is wellconditioned.) In this case, the model will perform well for a broader range of temporal discretizations and thus have a better approximation of the continuous dynamical system.
In this work, we adapt methods from numerical analysis theory to develop a methodology to verify whether an ML model has learned a meaningfully continuous function that describes a dynamical system of interest. Specifically, we introduce a modified convergence test to verify and validate whether a model has learned continuous dynamics for a physical system. Our method allows us to verify that a model approximates a continuous differential operator, rather than only learning discrete points at a given temporal discretization, in the same sense that discrete algorithms from numerical analysis can be said to approximate continuous functions. We also introduce the notion of a ContinuousNet to refer to an ODENet model that exhibits the convergence properties that are expected for a continuous time system:
Definition 1
(ContinuousNet). An ODENet,
trained with a numerical integration scheme is a ContinuousNet if it is convergent to a similar error as the error obtained by using the original training time step,
This convergence criteria is very similar to that of numerical analysis, whereby convergence is judged as h → 0, and can be evaluated with a similar methodology adapted for the ML setting. The criteria of Eq. (4) represents a heuristic that takes into account the error from the learning process that can be observed in the error on the discreteonly validation task, Error(Δt). (Note that we are not claiming that this method will guarantee that we have learned the true solution—that would require additional assumptions, wellknown in scientific computing—simply that we have learned some underlying continuous model of the data.)
To illustrate the utility of our approach, we demonstrate how meaningfully continuous models that pass our convergence test enable both better interpolation and better extrapolation in multiple ways. We show that such models can resolve finescale features of the solution, despite being trained only on coarse data, including data that are irregularly spaced with nonuniform time intervals; can learn higher resolution solutions through learning continuous temporal dynamics from flow field snapshots; and can correctly predict trajectories starting at different initial conditions on which the model was not trained. We also demonstrate that our convergence test method is generally applicable to ML models. In addition, we derive theoretical error bounds for simple linear ODEs. Our results show promise in bridging between ML methodologies and scientific computing methodologies, by respecting both the fundamentals of ML and the fundamentals of science.
Results and discussion
In this section, we use the convergence test to demonstrate and identify discreteoverfitting of dynamics models.
We start by showing an example of our convergence test on a simple harmonic oscillator system. We then illustrate our convergence test on a variety of different scientific systems, demonstrating that our method can validate whether a trained dynamics model has learned (some) meaningfully continuous dynamics. Next, we show that models that pass this test can predict finescale solutions from coarsely spaced data. This includes: predicting continuous temporal dynamics from flow fields; predicting trajectories starting at initial conditions on which the model was not trained; and predicting finescale solutions from coarse, irregularly spaced data. We then show that overfitting to the temporal discretization affects ML methods more generally than just with ODENets.
Here, we only consider systems which are nonchaotic, nondivergent, and that are not extremely stiff, such that they can be handled by simple explicit RungeKutta integrators. More generally, learning the underlying “true” dynamics would require a test that involves a more sophisticated coupling of numerical and ML methodologies. This is not necessarily needed for many scientific ML tasks, including any of the improvements for predicting finescale solutions from coarsely spaced data that we discuss.
Example convergence method
We demonstrate our convergence test on a toy example. We sample discrete training data points from the linear differential equations describing the harmonic oscillator:
We show the results in Fig. 2. Two ODENets are trained on this data. Here, Fig. 2a–c use the forward Euler integration scheme, while Fig. 2d–f use the RK4 integration scheme. Both the EulerNet (Fig. 2a) and RK4Net (Fig. 2d) use the same linear network architecture \({{{{{{{\mathcal{N}}}}}}}}(x;\theta )=\theta x\) to approximate the ODE. In theory, a linear model can exactly represent the linear ODE; however, we will demonstrate that this does not happen. For the example in Fig. 2, the training data was generated from the analytical solution, spaced apart by the training timestep Δt = 0.1. To measure the performance as a continuous model at inference time, we integrate the model using a range of inference timesteps h. Figure 2b, e plot the results of the convergence test with the EulerNet and the RK4Net.
For the EulerNet, the error when h = Δt (the step size is equal to the temporal spacing in the training data) is very low, but it increases when h decreases. This is in contrast to the classical Euler numerical integration scheme, where the error decreases as h decreases. Thus, these results for EulerNet do not pass the convergence test. In contrast, for the RK4Net, the error decreases as h → 0, and eventually it approaches and levels off at a fixed value. Notably, the error does not increase dramatically as it does with the EulerNet. In this case, the RK4Net has learned the right inductive biases to approximate an underlying continuous dynamics for the system.
We illustrate this further by showing an example trajectory at a specific evaluated h. In this case, both trained ODENets are evaluated at h = 0.01 (a 10 × increase in resolution in comparison to the training data) up to a final timestep. In Fig. 2c, the EulerNet falls off of the true numerical Euler solution. It has clearly not learned the underlying continuous dynamics. In contrast, in Fig. 2f, the RK4Net shows good correspondence with the true numerical RK4 solution.
Four prototypical dynamical systems
We consider canonical scientific dynamical systems: the nonlinear pendulum, the LotkaVolterra equations, the Cartesian pendulum, and the double gyre fluid flow. The first two systems are nonlinear dynamical systems; the Cartesian pendulum is a stiff dynamical system (which is difficult to solve with numerical methods without taking very small timesteps); and the double gyre fluid flow consists of vorticity fields describing a stream function. We provide more details about these dynamical systems in Supplementary Note 1: Details about considered dynamical systems. For each system, we sample data points from either the analytical solution or the numerical solution. The temporal spacing between the discrete data points is denoted as Δt, while the step size used to evaluate a trained ODENet is denoted as h.
Training setup
We train an ODENet with a numerical integration scheme (Euler or RK4) for each system. We use simple feedforward networks with tanh activation functions. See Supplementary Note 2: Model architecture details for details on the architecture used. In every example, the exact same network architecture is used for both the EulerNet and RK4Net, respectively. We also include additional results for ODENets trained on training data spaced apart by different Δt as well as an ODENet trained with the Midpoint numerical integration scheme, in Supplementary Note 3: Additional convergence test examples using ODENets. For the double gyre fluid flow, we use a dynamic autoencoder architecture^{27, 29} to embed the highdimensional input of flow field snapshots in some latent space. Specifically, we replace the linear discrete map in the architecture proposed by^{27} with a linear ODEblock. This means that the model learns to predict the next timestep by integrating forward in latent space (using an Euler or RK4 numerical integration scheme) with step size h = Δt. Finally, the decoder translates the latent space vectors back to the flow field.
Results
The results of our method are shown in Fig. 3. In each case, the EulerNet has low error when h = Δt (i.e., evaluated at the same time spacing as the training data), but it has high error when evaluated at all other h, in particular smaller values of h. Thus, it does not pass the convergence test, and it has not learned a meaningfully continuous dynamics. It is a good discrete model, appropriate for data drawn from the same temporal discretization, but it has overfit to the temporal discretization. In contrast, the error during inference time of the RK4Net steadily decreases when it is evaluated at lower h, eventually converging to a fixed basal level determined by the model and the noise properties of the data. It has passed the convergence test, and it can be said to have learned a meaningfullycontinuous model. We include additional convergence test results in Supplementary Figs. S1, S2, S3, S4.
Interpolation: predicting finescale solutions from coarse training data
Observational, discrete training data are limited in that they are measured at specific timesteps. To obtain a solution for the system inbetween these timesteps, one must retake the data measurements again at finer timesteps. However, selecting a model that has learned meaningfully continuous dynamics should guarantee accurate evaluation at smaller timesteps, despite only training on coarse and/or irregularly spaced temporal data (i.e., measurements taken with large timesteps). By learning continuous dynamics, the trained ODENet model can be evaluated at any point in temporal space, and still yield a low error solution. In this case, one would not need to recollect training data with smaller Δt between data points; the learned ODENet can be used instead. Here, we demonstrate that finescale evaluation is possible by learning continuous temporal dynamics from flow fields for the double gyre flow example.
Results
We consider two models: the EulerNet which did not pass the convergence test, and the RK4Net which did pass the convergence test (see Fig. 3). In Fig. 4, we show the flow field snapshots that result from both models being evaluated at different timesteps. The EulerNet is only able to approximate the true solution at the training data timestep (in this case, h = Δt = 0.5). It cannot match the true solution at the other timesteps, and it gives a poor approximation that does not capture the flow behavior. In contrast, the RK4Net has good correspondence to the true solution even when it is evaluated at timesteps that were not in the training data. Thus, our convergence test method has allowed us to choose a model that can recover finescale solutions of the system, while only having access to coarsescale measurements during training.
Extrapolation: predicting trajectories for new initial conditions
For a given system, temporal trajectories start at some initial condition. Measurements are taken for one trajectory at one initial condition, and then must be taken separately for other trajectories with different initial conditions. Selecting a model that has learned a meaningfully continuous dynamics circumvents this: after training a model on data points sampled from one (or more) trajectories, the model should be able to extrapolate and predict accurate solutions for new initial conditions.
Training setup
We look at the nonlinear pendulum (described in Eq. 27 in Supplementary Note 1: Details about considered dynamical systems). Here, θ is the initial condition representing the position of the pendulum in time. The phase portrait of this system (representing the true solution trajectories), showing \(\frac{{{{{{{{\rm{d}}}}}}}}\theta }{{{{{{{{\rm{d}}}}}}}}t}\) against θ, is shown in Fig. 5a. An EulerNet and an RK4Net are trained on trajectories, spaced apart by Δt = 0.1, starting at certain initial conditions (shown by the black lines in Fig. 5b). We then pick a test set of a number of different initial conditions that were not in the training data. The EulerNet and RK4Net start at these test initial conditions and are both evaluated at a finer h = 0.001, representing a 100 × increase in resolution. Note that we saw in Fig. 3 that the EulerNet did not pass the convergence test (i.e., it had high error when evaluated at h ≪ Δt), while the RK4Net did pass the test.
Results
The results of predicting trajectories starting at different test initial conditions are shown in Fig. 5. The EulerNet is unable to predict these trajectories and quickly falls off of the phase plot lines corresponding to the true solution. In contrast, the RK4Net is able to predict the trajectories, starting at different test initial conditions with good correspondence to the true solution. Thus, we see that it is critical to find a model that passes the convergence test and is able to learn a continuous dynamics to succeed at this extrapolation task.
Irregularly sampled training data
It is typically the case that scientific data collection includes measurements that are taken with some amount of imprecision. For example, the measurement of interest is not always taken at the exact same Δt every time, due to issues such as jitter in the measurement device. Measurements may also be skipped: for example, a measurement is only available at t = Δt and t = 3Δt because the measurement at t = 2Δt was lost or skipped. Thus, reconstructing the correct trajectory when the measurements are nonuniformly spaced is important in numerous science and engineering problems. Here, we look at an example of using the convergence test to correctly select a meaningfully continuous model in the case of the nonlinear pendulum with irregularly spaced temporal data with nonuniform temporal intervals.
Training setup
An example distribution of irregularly sampled training data is shown in Fig. 6a. The baseline Δt is 0.05, subject to jitter and frameskipping errors. An EulerNet and an RK4Net are both trained on these temporal data points, where the values of Δt are input into the integration schemes. (That is, at every given measurement, the timestep jitter was also recorded to use in training.) Both ODENets are then evaluated at a very low h (approximately 100 × lower than the general distribution of the training data points) to generate a time series plot. Note, that we run the convergence test on both ODENets.
Results
The EulerNet quickly falls off of the continuous solution (Fig. 6b). Conversely, the RK4Net follows the continuous solution with good accuracy, including at timesteps not in the training data (Fig. 6c). Thus, it is clear that the RK4Net has learned a meaningfully continuous dynamics while the EulerNet has not. This is confirmed by RK4Net passing the convergence test, but EulerNet not passing it (Fig. 6d). The dip for EulerNet appears at the average Δt in the training data, which is slightly larger than 0.05 due to the measurement noise.
Sparse identification of nonlinear dynamical systems
Here, we demonstrate that ovefitting to the temporal discretization affects ML methods (in the context of dynamical systems) more generally. To illustrate this, we consider the SINDy learning approach, which is a class of methods for system identification^{5}.
The SINDy method uses the following model structure to represent the dynamics,
where Ξ is a matrix of learnable parameters and ϕ(x) is a set of nonlinear basis functions which correspond to potential terms in the underlying system. A linear optimizer is used to fit the parameters Ξ to the data, with a sparsity constraint. The sparsity constraint identifies the subset of relevant basis elements in ϕ(x) to reveal an interpretable dynamics model.
In most real applications, the time derivatives, dx_{n}/dt, cannot be measured directly and instead need to be approximated from the observations x_{n}. The common approach in SINDy is to use finite differences to approximate dx_{n}/dt from the data^{47, 48}. (This treatment of time derivatives, where SINDy differentiates the data, is in contrast to the ODENet method, which integrates the model.) The finite difference operator FD(x_{n…}) is applied over the dataset as a preprocessing step. This yields the following set of N discrete equations, which is optimized for Ξ over all observations n:
where FD(x_{n…}) is a finite difference approximation using the region of points around time index n. Using the series of N equations, Ξ is learned using specialized algorithms designed to seek sparsity, such as LASSO regularization or sequential threshold least squares^{5}. (This is again in contrast with the ODENet method, which uses gradient descent nonlinear optimization.) The discretization order of the finite difference preprocessing is a hyperparameter of SINDy. The first order accurate finite difference (here referred to as FD1) results in a pointwise approximation of,
Note that when rearranged, this is equivalent to the forward Euler integrator:
Higherorder finite difference stencils can also be used to increase the accuracy of the time derivative approximation. The secondorder stencil (FD2) (analogous, but not equivalent, to Midpoint) can be expressed as
and the fourthorder stencil (FD4) (analogous, but not equivalent, to RK4) can be expressed as,
Training Setup
We look at the harmonic oscillator described in Eq. (5), and the nonlinear pendulum (described in Eq. 27 in Supplementary Note 1: Details about considered dynamical systems). We use the PySINDy implementation^{47, 48} to train models on these trajectories. We train three different SINDy models on the trajectories, altering the finite difference (FD) approximation order of accuracy: FD1 is a firstorder twopoint stencil (analogous to EulerNet), FD2 is a secondorder threepoint stencil (analogous to MidpointNet), and FD4 is a fourthorder fivepoint stencil (analogous to RK4Net). The SINDy model is plugged into the convergence test as F, such that the same RungeKutta integrators are used for trajectory prediction. For the harmonic oscillator, the training data is spaced apart by Δt = 0.1, while for the nonlinear pendulum, we look at examples where the training data is spaced apart by Δt = 0.05 and Δt = 0.1.
Results
We run our convergence test on the different SINDy models. The results of our method are shown in Fig. 7. In each case, the FD1 model has low error when h = Δt (i.e., evaluated at the same time spacing as the training data), but it has high error when evaluated at all other h, and especially smaller values of h. Thus, it does not pass the convergence test, and it has not learned a meaningfully continuous dynamics. The FD1 model shows a sharp dip because it is overfit to forward Euler. In this case, the stencil and integrator correspond to the exact same algebraic structure. Similar to when the models where trained via ODENets, we see that FD1 has overfit to the temporal discretization. In contrast, the error during the inference time of the FD4 model steadily decreases when it is evaluated at lower h, eventually converging to a fixed basal level. It has passed the convergence test, and it has learned a meaningfullycontinuous model.
Conclusion
One of the great challenges in scientific ML is to learn continuous dynamics for physical systems—either “the” underlying continuous dynamics, or “a” continuous dynamics that leads to good predictive results for the spatial/temporal regime of interest for the ML model—such that the learned ML model can be trusted to give accurate and reliable results. ML models are trained on discrete points, and typical ML training/testing methodologies are not aware of the continuity properties of the underlying problem from which the data are generated. Here, we have developed a methodology, and we showed that convergence (an important criteria used in numerical analysis) can be used for selecting models that have a strong inductive bias towards learning meaningfully continuous dynamics. Standard ODENet approaches, as well as common SINDy methods, both popular in recent years within the ML community, often do not pass this convergence test. In contrast, models that pass this convergence test have favorable properties. For instance, models that learned underlying continuous dynamics can be evaluated at lower or higher resolutions. Our results suggest that principled numerical analysis methods can be coupled with existing ML training/testing methodologies to deliver upon the promise of scientific ML more generally.
Many more concrete directions are of course raised by our methodology. One direction has to do with developing analogous tests to be used for less wellposed dynamical systems. Such systems are of interest in scientific ML, and such tests will be of greatest interest when one needs to obtain “the” correct underlying continuous solution (e.g., to identify correctly qualitative longterm behavior^{46}), rather than “a” continuous solution, which is often sufficient for ML prediction tasks. Another direction has to do with whether we can develop analogous tests appropriate for adaptive timestepping methods, symplectic integrators, and other commonly used numerical simulation methods such as those for optimal control problems using NNs and associated HamiltonJacobi partial differential equations^{49, 50}. Work subsequent to the posting of the initial technical report version of this paper has addressed the continuousdiscrete equivalence question for learning operators^{51, 52}, and likely our methodology provides a way to operationalize that in practice. A final direction has to do with whether one can obtain strong theoretical results, e.g., MLstyle generalization bounds, to guide the use of methods such as these. Recent theoretical and empirical results suggest that this will be challenging^{53,54,55,56,57}, at least when using traditional approaches to MLstyle generalization bounds. Our success in combining principled numerical analysis methods with existing ML methodologies also leads one to wonder whether we can use a posteriori error bound analysis methods to develop practically useful a posteriori generalization bounds for problems such as those we have considered.
Methods
The basic problem of numerical analysis is to solve problems from continuous mathematics using a discrete computer. The area has a rich history for describing the consistency and convergence behavior of numerical methods for approximating continuous functions^{58, 59}. Here, we expand on the methods we used in Results and Discussion.
Criteria of Classical Numerical Analysis
Given an initial value x(0) = x_{0}, we can discretize Eq. (2) along the node points t_{n} = nΔt for n = 0, 1, …, N by evaluating the following integral equation:
where x_{n} = x(t_{n}), and Δt is the discrete timestep. Typically, we are not able to compute an analytic solution for the integral, and thus we rely on numerical schemes to approximate \(\int\nolimits_{{t}_{n}}^{{t}_{n}+\Delta t}F(x(s))\,{{{{{{{\rm{ds}}}}}}}}\).
There are many different types of numerical integration schemes to approximate the integral in Eq. (12). These have different tradeoffs between computational efficiency and accuracy. One such scheme, the forward Euler discretization, can be written as:
This is a firstorder onestep method, where the global error (the error over all of the timesteps) is proportional to the step size, i.e., \({{{{{{{\mathcal{O}}}}}}}}(\Delta t)\), meaning that the error gets smaller as Δt decreases. There are also higherorder integration schemes. One popular higherorder scheme is the RungeKutta 4 (RK4) discretization, which takes the following form:
Here, the global error is proportional to the step size to the fourth power, i.e., O(Δt^{4}); and thus as Δt gets smaller, the error gets smaller much more quickly than with the forward Euler scheme. In general, the global error can be written as O(Δt^{p}), where p denotes the order of accuracy.
Classical numerical integration typically starts by assuming that there exists a true underlying continuoustime system, which is then replaced by a discretetime problem whose solution approximates that of the continuous problem. However, discretizing the problem introduces an error, and concepts such as stability, convergence, and consistency can be used to quantify the error of the discrete solution^{60,61,62}.
In the following, we describe the error bounds in the traditional scientific computing context where the system dynamics are known exactly, in which case the only approximation error comes from numerical integration in time. We specifically focus on numerical convergence because this will give us a mechanism to analyze ML models. However, note that stability and consistency are also of interest^{59}. Let x(t_{n}) denote the true solution of a dynamical system of interest; and let \({\bar{x}}_{n}^{\Delta t}\) denote a numerical solution after n steps with step size Δt. We use N to denote the maximum number of time steps such that T = t_{N} = NΔt is the final time. Decreasing Δt requires increasing N (the number steps taken to arrive at T), and vice versa. Then, x(T) = x(t_{N}) is the true solution at the final time, while \({\bar{x}}_{N}^{\Delta t}\) is the numerical solution at the final time. Convergence quantifies the global error (the cumulative error of all iterations) of a numerical algorithm.
Definition 2
(Convergent Numerical Approximation). A numerical onestep method for solving \(\frac{{{{{{{{\rm{d}}}}}}}}x(t)}{{{{{{{{\rm{d}}}}}}}}t}=F(x(t))\), with initial condition x(0) = x_{0}, is said to be convergent if and only if the error tends to zero as Δt goes to zero:
Of course, in numerical practice, the error does not converge to zero. Instead, it levels off at some base level determined typically by the level of numerical precision used to describe the data, as observed in 2e.
The specific metric for quantifying the approximation error across the sequence is somewhat arbitrary. Moreover, there is the problem that the numerical method can potentially converge to the wrong solution^{63}. Thus, to ensure that a numerical method is not only convergent but also consistent, one can use the mean error,
or the maximum error across all N points in time,
If, as the step size Δt decreases, the largest absolute error between the numerical solution \({\bar{x}}_{n}^{\Delta t}\) and the exact solution x(t_{n}) also decreases, then the numerical approximation converges towards the solution of the continuous system. In the limit of Δt → 0, the numerical solution converges to the exact solution and the error converges to zero, or to some base level determined by machine precision and numerical roundoff noise.
This convergence criteria is also a test for continuity in the solution: as Δt → 0, the time interval between adjacent numerical solutions (e.g., at t_{n}, x_{n}, and at t_{n+1}, x_{n+1}) also decreases towards zero. Thus, the numerical solution collapses onto a continuous solution as Δt → 0.
Validation of a new integration method involves multiple stages. Consistency, convergence, and stability can be theoretically proven for a rather small class of ODEs (typically only linear ODEs). Thus, the method will be evaluated empirically with a real implementation on a problem of interest. In practice, a convergence test is used, where the numerical integration scheme is used to predict trajectories for a range of Δt and compared to an analytical solution or to an overrefined solution. The errors are verified to approach zero at the correct rate, at least until they flatten out at some base level. The combination of theoretical proof of consistency, stability, and convergence on simple systems such as linear ODEs, combined with the emprical demonstration of convergence on ODEs of interest, is typically viewed as sufficient to vet the method. Emprirical convergence tests are a standard integration test method for scientific programs, e.g., they are regularly run to automatically catch bugs.
A convergence test for ODEnets
We now describe a convergence test, based on the discussed convergence criteria, to validate properties of an ODENet solution. The fact that ODENets are embedded in a numerical integration scheme enables us to use convergence analysis methods that are wellknown for studying classical numerical analysis problems. To start, we know that the numerical integrator itself will be convergent if it is given the true f, but we do not know if the ODENet will be convergent when an approximate ML model \({{{{{{{\mathcal{N}}}}}}}}\) is used to approximate f. The convergence test is used to determine whether an ODENet has learned a meaningfully continuous model for the underlying problem of interest.
Suppose that we are given an ODENet \({{{{{{{\mathcal{N}}}}}}}}\) that is trained with a numerical integration scheme (such as Euler or RK4) from t_{n} to t_{n+1} with stepsize Δt:
Following Definition 2, we compute the global error of the ODENet \({{{{{{{\mathcal{N}}}}}}}}\) as it approaches some fixed value b as the time step h goes to zero:
Here too, in the ML setting, the error does not necessarily converge to zero. This is analogous to the classical numerical analysis setting, where the numerical analysis test typically converges to a nonzero value determined by the numerical roundoff error (e.g., see the floor in Fig. 2e). Unlike in the classical numerical analysis setting, even in the absence of numerical errors the b of an ML model will be greater than zero. For an ODENet, the numerical value of b depends on the model architecture, integration method, the optimizer, and the noise properties of the data^{64}. The value of b will elucidate the convergence properties of the trained ODENet.
Computing an error metric in the ML setting requires additional consideration because we do not necessarily have access to the underlying exact solution at arbitrary points in time. Instead, we are restricted to the information that is provided by a given validation set of discrete data points \({{{{{{{\mathcal{T}}}}}}}}=\left\{{x}_{0},\,{x}_{1},\,{x}_{2},\,\ldots ,\,{x}_{N}\right\}\), where each point is spaced apart by the Δt between observations. A naive metric to compute the global error in this setting is simply to consider the 2norm between the end point of the validation trajectory and the predicted value:
However, this metric is susceptible to noise and edge cases. Computing the error over all points \({{{{{{{\mathcal{T}}}}}}}}\) is difficult using (19) because the inferred trajectory has a different number of points than the validation trajectory. To mitigate this issue, we suggest to compute the global error on a subset of points \({{{{{{{\mathcal{S}}}}}}}}\) from the validation/test trajectories, which is a set of indexes into the original dataset \({{{{{{{\mathcal{T}}}}}}}}\); e.g., \({{{{{{{\mathcal{S}}}}}}}}=\left\{0,9,18\ldots \,\right\}\) for every 9 points spaced by 9Δt. This allows for inferring the trajectory of h computing the error over the subset as follows:
where k is the index into the inferred trajectory corresponding to the index into the validation trajectory n such that \({\bar{x}}_{k}^{k}\) is the point that lines up at the same time as x_{n}. Note that we can only use certain timesteps h during inference because the solution points must align perfectly with those in the subset trajectory.
Given this setup, we use the term ContinuousNet to refer to an ODENet model that also exhibits these convergence properties, as per Definition 1. To evaluate this property, we can apply the same convergence test procedure used in traditional numerical analysis and scientific computing, but with the modifications necessary for it work on training data. Further, it is necessary to apply a weaker heuristic to judge convergence because there will be residual optimization error at Error(Δt), as per Eq. (4). The procedure is as follows.
Given the learned ODENet, we first infer a validation/test trajectory on the original stepsize Δt and evaluate the global error, Error(Δt). This is the standard procedure for evaluating a discrete model, and this error value informs its accuracy at inferring discrete points in a sequence. Then, to further evaluate whether the model is convergent and continuous, we consider a range of h values which are both smaller and larger than the step size Δt used during training. For example, if the ODENet was trained on Δt = 0.1, we evaluate the ODENet on the validation/test trajectory for h ∈ [10^{−3}, . . , 10^{1}]. Specifically, for a given set of inference timesteps {h_{1}, h_{2}, …, h_{p}}, our proposed convergence test iterates over the elements h_{i}, and executes the following two steps:

1.
Evaluate the pretrained ODENet using Eq. (16) on the time interval [t_{0}, t_{T}], using step size h_{i}.

2.
Calculate the error between the ODENet solution \({\bar{x}}_{n}^{{h}_{i}}\) and points in the test data, Error(h_{i}).
Algorithms 1 and 2 in Supplementary Note 4: Algorithm for the Convergence Test summarize this procedure. As with many other numerical convergence tests, our proposed algorithm is subject to a heuristic threshold. In practice, we observe that the difference between a model that passes our test condition and one that does not is pronounced. A noncontinuous model results in an error that is orders of magnitude larger, as compared to a continuous model, when h is taken smaller than Δt of the training data. In other words, our convergence test performs a form of model selection by selecting for models that learn inductive biases towards meaningfully continuous dynamics. In the following, we will demonstrate why convergence is an important consideration for ML model selection. We also discuss the two other criterion of classical numerical analysis, consistency and stability, in Supplementary Note 4: Algorithm for the Convergence Test.
In practice, we can also evaluate our convergence test with different starting points x_{0} and the respective final time step, x_{N}, and then average the error across the different runs. We highly recommend this to ensure that the same behavior occurs irrespective of start and end point.
Error analysis in an idealized learning setting
We provide a theoretical framework for the convergence test, analyzing the discretization error of onestep numerical integration schemes in an idealized setting. For additional details, see Supplementary Note 5: Theoretical Derivations. Specifically, we consider the problem of learning the simple scalar linear ODE,
where \(\lambda \in {\mathbb{R}}\) denotes a scalar parameter, and \(x\in {\mathbb{R}}\) denotes the state at a given point in time. It is well known that the function x(t) = e^{λt}x_{0} is a solution of this system, given the initial condition \(x(t)={x}_{0}\in {\mathbb{R}}\)^{65}. In the following, we assume a similar setting as before: we are given a set of discrete data points \({{{{{{{\mathcal{D}}}}}}}}=\left\{{x}_{0},\,{x}_{1},\,{x}_{2},\,\ldots ,\,{x}_{N}\right\}\) produced by the linear system and spaced by Δt. Our aim is to learn a scalar ODENet model,
parameterized by the learnable weight parameter w. Following the ODENet process, we discretize the model with a numerical integration scheme and then optimize the squared error loss,
We assume that the data are noisefree and can therefore be represented by its analytical solution, x_{n+1} = e^{λΔt}x_{n}. When the loss is optimized, the timediscretization step introduces its own unique source of error into the learning process, one that is independent of noise, numerical error, or optimization error. The error stems from the fact that any onestep consistent numerical integration scheme, when applied to a linear ODE, will result in a truncated Taylor series expansion with p terms, where p is the accuracy order of the scheme^{59}. Thus, the ML model cannot recover the exact parameters of the underlying ODE. The following lemma makes this issue explicit.
Lemma 1
In absence of any other optimization errors, a scalar ODENet can at best obtain a weight parameter w by minimizing Eq. (22) that, for certain timesteps and integrators, satisfies the following polynomial equation,
The proof is given in Supplementary Note 6: Optimization of w does not learn λ. For finite Δt, this equation clearly satisfies w ≠ λ if p ≪ ∞. There are situations where this equation can be solved for values of w that will set the loss in Eq. (22) to zero for all possible data points x_{n}. In the limit as Δt → 0, there is always a solution at w = λ for any p. Moreover, for practical settings there is at least one root when p is odd for any Δt; when p = 2 (RK2) and \(\Delta t\le \log (2)/ \lambda \); or when p = 4 (RK4) and Δt < 1.307/∣λ∣. This equation can be used to find analytical expressions for w for simple integrators; see Supplementary Note 7: Example evaluations of the analytical equations.
From this result, we can characterize the difference between the ML model and the target ODE. In addition, in practice, it is (almost always) only possible to learn a perturbed version, \(\tilde{w}\), of w due to noise, limited numerical precision, and optimization errors. Let ε denote an additive perturbation away from the the optimum of the minimization problem for an observed model due to these sources of errors, \(\tilde{w}=w+\varepsilon\). The following theorem bounds the error between the ML and ODE model, due to the overfitting of Eq. (23) in presence of additive error sources.
Theorem 1
If an optimal weight parameter can be found by Eq. (23), the approximation error introduced by scalar ODENet is bounded by ∣w − λ∣ ≤ c Δt^{p}, where c is a constant proportional to the Lipschitz continuity constants of λx and wx. In the presence of additive numerical error ε, the bound is,
The proof is given in Supplementary Note 8: Approximation errors introduced by scalar ODENets. This bound shows that the user needs to increase the order of accuracy of the training scheme p in order to reduce the error between the learned parameter and the true ODE parameter.
In practice, we can only measure Error(h) using a set of data points. Using the above results, we can analyze the expected behavior of the convergence test by bounding the global error using Eq. (24). Figure 8 plots the global error bound given concrete values of λ, Δt, and ε. As can be seen, the theoretically derived global error for the scalar ODE exhibits the same behavior as the empirical applications of the convergence test. We can use the global error to further derive expected bounds on the the key points of the convergence test.
Corollary 1
When a scalar ODENet is evaluated with the timestep that was used for training, the leading term of the global error is proportional to the optimization error ε for a k ∝ T∣x_{0}∣:
The proof is given in Supplementary Note 9: Observable quantities and the convergence test. The cΔt^{p} error term in Eq. (24) between w and λ is cancelled out, and the observed value is smaller than \( \tilde{w}\lambda \). Therefore, the global error only observes the difference between w and \(\tilde{w}\), resulting from the model optimization error. Note that this value can become very small as the optimization error decreases. By applying the convergence test, we are able to extract an estimate of \( \tilde{w}\lambda \), as given in the following.
Corollary 2
In the limit of decreasing the timestep size during inference, the global error approaches a constant factor (b) based on the bound in Eq. (24). It approaches at a rate of h^{q}, where q is the accuracy order of the ODE integration scheme used at inference time:
The proof is given in Supplementary Note 9: Observable quantities and the convergence test. Given these bounds, we can see how using Eq. (4) as a threshold yields (b − Error(Δt)) ∝ cΔt^{p}. Therefore, the comparison between b and Error(Δt) allows for the quantitative estimation for the magnitude of the term cΔt^{p}, which describes the error that is induced by the numerical discretization scheme used for training.
In summary, our analysis shows that there are two types of errors in the process of learning the dynamics of a linear scalar ODE using ODENets. These errors can be measured using the data by evaluating Error(Δt) and \({\lim }_{h\to 0}{{{{{{{\rm{Error}}}}}}}}(h)\). This illustrates the power of our proposed convergence criterion Eq. (4). Moreover, even if ε is small, it is required to increase the order of accuracy of the training scheme in order to further decrease the error between \(\tilde{w}\) and λ.
Data availability
The source code for the data generation will be made available at https://github.com/a1k12/learningcontinuousphysics.
Code availability
The code will be made available at https://github.com/a1k12/learningcontinuousphysics.
References
Robinson, R. C. An introduction to dynamical systems: continuous and discrete, vol. 19 (American Mathematical Soc., 2012).
Brunton, S. L. & Kutz, J. N.Datadriven science and engineering: Machine learning, dynamical systems, and control (Cambridge University Press, 2019).
Calinon, S., Li, Z., Alizadeh, T., Tsagarakis, N. G. & Caldwell, D. G. Statistical dynamical systems for skills acquisition in humanoids. In 2012 12th IEEERAS International Conference on Humanoid Robots (Humanoids 2012), 323–329 (IEEE, 2012).
Peters, J. R. Machine learning of motor skills for robotics (University of Southern California, 2007).
Brunton, S. L., Proctor, J. L. & Kutz, J. N. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl Acad. Sci. 113, 3932–3937 (2016).
Raissi, M., Perdikaris, P. & Karniadakis, G. E. Multistep neural networks for datadriven discovery of nonlinear dynamical systems. arXiv preprint arXiv:1801.01236 (2018).
Keller, R. T. & Du, Q. Discovery of dynamics using linear multistep methods. SIAM J. Numer. Anal. 59, 429–455 (2021).
Rudy, S., Alla, A., Brunton, S. L. & Kutz, J. N. Datadriven identification of parametric partial differential equations. SIAM J. Appl. Dynamical Syst. 18, 643–660 (2019).
Jin, P., Zhang, Z., Zhu, A., Tang, Y. & Karniadakis, G. E. SympNets: Intrinsic structurepreserving symplectic networks for identifying hamiltonian systems. Neural Netw. 132, 166–179 (2020).
Lutter, M., Ritter, C. & Peters, J. Deep Lagrangian Networks: Using physics as model prior for deep learning. International Conference on Learning Representations (2019).
Chen, Z., Zhang, J., Arjovsky, M. & Bottou, L. Symplectic recurrent neural networks. International Conference on Learning Representations (2019).
Erichson, N. B., Azencot, O., Queiruga, A., Hodgkinson, L. & Mahoney, M. W. Lipschitz recurrent neural networks. International Conference on Learning Representations (2020).
Rusch, T. K., Mishra, S., Erichson, N. B. & Mahoney, M. W. Long expressive memory for sequence modeling. arXiv preprint arXiv:2110.04744 (2021).
Wang, R., Maddix, D., Faloutsos, C., Wang, Y. & Yu, R. Bridging physicsbased and datadriven modeling for learning dynamical systems. In Learning for Dynamics and Control, 385–398 (PMLR, 2021).
Lim, S. H., Erichson, N. B., Hodgkinson, L. & Mahoney, M. W. Noisy recurrent neural networks. Adv. Neural Inform. Processing Sys. 34, 5124–5137 (2021).
Jiahao, T. Z., Hsieh, M. A. & Forgoston, E. Knowledgebased learning of nonlinear dynamics and chaos. Chaos: Interdiscip. J. Nonlinear Sci. 31, 111101 (2021).
Négiar, G., Mahoney, M. W. & Krishnapriyan, A. Learning differentiable solvers for systems with hard constraints. In The Eleventh International Conference on Learning Representations https://openreview.net/forum?id=vdv6CmGksr0 (2023).
Morton, J., Witherden, F. D. & Kochenderfer, M. J. Deep variational Koopman models: Inferring Koopman observations for uncertaintyaware dynamics modeling and control. arXiv preprint arXiv:1902.09742 (2019).
Lambert, N., Amos, B., Yadan, O. & Calandra, R. Objective mismatch in modelbased reinforcement learning. In Proceedings of the 2nd Conference on Learning for Dynamics and Control, vol. 120 of Proc. Machine Learn. Res. 761–770 (PMLR, 2020).
Li, Y., He, H., Wu, J., Katabi, D. & Torralba, A. Learning compositional Koopman operators for modelbased control. In International Conference on Learning Representations. https://openreview.net/forum?id=H1ldzA4tPr (2020).
Bachnas, A., Tóth, R., Ludlage, J. & Mesbah, A. A review on datadriven linear parametervarying modeling approaches: A highpurity distillation column case study. J. Process Control 24, 272–285 (2014).
Karniadakis, G. E. et al. Physicsinformed machine learning. Nat. Rev. Phys. 3, 422–440 (2021).
Manojlović, I. et al. Applications of Koopman mode analysis to neural networks. arXiv preprint arXiv:2006.11765 (2020).
Krishnapriyan, A., Gholami, A., Zhe, S., Kirby, R. & Mahoney, M. W. Characterizing possible failure modes in physicsinformed neural networks. Adv. Neural Inform. Process. Sys. 34 (2021).
BarSinai, Y., Hoyer, S., Hickey, J. & Brenner, M. P. Learning datadriven discretizations for partial differential equations. Proc. Natl Acad. Sci. 116, 15344–15349 (2019).
Pestourie, R., Mroueh, Y., Rackauckas, C., Das, P. & Johnson, S. G. Physicsenhanced deep surrogates for PDEs. arXiv preprint arXiv:2111.05841 (2021).
Erichson, N. B., Muehlebach, M. & Mahoney, M. W. Physicsinformed autoencoders for Lyapunovstable fluid flow prediction. arXiv preprint arXiv:1905.10866 (2019).
Otto, S. E. & Rowley, C. W. Linearly recurrent autoencoder networks for learning dynamics. SIAM J. Appl. Dynamical Syst. 18, 558–593 (2019).
Azencot, O., Erichson, N. B., Lin, V. & Mahoney, M. W. Forecasting sequential data using consistent Koopman autoencoders. International Conference on Machine Learning 475–485 (2020).
Dubois, P., Gomez, T., Planckaert, L. & Perret, L. Datadriven predictions of the Lorenz system. Phys. D: Nonlinear Phenom. 408, 132495 (2020).
Asadi, K., Misra, D., Kim, S. & Littman, M. L. Combating the compoundingerror problem with a multistep model. arXiv preprint arXiv:1905.13320 (2019).
Chen, R. T. Q., Rubanova, Y., Bettencourt, J. & Duvenaud, D. K. Neural ordinary differential equations. Adv. Neural Inf. Process. Syst. 31 (2018).
Ruthotto, L. & Haber, E. Deep neural networks motivated by partial differential equations. J. Math. Imaging Vis. 62, 352–364 (2020).
Queiruga, A., Erichson, N. B., Hodgkinson, L. & Mahoney, M. W. Stateful ODENets using basis function expansions. Adv. Neural Inf. Process. Syst. 34, 21770–21781 (2021).
Massaroli, S., Poli, M., Park, J., Yamashita, A. & Asama, H. Dissecting neural ODEs. Adv. Neural Inf. Process. Syst. 33, 3952–3963 (2020).
Zhang, T. et al. ANODEV2: A coupled neural ODE framework. Adv. Neural Inf. Process. Syst. 32, 5151–5161 (2019).
Weinan, E. A proposal on machine learning via dynamical systems. Commun. Math. Stat. 5, 1–11 (2017).
Rubanova, Y., Chen, R. T. & Duvenaud, D. K. Latent ordinary differential equations for irregularlysampled time series. Adv. Neural Inf. Process. Syst. 32, 5320–5330 (2019).
Greydanus, S. J., Dzumba, M. & Yosinski, J. Hamiltonian neural networks. Adv. Neural Inf. Process. Syst. 32 (2019).
Du, J., Futoma, J. & DoshiVelez, F. Modelbased reinforcement learning for semimarkov decision processes with neural ODEs. Adv. Neural Inf. Process. Syst. 33, 19805–19816 (2020).
Greydanus, S., Lee, S. & Fern, A. Piecewiseconstant neural ODEs. arXiv preprint arXiv:2106.06621 (2021).
Chen, R. T., Amos, B. & Nickel, M. Learning neural event functions for ordinary differential equations. International Conference on Learning Representations (2021).
Jia, J. & Benson, A. R. Neural jump stochastic differential equations. Adv. Neural Inf. Process. Syst. 32, 9847–9858 (2019).
Queiruga, A. F., Erichson, N. B., Taylor, D. & Mahoney, M. W. Continuousindepth neural networks. arXiv preprint arXiv:2008.02389 (2020).
Ott, K., Katiyar, P., Hennig, P. & Tiemann, M. ResNet after all: Neural ODEs and their numerical solution. International Conference on Learning Representations (2021).
RicoMartinez, R., Krischer, K., Kevrekidis, I. G., Kube, M. C. & Hudson, J. L. Discrete vs. continuoustime nonlinear signal processing of Cu electrodissolution data. Chem. Eng. Commun. 118, 25–48 (1992).
de Silva, B. et al. Pysindy: A python package for the sparse identification of nonlinear dynamical systems from data. J. Open Source Softw. 5, 2104 (2020).
Kaptanoglu, A. A. et al. Pysindy: A comprehensive python package for robust sparse system identification. J. Open Source Softw. 7, 3994 (2022).
NakamuraZimmerer, T., Gong, Q. & Kang, W. QRnet: Optimal regulator design with LQRaugmented neural networks. IEEE Control Syst. Lett. 5, 1303–1308 (2021).
Darbon, J., Langlois, G. P. & Meng, T. Overcoming the curse of dimensionality for some HamiltonJacobi partial differential equations via neural network architectures. Res. Math. Sci. 7, 1–50 (2020).
Bartolucci, F. et al. Are neural operators really neural operators? frame theory meets operator learning. Tech. Rep. Preprint: arXiv:2305.19913 (2023).
Raonic, B. et al. Convolutional neural operators for robust and accurate learning of PDEs. Tech. Rep. Preprint: arXiv:2302.01178 (2023).
Martin, C. H. & Mahoney, M. W. Traditional and heavytailed self regularization in neural network models. In Proceedings of the 36th International Conference on Machine Learning, 4284–4293 (2019).
Martin, C. H. & Mahoney, M. W. Heavytailed Universality predicts trends in test accuracies for very large pretrained deep neural networks. In Proceedings of the 20th SIAM International Conference on Data Mining (2020).
Martin, C. H., Peng, T. S. & Mahoney, M. W. Predicting trends in the quality of stateoftheart neural networks without access to training or testing data. Nat. Commun. 12, 1–13 (2021).
Martin, C. H. & Mahoney, M. W. Postmortem on a deep learning contest: a simpson’s paradox and the complementary roles of scale metrics versus shape metrics. arXiv preprint arXiv:2106.00734 (2021).
Hodgkinson, L., Simsekli, U., Khanna, R. & Mahoney, M. W. Generalization bounds using lower tail exponents in stochastic optimizers. International Conference on Machine Learning (2022).
Moin, P.Fundamentals of engineering numerical analysis (Cambridge University Press, 2010).
LeVeque, R. J. & Leveque, R. J.Numerical methods for conservation laws, vol. 132 (Springer, 1992).
Dahlquist, G. Convergence and stability in the numerical integration of ordinary differential equations. Mathematica Scandinavica 33–53 (1956).
Arnold, D. N. Stability, consistency, and convergence of numerical discretizations. Encyclopedia of Applied and Computational Mathematics 1358–1364 (2015).
Kirby, R. M. & Silva, C. T. The need for verifiable visualization. IEEE Computer Graph. Appl. 28, 78–83 (2008).
Thompson, D. B. Numerical methods 101convergence of numerical models. USGS Staff–Published Research 115 (1992).
Bottou, L. & Bousquet, O. The tradeoffs of large scale learning. Adv. Neural Inf. Process. Syst. 20 (2008).
Hirsch, M. W., Smale, S. & Devaney, R. L. Differential equations, dynamical systems, and an introduction to chaos (Academic press, 2012).
ChaitinChatelin, F. & Frayssé, V. Lectures on finite precision computations (SIAM, 1996).
Acknowledgements
We would like to thank Annan Yu and Krishna Harsha Reddy Kothapalli for valuable discussions and feedback. Moreover, we would like to thank all the reviewers for their helpful and constructive feedback. A.S.K. was supported by Laboratory Directed Research and Development (LDRD) funding under Contract Number DEAC0205CH11231 at LBNL and the Alvarez Fellowship in the Computational Research Division at LBNL. M.W.M. would like to acknowledge the DOE, NSF, and ONR for providing partial support of this work. N.B.E. would like to acknowledge support from NSF (DMS2319621), DOE (AC0205CH11231), and NERSC (DEAC0205CH11231). Our conclusions do not necessarily reflect the position or the policy of our sponsors, and no official endorsement should be inferred.
Author information
Authors and Affiliations
Contributions
A.S.K. and A.F.Q. contributed equally. A.S.K., A.F.Q., N.B.E., and M.W.M. all contributed to the manuscript discussion.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Communications Physics thanks Ljupco Todorovski and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Krishnapriyan, A.S., Queiruga, A.F., Erichson, N.B. et al. Learning continuous models for continuous physics. Commun Phys 6, 319 (2023). https://doi.org/10.1038/s42005023014334
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s42005023014334
This article is cited by

Chemical reaction networks and opportunities for machine learning
Nature Computational Science (2023)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.