Abstract
Building reducedorder models (ROMs) is essential for efficient forecasting and control of complex dynamical systems. Recently, autoencoderbased methods for building such models have gained significant traction, but their demand for data limits their use when the data is scarce and expensive. We propose aiding a model’s training with the knowledge of physics using a collocationbased physicsinformed loss term. Our innovation builds on ideas from classical collocation methods of numerical analysis to embed knowledge from a known equation into the latentspace dynamics of a ROM. We show that the addition of our physicsinformed loss allows for exceptional data supply strategies that improves the performance of ROMs in datascarce settings, where training highquality datadriven models is impossible. Namely, for a problem of modeling a highdimensional nonlinear PDE, our experiments show \(\times\) 5 performance gains, measured by prediction error, in a lowdata regime, \(\times\) 10 performance gains in tasks of highnoise learning, \(\times\) 100 gains in the efficiency of utilizing the latentspace dimension, and \(\times\) 200 gains in tasks of farout outofdistribution forecasting relative to purely datadriven models. These improvements pave the way for broader adoption of networkbased physicsinformed ROMs in compressive sensing and control applications.
Similar content being viewed by others
Introduction
Forecasting the behavior of a largescale realworld system directly from first principles often requires solving highlynonlinear governing equations such as highdimensional ordinary differential equations (ODEs) or partial differential equations (PDEs). Highfidelity simulations of such dynamical systems can become intractable, especially if an online control algorithm requires multiple forecasts per second using a lowpowered embedded device^{1,2,3}. A situation like this arises, for example, when a smart heating, ventilation, and air conditioning (HVAC) system attempts to optimize the temperature distribution of the air in a room using only partial measurements^{4,5}. At the time of writing this paper, such systems are incapable of realtime complex simulations, but they can already run lowdimensional pretrained models, which invites the development of highquality reduced order models (ROMs)^{6}. Therefore, ROMs are essential for enabling the design optimization, uncertainty propagation, predictive modeling, and control for such dynamical systems^{1,7,8,9}
In order to enable control of high dimensional dynamical systems, a ROM training method needs to identify a lowdimensional manifold along with dynamics on the manifold that together yield highaccuracy predictions and longterm stability^{10,11}. Most traditional ROMs are projectionbased, e.g. dynamic mode decomposition (DMD)^{8,12} and proper orthogonal decomposition (POD)^{13}, which transform the trajectories of a highdimensional dynamical system into a suitable, and in some sense optimal, lowdimensional subspace. This projection leads to truncation of higher order modes and parametric uncertainties, which result in large prediction errors over time due to the deterioration of the basis functions (spatial modes)^{3}. One challenge for POD methods is their intrusive nature, i.e. requiring access to the solver codes. To overcome this, operator inference approaches^{14,15} utilize SVDbased model reduction and exploit lifting to fit the latent space dynamics data into polynomial, typically quadratic, models. These models, however, are (i) limited in representation power (up to quadratic, e.g. for lift and learn approach) and (ii) require a customtailored SVDbased optimization technique.
In a thrust to overcome these challenges, significant effort has been invested into developing autoencoderbased reducedorder models, as a popular nonlinear ROM technique, which can yield both accurate and stable ROMs^{16,17,18,19}. In practice, however, autoencoderbased ROMs require datasets that densely cover a hypothetical infinite dimensional phase portrait of the dynamical system. Moreover, the large demand for training data significantly limits the use of such models in physics applications where the data can be expensive to obtain.
Another severe challenge of utilizing ROMs comes from their poor outofdistribution performance^{17,20,21}, especially when it is fundamentally impossible for a practitioner to obtain data that covers the entire distribution of possible data inputs. For example, in HVAC applications, one may collect data from a room with two windows but not from a room for every possible number of windows. In atmospheric LiDAR applications, we may conduct experiments on a certain terrain but we can never conduct experiments on all sorts of terrains^{22}. In such situations embedding the knowledge of physics into a model becomes necessary to improve the extrapolation performance, and for which several approaches have recently been proposed. For instance, the seminal works^{23,24} have tried to determine the underlying structure of a nonlinear dynamical system from data using symbolic regression. Recently, Cranmer et al.^{21} employed symbolic regression in conjunction with graph neural network (GNN), while encouraging sparse latent representation, to extract explicit physical relations. They showed that the symbolic expressions extracted from the GNN generalized to outofdistribution data better than the GNN itself. However, symbolic regression also suffers from excessive computational costs, and may be prone to overfitting.
Another example of incorporating physics in ROMs is the use of parametric models at the latent space, e.g. by using the sparse identification of nonlinear dynamics (SINDy)^{18,25}. For instance,^{20,26} used a chainrule based loss that ties latentspace derivatives to the observablespace derivatives for simultaneous training of the autoencoder and the latent dynamics. However, such loss is highly sensitive to noise in the data, especially when evaluating timederivatives with finite differences is required^{27}. Collocationbased enforcement of the physics, i.e. projection of the candidate functions in the governing equations to enforce the chain rule instead of finite difference, could address such numerical difficulties. Recently, Liu et al.^{28} used an autoencoder architecture and Koopman theory to demonstrate that combining autoencoders with enforcing linear dynamics in the latent space may result in an interpretable ROM. However, linearity may not be expressive enough for complex dynamics with multiple basins of attraction^{29}. Finally, recent works on NeuralODE (NODE)^{30,31} show a way to fit an arbitrary nonlinear model (e.g. a network) as a latent space dynamics model, significantly extending the set of models for the latent dynamics that one can train efficiently.
In this paper, we employ autoencoders to perform nonlinear model reduction along with NODE in the latent space to model complex and nonlinear dynamics. We choose Neural ODEs in the latent space dynamics representation because of their ability to model highly nonlinear dynamics, which is especially important when applications limit the size of the latent space dimension. Our goal is to reduce the demand for training data and improve the overall forecasting stability under challenging training conditions. To that end, we build on ideas from classical collocation methods of numerical analysis to embed knowledge from a known governing equation into the latentspace dynamics of a ROM, as described in “Methods” section. In “Experiments” section, we show that the addition of our physicsinformed loss allows for exceptional data supply strategies that improves the performance of ROMs in datascarce settings, where training highquality datadriven models is impossible. We demonstrate that such an approach not only reduces the need for large training datasets and produces highlyaccurate and longterm stable models, but also leads to the discovery of more compact latent spaces, which is especially important for applications in compressed sensing and control.
Methods
Reducedorder model with nonlinear latent dynamics
We consider an autonomous dynamical system on a finite space \(\mathscr {X}\subseteq {\mathbb {R}}^n\) given by
In realworld applications, it is often expensive to solve Eq. (1) directly because x(t) can be very highdimensional. However, a variety of works provided both theoretical^{13} and empirical^{11,32} evidence that many physical systems evolve on a manifold \(\mathscr {Z}\subseteq {\mathbb {R}}^m\) of a lower dimension \(m<< n\). In that space, the dynamics evolve according to a (generally unknown) function \(\varvec{h}(\varvec{z})\):
We call the space \(\mathscr {X}\) an observable space, and \(\mathscr {Z}\) a latent space. When an invertible mapping \(\psi :\ \mathscr {Z}\rightarrow \mathscr {X}\) between the observable and the latent spaces is known, one can predict the dynamics of the system \(\varvec{x}\) at a future time T by projecting the initial condition \(\varvec{x}(0)\) into the latent space, integrating the dynamics in the latent space, and mapping the resulting trajectory back to the observable space:
When \(m<< n\) we refer to the triplet \((\psi , \psi ^{1}, \varvec{h})\) as a ReducedOrder Model (ROM) of \(\varvec{f}\). It is often the case that for a given system \(\varvec{f}\), there exists no ROM \((\psi , \psi ^{1}, \varvec{h})\) such that the relation (3) holds exactly. In this case, we seek an approximation ROM \((\psi _{\theta ^*}, \phi _{\theta ^*}, h_{\theta ^*})\) that minimizes the difference between the data x(t) and the prediction \({\hat{x}}(t)\) over a chosen class of models \((\psi _\theta , \phi _\theta , h_\theta )\) parameterized by \(\theta\).
Multiple realworld applications necessitate using ROMs instead of integrating the relation (1) directly. For example, integrating (1) may be computationally intractable especially on platforms with limited computing capability such as embedded and autonomous devices. For instance, in an HVAC system, solving (1) means solving a Navier–Stokes equation on a fine grid in real time, which exceeds the computing capabilities of currentgeneration appliances. On the other hand, integrating (3) may be cheap when \(m<< n\). Finally, even when solving (1) is possible in real time (e.g. by utilizing a remote cluster), executing control over the resulting model, which is an endgoal for an HVAC system, may still be intractable. Indeed, executing control requires multiple evaluations of (1) for each iteration of control even for the most efficient algorithms known to date^{33}.
Architecture
In this work we model \(\psi\), \(\psi ^{1}\), and \(\varvec{h}\) with fullyconnected neural networks \(\psi _\theta\), \(\phi _\theta\), and \(h_\theta\), respectively. Specifically, the pair (\(\psi\), \(\psi ^{1}\)) is modelled with an autoencoder \((\psi _\theta , \phi _\theta )\), and \(\varvec{h}\) is modelled with a fullyconnected network \(h_\theta\). Figure 1 visualizes the architecture of the model.
Datadriven loss
Similar to prior works^{17,34,35}, we define a datadriven loss \(\mathscr {L}_{data}\) as a sum of reconstruction and prediction losses. The former ensures that \(\phi _\theta\) and \(\psi _\theta\) are inverse mappings of each other, whereas the latter matches the model’s predictions to the available data, as illustrated on Fig. 1.
Formally, for a given set of trajectories \(\varvec{x}_i\), \(i \in [1 \dots k]\), where each trajectory \(\varvec{x}_i \in {\mathbb {R}}^{n \times p}\) is a set of p snapshots that correspond to the recorded states of the system for p timesteps, \(t_j\), \(j \in [1, \dots , p]\), the loss function \(\mathscr {L}^{data}_\theta\) is defined as:
where \(\sigma\) is the standard deviation of the observation noise. We note that each trajectory \(\varvec{x}_i\) may be captured over its own timeframe and may use a distinct, possibly nonuniform, stepsize, in which case the loss function should be modified accordingly [The implementation is affected only in evaluating the integral in (4). This part is handled by torchdiffeq^{36} library, which supports nonuniform timeframes within a batch]. To simplify the notation, without loss of generality, in the rest of the paper we assume that all trajectories are recorded over the same timeframe with the same uniform stepsize. To forecast the behavior of the system in the latent space, we apply the technique of Neural Ordinary Differential Equations (Neural ODEs or NODEs)^{30}, which utilizes the adjoint sensitivity method to backpropagate the gradients through the integral in (4). Neural ODEs have demonstrated a better ability to model highly nonlinear dynamics compared to linear models when the dimensionality of the dynamics variable is limited. This is especially useful in applications where the size of the latent space dimension needs to be small^{16,17,18,19}.
Physicsinformed loss
In their recent work, Liu et al.^{28} proposed a method for utilizing knowledge of the governing equations \(d\varvec{x}/dt = \varvec{f(x)}\) as a finitedimensional approximation of Koopman eigenfunctions for linear latent dynamics. To extend this approach to the nonlinear regime, we note that for a true mapping \(\phi\) the following holds:
On the other hand, by the definition of \(\psi\) and \(\varvec{h}\) we have that
Combining Eqs. (6) and (7) we get that
Equation (8) links the dynamics \(\varvec{h}(\varvec{z})\) and the encoder \(\phi (\varvec{x})\) with the known equation \(\varvec{f}(\varvec{x})\) and is true for all \(z \in \mathscr {Z}\) and \(x \in \mathscr {X}\). Hence, as shown on Fig. 2, knowledge of \(\varvec{f}\) can be assimilated into the model by evaluating Eq. (8) on a set of N carefully sampled points \(\bar{\varvec{x}}_i \in \mathscr {X}\), \(i \in [1, \dots , N]\):
We refer to the points \(\bar{\varvec{x}}_i\) as collocation points.
Collocation points
We define a collocation as pair \((\varvec{\bar{x}},\, \varvec{f}(\varvec{\bar{x}}))\). collocation points are samples from the space \(\mathscr {X}\times Im_{f}(\mathscr {X})\), and they should satisfy three conditions, ordered by importance:

1.
Simplicity \(\varvec{f}(\bar{\varvec{x}}_j)\) should be computationally cheap to evaluate. It is especially important for PDE systems, where \(\varvec{f}\) may involve highorder derivatives.

2.
Representativeness \(\bar{\varvec{x}}_j\) should cover the space of states where one aims to improve the model’s performance or stability. Collocation points that a model might encounter and that are not represented by data snapshots are the best candidates.

3.
Feasibility \(\bar{\varvec{x}}_j \in \mathscr {X}\). In other words, \(x_j\) should be an attainable state of the system. Collocation points outside of \(\mathscr {X}\) may downgrade the performance of the autoencoder by forcing it to be an invertible function on a domain outside of \(\mathscr {X}\).
Thus, an optimal sampling procedure for collocation points \(\varvec{\bar{x}}_j\) is domainspecific and should be designed given a particular system \(\varvec{f}\) and available data \(\varvec{x}_i\). We show examples of how these conditions can be implemented for real systems in the following sections.
The above definition of collocation points is not to be confused with a classic notion of collocation points for finding numerical solutions for differential equations^{37,38}. The classic notion refers to a set of points in time \([t_0, t_0 + c_1h, t_0 + c_2h, \dots , t_0 + h]\), \(0< c1< c2< \dots < 1\) which are chosen to obtain an optimal local interpolant of a solution of a differential equation for a timeperiod between \(t_0\) and \(t_0 + h\). For example, s collocation points for RungeKutta methods are defined to provide an optimal GaussLegendre interpolant of order s; the coefficients \(c_1, \dots , c_s\) come from a respective Butcher table. In contrast, we define collocation points as pairs \((\varvec{\bar{x}},\, \varvec{f}(\varvec{\bar{x}}))\) which are examples of mapping \(x \rightarrow f(x)\). Our definition is built around solving an inverse problem of approximating \(\dot{x} = f(x)\) with \(f_\theta (x)\) and follows a recent work^{28} which develops upon a definition from Ref.^{39} with the difference being the sample space: instead of sampling from the spatiotemporal domain we sample them from an appropriate function space.
Combined loss function
We train the model by optimizing a sum of the physicsinformed loss (9) and the datadriven loss (4):
When \(\omega _1 = \omega _2 = 0\) we have \(\mathscr {L}^{data}_\theta = 0\), so we say that the model is (purely) PhysicsInformed. Similarly, when \(\omega _3 = \omega _4 = 0\) we have \(\mathscr {L}^{physics}_\theta = 0\) and we say that the model is (purely) DataDriven. When \(\omega _i \ne 0, \, \forall i\), we say that the model is Hybrid.
The coefficients \(\omega _i\) are hyperparameters which need to be tuned using a validation dataset. However, in all experiments of this paper we set \(\omega _i\) to be either 0 or 1, and we balance \(\mathscr {L}^{physics}_\theta\) and \(\mathscr {L}^{data}_\theta\) the choice of samples in a batch of training data. Specifically, we set the number of collocation points per batch \(N_{batch}\) to be equal to the number of trajectories per batch \(k_{batch}\) times the number of timestepsT: \(N_{batch} = Tk_{batch}\). In this way both \(\mathscr {L}^{physics}_\theta\) and \(\mathscr {L}^{data}_\theta\) represent the loss for \(Tk_{batch}\) snapshots of the system, providing on average a similar contribution of information to the overall loss function. More laborious approaches of hyperparameter tuning did not yield sufficient systematic advantage to justify the labour compared to this simple strategy.
We use a pytorch^{40} implementation of the Adam algorithm^{41} for optimization. To evaluate \(\nabla _\theta \mathscr {L}^{physics}_\theta\) and \(\nabla _\theta \mathscr {L}^{data}_\theta\) we use torchdiffeq^{36} – a pytorchcompatible implementation of the Neural ODE framework.
To the best of our knowledge, this is the first framework that combines nonlinear latentdynamics (Neural ODE), autoencoders, and a physicsinformed loss term (9). Thus, we call our framework PhysicsInformed Neural ODE, or PINODE.
Experiments
The experiments section is organized as follows. First, to illustrate the ideas behind the framework we study its performance on a highdimensional ODE—a lifted Duffing oscillator. We show how a nonlinear latent dynamics \(\varvec{h}(\varvec{z})\) overcomes the limitations of DMD and Koopman networks from^{28} by handling multiple basins of attraction within one model. We also show that using physicsinformed loss is sufficient for reconstructing the behaviour for basins of attraction that are not represented by the data. Finally, we demonstrate that a purely datadriven model may be highlyaccurate in the shortterm and highly unstable in the longterm, even when the data is abundant, and show that the physicsinformed approach improves longterm stability of such models by multiple orders of magnitude.
Next, we study the framework’s performance on Burgers’ equation. We show that (i) the nonlinear latent dynamics model yields more compact latent space representations than its linear counterpart for the same accuracy; (ii) the compact latent space representations allow for more stable longterm predictions; (iii) in the presence of significant noise in the data, the use of collocation points improves stability by providing an extra source of information that is noisefree, and (iv) in certain scenarios, training only on collocation points yields better models than training on data, even when a vast amount of data is available. The last observation shows that the contribution of the physicsinformed loss (9) may surpass that of the databased loss (4), especially when the data is severely limited or noisy.
Lifted duffing oscillator
A Duffing oscillator is a dynamical system \(d\varvec{z}/dt = \varvec{h}(\varvec{z})\) such that
A phase portrait for 300 randomly sampled trajectories from this system is visualized on Fig. 3, left frame. Depending on the total energy, each trajectory always stays in one of three regions: the left lobe, the right lobe, or the outer area, visualized in red, green, and blue, respectively. To create a synthetic highdimensional system that retains this property, we lift the Duffing trajectories into a higherdimensional space by applying an invertible transformation \(\mathscr {A}(\varvec{z})\):
Hence, for this system \(z \in \mathscr {Z}= {\mathbb {R}}^2\) and \(\varvec{x} \in \mathscr {X}= \text {span}\{A_{:,1}, A_{:,2}\} \subseteq {\mathbb {R}}^{128}\). We treat \(\mathscr {X}\) as an observable space, in which the dynamical system (11) obeys the following:
Thus, we created a highdimensional dynamical system with multiple basins of attraction for which the dynamics \(\varvec{f}\) are known.
For the experiment, we generate 6144 trajectories \(\varvec{x}_i\), \(t=[0, 1]\), \(\Delta t = 0.1\), all taken from the left lobe region (in red). We also sample 50,000 collocation points \(\bar{\varvec{x}}_j\) from the right (green) and the outer (blue) regions each by sampling \(\bar{\varvec{z}}_j \in U\left( [3/2,\, 3/2] \times [1, 1]\right)\) and then applying the transformation (12). For this example the conditions for collocation points discussed in “Methods” section are trivially satisfied.
We train two PINODE models: a DataDriven model that only uses the trajectories, and a Hybrid model that uses both trajectories and collocation points. The models share the same architecture and training parameters that are detailed in Supplementary Appendix A.1. After training, we invert the mapping (12) to project the models’ highdimensional predictions for unseen initial conditions onto the true lowdimensional manifold; those are visualized in Fig. 3.
We make two observations from the results displayed in Fig. 3. First, a purely datadriven model is unable to extrapolate outside its training region using only the data from that region. This observation is consistent with the conclusions from related works^{17} that neural networks interpolate well but struggle with extrapolation tasks. Second, we see that collocation points provided enough extra information for the model to predict nearly perfectly in regions from which no trajectories were provided. This observation suggests that one can use collocation points to “cover the gaps” in data and improve the extrapolation accuracy of the model.
The ability of Neural ODE to model nonlinear dynamics in the latent space is demonstrated in Fig. 4. The figure shows a comparison between the Hybrid PINODE model, the Hybrid PIKN model^{28}, and DMD, all of which have been trained using the same dataset. PIKN differs from PINODE in that it uses linear latent dynamics \(\frac{dz}{dt} = Lz\), where L is a finitedimensional approximation of the Koopman operator, instead of a general nonlinear dynamics operator \(\frac{dz}{dt} = h_\theta (z)\). For PIKN, we set \(z \in {\mathbb {R}}^{16}\), an 8 times expansion of the dimension of the true manifold. We observe in Fig. 4 that PIKN is unable to extrapolate the dynamics to unseen areas correctly using the collocation points: eventually, all trajectories “collapse” onto the same attractor. It can also be seen that DMD shows even worse performance which could be attributed to its linear model reduction.
In the next experiment, we show that collocation points stabilize longterm predictions of the model even when data from all parts of the space are available. To illustrate, we generate a dataset of 6144 trajectories (2048 trajectories per red, green, and blue area) and 50,000 collocation points uniformly distributed among all three lobes. We train three models: DataDriven, PhysicsInformed, and Hybrid versions of PINODE. The relative performance of the three models is evaluated in Fig. 5, where the xaxis represents the test timehorizon as multiples of the training trajectory length T. The yaxis shows box plots of the prediction mean squared error (MSE) corresponding to 300 unseen trajectories within the specific period. For example, \(x = 2T\) represents the timeperiod [2T, 3T), and the yaxis shows the distribution of the prediction errors within the period [2T, 3T). Figure 5 shows that the performance of the DataDriven model degrades quickly when the forecasting timeperiod increases despite its excellent performance when forecasting within its training timeperiod. The PhysicsInformed model starts with modest performance over the training time horizon but maintains a stable performance when forecasting far ahead. The Hybrid model, in its turn, combines both nearterm accuracy with longterm stability, yielding the best results over each time period.
Burgers’ equation
We now study the performance of our framework on Burgers’ equation with \([\pi , \pi ]\)periodic boundary conditions:
where \(u_t\), \(u_x\), and \(u_{xx}\) represent partial derivatives in time, the first, and second spatial derivatives, respectively. Burgers’ equation is a PDE occurring in applications in acoustics, gas and fluid dynamics, and traffic flows^{42}. When \(\nu\) is significantly smaller than one, the system exhibits strong nonlinear behaviour and is called “advectiondominated”, otherwise when \(\nu\) is large the system is called “diffusiondominated”. In the case of the former, linear projection methods such as POD become inaccurate as the true solution space has a slow decaying Kolmogorov nwidth, manifesting itself in slow decaying singular values^{43}. Therefore, in this section we focus on the advectiondominated Burgers’ equation for which we set \(\nu = 0.01\).
To generate trajectories, we discretize the spatial domain \([\pi ,\,\pi ]\) into 128 gridpoints, and solve Eq. (14) for \(t \in [0, 2]\) with \(\Delta t = 0.1\) using a spectral solver^{44}. To generate a diverse set of initial conditions we sum the first 10 harmonic terms with random coefficients:
To generate collocation points we use the same family of functions as we used for the initial conditions in Eq. (15), and additionally randomize the presence of individual frequencies in the sum:
We choose this family of collocation points to meet the conditions (2.5). First, this family is representative of the state space \(\mathscr {X}\times Im_f(\mathscr {X})\) in the region of interest (moving wavefronts). Second, (16) is a smooth set of functions that does not contain unattainable states. Finally, and more importantly, the values \(u_x\) and \(u_{xx}\) and, consequently \(u_t\) can be computed analytically, which makes it especially cheap to sample large numbers of collocation points.
Compressibility of the latent space
In “Lifted duffing oscillator” section, we showed that a nonlinear finitedimensional latent dynamics model can be necessary for building a compact ROM for the highdimensional lifted Duffing system. That is not necessarily the case for Burgers’ equation since there exists the ColeHopf transformation that linearizes the dynamics for Burgers’ equation. However, a latentspace nonlinearity can, in principle, be utilized for finding a more compact latent space representation, or for increasing the forecast accuracy for a fixed latent space dimension. In this section, we demonstrate how PINODE can achieve both goals.
For this experiment we generate 16,384 trajectories as described in (15). We also generate 100,000 collocation points as described in (16). The purpose of using such a large amount of data is to allow the trained models to achieve the best performance for the specified latent space dimension. We evaluate the performance of the models on test data with two different timeframes: (1) same as that of training data (interpolation), and (2) two times longer than that of the training data (extrapolation). More details on the experimental setup are provided in Supplementary Appendix A.4.
In Fig. 6, we compare the performance of the three models: DMD, PIKN Hybrid, and PINODE Hybrid. First, we notice that DMD does not perform well on the test data, despite achieving a training loss (\(\sim 10^{3}\)). This observation is consistent with earlier works^{8,45}; and illustrates well that a combination of a linear encoder and a linear latent dynamics operator may not be sufficient for modelling highlynonlinear phenomena. Second, we notice that PINODE achieves better performance for a given latent space dimension compared to PIKN. For instance, for \(m = 16\) (Fig. 6, left pane), PINODE achieves \(\sim 5\) times lower mean squared error than PIKN, which achieves the same performance only when \(m = 512\). More importantly, PINODE maintains a low prediction error over a longerterm horizon (extrapolation in time), which is not the case for PIKN (Fig. 6, center pane). This is a consequence of the latentdynamics matrix (\(h(z) = Lz\)) of PIKN having eigenvalues with positive real parts, which implies longterm instability (Fig. 6, right pane). Although there has been progress in the literature^{46}, further research is needed to understand (i) how to enforce stability constraints for PIKN, and (ii) why one does not need the same enforcement for PINODE to exhibit stable behaviour.
Training in lowdata regime with collocation points
In the next experiment, we study the relative efficiency of using collocation points against using data in a lowdata regime. It is frequently the case that only a small number of simulations (or measurements) can be obtained for a physical system of interest due to the computational, time, or budget constraints. We would like to compensate the lack of sufficient data with providing collocation points which are considerably cheaper to generate. In this section, we show that, when chosen appropriately, collocation points can be effectively used for training a model in the lowdata regime, and their contribution to a model’s accuracy may even surpass the contribution of the data.
To illustrate the tradeoff between data and collocations, we train one model using varying combinations of the number of trajectories vs collocation points in their training datasets. To gauge the extrapolation power of our models, we use trajectories with three types of initial conditions: “harmonic”, “bellcurve”, and “bumps” (see Fig. 7 for illustrations). We generate 1024 trajectories with “bumps” initial conditions for the training data, and use the harmonic family of initial conditions as described in (16) for generating the training collocations. We use two test datasets: (1) 100 trajectories with “bump” ICs to assess withindistributuion performance, left frame), and (2) a mix of trajectories with “bump”, “bellcurve”, and “harmonic” initial conditions, 100 trajectories each, to assess outofdistribution performance. All test data trajectories are two times longer than the training trajectories. More details on the experimental setup are provided in Supplementary Appendix A.5. Figure 8 presents the reconstruction MSE of the test datasets obtained from a PINODE models that were trained on varying combinations of trajectories and collocation points as a percentage of the MSE achievable by a PINODE model that was trained on the full 1024 trajectories alone (no collocations). The PINODE models all use a latent space dimension \(m=16\).
Figure 8 demonstrates that adding collocation points consistently improves the model performance in our experiments. Moreover, when a sufficient number of collocation points is added in training, the model with fewer training trajectories was always able to outperform the model that was trained on all the available trajectories and no collocations. On average, a collocationaided model was 5 times better at both withindistribution and outofdistribution reconstruction relative to a purely datadriven version of the model. In addition, we noticed that a model that used only collocation points can perform better than a datarich model, especially when predicting the dynamics of the unseen initial conditions (Fig. 8, right pane, topright vs bottomleft corner).
We also notice that the Hybrid models yield more stable and accurate predictions, relative to their purely datadriven counterparts, when forecasting far beyond the training timeperiod. In Fig. 9 we visualize the predictions for a test IC for two models: DataDriven model from the bottomleft corner of Fig. 8, and a Hybrid model from the bottomright corner of Fig. 8. The red line separates the timeperiod of training from the timeperiod of forecasting. The hybrid model’s errors stay below \(10^{2}\) even when forecasting 10 times farther than what it was trained on. In contrast, the DataDriven model shows low errors within its training timeregion but the forecast errors grow quickly when forecasting beyond that.
Finally, we observe that using collocation points can benefit other models, like DMD and PIKN. To illustrate, we replicate the experiments from Fig. 8 where the number of trajectories is 256 and with Bump ICs for PINODE, PIKN, and DMD. Figure 10 shows the root mean squared error (RMSE) for the test data predictions as a function of the number of collocation points that were used in training. The figure illustrates the prediction error for increasing prediction horizons going from left to right, and demonstrates that in all cases, PINODE benefits from the available collocation points. The leftmost panel shows that every model improves its onestepahead predictions, with DMD quickly achieving nearoptimal performance. However, once the forecast horizon is increased to 20 timesteps ahead (length of the training trajectories) and above, DMD failed to correctly forecast the longterm trajectories and was removed from those figure to improve legibility. The PIKN models improved the onestepahead (1st pane) and interpolation performance (2nd pane) by a factor of 4. It also improved the extrapolation performance for 40steps prediction (3rd pane) but failed to extrapolate for 80 steps (4th pane, removed for legibility). We attribute this behavior of PIKN to the possibility that the latent dynamics operator of PIKN contains positive eigenvalues despite the use of collocation points.
Robustness to noise in the lowdata regime
In this section we show that the use of collocation points improves the ROMs’ robustness to noise in the data by providing an alternative, noisefree, source of information.
For this experiment, we use the Burgers’ equation dataset containing 1024 trajectories with “bump” initial conditions, and 65,536 “harmonic” collocation points as defined in Eq. (16). We then add i.i.d. Gaussian noise to the trajectories, with variance ranging from \(\sigma = 10^{4}\) to \(\sigma = 10\). For reference, most of the data values lie between 0 and 1, so a noise level with \(\sigma > 1\) dominates the data. We train four models: PINODE Hybrid, PINODE DataDriven, PINODE PhysicsInformed, and DMD. To measure the models’ outofdistribution prediction errors, we use the test dataset with Bump, Gaussian, and Harmonic initial conditions, as described in the previous subsection. The prediction errors are displayed in Fig. 11, left pane. The prediction error of a purely PhysicsInformed model (in red) is flat because the collocation points are noisefree.
Figure 11 shows that in the high noise setting, the error of purely datadriven models (DMD and PINODE DataDriven) grows unbounded, whereas the performance of the hybrid model converges to the performance of the PhysicsInformed model as the noise level increases. We hypothesise that such behavior is due to the second part (\(\mathscr {L}_{\theta }^{data}\)) of the combined loss (Eq. 10) turns into noise, and so its derivative also turns into noise.
Thus, one can think about optimizing a hybrid model (10) as about training a PhysicsInformed model (9) using a noisy gradient descent with a fixedvariance noise. From the optimization literature^{47,48,49} we know that, under certain conditions, such SGD converges to a neighbourhood of a local minimum of its loss (in this case \(\mathscr {L}_{\theta }^{physics}\)) with high probability. So instead of diverging, a hybrid model turns into a PhysicsInformed model; where the latter works as a performance safeguard in the highnoise regime. On the right handside of Fig. 11, we show an example of the prediction performance of each of the models described above. The datadriven and hybrid models yield visually similar solutions when \(\sigma = 10^{3}\). However, the former provides inadequate performance when the data is dominated by noise, whereas a hybrid model in this regime produces a solution that is visually similar to the one that the PhysicsInformed model produces. A more rigorous analysis of this phenomenon seems possible but lies outside of the scope of this paper.
Discussion and conclusions
In this work, we demonstrated how a collocation pointbased technique can improve the performance of an emerging class of continuoustime physicsinformed neuralnetwork based reducedorder models. First, we demonstrated that the incorporation of collocation points in training data can “cover the gaps” in training trajectories and inform the model about underrepresented basins of attraction. Such an approach alleviates the demand for large volumes of data that is common in networkbased models, which is crucial in applications where data is scarce and expensive. Second, the physicsinformed loss may work as a safeguard, providing a noisefree source of underlying dynamics. Third, collocation points can stabilize the model’s longterm predictions, allowing for accurate forecasting far beyond the training time horizon. Finally, together with using a NODEbased nonlinear latent dynamics, adding physicsinformed loss leads to the discovery of more compact latent space representations that also yield more accurate models. Simultaneous stability and compactness is especially important if one aims to use models together with compressive sensing and control algorithms. With respect to the computational complexity, we note that adding Tk collocation points to the training imposes less of a computational burden than adding k data trajectories because collocation points do not require computing integrals forward in time as in the case of data trajectories.
One clear limitation of the current work is that the choice of an efficient collocation family is a design decision that a practitioner makes. The authors believe that such decisions can be automated by adopting existing approaches from classic works on numerical approximations of PDEs, which we leave for future research. Another automation that prompts future research is deriving efficient ways of sampling collocation points, possibly via applying modern adaptive learning techniques^{50}. Finally, although “Robustness to noise in the lowdata regime” section provides some rationale for why one may expect robustness of Hybrid models under noise, the authors believe that a more rigorous analysis is possible; particularly one that provides conditions under which such robustness is guaranteed.
Data availability
The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.
References
Rowley, C. W. & Dawson, S. T. Model reduction for flow analysis and control. Annu. Rev. Fluid Mech. 49, 387–417 (2017).
Lucia, D. J., Beran, P. S. & Silva, W. A. Reducedorder modeling: New approaches for computational physics. Prog. Aerosp. Sci. 40, 51–117 (2004).
Benner, P., Gugercin, S. & Willcox, K. A survey of projectionbased model reduction methods for parametric dynamical systems. SIAM Rev. 57, 483–531 (2015).
Farahmand, A.M., Nabi, S., Grover, P. & Nikovski, D. N. Learning to control partial differential equations: Regularized fitted qiteration approach. In 2016 IEEE 55th Conference on Decision and Control (CDC) 4578–4585 (IEEE, 2016).
Nabi, S., Grover, P. & Caulfield, C. Robust preconditioned oneshot methods and directadjointlooping for optimizing Reynoldsaveraged turbulent flows. Comput. Fluids 238, 105390 (2022).
Otterness, N. et al. An evaluation of the nvidia tx1 for supporting realtime computervision workloads. In 2017 IEEE RealTime and Embedded Technology and Applications Symposium (RTAS) 353–364 (IEEE, 2017).
Brunton, S. L. & Kutz, J. N. DataDriven Science and Engineering: Machine Learning, Dynamical Systems, and Control (Cambridge University Press, 2022).
Kutz, J. N., Brunton, S. L., Brunton, B. W. & Proctor, J. L. Dynamic Mode Decomposition: DataDriven Modeling of Complex Systems (SIAM, 2016).
Jones, D., Snider, C., Nassehi, A., Yon, J. & Hicks, B. Characterising the digital twin: A systematic literature review. CIRP J. Manuf. Sci. Technol. 29, 36–52 (2020).
Ahmed, S. E. et al. On closures for reduced order models—A spectrum of firstprinciple to machinelearned avenues. Phys. Fluids 33, 091301 (2021).
Noack, B. R., Morzynski, M. & Tadmor, G. ReducedOrder Modelling for Flow Control Vol. 528 (Springer, 2011).
Tu, J. H., Rowley, C. W., Luchtenburg, D. M., Brunton, S. L. & Kutz, J. N. On dynamic mode decomposition: Theory and applications. Preprint at http://arxiv.org/abs/1312.0041 (2013).
Holmes, P., Lumley, J. L., Berkooz, G. & Rowley, C. W. Turbulence, Coherent Structures, Dynamical Systems and Symmetry (Cambridge University Press, 2012).
Qian, E., Kramer, B., Peherstorfer, B. & Willcox, K. Lift & learn: Physicsinformed machine learning for largescale nonlinear dynamical systems. Physica D 406, 132401 (2020).
Peherstorfer, B. & Willcox, K. Datadriven operator inference for nonintrusive projectionbased model reduction. Comput. Methods Appl. Mech. Eng. 306, 196–215 (2016).
Lee, K. & Carlberg, K. T. Model reduction of dynamical systems on nonlinear manifolds using deep convolutional autoencoders. J. Comput. Phys. 404, 108973 (2020).
Gin, C., Lusch, B., Brunton, S. L. & Kutz, J. N. Deep learning models for global coordinate transformations that linearise pdes. Eur. J. Appl. Math. 32, 515–539 (2021).
Champion, K., Lusch, B., Kutz, J. N. & Brunton, S. L. Datadriven discovery of coordinates and governing equations. Proc. Natl. Acad. Sci. 116, 22445–22451 (2019).
Kim, B. et al. Deep fluids: A generative network for parameterized fluid simulations. In Computer Graphics Forum Vol. 38 (eds Hauser, H. & Alliez, P.) 59–70 (Wiley, 2019).
Fries, W. D., He, X. & Choi, Y. Lasdi: Parametric latent space dynamics identification. Comput. Methods Appl. Mech. Eng. 399, 115436 (2022).
Cranmer, M. et al. Discovering symbolic models from deep learning with inductive biases. Adv. Neural. Inf. Process. Syst. 33, 17429–17442 (2020).
Nabi, S. et al. Improving lidar performance on complex terrain using cfdbased correction and directadjointloop optimization. J. Phys. Conf. Ser. 1452, 012082 (2020).
Bongard, J. & Lipson, H. Automated reverse engineering of nonlinear dynamical systems. Proc. Natl. Acad. Sci. 104, 9943–9948 (2007).
Schmidt, M. & Lipson, H. Distilling freeform natural laws from experimental data. Science 324, 81–85 (2009).
Brunton, S. L., Proctor, J. L. & Kutz, J. N. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl. Acad. Sci. 113, 3932–3937 (2016).
He, X., Choi, Y., Fries, W. D., Belof, J. & Chen, J.S. glasdi: Parametric physicsinformed greedy latent space dynamics identification. Preprint at http://arxiv.org/abs/2204.12005 (2022).
Delahunt, C. B. & Kutz, J. N. A toolkit for datadriven discovery of governing equations in highnoise regimes. IEEE Access 10, 31210–31234 (2022).
Liu, Y., Sholokhov, A., Mansour, H. & Nabi, S. Physicsinformed koopman network. Preprint at http://arxiv.org/abs/2211.09419 (2022).
Page, J. & Kerswell, R. R. Koopman mode expansions between simple invariant solutions. J. Fluid Mech. 879, 1–27 (2019).
Chen, T. Q., Rubanova, Y., Bettencourt, J. & Duvenaud, D. K. Neural ordinary differential equations. In Advances in Neural Information Processing Systems 6571–6583 (2018).
Rackauckas, C. et al. Universal differential equations for scientific machine learning. Preprint at http://arxiv.org/abs/2001.04385 (2020).
Chen, B. et al. Discovering state variables hidden in experimental data. Preprint at http://arxiv.org/abs/2112.10755 (2021).
Duriez, T., Brunton, S. L. & Noack, B. R. Machine Learning ControlTaming Nonlinear Dynamics and Turbulence Vol. 116 (Springer, 2017).
Takeishi, N., Kawahara, Y. & Yairi, T. Learning koopman invariant subspaces for dynamic mode decomposition. Adv. Neural Inf. Process. Syst. 30, 1 (2017).
Morton, J., Witherden, F. D. & Kochenderfer, M. J. Deep variational koopman models: Inferring koopman observations for uncertaintyaware dynamics modeling and control. Preprint at http://arxiv.org/abs/1902.09742 (2019).
Chen, R. T., Rubanova, Y., Bettencourt, J. & Duvenaud, D. K. Neural ordinary differential equations. Adv. Neural Inf. Process. Syst. 31, 1 (2018).
Fornberg, B. A Practical Guide to Pseudospectral Methods 1 (Cambridge University Press, 1998).
Trefethen, L. N. & Bau, D. Numerical Linear Algebra Vol. 181 (Siam, Berlin, 2022).
Raissi, M. & Karniadakis, G. E. Hidden physics models: Machine learning of nonlinear partial differential equations. J. Comput. Phys. 357, 125–141 (2018).
Paszke, A. et al. Pytorch: An imperative style, highperformance deep learning library. In Advances in Neural Information Processing Systems 32, 8024–8035 (Curran Associates, Inc., 2019).
Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. Preprint at http://arxiv.org/abs/1412.6980 (2014).
Burgers, J. M. A mathematical model illustrating the theory of turbulence. Adv. Appl. Mech. 1, 171–199 (1948).
Peherstorfer, B. Breaking the Kolmogorov barrier with nonlinear model reduction. Not. Am. Math. Soc. 69, 725 (2022).
Trefethen, L. N. Spectral Methods in MATLAB (SIAM, 2000).
Kalur, A., Nabi, S. & Benosman, M. Robust adaptive dynamic mode decomposition for reduce order modelling of partial differential equations. In 2021 American Control Conference (ACC) 4497–4502 (IEEE, 2021).
Kojima, R. & Okamoto, Y. Learning deep inputoutput stable dynamics. In Advances in Neural Information Processing Systems (2022).
Friedlander, M. P. & Schmidt, M. Hybrid deterministicstochastic methods for data fitting. SIAM J. Sci. Comput. 34, A1380–A1405 (2012).
Patel, V., Tian, B. & Zhang, S. Global convergence and stability of stochastic gradient descent. Preprint at http://arxiv.org/abs/2110.01663 (2021).
Shapiro, A., Dentcheva, D. & Ruszczynski, A. Lectures on Stochastic Programming: Modeling and Theory (SIAM, 2021).
Subramanian, S., Kirby, R. M., Mahoney, M. W. & Gholami, A. Adaptive selfsupervision algorithms for physicsinformed neural networks. Preprint at http://arxiv.org/abs/2207.04084 (2022).
Acknowledgements
The authors greatly benefited from the discussions with Dr. J. Nathan Kutz and Dr. Steven Brunton, who provided their expertise and assistance.
Author information
Authors and Affiliations
Contributions
All authors reviewed the manuscript and contributed to the conceptualization. A.S. drafted the original manuscript and conducted the experiments. A.S. and S.N. conceived the experiments and analyzed the results.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Sholokhov, A., Liu, Y., Mansour, H. et al. Physicsinformed neural ODE (PINODE): embedding physics into models using collocation points. Sci Rep 13, 10166 (2023). https://doi.org/10.1038/s41598023367996
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598023367996
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.