Closed-form Continuous-time Neural Networks

the paper

One Sentence Summary: We find an approximate closed-form solution for the interaction of neurons and synapses and build a strong artificial neural network model out of it.

Main Text:
Continuous neural network architectures built by ordinary differential equations (ODEs) (2) opened a new paradigm for obtaining expressive and performant neural models.
These models transform the depth dimension of static neural networks and the time dimension of recurrent neural networks into a continuous vector field, enabling parameter sharing, adaptive computations, and function approximation for non-uniformly sampled data.These continuous-depth (time) models have shown promise in density estimation applications (3)(4)(5)(6), as well as modeling sequential and irregularly-sampled data (1,(7)(8)(9).
While ODE-based neural networks with careful memory and gradient propaga-tion design (9) perform competitively with advanced discretized recurrent models on relatively small benchmarks, their training and inference are slow due to the use of advanced numerical DE solvers (10).This becomes even more troublesome as the complexity of the data, task and state-space increases (i.e., requiring more precision) (11), for instance, in open-world problems such as medical data processing, self-driving cars, financial time-series, and physics simulations.
The research community has developed solutions for resolving this computational overhead and for facilitating the training of neural ODEs, for instance, by relaxing the stiffness of a flow by state augmentation techniques (4,12), reformulating the forwardpass as a root-finding problem (13), using regularization schemes (14)(15)(16), or improving the inference time of the network (17).
In this paper, we take a step back and propose a fundamental solution: we derive a closed-form continuous-depth model that has the rich modeling capabilities of ODE-based models and does not require any solver to model data (see Figure 1).
The proposed continuous neural networks yield significantly faster training and inference speeds while being as expressive as their ODE-based counterparts.We provide a derivation for the approximate closed-form solution to a class of continuous neural networks that explicitly models time.We demonstrate how this transformation can be formulated into a novel neural model and scaled to create flexible, highly performant and fast neural architectures on challenging sequential datasets.
Deriving an Approximate Closed-form Solution for Neural Interactions.Two neurons interact with each other through synapses as shown in Figure 1.There are three principal mechanisms for information propagation in natural brains that are abstracted away in the current building blocks of deep learning systems: 1) neural dynamics are typically continuous processes described by differential equations (c.f., dynamics of x(t) in Figure 1), 2) synaptic release is much more than scalar weights; it involves a nonlinear transmission of neurotransmitters, the probability of activation of receptors, and the concentration of available neurotransmitters, among other nonlinearities (c.f., S(t) in Figure 1), and 3) the propagation of information between neurons is induced by feedback and memory apparatuses (c.f.I(t) stimulates x(t) through a nonlinear synapse S(t) which also has a multiplicative difference of potential to the postsynaptic neuron accounting for a negative feedback mechanism).Liquid time-constant (LTC) networks (1), which are expressive continuous-depth models obtained by a bilinear approximation (18) of neural ODE formulation (2) are designed based on these mechanisms.Correspondingly, we take their ODE semantics and approximate a closed-form solution for the scalar case of a postsynaptic neuron receiving an input stimuli from a presynaptic source through a nonlinear synapse.
To this end, we apply the theory of linear ODEs (19) to analytically solve the dynamics of an LTC differential equation shown in Figure 1.We then simplify the so- lution to the point where there is one integral left to solve.This integral compartment, t 0 f (I(s))ds in which f is a positive, continuous, monotonically increasing, and bounded nonlinearity, is challenging to solve in closed-form; since it has dependencies on an input signal I(s) that is arbitrarily defined (such as a real-world sensory readouts).To approach this problem, we discretize I(s) into piecewise constant segments and obtain the discrete approximation of the integral in terms of sum of piecewise constant compartments over intervals.This piecewise constant approximation inspired us to introduce an approximate closed-form solution for the integral t 0 f (I(s))ds that is provably tight when the integral appears as the exponent of an exponential decay, which is the case for LTCs.We theoretically justify how this closed-form solution represents LTCs' ODE semantics and is as expressive (see Figure 1).Explicit Time Dependency.We then dissect the properties of the obtained closedform solution and design a new class of neural network models we call Closed-form Continuous-depth networks (CfC).CfCs have an explicit time dependency in their formulation that does not require an ODE solver to obtain their temporal rollouts.Thus, they maximize the trade-off between accuracy and efficiency of solvers (See Table 1).
CfCs perform computations at least one order of magnitude faster training and inference time compared to their ODE-based counterparts, without loss of accuracy.

Deriving a Closed-form Solution
In this section, we derive an approximate closed-form solution for liquid time-constant (LTC) networks, an expressive subclass of time-continuous models.We discuss how the scalar closed-form expression derived from a small LTC system can inspire the design of CfC models.
The hidden state of an LTC network is determined by the solution of the initialvalue problem (IVP) given below (1): where x(t) defines the hidden states, I(t) is the input to the system, w τ is a timeconstant parameter vector, A is a bias vector, and f is a neural network parametrized by θ.
Theorem 1.Given an LTC system determined by the IVP (1), constructed by one cell, receiving a single dimensional time-series input I with no self connections, the following expression is an approximation of its closed-form solution: Proof.In the single-dimensional case, the IVP (1) becomes linear in x as follows: Therefore, we can use the theory of linear ODEs to obtain an integral closed-form solution (19, Section 1.10) consisting of two nested integrals.The inner integral can be eliminated by means of integration by substitution (21).With this, the remaining integral expression can be solved in the case of piecewise constant inputs and approximated in the case of general inputs.The three steps of the proof are outlined below.
Integral closed-form solution of LTC.We consider the ODE semantics of a single neuron that receives some arbitrary continuous input signal I and has a positive, bounded, continuous, and monotonically increasing nonlinearity f : Assumption.We assumed a second constant value w τ in the above representation of a single LTC neuron.This is done to introduce symmetry on the structure of the ODE, hence being able to apply the theory of linear ODEs for solving the equation analytically.
By applying linear ODE systems theory (19, Section 1.10), we obtain: To resolve the double integral in the equation above, we define and observe that d ds u(s) = −(w τ + f (I(s))).Hence, integration by substitution allows us to rewrite (4) into: Analytical LTC solution for piecewise constant inputs.The derivation of a useful closed-form expression of x requires us to solve the integral expression t 0 f (I(s))ds for any t ≥ 0. Fortunately, the integral t 0 f (I(s))ds enjoys a simple closed-form expression for piecewise constant inputs I. Specifically, assume that we are given a sequence of time points: when τ k ≤ t < τ k+1 for some 0 ≤ k ≤ n − 1 (as usual, one defines ∑ −1 i=0 := 0).With this, we have: when τ k ≤ t < τ k+1 for some 0 ≤ k ≤ n − 1.While any continuous input can be approximated arbitrarily well by a piecewise constant input (21), a tight approximation may require a large number of discretization points τ 1 , . . ., τ n .We address this next.
Analytical LTC approximation for general inputs.Inspired by Eq. 6, the next result provides an analytical approximation of x(t).
Lemma 1.For any Lipschitz continuous, positive, monotonically increasing, and bounded f and continuous input signal I(t), we approximate x(t) in ( 5) as follows: we can obtain the following sharpness results, additionally: 1.For any t ≥ 0, we have sup{ 2. For any t ≥ 0, we have Above, the supremum and infimum are meant to be taken across all continuous input signals.
These statements settle the question about the worst-case errors of the approximation.The first statement implies in particular that our bound is sharp.
The full proof is given in Methods.Lemma 1 demonstrates that the integral solution we obtained shown in Equation 5is tightly close to the approximate closed-form solution we proposed in Equation 8.Note that as w τ is positively defined, the derived bound between Equations 5 and 8 ensures an exponentially decaying error as time goes by.
Therefore, we have the statement of the theorem.

An Instantiation of LTCs and their approximate closed-form expressions.
Figure 2 shows a liquid network with two neurons and five synaptic connections.The network receives an input signal I(t). Figure 2 further derives the differential equation expression of the network along with its closed-form approximate solution.
In general, it is possible to compile a trained LTC network into its closed-form version.This compilation allows us to speed up inference time of ODE-based networks as the closed-form variant does not require complex ODE solvers to compute outputs.
Algorithm 1 provides the instructions on how to transfer a trained LTC network into its closed form variant.

Tightness of the Closed-form Solution in Practice
Figure 3 shows an LTC-based network trained for autonomous driving (22).The figure further illustrates how close the proposed solution fits the actual dynamics exhibited from a single neuron ODE given the same parametrization.LTC network synaptic parameters { σ (N×H) , µ (N×H) , A (N×H) } Outputs: LTC's closed-form approximation of hidden state neurons, x(N×T) (t) ∇ all presynaptic signals to nodes for i th neuron in neurons 1 to H do for j in Synapses to

end for end for return x(t)
We took a trained Neural Circuit Policy (NCP) (22), which consists of a perception module and a liquid time-constant (LTC) based network (1) that possess 19 neurons and 253 synapses.The network was trained to autonomously steer a self-driving vehicle.We used recorded real-world test-runs of the vehicle for a lane-keeping task, governed by this network.The records included the inputs, outputs as well as all LTC neurons' activities and parameters.To perform a sanity check whether our proposed closed-form solution for LTC neurons is good enough in practice as well as the theory, we plugged in the parameters of individual neurons and synapses of the differential equations into the closed-form solution (Similar to the representations shown in Figure 2b and 2c) and emulated the structure of the ODE-based LTC networks.We then visualized the output neuron's dynamics of the ODE (in blue) and of the closed-form  solution (in red).As illustrated in Figure 3, we observed that the behavior of the ODE is captured with a mean-squared error of 0.006 by the closed-form solution.This experiment is an empirical evidence for the tightness results presented in our theory.
Hence, the closed-form solution contains the main properties of liquid networks in approximating dynamics.We next show how to design a novel neural network instance inspired by this closed-form solution, that has well-behaved gradient properties and approximation capabilities.

Design a Closed-form Continuous-depth Model Inspired by the Solution
Leveraging the scalar closed-form solution expressed by Eq. ( 2 integrated into larger representation learning systems.Doing so requires careful attention to potential gradient and expressivity issues that can arise during optimization, which we will outline in this section. Formally, the hidden states, x(t) (D×1) with D hidden units at each time step t, can be explicitly obtained by: where B (D) collapses (x 0 − A) of Eq. 2 into a parameter vector.A (D) and w (D) τ are system's parameter vectors, as well, I(t) (m×1) is an m-dimensional input at each time step t, f is a neural network parametrized by θ = {W x }, and is the Hadamard (element-wise) product.While the neural network presented in 9 can be proven to be a universal approximator as it is an approximation of an ODE system (1,2), in its current form, it has trainability issues which we point out and resolve shortly: Resolving the gradient issues.The exponential term in Eq. 9 derives the system's first part (exponentially fast) to 0 and the entire hidden state to A. This issue becomes more apparent when there are recurrent connections and causes vanishing gradient factors when trained by gradient descent (23).To reduce the effect, we replace the exponential decay term with a reversed sigmoidal nonlinearity σ(.).This nonlinearity is approximately 1 at t = 0 and approaches 0 in the limit t → ∞.However, unlike the exponential decay, its transition happens much smoother, yielding a better condition on the loss surface.
Replacing biases by learnable instances.Next, we consider the bias parameter B to be part of the trainable parameters of the neural network f (−x, −I; θ) and choose to use a new network instance instead of f (presented in the exponential decay factor).We also replace A with another neural network instance, h(.) to enhance the flexibility of the model.To obtain a more general network architecture, we allow the nonlinearity f (−x, −I; θ) present in Eq. 9 have both shared (backbone) and independent, (g(.)), network compartments.
Gating balance.The time-decaying sigmoidal term can play a gating role if we additionally multiply h(.), with (1 − σ(.)).This way, the time-decaying sigmoid function stands for a gating mechanism that interpolates between the two limits of t → −∞ and t → ∞ of the ODE trajectory.
Backbone.Instead of learning all three neural network instances f , g and h separately, we have them share the first few layers in the form of a backbone that branches out into these three functions.As a result, the backbone allows our model to learn shared representations, thereby speeding up and stabilizing the learning process.More importantly, this architectural prior enables two simultaneous benefits: 1) Through the shared backbone a coupling between time-constant of the system and its state nonlinearity get established that exploits causal representation learning evident in a liquid neural network (1,24).2) through separate head network layers, the system has the ability to explore temporal and structural dependencies independently of each other.
These modifications result in the closed-form continuous-depth (CfC) neural network model: The CfC architecture is illustrated in Figure 4.The neural network instances could be selected arbitrarily.The time complexity of the algorithm is equivalent to that of discretized recurrent networks (25), which is at least one order of magnitude faster than ODE-based networks.

How do you deal with time, t?
CfCs are continuous-depth models that can set their temporal behavior based on the task-under-test.For time-variant datasets (e.g., irregularly sampled time series, event-based data, and sparse data), t for each incoming sample is set based on its time-stamp or order.For sequential applications where the time of the occurrence of a sample does not matter, t is sampled batch-length-times with equidistant intervals within two hyperparameters a and b.

Experiments with CfCs
We now assess the performance of CfCs in a series of sequential data processing tasks compared to advanced, recurrent models.We first evaluate how CfCs compare to LTC-based neural circuit policies (NCPs) (22) in real-world autonomous lane keeping tasks.We then approach solving conventional sequential data modeling tasks (e.g., bitstream prediction, sentiment analysis on text data, medical time-series prediction, and robot kinematics modeling), and compare CfC variants to an extensive set of advanced recurrent neural network baselines.
CfC Network Variants.To evaluate how the proposed modifications we applied to the closed-form solution network described by Eq. 9, we test four variants of the CfC architecture: 1) Closed-form solution network (Cf-S) obtained by Eq. 9, 2) CfC without the second gating mechanism (CfC-noGate).This variant does not have the 1 − σ instance shown in Figure 4. 3) Closed-form Continuous-depth model (CfC) expressed by Eq. 10. 4) CfC wrapped inside a mixed-memory architecture (i.e., CfC defines the memory state of an RNN for instance an LSTM).We call this variant CfC-mmRNN.
Each of these four proposed variants leverage our proposed solution, and thus are at least one order of magnitude faster than continuous-time ODE models.

How well CfCs perform in autonomous driving compared to NCPs and other mod-els?
In this experiment, our objective is to evaluate how robustly CfCs learn to perform autonomous navigation as opposed to its ODE-based counterparts LTC networks.The task is to map incoming pixel observations to steering curvature commands.We start off by training neural network architectures that possess a convolutional head stacked with the choice of RNN.The RNN compartment of networks are replaced by LSTM networks, NCPs (22), Cf-S, CfC-NoGate, and CfC-mmRNN.We also trained a fully convolutional neural network for the sake of proper comparison.
Our training pipeline followed an imitation learning approach with paired pixelcontrol data, from a 30Hz BlackFly PGE-23S3C RGB camera, collected by a human expert driver across a variety of rural driving environments, including times of day, weather conditions, and season of the year.The original 3-hour dataset was further augmented to include off-orientation recovery data using a privileged controller (26) and a data-driven view synthesizer (27).The privileged controller enabled training all networks using guided policy learning (28).After training, all networks were transferred on-board our full-scale autonomous vehicle (Lexus RX450H, retrofitted with drive-by-wire capability).The vehicle was consistently started at the center of the lane, initialized with each trained model, and was run to completion at the end of the road.If the model exited the bounds of the lane a human safety driver intervened and restarted the model from the center of the road at the intervention location.All models were tested with and without noise added to the sensory inputs to evaluate robustness.
The testing environment consisted of 1km of private test road with unlabeled lanemarkers and we observed that all trained networks were able to successfully complete the lane-keeping task at a constant velocity of 30 km/hr.Fig. 5 provides an insight into how these networks come with driving decisions.To this end, we computed the attention of each network while driving, by using the visual-backprop algorithm (29).
We observe that CfCs similar to NCPs demonstrate a consistent attention pattern in each subtask, while maintaining their attention profile under heavy noise depicted in In the following, we design sequence data processing pipelines where we extensively test CfCs' effectiveness in learning spatiotemporal dynamics, compared to a large range of advanced recurrent models.

Regularly and Irregularly-Sampled Bit-Stream XOR
The bit-stream XOR dataset (9) considers classifying bit-streams implementing an XOR function in time, i.e., each item in the sequence contributes equally to the correct output.The bit-streams are provided in densely sampled and event-based sampled format.The densely sampled version simply represents an incoming bit as an input event.
The event sampled version transmits only bit-changes to the network, i.e., multiple equal bit is packed into a single input event.Consequently, the densely sampled variant is a regular sequence classification problem, whereas the event-based encoding variant represents an irregularly sampled sequence classification problem.
Table 4 compares the performance of many RNN baselines.Many architectures such as Augmented LSTM, CT-GRU, GRU-D, ODE-LSTM, coRNN, and Lipschitz RNN, and all variants of CfC can successfully solve the task with 100% accuracy when the bit-stream samples are equidistant from each other.However, when the bit-stream samples arrive at non-uniform distances, only architectures that are immune to the vanishing gradient in irregularly sampled data can solve the task.These include GRU-D, ODE-LSTM and CfCs, and CfC-mmRNNs.ODE-based RNNs cannot solve the event-based encoding tasks regardless of their choice of solvers, as they have vanishing/exploding gradient issues (9).The hyperparameter details of this experiment is provided in Table S1.

PhysioNet Challenge
The PhysioNet Challenge 2012 dataset considers the prediction of the mortality of 8000 patients admitted to the intensive care unit (ICU).The features represent time series

Note:
The performance of models marked by † are reported from (9).
of medical measurements of the first 48 hours after admission.The data is irregularly sampled in time, and over features, i.e., only a subset of the 37 possible features is given at each time point.We perform the same test-train split and preprocessing as (7), and report the area under the curve (AUC) on the test set as metric in Table 5.We observe that CfCs perform competitively to other baselines while performing 160 times faster training time compared to ODE-RNNs and 220 times compared to continuous latent models.CfCs are also, on average, three times faster than advanced discretized gated recurrent models.The hyperparameter details of this experiment is provided in Table S2.The performance of the models marked by † are reported from (7) and the ones with * from (44).

Sentiment Analysis -IMDB
The IMDB sentiment analysis dataset (47) consists of 25,000 training and 25,000 test sentences.Each sentence corresponds to either positive or negative sentiment.We tokenize the sentences in a word-by-word fashion with a vocabulary consisting of 20,000 most frequently occurring words in the dataset.We map each token to a vector using a trainable word embedding.The word embedding is initialized randomly.No pretraining of the network or the word embedding is performed.Table 6 represents how CfCs equipped with mixed memory instances outperform advanced RNN benchmarks.The hyperparameter details of this experiment is provided in Table S3.85.9 ± 0.9 CfC-mmRNN (ours) 88.3 ± 0.1

Note:
The performance of the models marked by † are reported from (40), and * are reported from (42).The n/a standard deviation denotes that the original report of these experiments did not provide the statistics of their analysis.

Physical Dynamics Modeling
The Walker2D dataset consists of kinematic simulations of the MuJoCo physics engine (50) on the Walker2d-v2 OpenAI gym (51) environment using four different stochastic policies.The objective is to predict the physics state of the next time step.
The training and testing sequences are provided at irregularly-sampled intervals.We report the squared error on the test set as a metric.As shown in

Note:
The performance of the models marked by † are reported from (9).
that on this task, CfCs even outperform Transformers by a considerable 18% margin.
The hyperparameter details of this experiment is provided in Table S4.

Scope, Discussions and Conclusions
We ), as well as performant modeling of sequential and irregularly-sampled data (1,(7)(8)(9)43).In this paper, we showed how to relax the need for an ODE-solver to realize an expressive continuous-time neural network model for challenging time-series problems.
Improving Neural ODEs.ODE-based neural networks are as good as their ODEsolvers.As the complexity or the dimensionality of the modeling task increases, ODEbased networks demand a more advanced solver that significantly impacts their efficiency (17), stability (13,15,(60)(61)(62) and performance (1).A large body of research went into improving the computational overhead of these solvers, for example, by designing hypersolvers (17), deploying augmentation methods (4,12), pruning (6) and by regularizing the continuous flows (14)(15)(16).To enhance the performance of an ODE-based model, especially in time series modeling tasks (63), solutions provided for stabilizing their gradient propagation (9,43,64).In this work, we showed that CfCs improve the scalability, efficiency, and performance of continuous-depth neural models.
Now that we have a closed-form system, where does it make sense to use ODE-based networks?For large-scale time-series prediction tasks, and where closed-loop performance matters (24) CfCs should be the method of choice.This is because, they capture the flexible, continuous-time nature of ODE-based networks while presenting large gains in performance and scalability.On the other hand, implicit ODE-based mod-els can still be significantly beneficial in solving continuously defined physics problems and control tasks.Moreover, for generative modeling, continuous normalizing flows built by ODEs are the suitable choice of model as they ensure invertibility unlike CfCs (2).This is because differential equations guarantee invertibility (i.e., under uniqueness conditions (6), one can run them backwards in time).CfCs only approximate ODEs and therefore they no longer necessarily form a bijection (65).
What are the limitations of CfCs?CfCs might express vanishing gradient problems.
To avoid this, for tasks that require long-term dependencies, it is better to use them together with mixed memory networks (9) (See CfC-mmRNN).Moreover, we speculate that inferring causality from ODE-based networks might be more straightforward than a closed-form solution (24).It would also be beneficial to assess if verifying a continuous neural flow ( 66) is more tractable by an ODE representation of the system or their closed form.
In what application scenarios shall we use CfCs?For problems such as language modeling where a significant amount of sequential data and substantial compute resources are available, Transformers (46) are the right choice.In contrast, we use CfCs when: 1) data has limitations and irregularities (e.g., medical data, financial timeseries, robotics (67) and closed loop control and robotics, and multi-agent autonomous systems in supervised and reinforcement learning schemes (68)), 2) training and inference efficiency of a model is important (e.g., embedded applications (69-71)), and 3) when interpretability matters (72).
where L is the Lipschitz constant of f and the last identity is due to dominated convergence theorem (21).To see 2), we first note that the negation of the signal −I tion of an integral appearing in LTCs' dynamics, that has had no known closed-form solution so far.This closed-form solution substantially impacts the design of continuous-time and continuous-depth neural models; for instance, since time appears explicitly in closed-form, the formulation relaxes the need for complex numerical solvers.Consequently, we obtain models that are between one and five orders of magnitude faster in training and inference compared to differential equation-based counterparts.More importantly, in contrast to ODE-based continuous networks, closed-form networks can scale remarkably well compared to other deep learning instances.Lastly, as these models are derived from liquid networks, they show remarkable performance in time series modeling, compared to advanced recurrent models.

Fig. 2 :
Fig. 2: Instantiation of LTCs in ODE and closed-form representations.a) A sample LTC network with two nodes and five synapses.b) the ODE representation of this two-neuron system.c) the approximate closed-form representation of the network.

Fig. 3 :
Fig.3: Tightness of the closed-form solution in practice.We approximate a closed-form solution for LTCs (1) while largely preserving the trajectories of their equivalent ODE systems.We develop our solution into closed-form continuous-depth (CfC) models that are at least 100x faster than neural ODEs at both training and inference on complex time-series prediction tasks.

Fig. 4 :
Fig. 4: Closed-form Continuous-depth neural architecture.A baclbone neural network layer delivers the input signals into three head networks g, f and h.f acts as a liquid time-constant for the sigmoidal time-gates of the network.g and h construct the nonlinearieties of the overall CfC network.

Fig. 5c .
Fig. 5c.Similar to NCPs, CfCs are very parameter efficient.They performed the endto-end autonomous lane keeping task with around 4k trainable parameters in their recurrent neural network component.

Fig. 5 :
Fig. 5: Attention Profile of networks.Trained networks receive unseen inputs (first column in each tab) and generate acceleration and steering commands.We use the Visual-Backprop algorithm (29) to compute the saliency maps of the convolutional part of each network.a) results for networks tested on data collected in summer.b) results for networks tested on data collected in winter.c) results for inputs corrupted by a zero-mean Gaussian noise with variance, σ 2 = 0.35.

Table 1 : Time Complexity of the process to compute K solver's steps
(17) step-size, ˜ is the max step-size and δ << 0. K is time steps for closed-form continuous depth models (CfCs) which is equivalent to K. Table is reproduced and taken from(17).

Table 2 : Sequence and time-step prediction complexity. n
is the sequence length, k the number of hidden units, and p = order of the ODE-solver.

Algorithm 1
Translate a trained LTC network into its closed-form variant

Table 3 : Lane-keeping models' parameter count.
CfC and NCP networks perform lanekeeping in unseen scenarios with a compact representation.

Table 4 :
(9)-stream XOR sequence classification.The performance values for all baseline models are reproduced from(9).Numbers present mean ± standard deviations, n=5

Table 6 :
(46)45) on the IMDB datasets.The experiment is performed without any pretraining or pretrained word-embeddings.Thus, we excluded advanced attention-based models(44,45)such as Transformers(46)and RNN structures that use pretraining.Numbers present mean ±

Table 7 ,
CfCs outperform the other baselines by a large margin rooting for their strong capability to model irregularly sampled physical dynamics with missing phases.It is worth mentioning

Table 7 :
Per time-step regression.Modeling the physical dynamics of a Walker agent in simulation.Numbers present mean ± standard deviations.n = 5