Deep kernel learning of dynamical models from high-dimensional noisy data

This work proposes a stochastic variational deep kernel learning method for the data-driven discovery of low-dimensional dynamical models from high-dimensional noisy data. The framework is composed of an encoder that compresses high-dimensional measurements into low-dimensional state variables, and a latent dynamical model for the state variables that predicts the system evolution over time. The training of the proposed model is carried out in an unsupervised manner, i.e., not relying on labeled data. Our learning method is evaluated on the motion of a pendulum—a well studied baseline for nonlinear model identification and control with continuous states and control inputs—measured via high-dimensional noisy RGB images. Results show that the method can effectively denoise measurements, learn compact state representations and latent dynamical models, as well as identify and quantify modeling uncertainties.


A.2 SVDKL Dynamical Model Architecture
Given a sample z t from the latent state distribution p(z t |x t ), we predict the evolution of the dynamical system forward in time with a control input u using the SVDKL dynamical model F. The SVDKL dynamical model is composed of 3 fully-connected layers of size 512, 512, and 20, respectively, with ELU activations except the final layer with a linear activation. Analogously to the SVDKL encoder, the output features of the neural network are fed to 20 independent GPs to produce a 20-dimensional next state distribution p(z t+1 |z t , u t ). Again, we sample the next latent states z t+1 using the reparametrization trick.

A.3 Pendulum Environment
The pendulum environment used for collecting the data tuples is the Pendulum-v1 from Open-Gym 6 .

A.4 KL Balancing
Similar to Dreamer-v2 5 , we employ the KL balancing. The method allows for balancing how much the prior is pulled towards the posterior and vice versa, and can be easily implemented as follows: where α is a hyperparameter balancing the contribution of the two terms of the KL divergence, and stop_grad is the function stopping the propagation of the gradients during the update step of the SVDKL parameters.

A.5 Hyperparameter Summary
The hyperparameters are chosen via grid search among the values reported in Table 1. The final values used in the experiments are indicated in bold. Other parameters used in the experiments are listed in Table 2.

B Comparison with Variational Autoencoder
We compare the proposed SVDKL-based scheme with a VAE-based counterpart 4 in the low-dimensional learning of state representation and latent forward model (for the pendulum) using high-dimensional noisy measurements.

B.1 Architecture
For a fair comparison, the VAE-and SVDKL-based schemes have very similar model architectures. We use the same encoding architecture (see Section A.1), and the two models only differ in the last layer of encoder, i.e., the outputs of the VAE are the means and standard deviations of the Gaussian distributions for latent states. Similarly, the NN-based latent forward models are formulated identically (see Section A.2), expect that the means and standard deviations of the next latent states are defined as outputs in the VAE-based scheme.

B.2 Loss Function
To train the VAE-based models, we employ a VAE loss L E (θ E , θ D ), a dynamical model loss L F (θ F ), and the overall loss L REP (θ E , θ F , θ D ) defined as their combination, all of which take the same form as their counterparts in the SVDKL-based scheme respectively, expect that there are no longer kernel hyperparameters to be determined. Because the VAE directly learns the mean and the standard deviation of the Gaussian distribution, we do not need to perform variational inference as in the case of the SVDKL-based models.

B.3 Hyperparameters
The hyperparameters for the VAE-based model training are set to the same values as in the SVDLK-based scheme, reported in Table 3.

B.4 Results
To empirically demonstrate the advantages in using the SVDKL-based scheme over the VAE-based models for state estimation and denoising, we show the reconstructed images under different noise levels in Figure 1. The SVDKL-based models, especially in the case of high measurement noise (e.g., σ u = 0.7 or σ x = 1.0), provide sharper reconstructions.