Bayesian Optimization of Bose-Einstein Condensates

Machine Learning methods are emerging as faster and efficient alternatives to numerical simulation techniques. The field of Scientific Computing has started adopting these data-driven approaches to faithfully model physical phenomena using scattered, noisy observations from coarse-grained grid-based simulations. In this paper, we investigate data-driven modelling of Bose-Einstein Condensates (BECs). In particular, we use Gaussian Processes (GPs) to model the ground state wave function of BECs as a function of scattering parameters from the dimensionless Gross Pitaveskii Equation (GPE). Experimental results illustrate the ability of GPs to accurately reproduce ground state wave functions using a limited number of data points from simulations. Consistent performance across different configurations of BECs, namely Scalar and Vectorial BECs generated under different potentials, including harmonic, double well and optical lattice potentials pronounces the versatility of our method. Comparison with existing data-driven models indicates that our model achieves similar accuracy with only a small fraction (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{1}{50}$$\end{document}150th) of data points used by existing methods, in addition to modelling uncertainty from data. When used as a simulator post-training, our model generates ground state wave functions \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$36 \times $$\end{document}36× faster than Trotter Suzuki, a numerical approximation technique that uses Imaginary time evolution. Our method is quite general; with minor changes it can be applied to similar quantum many-body problems.


Introduction
At ultra-cold temperatures, when a gas is dilute enough such that the interactions are weak, the atoms crash into the ground state, leading to an exotic state of matter called Bose-Einstein Condensates (BECs) 1 .BECs are in general governed by the mathematical model described by Gross-Pitaevskii equation (GPE), a variant of Nonlinear Schrödinger equation (NLSE).In GPE, there are two variants, namely nonlinear interaction parameter (coupling strength) and the trapping potential and both could be spatio-temporal in general.GPE in its most general form cannot be solved analytically.It can be solved analytically 2 only for specific choices of coupling strengths and trapping potentials.In other cases, one has to resort to numerical techniques 3 like Crank-Nicholson Scheme, Trotter-Suzukii Approximation, Finite Element Analysis, etc. Numerical techniques rely on numerical differentiation which is the finite difference approximation of derivatives using values of the original function evaluated at some sample points.Numerical approximations of derivatives are inherently ill-conditioned and unstable due to introduction of truncation and round-off errors 4 .On the other hand, Automatic Differentiation (AD) also called algorithmic differentiation is a family of techniques for efficiently and accurately evaluating derivatives of numeric functions expressed as computer programs 5 .AD exploits the fact that every computer program executes a sequence of elementary arithmetic operations and elementary functions.By applying the chain rule repeatedly to these operations, derivatives of arbitrary order can be computed automatically and accurately to working precision.AD is both efficient and numerically stable compared to Numerical Differentiation.AD has become the beating heart of Modern Machine Learning, leading to a new paradigm of programming called Differentiable Programming.
In recent years, Scientific Computing community has started adopting data-driven approaches from Machine Learning, as faster and efficient alternatives to numerical simulation techniques.This new regime called Data-driven Scientific Computing has shown great promise already by faithfully modelling physical phenomena using scattered noisy observations from coarse-grained grid-based numerical simulations.The learned models are capable of predicting the dynamics of the system under study in a fixed number of CPU cycles irrespective of the dimensions of the grid.Unlike numerical techniques that rely on unstable numerical gradients, Machine Learning models depend on precise and stable automatic differentiation.
Advances in Machine Learning has led to incredible breakthroughs in areas such as Computer Vision 6 , Natural Language Processing 7 , Computational Biology 8 , etc, demonstrating super-human abilities [9][10][11][12] in certain tasks.These dramatic improvements are reflected in computational sciences that make use of Machine Learning.Artificial Neural Networks (ANNs), the focal point of Modern Machine Learning or Deep Learning, are powerful and versatile models that can automatically learn representations from data.ANNs were primarily used in Physical Sciences, to learn suitable order parameters and detect the phase transition, without any prior domain knowledge [13][14][15][16] .Greitemann et al. 17 used a kernel-based learning method to learn the phases in magnetic materials to identify complex order parameters.In the context of nonlinear dynamical systems, Jaegar et al. 18 employed ANNs to predict the trajectories of a chaotic time-series improving the accuracy by a factor of 2400 over previous techniques.ANNs were used as a variational representation of quantum states in quantum-physical many-body problems to cope up with the exponential complexity of the many-body wave function 19 .Nomura et al. 20 employed ANNs to develop a machine learning method for constructing accurate ground-state wave functions of strongly interacting and entangled quantum spin systems as well as fermionic models on lattices.A restricted Boltzmann machine algorithm in the form of an ANN, combined with a conventional variational Monte Carlo method turned out to be a highly accurate quantum many-body solver capable of accurately predicting ground-state wave functions of strongly interacting and entangled quantum spin systems as well as fermionic models on lattices 21 .ANNs have been proven to be capable of representing large-scale quantum states, including the ground states of many-body Hamiltonians and states generated by quantum dynamics [22][23][24][25] .
Despite being robust and flexible, ANNs suffer from a number of shortcomings.ANNs are data-hungry and hence they do not play well with small data.They are over-confident in predictions and are susceptible to adversarial inputs.Learning in ANNs is based on Maximum Likelihood Estimation which leads to point estimates of learnable parameters instead of probability distributions.Gaussian Processes (GPs) on the other hand, are flexible probabilistic models that can learn probability distributions over functional mapping of inputs to outputs.They are quite efficient in learning hidden dynamics from small data.Being a Bayesian method, GP inherently models uncertainty from data.Learning results in a posterior distribution from which we could draw new samples.
Even though Machine Learning approach has been employed in modelling BECs earlier by Liang etal 26 , they exploited Convolutional Neural Network architecture(CNN) to model the ground state wave functions of BECs in different platforms.Liang et al. 26 showed that it is possible to approximate the wave function using a neural network but do not guarantee any kind of improvement over conventional numerical techniques 27 .Our investigation bridges this gap by using a Gaussian Process as a cheaper, powerful and efficient alternative to neural networks in the context of modelling ground state wave functions in BECs.
In this paper, our goal is to examine whether Gaussian Processes could be a viable surrogate model for datadriven emulation of one-dimensional Scalar and Vectorial BECs.Our validation of GP is centred around the following aspects: To answer these questions, we studied the performance of Gaussian Processes on different settings of BECs.We employ Trotter-suzuki approximation, a numerical technique for simulating the ground state wave functions, ψ.We run simulations by setting up a one-dimensional grid and vary the coupling strength g, the parameter that controls the interaction between the atoms besides varying the trapping potential.Similarly for two-component BECs, we run simulations by varying interaction parameters {g 11 , g 12 , g 22 } which results in two wave functions {ψ 1 , ψ 2 }.We model the simulated wave functions using a GP with Radial Basis Function (RBF) kernel 28 .The GP models the wave function as a function of space x and coupling strength g.The results reveal the versatility of GP in modelling different kinds of wave functions, efficiently and accurately, with uncertainty estimates.In addition to modelling uncertainty from data, our method performs better than Liang et al. 26 in terms of efficiency and model complexity.Our model being simpler in terms of model complexity, uses just a small fraction ( 1 50 th) of the data points used by Liang et al. to achieve similar accuracy.Furthermore, comparing the efficiency of our method in predicting wave functions, with Trotter-suzuki approximation, we find that our method performs 36× faster.The paper is organised as follows: Section.2 lays the foundation for the rest of the paper.It starts with a brief introduction to Bose-Einstein Condensates in Section.2.1, Trotter-Suzuki Approximation in Section.2.2, and a detailed description of Gaussian Processes in 2.3.Section.3 provides the reader with the tools necessary to replicate our experiments.Section.4 includes a report on all our experiments using Gaussian Processes as a surrogate model for different settings of BECs.Section.5 discusses the implications of our experimental results and summarizes possible avenues for further research.

Bose-Einstein Condensates
At temperatures close to absolute zero, majority of the bosons in a gas crash into the ground state of the system creating an exotic state of matter known as Bose-Einstein Condensates (BECs).The dynamics of BECs is described by Gross-Pitaevskii 29 equation 1 which belongs to the family of Variable Coefficient Nonlinear Schrödinger equations, given by where ψ denotes the wave function(or order parameter) of BECs, m is mass the atoms of the condensate, g is the inter-atomic interaction, V ext is the trapping potential and |ψ| 2 is the atomic density.
A two component BEC known as Vector BEC is endowed with inter-atomic and intra atomic interaction in addition to the trap and hence its density can be manipulated with more freedom and is governed by Coupled Gross-Pitaveskii Equation 3 of the form equation 2 and 3,

Trotter-Suzuki Approximation
The behaviour of any physical system can be studied by solving the partial differential equations (PDEs) which represent the dynamics of that physical phenomenon.In practice, most PDEs with any real application are nonlinear in nature and are hard to solve analytically.This is particularly true for complex dynamical systems which are quite difficult to solve and necessitate high computational resources to arrive at highly accurate solutions.Trotter-suzuki decomposition implemented by Wittek and Cucchietti 30 , exploits optimized kernels to solve Gross-Pitaevskii equation of a free particle.The exponential operators in PDEs are notoriously hard to approximate.Trotter Suzuki decomposes the Hamiltonian into sum of diagonal matrices which eases the task of computing the exponential.The evolution operator is calculated using the Trotter-Suzuki approximation.Given an Hamiltonian as a sum of hermitian operators, for instance

Gaussian Processes
Gaussian Processes 31 are probabilistic machine learning models that define a distribution over functions.They are infinite dimensional realizations of a multi-variate Gaussian Distribution.Given a set of points in the input space {x 1 , x 2 , ..x n } and the function f evaluated at those points { f 1 e , f 2 e , ..., f n e }, we can formally define a GP as: ) is a Gaussian Process if for any subset {x 1 , x 2 , ...x n } ⊂ X , the marginal distribution over the subset p( f e ) has a multivariate Gaussian distribution.Consider a Multivariate Gaussian Distribution given by, where µ ∈ R k is a k-dimensional zero vector and Σ ∈ R k×k is the covariance matrix that captures the correlation between dimensions {X 1 , X 2 , ..X k }.Let us sample 4 points {z 1 , z 2 , z 3 , z 4 } from the distribution and plot it sequentially as shown in Figure .1a.From Figure .1a,we can make two inferences.(1) Points closer to each other on the x-axis behave similarly on the y-axis (2) Points farther apart from each other behave differently on the y-axis.That is, closer points are highly correlated with each other while father points are not.This intuition leads to the understanding that by connecting these points sequentially, we can realize smooth functions like the ones shown in Figure .1b.The functions we realized by connecting the points sequentially can be considered as samples from a Gaussian Process by a zero-mean vector µ and the hitherto unknown covariance matrix Σ.
Yet another inference that can be made from figure .1a is that the nature of the functions realized are dependent on the covariance matrix Σ.By manipulating the covariance matrix Σ, one can manipulate the kind of functions realized from k-dimensional samples {z i }.The mechanism to manipulate the covariance matrix is through an algebraic function named Covariance function or kernel.The kernel k(x, x ) → R takes two points from the input space X as inputs and returns a scalar value c ∈ [0, 1].This value represents the similarity between the two input points.The covariance matrix Σ is constructed by calculating the similarity between k equally spaced points in the input space {x 1 , ...x k }.By controlling the kernel function, we control the covariance matrix that governs the GP, thereby controlling the nature of functions generated.The kernel is usually parameterized by one or more tunable variables which act as knobs for controlling the shape of the functions that GP generates.
Let us assume for didactic purposes that the x and y are scalars and the function f is given by f : X → R. We consider Radial Basis Function (RBF) parameterized by length scale k l as the kernel.
Length scale k l controls the order of the functions generated as evident from Figure .1b.

Inference
Inference in GP is a two-step process : Kernel Parameter Search and Posterior Estimation.In kernel parameter search, we learn the kernel parameter values that fit the data well.The optimal value of kernel parameters are estimated using Maximum Likelihood Estimation (MLE) by minimizing marginal negative log likelihood 32 given by

4/11
where N is the number of data points, (X, y) represent the data points.Then, we estimate GP posterior conditioned on data i.e. mean vector and covariance matrix conditioned on data.The definition of GP states that the joint distribution of observed data y and predictions f * has a multivariate Gaussian distribution.Given data X and new input X * , we can write the joint distribution as, Using the Multivariate Gaussian Theorem 33 , we arrive at the parameters of the posterior,

Motivation
In our experiments, we consider the ground state wave function of one-dimensional BECs 27 .The simulation takes as input multiple parameters including grid parameters like radius, length, potential function (V ext ), coupling strength (g),etc., and generates a one-dimensional wave function using Imaginary Time Evolution method.The time period to complete a simulation depends on the dimensions and size of the grid.The promise of Data-driven Scientific computing is the ability to model any process without closed-form solutions using experimental data and building a predictive model that can provide predictions at any point in the grid with absolute guarantee.As discussed in the Introduction, a data-driven model trained on scattered data points from simulations is capable of predicting the wave function at any point within the grid in a few fixed number of CPU cycles irrespective of grid dimensions or size.Gaussian Processes model the wave function as a probability distribution over functions supported by a kernel with optimal parameters.Being a Bayesian model, GP is data-efficient and makes uncertainty-aware predictions.The ability to model uncertainty is especially important in modelling physical phenomenon from a few scattered noisy observations.Bayesian methods rely on Bayesian Inference which estimates posterior over unknowns in contrast to non-probabilistic models which result in point estimates of unknowns.As a consequence, it is possible to generate entire wave functions from the GP that are subject to constraints on the input parameters like coupling strength, omega(Rabi Coupling), etc.

Methods
We use trotter-suzuki-mpi 34 , a massively parallel implementation of the Trotter-Suzuki approximation to simulate the evolution of Bose-Einstein Condensates.The simulation setup consists of a one-dimensional grid which represents the physical system.This discretized space is defined by a lattice of 512 nodes within a physical length of 24 units.The physics of BECs is described by the Hamiltonian which represents the GPE equation.The Hamiltonian requires a grid and a trapping potential.The trapping potential is defined as a function which takes x and y as arguments and returns a scalar value as output.In this case, we consider a harmonic trapping potential given by x 2 2 .The state of the system is initialized in gaussian form.Imaginary time evolution evolves the system state using the dynamics defined by the Hamiltonian, in 10 3 iterations with time step being ∆ t = 10 −4 for each iteration.In order to collect data, we run M simulations by varying the value of the interaction parameter g.We use sklearn's Gaussian Process API 35 with Radial Basis Function (RBF) kernel as a surrogate to model the wave function ψ as a simpler but continuous function of space x and the interaction parameter g.
The prior on the groundstate wave function of BEC is given by N data points of the form (x, g, ψ) are sampled randomly from M simulation outcomes.Throughout our experiments, we use the RBF kernel given in equation.5.The hyper-parameters θ of the kernel are tuned using Maximum Likelihood Estimation which essentially amounts to finding the parameters that minimize the expression in equation.6,given a list of data points.
The RBF kernel consists of the following parameters: length scale k l and variance k σ .ψ posterior is estimated by conditioning ψ prior on the sampled data points.The expression for ψ posterior reduces to a mathematically tractable form given by the mean vector and covariance matrix presented in equation 8. Inference consists of calculating the marginal mean µ 2|1 and covariance Σ 2|1 by substituting the training points X and new test points X * into the analytical forms presented in equation 8.The marginal mean µ 2|1 constitutes the predicted wave function.The diagonal elements of the covariance matrix form the variance σ on each prediction.Together, they make the GP posterior parameterized by [µ 2|1 , Σ 2|1 ].The fidelity of predicted wave function is measured by calculating Mean Squared Error (MSE) metric against the ground truth data.Figure .2portrays the results of an illustrative experiment on one-dimensional BECs with a harmonic trapping potential.
The experimental setup described above can be extended to different settings of simulation and data-driven approximation, to include two-component BECs and different kinds of trapping potentials.A detailed report of the experiments we conducted is presented in the next section.

Experiments
All the experiments to be covered in this section will follow along similar lines as the illustrative experiment discussed in the previous section: Simulation, Data-driven Approximation and Evaluation.In the previous experiment, we ran 300 simulations shown in Figure .2afor equally spaced values of g ranging from 1 to 100.We randomly sampled 500 samples of the form (x, g, ψ) from the simulation results, which we used to fit the GP.Evaluation leads to a MSE score of 1.23 × 10 −7 which indicates that the trained GP can accurately predict ground state wave functions for 100 different values of the interaction parameter g within the range (1, 100).
To follow up, we ask the question, How many data points are necessary to build a reliable predictive model of ground state wave function?In order to answer this question, we vary the number of training samples and observe the effect it has on MSE score of GP trained on those samples.We reuse the experimental setup from the previous experiment and vary the number of data points N sampled from M = 300 simulations.The results are presented in Figure .3. Figure .3depicts the relationship between variance and number of samples N and it shows the rate of decrease of MSE w.r.t N. The MSE curve shows a trend of saturation close to 0 as the number of samples are increased as shown in Table .1.Decrease in variance and MSE is evidence of confidence and accuracy in prediction respectively.
The next experiment tests the versatility of GP by modelling the ground states of BECs for different choices of trapping potentials.We consider Harmonic , Double Well and Optical Lattice potentials in Figure .4 .We restrict the range of g to (0, 2) in order to ensure the stability of generated wave function for different potentials.We run 100  simulations and collect 500 samples for each potential.We fit a GP for each potential using collected data points.
Evaluating the models results in MSE scores of 1.23 × 10 −7 , 1.57 × 10 −6 and 4.30 × 10 −5 , for harmonic, double well and optical lattice potentials 3 respectively.The predictive wave functions for a value of g = 1 is plotted against the ground truth in Figure .4(a,b,c).The ground state wave functions of two-component BECs are given by ψ 1 and ψ 2 .The interaction strength is defined by parameters g 11 , g 12 and g 22 .We model the system using a GP which learns the relationship between (ψ 1 , ψ 2 ) and the interaction parameters.We sample 500 data points from 200 simulations with values of interaction parameters lying within the range (−1, 1).The choice of this range is dictated by the zone of stability of the system.Predictions and ground truth wave functions for different potentials conditioned on values of interaction parameters given by {g 11 = 0.1, g 12 = −0.1,g 22 = 0.1}, are presented in Figure .4(a,b,c).Evaluation on a test set within the range g ∈ (−1, 1) results in MSE in the range of 10 −4 .
We reproduce an experiment conducted by Liang et al. 26 using GP -Emulation of rabi-coupled two-component BECs.Figure.4(d)shows the predicted ground state wave function of rabi-coupled two-component BECs equation.

8/11
9 & 10, conditioned on a cosine potential given by x 2 2 + 24cos 2 x with interaction parameters {g 11 = 103, g 12 = 100, g 22 = 97}.Finally, we use trained GPs as simulators to generate wave functions and compare their performance against Trotter-suzuki.We experiment with one-component and two-component BECs in 512 and 1024 grid spaces.The results tabulated in Table .2indicate an average of 36× speed up in wave generation with reasonable accuracy.

Discussion
CNN-based model employed by liang et al. 26 learns a mapping from coupling strength g to wave function values at fixed points in space, a scalar to vector mapping.This results in a highly limited predictive model of wave functions.In contrast, our method models the wave function as a continuous function of space and coupling strength, resulting in a predictive model which is well-defined in all the points within the chosen interval.The mean-squared error (MSE) presented by Liang et al. 26 , on an average ranges from 10 −4 to 10 −5 .Our experimental results tabulated in Table .1 show a similar MSE score ranging from 10 −4 to 10 −7 .We use a maximum of 1000 data points from simulations to achieve these results while Liang et al. 26 uses data from 50000 simulations to train their CNN.Our method, only using a small fraction ( 1 50 ) of data points used by Liang et al. 26 , has demonstrated comparable accuracy in most cases and enhanced accuracy in others.Our experiments with various trapping potentials show that it is possible to model different kinds of ground states of BECs using a vanilla GP with an RBF kernel.
We compare the predictive power of our model with Trotter-suzuki Approximation -a finite difference based numerical technique.Trotter-suzuki generates ground state wave functions by setting up a grid and running time evolution given a set of initial conditions (configuration).This scales linearly with respect to the number of simulations.The advantage in using a trained GP as a predictive model is that multiple simulation runs can be batched together by converting different configurations {(x, g 1 ), (x, g 2 ), ...(x, g M )} into an input tensor of form [[x, x, ...x], [g 1 , g 2 , ...g M ]].This trick makes it possible to pose multiple simulations as one prediction step.This is the reason for the significant speed up in generation of wave function using GPs compared to Trotter-suzuki.The empirical results discussed thus far indicate that Gaussian Processes are not just viable but desirable surrogate models for data-driven emulation of Bose-Einstein Condensates.
There are a number of avenues to explore which are beyond the scope of this work.We briefly mention these in this section.Vanilla Gaussian Processes do not scale well to high-dimensional data.modelling higher-dimensional BECs requires a scalable variant of GP.GP inherently models uncertainty in data.It is possible to exploit this in order to train GPs even more efficiently using Active Learning.We used different instances of GP to model wave functions conditioned on different trapping potentials.Using a common Multi-task GP (MTGP) 36 would increase the data-efficiency further as MTGP learns the common traits between wave functions generated in different settings.

Conclusion
Data-driven methods are replacing conventional numerical techniques in modelling physical phenomena in several fields of science.We explore data-driven emulation of BEC ground-state wave function using Gaussian Process as a surrogate model.Our empirical results show that Gaussian Processes in addition to modelling uncertainty from data, are able to surpass Artificial Neural Networks in terms of efficiency and complexity.Gaussian Processes when used as simulators post-training offer significant gain in speed when compared to Trotter-suzuki, a numerical approximation technique.Our experimental results indicate that GP is a desirable candidate for data-driven modelling of Bose-Einstein Condensates.

1 . 3 . 4 .
Correctness: Can GP accurately model the ground state of BECs ? 2. Versatility: Can GP adapt to different settings of BECs?Data Efficiency: How many data points are necessary to model a wave function?Compute Efficiency: How do they fare against numerical techniques when used as a simulator, after training?

Figure 1 .
Figure 1.(a) Four samples from a multi-variate normal distribution N (0, Σ) plotted sequentially.(b) Samples from GP prior conditioned on different values of length scale l k .

Figure 2 .
Figure 2. (a) Simulated and predicted ground state wave functions for different values of the interaction parameter g = 10, 50, 90 (b) Prediction of wave functions conditioned on different trapping potentials -Harmonic, Optical Lattice, Double Well potentials.