Data driven inference for the repulsive exponent of the Lennard-Jones potential in molecular dynamics simulations

The Lennard-Jones (LJ) potential is a cornerstone of Molecular Dynamics (MD) simulations and among the most widely used computational kernels in science. The LJ potential models atomistic attraction and repulsion with century old prescribed parameters (q = 6, p = 12, respectively), originally related by a factor of two for simplicity of calculations. We propose the inference of the repulsion exponent through Hierarchical Bayesian uncertainty quantification We use experimental data of the radial distribution function and dimer interaction energies from quantum mechanics simulations. We find that the repulsion exponent p ≈ 6.5 provides an excellent fit for the experimental data of liquid argon, for a range of thermodynamic conditions, as well as for saturated argon vapour. Calibration using the quantum simulation data did not provide a good fit in these cases. However, values p ≈ 12.7 obtained by dimer quantum simulations are preferred for the argon gas while lower values are promoted by experimental data. These results show that the proposed LJ 6-p potential applies to a wider range of thermodynamic conditions, than the classical LJ 6-12 potential. We suggest that calibration of the repulsive exponent in the LJ potential widens the range of applicability and accuracy of MD simulations.


S1 Uncertainty quantification
This section provides a detailed description of the UQ theory used in the current work.

S1.1 Sampling the posterior distribution
For the posterior distribution p(ϑ | d, M) which is known up to a normalizing constant p(d | M), available Markov Chain Monte Carlo (MCMC) methods can be used to efficiently generate samples that quantify the uncertainty in ϑ [1,2,3,4,5]. In our work we use the Transitional Markov Chain Monte Carlo (TMCMC) algorithm [6] with a slight modification on the MCMC proposal covariance used to overcome low acceptance rates in several runs of TMCMC which we observed. Instead of setting the covariance matrix to the scaled sample covariance of the previous generation, we use the landscape around the chain leaders to construct local covariance matrices. The sampling algorithm automatically tries to increase the radius of the neighborhood starting from 10% of the domain size until the local covariance matrix is positive definite.

S1.2 Robust posterior prediction
The uncertainty in the model parameters can be further propagated to the uncertainty in the QoI y produced by f (x; ϑ). Under the assumption on the likelihood function the probability of the model prediction conditioned on the parameters ϑ is given by p(y | ϑ, M) = N (y | f (x; ϑ), Σ). The probability of the model prediction conditioned on the observations d is known as the robust posterior prediction and is given by [7,8]: where ϑ (k) ∼ p(ϑ | d, M) and N s is sufficiently large. When a new QoI z produced by g(x; ϑ) is considered, Σ is unknown and only the parametric uncertainty is propagated into g(x; ϑ). Namely, one should estimate the density of Ns k=1 g(x; ϑ (k) ), where ϑ (k) ∼ p(ϑ | d, M) and N s is sufficiently large.

S1.3 Model selection
The Bayesian framework allows one to select the probabilistic model which best fits the data. The criterion for the model selection comes from the Bayes' theorem, which computes the probability of a probabilistic model M as where p(M) is the prior PDF of the model M i and the evidence p(d | M) of model M is computed as a by-product of TMCMC.

S1.4 Hierarchical Bayesian models
In our work we follow the methodology developed in [9]. We assume that data comes split in N different datasets: We assume that the probability of ϑ i depends on a hyper-parameter ψ ∈ R N ψ and is given by a PDF p(ϑ | ψ, M), where M corresponds to the graph describing the relations between ψ, ϑ i and d i , see Figure S1. Our goal is to obtain samples from the posterior distribution, The dependency assumptions from Fig. S1 allow to simplify: , and equation (S4) can be rewritten using the Bayes' theorem: Finally, the posterior distribution (S4) can be approximated as where, ψ (k) ∼ p(ψ | #» d , M) and N s is sufficiently large. Thus, in order to obtain ϑ i samples, we first have to sample the probability distribution p(ψ | #» d , M), which, according to Bayes' theorem, is equal to where p(ψ | M) is the prior PDF on ψ and p( #» d | M) is the normalizing constant. Exploiting the dependency assumption of Fig. S1 we see that and the likelihood of i-th dataset can be expressed according to the total probability theorem as Here we introduce the model M i described in Fig. S1. The posterior distribution of this model will be used as instrumental density for important sampling. Under the modeling assumption p( [9]) and the use of the Bayes' theorem, equation (S10) is written as or, equivalently, as Finally, equation (S10) can be approximated as and N s is sufficiently large. Note that in general N s can be different for each data set d i . The advantage of this approach is that the likelihoods p(d i | ϑ i , M i ), i = 1, . . . , N , which are the most expensive part of the computations, are not re-evaluated for each ψ.

S2 Information about hyper-parameter models for hierarchical inference
Inference for LJ 6-12 We assume that, and we consider the following two models: is the uniform distribution with parameters a, b and M is set to U, is the log-normal distribution of ξ with parameters a, b and M is set to L.
The prior distribution on the hyper-parameters is modeled as independent uniform, where M ∈ {U, L} and the constants a M j , b M j are given in Table S1, along with the values of the logevidences for the two models. The model U is according to the Bayesian model selection criterion, an order of magnitude more plausible and thus will be used for the further inference. Table S1: HB 12,R inference: lower bound and width for each of the hyper-parameters defined in equation (S15). Log-evidences for each hyper-prior model at the last row.  Table S2: HB p,R inference: lower bound and width for each of the hyper-parameters defined in equation (S17). Log-evidences for each hyper-prior model at the last row. Inference for LJ 6-p We assume and consider the following three models: 3) truncated normal: p(ϑ j | ψ, M) = T (ϑ j | ψ 2j−1 , ψ 2j ), where T (·|a, b) is the truncated normal distribution with parameters a, b and M is set to T .
The prior distribution on the hyper-parameters is modeled as independent uniform, where M ∈ {U, L, T } and the constants a M j , b M j are given in Table S2, along with the values of the log-evidences for the three models. As it can be seen, the uniform model is the most plausible one.

S3 Posterior parameter distribution for LJ 6-12 and LJ 6-p
This section presents the posterior distributions for LJ 6-12 and LJ 6-p potentials obtained in HB TMCMC runs for each thermodynamic condition, as well as, the distribution for the quantum dimer-based Bayesian inference. Each plot contains all the TMCMC samples of the last stage and it consists of the following parts: histograms of marginal distributions of parameters are shown on the diagonal, projections of the samples to all pair 2-d subspaces in the parameter space colored by the log-likelihood values are given above the diagonal, the corresponding densities constructed via a bivariate kernel estimate are depicted below the diagonal. Green star shows the parameters from Ref. [10], green square indicates the parameters from Ref. [11], and green circle marks the parameters from Ref. [12,13].     Figure S6: LJ parameters distributions obtained in B p,R for L 3 with the prior for p restricted to [12,14].