Abstract
Molecular dynamics simulations provide theoretical insight into the microscopic behavior of condensedphase materials and, as a predictive tool, enable computational design of new compounds. However, because of the large spatial and temporal scales of thermodynamic and kinetic phenomena in materials, atomistic simulations are often computationally infeasible. Coarsegraining methods allow larger systems to be simulated by reducing their dimensionality, propagating longer timesteps, and averaging out fast motions. Coarsegraining involves two coupled learning problems: defining the mapping from an allatom representation to a reduced representation, and parameterizing a Hamiltonian over coarsegrained coordinates. We propose a generative modeling framework based on variational autoencoders to unify the tasks of learning discrete coarsegrained variables, decoding back to atomistic detail, and parameterizing coarsegrained force fields. The framework is tested on a number of model systems including single molecules and bulkphase periodic simulations.
Introduction
Coarsegrained (CG) molecular modeling has been used extensively to simulate complex molecular processes with lower computational cost than allatom simulations.^{1,2} By compressing the full atomistic model into a reduced number of pseudoatoms, CG methods focus on slow collective atomic motions while averaging out fast local motions. Current approaches generally focus on parameterizing coarsegrained potentials from atomistic simulations^{3} (bottomup) or experimental statistics (topdown).^{4,5} The use of structurebased coarsegrained strategies has enabled important theoretical insights into polymer dynamics^{6,7,8,9} and lipid membranes^{10} at length scales that are otherwise inaccessible. Beyond efforts to parameterize CG potentials given a predefined allatom to CG mapping, the selection of an appropriate map plays an important role in recovering consistent CG dynamics, structural correlation, and thermodynamics.^{11,12} A poor choice can lead to information loss in the description of slow collective interactions that are important for glass formation and transport. Systematic approaches to creating lowresolution protein models based on essential dynamics have been proposed,^{13} but a systematic bottomup approach is missing for organic molecules of various sizes, resolutions, and functionalities. In general, the criteria for selecting CG mappings are usually based on a priori considerations and chemical intuition. Moreover, although there have been efforts in developing backmapping algorithms,^{14,15,16,17,18} the statistical connections are missing to reversibly bridge resolutions across scales. We aim to address such multiscale gaps in molecular dynamics using machine learning.
Recently, machine learning tools have facilitated the development of CG force fields^{19,20,21,22,23} and graphbased CG representations.^{24,25} Here we propose to use machine learning to optimize CG representations and deep neural networks to fit coarsegrained potentials from atomistic simulations. One of the central themes in learning theory is finding optimal hidden representations that capture complex statistical distributions to the highest possible fidelity using the fewest variables. We propose that finding coarsegrained variables can be formulated as a problem of learning latent variables of atomistic distributions. Recent work in unsupervised learning has shown great potential in uncovering the hidden structure of complex data.^{26,27,28,29} As a powerful unsupervised learning technique, variational autoencoders (VAEs) compress data through an information bottleneck^{30} that continuously maps an otherwise complex data set into a lowdimensional space and can probabilistically infer the real data distribution via a generating process. VAEs have been applied successfully to a variety of tasks, from image denoising^{31} to learning compressed representations for text,^{32} celebrity faces,^{33} arbitrary grammars^{29,34}, and molecular structures.^{35,36} Recent studies have used VAElike structures to learn collective molecular motions by reconstructing timelagged configurations^{37} and Markov state models.^{38} For the examples mentioned, compression to a continuous latent space is usually parameterized using neural networks. However, coarsegrained coordinates are latent variables in 3D space, and need specially designed computational parameterization to maintain the Hamiltonian structure for discrete particle dynamics.
Motivated by statistical learning theory and advances in discrete optimization, we propose an autoencoderbased generative modeling framework that (1) learns discrete coarsegrained variables in 3D space and decodes back to atomistic detail via geometric backmapping; (2) uses a reconstruction loss to help capture salient collective features from allatom data; (3) regularizes the coarsegrained space with a semisupervised mean instantaneous force minimization to obtain a smooth coarsegrained freeenergy landscape; and (4) variationally finds the highly complex coarsegrained potential that matches the instantaneous mean force acting on the allatom training data.
Results
Figure 1 shows the general schematics of the proposed framework, which is based on learning a discrete latent encoding by assigning atoms to coarsegrained particles. In Fig. 1b, we illustrate the computational graph of Gumbelsoftmax reparameterization,^{39,40} which continuously relaxes categorical distributions for learning discrete variables. We first apply the coarsegrained autoencoders to trajectories of individual gasphase molecules. By variationally optimizing encoder and decoder networks to minimize the reconstruction loss as in Eq. (1), the autoencoder picks up salient coarsegrained variables that minimize the fluctuation of encoded atomistic motions conditioned on a linear backmapping function. We adopt an instantaneousforce regularizer (described in the Methods section), to minimize the force fluctuations of the encoded space. This facilitates the learning of a coarsegrained mapping that corresponds to a smoother coarsegrained freeenergy landscape. For the unsupervised learning task, we minimize the following loss function.
The first term on the righthand side of Eq. (1) represents the atomwise reconstruction loss and the second term represents the average instantaneous mean force regularization. The relative weight \(\rho\) is a hyperparameter describing the relative importance of the force regularization term. The force regularization loss is discussed in the Methods section and training details in the Supplementary Information.
We show the unsupervised autoencoding process for gasphase orthoterphenyl (OTP) and aniline (\({{\mathrm{C}}}_{6}{{\mathrm{H}}}_{7}{\mathrm{N}}\)) in Fig. 2. The results show that the optimized reconstruction loss decreases with increasing coarsegrained resolution and that a small number of coarsegrained atoms have the potential to capture the overall collective motions of the underlying atomistic process. The reconstruction loss represents the information loss of coarsegrained particles to represent collective atomistic motions conditioned on a deterministic backmapping. In the case of OTP, an intuitive 3bead mapping is learned that partitions each of the phenyl rings. However, such an encoding loses the configuration information describing the relative rotation of the two side rings, resulting in decoded structures that yield higher error. When the number of coarsegrained degrees of freedom increases to 4, the additional beads are able to encode more configurational information than threebead models and therefore can decode back into atomistic coordinates with high accuracy. We further apply the autoencoding framework to a small peptide molecule to examine, as a function of CG resolution, the capacity of the coarsegrained representation to capture the critical collective variables of the underlying atomistic states. Although it is not able to recover the arrangement of hydrogen atoms (Fig. 3), the coarsegrained latent variables of 8 CG atoms can faithfully recover heavy atom positions and represent different collective states in the Ramanchadran map as the coarsegrained resolution is increased (Fig. 3).
The regularization term (second term in Eq. (1)) addresses the instantaneous mean forces that arise from transforming the allatom forces. Inspired by gradient domain regularization in deep learning^{41,42,43} and the role of fluctuations in the generalized Langevin framework,^{44} we minimize the average instantaneous force as a regularization term to facilitate the learning of a smooth coarsegrained freeenergy surface and to average out fast dynamics. The factor \(\rho\) is a hyperparameter that controls the interplay between reconstruction loss and force regularization and is typically set to the highest value for which the CG encoding still uses all alloted dimensions.
In Figs 4, 5, 6, and 7, we demonstrate the applicability of the proposed framework to bulk simulations of liquids for small (\({{\rm{C}}}_{2}{{\rm{H}}}_{6}\), \({{\rm{C}}}_{3}{{\rm{H}}}_{8}\)) and longchain (\({{\rm{C}}}_{24}{{\rm{H}}}_{50}\)) alkanes. Coarsegrained resolutions of 2 and 3 are used for ethane and propane, respectively, while two coarsegrained resolutions of 8 and 12 are used for the \({{\rm{C}}}_{24}{{\rm{H}}}_{50}\) alkane melt. We first train an autoencoder to obtain the latent coarsegrained variables for ethane, propane, and \({{\rm{C}}}_{24}{{\rm{H}}}_{50}\), and subsequently train a neural networkbased coarsegrained force field with additional excluded volume interactions using force matching to minimize Eq. (15) (in the case of \({{\rm{C}}}_{24}{{\rm{H}}}_{50}\), only the backbone carbon atoms are represented). Coarsegrained simulations are then carried out at the same density and temperature as the atomistic simulation. We include the training details and model hyperparameters in the Supplementary Information. Coarsegrained forces are evaluated using PyTorch^{45} and an MD integrator based on ASE (Atomistic Simulations Environment).^{46}
By minimizing the instantaneous forcematching loss term according to Eq. (15) in the Methods section, the neural network shows sufficient flexibility to reproduce a reasonably accurate structural correlation function. In the case of \({{\rm{C}}}_{24}{{\rm{H}}}_{50}\) (Figs 6 and 7), the neural network captures the bimodal bond length distribution for the coarsegrained \({{\rm{C}}}_{24}{{\rm{H}}}_{50}\)chains and reproduces the endtoend distance distribution and mapped monomer pair distribution function accurately. The mean squared displacement plots for all systems demonstrate faster dynamics than the atomistic ground truth due to loss of atomistic friction in the coarsegrained space. For \({{\rm{C}}}_{24}{{\rm{H}}}_{50}\), we also investigate the decoded CC structural correlations shown in Fig. 8. The interchain structural correlation shows good agreement with the underlying atomistic ground truth, while the C–C bond distances are predicted to be shorter because the coarsegrained superatoms can only infer average carbon poses based on the the deterministic inference framework using a linear backmapping. The prospect of stochastic decoding functions to capture statistical upscaling is discussed below.
Discussion
Within the current framework, there are several possibilities for future research directions regarding both the supervised and unsupervised parts.
Here, we have presented a choice of deterministic encoder and decoder. However, such a deterministic CG mapping results, by construction, in an irreversible loss of information. This is reflected in the reconstruction of average allatom structures instead of the reference instantaneous configurations. To infer the underlying atomistic distributions, past methods have used random structure generation followed by equilibration.^{14,15,16,17} By combining this with predictive inference for atomistic backmapping,^{18} a probabilistic autoencoder can learn a reconstruction probability distribution that reflects the thermodynamics of the degrees of freedom that were averaged out by the coarsegraining. Using this framework as a bridge between different scales of simulation, generative models can help build better hierarchical understanding of multiscale simulations.
Furthermore, neural network potentials provide a powerful flitting framework to capture manybody correlations. The choice of forcematching approach does not guarantee the recovery of individual pair correlation functions derived from full atomistic trajectories^{12,47} because the crosscorrelations among coarsegrained degrees of freedom are not explicitly incorporated. More advanced fitting methods can be incorporated in the current neural network framework to address the learning of structural crosscorrelation, including iterative force matching^{47} and relative entropy method.^{48}
Methods based on forcematching, like other bottomup approaches such as relative entropy method, attempt to reproduce structural correlation functions at one point in the thermodynamic space. As such, they are not guaranteed to capture nonequilibrium transport properties^{12,49} and are not necessarily transferable among different thermodynamic conditions.^{12,50,51,52,53} The datadriven approach we propose enables learning over different thermodynamic conditions. In addition, this framework opens new routes to understanding how the coarsegrained representation influences transport properties by training on timeseries data. A related example in the literature is to use a timelagged autoencoder^{37} to learn a latent representation that best captures molecular kinetics.
In summary, we propose to treat the coarsegrained coordinates as latent variables which can be sampled with coarsegrained molecular dynamics. By regularizing the latent space with force regularization, we train the encoding mapping, a deterministic decoding, and a coarsegrained potential that can be used to simulate larger systems for longer times and thus accelerate molecular dynamics simulations. Our work also enables the use of statistical learning as a basis to bridge across multiscale coarsegrained simulations.
Methods
Here we introduce the autoencoding framework from the generative modeling point of view. The essential idea is to treat coarsegrained coordinates as a set of latent variables that are the most predictive of the atomistic distribution while having a smooth underlying freeenergy landscape. We show that this is achieved by minimizing the reconstruction loss and the instantaneous force regularization term. Moreover, under the variational autoencoding framework, we can understand the force matching as the minimization of the relative entropy between coarsegrained and atomistic distributions in the gradient domain.
Coarsegraining autoencoding
The essential idea in generative modeling is to maximize the likelihood of the data under the generative process:
where z are the latent variables that carry the essential information of the distributions and x represents the samples observed in the data. Variational autoencoders maximize the likelihood of the observed samples by maximizing the evidence lower bound (ELBO):
where \({Q}_{\phi }(z x)\) encodes the data into latent variables, \({P}_{D}(x z)\) is the generative process parameterized by D, and \(P(z)\) is the prior distribution (usually a multivariate Gaussian with a diagonal covariance matrix) which imposes a statistical structure over the latent variables. Minimizing the ELBO by propagating gradients through the probability distributions provides a parameterizable way of inferring complicated distributions of molecular dynamics.
Similar to variational autoencoders with constraint on the latent space, a coarsegrained latent space should preserve the structure of the molecular mechanics phase space. Noid et al.^{3} have studied the general requirements for a physically rigorous encoding function. In order to address those requirements, the autoencoder is trained to optimize the reconstruction of atomistic configurations by propagating them through a lowdimensional bottleneck in Cartesian coordinates. Unlike most instances of VAEs, the dimensions of the CG latent space have physical meaning. Since the CG space needs to represent the system in position and momentum space, latent dimensions need to correspond to realspace Cartesian coordinates and maintain the essential structural information of molecules.
We make our encoding function a linear projection in Cartesian space \(E(x):{{\mathbb{R}}}^{3n}\to {{\mathbb{R}}}^{3N}\) where n is the number of atoms and N is the desired number of coarsegrained particles.
Let x be the atomistic coordinates and z be the coarsegrained coordinates. The encoding function should satisfy the following requirements:^{3,54}
 1.
\({z}_{ik}=E(x)={\sum }_{j=1}^{n}{E}_{ij}{x}_{jk}\in {{\mathbb{R}}}^{3},i=1\ldots N,j=1\ldots n\),
 2.
\({\sum }_{j}{E}_{ij}=1\:{\mathrm{and}}\;{E}_{ij}\ge 0\)
 3.
Each atom contributes to at most one coarsegrained variable z
where \({E}_{ij}\) defines the assignment matrix to coarsegrained variables, j is the atomic index, i is the coarsegrained atom index, and k represents the Cartesian coordinate index. Requirement (2) defines the coarsegrained variables to be a weighted geometric average of the Cartesian coordinates of the contributing atoms. In order to maintain consistency in momentum space after the coarsegrained mapping, the coarsegrained masses are redefined as \({M}_{i}={({\sum }_{j}\frac{{E}_{ij}^{2}}{{m}_{j}})}^{1}\)^{3,54} (\({m}_{j}\) is the mass of atom j). This definition of mass is a corollary of requirement (3).
The encoder function parameters are initialized randomly as atomwise vectors \(\vec{\phi_{j}}\), with its elements ϕ_{ij} representing the parameter of assigning individual atom j to coarsegrained atom i; ϕ_{ij} is further reparameterized to obtain coarsegraining encoding weights satisfying the requirements shown above. The goal of the parameterizable coarsegraining encoding function is to learn a onehot assignment from each atom to a coarsegrained variable. Its weights are obtained by normalizing over the total number of contributing atoms per coarsegrained atom (an alternative is to normalize based on atomic masses or charges), thus satisfying requirement (2):
\(\vec{{C}_{j}}\) is the coarsegraining onehot assignment vector for atom j using Gumbelsoftmax reparameterization with each vector element C_{ij} representing the assignment of atom j to coarsegrained atom i, so that requirement (3) is automatically satisfied. Gumbelsoftmax reparameterization is a continuous relaxation of Gumbelmax reparameterization for differentiable approximation using the softmax function.^{40} Similar parameterization techniques include concrete distributions,^{39} REBAR^{55} and RELAX.^{56} The Gumbelsoftmax reparameterization has been applied in various machine learning scenarios involving learning discrete structures,^{57} propagating discrete policy gradient in reinforcement learning^{58} and generating contextfree grammar.^{34} The continuously relaxed version of Eq. (5) is:
where \({g}_{ij}\) is sampled from the Gumbel distribution via the inverse transformation \({g}_{ij}={\mathrm{log}}({\mathrm{log}}({u}_{ij}))\;{\mathrm{where}}\;{u}_{ij}\) is sampled from a uniform distribution from 0 to 1. During training, \(\tau\) is gradually decreased with the training epoch and the onehot categorical encoding is achieved in the limit of small \(\tau\). Therefore, the encoding distribution \(Q(z x)\) is a linear projection operator parameterized by discrete atomwise categorical variables.
For the generation of atomistic coordinates conditioned on coarsegrained coordinates, we opt for a simple decoding approach via geometrical projection using a matrix \({\bf{D}}\) of dimension n by N that maps coarsegrained variables back to the original space so that \(\hat{x}=D(z)={\sum }_{i=1}^{i=N}{{\bf{D}}}_{ji}{z}_{ik}\;{\mathrm{where}}\;\hat{x}\) are the reconstructed atomistic coordinates. Hence, both the encoding and decoding mappings are deterministic. However, deterministic reconstruction via a lowdimensional space leads to irreversible information loss that is analogous to the mapping entropy introduced in Shell et al.^{48}. In our experiments, by assuming \({P}_{D}(x z)\) is Gaussian, the reconstruction loss yields the termbyterm meansquared error and is understood as a Gaussian approximation to the mapping entropy (scaled by the variance) defined by Shell et al.:
where \(\Omega (E(x))\) is the configuration space volume that is mapped to the atomistic coordinates. The latent variable framework provides a clear parameterizable objective whose optimization minimizes the information loss due to coarsegraining by using the following objective as reconstruction loss.
Hence, we present an analogous interpretation of the reconstruction loss in Eq. (3) but in the Cartesian space of coarsegrained pseudoatoms in molecular dynamics. This loss can be optimized by algorithm 1. A regularized version is introduced in section C.
Variational force matching
The physical meaning of the regularization term has a natural analogy to the minimization of Kullback–Leibler divergence (KL divergence for short, also called relative entropy) in coarsegrained modeling to reduce the discrepancy between mapped atomistic distributions and coarsegrained distributions conditioned on a Boltzmann prior. The distribution function of coarsegrained variables \(p(z)\) and the corresponding manybody potential of mean force \(A(z)\) are:
where \(V(x)\) is the atomistic potential energy function and \(E(x)\) is the encoding function defined by requirement (2). Unlike the VAE, which assumes a prior Gaussian structure in the latent space, the coarsegrained latent prior (1) is variationally determined by fitting the coarsegrained energy function, and (2) has no closedform expression for the KL loss. To recover the true \({P}_{CG}(z)\) requires constrained sampling to obtain the coarsegrained freeenergy. To bypass such difficulties, we parameterize the latent distributions by matching the instantaneous mean forces.In order to learn the coarsegrained potential energy \({V}_{CG}\) as a function of alsolearned coarsegrained coordinates, we propose an instantaneous forcematching functional that is conditioned on the encoder. Unlike regularizing KL loss in the context of training a VAE, which is straightforward to evaluate, the underlying coarsegrained distributions are intractable. However, matching the gradient of the log likelihood of mapped coarsegrained distributions (the mean force) is more computationally feasible. Training potentials from forces has a series of advantages: (i) the explicit contribution on every atom is available, rather than just pooled contributions to the energy, (ii) it is easier to learn smooth potential energy surfaces and energyconserving potentials^{59}, and (iii) instantaneous dynamics, which represent a tradeoff in coarsegraining, can be better captured. Forces are always available if the training data comes from molecular dynamics simulations, and for common electronic structure methods based on density functional theory, forces can be calculated at nearly the same cost as selfconsistent energies.
The forcematching approach builds on the idea that the average force generated by the coarsegrained potential \({V}_{CG}\) should reproduce the coarsegrained atomistic forces from thermodynamic ensembles.^{19,60,61}
Given an atomistic potential energy function \(V(x)\) with the partition function Z, the probabilistic distribution of atomistic configurations is:
The distribution function of coarsegrained variables \({P}_{CG}(z)\) and the corresponding manybody potential of mean force \(A(z)\) are:
The mean force of the coarsegrained variables is the average of the instantaneous forces conditioned on \(E(x)=z\)^{54,62}, assuming the coarse grained mapping is linear:
where \(F(z)\) is the mean force and \({\bf{b}}\) represents a family of possible vectors such that \({{\bf{w}}}^{\top }\nabla E(x)\ne 0\). We further define \({F}_{{\mathrm{inst}}}(z)={\bf{b}}\nabla V(x)\) to be the instantaneous force and its conditional expectation is equal to the mean force \(F(z)\). It is important to note that \({F}_{{\mathrm{inst}}}(z)\) is not unique and depends on the specific choice of w,^{61,62,63} but their conditional averages return the same mean force. For possible b, we further choose \(w=\nabla E(x)\) which is a wellstudied choice,^{61,63} so that:
where \({\bf{b}}\) is a function of \(\nabla E(x)\). In the case of coarsegraining encodings, \({\bf{b}}={\bf{C}}\ {\rm{where}}\ {\bf{C}}\) is the encoding matrix formed by concatenating atomwise onehot vectors as defined in Eq. (6). We adopt the forcematching scheme introduced by Izvekov et al.^{60,64}, in which the meansquared error is used to match the mean force and the “coarsegrained force" is the negative gradient of the coarsegrained potential. The optimizing functional, developed based on Izvekov et al., is
where \(\theta\) are the parameters in \({V}_{CG}\;{\mathrm{and}}\;\nabla {V}_{CG}\) represents the “coarse grained forces" which can be obtained from automatic differentiation as implemented in opensource packages like PyTorch.^{45} However, to compute the mean force F would require constrained dynamics^{61} to obtain the average of the fluctuating microscopic forces. According to Zhang et al^{19}, the forcematching functional can be alternatively formulated by treating the instantaneous mean force as an instantaneous observable with a welldefined average being the mean force \(F(z)\):
on the condition that \({{\mathbb{E}}}_{z}[{F}_{{\mathrm{inst}}}]=F(z)\). The original variational functional becomes instantaneous in nature and can be reformulated as the following minimization target:
Instead of matching mean forces that need to be obtained from constrained dynamics, our model minimizes \({L}_{{\mathrm{inst}}}\) with respect to \({V}_{CG}(z)\;{\mathrm{and}}\;E(x)\). \({L}_{{\mathrm{inst}}}\) can be shown to be related to L with some algebra: \({L}_{{\mathrm{inst}}}=L+{\mathbb{E}}[\epsilon {(E(x))}^{2}]\).^{19} This functional provides a variational way to find a CG mapping and its associated force field functions.
Instantaneous mean force regularization
Here we introduce the gradient regularization term that is designed to minimize the fluctuation in the mean forces. Similar methods involving gradient regularization have been applied in supervised learning computer vision tasks to smoothen the loss landscape for improved model generalization.^{41,42,43} In coarsegrained modeling, minimizing the forces is important for learning the slow degrees of freedom and a smoother freeenergy surface.
Based on the generalized Langevin equation, the difference between the true mean force and instantaneous mean force \(\epsilon (E(x))\) can be approximated as:^{44,65}
where \(\gamma\) is the friction coefficient, \(\beta (\tau )\) is the memory kernel, \(\widetilde{\eta }(t)\) is the colored Gaussian noise, and \({\sum }_{i}^{j}{C}_{ij}{\eta }_{j}\) is the mapped atomistic white noise. To avoid the need for special dynamics when running ensemble calculations, it is desirable to minimize the memory and fluctuation term to yield dynamics with fewer fluctuation terms. A related example in the work by Guttenberg et al.^{44} who compare the memory heuristics among coarsegrained mapping function. The objective we propose can be optimized by gradient descent to continuously explore the coarsegrained mapping space without iterating over the combinatorial spaces. We perform this regularization by minimizing the meansquared instantaneous forces over minibatches of atomistic trajectories to optimize the CG mappings.
In practice, this regularization loss is combined with \({L}_{ae}\) to obtain a coarsegrained mapping with a certain weight \(\rho\) that is added onto the reconstruction loss. We discuss the practical effect of including the regularization term in the Supplementary Information.
Data availability
Data for training the model is available upon request. An implementation of the algorithm described in the paper is available at https://github.com/learningmattermit/CoarseGrainingAutoencoders.
References
 1.
Agostino, M. D., Risselada, H. J., Lürick, A., Ungermann, C. & Mayer, A. A tethering complex drives the terminal stage of SNAREdependent membrane fusion. Nature 551, 634–638 (2017).
 2.
Huang, D. M. et al. Coarsegrained computer simulations of polymer / fullerene bulk heterojunctions for organic photovoltaic applications. J. Chem. Theory Comput. 6, 1–11 (2010).
 3.
Noid, W. G. et al. The multiscale coarsegraining method. I. A rigorous bridge between atomistic and coarsegrained models. J. Chem. Phys. 128, 243116 (2008).
 4.
Marrink, S. J., Risselada, H. J., Yefimov, S., Tieleman, D. P. & De Vries, A. H. The MARTINI force field: coarse grained model for biomolecular simulations. J. Phys. Chem. B 111, 7812–7824 (2007).
 5.
Periole, X., Cavalli, M., Marrink, S.J. & Ceruso, M. A. Combining an elastic network with a coarsegrained molecular force field: structure, dynamics, and intermolecular recognition. J. Chem. Theory Comput. 5, 2531–2543 (2009).
 6.
Wijesinghe, S., Perahia, D. & Grest, G. S. Polymer topology effects on dynamics of comb polymer melts. Macromolecules 51, 7621–7628 (2018).
 7.
Salerno, K. M., Agrawal, A., Peters, B. L., Perahia, D. & Grest, G. S. Dynamics in entangled polyethylene melts. Eur. Phys. J. Spec. Topics 225, 1707–1722 (2016).
 8.
Salerno, K. M., Agrawal, A., Perahia, D. & Grest, G. S. Resolving dynamic properties of polymers through coarsegrained computational studies. Phys. Rev. Lett. 116, 058302 (2016).
 9.
Xia, W. et al. Energy renormalization for coarsegraining polymers having different segmental structures. Sci. Adv. 5, eaav4683 (2019).
 10.
Vögele, M., Köfinger, J. & Hummer, G. Hydrodynamics of diffusion in lipid membrane simulations. Phys. Rev. Lett. 120, 268104 (2018).
 11.
Rudzinski, J. F. & Noid, W. G. Investigation of coarsegrained mappings via an iterative generalized YvonBornGreen method. J. Phys. Chem. B 118, 8295–8312 (2014).
 12.
Noid, W. G. Perspective: coarsegrained models for biomolecular systems. J. Chem. Phys. 139, 90901 (2013).
 13.
Zhang, Z. et al. A systematic methodology for defining coarsegrained sites in large biomolecules. Biophy. J. 95, 5073–5083 (2008).
 14.
Peng, J., Yuan, C., Ma, R. & Zhang, Z. Backmapping from multiresolution coarsegrained models to atomic structures of large biomolecules by restrained molecular dynamics simulations using bayesian inference. J. Chem. Theory Comput. 15, 3344–3353 (2019).
 15.
Chen, L. J., Qian, H. J., Lu, Z. Y., Li, Z. S. & Sun, C. C. An automatic coarsegraining and finegraining simulation method: application on polyethylene. J. Phys. Chem. B 110, 24093–24100 (2006).
 16.
Lombardi, L. E., Martí, M. A. & Capece, L. CG2AA: backmapping protein coarsegrained structures. Bioinformatics 32, 1235–1237 (2016).
 17.
Machado, M. R. & Pantano, S. SIRAH tools: mapping, backmapping and visualization of coarsegrained models. Bioinformatics 32, 1568–1570 (2016).
 18.
Schöberl, M., Zabaras, N. & Koutsourelakis, P.S. Predictive coarsegraining. J. Comput. Phys. 333, 49–77 (2017).
 19.
Zhang, L., Han, J., Wang, H., Car, R. & W, E. W. DeePCG: constructing coarsegrained models via deep neural networks. J. Chem. Phys. 149, 034101 (2018).
 20.
Bejagam, K. K., Singh, S., An, Y. & Deshmukh, S. A. Machinelearned coarsegrained models. J. Phys. Chem. Lett. 9, 4667–4672 (2018).
 21.
Lemke, T. & Peter, C. Neural network based prediction of conformational free energies  a new route toward coarsegrained simulation models. J. Chem. Theory Comput. 13, 6213–6221 (2017).
 22.
Wang, J. et al. Machine learning of coarsegrained molecular dynamics force fields. ACS Cent. Sci. 5, 755–767 (2019).
 23.
Boninsegna, L., Gobbo, G., Noé, F. & Clementi, C. Investigating molecular kinetics by variationally optimized diffusion maps. J. Chem. Theory Comput. 11, 5947–5960 (2015).
 24.
Webb, M. A., Delannoy, J.Y. & de Pablo, J. J. Graphbased approach to systematic molecular coarsegraining. J. Chem. Theory Comput. 15, 1199–1208 (2018).
 25.
Chakraborty, M., Xu, C. & White, A. D. Encoding and selecting coarsegrain mapping operators with hierarchical graphs. J. Chem. Phys. 149, 134106 (2018).
 26.
Tolstikhin, I., Bousquet, O., Gelly, S., Schölkopf, B. & Schoelkopf, B. Wasserstein AutoEncoders. In Proc. International Conference on Learning Representations (2018).
 27.
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. In Proc. International Conference on Learning Representations (2015).
 28.
Goodfellow, I. J. et al. Generative adversarial networks. In Proc. Advances in Neural Information Processing Systems (2014).
 29.
Kusner, M. J., Paige, B. & HernándezLobato, J. M. Grammar Variational Autoencoder. In Proc. International Conference on Machine Learning (2017).
 30.
Tishby, N. & Zaslavsky, N. Deep Learning and the Information Bottleneck Principle. https://arxiv.org/abs/1503.02406 (2015).
 31.
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y. & Manzagol, P.A. Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010).
 32.
Bowman, S. R. et al. Generating sentences from a continuous space. In Proc. SIGNLL Conference on Computational Natural Language Learning (CONLL) (2016).
 33.
Liu, Z., Luo, P., Wang, X. & Tang, X. Deep learning face attributes in the wild. In Proc. International Conference on Computer Vision (ICCV) (2015).
 34.
Kusner, M. J. & HernándezLobato, J. M. GANS for sequences of discrete elements with the gumbelsoftmax distribution. Preprint at https://arxiv.org/abs/1611.04051 (2016).
 35.
GómezBombarelli, R. et al. Automatic chemical design using a datadriven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).
 36.
Jin, W., Barzilay, R. & Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. In Proc. International Conference on Machine Learning (2018).
 37.
Wehmeyer, C. & Noé, F. Timelagged autoencoders: deep learning of slow collective variables for molecular kinetics. J. Chem. Phys. 148, 241703 (2018).
 38.
Mardt, A., Pasquali, L., Wu, H. & Noé, F. VAMPnets for deep learning of molecular kinetics. Nat. Commun. 9, 5 (2018).
 39.
Maddison, C. J., Mnih, A. & Teh, Y. W. The concrete distribution: a continuous relaxation of discrete random variables. In Proc. International Conference on Learning Representations (2016).
 40.
Jang, E., Gu, S. & Poole, B. Categorical reparameterization with gumbelsoftmax. In Proc. International Conference on Learning Representations (2017).
 41.
Drucker, H. & LeCun, Y. Improving generalization performance using double backpropagation. IEEE Trans. Neural Netw. 3, 991–997 (1992).
 42.
Varga, D., Csiszárik, A. & Zombori, Z. Gradient regularization improves accuracy of discriminative models. arXiv https://arxiv.org/abs/1712.09936 (2017).
 43.
Hoffman, J., Roberts, D. A. & Yaida, S. Robust learning with jacobian regularization. arXiv https://arxiv.org/abs/1908.02729 (2019).
 44.
Guttenberg, N. et al. Minimizing memory as an objective for coarsegraining. J. Chem. Phys. 138, 094111 (2013).
 45.
Paszke, A. et al. Automatic differentiation in pytorch. In NIPSWorkshop (2017).
 46.
HjorthLarsen, A. et al. The atomic simulation environment  A Python library for working with atoms. Matter 29, 273002 (2017).
 47.
Lu, L., Dama, J. F. & Voth, G. A. Fitting coarsegrained distribution functions through an iterative forcematching method. J. Chem. Phys. 139, 121906 (2013).
 48.
Shell, M. S. CoarseGraining With The Relative Entropy. In Advances in Chemical Physics, Vol. 161, p. 395–441 (WileyBlackwell, 2016).
 49.
Davtyan, A., Dama, J. F., Voth, G. A. & Andersen, H. C. Dynamic force matching: a method for constructing dynamical coarsegrained models with realistic time dependence. J. Chem. Phys. 142, 154104 (2015).
 50.
Carbone, P., Varzaneh, H. A. K., Chen, X. & MüllerPlathe, F. Transferability of coarsegrained force fields: the polymer case. J. Chem. Phys. 128, 64904 (2008).
 51.
Krishna, V., Noid, W. G. & Voth, G. A. The multiscale coarsegraining method. IV. Transferring coarsegrained potentials between temperatures. J. Chem. Phys. 131, 24103 (2009).
 52.
Xia, W. et al. Energy renormalization for coarsegraining the dynamics of a model glassforming liquid. J. Phys. Chem. B 122, 2040–2045 (2018).
 53.
Xia, W. et al. Energyrenormalization for achieving temperature transferable coarsegraining of polymer dynamics. Macromolecules 50, 8787–8796 (2017).
 54.
Darve, E. Numerical methods for calculating the potential of mean force. In New Algorithms for Macromolecular Simulation, p. 213–249 (SpringerVerlag, Berlin/Heidelberg, 2006).
 55.
Tucker, G., Mnih, A., Maddison, C. J., Lawson, D. & SohlDickstein, J. REBAR lowvariance, unbiased gradient estimates for discrete latent variable models. In Advances in Neural Information Processing Systems. Vol. 2017, p. 2628–2637 (2017).
 56.
Grathwohl, W., Choi, D., Wu, Y., Roeder, G. & Duvenaud, D. Backpropagation through the Void: optimizing control variates for blackbox gradient estimation. In Proc. International Conference on Learning Representations (2017).
 57.
Van Den Oord, A., Vinyals, O. & Kavukcuoglu, K. Neural discrete representation learning. In. Advances in Neural Information Processing Systems. Vol. 2017, p. 6307–6316 (2017).
 58.
Wu, Y., Wu, Y., Gkioxari, G. & Tian, Y. Building generalizable agents with a realistic and rich 3D environment. https://arxiv.org/abs/1801.02209 (2018).
 59.
Chmiela, S. et al. Machine learning of accurate energyconserving molecular force fields. Sci. Adv. 3, e1603015 (2017).
 60.
Izvekov, S. & Voth, G. A. A multiscale coarsegraining method for biomolecular systems. J. Phys. Chem. B 109, 2469–2473 (2005).
 61.
Ciccotti, G., Kapral, R. & VandenEijnden, E. Blue moon sampling, vectorial reaction coordinates, and unbiased constrained dynamics. ChemPhysChem 6, 1809–1814 (2005).
 62.
Kalligiannaki, E., Harmandaris, V., Katsoulakis, M. A. & Plecháč, P. The geometry of generalized force matching and related information metrics in coarsegraining of molecular systems. J. Chem. Phys. 143, 84105 (2015).
 63.
DenOtter, W. K. Thermodynamic integration of the free energy along a reaction coordinate in Cartesian coordinates. J. Chem. Phys. 112, 7283–7292 (2000).
 64.
Izvekov, S. & Voth, G. A. Multiscale coarsegraining of mixed phospholipid/cholesterol bilayers. J. Chem. Theory Comput. 2, 637–648 (2006).
 65.
Lange, O. F. & Grubmüller, H. Collective Langevin dynamics of conformational motions in proteins. J. Chem. Phys. 124, 214903 (2006).
Acknowledgements
W.W. thanks Toyota Research Institute for financial support. R.G.B. thanks MIT DMSE and Toyota Faculty Chair for support. W.W. and R.G.B. thank Prof. Adam P. Willard (Massachusetts Institute of Technology) and Prof. Salvador Leon Cabanillas (Universidad Politecnica de Madrid) for helpful discussions. W.W. thanks Mr. William H. Harris for proofreading the manuscript and helpful discussions.
Author information
Affiliations
Contributions
R.G.B. conceived the project, W.W. wrote the computer software and carried out simulations with contributions from R.G.B.; both authors wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wang, W., GómezBombarelli, R. Coarsegraining autoencoders for molecular dynamics. npj Comput Mater 5, 125 (2019). https://doi.org/10.1038/s4152401902615
Received:
Accepted:
Published:
Further reading

Is preservation of symmetry necessary for coarsegraining?
Physical Chemistry Chemical Physics (2020)

Autonome Entdeckung in den chemischen Wissenschaften, Teil I: Fortschritt
Angewandte Chemie (2020)

Ensemble learning of coarsegrained molecular dynamics force fields with a kernel approach
The Journal of Chemical Physics (2020)

CoarseGrained Models of RNA Nanotubes for Large Time Scale Studies in Biomedical Applications
Biomedicines (2020)

Backmapping coarsegrained macromolecules: An efficient and versatile machine learning approach
The Journal of Chemical Physics (2020)