Differentiable sampling of molecular geometries with uncertainty-based adversarial attacks

Neural network (NN) interatomic potentials provide fast prediction of potential energy surfaces, closely matching the accuracy of the electronic structure methods used to produce the training data. However, NN predictions are only reliable within well-learned training domains, and show volatile behavior when extrapolating. Uncertainty quantification methods can flag atomic configurations for which prediction confidence is low, but arriving at such uncertain regions requires expensive sampling of the NN phase space, often using atomistic simulations. Here, we exploit automatic differentiation to drive atomistic systems towards high-likelihood, high-uncertainty configurations without the need for molecular dynamics simulations. By performing adversarial attacks on an uncertainty metric, informative geometries that expand the training domain of NNs are sampled. When combined with an active learning loop, this approach bootstraps and improves NN potentials while decreasing the number of calls to the ground truth method. This efficiency is demonstrated on sampling of kinetic barriers, collective variables in molecules, and supramolecular chemistry in zeolite-molecule interactions, and can be extended to any NN potential architecture and materials system.

errors are correlated to higher variances, a high variance does not necessarily imply a high error. This is often the case of undersampled regions where the interpolating power of the NN potential is able to perform good predictions. Moreover, errors are more pronounced for variances above the 80th percentile threshold for forces, illustrating its classification power for epistemic error.
Using the variance in forces σ 2 F as the uncertainty metric, we construct the adversarial loss L adv using Eq. (11) from the main paper, as shown in Fig. 2. In particular, Fig. 2d illustrates that upon a reasonable choice of temperature, transition states and points beyond the training set are favorably sampled by the adversarial attack.
To demonstrate the ability of adversarial attacks and active learning loops to efficiently explore the phase space, we repeat Fig. 2a of the main text for the 1D well ( Fig. 3) by imposing an offset between the two wells, E(x) = 5x 4 − 10x 2 + 1.5x. (2) After the successive application of adversarial attacks, the well centered at x = 1 is discovered and sampled until the uncertainty in the region becomes below the 80th percentile of the force variance. The choice of temperature often prevents the adversarial loss from going towards infinity as r → ±∞ unless the predicted energy becomes negative in these directions (generations 2 and 3 of Fig. 3). Sampling the points r → ±∞ is avoided by using a fixed number of steps for optimizing δ, which essentially limits how much the attack can travel.
If the energy uncertainty σ 2 E is employed to create the adversarial loss L adv instead of the force uncertainty, the exploration of the phase space is not performed as efficiently. In addition to performing adversarial attacks on predefined collective variables (CVs), we can also sample new geometries by applying small distortions (δ) to the positions of all atoms, thus creating new geometries not seen in the training data. After training each generation of NN committee on the original molecular dynamics data (see Methods for NN training details), 700 training configurations from the training data are randomly chosen as seed geometries for the adversarial attacks. To ensure that physically meaningful configurations are obtained from the sampling, the normalized temperature kT of the adversarial loss is set to 3 kcal/mol. The attacks were performed for 80 epochs using the Adam optimizer with a learning rate of 5 × 10 −3 . This all-atom (AA) adversarial attack strategy was performed for 3 generations. Although the AA attack strategy allows the NN committee to sample various high-energy configurations, the phase space is not well explored, as CVs of attacked geometries are similar to the original seed CVs. In fact, the robustness of NN potential does not increase even when the size of the training data increases by around 20 % (Fig. 14).
To sufficiently explore the configuration space, we coupled 3 generations of AA attacks after 7 generations of CV attacks (CV + AA) (see Fig. 15). The CV adversarial attack follows the same procedure employed in the main paper (see Methods). After 7 generations of CV attacks, 3 generations of AA attacks with the procedure described above were performed.
For each generation, 500 adversarial attacks were sampled. Half of the seed geometries was randomly selected from CV adversarial attacks while the other half was taken from the molecular dynamics data. While AA attacks alone fail to improve robustness of NN potentials, CV + AA attacks were able to yield much longer stable MD trajectories (Fig. 14). This suggests that CV attacks are necessary to capture collective dynamics such as bond rotations which are not easily explored via translation-based adversarial attacks. CV attacks allow the sampled configurations to access geometries outside the low-energy bounds.
On the other hand, translation-based adversarial attacks supplement the diverse vibrational space offered by the increasing number of atoms in the system.

Supplementary Note 3. Adversarial attacks using ANI-1x models
To exemplify the use of adversarial sampling with different architectures and data sets, we performed adversarial attacks using ANI-1x models on three molecules from the ANI-1x dataset: methane (CH 4 ), ammonia (NH 3 ), and water (H 2 O), in addition to the alanine dipeptide system studied in this work. 42 methane, 36 ammonia and 22 water molecules were randomly selected from the ANI-1x data set, and 100 alanine dipeptide configurations were randomly selected from the dataset from Section III.C of main paper. The ANI-1x dataset was obtained from the TorchANI v2.2 GitHub repository 1 . The normalized temperature was kT set to 0.7 kcal/mol and adversarial attacks were performed for 70 steps for methane, ammonia and water, and 100 steps for alanine dipeptide using the Adam optimizer at a learning rate of 5 × 10 −5 . The NN ensemble consists of 8 ANI-1x models, pretrained on the ANI-1x data set. The evolution of the maximum standard deviation of atomic forces across models (max force std) as a function of steps is shown in Fig. 17a. Even though the max force std does not increase monotonically when all seeds are analyzed at once, adversarial attacks continuously attempt to push molecules towards configurations of higher uncertainty. Within 100 steps, many configurations of high RMSD are obtained (Fig. 17b). Since thermodynamic likelihoods of molecules are taken into account, total energies of molecules rarely exceed 100 kcal/mol above the ground state energy, even at relatively high RMSDs (Fig. 18c). In some cases, the attacked geometries have relatively low RMSD (< 0.05) compared to the training set (see Table 1). Intuitively, attack geometries with higher RMSD could improve the robustness of neural network potentials, although there may exist a trade-off between accuracy and transferability towards high energy regions of the configuration space (see Section III.C of the main paper). In many cases, however, NN potentials fail even within training domain of data. Hence, the adversarial sampling strategy samples configurations most likely to confuse the models. This task would otherwise be difficult to sample without a differentiable uncertainty metric, even when the sampled structures are geometrically similar to the training domain due to the non-linearity of NN predictions.
To compare the force uncertainties across different molecules, the distributions of the relative force std and max force std as a function of energy per atom are shown in Figs 18a,b. The relative force std is calculated according to with i = arg max σ F,i .
Since methane, ammonia and water are much smaller than alanine dipeptide, a smaller number of adversarial steps is usually required to push the geometries outside the training domains of the neural network models. However, we have also attempted adversarial attacks for small molecules with a larger number of steps to verify if higher uncertainty configurations could be obtained without breaking the molecules (see Figs. 19 and 20). Interestingly, at a high normalized temperature and with just 30 extra steps, more distorted geometries can be obtained from the seed configurations, all of which have high force uncertainty. However, the energy of these new configurations is much higher compared to the training set, some of which exceed 10 kcal/mol/atom. This indicates that obtaining geometries outside of the ANI-1x training set requires pushing towards much higher energy configurations.    Table 2: k-points mesh for each of the zeolites studied in this work. All meshes were constructed using a uniform k-point density of 64 k-points/Å −3 .

II. SUPPLEMENTARY TABLES
Host k-points mesh Host k-points mesh Host k-points mesh Continued on the next page Host k-points mesh Host k-points mesh Host k-points mesh