Automated discovery of a robust interatomic potential for aluminum

Machine learning, trained on quantum mechanics (QM) calculations, is a powerful tool for modeling potential energy surfaces. A critical factor is the quality and diversity of the training dataset. Here we present a highly automated approach to dataset construction and demonstrate the method by building a potential for elemental aluminum (ANI-Al). In our active learning scheme, the ML potential under development is used to drive non-equilibrium molecular dynamics simulations with time-varying applied temperatures. Whenever a configuration is reached for which the ML uncertainty is large, new QM data is collected. The ML model is periodically retrained on all available QM data. The final ANI-Al potential makes very accurate predictions of radial distribution function in melt, liquid-solid coexistence curve, and crystal properties such as defect energies and barriers. We perform a 1.3M atom shock simulation and show that ANI-Al force predictions shine in their agreement with new reference DFT calculations.


Supplementary Note 2 Active Learning Details
To bootstrap the active learning procedure, we generate an initial data set of 401 DFT calculations, performed on randomized atomic configurations. Each such configuration contains between 55 to 249 Al atoms, randomly placed in an orthorhombic supercell, subject to a non-overlapping constraint, as described in the main text. Active learning proceeds in iterations. At each iteration, ANI neural networks are trained on all available DFT data, and used to carry out molecular dynamics (MD) simulations for the purpose of sampling. We use an MD step size of 1 fs. MD initial conditions are randomized. To diversify the sampling, each MD run is driven according to a randomized heating and cooling schedule as described in the main text. An uncertainty quantification technique is utilized to determine whether the ANI-Al model (under construction) is confident in its predictions for the current configuration of atoms. To estimate model uncertainty, we employ the ensemble disagreement technique used in previous work [4]. Specifically, we look at the sample variance Var[Ê] of the energy predictionsÊ over the eight neural nets that comprise a single (ensembled) ANI-Al model. We also look at the ensemble variance for the force predictions, Var[f ], averaged over all N atoms in the unit cell. If, during active learning, the ensemble variance for energy or force prediction exceeds a threshold (E thresh = 0.001 √ N eV or f thresh = 0.002 eV/Å, respectively) then a new DFT calculation is queued to help the ML model to better understand that atomic configuration. If an uncertainty threshold is exceeded, we halt MD simulation and queue a DFT calculation to acquire ground-truth training data for the current MD configuration. If an MD simulation reaches 8 hours of wall-clock time (about 200ps to 500ps of simulated time) without reaching the uncertainty threshold, it is respawned. Batches of MD simulations are carried out with multiple ANI-Al models simultaneously. DFT calculations are continuously performed on the queued configurations. If 2 hours pass since the last ANI-Al model was trained and more than 25 new data points had been generated, a new ANI-Al model is trained. In this way, full cycles of ensemble training, MD simulations for sampling, and DFT data generation are carried out simultaneously to build the training data set. Our final training data set contains 6,352 DFT calculations. Each DFT calculation provides considerable data: a supercell contains between 55 and 249 Al atoms, and DFT calculates forces for each atom. Figure 7 below illustrates the workflow for the MD simulations used during active learning.

Supplementary Note 3 ANI-Al Model Hyper-parameters
The ANI neural networks used in this work were implemented in the NeuroChem C++/CUDA software package. A batch size of 128 was used while training the ANI-Al model. A weight of 1.0 was used on the energy loss term of the loss function, while a weight of 0.01 was used on the force training loss term. Learning rate annealing was used during training, starting at a learning rate of 0.001 and converging at a learning rate of 0.00001. The learning rate annealing algorithm is described in previous work [5]. The ADAM [6] update algorithm is used during training. The network architecture is provided in Table 1

Supplementary Note 4 Nearsightedness principle and the interaction cutoff distance
The ANI-Al predicted total system energyÊ = iÊ i is a sum over local contributions centered on each atom i. For the hyperparameters listed above, each local energyÊ i incorporates information about neighboring atoms j if the pairwise distance r ij = |r i − r j | is less than 7 Å. Furthermore,Ê i can incorporate angular information r ij · r ik involving neighboring atoms j and k if both are within a smaller cutoff distance: r ij < 5 Å and r ik < 5 Å. Using i as an intermediary, the total ANI-Al energy can strongly couple atoms j and k separated by a distance of up to 10 Å, even though the "cutoff distance" is just 7 Å The ANI-Al predicted force on atom j is f j = −dÊ/dr j . Note that f j involves −dÊ i /dr j if r ij < 7 Å. In turn,Ê i can depend on atom k if r ik < 7 Å. Considering a linear geometry in which atoms j, i, and k are separated by 7 Å intervals, one observes that f j may in principle be affected by an atom k for pairwise distances of up to r jk < 14 Å.

Supplementary Note 5 Training to forces
We employed the loss function summed over systems in the dataset. Stochastic gradient descent training requires calculation of all components ∂L/∂W i of the loss gradient. Typically, there are order 10 5 model parameters W i . Frameworks such as TensorFlow or PyTorch support iterated backpropagation, thereby enabling efficient calculation of the full gradient ∂L/∂W i at a cost comparable to calculating L itself [7,8,9].
For the present work, we used the C++ Neurochem implementation of ANI [10], for which iterated backpropagation would be challenging to implement. Instead, we use a recently developed scheme to train to force data [11]. The force part of the loss gradient ∂L/∂W i can be cast as a directional derivative of ∂Ê i [r]/∂W i , in which each atomic position r j is varied in the direction of the observed force prediction error. Using central differences, the loss gradient may be approximated as where r ± j = r j ± η(f j − f j ) represents a carefully selected perturbation to the position of atom j (for all j simultaneously). Crucially, r ± j is treated as fixed with respect to variations in model parameters W i . This central difference approximation would be exact in the limit that η → 0. Guidance on selecting η at fixed floating point precision is provided in Ref. [11]. This scheme is highly efficient because, for a given atomic configuration, the full energy gradient ∂Ê/∂W i , for all model parameters W i , can be efficiently calculated using straightforward backpropagation (i.e., using the same techniques as needed to implement energy-only training). In practice, we find that the cost to calculate all ∂L/∂W i is of the same order as the cost to calculate all forces.

Supplementary Note 6 t-SNE Embeddings
For each atom in the active learning generated data set we obtain a chemical environment descriptor, which is the first hidden layer of an ANI-Al model. These chemical environment descriptors are embedded into a 2D space using the OpenTSNE python package [12]. The t-SNE algorithm is initialized with principal component analysis (PCA). The cosine distance metric is used with a perplexity of 100.

Supplementary Note 7 MD Simulation Details
Melting temperatures via solid-liquid coexistence. The melt-curve was determined using solid-liquid coexistence simulations [13,14,15] performed in the NPT ensemble. Independent NPT simulations were performed to determine equilibrium densities of solid/liquid Al determined at the relevant (P,T) conditions. The starting configurations for solid-liquid coexistence simulations, containing 16000 atoms, were created using these equilibrium densities. The simulation cells were equilibrated in NVT ensemble for 100 ps, at a temperature close to the expected melting point. At the end of the equilibration, the simulation cell consisted of liquid Al in contact with the (100) crystallographic face of Al, with roughly equal number of atoms in the solid and liquid phases. Following this, independent NPT simulations of solid-liquid coexistence were performed at a set of temperatures spanning the melting temperature at a given pressure. The time evolution of the simulation-cell volume was used to determine the upper and lower bounds of the melting temperature, i.e., an increase/decrease in the simulation cell volume with time implied that the system was above/below the melting temperature. All the melting-points presented in Figure 6 of the main article are the upper bounds determined as explained. Lower bounds are at the most 25 K lower than the reported upper bounds.
We note that a fcc-bcc phase transition was predicted for the Mendelev-EAM potential [16] at P > 20 GPa. Therefore, we calculated the melt-curve for this potential only up to 15 GPa.
Phase Transition Dynamics. For this simulation, we first slowly heat the system from 300 K to 1500 K, passing through the melt point of 933 K at atmospheric pressure. Next we slowly cool the system back to 300 K. The entire heating-cooling process runs for 750 ps. As with our previous MD simulations, we use a Langevin thermostat with a friction coefficient of 0.02 fs −1 , and we apply an MD timestep of 0.5 fs. The simulation is performed in a periodic box containing 108 atoms initialized in an FCC lattice (that is, 3 × 3 × 3 cubic unit cells). The density of the simulation is fixed by selecting a volume of (12.138 Å) 3 . At this density, the simulated pressure should be about 1 bar at the coexistence temperature.
Variable temperature and density liquid phase simulations. We iterated through five temperatures, 1000 K, 1200 K, 1400 K, 1600 K, and 1800 K. At each temperature, we used ANI-Al to drive MD simulations of 108 atoms at five volumes, 0.850V 0 , 0.925V 0 , 1.00V 0 , 1.075V 0 , and 1.15V 0 , where V 0 is the mean volume at zero pressure. Each simulation was carried out for 60 ps. We sampled 10 snapshots from each of the 5 × 5 MD trajectories. Figure S5 shows the accuracy of the ANI-Al model on each trajectory.     [28], DeepPot (DP) [29], and the present work (ANI-Al  Error (meV/atom) 3  Supplementary Figure 9: Extended cold curves for ANI-Al ensemble. At zero temperature ANI-Al predicts an FCC-to-HCP transition at 154.8 GPa, and an HCP-to-BCC transition at 392.7 GPa. The BCC crystal becomes lower energy than FCC above 261.5 GPa.