Machine-learned interatomic potentials by active learning: amorphous and liquid hafnium dioxide

We propose an active learning scheme for automatically sampling a minimum number of uncorrelated configurations for fitting the Gaussian Approximation Potential (GAP). Our active learning scheme consists of an unsupervised machine learning (ML) scheme coupled with a Bayesian optimization technique that evaluates the GAP model. We apply this scheme to a Hafnium dioxide (HfO2) dataset generated from a “melt-quench” ab initio molecular dynamics (AIMD) protocol. Our results show that the active learning scheme, with no prior knowledge of the dataset, is able to extract a configuration that reaches the required energy fit tolerance. Further, molecular dynamics (MD) simulations performed using this active learned GAP model on 6144 atom systems of amorphous and liquid state elucidate the structural properties of HfO2 with near ab initio precision and quench rates (i.e., 1.0 K/ps) not accessible via AIMD. The melt and amorphous X-ray structural factors generated from our simulation are in good agreement with experiment. In addition, the calculated diffusion constants are in good agreement with previous ab initio studies.


INTRODUCTION
Ab initio molecular dynamics (AIMD) simulations based on Density Functional Theory (DFT) 1,2 can provide atomistic structural descriptions of materials with quantum mechanical accuracy 3 . But such calculations are severely limited by the finite system size (10-100's of atoms) and short timescales (~10's of ps). Classical molecular dynamics (MD) simulations based on interatomic potentials derived from empirical and physical approximations, on the other hand, can provide access to larger system sizes (millions of atoms) with longer timescales (~100-1000's of ns) by sacrificing the quantum mechanical accuracy. Inverse modeling techniques such as Reverse Monte Carlo (RMC) 4 , along with advanced experimental techniques such as synchrotron based high-energy X-ray diffraction, have certainly aided in improved understanding of the atomic structure of materials. But such techniques can only provide a statistical description of the local atomic environment 5 . More recently, an improved version of RMC has been used to develop classical interatomic potentials where molecules are modeled using bonds, angles and dihedral potential terms with added nonbonded interaction parameters, thus making it more suitable to model larger molecules which are intractable using the traditional approach. In particular, recent research also showed the application of RMC in the development of a quantum mechanical-accurate model for amorphous silicon 6,7 . In the age of "big data"-driven materials informatics 8 , there emerged a new generation of machine learning (ML) interatomic potentials [9][10][11][12][13][14][15][16] . Unlike classical interatomic potentials, these potentials employ ML techniques such as neural networks and kernel based methods to map the direct functional relationship between atomic configuration and energy from reference quantum mechanical calculated datasets. Much like the atomic configurations in Cartesian coordinates, the ML interatomic potentials must satisfy translation, rotation, and permutation invariances. This is typically enforced by transforming the atomic coordinates into descriptors that capture the local atomic environment and satisfy the invariances. The ML interatomic potentials are regression models of the descriptors. Subsequently, many recent applications of ML interatomic potentials have achieved simulation lengths and timescales accessible to classical interatomic potentials, with near quantum mechanical accuracy 17,18 . Despite the progress, training the ML interatomic potentials remains a challenging task. The challenge is finding the right hyperparameters for the chosen method of fitting and sampling the correct training data that would lead to meaningful interpolation for the property of interest 19 . If there are large samples of datasets, then educatedly handpicking the training configurations becomes a cumbersome task.
Active learning (AL) is an ML strategy where a learning algorithm iteratively queries a very large pool of unlabeled data to extract a minimum number of training data that would lead to a supervised ML model with superior accuracy compared to a training model with educated handpicking 20 . Within the context of this article, our goal is to devise an active learner that can automatically select a minimum number of training configurations that would result in a near DFT accuracy ML interatomic potential. In addition, reducing the number of training samples lowers the computational resources required to train and evaluate the ML interatomic potential. Inspired by the original work of Dasgupta et al. 21 , we propose an active learner which aims to exploit the cluster structure embedded in a given unlabeled dataset so as to arrive at a minimum number of training configurations. The term "unlabeled dataset" implies that the proposed AL query strategy based on clustering 22 would only rely on input atomic configurations. We apply our AL scheme to fit the Gaussian Approximation Potential (GAP) framework 23 . The full details of the AL scheme are discussed in the "Methods" section. We also refer the reader to the recent success in the applications of AL [24][25][26] .
To showcase the overall capability of the AL scheme to fit the GAP model, we have chosen the specific application example of a binary amorphous oxide, namely Hafnium dioxide (HfO 2 ) or hafnia. Hafnia is a relevant material in semiconductor process technology such as high-k gate dielectrics 27,28 , as a potential replacement for silicon dioxide (SiO 2 ). In particular, high-k gate dielectrics applications require thin films of amorphous HfO 2 (a-HfO 2 ) 29,30 . The density of the a-HfO 2 is shown to significantly influence the atomistic structure and oxygen diffusion 31,32 . Furthermore, hafnia is used as a high-temperature refractory material 33 and also has applications in nuclear technologies 34 . We have chosen this specific application example given that the hightemperature refractory nature of this material requires exploration across a wide regime of thermodynamic configuration space. Our aim is to construct AL-driven fitting of GAP model for a-HfO 2 and liquid hafnia (l-HfO 2 ). For this purpose, we generated a reference HfO 2 dataset from an NVT "melt-quench" AIMD simulations, details of which are discussed in the "Methods" section. We demonstrate that the AL scheme reaches the high accuracy in energy and force fit for the GAP model trained with very few configurations sampled from this reference datasets. The active learned GAP model is used to perform melt-quench MD simulations for two different system sizes. The effect of quench rate is investigated via the order parameters extracted from the MD simulation of a medium size system (768 atom). To showcase the scalability and stability of the active learned GAP model, additionally we have performed a melt-quench MD on a large system size (6144 atoms) to generate a-HfO 2 . Overall, we demonstrate that the active learned GAP model accurately reproduces the AIMD computed results. Further, the results are validated against X-ray diffraction measurements. We stress the fact that all the AL-driven GAP models are trained only on the ab initio data, and experimental entities are used for benchmarks purposes to improve the training dataset. Finally, we demonstrate that the active learned GAP potential can be used to perform NPT quench on a 6144 atom system to estimate the density of the a-HfO 2 .

Active learning
We begin by discussing the results of applying the AL workflow to HfO 2 datasets generated from NVT "melt-quench" AIMD simulations. The AIMD datasets are summarized in the inset of the Fig. 1. The details of the AL workflow and the AIMD simulations are described in the "Methods" section. The optimal learning configuration for building up the potentials are chosen with the AL workflow in order to achieve standard error convergence pertaining to the range of properties measured 7 . The validation plot for the active learned a-HfO 2 potential is shown in Fig. 1. We start by discussing the inset table of Fig. 1, where the details of the active learned training configurations have been summarized. The energy tolerance value, E tol , was set to 5 meV/atom for quenching dataset, 2 meV/atom for the liquid and amorphous, respectively. In the case of amorphous and liquid phases, the AL workflow ended up with optimal training configurations with very few data iterations. It can be observed that the nonequilibrium nature of the quenching procedure over a large temperature range leads to significant challenges in picking the right training configuration. Consequently, it took the AL workflow 11 data iterations to reach the requested accuracy. But N train = 260 is a meager 0.8% of the entire AIMD quench dataset. This would be a significant human effort if done by handpicking configurations from the ab initio dataset. The human choice of training dataset is based on previous experience and literature reviews, combined with trial-and-error principle to achieve the desired error convergence. However, for the system relevant to this study, the AL workflow gives an automated path to achieve the desired accuracy without human intervention. Interested readers are advised to refer to the Supplementary Discussion on manual configuration selection and its benchmark with respect to the AL scheme presented here.
The individual active learned configurations was combined to train the final a-HfO 2 potential. The details of the hyperparameters are available in the Supplementary Table 1. We turn our attention now to the quality of this active learned a-HfO 2 potential, by validating on a test dataset independent from the data used for training. It can be seen that the active learned potential gives close-to-linear fit in predicted GAP energies vs DFT energies. The overall mean absolute error (MAE) for GAP-predicted energy is Fig. 1 The GAP-predicted vs DFT energy validation plot for the active learned a-HfO 2 potential. The validation was performed on a test dataset independent from the training data. The scatter color indicates the AIMD dataset source from which the test data point was chosen. (Inset table) Summary of the AIMD datasets, and active learning settings. N train is the number of active learned training configurations. E tol is the user-specified energy tolerance value in meV/atom. N iter is the number of data iterations required to converge the active learning workflow. (Inset plot) Force validation plot.
2.6 meV/atom. In the inset, we also show force convergence with an overall MAE of 0.28 eV/Å. This supports the argument that GAP-based atomistic models predict the local properties of the systems with good accuracy of MAE < 5 meV/atom for liquid and amorphous states 35,36 .
Active learned GAP MD. With the active learned GAP potentials, MD simulations are performed using the LAMMPS package 37 . The simulation setup is shown in Fig. 2. We considered medium (768 atom) and large (6144) system size for the melt-quench simulation to generate the amorphous structure. The details of the meltquench scheme are discussed in the "Methods" section.
We begin by a discussion of the results of the NVT melt-quench. As noted in the "Methods" section, we have fixed the density of l-HfO 2 and a-HfO 2 to values of 8.16 and 7.69 g cm −3 , respectively, as reported in an experiment study 31 . The X-ray structure factor of molten hafnia at 3173.15 K (2900°C) is measured to a Q-value of 22.5 Å −1 . The simulated atom-atom partial X-ray structure factors are obtained via inverse Fourier transforms of pair distribution functions (PDFs), weighted by the appropriate (Q-space) X-ray form and concentration factors, summed and compared directly with the experimental data. Figures 2 and 3 represent the structure factor of l-HfO 2 and a-HfO 2 , respectively. We can see a very good agreement of our GAP model structure factor with that of the experimental X-ray diffraction experiments for l-HfO 2 .
Furthermore, our GAP model shows good agreement with the long-range and short-range ordering for the a-HfO 2 , whereas the middle-range ordering from 5 < Q(Å −1 ) < 8 shows deviations from the experimental structure factor. Our GAP model is capable of capturing the salient structural features upon changing from an equilibrium liquid structure to a nonequilibrium amorphous state. To highlight the detailed structural rearrangements between the liquid and amorphous structures, we plot Q[S(Q) − 1] to emphasize the strong oscillations in amorphous signal in the range Q~5-15 Å −1 , which are heavily damped in the liquid signal. As expected, the oscillations decay for Q > 5 Å −1 in the liquid structure factor due to the increased local disorder at higher temperatures.
The objective of using our GAP model is to attain ab initio accuracy with scaling which cannot be accessed with DFT. As described in the "Methods" section, the ab initio reference data have a 96 atom system and our GAP-MD simulation consists of a system with 6144 atoms in total with box size of 4.4 × 4.4 × 4.4 nm 3 . From Fig. 2, it can be seen that with an increase in scale, the accuracy of structure factor of 6144 atoms l-HfO 2 is comparable to that of the experiments. This supports the argument that our GAP-based atomistic model can retain DFT accuracy at large scales with increased simulation times with linear scaling 7 .
To characterize the atomic structure in finer detail, we show the partial PDFs from GAP MD simulations for both l-HfO 2 and a-HfO 2 in Fig. 4. The calculated partial PDFs illustrate the growth of intermediate range ordering in a-HfO 2 compared to liquid in real space (see Fig. 4). The first peak at~2 Å corresponds to the average bond length between hafnium and oxygen. For l-HfO 2 , there are single broad peaks associated with the Hf-O and Hf-Hf correlations, but for a-HfO 2 , the Hf-O peak becomes narrower and increases in intensity. Moreover, the broad Hf-Hf peak in the liquid splits into two peaks in the amorphous form, corresponding to well-defined edge-sharing polyhedra at 3.4 Å and corner-sharing polyhedra 3.9 Å. The ratio of the edge/corner-sharing ratio is known to be density dependent 32 and leads to the formation of a disordered network at distances r > 8 Å in the amorphous phase, which have also been observed in previous ab initio studies and experiments 31 . The light blue dotted lines in Fig. 4 represent the partial PDFs obtained from AIMD simulations, and our GAP-MD model for 6144 atoms accurately reproduces the Hf-Hf peak split corresponding to the edge-sharing and corner-sharing polyhedra seen in the baseline DFT. This shows that the active learned GAP  Previous studies 31,32 have shown that the structure of a-HfO 2 is strongly density dependent. Here, we have performed the GAP MD for the a-HfO 2 with a fixed density of 7.69 g cm −3 . Further, with the analysis of the partial PDFs from Fig. 4, we showed that the active learned GAP model accurately matches with AIMD results. From Fig. 3, we can see deviations for the middle range ordering of a-HfO 2 for this density from our trained ab initio dataset. Now from Figs. 4 and 3, it can be seen that Hf-Hf interactions dominate the middle range ordering (5 < Q(Å −1 ) < 8) of a-HfO 2 . To elucidate the density dependence, we run NPT simulations with our GAP model. The details of NPT simulations are explained in "Methods" section. These simulations are performed to let the volume change in the system and the resulting density of a-HfO 2 is found to be 9.25 g cm −3 . The simulated structure factor at this density is shown in Fig. 3 (shifted green curve). Here, we can clearly see an improved middle range ordering (5 < Q(Å −1 ) < 8) compared to the low-density a-HfO 2 . Thus the natural increase in density due to NPT simulations from our GAP model is already able to improve the polyhedral connectivity observed in a-HfO 2 . Further we stress the fact that the accuracy of structure factor can also be increased by benchmarking the training configurations in relevance to the experiments which could be performed by RMC modeling 7 . Table 1 gives the coordination numbers and bond lengths of a-HfO 2 and l-HfO 2 in comparison to the experiments. GAP MD gives a very close agreement with the Hf-O coordination number and bond lengths with experiments. From Table 1, it can also be seen that hafnium gives a sevenfold coordination with oxygen, which is similar to the monoclinic phase asymmetric arrangements of Hf-O bond distances at 2.03-2.25 Å, as reported in previous study 38 . To articulate the argument from the above, we calculated the bond angle distribution of Hf-O-Hf of a-HfO 2 , the results of which are discussed below.
The bond angle distributions derived from the GAP-MD are shown in Fig. 5. For a-HfO 2 , it can be seen that there are two peaks at Hf-O-Hf angles of 107°and 142°. Previous studies have attributed these peaks to the existence of edge and cornersharing polyhedra, respectively 32 . In the GAP models, the two Hf-O-Hf peaks suggests a structural morphology in a-HfO 2 resembling that of the monoclinic phase, rather than a single broad peak around 120°observed in the tetragonal and cubic phases. However, a major difference between the amorphous Hf-O-Hf bond angle distribution and that of the crystalline monoclinc form is the width and intensity of the 142°peak. The broad nature of this peak in a-HfO 2 is indicative of a wide distribution of packing arrangements of corner-shared HfO n polyhedra compared to m-HfO 2 . The similarity of the 107°H f-O-Hf bond angle peak between a-HfO 2 and m-HfO 2 can be understood due to the strict geometric requirements of edgeshared units. Previous ab initio studies have predicted two different types of amorphous structure formation using the melt-quench scheme to investigate the amorphous-to-crystalline phase transition. They related these structures to tetragonal type and monoclinic type with respect to their long-range ordering and volume energy curve 31,32,38 . The l-HfO 2 , on the other hand, has a single asymmetric Hf-O-Hf bond angle distribution peak located at~115°, similar to that of the cubic and tetragonal phases of hafnia, which have bond angle distribution peaks in the interval of 117°-120°.
In summary, we have validated the active learned GAP models using structural properties. Furthermore, we also note that the diffusive behavior of amorphous and l-HfO 2 has been studied previously for their dielectric properties in RRAM 31,32 . To calculate   diffusion constants, the l-HfO 2 trajectory was sampled for more than 1 ns at 3100 K. The obtained hafnium and oxygen self diffusion constants were Hf: 3.3796 (±0.1) × 10 −5 cm 2 /s and O: 6.2971 (±0.1) × 10 −5 cm 2 /s, respectively. To account for correlated diffusion, we also calculated the distinct diffusion constants [39][40][41][42][43] and the calculated distinct diffusion constants of Hf: 3.6 (±0.009) × 10 −5 cm 2 /s, O: 6.425 (±0.002) × 10 −5 cm 2 /s. The impact of correlation is obtained from the ratio of DA Dσ and is found to be >0.9. This is in good agreement with the previous ab inito study performed by Hong et al. 44 reported at 3100 K Hf: 2.4 (± 0.1) × 10 −5 cm 2 /s, O: 5.2 (±0.2) × 10 −5 cm 2 /s. As reported in the Table I of Hong et al., the computational cost associated with that ab initio calculation to achieve 56 ps of trajectory for a system size of 270 atom is 21,500 CPU hours. Our GAP model with 6144 atom unit cell on the other hand achieved 110 ps with a computational cost of 1080 CPU hours, while keeping almost ab inito accuracy.

DISCUSSION
We have shown that the AL workflow can automatically sample a minimum number of training and test configuration for the GAP model that result in near DFT accuracy for the fitted energy and forces. Further, we show that this AL technique can improve the quality of interatomic potentials used for MD simulations up to an accuracy of ab inito training data. We understand that there will be scenarios where human logic and interventions are required to guide the training process. However, for the system relevant to this study, the AL workflow gives an automated path to achieve the desired accuracy without human intervention. We showed here the proof of concept that the AL could be a possible replacement to educated handpicking configuration method. The AL schemes further benefit the automation and selection speed of training and test configurations from the ab initio dataset, which are required to construct the machine learned atomistic potentials, as shown in Supplementary Fig. 2. Here, we used GAP-based atomistic models to model a-HfO 2 so as to showcase the AL workflow. Our machine learned atomistic model showed very good agreement with experimental liquid and amorphous Xray structure factors. Our model is able to predict the diffusion constants at same accuracy as previous study 44 , but at a reduced computational effort, due to the linear scaling of GAP models. We also exemplify the fact that the accuracy of the atomistic potential purely depends on the quality of quantum mechanical calculations used for training. This method can further be used to enhance the speed of modeling amorphous and liquid systems of interest. Our AL scheme uses an automated query strategy that relies on unsupervised clustering. In addition, Bayesian optimization (BO) is used on the fly to find the optimal hyperparameters of the ML interatomic potential. Due to the generality of the query strategy and BO framework, we stress that this approach can be easily extended to other ML based interatomic potentials.

AIMD dataset
The input for amorphous structures was generated using Packmol 45 by packing 96 random particles (32 Hf, 64 O) in a cubic box of density 8.16 g cm −346 . The density of this cubic box is taken to be experimental density reported by Gallington et al. 31 . The initial configuration is heated to 3600 K (500 K higher than melting point) and sampled for 12 ps. The final snapshot from the liquid configuration at 3600 K is quenched to 300 K at a rate of 100 K/ps. The final configuration of quench is rescaled to experimental density of 7.69 g cm −3 and 12 ps of amorphous configurations are generated. From the liquid and amorphous trajectories, the final 6 ps of snapshots are retained in the dataset. All of the 33 ps of quenching trajectories were retained in the dataset. These same density values are used through this study.
The atomic configurations, energy, force, and virial stress are calculated in NVT ensemble with a Nosé-Hoover thermostat 47

Active learning
Our aim is to employ AL to automatically sample uncorrelated learning configurations from the reference datasets, by exploiting the underlying cluster structure embedded in the dataset 21 and partition them into "N" uncorrelated clusters 53 . The learning configurations are sequentially sampled from the uncorrelated cluster and trained using the GAP model till the desired accuracy has been achieved. The AL workflow has been show in Fig. 6.
For the clustering, we use HDBSCAN 54-56 , a density-based hierarchical clustering method. This algorithm has been successfully applied to partition and analyze MD trajectories 57 . The individual AIMD trajectory serves as the input to the AL workflow. For the distance metric, we compute the pairwise root mean square deviations (RMSD) of atomic positions 58 . The pairwise RMSD matrix computed from the trajectory is input to HDBSCAN. HDBSCAN partitions the input trajectories into uncorrelated clusters. Once the information on the clusters is extracted, a series of trials is run to sample data from the uncorrelated clusters. In each trial, samples are drawn from each cluster at intervals separated by K iter , where K iter goes from K max to K min . K min and K max are the sample sizes of the smallest and the largest clusters, respectively. For the sake of consistency, an equal number of unique training (N train ) and test configurations are sampled from each cluster per trial. The sampled training configurations are used for training the GAP model and the test configurations, which are samples drawn independent of the training configurations used for validating the trained models. In the initial trial (i.e., first iteration), exactly one unique training and test configuration are drawn from each cluster (i.e., sampling width is same as maximum sample size, K max ). This would mean that at the end of the first iteration, the number of training and test configuration samples would be equal to the total number of clusters. In the subsequent trials more configurations are drawn from the cluster as the sampling width decreases. Now we turn our attention to GAP model training and validation. For the AL workflow, we only fit and evaluate the GAP model with respect to reference dataset energy computed from DFT, as our goal is to arrive at the optimal training configurations. We use MAE in the GAP-predicted energy on the test dataset as the error metric. The GAP model, in turn, has a number of Fig. 6. Active learning workflow. The AIMD trajectory is converted in to distance matrix and passed to the Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) clustering algorithm. Training and test data samples are sequentially drawn from the clusters to fit the GAP model till the required accuracy is achieved. For every data sampling iteration, Bayesian Optimization perform on-the-fly hyperparamter tuning of the GAP model.
hyperparameters that need to be tuned to arrive at the best model for a sampled dataset. BO is a powerful technique for the automated hyperparameter tuning of costly ML models 59 . Excellent review on this subject is available elsewhere 60 . The BO consists of a surrogate model for the objective function and an acquisition function to sample the next hyperparameters. Initially the BO runs a number of explorations over hyperparameter space to build the surrogate model over the error metric. Then a series of exploitation's are performed based on the knowledge gained from the initial exploration so as to move towards improving the surrogate model and getting better samples of hyperparameters that might lead to the minimization of the error metric. If the best GAP model generated from the BO does not achieve the required accuracy as gauged by an arbitrary user-specified threshold tolerance value (E tol ), then the next trial is invoked to add more training and test data from the clusters. The workflow stops the data iterations (number of data iteration, N iter ) as soon as the MAE in energy prediction for the best GAP model is on or below the user-specified threshold energy tolerance value (E tol ). The optimal configuration active learned from each of the individual AIMD datasets is sequentially combined to train the a-HfO 2 potential. The cluster parameters such as N, K min , and K max are outcomes of the unsupervised learning; consequently, the sampling regime is expected to generalize easily to new datasets and requires no hand tuning. Since we perform BO of the ML interatomic potential as a part of the AL, the optimal hyperparameters are readily accessible along with the optimal training configurations. The details specific to the Python implementation of the workflow has been discussed in the usage notes section. The technical details of clustering algorithm and BO are available in the Supplementary Discussion.

GAP-MD
The individual active learned datasets are combined to train the final a-HfO 2 GAP potential. In order to prevent unphysical clustering of oxygen atoms at high-temperature simulations, a nonparametric two-body term was added to the Smooth Overlap of Atomic Position (SOAP) descriptor 61,62 . The details of the training hyperparameters are provided in the Supplementary Table 1. The a-HfO 2 GAP potential is used to perform meltquench MD simulations using the LAMMPS MD package 37 . Two different system sizes are considered for the simulations. The melt-quench MD (up to 1 K/ps) is performed in an NVT ensemble with a Nosé-Hoover thermostat 47,48 to sample the configurations with a time step of 1 fs. The liquid and amorphous configurations are sampled for 100-200 ps. For scaling, we have performed MD simulations of liquid configurations up to 1.2 ns, and diffusion coefficients are estimated from this trajectory. For the GAP-based NVT quench, we use the same procedure as used in the AIMD dataset generation step by fixing the liquid and amorphous simulation setup density to values previously reported in experiment 31 . Different system sizes are also investigated, details of which are discussed in the results section. To estimate the density of a-HfO 2 , starting from melt, we performed a quench in NPT ensemble with a Nosé-Hoover thermostat 47,48 and a barostat [63][64][65] to allow for the volume to change. To avoid unnatural pressure fluctuations, the l-HfO 2 at 3600 K is equilibrated till 2500 K using an NVT ensemble. We performed a zero pressure NPT quench from 2500 to 500 K. A quench to temperature below 500 K is not observed to result in significant structural changes 7 . The GAP model is used to perform conjugate-gradient minimization to relax cell and atomic position of the NPT quenched configuration at 500 K into local minima.

Experiment
High-energy X-ray diffraction experiments on liquid and a-HfO 2 were performed on beamline 6-ID-D at the Advanced Photon Source. The details have previously been reported elsewhere 31 , so only the salient information is provided here. The liquid state diffraction measurements were performed by laser heating in an aerodynamic levitator using 100.27 keV X-rays, in combination with a large a-Si area detector. The experiments on a-HfO 2 were performed in a similar manner but using an incident energy of 131.74 keV to attain data out to higher Q-values and improve the realspace resolution. The data were analyzed using standard correction procedures, including corrections for background, Compton scattering, fluorescence and oblique incidence to yield the Faber-Ziman total X-ray structure factors. A sine Fourier transform was used to obtain the corresponding X-ray PDFs.

Usage notes
The AL workflow is implemented through the "BayesOpt_SOAP.py" workflow. This workflow internally calls the "activesample" Python class, which performs the data processing, clustering and sampling, at once. This class takes an extended XYZ file format as input, and it uses the MDTraj library 66 by calling the md.rmsd() to construct the distance matrix. MDTraj's md.rmsd() module computes the pairwise RMSD of atomic positions after rotating and centering trajectory configurations. There are two hyperparameters, namely the "minimum number of clusters" and "number of samples," which controls the performance of the clustering. These two hyperparameters need to be carefully tuned in such a way as to minimize the noise points 67 . In addition, setting the minimum number of clusters might prevent a scenario in which all data points collapse into a single cluster. If there are no user supplied inputs for these hyperparameters then a default value of 10 is set to both, which was found to be a reasonable choice after carefully evaluation across multiple trajectories. Since we are enforcing the minimum number of clusters, the K min will have same value as this hyperparameter. If there are scenarios in which more complex molecule dynamics trajectories are encountered, then a custom distance matrix can be supplied by the user. In the above code, this could be done using the class attribute "data.distance." Finally, we have used the BO as implemented in the GPyOpt Python library 68 to optimize the hyperparameters of the SOAP descriptor 61 and the GAP model 23 . These hyperparameters are radial cutoff, number of radial basis functions ("n_max"), spherical harmonics basis band limit ("l_max"), number of sparse points to use in the sparsification of the Gaussian process ("n_sparse"), and the standard deviation of the Gaussian process (delta). The active learned final training and test configurations are written to "opt_train.extxyz" and "opt_test.extxyz" files respectively. The detailed summary of the workflow including information of each data sampling iterations as well as the optimized hyperparameters are written to a JavaScript Object Notation (JSON) formatted output ("activelearned_quipconfigs.json"). More detailed discussions on workflow parameters and example usage are available in the GitHub repository. The hyperparameters and results of an example usage are discussed in "Active Learning Workflow Example Results" subsection in the Supplementary Discussion.
A code example for the usage of the "activesample" class.

DATA AVAILABILITY
The "XML" formatted force field file and active learning benchmark dataset are available at https://github.com/argonne-lcf/active-learning-md.

CODE AVAILABILITY
An implementation of the active learning workflow described in the paper is available at https://github.com/argonne-lcf/active-learning-md.