## Abstract

Efficiently creating a concise but comprehensive data set for training machine-learned interatomic potentials (MLIPs) is an under-explored problem. Active learning, which uses biased or unbiased molecular dynamics (MD) to generate candidate pools, aims to address this objective. Existing biased and unbiased MD-simulation methods, however, are prone to miss either rare events or extrapolative regions—areas of the configurational space where unreliable predictions are made. This work demonstrates that MD, when biased by the MLIP’s energy uncertainty, simultaneously captures extrapolative regions and rare events, which is crucial for developing uniformly accurate MLIPs. Furthermore, exploiting automatic differentiation, we enhance bias-forces-driven MD with the concept of bias stress. We employ calibrated gradient-based uncertainties to yield MLIPs with similar or, sometimes, better accuracy than ensemble-based methods at a lower computational cost. Finally, we apply uncertainty-biased MD to alanine dipeptide and MIL-53(Al), generating MLIPs that represent both configurational spaces more accurately than models trained with conventional MD.

### Similar content being viewed by others

## Introduction

Computational techniques are invaluable for exploring complex configurational and compositional spaces of molecular and material systems. The accuracy and efficiency, however, depend on the chosen computational methods. Ab initio molecular dynamics (MD) simulations using density-functional theory (DFT) provide accurate results but are computationally demanding. Atomistic simulations with classical force fields offer a faster alternative but often lack accuracy. Thus, developing accurate and computationally efficient interatomic potentials is a key challenge successfully addressed by machine-learned interatomic potentials (MLIPs)^{1,2,3,4,5}. An essential component of any MLIP is the accurate encoding of the atomic system by a local representation, which depends on configurational (atomic positions) and compositional (atomic types) degrees of freedom^{6}. Recently, a wide range of MLIPs have been introduced, comprising linear and kernel-based models^{7,8,9,10}, Gaussian approximation^{11,12}, and neural network (NN) interatomic potentials^{13,14,15,16,17}, including graph NNs^{18,19,20,21,22,23,24}, all demonstrating remarkable success in atomistic simulations.

The effectiveness of MLIPs, however, crucially relies on training data sufficiently covering configurational and compositional spaces^{25,26}. Without such training data, MLIPs cannot faithfully reproduce the underlying physics. An open challenge, therefore, is the generation of comprehensive training data sets for MLIPs, covering relevant configurational and compositional spaces and ensuring that resulting MLIPs are uniformly accurate across these spaces. This objective must be realized while reducing the number of expensive DFT evaluations, which provide reference energies, atomic forces, and stresses. This challenge is further complicated by the limited knowledge of physical conditions, such as temperature and pressure, at which configurational changes occur. Setting temperatures and pressures excessively high can result in atomic system degradation before exploring the relevant phase space.

To address this challenge, iterative active learning (AL) algorithms are used to improve the accuracy of MLIPs by providing an augmented data set^{27,28,29,30,31,32,33,34}; see Fig. 1(a). They select the data most informative to the model, i.e., atomic configurations with high energy and force uncertainties, as estimated by the model. This data is drawn from configurational and compositional spaces explored during, e.g., MD simulations. Reference DFT energies, atomic forces, and stresses are evaluated for the selected configurations. Furthermore, energy and force uncertainties indicate the onset of extrapolative regions—regions where unreliable predictions are made—prompting the termination of MD simulations and the evaluation of reference DFT values. In this AL setting, covering the configurational space and exploring extrapolative configurations might require running longer MD simulations and defining physical conditions for observing slow configurational changes (rare events).

Alternatively, enhanced sampling methods can significantly speed up the exploration of the configurational space by using adaptive biasing strategies such as metadynamics^{35,36,37,38,39,40,41}; see Fig. 1(b). However, metadynamics requires manually selecting a few collective variables (CVs) that are assumed to describe the system. The limited number of CVs restricts exploration, as they might miss relevant transitions and parts of the configurational space. In contrast, MD simulations biased toward regions of high uncertainty can enhance the discovery of extrapolative configurations^{42,43}. A related work utilizes uncertainty gradients for adversarial training of MLIPs^{44,45}. To obtain MLIPs that are uniformly accurate across the relevant configurational space, however, simultaneous exploration of rare events and extrapolative configurations is necessary. The extent to which uncertainty-biased MD can achieve this objective remains an unexplored research area.

This work demonstrates the capability of uncertainty-biased MD to explore the configurational space, including fast exploration of rare events and extrapolative regions; see Fig. 1(c). We achieve this by exploring the CVs of alanine dipeptide—a widely used model for protein backbone structure. To assess the coverage of the CV space, we introduce a measure using a tree-based weighted recursive space partitioning. Furthermore, we extend existing uncertainty-biased MD simulations by automatic differentiation (AD) and propose a biasing technique that utilizes bias stresses obtained by differentiating the model’s uncertainty with respect to infinitesimal strain deformations. We assess the efficiency of the proposed technique by running MD simulations in isothermal-isobaric (*N**p**T*) statistical ensemble and exploring cell parameters of MIL-53(Al)—a flexible metal-organic framework (MOF) featuring closed- and large-pore stable states. Both benchmark systems are often used in studies assessing enhanced sampling and data generation methods^{36,38,41,44}.

A key ingredient of AL algorithms with dynamically generated candidate pools is a sensitive metric for detecting the onset of extrapolative regions. These regions are typically associated with large errors in MLIP predictions. However, MLIP uncertainties often underestimate actual errors^{46,47}, resulting in the exploration of unphysical regions, negatively affecting MLIP training. Thus, calibrated uncertainties are crucial for generating high-quality MLIPs with AL, which involves configurations explored during MLIP-based MD^{47,48,49}, but might be unnecessary in AL tasks that rely on relative uncertainties^{50,51,52}. In our setting, we demonstrate that conformal prediction (CP) helps align the largest force error with its corresponding uncertainty value. This approach effectively makes MLIPs not underestimate force errors, which is important for preventing MD from exploring unphysical configurations. Thus, CP-based uncertainty calibration helps set reasonable uncertainty thresholds without limiting the exploration of the configurational space. In contrast, conventional approaches drive MD away from high-uncertainty regions, which can hinder exploration^{53}.

Contrary to existing work^{42,43}, which relies on ensembles of MLIPs for uncertainty quantification, we propose using ensemble-free uncertainties of NN-based MLIPs derived from gradient features^{50,51,52}. These features can be interpreted as the sensitivity of a model’s output to parameter changes. Recent studies demonstrate that gradient-based uncertainties perform comparably to ensemble-based counterparts in AL^{51,52,54}. Furthermore, they yield the exact posterior in the case of linear models^{9,10}. We demonstrate that gradient features can define uncertainties of total and atom-based properties, such as energy and atomic forces. To make gradient-based uncertainties computationally efficient, we employ the sketching technique^{55} and reduce the dimensionality of gradient features. For many NN-based MLIPs, gradient-based approaches can significantly reduce the computational cost of uncertainty quantification and accelerate the time-consuming MD simulations compared to ensemble-based methods. However, the latter can be made computationally efficient, e.g., through parallelization or employing specific settings with non-trainable descriptors and gradient-free force uncertainties^{45}.

We further enhance configurational space exploration and improve the computational efficiency of AL by employing batch selection algorithms^{51,52}. These algorithms simultaneously select multiple atomic configurations from trajectories generated during parallel MD simulations. Batch selection algorithms enforce the informativeness and diversity of the selected atomic structures. Thus, they ensure the construction of maximally diverse training data sets.

## Results

### Overview

In the following, we first demonstrate the necessity of uncertainty calibration on an example of MIL-53(Al) to constrain MD to physically reasonable regions of the configurational space. Then, we present two complementary analyses demonstrating the improved data efficiency of MLIPs obtained by our AL approach, developing MLIPs for alanine dipeptide and MIL-53(Al). Furthermore, we investigate how uncertainty-biased MD enhances the exploration of the configurational space, utilizing bias forces and stress. To benchmark our results, we draw a comparison with MD run at elevated temperatures and pressures as well as metadynamics simulations. The details on the ensemble-free uncertainties (distance- and posterior-based ones) and uncertainty-biased MD can be found in Methods.

### Calibrating uncertainties with conformal prediction

Total and atom-based uncertainties are typically poorly calibrated^{47}, meaning that they often underestimate actual errors. The underestimation of atomic force errors is particularly dangerous when dynamically generating candidate pools, as it may result in exploring unphysical configurations with extremely large errors in predicted forces. These unphysical configurations often cause convergence issues in reference DFT calculations. Additionally, poor calibration complicates defining an appropriate uncertainty threshold for prompting the termination of MD simulations and the evaluation of reference DFT energies, atomic forces, and stresses. To address this issue, we utilize inductive CP, which computes a re-scaling factor based on predicted uncertainties and prediction errors on a calibration set. The confidence level 1 − *α* in CP is defined such that the probability of underestimating the error is at most *α* on data drawn from the same distribution as the calibration set. For more details, see Methods.

Figure 2 demonstrates the correlation of maximal atom-based uncertainties, \(\mathop{\max }\limits_{i}{u}_{i}\), with maximal atomic force RMSEs, \(\mathop{\max }\limits_{i}\sqrt{\frac{1}{3}\mathop{\sum }\nolimits_{k = 1}^{3}{(\Delta {F}_{i,k})}^{2}}\), for the MIL-53(Al) test data set from ref. ^{41} based on numerous first principles MD trajectories at 600 K. We chose maximal atomic force RMSE as our target metric to identify extrapolative regions due to its high sensitivity to unphysical local atomic environments. In MLIP-based atomistic simulations, we model it using maximal atom-based uncertainty. Employing quantiles or averages of atomic force RMSE could extend simulation time by reducing sensitivity to extreme values; however, exploring these alternatives is left for future work.

In Fig. 2, transparent hexbins represent uncertainties calibrated with a lower confidence (*α* = 0.5; see Methods), while opaque ones depict those calibrated with a higher confidence (*α* = 0.05). The presented uncertainties are derived from gradient features or an ensemble of three MLIPs and calibrated using CP with atomic force RMSEs^{49}. For posterior- and distance-based uncertainties, which are unitless, the re-scaling with CP ensures that the resulting uncertainties are provided in correct units, i.e., eV Å^{−1}. Ensemble-based uncertainty quantification already provides correct units, which CP preserves. Equivalent results for alanine dipeptide, including the correlation between average uncertainties and average force RMSEs, can be found in the Supplementary Information.

Figure 2 (top) demonstrates results for MLIPs trained on 45 MIL-53(Al) configurations, while five samples were used for early stopping and uncertainty calibration. Figure 2 (bottom) shows the results for MLIPs trained and validated on 450 and 50 MIL-53(Al) configurations, respectively. In both experiments, the training and validation samples were selected from the data sets provided by ref. ^{41}. The first 50 samples correspond to randomly perturbed structures, while the remaining 450 are generated using metadynamics combined with incremental learning^{41}. The latter is an iterative algorithm that improves MLIPs by training on configurations generated sequentially over time, using the last frame of atomistic simulations.

We observe that uncertainties calibrated with a lower confidence level often underestimate actual errors. In this case, MD can explore unphysical regions before reaching the uncertainty threshold, especially in cases with a weak correlation between uncertainties and actual errors. By employing CP with higher confidence, we help align the largest prediction error with the corresponding uncertainty, thereby improving its ability to identify the onset of extrapolative regions. This alignment becomes apparent in Fig. 2, where CP shifts the hexbin points to be on or below the diagonal.

In Fig. 2 (top), we find that even training and calibrating models with a few randomly perturbed atomic configurations is sufficient for robust identification of unreliable predictions. This result is crucial as we rely on such data sets to initialize our AL experiments, eliminating the need for predefined data sets^{42,43}. Furthermore, we observe that, for MIL-53(Al), calibrated uncertainties from model ensembles tend to overestimate the actual error to a greater extent than gradient-based approaches. While this may not be critical when exploring unphysical configurations, it can prematurely terminate MD simulations. This trend is consistent across all training and calibration data sizes. Lastly, the results provided here and in the Supplementary Information demonstrate that all uncertainty methods perform comparably regarding Pearson and Spearman correlation coefficients.

### Performance of bias-forces-driven active learning

Exploring the configurational space of complex molecular systems, particularly those with multiple stable states, is essential for developing accurate and robust MLIPs. We apply bias-forces-driven MD combined with AL to develop MLIPs for alanine dipeptide in vacuum. This dipeptide exhibits two stable conformers characterized by the backbone dihedral angles *ϕ* and *ψ* (see Fig. 3): the C_{7eq} state with *ϕ* ≈ − 1.5 rad and *ψ* ≈ 1.19 rad and the C_{ax} state with *ϕ* ≈ 0.9 rad and *ψ* ≈ − 0.9 rad^{56}. We use unbiased MD as the baseline for generating candidate pools in two scenarios: AL with candidates selected from unbiased MD trajectories based on their uncertainty (and diversity) and candidates sampled from them at random. The performance of MLIPs is assessed employing the test data obtained from a long MD trajectory at 1200 K; see Methods. We employ the AMBER ff19SB force field for reference energy and force calculations^{57}, as implemented in the TorchMD package using PyTorch^{58,59}.

Figure 3 demonstrates the performance of MLIPs obtained for alanine dipeptide depending on the number of acquired configurations. Table 1 presents error metrics evaluated for MLIPs at the end of each experiment. Here, we provide results for the posterior-based uncertainty and uncertainty-biased MD at 300 K. The Supplementary Information presents equivalent results for other uncertainty methods and temperatures. Figure 3a presents the coverage of the CV space defined by *ϕ* and *ψ*, calculated using all MD trajectories up to the current AL step. We measure the coverage of the respective space by a tree-based weighted recursive space partitioning; see Methods. AL experiments combined with unbiased MD at 1200 K serve as the upper-performance limit for MLIPs in the case of alanine dipeptide, achieving the highest coverage of 0.97 after acquiring 512 configurations. Increasing temperature even further while using interatomic potentials, which allow for bond breaking and formation, may lead to the degradation of the molecule. Uncertainty-biased MD simulations at 300 K result in slightly lower coverage values, surpassing the coverages achieved by unbiased MD at 300 K and 600 K.

Furthermore, biased MD at 300 K outperforms unbiased dynamics at 1200 K, efficiently covering the CV space before acquiring ~ 200 configurations. This observation is attributed to the gradual increase in driving forces induced by the uncertainty bias, resulting in a more gradual distortion of the atomic structure. In contrast, high-temperature unbiased simulations perturb the system more strongly and rapidly enter extrapolative regions without exploring relevant configurational changes. Thus, high-temperature simulations may also cause the degradation of the investigated atomic systems, unlike uncertainty-biased dynamics applied at mild physical conditions.

Figure 3b, c present energy and force RMSEs evaluated on the alanine dipeptide test data set; see Methods. Consistent with the findings in Fig. 3a, AL approaches combined with biased MD at 300 K outperform their unbiased counterparts at 300 K and 600 K once they acquire ~ 100 configurations. Biased AL experiments achieve energy RMSE of 1.97 meV atom^{−1}, close to those observed in high-temperature MD simulations, surpassing others by a factor of more than 13. A similar trend is observed for force RMSE. Biased AL experiments achieve an RMSE of 0.071 eV Å^{−1}, outperforming their counterparts at 300 K and 600 K by factors of 2.1 and 1.6, respectively.

These results demonstrate the efficiency of uncertainty-biased dynamics in exploring the configurational space and developing accurate and robust MLIPs. Moreover, generating training data that sufficiently covers the configurational space by combining AL with biased MD does not significantly increase the computational demand compared to conventional AL with unbiased MD; see the Supplementary Information. Lastly, MLIPs trained with candidates selected based on their uncertainty (and diversity) from biased and unbiased MD trajectories systematically outperform MLIPs trained with candidates selected at random; see Table 1.

Biased AL experiments achieve exceptional performance without knowledge of temperatures that accelerate transitions between stable states; see Fig. 3d. Identifying these temperatures requires running MD simulations at different conditions to explore the configurational space without degrading the atomic system. In contrast, given the mild physical conditions such as temperatures of 300 K and 600 K, biased MD simulations outperform their unbiased counterparts at 300 K and 600 K and achieve comparable performance to experiments at 1200 K for *τ* ≲ 0.5 and 0.2 ≲ *τ* ≲ 0.4, respectively. The available range of biasing strength values may be more restricted at more extreme conditions. Adding uncertainty bias to MD at 1200 K results in an even stronger system perturbation than during unbiased MD without yielding any improvement. For additional details, see the Supplementary Information.

Our results offer evidence of rare event exploration (the exploration of both stable states of alanine dipeptide) through uncertainty-biased dynamics. The following section will present a detailed analysis of the exploration rates. Additionally, we have identified how to further improve our biased MD simulations by making biasing strengths species dependent; see the Supplementary Information. The results presented in this section, achieved with a biasing strength of zero for hydrogen atoms, outperform settings where all atoms are biased equally, with improvements by a factor of 1.08 in coverage and 1.15 in force RMSE; see Table 1. Thus, a more sophisticated data-driven redistribution of biasing strengths can further enhance the performance of bias-forces-driven MD simulations. However, learning species-dependent biasing strengths necessitates defining a suitable loss function that promotes the fast exploration of phase space^{60}, which falls beyond the scope of this work.

### Exploration rates for collective variables of alanine dipeptide

We have observed that uncertainty-biased MD simulations effectively explore the configurational space of alanine dipeptide, defined by its CVs. Figure 4 evaluates the extent to which the introduced bias forces in MD simulations accelerate their exploration. In Fig. 4a, we present the coverage of the CV space as a function of simulation time, i.e., of the effective number of MD steps. The figure demonstrates that uncertainty-biased AL experiments at 300 K outperform unbiased experiments at 300 K and 600 K. They achieve the same coverage in considerably shorter simulation times, thereby enhancing exploration rates by a factor of larger than two. At the same time, biased MD simulations yield results comparable to those obtained from unbiased MD simulations at 1200 K. Thus, uncertainty-biased MD explores configurational space at a similar rate to unbiased MD at 1200 K.

The exploration rates estimated from Fig. 4a provide an approximate measure of how uncertainty-biased dynamics accelerate the exploration of configurational space. To offer a more thorough assessment, we examine auto-correlation functions (ACFs) computed for both position and uncertainty spaces in Fig. 4b, c. Here, a faster decay corresponds to a faster exploration of the respective space. We compute ACFs using MD trajectories from all AL iterations. Additionally, we calculate the auto-correlation time (ACT) for each experiment. For the definition of ACF and ACT, see Methods. Table 1 presents ACTs for all AL experiments. Smaller ACTs correspond to a faster decay of ACFs, indicating a faster exploration of the respective spaces.

ACTs demonstrate that uncertainty-biased MD at 300 K explores position and uncertainty spaces two to six times faster than unbiased MD at 300 K and 600 K. Compared to unbiased MD at 1200 K, it achieves comparable exploration rates in the position space and rates lower by a factor of two for the uncertainty space. Biasing hydrogen atoms reduces the uncertainty ACT compared to experiments with zero hydrogen biasing strength but increases the position ACT by a factor of three. Thus, stronger atomic bond distortions, resulting in fast exploration of extrapolative regions, can explain a shorter uncertainty ACT of unbiased MD at 1200 K. While this effect can be unfavorable for promoting the exploration of rare events in biased MD, incorporating small, non-zero biasing strengths for hydrogen atoms may be necessary to ensure the robustness of MD simulations at elevated temperatures. Interestingly, we observe that uncertainty-biased MD explores both stable states in alanine dipeptide, even though 27 degrees of freedom (C, N, and O atoms) were effectively biased, demonstrating its remarkable efficiency.

To gain insight into the exploration of the CV space during AL, we refer to Fig. 4d, e, which illustrate the time evolution of the maximal atom-based uncertainty and the CV space coverage for selected AL iterations. Biased MD systematically explores configurations with higher uncertainty values than unbiased MD at 300 K and 600 K. Furthermore, bias forces drive the exploration of both stable states of alanine dipeptide and promote transitions between them, similar to higher temperatures in unbiased MD. Later AL iterations in Fig. 4d, e demonstrate that MD driven by bias forces reduces the uncertainty level uniformly across the configurational space. Thus, given the correlation between uncertainties and actual errors, uncertainty-biased MD generates MLIPs uniformly accurate across the configurational space.

### Performance of bias-stress-driven active learning

Generating training data for bulk material systems with large unit cells and multiple stable states poses a significant challenge in developing MLIPs. Therefore, we assess the performance of the bias-stress-driven AL applied to MIL-53(Al), a flexible MOF that undergoes reversible, large-amplitude volume changes under external stimuli, such as temperature and pressure (see Fig. 5). MIL-53(Al) features two stable phases: the closed-pore state with a unit cell volume of *V* ~ 830 Å^{3} and the large-pore state with *V* ~ 1419 Å^{3}. For reference energy, force, and stress calculations, we use the CP2K simulation package (version 2023.1)^{61} and DFT at the PBE-D3(BJ) level^{62,63}. Our baseline for generating candidate pools for AL involves unbiased MD and training data selected based on their uncertainty (and diversity) or at random. We also employ metadynamics^{41}, which uses an adaptive biasing strategy for cell parameters of MIL-53(Al), as a baseline. We assess the performance of MLIPs for MIL-53(Al) using the test data set presented by ref. ^{41}.

Figure 5a–c demonstrate the performance of MLIPs developed for MIL-53(Al) depending on the number of acquired configurations. Table 2 presents error metrics evaluated for MLIPs at the end of each experiment. Here, we present results for the posterior-based uncertainty. The Supplementary Information presents equivalent results for other uncertainty methods and pressures. We observe that MLIPs trained with configurations generated using metadynamics outperform the others for data set sizes below ~ 200 samples. This difference in performance can be attributed to how perturbed configurations are generated and the differing experimental settings between incremental learning and AL applied here. Bias-stress-driven AL outperforms metadynamics-based experiments after acquiring ~ 200 configurations regarding force and stress RMSEs.

Metadynamics-based experiments achieve performance on par with unbiased AL experiments conducted at 0 MPa after they reach a data set size of ~ 200 configurations. For uncertainty-biased MD, the force RMSE improves by a factor of 1.14, and the stress RMSE improves by a factor of two compared to zero-pressure unbiased MD. Furthermore, AL experiments with biased MD simulations outperform unbiased MD simulations at 250 MPa regarding stress RMSE. Thus, bias-stress-driven MD generates a data set that better represents the relevant configurational space of flexible MOFs compared to MLIPs trained with conventional MD and metadynamics simulations. This improvement is achieved without significantly increasing the computational cost of data generation; see the Supplementary Information. Lastly, similar to the results obtained for alanine dipeptide, AL with a more advanced selection strategy outperforms experiments where training data is picked at random; see Table 2.

Figure 5d, e show the main advantage of biased MD simulations over unbiased and metadynamics-based approaches. While exploring the large-pore state less frequently than metadynamics-based counterparts, bias-stress-driven MD spans a broader range of volumes and uniformly reduces energy, force, and stress RMSEs across the entire volume space. Compared to zero-pressure unbiased MD simulations, it promotes the exploration of the large-pore state. However, this state can be modeled using atomic environments from the closed-pore one. Thus bias stress does not excessively favor exploration of the former. Instead, it drives the dynamics more toward smaller volumes, for which all other approaches tend to predict energy, force, and stress values with larger errors. Note that, in Fig. 5e, we reduce the temperature to 300 K and initiate AL experiments with 256 configurations, each having a unit cell volume below 1200 Å^{3} (drawn from the training data in ref. ^{41}). Using a lower temperature and learning the configurational space around the closed-pore state is required to decrease the probability of MD simulations exploring the large-pore stable state of MIL-53(Al). In contrast, we found that using randomly perturbed atomic configurations can lead to underestimated energy barriers by MLIPs, thus facilitating the transition between both stable phases in initial AL iterations.

These results show that uncertainty-biased MD simulations aim to uniformly reduce errors across the relevant configurational space and promote the simultaneous exploration of extrapolative regions and transitions between stable states. Also, under selected physical conditions (*T* = 600 K and *p* = 0 MPa), the performance of our uncertainty-biased MD exhibits low sensitivity to stress biasing strength values for *τ* ≥ 0.5; see the Supplementary Information. Metadynamics, in contrast, may require longer simulation times to generate equivalent candidate pools as it focuses on generating configurations uniformly distributed in the CV space, which is unnecessary for developing MLIPs.

### Exploration rates for cell parameters of MIL-53(Al)

Figure 6 assesses the extent to which uncertainty-biased (bias stress) MD simulations enhance the exploration of the extensive volume space of MIL-53(Al). In Fig. 6a, we observe a higher frequency of transitions between stable phases for biased MD simulations than for zero-pressure counterparts. Additionally, uncertainty-biased simulations favor the exploration of smaller MIL-53(Al) volumes, in line with the results shown in Fig. 5. Figure 6b, c present ACFs for position and uncertainty spaces, with estimated ACTs provided in Table 2. Here, a faster decay of ACFs corresponds to shorter ACTs and indicates a faster exploration of the respective space. These results indicate that bias-stress-driven MD is at least as efficient as high-pressure MD simulations in exploring both spaces. Figure 6d demonstrates the time evolution of energy, force, and stress RMSEs. It reveals that local atomic environments in the large-pore state are well represented by those in the closed-pore state, explaining the stronger preference for smaller volumes by biased MD; see Figs. 6a and 5d, e. This effect is evident from the low force and stress RMSEs in the early AL iterations for the large-pore state, even though this state has not been explored yet. Furthermore, uncertainty-biased MD simulations surpass the performance of their counterparts already in the early stages by aiming to reduce errors across the test volume space uniformly.

From these results and the findings in Fig. 5d, we conclude that bias-stress-driven MD significantly enhances the exploration of the relevant configurational space, including rare events (i.e., transitions between stable phases). However, in Table 2, we obtained longer ACTs for biased MD at 300 K compared to its unbiased counterparts, which contradicts our previous arguments. When examining the ACF shown in Fig. 7, it becomes evident that a stronger correlation in the position space results from the volume fluctuations induced in MIL-53(Al) by the bias stress. These fluctuations can be represented by a sine wave with additive random noise and a period twice the simulation’s length; see Methods. This observation implies that bias stress induces correlated motions in the MIL-53(Al) system, causing it to expand and contract alternately for half of the simulation time. This phenomenon results in periodic exploration of small and large volumes within the configurational space.

In contrast to the conventional approaches, including the bias-forces-driven MD simulations, which aim for uncorrelated random-walk-like behavior of predetermined CVs to capture configurational changes, our method introduces correlated motion that explores the entire configurational space. Increasing the amplitude of random noise in the sine wave reduces the amplitude of these fluctuations in the ACF, similar to raising the temperature in an atomic system. This decrease in the amplitude explains why this effect is not observed in Fig. 6b.

## Discussion

This work investigates an uncertainty-driven AL approach for data set generation, facilitating the development of high-quality MLIPs for chemically complex atomic systems. We employ uncertainty-biased MD simulations to generate candidate pools for AL algorithms. Our results show that applying uncertainty bias facilitates simultaneous exploration of extrapolative regions and rare events. Efficient exploration of both is crucial in constructing comprehensive training data sets, enabling the development of uniformly accurate MLIPs. In contrast, classical enhanced sampling techniques (e.g., metadynamics) or unbiased MD simulations at elevated temperatures and pressures often cannot simultaneously explore extrapolative regions and rare events. Enhanced sampling techniques were designed to ensure the reconstruction of the underlying Boltzmann distribution. However, this property is unnecessary for data set generation and may limit their effectiveness in this context.

The performance of enhanced sampling techniques depends on the manual definition of hyper-parameters, e.g., CVs for metadynamics. Setting them requires expert knowledge because the wrong choice can limit the range of explored configurations. Uncertainty-biased MD only needs to define an uncertainty threshold and biasing strength. Both parameters influence the exploration rate of configurational space without constraining the space that can be explored. Under milder conditions, uncertainty-biased MD simulations outperform their unbiased counterparts for a broad range of biasing strength values, making the latter’s choice more accessible. Yet, the dependence of the performance on the biasing strength value becomes more noticeable under extreme conditions, sometimes with no improvement by adding uncertainty bias to MD. A similar behavior can also be expected for metadynamics simulations^{64}. Additionally, employing species-dependent biasing strength can restrict biasing in sensitive configurational regions, e.g., biasing hydrogen atoms.

Identifying extreme conditions like high temperatures and pressures can also accelerate phase space exploration in unbiased MD. However, a wrong choice of temperature and pressure may result in unphysical force predictions and degradation of the atomic system. In contrast, uncertainty-biased MD, conducted under milder conditions, explores relevant phase space at rates comparable to those obtained under extreme conditions and reduces the risk of degrading the atomic system. As mentioned, uncertainty-biased MD simulations outperform their unbiased counterparts for a broad range of biasing strength values in our setting. Furthermore, while evaluating uncertainty gradients increases the inference times by a factor of 1.4 to 1.7 compared to unbiased MD, applying uncertainty bias leads to, on average, shorter MD simulations. Thus, the difference in the computational cost between biased and unbiased MD is typically insignificant.

We compare uncertainty quantification methods, including the variance of an ensemble of MLIPs, and ensemble-free methods derived from sketched gradient features, focusing on configurational space exploration rates and generating uniformly accurate potentials; see the Supplementary Information. Overall, gradient-based approaches yield MLIPs with similar performance to those created using ensemble-based uncertainty while significantly reducing the computational cost of uncertainty quantification. For MIL-53(Al), we find that ensemble-based uncertainties overestimate the force error more strongly than gradient-based approaches, resulting in earlier termination of MD simulations and potentially worse configurational space exploration. For alanine dipeptide, using an ensemble of MLIPs improves their robustness during MD simulations, facilitating CV space exploration. Therefore, improving the robustness of a single MLIP during an MD simulation is a promising research direction^{65}, combined with the proposed ensemble-free techniques.

While this study thoroughly investigates AL with uncertainty-biased MD for generating candidate pools, further research is still necessary. For example, one should analyze how well uncertainty-biased MD explores a configurational space with multiple stable states and how it identifies the respective slow modes using solely uncertainty bias. Also, assessing the uniform accuracy of resulting MLIPs and the enhanced exploration in higher-dimensional CV spaces remains challenging. Furthermore, the applicability of the proposed data generation approach to more complex molecular and material systems, such as biological polymers^{66} and multicomponent alloys^{5}, is yet to be explored. Unlike MD, Monte Carlo simulations generally allow significant configurational changes, eliminating the need to explore intermediate transition paths. Combined with uncertainty bias, they might avoid exploring intermediate, low-uncertainty transition regions, improving the efficiency of uncertainty-driven data generation. Lastly, the extent to which MLIPs based on graph NNs can enhance the efficiency of the proposed data generation approach remains to be seen.

## Methods

### Machine-learned interatomic potentials

We define an atomic configuration, \(S={\{{{{{\bf{r}}}}}_{i},{Z}_{i}\}}_{i = 1}^{{N}_{{{{\rm{at}}}}}}\), where \({{{{\bf{r}}}}}_{i}\in {{\mathbb{R}}}^{3}\) are Cartesian coordinates and \({Z}_{i}\in {\mathbb{N}}\) is the atomic number of atom *i*, with a total of *N*_{at} atoms. Our focus lies on interatomic NN potentials, which map an atomic configuration to a scalar energy *E*. The mapping is denoted as \({f}_{{{{\boldsymbol{\theta }}}}}:S\,\mapsto\, E\in {\mathbb{R}}\), where ** θ** denotes the trainable parameters. By assuming the locality of interatomic interactions, we decompose the total energy of the system into individual atomic contributions

^{13}

where *S*_{i} is the local environment of atom *i*, defined by the cutoff radius *r*_{c}. The trainable parameters ** θ** are learned from atomic data sets containing atomic configurations and their energies, atomic forces, and stress tensors.

### Gradient-based uncertainties

We quantify the uncertainty of a trained MLIP by expanding its energy per atom *E*_{at} = *E*/*N*_{at} around the locally optimal parameters *θ*^{*}^{50,51,52}

where *S* denotes an atomic configuration as defined in the previous section. Gradient features \(\phi \left(S\right)\in {{\mathbb{R}}}^{{N}_{{{{\rm{feat}}}}}}\) can be interpreted as the sensitivity of the energy to small parameter perturbations. Here, *N*_{feat} is the number of trainable parameters of the MLIP. We employ the energy per atom *E*_{at} in Eq. (2), as it accounts for the extensive nature of the energy, whose value depends on the system size. This choice ensures that uncertainties defined using gradient features do not favor the selection of larger structures. Gradient features can also be expressed as the mean of their atomic contributions: \(\phi =\mathop{\sum }\nolimits_{i = 1}^{{N}_{{{{\rm{at}}}}}}{\phi }_{i}/{N}_{{{{\rm{at}}}}}\). For atomic gradient features *ϕ*_{i}, using the energy per atom in Eq. (2) is unnecessary. Here, we use \(\phi =\phi \left(S\right)\) and \({\phi }_{i}={\phi }_{i}\left({S}_{i}\right)\), with *S*_{i} denoting the local environment of an atom *i*, to simplify the notation. Thus, gradient features can be used to quantify uncertainties in total and atom-based properties of an atomic system, such as energy and atomic forces, respectively.

Particularly, we define the atom-based model’s uncertainty (atomic forces) by employing squared distances between atomic gradient features

Alternatively, we consider Bayesian linear regression in Eq. (2) and compute the posterior uncertainty as

where *λ* is the regularization strength. Here, we define \({\Phi }_{{{{\rm{train}}}}}={\phi }_{j}\left({{{{\mathscr{X}}}}}_{{{{\rm{train}}}}}\right)\in {{\mathbb{R}}}^{\left({N}_{{{{\rm{at}}}}}\cdot {N}_{{{{\rm{train}}}}}\right)\times {N}_{{{{\rm{feat}}}}}}\) with \({{{{\mathscr{X}}}}}_{{{{\rm{train}}}}}\) denoting the local atomic environments of configurations in the training set of size *N*_{train}. In this work, we refer to our uncertainties as distance- and posterior-based uncertainties. Equivalent results can be obtained for total uncertainties (energy), employing gradient features \(\phi =\mathop{\sum }\nolimits_{i = 1}^{{N}_{{{{\rm{at}}}}}}{\phi }_{i}/{N}_{{{{\rm{at}}}}}\) with \({\Phi }_{{{{\rm{train}}}}}=\phi \left({{{{\mathcal{X}}}}}_{{{{\rm{train}}}}}\right)\in {{\mathbb{R}}}^{{N}_{{{{\rm{train}}}}}\times {N}_{{{{\rm{feat}}}}}}\).

Calculating uncertainties using gradient features is computationally challenging, especially for the posterior-based approach, for which a single uncertainty evaluation scales as \({{{\mathscr{O}}}}\left({N}_{{{{\rm{feat}}}}}^{2}\right)\). Therefore, we employ the sketching technique^{55} to reduce the dimensionality of gradient features, i.e., \({\phi }_{i}^{{{{\rm{rp}}}}}={{{\bf{U}}}}{\phi }_{i}\in {{\mathbb{R}}}^{{N}_{{{{\rm{rp}}}}}}\) with *N*_{rp} and \({{{\bf{U}}}}\in {{\mathbb{R}}}^{{N}_{{{{\rm{rp}}}}}\times {N}_{{{{\rm{feat}}}}}}\) denoting the number of random projections and a random matrix, respectively^{51,52}. In previous work^{51}, we have observed that uncertainties derived from sketched gradient features demonstrate a better correlation with RMSEs of related properties than those based on last-layer features^{50,67,68}. More details on sketched gradient features can be found in the following sections. Atom-based uncertainties, defined by the distances between gradient features, scale linearly with both the system size and the number of training structures, i.e., as \({{{\mathcal{O}}}}\left({N}_{{{{\rm{at}}}}}{N}_{{{{\rm{train}}}}}\right)\). Consequently, they require an additional approximation to ensure computational efficiency. To address this, we employed the batch selection algorithm that maximizes distances within the training set, allowing us to identify the most representative subset of atomic gradient features; see the following sections.

### Uncertainty-biased molecular dynamics

Following previous work^{42,43}, we define the biased energy as

where *τ* denotes the biasing strength. The negative sign ensures that negative uncertainty gradients with respect to atomic positions (bias forces) drive the system toward high uncertainty regions; see Fig. 1c. In this work, we use AD to compute bias forces acting on atom *i*, denoted as \(-{\nabla }_{{{{{\bf{r}}}}}_{i}}u\left(S,{{{\boldsymbol{\theta }}}}\right)\) with atomic positions **r**_{i}. The total biased force on atom *i* reads

These biased forces can be used for MD simulations in, e.g., canonical (*N**V**T*) statistical ensemble to bias the exploration of the configurational space.

In the case of bulk atomic systems, the configurational space often includes variations in cell parameters, which define the shape and size of the unit cell, necessitating enhanced exploration of them. For this purpose, we propose the concept of bias stress, defined by

with *V* denoting the volume of the periodic cell. This expression is motivated by the definition of the stress tensor^{69}. Here, \(u\left(S,{{{\boldsymbol{\theta }}}}\right)\) denotes the uncertainty after a strain deformation of the bulk atomic system with the symmetric tensor \({{{\boldsymbol{\epsilon }}}}\in {{\mathbb{R}}}^{3\times 3}\), i.e., \(\tilde{{{{\bf{r}}}}}=\left({{{\bf{1}}}}+{{{\boldsymbol{\epsilon }}}}\right)\cdot {{{\bf{r}}}}\). The calculation of the bias stress is straightforward with AD. The total biased stress reads

The bias stress tensor in Eq. (7) effectively reduces the internal pressure in the bulk atomic system. We propose combining the bias stress tensor with MD simulations conducted in isothermal-isobaric (*N**p**T*) statistical ensemble to enhance the data-driven exploration of cell parameters and pressure-induced transitions in bulk materials.

Uncertainty gradients exhibit different magnitudes compared to energy gradients. Thus, re-scaling uncertainty gradients is necessary to ensure consistent driving toward uncertain regions. Building upon the approach introduced in ref. ^{43}, we implement a re-scaling technique that monitors the magnitudes of both actual and bias forces (alternatively, actual and bias stresses) over *N* steps and then computes the ratio between them. To re-scale bias forces, we use the following expression

An equivalent expression is applied for bias stresses.

The re-scaling of uncertainty gradients is reminiscent of the AdaGrad algorithm^{70}, which dynamically adjusts the learning rate (analogous to the biasing strength) based on historical gradients from previous iterations. While incorporating momentum through exponential moving averages can improve the AdaGrad approach, treating all past gradients with equal weight is essential within the context of this study. Our attempts to damp learning along directions with high curvature (high-frequency oscillations), similar to the Adam optimizer^{71}, did not yield improved performance. We further find that employing species-dependent biasing strengths for bias forces, \(\tau \to {\tau }_{{Z}_{i}}\), with a particular emphasis on damping biasing of hydrogen atoms, improves the efficiency of biased MD simulations.

We employ biased MD simulation to generate a candidate pool for AL, as depicted in Fig. 1a. We employ multiple parallel MD simulations to enhance the exploration of the configurational space further and improve the computational efficiency of AL. We expect biased MD simulations to have relatively short auto-correlation times (ACTs) obtained from position and uncertainty auto-correlation functions (ACFs). Short ACTs imply that the generated candidates will be less correlated than those generated with unbiased MD simulations. However, we cannot guarantee the generation of uncorrelated samples with biased MD simulations throughout AL, particularly in later AL iterations when the uncertainty level is reduced. Therefore, we propose to use batch selection algorithms (see later sections) that select *N*_{batch} > 1 samples at once. These algorithms enforce the informativeness and diversity of the selected atomic configurations and the resulting training data set.

### Gaussian moment neural network

This work uses the Gaussian moment neural network (GM-NN) approach for modeling interatomic interactions^{16,17}. GM-NN employs an artificial NN to map a local atomic environment *S*_{i} to the atomic energy \({E}_{i}\left({S}_{i},{{{\boldsymbol{\theta }}}}\right)\); see Eq. (1). It uses a fully-connected feed-forward NN with two hidden layers^{16,17}

with \({{{{\bf{W}}}}}^{(l+1)}\in {{\mathbb{R}}}^{{d}_{l+1}\times {d}_{l}}\) and \({{{{\bf{b}}}}}^{(l+1)}\in {{\mathbb{R}}}^{{d}_{l+1}}\) representing the weights and biases of layer *l* + 1. In this work, we employ a NN with *d*_{0} = 910 input neurons (corresponding to the dimension of the input feature vector \({{{{\bf{G}}}}}_{i}={{{{\bf{G}}}}}_{i}\left({S}_{i}\right)\)), *d*_{1} = *d*_{2} = 512 hidden neurons, and a single output neuron, *d*_{3} = 1. The network’s weights **W**^{(l+1)} are initialized by selecting entries from a normal distribution with zero mean and unit variance. The trainable bias vectors **b**^{(l+1)} are initialized to zero. To improve the accuracy and convergence of the GM-NN model, we implement a neural tangent parameterization (factors of 0.1 and \(1/\sqrt{{d}_{l}}\))^{72}. For the activation function *ϕ*, we use the Swish/SiLU function^{73,74}.

To aid the training process, we scale and shift the output of the NN

where the trainable shift parameters \({\mu }_{{Z}_{i}}\) are initialized by solving a linear regression problem, and the trainable scale parameters \({\rho }_{{Z}_{i}}\) are initialized to one. The per-atom RMSE of the regression solution determines the constant *c*^{17}.

GM-NN models employ the Gaussian moment (GM) representation to encode the invariance of total energy with respect to translations, rotations, and permutations of the same species^{16}. By computing pairwise distance vectors **r**_{ij} = **r**_{i} − **r**_{j} and then splitting them into radial and angular components, denoted as *r*_{ij} = ∥**r**_{ij}∥_{2} and \({\hat{{{{\bf{r}}}}}}_{ij}={{{{\bf{r}}}}}_{ij}/{r}_{ij}\), respectively, we obtain GMs as follows

where \({\hat{{{{\bf{r}}}}}}_{ij}^{\otimes L}={\hat{{{{\bf{r}}}}}}_{ij}\otimes \cdots \otimes {\hat{{{{\bf{r}}}}}}_{ij}\) is the *L*-fold outer product. The nonlinear radial functions \({R}_{{Z}_{i},{Z}_{j},s}({r}_{ij},{{{\boldsymbol{\beta }}}})\) are defined as a sum of Gaussian functions \({\Phi }_{{s}^{{\prime} }}({r}_{ij})\) (*N*_{Gauss} = 9 for this work)^{17}

The factor \(1/\sqrt{{N}_{{{{\rm{Gauss}}}}}}\) impacts the effective learning rate inspired by neural tangent parameterization^{72}. The radial functions are centered at equidistantly spaced grid points ranging from \({r}_{\min }=0.5\) Å to *r*_{c}, set to 5.0 Å and 6.0 Å for alanine dipeptide and MIL-53(Al), respectively. The radial functions are re-scaled by a cosine cutoff function^{13}, to ensure a smooth dependence on the number of atoms within the cutoff sphere. Chemical information is embedded in the GM representation through trainable parameters \({\beta }_{{Z}_{i},{Z}_{j},s,{s}^{{\prime} }}\), with the index *s* iterating over the number of independent radial basis functions (*N*_{basis} = 7 for this work).

Features invariant to rotations, **G**_{i}, are obtained by computing full tensor contractions of tensors defined in Eq. (11), e.g.^{16,17},

where we use Einstein notation, i.e., the right-hand side is summed over *a*, *b* ∈ {1, 2, 3}. Specific full tensor contractions are defined by using generating graphs^{75}. In a practical implementation, we compute all GMs at once and reduce the number of invariant features based on the permutational symmetries of the respective graphs.

All parameters ** θ** = {

**W**,

**b**,

**,**

*β***,**

*ρ***} of the NN are optimized by minimizing the combined squared loss on training data \({{{{\mathscr{D}}}}}_{{{{\rm{train}}}}}=\left({{{{\mathscr{X}}}}}_{{{{\rm{train}}}}},{{{{\mathscr{Y}}}}}_{{{{\rm{train}}}}}\right)\), with \({{{{\mathcal{X}}}}}_{{{{\rm{train}}}}}={\{{S}^{(k)}\}}_{k = 1}^{{N}_{{{{\rm{train}}}}}}\) and \({{{{\mathcal{Y}}}}}_{{{{\rm{train}}}}}={\{{E}_{k}^{{{{\rm{ref}}}}},{\{{{{{\bf{F}}}}}_{i,k}^{{{{\rm{ref}}}}}\}}_{i = 1}^{{N}_{{{{\rm{at}}}}}},{{{{\boldsymbol{\sigma }}}}}_{k}^{{{{\rm{ref}}}}}\}}_{k = 1}^{{N}_{{{{\rm{train}}}}}}\),**

*μ*We have chosen *C*_{e} = 1.0, *C*_{f} = 4.0 Å^{2}, and *C*_{s} = 0.01 to balance the relative contributions of energies, forces, and stresses, respectively.

Using AD, we compute atomic forces as negative gradients of total energy with respect to atomic coordinates

Furthermore, we use AD to compute stress tensor, defined by^{69}

where \(E\left({S}^{(k)},{{{\boldsymbol{\theta }}}}\right)\) is total energy after a strain deformation with symmetric tensor \({{{\boldsymbol{\epsilon }}}}\in {{\mathbb{R}}}^{3\times 3}\), i.e., \(\tilde{{{{\bf{r}}}}}=\left({{{\bf{1}}}}+{{{\boldsymbol{\epsilon }}}}\right)\cdot {{{\bf{r}}}}\). As the stress tensor is symmetric, we use only its upper triangular part in the loss function. Here, *V*_{k} is the volume of the periodic cell.

We employ the Adam optimizer^{71} to minimize the loss function. The respective parameters of the optimizer are *β*_{1} = 0.9, *β*_{2} = 0.999, and *ϵ* = 10^{−7}. Usually, we work with a mini-batch of 32 molecules. However, smaller mini-batches were used in the initial AL iterations because the training data sizes were less than 32. The layer-wise learning rates are decayed linearly. The initial values are set to 0.03 for the parameters of the fully connected layers, 0.02 for the trainable representation, as well as 0.05 and 0.001 for the scale and shift parameters of atomic energies, respectively. The training is performed for 1000 training epochs. To prevent overfitting during training, we employ the early stopping technique^{76}. All models are trained using PyTorch^{59}.

### Sketched gradient features

We obtain atomic gradient features by computing gradients of Eq. (1) with respect to the parameters of the fully connected layers in Eq. (9). Particularly, we make use of the product structure of atomic gradient features. To obtain the latter, we re-write the network in Eq. (9) as follows

where **z**^{(l)} and **x**^{(l)} denote the pre- and post-activation vectors of layer *l*. Thus, atomic gradient features read

To make the calculation of gradient features computationally tractable, we employ the random projections (sketching) technique^{55}, as proposed in refs. ^{51,52}. For atomic gradient features \({\phi }_{i}\left({S}_{i}\right)\in {{{{\bf{R}}}}}^{{N}_{{{{\rm{feat}}}}}}\) and a random matrix \({{{\bf{U}}}}\in {{\mathbb{R}}}^{{N}_{{{{\rm{rp}}}}}\times {N}_{{{{\rm{feat}}}}}}\)—with *N*_{feat} and *N*_{rp} denoting the number of atomic features and random projections, respectively—we can define randomly projected atomic gradient features as

While a Gaussian sketch could be employed, where the elements of **U** are drawn from standard normal distributions, we use a tensor sketching approach that is more runtime and memory efficient^{52}. Specifically, denoting the element-wise or Hadamard product as ⊙ , we compute

with \({\phi }_{i,{{{\rm{out}}}}}^{(l)}({S}_{i})=\partial {{{{\bf{z}}}}}_{i}^{(L)}/\partial {{{{\bf{z}}}}}_{i}^{(l)}\) and \({\phi }_{i,{{{\rm{in}}}}}^{(l)}({S}_{i})={\tilde{{{{\bf{x}}}}}}_{i}^{(l)}\). All entries of \({{{{\bf{U}}}}}_{{{{\rm{in}}}}}^{(l)}\) and \({{{{\bf{U}}}}}_{{{{\rm{out}}}}}^{(l)}\) are sampled independently from a standard normal distribution.

For atom-based uncertainties, we can directly use the sketched atomic gradient features. For (total) uncertainties per atom, we need to work with a mean \(\phi (S)=\mathop{\sum }\nolimits_{i = 1}^{{N}_{{{{\rm{at}}}}}}{\phi }_{i}({S}_{i})/{N}_{{{{\rm{at}}}}}\). Thus, we use that the individual projections (rows of Eq. (20)) are linear in the features and obtain for the (total) gradient features^{51}

given that all of the individual random projections use the same random matrices.

### Ensemble-based uncertainty quantification

The variance of the predictions of individual models in an ensemble of MLIPs can be used to quantify their uncertainty. Thus, we define the variance of predicted energy as

where *M* is the number of models in the ensemble. The variance of atomic forces reads

Here, \(\bar{E}\) and \({\bar{{{{\bf{F}}}}}}_{i}\) denote the arithmetic mean of the predictions from individual models. Our experiments demonstrated that *M* = 3 is sufficient to obtain good performance. Using larger ensembles would make the ensemble-based uncertainty quantification even more computationally inefficient than gradient-based alternatives.

### Batch selection methods

The simplest batch selection method is based on querying points only by their uncertainty values. Specifically, given the already selected structures \({{{{\mathcal{X}}}}}_{{{{\rm{batch}}}}}\) from an unlabeled pool \({{{{\mathcal{X}}}}}_{{{{\rm{pool}}}}}\) we select the next point by

until *N*_{batch} > 1 structures are selected. In this work, we use this selection method combined with ensemble-based uncertainties.

For the posterior-based uncertainty, we can constrain the diversity of the selected batch by using the posterior covariance between structures

with \({\Phi }_{{{{\rm{train}}}}}=\phi \left({{{{\mathcal{X}}}}}_{{{{\rm{train}}}}}\right)\). The corresponding method greedily selects structures, i.e., one structure per iteration, such that the determinant of the covariance matrix is maximized^{51,52,77}

For the distance-based uncertainty, we ensure the diversity of the acquired batch by greedily selecting structures with a maximum distance to all previously selected and training data points. The respective selection method reads^{51,52,78}

We also applied this batch selection method to define the most representative subset of atomic gradient features when calculating atom-based uncertainty using feature space distances.

Lastly, to compare the performance of uncertainty-based data generation approaches with conventional random sampling from an ab initio MD, we employ a random selection strategy combined with posterior-based uncertainty to terminate MD simulations. We define random selection as

where \({{{\mathcal{U}}}}\) is the uniform distribution over \({{{{\mathcal{X}}}}}_{{{{\rm{pool}}}}}\).

### Conformal prediction

Conformal prediction methods offer distribution-free uncertainty quantification with guaranteed finite sample coverage^{49,79,80,81,82}, thus ensuring calibration. Finite sample coverage can be defined as

Here, \(\left({x}_{{{{\rm{test}}}}},{y}_{{{{\rm{test}}}}}\right)\) are the newly observed data, while *C* defines the prediction set based on previous observations \({\{\left({x}_{k},{y}_{k}\right)\}}_{k = 1}^{{N}_{{{{\rm{calibr}}}}}}\). The user determines the hyper-parameter *α* and defines the desired confidence level. CP methods guarantee that the prediction set contains the true label with a probability of almost 1 − *α*.

We employ inductive CP, which comprises the following steps^{49,79}: (1) A subset of calibration data, sized *N*_{calibr}, is selected, and the corresponding errors are computed on this subset. For atomic forces, we employ RMSEs \(\Delta {{{{\bf{F}}}}}_{i}^{2}=\frac{1}{3}\left\Vert {{{{\bf{F}}}}}_{i}-{{{{\bf{F}}}}}_{i}^{{{{\rm{ref}}}}}\right\Vert_{2}^{2}\), while for total energies the respective energy absolute errors per atom, Δ*e* = ∣*E* − *E*^{ref}∣/*N*_{at}, are used. (2) The uncertainty \(u\left(S\right)\) is calculated for this subset of data. (3) The ratio \(\Delta e/u\left(S\right)\) or \(\Delta {{{{\bf{F}}}}}_{i}/u\left({S}_{i}\right)\) is computed. (4) Utilizing quantile regression, the \(\left(1-\alpha \right)\left({N}_{{{{\rm{calibr}}}}}+1\right)/{N}_{{{{\rm{calibr}}}}}\)-th quantile, denoted as *s*, is determined. (5) This *s* value is applied to new observations, resulting in the re-scaled and calibrated uncertainty, \(\tilde{u}=s\cdot u\).

### Coverage of collective variable space

To measure how well different methods explore the (bounded) space of interest, we implement a tree-based weighted recursive partitioning of a *d*-dimensional Euclidean space, which is reminiscent of quadtrees^{83} and matrix-based octrees^{84} but allows to choose how many times *n* to split each dimension. Thus, the variety of the tree is *k* = *n*^{d}. Each node of this complete k-ary tree encodes a generalized hypercube of *d* dimensions, where each side length depends on the boundaries of the original space. The root node represents the full bounded space. A tree of height *L* has total number of partitions equal to (*k*^{L+1} − 1)/(*k* − 1), and each level *ℓ* has *k*^{ℓ} nodes. The hyper-parameters we choose in this paper are *n* = 2, *d* = 2 (for the CVs *ϕ* and *ψ* of alanine dipeptide), and *L* = 5, for a total of 1365 partitions of the space of interest.

Our proposed surface coverage metric uses this data structure as a proxy to capture how many space partitions a method can explore in the least amount of iterations. At the same time, we need to penalize methods that get stuck in a region of the space, exploring partitions of smaller volumes, that is, those represented by nodes at deeper levels in the tree. For this reason, each node at level *ℓ* is associated with a reward (or weight) of 1/*k*^{ℓ}, so each level of the tree has a cumulative reward of 1. The optimal strategy would be to perform a breadth-first search of the nodes of this tree, which translates into observing the largest partitions of unobserved space first. In addition, partitions that are revisited by the methods give no additional reward, so there is no gain in getting stuck in a certain partition. We visually represent the idea of the algorithm in the Supplementary Information for the simple case of *d* = 2.

### Auto-correlation analysis

We evaluate the performance of uncertainty-biased MD simulations by investigating the auto-correlation between subsequent time frames of the MD trajectory. The auto-correlation function (ACF) is defined as^{85}

where 〈 ⋯ 〉 denotes the thermodynamic expectation value, *k* is the lag time, and \({{{\mathcal{O}}}}\) is an observable, e.g., atomic positions or atom-based uncertainties. From ACF, we can calculate the auto-correlation time (ACT) for an MD trajectory of length *N*

ACT is related to effective sample size (ESS) by

In this work, we calculate ESS as implemented in TensorFlow^{86} and use it to estimate the ACT.

### Test data set for alanine dipeptide

The test data set for alanine dipeptide comprises 2000 configurations randomly selected from an MD trajectory at 1200 K. This trajectory was generated within the ASE simulation package^{87} by running an MD simulation in the canonical (*N**V**T*) statistical ensemble using the Langevin thermostat. We have used a time step of 0.5 fs and a total simulation time of 1 ns. The AMBER ff19SB force field has provided forces^{57}, as implemented in the TorchMD package using PyTorch^{58,59}. The data set effectively covers the relevant configurational space of alanine dipeptide, representing an upper boundary in exploring its collective variables (CVs).

### MLIP learning details for alanine dipeptide

Each AL experiment starts with training an MLIP with eight alanine dipeptide configurations randomly perturbed from its initial configuration in the C_{7eq} state. Trained MLIPs are then used to run eight parallel MD simulations, initialized from the initial configuration or configurations selected in later iterations. Each MD simulation runs until reaching an empirically defined uncertainty threshold of 1.5 eV Å^{−1}. A lower threshold value may result in slower CV space exploration, while a larger one would lead to the exploration of unphysical configurations. The maximum data set size, comprising training and validation data, is limited to 512 configurations. The Supplementary Information presents the scaling of the presented AL experiments to larger data set sizes, acquiring data sets of 1024 samples. Biased (bias-forces-driven) and unbiased MD simulations are performed using the canonical (*N**V**T*) statistical ensemble within the ASE simulation package^{87}. Unbiased MD simulations are run with the Langevin thermostat at temperatures of 300 K, 600 K, and 1200 K, whereas biased simulations are performed at a constant temperature of 300 K. We have chosen an integration time step of 0.5 fs and set a maximum of 20,000 steps for an MD simulation. A biasing strength of *τ* = 0.25 was also chosen for biased AL experiments. In reference calculations, we employ a force threshold of 20 eV Å^{−1} to exclude unphysical structures, potentially expected at high biasing strengths (equivalently, a smaller integration time step could be used). All AL experiments have been repeated five times.

### Reference DFT calculations for MIL-53(Al)

DFT calculations for MIL-53(Al) were performed using the CP2K simulation package (version 2023.1)^{61}. To ensure consistency with incremental learning experiments^{41}, we employed the PBE functional^{62} with Grimme D3 dispersion correction^{63}. A hybrid basis set, combining TZVP Gaussian basis functions and plane waves, was employed^{88}. GTH pseudopotentials were used to smoothen the electron density near the nuclei^{89}. To ensure the convergence of force and stress calculations, a plane wave cutoff energy of 1000 Ry was selected.

### MLIP learning details for MIL-53(Al)

In each AL experiment, we start with 32 MIL-53(Al) configurations randomly perturbed around its closed-pore state, with 90% reserved for training. Trained MLIPs are then used to perform 32 parallel MD simulations, each running until it reaches an uncertainty threshold of 1.0 eV Å^{−1}. The maximum data set size is limited to 512 configurations, comprising training and validation data. The Supplementary Information presents the scaling of the presented AL experiments to larger data set sizes, acquiring data sets of 1024 samples. Both biased (bias-stress-driven) and unbiased MD simulations use the isothermal-isobaric form of the Nosé–Hoover dynamics^{90,91}. Unbiased MD simulations are carried out at 600 K and 0 MPa, as well as ± 250 MPa (half of the simulations each), while biased simulations are performed at 600 K and 0 MPa. The characteristic time scales of the thermostat and barostat are set to 0.1 ps and 1 ps, respectively. We have chosen an integration time step of 0.5 fs and set a maximum of 20,000 MD steps for an MD simulation. A stress-biasing strength of *τ* = 0.5 is used in biased AL experiments. In reference calculations, we employ a force threshold of 20 eV Å^{−1} to exclude strongly distorted structures. We use the data set from ref. ^{41} as a metadynamics-generated baseline and select the first 500 sequentially generated configurations. All AL experiments are repeated three times, except for metadynamics, which was run once^{41}. For metadynamics, we train three MLIPs initialized using different random seeds.

### Random perturbation of atomic configurations

We obtain randomly perturbed atomic configurations by adding atomic shifts, denoted as *δ*_{i}, to the original atomic positions **r**_{i}

The components of *δ*_{i} are sampled independently from a uniform distribution: for alanine dipeptide, the range is between −0.02 Å and 0.02 Å, and for MIL-53(Al), it is between −0.08 Å and 0.08 Å. Additionally, for MIL-53(Al), we introduce random perturbations to its periodic cell **B** using a strain deformation \({{{\boldsymbol{\epsilon }}}}=\left({{{\bf{A}}}}+{{{{\bf{A}}}}}^{\top }\right)/2\), where the components of **A** are sampled independently from a uniform distribution between −0.02 and 0.02. This transformation can be expressed as

The shifted atomic positions are re-scaled according to

### Sine wave with additive random noise

We model large-amplitude volume fluctuations in MIL-53(Al) induced by the bias stress using a sine wave with period *T*_{0} and additive random noise \(N\left(t\right)\)

where *A* and *B* denote the sine wave’s amplitude and random noise, respectively. In this work, \(N\left(t\right) \sim {{{\mathcal{N}}}}\left(0,1\right)\) represents random noise following a normal distribution with zero mean and unit variance. We chose *A* = 1.0 and *B* = 0.5 for the blue line in Fig. 7. For the red line, we increase the noise amplitude to *B* = 2.0. To represent the volume fluctuations induced in MIL-53(Al) (see Fig. 7), a sine wave with the period twice the length of the MD simulation, i.e., *T*_{0} = 3.2 ns is required.

## Data availability

The data sets generated during this study are available in the Zenodo repository: https://doi.org/10.5281/zenodo.10776838. The MIL-53(Al) test data set is available at https://doi.org/10.5281/zenodo.6359970 (ref. ^{41}).

## Code availability

The source code for this study is available on GitHub and can be accessed via this link: https://github.com/nec-research/alebrew.

## References

Butler, K. T., Davies, D. W., Cartwright, H., Isayev, O. & Walsh, A. Machine learning for molecular and materials science.

*Nature***559**, 547–555 (2018).Smith, J. S. et al. The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules.

*Sci. Data***7**, 134 (2020).Chanussot, L. et al. Open Catalyst 2020 (OC20) Dataset and Community Challenges.

*ACS Catal.***11**, 6059–6072 (2021).Xie, Y. et al. Uncertainty-aware molecular dynamics from Bayesian active learning for phase transformations and thermal transport in SiC.

*npj Comput. Mater.***9**, 36 (2023).Gubaev, K. et al. Performance of two complementary machine-learned potentials in modelling chemically complex systems.

*npj Comput. Mater.***9**, 129 (2023).Langer, M. F., Goeßmann, A. & Rupp, M. Representations of molecules and materials for interpolation of quantum-mechanical simulations via machine learning.

*npj Comput. Mater.***8**, 41 (2022).Rupp, M., Tkatchenko, A., Müller, K.-R. & von Lilienfeld, O. A. Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning.

*Phys. Rev. Lett.***108**, 058301 (2012).Faber, F. A., Christensen, A. S., Huang, B. & von Lilienfeld, O. A. Alchemical and structural distribution based representation for universal quantum machine learning.

*J. Chem. Phys.***148**, 241717 (2018).Shapeev, A. V. Moment tensor potentials: A class of systematically improvable interatomic potentials.

*Multiscale Model. Simul.***14**, 1153–1173 (2016).Drautz, R. Atomic cluster expansion for accurate and transferable interatomic potentials.

*Phys. Rev. B***99**, 014104 (2019).Bartók, A. P., Payne, M. C., Kondor, R. & Csányi, G. Gaussian approximation potentials: The accuracy of quantum mechanics, without the electrons.

*Phys. Rev. Lett.***104**, 136403 (2010).Bartók, A. P., Kondor, R. & Csányi, G. On representing chemical environments.

*Phys. Rev. B***87**, 184115 (2013).Behler, J. & Parrinello, M. Generalized Neural-Network Representation of High-Dimensional Potential-Energy Surfaces.

*Phys. Rev. Lett.***98**, 146401 (2007).Artrith, N. & Urban, A. An implementation of artificial neural-network potentials for atomistic materials simulations: Performance for TiO2.

*Comput. Mater. Sci.***114**, 135–150 (2016).Smith, J. S., Isayev, O. & Roitberg, A. E. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost.

*Chem. Sci.***8**, 3192–3203 (2017).Zaverkin, V. & Kästner, J. Gaussian Moments as Physically Inspired Molecular Descriptors for Accurate and Scalable Machine Learning Potentials.

*J. Chem. Theory Comput.***16**, 5410–5421 (2020).Zaverkin, V., Holzmüller, D., Steinwart, I. & Kästner, J. Fast and Sample-Efficient Interatomic Neural Network Potentials for Molecules and Materials Based on Gaussian Moments.

*J. Chem. Theory Comput.***17**, 6658–6670 (2021).Schütt, K. et al. Schnet: A continuous-filter convolutional neural network for modeling quantum interactions.

*Adv. Neural Inf. Process. Syst.***30**, 992–1002 (2017).Schütt, K. T., Unke, O. T. & Gastegger, M. Equivariant message passing for the prediction of tensorial properties and molecular spectra.

*Int. Conf. Mach. Learn.***139**, 9377–9388 (2021).Batzner, S. et al. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials.

*Nat. Commun.***13**, 2453 (2022).Batatia, I., Kovacs, D. P., Simm, G. N. C., Ortner, C. & Csanyi, G. MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields.

*Adv. Neural Inf. Process. Syst.***35**, 11423–11436 (2022).Gasteiger, J., Becker, F. & Günnemann, S. GemNet: Universal Directional Graph Neural Networks for Molecules.

*Adv. Neural Inf. Process. Syst.***34**, 6790–6802 (2021).Liao, Y.-L., Wood, B., Das, A. & Smidt, T. EquiformerV2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations.

*Int. Conf. Learn. Represent*. https://arxiv.org/abs/2306.12059 (2024).Passaro, S. & Zitnick, C. L. Reducing SO(3) Convolutions to SO(2) for Efficient Equivariant GNNs.

*Int. Conf. Mach. Learn.***202**, 27420–27438 (2023).Friederich, P., Häse, F., Proppe, J. & Aspuru-Guzik, A. Machine-learned potentials for next-generation matter simulations.

*Nat. Mater.***20**, 750–761 (2021).Unke, O. T. et al. Machine Learning Force Fields.

*Chem. Rev.***121**, 10142–10186 (2021).Li, Z., Kermode, J. R. & De Vita, A. Molecular dynamics with on-the-fly machine learning of quantum-mechanical forces.

*Phys. Rev. Lett.***114**, 096405 (2015).Podryabinkin, E. V. & Shapeev, A. V. Active learning of linearly parametrized interatomic potentials.

*Comput. Mater. Sci.***140**, 171–180 (2017).Gastegger, M., Behler, J. & Marquetand, P. Machine learning molecular dynamics for the simulation of infrared spectra.

*Chem. Sci.***8**, 6924–6935 (2017).Zhang, L., Lin, D.-Y., Wang, H., Car, R. & E, W. Active learning of uniformly accurate interatomic potentials for materials simulation.

*Phys. Rev. Mater.***3**, 023804 (2019).Vandermause, J. et al. On-the-fly active learning of interpretable Bayesian force fields for atomistic rare events.

*npj Comput. Mater.***6**, 20 (2020).Shuaibi, M., Sivakumar, S., Chen, R. Q. & Ulissi, Z. W. Enabling robust offline active learning for machine learning potentials using simple physics-based priors.

*Mach. Learn.: Sci. Technol.***2**, 025007 (2021).Briganti, V. & Lunghi, A. Efficient generation of stable linear machine-learning force fields with uncertainty-aware active learning.

*Mach. Learn.: Sci. Technol.***4**, 035005 (2023).Wang, X. et al. Generalization of Graph-Based Active Learning Relaxation Strategies Across Materials.

*Mach. Learn.: Sci. Technol.*https://doi.org/10.1088/2632-2153/ad37f0 (2024).Huber, T., Torda, A. E. & van Gunsteren, W. F. Local elevation: A method for improving the searching properties of molecular dynamics simulation.

*J. Comput. Aid. Mol. Des.***8**, 695–708 (1994).Laio, A. & Parrinello, M. Escaping free-energy minima.

*Proc. Natl. Acad. Sci. USA***99**, 12562–12566 (2002).Barducci, A., Bussi, G. & Parrinello, M. Well-tempered metadynamics: A smoothly converging and tunable free-energy method.

*Phys. Rev. Lett.***100**, 020603 (2008).Demuynck, R. et al. Efficient Construction of Free Energy Profiles of Breathing Metal-Organic Frameworks Using Advanced Molecular Dynamics Simulations.

*J. Chem. Theory Comput.***13**, 5861–5873 (2017).Yoo, D., Jung, J., Jeong, W. & Han, S. Metadynamics sampling in atomic environment space for collecting training data for machine learning potentials.

*npj Comput. Mater.***7**, 131 (2021).Yang, M., Bonati, L., Polino, D. & Parrinello, M. Using metadynamics to build neural network potentials for reactive events: the case of urea decomposition in water.

*Catal. Today***387**, 143–149 (2022).Vandenhaute, S., Cools-Ceuppens, M., DeKeyser, S., Verstraelen, T. & Van Speybroeck, V. Machine learning potentials for metal-organic frameworks using an incremental learning approach.

*npj Comput. Mater.***9**, 19 (2023).Kulichenko, M. et al. Uncertainty-driven dynamics for active learning of interatomic potentials.

*Nat. Comput. Sci.***3**, 230–239 (2023).van der Oord, C., Sachs, M., Kovács, D. P., Ortner, C. & Csányi, G. Hyperactive learning for data-driven interatomic potentials.

*npj Comput. Mater.***9**, 168 (2023).Schwalbe-Koda, D., Tan, A. R. & Gómez-Bombarelli, R. Differentiable sampling of molecular geometries with uncertainty-based adversarial attacks.

*Nat. Commun.***12**, 5104 (2021).Carrete, J., Montes-Campos, H., Wanzenböck, R., Heid, E. & Madsen, G. K. H. Deep ensembles vs committees for uncertainty estimation in neural-network force fields: Comparison and application to active learning.

*J. Chem. Phys.***158**, 204801 (2023).Kuleshov, V., Fenner, N. & Ermon, S. Accurate Uncertainties for Deep Learning Using Calibrated Regression.

*Int. Conf. Mach. Learn.***80**, 2796–2804 (2018).Pernot, P. The long road to calibrated prediction uncertainty in computational chemistry.

*J. Chem. Phys.***156**, 114109 (2022).Tran, K. et al. Methods for comparing uncertainty quantifications for material property predictions.

*Mach. Learn.: Sci. Technol.***1**, 025006 (2020).Hu, Y., Musielewicz, J., Ulissi, Z. W. & Medford, A. J. Robust and scalable uncertainty estimation with conformal prediction for machine-learned interatomic potentials.

*Mach. Learn.: Sci. Technol.***3**, 045028 (2022).Zaverkin, V. & Kästner, J. Exploration of transferable and uniformly accurate neural network interatomic potentials using optimal experimental design.

*Mach. Learn.: Sci. Technol.***2**, 035009 (2021).Zaverkin, V., Holzmüller, D., Steinwart, I. & Kästner, J. Exploring chemical and conformational spaces by batch mode deep active learning.

*Digital Discovery***1**, 605–620 (2022).Holzmüller, D., Zaverkin, V., Kästner, J. & Steinwart, I. A framework and benchmark for deep batch active learning for regression.

*J. Mach. Learn. Res.***24**, 1–81 (2023).Schran, C., Brezina, K. & Marsalek, O. Committee neural network potentials control generalization errors and enable active learning.

*J. Chem. Phys.***153**, 104105 (2020).Kirsch, A. Black-Box Batch Active Learning for Regression.

*Transact. Mach. Learn. Res*. https://arxiv.org/abs/2302.08981 (2023).Woodruff, D. P. Sketching as a tool for numerical linear algebra.

*Found. Trends Theor. Comput. Sci.***10**, 1–157 (2014).Bolhuis, P. G., Dellago, C. & Chandler, D. Reaction coordinates of biomolecular isomerization.

*Proc. Natl. Acad. Sci. USA***97**, 5877–5882 (2000).Tian, C. et al. ff19sb: Amino-acid-specific protein backbone parameters trained against quantum mechanics energy surfaces in solution.

*J. Chem. Theory Comput.***16**, 528–552 (2020).Doerr, S. et al. Torchmd: A deep learning framework for molecular simulations.

*J. Chem. Theory Comput.***17**, 2355–2363 (2021).Paszke, A. et al. Pytorch: An imperative style, high-performance deep learning library.

*Adv. Neural Inf. Process. Syst.***32**, 8024–8035 (2019).Christiansen, H., Errica, F. & Alesiani, F. Self-tuning Hamiltonian Monte Carlo for accelerated sampling.

*J. Chem. Phys.***159**, 234109 (2023).Kühne, T. D. et al. CP2K: An electronic structure and molecular dynamics software package - Quickstep: Efficient and accurate electronic structure calculations.

*J. Chem. Phys.***152**, 194103 (2020).Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient approximation made simple.

*Phys. Rev. Lett.***77**, 3865–3868 (1996).Grimme, S., Antony, J., Ehrlich, S. & Krieg, H. A consistent and accurate ab initio parametrization of density functional dispersion correction (DFT-D) for the 94 elements H–Pu.

*J. Chem. Phys.***132**, 154104 (2010).Nagyfalusi, B., Udvardi, L. & Szunyogh, L. First principles and metadynamics study of the spin-reorientation transition in Fe/Au(001) films.

*J. Phys. Conf. Ser.***903**, 012016 (2017).Ibayashi, H. et al. Allegro-legato: Scalable, fast, and robust neural-network quantum molecular dynamics via sharpness-aware minimization.

*High Perform. Comput*. https://doi.org/10.1007/978-3-031-32041-5_12 (2023).Zhao, J., Kennedy, S. D. & Turner, D. H. Nuclear Magnetic Resonance Spectra and AMBER OL3 and ROC-RNA Simulations of UCUCGU Reveal Force Field Strengths and Weaknesses for Single-Stranded RNA.

*J. Chem. Theory Comput.***18**, 1241–1254 (2022).Janet, J. P., Duan, C., Yang, T., Nandy, A. & Kulik, H. J. A quantitative uncertainty metric controls error in neural network-driven chemical discovery.

*Chem. Sci.***10**, 7913–7922 (2019).Zhu, A., Batzner, S., Musaelian, A. & Kozinsky, B. Fast uncertainty estimates in deep learning interatomic potentials.

*J. Chem. Phys.***158**, 164111 (2023).Knuth, F., Carbogno, C., Atalla, V., Blum, V. & Scheffler, M. All-electron formalism for total energy strain derivatives and stress tensor components for numeric atom-centered orbitals.

*Comput. Phys. Comm.***190**, 33–50 (2015).Duchi, J., Hazan, E. & Singer, Y. Adaptive subgradient methods for online learning and stochastic optimization.

*J. Mach. Learn. Res.***12**, 2121–2159 (2011).Kingma, D. P. & Ba, J. Adam: A Method for Stochastic Optimization.

*Int. Conf. Learn. Represent*. https://arxiv.org/abs/1412.6980 (2015).Jacot, A., Gabriel, F. & Hongler, C. Neural Tangent Kernel: Convergence and Generalization in Neural Networks.

*Adv. Neural Inf. Process. Syst.***31**, 8580–8589 (2018).Elfwing, S., Uchibe, E. & Doya, K. Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning.

*Neural Netw.***107**, 3–11 (2018).Ramachandran, P., Zoph, B. & Le, Q. V. Searching for activation functions.

*Int. Conf. Learn. Represent*. https://arxiv.org/abs/1710.05941 (2018).Suk, T. & Flusser, J. Tensor method for constructing 3D moment invariants. In

*Computer Analysis of Images and Patterns*(eds Real, P. et al.), 212–219 (Springer, 2011).Prechelt, L. Early stopping—but when? In

*Neural Networks: Tricks of the Trade*(eds. Montavon, G. et al.), 53–67 (Springer, 2012).Kirsch, A., Van Amersfoort, J. & Gal, Y. BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning.

*Adv. Neural Inf. Process. Syst.***32**, 7026–7037 (2019).Sener, O. & Savarese, S. Active learning for convolutional neural networks: A core-set approach.

*Int. Conf. Learn. Represent*. https://arxiv.org/abs/1708.00489 (2018).Vovk, V., Gammerman, A. & Shafer, G. Algorithmic learning in a random world (Springer, 2005).

Lei, J., G’Sell, M., Rinaldo, A., Tibshirani, R. J. & Wasserman, L. Distribution-free predictive inference for regression.

*J. Am. Stat. Assoc.***113**, 1094–1111 (2018).Romano, Y., Patterson, E. & Candés, E. J. Conformalized quantile regression.

*Adv. Neural Inf. Process. Syst.***32**, 3543–3553 (2019).Angelopoulos, A. N. & Bates, S. Conformal Prediction: A Gentle Introduction.

*Found. Trends Mach. Learn.***16**, 494–591 (2023).Finkel, R. A. & Bentley, J. L. Quad trees a data structure for retrieval on composite keys.

*Acta Inform.***4**, 1–9 (1974).Meagher, D. J. Octree Encoding: A New Technique for the Representation, Manipulation and Display of Arbitrary 3-D Objects by Computer (Electrical and Systems Engineering Department Rensseiaer Polytechnic, 1980).

Janke, W. Monte Carlo Simulations in Statistical Physics – From Basic Principles to Advanced Applications. In

*Order, Disorder and Criticality*, 93–166 (World Scientific, 2013).Dillon, J. V. et al. Tensorflow distributions. Preprint at https://arxiv.org/abs/1711.10604 (2017).

Larsen, A. H. et al. The atomic simulation environment—a python library for working with atoms.

*J. Phys. Condens. Matter***29**, 273002 (2017).Lippert, G., Hutter, J. & Parrinello, M. A hybrid Gaussian and plane wave density functional scheme.

*Mol. Phys.***92**, 477–487 (1997).Goedecker, S., Teter, M. & Hutter, J. Separable dual-space Gaussian pseudopotentials.

*Phys. Rev. B***54**, 1703–1710 (1996).Melchionna, S., Ciccotti, G. & Holian, B. L. Hoover NPT dynamics for systems varying in shape and size.

*Mol. Phys.***78**, 533–544 (1993).Melchionna, S. Constrained systems and statistical distribution.

*Phys. Rev. E***61**, 6165–6170 (2000).

## Acknowledgements

Funded by Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy - EXC 2075 – 390740016. We acknowledge the support by the Stuttgart Center for Simulation Science (SimTech). The authors thank the International Max Planck Research School for Intelligent Systems (IMPRS-IS) for supporting David Holzmüller.

## Author information

### Authors and Affiliations

### Contributions

All authors designed the project, discussed the results, and wrote the manuscript. Viktor Zaverkin performed the calculations.

### Corresponding author

## Ethics declarations

### Competing interests

The authors declare no competing interests.

## Additional information

**Publisher’s note** Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Supplementary information

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

## About this article

### Cite this article

Zaverkin, V., Holzmüller, D., Christiansen, H. *et al.* Uncertainty-biased molecular dynamics for learning uniformly accurate interatomic potentials.
*npj Comput Mater* **10**, 83 (2024). https://doi.org/10.1038/s41524-024-01254-1

Received:

Accepted:

Published:

DOI: https://doi.org/10.1038/s41524-024-01254-1