Ultra-fast interpretable machine-learning potentials

All-atom dynamics simulations are an indispensable quantitative tool in physics, chemistry, and materials science, but large systems and long simulation times remain challenging due to the trade-off between computational efficiency and predictive accuracy. To address this challenge, we combine effective two- and three-body potentials in a cubic B-spline basis with regularized linear regression to obtain machine-learning potentials that are physically interpretable, sufficiently accurate for applications, as fast as the fastest traditional empirical potentials, and two to four orders of magnitude faster than state-of-the-art machine-learning potentials. For data from empirical potentials, we demonstrate exact retrieval of the potential. For data from density functional theory, the predicted energies, forces, and derived properties, including phonon spectra, elastic constants, and melting points, closely match those of the reference method. The introduced potentials might contribute towards accurate all-atom dynamics simulations of large atomistic systems over long time scales.


I. INTRODUCTION
All-atom dynamics simulations enable the quantitative study of atomistic systems and their interactions in physics, chemistry, materials science, pharmaceutical sciences, and related areas.The simulations' capabilities and limits depend on the potential used to calculate the forces acting on the atoms, with an inherent correlation between the accuracy of the underlying physical model and the required computational effort: On the one hand, electronic structure methods tend to be accurate, slow, applicable to many systems, and require little human parametrization effort.On the other hand, traditional empirical potentials are fast but limited in accuracy and applicability, with often high parametrization effort.
Machine-learning potentials (MLPs) [1][2][3][4] are flexible functions fitted to reference energy and force data from, e.g., electronic structure methods.Their computational advantage does not primarily arise from simplified physical models but from avoiding redundant calculations through interpolation.Hence, they can stay close to the accuracy of the reference method while being orders of magnitude faster (see Fig. 1), with little human parametrization effort, but are often hard to interpret.However, current accurate MLPs are still orders of magnitude slower than traditional empirical potentials, limiting their use for dynamics simulations of large atomistic systems over long time scales.
In this work, we develop an interpretable linear MLP based on effective two-and three-body potentials using a flexible cubic B-spline basis.Figure 1 demonstrates how this ultra-fast potential (UF) is close in error to state-of-the-art MLPs while being as fast as the fastest traditional empirical potentials, such as the Morse and Lennard-Jones potentials.Trade-off between prediction error and computational cost of evaluating machine-learning potentials.Prediction errors are relative to the underlying electronic-structure reference method (disk).Ultra-fast potentials (this work, stars) with two-body (UF2) and three-body (UF2,3) terms are as fast as traditional empirical potentials (squares) but close in error to state-of-the-art machine-learning potentials (triangles, diamond).All potentials except EAM4 were refitted to the same tungsten data set.Computational costs were benchmarked with a 128-atom bcc-tungsten supercell.Trade-offs between accuracy and cost also arise from the choices of hyperparameters for each potential.See Sections II and V for abbreviations and details.

II. BACKGROUND
The many-body expansion [5] of an atomistic system's potential energy is a sum of N -body potentials V N that depend on atom positions R i and element species σ i , where the prefactors account for double-counting.Assuming transferability of the V N across configurations provides the basis for empirical and machine-learning potentials.
In this study, we omit the species dependency as well as the reference energies E 0 and 1 ) and subsume the factorial prefactors into the potentials V for simplicity.The extension to multi-component systems is straightforward and implemented in the accompanying program code.
Traditional empirical potentials often have rigid functional forms with a small number of tunable parameters.These are optimized to reproduce experimental quantities, such as lattice parameters and elastic coefficients, as well as calculated quantities from first principles, such as energies of crystal structures, defects, and surfaces [6][7][8].Typically this requires global optimization, e.g., via simulated annealing and substantial human effort.
Pair potentials truncate Eq. (1) after two-body terms, where r ij is the distance between atoms i and j.These potentials are limited to systems where higher-order terms such as angular and dihedral interactions are negligible.The functional forms of the Lennard-Jones (LJ) [9] potential and the Morse potential [10] were originally developed for their numerical efficiency, where ϵ, σ, and D 0 , a, r c are model parameters.
Many-body potentials extend the pair formalism by including additional many-body interactions, either in the form of many-body functions as in Eq. (1), or via environment-dependent functionals, such as in the embedded atom method (EAM) [11], Here, the embedding energy F is a non-linear function of the electron density ρ, which is approximated by a pairwise sum.While traditional empirical potentials have seen success in applications across decades of research, their rigid functional forms limit their accuracy.More recently, MLPs with flexible functional forms and built-in physics domain knowledge in the form of engineered features or deep neural network architectures have emerged as an alternative [2,[12][13][14][15][16] State-of-the-art MLPs can simulate the dynamics of large ("high-dimensional") atomistic systems with an accuracy close to the underlying electronic-structure reference method but orders of magnitude faster.[17,18] However, they are still orders of magnitude slower than fast traditional empirical potentials, limiting their application in system size and simulation length.
A complementary approach to improve speed is to use basis functions that are fast to evaluate.In the context of MLPs, non-linear kernel-based MLPs have been trained and subsequently projected onto a spline basis, yielding a linear model [28].Similar to this work, the general twoand three-body potential (GTTP) approach employs a quadratic spline basis set, exceeding MTPs in speed when trained on the same data [29].Polynomial symmetry functions (PSF) improve over Behler-Parrinello symmetry functions [30] in speed and accuracy by introducing compact support [31].Recently, new methods for fitting spline-based modified EAM potentials were benchmarked against MLPs, demonstrating comparable accuracy despite the lower complexity of their functional forms [32].
Motivated by these observations, we developed an ultra-fast (UF) MLP that combines the speed of the fastest traditional empirical potentials with an accuracy close to state-of-the-art MLPs by employing regularized linear regression with spline basis functions with compact support to learn effective two-and three-body interactions.Figure 1 showcases the relation between prediction errors and computational costs for three traditional empirical potentials (LJ, Morse, EAM) and several MLPs benchmarked on a dataset of elemental tungsten [33].While the traditional empirical potentials are fast but limited by accuracy, the MLPs are accurate but limited by speed.UF MLPs improve on the Pareto frontier of predictive accuracy and computational costs.They are available as an open-source Python implemen- B-spline basis set for pair potentials.A) Ten weighted B-splines after fitting (B * n = cnBn(r), solid curves) and their ranges of support.Their sum is the fitted pair potential (V (r), blue curve).In this example, knots tn are selected with uniform spacing and illustrated as vertical lines.B) Using cubic B-spline basis functions, the pair potential V (r) has a smooth and continuous first derivative V ′ (r) (dotted line), which is essential for reproducing accurate forces.Its second derivative V ′′ (r) (dashed line) is continuous, which is essential for reproducing stresses and phonon frequencies.
C) The optimized UF2 potential exhibits more inflection points than the optimized LJ (dotted line) and Morse (dashed line) potentials, highlighting its increased flexibility.tation (UF 3 , Ultra-Fast Force Fields) [34] with interfaces to the VASP [35] electronic-structure code and the LAMMPS [36,37] molecular dynamics code.

III. RESULTS AND DISCUSSION
The central idea of UF MLPs is to learn an effective low-order many-body expansion of the potential energy surface, using basis functions that are efficient to evaluate.For this, we truncate the many-body expansion of Eq. (1) at two-or three-body terms and express each term as a function of pairwise distances (one distance for the two-body and three distances for the three-body term).This approach is general and can be extended to higher-order terms.To minimize the computational cost of predictions, we represent N -body terms in a set of basis functions with compact support and sufficiently many derivatives to describe energies, forces, and vibrational modes, i.e., cubic splines.

A. Expansion in B-splines
Splines are piecewise polynomial functions with locally simple forms, joined together at knot positions [38].They are globally flexible and smooth but do not suffer from some of the oscillatory problems of polynomial interpolators (e.g., Runge's phenomenon [39]).
Spline interpolation is well-established in empirical potential development [40,41].The LAMMPS package [36], a leading framework for molecular dynamics simulations, implements cubic spline interpolation for pair potentials and selected many-body potentials, including EAM.The primary motivation for splines is the desire for computational efficiency and the opportunity to improve accuracy systematically [42].Their compact support and simple form make spline-based potentials the fastest choice to evaluate, and adding more knots increases their resolution.
B-splines constitute a basis for splines of arbitrary order.They are recursively defined as [38] B n,1 (r) = 1, t n ≤ r < t n+1 0, otherwise where t n is the n-th knot position, and d is the degree of the polynomial.The position and number of knots, a non-decreasing sequence of support points that uniquely determine the basis set, may be fixed or treated as free parameters.B-Splines are well suited for interpolation due to their intrinsic smoothness and differentiability.Their derivatives are also defined recursively: The UF potential describes the energy E of an atomistic system via two-and three-body interactions: For finite systems such as molecules or clusters, indices i, j, k run over all atoms.For infinite systems modeled via periodic boundary conditions, i runs over the atoms in the simulation cell, and j, k run over all neighboring   Here, θ jik is substituted for r jk using the law of cosines for ease of visualization.c) Volume slices of V3(rij, r ik , θ jik ) reveal favorable (blue) and unfavorable (red) three-body interactions.
atoms, including those in adjacent copies of the simulation cell.While these are infinitely many, the sums are truncated in practice by assuming locality, that is, finite support of V 2 and V 3 .
Modeling N -body interactions requires n 2dimensional tensor product splines.The UF potential therefore expresses V 2 and V 3 as linear combinations of cubic B-splines, B n = B n,3+1 , and tensor product splines: where K, K l , K m , and K n denote the number of basis functions per spline or tensor spline dimension, and c n and c lmn are corresponding coefficients.The B-spline basis set spans a finite domain and is bounded by the end knots [t 0 , t K ].At the upper limit t K , the potential smoothly goes to zero, and near the lower limit t 0 , it monotonically increases with shorter distances to prevent atoms from getting unphysically close.Figure 2A illustrates the compact support of the cubic Bspline basis functions.By construction, each basis function is nonzero across four adjacent intervals.Therefore, evaluating the two-and three-body potentials involves evaluating at most 4 and 4 3 = 64 basis functions for any pair or triplet of distances, respectively, giving rise to the aforementioned computational efficiency.
The force F a acting on atom a is given as the negative gradient −∇ Ra E of the energy E with respect to the atom's Cartesian coordinate R a and obtained analytically from the derivatives of the two-and three-body potentials, with where l is the Cartesian coordinate.Figure 2B illustrates, for the two-body potential, that the choice of the cubic B-spline basis results in a smooth, continuous first derivative comprised of quadratic Bsplines and a continuous second derivative of linear Bsplines.

B. Regularized least-squares optimization with energies and forces
During the fitting procedure, we optimize all spline coefficients c simultaneously with the regularized linear least-squares method.Given atomic configurations S, energies E, and forces F, we fit the potential energy func-tion E of Eq. ( 8) by minimizing the loss function where the second sum is taken over force components.
Here, κ ∈ [0, 1] is a weighting parameter that controls the relative contributions between energy and forcecomponent residuals, |E| and |F| are the number of energy and force observations in the training set, and σ E and σ F are the sample standard deviations of energies and force components across the training set.This normalization yields dimensionless residuals, allowing κ to balance the relative contributions from energies and forces, independently of the size and variance of the training energies and forces.The minimization of L with respect to the spline coefficients c is a linear least-squares problem with Tikhonov regularization and solution where I is the identity matrix, y contains energies and forces, and each element of X is the sum of B-spline values taken over all relevant pair distances in each configuration (rows) for each basis function (columns).Subsets of columns correspond to different body orders.Similarly, for multi-component systems, subsets of columns correspond to different chemical interactions.When forces are included in y, the corresponding rows of X are generated according to Eqs. (10) and (11).This optimization problem is strongly convex, allowing for an efficient and deterministic solution with LU decomposition.
The used Tikhonov regularization controls the magnitude of spline coefficients c through the parameter λ 1 via the ridge penalty and the curvature and local smoothness across adjacent spline coefficients through λ 2 via the difference penalty [43,44].For the two-body case, D 2 is given by For higher-order potential terms, the difference penalty affects the spline coefficients that are adjacent in each dimension of a tensor product spline.This difference penalty is related to a penalty on the integral of the squared second derivative of the potential [45,46], used in the aPIP potentials [26].However, the difference penalty is less complex because the dimensionality of the corresponding regularization problem is simply the number of basis functions K [44].

SNAP qSNAP
Normalized Error: Learning curves for UF and SNAP potentials fit to bcc tungsten.Shown are out-of-sample normalized prediction errors (dots) for training sets of increasing size, with five repetitions per size.Fitted curves (lines) are soft-plus functions that capture both the initial linear slope in log-log space and the observed saturation.Normalized error includes energy and force error contributions weighted by the respective standard deviations of energies and forces in the training set, respectively.Simpler potentials saturate earlier, but with higher error than more complex potentials (UF2 vs. SNAP; UF2 vs. UF2,3; SNAP vs. QSNAP), outperforming them when training data is limited.
Figure 2C compares optimized UF 2 , LJ, and Morse potentials for tungsten.Although the three curves have similar minima and behavior for greater pair distances r, the UF 2 potential exhibits additional inflection points.We attribute the ability of the UF 2 potential to reproduce the properties of a bcc metal, a traditionally difficult task for pair potentials, to its flexible functional form.
The fitting of a UF potential maps energy and force data onto effective two-and three-body terms, as shown in Figure 3 for the tungsten dataset.Both terms can be visualized directly, providing interpretability by disentangling contributions to the interatomic interactions.The inspection of minima, repulsive and attractive contributions, and inflection points enables insight into the chemical bonding characteristics of the material.This straightforward and visual analysis makes the UF potentials more directly interpretable than most MLPs.

C. Convergence of error in energy and force predictions
UF potentials should exactly reproduce any two-and three-body reference potential by construction, given sufficiently many basis functions and training data.To establish baseline functionality, we fit the UF 2 potential to energies and forces from the LJ potential for elemental tungsten, and the UF 2,3 potential to energies and forces from the Stillinger-Weber potential on elemental silicon, which they both reproduce with negligible error (see the Supplemental Information for learning curves and details).
To assess the accuracy of UF potentials, we measure their ability to predict DFT energies and forces in tungsten as a function of the amount of training data.Figure 4 compares learning curves for the two-body UF 2 , two-and three-body UF 2,3 , SNAP, and qSNAP potentials.To quantify prediction performance, we use the root-mean-squared-error (RMSE) on randomly sampled hold-out test sets.In this we ensured that each test set contained all available configuration types (see Section V A).Each learning curve is fit with a softplus function ln(1 + e n ), where n is the number of training data.This function captures both the initial linear slope in loglog space and the observed saturation due to the models' finite complexity.
Of the four models, the UF 2 potential, using 28 basis functions, converges earliest.The SNAP potential, using 56 basis functions based on hyperspherical harmonics, converges slightly later with lower errors.The qSNAP potential, using 496 basis functions including quadratic terms, converges last, with modest improvements in error over SNAP.Finally, the UF 2,3 potential, using 915 basis functions, is comparable to SNAP in convergence speed with lower errors in energies and similar errors in forces.Separate energy and force learning curves are included in Fig. S5. in the supplemental information.The convergence speed is related to both the number of basis functions and the complexity of the many-body interactions.This indicates that the simpler UF 2 potential may be more suitable than other MLPs when data is scarce.

D. Validation with derived quantities
To benchmark performance for applications, we computed several derived quantities that were not included in the fit, such as the phonon spectrum and melting temperature, using each potential.Figure 5   Despite its low computational cost, the UF 2 pair potential exhibits errors comparable to those of more complex potentials, such as SNAP and qSNAP.In contrast, the LJ and Morse potentials severely overpredict and underpredict most properties, respectively.One source of error in pair potentials, including the UF 2 potential, arises due to deviation from the Cauchy relations in materials.The Cauchy relations are constraints between elastic constants that hold if atoms only interact via central forces, e.g., pair potentials, and every atom is a center of inversion, such as in bcc tungsten [47].The Cauchy relation is C 12 = C 44 for fcc and bcc lattices.Nobel gas crystals nearly fulfill this condition, but significant deviations occur for most other crystals.Hence, the errors in C 12 and C 44 are larger for pair potentials, where they are constrained to be equal, than for models with many-body terms.This limitation in modeling the elastic response may hinder the prediction of mechanical properties and defect-related quantities [48].
As an example of an empirical potential used in practice, we selected the EAM4 potential [49] for all benchmarks.The EAM4 model, one of four tungsten models developed by Marinica et al., accurately reproduces the Peierls energy barrier and dislocation core energy [49].The EAM4 potential was fitted to materials' properties such as those in Figure 5 and, hence, exhibits reasonably low error.
All potentials in Figs. 1 and 5, except EAM4, were fit using the same training set of 1 939 configurations.While this training set is realistic in both size and diversity, based on their complexity the SNAP, qSNAP, MTP, and GAP models may achieve even lower errors with a more extensive training set and larger basis set.
In this work, the cut-off radius for inter-atomic interactions was set to 5.5 Å for all potentials except for EAM4.Additional hyperparameters for the UF 2 , UF 2,3 , SNAP, qSNAP, and GAP basis sets are tabulated in the Sup- The UF2 potential outperforms empirical potentials (Morse, EAM4) in reproducing the reference phonon frequencies (pink squares).The LJ curve, omitted, exhibits large oscillations and negative phonon frequencies.B) SNAP and qSNAP have similar phonon frequency errors to the UF2 and EAM4 potentials.C) The addition of three-body interactions allows the UF2,3 potential to approach the performance of the MTP and GAP potentials.
plemental Information.For UF 2,3 , a separate smaller cut-off radius of 4.25 Å was used for three-body interactions.This choice was motivated in part by precedence in other two-and three-body potentials such as the modified embedded-atom potentials [42] and in part by speed: The smaller cut-off radius results in ten times fewer threebody interactions and corresponding speed-up.
For the bcc tungsten system, the UF 2,3 potential approaches the accuracy of the MTP and GAP potentials.Adding the three-body interactions eliminates the error associated with the Cauchy discrepancy, improving the elastic constant predictions.The UF 2,3 potential also achieves lower errors for the surface and vacancy formation energies than the UF 2 pair potential, underscoring the value of including three-body interactions.
Figure 6 compares the calculated phonon spectra of the various MLPs with the DFT reference values [33].Perhaps surprisingly, the UF 2 pair potential is comparable in error to the potentials with many-body terms.In contrast, the Morse pair potential is rather inaccurate  I. Melting temperature predictions alongside energy, force, and phonon frequency benchmarks for bcc tungsten.See Section III D and Section V D for details.Compared to LJ, Morse, and EAM4, the UF2 potential achieves a low error in the melting temperature for a similar cost.The UF2,3 potential prediction is even closer to the DFT reference at the cost of one order of magnitude in speed.MTP, SNAP, qSNAP, and GAP require one to three orders of magnitude more computational resources for the same large-scale simulation.
and the LJ pair potential, not shown, yields a phonon spectrum with imaginary frequency and large frequency oscillations.Table I summarizes the RMSE of the phonon frequencies computed across the 26 DFT reference values.As in other calculated properties, the addition of threebody interactions in the UF 2,3 potential significantly improves the agreement with the DFT reference compared to the UF 2 pair potential and even surpasses the other computationally more expensive MLPs.
As an example for a practical, large-scale calculation, we predict the melting temperature of tungsten (see Section V D for computational details).Since the training set of the MLPs does not include liquid configurations, the melting point predictions measure the models' extrapolative capacity.Table I compares the predicted melting temperatures of the MLPs to the ab-initio reference value of 3465 K [50].We observe that the other pair potentials are limited in their predictive capabilities while the UF 2 potential is comparable in accuracy to the more complex potentials, which yield reasonable predictions.The qSNAP melting calculation failed due to numerical instability at higher temperatures, which is known to occur in high-dimensional potentials.[51] Bonds break at higher temperatures, leading to many local atomic configurations that are underrepresented in the training set.We expect that training the qSNAP potential on a suitable, more extensive dataset would remove this instability.The extrapolative capacity of the UF 2 and UF 2,3 potentials in melting-temperature simulations illustrates that the UF potentials can provide an accurate description with only a moderately sized training dataset, indicating their usefulness for materials simulations with limited reference data.

IV. SUMMARY
We developed and implemented a machine-learning potential that is fast to train and evaluate, provides an interpretable form, is extendable to higher-order interactions, and accurately describes materials even for comparably sparse training sets.The approach is based on an effective many-body expansion and utilizes a flexible B-spline basis.The resulting regularized linear leastsquares optimization problem is strongly convex, significantly reducing the computational requirements.
For the example of elemental tungsten, the UF 2 pair potential produces energy, force, and property predictions rivaling those of SNAP and qSNAP while matching the cost of the Morse potential, corresponding to a reduction in computational cost by two orders of magnitude.Despite the intrinsic limitations of the pair potential in capturing physics, we find that the UF 2 pair potential yields reasonable predictions in property benchmarks, such as for elastic constants, phonons, surface energies, and melting temperature.The UF 2,3 potential, which accounts for three-body interactions, approaches the accuracy of MTP and GAP while reducing computational cost by one to three orders of magnitude.
The rapidly increasing number of MLPs, from ultrafast linear models to graph neural network potentials, highlights the trade-offs between computational efficiency, robustness, and model capacity.Complex, highdimensional MLPs are expected to yield higher accuracy for complicated systems at the price of greatly increased computational cost and data requirements.The UF approach yields potentials that are fast and robust, at the price of reduced flexibility and possibly greater errors for complex systems.Future work on UF potentials will explore the use of active learning for increased robustness and data efficiency as well as the addition of the fourbody term, which is necessary for modeling dihedral angles that are critical to describing organic molecules and protein structures.The software for fitting UF potentials and exporting LAMMPS-compatible tables is freely available in our Github repository [34].

A. Data
To compare the UF potential against existing potentials we use a dataset by Szlachta et al. [33] which has been used before to benchmark the GAP [33], SNAP [52], and aPIP [26] potentials.This dataset of 9 693 tungsten configurations includes body-centered cubic (bcc) primitive cells, bulk snapshots from molecular dynamics, surfaces, vacancies, gamma surfaces, gamma surface vacancies, and dislocation quadrupoles.Energies, forces, and stresses in the dataset were computed using density functional theory (DFT) with the Perdew-Burke-Ernzerhof (PBE) [53] functional.

B. B-spline basis
The UF potential uses natural cubic B-splines: Their first derivative is continuous and smooth, while their second derivative is continuous.They are natural, rather than clamped, in that their second derivative is zero at the boundary conditions.These properties, which are critical for accurately reproducing forces and stresses, motivated our choice of basis set.Other B-spline schemes have been explored for interpolation in empirical potential development.Wen et al. [41] discuss clamped and Hermite splines as well as quartic and quintic splines.The natural cubic spline is sufficient except when computing properties that rely on the third and fourth derivative of the potential, such as thermal expansion and finite-temperature elastic constants.
Uniform spacing of knots is a reasonable choice in many cases.However, the user may adjust the density of knots to control resolution in regions of interest.Due to compact support in this basis, each pair-distance energy requires the evaluation of exactly four B-splines.Hence, the potential's speed scales with neither the number of knots nor the number of basis functions.
On the other hand, the minimum distance between knots limits the maximum curvature of the function.The optimum density of knots thus depends on the quality of the training set available.Underfitting or overfitting may arise from insufficient or excessive knot density, respectively.Based on convergence tests (supplemental information), we fit UF potentials in this work using 25 uniformly spaced knots.
By construction, each spline coefficient influences the overall function across five adjacent knots.As a result, the user can tune the shape of the potential further according to additional For instance, soft-core and smooth-cutoff requirements may be satisfied by adjusting spline coefficients at the ends.In this work, we ensure that the potential and its first derivative follow a smooth cutoff at r ij = t K by setting the last three coefficients to 0.

C. Models
In this work, we partitioned the data using a random split of 20% training and 80% testing data.The testing set was used to evaluate the root-mean-square error (RMSE) in energies and forces, as reported in Table I and Fig. 1.The LJ and Morse potentials were optimized using the BFGS algorithm as implemented in the SciPy library [54].The SNAP and qSNAP potentials were retrained using the MAterials Machine Learning (MAML) package [55].The GAP potential was retrained using the QUantum mechanics and Interatomic Potentials (QUIP) package [56,57].We used the EAM4 potential, obtained from the NIST Interatomic Potential Repository [58,59], without modification.The MTP potential was fit using the MLIP package [60].
The size of the training set was selected to represent common, data-scarce scenarios.With a larger training set, the MTP, SNAP, qSNAP, and GAP models would likely produce better predictions.We refer the reader to the original works and existing benchmarks [18,32] for details regarding convergence with training examples and model complexity.

D. Calculations
All potentials in Fig. 1 were benchmarked using one thread on an AMD EPYC 7702 Rome (2.0 GHz) CPU.
Computational cost measurements for each potential are reported as the average over ten simulations.
We use two methods to estimate the computational cost of the UF 2,3 potential and present both values in Fig. 1 and Table I.The lower estimate, 0.76 ms/step, is computed by multiplying the computational cost of UF 2 by the ratio of floating point operations used by UF 2,3 and UF 2 .The higher estimate, 2.03 ms/step, is computed using the ratio of speeds, in the Python implementation, multiplied by the reported cost of UF 2 .We show performance bounds instead of the current UF 2,3 implementation's computational cost because it has not been fully optimized yet.
Elastic constants were evaluated using the Elastic python package [61].Phonon spectra were evaluated using the Phonopy python package [62].Melting temperatures were calculated in LAMMPS using the twophase method and a timestep of 1 fs.The initial system, a 16 × 8 × 8 bcc supercell (2 048 atoms), was equilibrated at a selected temperature for 40 000 timesteps.Next, the solid-phase atoms were fixed while the liquidphase atoms were heated to 5 000 K and cooled back to the initial temperature over 80 000 timesteps.Finally, the system was equilibrated over 200 000 steps using the isoenthalpic-isobaric (NPH) ensemble.If the final configuration did not contain both phases, the procedure was repeated with a different initial temperature.Reported melting temperatures were computed by taking the average over the final 100 000 steps.

CODE AVAILABILITY
Code for fitting UF 2 and UF 3 potentials, as well as exporting LAMMPS-compatible tables is freely available in our open-source "Ultra-Fast Force Fields" GitHub repository [34].The UF 2 potential is natively supported by LAMMPS, GROMACS, and other molecular dynamics suites that can construct potentials from interpolation tables.In LAMMPS, the pair style "table" is available for execution on both CPU and GPUs, enabling the UF 2 potential to benefit from various computer architectures.A LAMMPS package for using UF 2,3 potentials is also available in the repository.
FIG. 1.Trade-off between prediction error and computational cost of evaluating machine-learning potentials.Prediction errors are relative to the underlying electronic-structure reference method (disk).Ultra-fast potentials (this work, stars) with two-body (UF2) and three-body (UF2,3) terms are as fast as traditional empirical potentials (squares) but close in error to state-of-the-art machine-learning potentials (triangles, diamond).All potentials except EAM4 were refitted to the same tungsten data set.Computational costs were benchmarked with a 128-atom bcc-tungsten supercell.Trade-offs between accuracy and cost also arise from the choices of hyperparameters for each potential.See Sections II and V for abbreviations and details.

FIG. 3 .
FIG. 3.Visualization of two-body and three-body terms of a tungsten UF potential.a) Distribution of pair interactions in tungsten training data and the learned two-body component of the fitted potential.b) The learned three-body component V3(rij, r ik , r jk ) corresponds to the contribution to the total energy of a central atom i interacting with two neighbors j and k.It is fit simultaneously with the two-body potential.Here, θ jik is substituted for r jk using the law of cosines for ease of visualization.c) Volume slices of V3(rij, r ik , θ jik ) reveal favorable (blue) and unfavorable (red) three-body interactions.

FIG. 5 .
FIG.5.Performance for derived quantities of seven potentials relative to the DFT reference for bcc tungsten.The solid black line in each spider plot indicates zero error.Energy, force, and phonon spectra error are percent RMSE normalized by the sample standard deviation of the reference values.Other errors are percentage errors.The UF2 potential achieves an accuracy approaching that of SNAP and qSNAP, while the UF2,3 potential achieves an accuracy comparable to MTP and GAP.

FIG. 6 .
FIG. 6. Phonon dispersion curves for bcc tungsten.A)The UF2 potential outperforms empirical potentials (Morse, EAM4) in reproducing the reference phonon frequencies (pink squares).The LJ curve, omitted, exhibits large oscillations and negative phonon frequencies.B) SNAP and qSNAP have similar phonon frequency errors to the UF2 and EAM4 potentials.C) The addition of three-body interactions allows the UF2,3 potential to approach the performance of the MTP and GAP potentials.
shows the relative errors in 12 quantities: energy, forces, phonon frequencies, lattice constant a 0 , elastic constants C 11 , C 12 , and C 14 , bulk modulus B, surface energies E 100 , E 110 , E 111 , and vacancy formation energy E V .Energy, force, and phonon predictions are displayed as percent RMSE, normalized by the sample standard deviation of the reference values.The remaining scalar quantities are displayed as