Introduction

Alloys consisting of multiple principal components have attracted considerable attention over the last decade1,2,3,4,5,6,7,8. Apart from the conventional subsystems like binary and ternary alloys, one can also access the interior regions of the highly complex compositional space of such alloys, revealing a multitude of untapped compositions. High entropy alloys (HEAs) are one such class of alloys consisting of at least four to five elements in near-equal proportions. They are known for remarkable properties such as high tensile and yield strength, high ductility, and fracture toughness9,10.

A high mixing entropy stabilises the disordered phase of a HEA wherein elements randomly occupy sites of the simple lattices, i.e., bcc11,12, fcc7,13, or hcp14. However, this is the predominant effect only at high temperatures, accompanied by large vibrations in the system. At lower temperatures, there is a possibility of segregation of individual elements, short-range ordering, and phase decomposition15,16,17,18,19. Therefore, to fully analyse the thermodynamics of the alloy across the entire temperature range, both vibrational and configurational entropy contributions are critical, and one needs to study both the high-temperature disordered phase and the low-temperature segregated sub-phases. In addition, the properties of a HEA cannot be deduced by simply performing a linear combination of properties of individual elements and subsystems, as the interaction between the elements leads to unusual behaviour.

Computational techniques can assist in a rapid exploration of complex compositional spaces. However, the accuracy of results relies critically on the computational method. Running ab initio molecular dynamics (AIMD) simulations using density-functional theory (DFT) to obtain accurate results is computationally demanding. In particular, studying configurations with short-range order requires much larger length scales making DFT prohibitively expensive. Atomistic simulations employing interatomic potentials are a possible alternative. Most potentials assume the locality of interatomic interactions, facilitating much faster computation and better scaling with system size than DFT.

The key challenge lies in developing accurate and robust interatomic potentials. In particular, simultaneously accounting for both the vibrational and configurational degrees of freedom is extremely difficult. Conventional potentials, e.g., based on the embedded-atom method (EAM)20 or the modified embedded-atom method (MEAM)21, are restricted to simple fixed analytical forms for the energy with a small number of parameters. The sparsity of the respective parameter space does not provide enough flexibility to model complex systems.

In this regard, state-of-the-art machine learning (ML)-based models have a more systematic approach toward the parameterization of interatomic interactions. The atomic environment around a central atom is encoded by a local representation, which depends on vibrational (coordinates) and configurational (atomic types) degrees of freedom. Typically, local representations are composed of a radial part (depending only on interatomic distances) and an angular part (depending on the angles). The representations are invariant with respect to atomic permutations, translations and rotations of physical space and form a set of functions that describe the local atomic arrangement. This set of functions provides suitable input to an ML model. The mapping from this input to the energy is determined by the values of model parameters, which are obtained by training on DFT energies, atomic forces, and stresses.

ML-based models can also robustly estimate the uncertainty in their predicted energies and forces22,23,24,25,26,27,28. Such estimations are used to develop active learning algorithms to select the data most informative to the model. The data are obtained from the configurational and vibrational spaces sampled during, e.g., a molecular dynamics (MD) simulation using the existing model, either beforehand or in an on-the-fly fashion. The model is then retrained with the updated training data set, which includes DFT values of these sampled configurations. Hence, in addition to their unique way of accurately describing a complex multidimensional phase space with a certain number of model parameters, the learning-on-the-fly feature allows ML-based models to be a more robust and reliable approximation of the DFT potential energy surface.

Several ML-based models have been proposed in the literature, each with its own representation of the atomic environment29,30,31,32. In the present work, we analyse two complementary ML-based approaches, the moment tensor potential (MTP)23,33 and the Gaussian moment neural network (GM-NN)34,35. MTPs have been developed for thermodynamic property prediction up to the melting point of a single disordered phase of a TaVCrW HEA36. MTPs have also been built to study partial compositional spaces of a TiZrHfTax HEA37. GM-NNs, on the other hand, have so far been developed for a different class of chemical structures. For instance, they were used to study the dynamics of N and H2 on water and CO ice surfaces, respectively38,39,40, and the dynamic behaviour of magnetic anisotropy tensors of \({[{{{\rm{Co}}}}{({{{{\rm{N}}}}}_{2}{{{{\rm{S}}}}}_{2}{{{{\rm{O}}}}}_{4}{{{{\rm{C}}}}}_{8}{{{{\rm{H}}}}}_{10})}_{2}]}^{2-}\), \({[{{{\rm{Fe}}}}{({{{\rm{tpa}}}})}^{Ph}]}^{-}\), and \({[{{{\rm{Ni}}}}{({{{{\rm{HIM}}}}}_{2}-{{{\rm{py}}}})}_{2}{{{{\rm{NO}}}}}_{3}]}^{+}\) molecular crystals41. However, they have not been tested for metallic, let alone, complex HEA systems.

Here, we develop and compare an MTP and a GM-NN potential for the Ta-V-Cr-W HEA family, and assess their performance in modelling chemically-complex multicomponent systems. We include the vibrational and configurational degrees of freedom of the system in the potential development. We do so by attempting to predict energies and forces in Ta-V-Cr-W subsystems under two scenarios: (i) the 0 K energies and forces in binaries, ternaries and quaternaries; (ii) near-melting temperature energies and forces in 3- and 4-component disordered alloys. The subsystems are categorized into two groups—the in-distribution, and the out-of-distribution subsystems, where the former includes those subsystems used for training, and the latter includes non-equiatomic ternaries that are not a part of the training. In addition to the energy and force estimates, we assess the quality of the potentials by also computing thermodynamic properties.

The MTP and the GM-NN models are equally accurate in describing the Ta-V-Cr-W HEA family. The 0 K energies and forces of the in-distribution subsystems are predicted extremely accurately, while the RMS errors are slightly larger for the out-of-distribution subsystems. The high-temperature configurational space of the disordered quaternary (part of the in-distribution subsystems) is accurately captured to within 0.18 eV/Å RMS error in forces. Likewise, even non-equiatomic ternaries (part of the out-of-distribution subsystems) are accurately predicted, with RMS errors in forces of less than 0.16 eV/Å. While the absolute energies of the out-of-distribution subsystems are not predicted well, the relative energies—which are relevant for thermodynamics properties—are accurate to within 5.5 meV/atom. Classical EAM/MEAM potentials are unsuitable for such a complex phase space and have errors one to two orders higher in magnitude. Finally, we investigate the role of active learning strategies applied to HEAs and find that human-based inspection is crucial in this specific case.

Results

Features of the two ML models

Figure 1 is a schematic illustration of the similarities and differences between the MTP and the GM-NN approaches. MTPs and GM-NNs assume locality of interatomic interactions. The total energy of an atomic system is split into the contributions of individual atoms. For every atom, only the interactions within a cutoff Rcut are explicitly considered, as shown in Fig. 1a. Since the total energies (written as a sum of site energies for all atoms) are trained on DFT systems which include long-range effects, the interactions beyond Rcut get implicitly included during the fitting.

Fig. 1: Schematic representation of the moment tensor potential (MTP) and the Gaussian moment neural network (GM-NN) approaches.
figure 1

a Both methods rely on the assumption of locality of interatomic interactions but implicitly include interactions beyond the cutoff radius Rcut. b As an intermediate step in constructing invariant features, both methods obtain a common representation equivariant to rotations. c Invariant features are obtained by computing tensor contractions and eliminating linear dependencies specific to the considered approach. d Utilizing linear regression or neural networks, which take invariant features as an input, mapping from a structure to the respective energy is modelled.

The building blocks of the MTPs and GM-NNs are the atomic environment descriptors, shown in Fig. 1b. MTPs and GM-NNs encode the structural and chemical composition of a local atomic environment by utilizing symmetry preserving representations motivated by invariant polynomials and Cartesian Gaussian-type orbitals (GTOs)42, respectively. Interestingly, both approaches lead to a common representation invariant to translations and permutations but equivariant to rotations.

Descriptors, invariant to rotations, are obtained by computing tensor contractions and eliminating possible linear dependencies; see Fig. 1c. For MTP, the selected contractions are restricted to the ones of a particular level \({{{{\rm{lev}}}}}_{\max }\)33, providing restriction on both radial and angular parts and thus defining the basis in which the atomic energy is expanded. Larger values of \({{{{\rm{lev}}}}}_{\max }\) account for high-order tensors participating in contractions, thereby producing more descriptors and increasing the fitting capabilities of the potential. In fact, the EAM and MEAM potentials can be viewed as reduced MTP potentials with \({{{{\rm{lev}}}}}_{\max }=0\) and \({{{{\rm{lev}}}}}_{\max }=2\), respectively. For GM-NN, linear dependencies are eliminated by identifying unique generating graphs corresponding to the tensor contractions43 and taking into account the permutational symmetry of the respective tensors or, equivalently, nodes on the graph34,35. The intermediate equivariant representation relates MTP and GM-NN to the recently proposed equivariant message passing neural network (NN) architectures44,45.

The invariant features of the atomic environments computed within both approaches are then used as an input to linear regression or a neural network-based model for MTP and GM-NN, respectively, as illustrated in Fig. 1d. The GM-NN has ~500 times more parameters as it utilizes artificial NNs to map the structure and the respective properties. Finally, the models are trained on DFT data (energies, atomic forces and stresses) by minimizing a specified loss function. The reader is referred to the ‘Methods’ section for more details about the construction of the ML-based models, and the relation of the MTP and GM-NN models to other ML potentials.

From configurations generated by the trained model, new samples are iteratively selected for active learning based on uncertainty quantification. In the MTP and GM-NN approach, uncertainty measures are based on gradient feature maps, i.e., \(\nabla_{\boldsymbol{\theta}}E\) with θ being the model’s parameters. In the simplest case, \(\nabla_{\boldsymbol{\theta}}E\) would correspond to input features of the linear regression in Fig. 1d. Specifically, for MTP, the uncertainty measure is based on the maximum volume (MaxVol) principle for the matrix of the basis function derivatives with respect to the fitting coefficients46. The actual measure of the uncertainty is the determinant of the corresponding MaxVol submatrix. For GM-NN, the uncertainty is computed by employing gradient features of a trained NN (with θ being the weights and biases)26,27,28, corresponding to the finite-width neural tangent kernel47. In this work, the largest cluster maximum distance (LCMD) method28 for selecting new structures and last-layer gradient features (FEAT(LL)) to compute the similarity between structures and the model’s uncertainty from it is used. Other active learning approaches are also possible27,28.

Evaluation strategy

The objective of the ML-potentials is to capture both the vibrational and configurational degrees of freedom at near-DFT accuracy. The TaVCrW HEA stabilizes as a fully disordered structure at high temperature. As the temperature decreases, there is short-range ordering in the alloy, followed by segregation of individual elements or phases18. Hence, a robust ML model that accurately predicts the complete HEA behaviour should be able to predict both the high-temperature disordered phase, and the low-temperature ordered phases.

Hence, we choose structures that represent both these temperature regimes for training and evaluation of the ML potentials. The training data includes 0 K energies and forces of ordered phases—binaries, ternaries and quaternaries. In addition, we also include configurations of strained binaries and of the disordered TaVCrW alloy at 2500 K. These subsystems that form the training set are classified as in-distribution subsystems.

To further investigate the generalization capabilities of the ML-potentials, we also evaluate their performance on subsystems from different parts of the compositional space of the Ta-V-Cr-W family that do not form a part of the training set. For this, we choose a set of non-equiatomic ternaries, and evaluate both the 0 K and high-temperature energy and force predictions of the ML-potentials. These subsystems are classified as out-of-distribution subsystems. The structures in the training and validation data set of the in-distribution and out-of-distribution subsystems are detailed in the ‘Methods’ section.

Accuracy for the in-distribution subsystems

Figure 2 shows the root-mean-square (RMS) errors in the predicted energies and forces for all in-distribution subsystems using the MTP. The RMS error-tetrahedra for the GM-NN potential and the EAM/MEAM potentials are compiled in the Supplementary Information. In addition to the 0 K subsystems (binaries, ternaries, quaternary), Fig. 2 also shows errors for the 0 K strained binaries and the 2500 K disordered HEA, all of which form the in-distribution subsystems. The ‘Overall errors’ correspond to the RMS errors on all configurations put together. The results are additionally compiled in Table 1. To guarantee reliable statistics, all values refer to averages over ten separately fitted MTPs and GM-NNs.

Fig. 2: RMS errors in predicted energies and forces for the MTP model evaluated on the test data for different in-distribution subsystems of Ta-V-Cr-W.
figure 2

Energy RMS errors in meV/atom are shown in bold, while the atomic force RMS errors in eV/Å are in grey. The numbers with arrows indicate the RMS errors for the energy per atom for strained binary structures.

Table 1 RMS errors in the predicted energies/forces for the individual Ta-V-Cr-W in-distribution subsystems.

From Fig. 2 and Table 1 (and Fig. S1 in the Supplementary Information), we see that both ML-based models MTP and GM-NN are almost equally accurate and reproduce DFT values extremely well for the in-distribution subsystems. For the 0 K subsystems, the RMS errors in energies and forces range from 1–4 meV/atom and 0.02–0.06 eV/Å, respectively, and for the 2500 K disordered structure, the RMS errors are around 2.5 meV/atom and 0.16 eV/Å, respectively. The strained binary structures are also predicted to within 4.5 meV/atom error in energy, with some binaries (CrW, TaCr, and VW) being predicted more accurately than the others, irrespective of the model chosen. The MTP is marginally better than the GM-NN in predicting the 2500 K structure (i.e., the vibrational degrees of freedom). The GM-NN predicts the 0 K structures marginally better (i.e., the configurational degrees of freedom).

Figure 3 shows the correlation between the predicted forces for the 0 K and the 2500 K TaVCrW structures. The 0 K data in the training and validation data set include a combination of relaxed and unrelaxed structures, and they span a broad range of force values. For example, as seen in Fig. 3, the forces range from −2 to 2 eV/Å for the TaVCrW 0 K structures. For the 2500 K MD structure, they range from −10.5 to 10.5 eV/Å. Irrespective of this, there is an excellent agreement in the forces predicted by the ML-based models for all structures of the in-distribution subsystems.

Fig. 3: Correlation between the predicted forces (by the MTP, the GM-NN and the EAM potential) and the DFT forces in eV/Å for the 0 K and 2500 K structures of the TaVCrW alloy.
figure 3

The colours indicate the relative density as estimated by a kernel density estimator.

The overall accuracy achieved by the MTP and the GM-NN with respect to the DFT values is remarkable, given the complexity of the configurational phase space (which includes both 0 K subsystems and a 2500 K disordered quaternary). To put it into perspective, Table 1 also shows the RMS errors of an EAM that was fit to a combination of the 2500 K 4-component structure and 0 K subsystems. Both the energy and force errors are one to two orders of magnitude higher, deeming them almost unusable for any accurate property prediction. The last column in Fig. 3 shows the force correlation between the EAM-predicted forces and the DFT forces on both 0 K and 2500 K TaVCrW structures. The 0 K forces predicted by the EAM are highly uncorrelated to the DFT values, and the 2500 K forces have a much broader distribution (larger RMS error) compared to the two ML-based models.

Heat capacity prediction for the HEA

Having shown the remarkable accuracy in energies and forces achievable with the ML-based models for the in-distribution subsystems, we now test their reliability and robustness in predicting thermodynamic properties of the disordered TaVCrW HEA. In particular, we analyse the isobaric heat capacity, Cp(T), up to 2500 K (close to the melting point). The heat capacity is calculated numerically using thermodynamic integration, the details of which are in the ‘Methods’ section.

Figure 4 shows the heat capacity curves obtained using the MTP and the GM-NN. The curves are compared to the DFT results from literature36. We have removed the electronic part from the full DFT curve to be able to directly compare to the results predicted by the current potentials. Both the MTP and the GM-NN predict almost the same behaviour of the heat capacity with temperature. The curves are also very close to the full DFT curve up to around 2200 K after which the ML-based curves diverge. The heat capacity, calculated as the second derivative of the Gibbs energy with temperature, is extremely sensitive to small changes in the free energy. Hence, given that only 25% of the training set comprised of disordered quaternary structures (0 K and 2500 K), the accuracy achieved by both the MTP and GM-NN for the heat capacity is outstanding. The curves are also compared to the heat capacity calculated using the EAM in Fig. 4. The EAM that was fitted to the entire training set whose RMS errors were compared in Table 1 was not stable enough to run TI across the entire temperature and volume range. Hence, we have fitted an EAM only to the 2500 K disordered TaVCrW structures to calculate the heat capacity. Even then, the heat capacity calculated by the EAM diverges from the DFT curve after 750 K and is much less accurate.

Fig. 4: Isobaric heat capacity as a function of temperature computed by employing the MTP, the GM-NN, and the EAM interatomic potentials.
figure 4

The reference DFT values (without the electronic contribution) are taken from literature36.

Accuracy for the out-of-distribution subsystems

Figure 5 shows the errors in the predicted energies and forces (finite-temperature results at 2000 K and 0 K results) for the out-of-distribution subsystems (non-equiatomic ternaries on the x-axis) using both the MTP and the GM-NN potential. The first row of plots contains the errors in absolute energies of the various subsystems. It is observed that the prediction of the absolute energies by the ML-based potentials is off by as much as 20 meV/atom in some cases. The out-of-distribution subsystems are not a part of the training set. The ML-potentials—although trained on absolute energies of configurations of the in-distribution subsystems—cannot accurately predict the large, absolute chemical potentials. Hence, we observe these much higher errors in the absolute energies of the out-of-distribution subsystems compared to those from the in-distribution subsystems.

Fig. 5: Errors in predicted energies and atomic forces for finite-temperature (2000 K) and 0 K out-of-distribution non-equiatomic ternaries.
figure 5

For the 2000 K data, twelve structures have been sampled for each system. Thus, ΔE and ΔFij denote the respective RMS errors. For the relative energy, we used the first structure in each subsystem as a reference. Therefore, RMS errors ΔErel are evaluated for the remaining eleven structures. For the 0 K test, we employed a single configuration such that ΔE denotes the mean-absolute error, while ΔFij denotes the RMS error.

However, the temperature dependence of thermodynamic properties such as the heat capacity discussed in the previous section, are numerical functions of energy differences instead of absolute energies. Thus the absolute energy predictions do not serve as a rigorous assessment of the accuracy of the ML-based potentials in predicting thermodynamic properties. In the second row in Fig. 5, we show the respective RMS errors in the relative energy (with respect to the corresponding relaxed structure) of the out-of-distribution subsystems, and find that the errors are less than 5.5 meV/atom.

In the third row of Fig. 5, we also show the errors in atomic forces at both 2000 K (left) and 0 K (right). The errors observed in the 2000 K forces are similar to those for the in-distribution systems with values not exceeding 0.16 eV/Å. Accurate prediction of MD trajectories at elevated temperatures can be thus expected for various alloy compositions in the Ta-V-Cr-W family. The errors in the 0 K forces are even smaller, and comparable to the 0 K subsystems in the in-distribution set. An exception is the Cr17W17V66 alloy, where the higher error comes from the distinctly higher absolute value of the force in the corresponding configuration (marked on the top of the plot). Hence, relative force errors are comparable among the different out-of-distribution subsystems.

In clear contrast to the ML models, the performance of the EAM on the out-of-distribution subsystems is much worse. The absolute energies are off by 120–200 meV/atom, while even the relative energies are off by 60–80 meV/atom. The RMS error in the forces in the ternaries is similar to the error observed for the TaVCrW quaternary with values ranging from 0.5 to 0.7 eV/Å. Such errors make the EAM unsuitable, even more so, for thermodynamic property predictions of out-of-distribution subsystems.

Overall, the performance of the ML-based potentials is equally and remarkably good even for out-of-distribution subsystems. The observed larger errors in the absolute energies could be, for example, corrected in an additional step by single static DFT calculations at the respective compositions.

Active learning applied to the TaVCrW HEA

Figure 6 compares the learning curves obtained by training the MTP and the GM-NN models on randomly and actively selected data. Figure 6a, b shows the RMS and MAX errors in energies. It can be seen that the MTP (orange lines) needs significantly fewer data to reach its final accuracy in the energies. The GM-NN model (blue lines), on the other hand, shows a continuous (nearly linear) decrease in the error on the energies as the size of the training set increases. The final accuracy of the two models is similar. The reason for the faster convergence of the MTP can be attributed to the difference in the inherent nature of the two ML-based models—the relatively simpler formalism of the MTP with ~500 times fewer trainable parameters favoring quicker convergence versus the enhanced non-linear NN formalism of the GM-NN model making it continuously more accurate with increasing training data.

Fig. 6: Learning curves for MTP and GM-NN models.
figure 6

Shaded areas denote the standard error on the mean evaluated over 10 or 100 independent runs for MTP and GM-NN, respectively. An additional factor of ten appears for GM-NN as for each of the ten data splits, another ten splits for early stopping have been performed. The root-mean-squared (RMS) and maximal (MAX) errors for energies are shown in (a) and (b) respectively in meV/atom; The RMS and MAX errors for atomic force are shown in (c) and (d) respectively in eV/Å.

Indeed, the MTP reaches convergence in the energy already with a seemingly very small training set size of 128 structures. However, one should note that each structure contains several tens to hundreds of atoms with many distinct local chemical environments. The total number of atoms in each training set is tabulated in Table 2. For example, the MTP trained on 128-atom structures constitutes around 11,000 atoms. Hence, even with 128 structures, the MTP is trained on a sufficiently large number of atomic neighbourhoods, allowing it to make accurate predictions on every part of the configurational space.

Table 2 The average number of atoms in the differently sized training sets during active learning (Ntrain = number of training structures).

Comparison of the dashed and solid lines in Fig. 6a, b reveals that there is no improvement in the RMS errors in energies using the active learning approach for neither of the two ML models. Since the random data selection was made across the entire diverse configurational space (with data chosen from all 0 K subsystems and MD), it is already sufficient to reach good energy accuracy for the Ta-V-Cr-W alloy family. An advanced active learning approach is superfluous for obtaining good energies.

Figure 6c, d show the errors in forces with increasing training set size. Once again, the MTP converges faster to its minimum RMS error in the forces in comparison to the GM-NN which has a more gradual decrease. The reason for this is the same as above, i.e., the simpler formalism of the MTP. However, unlike in the energies, here we notice a difference in convergence in the RMS error between the random and actively learnt MTP (dashed and solid orange lines respectively). The MTP trained on actively learnt training sets converges faster than the MTP trained on randomly selected training data. There is no such clear difference seen in the GM-NN curves (dashed and solid blue lines). To explain the improvement of active learning over a random selection for the MTP and the absence of such an improvement for the GM-NN, we track the number of 2500 K configurations in the actively learnt training set.

Table 2 shows the fraction of 2500 K configurations, given in brackets, at different stages of the MTP and GM-NN active learning process. The active learning scheme of the MTP selects a higher fraction of 2500 K configurations in comparison to the GM-NN model, especially as the training set size increases. For example, for the MTP trained on 256 structures, 34% of the training set comprises of 2500 K configurations. This is reflected in the considerably better performance of the actively learnt MTP in predicting atomic forces at small training set sizes. In contrast, the active learning approach employed by GM-NN selects roughly the same amount of 2500 K data across all active learning steps, but a much smaller fraction than the corresponding MTP scheme. The GM-NN method constrains representativity of the selected batch, i.e. takes data distribution into account, and almost 80% of the total training set comprises of 0 K data. Thus, there is only a very small decrease in the RMS error in forces using the GM-NN model coming from the actively learnt training set. In conclusion, active learning might be beneficial as long as multiple structures are provided before labelling, i.e., computing DFT energies and forces. One can only take advantage of active learning in combination with a conventional human-based training set generation.

Training and inference times

The time taken to train an ML-based model and to perform MD calculations with it depends on the number of model parameters. In the case of the GM-NN potential, this number is dictated by the neural network representation, whereas for an MTP the contractions are decisive (check ‘Methods’ section for details). To elucidate the training and inference times, we compare five GM-NNs and five MTPs built with different numbers of model parameters. The GM-NNs are designated as: 593-512-512-1, 593-128-128-1, 593-32-32-1, 360-512-512-1, and 910-512-512-1. Here the first number represents the number of input features, the following two numbers represent the number of hidden neurons, while the 1 in the end stands for the single output neuron (which corresponds to the energy of the system). For the MTP, we compare potentials with levels 16, 20, 24, 26, and 28 (see Equation (6)) resulting in 92, 288, 864, 1464, and 2445 descriptors, respectively.

Table 3 lists the training times for the different MTP and GM-NN models fitted to the in-distribution subsystems. Naturally, the time taken to train the MTP increases with the level as the number of parameters increase. Similarly, the time taken to fit a GM-NN potential increases with the number of input features, and with the number of neurons. The lev24 MTP—which is used eventually for the thermodynamic properties—takes eight hours to reach convergence in 400 iterations of the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm48. The corresponding GM-NN model takes four hours for 1000 epochs. In addition, in every epoch, the GM-NN evaluates 500 other configurations to avoid overfitting.

Table 3 Training times for both ML-based models evaluated for different numbers of model parameters on two Intel Xeon E6252 Gold (Cascade Lake) CPUs.

Figure 7 shows the inference time of the ML-models in comparison to the classical potentials and in relation to the achievable accuracy. In general, the MTPs (orange circles) are faster than the GM-NN potentials (blue triangles). However, the GM-NNs have been designed to run on graphical processing units (GPUs), which makes them faster (blue stars) than the higher-level MTPs. More specifically, the MTP used in this work for the HEA modelling—the lev24 MTP—is faster by a factor of 2.5 than the corresponding GM-NN—(593-512-512-1)—on the same CPU. However, on an NVIDIA GeForce RTX 3090 Ti 12GB GPU, the GM-NN is 2.5 times faster than the lev24 MTP.

Fig. 7: The trade-off between computational cost and normalized error (Equation (13)) evaluated for MTP, GM-NN, EAM, and MEAM.
figure 7

For the normalized error, the overall errors from Table 1 have been chosen. The darkened colour represents the default MTP and GM-NN models used throughout this work. Blurred colours show models with varying hidden neurons and/or input features. For MTP, the respective level defines the number of employed input features; see Equation (6). For GM-NN, we use a convention where the first number indicates the number of features, while all following numbers define the number of neurons in the respective layer (two hidden and one output layer).

It can also be observed that the inference time of an MTP gradually increases as the level of the MTP increases, although the accuracy (normalized error) saturates to a minimum after lev24. On the other hand, all GM-NN models are comparably computationally efficient on a GPU, and only minor differences in inference times have been observed on CPU nodes. The accuracy of GM-NN deteriorates when decreasing the number of hidden neurons or input features. For example, decreasing the number of neurons by a factor of 16 leads to similar results as decreasing the number of features by a factor of 1.6. Thus, the employed representation affects the final result more than the network size.

Obviously, the ML-based models are slower than conventional potentials such as the EAM and the MEAM model. For example, the lev24 MTP is slower by a factor of 35 in comparison to the EAM. However, the speed of the classical potentials is accompanied by an extreme inaccuracy compared to the ML potentials, at least an order of magnitude in the normalized error.

Discussion

We have developed two ML-based interatomic potentials—an MTP and a GM-NN—that can accurately capture the atomic interactions of the Ta-V-Cr-W alloy family. This is the first instance of exploring the limits of applicability of ML-based models to an extent where both the configurational and the vibrational degrees of freedom are included in the training and testing data sets for a chemically complex system. Specifically, the MTP and the GM-NN are able to extremely accurately reproduce energies and forces of both the 0 K subsystems, and the high-temperature (2500 K) disordered equiatomic TaVCrW HEA, which make up the in-distribution sets. They are also able to accurately predict the energies of the 0 K strained binaries. The accuracy achieved in the present work is very similar to the MTP errors obtained in ref. 36 for the same disordered alloy, although their MTP was trained only on the equiatomic 2500 K structures. Adding the configurational degrees of freedom by introducing many 0 K structures in the training data set does not affect the predictive capabilities of the MTP (nor the GM-NN) on the HEA behaviour, suggesting that there is scope for further expanding the training set.

The models have also been tested on out-of-distribution subsystems (not a part of the training set) comprising of non-equiatomic ternaries. Both models successfully provide a similar performance for the out-of-distribution subsystems as compared to the in-distribution subsystems in the prediction of high-temperature MD forces. The observed RMS errors in absolute energies are larger than for the in-distribution cases, arising from the inability to predict accurate absolute chemical potentials. Nevertheless, the relative energies relevant for calculating temperature-dependent thermodynamic properties are still predicted with a low RMS error. These results indicate a good generalization of the MTP and GM-NN models even to out-of-distribution data.

In an earlier work18, lattice-based Monte Carlo and cluster expansion techniques were used to predict chemical SRO and phase separation below 1300 K in the TaVCrW alloy. With the here developed ML-based potentials, we can now model off-lattice behaviour that enables us to capture strong lattice relaxations. In phase-separated structures, lattice mismatch at the interface can lead to strained subsystems, which are likewise well-predicted by the MTP and GM-NN. Consequently, the low-temperature phase-separated structures in combination with strained interfaces, the intermediate SRO structures, as well as the high-temperature fully disordered structure can now be simultaneously and more accurately modelled with a single potential. The here-developed ML models also offer scope to combine MD with Monte Carlo methods to further enhance the sampling of phase spaces of chemically complex alloys. The vibrational degrees of freedom that the ML-based models entail enable, for example, a more accurate prediction of the order-disorder transition temperature. A well-determined transition temperature is crucial in deciding the operational temperature of the HEA since ordering and phase segregation can lead to changes in mechanical behaviour15.

The robustness of the ML-based models is also noticeable in the heat capacity curves shown in Fig. 4. Even small differences in the free energies are known to cause large deviations in the computed heat capacity49. Nevertheless, the MTPs and GM-NNs are reasonably accurate (< few meV/atom) in the free energy prediction that they are able to predict the heat capacities to DFT accuracy (without electronic contributions) up to around 2200 K.

Even though the ML-based models are remarkable in their free energy and the eventual thermodynamic property prediction, one needs to keep in mind they have been fitted to DFT data that only contain the vibrational and configurational degrees of freedom. To reach ‘full DFT’ accuracy, one also needs to include the electronic degrees of freedom (electronic free energy and the impact of vibrations on it) and the magnetic degrees of freedom. The state-of-the-art method to include the electronic contribution uses the direct upsampling technique49, wherein we can perform DFT calculations (including electronic contributions) on snapshots generated by the ML-based models. The number of snapshots needed to reach a certain accuracy in the free energy can be roughly estimated based on the RMS errors in our models. In the current case, for an RMS error of 3 meV/atom, we can reach to within ± 1 meV/atom accuracy using approximately 35 snapshots49 for a given volume and temperature. By doing so, we can include the temperature-dependent (and vibration-coupled) electronic degree of freedom into the thermodynamics of the alloy. To account for the magnetic free energy, there have been propositions put forth in the literature, such as the magnetic MTP50, that can be used for magnetic systems. In consequence, on top of the predictions of the currently fitted ML-based models, further expanding the configurational space of the training set (e.g., also including temperature-dependent behaviour of binaries and ternaries) and adding the electronic and magnetic degrees of freedom would lead to a more complete and accurate picture of the behaviour of chemically complex multicomponent alloys.

For a complex system such as the Ta-V-Cr-W family, active learning is only partially beneficial. The RMS errors in predicted energies converge similarly irrespective of an active or random selection of configurations. This is true for both ML models. The random selection done uniformly across the diverse set of subsystems is already sufficient to reach good accuracy in the energies. The advantage of active learning is more evident in the convergence of the atomic forces, arising likely from the data set size and complexity of the atomic-force landscape compared to the energies. The active learning algorithm employed by MTP preferably selects more high-temperature configurations which have large force values. In contrast, the algorithm employed by GM-NN has been designed to select new data considering the data distribution. Thus, the GM-NN scheme selects almost equal portions of MD data in every iteration.

The convergence of the current models with respect to the training set size is also the result of the number of trainable parameters. The MTP has ~ 500 times fewer fitting parameters than the GM-NN model. Hence, it reaches its final accuracy faster while showing no improvement when increasing the training set size above a certain amount. On the other hand, the GM-NN model shows gradual yet continuous improvement in accuracy with the training set size, owing to a larger number of trainable parameters. This fact allows us to assume that the GM-NN would outperform the MTP on larger data set sizes. However, the accuracy of the MTP on larger data sets can be improved by increasing the level of the selected MTP model, i.e., by increasing the effective number of trainable parameters.

Ultimately, in complex systems such as the TaVCrW HEA, which require large simulation cells to allow for events such as SRO to occur, the choice of the interatomic model is an interplay between the accuracy and the inference (simulation) time. Although the ML-based models are one to two orders slower than classical potentials (yet negligibly small in comparison to ab initio simulation times), they are at least an order of magnitude more accurate. Classical empirical potentials such as the EAM/MEAM are restricted in their functional form. Although such potentials need a smaller training set leading to less severe phase space sampling problems, their inaccuracy far outweighs their partial benefits. The inaccuracy of classical models might not only affect their robustness and stability while running MD across the volume and temperature range but would also highly alter, for example, the order-disorder transition temperature, making them unsuitable for such applications. Hence, for predicting challenging multicomponent system behaviour, one needs to resort to ML-based models.

In summary, despite certain prominent differences between MTP and GM-NN, they provide similar results, evaluated on a test data set or during a real-time simulation. Considering that the two models are cutting-edge representatives of the ML-potential family, these results may reflect the constantly growing boundaries of the machine-learning potentials applied to current material systems of interest.

Methods

Local interatomic potentials

An atomic structure is defined by \(S={\{{{{{\bf{r}}}}}_{i},{Z}_{i}\}}_{i = 1}^{{N}_{{{{\rm{at}}}}}}\), where Nat is the number of atoms, \({{{{\bf{r}}}}}_{i}\in {{\mathbb{R}}}^{3}\) denotes the Cartesian coordinates of atom i, and \({Z}_{i}\in {\mathbb{N}}\) is the respective atomic number. We consider the mapping f from the atomic structure to a scalar electronic energy E, i.e., \(f:S\mapsto E\in {\mathbb{R}}\). All models employed here assume locality of interatomic interactions (Fig. 1a), thus the total energy of the system is split into contributions from individual atoms

$$E=\mathop{\sum }\limits_{i=1}^{{N}_{{{{\rm{at}}}}}}{E}_{i}.$$
(1)

The energy contributed by atom i is defined by a parameterized function, i.e., \({E}_{i}={E}_{i}\left({S}_{i},{{{\boldsymbol{\theta }}}}\right)\), where θ are trainable parameters and Si is the local atomic environment of atom i. The parameters θ are learned from the training data by minimizing a predefined loss function \(L\left({{{\boldsymbol{\theta }}}}\right)\). The data set typically contains total energies, atomic forces, and stresses obtained by an ab initio method. The local environment, \({S}_{i}={\{{{{{\bf{r}}}}}_{i},{Z}_{i},{\{{{{{\bf{r}}}}}_{j},{Z}_{j}\}}_{j\in {R}_{{{{\rm{cut}}}}}}\}}_{i = 1}^{{N}_{{{{\rm{at}}}}}}\), is defined by the chosen cutoff radius Rcut. In this work, we choose a cutoff radius of 5.0 Å.

Moment tensor potential—MTP

The MTP site energy for each atom i is linearly expanded through a set of basis functions Bα:

$${E}_{i}=\mathop{\sum }\limits_{\alpha =1}^{{\alpha }_{\max }}{\xi }_{\alpha }{B}_{\alpha }^{(i)},$$
(2)

which are contractions (e.g., dot products of vectors and matrices, see below) of the moment tensor descriptors Mμ,ν33. The ξα are expansion coefficients and the parameter \({\alpha }_{\max }\) is determined from Equation (6) below. Each descriptor Mμ,ν is composed of a radial part (scalar function depending only on the interatomic distances) and an angular part (a tensor of a certain rank) (Fig. 1b),

$$M_{\mu,\nu}({\mathfrak{n}}_i)=\sum\limits_{j} \overbrace{f_{\mu}(|{\boldsymbol{r}}_{ij}|,Z_i,Z_j)}^{{\mathrm{radial}}} \overbrace{\underbrace{{\boldsymbol{r}}_{ij}\otimes\ldots \otimes {\boldsymbol{r}}_{ij}}_{\nu\, {\rm{times}}}}^{{\mathrm{angular}}},$$
(3)

where \({{\mathfrak{n}}}_{i}\) stands for the neighbourhood of the atom i (all atoms within the radius Rcut around atom i), μ stands for the number of the radial functions, ν is the rank of the tensor in the angular part, and rij the vector between atom i and j. The radial functions fμ(rij, Zi, Zj) are expanded in a basis of Chebyshev polynomials, usually up to rank seven,

$${f}_{\mu }(| {{{{\boldsymbol{r}}}}}_{ij}| ,{Z}_{i},{Z}_{j})=\mathop{\sum}\limits_{k}{c}_{\mu ,{Z}_{i},{Z}_{j}}^{k}{Q}^{k}(| {{{{\boldsymbol{r}}}}}_{ij}| ),$$
(4)

where \({c}_{\mu ,{Z}_{i},{Z}_{j}}^{k}\) are expansion coefficients and

$${Q}^{k}({{{\boldsymbol{r}}}})={T}_{k}({{{\boldsymbol{r}}}}){({R}_{{{{\rm{cut}}}}}-{{{\boldsymbol{r}}}})}^{2}.$$
(5)

Here Tk(r) are the Chebyshev polynomials of order k and the \({({R}_{{{{\rm{cut}}}}}-{{{\boldsymbol{r}}}})}^{2}\) term is introduced to ensure a smooth decreasing to 0 at Rcut.

Although the moment tensor descriptors themselves are two-body functions (depending only on rij), many-body interactions get included into the basis functions Bα by building various contractions of the Mμ,ν, under the condition that each contraction has to yield a scalar value. Note that the coefficients \({c}_{\mu ,{Z}_{i},{Z}_{j}}^{k}\) enter Equation (2) non-linearly, as the basis functions Bα contain multiplications of different radial functions, see examples below. This way, the MTP fitting parameters \({{{\boldsymbol{\theta }}}}=\{{\xi }_{\alpha },\,{c}_{\mu ,{Z}_{i},{Z}_{j}}^{k}\}\) are composed of the coefficients ξα and \({c}_{\mu ,{Z}_{i},{Z}_{j}}^{k}\), which enter Equation (2) linearly and non-linearly, respectively.

For the purpose of choosing which (out of the infinite number of) basis functions to include in the interatomic potential, we define a degree-like measure, ‘level’, of Mμ,ν defined by levMμ,ν = 2μ + ν and the level of Bα obtained by contracting \({M}_{{\mu }_{1},{\nu }_{1}}\), \({M}_{{\mu }_{2},{\nu }_{2}},\ldots\), as

$${{{\rm{lev}}}}{B}_{\alpha }=(2{\mu }_{1}+{\nu }_{1})+(2{\mu }_{2}+{\nu }_{2})+\ldots \!.$$
(6)

Thus, to define an MTP we choose a particular \({{{{\rm{lev}}}}}_{\max }\) and expand the site energy function linearly through such Bα, which satisfy \({{{\rm{lev}}}}{B}_{\alpha }\le {{{{\rm{lev}}}}}_{\max }\). The MTP part of Fig. 1c (left-hand side) has examples of how some of the basis functions are constructed:

  • For the 2-body case, the scalar moments M0,0, M1,0, and M2,0 already serve as basis functions, which differ by the radial functions involved.

  • For the 3-body case, the moments M0,1 and M1,1 produce basis functions M0,1M0,1 and M1,1M1,1, respectively, which differ by the radial functions involved.

  • For the 5-body case (the most complex case for MTP in this work), the basis function has the form M0,1M0,2M0,2M0,1.

Note how the condition specified in Equation (6) reduces the number of different radial functions involved in the many-body moments (Fig. 1c left). By increasing \({{{{\rm{lev}}}}}_{\max }\), MTPs with more parameters emerge, which are capable of more accurately fitting or handling a larger training set, but which require more computational time for fitting and inference (simulation time). In this work, we use an MTP with \({{{{\rm{lev}}}}}_{\max }=24\).

The parameters of an MTP are obtained as those minimizing the mean squared loss on the training data (Fig. 1d). We use the second-order quasi-Newton BFGS48 optimization method, which requires exact first derivatives of the loss function with respect to the model parameters and approximates the second derivatives by constructing the so-called hessian matrix. To balance the relative contributions of energies, forces, and stresses for a configuration j with \({N}_{{{{\rm{at}}}}}^{(j)}\) atoms, weight factors of \({C}_{{{{\rm{e}}}}}=1/{N}_{{{{\rm{at}}}}}^{(j)}\), Cf = 0.001 Å2, and \({C}_{{{{\rm{s}}}}}=0.0001/{N}_{{{{\rm{at}}}}}^{(j)}\) are used, respectively.

Gaussian moment neural network—GM-NN

GM-NN uses an artificial NN to map an atomic structure S to the total energy \(E\left(S;{{{\boldsymbol{\theta }}}}\right)\). Specifically, GM-NN employs a fully-connected feed-forward NN consisting of two hidden layers34,35

$$\begin{array}{l}{y}_{i}\,=\,0.1\cdot {{{{\bf{b}}}}}^{(3)}+\frac{1}{\sqrt{{d}_{2}}}{{{{\bf{W}}}}}^{(3)}\phi \left(0.1\cdot {{{{\bf{b}}}}}^{(2)}\right.\\ \quad\quad\,\, +\,\left.\frac{1}{\sqrt{{d}_{1}}}{{{{\bf{W}}}}}^{(2)}\phi \left(0.1\cdot {{{{\bf{b}}}}}^{(1)}+\frac{1}{\sqrt{{d}_{0}}}{{{{\bf{W}}}}}^{(1)}{{{{\bf{G}}}}}_{i}\right)\right),\end{array}$$
(7)

with \({{{{\bf{W}}}}}^{(l+1)}\in {{\mathbb{R}}}^{{d}_{l+1}\times {d}_{l}}\) and \({{{{\bf{b}}}}}^{(l+1)}\in {{\mathbb{R}}}^{{d}_{l+1}}\) being the weights and biases of the layer l + 1, respectively. The default network in this work consists of d0 = 593 input neurons (dimension of the feature vector \({{{{\bf{G}}}}}_{i}={{{{\bf{G}}}}}_{i}\left({S}_{i}\right)\)), d1 = d2 = 512 hidden neurons, and a single output neuron d3 = 1. The weights of the network W(l+1) are initialized by picking the respective entries from a normal distribution with zero mean and unit variance. In contrast, the trainable bias vectors b(l+1) are initialized to zero. To improve the accuracy and convergence of the GM-NN model, we employ a neural tangent parameterization (factors of 0.1 and \(1/\sqrt{{d}_{l}}\))47. For the activation function ϕ, we use the Swish/SiLU function51,52.

The network’s output is scaled and shifted to aid the training process even further,

$${E}_{i}\left({{{{\bf{G}}}}}_{i},{{{\boldsymbol{\theta }}}}\right)=c\cdot ({\sigma }_{{Z}_{i}}{y}_{i}+{\mu }_{{Z}_{i}}),$$
(8)

where the trainable shift parameters \({\mu }_{{Z}_{i}}\) are initialized by solving a linear regression problem, and the trainable scale parameters \({\sigma }_{{Z}_{i}}\) are initialized to 1. The constant c is given by the RMS error per atom of the regression solution35.

To encode the invariance of the total energy to translations, rotations, and permutations of the same species, we employ the Gaussian moment (GM) representation34. The construction of GM features is based mainly on defining pairwise distance vectors rij = ri − rj and splitting them into their radial and angular components: rij = rij2 and \({\hat{{{{\bf{r}}}}}}_{ij}={{{{\bf{r}}}}}_{ij}/{r}_{ij}\). Thus, the equation from Fig. 1b takes a slightly different form as the L-fold outer product is computed for the normalized distances,

$${\Psi }_{i,L,s}=\mathop{\sum}\limits_{j\ne i}{R}_{{Z}_{i},{Z}_{j},s}\left({r}_{ij},{{{\boldsymbol{\beta }}}}\right){\hat{{{{\bf{r}}}}}}_{ij}^{\otimes L},$$
(9)

where \({\hat{{{{\bf{r}}}}}}_{ij}^{\otimes L}={\hat{{{{\bf{r}}}}}}_{ij}\otimes \cdots \otimes {\hat{{{{\bf{r}}}}}}_{ij}\). The non-linear radial functions \({R}_{{Z}_{i},{Z}_{j},s}\left({r}_{ij},{{{\boldsymbol{\beta }}}}\right)\) are defined as a finite sum of Gaussians \({\Phi }_{{s}^{{\prime} }}\left({r}_{ij}\right)\) (NGauss = 8 for this work)35

$${R}_{{Z}_{i},{Z}_{j},s}\left({r}_{ij},{{{\boldsymbol{\beta }}}}\right)=\frac{1}{\sqrt{{N}_{{{{\rm{Gauss}}}}}}}\mathop{\sum }\limits_{{s}^{{\prime} }=1}^{{N}_{{{{\rm{Gauss}}}}}}{\beta }_{{Z}_{i},{Z}_{j},s,{s}^{{\prime} }}{\Phi }_{{s}^{{\prime} }}\left({r}_{ij}\right).$$
(10)

The factor \(1/\sqrt{{N}_{{{{\rm{Gauss}}}}}}\) influences the effective learning rate and is inspired by the neural tangent parameterization47. The individual radial functions are centred at an equidistant grid of points between \({R}_{\min }=1.5\) Å (specific to this work) and Rcut, and are re-scaled by a cosine cutoff function29. The chemical information is encoded in the GM representation by trainable parameters \({\beta }_{{Z}_{i},{Z}_{j},s,{s}^{{\prime} }}\). The index s runs over the number of independent radial basis functions, set to Nbasis = 6 for this work.

The features invariant to rotations Gi (Fig. 1c right) are obtained by computing contractions of the respective tensors, as any full contraction of Cartesian tensors \({\hat{{{{\bf{r}}}}}}_{ij}^{\otimes L}\) yields a rotationally invariant scalar34. For the employed GM-NN model, contractions of up to three tensors of the maximum rank of three are used. Thus, the many-body expansion is restricted to the maximum of four-body terms, e.g.,

$${G}_{i,{s}_{1},{s}_{2},{s}_{3}}={\left({\mathbf{\Psi} }_{i,1,{s}_{1}}\right)}_{a}{\left({\mathbf{\Psi}}_{i,1,{s}_{2}}\right)}_{b}{\left({\mathbf{\Psi}}_{i,2,{s}_{3}}\right)}_{a,b},$$
(11)

where we use Einstein notation, i.e., the right-hand sides are summed over a, b {1, 2, 3}. Particular contractions are defined by identifying unique generating graphs43. In a practical implementation all GM features are computed at once providing tensors \({{{{\bf{G}}}}}^{(j)}\in \{{{\mathbb{R}}}^{{N}_{{{{\rm{basis}}}}}},{{\mathbb{R}}}^{{N}_{{{{\rm{basis}}}}}\times {N}_{{{{\rm{basis}}}}}},\ldots \,\}\). However, the number of invariant features used as input to NNs is reduced by considering the permutational symmetries of the respective graphs (see Fig. 1c right)34, i.e., only tensors of the same rank can be exchanged.

The parameters θ = {W, b, β, σ, μ} of the NN, i.e., weights W and biases b, as well as the trainable parameters β of the local representation and the parameters that scale, σ, and shift, μ, the output of the NN, are optimized by minimizing the mean squared loss on training data (Fig. 1d). We employ the Adam optimizer53 to minimize the loss function. The respective parameters of the optimizer are β1 = 0.9, β2 = 0.999, and ϵ = 10−7. We use a mini-batch of 32 molecules. The layer-wise learning rates are set to 0.0075 for the parameters of the fully connected layers, 0.005 for the trainable representation, as well as 0.0125 and 0.00025 for the shift and scale parameters of atomic energies, respectively. The training is performed for 1000 training epochs. To balance the relative contributions of energies, forces, and stresses, weight factors of \({C}_{{{{\rm{e}}}}}=1/{N}_{{{{\rm{at}}}}}^{(j)}\), Cf = 0.005 Å2, and \({C}_{{{{\rm{s}}}}}=0.0001/{N}_{{{{\rm{at}}}}}^{(j)}\) are used, respectively. To prevent overfitting during training, we employ the early stopping technique54. All models are trained within the TensorFlow framework55.

Relation of MTP and GM-NN to other ML potentials

Interestingly, the MTP and GM representations can be related to many other approaches of modelling interactions proposed in the literature by considering a general function, e.g., ref. 56

$$\mathop{\sum}\limits_{j,k,l}f\left({r}_{ij}\right)f\left({r}_{ik}\right)f\left({r}_{il}\right)\langle {\hat{{{{\bf{r}}}}}}_{ij}{\hat{{{{\bf{r}}}}}}_{ik}\rangle \langle {\hat{{{{\bf{r}}}}}}_{ik}{\hat{{{{\bf{r}}}}}}_{il}\rangle .$$
(12)

This expression describes a four-body interaction but scales cubically with the number of atoms within Rcut. It is reminiscent of, e.g., atom-centred symmetry functions57. However, by writing the definition of scalar products \(\langle {\hat{{{{\bf{r}}}}}}_{ij}{\hat{{{{\bf{r}}}}}}_{ik}\rangle\) explicitly and pulling out the sum with respect to spatial coordinates we arrive at expressions proposed in ref. 33 and ref. 34, which scale linearly with the number of atoms within Rcut. Specifically, for GM-NN the expression is obtained in Equation (11), while for MTP the analogue is contracting the tensor moments from Equation (3), each of which scales linearly with the number of atoms. Since a Cartesian tensor of rank L can be written as a linear combination of spherical harmonics up to angular momentum L58, both MTP and GM-NN representations (which use Cartesian tensors) are closely related to linear ACE59 and message-passing NequIP and MACE models45,60, which use spherical harmonics. Note that for message-passing models, the resulting many-body order and the effective cutoff radius increase with the increasing number of interaction blocks. While it often leads to a better performance in terms of error metrics such as RMS errors, it may lead to larger inference times and, thus, less efficient potentials.

The recently proposed extension of HIP-NN models61 with tensor sensitivity information is closely related to MTP and GM-NN potentials, as it employs irreducible Cartesian tensors to pass tensor information between atomic sites62. Another example of local representations based on spherical harmonics is SOAP or the respective GAP model63. Unlike the above-mentioned models, it employs Gaussian process regression to map the structure to the respective energy. A more detailed overview of existing interatomic ML potentials is out of the scope of this work, and the reader is referred elsewhere32,64,65.

EAM and MEAM potentials

Classical interatomic potentials based on the embedded-atom method (EAM)20 and the modified embedded-atom method (MEAM)21 are well understood in the literature and are not discussed here. We just draw attention to their relation to the ML-based models. EAM can be viewed as an MTP potential with one radial function per species pair and without the angular part, i.e., \({{{{\rm{lev}}}}}_{\max }=0\), see Equation (6). In turn, MEAM would be close to an MTP with \({{{{\rm{lev}}}}}_{\max }=2\) (see Equation (6)), allowing for different radial functions, and a vectorial angular part (rank 1 tensor).

In the main text, we compare the performance of the ML-based models to an EAM that is fitted to the same training data. In addition, in the Supplementary Information, we also compare to a reference-free MEAM in order to test whether there is any improvement coming from the angular three-body terms. The EAM and reference-free MEAM fitting is performed with MEAMfit66,67. Fitting weights for the energies, atomic forces and stresses are \(1/{N}_{{{{\rm{at}}}}}^{(j)}\), 0.1, and 0.001, respectively.

Data set generation - DFT calculations

All DFT calculations are performed using VASP68,69, with projector augmented wave (PAW) potentials70, within the generalized gradient approximation (GGA)71,72,73, and in the spin-unpolarized state. We use Methfessel-Paxton smearing of order 1 with σ = 0.1, a 350 eV energy cutoff and an automatic k-mesh generation with KSPACING = 0.12. The number of valence electrons is 11 for Ta and V, and 12 for Cr and W.

Description of the data set

The training and test data set includes energies, atomic forces and stresses from the DFT calculations. For the 0 K configurations with small supercells (item 1 below), we perform structural relaxation with DFT. For the 0 K configurations with large supercells (item 2 below), we have a combination of relaxed and unrelaxed structures. All configurations sampled from MD (item 3 below) are unrelaxed. We generate 2-, 3-, and 4-component BCC configurations in the following categories:

  1. 1.

    0 K configurations in small supercells (2–8 atoms):

    These configurations are sampled with the enumlib library74,75, which enumerates all possible symmetries to obtain non-repetitive structures. We first pick 2500 binary structures from enumlib, representing different atomic combinations of each of the TaV, TaCr, TaW, VCr, VW, and CrW subsystems on the BCC lattice. To obtain the 4-component structures, we take 8-atom binaries from above and substitute the elements giving 248 equiatomic quarternary structures.

  2. 2.

    0 K configurations in medium to large supercells (16, 128, 432, 8192 atoms):

    These configurations are designed to mimic the possible low-temperature ordering in TaVCrW. For each of these sizes, we create certain types of binaries and quaternaries. Binary structures for each of TaV, TaCr, TaW, VCr, VW, CrW subsystems include:

    • B2 ordering

    • B32 ordering

    • random binary solid solution

    • BCC interface: half of the cubic supercell is one unary, and the other half is the other unary

    Quaternary structures represent all possible phase separations on the TaVCrW BCC lattice, i.e., half of the cubic supercell is one binary system (e.g., TaV) and the other half is another binary system (e.g., CrW), and they include:

    • B2/B2 ordering

    • B2/B32 ordering

    • B32/B32 ordering

    • B2/random binary ordering

    • B32/random binary ordering

    • random binary/random binary ordering

  3. 3.

    High-temperature (2500 K) disordered structures (128 and 432 atom supercells):

    We use the MTP from ref. 36 to run NPT MD at 2500 K. From the obtained MD trajectories, we sample 983 and 48 structures with 128 and 432 atoms in the supercell, respectively. Sampling is performed by employing active learning designed for MTP models76,77, with MTP potentials initialized by training on 0 K data. Thus, the selected data may introduce some bias towards MTP performance. However, a more thorough investigation may be required to prove this hypothesis and is postponed to future works. The energies, forces and stresses of the sampled structures are calculated using DFT without relaxation.

  4. 4.

    Deformed structures (432 atoms):

    Equiatomic TaVCrW can phase segregate at low temperatures into: Ta+Cr/V+W (predicted by MTP in this work), Ta+W/Cr+V (predicted in ref. 18), and Ta+V/Cr+W. Each binary part undergoes certain strain in the [100] direction to accommodate the same cross-section between the two parts. Hence we also create deformed binaries as a part of the test data set (whose RMS errors are in Table 1) to accurately replicate such a phase-segregated B2/B2 quaternary structure. The strains applied along the [100] direction corresponding to the different binaries are: TaV 3.7%, CrW −1.28%, TaCr 0.27%, VW −0.14%, TaW 4.06% and VCr −4.71%.

In total, 6711 configurations are computed with DFT for usage as training and validation sets. More precisely, there are 5680 0 K structures: 4491 binary, 595 ternary, and 594 quaternary structures, along with 1031 structures sampled from MD at 2500 K. The number of binary and ternary structures is more or less equally distributed between the subsystems, and the difference comes from the non-convergence of some DFT calculations for some subsystems.

Testing the models

To assess the accuracy of the models, we compute the RMS error in energies and atomic forces on every Ta-V-Cr-W in-distribution subsystem, including the TaVCrW alloy at 0 K and the equiatomic TaVCrW disordered alloy sampled at 2500 K. The RMS errors are obtained by the 10-fold cross-validation method. From each subsystem and disordered alloy, 80% of the respective data set is selected for training and 20% for testing the resulting models. The selected training data is gathered into a final training set consisting of 5373 configurations. Then the potentials are tested on the selected test data set including 1336 configurations. We perform ten such instances with different combinations of the (80%, 20%) splitting to obtain a reasonable statistical averaging. The RMS errors presented in Table 1 and Fig. 2 are averaged over the 10 independent splits. We also train a single potential for each MTP/GM-NN/EAM/MEAM model to the entire data set (100%) to calculate the heat capacity of the disordered alloy.

While the above strategy applies to the in-distribution data, for the out-of-distribution subsystems (Fig. 5), we generate nine random ternary A17B17C66 structures with 54 atoms, for each possible non-equiatomic ternary except for the Cr-rich ones to avoid magnetism-related issues. For the 0 K case we take one sample per composition, while for the MD case we sample 12 structures per composition using NPT MD at 2000 K.

It should be mentioned here that the NN-based models are over-parametrized given a larger number of parameters than MTPs. The possibility of overfitting is accounted for by, e.g., employing a separate holdout or a validation data set and the early stopping technique54. For the GM-NN, we take 500 structures from the resulting training data sets as validation data sets, i.e., the GM-NNs are trained on fewer structures compared to the other potentials. We make ten independent splits for each training set to avoid any bias in the final models. Thus, the RMS errors for the GM-NNs are averaged over 100 runs.

The normalized error (NE) in Fig. 7 is calculated as

$$\,{{\mbox{NE}}}\,=\frac{1}{2}\left(\frac{{{{\rm{F}}}}-{{{\rm{RMSE}}}}}{\max {{{\rm{F}}}}-{{{\rm{RMS}}}}\,{{{\rm{E}}}}}+\frac{{{{\rm{E}}}}-{{{\rm{RMSE}}}}}{\max {{{\rm{E}}}}-{{{\rm{RMSE}}}}}\right),$$
(13)

where E-RMSE and F-RMSE are overall energy and force errors, respectively, taken from Table 1.

Heat capacity simulations

The heat capacity of the TaVCrW HEA up to 2500 K is numerically calculated as the second derivative of the Gibbs free energy with respect to temperature. The Gibbs free energy is calculated by a Legendre transformation of the Helmholtz energy. The Helmholtz energy up to the level of the interatomic model is calculated using thermodynamic integration using Langevin dynamics (TILD). For a set of volume and temperature (V, T) points, we run TILD on a 128-atom special quasi-random structure (SQS) of the TaVCrW alloy. TILD is performed from a reference effective quasiharmonic model78,79 (fitted to 2500 K MD forces) to the interatomic potential to obtain the free energy difference. By summing the effective quasiharmonic reference energy with the free energy difference, we obtain the full Helmholtz energy for that (V, T) point. After doing the same for the different (V, T) points, we parametrize the Helmholtz energy surface using a fourth-order polynomial in V and T, on which the transformation and numerical derivatives are performed. Further details on the TILD methodology can be found in ref. 36.