High-entropy alloys (HEA), or namely the compositionally complex alloys for a broader definition, have attracted a great deal of attention as a promising material solution for addressing the urgent societal goals of reduced carbon emissions and increased energy efficiency. Many HEAs have already been demonstrated to have remarkable yield strengths1,2,3,4,5,6,7,8, great corrosion9,10,11,12,13,14 and fatigue resistance15,16,17, reasonable ductility3,4,5,6,18,19,20,21,22, and/or thermal stability3,23,24. More excitingly, the multicomponent chemistry grants HEAs a nearly unlimited design space, making this group of alloys full of potential as revolutionary structural materials for both light weighting and high-temperature applications. The outstanding performances of HEAs are recognized to largely originate from their unique non-dilute, multicomponent chemistry and atomic structures at the fundamental level25,26,27,28.

A signature characteristic of the atomic structures of HEAs is severe lattice distortion29,30. In HEAs, multiple types of atoms with different atomic sizes and electronegativities are randomly mixed in a single-crystal lattice. The size mismatch and charge transfer between individual atoms result in significant variations in interatomic distances and angles. As a result, the atoms cannot reside perfectly on the ideal lattice site, giving rise to a considerable lattice distortion throughout the entire crystal. With respect to the perfect lattice, the distorted lattice of HEAs generates atomic strains that fluctuate from one atom to another, which produces intense interactions with the strain field of dislocations. As a result, every atom in HEAs can actually be considered as a point-pining obstacle to retard dislocation movement, leading to an outstanding solid-solution type strengthening effect31,32,33,34,35. For instance, extremely high yield strengths have been achieved in HEAs by elaborate manipulation of lattice distortion4,36,37,38,39. Additionally, lattice distortion is also recognized to play essential roles in the phase stability40,41, sluggish diffusion42,43, electrical44,45, and thermal conductivities45 of HEAs. Therefore, a deep understanding of lattice distortion could assist the design of high-performance HEAs without wandering in the endless compositional space.

Lattice distortion in HEAs can be accurately characterized for individual alloy compositions through various experimental methods, including neutron diffraction, synchrotron X-ray diffraction, and high-resolution transmission electron microscopy4,37,38,46. Additionally, first-principles calculations based on density functional theory (DFT) have been extensively applied to study lattice distortion owing to the accurate description of interatomic bonding. The supercell method based on the special quasi-random structure (SQS)47 is widely employed to mimic the multicomponent random mixing in HEAs33,47,48,49. The alloy’s atomic structure at the ground state can be straightforwardly obtained by performing a relaxation calculation. Based on the relaxed structure, lattice distortion can be accurately quantified via a variety of analysis approaches, such as radial or pair distribution function50,51, bond length distribution and fluctuation36,49, local atomic volume52, least-square atomic strain53, and mean square average atomic displacement (MSAD)33. Nevertheless, like the experimental approaches, most of the time the DFT calculations were only able to selectively study a small group of discrete alloy compositions due to large computational costs. Apart from the accurate experimental characterization and DFT-based computation, several mathematical models have been proposed for predicting lattice distortion simply according to the alloy composition. However, those models usually oversimplified lattice distortion as the difference in the pure-element atomic size of the constituent elements by assuming that each atom retains its original size even in a complex solid-solution environment. This assumption fails to account for the changes in atomic radii due to charge transfer among constituent atoms, which thus becomes problematic for predicting lattice distortion in the systems containing elements with similar atomic sizes but different electronegativities37,54. The lack of a generally applicable model for rapid and accurate predictions of lattice distortion for arbitrary alloy compositions significantly limits our ability to explore the large design spaces of HEA efficiently.

In the present work, a surrogate model based on physics-informed statistical learning and training data from first-principles calculations is developed to efficiently predict the lattice distortion in the body-centered-cubic (bcc) refractory HEAs (RHEAs) with compositions arbitrarily varying in chemical space composed of ten elements. The study is centered on RHEAs because they are not studied as comprehensively as the face-centered-cubic (fcc) 3d-transition-metal HEAs though lattice distortion is indeed found to be significant in individual cases and critically contribute to the mechanical strength4,35,36. The lattice distortion is quantified using the root mean squared atomic displacement (RMSAD), which measures the average displacement of relaxed atoms from their ideal positions in the undistorted crystal lattice33,55. On the basis of physical descriptors that characterize atomic bonds of pure metals and their binary ordered compounds, the developed surrogate model delivers high-throughput predictions of lattice distortion quantified in a way that previously can be only done for individual compositions by costly experiments or first-principles calculations. The high-throughput prediction enables a systematic overview of lattice distortion in the alloys with different levels of chemical complexity, from which the physical mechanism of HEA’s unique cocktail effect on lattice distortion is revealed.

Based on the predictions of the lattice distortion model, it is further found that the RMSAD parameter quantitatively correlates with the room-temperature yield strengths of the bcc refractory alloys in a general linear form. Accordingly, a strengthening model is established to effectively predict yield strengths of RHEAs according to their lattice distortion. Together with our previous surrogate models on intrinsic ductility56 and CALculation of PHAse Diagram (CALPHAD)-based phase stability prediction, a high-throughput alloy design framework is demonstrated to perform comprehensive screening of potential alloy compositions with a synergy of high strength and good ductility in a large compositional space far from fully explored.

Results and discussion

Initial data exploration

As schematically illustrated in Fig. 1, due to lattice distortion, atoms at equilibrium positions in a solid-solution alloy are locally displaced away from their ideal positions in a perfect, undistorted lattice. A mean squared average of those atomic displacements (i.e., the MSAD) has been demonstrated an accurate quantitative measure of lattice distortion in 3d-transition-metal HEAs both experimentally and computationally33,57,58. In the present work, the square root of the mean squared atomic displacement (i.e., the RMSAD parameter) is used to quantify lattice distortion because of its strong linear correlation with yield strength33,57. Through a DFT relaxation calculation, the RMSAD parameter for a given alloy composition can be straightforwardly derived as described in the method section in detail. To generate the training and testing data for the development of a surrogate predictive model for lattice distortion, we computed the RMSAD parameters of 215 individual alloy compositions as listed in Supplementary Table 1. Prior to using those DFT data for direct statistical regressions, we first utilized them to perform an initial data exploration, aiming to gain enough physical insights of lattice distortion to better develop the surrogate model. We limited the exploration process to the training dataset, which only contains binary and ternary alloys. As a result, the subsequent development of the surrogate model based on the data exploration is completely blind to the validation and testing datasets (i.e., quaternary and quinary alloys), and those datasets were thus valid to use for testing the model’s ability of extrapolative prediction.

Fig. 1: A schematic shows the local atomic displacements due to lattice distortion.
figure 1

The red dots stand for the equilibrium atomic positions in a bcc solid-solution alloy with lattice distortion, while the black dots stand for the ideal positions in an undistorted lattice. The schematic corresponds to a projection view along the \(\left[ {0\bar 11} \right]\) direction. The big solid circles in different colors represent individual types of atoms in the alloy.

The standard deviation of local atomic-bond lengths was naturally considered as a starting point for exploring RMSAD, because an ideal, undistorted bcc lattice would have uniform local bond lengths for every atom, and any displacement of an atom from its ideal lattice site would introduce the variance to the interatomic-bond lengths between the displaced atom and its neighbors. In the present work, we calculated the length of every first-nearest neighbor (FNN) atomic bond in the relaxed SQS’s using Voronoi-tessellation analysis, as implemented in the Pymatgen Python package59, and then derived the standard deviation of the bond lengths (\(\sigma _{{{{\mathrm{SQS}}}}}^L\)) in comparison with the RMSAD parameter obtained for the same structure. A scatter plot shown in Fig. 2a depicts a strong positive correlation between the \(\sigma _{{{{\mathrm{SQS}}}}}^L\) and RMSAD for the alloy samples in the training dataset, with a correlation coefficient, r = 0.94, and significance level, p < 0.001. The correlation indicates that the lattice distortion in solid-solution alloys tightly connects to the length divergence of individual local atomic bonds. In addition, there are two apparent outliers to be off from the correlation displayed in Fig. 2a, which are NbTi2 and NbHf3. A common feature of the two alloys is their high content of the group IV element, giving valence electron concentrations (VECs) that are at the lower end of those represented in the training dataset. This trend implies that the d-band filling effect may also play a key role to affect the lattice distortion in bcc refractory solid-solution alloys.

Fig. 2: Correlations between the root-mean-squared atomic displacement (RMSAD) parameter and fundamental properties of atomic bonds in refractory solid-solution alloys.
figure 2

a RMSAD vs. standard deviation of lengths of the first-nearest neighbor (FNN) atomic bonds in the relaxed SQS’s (\(\sigma _{{{{\mathrm{SQS}}}}}^L\)); b RMSAD vs. average valence electron concentration (\(u^{{{{\mathrm{VEC}}}}}\)); c RMSAD vs. standard deviation of the number of valence electrons of the constituent elements relative to VEC (\(\sigma ^{{{{\mathrm{VEC}}}}}\)); and d RMSAD vs. standard deviation of atomic bond lengths estimated from single-element bcc and binary B2 crystals (\(\sigma _{{{{\mathrm{B}}}}2/{{{\mathrm{bcc}}}}}^L\)) based on Eq. (6).

Accordingly, a scatter plot of VEC against RMSAD is displayed in Fig. 2b for the alloys in the training dataset. It was found that RMSAD showed a typical dependence on the average filling fraction of the d-band. Specifically, the RMSAD parameter becomes lower around VEC of 5.5 e/atom, which corresponds to an averagely half-filled d-band, and becomes higher when the VEC is outside the range of 5.0–6.0 e/atom, at which the d-band deviates from the half-filled state. In addition to this general d-band filling dependence, Fig. 2b also highlights an interesting phenomenon that RMSAD can still vary greatly for the alloys with the same VEC. A typical example is the Ti-Ru binary system. As presented in Fig. 2b, although all the three studied Ti-Ru alloys have VECs within the range of 5.0–6.0 e/atom, they tend to have more severe lattice distortion (i.e., a larger RMSAD), compared to other alloys with the same VECs. It should be noted that the bcc solid-solution phase is metastable for studied Ti-Ru compositions60. The observed severe lattice distortion actually reflects the strong phase-transformation tendency from the disordered solid solution to ordered intermetallic compounds. Another obvious distinction of the studied Ti-Ru alloys from others with similar VECs is that its constituent elements, Ti and Ru, have very different numbers of valence electrons. Considering the strong localization tendency of valence d electrons, one can expect that the local distribution of the d electrons at the Ti-Ti, Ti-Ru, and Ru-Ru atomic bonds in the alloy are inhomogeneous, giving local fluctuations in the VEC. Given this insight, the standard deviation of the number of valence electrons of the constituent elements relative to VEC (\(\sigma ^{{{{\mathrm{VEC}}}}}\)) is also calculated and plotted to against RMSAD in Fig. 2c. As expected, \(\sigma ^{{{{\mathrm{VEC}}}}}\) shows a general positive correlation with RMSAD, suggesting that in addition to the bond-length divergence, the divergence in the d-band filling fraction at local atomic bonds, which is induced by the variance of constituent element valences, could also be an important contributor to lattice distortion.

A physically interpretable surrogate model for RMSAD prediction

Although the DFT-SQS calculations provide a general approach to assess the lattice distortion of each individual alloy composition by predicting the RMSAD parameter, the relatively large size of SQS supercells make the calculation computationally expensive and consequently, ill-suited for high-throughput screening. Therefore, to fully explore the complex compositional space of RHEAs, it is necessary to develop surrogate models for more efficient prediction of RMSAD.

The data exploration discussed above provides a theoretical basis to raise a set of physical descriptors and an interpretable linear model to quantitatively describe RMSAD. As shown in Fig. 2a, the RMSAD parameter displays a strong linear correlation with the standard deviation of the FNN bond lengths in relaxed SQS’s (\(\sigma _{{{{\mathrm{SQS}}}}}^L\)). However, \(\sigma _{{{{\mathrm{SQS}}}}}^L\) cannot be directly used as a descriptor for the prediction of RMSAD because its obtainment requires expensive DFT calculations to relax the SQS for each alloy composition of interest. A surrogate approach is thus needed to efficiently estimate the variance in lengths of FNN atomic bonds in a solid-solution alloy.

As discussed later in the methods section, the FNN atomic bonds in the single-element bcc and binary B2 crystals can serve as an approximation to those in the solid-solution lattice if only the atomic-pair interactions in the FNN shell are considered. We expect that this assumption is particularly reasonable for the transition metal alloy systems as the valence d electrons are strongly localized. As a validation, the lengths of the FNN atomic bonds derived from the DFT-relaxed SQS’s are compared with their respective lengths in the single-element bcc and binary B2 crystals by showing the histogram of relative deviations in Supplementary Fig. 1. By comparing across a considerable amount of FNN bonds in the SQS’s of 68 binary and ternary alloys, the mean error and root mean square error (RMSE) are only 0.0009 Å and 0.0312 Å (−0.0567% mean relative error and 2.258% root mean square relative error), respectively, which validates our assumption. Therefore, the standard deviation of atomic bond lengths in a solid-solution alloy can be effectively estimated from the lengths of the FNN bonds in the single-element bcc and binary B2 structures using the bond-counting approach described in the method section. Correspondingly, a descriptor for the bond length standard deviation, \(\sigma _{{{{\mathrm{bcc}}}}/{{{\mathrm{B}}}}2}^L\), is derived. Based on Eq. (6), \(\sigma _{{{{\mathrm{bcc/B}}}}2}^L\) can be easily calculated for any alloy composition of interest without the need to run expensive DFT-SQS calculations.

As shown in Fig. 2d, the correlation between the RMSAD and the bond-length standard deviation (\(\sigma _{{{{\mathrm{bcc}}}}/{{{\mathrm{B}}}}2}^L\)) calculated using Eq. (6) is also significant (p < 0.001). On the other hand, significant variability in RMSAD remains unaccounted for by \(\sigma _{{{{\mathrm{bcc}}}}/{{{\mathrm{B}}}}2}^L\) alone as evidenced by an increase in the number of outliers compared to \(\sigma _{{{{\mathrm{SQS}}}}}^L\), suggesting that it is insufficient on its own. The results of data exploration in Fig. 2b, c suggest that the d-band filling effect could be another important contributor to lattice distortion in addition to the bond-length standard deviation. Therefore, it is natural to consider VEC as a possible descriptor to explain the discrepancies in the RMSAD vs. \(\sigma _{{{{\mathrm{bcc}}}}/{{{\mathrm{B}}}}2}^L\) correlation, especially given that all the outliers share VECs much lower than those of other samples. To test this hypothesis, the residuals of a linear regression of RMSAD on \(\sigma _{{{{\mathrm{bcc}}}}/{{{\mathrm{B}}}}2}^L\) is plotted to against the VEC of each sample in the training set in Supplementary Fig. 2a, where the residuals show a clear parabolic dependence on VEC with a vertex between VECs of 5 and 6. Based on this observation, we modified VEC into a new descriptor, \(u_{5.7}^{{{{\mathrm{VEC}}}}}\), with the form (VEC-X)2, where X is a constant tuned to be 5.7 e/atom by selecting the value that maximizes the correlation coefficient (Pearson’s r) between the transformed feature and RMSAD (Supplementary Fig. 2b). Similarly, another descriptor, \(\sigma _{5.7}^{{{{\mathrm{VEC}}}}}\), which describes the variance of VEC from 5.7 e/atom, was created to capture the fluctuation of VEC at different local atomic bonds in the solid-solution lattice. Similar to the feature standard deviation described in Eq. (6), \(\sigma _{5.7}^{{{{\mathrm{VEC}}}}}\) is calculated as a modified variance of a discrete probability distribution as follows, \(\sigma _{5.7}^{{{{\mathrm{VEC}}}}} = \mathop {\sum }\nolimits x_i\left( {{{{\mathrm{VEC}}}}_i - 5.7} \right)^2\). Where xi is the chemical composition of the constituent element, i, and VECi is its number of valence electrons. As expected, \(\sigma _{5.7}^{{{{\mathrm{VEC}}}}}\) also shows a strong correlation with RMSAD (Supplementary Fig. 2c).

In addition to VEC, the bimodality of the electronic density of state (DOS) of the d-orbitals is also a key factor to determine the d-band filling effects61. The localized characteristics of d electrons cause their DOS to display considerable shape features that strongly depend on the lattice structure. For example, a perfect bcc-type lattice generally results in a bimodal DOS of the d-orbitals. Once the bcc lattice is distorted, such as due to the presence of crystal defects, the shape of the DOS becomes less bimodal61. An extreme situation is that the bcc lattice changes to the fcc/hcp (hexagonal-close-packed) lattice through phase transformation. Correspondingly, the shape of the DOS also changes from bimodal to unimodal. More importantly, the change of bimodality can strongly influence the d-band-filling energy as well as the stability of the crystal lattice when the filling fraction varies. As shown in Fig. 3a, when the d-band is about half-filled, which corresponds to a VEC between 5 and 6, a bimodal DOS would have more occupied states far below Fermi Level (Ef) and fewer occupied states close Ef, compared to that of the unimodal DOS. This trend leads to a more negative band-filling energy, and correspondingly make the bcc lattice stable. Inversely, when the filling fraction of the d-band towards to the edges, a lower band filling energy is expected for the less bimodal DOS, correspondingly stabilizing the non-/distorted-bcc lattice (Fig. 3b). This principle well explains the effect of d-band bimodality on lattice distortion. For instance, if two elements with fewer d electrons (e.g., Ti, Zr, Hf) are forced to form an atomic bond in an undistorted bcc lattice, the DOS of the local d-band between them will then have a bimodal shape but a filling fraction near the left-band edge. As a result, the filling energy of the local d-band will be high, resulting in a negative effect on the lattice stability. To lower the band-filling energy and stabilize the bcc lattice, a local lattice distortion is thus necessary to reduce the bimodality of the local d-band DOS. In other words, a stronger local lattice distortion shall be expected between two atoms if their atomic bond in the undistorted lattice has a local d-band that has a more bimodal DOS and a filling fraction more away from the half-filled state.

Fig. 3: Illustration of the difference in the filling energy of the d-bands with a unimodal and bimodal electronic DOS.
figure 3

a The filling fraction is close to the band center. b The filling fraction is close to the band edge. The position of the Fermi level is represented by the red dashed line.

The bimodality of a DOS can be quantitatively measured through the Hartigan’s dip test61,62. A completely unimodal DOS corresponds to a test statistic (i.e., the dip value) of 0, while a more bimodal DOS has a larger dip value. In the present work, the average bimodality of the d-band DOSs of the FNN atomic bonds (\(u_{{{{\mathrm{B}}}}2/{{{\mathrm{bcc}}}}}^{{{{\mathrm{dip}}}}}\)) in a bcc solid-solution alloy at the undistorted state is estimated using Eq. (5) based on the bond-counting approach described in the method section. Considering the positive correlation of lattice distortion with both the filling fraction and bimodality of the local d-band DOS, we further weigh the two VEC-associated descriptors, \(u_{5.7}^{{{{\mathrm{VEC}}}}}\) and \(\sigma _{5.7}^{{{{\mathrm{VEC}}}}}\)with the bimodality parameter \(u_{{{{\mathrm{B}}}}2/{{{\mathrm{bcc}}}}}^{{{{\mathrm{dip}}}}}\) for a complete description of the d-band effect on lattice distortion. By testing the Pearson’s correlation with RMSAD, the square of \(u_{{{{\mathrm{B}}}}2/{{{\mathrm{bcc}}}}}^{{{{\mathrm{dip}}}}}\) (i.e., \((u_{{{{\mathrm{B}}}}2/{{{\mathrm{bcc}}}}}^{{{{\mathrm{dip}}}}})^2\)) turns to be the optimal weighing factor to result in the largest correlation coefficients.

With the descriptors discussed above, a physically interpretable model is developed to efficiently predict the RMSAD parameter of arbitrary bcc solid-solution alloys within the 10-element composition space studied in the present work. The model expresses RMSAD in a linear form with three terms,

$${{{\mathrm{RMSAD}}}} = a_1\sigma _{{{{{\mathrm{B}}}}2}/{{{{{\mathrm{bcc}}}}}}}^L + a_2\left( {u_{{{{{{\mathrm{B}}}}2}}/{{{{{\mathrm{bcc}}}}}}}^{{{{\mathrm{dip}}}}}} \right)^2u_{5.7}^{{{{\mathrm{VEC}}}}} + a_3\left( {u_{{{{{{\mathrm{B}}}}2}}/{{{{{\mathrm{bcc}}}}}}}^{{{{\mathrm{dip}}}}}} \right)^2\sigma _{5.7}^{{{{\mathrm{VEC}}}}}$$

where a1, a2, and a3 are the fitting coefficients obtained by performing ordinary least squares (OLS) regression with the available DFT data. As shown in Fig. 4a, by only regression with the binary and ternary data, the resulting model can already accurately predict the RMSAD of the quaternary alloys in the validation dataset with an RMSE of 0.012 Å, which is comparable to its training RMSE, 0.015 Å. The regressed values of a1, a2, and a3 are summarized in Table 1. Additionally, to confirm the stability and generalization of the model, we re-performed the regression on the data of binary, ternary, and quaternary alloys and tested the correspondingly obtained model with the DFT-calculated RMSAD of quinary alloys. As expected, the values of the fitting coefficients (i.e., a1, a2, and a3) are negligibly varied after additionally including the quaternary data in the regression, indicating the good stability of the proposed linear model. Additionally, as shown in Fig. 4b, the testing performance of the model on the quinary data is also promising, yielding an RMSE of 0.017 Å over 52 testing compositions. The results of Fig. 4 strongly suggest that the proposed linear model well captures the underlying physical mechanism of RMSAD, which grants the model outstanding capability to make accurate extrapolative predictions for multicomponent systems by only training with a finite amount of binary and ternary data.

Fig. 4: Prediction performance of the statistical surrogate model for RMSAD.
figure 4

a Trained on binary/ternary data and validated on quaternary data. b Trained on binary/ternary/quaternary data and tested on quinary data.

Table 1 Comparison of values of the fitting coefficients when the RMSAD surrogate model (Eq. 1) is trained with two different datasets.

It is further interesting to point out that a1, a2, and a3 all have positive values, which is consistent with the physical motivation of the model. First, a positive a1 indicates that a more severe lattice distortion should be expected in an alloy if it has larger deviations in the lengths of local atomic bonds, which is consistent with the observations in Fig. 2a, d. In the previous classical models of lattice distortion, the standard deviation of the atomic radii of pure elements are commonly adopted to describe the variance of atomic bond lengths by assuming atoms are all rigid particles. This assumption fails to consider the bond-length changes due to the possible electron density overlap between heteroelements, which leads to an overestimation on the lattice distortion in some bcc refractory alloys37,54. In the present work, we tackled this problem by using a bond-counting approach (Eqs. (5) and (6) in the methods section). Particularly, the bond length between two heteroelements can be approximated from their binary B2 structure by taking into account the possible charge transfer when forming a bond. Second, a positive value of a2 is consistent with our above analysis of the d-band effects on lattice distortion; the lattice distortion in an alloy becomes stronger if the average filling fraction of the d orbitals of the alloy is more away from half-filled (i.e., a larger value of \(u_{5.7}^{{{{\mathrm{VEC}}}}}\)) or the shape of the local d-orbital DOSs is more bimodal (i.e., a larger value of \(u_{{{{\mathrm{B}}}}2/{{{\mathrm{bcc}}}}}^{{{{\mathrm{dip}}}}}\)). Third, the positive contribution of VEC standard deviation to lattice distortion is also confirmed by a positive a3 coefficient. Overall, the proposed model suggests that the lattice distortion of bcc refractory solid-solution alloys is mainly attributed to two key factors: 1) the significance of the variance in the lengths of local atomic bonds; and 2) the d-band effect described by the filling fraction and bimodality of d-orbital DOS.

Prediction of yield strength based on the RMSAD parameter

One of the primary motivations behind developing in-depth understanding of lattice distortion is its physical correlation with solid-solution strengthening, which consequently provides a possible route to design high-strength HEAs. As discussed in the introduction section, ubiquitous lattice distortion over the crystal lattice could introduce significant atomic strains with random spatial fluctuations, which can elastically interact with the strain field of dislocations and provide a pinning force to retard the dislocation movement. Outstanding yield strengths have been generally achieved in the HEAs with severe lattice distortion3,4,63. The correlation between the lattice distortion and yield strength has also been discussed previously in a few individual alloy systems, including fcc HEAs composed of 3d transition metal elements33, and bcc Nb-based35,64 and fcc Ni-based solid-solution alloys34.

In spite of the great success of these previous works, the correlation has so-for been only quantitatively confirmed for a few equimolar alloy compositions in rather limited compositional spaces. Our surrogate model of lattice distortion provides an opportunity for performing a more comprehensive assessment in a much broader space, because the model enables a rapid prediction of RMSAD for any given alloy compositions with the studied ten refractory elements. Towards this goal, we have tried to collect an exhaustive set of the room-temperature experimental yield strength and hardness data from a recent HEA database developed by Citrine Informatics and other literatures2,19,20,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95. In total 83 yield strength and 28 hardness data were collected, which covered 66 individual compositions (30 equiatomic, and 33 non-equiatiomic cases), ranging from quaternary to senary systems (Supplementary Table 2). The data collection is limited to as-cast and annealed alloys, so the strengthening effects due to processing such as grain refinement and strain hardening can be minimized. The detail of our data-collection process is described in the methods section. Then, regarding each of the experimental data, the surrogate model is employed to predict RMSAD according to the alloy composition in order to investigate the potential correlation between the lattice distortion and yield strength.

A modified theory of solid-solution strengthening for random alloys has been recently developed by Maresca and Curtin28. The theory expresses the athermal yield strength (τY) of a bcc solid-solution alloy as,

$$\tau _{{{\mathrm{Y}}}} = 0.051\alpha ^{ - \frac{1}{3}}\mu \left( {\frac{{1 + v}}{{1 - v}}} \right)^{\frac{4}{3}}f^{\tau}\left[ {\frac{{\mathop {\sum }\nolimits_n c_n\Delta V_n^2}}{{b^6}}} \right]^{\frac{2}{3}}$$

In the equation, α is the dislocation line-tension parameter, which is a constant for a given type of dislocation, and \(\left( {\frac{{1 + v}}{{1 - v}}} \right)^{\frac{4}{3}}\) is a term of elastic anisotropy calculated by the Poisson’s ratio (v), which is usually insensitive to the variation of alloy compositions96. From the equation, one can consider the variation of the athermal yield strength with alloy composition is dictated by three parts of contributions, the isotropic shear modulus of the alloy, μ, an effective parameter describing the gradient of dislocation stress field, fτ, and the volume misfit quantity, \(\frac{{\mathop {\sum }\nolimits_n c_n\Delta V_n^2}}{{b^6}}\), which is actually closely related to the δ parameter, a commonly-used parameter to describe lattice distortion based on the mismatch of atomic radii. Recently, the volume-misfit quantity has also been found to be strongly correlated with the RMSAD parameter in several HEA systems55, correspondingly evidencing the correlation of the yield strength with lattice distortion.

Therefore, inspired by Eq. (2) above, here we propose a simplified linear expression based on RMSAD to effectively model and predict the room-temperature yield strength of bcc refractory HEAs, which can be written as,

$$\tau _{{{\mathrm{Y}}}} \approx a\mu \gamma _{{{{\mathrm{USF}}}}} \times {{{\mathrm{RMSAD}}}}$$

where μ is still the isotropic shear modulus, γUSF is the average unstable stacking fault (USF) energy of the {110} plane of the alloy, and a is a constant coefficient, which should be universal to different alloy compositions and systems. The derivation of Eq. (3) is based on the three main contributions to yield strength in Eq. (2), where we use RMSAD to parameterize the local atomic misfit strain generated by lattice distortion and use the USF energy (γUSF) to substitute fτ to reflect the gradient of the dislocation-stress field. This is because the derivation of fτ is usually a complex process involving extensive data fitting, and in the meanwhile, the gradient of the dislocation-stress field should be generally proportional to the width of the dislocation core, which essentially relates to the USF energy according to the Peierls-Nabarro model96. Moreover, it is noteworthy that the three parameters, μ, γUSF, and RMSAD, in Eq. (3) at a given alloy composition can be respectively predicted using Vegard’s law97,98, a statistical learning model developed by us previously56, and the present lattice distortion model, without requiring performing any additional time-consuming calculations. Therefore, Eq. (3) naturally offers a high-throughput prediction approach of yield strength for efficient screening over large compositional spaces.

To validate Eq. (3), experimental yield strength data is normalized by the isotropic shear modulus and USF energy to evaluate its linear dependence on RMSAD. As shown in Fig. 5a, a linear relationship is observed between the normalized yield strength, denoting as \(\tau _{{{\mathrm{Y}}}}/(\mu \gamma _{{{{\mathrm{USF}}}}})\), and RMSAD, giving an R2 value as high as 0.94. Additionally, the value of the constant, a, in Eq. (3) can be easily obtained via an OLS regression of \(\tau _{{{\mathrm{Y}}}}/(\mu \gamma _{{{{\mathrm{USF}}}}})\) on RMSAD, which is 0.29 Å/eV. An RMSE of 135 MPa was obtained by comparing the predicted yield strength predicted with the experimental truth. Additionally, it is well known that the hardness of most alloys shows a linear dependence on yield strength in general. Therefore, Eq. (3) should also be capable to model the hardness of bcc refractory HEAs, if it truly sketches the underlying physics of solid-solution strengthening. As expected, a clear linear relationship is also observed between the normalized hardness, \(HV/\left( {\mu \gamma _{{{{\mathrm{USF}}}}}} \right)\), and RMSAD, returning a RMSE of prediction as small as 29.5 HV (Fig. 5b).

Fig. 5: Linear dependence of mechanical properties of RHEAs on the RMSAD parameter.
figure 5

a Normalized yield strength (τY) vs. RMSAD, and b Normalized Vicker’s hardness (HV) vs. RMSAD. The normalization factor is a product of the alloy’s isotropic shear modulus (μ) and USF energy (γUSF). The yield strength and hardens data are collected from the previous experimental measurements in the literature. The unit of the y axis is \(\frac{{{{{\mathrm{GPa}}}}}}{{({{{\mathrm{GPa}}}} \times \frac{{{{\mathrm{J}}}}}{{{{{\mathrm{m}}}}^2}})}} = {{{\mathrm{m}}}}^2/{{{\mathrm{J}}}}\) in (a) and (HV m2)/(GPa J) in (b).

Data mining and interpretation

Based on the developed surrogate models, a systematic data mining over massive alloy compositions is performed to uncover the physical relations of lattice distortion to phase stability, strength, and ductility of RHEAs. First, the distribution of lattice distortion in bcc refractory alloys is mapped in a compositional space composed of ten major RHEA elements. Second, with the mapping results of lattice distortion, the room-temperature yield strengths of those mapped compositions are also predicted, using Eq. (3). Third, the predicted yield strength is leveraged with a previously developed ductility model56 and the CALPHAD approach99 to validate an integrated computational framework for the data-driven design of RHEAs with high strength and good ductility.

Systematic mapping of lattice distortion

Mapping of lattice distortion is performed for various binary, ternary, quaternary, and quinary alloys in a vast compositional space consisting of ten different elements, namely Ti, Zr, Hf, V, Nb, Ta, Mo, W, Re, and Ru. Compositions considered are limited such that every individual element composes between 10 and 60 atomic percent (at%) to maintain focus on the concentrated, high-entropy region of the compositional space. For a homogenous sampling, the binary, ternary, and quaternary alloys are assessed at 5 at% compositional intervals with each element while a 10 at% interval is applied to the quinary alloys. As such, a total of 76,816 compositions are investigated. Of those, 140 compositions are binary alloys; 5208 are ternary; 36,400 are quaternary; and 35,068 are quinary. At each mapped composition, we also utilize the CALPHAD approach99 to evaluate the phase stability of the single-phase bcc solid solution at 800 °C, which is a common temperature used for the homogenization processing of RHEAs. After the phase-stability evaluation, 5916 compositions are found to result in single-phase bcc solid solutions at 800 °C. Among them, 71 are binary; 1989 are ternary; 2012 are quaternary; and 1844 are quinary.

The mapping results of lattice distortion are shown in Fig. 6a, b where relative frequency histograms showing the distribution of RMSAD for all the screened alloy compositions and those possessing a single bcc phase at 800 °C, respectively. The compositions are grouped into binary/ternary (low/medium entropy) and quaternary/quinary (high entropy) categories to highlight the effect from the number of component elements, i.e., the cocktail effect of HEAs. In Fig. 6a, it can be seen that the RMSAD distributions are nearly identical between the binary/ternary and the quaternary/quinary groups. The mean values of the distributions are also similar, which are 0.1544 Å for the low/medium-entropy alloys and 0.1573 Å for the high-entropy alloys. However, when the solid-solution phase stability is considered, obvious difference is observed in the RMSAD distributions between the low/medium-entropy and high-entropy alloys. As presented in Fig. 6b, with the single-phase constraint, the RMSAD distribution of the low/medium-entropy group is biased towards the compositions with less lattice distortion. This feature is also evidenced by the increased skewness, which more than doubled (0.70 vs. 1.92) and reduced distribution mean (0.1544 vs. 0.0974 Å). In contrast, the distribution of RMSAD for the high-entropy group (quaternary/quinary alloys) does not show significant bias after applying the phase-stability restriction, which still holds a mean RMSAD as high as 0.1502 Å and a weak skewness. Comparing the results of Fig. 6a, b suggests that while highly distorted binary and ternary bcc solid-solution alloys are predicted to be possible, a majority of them cannot remain thermodynamically stable at typical processing temperatures. On the other hand, the quaternary and quinary alloys with severe lattice distortions are likely to be stabilized by the increased configurational entropy from the inclusion of multiple principal elements. There has been an enduring debate whether the severe lattice distortion is a feature unique to HEAs or it can be generally exiting in any kind of solid-solution alloys regardless compositional complexity100. The present observation clearly evidences the importance of compositional complexity in preserving solid-solution alloys with severe lattice distortion from phase decomposition.

Fig. 6: Distribution of lattice distortion in bcc refractory solid-solution alloys.
figure 6

Relative frequency histograms of lattice distortion in the alloys (a) before and (b) after screening for single-phase stability at 800 °C. The histogram colored in blue corresponds to the distribution of the RMSAD parameter of the binary and ternary alloys, while the histogram associated with the quaternary and quinary alloys is colored in orange. ce Boxplots describing the distribution of physical descriptors that contribute to RMSAD in the alloys with severe lattice distortion (i.e., alloys with RMSAD > 0.15 Å).

To uncover the mechanism of the HEA’s cocktail effect on lattice distortion, the single-phase-alloy compositions with a RMSAD larger than 0.15 Å are collected for a further investigation, as marked by the red box in Fig. 6b. Here, 0.15 Å is chosen as an indicator for severe lattice distortion, as it represents the 84th percentile of the binary/ternary distribution and the 58th percentile of the quaternary/quinary distribution. In this collective dataset, a comparison between the binary/ternary and quaternary/quinary alloys could clearly elucidate the effects of compositional complexity on severe lattice distortion. According to Eq. (1), a higher degree of lattice distortion can be achieved in an alloy that has a larger deviation in local atomic bond lengths (\(\sigma _{{{{\mathrm{B}}}}2/{{{\mathrm{bcc}}}}}^L\) in Eq. (1)), a VEC more away from 5.7 e/atom (\(u_{5.7}^{{{{\mathrm{VEC}}}}}\) in Eq. (1)), and/or a large variability of the number of valance electrons (\(\sigma _{5.7}^{{{{\mathrm{VEC}}}}}\) in Eq. (1)) among the constituent elements. The distributions of the \(\sigma _{{{{\mathrm{B}}}}2/{{{\mathrm{bcc}}}}}^L\), \(u_{5.7}^{{{{\mathrm{VEC}}}}}\), and \(\sigma _{5.7}^{{{{\mathrm{VEC}}}}}\) parameters in the binary/ternary and quaternary/quinary alloys with RMSAD > 0.15 Å are visualized via boxplots, as exhibited in Fig. 6c, d, and e, respectively. It is interesting to note that the quaternary/quinary alloys generally have a larger \(\sigma _{{{{\mathrm{B}}}}2/{{{\mathrm{bcc}}}}}^L\) than the binary/ternary alloys, but vice versa for the \(u_{5.7}^{VEC}\) parameter. Also, the aforementioned differences are demonstrated to be statistically significant via Welch’s t-tests101. This observation suggests that the severe lattice distortion in the quaternary/quinary alloys is more attributed to the deviations in local atomic bond lengths of the constituent elements. However, significant differences in the valences and atomic volumes of the constituent element in a solid-solution phase also leads to a strong driving force for phase decomposition, according to the classic Hume-Rothery rule. Thanks to the multicomponent chemistry, such a driving force in the quaternary and quinary alloys could be balanced by the stabilization effect induced by the increased configurational entropy. As a result, those alloys can still remain as a single-phase solid solution while bearing a large lattice distortion. On the other hand, it is understandable that the severe lattice distortion in the binary and ternary alloys is more associated with a larger \(u_{5.7}^{{{{\mathrm{VEC}}}}}\) parameter (Fig. 6d), since too large \(\sigma _{{{{\mathrm{B}}}}2/{{{\mathrm{bcc}}}}}^L\) would introduce phase decomposition, which cannot be sufficiently balanced by the entropy stabilization effect. In fact, the \(u_{5.7}^{{{{\mathrm{VEC}}}}}\) parameter cannot be too large either because the bcc lattice structure becomes unstable for a transition-metal alloy if its d orbital band is away from the half-filled state102. Therefore, a majority of screened binary and ternary alloys with severe lattice distortion are thermodynamically unstable.

Lattice distortion induced solid solution strengthening

With the results of the RMSAD screening above, we could further evaluate how the yield strengths of the bcc refractory alloys distribute across a large compositional space using the developed strengthening model (Eq. (3)). Here we restrict the yield-strength prediction to the same alloy compositions in Fig. 6b, which are predicted to be stable as a single bcc solid-solution phase at 800 °C. The compositions with RMSAD less than 0.01 Å, however, are omitted. It is important to note that this exclusion would only disproportionately affect a select few dilute compositions in the Mo-W binary series, which tend to exhibit low lattice distortion and yet could potentially still be strong due to a high lattice friction stress, as reflected by their shear moduli (Supplementary Table 3). The predicted yield strengths are visualized as distribution histograms to reveal the difference between the binary/ternary and quaternary/quinary alloys. As shown in Fig. 7a, the quaternary/quinary alloys generally possess a higher yield strength and a narrower distribution than those of the binary/ternary alloys. The higher average strength of the quaternary/quinary alloys is likely to be a benefit of higher levels of lattice distortions in those alloys (Fig. 6b), which contributes positively to yield strength according to Eq. (3). Additionally, in our strengthening model, the yield strength of a bcc refractory alloy is also proportional to its shear modulus and USF energy, which are two material features showing strong dependences on VEC. As exhibited in Supplementary Fig. 3, an alloy with a lower VEC that is away from a half-filled d band state generally has a smaller μ and γUSF. Therefore, to achieve a high yield strength in an alloy would require it to have a large RMSAD and meanwhile not a too small VEC. This combination of features is more possible to achieve in the quaternary/quinary alloys due to the entropy-stabilization effect.

Fig. 7: Distribution of yield strength in the bcc refractory alloys screened to be stable as a single-phase solid solution at 800 °C.
figure 7

Relative frequency histograms of (a) room-temperature yield strength and (b) room-temperature specific yield strength.

The potentials of RHEAs for light weighting applications are also evaluated through a prediction of specific strength, which is achieved by combining the yield strength data obtained above with a lattice-parameter model developed in our recent work for the prediction of theoretical density103. The predicted specific strengths are also overviewed, using the distribution histograms, as shown in Fig. 7b. It is exciting to note that the HEAs also averagely yield higher specific strengths than the alloys with low/medium configurational entropies. The results of Fig. 7 here suggest that more opportunities of light-weight, high-strength alloy compositions shall be expected by increasing the number of components through the “high-entropy” design strategy.

Data-driven design of high-performance RHEAs

The current screening model for alloy’s yield strength can be seamlessly integrated with a ductility assessment model developed in our previous work for a data-driven exploration of strong and ductile bcc refractory alloys. Taking alloy composition as input, the ductility model yields rapid prediction for a so-called D parameter, which is defined as the ratio between the surface energy of a crack fracture plane and unstable stacking fault energy of the common dislocation-slip plane. The magnitude of the D parameter physically reflects the likelihood of an alloy to be intrinsically ductile, based on the Rice fracture mechanics104. More importantly, the D parameter of RHEAs has been demonstrated to positively correlate with the compression fracture strains measured experimentally, which makes it an effective indictor to assess alloy’s ductility56. A combination of the strength and ductility models with the CALPHAD-based assessment of phase stability forms a data-driven design framework that is powerful for performing massive searches in a vast design space for promising alloy compositions with the enhanced ductility and strength. Figure 8a shows the flow chart of the framework. With an input alloy composition, the CALPHAD-based thermodynamic assessment is first carried out to check whether the bcc solid solution can be stable as a single phase at a given processing temperature. If yes, the strength and ductility models are sequentially applied to assess the mechanical properties.

Fig. 8: A high-throughput framework for screening of RHEAs with a good combination of ductility and strength.
figure 8

a A flowchart detailing the screening process of the framework. The RMSAD and yield strength are predicted, using the surrogate models developed in the present work, while intrinsic ductility is predicted from the models in our previous work56. The phase stability assessment is performed using the TCHEA4 database implemented in the Thermo-Calc Software99. b A scatter histogram of the yield strength (τY) and intrinsic ductility (D parameter) of the screened alloys that are single-phase solid solution at 800 °C. The data points that overlap with more other points within a bin area are coded by a warmer color. Experimentally studied RHEAs are also identified to show where the current experimental literature on equimolar compositions exists and where there is potential to optimize properties by exploring non-equimolar compositions. The red-dashed box illuminates a range of previously unexplored chemical compositions promising for achieving a single-phase RHEAs with a combination of high strength (τY > 1 GPa) and reasonable ductility (D parameter > 3.3). Scatterplots showing effect of VEC on (c) yield strength and (d) the D parameter of single-phase RHEAs. In (c) and (d), RMSAD is color mapped onto points to show how VEC and RMSAD can be optimized in tandem for high-performance alloy design.

As a demonstration, we applied this framework to the same composition dataset used for our lattice-distortion mapping, which contains 76,816 compositions homogenously sampling all the binary, ternary, quaternary, and quinary systems in the 10-component compositional space. The results are displayed by an Ashby plot of the strength-ductility combination in Fig. 8b. Positions of some experimentally-studied RHEAs2,19,65,67,68,70,71,72,73,74,75,77,78,80,82,83,84,85,86,87,88,89,90,91,92,93,94,95,105 are also highlighted to manifest the potential area to explore. As shown in Fig. 8b, despite a weak trend of the typical strength-ductility trade-off, the quite dispersed distribution between yield strength and the D parameter illuminates a range of RHEAs that can possess a good combination of strength and ductility. If we consider the alloys with a D parameter larger than 3.3 being potentially ductile, as it generally corresponds to a compression failure strain greater than 40%, there are still a substantial proportion of alloy compositions (marked by a dashed box) could have yield strength greater than 1 GPa but are not fully experimentally explored yet. A recently discovered RHEA, HfTaTiZr, falls in this window, which was indeed measured to have a yield strength about 1.5 GPa and be ductile in tension deformation with limited plasticity, in a good agreement with the present prediction67. Additionally, the alloy’s ductility was found to be significantly improved via proper heat-treatment processing106. Furthermore, the scattered distribution of the experimentally reported equimolar alloys in Fig. 8b indicates that maximizing configurational entropy may not always be an optimal design strategy for HEAs and there could be considerable spaces to tune alloy compositions for the improved strength and ductility combination relative to the currently known RHEAs.

Several practical insights are also derived from the current screening, which may enlighten the future design of RHEAs. VEC is unsurprisingly found to be quite influential for both yield strength and ductility (i.e., D parameter). The yield strength and D parameter of the screened alloys is plotted against their VEC in Fig. 8c, d, respectively, with data points color coded corresponding with RMSAD. As shown in Fig. 8c, the yield strength seems to peak at two VEC values, 4.7 and 5.5 e/atom, respectively. As discussed in Eq. (3), the yield strength is a function of RMSAD, μ, and γUSF. The relation between RMSAD and VEC is discussed in Fig. 2b with lower VECs corresponding with higher RMSAD. When plotted as a function of VEC (Supplementary Fig. 3), μ, and γUSF show a positive correlation with VEC. Based on these observations, The peak at 4.7 e/atom is attributed to increased lattice distortion and the peak at 5.5 e/atom is more likely to correspond to higher μ, and γUSF. On the other hand, the D parameter shows a negative dependence on VEC in general, which means that the alloys with larger VEC are likely have poor ductility due to a smaller D parameter. Particularly, regarding the alloys near the yield strength peak with VEC around 5.5 e-/atom, their D parameter is only about 2, which is a value exhibited by the alloys showing brittle deformation behaviors56. Therefore, a good balance between the strength and ductility may be achieved by tuning the alloy’s VEC to be around 4.7 e/atom. Furthermore, the lattice distortion also positively affects both the strength and ductility. For example, as shown in Fig. 8d, alloys with a larger RMSAD generally have a larger D parameter. Also, at a certain level of VEC, increasing the alloy’s RMSAD leads to a further enhancement of the yield strength, as shown by the colormap in Fig. 8c. These observations suggest that a good blend of strength and ductility can be expected if the alloy is designed to have a VEC close to 4.7 e-/atom and constitute elements that having large variances in their number of valence electrons and bcc/B2 bond lengths to maximize the lattice distortion (i.e., RMSAD).

In summary, a statistical surrogate model was developed to efficiently predict lattice distortion in bcc refractory non-dilute, high-entropy solid solution alloys with ten constituent elements: Ti, Zr, Hf, V, Nb, Ta, Mo, W, Re, and Ru. The lattice distortion quantified by the RMSAD parameter was previously only obtained for a few individual alloys by costly experiments or calculations but can now be rapidly predicted by the developed model in a high-throughput manner simply according to alloy composition. Unlike the conventional models based on atomic size mismatch of pure elements, the present model uses descriptors that characterize the length and electronic structure of interatomic bonds in solid-solution alloys for a more accurate description of lattice distortion. With physics-informed statistical learning, the model exhibits a promising capability to make reliable extrapolative predictions for the multicomponent alloy compositions outside the training data range. Moreover, a strong correlation between the lattice distortion and the room-temperature yield strength is revealed in bcc RHEAs. Accordingly, a surrogate model enlightened by the recent solid-solution strengthening theory28 is proposed for rapid prediction of yield strength of RHEAs simply according to their compositions.

The developed surrogate models are applied to screen over a massive amount of alloy compositions to systematically investigate the compositional dependence of lattice distortion and yield strength in RHEAs. The multicomponent nature (i.e., cocktail effect) of RHEAs is revealed to play a critical role in preventing the highly distorted solid-solution lattice from phase transformation. As a result, severe lattice distortion is found to generally be more achievable in the alloys with higher chemical complexity due to increased configurational entropy. The increased level of lattice distortion also benefits RHEAs to generally have a higher strength than their binary and ternary counterparts. Integration of the developed models with a model of intrinsic ductility56 and phase stability assessment based on the CAPHAD approach enables an alloy design platform for high-throughput search of high-performance RHEAs in a large compositional space that is far from being fully explored. The high-throughput search illuminates a range of previously unexplored chemical compositions promising for achieving a single-phase RHEAs with a combination of high strength (>1 GPa) and reasonable ductility. The present work demonstrates a promising route to combine physical principles with statistical approaches for developing efficient surrogate models to accelerate the pace of HEAs’ design and development, especially for those with non-equimolar compositions.


Details of DFT calculations

In the present work, DFT calculations were performed, using the projector augmented wave method107 with the exchange-correlation functional depicted by the general gradient approximation from Perdew, Burke, and Ernzerhof (GGA-PBE)108, as implemented in the Vienna ab-initio simulation package (VASP)109. The calculation parameters used were the same as those employed in our previous work56 with a plane-wave basis energy cutoff of 400 eV, first-order Methfessel Paxton smearing of 0.2 eV, and using k-point grids generated by the VASP’s automatic meshing scheme with an Rk length of 30 Å. An Energy convergence criterion of 10−6 eV was used for electronic self-consistency cycles. The atomic structural relaxations were performed by minimizing the atomic forces to be smaller than 0.001 eV/Å. The aforementioned settings of the k-point density and smearing width were validated by convergence test calculations in our previous work56. The supercells used for DFT calculations were constructed, using the SQS approach to approximate the chemical disorder in solid-solution alloys47. For binary, ternary, and quaternary alloy compositions, the generated SQSs contains 72 atoms, and 120-atom SQSs were employed for quinary compositions. Overall, as listed in Supplementary Table 1, the relaxed atomic structures of 69 binary, 36 ternary, 64 quaternary, and 52 quinary bcc refractory alloys were calculated from a compositional space comprised of 10 group IV–VIII elements, including Ti, Zr, Hf, V, Nb, Ta, Mo, W, Re, and Ru. Among the DFT results, 106 of the compositions were adopted from our previous work56.

Scrutiny of bcc-to-ω phase transformation in training data

A bcc-based SQS for refractory alloys with low VECs could be dynamically unstable and transform into a lattice closer to the crystal structure of the ω phase after relaxation40,41,110. To ensure that no ω-like structures were included in the training data, an approach developed be Ikeda et al. 41 was used. The bcc-to-ω transformation has the martensitic nature and can be caused by atomic shuffling of two-third atoms in the bcc lattice along a \(111_{{{{\mathrm{bcc}}}}}\) direction. Therefore, a SQS of a multicomponent alloy can be considered as being “ω -like” if the atomic displacements caused by relaxation display a preference for a particular \(111_{{{{\mathrm{bcc}}}}}\) direction whereas there should be no significant preference if the bcc symmetry is preserved41. The preference can be quantified through a four-step analysis. First, the atomic displacement of each atom in the supercell is projected along four distinct \(\left[ {111} \right]_{{{{\mathrm{bcc}}}}},\left[ {1\bar 1\bar 1} \right]_{{{{\mathrm{bcc}}}}},\left[ {\bar 11\bar 1} \right]_{{{{\mathrm{bcc}}}}}\) and \(\left[ {\bar 1\bar 11} \right]_{{{{\mathrm{bcc}}}}}\) directions, respectively. Second, the projection magnitude for each \(111_{{{{\mathrm{bcc}}}}}\) direction is averaged over all the atoms in the supercell. Third, the average projection magnitudes along different \(111_{{{{\mathrm{bcc}}}}}\) directions are sorted based on their values. Fourth, a structural descriptor is defined by taking the difference between the largest projection magnitude (p1) and second largest projection magnitude (p2), i.e., p1–p2. A relaxed SQS is considered as being “ω -like” if p1p2 is larger than a threshold value, 0.04cω, where cω is the lattice parameter of the c axis of a perfect ω crystal which is calculated as a function of the bcc lattice parameter (abcc) with the relationship: \(c_{\upomega} = \frac{{\sqrt 3 }}{2}a_{{{{\mathrm{bcc}}}}}\). If p1–p2 is smaller than 0.04cω, the bcc symmetry is preserved. By implementing Ikeda’s method, we found that concerning the SQSs used in the present work no structures exceeded this threshold. The calculated p1-p2 of each SQS is shown in Supplementary Fig. 4. In addition, NbTi3 was found to be an outlier relative to the other compositions and was thus removed from subsequent training process as a precautionary measure.

RMSAD parameter

In the present work, the RMSAD parameter was derived from DFT relaxation calculations. Specifically, the RMSAD parameter was calculated by taking the root-mean square average of the Euclidian distances between atomic positions in the relaxed SQS supercell and their corresponding ideal positions in the perfect bcc lattice, which can be expressed as,

$${{{\mathrm{RMSAD}}}} = \sqrt {\frac{{\mathop {\sum }\nolimits_i^n \left( {x_i^{{{{\mathrm{real}}}}} - x_i^{{{{\mathrm{ideal}}}}}} \right)^2}}{n}}$$

where n is the number of atoms in the supercell, \(x_i^{{{{\mathrm{ideal}}}}}\) is the ideal lattice site for atom, i, and \(x_i^{{{{\mathrm{real}}}}}\) is the DFT relaxed position of atom, i. Since the volume of the supercell can also change after relaxation, the ideal position of each atom was attained by scaling the initially unrelaxed supercell to have the same atomic volume as the relaxed one. The Euclidian distance was then calculated in Cartesian coordinates. As such, the calculated Euclidian distance should only reflect the atomic-position deviation due to the local lattice distortion rather than that caused by the change in the volume of the supercell after relaxation.

Physics-informed descriptors

The selection of the physical feature to construct descriptors was driven by a data exploration described in the result section. It was found that lattice distortion was largely determined by interatomic-bonding characteristics like the first nearest neighbor (FNN) bond length and valence electron concentration (VEC) and their variations due to the local chemistry fluctuation caused by random mixing. Therefore, other than VEC, which has a simple derivation, the rest of descriptors were constructed using a bond-counting-based method developed in our previous work, by which the physical characteristics of interatomic bonds in a random alloy can be effectively integrated with its composition56.

According to the bond-counting method111,112, a bulk property of an alloy, such as the cohesive energy, can be approximated as a summation of the individual contributions from each atomic bond in the alloy. In random alloys, the probability of finding an interatomic bond between two constituent elements, i and j, should be statistically equal to the product of the concentration of i and j. Therefore, the summation of individual bond contribution can be considered as a weighted average over different i–j bonds in the alloy, and the weighting factor is the presence probability of each type of i–j bonds. Accordingly, the descriptor can be written as,

$$u_{{{{\mathrm{B}}}}2/{{{\mathrm{bcc}}}}}^p = \mathop {\sum }\limits_i x_i\mathop {\sum }\limits_j x_jp_{i,j}$$

where xi and xj are the chemical compositions of elements, i and j, respectively, which are among the 10 refractory elements studied in the present work (i.e., Ti, Zr, Hf, V, Nb, Ta, Mo, W, Re, and Ru). \(u_{{{{\mathrm{B}}}}2/{{{\mathrm{bcc}}}}}^p\)is a descriptor based on an interatomic-bonding feature, p, which has a value of pi,j for the i–j type of atomic bond. Clearly, all values of pi,j can be summarized as a 10 × 10 matrix in which each component stands for a pair permutation between the ten alloying elements. Additionally, to reflect the fluctuations in local atomic environment, the standard deviation of \(u_{{{{\mathrm{B}}}}2/{{{\mathrm{bcc}}}}}^p\) is also taken as a descriptor with an expression as,

$$\sigma _{{{{\mathrm{B}}}}2/{{{\mathrm{bcc}}}}}^p = \sqrt {\mathop {\sum }\limits_i x_i\mathop {\sum }\limits_j x_j\left( {p_{i,j} - u_{{{{\mathrm{B}}}}2/{{{\mathrm{bcc}}}}}^p} \right)^2}$$

To derive the values of pi,j is not trivial. Assuming that the atomic interactions between the atoms as FNN are much stronger than others, we could apply the physical properties of a group of single-element bcc and ordered binary B2 structures as the bond feature parameter. For example, if we consider the FNN bond length as a bond-feature parameter, its pi,i component thus to be the length of the FNN i-i bond in the single-element bcc structure composed of element, i, while the pi,j and pj,i components are the FNN bond lengths of the binary B2 structure composed of elements, i and j. This assumption is valid because the only type of the FNN atomic bond in a bcc crystal purely comprised of element, i, is the i-i bond along the <111> direction, and the only type of the FNN bond in the binary B2 structure composed of elements, i and j, is the i–j bond, as illustrated in Fig. 9. Since there are only two atoms in the unit cell of the single-element bcc and ordered binary B2 structures, DFT calculations can be easily applied to these structures to predict the values of pi,j without demanding extensive calculations. In the present work, we derived two types of bond-feature parameters from the DFT calculations, which are the FNN bond length (L) and an electronic structure parameter (dip) measuring the bimodality of the local DOS of the d orbitals in the bcc/B2 structures. The dip parameter was designed to account for the nuance of the d-band filling effects in bcc transition metals beyond VEC, and demonstrated to great success capturing the effects of interatomic bonding on defect energetics of refractory alloys in our previous work56,61.

Fig. 9: Unit cells of the crystal structures used to model the physical features of the first nearest neighbor (FNN) atomic bonds in bcc solid-solution alloys.
figure 9

a binary B2 unit cell and b single element bcc unit cell.

Data splitting and statistical regression

Due to the high computational cost, the DFT-SQS calculations were only able to generate a finite amount of RMSAD data for 215 individual alloy compositions, which is only a tiny fraction of the full composition space studied in the present work. Therefore, to wisely leverage this small set of DFT data for reliable extrapolative predictions, we did not randomly split them into training and testing datasets. Instead, the obtained DFT data were split into three datasets based on the chemical complexity for training, validation, and testing. Specifically, only the data of binary and ternary alloys were used as training data, quaternary ones as the validation data to determine the best descriptors, and quinary ones were reserved exclusively for a final testing to examine the regression stability of the finalized model and its prediction ability for unseen compositions. This data-splitting strategy allows us to maximally test the prediction capability of the built surrogate models for extrapolative exploration of multicomponent alloy compositions not covered by the training set. Based on the developed descriptors, a modified OLS regression process was employed to develop a predictive model for the lattice distortion (i.e., the RMSAD parameter) in refractory solid-solution alloys. Specifically, the model was constructed piecewise, starting with the descriptors with the strongest correlation, and additional descriptors were sequentially added until no significant improvement in the prediction performance can be achieved.

Collection of the experimental strength data of RHEAs

To explore the deep connection between the lattice distortion and solid-solution strengthening in refractory HEAs, a set of experimental yield-strength and hardness data were collected from a HEA database recently developed by Citrine Informatics and other literatures2,19,20,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95. Specifically, data points were filtered to collect the measurements made at room temperature on bcc single-phase HEAs with constituent elements among the ones studied in the present work. The collection is limited to as-cast and annealed alloys to minimize the extrinsic strengthening effects due to processing such as strain hardening and grain refinement and to compression yield strengths because of a relative lack of tensile tested data. Overall, 83 yield-strength and 28 hardness data were collected, which covered 66 individual alloy compositions, ranging from quaternary to senary systems, as summarized in Supplementary Table 2. For each of the assessed experimental data, its corresponding RMSAD parameter is predicted by the surrogate model developed in the present work for exploring the correlation between lattice distortion and yield strength.