# Local electronic descriptors for solute-defect interactions in bcc refractory metals

## Abstract

The interactions between solute atoms and crystalline defects such as vacancies, dislocations, and grain boundaries are essential in determining alloy properties. Here we present a general linear correlation between two descriptors of local electronic structures and the solute-defect interaction energies in binary alloys of body-centered-cubic (bcc) refractory metals (such as W and Ta) with transition-metal substitutional solutes. One electronic descriptor is the bimodality of the d-orbital local density of states for a matrix atom at the substitutional site, and the other is related to the hybridization strength between the valance sp- and d-bands for the same matrix atom. For a particular pair of solute-matrix elements, this linear correlation is valid independent of types of defects and the locations of substitutional sites. These results provide the possibility to apply local electronic descriptors for quantitative and efficient predictions on the solute-defect interactions and defect properties in alloys.

## Introduction

Solute atoms, whether they are added voluntarily for specific needs, inevitably remained as impurities after the synthesis, or introduced during the materials service, can affect various properties of alloys by changing the stability and mobility of crystalline defects1,2,3,4,5. One characteristic example is body-centered-cubic (bcc) refractory alloys based on group V (V, Nb, Ta) and VI (Mo, W) elements. These alloys are usually composed of a single bcc solidsolution phase, of which many properties are mainly managed by controlling the interactions of crystalline defects with solute elements, especially transition metal elements4,6,7,8,9,10. These interactions can be quantitatively characterized as the solute–defect binding energy, which is often correlated with the elastic strain energy variations caused by the size mismatch between solute and matrix atoms at different atomistic sites11,12,13. Beyond elastic interactions, especially in/near the core regions of defects, the variations in local electronic structures and chemical bonding caused by solute and defect geometries should contribute to the solute–defect binding energies, so this variation is usually referred to as the electronic contribution in the literature14,15. Understanding and quantifying these electronic contributions are critical for both fundamental science and technological development of advanced alloys in future.

Scientifically, a general physics-based model is required to explain electronic effects on the solute binding for various types of defects and alloys recently found by first-principles calculations. The solute–defect binding in bcc refractory metals seems to show strong dependences on the electronic features of solute elements. A unique regularity—the solute–defect interaction becomes more attractive when the solute element has more valence electrons—has been reported for the interactions between transition metal elements and various types of crystalline defects in W/Mo alloys in different dimensions, including vacancies16, dislocations4,6,17, and grain boundaries (GBs)18.

Technically, quantifying the electronic contributions may provide effective and robust descriptors to represent the features of materials in the complex compositional and structural spaces. Both first-principles calculations and atomistic simulations using empirical potentials are often difficult to provide computationally efficient and chemically accurate descriptions for various types of complex defects simultaneously, especially for alloy systems. The recent development of data-centric materials science based on machine learning methods may help resolve the problem. However, these new methods usually require the descriptors derived from physical principles to improve their transferability19,20,21. Electronic structures related to defect–solute interactions can be potential candidates for such descriptors, which have been suggested by many recent first-principles calculations. Some of these studies were related to electronic band filling effects14,22,23; others also indicated alternative electronic structure features that can affect energetic properties of the transition metal alloys, including d-band bimodality24, the transition between eg and t2g orbital sets25, eg/t2g population ratio17, and upper band edge26.

Using first-principles calculations based on density functional theory (DFT), herein we show that the binding behavior between transition metal substitutional solute elements and various types of crystalline defects (zero-, one- and two-dimensional (0D, 1D, and 2D, respectively)) in non-magnetic bcc refractory metals is highly correlated to the variations in the local electronic structures of the matrix atom in the unalloyed defect. This correlation largely depends on two electronic descriptors inspired by tight-binding theory24,27,28,29,30. One descriptor is the variation in the bimodality feature of the d-orbital local density of states (LDOS) of the matrix atom before substitution; the other is the change in the bond hybridization strength between the valance sp- and d-bands of the same matrix atom. Moreover, based on these two electronic descriptors, a linear regression model is proposed to describe the solute–defect interaction energies in binary alloys of bcc refractory metals with transition metal substitutional solutes. For a particular pair of solute–matrix elements, this linear correlation is valid independent of types of defects and the locations of substitutional sites. We also provide detailed examples to demonstrate the promising potential of this correlation for efficient predictions of the defect–solute interaction energies at different atomic sites in complex defect structures. The prediction accuracy can be further improved by a residual-corrected nonparametric regression model solely based on descriptors established from the local electronic structures of the matrix atom. The observed generality of the solute–defect interaction can provide physical guidance on the proper selection of solute elements in a quantitative manner to control the crystalline defects in alloys with targeted properties.

## Results

### Solute interaction and LDOS of dislocation core

Figure 1a shows the calculated interaction energy (i.e., binding energy) Eint between the $$\frac{1}{2}\left\langle {111} \right\rangle$$ screw dislocation core and five types of transition metal substitutional solutes in bcc W, namely, Ta, Re, Os, Ir, and Pt. In this paper, positive/negative values of Eint indicate attractive/repulsive interactions between solutes and defects. The dislocation structure is fully relaxed to reach its equilibrium state in pure W and subsequently used for solute substitution. The interaction energies are calculated under two conditions: relaxing and fixing atomic positions during the total energy calculations of the solute-doped dislocation structures. Therefore, the difference between the relaxed $$\left( {E_{{\mathrm{int}}}^{{\mathrm{relax}}}} \right)$$ and fixed-lattice interaction energies $$\left( {E_{{\mathrm{int}}}^{{\mathrm{fix}}}} \right)$$ gives the energy gained by the relaxation of the W lattice upon the solute substitution. As shown in Fig. 1a, both the relaxed and fixed-lattice interaction energies are negative for the solute with fewer d electrons than W and become more positive when the solute has more d electrons. In addition, the relative difference between $$E_{{\mathrm{int}}}^{{\mathrm{relax}}}$$ and $$E_{{\mathrm{int}}}^{{\mathrm{fix}}}$$ is small for all the solutes. These results indicate that the observed dependence of the interaction energies on the number of d electrons of the solute element mainly originates from the local changes in the electronic structure near the dislocation core rather than the effects of the lattice relaxation upon the solute substitution.

Owing to the localized characteristics of d orbitals, the LDOS of transition metals can display considerable shape features that are characteristic of the given crystal structure27,29. Using W as an example, Fig. 1b shows that the bcc structure results in a bimodal d-band LDOS (solid-blue line) with a pseudo-band gap in the middle of the d-band, while the LDOS of close-packed structures (i.e., face-centered cubic (fcc)/hexagonal close-packed (hcp)) has a unimodal shape (solid-orange line). Interestingly, it is found that the LDOS of the W atom surrounding the screw dislocation core (dashed-blue line) also has a less bimodal shape compared to that of perfect bcc, as a consequence of the change in local atomistic structures. Similar variation in LDOS is also observed for the $$\frac{1}{2}\left\langle {111} \right\rangle$$ screw dislocation in Nb and Mo31. The bimodality distinction of LDOS was found previously to be essential for differentiating the energetic stabilities between the bulk phases with bcc and close-packed structures in transition metal systems27,28,29. When d-band is about half-filled, the Fermi level (EF) is located close to the minimum of the pseudo-band gap in the LDOS of bcc structure, as shown in Fig. 1b. Qualitatively speaking, the LDOS of bcc structure has more occupied states far below EF and less occupied states close to EF compared to that of fcc/hcp structure when the d-band is about half-filled29. This leads to a lower electronic band energy, which makes bcc structure more stable compared to the close-packed structure29.

Interestingly, solute substitutions do not significantly change the bimodality features of LDOS for the dislocation core and the bcc bulk site, showing characteristics of the so-called canonical d-band27,29,32. Figure 1c, d show the LDOS of atoms at a dislocation core site and a bulk bcc site far away from the core when these sites are occupied by Re or Ta instead of W, respectively. The solute atom at the core site still has a less bimodal LDOS compared with its counterpart at the bulk site. However, the filling fraction of the local d-band of the solute atom is changed as it has a different number of d electrons than W. As Re has more d electrons than W, the position of the EF on LDOS of Re shifts away from the minimum of the pseudo-band gap, toward the right band edge. Moreover, it is found that EF will keep shifting closer to the right band edge for the solute with more d electrons (Supplementary Fig. 6). According to bond-order potential theory, a structure with less bimodal DOS can usually be stabilized when the filling fraction is towards to the band edges, while a more bimodal DOS is favored for a half-filled band27,28,29,30. Therefore, compared to placing W atoms at the core site, the system may benefit from a stabilization contribution from the band energy when the core site is occupied by the solute atom with more d electrons than W. Correspondingly, there is a positive/attractive interaction tendency between the dislocation core and these solute elements as shown in Fig. 1a. A similar solute-induced stabilization mechanism has also been demonstrated on the $$\left\{ {112\bar 1} \right\}$$ twin boundary (TB) of hcp Re24. On the other hand, compared to that of the W atom, EF shifts to a position even closer to the minimum of the pseudo-band gap of the LDOS of the Ta solute as shown in Fig. 1d. Since the difference in the number of the occupied state close to EF between the core and bulk LDOS may be maximized at the minimum of the pseudo-band gap, Ta atom should be less preferred by the core site than W atom by considering occupied states close to and far below the EF. This consequently yields a negative/repulsive interaction energy as shown in Fig. 1a.

### Electronic attributes of solute–defect interactions

The results of Fig. 1 reveal a qualitative correlation between the d-band bimodality and the solute–dislocation interaction in the binary alloys of bcc W and transition metal solutes. To further explore this correlation, we investigate the local electronic structures of atoms near several 0D, 1D, and 2D defects in pure W, including mono-vacancy, < 100 >-dumbbell, < 111 >-dumbbell, $$\frac{1}{2}\left\langle {111} \right\rangle$$ screw dislocation, Σ3$$\left( {11\bar 2} \right)$$ TB, Σ3(111), Σ5(310), and Σ5(210) GBs. To quantify the bimodality of the DFT-calculated LDOS, Hartigan’s dip test was performed33,34. A completed unimodal LDOS corresponds to a test statistic of 0, while a more bimodal LDOS has a larger value of test statistic33,34. We then use a parameter, Δdip, to quantify the change in the bimodality of the LDOS of the atoms near the defect relative to a reference atom that is far away from the defect, where Δdip = dip(reference) − dip(defect). Therefore, W atom at a site with a more positive Δdip will have a less bimodal LDOS compared to the atom at the reference site. Furthermore, for the W atoms where the Δdip calculations are performed, we also calculate the corresponding fixed-lattice solute–defect interaction energies $$\left( {E_{{\mathrm{int}}}^{{\mathrm{fix}}}} \right)$$ when these W atoms are substituted by the Pt, Re, and Ta solutes, respectively. The results are summarized in Supplementary Note 2. In addition, like the solute–dislocation interactions, it is found that the effects of solute-induced lattice relaxation on the interaction energy are also small for other defect structures in W (details in Supplementary Note 3).

By comparing the calculated Δdip with $$E_{{\mathrm{int}}}^{{\mathrm{fix}}}$$, we notice a very interesting phenomenon that the variations in $$E_{{\mathrm{int}}}^{{\mathrm{fix}}}$$ of the Re and Pt solutes are strongly correlated with the variations in the bimodality of the LDOS for the W atoms that is being substituted at the sites with different separation distance to the defect center. Taking the $$\frac{1}{2}\left\langle {111} \right\rangle$$ screw dislocation as an example, as shown in Fig. 2a, the defect site with a higher Δdip generally has a more attractive interaction with the solutes (higher $$E_{{\mathrm{int}}}^{{\mathrm{fix}}}$$). This correlation is consistent with the analyses in Fig. 1b–d, since a more positive Δdip corresponds to a less bimodal LDOS feature for W atom at that site. If we assume that the solute substitutions do not significantly change the bimodality features of LDOS as shown in Fig. 1c, d, a less bimodal LDOS indicates that this atomic site prefers to be occupied by the solute atoms with more d electrons than W because EF will be at a position closer to the edge of their d-band. In addition, the correlation between Δdip and $$E_{{\mathrm{int}}}^{{\mathrm{fix}}}$$ is found to be also valid for the Re and Pt solutes interacting with the defects in transition states, such as the generalized stacking faults (GSF) shown in Supplementary Note 4.

Moreover, if we plot all the calculated $$E_{{\mathrm{int}}}^{{\mathrm{fix}}}$$ together with respect to the corresponding Δdip parameter, an approximately linear relationship can be revealed between $$E_{{\mathrm{int}}}^{{\mathrm{fix}}}$$ and Δdip for both Re- and Pt-substitutional solutes, as shown in Supplementary Fig. 11a, b, respectively. These results indicate that the filling energy of the d-band associated with the bimodality variation indeed has significant contribution to the solute–defect interaction energy, which can be quantitatively described by the Δdip parameter. On the other hand, compared to the W–Re and W–Pt systems, the correlation between $$E_{{\mathrm{int}}}^{{\mathrm{fix}}}$$ and Δdip in the W–Ta system becomes more scattered. For example, as shown in Fig. 2b, the Ta solute generally interacts in a repulsive way with the W Σ3$$\left( {11\bar 2} \right)$$ TB, which yields a negative correlation between $$E_{{\mathrm{int}}}^{{\mathrm{fix}}}$$ and Δdip (Δdip > 0 → $$E_{{\mathrm{int}}}^{{\mathrm{fix}}}$$ < 0), consistent with the analyses in Fig. 1d. However, quantitative discrepancies can be seen for several individual sites near the defects. For example, sites 4 and 5 in Σ3$$\left( {11\bar 2} \right)$$ TB shown in Fig. 2b have nearly zero values of Δdip and notable values of $$E_{{\mathrm{int}}}^{{\mathrm{fix}}}$$ in contrast. This implies that there could be other underlying mechanisms contributing to the solute–defect interaction energies, which cannot be solely described by the Δdip term.

One possible mechanism could be the energy contributions from the valence sp-band. Owing to the covalent feature of the d-band, the valence sp-band can be strongly hybridized with and thus strongly influenced by the valence d-band. Within a tight-binding framework35,36,37,38,39,40,41,42, the strength of the spd hybridization (Esp) of an atom in transition metal alloys can be correlated with a function of (i) the interatomic distances between the atom and its neighboring atoms (dij) and (ii) the spatial extents of the d-orbitals of the atom and its neighboring atoms $$\left( {r_{d_i}\& r_{d_j}} \right)$$, which is $$E_{{\mathrm{sp}}} \propto \mathop {\sum }\limits_j r_{d_i}^{\frac{3}{2}}r_{d_j}^{\frac{3}{2}}/d_{ij}^5$$ (see Supplementary Note 5 for details). This suggests that the strength of the spd hybridization in a defect structure should vary with each individual atom since dij of the atom at each defect site can be different and the $$r_{d_i}$$ of the solute element can differ from that of the neighboring matrix element. Therefore, the effect of the spd hybridization may not be ignored for determining solute–defect interactions in the bcc refractory alloys.

### General correlation between electronic descriptors and $${\boldsymbol{E}}_{{\mathbf{int}}}^{{\mathbf{fix}}}$$

Based on the discussion above, we propose a linear regression model that approximates the solute–defect interaction energy $$\left( {E_{{\mathrm{int}}}^{{\mathrm{fix}}}} \right)$$ into two parts as shown in Eq. (1),

$$E_{{\mathop{\rm{int}}} }^{{\mathrm{fix}}} \approx \Delta E_{\mathrm{d}} + \Delta E_{{\mathrm{sp}}} \approx a_1\Delta {\mathrm{dip}} + a_2x_{{\mathrm{sp}}}$$
(1)

Here ΔEd represents the energy contribution due to the d-band filling, which may linearly correlate with the changes in the bimodality of the d-band through the Δdip term and a fitting coefficient, a1. The second part in Eq. (1), ΔEsp, represents the energy contribution related to the spd hybridization. We propose that ΔEsp can also be estimated through a fitting coefficient, a2, and a variable, xsp, that describe the local environment of the defect site related to the spd hybridization.

In the present work, xsp of a matrix atom near the defect in pure metals is proposed to be,

$$x_{{\mathrm{sp}}} = 1 - \frac{{\left( {V_{{\mathrm{vor}}}^{{\mathrm{def}}}} \right)^{ - \frac{5}{3}}/\epsilon _{{\mathrm{sp}}}^{{\mathrm{def}}}}}{{\left( {V_{{\mathrm{vor}}}^{{\mathrm{ref}}}} \right)^{ - \frac{5}{3}}/\epsilon _{{\mathrm{sp}}}^{{\mathrm{ref}}}}}$$
(2)

where $$V_{{\mathrm{vor}}}^{{\mathrm{def}}}$$/$$V_{{\mathrm{vor}}}^{{\mathrm{ref}}}$$ is the Voronoi volume of the atom at the defect and reference site, respectively, and $${\it{\epsilon }}_{{\mathrm{sp}}}^{{\mathrm{def}}}$$/$${\it{\epsilon }}_{{\mathrm{sp}}}^{{\mathrm{ref}}}$$is the center of the occupied sp-band projected on the atom at the defect and the reference site, respectively. The reference site is same as the one used for the calculation of Δdip and $$E_{{\mathrm{int}}}^{{\mathrm{fix}}}$$. The $${\it{\epsilon }}_{{\mathrm{sp}}}^{{\mathrm{def}}}$$ term is calculated as

$$\epsilon _{{\mathrm{sp}}}^{{\mathrm{def}}} = {\int}_{\!\!\!- \infty }^0 {E\rho _{{\mathrm{sp}}}^{{\mathrm{def}}}} \left( E \right)dE/{\int}_{\!\!\! - \infty }^0 {\rho _{{\mathrm{sp}}}^{{\mathrm{def}}}} \left( E \right)dE$$
(3)

where $$\rho _{{\mathrm{sp}}}^{{\mathrm{def}}}\left( E \right)$$ is the projected LDOS of the sp-band on the atom at the defect site and the Fermi energy EF is set to zero. $${\it{\epsilon }}_{{\mathrm{sp}}}^{{\mathrm{ref}}}$$ is calculated in the same way for the atom at the reference site. In Eq. (2), Voronoi volume (Vvor) is used to describe the average changes in the interatomic distances (dij) of the atoms near the defect, and $$1/{\it{\epsilon }}_{{\mathrm{sp}}}$$ is included as a scaling term to the effects of spd hybridization on solute–defect interactions (see Supplementary Note 6 for details). Like the Δdip term, the Voronoi volume and LDOS of the sp-band are also determined from the DFT calculations of relaxed atomic structures of pure matrix metals that contain defects. Herein we expect that the electronic features of the matrix atoms at defects are mainly assessed by the Δdip and xsp parameters, while the fitting coefficient a1 and a2 should be fixed values for each matrix–solute element pair.

Based on Eq. (1), we perform linear regressions to model the DFT-calculated $$E_{{\mathrm{int}}}^{{\mathrm{fix}}}$$ of the crystalline defects in the W–Ta, W–Re, and W–Pt binary alloy systems. Δdip and xsp are treated as regression variables; a1 and a2 are fitting coefficients. As shown in Fig. 3, the solute–defect interaction energies $$\left( {E_{{\mathrm{int}}}^{{\mathrm{fix}}}} \right)$$ predicted by the proposed linear model show good agreement with the results of DFT calculations for the W alloys with different transition metal solutes (i.e., Ta, Re, and Pt). Good regression quality is also demonstrated by the close-to-one value of adjusted R2 as listed in Table 1.

Considering the closeness of the crystal and electronic structures between group V and VI bcc elements, one would naturally wonder whether Eq. (1) can also be generally applied to model the solute–defect interactions in the binary alloys of group V element and transition metal solutes. To explore the possible correlation, we also perform DFT calculations to calculate the Δdip and xsp of atoms in several 0D, 1D, and 2D crystalline defects in pure Ta. As expected, it is found that Ta atoms near the defect center also generally have a less bimodal LDOS compared to those far away. For example, the d-orbital LDOS for a Ta atom exactly on the interface plane of the Σ3$$\left( {11\bar 2} \right)$$ TB are plotted in Fig. 4a, showing less bimodal characteristics comparing to the LDOS of a Ta atom far away from the interface.

The fixed-lattice solute–defect interaction energies $$\left( {E_{{\mathrm{int}}}^{{\mathrm{fix}}}} \right)$$ are also calculated correspondingly when Ta atoms are substituted by the Hf and Os solutes. Linear regressions based on Eq. (1) are performed to model the DFT-calculated $$E_{{\mathrm{int}}}^{{\mathrm{fix}}}$$. Parity plots of the regression results are shown in Fig 4b, c for Ta–Hf and Ta–Os systems, respectively. The regression coefficient and parameters are listed in Table 1. As shown by both Fig. 4 and Table 1, the proposed linear regression model (Eq. (1)) can be generally applied to quantitatively describe the solute–defect interactions in Ta-based alloys as well.

### Improving the accuracy of the linear correlation

As shown in Figs. 3 and 4, a few of outliers still appear in the predictions of the linear regression model, which have apparent discrepancies from the DFT results. Interestingly, we found that these outliers usually repeatedly appear at particular defect sites in multiple alloying systems. Scrutinizing the local electronic structures of the matrix atom at these outlier sites, it is found that there are some additional local features in their LDOSs. These features could affect the solute–defect energetics but are not sufficiently described by the Δdip and xsp parameters, resulting in large prediction errors. More detailed explanation can be found in Supplementary Note 8.

The above finding suggests that the remaining residuals of the linear regression model can be reduced if the model includes some other descriptors of the electronic bands in addition to Δdip and xsp. As indicated in the recent DFT calculations, the energetic properties of the transition metal alloys could connect closely with many band features, including the transition between eg and t2g orbital sets25, eg/t2g population ratio17, band occupation fraction14,22,23, and upper band edge26. Therefore, we propose an additional regression function, which is added on the basis of Eq. (1) to further correct the remaining residuals from the linear regression. Accordingly, the solute–defect interaction energy $$\left( {E_{{\mathrm{int}}}^{{\mathrm{fix}}}} \right)$$ is now proposed to be approximated as,

$$E_{{\mathrm{int}}}^{{\mathrm{fix}}} \approx a_1\Delta {\mathrm{dip}} + a_2x_{{\mathrm{sp}}} + f_{{\mathrm{r}} - {\mathrm{c}}}\left( {D_i,D_j, \ldots } \right)$$
(4)

where the first two parts of the equation are the linear model described by Eq. (1) with the same a1/a2 from Table 1. fr–c(Di, Dj,…) is the residual-correction function established by regressing the residuals Δlinearlinear ≡ $$E_{{\mathrm{int}}}^{{\mathrm{fix}}}$$ − (a1Δdip + α2xsp)) of the linear model based on a boarder set of 23 potential electronic descriptors (Di, Dj,…). These descriptors include Δdip andxsp; they also contain the band center and root-mean-square width of the whole d-orbital, eg and t2g orbital sets, and the sp-orbitals. In addition, these descriptors include the individual bimodalities of the eg and t2g orbital sets. All of these 23 descriptors are available from the DFT calculations of the defects relaxed in pure metals of matrix elements. A detailed description of the descriptor construction is included in Supplementary Note 9.

In the present work, the residual-correction function, fr–c(Di, Dj,…), is developed based on a sophisticated local regression model, as implemented in the Locfit package43,44,45,46. The model performs a series of kernel-weighted local linear regressions within a moving window across the descriptor space, which gives the largest weight to observations close to the center of the window and produces a smooth curve that runs through the middle of the observations44,45,46. The local regression is performed with only 4 of the 23 potential electronic descriptors at a time to mitigate the risk of overfitting. Within a cross-validation framework, we select five sets of descriptors (each set containing four descriptors) that provide the best regression accuracy on average in all the five solute–matrix systems studied in the present work, and all of these five descriptor sets have two or three descriptors in common. We then establish the residual-correction function by averaging the corresponding local regression models of these five sets of descriptors. More details on the algorithms and calculation procedures of this statistical model can be found in Supplementary Note 9.

The regression results of the improved model based on Eq. (4) (referred as the linear + fr-c model in the following) are plotted against the original DFT data in Fig. 5a, b for the W–Re and Ta–Hf systems, respectively. The regression results from the linear model solely based on Δdip and xsp (Eq.(1)) are also included for comparison. As shown in both figures, the developed linear + fr-c model indeed yields better agreements with the original DFT results. The parity plots of the W–Ta, W–Pt, and Ta–Os systems are shown in Supplementary Fig. 17, where the improvement of the regression accuracy is also clearly observed.

### Prediction of solute segregation in complex GB structures

Since all the descriptors used in the present linear correlation model and the regression model are available from the LDOSs of atoms at/near the relaxed defect structures in pure metals, one could possibly apply the model to efficiently predict the solute–defect interaction energy of any atomic sites in the defects of interest, especially those with complex geometries. Here we show some examples in both Ta and W matrix in terms of two complex GBs, namely the Σ13 (230) and Σ27 (552) GBs. These two GB structures both have high index GB planes and complex geometries, which require large supercells to accommodate (Supplementary Fig. 4). Particularly, the input geometry of the Σ27 (552)-GB is implemented from a ground state structure in W predicted by a state-of-art evolutionary structure search algorithm47,48. The prediction results of the linear (Eq. (1)) and the linear + fr-c (Eq. (4)) model based on electronic descriptors from DFT calculations of the unalloyed GBs are shown as parity plots in Fig 5c, d for the W–Re and Ta–Hf systems, respectively, in comparison with the DFT-computed $$E_{{\mathrm{int}}}^{{\mathrm{fix}}}$$. As shown by the blue symbols, the predictions solely from the two-descriptor linear model have already reached fairly good agreements with the DFT results for both GBs in both systems, indicating that the major energy contributions to $$E_{{\mathrm{int}}}^{{\mathrm{fix}}}$$ can be well captured by the linear model alone. Moreover, by adding the residual-correction function (fr-c), the linear + fr-c model (orange symbols) yields even better agreements, especially for the sites where the predictions of the linear model have large deviations. Similar validation results are also observed for the W–Ta, W–Pt, and Ta–Os systems, as shown in Supplementary Fig. 17.

With the predicted solute–defect interaction energies at each defect site, one can use the White–Coghlan site occupation model49,50 to estimate the GB solute concentration isotherms under an assumption of non-interacting solutes,

$$c_{{\mathrm{GB}}} = \frac{1}{N}\mathop {\sum}\nolimits_{i = 1}^N {\frac{1}{{1 + \frac{{1 - c_{{\mathrm{bulk}}}}}{{c_{{\mathrm{bulk}}}}}{\mathrm{exp}}\left( { - \frac{{E_{{\mathrm{int}}}^{X,i}}}{{k_{\mathrm{B}}T}}} \right)}}}$$
(5)

where $$E_{{\mathrm{int}}}^{X,i}$$ is the interaction energy of solute, X, when it occupies the ith of N sites at GB, T is temperature, and cbulk is the solute concentration in the bulk matrix (fixed as 2 at.% here). The solute concentration isotherms calculated using the $$E_{{\mathrm{int}}}^{X,i}$$ predicted by both the linear and linear + fr-c model are compared with those calculated using DFT-computed $$E_{{\mathrm{int}}}^{X,i}$$. As shown in Fig. 6a, b, for both of the GBs and all the five studied solute–matrix systems, the interaction energies predicted by the linear + fr-c model give concentration isotherms that are very close to the DFT reference curves across a wide temperature range. The largest deviation is seen for the case of Pt in W (552)-GB at high temperature range at about 6 at.%. In fact, the curves calculated using the interaction energies solely predicted by the linear model are already in fairly good agreement with the DFT references, except for the case of Pt in W (552)-GB at low temperature.

These results suggest that, with the present model, one can estimate the interaction energies in complex defect structures with reasonably small uncertainty for the prediction of solute segregation isotherms. Instead of running many case-by-case calculations for substitutional solutes at different atomic sites surrounding a specific defect, only one DFT calculation for this defect in pure matrix metal is needed for obtaining the local electronic descriptors. Here it has to be emphasized that, although the root-mean-squared errors are 0.03–0.1 eV for defect–solute interaction energies (varying from ~−1.0 eV to ~+3.0 eV) for individual defect sites in these five matrix–solute pairs, we still obtain the reasonably good accuracy in the prediction of solute segregation because the concentration values depend on the defect–solute binding energies of multiple sites at/near the defects. There could be risk having large errors if the current linear or linear + fr-c model is applied to predict solute effects on defect properties that are sensitive to the solute interaction with a particular defect site.

## Discussion

There are two major aspects that require further investigations to understand and improve our proposed numerical model for solute–defect interactions and defect properties in more general cases. For the first aspect, fundamental and quantitative physical mechanisms are needed to interpret the most effective descriptors and corresponding coefficients. As the linear correlation model is inspired by the moment analysis of DOS based on tight-binding theory27,28,29,30, it would deepen our understanding of solute–defect interactions if we can also provide physical interpretation of the fitting coefficients.

The fitting coefficients (a1 and a2) in Table 1 indeed show strong dependence on the number of d electrons of the solute element. In W alloys, the Δdip term yields a positive contribution (a1 > 0) to $$E_{{\mathrm{int}}}^{{\mathrm{fix}}}$$ for the solute with more d electrons than W (e.g., Re and Pt), while it yields a negative contribution (a1 < 0) for the solute with fewer d electrons (e.g., Ta), which is consistent with our analysis in Fig. 1b–d. In Ta alloys, this contribution becomes positive (negative) for the solute with fewer (more) d electrons than Ta, e.g., Hf vs. Os. This is because the relative position of EF on the LDOS of the d-band is intrinsically different between Ta and W when they serve as the matrix element. As shown in Fig. 4a, EF of the Ta matrix is located on the lower energy side of the bcc pseudo-band gap, unlike the position of EF in the W matrix shown in Fig. 1b. Therefore, when alloying Ta and solutes with fewer (more) d electrons, such as Hf (Os), the position of EF on the local d-band of the solute atom would further shift away from (toward) the pseudo-band gap compared to that of Ta matrix atom, leading to a positive (negative) contribution to $$E_{{\mathrm{int}}}^{{\mathrm{fix}}}$$ in terms of the Δdip parameter. Moreover, by alloying Ta with the solute element having even more d electrons (e.g., Au), EF should continuously move across the pseudo-band gap to the right edge of d-band to generate a positive contribution to $$E_{{\mathrm{int}}}^{{\mathrm{fix}}}$$. Consequently, the energy contributions of the Δdip term in the alloys of group V elements should have an overall parabolic relationship with the number of d electrons of solute, which may be reflected in some cases of the solute–defect interactions (e.g., Supplementary Fig. 18. and ref. 51,52). In addition, in both Ta- and W-based alloys, the coefficient of the xsp term (a2) always has a positive sign if the solute element has less d electrons than the matrix element (e.g., W–Ta and Ta–Hf), while yields a negative sign if the difference in the number of d electrons is reversed. This correlation can be understood in terms of the difference in the spatial extent of d-orbital between the solute and matrix elements. Details are provided in Supplementary Note 10. These qualitative results provide the foundations for further investigations of physical mechanisms of solute–defect interactions in a quantitative manner in refractory metals and beyond.

For the second aspect, although the linear model could be robust for general solute–defect interactions since it is based on physics-inspired mechanisms, the residual-correction model should be further improved for more accurate and efficient prediction ability. As shown in Figs. 5 and 6, our current methods are reasonably accurate to predict the defect properties that depend on average effects of defect–solute interactions. However, improvements are still needed for predicting the individual defect–solute interaction at a specific defect site in the weak limit (|$$E_{{\mathrm{int}}}^{{\mathrm{fix}}}$$| < ~0.05 eV). Since the residual-correction functions were developed based on local regression method from the limited amount of data due to the large computational cost (351 regression data points for 5 matrix–solute element pairs), the natural strategy to improve the accuracy and transferability of our method is to include more solute–defect interactions data and apply more advanced regression methods.

Furthermore, more representative and deterministic descriptors of electronic and atomistic structures can further improve the accuracy of our method. The discussions in Supplementary Note 8 show that Δdip has limitations to describe the characteristics of d-band LDOS in specific situations. These problems are overcome by including other effective descriptors, such as the center of the d-band, the center of the sp-band, and Δdip of the eg orbitals, in the residual-correction model, but they may not be the final solutions. Moreover, the accuracy could be further increased if we apply certain descriptors from deterministic methods instead of Δdip, which have tiny fluctuations due to its statistical method associated with the random number generator. The fluctuations can cause prediction uncertainties on the level of ~0.001 eV. In addition, descriptors for atomistic structures can be included to consider the elastic contributions in the weak limit of interactions13,53,54.

In summary, our findings establish a general and quantitative correlation between electronic structure descriptors and energetic stabilities of crystalline defects containing substitutional solute atoms in bcc refractory alloys. It is inspired by the classical theories of bulk phase stability based on electronic structures and applied to explain the energetic stabilities of local structural units at the atomistic level24. This correlation can potentially serve as a quantitative guideline for the transition metal alloy design with targeted properties by controlling the effects of solute–defect interactions on defect stability and mobility. From a broader perspective, this study provides a robust example and a key step to construct advanced theories to describe the quantitative connections between the chemical bonding characteristics at the electronic level and the macroscopic materials’ properties55,56,57. In addition, the observed electronic descriptors have potentials to be applied in data-centric materials’ innovation based on machine learning techniques58,59,60.

## Methods

### First-principles calculations

First-principles calculations in the present work were carried out using the projector augmented wave (PAW)61 method and the exchange-correlation functional depicted by the general gradient approximation from Perdew, Burke, and Ernzerhof62, as implemented in the Vienna ab initio simulation package (VASP)63. The energy cutoff of the plane-wave basis was 400 eV. Brillouin zone integration was performed using a first-order Methfessel–Paxton smearing of 0.2 eV64. The grid of the k-point mesh in the first Brillouin zone is set according to the size and geometry of the simulation supercells (see Supplementary Method for details). The convergence criterion of the electronic self-consistent loop was set as 10–7 eV for the structure relaxation and 10–8 eV for the static calculations. The electronic configurations of the pseudopotentials used for the present first-principles calculations are summarized in Supplementary Table 1. As shown in Supplementary Table 1, the semi-core 5p electrons are treated as valence electrons for the calculations of Hf, Ta, and W. However, it is found that the LDOS of the 5p-band localizes at very low energy states far away from the Fermi level and has a very large energy gap with the 5d-, 6s-, and 6p-bands. We thus assume that the 5p electrons are basically inner-core electrons that have very limited contributions to electronic bonding. Therefore, the LDOS of the 5p-band is not included in the band analysis based on Eq. (3).

First-principles calculations are performed in three steps to model the local electronic descriptors of the crystalline defects in bcc Ta and W and their interactions with substitutional solute atoms. In the first step, relaxation calculations are performed to obtain the optimized atomistic structures of crystalline defects in the pure metal matrix. In each relaxation calculation, the atoms and geometry of the simulation supercells are fully relaxed according to the Hellmann–Feynman forces, except calculations for the $$\frac{1}{2}\left\langle {111} \right\rangle$$ screw dislocation and the GSF defects due to their unique atomistic geometries. The relaxation of the $$\frac{1}{2}\left\langle {111} \right\rangle$$ screw dislocation is performed using the flexible boundary condition method65,66. The relaxation scheme consists of two steps: (1) the conjugate gradient relaxation of atoms near the dislocation core based on DFT calculations, and (2) the atomic structures outside the core region are relaxed based on the lattice Green function4,6,65,66. The two steps are repeatedly iterated until the maximum Hellmann–Feynman forces are <5 meV/Å4,6. In the calculations of the GSF defects, the atoms are only allowed to relax along the direction perpendicular to the fault plane. In the second step, static calculations are performed based on the relaxed defect structures to obtain the projected LDOS on each atom in the supercells. Then the local electronic descriptors of each atomic site of interest are obtained from the DFT-calculated LDOSs and atomistic structures. In the third step, solute atoms are introduced to substitute the individual solvent atoms with different separation distances to the defect center to investigate the solute–defect interactions. The relaxed defect structures in pure metals are used for solute substitution. After substitution, the interaction energies are then calculated under two different conditions: fixing and relaxing atomic positions during the total energy calculations of the solute-doped defect structures. The difference between the relaxed $$\left( {E_{{\mathrm{int}}}^{{\mathrm{relax}}}} \right)$$ and fixed-lattice interaction energies $$\left( {E_{{\mathrm{int}}}^{{\mathrm{fix}}}} \right)$$ gives the energy change due to the relaxation of the defect lattice upon the solute substitution. The fixed-lattice interaction energies are calculated for all solute–defect interactions considered in the present work, while the relaxed interaction energies are only calculated for a few defect sites in order to evaluate whether the lattice relaxation has a significant contribution to the solute–defect interaction energies. A detailed comparison between the calculated $$E_{{\mathrm{int}}}^{{\mathrm{fix}}}$$ and $$E_{{\mathrm{int}}}^{{\mathrm{relax}}}$$ is described in Supplementary Note 3.

### Hartigan’s dip test

The Hartigan’s dip test is a statistical method proposed by Hartigan and Hartigan34, which measures the deviation of the cumulative distribution function of an empirical distribution from that of unimodal distributions. The test takes a sample from the distribution density as inputs and transfers it into its unique corresponding cumulative distribution function, F(x). Since the distribution is empirical, the corresponding F(x) is a step function that jumps at each interval $$\left\{ {x_i} \right\}_{i = 1}^n$$, where n equals to the number of total intervals. In the test, there are three major steps. First, based on all the possible intervals [xi, xj] of F(x), where 1 ≤ i ≤ j ≤ n, we generated a set of unimodal cumulative distributions function, $$\left\{ {H_{ij}(x)} \right\}_{1 \le i \le j \le n}$$, that are all close to F(x). It means each of Hij(x) have to satisfy that: (i) the mode of Hij(x) is located in the interval [xi, xj]; (ii) Hij(x) is a straight line connecting (xi, F(xi)) and (xj, F(xj)); (iii) Hij(x) is the greatest one among all the convex functions that have smaller values than F(x) in the range (–,xi); and (iv) Hij(x) is the smallest one among all the convex functions that have larger values than F(x) in the range (xj,+∞). Second, each of Hij(x) is vertically shifted upward and downward with a same distance, dij, to form a band. The shifting is stopped until F(x) is within the band in all range, (–∞,+∞). Then, this shifting distance, dij, is defined as the distance between F(x) and Hij(x). Third, the smallest dij among all the tested Hij(x) is defined as the dip test statistic, which is returned by the test. Therefore, the unimodal distribution corresponds to a statistic of 0, while a more significant bimodal distribution is evidenced by a larger statistic.

In the present work, to perform the Hartigan’s dip test, the LDOS from first-principles calculations was normalized with respect to its total number of DOS and treated as an empirical distribution. The default settings in VASP was used to determine the minimum/maximum energy boundaries of the LDOS, so the interval of each individual LDOS calculation is slightly varied, ranging from 0.151 to 0.155 eV. Default setting was used for the NBANDS tag in the W-based calculations, which gave an average number of bands about 7.2 per atom. To keep the consistency, The NBANDS tag in the DFT calculations of the Ta system was set to the same value as those used in the W-based calculations. The sample for the dip test was then drawn randomly from the normalized LDOS with a size of 500 data points (Each LDOS in the present work was set to have 301 energy intervals in first-principles calculations.). We have drawn 8000 samples for each LDOS, and the dip test statistic of each LDOS being used for comparison is taken as the average of the statistics from the 8000 samples. All the Hartigan’s dip tests of bimodality of LDOS were performed using a MATLAB code by Mechler67. In addition, the sensitivities of the Δdip measurements to the LDOS-related DFT parameters (i.e., the number of bins, k-point density, cutoff energy, and width of smearing) were tested, which is described in Supplementary Note 1. In addition, the performance of Eq. (1) on predicting the $$E_{{\mathrm{int}}}^{{\mathrm{fix}}}$$ calculated from the four-supercell method68,69 are discussed in Supplementary Note 7.

## Data availability

The data that support the findings of this study are available from Supplementary Information and two public open-access repositories with identifiers (1) materials cloud (https://doi.org/10.24435/materialscloud:2019.0047/v1) and (2) materials commons (https://doi.org/10.13011/m3-k83c-kr76). The raw DFT data are also included in the open-access repositories.

## Code availability

The codes that support the findings of this study are available from the two public open-access repositories mentioned in the section of “Data availability.”

## References

1. 1.

Leyson, G. P. M., Curtin, W. A., Hector, L. G. Jr. & Woodward, C. F. Quantitative prediction of solute strengthening in aluminium alloys. Nat. Mater. 9, 750 (2010).

2. 2.

Wu, Z., Ahmad, R., Yin, B., Sandlöbes, S. & Curtin, W. A. Mechanistic origin and prediction of enhanced ductility in magnesium alloys. Science 359, 447–452 (2018).

3. 3.

Nie, J. F., Zhu, Y. M., Liu, J. Z. & Fang, X. Y. Periodic segregation of solute atoms in fully coherent twin boundaries. Science 340, 957–960 (2013).

4. 4.

Trinkle, D. R. & Woodward, C. The chemistry of deformation: how solutes soften pure metals. Science 310, 1665–1667 (2005).

5. 5.

Wakeda, M. et al. Chemical misfit origin of solute strengthening in iron alloys. Acta Mater. 131, 445–456 (2017).

6. 6.

Hu, Y.-J. et al. Solute-induced solid-solution softening and hardening in bcc tungsten. Acta Mater. 141, 304–316 (2017).

7. 7.

Romaner, L., Ambrosch-Draxl, C. & Pippan, R. Effect of rhenium on the dislocation core structure in tungsten. Phys. Rev. Lett. 104, 195503 (2010).

8. 8.

Rodney, D., Ventelon, L., Clouet, E., Pizzagalli, L. & Willaime, F. Ab initio modeling of dislocation core properties in metals and semi-conductors. Acta Mater. 124, 633–659 (2016).

9. 9.

Chookajorn, T., Murdoch, H. A. & Schuh, C. A. Design of stable nanocrystalline alloys. Science 337, 951–954 (2012).

10. 10.

Xu, A. et al. Ion-irradiation-induced clustering in W-Re and W-Re-Os alloys: a comparative study using atom probe tomography and nanoindentation measurements. Acta Mater. 87, 121–127 (2015).

11. 11.

Argon, A. S. Strengthening Mechanisms in Crystal Plasticity (Oxford University Press, Oxford, 2008).

12. 12.

Wolverton, C. Solute–vacancy binding in aluminum. Acta Mater. 55, 5867–5872 (2007).

13. 13.

Clouet, E., Garruchet, S., Nguyen, H., Perez, M. & Becquart, C. S. Dislocation interaction with C in α-Fe: a comparison between atomic simulations and elasticity theory. Acta Mater. 56, 3450–3460 (2008).

14. 14.

Naghavi, S. S., Hegde, V. I., Saboo, A. & Wolverton, C. Energetics of cobalt alloys and compounds and solute–vacancy binding in fcc cobalt: a first-principles database. Acta Mater. 124, 1–8 (2017).

15. 15.

Ohnuma, T., Soneda, N. & Iwasawa, M. First-principles calculations of vacancy-solute element interactions in body-centered cubic iron. Acta Mater. 57, 5947–5955 (2009).

16. 16.

Kong, X.-S. et al. First-principles calculations of transition metal–solute interactions with point defects in tungsten. Acta Mater. 66, 172–183 (2014).

17. 17.

Medvedeva, N. I., Gornostyrev, Y. N. & Freeman, A. J. Electronic origin of solid solution softening in bcc molybdenum alloys. Phys. Rev. Lett. 94, 136402 (2005).

18. 18.

Wu, X. et al. First-principles determination of grain boundary strengthening in tungsten: Dependence on grain boundary structure and metallic radius of solute. Acta Mater. 120, 315–326 (2016).

19. 19.

Pun, G. P. P., Batra, R., Ramprasad, R. & Mishin, Y. Physically informed artificial neural networks for atomistic modeling of materials. Nat. Commun. 10, 2339 (2019).

20. 20.

Bartel, C. J. et al. Physical descriptor for the Gibbs energy of inorganic crystalline solids and temperature-dependent materials chemistry. Nat. Commun. 9, 4168 (2018).

21. 21.

Ramprasad, R., Batra, R., Pilania, G., Mannodi-Kanakkithodi, A. & Kim, C. Machine learning in materials informatics: recent applications and prospects. npj Computational Mater. 3, 54 (2017).

22. 22.

Al-Zoubi, N. et al. Elastic properties of 4d transition metal alloys: values and trends. Computational Mater. Sci. 159, 273–280 (2019).

23. 23.

Li, H., Draxl, C., Wurster, S., Pippan, R. & Romaner, L. Impact of d-band filling on the dislocation properties of bcc transition metals: the case of tantalum-tungsten alloys investigated by density-functional theory. Phys. Rev. B 95, 094114 (2017).

24. 24.

De Jong, M. et al. Electronic origins of anomalous twin boundary energies in hexagonal close packed transition metals. Phys. Rev. Lett. 115, 065501 (2015).

25. 25.

Zhao, S., Egami, T., Stocks, G. M. & Zhang, Y. Effect of d electrons on defect properties in equiatomic NiCoCr and NiCoFeCr concentrated solid solution alloys. Phys. Rev. Mater. 2, 013602 (2018).

26. 26.

Xin, H., Vojvodic, A., Voss, J., Nørskov, J. K. & Abild-Pedersen, F. Effects of d -band shape on the surface reactivity of transition-metal alloys. Phys. Rev. B 89, 115114 (2014).

27. 27.

Pettifor, D. G. Bonding and Structure of Molecules and Solids (Oxford University Press, 1995).

28. 28.

Drautz, R. & Pettifor, D. G. Valence-dependent analytic bond-order potential for transition metals. Phys. Rev. B 74, 174117 (2006).

29. 29.

Sutton, A. P. Electronic Structure of Materials (Clarendon Press, 1993).

30. 30.

Seiser, B., Hammerschmidt, T., Kolmogorov, A. N., Drautz, R. & Pettifor, D. G. Theory of structural trends within 4d and 5d transition metal topologically close-packed phases. Phys. Rev. B 83, 224116 (2011).

31. 31.

Dezerald, L. et al. Ab initio modeling of the two-dimensional energy landscape of screw dislocations in bcc transition metals. Phys. Rev. B 89, 024104 (2014).

32. 32.

Andersen, O. K. Linear methods in band theory. Phys. Rev. B 12, 3060 (1975).

33. 33.

Freeman, J. B. & Dale, R. Assessing bimodality to detect the presence of a dual cognitive process. Behav. Res. Methods 45, 83–97 (2013).

34. 34.

Hartigan, J. A. & Hartigan, P. M. The dip test of unimodality. Ann. Stat. 13, 70–84 (1985).

35. 35.

Hodges, L., Ehrenreich, H. & Lang, N. D. Interpolation scheme for band structure of noble and transition metals: ferromagnetism and neutron diffraction in Ni. Phys. Rev. 152, 505 (1966).

36. 36.

Mueller, F. M. Combined interpolation scheme for transition and noble metals. Phys. Rev. 153, 659 (1967).

37. 37.

Pettifor, D. G. Accurate resonance-parameter approach to transition-metal band structure. Phys. Rev. B 2, 3031 (1970).

38. 38.

Pettifor, D. G. Theory of energy bands and related properties of 4d transition metals. III. s and d contributions to the equation of state. J. Phys. F Met. Phys. 8, 219 (1978).

39. 39.

Lambert, R. M. & Pacchioni, G. Chemisorption and Reactivity on Supported Clusters and Thin Films:: Towards an Understanding of Microscopic Processes in Catalysis, Vol. 331 (Springer Science & Business Media, 2013).

40. 40.

Xin, H., Holewinski, A., Schweitzer, N., Nikolla, E. & Linic, S. Electronic structure engineering in heterogeneous catalysis: identifying novel alloy catalysts based on rapid screening for materials with desired electronic properties. Top. Catal. 55, 376–390 (2012).

41. 41.

Harrison, W. A. Electronic Structure and the Properties of Solids: the Physics of the Chemical Bond (Courier Corporation, 2012).

42. 42.

Qian, X. et al. Quasiatomic orbitals for ab initio tight-binding analysis. Phys. Rev. B 78, 245112 (2008).

43. 43.

Loader, C. Local Regression and Likelihood (Springer Science & Business Media, 2006).

44. 44.

De Jong, M. et al. A statistical learning framework for materials science: application to elastic moduli of k-nary inorganic polycrystalline compounds. Sci. Rep. 6, 34256 (2016).

45. 45.

Stone, C. J. Consistent nonparametric regression. Ann. Stat. 5, 595–620 (1977).

46. 46.

Cleveland, W. S. Robust locally weighted regression and smoothing scatterplots. J. Am. Stat. Assoc. 74, 829–836 (1979).

47. 47.

Zhu, Q., Samanta, A., Li, B., Rudd, R. E. & Frolov, T. Predicting phase behavior of grain boundaries with evolutionary search and machine learning. Nat. Commun. 9, 467 (2018).

48. 48.

Frolov, T. et al. Grain boundary phases in bcc metals. Nanoscale 10, 8253–8268 (2018).

49. 49.

White, C. L. & Coghlan, W. A. The spectrum of binding energies approach to grain boundary segregation. Metall. Trans. A 8, 1403–1412 (1977).

50. 50.

Huber, L., Hadian, R., Grabowski, B. & Neugebauer, J. A machine learning approach to model solute grain boundary segregation. npj Computational Mater. 4, 64 (2018).

51. 51.

Shi, S., Zhu, L., Zhang, H., Sun, Z. & Ahuja, R. Mapping the relationship among composition, stacking fault energy and ductility in Nb alloys: a first-principles study. Acta Mater. 144, 853–861 (2018).

52. 52.

Zhang, X. et al. Effects of solute size on solid-solution hardening in vanadium alloys: a first-principles calculation. Scr. Materialia 100, 106–109 (2015).

53. 53.

Fellinger, M. R., Hector, L. G. & Trinkle, D. R. Effect of solutes on the lattice parameters and elastic stiffness coefficients of body-centered tetragonal Fe. Computational Mater. Sci. 152, 308–323 (2018).

54. 54.

Hanlumyuang, Y., Gordon, P. A., Neeraj, T. & Chrzan, D. C. Interactions between carbon solutes and dislocations in bcc iron. Acta Mater. 58, 5481–5490 (2010).

55. 55.

Hammer, B., Morikawa, Y. & Nørskov, J. K. CO chemisorption at metal surfaces and overlayers. Phys. Rev. Lett. 76, 2141 (1996).

56. 56.

Hammer, B. & Nørskov, J. K. Theoretical surface science and catalysis—calculations and concepts. Adv. Catal. 45, 71–129 (2000).

57. 57.

Hume-Rothery, W., Smallman, R. E. & Haworth, C. W. The Structure of Metals and Alloys (Metals & Metallurgy Trust, 1969).

58. 58.

Tanaka, I., Rajan, K. & Wolverton, C. Data-centric science for materials innovation. MRS Bull. 43, 659–663 (2018).

59. 59.

Gomberg, J. A., Medford, A. J. & Kalidindi, S. R. Extracting knowledge from molecular mechanics simulations of grain boundaries using machine learning. Acta Mater. 133, 100–108 (2017).

60. 60.

Mueller, T., Kusne, A. G. & Ramprasad, R. Machine learning in materials science: recent progress and emerging applications. Rev. Computational Chem. 29, 186–273 (2016).

61. 61.

Bloechl, P. E. Projector augmented-wave method. Phys. Rev. B 50, 17953 (1994).

62. 62.

Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient approximation made simple. Phys. Rev. Lett. 77, 3865–3868 (1996).

63. 63.

Kresse, G. et al. Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. Phys. Rev. B 54, 11169 (1996).

64. 64.

Methfessel, M. & Paxton, A. T. High-precision sampling for brillouin-zone integration in metals. Phys. Rev. B 40, 3616–3621 (1989).

65. 65.

Yasi, J. A. & Trinkle, D. R. Direct calculation of the lattice Green function with arbitrary interactions for general crystals. Phys. Rev. E 85, 66706 (2012).

66. 66.

Trinkle, D. R. Lattice Green function for extended defect calculations: computation and error estimation with long-range forces. Phys. Rev. B 78, 014110 (2008).

67. 67.

Mechler, F. A direct translation into MATLAB from the original FORTRAN code of Hartigan’s Subroutine DIPTEST algorithm. Retrieved from www.nicprice.net/diptest (2002).

68. 68.

Lüthi, B., Ventelon, L., Rodney, D. & Willaime, F. Attractive interaction between interstitial solutes and screw dislocations in bcc iron from first principles. Computational Mater. Sci. 148, 21–26 (2018).

69. 69.

Wang, J., Janisch, R., Madsen, G. & Drautz, R. First-principles study of carbon segregation in bcc iron symmetrical tilt grain boundaries. Acta Mater. 115, 259–268 (2016).

## Acknowledgements

Y.-J.H., C.Y., M.Z., and Q.L. acknowledge the support by startup fund from the University of Michigan and the partial support by National Science Foundation (NSF) under award DMR-1847837. B.Z. and X.Q. acknowledge the startup fund from Texas A&M University and the partial support by the NSF under award number OAC-1835690. Z.-K.L. would like to acknowledge the partial financial support from the NSF grant CMMI-1825538. The calculations were performed by using the Extreme Science and Engineering Discovery Environment (XSEDE) Stampede2 at the TACC through allocation TG-DMR190035, the computational resources and services provided by Advanced Research Computing at the University of Michigan, Ann Arbor, the resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02–05CH11231, and the advanced computing resources provided by Texas A&M High Performance Research Computing. Finally, we would like to thank Professor Dallas R. Trinkle in University of Illinois Urbana-Champaign for sharing his simulation codes on the flexible boundary condition method.

## Author information

Authors

### Contributions

Y.-J.H., X.Q. and L.Q. conceived the research and designed the modeling procedures. Y.-J.H., B.Z., C.Y. and M.Z. performed the first-principles calculations. Y.-J.H. and G.Z. performed the Hartigan’s dip tests and the modeling of the residual-correction function. Y.-J.H., Z.-K.L., X.Q. and L.Q. prepared the manuscript. L.Q. supervised the project. All authors discussed the results and contributed to the manuscript.

### Corresponding author

Correspondence to Liang Qi.

## Ethics declarations

### Competing interests

The authors declare no competing interests.

Peer review information Nature Communications thanks Pär Olsson and the other anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

Reprints and Permissions

Hu, Y., Zhao, G., Zhang, B. et al. Local electronic descriptors for solute-defect interactions in bcc refractory metals. Nat Commun 10, 4484 (2019). https://doi.org/10.1038/s41467-019-12452-7

• Accepted:

• Published:

• ### A brief review of data-driven ICME for intelligently discovering advanced structural metal materials: Insight into atomic and electronic building blocks

• William Yi Wang
• , Bin Tang
• , Deye Lin
• , Chengxiong Zou
• , Ying Zhang
• , Shun-Li Shang
• , Quanmei Guan
• , Jun Gao
• , Letian Fan
• , Hongchao Kou
• , Haifeng Song
• , Jijun Ma
• , Xi-Dong Hui
• , Michael C. Gao
• , Zi-Kui Liu
•  & Jinshan Li

Journal of Materials Research (2020)

• ### Grain boundary properties of elemental metals

• Hui Zheng
• , Xiang-Guo Li
• , Richard Tran
• , Chi Chen
• , Matthew Horton
• , Donald Winston
• , Kristin Aslaug Persson
•  & Shyue Ping Ong

Acta Materialia (2020)

• ### Solute/screw dislocation interaction energy parameter for strengthening in bcc dilute to high entropy alloys

• A Ghafarollahi
• , F Maresca
•  & WA Curtin

Modelling and Simulation in Materials Science and Engineering (2019)