Introduction

Multi-principal element alloys (MPEAs) present unique challenges including an astronomically large composition space (with more than 1078 possible alloy compositions1) and a multitude of interactions arising from mixing four or more principal elements (N ≥ 4) that cannot be precisely explained or predicted with the existing thermodynamic and physical models. Ab-initio calculations can accurately predict the phase stability and physical properties2,3,4; however, they are time-intensive when extended to finite temperatures and thus are not well-suited as an exploratory tool. Similarly, experimental exploration of the composition space becomes expensive and time intensive. Given these constraints, coupled with a reasonably large volume of data on these alloys generated over the past decade, machine learning (ML) based approaches have rapidly gained traction5,6,7,8,9. A classic shortcoming of ML is that models with fewer parameters tend to be physically interpretable, though less accurate, while those with larger number of parameters tend to be more accurate but lose the interpretability.

Hardness is an important quantity from engineering perspective as it is a measure of a material’s ability to resist localized plastic-deformation, scratching, or indentation. Knowledge of hardness can be critical because high hardness in an alloy is often associated with reduced ductility that may limit its use10. In the past few years, ML models have been reported and applied to MPEAs to predict mechanical properties11,12,13,14,15, thermal properties13 and phase selection16,17,18,19. Chang et al.12 implemented an artificial neural network (ANN), with a single hidden layer and three nodes, to predict hardness of AlCoCrFeMnNi high-entropy alloys (dataset size of 91 alloys) using composition-weighted hardness, density and atomic mass as features. Wen et al.15 explored the hardness of AlCoCrCuFeNi system (dataset size of 155 alloys) using multiple ML algorithms (linear/polynomial regression, support vector machines, decision trees and ANNs) based on elemental compositions and twenty other material features. While both these models are quite useful and display good accuracy, they were trained on datasets spanning eight and six-element composition space, respectively. This leaves out a significant number of alloy systems, especially refractory MPEAs, which cannot be explored reliably using these models. Rickman et al.11 developed a more comprehensive approach employing a canonical-correlation analysis to predict the hardness of MPEAs (dataset size of 82 alloys) using seven features built from elemental and thermodynamic parameters, and further used a genetic algorithm to search for high-hardness quinary alloys from a sixteen-element composition-space. While accurate, these models do not probe deeper into the nature of fit achieved by ML approach, thereby failing to assess the physical consistency of learning achieved – a challenge that we seek to address here through a combination of deconstructed ML predictions coupled with ab-initio stability analysis.

Single-phase MPEAs can provide fundamental insights into the correlation between physical and mechanical properties20,21,22; however, the multi-phase alloys allow greater flexibility in tailoring the microstructure leading to improved properties23,24,25,26,27. Also, one expects discontinuous variations in physical properties as we move across phase boundaries in multi-phase systems. This necessitates the development of strategies that can narrow down the regions of interest by rapidly exploring compositional spaces to provide approximate, but representative, insights into the targeted properties. Ideally, the model should also closely mirror the changes in microstructure assemblages, as well as the ordering that drives precipitate formation indicated by the phase transformations and capture the non-linear variation in physical properties due to change in alloy chemistry. As we develop this model for MPEA hardness prediction, it is pertinent to ask: (i) What elemental and thermodynamic variables (or combination thereof) can best describe alloy hardness? (ii) Can a combination of these variables, some of which vary linearly, be combined to predict non-linear response in the system? And, (iii) Is the ML model’s decision-making process a mere statistical fit or does it capture some fundamental insights into the physical origins of hardness?

In this work, we present a ML strategy (Fig. 1) that employs an ensemble of ANNs, driven by elemental and alloying descriptors, to rapidly predict and explore the hardness of MPEAs over vast compositional spaces. ANNs, inspired by biological neural networks, are capable of learning non-linear relationships and thus excel in predictive modelling of material properties15,28,29,30 as they can learn complex unknown functions from a stream of data31. To address the high variance of ANNs, we implement a model averaging ensemble learning technique combining output from 165 trained networks to give a final prediction. The material descriptors used for training are shortlisted from an extensive pool of 22 features based on their fundamental relevance and statistical correlation with respect to hardness. The model is trained over a dataset32 of 218 MPEAs and is validated using a test dataset of 58 alloys compiled from recent literature (these were not included in model training), followed by experimental validation for TiZrHfAlx system.

Fig. 1: Overview of the methodology used for extracting origins of hardness.
figure 1

a MPEAs hardness database development and calculation of alloy features, b training of neural network ensemble, c exploration of hardness over wide compositional spaces, d model interrogation to extract exact feature contributions along continuous composition pathways, (e) DFT results to probe ordering behaviour and structure stabilities, f experimental validation over complex alloy systems, and g analyzing the results to establish physical origins of hardness.

The ML model together with density-functional theory (DFT) is essential to minimize the gap in our understanding of the physical origin of mechanical response in MPEAs. Therefore, for arbitrary MPEAs, selected set of compositions were analyzed using DFT33 total-energy and electronic-structure calculations and DFT-based thermodynamic linear-response theory3,34 to assess chemical short-range order to capture/identify the physics, including phase stability, electronic effects, and short-range ordering/clustering and its relation to ML-predicted hardness. ML models, neural networks particularly, often face criticism due to their treatment as a black-box that severely limits the understanding of the decision-making process. To overcome this, we have developed a methodology that uses the local partial dependencies and stimulus-response characteristics of the ML model to reveal the decision-making process for critical insights as to how the ML model learns the physical origins of hardness through different features. Our explainability analysis approach identifies the origin of hardness at the feature level while the DFT calculations assist in identification of baselines as to what these origins at feature level may represent at an atomistic level within the material.

Results

The ML model was trained on databases available in the literature32. The model was validated using three different approaches – (a) Direct comparison with discrete hardness measurements of alloys across different alloy systems. (b) Validation of model predictions in systems with continuously varying compositions, where non-linear increases in hardness have been reported. And (c) experimental validation of model predictions for the TiZrHfAlx MPEAs. The ML model predictions have been combined with ab-initio calculations in cases (b) and (c) for understanding the physics of the process and how well is it reflected by the ML model.

Database diversity and model validation

A training dataset of 218 unique as-cast alloys, with experimentally measured hardness values, was extracted from the database compiled by Gorsse et al.32 The data consists of single- and multi-phase alloys, Fig. 2a, which span a composition space of 22 elements, Supplementary Fig. 1, including 3d-transition-metals, refractory-metals and select main-group elements. The hardness values in the dataset span from 109 to 905 HV with the distribution profile as shown in Fig. 2b. The region above 800 HV is very thinly populated. While there are some outliers lying beyond the 1.5 interquartile range (IQR) in single-phase BCC/FCC alloys, Fig. 2a, these have not been excluded from the training as we believe these alloys could be critical for capturing the underlying physics that may not be apparent in other alloys. The model was validated using a separate test dataset comprising 58 alloys (Supplementary Table 1) compiled from recent literature. The test dataset was not used for model training. It spans a composition space of 15 elements and contains alloys with hardness ranging from 123 HV to 894 HV, with a mean hardness value of 459 HV. The test set has a wide hardness distribution, as seen in Fig. 2c, similar to that of the training set, and this diversity ensures that the performance of ML model on the test set is a good representation of its predictive ability.

Fig. 2: Dataset exploration and model validation.
figure 2

a Statistical distribution of hardness values for eight different type of phase combinations present in the dataset along with number of alloys, mean hardness, median hardness, 1.5 IQR and 25–75% percentile range for each structure. b Distribution of hardness values in the training dataset (218 alloys) along with mean, median and 10–90% percentile range of hardness. c Distribution of hardness values in the test dataset (58 alloys) compiled from recent literature. d Parity plot of hardness predictions obtained for test dataset along with statistical performance metrics –root mean square error (RMSE), mean absolute error (MAE) and average percentage error. The shaded area represents an 80% accuracy region and the number at top right corner represents fraction of predictions with >80% accuracy.

The ML model, comprising of an ensemble of 165 trained ANNs, was used to predict the hardness of each alloy present in test set. Figure 2d shows the prediction results and performance metrics obtained for the test set. An average percentage error of 18.6% and mean absolute error of 82.8 HV was obtained on the test set, with 62% (75% if only as-cast alloys are considered) of predictions lying within 80% accuracy region. The alloys in test set were prepared with either vacuum arc melting (denoted by black dots) or through mechanical alloying (MA) plus spark plasma sintering (SPS) route (denoted by red dots, MA + SPS). Both the as-cast and (MA + SPS) alloys were kept in the test set to highlight the fact that the final model, trained on only as-cast alloys, is more prone to underpredict the hardness of alloys made through (MA + SPS) route. Thus, although the test set accuracy (~82%) is slightly lower in comparison with cross-validation accuracy (~87%, Supplementary Table 2), the model captures the experimentally measured hardness with a reasonable accuracy.

Prediction of non-linear trends in hardness

Having established a good statistical performance of our model for discrete alloy compositions, we perform the next step in our validation, namely exploration of continuously varied composition space and prediction of non-linear variations in hardness. This validation can be accomplished only if the model can correctly identify both the continuous monotonic (near linear) and discontinuous (non-linear) variation of hardness due to subtle changes in alloying chemistry. The discontinuity in hardness values may arise from formation of new phases resulting in different microstructural assemblage as a result of compositional variations. The new crystal structures can have significantly different nearest neighbors (and hence bonding) as well as completely different slip systems affecting the resistance to localized plastic deformation. One also expects that significant incipient ordering, which may arise near phase boundaries as the composition is varied continuously prior to the actual phase separation/transformation, may be responsible for controlling the width of the non-linear jumps in hardness. As shown previously, hardness depends on the nature of the atomic bonding35,36. The bond strength is mainly driven by constituent elements and their properties, such as electronegativity, that control the electronic-structure behavior37. Thus, we anticipate a dependence of MPEA hardness on its overall electronic structure. Therefore, comparing electronic-structure behavior with hardness in the MPEAs should reveal contributions from electronic mechanism.

Hence, we have explored an MPEA system – AlxTiy(CrFeNi)1-x-y – that had the adequate microstructural complexity along with reported experimental hardness values over a range of compositions38,39. We have also investigated the HfxCoy(CrFeNi)1-x-y system, where Hf content is seen to affect the ordering process. Notably, Co has a room temperature crystal structure similar to Ti and is expected to form a solid solution with CrFeNi, while Hf is expected to exhibit a strong clustering effect. As such, we assessed HfxCoy(CrFeNi)1-x-y and compared the model predictions with the experimental measurements reported by Ma and Shek40.

The interest in AlxTiy(CrFeNi)1-x-y system stems from the role of Al in promoting B2 ordering in a number of systems3,34. Ti does not have as pronounced an effect on ordering as Al, but Ti-containing systems do exhibit a large number of intermetallics38,39,41. Figure 3a shows the contour plot of hardness predictions for the entire composition range of AlxTiy (CrFeNi)1-x-y system. An inset (in Fig. 3b) shows an expanded view, where measurements are available. Predicted and actual hardness values are compared (Fig. 3c) for five Tix(CrFeNi)1-x compositions (x = 0.0625–0.1666) studied by Gao et al.39 to investigate the effect of Ti addition. Both the measurements and the predicted values show a near-linear monotonic increase and there is an excellent agreement in general trends, although absolute values are underpredicted. To understand the underlying reason for the deviation of predicted hardness, we performed phase stability (Eform) analysis in Fig. 3d, and found that increasing Ti stabilizes the BCC structure, which mirrors the trends in hardness. This stabilization is also borne out by the experiments, where increased Ti content led to a reduction of the FCC phase fraction, as well as increase in the BCC phase fraction. Additionally, minor amounts of intermetallic phases begin to form (ε-phase, Ni3Ti at low Ti and the R-phase, Ni2.67Ti1.33 at higher Ti). Figure 1b indicates that the training dataset included only a very small amount of intermetallics (1) or multi-phase mixtures (14) – FCC + BCC + IM. Presumably, the data available was too sparse for greater accuracy. Enhanced Eform strongly correlates with charge sharing due to increased hybridization among constituent elements and suggests towards increased bond strength in Tix(CrFeNi)1-x. The bonding behavior is dependent on local environment; therefore, we hypothesize that it should directly impact the local electronic properties, such as short-range order. The SRO strength of Tix(CrFeNi)1-x, calculated using thermodynamic linear-response theory34, is shown in Fig. 3e. We found that the SRO of dominant pairs (Cr-Ni pair at x = 0; and Ti-Ni pair x = 0.0629-0.189) increases with increasing Ti. The stronger SRO also indicates increased concentration fluctuations, which directly correlates with stronger bonding character arising from increased hybridization (see Supplementary Fig. 8 & 9). This bonding/hybridization effect is an aspect that has not been considered in the ML model directly due to a paucity of data on which the model could be trained. Nonetheless, the model does include the formation enthalpy, as estimated by Miedema’s semi-empirical model as a feature, which is expected to correlate with bonding. At this stage, we note that Miedema’s model provides a reasonable agreement with experimental measurements of formation enthalpies but it does not seem to capture the effect of Ti adequately. As a result, the elemental descriptors for Ti in Miedema’s model must be adjusted and this is another potential source of error.

Fig. 3: ML predictions of hardness and corresponding formation enthalpy and SRO obtained from DFT calculations.
figure 3

a Predicted hardness contours for AlxTiy(CrFeNi)1-x-y. b Inset shows the hardness contours in Al-poor and Ti-poor regions, along with composition trajectories along which hardness measurements and predictions are compared. c Experimental and ML-predicted hardness for Tix(CrFeNi)1-x with a 90% prediction-interval (PI). d Formation energy of the BCC MPEA and e Pairwise SRO in Tix(CrFeNi)1-x. f Experimental and ML-predicted hardness for AlxTiy(CrFeNi)1-x-y with a 90% prediction-interval (PI). g Formation energy of the BCC solid solution and h Pair SRO and dominant ordering pairs in AlxTiy(CrFeNi)1-x-y.

As Al is added (with corresponding decrease in the amount of Ti) and we move from the quaternary TixCrFeNi to the quinary AlxTiy(CrFeNi)1-x-y, the accuracy of the model is seen to improve with excellent agreement between experimental measurements and ML predictions, as seen in Fig. 3f. Significantly, the ML model is able to predict the non-linear increases in hardness as a function of composition. A comparison with experiments38 and prior phase stability calculations18 indicates that Al addition results in a structural change, with the simple FCC structure transforming to a three-phase mixture (FCC + BCC + Intermetallic). We investigated this further here. We performed DFT calculations for phase stability4 (Fig. 3g) and short-range order (SRO) of MPEAs34,42, in particular the strength of dominant pairs (Fig. 3h). Our DFT calculations (which embody quantum mechanics) provide robust Eform prediction in MPEAs4,43. Clearly, the trends in Eform and SRO pair strength match with hardness in Fig. 3f, i.e., our model is able to capture the electronic-structure-driven features yielding a non-linear change in hardness. The possible reason for small or no error in hardness of AlxTiy(CrFeNi)1-x-y comes from the fact that SRO contribution is very weak compared to CrFeNiTix, and the relative amount of Ti is lower in comparison to the quaternary alloy; hence, errors associated with Miedema’s calculation of formation enthalpies is minimized.

The hardness contours for the Hf-Co-(CrFeNi) system are shown in Fig. 4a, where Fig. 4b gives an expanded view of the region investigated experimentally by Ma and Shek40. The model predicts a strong dependence of hardness on the Hf content, as the predicted hardness contours in Fig. 4a are almost entirely dictated by the amount of Hf present in the system. The predictions accurately follow the experimental hardness40 values shown in Fig. 4c. The hardness variation in this system is relatively linear. The Eform in Fig. 4d clearly shows that Hf destabilizes the FCC phase. Stability predictions show good agreement with experiments as the hypoeutectic microstructures and Laves phases increase with increasing %Hf 40. With the addition of Hf, the Hfx(CoCrFeNi)1-x MPEAs transformed from a single-phase FCC structure at x = 0 to (C15 Laves + FCC phases) at x = 0.09. Hypoeutectic microstructures were obtained from x = 0.024–0.069 and a fully eutectic structure with lamellas of FCC and C15 Laves phase was found at x = 0.0940. This result raises a question whether the intermetallic phase contributes significantly to the hardness in the hyper-eutectic region. The other possibility would be enhanced contributions to hardness due to ordering or clustering in the solid solution phase itself. To explore this aspect, we calculated the SRO pair strength using DFT. The SRO pair strength in Fig. 4e is even more interesting as Hfx(CoCrFeNi)1-x at x = 0 shows weak ordering behavior with SRO pair strength of 2.51 Laue (Cr-Ni pair) but adding Hf (x > 0) promotes clustering (clustering is often related to unstable density of states at the Fermi-level43, see Supplementary Fig. 10). The clustering strength of dominant Hf-Cr pair (21–29 Laue) shows monotonic increase with increasing %Hf. Clustering in Hf-Cr pairs suggests that Hf does not thermodynamically prefer to sit around Cr, i.e., Hf promotes phase separation. In some cases, the presence of multiple phases in MPEAs improves hardness as the multiple phases with different grain sizes and grain orientation can strengthen the alloy. Thus, it is possible that the Hf-Cr clustering drives the eutectic phase formation and eventually contributes to enhanced hardness through the formation of second phase intermetallics.

Fig. 4: ML predictions of hardness and corresponding DFT-predicted formation enthalpy and SRO.
figure 4

a Predicted hardness contours for HfxCoy(CrFeNi)1-x-y system, with b showing an expanded view of the compositions from experiments by Ma and Shek40. c Experimental and ML-predicted hardness for Hfx(CoCrFeNi)1-x alloys with a 90% prediction-interval (PI). d DFT formation energies showing the relative stabilities of the BCC and FCC structures. e DFT SRO and the main ordering and clustering pairs present.

Experimental validation in the Al-Ti-Zr-Hf alloy system

While the AlxTiy(CrFeNi)1-x-y and HfxCoy(CrFeNi)1-x-y systems provided a study in contrast displaying ordering and clustering tendencies, respectively, in neither of these two systems were defects (like vacancies) noted to play a prominent role. However, we recently discovered the formation of a vacancy-stabilized phase in the Alx(TiZrHf)1-x system, where values of x > 0.125 promote the formation of a new type gamma-brass (4-vacancy ordered) phase3. Such phases were absent from the training dataset and, therefore, the ML model is not necessarily expected to give accurate predictions. Nonetheless, this creates an opportunity for understanding whether the vacancies play a significant role in the hardness. Furthermore, it should be noted that the crystal structure of ternary TiZrHf is hcp, which is also absent from the training dataset. Hence, we choose to measure the hardness in the Alx(TiZrHf)1-x system to test the limits of the ML model.

The model was observed to underpredict the hardness by a maximum of 18% across the compositions studied. Nonetheless, the model was able to capture the trends in hardness quite accurately. Figure 5a shows the predicted hardness contours in the AlxTiy(ZrHf)1-x-y system, while Fig. 5b shows the comparison of predicted and experimentally measured hardness. The values are significantly under-predicted by the ML model, which is indicative that the quantitative predictions by model may be limited in cases where significant vacancy-ordering or occurrence of the hcp structure occurs. This in itself is not a surprising result since the MPEAs database used for training the model consists largely of cubic solid solution alloys with few multi-phase alloys with a constituent intermetallic phase. It is, however, interesting to observe that the model still predicts the general trends in hardness.

Fig. 5: ML hardness predictions and corresponding DFT-predicted formation enthalpy and SRO.
figure 5

a Predicted hardness contours for AlxTiy(ZrHf)1-x-y system. b Experimental and ML-predicted hardness for Alx(TiZrHf)1-x alloys with a 90% prediction-interval (PI). c DFT energy calculations shows the relative stabilities of the BCC and HCP structures. d DFT SRO pairs show the ordering and clustering tendencies. e DFT and XRD densities compared for the HCP, BCC (vacancy-stabilized) and disordered BCC (without vacancy) phases.

In Fig. 5c, we plot alloy phase stability and overall, a monotonous change in Eform was found except a jump at Al = 0.0749 atomic faction. The similar jump was also observed in hardness in Fig. 5b, i.e., the ML model is able to capture the electronic and size effect through valence-electron count and atomic-radii, respectively. The reason for jump in hardness is obvious as alloy undergoes a phase transformation (hcp → bcc) at Al = 0.0749; however, it still does not explain why ML model underestimates the hardness by 15–20%. Similar to Tix(CrFeNi)1-x, the enhanced Eform in Alx(TiZrHf)1-x can well correlate to improved bond strength that can directly impact the local properties, such as short-range order. In Fig. 5d, we plot DFT-derived SRO pair strength of dominant pairs in BCC phase. At Al = 0, the clustering in Ti-Hf pairs is the dominant mode, which suggests that Ti and Hf want to phase separate and form a two-phase region. Moreover, adding Al stabilizes the bcc phase, as shown in Fig. 5c. For Al > 0, the Alx(TiZrHf)1-x shows ordering and Al-Hf is the dominant SRO pair. In going from clustering mode for Al = 0 to ordering mode for Al > 0 in Fig. 5d, a jump in SRO pair strength and hardness is seen in Fig. 5b at same Al atomic fraction. Recently, Singh et al.3 studied Alx(TiZrHf)1-x MPEAs and found that vacancies stabilize the BCC phase at higher Al content, whereas competing (BCC/HCP) phases were found in Al-poor region. The abrupt change in Alx(TiZrHf)1-x densities with 7.45 at.% vacancies matches with X-ray measured density in Fig. 5e3. Clearly, the ANN model is able to capture the trends of experiments and electronic features but underpredicts the hardness. In Alx(TiZrHf)1-x, both SRO and vacancies have significant contribution on alloy properties, however, the ML model was not trained with these quantities.

Interrogating the ML decision-making process

The results presented thus far (Figs. 35) highlight the accuracy of ML model along with its ability to predict non-linear hardness variations associated with phase transitions in a variety of MPEAs. But this still leaves two fundamental questions. What is the decision-making process followed by the ML model? Are these decisions purely statistical in nature or do they capture the fundamental physics that can lead to insights into the physical origins of hardness? To address these questions, we have probed the nature of the fit by performing an analysis based on the local partial dependencies and stimulus-response characteristics of the model that exposes the exact contribution of each feature towards the predicted hardness over continuous composition variations. Notably, our methodology does not just rank the features based on their perceived or indicative importance, but gives directly the exact quantitative contribution of each feature towards decision-making. Also, the integration of feature contributions with compositional-stimulus and model-response study ensures that the causality for model understanding is not some arbitrary change in feature values but instead the alloy composition which is the direct point-of-control in alloy design. We have considered a differential form of the neural network i.e., if the hardness (HVx), at any composition x, is a function of n features (\(F_1^x,F_2^x, \ldots ,F_n^x\)), then

$$d\left( {{{{\mathrm{HV}}}}^x} \right) = \left( {\frac{{\partial \left( {{{{\mathrm{HV}}}}^x} \right)}}{{\partial F_1^x}}} \right)dF_1^x + \left( {\frac{{\partial \left( {{{{\mathrm{HV}}}}^x} \right)}}{{\partial F_2^x}}} \right)dF_2^x + \ldots + \left( {\frac{{\partial \left( {{{{\mathrm{HV}}}}^x} \right)}}{{\partial F_n^x}}} \right)dF_n^x$$
(1)

The neural network is treated as a non-linear function that scales, e.g., ten alloy features (\(F_1^x,F_2^x, \ldots ,F_{10}^x\)) to predict a hardness value (HVx) at given alloy composition (x). The function that represents decision-making process of a neural network is extremely complex as it involves thousands of scaling parameters along with a large number of non-linear activation units (both sigmoid and rectified linear units). Further, we have used an ensemble approach, wherein 165 independent neural networks contribute to the final decision-making i.e., 165 such functions contribute to a single prediction. Thus, a direct understanding of the decision-making process is almost impossible and requires formulation of approaches that can indirectly obtain some meaningful insights through systematic stimulus-response observations. An important point to keep in mind is that any change in prediction (HVx) with respect to a feature (\(F_i^x\)) depends on the values of all the other features. Thus, whenever we want to probe the stimulus-response characteristics of our model, we will have to choose a baseline composition about which the change in hardness is calculated.

The approach detailed above is exemplified here for Alx(CrFeNi)1-x, see Fig. 6a. To systematically study contribution of each feature towards hardness prediction, we start by taking x = 0 as the baseline composition and calculate all the normalized features (\(F_1^{x = 0},F_2^{x = 0}, \ldots ,F_{10}^{x = 0}\)) at this composition. Predicted hardness (HVx=0) is obtained from ML model and this acts as the baseline hardness value. Now, as stimulus, a small amount of Al (Δx = 0.01) is added to the alloy in steps (n) and the corresponding hardness contribution of each feature i.e., \(\Delta ({{{\mathrm{HV}}}})_{x \to x + 0.01}^{F_i}\) is calculated for each step using Eq. (1). Thus, after step N (i.e., at x = 0.01*N), the cumulative hardness contribution of any feature Fi, with respect to baselines hardness value, can be expressed as:

$${{{\mathrm{HV}}}}_N^{F_i} = \mathop {\sum }\limits_{n = 1}^N {{\Delta }}({{{\mathrm{HV}}}})_{x:0 \to 0.01 \ast n}^{F_i}$$
(2)

The overall hardness after any step N (i.e., x = 0.01*N) can be expressed as the sum of baseline hardness (HVx = 0) and all the feature contributions (\({{{\mathrm{HV}}}}_N^{F_i};{{{\mathrm{i}}}} = 1,2, \ldots ,10\)).

$$Overall\,Hardness = {{{\mathrm{HV}}}}^{x = 0} + \mathop {\sum }\limits_{i = 1}^{10} {{{\mathrm{HV}}}}_N^{F_i}$$
(3)

The hardness contribution of each feature and overall hardness variation for Alx(CrFeNi)1-x, Tix(CrFeNi)1-x, Hfx(CoCrFeNi)1-x, and Alx(TiZrHf)1-x MPEAs is shown in Fig. 6. The reasonably good match between the predicted (ML) and calculated (Eq. 3) hardness establishes the accuracy of our methodology (see Supplementary Fig. 4) indicating that the calculated feature contributions are an accurate representation of the decision-making process followed by ML model. The non-linear decision making of the ML model is evident through the non-linear contribution of select features to hardness. Additionally, it appears that the origin of non-linear response arises due to a combination of features, some of which result in near-linear response while the others serve to classify the structure of the system in almost a step-like manner, which introduces the non-linearity. For example, the VEC, that acts as a classifier for phase selection (FCC and/or BCC), seems to plateau over a range where further variation in VEC does not affect structural changes.

Fig. 6: Visualizing the decision-making process of ML model.
figure 6

Contribution of different features toward ML hardness prediction in: a Alx(CrFeNi)1-x, b Tix(CrFeNi)1-x, c Hfx(CoCrFeNi)1-x, and d Alx(TiZrHf)1-x alloy systems. At any composition (x), the cumulative hardness contribution of each feature is equal to the vertical distance between that feature contribution plot and the baseline hardness value (calculated at x = 0). At any x, the summation of baseline hardness and all feature contributions will result in overall hardness. e Feature variations with respect to composition for alloy systems shown in ad. Normalized feature values have been plotted here. Feature notations: VEC-Valence electron concentration, δcov-asymmetry in covalent radius, ρ-average density, δE-asymmetry in Young’s modulus, δG-asymmetry in shear modulus, ΔHchem-chemical enthalpy of mixing, ΔHel-elastic enthalpy of mixing. Features that had negligible contribution to hardness prediction over these composition ranges have not been included in the plots.

Discussion

The explainability analysis of the decision-making process of ML model, in tandem with first-principles DFT calculations, conclusively shows that the model is cognizant of the underlying physics such as relative phase stability, phase transitions, SRO and solid-solution strengthening. There are four key insights obtained from the breakdown of ML model. Firstly, the hardness contribution of VEC is strikingly different in Alx(CrFeNi)1-x, Tix(CrFeNi)1-x and Hfx(CoCrFeNi)1-x MPEAs even though the VEC varies almost identically. The ML model gives significant importance to VEC in Alx(CrFeNi)1-x in the composition range where FCC→ BCC phase transition is expected based on experimental observations38,44. In Tix(CrFeNi)1-x, VEC contribution is lower, in line with the lower BCC stability obtained from Ti addition as compared to Al, as seen in Fig. 3d, g. In contrast, addition of Hf in Hfx(CoCrFeNi)1-x does not induce this FCC→ BCC transition40 and the VEC contribution towards predicted hardness in ML model is also negligible. This is significant as FCC→ BCC transitions in MPEAs have been linked to VEC in past4,45,46 and the hardness of BCC structures is significantly higher; and thus, it appears that the ML model has successfully learned these nuances that are critical for accurate hardness prediction.

Secondly, the contributions of chemical mixing enthalpy (ΔHchem) and asymmetry in covalent radius (δcov) toward hardness prediction are quite significant and follow each other closely (except for Alx(CrFeNi)1-x where the value of δcov changes only slightly with Al addition, thereby resulting in its negligible contribution to hardness). The hardness contributions of ΔHchem and δcov in ML model are strongly linked to the ordering tendencies, as they are negligible at low SRO values but kick in suddenly as SRO increases beyond ~4–5 Laue; this happens at ~10 at.% Al in Alx(CrFeNi)1-x, ~4 at.% Ti in Tix(CrFeNi)1-x and ~3 at.% Al in Alx(TiZrHf)1-x, as seen from Figs. 3e, h, 5d and 6a, b, d. Also, while both ΔHchem and δcov contributions follow ordering tendencies, δcov appears to be considerably more dominant where intermetallic formation occurs, as seen for Tix(CrFeNi)1-x and Hfx(CoCrFeNi)1-x systems in Fig. 6b, c, both of which exhibit strong intermetallic formation39,40. The contributions to hardness from ΔHchem and δcov also appear to be sensitive to phase transformations as their slopes change significantly wherever phase transitions appear. In Alx(TiZrHf)1-x, Fig. 6d, this coincides with HCP→BCC transition as Al increases from 7.7 to 14.2 at.%, as seen in Fig. 5b, and in Tix(CrFeNi)1-x, the two non-linear jumps in ΔHchem and δcov hardness contributions, as seen in Fig. 6b, coincide with the formation of ε-phase (Ni3Ti, HCP) at low Ti concentrations and a metastable R-phase (Ni2.67Ti1.33) at higher concentrations39. This insight is significant as the short-range order and the nature of metallic bonds have been linked to intermetallic formation and mechanical properties in previous studies36,47. The ML model appears to be able to capture these dependencies quite accurately through variations in ΔHchem and δcov.

The third insight is from the hardness contributions from asymmetry in Young’s Modulus (δE), which appear to be more direct wherein a larger increase in δE manifests as a more significant increase in hardness; as can be seen for Alx(CrFeNi)1-x which shows the highest increase in δE among the systems studied and consequently exhibits highest contribution of δE towards ML predicted hardness. But, note that the hardness contribution of δE is not linear with respect to feature value and appears to follow similar trends as ΔHchem and δcov, which are linked to ordering and phase transformations. This is along expected lines, because the Young’s modulus can be calculated in principle from the interatomic potential-energy (U) vs. separation (r) curve, where the force \(F = - \partial U/\partial r\). At constant pressure and negligible volume changes, \(\delta U \approx dH\). A larger δE would indicate the presence of a pair of atoms where one species has a higher bond strength (and hence higher stiffness or larger Young’s modulus) and the other has a lower bond strength. It has been observed empirically for minerals that higher is the localization of the electron density, higher is the bond strength. In Miedema’s model, the value of ΔHchem is a function of the difference in the Wigner-Seitz cell boundary electron density and will likely predict a higher value of ΔHchem for the atomic species pair described above.

Finally, the elastic mixing enthalpy (ΔHel) increases monotonically with respect to composition (x) in all systems studied here, but its contribution to hardness prediction shows striking differences and shifts from negative to negligible to strongly positive contribution as we move from Alx(CrFeNi)1-x to Tix(CrFeNi)1-x to Hfx(CoCrFeNi)1-x system. Addition of Hf to CoCrFeNi causes a significant increase in ΔHel and the Hf-Cr pair has a strong clustering tendency, as shown in Fig. 4e, indicating that Hf does not prefer sitting next to Cr. Recently, Roy et al.48 have demonstrated that lattice distortion can be used for estimating solid-solution hardening in high-entropy alloys, where the solute-atom dislocation interaction energy was calculated as a function of shear modulus, solute-atom–dislocation-core distance and local strain. The distance from dislocation core is influenced by atomic size (i.e., molar volume) with smaller atoms segregating easily to dislocation cores and the local strain is influenced by the radius asymmetry. ΔHel captures these nuances to some extent as it reflects both the local distortion and the bonding characteristics. Figure 6c shows that the hardness increase predicted by ML model at low Hf concentration (<3 at.%) originates almost entirely from ΔHel contribution, indicating that the ML model is able to correctly predict the hardness variations accompanying phase separation processes driven by a combination of weak ordering parameter and high elastic strain energy.

In summary, our machine-learning (ML) framework identifies in the decision-making process the essential feature sets, non-linear responses, and the underlying correlated physics – here, for hardness in complex multi-principle-element alloys (MPEAs). Our ML model utilizes an ensemble of 165 independent neural networks that are driven by physical features to predict the hardness of MPEAs; wherein each network is trained on a diverse dataset using elemental and alloying descriptors. The model successfully predicts hardness variations in a wide variety of MPEAs and closely follows the ordering behaviour and phase transitions observed from first-principles calculations. The decoding of ML model, achieved through calculation of exact hardness contributions of different features, indicates that the underlying physics is being captured through predictors of atomic-interactions (such as formation enthalpy and bonding characteristics) and local-lattice distortion (such as size-asymmetry, elastic-enthalpy and strain-energy) along with a phase classifier (VEC). Our proposed ML framework presents a promising way of efficiently exploring wide compositional spaces in MPEAs. The methodology developed for decoding the ML model can be extended to any ML model, irrespective of algorithm used, and can thus prove immensely useful in bringing out fundamental insights from both existing as well as future ML models.

While the ML model is generally successful, it appears that small discrepancies with the experimental measurements stem from: (i) discrepancies in experimental and calculated enthalpies that can become significant for systems containing elements prone to multiple oxidation states such as Ti – an artifact that is carried over from Miedema’s approach, and (ii) lack of explicit information on crystal structure and SRO parameters, neither of which is known without a priori experiments and/or DFT calculations. Improvements in the proposed model will, therefore, require accurate prediction of short-range order parameters and crystal structures from elemental properties and improved description of thermodynamic interactions. For example, it is well known that the short-range order in disordered alloys may affect mechanical response49, therefore, it is important for future models to effectively capture such effects on material properties.

Methods

Feature engineering and model architecture

We explored a set of 22 features comprising of 18 elemental and 4 alloying descriptors. The features have been classified as-such to highlight two different characteristics of an alloy: (a) the elemental descriptors represent alloy properties that may be a direct extension of the properties possessed by component elements, and (b) the alloying descriptors represent changes that occur when different elements interact with each other during alloy formation. Supplementary Fig. 2 lists these features along with the expressions used to calculate them and the Pearson’s correlation coefficient for each feature as a measure of its linear association with hardness. The elemental descriptors (S. No. 1–18) are characteristic of the elemental composition of the alloy and were calculated as either composition-weighted average or as an asymmetry-measure over the component elements. The chemical and elastic enthalpies of mixing associated with alloy formation were calculated using Miedema’s model50,51,52,53. The configurational entropy depends only on the relative amount of constituents while being independent of their identity, and has been shown to be the primary stabilizing factor for disordered phases in HEAs54. YZ parameter represents a thermodynamic parameter developed by Yang and Zhang55 that was shown to be a good descriptor of the phases present in MPEAs. Supplementary Fig. 3 lists the ten shortlisted features that were used for training the ANNs and also visualizes the variation of hardness with each feature along with the linear regression lines and R2 values. For a detailed discussion on feature selection, readers may refer to Supplementary Methods.

Feed-forward back-propagation ANNs with twelve different architectures (Supplementary Table 3) were trained using ten feature-sets (Supplementary Table 4) with Vickers hardness as the target value. For three feature-sets, a multiple linear regression was also performed to act as a baseline measure of ANN performance. These ANNs (ANN1, ANN2, …, ANN12) were used with the aim to ascertain if the depth of neural network will be significant in controlling the model performance. This was of specific interest since a non-linear relationship was observed between the features and hardness, and deep neural networks have been conclusively shown to perform better at learning non-linear relationships56. The number of layers in the neural networks used in this work range from two in ANN1 to seven in ANN12.

Since the hardness prediction is a regression problem, the output layer in all ANN architectures employs rectified linear unit (ReLU) activation function. For the hidden layers, we have used a combination of sigmoid and ReLU activation functions. The hardness of an alloy is a strong function of its crystal structure and the same can also be seen in Fig. 2b where the mean hardness value increases as we move from FCC to BCC crystal structure and the presence of intermetallic phases also hardens the alloy. This strong dependence indicates that the neural network may make accurate hardness predictions only if it is capable of classifying the alloy crystal structure based on input features. Past research has shown that it is possible to predict the phases present in MPEAs using the thermodynamic descriptors included in our feature sets46,51,57. Thus, we believe that hidden layers with sigmoid activation functions may be able to learn such correlations without explicitly training the network for crystal structure classification. This belief was strengthened by test runs wherein ANNs with combination of ReLU and sigmoid hidden layers performed considerably better than those with only ReLU hidden layers.

Machine learning model training

The standard practices followed while training all the models have been elucidated here. To ensure equal importance to all features during the training and for good convergence, the range of each feature was rescaled to [0, 1] range using min-max normalization. For every training process, five-fold cross validation was used and thus the training-set to testing-set size ratio was always 80:20. Each cross validation was independent of the others i.e., trained weights or initialized parameters were not carried forward. All performance results are from models trained on the randomized data with same seed to ensure uniformity of training/test sets between all models. This allows fair comparison of performance between models while ruling out any bias due to dataset. Python 3.8.1 and associated open-source libraries have been used for developing all the models reported here. We have used pandas|1.0.3 and numpy|1.18.2 for data processing, scikit-learn|0.22.2 and statsmodels|0.11.1 for linear regression and statistical analysis, and tensorflow|2.2.0rc2 and keras|2.3.1 for implementation of ANNs. All ANNs were trained using mean absolute error (MAE) loss function and Adam optimizer with a learning rate of 0.02. The performance of each ANN was calculated using only the cross-validation results, i.e., for each alloy, only that prediction was considered when it was part of the validation set and thus did not participate in the training process. For any given training process, the predictions of each validation set were recorded and statistical analysis (R2, RMSE, MAE and average percentage error) was done on the combined predictions from all validation sets. Averaging of statistical scores from validation sets was not done as it would bring in bias by giving more/less importance to a particular validation set. The cross-validation performance for each trained model is detailed in Supplementary Table 2 and the effect of model architecture and feature set has been visualized in Supplementary Fig. 5 & 6.

Predictions using ensemble model

The final ML model consists of an ensemble of 165 trained ANN models which were selected based on their cross-validation performance scores. A model averaging technique is used wherein the final prediction is calculated as an average of the 165 predictions (one from each trained model present in the ensemble). The ML model developed here requires only a single user input viz. name of the alloy (for e.g., AlCo2CrFe0.5Ni) for predicting the hardness as the composition and features required by each ANN are calculated automatically by supporting scripts.

We developed a methodology for generating hardness predictions (ternary contour plots in Figs. 3, 4 & 5) over vast compositional spaces by reducing the compositional degree of freedom in MPEAs through clubbing of elements into binary or ternary components. For example, in Fig. 3a, AlTiCrFeNi system is broken into three components – Al, Ti and an equiatomic ternary (CrFeNi). This allows representation of MPEAs composition space on a ternary plot. The first step is to create alloy compositions spaced by 1 atomic %, thereby leading to 5151 unique compositions. The ML model is used to predict the hardness at each of these compositions and the results are plotted as a predicted hardness contour on a ternary plot.

Density functional theory calculations

The density-functional theory (DFT) based Korringa-Kohn-Rostoker (KKR) Greens’ function method combined with the coherent potential approximation (CPA) was used to calculate total energy of arbitrary solid-solution alloys33. The KKR-CPA performs configurational averaging simultaneously with charge self-consistency, which properly includes alloy-induced Friedel impurity-charge screening. For DFT, we used the Perdew-Burke-Ernzerhof (PBE) exchange-correlation functional for solids58. We employed a site-centered, spherical-harmonic basis that includes s, p, d, and f -orbital symmetries (i.e., lmax = 3) in all calculations. The self-consistent charge density was obtained from the Green’s function using a complex-energy contour integration and Gauss-Laguerre quadrature33 (with 24-point semi-circular mesh enclosing the bottom to the top of the valence states). An equally spaced k-space mesh of 24 × 24 × 24 was used for Brillouin zone integrations. The core electrons were treated fully relativistically (includes spin-orbit coupling), while semi-core/valence electrons were treated scalar relativistically (i.e., neglecting spin-orbit coupling).

Linear-response theory for short-range order

Thermodynamic linear-response theory was used to calculated Warren-Cowley SRO parameters34, i.e., \(\alpha _{\mu \nu }^{ij}\), where μv denote elements pairs in the alloys and ij are lattice sites in the crystal. For a homogeneous solid-solution alloy with a set of compositions \(\{ c_\mu ^i\}\), the SRO dictates pair probabilities \(P_{\mu \nu }^{ij} = c_\mu ^ic_\nu ^j\left( {1 - \alpha _{\mu \nu }^{ij}} \right)\) that potentially can affect chemical short-range order4, as well as mechanical behavior47. For a dominant k-space wavevector \({{{\mathbf{k}}}} = {{{\mathbf{k}}}}_{{{\boldsymbol{o}}}}\), the SRO diverges at the spinodal temperature (Tsp) due to absolute instability in the correlated fluctuations, i.e., \({{{\mathbf{\alpha }}}}_{\mu \nu }^{ - 1;{{{\mathrm{s}}}},{{{\mathrm{s}}}}\prime }\)(\({{{\mathbf{k}}}}_{{{\boldsymbol{o}}}}\);Tsp) = 0 (where s,s’ denote the independent sublattice in the structure), and provides an estimate for SRO and the order-disorder or miscibility temperature34. This first-principles theory of SRO is based on the electronic structure of the alloy; therefore, it directly embodies underlying electronic and alloying effects (like band-filling, hybridization, atomic-size, or Fermi-surface nesting43).

Experimental method

The Alx(TiZrHf)1-x with x = 0–0.25 atomic fraction was synthesized by arc-melting on a water-cooled copper hearth in an ultra-high purity argon atmosphere using elemental chunks (Alfa Aesar, purity > 99.8%). Samples were melted, flipped and re-melted multiple times to ensure homogeneity. X-ray diffraction and subsequent scanning electron microscopy (SEM) showed the formation of a homogeneous alloy (Supplementary Fig. 7). The Vickers microhardness testing was performed on a Wilson Instruments Tukon hardness tester using an indenter with a square pyramid shape. The micro-hardness tests employed a constant 500 g load with a hold time of 10 s. The indentation size was measured using an optical microscope, and a look up table is used to determine the Vickers hardness value. The samples that were prepared for SEM analysis (polished with 1 micron diamond slurry) were also used for the hardness test. Micro hardness tests were performed 1 mm apart with 3 test measurements on each sample.