Introduction

To overcome twenty-first century grand engineering challenges, the investigation of unexplored central region of the ternary phase diagram is indispensable, which occupies the complex multi-component alloys or popularly known high-entropy alloys (HEAs)1,2. HEAs have ample compositional space and possess exceptional properties such as excellent mechanical performance at high temperatures, exceptional ductility, high fracture toughness at cryogenic temperatures, bio-compatibility, high conductivity, excellent catalytic and magnetic properties which means that one or more than one HEA can potentially offer a solution for most engineering problems concerning materials3,4,5.

The combination theory suggests a large compositional space (nearly ~ 108 types of HEAs can be developed from about 64 elements in the periodic table) in the central region of the ternary phase diagram6,7. However, it is only since 2004 when HEAs were first discovered, and since then, the profound study is compelling to accelerate the pace of discovery of novel HEAs. The search for new HEAs is strenuous, because each element, its weight percentage, various synthesis routes (vacuum arc melting, powder metallurgy, selective laser melting, additive manufacturing and others) and their processing parameters (cooling rate, processing time, temperature, vacuum/gas) can affect the phase in which a high-entropy alloy stabilises8,9. The enormity of composition-processing-structure-performance space makes the searches based on the traditional trial-and-error approach extremely difficult and time-consuming.

Traditionally, new high-entropy alloys are recognised using empirical rules, for instance, a series of TixNbMoTaW (the molar ratio x = 0, 0.25, 0.5, 0.75 and 1) refractory high-entropy alloys were developed to find an alloy that can surpass the elevated temperature properties of Ni-based superalloys for further improvement of the turbine efficiency8,10.

Computational tools can fast predict materials, which are enabling rapid advances in materials discovery and beyond through initiatives such as Materials-4.011,12. Hitherto methods such as ab-initio calculations13,14, Monte Carlo simulation15, and CALPHAD16,17 are used in the arena of material prediction for HEAs. Molecular dynamics (MD) & Density functional theory (DFT) methods are two other choices for studying the mechanical behaviour of materials. As with other techniques, these methods have limitations, for instance, DFT requires a large computational power and is limited to few atoms, while MD suffers limitations arising from force-field or inter-atomic potential function to capture the nature of atomic bonding (cocktail effect, lattice distortion, configurational entropy and sluggish diffusion) reported experimentally in HEAs9.

Machine learning in particular is on the rise of prominence in the last decade as one can simply make use of the available dataset to discover a general trend18. Machine learning (ML) is a subset of artificial intelligence (a technique that enables machines to apply intelligence akin to a human brain), also known as a data-driven approach, which relies on pattern recognition from a given set of data19,20. The accuracy of results predicted by ML depends on the extent of data fed to the ML algorithm to train the system21. A significant surge in the use of ML in materials research is evidence of the promise this technique offers as explained by the other researchers22. Various ML algorithms such as the Artificial neural network, Convolutional neural network, Random Forest, Support vector machine, Decision trees, Gradient boosting, K-nearest neighbour, XGBoost, logistic regression and Naïve Bayes are employed over the past few years in predicting various phases of HEAs.

In all these studies, the known empirical rules for forming solid solution and phase determination have been applied, which include parameters such as atomic size difference (δ), electronegativity difference (∆χ), valence electron concentration (VEC), thermodynamic rule (mixing enthalpy (∆Hmix) and mixing entropy (∆Smix)), and others (Ω-parameter, ϕ‐parameter, and γ-parameter)23. These terms describe the associated chemistry underlying the formation of HEAs and provide an insight into phase prediction, which can be mathematically stated as24:

$${\text{VEC}} = \mathop \sum \limits_{i = 1}^{n} \left( {c_{i} {\text{VEC}}_{i} } \right)$$
(1)
$$\Delta {\text{S}}_{{{\text{mix}}}} = - R\mathop \sum \limits_{i = 1}^{n} \left( {c_{i} {\text{ln}}c_{i} } \right)$$
(2)
$$\Delta {\text{H}}_{{{\text{mix}}}} = \mathop \sum \limits_{i = 1, i < j}^{n} \left( {4 H_{ij} c_{i} c_{j} } \right)$$
(3)
$$\Delta \chi = \sqrt {\mathop \sum \limits_{i = 1}^{n} c_{i} } \left( {x_{i} - \overline{x}} \right)^{2}$$
(4)
$$\delta = \sqrt {\mathop \sum \limits_{i = 1}^{n} c_{i} } \left( {1 - r_{i} /\overline{r}} \right)^{2}$$
(5)

where, ci is the atomic percentage of the ith element, n represents the total number of metallic elements in a high-entropy alloy. VECi is the valence electron concentration of the ith element, R is the gas constant, Hij is the mixing enthalpy for the atomic pairs. xi and x̄ are the Pauling electronegativity and averaged Pauling electronegativity, respectively, ri is the atomic radius of the ith element, and \(\overline{r}\) is the average atomic radius. Note that the actual value of δ was multiplied by numerical factor 100 for better clarity. Corroborating these parameters with the historical data have started to gain prominence25,26 leading to the emergence of the use of ML to significantly identify, approximate and explain the structure–property relationships in HEAs in a cost-effective manner12,27. In the literature concerning phase prediction of HEAs using ML algorithms, no study can be seen that targets one particular synthesis route to extract the data reliably from experimental studies which can help avoid the spurious effect of an alternative synthesis routes on the resulting phase. For example, Bakr et al.28 used neural network on 775 samples of HEAs synthesized from mixed manufacturing routes (Arc-melting, sintering, SLM, and others) and obtained 93.4% accuracy in predicting the existence of different phases (AM, BCC, FCC, and IM). Their study did not consider the effect of manufacturing method on the resulting phase of HEAs.

Furthermore, in an attempt to balance out the majority and minority class of an imbalanced dataset, various studies have exercised over-sampling and under-sampling methods. This has been done either by supplementing the synthetically generated data to remaining classes for making it equal to the majority class in case of over-sampling method or by subtracting the data from other classes for making it equal to the minority class in case of under-sampling method. Some studies have also utilised generative adversarial network (GAN) for generating synthetic data to avoid the biasness in the dataset. However, whether an alloy may be called as an HEA is controversial. This became the primary basis for our investigation as we believe that synthetic data is not comparable fully with the experimental data and cannot be considered prudent.

In this paper, we formulate the research objectives keeping in mind the current research gaps in the extant literature as below:

  • Consolidate the scattered data on HEA synthesis obtained specifically through melting and casting routes such as: induction melting/induction levitation melting/ vacuum induction melting, arc melting/smelting and casting, arc melting + suction casting, electric/vacuum arc melting followed by suction casting techniques and to use this data as a fresh/new dataset for machine learning predictions. As opposed to previously published studies, the dataset used here included ternary, quaternary, quinary, and other alloys with more constituent elements making the algorithm ultra-robust, while targeting a synthesis route (melting + casting) that yields consistent phases during repeat experiments. This helped us avoid the spurious effect when combining the data from different synthesis routes on the resulting phase of a HEA. The dataset we tested was carefully screened from various experimental papers concerning synthesis of 3d-transition metals HEAs, refractory metals HEAs, HEA brasses and bronze, low-density HEAs, and some precious metal HEAs.

  • To use a variety of available machine learning algorithms in their vanilla form (base models) such as K-nearest neighbours (V-KNN), support vector machine (V-SVM), decision tree classifier (V-DTC), random forest classifier (V-RFC), and XGBoost (V-XGB) to obtain phase prediction or to classify the phases of new HEAs into solid-solutions (FCC, BCC, FCC + BCC) or mixture of intermetallic phases (MIP) with the view to compare and contrast the robustness of each ML model based on various alternative evaluation metrics in case of imbalanced data, where accuracy percentage can be a misleading indicator.

  • Whether synthetic data augmentation is reliable in predicting complex alloys such as HEAs? In testing this fact, we compared the vanilla RFC (V-RFC) model for 1200 original datasets with SMOTE-Tomek links augmented RFC (ST-RFC) model for 192 new datasets and in total 1392 datasets (1200 original + 192 generated = 1392).

  • To synthesise HEA based on our ML predictions for proving the need to eliminate computationally/cost intensive approaches such as ThermoCalc, DFT and ab-initio methods in predicting phase of new HEAs.

Research methodology

Depending on the chemical nature of the constituting elements, HEAs can be classified into five main subfamilies: (i) 3d transition metal high-entropy alloys (3d TM HEAs) having Fe, Ni, Co, Mn, Ti and Cr typically exhibiting face-centered cubic (FCC) solid solutions; (ii) refractory high entropy alloys (RHEAs) constituted by elements of the groups IVB, VB and VIB exhibiting body-centered cubic (BCC) solid solutions; (iii) low-density high-entropy alloys, constituted of light elements like Al, Be, Li, Mg, Ti, Sc, typically presenting hexagonal closed-packed (HCP); and (iv) HEAs constituted by at least four of the lanthanide elements, also exhibiting HCP solid solutions; and (v) other HEAs, exhibiting the formation of multiple chemically disordered solid solutions (with FCC, BCC, or HCP lattice structures), ordered phases as B2 and L21, as well as different intermetallics (such as the σ, μ, C14, C15, and C36 Laves phases, among others)29. It suggests that a very scant number of HEAs have been discovered so far. It is timely to unearth the unexplored compositionally concentrated solid solution alloys at a faster pace to develop novel solutions for various engineering problems.

HEAs emerged in about 2004 and currently a lot of work is ongoing on their developments. There are however open questions such as what constitutes an HEA. According to Miracle and Senkov30, the term HEAs refers to a single-phase solid-solution prepared by controlling the configurational entropy, which limits the objective of exploring the vast compositional space of central region of hyper-dimensional phase diagram. On the other hand, terms such as compositionally complex alloys (CCAs) or multiprincipal element alloys (MPEAs) evokes the vastness of composition space, without concerning the types of phases present or the magnitude of configuration entropy. Figure 1a demonstrates the taxonomy of HEAs based on extant literature, which classifies compositions based on whether they satisfy the ideal theory of HEAs formation or not (called as MPEA/ CCAs).

Figure 1
figure 1

(a) Taxonomy of HEAs based on different definitions30. (b) Phases considered in this work to classify the data based on the existing literature.

Various phases in HEA known theoretically to date can be categorised as below:

  1. (1)

    Ordered solid-solution (SS) phase: HEA residing in a singular crystalline phase such as B2 or β-ordered BCC phase

  2. (2)

    Disordered solid-solution (SS) phase: BCC, FCC, HCP

  3. (3)

    Mixed SS (Ordered + Disordered): FCC + BCC, BCC + B2, FCC + B2

  4. (4)

    Pure Intermetallics (IM): α, β, σ, μ, L12, L21, C14, C15, and C36 Laves.

  5. (5)

    IM + SS: BCC + C14 Laves, BCC1 + BCC2 + C15 Laves, BCC + β-ordered BCC, FCC + CoMo2Ni-type IM, FCC + IM and so on.

For the purpose of ML predictions, we clustered these phases together as for instance: (1)+(2) were considered as Single phase solid solution (SS), (3) was considered as Mixed solid solutions (MSS), and (4)+(5) were considered as mixture of intermetallic phases referred as ‘MIP’ as shown in Fig. 1.

Depending on the most-available phases procured from various literature, the current database used in this study contained four phases namely FCC, BCC, FCC + BCC, and MIP (mixture of intermetallic phases), as depicted in Fig. 1b. Due to scarcity of data belonging to HCP solid-solution phase, it was not considered in present study.

An open question in the literature is whether we can predict the type of phase (solid-solution, intermetallic, amorphous) for a given composition with known constituent elements, let’s say: AlxCoyHfz…… alloy, where x, y, z is the atomic weight percentage of each element. In this spirit, we demonstrate that ML strategy can be adopted to predict the phase of HEA merely using the reported experimental data by proper training, testing and validation of ML models which has been illustrated through the scheme shown in Fig. 2.

Figure 2
figure 2

ML framework used in this work for phase prediction of HEAs as solid-solutions (FCC, BCC, FCC + BCC) or mixture of intermetallic phases (MIP).

Data collection

Due to different stoichiometric ratios, distinct synthesis routes or processing conditions adopted by different researchers, the homogeneity in data collection on HEAs cannot be ensured, which makes it a challenging task to consolidate the data for comparison. This study extracted a dataset of 1200 unique compositions of HEAs experimentally synthesised from the melting and casting routes such as induction melting/induction levitation/vacuum induction melting + casting, arc melting/smelting + casting, arc melting + suction casting, or electric/vacuum arc melting followed by suction casting techniques, the corresponding reference to each HEA can be seen from the dataset provided and references30,31,32. The alloys prepared via other synthesis routes (powder metallurgy, selective laser melting, additive manufacturing and others) were not considered to avoid the effect of synthesis route4. The current dataset comprises 30 elements (Al, Co, Cr, Fe, Ni, Cu, Mn, Ti, V, Nb, Mo, Zr, Hf, Ta, W, C, Mg, Zn, Si, Re, N, Li, Sn, Be, B, Ag, Pt, Y, Pd, Au) and five physical parameters that are crucial for phase prediction of high-entropy alloys. The range of compositional and physical parameters (minimum, maximum, average and standard deviation values) are tabulated in Table 1. A detailed description of the complete dataset is provided as supplementary information [Table S1 and Fig. S1 in supplementary].

Table 1 Range of composition (atomic weight %) and physical parameters used in this study.

Empirical relations observed in high entropy alloys suggest that an HEA (solid-solution phase) formation becomes plausible when δ < 6.6% and 11.6 < ∆Hmix < 3.2 kJ/mol. When δ is large (δ > 6.6%) and ∆Hmix is noticeably negative (∆Hmix = − 12.2 kJ/mol)24, it leads to an amorphous phase instead of a crystalline phase. Intermetallic compounds tend to form in the intermediate range in terms of δ and ∆Hmix, or it overlaps largely with those for solid solutions and amorphous phase. Furthermore, for the identification of crystal structure in various solid solution forming HEAs, the effect of VEC was formulated and the threshold value was found to be as:

  • BCC when VEC < 6.87,

  • FCC when VEC > 8.0 and

  • Mixed phase (BCC + FCC) when VEC is in between 6.87 and 8.0.

A joint plot and swarm plot are shown for better visualisation [Fig. S2 in supplementary]. Zhang et al.33 criterion were almost the same for δ (δ < 6.6%) but the range of ∆Hmix was slightly different (− 15 < ∆Hmix < 5 kJ/mol). Among all physical parameters (atomic size difference (δ), electronegativity difference (∆χ), valence electron concentration (VEC), thermodynamic rule (mixing enthalpy (∆Hmix) and mixing entropy (∆Smix)), and others (Ω-parameter, ϕ‐parameter, and γ-parameter)) proposed for guiding the design of stabilizing phases of HEAs, only five crucial parameters (∆Hmix, ∆Smix, δ, ∆χ, VEC) were considered for this study, as these are widely accepted and easy to compute. Also, the mere requirement of these five parameters which can easily be obtained theoretically, guiding to the development of a new alloy based on our methodology would ensure effortless development of HEAs in future. The ∆Hmix for available HEAs in the dataset were calculated using Miedema’s rule34, while ∆Smix, δ, ∆χ, VEC were calculated by following Guo et al.35. Other parameters such as geometric parameter (γ) still awaits support from more experimental data. Accordingly, these five most influencing physical parameters being primarily responsible for a crystal structure in HEA were considered in the design of this study.

As a proof of concept for testing the cruciality of these five parameters, the heatmap shown in Fig. 3 was drawn using the seaborn library of python, which represents the Pearson correlation coefficient of five parameters governing the formation of HEA proposed by various researchers. This heatmap helps to visualize the correlation between features for sanity check of redundant features. Two features that are strongly positively correlated (when two features move in tandem) or negatively correlated (when two features are inversly related) leads to the problem of multicollinearity that significantly reduces the model performance and increases the standard error. Thus, it is suggested to eliminate one of the features that are strongly correlated36,37. No such strong positive or negative correlation between any two independent feature was observed; thus, all the five parameters were considered for further study without any elimination.

Figure 3
figure 3

Heatmap showing the Pearson correlation coefficient among five HEA parameters.

The complete dataset was labelled in four categories: FCC, BCC, FCC + BCC, and MIP. The alloys with a single-phase ordered/ disordered FCC or multiple FCC such as (FCC1 + FCC2) were considered in ‘FCC’ category. Similarly, alloys with a single-phase ordered/ disordered BCC or multiple BCC such as (BCC1 + BCC2) were considered in ‘BCC’ category and the mixture of FCC and BCC phases was considered in ‘FCC + BCC’ category. Compositions containing pure IM compounds (such as Laves, α, β, sigma etc.) or forming a mixture of SS + IM (such as FCC + IM, FCC + BCC + α, BCC + IM, FCC + α + β, BCC + Laves etc.) were considered in ‘MIP’ category, while the amorphous phase was not included in the analysis.

The 1200 datasets of HEAs used in this work contains 441 compositions of MIP phase, 372 compositions of BCC phase, 220 compositions of FCC phase and 167 compositions of FCC + BCC phase with no duplicated entry of any alloy. Depending on the number of instances belonging to each class, a dataset can typically be recognised as a balanced (when the number of instances available from each class is equal) or an imbalanced dataset (when the number of instances available from each class is different) for a classification problem. In case of an imbalanced dataset, the class with the highest and least number of instances is known as the majority and minority class, respectively.

It must be noted that the present study discusses the phase prediction of HEA as solid solution phases such as (FCC, BCC, or FCC + BCC) or MIP (pure IM or mixed IM + SS) phases for an imbalanced dataset by targeting only those HEAs that were developed via. melting and casting route. The effect that the imbalanced dataset makes on the performance of ML algorithms has been explicitly discussed in section "Results and discussions".

Data processing

Before feeding the data into the ML algorithms, some statistical processing steps were performed to make the predictions more meaningful38,39. The text data (phases) was converted into numeric values (MIP: 0, BCC: 1, FCC: 2, and FCC + BCC: 3), outlier detection was performed to remove the outliers from the dataset; various imputation methods such as simple imputer with different strategies (mean, median, and constant), KNN imputer and MICE imputer was employed to impute the missing values (NaN) in the dataset. Feature scaling was performed on each set of imputed data to normalise the data into a finite range, using robust scalar imported from scikit-learn library. The robust scaling formula can be expressed as38:

$${X}_{\text{robust }}=\frac{{X}-{X}_{median }}{{X}_{75 }- {X}_{25}}$$
(6)

where X is an input variable, \({X}_{median}\) is the median of X, \({X}_{75}\) is the 75th quantile and \({X}_{25}\) is the 25th quantile of X. The difference between 75th quantile and 25th quantile is also known as interquartile range (IQR).

Brief description of the machine learning algorithms

KNN algorithm

The KNN algorithm searches for the nearest neighbours by measuring the distance between the two points40,41 and is expressed as:

$$d\left(q,{x}_{i}\right)=\sum_{f\in F} {w}_{f}\delta \left({q}_{f},{x}_{{i}_{f}}\right)$$
(7)

For classifying an unknown input variable (q) one needs to know the existing input variable (xi) in F and the weight factor (wf) for each feature. Based on this distance, the k nearest neighbours is selected, and the class of q is determined from the voting of the nearest neighbours as below:

$${\text{Vote}}\left({y}_{i}\right)=\sum_{c=1}^{k} \frac{1}{d\left(q,{x}_{c}\right)\rho }1\left({y}_{i},{y}_{c}\right)$$
(8)

This returns 1 if the class labels matches and 0 if does not match. The vote assigned to class yi by neighbour xc is the inverse of their distance, i.e., 1(yi, yc).

SVM algorithm

SVM classifier searches for the hyper plane that best separates different classes by maximising the margin (the distance between the nearest data points from different class sets) to avoid the local minima and to achieve the best separation of different classes42,43. The decision function is as below:

$$f\left(x\right)=w.x+b$$
(9)
$$\underset{\mathbf{w},\xi }{min} \left\{\frac{1}{2}\parallel \mathbf{w}{\parallel }^{2}+C\sum_{i=1}^{N} {\xi }_{i}\right\}$$
(10)
$${\text{Subject to}}:{ }y_{i} \left( {{\mathbf{w}} \cdot {\mathbf{x}}_{{\mathbf{i}}} } \right) \ge 1 - \xi_{i} ,\xi_{i} \ge 0$$
(11)

DTC algorithm

A decision tree classifier splits the dataset into root node, sub-node and leaf-node by calculating the information gain, i.e., change in entropy after dividing a dataset based on attributes, which helps to determine the order of features in various nodes of a decision tree (quality of splitting)44,45. Information gain is calculated as below:

$$H\left( {Y|X} \right) = H\left( {X,Y} \right) - H\left( X \right)$$
(12)

where \(H(Y\mid X)\) is the conditional entropy, \(H\left(X\right)\) is the entropy of random variable X, and \(H\left(X,Y\right)\) is the joint entropy, calculated as follows:

$$H(X,Y)=-\sum_{i,j} p\left({x}_{i},{y}_{j}\right){\text{log}}_{2}p\left({x}_{i},{y}_{j}\right)$$
(13)

RFC algorithm

In a random forest classifier, ensembles of various decision trees (base learners) are considered such as \({h}_{1}(x),{h}_{2}(x),\dots ,{h}_{J}(x)\). It takes majority of votes to calculate f(x) such that the loss function is minimised46,47. Loss function is expressed as below:

$$L(Y,f(x))=I(Y\ne f(x))=\left\{\begin{array}{c}0, \, {\text{if}} \, Y=f(x)\\ 1, \, {\text{otherwise}}\end{array}\right.$$
(14)
$${\text{Voting is based on }}f(x)={\text{arg}}max\sum_{j=1}^{J} I\left(y={h}_{j}(x)\right)$$
(15)

XGBoost algorithm

XGBoost combines a set of weak classifiers to create a strong classifier42,48. The objective function is expressed as:

$${\text{obj}}(\theta )=\sum_{i}^{n} l\left({\widehat{y}}_{i},{y}_{i}\right)+\sum_{k}^{K}\Omega \left({f}_{k}\right)$$
(16)

The term \(l\left({\widehat{y}}_{i},{y}_{i}\right)\) represents the loss function, which measures the difference between predicted output and the actual output, where \({y}_{i}\) is the actual output, and \({\widehat{y}}_{i}\) is the predicted output given by \({\widehat{y}}_{i}=\sum_{k}^{K} {f}_{k}\left({x}_{i}\right),{f}_{k}\in F.\) \({x}_{i}\) is the input variable and \(\Omega \left({f}_{k}\right)\) is regularisation term that helps to avoid overfitting by penalising the complexity of the model. XGBoost is trained additively, where one tree is optimised and added each at a time. Supplementary provides the description and proper visualisation of these algorithms [Table S2 in supplementary].

Results and discussions

For an imbalanced dataset problem such as the one tested in this work, careful treatment is essential or else the predictions can be out of order. Accuracy is well-accepted measurement for evaluating the performance of a classification problem. However, for an imbalanced dataset, the use of accuracy as an effective indicator has been questioned recently by various authors49,50,51,52,53. Therefore, alternative evaluation metrics for assessing the effectiveness of ML models for imbalanced dataset were explored, as accuracy alone is not trustworthy. Various other evaluation metrics such as ROC-AUC score, Precision, Recall, and F1-score available in scikit-learn version 1.1.1 module54 in python version 3.9.12 are robust measures for imbalanced dataset classification55.

The receiver operating characteristic (ROC) curve is a probability curve typically plotted for binary classification tasks at different classification threshold values54,56.

This paper studies the phase prediction of HEAs as solid-solution phases such as FCC, BCC, and FCC + BCC, or MIP (which can be either pure IM or mixed IM + SS phase, as described in Fig. 1a), by targeting the multiclass classification of HEAs into four phases namely FCC, BCC, FCC + BCC and MIP using the real-world imbalanced dataset of HEAs.

The ROC curve can be extended to multiclass classification with ‘one-vs-one’ and ‘one-vs-rest’ strategies54,57. Here, ‘one-vs-rest’ strategy was employed, to compute the AUC score (by calculating the area under the ROC curve) for each class against the rest of the class and by subsequently taking its average. ROC_AUC score provides a summary of classifier’s performance by measuring the area under the ROC curve, which is more likely to be true representative of model’s performance. ROC_AUC score varies in between 0 to 1, where 1 denotes the perfect classifier, while 0 denotes a perfectly incorrect classifier.

Precision evaluates the fraction of predicted positives that were actually true positives (TP), Recall determines the ability of a model to predict the true positives (TP), and F1-score calculate the harmonic mean of Precision and Recall58,59. The detailed description of Precision, Recall and F1-score are given in supplementary [Table S3 in supplementary].

The effectiveness of five vanilla (base) models (V-KNN, V-SVM, V-DTC, V-RFC, and V-XGB) was tabulated and compared using all the above-mentioned evaluation metrics, for five different imputers (Simple imputer (SI) with strategy mean, median and constant; KNN imputer and MICE imputer), tabulated in supplementary [Table S4]. No significant difference in model’s performance for these imputers were observed. Vanilla-RFC (V-RFC) performed best compared to other algorithms, with an average test accuracy of 84%, ROC-AUC score of 0.9649, tenfold cross-validation mean score of 0.9315 which is shown in Fig. 4a.

Figure 4
figure 4

(a) Performance comparison of V-KNN, V-SVM, V-DTC, V-RFC, V-XGB, HT-RFC, and ST-RFC models using average test accuracy (multiply by 100 for % value), ROC_AUC score, tenfold cross-validation score and its standard deviation (values shown in red color), (b) F1-score, Recall and Precision for four distinct phases of HEAs for five vanilla models.

Furthermore, F1-score, Recall and Precision were also evaluated for all five vanilla models, where V-RFC obtained higher precision, recall and f1-score in contrast to other models (see Fig. 4b). Note that for each model, five iterations were performed and their average was considered. A bar-chart comparing the performance of all ML models tested in this work compares the three outcomes (Fig. 4a), namely, Accuracy (peach-coloured bars), ROC-AUC score (light green bars) and tenfold cross-validation score (light purple bars). We further explored hyper-parameter tuning of RFC model (HT-RFC) and noticed an increment of approximately 3% in average test accuracy (87.49%).

It is not surprising that many studies have reported higher accuracy from their ML predictions but the fact that these accuracies have come through the aid of synthetic data by mixing with the experimental data cast doubts on the reliability of these models. For instance, Risal et al.60 obtained 92.31% accuracy with higher ROC-AUC, precision, recall and f1-scores but they used over-sampling/under-sampling method to balance out the majority and minority class data by augmenting it with synthetic data while acknowledging that the “ML algorithms usually do not perform well for imbalanced dataset”.

In our considerations, augmenting or polluting the real-world data with synthetically generated data is not reliable for two reasons: first accuracy alone is not the most robust measure for assessing the performance of the ML model for an imbalanced data and second, the controversy on calling an alloy as HEA still exists, thus it cannot be assured that the generated samples are truly a high-entropy alloy.

Still, as per current vogue, we tried to resample our data using SMOTE-Tomek links method for V-RFC model (outperformed among other vanilla algorithms), which is quite different from other existing over-sampling and under-sampling methods. It generates synthetic data for minority class using SMOTE and removes the data from majority class that is closest to minority class using Tomek links61. An average of 92% accuracy was observed for augmented data (1200 + 192 = 1392) using SMOTE-Tomek links (ST-RFC), by generating 192 synthetic data.

We further evaluated the performance of V-RFC and ST-RFC with a confusion matrix to analyse it’s prediction quality for each phase. A confusion matrix is an easy way to visualise classifier’s performance, where n × n matrix is created (n is the number of classes) to provide better insights into the correctly and incorrectly classified instances. Two confusion matrices of 4 × 4 were created on test data of 240 HEA samples (93 MIP, 67 BCC, 46 FCC, and 34 FCC + BCC) for V-RFC model from the original dataset (1200), and 279 HEA samples (78 MIP, 67 BCC, 40 FCC, and 94 FCC + BCC) for ST-RFC model from augmented dataset (1200 + 192 = 1392 samples) to investigate the performance of the RFC model in predicting phases of HEAs, as shown in Fig. 5.

Figure 5
figure 5

Confusion matrix comparing the performance of V-RFC, and ST-RFC for each distinct phase.

It can be noticed that the number of samples in the minority class increases by maintaining the stratified ratio between the classes for the ST-RFC model. Although the number of incorrect predictions becomes less in the case of ST-RFC model in contrast to the V-RFC model, it is still ineffective considering the uncertainty associated with using synthetic data which cannot guarantee a high-entropy alloy. Furthermore, the ROC curve and their AUC score for all five vanilla models trained on original data, and SMOTE-Tomek links models trained on augmented data were plotted as shown in Fig. 6a,c. The ROC-AUC score of all ST-models were higher than the vanilla models. We selected the best models i.e., V-RFC and ST-RFC models and evaluated AUC score for each phase (MIP, BCC, FCC, and FCC + BCC) for the test data of original dataset (test data = 240 HEAs) and augmeneted dataset (test data = 279 HEAs) which are depicted in Fig. 6b,d. It can be seen that although the ROC-AUC of ST-RFC model was approximately 3% higher than the vanilla model (V-RFC), still both models provided approximately similar AUC score for each phase except for the FCC + BCC phase. The reason of higher AUC score for FCC + BCC phase for ST-RFC model is the increased number of instances of minority class i.e., FCC + BCC (34 instances), which has now become the majority class (94 instances) by augmenting the data in case of ST-RFC model. Therefore, we reinforced our point by comparing the confusion matrices and ROC-AUC score for original and augmented dataset. As these matrices provided better insights and are considered as true indicator of a classification model, we claim that augmenting data to increase model’s accuracy is not a reliable practice. Therefore, this study is more pertinent considering the aforementioned issues.

Figure 6
figure 6

ROC-AUC scores for (a) five vanilla (base) models, (b) AUC score of each phase for the best vanilla model i.e., V-RFC model, (c) for all five SMOTE-Tomek links augmented model, (d) AUC score of each phase for the best SMOTE-Tomek links augmented model i.e., ST-RFC model.

Model validation

Validation based on the literature

The predictive capability of all five classifiers was further tested for alloys that were not considered for training or testing the dataset for sense check. Various phases of five alloys (2 refractory HEAs10, one 3d-transition metal HEA62 and 2 precious metal HEAs63) that are recently reported were taken as examples from experimental studies (literature) that are shown in Table 2. The physical parameters corresponding to these HEAs were calculated using the chemical formulae mentioned previously in earlier sections. The phase of HEA highlighted in bold fonts indicated wrong predictions (predictions does not match with experimentally characterised phase), the italicized phases show exceptional case (where the certainty of matching ML predictions and the actual phase is limited), and the remaining phases (nonbold and nonitalic) show the correct prediction revealing that the ML models corroborate with the experimentally reported phases.

Table 2 Validation of all vanilla (base) model’s performance for unseen compositions (not used in training or test datasets).

It can further be noted that the RFC classifier predicted the phases correctly in most cases. In case of Al0.5CrCuNiV 3d-transition metal HEA62, RFC model predicted MIP phase (in italic font), while the actual phase contains 1FCC + 2BCC + ordered B2 phase, which is a complex multiphase alloy. The ML models assumed that MIP can be either pure intermetallic compound (IM) or mixtures of intermetallic and solid solutions (IM + SS) which was discussed in section "Research methodology". Assuming that it can be inferred that the RFC model’s prediction is correct for all new compositions taken from different experimental studies, that were not the part of either training or test dataset. However, it is limited in inferring the number and types of phases present in complex multiphase HEAs. A list of such complex multiphase HEAs (that were not the part of training or test set) is tabulated in Table S5 in supplementary as an additional information.

To strengthen the support to our claim further, we critically assessed the recently published literature based on various evaluation metrics, shown in Fig. 7. Scant literature was found to focus on alternative evaluation metrics such as ROC-AUC, precision, recall and F1-score. The proposed RFC model in the present work, revealing the ROC-AUC score, their tenfold cross-validation score, confusion matrix, F1-score, Recall, and Precision, showed satisfactory performance in predicting phases of HEAs as solid-solution phases (BCC, FCC, FCC + BCC) or MIP (denoting either pure intermetallic compounds (IM : such as α, β, σ , L12, C14, C15, C36 Laves, and others) or mixture IM + SS phases such as FCC + IM, FCC + BCC + α, BCC + IM, FCC + α + β, BCC + Laves, BCC1 + BCC2 + C15 Laves etc.)) for large imbalanced dataset of those HEAs that were synthesized via. melting and casting route only, without augmenting/polluting experimental data with generated ones.

Figure 7
figure 7

Performance comparison with existing literature28,31,60,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79.

Synthesis and characterisation of a new HEA (Ni25Cu18.75Fe25Co25Al6.25)

In accord with these learnings, a new high entropy alloy was synthesised based on the predictions obtained from the RFC algorithm. This alloy consists of Nickel, Copper, Iron, Cobalt and Aluminium with a composition of Ni25Cu18.75Fe25Co25Al6.25.

To begin the experimental synthesis, the metal buttons were procured. Various metal buttons of Ni, Cu, Fe, Co and Al elements (purities > 99.99%) were purchased from Thermofisher Scientific®. All elemental metals were melted together by vacuum arc melting under inert gas (high purity Ar) environment. The ingot formed in the process was melted and solidified multiple times to ensure chemical homogeneity, and then the HEA button was vacuum sealed in a quartz tube, homogenised at 1000 °C for 10 h, and then quenched into water for stabilising high-temperature phase. The detailed description of the newly synthesised high-entropy alloy is specified in Table 3.

Table 3 Detailed description of newly synthesized high-entropy alloy.

X-ray diffraction (XRD, Broker D8) was used to identify the phase of Ni25Cu18.75Fe25Co25Al6.25 alloy, with the wavelength Cu Kα (λ = 1.54056 Å) at a step size of 0.02° recorded with angles (2θ) in the range of 20°–100° (see Fig. 8). The Bragg’s peaks (111), (200), (220), (311), and (222), belong to the lattice planes of FCC phase, while no other peaks corresponding to ordered structure were detected, indicating that this new HEA resides in a crystalline FCC structure.

Figure 8
figure 8

XRD analysis of newly synthesized HEA (Ni25Cu18.75Fe25Co25Al6.25) for as-cast and heat-treated sample. Peaks (111), (200), (220), (311) and (222) correspond to the FCC structure.

Figure 9 compares various HEAs for the test datasets (240 alloys) from original data with the newly developed and synthesised HEA composition Ni25Cu18.75Fe25Co25Al6.25. The orange dot represents the reported experimental phase of the HEA for the test data while the blue triangles represent the RFC prediction, and the red asterisk represents the new composition of the Ni25Cu18.75Fe25Co25Al6.25. RFC algorithm indicated that this new HEA would stabilise as FCC phase at room temperature.

Figure 9
figure 9

Phase prediction of novel HEA composition Ni25Cu18.75Fe25Co25Al6.25 (shown in red asterisk), along with 240 test data.

A remarkable agreement between RFC model prediction and experiment can be seen for this new composition of HEA. It can be inferred that the V-RFC model is reliable and robust in predicting phases of novel compositions of HEAs as simple solid solution (FCC, BCC, FCC + BCC) and MIP (Mixture of intermetallic phases) with higher reliability of phase prediction, where MIP denotes the presence of either pure IM compounds such as α, β, σ, L12, L21, C15, C15, C36 Laves or mixture of IM + SS phases (FCC + IM, FCC + BCC + α, BCC + IM, FCC + α + β, BCC + Laves, BCC1 + BCC2 + C15 Laves, BCC + β-ordered BCC, FCC + CoMo2Ni-type IM, FCC + IM etc.). However, this method is limited in exactly interpreting the number and types of phases present in a complex multiphase HEA, which it usually predicts as MIP phase, but it is robust for predicting solid-solution phases.

A follow-on work of this study will be to test and predicting the mechanical properties of HEAs which will require atomistic studies on HEA. On surveying the wealth of literature in the arena of molecular dynamics simulation, the EAM potential currently available for 16 elements namely, Cu, Ag, Au, Ni, Pd, Pt, Al, Pb, Fe, Mo, Ta, W, Mg, Co, Ti, Zr was tested by the authors and found robust to predict HEAs mechanical properties reliably80. In keeping this momentum for the purpose of traceability, we have taken the same alloy used in this MD study for the purpose of ML validation in this work as well (Ni25Cu18.75Fe25Co25Al6.25).

Conclusion

This study attempts to develop a novel high entropy alloy through comparison of prior literature using robust machine learning algorithms. An effort in this area will strengthen the materials discovery research to guide the initiatives at the forefront of Materials-4.0. Major conclusions from this study can be summarised as:

  1. a.

    An imbalanced dataset involving synthetic data merged into the experimental data can lead to spurious outcomes when feeding to the machine learning algorithms. An attempt like this (which has been routinely done in literature) can although help achieve higher accuracy from the model, but it can compromise the quality of prediction, particularly, while inferring complex phases of high entropy alloys (HEAs). Our novel work using machine learning revealed that it is possible to make reliable predictions to infer phase information of an HEA merely by using five crucial parameters (Valence electron concentration (VEC), Electronegativity difference (∆χ), Mixing entropy (∆Smix), Atomic size difference (δ), and Mixing enthalpy (∆Hmix)). One must however be cautious of using selectively screened input experimental data to feed the ML algorithm.

  2. b.

    The performance of ML models was assessed using accuracy, precision, recall, f1-score, ROC-AUC score and tenfold cross-validation scores. Across, K-nearest neighbours (V-KNN), support vector machine (V-SVM), decision tree classifier (V-DTC), random forest classifier (V-RFC) and XGBoost (V-XGB), Random Forest Classifier (V-RFC) model performed the best in correctly predicting the phase of an alloy as solid-solutions (FCC, BCC, FCC + BCC) or MIP which denotes the presence of either pure IM compounds (such as α, β, σ, L12, L21, C15, C15, C36 Laves) or mixture of IM + SS phases (such as FCC + IM, FCC + BCC + α, BCC + IM, FCC + α + β, BCC + Laves, BCC1 + BCC2 + C15 Laves, BCC + β-ordered BCC, FCC + CoMo2Ni-type IM, FCC + IM etc.), with an average accuracy of 84%, ROC-AUC score of 0.9649, tenfold cross-validation mean score of 0.9315. Thus, V-RFC model can be used for predicting phases of new HEAs as solid-solution (FCC, BCC, FCC + BCC) or MIP (Mixture of Intermetallics phases). This claim was reinforced by comparing the V-RFC predicted phases with experimental phases reported recently for the newly developed HEAs, where V-RFC correctly predicted solid solution phases (BCC and FCC) for 2 refractory HEAs10, and 2 precious metal HEAs63 respectively. The phase of 3d-transition metal HEA (Al0.5CrCuNiV)62 was also correctly predicted as MIP, as per the considered assumption, however the actual phase contained 1FCC + 2BCC + ordered B2. Note that our algorithm worked robustly in predicting solid-solution phases, and complex multiphase HEAs as MIP, but it was found limited in interpreting the number and types of phases present in a complex multiphase HEA.

  3. c.

    Although there are few studies reporting higher accuracy from the models using synthetic data, we showed that this can lead to inaccurate predictions. For instance, care must be taken while extracting the data from mixed manufacturing routes and tackling an imbalanced dataset. This becomes clear from the fact that although the ANN model used in Bakr et al.28 study achieved an accuracy of 93.4% but could not correctly predicted the existence of amorphous phase. Hence, proving the fact that even after achieving 93.4% of accuracy, their model resulted in erroneous predictions while treating the imbalanced data. Also. the recall, precision, and F1-score for amorphous (AM) phase were not defined clearly. It was also acknowledged by Risal et al.51 that the “ML algorithms usually do not perform well for imbalanced dataset” and reported 92.31% accuracy by using oversampling method to balance out the minority class data by polluting it with synthetic data. Accordingly, we explored the use of SMOTE-Tomek link to resample our dataset in support of testing our claim, using RFC model (ST-RFC). An average accuracy of 92% was observed on augmented data of 1392 instances (1200 + 192) for ST-RFC model. Although, a great increment in accuracy was observed, but it could not yield better phase predictions.

  4. d.

    Using the robust RFC algorithm developed in this work, we report the development of a novel HEA with its composition Ni25Cu18.75Fe25Co25Al6.25. The peaks from X-ray diffraction revealed an FCC structure in corroboration with the ML predictions.