High Entropy Alloys Mined From Binary Phase Diagrams

High entropy alloys (HEA) are a new type of high-performance structural material. Their vast degrees of compositional freedom provide for extensive opportunities to design alloys with tailored properties. However, compositional complexities present challenges for alloy design. Current approaches have shown limited reliability in accounting for the compositional regions of single solid solution and composite phases. For the first time, a phenomenological method analysing binary phase diagrams to predict HEA phases is presented. The hypothesis is that the HEA structural stability is encoded within the phase diagrams. Accordingly, we introduce several phase-diagram inspired parameters and employ machine learning (ML) to classify 600+ reported HEAs based on these parameters. Compared to other large database statistical prediction models, this model gives more detailed and accurate phase predictions. Both the overall HEA prediction and specifically single-phase HEA prediction rate are above 80%. To validate our method, we demonstrated its capability in predicting HEA solid solution phases with or without intermetallics in 42 randomly selected complex compositions, with a success rate of 81%. The presented search approach with high predictive capability can be exploited to interact with and complement other computation-intense methods such as CALPHAD in providing an accelerated and precise HEA design.

www.nature.com/scientificreports www.nature.com/scientificreports/ assess phase stability is that they can readily provide direct and realistic information about the roles of individual elemental components on phase formation. The phenomenological method is built on the hypothesis that the constituent binary alloys encode a wealth of information about the multi-component alloy in terms of crystal structures, elemental mixing, and phase separation. Here, we demonstrate the effectiveness of the proposed method by introducing physically meaningful phenomenological parameters that can be conveniently accessed from binary phase diagrams. These parameters are used to demarcate the phases forming regions for HEAs. The phases studied here are those with homogeneity ranges in the phase diagrams such as body-centred cubic (BCC) single-phase, face-centred cubic (FCC) single-phase, mixed FCC + BCC phase, hexagonal close-packed (HCP) single-phase, Sigma phase, and Laves phase. Minor phases such as line compounds are not included but will be for future work. ML algorithm is employed to navigate the complex parameter space regions occupied by the currently known HEA compositions. The effectiveness of the method is evaluated, and the derived ML algorithms are used to make predictions for experimental verification. The presented "phase diagram" approach to predicting single solid solution HEAs can complement CALPHAD and other first-principles methodologies in providing an efficient pathway to phase-field and microstructural control.

Database partitioning
The HEAs included in our model have phases classified as: disordered FCC (A1), disordered BCC (A2), disordered HCP (A3), mixed disordered FCC + BCC (A1 + A2), ordered BCC (B2), B2 mixed with disordered solid solution phases specifically A1, A2, and A3 (B2 + SS), and either Sigma or Laves IM mixed with the other phases (IM+). The set of HEAs included in A1 + A2 are the commingling of A1s, A2s, or the coexistence of A1s and A2s. The set of HEAs included in the IM+ phase have at least Sigma or Laves phase. Additionally, the IM+ phase may also contain other complex or solid solution phases. The database is parsed into three different levels, namely, Level 1, 2, and 3. Level 1 is composed of the simple disordered phases: A1, A2, A1 + A2, and A3. Level 2 is Level 1 with the addition of the B2 + SS HEAs. And Level 3 is Level 2 with the addition of IM+ HEAs. HEAs with other minor phases such as line compounds that do not belong to the above categories are not included in the present study. Levels 1, 2, and 3 comprise 317, 486, and 614 HEAs respectively. More details about the database can be found in the method section and the supplementary materials.

HeA phase formation parameters
The parameters, introduced below, and elaborated on in detail in the method section, provide the basis for quantifying HEA phase formation tendencies. For ML, these individually measured property parameters, used as input data to do classification, are called features.
The HEA melting temperature (T m ) is expressed as the weighted average of binary liquidus temperatures. For the as-cast HEAs, undercooling usually extends to the region around 0.8 T m 29 . Phase evolution may still exist below this temperature because of the high kinetic energies of the atoms. Here, a phase formation temperature (T pf ) is defined where rapid phase evolution ceases. It is assumed that T pf is not lower than 0.7 T m . Below this temperature, the kinetic energy of atoms is not high enough to transform the phase within the brief time of cooling. Incidentally, most post-annealed HEAs in the full database are homogenised above 0.7 T m . Atoms are free to exchange neighbours during undercooling (i.e. above 0.8 T m ), or via fast diffusion down to T pf . The alloy mixture is essentially ergodic and local atoms have nearly equal probabilities of sampling any binary configurations favoured by the phases present in the constituent binary phase diagrams.
Following the above discussion, information from individual binary phase diagrams is combinatorially used within the model. It is assumed that the tendency for a pair of elements to form a specific phase is directly determined by its binary phase field percentage. The binary phase field percentage of phase X for i-j elemental pair is denoted as − X i j and is determined using T pf . Then, − X i j is used to calculate the phase field parameter (PFP X ) which is related to the tendency of a HEA to form a phase X.
Many mixed phase HEAs are found to form because of interatomic repulsions 30,31 . Specific element pairs, such as Cr and Cu, separate because of the large positive mixing enthalpy, causing multiphase formations in HEAs 30 . This effect is included in the model with the phase separation parameter (PSP).
The selection of T pf can influence the values of parameters and the prediction accuracy. The optimized T pf value was obtained to get the most accurate ML prediction. Further details for T pf determination and calculating these parameters (value ≤ 1) are found in the method section.

Visualisation of phase Regions in parameter Space
The prior defined parameters are calculated for all HEAs in different database levels. Their correlations with the actual phases formed are examined.
For the Level 1 phases, there are correlations between the calculated parameters PFP A1 , PFP A2 , PFP A3 , and PSP with the A1, A2, A3, and A1 + A2 phase formation. Figure 1a, a plot of PFP A1 verse PFP A2 shows the parameters partitioning the A1, A2, and A1 + A2 HEAs. Typically, A1 HEAs have PFP A1 > 0.4 and PFP A2 < 0.4, while A2 HEAs have PFP A1 < 0.4. Some A2 HEAs form even with small PFP A2 values because their B2 phase field can transfer into the A2 phase to prompt A2 formation. As discussed in the following Level 2 and Level 3, specific phase formation is influenced by multiple parameters. A1 + A2 HEAs are distributed in a region where neither PFP A1 nor PFP A2 is dominant and cannot be separated from the single phase HEAs. In general, individually large PFP A1 or PFP A2 values promote the formation of a single phase, while the similar values of PFP A1 and PFP A2 tend to favour a mixed phase formation. Adding PSP as the third axis in Fig. 1b separates the A1 + A2 from the A1 and A2 HEAs by their relative higher PSP values because a large PSP indicts the strong phase separation effect which leads to the A1 + A2 phase formation. To study the effect of PFP A3 on A3 phase formation, a plot with axes PFP A1 , PFP A2 , and PFP A3 is plotted for A1, A2, A3, and A1 + A2 HEAs in Fig. 1c, where A1, A2, and A1 + A2 HEAs are www.nature.com/scientificreports www.nature.com/scientificreports/ grouped as non-A3 HEAs. All the A3 HEAs have higher PFP A3 than the non-A3 HEAs and appear separate from the other phases.
For the Level 2 phases, the five parameters are PFP A1 , PFP A2 , PFP A3 , PFP B2 , and PSP. In Fig. 2a-g, to study the effects of the new parameter PFP B2 , the 5D parameter space of the Level 2 data is visualised by projecting it on to 3D spaces. Figure 2a is plotted with only the parameters in Level 1. B2 + SS HEAs are mixed with HEAs in other phases. In Fig. 2b-d, PFP B2 is added. Figure 2e-g have the same axes as Fig. 2d but can give direct comparisons between the B2 + SS phase and the A1, A2, and A1 + A2 phases. On all these plots, B2 + SS HEAs are located in a region with relatively higher PFP B2 values. This indicates that PFP B2 is strongly correlated with the B2 + SS phase formation. PFP A3 and A3 HEAs are not plotted here because PFP A3 does not affect the formation of B2 + SS phase and A3 HEAs are trivial to predict with PFP A3 as shown in Level 1.
For the Level 3 phases, two additional parameters PFP Sigma and PFP Laves are added. Seven parameters PFP A1 , PFP A2 , PFP A3 , PFP B2 , PFP Sigma , PFP Laves , and PSP are used to separate the phase regions of A1, A2, A3, A1 + A2, B2 + SS, and IM + HEAs. In order to study the correlation between the newly added IM+ phase formation and the two parameters PFP Sigma and PFP Laves , a 2D graph with axes PFP Sigma and PFP Laves is plotted in Fig. 3. All the phases from Level 2 are grouped as Non-IM phases. In general, IM+ HEAs have larger PFP Laves or PFP Sigma than most of the Non-IM HEAs. However, all seven parameters have influence on the IM+ phase formation. Figure 3 is insufficient to convey all the information from the seven parameters.
In summary, Level 1 shows separation between all single phase HEAs in the PFP A1 , PFP A2 , PFP A3 , and PSP parameter space. A1 + A2 phase region is seen to have some overlaps with A1 and A2 phase regions. By adding  www.nature.com/scientificreports www.nature.com/scientificreports/ parameters in Level 2 and Level 3, additional overlaps are noted. The parameter space of the HEAs assumes an increasingly complex topological configuration as the number of parameters increases. Additionaly, it is difficult to resolve the connections in 3D space. In such complicated cases, ML is superior to the visualisation method to determine phase formation regions.

HeA phases prediction Using Machine Learning
ML is employed to analyse the complex parameter space of HEA phase formation. It creates links between the parameters and phase formation in the higher-dimensional parameter space. Through ML composition-phase correlations are determined and new HEA compositions are predicted.
The effect of phase formation from alloy preparation methods is also studied. ML is first applied to only the as-cast HEAs and its performance serves as a benchmark. Then ML is applied to all HEAs in as-cast and annealed states. The ML prediction performance comparison of the HEA sets yields on average that the addition of the annealed HEAs has a slight abating effect, as seen in Table 1.
The ML results for Level 1 HEAs are obtained using the features PFP A1 , PFP A2 , PFP A3 , and PSP. The overall success rates with 50-90% training sets are 89-90% for the as-cast HEAs or 87-89% including the annealed HEAs.  www.nature.com/scientificreports www.nature.com/scientificreports/ Single phase predictions have higher success rates than the mixed phase predictions. The high prediction success rates prove that these parameters are sufficient for describing the disordered solid solution phase formation behaviour. PFP B2 is added as a fifth ML feature to predict the B2 + SS HEAs in Level 2. The overall and the B2 + SS phase prediction success rates are near 85% for both the as-cast HEA set and the set including the annealed HEAs. Thus, PFP B2 is useful in predicting the presence of the B2 + SS phase. Formation of the IM+ phases in the Level 3 HEAs are studied by adding PFP Sigma and PFP Laves as new features. The IM+ phase prediction success rates are 73-78% for the as-cast HEAs or 71-77% including the annealed HEAs. The overall success rate is as high as 80% for all HEAs.
With the increasing complexity of the database from Level 1 to Level 3, the ML prediction success rates decrease but still maintain high values. As the training set percentages change from 90% to 50% in each level, the success rates show little variance. High accuracy is obtained even with training set percentages as low as 50%.

Model Validation
To show that the model avoids overfitting with ML and can expand the current phase regions, 42 new HEAs were synthesised. The phases of these elemental combinations, which do not exist in the current collected database, are then predicted by the model. As shown in Fig. 4, the selection of compositions is distributed evenly in the parameter space of the collected database. The numbers of new HEAs in different predicted phases are approximately proportional to the numbers of different HEA phases in the database. Many synthesised HEAs are outside the current known phase regions in order to show the ability to expand the phase region. As shown in Table 2, our method is not limited by the use of a specific element type nor the number of elements in a HEA. Elements are chosen from different groups of the periodic table such as refractory metals, transition metals, and main group elements. The number of elements in a single HEA varies from four to seven. All the phases are measured in the as-cast state. Out of the 42 HEAs, 34 were predicted by ML correctly, yielding a success rate of 81%. Their X-Ray Diffraction (XRD) patterns are found in the supplementary materials.

Discussion
For the first time, a method predicting the phase formation of HEAs based solely on the binary phase diagrams is demonstrated and validated. The information on elemental mixing and phase separation from binary phase diagrams has provided success to the phenomenological approach presented. Considering the atomic mobility at high temperatures and presumed pairwise additivity of atomic pair interactions, this information from binary diagrams is used combinatorially to evaluate HEA phases formation. The initial success of using PFP X and PSP, defined using binary phase diagrams, in predicting the corresponding single phase and mixed phase HEAs, prompted us to apply this method to include more phases. The inter-correlated roles of these parameters are noted, and their combined effect must be considered in designing HEAs. We have included in our study the majority of the entire available HEA database, excluding those containing line compounds and the minor phases. Visualisation reveals robust HEA phase formation regions in the parameter space. ML enables the quantification of HEA phase formation, yielding an average single phases prediction success rate of about 90% for the Level 1 and Level 2, and more than 80% for Level 3. The ML success rates obtained from the as-cast HEAs, or the as-cast and annealed HEAs vary marginally. Thus, the model works well for the as-cast and the high temperature annealed HEAs. Considering that these are the most common HEA preparation methods, our model can be applied to most HEA synthesis situations. High accuracy is obtained even with small training set www.nature.com/scientificreports www.nature.com/scientificreports/ percentages. This implies that the phase formation parameters are well defined and efficient in prediction. Most HEA phase prediction models do not have experiment validation. The high experimental validation success rate of this method is indicative of its reliability. Moreover, ML can predict the phases of the new HEAs to expand the current database and phase parameter regions.
Compared with the other large database statistical approaches, Tancret et al. combined Gaussian Process using nine thermodynamic and atomistic parameters with CALPHAD to predict the formation of over 60 single solid solution phase HEAs 25 . The performance of the model has high precision but low recall. Many of the alloys predicted as single solid solution phase HEAs by this method have a high chance of being single solid solution phase HEAs, but many potential single solid solution phase HEAs are misidentified as mixed phases HEAs. Additionally, the exact phase of a HEA such as BCC or IM cannot be predicted. As a comparison, our method has high precision and high recall, and gives specific phase formation information.
Another model by Kube et al. assigned values called stabilising abilities (β i ) to seven specific elements Al, Co, Cr, Cu, Fe, Mn, and Ni representing their strength in stabilising FCC or BCC formation. The β i 's are optimised by ordinal logistic regression based on a database of over 2000 sputter deposited HEAs from a high-throughput experiment 26 . This method is efficient in separating out FCC and BCC single phase HEAs. But mixed FCC and BCC phase cannot be separated from the prior phases. Moreover, other phases such as HCP and IM were not studied. Our method has no element preference and a higher number of phases can be predicted.
Neural network models trained by Huang et al. 28 and Islam et al. 27 based on the thermodynamic and atomistic parameters only predict the phase categories such as the formation of solid solution, IM, or their mixture. Details of phase information are not predicted. Thus far, no published model uses these parameters to predict detailed phase formation accurately. However, with the phenomenological parameters in this article, we have proved theoretically and experimentally the ability to predict detailed phase formation with high accuracy.
To summarise, the advantages of our approach are the following: 1. Indiscriminate HEA selection feasibility: Some prediction methods such as CALPHAD are limited by the availability and depth of proprietary databases. Our method is based solely on binary phase diagrams for which there exist plentiful easily accessible data. 2. Phase region expansion ability: New HEAs are predicted with a high success rate outside the regions where 614 HEA phases are currently known. 3. Parameter-phase relevance: Unlike the traditional thermodynamic parameters, our parameters directly determine the formation of the corresponding phases. Detailed phase formation can be predicted. 4. Ease of computing: Methods such as ab initio molecular dynamics require high computation capability.  www.nature.com/scientificreports www.nature.com/scientificreports/ The atomic pairs with the separation effect are identified on phase diagrams by the presence of two bounding pure solid solution phases with no additional single phase present between the two. For example, a strong phase separation effect exists on the phase diagram of Cr-Cu (Fig. 6a) where Cr and Cu never dissolve into the same phase matrix.
In certain cases, at high temperatures, the mixing entropy term is large enough to overcome the positive mixing enthalpy and results in a negative Gibbs free energy for forming the solid solution. This makes it possible to have the two elements mixed marginally. Co-Cu in Fig. 6b is a typical example where two atoms separate at low temperature and mixing exists at high temperature. The Co-Cu phase diagram is used to calculate the − Separation Co Cu by the line segment method. The HEA Al 2 CoCrCuNi is used again. This method gives a Separation Co-Cu = 92% and Mixing Co-Cu = 8%. Separation i-j = 0% if the phase separation is absent from a phase diagram. phase formation temperature. For an as-cast HEA, the phase transformation evolves at various temperatures above T pf as it cools from the molten state. The values of PFP X and PSP are different when calculated using different T pf . Thus they result in different ML accuracies. To optimise the value of T pf , the parameter calculation and corresponding ML were conducted with T pf = 0.7, 0.75, 0.8, 0.85, and 0.9 T m . Highest ML accuracy was obtained when T pf = 0.8 T m . Of note, the optimised T pf is close to the undercooling temperature.
For the high temperature annealed HEAs, the phases formed during annealing at these high temperatures are locked in during rapid quenching. Thus, T pf is the annealing temperature and the phase formation tendency is determined from the line segment percentages of the binary phase fields present.
Machine learning. ML was conducted using the data mining software WEKA 3.8 32 . We use Random Forest 33 with 300 trees to perform this classification task. The features are the parameters defined for the three levels of the database partition. Each database level is divided randomly into training and test sets. The ML algorithm establishes and optimises decision trees based on the training set. These trees are used to predict the phases of HEAs in the test set based on their features. The performance of the ML model is accessed by 2, 3, 4, 5, and 10-fold cross-validations, which, in Table 1, correspond to training set percentages of 50%, 67%, 75%, 80%, and 90%. An F1 score, as a weighted average of precision and recall model evaluation metrics, is used to denote the success rate of prediction. Each cross-validation is conducted for 20 times and then the average F1 score is obtained. After the optimisation, new HEAs are predicted.
Alloy validation experiment. The 42 predicted HEAs used to validate our model were all prepared by suction casting. These HEAs are created by first making master ingots. These ingots are made from elements with a minimum purity of 99.7 wt%. The elements are arc-melted in a water-cooled copper hearth in a high purity argon atmosphere and are melted three times to ensure homogeneous mixing. The ingots are then suction-casted into a copper mould making 3 mm diameter rods. Structure investigations are carried out with XRD analysis using a Cu Kα radiation on a PANalytical Empyrean diffractometer.
Database description. 679 HEAs have been collected from literature in the supplementary material. Structural data used in our model is predominantly from XRD measurements. When transmission electron microscopy (TEM) data is available and it can reveal the hidden patterns from XRD results, the higher resolution TEM data will supersede the XRD data. The study is limited to 614 HEAs formed in the as-cast state or those annealed at temperatures higher than 0.7 T m . Most of the heat-treated HEAs were annealed above 0.7 T m . The high-temperature annealed HEAs are included since the formation entropy can contribute more Gibbs free energy change at the higher temperatures. Mechanically alloyed HEAs are not included because ball milling tends to retain metastable phases. www.nature.com/scientificreports www.nature.com/scientificreports/ Data availability