Phase classification of multi-principal element alloys via interpretable machine learning

There is intense interest in uncovering design rules that govern the formation of various structural phases as a function of chemical composition in multi-principal element alloys (MPEAs). In this paper, we develop a machine learning (ML) approach built on the foundations of ensemble learning, post hoc model interpretability of black-box models, and clustering analysis to establish a quantitative relationship between the chemical composition and experimentally observed phases of MPEAs. The originality of our work stems from performing instance-level (or local) variable attribution analysis of ML predictions based on the breakdown method, and then identifying similar instances based on k-means clustering analysis of the breakdown results. We also complement the breakdown analysis with Ceteris Paribus profiles that showcase how the model response changes as a function of a single variable, when the values of all other variables are fixed. Results from local model interpretability analysis uncover key insights into variables that govern the formation of each phase. Our developed approach is generic, model-agnostic, and valuable to explain the insights learned by the black-box models. An interactive web application is developed to facilitate model sharing and accelerate the design of MPEAs with targeted properties.


I. INTRODUCTION
][4][5][6][7][8][9][10][11][12] HEAs are unique amongst MPEAs because they contain multiple (at least five) principal alloying elements of nearly equi-atomic concentration and yet have a global crystal structure with well-defined Bragg reflections indicative of long-range order.HEAs are typically solid solutions of face centered cubic (FCC), body centered cubic (BCC), or hexagonally closed packed (HCP) phases.Recently, the community has started to explore high entropic versions of intermetallic and ceramic compounds. 13,14To date, numerous elements in the periodic table have been explored to tune the properties of HEAs.However, not all compositions have resulted in the desired microstructure for application in extreme environments.6][17] In some applications, mixed phases are preferred; 18 whereas in others, a single-phase is desired. 19Nonetheless, these observations have led many groups to develop effective and efficient phase prediction models for enabling discoveries of novel HEAs for targeted applications.
Traditional high-throughput approaches based on first-principles calculations are particularly not suitable to search for novel MPEAs due to the need for large supercells and complex crystal structure space involving multiple prototypes.Although computational thermodynamics based methods have played an important role, 20,21 their limitations are also documented in the published literature. 22[24][25][26][27][28][29][30][31][32][33][34][35][36][37] One of the most explored ML implementations on MPEAs research is the phase classification problem, where the objective is to train ML models for predicting whether a given chemical composition will form in: (1) single-phase FCC, BCC, or HCP solid solutions, (2) FCC+BCC dual phase with varying phase fractions, (3) single-phase intermetallics, (4)   mixed phases (FCC+intermetalics, BCC+intermetallics, FCC+BCC+intermetallics, two different intermetallics etc.), or (5) amorphous phase.ML models with fairly high accuracy (over 75%) have been trained using small and large data set sizes and different choices of outputs.Various elemental and thermodynamic properties have been considered as input features for the phase classification problem.. 23,26,[29][30][31][32][33] A number of published studies also report descriptor importance based on cross-entropy, Gini index, or permutation methods to gain some insight into the descriptor contribution to the overall predictive power of the model.There are several drawbacks in the current approaches.None of the published papers explain the predictions of the black-box models at the granularity of each observation.There is a lack of principled approach to glean insights that shed light on the formation of each phase in the training set.It is not straight-forward to compare the predictive performance of every published ML study using the data sets generated from different research groups, because the ML models are not published along with the research paper.
In this work, we advance the application of ML methods in the MPEA phase classification problem in two significant ways.First, we apply two complementary instance-level (or local) post hoc model interpretability approaches, namely breakdown (BD) plots and Ceteris Paribus (CP) profiles, to glean insights into each observation.The BD method is based on the variable attribution principle, which decomposes the prediction of each individual observation into particular variable contributions. 38In contrast, the more traditional global variable importance method provides a high-level or generic understanding of the inner workings of a black-box model and captures the relative importance of a given variable in impacting the overall model performance on the entire data set (that includes all phases).The CP profile method, on the other hand, evaluates the prediction response of a trained ML model to changes in a particular variable under the assumption that the values of all other variables do not change.We then develop a novel algorithm that combines the variable attribution data from the BD method with k-means clustering method to infer insights about similar instances.These results provide insight into explaining the relative variable contributions in the prediction of each phase or class label as inferred by the ML models.In addition, the CP profile captures the average partial relationship between the predicted response and the input variables.In this paper, we demonstrate the power of local model interpretability methods as a key post hoc model analysis tool for materials informatics research.We apply them to explain the predictions from an ensemble of support vector machine (eSVM) models trained on a high-dimensional, multi-class MPEA phase classification problem data set.
SVMs belong to a class of black-box models that lack transparency. 39,40More details about the eSVM approach is given in the Methods section.Second, we build a novel interactive web application (https://adaptivedesign.shinyapps.io/AIRHEAD/) that allows the user to query our trained models directly and predict new MPEA or HEA compositions with the desired phase.This effort is aimed at allowing interested researchers to examine carefully the model predictions and facilitate the decision-making process.Moreover, this will also allow the MPEA community to objectively compare future models and document the progress.

A. Data sets and Model training
Our initial data set for ML was constructed by referring to several previous reports that meticulously compiled experimental data from the published literature.. 22,32,[41][42][43][44][45][46][47][48][49][50][51] The merged data set contained 3,719 compositions ranging from binary to multi-component alloys.Each composition was also augmented with the phase information as reported in the literature.The phases were then simplified into seven classes: BCC, FCC, BCC+FCC, HCP, Amorphous (AM), Intermetallics (IM), and Mixed-phases (MP).The IM label indicates that the microstructure contains at least one intermetallic phase.The MP label indicates complex mixture of multiple phase combinations.A final data set with 1,821 observations was obtained by removing all the duplicate data, missing values, and excluding the alloys showing inconsistent phase data depending on the source.Each of the 1,821 observation was represented by a total number of 125 variables. 52,53We did not track the processing history, which can have an impact on the thermodynamics and kinetics of phase formation in the MPEAs.
The number of variables were then reduced based on linear Pearson correlation coefficient (PCC) 54 and non-linear normalized mutual information (NMI) analyses. 55,56The workflow is shown in Figure 1a.We considered two different PCC threshold values (0.4 and 0.6) to down-select least linearly correlated input variables.Our choice of using a PCC criterion of 0.6 was motivated by the work of Pei et al. 32 In addition, we also imposed a more stringent PCC criterion of 0.4 for further simplification.The PCC analysis resulted in identifying 12 and 20 variable sets for the 0.4 and 0.6 criterion, respectively.The list of down-selected variables is given in Table 1.We can broadly subdivide the down-selected variables in three categories: (1) those that are chemistry-agnostic (e.g., Mixing Entropy), (2) those that depend on element pairs (e.g., DeltaHf), (3) those that depend on chemistry (everything else in Table 1).
We then examined the presence of non-linear associations using NMI analysis, where an arbitrary criterion of NMI > 0.2 was adopted to flag the presence of any non-linear associations.The choice of 0.2 was informed by our calculated NMI value of 0.35 for a simulated sinusoidal curve.While the 12 variable set showed no non-linear association, the 20 variable set contained three pairs of variables with NMI values greater than 0.2.We visualized the result using a scatterplot (Figure S1), which did not reveal any obvious non-linear trend that warranted further down-selection.Therefore, we ended up with two pre-processed data sets (one with a 12 variable set and the other with a 20 variable set) for ML model building.The pre-processed data set was randomly split into two subsets with 75% and 25% data for training and testing, respectively.We used the eSVM algorithm for training the ML models.The optimal hyperparameters were determined using a grid search.The out-of-bag error rate was used to evaluate the performance.We systematically varied the number of bootstrap samples and found the 50 and 100 bootstrap eSVM models to show the best predictive performance on the test data for the 12 and 20 variable sets, respectively.Tables S1 and S2 compare the relative performance of eSVM models on the test set in terms of accuracy, precision, recall, and F1-score.Both 12 and 20 feature sets of eSVM showed similar performance.Finally, we chose the simpler 12 feature set eSVM models for further analysis.
The next step is the post hoc analysis of the trained eSVM models.We start with the global variable importance analysis, which is also the most common method within the ML MPEA community.

B. Global Variable Importance
The objective of global variable importance analysis is to evaluate the relative importance of each variable in impacting the overall predictive performance of the trained ML models.
In this work, we used the well known permutation-method and cross-entropy loss function to assess the global variable importance. 57In Figure 2, we show the averaged global variable importance analysis from the 12 feature set eSVM model.All features appear to contribute to the prediction performance of the eSVM model.The error bar is the standard deviation from the 50 bootstrap samples.Mixing entropy, number of filled d or s valence electrons, covalent radius, and atomic weight are identified as more important to affect the prediction performance.]33 While helpful, global variable importance approach does not shed light on the following question: what variables contribute to the prediction of each phase (or class label) and how are these variables related to the predicted phase?This requires an implementation of local variable importance methods, which we discuss next.

C. Local Variable Importance
We focused on two complementary local model interpretability methods: (1) Breakdown plots and (2) Ceteris Paribus profiles.

Breakdown analysis
In the breakdown (BD) approach, we decompose the model prediction for a single observation into contributions that can be attributed to different input variables. 57,58The BD analysis can start from either a null set of indexes or a full set of relaxed features, which are referred to as step-up and step-down approaches, respectively.In the case of step-down approach (as considered in our work), each contribution of input variable is calculated by sequentially removing a single variable from a set followed by variable relaxation in a way that the distance to the prediction is minimized.For example, in Figure 3, a BD plot is shown for the NbTaTiV composition.The eSVM model predicted the composition to form in BCC with 100% probability score.Thus, we will obtain only one BD plot for this composition representing the BCC phase prediction.The BD plots resemble a bar graph.
Each variable can either contribute positively (positive weight) or negatively (negative weight) to the overall prediction.In this specific example, the mean MeltingT, mean NValence, and mean NsValence variables carry the largest weight and are recognized as important for predicting the composition as forming in the BCC phase.In a similar manner, we calculated the BD plots for all compositions in the training data.Readers can access the BD plots through our Web App.

Ceteris Paribus profile
The Ceteris Paribus (CP) profiles convey complementary insights about the relationship between a variable and the response by showing how the prediction would be affected if we changed a value of one variable while keeping all other variables unchanged. 57The method is based on the Ceteris Paribus principle; "Ceteris Paribus" is a Latin phrase meaning "other things held constant" or "all else unchanged."CP profiles are an intuitive method to gain insights in to how the black-box model works by investigating the influence of input variables separately, changing one at a time. 57In essence, a CP profile shows the dependence of the conditional expectation of the dependent (or output) variable on the values of a particular input variable.In Figure 4, we show a representative CP profile plot for the same NbTaTiV composition that was discussed in the previous BD section.Unlike the BD plot, we also observe the functional dependence of each variable on the model performance.In Figure 4, x-axes are the input variables and the y-axes are the prediction probabilities from the eSVM models.There are seven curves in each panel and each curve represents a particular phase.For example, the red curve traces the prediction for the BCC phase.The CP profile plot highlights the presence of non-linear relationship between each of the feature and the response.CP profiles for other compositions can be accessed through our Web App.

D. Extracting Variable Importance for each Phase
While the global variable importance analysis functions at the entire data set level, the breakdown and Ceteris Paribus analyses function at the granularity of each instance or composition.These two methods represent the two extremes in the spectrum of post hoc model interpretability analysis.In addition, there is a need for model interpretability at the intermediate level that will yield insights specific to each phase in our data set (based on the collective similarity or clustering of similar observations).To address this question, we combined the BD plots with the k-means clustering analysis and CP profile data.The pseudocode is summarized in Algorithm 1, which describes the implementation sequence of the BD method, k -means clustering, and CP analysis.
The algorithm starts with the BD analysis for each composition.For a given composition, the BD values are calculated from each trained SVM model in the ensemble and averaged across all 50 ensembles.The results are stored as a data frame.We then perform clustering analysis using the k -means algorithm, assigning a cluster label to each data point.We also construct CP profiles for each composition in the data set and group them according to the cluster labels.We then calculate the average CP profile for each cluster.The final outcome is two plots for each cluster: (1) averaged BD plots and (2) averaged CP profiles.Visualization of the two plots will yield phase-specific interpretation of the eSVM model.For k -means clustering, we found the optimal number of clusters by plotting the total within sum of square as a function of the number of clusters (Figure S2a).The elbow point corresponded to the choice of 10 clusters (as visualized in Figure S2b using principal component analysis).
The 10 clusters were then analyzed using histograms as shown in Figure 5, where we plot the frequency of occurrence of the number of components in the alloy composition for each cluster.Figure 5 shows that clusters 1, 5, 7 and 10 capture patterns that are representative of the binary systems.Given our interest in the design of HEAs, which normally consists of more than four components, we do not discuss the results from clusters 1, 5, 7, and 10.All other clusters can provide important clues for uncovering phase-specific variable importance analysis that pertain to the MPEAs and HEAs.Instead of explaining each cluster in detail (which is beyond the scope of this paper), we only focused on specific clusters where the ML predictions agreed closely with the experimental labels in the data set.
In Table S3 (in the Supplemental Document), we compared the ML prediction accuracy for each of the 10 clusters.are key variables for the formation of BCC phase.From Figure 6b, it can be inferred that maxdiff Electronegativity, mean NValence, and MixingEntropy are important for forming the AM phase.The relationship between mean DeltaHf and BCC phase also agrees well with the previous published results. 61e averaged BD plots from other clusters are also displayed in Figure S3, and the interpretations are summarized in Table S4.The analysis reveal similarities between BCC and IM phases, and between FCC and AM phases.The MP phase does not appear to have distinct characteristics.This may be due to the fact that the alloys of MP phase have a wider range of data distribution arising from relatively more abundant data and many different types of mixed phases compared to those with other phases that are more unique.
We next visualize the averaged CP profiles for clusters 8 and 9, which provide a more detailed account of the relationship between the input variables and the phases.The CP profiles for BCC and AM phases are shown in Figures 7a and b, respectively.Not all input variables have unique functional relationships.For example, in Figure 7a  We also made an attempt to connect the averaged BD plots (Figure 6a) with the averaged CP profiles (Figure 7a) for the BCC phase.We found that High mean MeltingT, high mean NsValence, and mean DeltaHf values between 0.3 and 0.5 favor BCC phase formation.
From the standpoint of maxdiff AtomicWeight and maxdiff NUnfilled variables, MPEAs tend to form in BCC phase when the constituent elements have moderately different atomic weights and similar number of the unfilled valence orbitals.In the case of AM phase (Figure 7b), while high mean NsValence values are preferred, low mean NValence values favor AM phase formation.Low MixingEntropy should be avoided, because it appears to favor the formation of mixed phase (blue curve in Figure 7b).There is a window of values for maxdiff AtomicWeight and maxdiff Electronegativity that favor AM phase formation.In Figure 7b, extreme values of maxdiff AtomicWeight and maxdiff Electronegativity appear to favor mixed phase.
So far, we have been comparing the averaged CP profiles within a cluster.We also observe

III. DISCUSSION
3][64][65][66] The expectation that the ML model should also explain the underlying patterns of materials phenomena in addition to the predictions has been steadily increasing.There are also papers from other disciplines, such as bioinformatics, that share similar goals. 67We have developed a novel post hoc ML model interpretability framework for the MPEA phase classification problem.The algorithms provide an indepth analysis of the complex black-box models and extracts interpretable patterns from an ensemble of trained models.In the materials informatics literature, the results from global variable importance are widely used to interpret which variables are strongly related to the ML performance.We argue that phase-specific (or class label specific) variable importance analysis based on local model interpretability offers a new way to gain much deeper insights into the global variable importance results.To illustrate this point, we also compared the global and local variable importance plots to glean additional insights (main results are distilled in Table S5).Note that the top three variables from the global variable importance analysis, namely MixingEntropy, dev NdValence, and mean CovalentRadius, are not associated with either the single-phase BCC or FCC compositions that have attracted interest for tailoring the mechanical properties of the HEAs. 68The fact that these variables are connected to the MP phase indicates that the presence of a large fraction of the MP phase in the dataset significantly affects (or biases) the global variable importance analysis.
which is a package to compute the concentration-weighted values of materials using the elemental or pairwise properties of components.To find the independent descriptors among 125 descriptors, the feature values are normalized by min-max scaling and then analyzed using pair-wise Pearson correlation and normalized mutual information coefficients 69 within the RSTUDIO environment. 70chine learning.We employed the eSVM models for multi-class classification learning tasks. 71The eSVM algorithm comprises of multiple SVM models generated by the bootstrap sampling method. 72We used the nonlinear Gaussian radial basis function kernel, as implemented in the e1071 package. 73 Web Application.Applications developed with the Shiny package 75 in the R programming language allow users to interactively engage with models defined in the server end (server.R).
The front end of the application, contained in the user-interface script (ui.R), takes a user inputted string composed of element symbols followed by the amount of the element (e.g., Al1.0V1.0Nb1.0T1.0)representing the composition of the high entropy alloy.The trained eSVM model in the backend generates the phase probability for the given composition.
Additionally, the users can obtain the set of 12 descriptors (Table 1), generated using an R script based on the Magpie package.For each new composition, the user can add the phase probability and descriptor information to a dynamic history, able to be exported as a comma separated value file at the end of the session.For each of the 1,367 points in the training set, users can see the associated BD plot and CP profiles.The web app can be accessed at https://adaptivedesign.shinyapps.io/AIRHEAD/.
The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of DARPA, the Army Research Office, or the U.S. Government.The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.

FIG. 1 .
FIG. 1.The flow chart for (a) feature selection using Pearson correlation coefficients (PCC) and normalized mutual information (NMI) and (b) machine learning and local model interpretability approach.In this work, we used an ensemble of support vector machines (eSVM) for multiclass classification learning, breakdown plots and Ceteris Paribus (CP) profiles for local model interpretability, and k-means clustering.

FIG. 2 .
FIG. 2. The global variable importance for the 12 variable set eSVM model.Cross entropy loss was used as an indicator of variable importance.Error bars represent the standard deviation from 50 SVM models in the ensemble.

FIG. 3 .
FIG. 3. The BD plot for NbTaTiV composition, which is predicted to form in BCC phase by the eSVM model.Each bar represents the averaged contribution for that variable towards the overall prediction.

FIG. 4 .
FIG. 4. The CP profile for NbTaTiV composition with respect to the 12 input variables.The black dots indicate the true feature values.Line colors denote phase information: blue, MP; violet, AM; cyan, FCC; orange, BCC+FCC; lightblue, HCP; red, BCC; green, IM.

FIG. 5 .
FIG. 5.The distributions of the number of components (denoted as NComp) for the 10 clusters from k-means clustering analysis.Each cluster is also identified by phase selections via the BD-based prediction as shown in the titles of each plot.

Figure 5
indicates that clusters 8 and 9 are representative of the MPEAs.Although cluster 4 is also representative of MPEAs (six-component alloys), it contained fewer data points than clusters 8 and 9. Therefore, we focused on clusters 8 and 9 for model interpretation.The prediction accuracy data from eSVM reveals that clusters 8 and 9 are representative of the BCC and AM phases, respectively.The averaged variable attribution analyses from the BD method for clusters 8 and 9 are shown in Figures6a and b, respectively.The mean NsValence and maxdiff AtomicWeight variables are identified as important variables for both BCC and AM phases.Since the maxdiff AtomicWeight variable can be related to the atomic size mismatch, this result is in good agreement with the previous studies.59,60Figure 6a indicates that mean MeltingT, maxdiff NUnfilled, and mean DeltaHf

FIG. 6 .
FIG. 6.The averaged and sorted contribution from each variable for (a) cluster 8 (BCC phase) and (b) cluster 9 (AM phase).Each bar represents the relaxed predictions with and without a particular single explanatory variable in the corresponding row.The last row contains the sum of the overall mean prediction values.Red dots and yellow lines stand for median values and error bars, respectively.
(representative of BCC phase), similar functional relationships are observed between: (1) frac pValence, maxdiff NUnfilled and min NpUnfilled, (2) mean CovalentRadius and mean DeltaHf, (3) dev NdValence, maxdiff Electronegativity, and mean NValance, and (4) mean NsValence and mean MeltingT.The maxdiff AtomicWeight and MixingEntropy are the only two variables that do not share a similar relationship with any other variable.

FIG. 7 .
FIG. 7. The averaged CP profiles for (a) cluster 8 (BCC phase) and (b) cluster 9 (AM phase) with respect to the 12 input variables.The black dots indicate the true feature values for all the data points within that cluster.Line colors denote phase information: blue, MP; violet, AM; cyan, FCC; orange, BCC+FCC; lightblue, HCP; red, BCC; green, IM.

FIG. 8 .
FIG. 8.The constituent elements present in clusters 8 (BCC phase) and 9 (AM phase) are (a) depicted in the periodic table and (b) analyzed by pie charts, where each number shows their frequency of occurrence.The purple (dashed), red (solid), and blue (dotted) circles indicate the elements appearing in both BCC and AM phases, only BCC phase, and only AM phase, respectively.
One can generate a large number of training sets using the bootstrap sampling, where samples are randomly drawn with replacement.Every resampling produces two types of samples: (1) in-bag and (2) out-of-bag (OOB), which are used for training and testing the ML models, respectively.The optimization of eSVM hyperparameters is done by the OOB evaluation using grid search.Breakdown and Ceteris Paribus methods.To interpret the trained eSVM model, the BD and CP profile methods as implemented in the DALEX package 57 were applied to compute the contributions of features and individual profiles to ML prediction, respectively.The k-mean clustering algorithm from the factoextra package 74 was used to divide the dataset containing the BD values into clusters in an unsupervised fashion.Local feature importance is analyzed based on the averaged BD data by identifying the correlation between each cluster and the phase selections as predicted by the BD method.The global variable importance of the eSVM is obtained by averaging the outputs of global variable importance for each individual SVM part across all the bootstrap samples.

TABLE I .
List of the descriptors identified from 125 descriptors by PCC > 0.4 or 0.6 along with NMI > 0.2.
Augmenting global variable importance analysis with local feature importance has many desirable characteristics for rationally tailoring new HEAs with desired properties.
Therefore, pursuing MPEA design based solely from global variable importance analysis could potentially mislead the researchers especially from the context of a multi-class classification learning setting.