Rational design of chemically complex metallic glasses by hybrid modeling guided machine learning

The compositional design of metallic glasses (MGs) is a long-standing issue in materials science and engineering. However, traditional experimental approaches based on empirical rules are time consuming with a low efficiency. In this work, we successfully developed a hybrid machine learning (ML) model to address this fundamental issue based on a database containing ~5000 different compositions of metallic glasses (either bulk or ribbon) reported since 1960s. Unlike the prior works relying on empirical parameters for featurization of data, we designed modeling guided data descriptors in line with the recent theoretical models on amorphization in chemically complex alloys for the development of the hybrid classification-regression ML algorithms. Our hybrid ML modeling was validated both numerically and experimentally. Most importantly, it enabled the discovery of MGs (either bulk or ribbon) through the ML-aided deep search of a multitude of quaternary to scenery alloy compositions. The computational framework herein established is expected to accelerate the design of MG compositions and expand their applications by probing the complex and multi-dimensional compositional space that has never been explored before.


INTRODUCTION
Since their discovery in 1960s 1 , metallic glasses (MGs) have attracted tremendous research interest because of their promising structural and functional properties, such as superb strength 2,3 , high wear/abrasion resistance 4 , remarkable magnetic permeability 4 , excellent corrosion resistance 4 , and superior electrochemical catalytic ability 5 . However, as of today, a widespread use of MGs in various engineering applications is still rare. One of the roadblocks against the wide deployment of MGs is their limited size, resulting from the poor glass forming ability (GFA) of MGs relative to that of other glassy systems, such as oxide glasses 6 . To date, the design of MGs with a targeted GFA is still challenging.
While quantifying GFA of MGs is the enduring research effort in the MG literature [7][8][9] , the search of MGs is mostly based on the empirical rules, such as the Inoue's rules 10 , which is usually an iterative process of trial and error and hence inefficient on chemically complex alloys 11,12 . To overcome this issue, combinatorial experimental techniques, such as multi-target physical vapor deposition 13 , were adopted, which enabled the synthesis of hundreds of alloy compositions in one experimental run and hence greatly improved the efficiency in compositional screening. In addition, one emerging tool that can further accelerate the search of chemically complex MGs is the use of machine learning (ML) to guide high-throughput experiments [14][15][16] . In principle, ML is a data-driven approach and able to solve multi-variable (or multi-dimensional) complex problems by establishing a direct input-output correlation without specific programming 17,18 that requires a thorough understanding of the underlying physics. In practice, the success of a ML-based approach depends on the size of database. The general trend is that the more are the valid data the more reliable are the ML-generated predictions 18 .
Owing to the versatility of the ML-based approach, it becomes increasingly popular as a tool in the recent development of MGs 14,16,[19][20][21] . However, as discussed in ref. 18 , one of the challenges with the application of ML in materials science is the relatively small size of the dataset available for use. By tradition, people tend to report only successful results (or positive data) while discard those unsuccessful (negative data) in the development of materials. Therefore, over the past 60 years, people reported about~5000 MG compositions, out of which~1000 came with measured GFAs. As seen in Fig. 1a, these MGs are mostly based on transition metals, such as Fe, Zr, Cu, Ni, Ti, Ag, Pd, and Co, and a few based on alkaline and rare earth metals. Despite the substantial efforts dedicated to the development of MGs, however, the size of the MG dataset is still small relative to others, such as the Inorganic Crystal Structure Database that contains 240,000 structures (https://icsd.fiz-karlsruhe.de). In principle, people need both positive and negative data, as presentative and diverse as possible, in training ML models to avoid modeling bias. Thus, people built classification type ML models by including both typical MG and non-MG compositions 14,16,19,21 . As a result, the size of the dataset was extended to thousands of compositions 19 . Although the classification type ML models are useful for the fast screening of alloy compositions, they are incapable of pinpointing alloy compositions with superior GFAs.
On the other hand, regression type ML models were also developed for predicting GFA of MGs 19,22 . In fact, the reported GFA values are mostly integers as they correspond to the size of the largest mold in alloy casting, which however are usually taken as the data output of different ML models 19,22 . Consequently, the round-up error of the reported GFA values results in significant data scattering 23 (see Supplementary Fig. 1), hence compromising the predictability of the regression type ML models. In addition, the distribution of D is highly skewed towards the small value range (see Supplementary Fig. 2). Because of these difficulties, it is a challenge to develop a reliable ML model to guide the design of BMGs. According to the prior work 19 , if one considers all individual and collective attributes of constituent elements, he could come up with 186 data descriptors (a very high dimension). However, according to Zhang et al. 18 , data descriptors of a low dimension, as derived from physical modeling, are preferred for machine learning 18,24 . In this work, we propose hybrid machine learning algorithms to tackle the above problems by combining classification-type ML modeling for initial alloy screening and regression-type ML modeling for GFA prediction. As a result, we build our ML models based on the dataset containing~7000 compositions (see Supplementary Table 1 for the counts of elements covered by these compositions) available in the literature 8,22, to date for classification and regression. In addition, we develop the data descriptors based on the physical models recently developed for glass formation in chemically complex systems [49][50][51][52] . In theory, this can reduce the size of dataset needed for successful ML applications 18 .

RESULTS
Modeling-guided hybrid machine learning In principle, even some elemental metals can be vitrified (to form glass) at the cooling rate as high as 10 14 K s -1 53 . However, such a high cooling rate corresponds to a very limited sample size or a very poor GFA. According to the prior work 10 , it is known that the GFA of metallic liquids can be improved by mixing different sized elements, enhancing the local chemical affinity between atomic pairs 52 and increasing overall compositional complexity in favor of crystallization frustration 54 . In line with these three principles, we here develop three classes of descriptors (a total of 8) (see Fig. 1b and Supplementary Table 2) for our hybrid ML model, as described below: (1) Descriptors derived from atomic size. In the MG literature, the traditional models accounting for the atomic size effect on GFA were mostly developed for binary alloys 55,56 . In principle, these models were rooted in the same notion that one could easily distinguish solvent (or base) atoms from solute atoms in a MG. However, this notion does not apply easily to chemically complex or baseless MGs which apparently have no unique solvent or base elements, such as high entropy MGs (HEMGs) 57 . In this case, we generalized the geometrical model proposed by Ye et al. 50 by further considering coordination number deficiency and chemical fluctuation around a central atom, which was not accounted for in the original model. As a result, in our generalized geometrical model, we derive four additional data descriptors by minimizing the elastic energy storage caused by the atomic size misfit in chemically complex alloys. Unlike the previous works 50, 58 , we consider the presence of coordination deficiency and chemical fluctuation during energy minimization (see Supplementary Note 1). (2) Descriptors derived from local chemical affinity. According to the most recent atomistic simulations 52 , the GFA of multicomponent alloys is correlated with the standard deviation (ε À ) of the cohesive energies of the constituent elements and the averaged interaction energies ε between dislike elements. Inspired by this recent discovery 52 , we compute ε À and ε for the variety of alloy compositions and include them as the two additional data descriptors (see Supplementary Note 2). (3) Descriptors derived from entropy. According to the confusion principle 59 , one could improve the GFA of metallic liquids by promoting different crystallization pathways in their supercooled state. As a result, the competition between the different crystallization processes leads to a dynamic slowdown and thus enhances the GFA. From a thermodynamic viewpoint, this is equivalent to reducing the configurational entropy of the super-cooled metallic liquid, which is in line with the Adam-Gibbs model 60 . However, it is extremely difficult to compute the configurational entropy of a chemically complex metallic liquid without knowing its configurational energy landscape. To circumvent this difficulty, we propose to compute the correlated configurational entropy of mixing (S corr ) as an approximation to probe the configurational entropy of multicomponent alloys (See Supplementary Note 3). It should be noted that S corr was recently developed by He et al. 51 in their study of phase selection in high entropy alloys (HEAs). According to He et al. 51 , HEAs tend to form glass with a low S corr value while tend to form random solid solution with a high S corr value. This finding is consistent with the confusion principle 59 . Besides, S corr is implicitly related to chemical short-range ordering (CSRO). In principle, a higher degree of CSRO implies a lower value of S corr . However, the correlation between CRSO and atomic size misfit is not straightforward. According to the recent work of He et al. 61 , atomic size misfit could act against or in favor of CRSOs, depending on the chemistry/size of atomic pairs. This is an interesting and important topic that warrants future research.
For our classification ML modeling, we consider binary labeling of the data output. We first designate the alloy compositions that were reported to be capable of forming MG ribbons or BMGs with the label '1' as typical MG compositions, while the rest with the label '0' as non-typical MG compositions. According to the previous works 8, 9 , we take lnD rather than D-the GFA value of a MG composition-as the data output for our regression ML model. In doing so, we obtained a distribution of the data output close to a normal distribution, which reduced the data skewness ( Supplementary Fig. 2) and thus improved the performance of our regression ML models ( Supplementary Fig. 3).
In our database, we have~5000 MG compositions and~2000 non-MG compositions. To rule out a possible modeling bias towards MG compositions, we applied the synthetic minority over-sampling technique (SMOTE) 62 to oversample our non-MG compositions by 150%. In the literature, SMOTE has been widely used in data oversampling 63 , which can generate high-quality synthetic data after ruling out data outliers (noise) and data redundancy. In our work, the synthetic compositions were generated without knowing whether they belong to MGs or non-MGs. As a result, this generated additional~3000 virtual non-MG compositions to keep the size balance between the typical MG and non-MG compositions. After that, we employed three ML algorithms for the training and validation of the classification ML model, including adaptive boosting (AB), support vector machine (SVM) and k-nearest neighbors (KNN) (see Fig. 1c and Methods). According to our results, the AB model outperforms the other two for its highest testing accuracy of 87.7% with the area-under-curve (AUC) value of 0.95 ( Supplementary Fig. 4). It outperforms the model trained with the original data without oversampling, especially with respect to the prediction about non-MGs (see Supplementary Fig. 5). Next, we developed four classes of regression ML models, including artificial neural network (ANN), support vector regression (SVR), Gaussian process regression (GPR), and random forest (RF), as illustrated in Fig. 1d. Out of the four classes, we built 11 different regression ML models with the use of different algorithms, which led to 6 ANN-based models, 3 GPR-based models, 1 SVR-based model, and 1 RF-based model (see Methods). Subsequently, we tested/validated the ANN-based ML models by hold-out validation while the others via 10-fold cross-validation (see Methods). As a result, we found three best performing regression ML models, i.e., the Levenberg-Marquardt backpropagation ANN (LMANN) model, the rational quadratic kernel GPR (RQGPR) model, and the Exponential kernel GPR (ExpGPR) model. These regression ML models are characterized by low error and high coefficient of determination (~0.8) (see Supplementary Fig. 6).

Validation by experiments
To validate our ML modeling, we calculate the GFAs of three ternary (or pseudo-ternary) alloy systems, namely, the Zr-Cu-Al, Zr-Co-Al, and Zr-Cu-(Ag, Al), using the LMANN, RQGPR, and ExpGPR ML model, and compare our predictions with the available experimental data, as reported by Inoue et al. 64 , Bhatt et al. 65 , and Kim et al. 66 . It is worth noting that the data for these systems were not included in our original dataset for training and testing of our ML models. As shown in Fig. 2a-c, it is evident that the RQGPR model captures the trend of the experimental results very well, including the cases that failed to produce BMGs (Fig. 2a, b) and the marginal cases that nearly produced a BMG (Fig. 2b). The predictions of the ExpGPR model are like those of the RQGPR model ( Supplementary  Fig. 7); however, the predictions of the LMANN model are relatively poor, some of which are far off the experimental results (see Supplementary Fig. 8) even though the LMANN model appears to show a similar coefficient of determination and relative error with the other two models. Based on these findings, we adopted the RQGPR model to guide our development of MGs.
Discovery of chemically complex and high entropy metallicglasses Aside from the experimental validation, we demonstrate the predictability of our hybrid ML modeling through the discovery/ development of MG compositions that have not been reported before. For the present work, we consider 8 common metals (Fe, Zr, Cu, Ni, Ti, Al, Co, and Hf) as the possible constituent elements. These metals are non-toxic and easy to access, which have economic values and hence were frequently used in the previous design of MGs, as seen in Fig. 1a. It is worth noting that the atomic fraction of each constituent element was varied over a wide range during the deep search of the compositional space for MGs. For the present work, we discovered 12 MG compositions based on the predictions of the classification ML model. For comparison, we developed 23 alloy compositions (see Supplementary Table 3), including the 12 MG compositions as well as another 11 non-MG compositions which failed to pass our classification ML modeling. Next, the GFAs of these 12 glassforming alloys were predicted by the RQGPR model in terms of lnD, as shown in Fig. 3a. Based on these computational results, we prepared both bulk and/or ribbon samples for each of these 23 compositions by arc melting and/or melt spinning (see Methods).
As seen in Fig. 3b, the experimental results clearly show that the 12 predicted glass-forming alloys can form glass in either bulk or ribbon (or both) while those 11 predicted as bad glass-formers or non-MGs cannot (see Supplementary Fig. 9). Furthermore, we discovered that the 6 glass-forming alloys with high GFAs could form glass in bulk (see Fig. 3b, c). By comparison, 6 glass-forming alloys with predicted low GFAs could only form glassy ribbons (see Fig. 3b and Supplementary Fig. 10) but were crystallized in bulk (see Supplementary Fig. 11). Here it is worth noting that any small difference in lnD is magnified in an exponential manner when the prediction is transformed back to D. Thus, in the linear scale, the predicted trend is more meaningful than the exact value of the predicted D. Based on this trend prediction, we successfully locate the compositions of a few BMGs. We also measured the chemical composition of the glass-forming alloys with energy dispersive X-ray spectroscopy (EDX). The results show that the exact composition is consistent with the predicted one within a relative error of 9% (see Supplementary Table 4).
Before moving to the next section, it is worth noting that the traditional empirical approach 67 , which is based on the atomic size difference parameter (δ) and the heat of mixing (ΔH mix ), predicts that an alloy tends to form a metallic glass if δ > 0.065 and ΔH mix <−12 kJ mol −1 . However, as seen in Fig. 3d, this traditional approach is insufficient and unable to distinguish glass-forming alloys from others because they share a similar value of δ and/or ΔH mix . We also obtained the glass transition temperature (T g ), the crystallization onset temperature (T x ), and the liquidus temperature (T l ) of our MGs by differential scanning calorimetry (DSC) (see Methods and Supplementary Fig. 12) or following the method in refs. 68,69 (see Supplementary Note 4 and Supplementary Table 5). According to the literature 26 , the ratio of T g /T l -also termed as the 'reduced glass transition temperature, T rg '-scales positively with the GFA of MGs. As shown in Fig. 4, there is a clear trend that T g~0 .63T l , within a margin of ±0.05 T l for different classes of MGs, our high entropy MG ribbons are clearly off this trend because of their relatively high T l or low T g /T l ratio. Based on the DSC results, we also calculated T rg and γ ¼ Tx TgþTg of our MGs. As shown in Supplementary Fig. 13, the results reaffirm that the GFA of the high entropy MGs is low, which can be attributed to the high thermal stability (high T l ) of their corresponding crystals.

DISCUSSION
First, we are interested in the difference between our designed chemical complex and high entropy MGs, which may cause their Zr 55 Table 6).
In theory 50 , these two descriptors quantify the tolerance of a crystalline structure against atomic size misfit. Therefore, the larger these two descriptors are the more tolerant (or stable) the crystalline structure is against amorphization. For the discovered MGs, since their descriptor values look similar except for δc and σ δc , we can therefore attribute the disparity in their GFAs to a structural Z.Q. Zhou et al. difference that is closely linked to the excessive atomic size misfit as indicated by the descriptor radar charts.
Inspired by the above results, we examined the amorphous structure of Zr 54 Ni 16 Cu 14 Ti 10 Al 6 and Zr 25 Hf 25 Al 25 Co 25 across micro-, nano-, and atomic-scale through 3D atom probe tomography (APT) and aberration corrected transmission electron microscopy (see Methods). As shown by the APT elemental mapping (Fig. 6c, d)  show structural homogeneity at the nano scale, and structural homogeneity is observed by the TEM and HRTEM results as well (Fig. 6e, f, h, i). Interestingly, the STEM image (Fig. 6g) of the Zr 25 Hf 25 Al 25 Co 25 MG shows a clear sub-nanometer scale chemical fluctuation which cannot be resolved by APT and TEM owing to its limited spatial resolution (1-2 nm). According to the prior works, this contrast can be attributed to a density fluctuation, which can result from excessive plasticity [70][71][72] or simply signals an unusual capacity of a MG for plastic flows 72,73 . In contrast, the STEM image (Fig. 6j) of the Zr 54 Ni 16 Cu 14 Ti 10 Al 6 BMG display structural homogeneity in atomic-scale. Evidently, these findings echo well with what the data descriptor radar charts reveal about the probable structural difference between these two MGs (Fig. 6g, j): namely, the Zr 25 Hf 25 Al 25 Co 25 MG can afford a higher degree of chemical fluctuation than the Zr 54 Ni 16 Cu 14 Ti 10 Al 6 BMG. These findings are intriguing and warrant further research on the underlying physics.
In order to gage the relative importance of the data descriptors, we followed the literature 16,74 and removed each data descriptor, one at a time, from the training of the classification and regression models. We ran out all eight data descriptors and measured the accuracy loss in terms of RMSE (see Supplementary Fig. 15). According to our results, ε À is the most influential descriptor seconded by σδN N for both models. To further probe the difference between MGs and non-MGs, we obtained the parallel coordinate plots (PCPs) 75 from our dataset of~7000 compositions. In the literature, PCP is widely used for the visualization of sensitivity of multi-dimensional data 75 . For descriptors σδN N , ε 2 h i 1 2 , S corr , ε À and ε, the PCP bands for MGs are narrower and denser than those for non-MGs (see Supplementary Fig. 16). By contrast, the bands of descriptors δc , δN N , and σ δc for MGs look similar to those for non-MGs. To reveal the difference, we plotted the distributions of the three descriptors for MGs and non-MGs (see Supplementary Fig.  17). The averages of the three descriptors for MGs are generally larger than those for non-MGs, which indicates the slight difference in local packing deficiency and local chemical fluctuation between MGs and non-MGs. In principle, PCP provides us a comparative view for the distributions of multiple descriptors 76 , the effectiveness of which in machine learning can be inferred by the difference if there is any, in the band structures. Besides, we studied the difference of BMGs and glass ribbons for different classes of MGs with PCPs (e.g., Zr-based, Fe-based, Cu-based, and La-based). As shown in Fig. 6a-h, it is evident that the descriptor values are further squeezed into a much narrower band for BMGs than for glass ribbons, which suggests a comparatively limited compositional space that allows for the discovery of BMGs.
To sum up, we develop a hybrid ML model in this work for the design of MGs with a targeted GFA, which is based on the largest dataset available in the literature and has only 8 data descriptors derived in conformity with the theoretical models. Our hybrid ML model exhibits a high computational performance in classification and regression, and validated by the experiments that systematically studied the GFAs of several ternary glass-forming systems. We also demonstrate the predictability of the hybrid ML model through the development of chemically complex MGs-from quaternary to senary systems-based on the ML predictions. It is interesting to note that the values of the data descriptors also provide important clues to the hidden structural characteristics of our designed MGs, which are also validated by the experiments. These findings are important, which may pave the way towards the computational discovery of chemically complex MGs with unusual amorphous structures.

ML algorithms
Our ML algorithms are implemented by Matlab R2020a with Statistics and Machine Learning Toolbox and Deep Learning Toolbox.

Adaptive boosting (AB). AB model is trained in the Classification Learner
App from Statistics and Machine Learning Toolbox. 10-fold cross-validation is applied. Model type is set to Boosted Trees with the maximum number of splits set to 342, the number of learners to 143, and learning rate to 0.961. Other parameters are set automatically to achieve the best training results.

Support vector machine (SVM). SVM model is trained in the Classification
Learner App from Statistics and Machine Learning Toolbox. 10-fold crossvalidation is applied. Model type is set to Fine Gaussian SVM with the kernel scale set to 0.71 and the box constraint to 1. Other parameters are set automatically to achieve the best training results.

K-nearest neighbor (KNN). KNN model is trained in the Classification
Learner App from Statistics and Machine Learning Toolbox. 10-fold crossvalidation is applied. We use Euclidean distance metric with equal weight and the number of neighbors is set to 10. Other parameters are set automatically to achieve the best training results.
Artificial neural network (ANN). ANN in our work consists of one hidden layer with an activation function S a i ð Þ ¼ 1 1þe Àa i and 20 hidden neurons determined through pre-training (See Supplementary Fig. 18 To avoid overfitting, we applied a hold-out validation method to the ANNbased ML models and divided the dataset into three subsets with 70% data for training, 15% for validation, and 15% for testing. The training parameters are set as net.trainParam.Max_fail = 10; net.trainParam.goal = 0.02. The initial biases and weight matrices of input and layer are generated randomly. Gaussian processed regression (GPR). GPR models are trained in the Regression Learner App from Statistics and Machine Learning Toolbox. 10fold cross-validation is applied. We employ three different kernels to train  i ; the parameters are set as σ f = 1.86, σ l = 2.79, β = 0.53, ε = 0.28. Other parameters are set automatically to achieve the best training results. Support vector regression (SVR). SVR model is trained in the Regression Learner App from Statistics and Machine Learning Toolbox. 10-fold crossvalidation is applied. The model type is set to Fine Gaussian SVM; the kernel function is set to Gaussian k x; x 0 ð Þ¼expðÀ x À x 0 k k 2 Þ; the kernel scale is set to 0.71 while the box constraint is set to 0.93 and ε = 0.093. Other parameters are set automatically.
Random forest (RF). The RF model is trained in the Regression Learner App from Statistics and Machine Learning Toolbox. 10-fold cross-validation is applied. The model type is set to Boosted Trees; minimum leaf size is set to 8; number of learning is set to 30; the learning rate is set to 0.1. Other parameters or options are set automatically.

Arc-melting
Pure metals, including Ti, Zr, Hf, Al, Co, Ni, Fe, Cu, and Nb, with a purity level higher than 99.95% are used to prepare rod samples. We use a lab-scale arcmelting furnace to melt the pure metals with vacuum level as high as 8 × 10 −4 Pa and a melted Ti ingot to avoid possible oxidation. Then the melted samples are casted in copper molds with dimensions of Ф2 mm, Ф3 mm, and Ф5 mm.

Melt spinning
We first prepare ingot samples by casting mentioned above. Then the ingots are melted in a lab-scale induction-melting furnace with a vacuum as high as 8 × 10 −4 Pa and the ribbons are prepared by a single copper roller melt spinning with a rotating speed of 75 r/s.

Differential scanning calorimetry (DSC)
DSC experiments are performed using both DSC3/700 and TGA DSC3+HT/ 1600 (METTLER TOLEDO) with a heating rate of 20 K/min and argon flow with a rate of 50 mL/min.

Atom probe tomography (APT)
Needle-shaped APT specimens are fabricated by lift-outs and annular milled in a FEI Scios focused ion beam/scanning electron microscope (FIB/ SEM). The APT characterizations are performed in a local electrode atom probe (CAMEACA LEAP 5000 XR). The specimens are analyzed at 70 K in voltage mode, at a pulse repetition rate of 200 kHz, a pulse fraction of 20%, and an evaporation detection rate of 0.2% atom per pulse. Imago Visualization and Analysis Software (IVAS) version 3.8 is used for creating the 3D reconstructions and data analysis.

DATA AVAILABILITY
The exact datasets used in this study are freely available at https://citrination.com/ datasets/198590/. The source code used in this study could be downloaded from GitHub (https://github.com/ZHOU-Ziqing/ML_Metallicglass_GFA). Additional data related to this work is available on reasonable request.