Introduction

Metallic glasses (MGs), as a unique class of amorphous materials, exhibit a high atomic packing density with pronounced topological and chemical short-to-medium range order1,2,3,4. The complex local structures have been demonstrated to have a profound influence on the properties of MGs5. In essence, many properties of MGs can be depicted in terms of excursions in the potential energy landscape (PEL)6,7,8, which is a multidimensional configurational space with local energy minima separated by barriers. In the PEL picture, elementary excitations upon external stimuli (e.g., thermal or mechanical) are associated with the β processes, which correspond to the hopping between nearby local minima, i.e., sub-basins inside a deep PEL megabasin9. Elementary excitations have been correlated with many properties10,11,12, including local plastic deformation13,14, diffusion mediated by atomic hopping15, as well as structural relaxation (local energy minimization in the direction towards the bottom of the basin) or rejuvenation (to a higher-energy local minimum)16.

It remains as a long-standing challenge to unravel the role of static structure in controlling the elementary excitations in MGs: is there a structural indicator that can be tapped into to predict how resistant or compliant different local regions are to externally stimulated activation? Over the past several decades, many efforts have been devoted to addressing this critical question. Recently, the emerging machine learning (ML) technique, based on well-crafted representations of the atomic environment, has been proven to be promising for establishing atomic-level structure-property relationships in liquids and glasses17,18,19,20,21,22,23,24. For example, Schoenholz et al.17 studied L-J model liquids and utilized ML to derive a structural parameter called “softness”, which was found to correlate well with the particle’s propensity for hopping, reflecting its susceptibility to β relaxation of liquids10. Below the glass transition temperature, metallic liquids become frozen into glass solids and the timescale of the glass dynamics becomes very long, well beyond the capability of atomistic (e.g., molecular dynamics) simulations. We, therefore, have to resort to local perturbation methods, to activate the local group of atoms into excited states by stress or thermal stimulus, as a probe into the susceptibility to elementary excitations. Several recent ML studies have focused on quantitatively gauging how the local environment influences the propensity for stress-activated β processes (i.e., shear transformations) in MGs18,19,20. For example, pioneering works of Cubuk et al.18 performed ML on disordered materials such as L-J glasses and granular systems and showed that radial and bond-angle distribution information can be used to identify atoms with a high propensity to shear transformation. Wang et al.19 developed interstice distribution as a new local structural representation for MGs, which is proven to be robust in predicting plastic sites of several MGs and has advantages in generalizing between compositions even chemical systems. However, the accuracy achieved in these attempts is not yet sufficiently high, and the reported scoring metric, e.g., recall or area under the receiver operating characteristic curve (AUC-ROC), is typically below 80%. One reason for this is that the elementary excitations upon shear transformations are complicated by the non-uniform tensorial stress field in the solid under deformation, as well as the dependence of activation on loading conditions (e.g., loading mode and direction)25,26. If not properly dealt with, these would introduce non-trivial noises in the accrued data and influence adversely the quality of the learnt structure-property relations.

This problem, however, subsides when dealing with the thermally induced elementary excitations in MGs. For instance, here we use activation-relaxation technique (ART)27,28 to probe the propensity for thermal activation of each atom in MGs (see schematic description in Fig. 1a, which will be discussed later). These activated processes are not subject to internal non-uniform stresses, and can be well converged by averaging over a considerable number of activation pathways, significantly reducing data noises. Meanwhile, the Gaussian-like distribution of thermal activation energetics (Fig. 1b, to be discussed later) can well identify atoms at both the hard and soft ends, corresponding to locally favored and unfavored motifs, respectively. This avoids the problem associated with common stress activation indicators (e.g., non-affine displacement or von Mises strain), which often exhibit a skewed “long-tail” distribution29 and the resolution at the hard end is much lower than that on the soft side. Moreover, thermally activated events are comparable in their energetics, at least for some MGs that are based on some common (or similar) elements, even when they are of different composition or processing history; as such multiple “datasets” can be combined to facilitate the ML identification of the structural underpinning in more general terms.

Fig. 1: Thermally activated events in Cu-Zr metallic glasses.
figure 1

a Schematic description of the β-process in the context of potential energy landscape (PEL). Red dashes illustrate several activated pathways from a local minimum. In practice, we initiate 50 independent events around each atom along random activation pathways using ART and extract an ensemble-averaged activation energy Eact for each atom. b Distribution of Eact in the six model glasses as well as their combined Eact spectrum. The median (quantile 50%) and quantiles 5% and 95% are marked as vertical dashed lines in the combined Eact spectrum, and the median (quantile 50%) is marked in the spectrum of each model glass.

In this work, we develop ML models to predict the propensity of thermally activated elementary excitation, from the atomic environment of the static MG structure. We systematically probe the activation energies in six MGs, including Cu64Zr36 prepared under different quenching rates, as well as Cu50Zr50 and Cu80Zr20, using ART27,28. The activation energy around each atom is calculated, and ensemble-averaged over 50 activation trials, to indicate its susceptibility to excitation. We then combine the data from the six MGs into a wider activation energy spectrum (Fig. 1b) and use ML to identify those atoms with strong resistance or high compliance to activation. By fingerprinting the atomic site environment with a recently proposed interstice distribution representation19, we find that ML can reliably identify atoms with the highest 5% and lowest 5% activation energy, reaching an area under the receiver operating characteristic curve (AUC-ROC) of 0.942 and 0.888, respectively. Such accuracies are considerably better than that in previous ML predictions of the propensity for stress-driven shear transformations18,19. We rigorously compare our ML results with those obtained using several other feature representations, and identify descriptors that are critical to our ML decision; interestingly, most of them turn out to be medium-range order features. Finally, we conduct quantitative “between-task” transferring tests and show that our learnt model can be used to predict the propensity for shear transformation as well. This ML work highlights the predictive power of local static structure to quantitatively connect with β processes in MGs.

Results

Energy barriers for thermally activated β processes

We employ molecular dynamics (MD) simulation to prepare six Cu-Zr model MG samples: (i) different compositions yet with the same cooling rate (Cu50Zr50, Cu64Zr36, and Cu80Zr20 quenched from liquid at 1010 K s−1), and (ii) same composition but with different cooling rates (Cu64Zr36 MGs with the quenching rates of 109 to 1012 K s−1) (see Methods for simulation details). We then apply ART to probe the energy barrier for thermally activated events27,28. The physics picture of ART is to cross the energy barrier in a way that simulates the thermal activation, although it’s actually achieved by the ART algorithm instead of real temperature (0 K is applied during that process).

Around each atom in those MGs, we initiate 50 independent activation events along random activation pathways (illustrated by the dashed red lines in Fig. 1a, see Methods for more details). The ensemble-averaged activation energy, Eact, can then be defined as the average energy difference between the saddle point and the initial state,

$$E_{{\mathrm{act}}} = \left\langle {E_{{\mathrm{saddle}}} - E_{{\mathrm{initial}}}} \right\rangle$$
(1)

The average value of 50 independent activations around each atom is sufficient to achieve a converged Eact, which contains key statistical information for thermal excitations on each local region (including the center atom and its neighbors). We chose to average all the activation barrier into an “effective” barrier as the target variable for this ML study. Such “effective” barrier can be considered as a pure thermodynamics description that aims to provide a relatively complete information on the local topology of the potential energy landscape. Other options for the target variable such as the lowest energy barrier have been discussed in Supplementary Table 1.

Figure 1b shows the distribution of Eact in the six MGs. The dashed vertical line denotes the percentile 50% (median) of Eact in each MG. The widespread of Eact signifies a large degree of structural and property heterogeneity in each glass. As mentioned in the Introduction, the Gaussian-like distribution of Eact observed is very different from that for stress-activated event, where a “long-tail” distribution is often observed in the stress activation indicator (e.g., non-affine displacement or von Mises strain)29. The Eact spectrum clearly depends on the MG composition or quenching rate. Next, we merge the Eact data of the six MGs into a more comprehensive Eact spectrum (Fig. 1b). The combined spectrum markedly increases the variety of local environments surveyed, far beyond what is present in a single MG. Later, we will feed this combined dataset to ML and test if ML is capable of mapping out the characteristic atoms at both the high Eact (hard) and low Eact (soft) ends of these Cu-Zr MGs.

Connecting activation barriers with local atomic environment

We make use of a set of interstice distribution descriptors to represent the local atomic environment19. The basic fingerprinting procedure is to extract groups of bonds, facets and tetrahedra from the coordination polyhedron of an atom, and then featurize the distribution of interstitial spaces present in these bond, facet, and tetrahedron groups. A simple treatment of representing the distribution is to derive typical statistics (such as minimum, mean, maximum, and standard-deviation) of the interstitial spaces present. The characterization of bond, facet, and tetrahedron interstices can include 2-body, 3-body, and 4-fold correlations, respectively, in the nearest-neighbor, short-range-order (SRO) signatures. The SRO signatures will be further “coarse-grained” to derive statistics among their neighbors. Such “coarse-grained” signatures are a representation of medium-range-order (MRO), with a length scale of ~4–6 Å, which is the next-level structural organization beyond the SRO. Upon implementation, the interstice representation contains 80 descriptors, 16 SRO, and 64 MRO. The codes for this representation can be publicly accessed in amlearn19 (https://github.com/Qi-max/amlearn) and matminer30 (https://github.com/hackingmaterials/matminer). This representation has been demonstrated to be highly predictive, interpretable, and generalizable in a range of MGs19.

After featurizing all atoms in the six MGs, we feed the data to a scalable tree boosting ML algorithm, XGBOOST31. XGBOOST implements a parallel tree boosting algorithm that is proven to be very efficient and robust in various cases. We train two sets of XGBOOST classifiers to identify the highest 5% and the lowest 5% Eact atoms, respectively, in the combined dataset merged from six MGs (Fig. 1b). Varying the threshold from 3% to 10% gives similar results, and in general, the smaller the fraction, the better the ML score (i.e., the easier for ML to identify). As we are dealing with an imbalanced dataset, we do random equal undersampling three times to create three data samples, each with 3000 positive class atoms (the highest or lowest Eact atoms) and 3000 negative class atoms. We then perform 5-fold cross-validation on each of the data samples, and average the predictions on the test sets (i.e., averaged over 5 × 3 = 15 test sets). The repeated undersampling procedure is very useful for reducing the variance introduced by data undersampling.

We use the area under the receiver operating characteristic curve (AUC-ROC) as the scoring metric of the classifiers. The ROC curve characterizes the tradeoff between the true positive rate (TPR) and negative-positive rate (FPR)32:

$${\mathrm{TPR}} = \frac{{{\mathrm{TP}}}}{{{\mathrm{TP}} + {\mathrm{FN}}}},{\mathrm{FPR}} = \frac{{{\mathrm{FP}}}}{{{\mathrm{FP}} + {\mathrm{TN}}}}$$
(2)

TPR is also known as recall or sensitivity, where TP and FN are short for true positive and false negative, respectively. FPR is the false-alarm rate, where FP and TN stand for false positive and true negative, respectively. AUC-ROC, measuring the area underneath the ROC curve, is a widely used metric to evaluate a classifier32. By definition, an AUC-ROC of 0.5 indicates the classifier performs no better than random chance level, 1.0 signifies perfect classification, and the higher the AUC-ROC, the better the model is at distinguishing the classes. Figure 2a presents the ROC curve and its AUC in classifying the highest and lowest 5% Eact atoms, respectively. For simplicity, these two ML problems are referred to as “H-Eact” and “L-Eact” hereafter. We see that the XGBOOST model trained from interstice distribution can well distinguish the high Eact atoms from the rest of the glass, reaching a very high AUC-ROC of 0.942. These high Eact atoms are particularly resistant to thermal activation and “pin” the local rearrangement. While there is an increased ambiguity in classifying the lowest Eact atoms, the AUC-ROC is also high (0.888), indicating there is also significant structural contrast at the soft end. One can directly observe from the ROC curve the TPR and FPR values at various probability thresholds for designating the classes.

Fig. 2: Predicting the heterogeneity of thermally activated events.
figure 2

a Receiver operating characteristic (ROC) curve and area under curve (AUC) in classifying the atoms showing the highest 5% (H-Eact problem) or lowest 5% activation energy (L-Eact problem). The dashed line marks a random case. b Near-perfect calibration of the ML-evaluated class probability estimates, that is, ph from H-Eact and pl from L-Eact.

Besides outputting a “label” (0 or 1) to predict whether an atom belongs to a class or not, XGBOOST (and many other ML algorithms) can also give continuous probability estimates, in the range of [0, 1], to reveal the confidence level of predictions. The probabilities can reveal the uncertainty of prediction, allow some flexibility in using the model, and provide a more nuanced way to assess the model. However, raw class probabilities from nonlinear ML algorithms are often not well-calibrated and should be carefully checked before interpretation. Specifically, if the predicted probabilities match the “real” class probabilities, such probabilities are referred to as calibrated. For instance, when the positive class probability of some data points is 0.70, ideally these points should indeed have a probability of 0.70 to be positive. This ideal calibration occasion is illustrated by the diagonal line in Fig. 2b. In this work, we employ a post-training calibration method called isotonic regression33 to improve the calibration performance of our probability estimates. As seen in Fig. 2b, the calibration curves of ph and pl, i.e., the probability estimates from models obtained in the “H-Eact” and “L-Eact” ML problems, respectively, are both demonstrated to be close to perfect calibration. The area between the calibration curve and the perfect calibration line, as a measure of miscalibration, is very low in both cases (Fig. 2b). Thus, our machine-learnt probability estimates can well reflect the real class probabilities and warrant further interpretation.

We proceed to look more into the distributions of the ML-evaluated class probability estimates, that is, ph from H-Eact and pl from L-Eact. Figures 3a and 3b present the overall ph and pl distributions in the six MGs as well as the variation of Eact with ph and pl. A wide distribution of ph and pl is observed, revealing a large degree of heterogeneity inside the MGs. ph has a larger proportion of atoms close to 0 and 1, again indicating that ML is more confident at distinguishing the high Eact atoms. A strong dependence of Eact on ph and pl is observed (Fig. 3b), that is, positively correlated with ph and negatively correlated with pl, demonstrating the feasibility of ph and pl serving as indicators of the thermal activation propensity. We further visualize the distribution of Eact, ph and pl in a model Cu64Zr36 glass to allow atomic-scale scrutinization (Fig. 3c). For simplicity, only atoms with ph or pl > 0.50 are highlighted in the ph and pl maps: ML predicts that the probability of these atoms belonging to the highest 5% Eact or lowest 5% Eact class is greater than 0.50; if setting a class threshold as 0.50, these ph or pl > 0.50 atoms would then be classified as the high or low Eact class, respectively. A good correspondence can be seen between the high Eact atoms and high ph atoms, as well as between the low Eact atoms and high pl atoms. As reflected by the relatively lower prediction score in the L-Eact task, there are more false-positive atoms (high pl yet high Eact) and false-negative atoms (low pl yet low Eact) in predicting the low Eact atoms, but still the prediction quality is sufficiently good. These results reveal that a solid relationship between local structure and thermal activation propensity can be established by combining interstice features and ML. We also perform direct regression of Eact using the interstice features and the Pearson correlation coefficients and parity plots are presented in Supplementary Table 3.

Fig. 3: ML-evaluated class probability.
figure 3

a Probability density distribution, f(p), of ph and pl in the combined MG dataset. b Strong dependence of activation energy Eact on ph and pl. c Distribution of Eact, ph, and pl in a Cu64Zr36 - 109 K s−1 MG. For simplicity, only the atoms with ph or pl > 0.50 are highlighted in c. The high Eact and low Eact atoms correspond well to the high ph and high pl atoms predicted by ML.

Comparison with ML models employing other feature representations

Next, we compare our ML results based on interstice features with those fitted from several other representations. Here we consider a total of eight pure structural representations and three physical signatures for comparison (Table 1). To guarantee a fair comparison, training is performed on the same data samples and same cross-validation splits. We train XGBOOST31 and SVM34 models with various hyperparameters and extract the best scores for each representation. Most of the presented scores are from XGBOOST, while the best scores of the radial symmetry functions, bispectrum coefficients and smooth-overlap of atomic positions (SOAP) are from linear SVM, and moment tensor potential (MTP) internally uses linear regression to build the potential model (Table 1). The detailed ROC curves can be found in Supplementary Table 2. Besides, an additional feature indicating whether the atom is Cu (0) or Zr (1) is added to each representation to help ML decisions. This is very helpful for representations that cannot well distinguish the atom types from the features themselves.

Table 1 Using various pure structural representations or physical signatures to classify the highest 5% (denoted as H-Eact problem) and lowest 5% Eact atoms (L-Eact problem) of the combined Eact spectrum merged from six MGs (Fig. 1b).

We start with two “baseline” models built with: (i) two one-hot-encoded (0 or 1) variables designating whether the nearest-neighbors around an atom form a < 0, 0, 12, 0, 0 > (<0, 0, 12, 0> if omitting occasional facets with >6 edges) or <0, 0, 12, 4, 0> Voronoi polyhedron or not; (ii) five integer Voronoi indices (n3, n4, n5, n6, and n>6), where nx represents the number of x-edged facets in the Voronoi polyhedron35. Many studies revealed that the Cu-centered <0, 0, 12, 0, 0> icosahedra and Zr-centered <0, 0, 12, 4, 0> polyhedra are among the most stable motifs in Cu-rich Cu-Zr MGs36,37. In this work, ~21.4% Cu atoms of the six MGs are surrounded by icosahedra and 6.0% Zr atoms are <0, 0, 12, 4, 0> . A baseline model can then be simply predicting the atoms centered in icosahedra or <0, 0, 12, 4, 0> as high Eact atoms and those not as low Eact. We find that the AUC-ROC achieved by such baseline model is not satisfactory, i.e., 0.673 and 0.557 in the H-Eact and L-Eact tasks, respectively (Table 1). As seen in the Supplementary Table 2, the TPR (recall) of this baseline model in classifying the highest 5% Eact atoms is ~0.48, indicating that indeed only ~0.48 of the highest Eact atoms are among the icosahedra and <0, 0, 12, 4, 0> atoms. Not surprisingly, this heuristic model works worse in classifying the lowest Eact atoms, as icosahedra and <0, 0, 12, 4, 0> are aimed at prototyping the most stable motifs and not forming those motifs does not necessarily mean that this atom is soft. This results in a large FPR and ultimately a small AUC-ROC of 0.557 in the L-Eact task (Supplementary Table 2). As to the second baseline model trained from the Voronoi indices, the prediction is better, with AUC-ROC of 0.750 and 0.628, respectively (Table 1, the ROC curves are presented in Supplementary Table 2). We see that by allowing the model to decide from the detailed Voronoi indices instead of from several predefined motifs only, the model can capture more subtle structural information and make better decisions in both tasks. These two sets of models are basically based on the well-established Voronoi indices and are relatively simple to set up, forming the baseline models in our tasks; and ideally, any proposed ML models should well outperform the baseline models.

Next, we combine a group of SRO features as the third structural representation for comparison, including characteristic motif signatures and Voronoi indices35 as used in the baseline models, coordination number (CN) within a cutoff distance (4.0 Å) or in a Voronoi polyhedron, Voronoi volume, and bond-orientational order parameters (ql and wl, where l = 4, 6, 8, and 10)38. This representation achieves an AUC-ROC of 0.807 in the H-Eact task and 0.634 in the L-Eact task (Table 1, see Supplementary Table 2 for ROC curves). The inclusion of bond-orientational order features accounts for the increase of AUC-ROC compared with baseline model 2. The L-Eact task remains to be a harder task than the H-Eact for the structural representation to predict. Beyond SRO, interestingly, if we further augment the SRO features with the coarse-grained MRO features (taking statistics between nearest neighbors19, as applied in the interstice representation), the predictive ability is greatly enhanced (Table 1, see Supplementary Table 2 for ROC curves). This suggests that it is important to bring MRO into the prediction scheme (the importance of MRO will be discussed in more detail later).

As another important group of structural representations, we adopt four representations that are originally designed for fitting ML potentials: (i) radial symmetry functions17,18,20,22,23,24,39; (ii) bispectrum coefficients of density functions40,41; (iii) moment tensor potential (MTP)42,43; and (iv) smooth-overlap of atomic positions (SOAP)44. Please see Methods for details. The ML results are summarized in Table 1 and ROC curves are shown in Supplementary Table 2. We see that these four representations can all well predict the high Eact atoms (AUC-ROC > 0.90), while the scores in predicting low Eact atoms are lower. The MTP and SOAP descriptors achieve the best scores in this group of structure representations. Going beyond the radial symmetry functions that only contain radial information, including angular information in the MTP and SOAP descriptors increases the prediction accuracy, yet does not induce a very significant improvement. This can be because in MGs, due to the removal of crystallographic restraints, the angular distribution tends to be close to that preferred in poly-tetrahedral packing without significant variation. Comparatively speaking, incorporating an effective representation of MRO, which has been demonstrated to pose a huge effect on the glass properties, has improved the prediction performance to a greater degree. This is demonstrated in Table 1 for the excellent accuracy of the interstice distribution representation with MRO incorporated, as well as the remarkable increase of accuracy for the simple SRO features when augmented by MRO ones. Besides, in previous studies, Schoenholz et al.17 used the radial symmetry function representation to classify atoms with high propensity for hopping (soft end) in L-J liquids and achieved a very high recall of ~90%. The relatively lower accuracy in the current L-Eact task (also corresponds to soft end) suggests that identifying atoms susceptible to β relaxation in the solid-state MGs could be harder than that for the parent supercooled liquids, as manifested by that the same set of features achieve a lower score in the former problem. Other possible factors are (i) the natural prediction accuracy difference between Cu-Zr MGs described by EAM potential and supercooled liquids described by pairwise L–J potential and (ii) the combination of different composition, different quenching rate in a single dataset may increase the ambiguity for the radial symmetry functions.

Finally, we compare the results of the pure structural representations with the results of three physical signatures, namely flexibility volume Vflex45, atomic and coarse-grained shear moduli G46 (see Methods). Table 1 summarizes their prediction scores and the ROC curves are presented in Supplementary Table 2. These signatures require detailed knowledge of interatomic potentials to calculate and thus are not pure structural representations. Among the physical descriptors, Vflex fares much better than atomic or coarse-grained G in correlating with Eact. We find that some pure structural representations (interstice, SRO + coarse-grained MRO, and the four ML potential representations) are still very competitive compared with these physical signatures (Table 1), further advocating the use of proper structural representation, with the aid of ML, to establish the structure-property relationship in MGs. The interstice distribution features achieve the highest prediction score in both the H-Eact and L-Eact tasks. Such quantitative benchmarks are important for obtaining a clear picture of the structure-property relations proposed in MGs. We also note that, strictly speaking, the relative performance of each representation can be task-specific. Thus, for a future task of interest, we recommend to conduct some rigorous benchmarking like this to locate the best representation for maximal ML performance.

Impact of medium-range environment on activated events

Thus far, we demonstrate that our ML model, employing the interstice features that start from static atomic positions only, can well predict the heterogeneity of thermal-activated elementary excitations in Cu-Zr MGs. We next look into how the ML models make decisions based on the input features.

ML algorithms such as XGBOOST allow quantification of feature importance, which evaluates how each descriptor improves the performance measure, e.g., Gini index for XGBOOST, and thus can be particularly useful for model interpretation. For ease of interpretation, we first remove some highly-linearly-correlated features (Pearson correlation coefficient > 0.70) and then reduce the feature number to 10 by a brute-force recursive feature-elimination procedure: i) train a model with N features and derive the ML performance; ii) iteratively eliminate each of the N features, retrain a ML model with the remaining N - 1 features and calculate the performance loss (if any) compared to the original model with N features; iii) eliminate the feature with the least performance loss. This is based on that basically, dropping unimportant features should not degrade the performance significantly. We recursively repeat the above procedure until the feature dimension is reduced to 10.

Figure 4a visualizes the ultimate 10 features and their Pearson correlation matrix. We abbreviate the subscript “interstice” as “is”; and for several distance interstice features, the subscript “dist” in dis-dist indicates that the nearest-neighbors are determined by a cutoff distance rather than by the default Voronoi tessellation. The 10 features exhibit low Pearson correlation coefficient (the maximum is 0.63). Interestingly, we find that 9 out of the 10 survived features are describing interstice distribution in the medium-range (i.e., with “MRO” in the feature name). This again suggests that MRO contributes greatly to the decision making. According to the feature importance, MROmean Std(Vis) and MROmean Std(dis-dist) are the most important features in the L-Eact and H-Eact tasks, respectively (Fig. 4b). These two metrics are evaluating the average variation of the tetrahedron volume interstice and bond distance interstice at the medium-range around an atom. This emphasizes the importance of local structure anisotropy, persisting to the medium-range, to the glass property. For the L-Eact task, MROmean Std(Vis) stands out with a very high importance, and for the H-Eact task, the feature importances distribute more evenly.

Fig. 4: Interpreting the ML models.
figure 4

a Pearson correlation coefficient of the ten features that survived the feature reduction. The coefficient value is encoded in the color while the circle radius encodes the absolute coefficient value. Vis, ais and dis represent the volume, area, and distance interstices, respectively, and the symbol before brackets, i.e., Std, Mean, Std, Min, denotes the statistics of these interstices in the nearest-neighbor (SRO) environment. If there is MROstat in the feature name, this means that the SRO feature has been coarse-grained among neighbors, i.e., taking the statistics, as denoted by the subscript of MRO, among neighbors. The subscript “dist” in dis-dist indicates that the neighbors are determined by a cutoff distance rather than by the default Voronoi tessellation. b Feature importance of the ML models trained in the H-Eact and L-Eact tasks. The feature importance is averaged over models obtained from the three times of data undersampling and five-fold cross-validation in each data sample.

We then select typical hard and soft Cu (Zr) atoms and show the distribution of tetrahedron volume interstice, Vis, and bond distance interstice, dis-dist, in their local environment to demonstrate the inherent structural contrast between the hard and soft atoms. Typical atoms with high Eact (~2.9 eV) and low Eact (~0.7 eV) are selected, and the red and purple histograms show the spread of interstices, Vis and dis-dist, present in the coordination polyhedron (SRO) and in the neighboring clusters (MRO), respectively (Fig. 5a and b). We find that the Vis and dis-dist distributions in the SRO of the high Eact atoms (Fig. 5a) are distinctly more centered than that in the low Eact ones (Fig. 5b). For the low Eact atoms, there often exist some tetrahedra or bond segments that have very low or high content of interstice. This would lower the stability of local environment and propel the atom to respond to thermal excitation. Remarkably, this trend persists to the medium-range (purple histograms). As quantified by Fig. 4b, the MRO interstice distribution is even more important than the SRO ones. The sharp contrast in the interstice distribution illustrates the foundation of our ML success in distinguishing the characteristic atoms.

Fig. 5: Structural contrast between the high and low Eact atoms.
figure 5

a Distribution of tetrahedron volume interstice Vis and bond distance interstice dis-dist around representative Cu and Zr atoms with high Eact = ~2.95 eV. The red and purple histograms show the spread of interstices in the coordination cluster (SRO) and that in the neighboring clusters (MRO), respectively. The SRO coordination clusters are also visualized in the inset, with the larger blue spheres for Zr and smaller orange spheres for Cu. The subscript “dist” in dis-dist means that the neighbors are determined by a cutoff distance rather than by the default Voronoi tessellation. b Distribution of Vis and dis-dist around representative Cu and Zr atoms with low Eact = ~0.71 eV. c Unsupervised principal component analysis (PCA) to reduce the original high-dimensional interstice feature space to a two-dimensional space. The color scale indicates the density level of the contours.

Next, we use principal component analysis (PCA)47 to project the information in the high-dimensional feature space (R10, ten features in Fig. 4) into a low-dimensional space (R2) to visualize the inherent data structure of the site environment signatures (Fig. 5c). PCA is a dimensionality reduction method that uses orthogonal transformation to reduce possibly correlated features to uncorrelated variables with key information preserved, and is totally unsupervised (with no use of class labels and does not need training)43. From Fig. 5c, we see that the high Eact and low Eact atoms do tend to reside in very different regions (the ratios of variance explained by the principle component 1 and 2 are 0.303 and 0.209, respectively). Back to the above supervised ML results, strong structural contrast in both the hard and soft ends is also revealed (Figs. 2 and 3). Here, both supervised and unsupervised analyses suggest a highly inhomogeneous MG structure, with distinctive hard (or say solid-like) and soft (liquid-like) atoms dissolved inside.

Transferability to identifying shear transformation propensity

As mentioned earlier, in addition to thermally activated events, another important type of elementary excitation is the local shear transformation activated by stress48,49,50,51,52,53. The low-stress-resistance units are usually referred to as shear transformation zones (STZs). As discussed in the Introduction, the thermal- and stress-activated excitations can both be interpreted in the framework of β processes, however, the atomic-specific response can vary, due to the different characteristics of stimulus source (uniform vs non-uniform, protocol-independent vs dependent). This prompts us to ask: how would our ML models trained for predicting the thermal excitation propensity perform, when they are used to identify STZs? Is it possible for the models to work well when transferring to a different task?

This “between-task” test is challenging in several ways: (i) STZs and Eact are basically different properties, stimulated by different stimuli and thus yielding different data; (ii) the features considered important for predicting Eact may not be optimal for identifying STZs. The point (ii) is very likely, as in a previous work using the interstice features to identify STZs in MGs, only ~50% of the most important features were MRO features19, much lower than the ~90% in the Eact case (Fig. 4). Driven by this question, we simulate athermal quasi-static (AQS) shear deformation of a typical Cu64Zr36–109 K s−1 glass (Methods). We calculate the interstice features of each atom and apply the model trained from the L-Eact problem (which focuses on the soft end) to derive the probability estimate pl of each atom. Intuitively, as pl is in positive correlation with the tendency of an atom to be easily activated by the thermal stimulus (Figs. 2 and 3), it may positively correlate with the susceptibility of atom to be activated by stress as well.

We calculate the non-affine displacement (\({\mathrm{D}}^{2}_{\min}\)) relative to undeformed state, at 4.0% shear strain, as an indicator of the plastic susceptibility of each atom. The correlation between \({\mathrm{D}}^{2}_{\min}\) and pl is presented in Fig. 6a. Given the “long-tail” distribution of \({\mathrm{D}}^{2}_{\min}\), box plots are used to present the correlation. Box plots are useful in such case of skewed distributions, with the median (a line in the interior of box), 25% and 75% quantile (lower and upper ends of box), 1.5 times the inter-quartile range (whiskers extending outside box), as well as outliers (points outside the whiskers), clearly marked. The left figure in Fig. 6a shows the complete box plot, and some outliers extend so widely that the box section is squeezed. We then highlight the squeezed section, which constitutes the vast majority of data, in the right figure of Fig. 6a. A positive correlation between pl and \({\mathrm{D}}^{2}_{\min}\) is clearly observed, evidencing our assumption that these two types of activations could have some similar structural origins. As a quantitative test, we use pl to try classifying STZs with the largest 5% \({\mathrm {D}}^2_{\min}\) from the rest of the glass, similar to the setting of the L-Eact task. We vary the threshold of pl in designating the positive/negative classes in this new STZ task, calculate the TPRs and FPRs and derive the ROC curve in Fig. 6b. The area under the ROC curve, AUC-ROC, is 0.810, which is a very reasonable score for such a transferring test. This quantitative test provides additional support to the feasibility of this “between-task” generalization.

Fig. 6: Transferring ML model to identifying STZs.
figure 6

a Correlation between the probability estimates, pl, from the L-Eact model and the non-affine displacement (\({\mathrm {D}}^2_{\min}\)) at strain 4.0% of a Cu64Zr36 – 109 K s−1 glass. In the box plots, ends of box spans from 25 to 75% percentile, black line in box represents median, whiskers show 1.5 times the inter-quartile range, and points outside the whiskers show outliers. Outliers are marked in the left plot and are removed in the right plot for clarity. b Receiver operating characteristic (ROC) curve and area under curve (AUC) when using the pl to identify STZs with the largest 5% \({\mathrm{D}}^2_{\min}\).

As discussed in the Introduction, the accuracy of STZ recognition (for example, Ref. 18 and 19) is usually lower than that of identifying the thermally activated atoms (Ref. 17 and this work), especially when using the same feature representation (radial symmetry functions17,18 or interstice distribution features19). There are several factors that can cause this performance difference. One is the increased internal data noise of the STZ data, if the data is collected from a single loading condition. As discussed in the Introduction, stress-activated plastic heterogeneity is quite sensitive to the loading conditions such as loading mode and direction25,26; thus, if using data from a single loading condition, non-trivial noise could be introduced in the collected data. For the thermal activation data, as used in this work, the absence of non-uniform stress eliminates the loading-related noises, and probing sufficient elementary ART events can guarantee a well-converged Eact to indicate the susceptibility to thermal excitation. In addition, upon deformation, the activation of STZ proceeds in a progressive way, that is, not all soft atoms will move in a straining step; therefore, it usually requires a relatively large strain to collect sufficient plastic events. However, this can introduce more cascade activation events to reduce the controllability of the initial undeformed structure, and the existence of long-range elastic field in the process of deformation would also increase the length scale of plastic heterogeneity, making it even beyond the scope of SRO and MRO that can be described by the structural representation.

Discussion

For the ML exploration of atomic-level structure-property relationships in amorphous alloys, a goal of common pursuit is developing novel structural representation and machine learning scheme. This paper, instead, focuses on another important aspect – finding of a suitable target property, with minimized data noises, to convincingly test the power of ML in correlating the structure with the property. Through what has been presented above, we have demonstrated that the thermally activated elementary excitation is an excellent choice in this regard. Compared with previous ML models on shear transformations in glasses, the merits of our present success on thermally activated events in MGs are multifold:

  1. i.

    We reached a high accuracy for ML prediction of elementary excitation in MGs. In this work, ML can accurately identify atoms with the highest 5% and lowest 5% thermal activation energy in a dataset merged from six different MGs, reaching an AUC-ROC of 0.942 and 0.888, respectively. These scores are significantly higher than that achieved in predicting the propensity of shear transformation. As discussed, this is mainly because the thermal activation does not suffer from the effect of non-uniform, oriented stress25,26, and can reduce the data noises by well-converged exploration of elementary excitations. The importance of noise reduction also has implications for constructing high-fidelity glass datasets in the future.

  2. ii.

    Our ML model is able to link structure with both local favored and unfavored structural motifs, rather than only identifying the latter as in previous ML literature17,18,19,20,21. This is aided by the explicit and sufficient ART perturbation tests around each atom, and the Gaussian-like distribution of thermal activation energetics that gives sufficient resolution to both the soft and hard ends. By benchmarking a variety of pure structural representations and physical signatures, our interstice distribution representation performs best in both ML tasks.

  3. iii.

    We have demonstrated that the data from multiple compositions or processing histories can be combined to connect with underlying structural signatures. This results from the comparable magnitude/range of activation barriers, for different compositions and processing histories in the same MG system. Such treatment can notably increase the variety of local environments surveyed, and allows for structure-property relation mining in more general terms.

  4. iv.

    Our analysis provides a repertoire of descriptors that are essential to the ML decision. We demonstrate how the ML models make decisions based on the interstice features and interpret why these features work in representing the inherent structural contrast in MGs. Our data-centric results also highlight the importance of MRO in determining the activation heterogeneity that has implications on the underlying glass physics. Very recently, Bapst et al.54 have built graph neural networks to learn, from a large amount of data, to encode the atomic environment, via message-passing through an expanded neighborhood. The models achieved impressive scores in predicting the atomic motion in supercooled liquids and the shear-induced events. While such deep learning techniques can provide greater versatility and representing ability, ML techniques based on the physics-oriented descriptors still have their benefits. For example, interpretability is important for gaining insight into the underlying physics. In this regard, structure representation such as the interstice distribution features used in this work is fully transparent as it is easily interpretable in terms of what each feature is representing and we can gain structural insights that transfer. Meanwhile, structural representations are often not material- or class-specific, i.e., they are quite general and perform the same for any glass system, making it easier to judge whether the framework will work outside the training environment.

  5. v.

    We have conducted a quantitative “between-task” transferring test that successfully transfers the model fitted for pinpointing the low thermal activation energy atoms to identifying STZs upon AQS shear deformation. This success points to some common structural origins of the thermal-activated and stress-activated β processes. It is interesting to extend such quantitative transferring tests to more glass properties in the future. Despite a ton of atom-specific properties have been studied up-to-now, many properties may be intercorrelated; thus, despite one ML model is trained and tested on one task, it is possible to generalize to more tasks and gain a wider range of utility. Forming a quantitative test on a wider range of properties can also sharpen the general understanding of structure-property correlations in MGs.

Taken together, these advances underscore the structural impact on the β processes and their heterogeneity, and the insights shed light on the role of β processes as a basic unit event underlying a variety of properties of MGs10,11,12, including local plastic deformation13,14, atomic hoping mediating diffusion15, and structural relaxation/rejuvenation16. Our discovery, enabled by the well-designed site environment representation and dedicated ML models, is very useful and important as a step forward in establishing a concrete structure-property relationship for MGs. We have made our MG configurations and thermal activation energy data public in figshare with the DOI of https://doi.org/10.6084/m9.figshare.12485795, which could serve as a valuable benchmark for future ML studies in MG research.

Methods

MG samples preparation by MD simulation

Molecular dynamics (MD) simulations using LAMMPS55 have been employed to prepare and analyze the Cu-Zr metallic glass models, using a set of optimized embedded-atom-method (EAM) potentials56. Cu64Zr36, Cu50Zr50, and Cu80Zr20 samples containing 10,000 or 5,000 atoms (if 5,000, we will prepare two different samples at the same processing condition) were quenched to room temperature (300 K) from equilibrium liquids above the corresponding melting points. The quenching was performed at a rate of 109–1012 K s−1, as marked in Fig. 1b, using a Nose–Hoover thermostat with zero external pressure. Periodic boundary conditions (PBC) were applied in all three directions during MD simulation57. The timestep was 1 fs.

Activation-relaxation technique (ART)

Initial perturbations in ART were introduced by applying random displacement on a small group of atoms (an atom and its nearest-neighbors)27,28. The magnitude of the displacement was fixed, while the direction was randomly chosen. When the curvature of the PEL was found to overcome the chosen threshold, the system was pushed towards the saddle point using the Lanczos algorithm. The saddle point is considered to be found when the overall force of the total system is below 0.01 eV Å−1. The corresponding activation energy is thus the difference between the saddle point energy and the initial state energy. The search is performed using ART nouveau package27,28,58. For each group of atoms, we employed ~50 successful ART searches with different random perturbation directions.

Radial symmetry functions

For an atom i, the radial symmetry functions are described as17,18,20,22,23,24,39,

$$G_\alpha (i;r) = \mathop {\sum}\limits_{j \in \alpha } {e^{ - (r_{ij} - r)^2/2\sigma ^2}}$$
(3)

where α represents an atom species in the system (Cu or Zr). rij is the distance between atoms i and j. r is a variable constant and σ is set as 0.2 Å. The sums are taken over all atom j whose distance to i is within a cutoff Rc (6.5 Å). This set of features can be considered as the Gaussian-smoothed partial pair correlation functions at different r values. Here, we vary r from 1.0 to 8.0 Å with a bin size of 0.2 Å (35 bins), generating 35 features for i – Cu and i – Zr, respectively. We then use the 70 features as input to train ML models on the same data and cross-validation splits to classify the high Eact and low Eact atoms.

Bispectrum coefficients of density functions

The coefficients of the bispectrum of the neighbor density mapped onto the 3-sphere are order parameters that can characterize the radial and angular distribution of neighbors of an atom42. We follow the implementation of Spectral Neighbor Analysis Potential (SNAP) which uses bispectrum as basis41. The bispectrum coefficients are calculated using the “compute sna/atom” command implemented by Thompson et al. in LAMMPS55. We set the twojmax as 6 and rfac0 as 0.99363. The scaling factor of the cutoff radius, rcutfac, the cutoff radii, RCu/RZr, and neighbor weights, wCu/wZr, are optimized by grid search and set to be 4.0, 0.7/0.8 and 1.0/0.9 for predicting the high Eact atoms and 4.0, 1.0/1.0 and 1.0/1.0 for predicting the low Eact atoms.

Moment tensor potential (MTP)

The MTP introduces the moment tensor descriptors43,44,

$$M_{\mu ,\nu }({\mathbf{n}}_i) = \mathop {\sum}\limits_j {f_\mu \left( {\left| {r_{ij}} \right|,z_i,z_j} \right)} \underbrace {r_{ij} \otimes \ldots \otimes r_{ij}}_{\nu \,{\mathrm{times}}}$$
(4)

to characterize the radial (fμ) and angular information (\(r_{ij} \otimes \ldots \otimes r_{ij}\)) of the neighborhood \({\mathbf{n}}_i\). The moments are then contracted to a set of basis functions Bα that are invariant to permutations, rotations, and reflections. In practice, all the basis functions whose level of multiplication (levBα) ≤ levmax are included. The site energies are then expanded as a linear combination of the basis functions. In this work, we set the levmax as 20 and the size of radial basis as 4, and the number of basis functions is 288. The radial parameters in the radial functions, the linear regression coefficients, as well as the weights of species (Cu and Zr) are fitted through regression of Eact using a modified version of MLIP package44. The predicted Eact for the test atoms are then used to derive the ROC curve and AUC-ROC for the present classification tasks (i.e., derive the TPRs and FPRs by varying the Eact threshold in designating the positive/negative classes and calculate the area underneath the curve). The same set of test atoms are used for each CV split, and the remaining atoms are all used, without undersampling, for training.

Smooth-overlap of atomic positions (SOAP)

In the SOAP formalism, the neighbor density is expanded into a radial basis function Rn (r) and spherical harmonics Ylm as angular basis set40:

$$\rho _i(r) = \mathop {\sum}\limits_j {e^{ - (r_{ij} - r)^2/2\sigma ^2}} f_c(r_{ij}) = \mathop {\sum}\limits_{nlm} {c_{nlm}^i} R_n(r)Y_{lm}({\hat{\mathbf r}})$$
(5)

SOAP also has achieved notable success in fitting ML potentials. In pratical applications, the number of descriptors depend on nmax (number of radial basis functions) and lmax (maximum degree of spherical harmonics), as noted in Table 1. Here we set nmax = 6 and lmax = 8. The cutoff radius for determining the neighbors, Rc, and standard deviation of Gaussian expansion, σ, are set as 4.5 Å and 0.5 Å, respectively. The SOAP descriptors are derived using DScribe59.

Flexibility volume and atomic shear moduli

The flexibility volume \(V_{{\mathrm{flex}},i}\) of atom i is defined as45:

$$V_{{\mathrm{flex}},i} = \left\langle {\left( {x_i(t) - \bar x_i} \right)^2} \right\rangle \times V_i^{1/3}$$
(6)

where \(\bar x_i\) and \(x_i(t)\) are the equilibrium position and instantaneous position at time t of the atom i, and Vi is the corresponding atomic volume. The calculation was obtained on short time scales when the mean square displacement is flat with time and contains the vibrational but not the diffusional contribution. Each sample was kept at equilibrium under a microcanonical ensemble (NVE) at room temperature for the calculation, which was taken over 100 independent runs, all starting from the same configuration but with momenta assigned randomly from the appropriate Maxwell-Boltzmann distribution.

Atomic shear moduli at room temperature were evaluated using the fluctuation method. For a canonical (NVT) ensemble, elastic constants can be calculated as the sum of three contributions:

$$C_{ijkl}^T = C_{ijkl}^I + C_{ijkl}^{II} + C_{ijkl}^{III}$$
(7)

where the superscript I, II, and III represents the fluctuation, kinetic contribution, and the Born term, respectively (see ref. 46 for more details). To reduce the statistical error in our simulated samples, the average atomic shear modulus (G) is evaluated as

$$G = \frac{{C_{44} + C_{55} + C_{66}}}{3}$$
(8)

The local moduli tensor is computed at the coarse-grained scale using the average atomic shear moduli of the center atom and its nearest neighbors.

Athermal quasi-static (AQS) simulation

We employ the athermal quasi-static (AQS) mode to simulate the shear deformation of glass60. On each deformation step, an affine strain of 10-4 is imposed along the +xy direction, followed by an energy minimization using the conjugate-gradient method. Initial configuration is the inherent structure of the equilibrated glass sample. The simulations were conducted using LAMMPS55 and periodic boundary conditions (PBC) were applied in all three directions. The plastic events were monitored using the non-affine displacement (\({\mathrm {D}}^2_{\min}\))49. This is done by tracking the atomic strain of each atom during deformation, and dissociating the strain into the best affine fit and the non-affine residue.