Predicting the propensity for thermally activated $\beta$ events in metallic glasses via interpretable machine learning

The elementary excitations in metallic glasses (MGs), i.e., $\beta$ processes that involve hopping between nearby sub-basins, underlie many unusual properties of the amorphous alloys. A high-efficacy prediction of the propensity for those activated processes from solely the atomic positions, however, has remained a daunting challenge. Recently, employing well-designed site environment descriptors and machine learning (ML), notable progress has been made in predicting the propensity for stress-activated $\beta$ processes (i.e., shear transformations) from the static structure. However, the complex tensorial stress field and direction-dependent activation would induce non-trivial noises in the data, limiting the accuracy of the structure-property mapping learned. Here, we focus on the thermally activated elementary excitations and generate high-quality data in several Cu-Zr MGs, allowing quantitative mapping of the potential energy landscape. After fingerprinting the atomic environment with short- and medium-range interstice distribution, ML can identify the atoms with strong resistance or high compliance to thermal activation, at an unprecedented accuracy over ML models for stress-driven activation events. Interestingly, a quantitative"between-task"transferring test reveals that our learnt model can also generalize to predict the propensity of shear transformation. Our dataset is potentially useful for benchmarking future ML models on structure-property relationships in MGs.


INTRODUCTION
Metallic glasses (MGs), as a unique class of amorphous materials, exhibit a high atomic packing density with pronounced topological and chemical short-to-medium range order [1][2][3][4] . The complex local structures have been demonstrated to have a profound influence on the properties of MGs 5 . In essence, many properties of MGs can be depicted in terms of excursions in the potential energy landscape (PEL) [6][7][8] , which is a multidimensional configurational space with corresponding local energy minima separated by barriers. In the PEL picture, elementary excitations upon external stimuli (e.g., thermal or mechanical) are associated with the β processes, which correspond to the hopping between nearby local minima, i.e., sub-basins inside a deep PEL megabasin 9 . Elementary excitations have been correlated with many properties [10][11][12] , including local plastic deformation 13,14 , diffusion mediated by atomic hopping 15 , as well as structural relaxation (local energy minimization in the direction towards the bottom of the basin) or rejuvenation (to a higher-energy local minimum) 16 .
It remains as a long-standing challenge to unravel the role of static structure in controlling the elementary excitations in MGs: is there a structural indicator that can be tapped into to predict how resistant or compliant different local regions are to externally stimulated activation?
Over the past several decades, many efforts have been devoted to addressing this critical question.
Recently, the emerging machine learning (ML) technique, based on well-crafted representations of atomic environment, has been proven to be promising for establishing atomic-level structureproperty relationships in liquids and glasses [17][18][19][20][21][22][23][24] . For example, Schoenholz et al. 17 studied L-J model liquids and utilized ML to derive a structural parameter called "softness", which was found to correlate well with the particle's propensity for hopping, reflecting its susceptibility to β relaxation of liquids 10 . Below the glass transition temperature, however, metallic liquids become frozen into glass solids and the timescale of the glass dynamics becomes very long, well beyond the capability of atomistic (e.g. molecular dynamics) simulations. We therefore have to resort to local perturbation methods, to activate the local group of atoms into excited states by stress or thermal stimulus, as a probe into the susceptibility to elementary excitations. Several recent ML studies have focused on quantitatively gauging how local environment influences the propensity for stress-activated β processes (i.e., shear transformations) in MGs [18][19][20] . For example, pioneering works of Cubuk et al. 18 performed ML on disordered materials such as L-J glasses and granular systems and showed that radial and bond-angle distribution information can be used to identify atoms with high propensity to shear transformation. Wang et al. 19 developed interstice distribution as a new local structural representation for MGs, which is proven to be robust in predicting plastic sites of several MGs and has advantages in generalizing between compositions even chemical systems. However, the accuracy achieved in these attempts is not yet sufficiently high, and the reported scoring metric, e.g. recall or area under receiver operating characteristic curve (AUC-ROC), is typically below 80%. One reason for this is that the elementary excitations upon shear transformations are complicated by the non-uniform tensorial stress field in the solid under deformation, as well as the dependence of activation on loading conditions (e.g., loading mode and direction) 25,26 . If not properly dealt with, these would introduce non-trivial noises in the accrued data and influence adversely the quality of the learnt structure-property relations.
This problem, however, subsides when dealing with the thermally induced elementary excitations in MGs. For instance, here we use activation-relaxation technique (ART) 27,28 to probe the propensity for thermal activation of each atom in MGs (see schematic description in Fig. 1a, will be discussed later). These activated processes are not subject to internal non-uniform stresses, and can be well converged by averaging over a considerable number of activation pathways, significantly reducing data noises. Meanwhile, the Gaussian-like distribution of thermal activation energetics (to be shown later) can well identify atoms at both the hard and soft ends, corresponding to locally favored and unfavored motifs, respectively. This avoids the problem associated with common stress activation indicator (e.g., non-affine displacement or von Mises strain), which often exhibits a skewed "long-tail" distribution 29 and the resolution at the hard end is much lower than that on the soft side. Moreover, thermally activated events are comparable in their energetics, at least for some MGs that are based on some common (or similar) elements, even when they are of different composition or processing history; as such these multiple "datasets" can be combined to facilitate the ML identification of the structural underpinning in more general terms.
In this work, we develop ML models to predict the propensity of thermally activated elementary excitation, from the atomic environment of the static MG structure. We systematically probe the activation energies in six MGs, including Cu64Zr36 prepared using different quenching rates, as well as Cu50Zr50 and Cu80Zr20, using ART 27,28 . The activation energy around each atom is calculated, and ensemble-averaged over 50 activation trials, to indicate its susceptibility to excitation. We then combine the data from the six MGs into a wider activation energy spectrum ( Fig. 1b) and use ML to identify those atoms with strong resistance or high compliance to activation. By fingerprinting the atomic site environment with a recently proposed interstice distribution representation 19 , we find that ML can reliably identify atoms with the highest 5% and lowest 5% activation energy, reaching an area under receiver operating characteristic curve (AUC-ROC) of 0.942 and 0.888, respectively. Such accuracies are considerably better than that in previous ML prediction of the propensity for stress-driven shear transformations 18,19 . We also identify descriptors that are critical to ML decision, and interestingly, most of them turn out to be medium-range order features. Finally, we conduct quantitative "between-task" transferring tests and show that our learnt model can be used to predict the propensity for shear transformation as well. This ML work highlights the predictive power of local static structure to quantitatively connect with β processes in MGs.

Energy barriers for thermally activated β processes
We employ molecular dynamics (MD) simulation to prepare six Cu-Zr model MG samples: i) different compositions yet with the same cooling rate (Cu50Zr50, Cu64Zr36, Cu80Zr20 quenched from liquid at 10 10 K/s), and ii) same composition but with different cooling rates (Cu64Zr36 MGs with the quenching rates of 10 9 to 10 12 K/s) (see Methods for simulation details). We then apply ART to probe the energy barrier for thermally-activated events 27,28 . Around each atom in those MGs, we initiate 50 independent activation events along random activation pathways (illustrated by the dashed red lines in Fig. 1a, see Methods for more details). The ensemble-averaged activation energy, Eact, can then be defined as the average energy difference between the saddle point and the initial state, The average value of 50 independent activations around each atom are sufficient to achieve a converged Eact, which contains key statistical information for thermal excitations on each local region (including center atom and its neighbors). Figure 1b shows the distribution of Eact in the six MGs. The dashed vertical line denotes the percentile 50% (median) of Eact in each MG. The wide spread of Eact signifies a large degree of structural and property heterogeneity in each glass. As mentioned in Introduction, the Gaussianlike distribution of Eact observed is very different from that for stress-activated event, where a "long-tail" distribution is often observed in the stress activation indicator (e.g., non-affine displacement or von Mises strain) 29 . The Eact spectrum clearly depends on the MG composition or quenching rate. Next, we merge the Eact data of the six MGs into a more comprehensive Eact spectrum (Fig. 1b). The combined spectrum markedly increases the variety of local environments surveyed, far beyond what is present in a single MG. Later, we will feed this combined dataset to ML and test if ML is capable of mapping out the characteristic atoms at both the high Eact (hard) and low Eact (soft) ends of these Cu-Zr MGs. (a) Schematic description of the β-process in the context of potential energy landscape (PEL). Red dashes illustrate several activated pathways from a local minimum. In practice, we initiate 50 independent events around each atom along random activation pathways using ART and extract an ensemble-averaged activation energy Eact for the atom. (b) Distribution of Eact in the six model glasses as well as their combined Eact spectrum. The median (quantile 50%) and quantiles 5% and 95% are marked as vertical dashed lines in the combined Eact spectrum, and the median (quantile 50%) is marked in the spectrum of each model glass.

Connecting activation barriers with local atomic environment
We make use of a set of interstice distribution descriptors to represent the local atomic environment 19 . The basic fingerprinting procedure is to extract groups of bonds, facets and tetrahedra from the coordination polyhedron of an atom, and then featurize the distribution of interstitial spaces present in these bond, facet and tetrahedron groups. A simple treatment of representing a distribution is to derive typical statistics (such as minimum, mean, maximum and standard-deviation) of the interstitial spaces present. The characterization of bond, facet and tetrahedron interstices can include 2-body, 3-body and 4-fold correlations, respectively, in the nearest-neighbor short-range-order (SRO) signatures. The SRO signatures will be further "coarsegrained" to derive statistics among their neighbors. Such "coarse-grained" signatures are a representation of medium-range-order (MRO), with a length scale of ~4 -6 Å, which is the nextlevel structural organization beyond the SRO. Upon implementation, the interstice representation contains 80 descriptors, 16  After featurizing all atoms in the six MGs, we feed the data to a scalable tree boosting ML algorithm, XGBOOST 31 . XGBOOST implements a parallel tree boosting algorithm that is proven to be very efficient and robust in various cases. We train two sets of XGBOOST classifiers to identify the highest 5% and the lowest 5% Eact atoms, respectively, in the combined dataset merged from six MGs (Fig. 1b). We have tested that varying the threshold from 3% to 10% gives similar results, and in general, the smaller the fraction, the better the ML score (i.e., the easier for ML to identify).
As we are dealing with an imbalanced dataset, we do random equal undersampling three times to create three data samples, each with 3,000 positive class atoms (the highest or lowest Eact atoms) and 3,000 negative class atoms. We then perform 5-fold cross-validation on each of the data samples, and average the predictions on the test sets (i.e. averaged over 5 × 3 = 15 test sets).
The repeated undersampling procedure is very useful for reducing the variance introduced by data undersampling.  We proceed to look more into the distributions of the ML-evaluated class probability estimates, that is, ph from H-Eact and pl from L-Eact. Figure 3a and 3b present the overall ph and pl distributions in the six MGs as well as the variation of Eact with ph and pl. A wide distribution of ph and pl is observed, suggesting a large degree of heterogeneity inside the MGs. ph has a larger proportion of atoms close to 0 and 1, again indicating that ML is more confident at distinguishing the high Eact atoms. A strong dependence of Eact on ph and pl is observed, that is, positively correlated with ph and negatively correlated with pl, demonstrating the feasibility of ph and pl serving as indicators of the thermal activation propensity (Fig. 3b). We further visualize the distribution of Eact, ph and pl in a model Cu64Zr36 glass to allow atomic-scale scrutinization (Fig. 3c).
For simplicity, only atoms with ph or pl > 0.50 are highlighted in the ph and pl maps: ML predicts that the probability of these atoms having highest 5% Eact or lowest 5% Eact is greater than 0.50; If setting a class threshold as 0.50, these ph or pl > 0.50 atoms would be classified as the high or low Eact class, respectively. A good correspondence can be seen between the high Eact atoms and high ph atoms, as well as between the low Eact atoms and high pl atoms. As reflected by the relatively lower prediction score in the L-Eact task, there are more false positive atoms (high pl yet high Eact) and false negative atoms (low pl yet low Eact) in predicting the low Eact atoms, but still the prediction quality is sufficiently good. These results reveal that a solid relationship between local structure and thermal activation propensity can be established by combining interstice features and ML.

Comparison with ML models employing other feature representations
Next, we compare our ML results based on interstice features with those fitted from several other representations. Here we consider a total of five pure structural representations and three physical signatures for comparison (Table 1). To guarantee a fair comparison, training is performed on the same data samples and same cross-validation splits. We train XGBOOST 31 and SVM 34 models with various hyperparameters and extract the best scores for each representation.
Most of the presented scores are from XGBOOST, while the best scores of the radial symmetry functions are from SVM (   Fig. 1b). As to the second baseline model trained from the Voronoi indices, the prediction is better, with AUC-ROC of 0.750 and 0.628, respectively ( Table 1, the ROC curves are presented in Supplementary Fig. 2). We see that by between nearest neighbors 19 , as applied in the interstice representation), the predictive ability is greatly enhanced ( Table 1, see Supplementary Fig. 4 for ROC curves). This suggests that it is quite important to include MRO for predicting the thermal activation propensity (the importance of MRO will be discussed in more detail later).
For an atom i, the radial symmetry functions are described as, where α represents an atom specie in the system (Cu or Zr). rij is the distance between atoms i and j. r is a variable constant and σ is set as 0.2 Å. The sums are taken over all atom j whose distance to i is within a cutoff R c (6.5 Å). This set of features can be considered as the Gaussian-  Table 1 and ROC curves are shown in Supplementary Fig. 5. We see that this representation can well predict the high Eact atoms (0.909), yet the score in predicting low Eact atoms is relatively lower (0.774). In previous studies, Schoenholz et al. 17 used this radial symmetry function representation to classify atoms with high propensity for hopping (soft end) in L-J liquids and achieved a very high recall of ~90%.
The relatively lower accuracy in the current L-Eact task (also corresponds to soft end) suggests that identifying atoms susceptible to β relaxation in the solid-state MGs could be harder than that for the parent supercooled liquids, as manifested by that the same set of features achieve a lower score in the former problem. Other possible factors are i) the natural prediction accuracy difference between Cu-Zr MGs described by EAM potential and supercooled liquids described by Eact. We find that the pure structural representations (interstice, SRO + coarse-grained MRO, and radial symmetry functions) are still very competitive compared with these physical signatures (Table 1), further advocating the use of proper structural representation, with the aid of ML, to establish the structure-property relationship in MGs. The interstice distribution features achieve the highest prediction score in both the H-Eact and L-Eact tasks. Such quantitative benchmarks are important for obtaining a clear picture of the structure-property relations proposed in MGs. We also note that, strictly speaking, the relative performance of each representation is specific to a task. Thus, for a future task of interest, we recommend to conduct some rigorous benchmarking tests to locate the best representation for maximal ML performance.

Impact of medium-range environment on activated events
Thus far, we demonstrate that our ML model, employing the interstice features that start This is based on that basically, dropping unimportant features should not degrade performance significantly. We recursively repeat the above procedure until the feature dimension is reduced to 10.  in the neighboring clusters (MRO), respectively ( Fig. 5a and 5b). We find that the Vis and dis-dist distributions in the SRO of the high Eact atoms (Fig. 5a) are distinctly more centered than that in the low Eact ones (Fig. 5b). For the low Eact atoms, there often exist some tetrahedra or bond segments that have very low or high content of interstice. This would lower the stability of local environment and propel the atom to respond to thermal excitation. Remarkably, this trend persists to the medium-range (purple histograms). As quantified by Fig. 4b, the MRO interstice distribution is even more important than the SRO ones. The sharp contrast in the interstice distribution illustrates the foundation of our ML success in distinguishing the characteristic atoms.
Next, we use principal component analysis (PCA) 43 to project the information in the highdimensional feature space (R 10 , ten features in Fig. 4) into a low-dimensional space (R 2 ) to visualize the inherent data structure of the site environment features (Fig. 5c). PCA is a dimensionality reduction method that uses orthogonal transformation to reduce possibly correlated features to uncorrelated variables with key information preserved, and is totally unsupervised (with no use of class labels and does not need training) 43 . From Fig. 5c, we see that the high Eact and low Eact atoms do tend to reside in very different regions. Back to the above supervised ML results, strong structural contrast in both the hard and soft ends is also revealed ( Figs. 2 and 3). Here, both supervised and unsupervised analyses suggest a highly inhomogeneous MG structure, with distinctive hard (or say solid-like) and soft (liquid-like) atoms dissolved inside.

Transferability to identifying shear transformation propensity
As mentioned earlier, in addition to thermally activated events, another important type of elementary excitation is the local shear transformation activated by stress [44][45][46][47][48][49] . The low-stressresistance units are usually referred to as shear transformation zones (STZs). As discussed in the Introduction, the thermal-and stress-activated excitations can both be interpreted in the framework of β processes, however, the atomic-specific response can vary, due to the different characteristics of stimulus source (uniform vs non-uniform, protocol-independent vs dependent).
This prompts us to ask: how would our ML models trained for predicting the thermal excitation propensity perform, when they are used to identify STZs? Is it possible for the models to work well when transferring to such a different task?
This "between-task" test is challenging in several ways: i) STZs and Eact are basically different properties, stimulated by different stimuli and thus yielding different data; ii) the features considered important for predicting Eact may not be optimal for identifying STZs. The ii) point is very likely, as in a previous work using the interstice features to identify STZs in MGs, only ~50% of the most important features were MRO features 19 , much lower than the ~90% in the Eact case ( Fig. 4). Driven by this question, we simulate athermal quasi-static (AQS) shear deformation of a typical Cu64Zr36 -10 9 K/s glass (Methods). We calculate the interstice features of each atom and apply the model trained from the L-Eact problem (which focuses on the soft end) to derive the probability estimate pl of each atom. Intuitively, as pl is in positive correlation with the tendency of an atom to be easily activated by the thermal stimulus ( Fig. 2 and Fig. 3), it may positively correlate with the susceptibility of atom to be activated by stress as well.
We calculate the non-affine displacement (D 2 min) relative to undeformed state, at 4.0% shear strain, as an indicator of the plastic susceptibility of each atom. The correlation between D 2 min and pl is presented in Fig. 6a. Given the "long-tail" distribution of D 2 min, box plots are used to present the correlation. Box plots are useful in such skewed distributions, with the median (a line in the interior of box), 25% and 75% quantile (lower and upper ends of box), 1.5 times the interquartile range (whiskers extending outside box) as well as outliers (points outside the whiskers) cleared marked. The left figure in Fig. 6a shows the complete box plot, and some outliers extend so widely that the box section is squeezed. We then highlight the squeezed section, which constitutes the vast majority of data, in the right figure of Fig. 6a. A positive correlation between pl and D 2 min is clearly observed, evidencing our assumption that these two types of activations could have some similar structural origins. As another quantitative test, we use pl to try classifying STZs with the largest 5% D 2 min from the rest of the glass, similar to the setting of the L-Eact task.
We vary the threshold of pl in designating the positive/negative classes in this STZ task, calculate the TPRs and FPRs and derive the ROC curve in Fig. 6b. The area under the ROC curve, AUC-ROC, is 0.810, which is a very reasonable score for such a transferring test. This quantitative test provides additional support to the feasibility of this "between-task" generalization. As discussed in the Introduction, the accuracy of STZ recognition (for example, Ref. 18 and 19) is usually lower than that of identifying the thermally activated atoms (Ref. 17 and this work), especially when using the same feature representation (namely radial symmetry functions 17,18 or interstice distribution features 19 ). There are several factors that can cause this performance difference. One is the increased internal data noise of the STZ data, if the data is collected from a single loading condition. As discussed in the Introduction, stress-activated plastic heterogeneity is quite sensitive to the loading conditions such as loading mode and direction 25,26 ; thus, if using data from a single loading condition, non-trivial noise would be introduced in the collected data.
For the thermal activation data, as used in this work, the absence of non-uniform stress eliminates the loading-related noises, and probing sufficient elementary ART events can guarantee a wellconverged Eact to indicate the susceptibility to thermal excitation. In addition, upon deformation, the activation of STZ proceeds in a progressive way, that is, not all soft atoms will move in a strain step; therefore, it usually requires a relatively large strain to collect sufficient plastic events.
However, this can introduce more cascade activation events to reduce the controllability of the initial undeformed structure, and the existence of long-range elastic field in the process of deformation would also increase the length scale of plastic heterogeneity, making it beyond the scope of SRO and MRO that can be described by the structural representation.

DISCUSSION
For the ML exploration of atomic-level structure-property relationships in amorphous alloys, a goal of common pursuit is developing novel structural representation and machine learning scheme. This paper, instead, focuses on another important aspect -finding of a suitable target property, with minimized data noises, to convincingly test the power of ML in correlating the structure with the property. Through what has been presented above, we have demonstrated that the thermally activated elementary excitation is an excellent choice in this regard. Compared with previous ML models on shear transformations in glasses, the merits of our present success on thermally activated events in MGs are multifold: i) We reached an unprecedentedly high accuracy for ML prediction of elementary excitation in MGs. In this work, ML can accurately identify atoms with the highest 5% and lowest 5% thermal activation energy in a dataset merged from six different MGs, reaching an AUC-ROC of 0.942 and 0.888, respectively. These scores are significantly higher than that achieved in predicting the propensity of shear transformation. As discussed, this is mainly because the thermal activation does not suffer from the effect of non-uniform, oriented stress 25,26 , and can reduce the data noises by well-converged exploration of elementary excitations. The importance of noise reduction also has implications for constructing high-fidelity glass datasets in the future.
ii) Our ML model is able to link with both local favored and unfavored structural motifs, rather than only identifying the latter as in previous ML literature [17][18][19][20][21] . This is aided by the explicit and sufficient ART perturbation tests around each atom, and the Gaussian-like distribution of thermal activation energetics that gives sufficient resolution to both the soft and hard ends.
iii) We have demonstrated that the data from multiple compositions or processing histories can be combined to connect with underlying structural signatures. This results from the comparable magnitude/range of activation barriers, for different compositions and processing histories in the same MG system. Such treatment can notably increase the variety of local environments surveyed, and allows for structure-property relation mining in more general terms. iv) Our analysis provides a repertoire of descriptors that are essential to the ML decision. We demonstrate how the ML models make decisions based on the interstice features and interpret why these features work in representing the inherent structural contrast in MGs. Our data-centric results also highlight the importance of MRO features in determining the activation heterogeneity that has implications on the underlying glass physics. v) We have conducted a quantitative "between-task" transferring test that successfully transfers the model fitted for pinpointing the low thermal activation energy atoms to identifying STZs upon AQS shear deformation. This success points to some common structural origins of the thermal-activated and stress-activated β processes.
Taken together, these advances underscore the structural impact on the β processes and their heterogeneity, and the insights shed light on the role of β processes as a basic unit event underlying a variety of properties of MGs 10-12 , including local plastic deformation 13,14 , atomic hoping mediating diffusion 15 , and structural relaxation/rejuvenation 16

MG samples preparation by MD simulation
Molecular dynamics (MD) simulations using LAMMPS 50 have been employed to prepare and analyze the Cu-Zr metallic glass models, using a set of optimized embedded-atom-method (EAM) potentials 51 . Cu64Zr36, Cu50Zr50 and Cu80Zr20 samples containing 10,000 or 5,000 atoms (if 5,000, we will prepare two different samples at the same processing condition) were quenched to room temperature (300 K) from equilibrium liquids above the corresponding melting points. The quenching was performed at a rate of 10 9 -10 12 K/s, as marked in Fig. 1b, using a Nose-Hoover thermostat with zero external pressure. Periodic boundary conditions (PBC) were applied in all three directions during MD simulation 52 . The timestep is 1 fs.

Activation-relaxation technique (ART)
Initial perturbations in ART were introduced by applying random displacement on a small group of atoms (an atom and its nearest-neighbors) 27,28 . The magnitude of the displacement was fixed, while the direction was randomly chosen. When the curvature of the PEL was found to overcome the chosen threshold, the system was pushed towards the saddle point using the Lanczos algorithm. The saddle point is considered to be found when the overall force of the total system is below 0.01 eV/Å. The corresponding activation energy is thus the difference between the saddle point energy and the initial state energy. The search is performed using ART nouveau package 27,28,53 . For each group of atoms, we employed ~50 successful ART searches with different random perturbation directions.

Flexibility volume and atomic shear moduli
The flexibility volume of atom i is defined as 41 : ( where and are the equilibrium position and instantaneous position at time t of the atom i, and Vi is the corresponding atomic volume. The calculation was obtained on short time scales when the mean square displacement is flat with time and contains the vibrational but not the diffusional contribution. Each sample was kept at equilibrium under a microcanonical ensemble (NVE) at room temperature for the calculation, which was taken over 100 independent runs, all starting from the same configuration but with momenta assigned randomly from the appropriate Maxwell-Boltzmann distribution.
Atomic shear moduli at room temperature were evaluated using the fluctuation method. For a canonical (NVT) ensemble, elastic constants can be calculated as the sum of three contributions: where the superscript I, II and III represents the fluctuation, kinetic contribution and the Born term, respectively (see Ref. 42 for more details). To reduce the statistical error in our simulated samples, the average atomic shear modulus (G) was evaluated as The local elastic moduli tensor is computed at the coarse-grained scale using the average atomic shear moduli of center atom and its nearest neighbors.

Athermal quasi-static (AQS) simulation
We employ the athermal quasi-static (AQS) mode to simulate the shear deformation of the glass 54 .
On each deformation step, an affine strain of 10 -4 is imposed along the +xy direction, followed by an energy minimization using the conjugate-gradient method. Initial configuration is the inherent structure of the equilibrated glass sample. The simulations were conducted using LAMMPS 50 and periodic boundary conditions (PBC) were applied in all three directions. The plastic events were monitored using the non-affine displacement (D 2 min) 45 . This is done by tracking the atomic strain of each atom during deformation, and dissociating the strain into the best affine fit and the nonaffine residue.   Table 1 and related content for details). (a) Classifying the highest 5% Eact atoms of the combined dataset merged from six MGs (Fig. 1b); (b) Classifying the lowest 5% Eact atoms of the combined dataset.