Introduction

The mechanical response of a crystalline metal to applied stresses can be quantitatively explained by monitoring the evolution of dislocations1,2 in the lattice. In stark contrast, amorphous metals do not have such well-defined structural defects, owing to the absence of long-range atomic packing order. As a result, in the static structure of a glass, it remains a grand challenge to identify a priori local “defects” that are vulnerable to rearrangements, even when the positions of all atoms are fully known3,4,5,6,7. Over the years, a number of physical-property-informed parameters have been used to serve as indicators8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31 to forecast fertile sites for shear transformations. Also, several macroscopic and experimental parameters28,29,30,31,32 have been shown to correlate with the properties/processes in glasses. Data-driven models33,34,35,36 have been put forward as well for the same goal. However, all these endeavors have only achieved moderate success: the correlations with the plastic susceptibility of particles/atoms, i.e., the propensity of each particle to experience plastic rearrangement, have not been sufficiently strong to establish structure–property relations in glassy solids.

A critical reason for this status quo is that the scalar or rotation-invariant quantities employed so far are inherently inadequate in capturing the anisotropic response of a local configuration. A given local environment of an atom (particle) can respond quite differently when the externally applied global force is imposed along different orientations37,38,39,40. Recently, additional indicators have been invoked to take into account anisotropy in predicting the propensity for plastic activity in two-dimensional (2D) model glasses38,39,40. But it remains unclear how well these new indicators fare in three-dimensional (3D) amorphous solids. Furthermore, these indicators are physics-informed quantities that need to be evaluated with known interparticle interactions (potentials), every time the structure changes. In other words, one cannot simply monitor the spatial atomic positions alone, to explain property changes. In terms of physical insight, it is difficult to use these parameters to directly identify the critical structure features (i.e., the atomic packing environment per se) responsible for the anisotropic local mechanical response. This has been a puzzle existing for at least more than two decades since Falk and Langer pointed it out in 199837.

Recently, deep convolutional neural networks (CNNs) have enabled revolutionary breakthroughs in the field of computer vision and pattern recognition41,42,43,44. Inspired by such successes, here we explore the potential of this deep learning approach via converting local configurations into images and constructing large-scale training datasets (up to ~20 million images) from extensive atomic simulations. More importantly, we also devise a rotationally non-invariant structure representation, to capture the anisotropic responses of a local configuration and the intrinsic sensitivity to the loading orientation. We will demonstrate that this combination achieves unprecedented accuracy when it is used to separate out particles with high plastic susceptibility, unveiling the power of local static structure for predicting orientation-dependent plastic events in both 2D and 3D glasses. In terms of new physical insight, by identifying a clear difference in the new atomic structure representation for contrasting particles, i.e., those with high versus low plastic susceptibility, we have contributed, from the structural perspective, to fully resolving the puzzle as to why the same local configuration could respond very differently to external loading applied in different orientations.

Results

Atomic structure representation and model architecture

We have developed CNN models in both a 2D binary Lennard-Jones (L-J) glass system45 and a 3D binary embedded-atom method (EAM)-based Cu–Zr metallic glass system46. To represent the local packing environment of each particle i, we use a new multidimensional structural function, termed a Gaussian-weighted spatial density map (SDM). For the 2D model, the SDM of each particle i is defined as

$${{\Xi }}_i\left( {y,x,\beta } \right) = \mathop {\sum }\nolimits {\mathrm{exp}}\left( { - \frac{{\left( {r_{ij,x} - x} \right)^2 + \left( {r_{ij,y} - y} \right)^2}}{{2\Delta ^2}}} \right)$$
(1)

where the summation is performed over all particles satisfying these conditions: \({\mathrm{species}}\left( j \right) \in \beta ;| {r_{ij,x}} | < r_c\) and \(| {r_{ij,y}} | < r_c\), \(r_{ij,x}\) and \(r_{ij,y}\) are the components along x and y dimensions, respectively, of \({\mathbf{r}}_{i,j}\), which is the vector connecting the central particle i with particle j of species \(\beta\) within a cutoff \(r_c\) (particle j may be the central particle when the species of the central particle is that under consideration in the channel). Here, \(x,y \in \left[ { - r_c + 0.5\Delta ,r_c - 0.5\Delta } \right]\) with a constant increment of \(\Delta\) and \(\beta \in \left\{ {0,1} \right\}\) where 0 represents small (S) particles, and 1 large (L) ones. We set \(r_c = 5\sigma\) and \(\Delta = 0.2\sigma\) after trials using many different values (see Supplementary Fig. 1a and Supplementary Notes 1 and 2, and \(\sigma\) is the characteristic length scale parameter in the potential, see details in “Methods”).

Similarly, for 3D glass systems, the SDM of each atom i is defined as

$${{\Xi }}_i\left( {z,y,x,\beta } \right) = \mathop {\sum }\nolimits {\mathrm{exp}}\left( { - \frac{{\left( {r_{ij,x} - x} \right)^2 + \left( {r_{ij,y} - y} \right)^2 + \left( {r_{ij,z} - z} \right)^2}}{{2\Delta ^2}}} \right)$$
(2)

here the summation is performed over all particles satisfying these conditions: \({\mathrm{species}}\left( j \right) \in \beta ;| {r_{ij,x}} | < r_c;| {r_{ij,y}} | < r_c\) and \(| {r_{ij,z}} | < r_c\), and \(\beta\) = 0 or 1, to represent Zr and Cu atoms, respectively. Using an SDM to represent a local configuration can be viewed as projecting the local configuration to 2D (or 3D) grids for different species, which results in a multidimensional numerical array, equivalent to a 2D (or 3D) image containing \(\left( {2r_c/\Delta } \right)^2\) or \(\left( {2r_c/\Delta } \right)^3\) pixels. And each pixel has channels equal to the number of components in the sample. These images can be used directly as input into the CNN model. For an adequately large \(r_c\), a sufficient number of surrounding particles would be included. And, when \(\Delta\) is small enough, any tiny variation of particle positions in the local configuration would lead to a corresponding variation in SDM. Therefore, the SDM can faithfully and accurately represent the topological structure feature of a local configuration. Obviously, when a larger \(r_c\) is used, an image would contain structural information across a larger range, and a smaller \(\Delta\) means a higher resolution of resultant images; the difference between two very similar local configurations would be more pronounced in the images. Although a larger \(r_c\) and a smaller \(\Delta\) are good for more faithful and accurate representation of the topological structure feature and elevated predictive performance, these also mean larger size of each image, requiring more computational resource and memory to generate, store, and use these images. In this work, we will choose optimized values of both \(r_c\) and \(\Delta\) for different glass systems after trying many different values. See more discussion about the optimization of both \(r_c\) and \(\Delta\) in Supplementary Notes 1 and 2. In addition, the contribution to the SDM from different species are separated into different channels, taking into account the influence arising from chemical affinity. Figure 1a shows a typical 2D model glass, with a close-up view (a local configuration) in Fig. 1b. The SDM of S and L particles corresponding to Fig. 1b are shown in Fig. 1c, d, respectively, where each image contains 50 × 50 pixels. The pixels with high intensity (darker color) correspond to the locations of particles. Obviously, the SDM will be different when the local configuration is rotated. Before computing SDMs, we will use different coordinate systems or rotate configurations for different loading orientations such that the shear direction and the normal direction of the shear plane are parallel to X and Y axes, respectively, of the chosen coordinate system (several specific examples are given in Supplementary Fig. 2 for choosing appropriate coordinate systems for different loading protocols). Therefore, the SDM is rotationally non-invariant, which is the key feature we need for predicting orientation-dependent plastic events in amorphous solids. As will be shown later, the rotation-dependent SDM is very sensitive to small changes in sample-loading alignment. The SDM can also be used to predict rotation-invariant properties of crystalline materials via data augmentation41, i.e., arbitrarily rotating images each time they are fed into the network. The SDM is a substantial extension of the radial Gaussian function pioneered by Behler and Parrinello previously47, to multidimensional space. Our extension is important in the following aspects. First, it ensures a complete structure representation. Second, it enables rotationally non-invariant prediction. Third, it results in images that can be interpreted by CNN which promises state-of-the-art learning capability.

Fig. 1: Atomic structure representation and model architecture for 2D glasses.
figure 1

a Snapshot of a 2D model glass. Orange and silver circles represent small (S) and large (L) particles, respectively. b A close-up showing a local configuration around the red particle in a. c, d Corresponding spatial density map (SDM) of the S and L particles, respectively. e Architecture of the convolutional neural network (CNN) model used for 2D glasses, which contains 25 (5 in each blue box) 2D convolutional (conv.) layers. A 2D max-pooling layer is periodically inserted in between the successive convolutional layers. The last convolutional layer is followed by the output layer which is a sigmoid neuron. The corresponding architecture of the CNN model for our 3D model glasses is shown in Supplementary Fig. 3.

We then applied the CNN method44 to each input image to predict the propensity of each particle to experience plastic rearrangement, at different global shear strains when the sample is loaded along a specific orientation. See more details on the loading protocol in “Methods.” Note that the CNN model is taking one image per atom (each image represents the local atomic environment centered on the atom of interest), rather than taking an image of the entire simulation box. Figure 1e illustrates the architecture of the CNN model for the 2D glasses. Specifically, each CNN model contained 25 convolutional layers, where filters with a small receptive field of 3 × 3 were used. Zero padding was involved in all but not in the fifth and last convolutional layers. Batch normalization48 was adopted right after each convolution and before activation with the rectification (ReLU) nonlinearity41. A max-pooling layer is periodically inserted in between the 25 successive convolutional layers. Max-pooling is performed over a 2 × 2 pixel window, with a stride of 2. The number of filters is 20 in the first five convolutional layers and then doubled after each max-pooling layer. And the last convolutional layer is directly followed by the output layer, which is a sigmoid neuron as we are performing binary classification tasks. In the classification tasks, each input image has a true positive label (yi = 1) when the non-affine squared displacements (D2min)37 of the atom i represented by the image is greater than a specific threshold after straining to a given degree under a specific loading condition, and a negative label otherwise (yi = 0). More details about the D2min threshold and the loading protocol will be presented later.

For 3D glasses, rc = 9.6 Å and Δ = 0.6 Å were found to produce optimized results (see Supplementary Fig. 1b, Supplementary Notes 1 and 2), and the resultant input images contain 32 \(\times\) 32 \(\times\) 32 pixels. The corresponding CNN model contains 20 convolutional layers and three max-pooling layers, and both convolution and max-pooling were performed in 3D space, as illustrated in Supplementary Fig. 3. The number of filters in each 3D convolutional layer was 60. To accelerate training and avoid shortage of memory, we did not double the filter number after max-pooling. This is not expected to influence significantly the training results, based on our experience with the 2D glasses for which doubling the filter number increased the validation accuracy by less than 1%. For tutorials about deep learning, the readers are referred to refs. 41,42,43,44.

Structural insight into the anisotropic local mechanical response

In the literature, free volume8,49 has been commonly used as a (macroscopic) structural parameter. However, it has been known for a long time that the correlation between local excess volume and dynamics in glasses is rather weak10,17. We will further confirm later that there is indeed no obvious correlation between the plastic susceptibility and local volume. There have also been recent observations that atoms with fewer neighboring particles in the nearest-neighbor shell, but more in the troughs in between the shells, would be more flexible33,35. However, this does not explain why the same local configuration responds differently to applied global force along different directions. Now that we have converted local configurations into images, let us first examine, from this perspective, if we could identify a critical structure feature responsible for the anisotropic mechanical response. To this end, we seek valuable insight by contrasting the SDMs of particles with extremely high plastic propensity (i.e., the tail end of the high-susceptibility side) versus those on the other end.

We first sheared 5000 2D glass samples to a shear strain of 3.0% using the athermal quasi static (AQS) method50,51, such that we can identify all of the particles with extreme (the highest or the lowest 0.5%) D2min, for each of the four different loading protocols (positive and negative simple shear and positive and negative pure shear). In positive (negative) simple shear, the applied strain increment is \(\Delta \gamma _{xy} = \Delta \gamma\) (\(\Delta \gamma _{xy} = - \Delta \gamma\)) and in positive (negative) pure shear, the applied strain increments are \(\Delta \varepsilon _{xx} = - \Delta \varepsilon _{yy} = - \Delta \gamma /2\) (\(\Delta \varepsilon _{xx} = - \Delta \varepsilon _{yy} = \Delta \gamma /2\)), see details in Supplementary Fig. 2 and “Methods”). In this work, following the pioneering work by Falk and Langer37 as well as recent machine-learning attempts33,34, we assume that an atom has a high plastic susceptibility if it shows a large D2min (the position of the atom is then a fertile site for shear transformations). The plastic susceptibility is a scalar quantity, but exhibits a different value each time the sample is globally loaded along a different orientation. We then went back to the original configurations before deformation and calculated the SDMs for these two groups of extreme particles that we want to compare. For each group an average SDM was obtained, representing the averaged environment of a representative particle. In the main text, we show and discuss the difference, in terms of the spatial distribution of surrounding S particles, between central S particles having the highest versus the lowest plastic susceptibility. Other cases, including the distribution of L particles around central S, and the distribution of both S and L particles around central L, are displayed in Supplementary Figs. 46 only, as they show features similar to that observed for the distribution of S particles around the central S particle. In 3D Cu–Zr glasses, Cu and Zr species correspond to S and L particles, respectively.

It is interesting to observe from Fig. 2a–h that for a particle of highest 0.5% D2min (see Fig. 2a–d), hereafter referred to as “fertile site,” its neighboring shells are more diffuse (wider), and the intensity in the same shell is nonuniform. For the other extreme, the lowest D2min particle (shown in Fig. 2e–h), each shell is sharper with more uniform intensity. In other words, the images do reveal structural differences between particles with high versus low plastic susceptibility. This difference in local environment can be better appreciated by subtracting the maps in Fig. 2a–d with those in Fig. 2e–h. As can be seen from the resultant difference in the SDMs, Fig. 2i–l, for “fertile sites” the surrounding shells are ellipses. The long axis is parallel to the macroscopic elongation direction \({\boldsymbol{\zeta}}\)—the white dashed line in Fig. 2i–l, which is the sum of the unit vector \(\hat {\boldsymbol{\iota}}\) (effective shear direction) and \({\hat {{{\boldsymbol{\eta}}}}}\) (normal to the effective shear plane), and the short axis is perpendicular to \({\boldsymbol{\zeta}}\). In contrast, for particles on the other end of the plastic susceptibility spectrum, their neighboring shells are circles. A schematic highlighting our observation in Fig. 2i–l is included in Supplementary Fig. 7, to better illustrate the critical structural difference.

Fig. 2: Structural insight into the anisotropic local mechanical response.
figure 2

Each map in ah shows the average SDM of the S particles in the neighborhood up to medium range for all S particles that have been identified to have large or small D2min when sheared to 3.0% strain, in 5000 2D samples, for one of the four different loading protocols (from left to right: positive simple shear, negative simple shear, positive pure shear, and negative pure shear). The SDM was calculated for these particles in the initial undeformed configurations. ad for central particles with top 0.5% of D2min, while eh for those with the lowest 0.5% of D2min. il The difference in SDMs, which is calculated using the first row (ad) minus the second row (eh). The white dashed line in (il) represents the macroscopic elongation direction, \({\boldsymbol{\zeta}}\). m Orientational partial (S-centered) pair correlation function \(g_{\boldsymbol{\zeta}} \left( {r,\theta } \right)\) for S–S correlation. Each \(g_{\boldsymbol{\zeta}} \left( {r,\theta } \right)\) curve is an average, over all samples and all loading orientations. For the 3D Cu50Zr50 glass, we show two orientational pair correlation functions for atoms that showed extreme D2min upon straining to 5%. n, o Cu-centric \(g_{\boldsymbol{\zeta}} \left( {r,\theta } \right)\) and \(g_{\boldsymbol{\zeta}} \left( {r,\varphi } \right)\) curves, respectively, for Cu–Cu correlation.

To further understand the structural difference, we also used an orientational pair correlation function, \(g_{\boldsymbol{\zeta}} \left( {r,\theta } \right)\), to characterize the local atomic packing environment. The \(g_{\boldsymbol{\zeta}} \left( {r,\theta } \right)\) for a single particle i is defined as

$$g_{{\boldsymbol{\zeta}} ,i}\left( {r,\theta } \right)_\beta \,= \frac{1}{{\kappa \rho f_\beta }}\mathop {\sum }\limits_{j = 1}^{N_\beta } \left[ {\delta \left( {r - r_{ij}} \right)\delta \left( {\theta - \theta _{{\boldsymbol{\zeta}} ,{\mathbf{{r}}}_{ij}}} \right)} \right]$$
(3)

where \(\theta _{{\boldsymbol{\zeta}} ,{\mathbf{r}}_{ij}} = {\mathrm{arccos}}\left( {| {\hat {\mathbf{{r}}}_{ij}\cdot \hat {\boldsymbol{\zeta}} } |} \right)\) is the angle between the vectors \({\boldsymbol{\zeta}}\) and \({\mathbf{r}}_{ij}\) and has its values in the range of [0, \(\pi /2\)], \({r_{ij}} = | {{\mathbf{r}}_{ij}} |\), \(\rho\) is the number density of atoms, and \(f_\beta\) is the composition fraction of species \(\beta\) in a sample. For 2D samples, \(\kappa = 4r\Delta r\Delta \theta\), and for 3D samples, \(\kappa = 4\pi r^2\Delta r\left[ {\cos \left( {\theta - 0.5\Delta \theta } \right) - \cos \left( {\theta + 0.5\Delta \theta } \right)} \right]\). The \(g_{\boldsymbol{\zeta}} \left( {r,\theta } \right)\) measures the line density of particles/atoms in the region with radial distance of \(r\) from the central atom i, oriented relative to elongation direction \({\boldsymbol{\zeta}}\) at angle \(\theta\) (this can be compared with the usual pair correlation function, which can be regarded as the average of orientational correlations over all \(\theta\)). Again, our goal is to probe the local environment of particles identified to exhibit extreme D2min in 3.0%-strained 2D samples. In order to accomplish this, we went back to the original unsheared sample to calculate the \(g_{{\boldsymbol{\zeta}} ,i}\left( {r,\theta } \right)\) for these particles of interest. This curve is then averaged over all such S (or L) particles from 5000 different samples for a given protocol (each sample was sheared four times, with the different protocols). The \(g_{\boldsymbol{\zeta}} \left( {r,\theta } \right)\) plots are shown in Supplementary Fig. 8 for S particles and Supplementary Fig. 9 for L particles. The \(g_{\boldsymbol{\zeta}} \left( {r,\theta } \right)\) curves obtained for particles identified in the various loading protocols show little difference, e.g., (a–d) in the Supplementary Fig. 8 are very similar to one another; they were hence averaged to the one shown in Fig. 2m.

As seen from Fig. 2m, for particles with low plastic susceptibility, the gray curves of \(g_{\boldsymbol{\zeta}} \left( {r,\theta } \right)\) for different \(\theta\) almost overlap on top of each other, which is consistent with the observation that around each of such particles the neighbors tend to form circularly symmetric shells. In stark contrast, for fertile-site particles, the curves of \(g_{\boldsymbol{\zeta}} \left( {r,\theta } \right)\) at different \(\theta\) are quite different. Specifically, their neighboring particles with \(\theta\) close to 0 have longer bond lengths, whereas those with \(\theta\) closer to \(\pi /2\) have shorter bond lengths, see Fig. 2m. Note that other peaks at larger radial distance behave similarly, so only the first peaks are compared in the figures. We carried out similar \(g_{\boldsymbol{\zeta}} \left( {r,\theta } \right)\) analysis for 3D Cu50Zr50 model glasses. As shown in Fig. 2n, the structural features are similar to those in 2D glasses.

The analysis based on \(g_{\boldsymbol{\zeta}} \left( {r,\theta } \right)\) unveils that a local configuration would be vulnerable to plastic rearrangement when its longest axis of the elliptical neighboring shells is parallel to the elongation direction of global loading. This identifies the softest direction of a 2D local configuration. But for a 3D local configuration under this same condition, there is still one more degree of freedom: the local configuration can rotate arbitrarily around the elongation direction. Additional conditions are therefore needed to identify the softest direction of 3D local configurations, to take into account this angle of rotation. To this end, we employ another orientational pair correlation function, \(g_{{\boldsymbol{\zeta}} ,i}\left( {r,\varphi } \right)\)

$$g_{{\boldsymbol{\zeta}} ,i}\left( {r,\varphi } \right)_\beta = \frac{1}{{\kappa \rho f_\beta }}\mathop {\sum }\limits_{j = 1}^{N_\beta } \left[ {\delta \left( {r - r_{ij}} \right)\delta \left( {\varphi - \varphi _j} \right)} \right]$$
(4)

Here, we shift the elongation vector \({\boldsymbol{\zeta}}\) such that it goes through the central atom i of a local configuration and then define a new vector \({\boldsymbol{\chi}} _j\) for each surrounding atom j of the local configuration. The \({\boldsymbol{\chi}} _j\) is perpendicular to \({\boldsymbol{\zeta}}\) and passes through the surrounding atom j of interest. Then, \(\varphi _j\) is the angle between \({\boldsymbol{\chi}} _j\) and the plane (\({\boldsymbol{\iota}} \times {\boldsymbol{\eta}}\)) defined by the shear direction vector (\({\boldsymbol{\iota}}\)) and the normal vector of shear plane (\({\boldsymbol{\eta}}\)), for each surrounding atom j. Thus, the \(g_{{\boldsymbol{\zeta}} ,i}\left( {r,\varphi } \right)_\beta\) depicts neighboring shells which pass through the vector \({\boldsymbol{\zeta}}\) but with different \(\varphi\) relative to the plane of \({\boldsymbol{\iota}} \times {\boldsymbol{\eta}}\). When \(\varphi = 0\), the shells are in the plane of \({\boldsymbol{\iota}} \times {\boldsymbol{\eta}}\), and when \(\varphi = \pi /2\), they are perpendicular to the plane of \({\boldsymbol{\iota}} \times {\boldsymbol{\eta}}\). Here, \(\kappa = 8r^2\Delta r\Delta \varphi\), different from that for \(g_{\boldsymbol{\zeta}} \left( {r,\theta } \right)\) in Eq. (3).

For the 3D glasses, Fig. 2o contrasts the partial \(g_{{\boldsymbol{\zeta}} ,i}\left( {r,\varphi } \right)\) of the Cu atoms having the lowest 0.5% D2min with those having the 0.5% highest for Cu–Cu. These extreme particles were identified in the 1500 Cu50Zr50 glasses deformed to 5% strain. For the lowest D2min, all partial \(g_{{\boldsymbol{\zeta}} ,i}\left( {r,\varphi } \right)\) at different \(\varphi\) also overlap with one another, in conjunction with the overlapping partial \(g_{{\boldsymbol{\zeta}} ,i}\left( {r,\theta } \right)\) at different \(\theta\) shown in Fig. 2n. These unequivocally confirm that the neighboring shells of particles with lowest plastic susceptibility exhibit spherical symmetry. In contrast, for the fertile sites with the highest D2min, the surrounding particles inside the plane of \({\boldsymbol{\iota}} \times {\boldsymbol{\eta}}\) tend to have shorter bond lengths, as seen in all the \(g_{{\boldsymbol{\zeta}} ,i}\left( {r,\varphi } \right)\) at small \(\varphi\). Combining this with the insight from Fig. 2n, a 3D local configuration is expected to emerge as a fertile site when (i) the degree of asymmetry of its atomic packing is sufficiently large; (ii) its longest axis is parallel to the elongation direction of the global loading; and (iii) its orientation around the elongation direction axis is in such a way that the neighboring shell inside the plane of \({\boldsymbol{\iota}} \times {\boldsymbol{\eta}}\) coincides with the ellipse having the largest aspect ratio. Putting it another way, in the plane of \({\boldsymbol{\iota}} \times {\boldsymbol{\eta}}\), the larger the ratio of the bond length parallel to the elongation direction to that parallel to the contraction direction (the elongation and contraction directions are perpendicular to each other), the higher the susceptibility to plastic rearrangement.

We emphasize here that these features above provide new insight into the difference in local structure, between particles with high versus low plastic susceptibility. To recapitulate, in both the 2D and 3D glasses, for particles with low plastic susceptibility, their neighbors tend to form circular shells with almost uniform intensity (peak height) at different \(\theta\) and \(\varphi\). In other words, the bond lengths are very similar for each shell, in which the neighbors uniformly distribute. In contrast, packing tends to be more asymmetric surrounding fertile sites, the neighboring shells are ellipses, each with non-uniform particle distribution. For a fertile site to be activated upon a particular mechanical loading, the longest and shortest axes of neighboring shells should be parallel to the elongation and contraction directions, respectively. In other words, a fertile site is the most prone to be activated when its surrounding packing along the elongation direction is the loosest and packing along the contraction direction is the densest, as illustrated by the schematic in Supplementary Fig. 7. This is the structural origin responsible for the anisotropy of local mechanical response in amorphous solids.

With these insights in mind, we expect that the mechanical response of a local configuration will not show much difference when one mirrors the local configuration around the plane perpendicular to the plane of \({\boldsymbol{\iota}} \times {\boldsymbol{\eta}}\) and containing the vector \({\boldsymbol{\zeta}}\) or the vector perpendicular to \({\boldsymbol{\zeta}}\) or the plane of \({\boldsymbol{\iota}} \times {\boldsymbol{\eta}}\) (see details on these mirror operations in Supplementary Fig. 10), as these mirrors will not change the packing density of local configurations along the elongation or contraction direction. To verify this, we loaded three such mirrored configurations of a 2D L-J sample and seven mirrored Cu50Zr50 configurations along a constant loading orientation. As expected, D2min of all particles in those mirrored samples are indeed almost identical to those in the corresponding original samples, when the strain is well below global yielding, as shown in Supplementary Fig. 11. In training our CNN model (next section), we will augment our training data with these mirror symmetries. This was found to improve accuracy by ~2%, lending further support to our insight above.

Training procedure and results

The insight from Fig. 2 discussed in the preceding section is important for our understanding of the structural origin of the anisotropy, as it provides a general picture of the main structural features/difference. But our trials found it difficult to translate the new insight into a simple hand-crafted indicator that can predict plastic susceptibility with high accuracy. This underscores the need to take into account all the subtle but nontrivial structural features, when a concrete decision is to be made regarding whether a particular particle has high (or low) susceptibility. As will be demonstrated in the work that follows, deep learning is well suited for this purpose, in terms of providing adequate predictive power. Furthermore, the consistency between the CNN prediction with our expectation based on the insight in Fig. 2 will also be confirmed at the end of this subsection. To construct big datasets for the CNN models, we prepared 2D L-J (or 3D Cu50Zr50) glasses with constant a cooling time of 106 t0 (or effective quench rate of 1010 K/s) from the corresponding well-equilibrium liquid. A total of 5000 (1500) samples were made, each containing 10,000 (32,000) particles. Each sample was then deformed under 4 (24) different loading protocols (see more details about sample preparation and deformation in “Methods”). These 2D (3D) glass models were divided into training, validation, and test datasets, each containing 4900 (1480), 50 (10), and 50 (10) samples, respectively. Since we are interested in identifying particles which will experience extreme (large) plastic rearrangement upon loading, the first step is to do a binary classification task. We will explore the relations between initial static structure and plastic responses at different degrees of strain before the yielding point. Thus, we constructed different datasets for the plastic responses at different degrees of strain. For a specific strain, a particle i is labeled yi = 1 if its D2min is higher than a specific threshold and yi = 0 otherwise. Here, we use fthres to set the D2min threshold at a given shear strain. For example, fthres = 0.5% means that we label particles in the top 0.5% D2min group as yi = 1 and the remainder yi = 0. The shear stress–strain curves and cumulative distribution function (CDF) of D2min are shown in Supplementary Figs. 12 and 13, respectively, for the 2D and 3D glasses. Our main purpose is to reveal how accurate it can be when identifying atoms with large plastic susceptibility a priori from initial static structure in glasses. We therefore choose fthres = 0.5% to show the predictive performance (to ensure sufficiently large datasets, fthres cannot be too small), after trying many different values of fthres. Actually, the advantage of the CNN over other machine-learning methods is more obvious for larger fthres, see more discussions and results in Supplementary Note 3 and Supplementary Fig. 14. Our training, validation, and test datasets were balanced, as in each dataset we included all the yi = 1 particles in each sample deformed under each specific deformation protocol, with the same number of yi = 0 particles selected at random from the rest. In contrast to previous works33,36, we will train a single model for all species in a glass system to improve the user friendliness of machine-learning models; this does not sacrifice much predictive power compared to training different models for different species, as verified in Supplementary Fig. 15 and related discussion in Supplementary Note 4. Note again that the SDM was constructed for particles in the initial undeformed configurations; it is just that these particles were selected based on their D2min response upon straining.

The main interest in the glass community is to see how effective it could be when it comes to separating out fertile sites, i.e., atoms prone to large plastic rearrangement upon deformation. Classification suffices for this purpose. In performing binary classification tasks, the most commonly used metric to measure the predictive performance is “accuracy,” defined as the fraction of the number of images (local configurations) predicted correctly divided by the total number of images in a dataset. The image is predicted correctly, when the actual label yi (1 or 0, determined by the value of D2min from simulations) of an image is identical to the label predicted by the machine-learning model. Since each of our datasets is balanced, i.e., the number of images with actual yi = 0 is equal to that of images with actual yi = 1, the lowest value of accuracy would therefore be 50%. We confirmed that, for each trained model, the validation accuracy and test accuracy are practically the same, suggesting that the trained models did not bias our validation datasets despite the hyperparameters involved, such as CNN architecture, \(r_c\) and Δ. We therefore only report the validation accuracy (or simply “accuracy”) in the following.

Figure 3a, b, for 2D and 3D glasses, respectively, shows the predictive accuracy achieved on validation datasets with our CNN method, the graph neural network (GNN)36 method, and the linear support vector machine (SVM)33 method, when differentiating particles/atoms with yi = 1 from those with yi = 0 at different strains with a constant fthres = 0.5%. As seen from Fig. 3a for the 2D case, CNN exceeds SVM by far in performance, over the entire strain range we considered; the highest accuracy achieved was as high as 96.27%. This accuracy is very close to the ceiling (see “Methods” for details about the upper bound). Several advantages of CNN make it obviously superior to SVM: (i) CNN can capture more complicated relations between static structure and plastic activity compared to linear SVM, as the latter is merely a linear machine-learning algorithm; (ii) SVM loses some subtle but critical structural information during the conversion from atomic positions to the input features; and (iii) most importantly, for a given local configuration in different loading directions, we constructed images with different coordinate systems (in which the X axis is parallel to the shear direction (\({\boldsymbol{\iota}}\)) and the Y axis is the normal direction of the shear plane (\({\boldsymbol{\eta}}\))). This enabled orientation-dependent predictions, whereas the input features to SVM are rotation-invariant, leading always to the same prediction. We observe that GNN produces slightly lower accuracy compared to the accuracy of our CNN when fthres = 0.5% for 2D glasses, as shown in Fig. 3a. This is not surprising, as GNN used atomic positions directly to construct graphs as input, and in training GNN models we rotated edge vectors for different loading directions to take into account the orientation dependency of local plastic response, in contrast to the original procedure from ref. 36 (see details in “Methods”). The advantage of CNN over GNN is more obvious for larger fthres, see Supplementary Fig. 14, and for all cases in 3D glasses (see discussion in next paragraph). In addition, the graph input to GNN cannot be used to provide the insights gained from Fig. 2 based on images. Supplementary Fig. 16 shows the particle-level prediction results of a given 2D glass under different loading protocols for the three machine-learning methods (see more discussion in Supplementary Note 5).

Fig. 3: Predictive power enabled through deep learning.
figure 3

a Accuracy achieved using three different machine-learning methods (CNN, GNN, and SVM) at different shear strains using a constant fthres of 0.5% for 2D L-J glasses. b The corresponding results for 3D Cu50Zr50 glasses. The lower edge of the region shaded gray is the upper bound, i.e., the ceiling predicted using labels based on new D2min after conducting another deformation simulation on the same samples belonging to the test dataset (see more details in “Methods”). c The cumulative rank correlation (Crank, see text for its definition) for the 2D L-J glasses between real local plastic response in a sample along a given loading orientation and the plastic susceptibility predicted via each of the six routes, including the three machine-learning models and the other three based on physical parameters, i.e., flexibility volume (vflex), soft modes (SM), and atomic volume (\({{\Omega }}_a\)). The corresponding results for the 3D Cu50Z50 glasses are shown in d. The Crank value is the average for 50 (10) different samples loaded in 4 (24) different loading orientations for the 2D (3D) glasses, with the error bar marking the standard deviation. For the 3D glasses, each sample contains 32,000 atoms, although the soft mode analysis was conducted using smaller samples each containing 10,000 atoms. For a fair comparison, CNN was also applied on these smaller samples, as shown with open pentagons (i.e., white pentagons overlapping with red full circles) in d. Here, all species are considered simultaneously, and it has been confirmed that all the trends shown here remain the same for the predictive capability reached, when predicting just based on a specific particle type. Supplementary Fig. 15 shows the comparison of the three machine-learning methods for each individual species.

When dealing with 3D Cu50Zr50 glasses, the GNN method is also superior to SVM, consistent with the observations in ref. 36. However, we found that our CNN, which has stronger learning capability, is far more powerful than GNN, as shown in Fig. 3b, over the entire strain range. This is because the relation between structure and plastic response in 3D glasses is much more complex than that in 2D glasses. The highest accuracy achieved with CNN is above 90%, in the strain range from 3 to 6%. At small strains, only some of the initial “defects” have been activated and thus D2min cannot reflect the system plastic response very well; on the other hand, at large strains close to global yielding, the role played by initial structural state becomes less dominant. As a result, the correlation between the initial structure and the plastic response peaks at an intermediate value of shear strain. Even at a strain of 10% where global yielding occurs (see Supplementary Fig. 12b), the accuracy is still >83%. This suggests our CNN model may be useful in predicting the location where shear band initiates. This point is beyond the scope of the current work and will be discussed elsewhere. We believe that the CNN method can achieve higher accuracy, especially for 3D glasses, if we use deeper networks. But it will require more computational resources, see more discussion in Supplementary Note 1. Our main purpose here is to demonstrate that the CNN is indeed a powerful method to meet the challenge we are dealing with. To benchmark the advantages of the CNN, a comparison is made against the reference—the fully connected neural networks, as shown in Supplementary Fig. 17.

One would be naturally curious as to how well our CNN models fare, when compared with the prediction based on some previously published physical indicators, such as the participation ratio in soft modes (SMs)17,18, flexibility volume (vflex)19, and atomic volume (\(\Omega _a\)). For binary classification tasks, the lowest accuracy of any machine-learning method is always 50%. But if the accuracy is used as the metric to assess the predictive performance of those physical parameters, it would drop below 50%. To avoid such an unfair comparison, we have chosen the following route. For the three machine-learning models, we use the class probability for CNN and GNN, and the distance to the separation boundary predicted by SVM, to denote the predicted plastic susceptibility of each particle in the samples constructed for the test dataset but never involved in the training dataset. Take CNN as an example, the class probability for each input image is a numerical value in the range of (0, 1) provided by the learning algorithm, to represent the probability of having a label of yi = 1 for the input object. For binary classifications, class probability greater than 0.5 predicts yi = 1 and yi = 0 otherwise. For the three physical indicators, the value of each of these indicators is used directly to denote the predicted plastic susceptibility of each particle. Similar to ref. 22, we define a cumulative rank correlation (Crank) to compare the power of these six routes in predicting the particles which will have top 0.5% D2min when a sample is sheared to different strains in a given orientation. As the indicator based on each of the six methods are expected to correlate positively with the plastic response, i.e., larger indicator means larger D2min, here we define

$$C_{{\mathrm{rank}}} = 2\overline {{\mathrm{CDF}}} \left( {\rho _i} \right) - 1$$
(5)

where \(\rho _i\) is the plastic susceptibility of particle i predicted via one of the six routes, \({\mathrm{CDF}}\left( {\rho _i} \right)\) is the cumulative distribution function value for \(\rho _i\), and the bar on top represents the averaging of \({\mathrm{CDF}}\left( {\rho _i} \right)\) over all particles with top 0.5% D2min. The highest (lowest) value of \(C_{{\mathrm{rank}}}\)is around 0.995 (−0.995), which means perfect correlation (anticorrelation), and \(C_{{\mathrm{rank}}} = 0\) means no correlation.

As seen in Fig. 3c, d, the \(C_{{\mathrm{rank}}}\) achieved with our CNN method is almost equal to its highest possible value over a wide range of strain for both the 2D and 3D glasses. The worst case is \(\Omega _a\) for which \(C_{{\mathrm{rank}}}\) is always close to zero across the board, for either the 2D or 3D glasses. For the three data-driven methods, their relative predictive power (the magnitude of \(C_{{\mathrm{rank}}}\)) follows an order similar to that observed in Fig. 3a, b. The CNN prediction stands out to be the best. Note that the predictive power generally decreases with increasing strain for physical indicators. In this regard, our CNN has a major advantage over other methods at larger strains, which is important for monitoring the correlation between structure and mechanical response of amorphous solids. As SM analysis on large 3D samples is too computationally expensive, we used instead Cu50Zr50 glasses containing 10,000 atoms. Such smaller samples were also analyzed via CNN, to allow head-to-head comparison (included in Fig. 3d). The consistency in \(C_{{\mathrm{rank}}}\) obtained with our CNN models on smaller and larger samples (red circles almost overlap with white pentagons in Fig. 3d) indicates the robustness of our CNN models with no sample-size dependence. This is not a big surprise, as the rc we used is sufficiently large, and so are our training samples.

In addition, we also used a different and extremely stringent criterion, an overlap ratio, \({\Re}\), to gauge the fraction of correctly predicted particles. \({\Re}\) is evaluated by dividing the number of particles belonging simultaneously to two groups that are intersecting (partially overlapping), with the number of particles known to have the top 0.5% D2min. Here, the intersection/overlapping particles are those that fall into not only the group known to have the top 0.5% D2min, but at the same time also the group having the top 0.5% plastic susceptibility predicted using that particular method/indicator. In the ideal case, the particles predicted to have the top 0.5% plastic susceptibility are expected to be the same particles known to have the top 0.5% D2min (i.e., all the particles in that pool). As such, the closer to 1.0 the \({\Re}\) ratio, the higher the predictive capability. The corresponding results are displayed in Supplementary Fig. 18, which shows a similar trend as in Fig. 3c, d. This further confirms the advantage of our CNN method as the most robust of the six, disregarding the metric adopted to characterize the strength of correlation. Under this extremely stringent criterion, the advantage of CNN over GNN is evident on the 2D glasses, see Supplementary Fig. 18a.

To demonstrate that the orientation dependence has been clearly captured, we show in Fig. 4 the CNN-predicted plastic susceptibility for a 2D local configuration when it is rotated gradually: the rotationally non-invariant SDM turns out to be highly sensitive to minor orientation change. In its initial orientation (\(\theta = 0^\circ\)), when its loosest and densest packing directions align with the elongation (red dashed line) and contraction (black dashed line) directions of mechanical loading, respectively, CNN predicts the highest plastic susceptibility (class probability very close to 1.0). With the counter-clockwise rotation, these packing directions gradually misalign with the loading ones, and the predicted plastic susceptibility goes down. At \(\theta \cong 90^\circ\), the loosest packing of the configuration lines up with the contraction direction whereas the densest packing with the elongation direction, class probability drops to a value very close to zero. When \(\theta\) increases to ~180°, the favorable alignment comes back, such that the plastic susceptibility is predicted to be the highest again. Likewise, the lowest reemerges when \(\theta = 270^\circ\). This sensitivity to the coupling between the anisotropic local structure and the loading direction demonstrates that, when we feed the images (configurations) into CNN, the algorithm is very good at recognizing this coupling and churning out predictions consistent with the insight in Fig. 2. The blue dotted line in the polar plot of Fig. 4 represents D2min of the local configuration at different θ (see “Methods”), which almost matches the CNN prediction. This underscores the benefit and advance enabled by our approach to innovate a new structure representation and combine it with CNN.

Fig. 4: The orientation dependence of CNN-predicted plastic susceptibility for a 2D local configuration.
figure 4

This susceptibility is characterized here by the magnitude of class probability (red solid curve), varying in the range of 0–1 (black circle in the center), depending on the rotation angle (θ) of the local configuration relative to the loading orientation (four scenarios during the counter-clockwise rotation at representative angles are displayed). The red and black dashed lines in each snapshot represent the elongation and contraction directions of the global loading, respectively. The blue dotted line in the polar plot represents D2min of the local configuration at different θ.

Generalize to different processing history or compositions

Our previous work35 demonstrated that the SVM models, trained on samples having the same processing history at a single composition, can be generalized to samples with different processing histories or at different compositions in the same alloy system. To see if such a generalization is possible with our CNN models, 50 2D samples were quenched over different time periods in the range from 100 to 106 t0. For 3D glasses, we also quenched 10 Cu50Zr50 samples with effective cooling rates in the range from 109 to 1013 K/s, and 10 CuxZr100-x glasses with different compositions (x = 20, 25, …, 80) at the same effective cooling rate of 1010 K/s. Each of these 2D (3D) samples was sheared using 4 (24) different loading protocols with the AQS method. We then constructed test datasets for each processing history and each composition, by selecting all of the atoms with D2min above a threshold and a same number of atoms randomly from the rest. The accuracy achieved on these datasets, using the CNN model previously trained at a single composition for a particular processing history (see caption and “Methods”), is presented in Fig. 5a–c. We find a very high accuracy across a wide composition range (Fig. 5c), especially for samples with slower cooling (Fig. 5a, b). For example, for 3D Cu50Zr50 quenched at cooling rate of 109 K/s, the highest accuracy achieved is 92.40%. The decrease in accuracy for faster cooling rate was observed before22,35,38,52, as the critical structural difference becomes increasingly difficult to distinguish38. The more relaxed glasses, on the other hand, have a much smaller population of fertile sites that exhibit larger shear susceptibility, making the recognition easier and prediction more robust.

Fig. 5: The ability of CNN models to generalize.
figure 5

The CNN models were trained on samples at a single composition with one processing history (the 2D glasses at the composition of NL:NS = \(\left( {1 + \sqrt 5 } \right):4\), quenched within 106 t0. The 3D Cu50Zr50 glasses were quenched at 1010 K/s). The accuracy of the models is shown in a for 2D glasses quenched with different cooling times, in b for Cu50Zr50 glasses quenched with different cooling rates, and in c for CuxZr100-x glasses of different compositions quenched at a constant effective cooling rate of 1010 K/s. In the plots, the stars are for the training condition and circles for non-training conditions.

This transferability of the model offers the desirable applicability to glasses across different compositions and/or different processing histories, to compare plastic susceptibility. An example is shown in Supplementary Fig. 19a (for 2D) and b (for 3D). Here, the glass structure is generically characterized by the fraction of predicted “yi = 1” particles out of all particles. This fraction, fpy=1 (the subscript means “predicted (labeled) y=1”), is given by the fraction of particles with class probability > 0.5, and scales almost linearly with the average class probability. These plots reveal a strong correlation between the glass structure and shear modulus (G), found to be robust at any strain as long as it is well below that corresponding to global yielding.

A possible reason for the transferability is that our training datasets, although from one composition and the same processing history, are large enough to include almost all of the possible local environments for samples with different processing histories and/or at different compositions within the same alloy system, as suggested in our previous work35. Achieving transferability across different alloy systems while keeping high predictive capability requires more exploration but it is beyond the scope of the current work.

Discussion

Our results presented in this paper bring about several advances over previous efforts to identify structural “defects” in amorphous solids, i.e., particles that are most prone to rearrangements, in particular shear transformations. First of all, a qualitative but crucial structural difference, in terms of a more non-uniform and asymmetric packing environment, was discovered for atoms with high plastic susceptibility relative to those with low plastic susceptibility. This difference is found to be the origin of the anisotropic local mechanical response in amorphous solids, contributing to the full resolution of this long-standing puzzle. Our expectation based on this insight, starting from Fig. 2, is reflected in the eventual CNN predictions, as highlighted in Fig. 4. This new insight goes beyond that obtained based on the previous concept of geometrically unfavored motifs (GUMs)18. The GUMs refer to coordination polyhedra, an elementary but incomplete structural representation, especially since the local property is known to be influenced by the surroundings beyond the first and second nearest neighbors. In addition, it is difficult to identify the softest shear direction merely based on the local GUMs. The insight presented in this work may be assessed from experiments if the orientational pair correlation functions can be mapped out in future experiments. Second, we achieved unprecedented accuracy in predicting orientation-dependent local mechanical response over a wide range of shear strain, by designing a rotationally non-invariant structure representation coupled with a powerful deep learning algorithm, as highlighted in Fig. 3. This accomplishes what is left unsolved in recent attempts invoking vectors, symmetry, and orientational order53,54,55, which only consider some aspects of the topological information in the first nearest-neighbor shell and lack the capacity to handle the coupling with the direction of the imposed shear. Here, we have used the SDM and the orientational pair correlation functions to cover the entire local environment of an atom—the full gene of the glass structure beyond the first neighboring shell. Because our representation is no longer rotation-invariant, our approach unveils the predictive power of the static structure of amorphous solids in forecasting direction-dependent local plastic response. We are aware that factors beyond static structures, including short-term dynamics, are also relevant to relaxation responses at finite temperatures. However, there has been a pressing need to gauge how much predictive power can be garnered, from the initial static structure alone. This is a question that the community wants a clear answer for, as the static structure is usually the information that becomes available first. Third, once the optimized CNN model is in hand, all that is needed to predict plastic response is atomic positions, without relying on other knowledge such as interparticle interactions required by previous approaches using parameters based on physical properties. Fourth, we have demonstrated that the CNN models trained on a single glass (with one set of a particular composition and a specific processing history) can be generalized to samples with different processing histories or at different compositions in the same alloy system. This is important for probing into the effects of processing procedure or chemical composition on properties, enabling the comparison between different glasses. These four merits open new avenues to the understanding of the structure–property relations in amorphous solids. Fifth and finally, we anticipate that our novel structure representation (SDM), in combination with the powerful CNN method, would find use in the studies of other amorphous matter, beyond metallic glasses, as well as in predicting rotation-invariant properties of crystalline materials, via data augmentation41.

Methods

The data used for training and validating deep learning models are from molecular dynamics simulations56, which are implemented in the LAMMPS package57.

Preparation of 2D model glasses

2D binary glasses are composed of equal-massed (m) small (S) and large (L) particles which interact via a standard 6–12 L-J potential. The full details of the potential are presented elsewhere45. We chose our composition such that the ratio between the species is NL:NS = \(\left( {1 + \sqrt 5 } \right):4\) to be consistent with previous studies14,22,37 of this system and each of the sample contains N = 10,000 particles. All units will be expressed in terms of m as well as \({\it{\epsilon }}\) and \(\sigma\), the parameters describing the energy and length scale, respectively, of the interparticle interaction. The characteristic time is \(t_0 = \sigma \sqrt {m/\varepsilon }\). The glass transition temperature Tg of this system is known to be 0.325\(\varepsilon /k\)14,22,37, where \(k\) is the Boltzmann constant. Periodic boundary conditions were imposed on square boxes of linear dimension 98.8045\(\sigma\). The density of the system (\(N/L^2 \approx 1.02\sigma ^{ - 2}\)) was kept constant. A total of 5000 2D glass samples were each obtained by continuously decreasing the temperature from a liquid state, well equilibrated at 1.08 Tg and quenched to a low-temperature (0.092 Tg) solid state over a period of 106 t0 using a Nose–Hoover thermostat. Then, a static relaxation using conjugate gradient algorithm was applied to bring the system to mechanical equilibrium before conducting deformation simulation22. Particle positions in this state were used to construct structure representations. Our training, validation, and test datasets contained 4900, 50, and 50 samples, respectively. To demonstrate that the CNN models are valid for the glasses with different processing histories, 50 additional samples were quenched from 1.08 Tg to 0.092 Tg over different period in the range from 105 t0 to 100 t0.

Deforming 2D model glasses

Each of the 2D glasses was deformed under four loading conditions (positive and negative simple shear and positive and negative pure shear) with an AQS method50,51. In positive (negative) simple shear, the applied strain increment is \(\Delta \gamma _{xy} = \Delta \gamma\) (\(\Delta \gamma _{xy} = - \Delta \gamma\)), and in positive (negative) pure shear, the applied strain increments are \(\Delta \varepsilon _{xx} = - \Delta \varepsilon _{yy} = - \Delta \gamma /2\) (\(\Delta \varepsilon _{xx} = - \Delta \varepsilon _{yy} = \Delta \gamma /2\)). Here, \(\Delta \gamma\) = 10-5. After each deformation increment, the system was relaxed to its mechanical equilibrium.

Mechanical response of a local 2D configuration at arbitrary orientation

To probe the mechanical response of a local 2D configuration at arbitrary orientation, i.e., the blue dotted curve in the Fig. 4, we deleted particles58 outside the largest circular region from the original square box. Next, we rotated the circular configuration by various angles counter-clockwise, and then conducted simple shear on it with the AQS method. During deformation, the particles in the outer annulus (wall) of width 5\(\sigma\) (two times of potential cutoff) only experience applied affine displacement, i.e., they are not allowed to relax such that the periodic boundary conditions are lost.

Preparation of 3D model glasses

The 3D Cu50Zr50 metallic glasses were simulated with an optimized EAM potential adopted from ref. 46. Each of the 1500 samples contained 32,000 atoms in a cubic box. After adequate equilibration at 2000 K, the liquid was first quenched to 1500 K at a cooling rate of 1013 K/s, followed by 1012 K/s to 1000 K, then at the desired rate of 1010 K/s (effective cooling rate, in the text) to 500 K, and finally at 1013 K/s to 50 K in the NPT ensemble using a Nose–Hoover thermostat with zero external pressure. The periodic boundary condition was applied in all three directions. The samples were then brought to mechanical equilibrium through a static relaxation via a conjugate gradient algorithm. Our training, validation, and test datasets for the 3D system contain 1480, 10, and 10 samples, respectively. To demonstrate that the CNN models are valid for glasses with different processing histories or at different compositions, 10 Cu50Zr50 glasses were also quenched using different effective cooling rates in the range from 109 to 1013 K/s, and at each different composition (CuxZr100-x, x = 20, 25 …, 80, in step of 5), 10 metallic glasses were quenched with an effective cooling rate of 1010 K/s.

Deforming 3D model glasses

Before extracting atomic coordinates to construct structure representation and conducting deformation simulation in each of the 24 loading orientations listed in Supplementary Table 1, we rotated each configuration such that the shear direction and the normal direction of the shear plane were parallel to X and Y axes, respectively, of the coordinate system. For some loading orientations, appropriate atom replication was needed after rotation, which is illustrated in Supplementary Fig. 20. Each of the 3D glasses was deformed in simple shear along the 24 different loading orientations via an AQS method50,51. After each applied strain increment of \(\Delta \gamma _{xy}\) = 10−4, the system was relaxed to its mechanical equilibrium. We calculated D2min per definition in ref. 37, implemented in the OVITO package59.

Upper bound of the prediction accuracy

To assess this ceiling, we make use of the particles from the test dataset, which were already labeled after deformation. Another deformation simulation was then conducted, after shuffling the sequence of all particles, as a new job run on a different computer. The resultant new configuration was used to compute D2min of all the particles, which were labeled again. The number of particles always having the same label, determined based on both the old and the new D2min, was divided by the total number of particles in the dataset, to give the upper bound of accuracy that can be reached via any method. Note that numerical error, when the atom sequence in the initial configuration file is changed although all other deformation conditions are kept the same, renders deformation under the AQS not exactly reproducible. This error also grows with increasing strain.

Training procedure of CNN models

For the CNN models, we minimized the binary cross-entropy loss between true labels and predicted labels, and used early stopping and selected the model with the highest accuracy in the validation dataset. The learning rate started from 0.001 and was then divided by \(\sqrt {10}\) once the validation accuracy plateaued. We chose RMSprop optimizer and mini-batch size of 512 and 64, for the 2D and 3D glasses, respectively. We augmented the training data by applying randomly one of the 4 (or 8) mirror symmetries to the images of 2D (or 3D) glasses every time we fed the images into the CNN. The training was implemented in the TensorFlow package60. And we conducted distributed deep learning with the Horovod package61 to accelerate the training for 3D glasses.

GNN methods

We followed the procedure in ref. 36 to construct graph with edge threshold of 2.0\(\sigma\) (and 4.0 Å) for 2D (and 3D) glasses. We used the same GNN architecture as in ref. 36, except that we activated the output layer with a sigmoid function as we were performing binary classification tasks, and chose nrec = 7 and 6 for 2D and 3D glasses, respectively, according to the optimization results shown in Supplementary Fig. 21. We also tried more neurons in all multilayer perceptors, but this did not further improve the validation accuracy.

For the 2D (3D) samples in both the test and validation datasets, we loaded each sample, and constructed corresponding graph, 4 (24) times, each graph corresponding to one of the 4 (24) loading protocols. And we updated edge vectors of those graphs using relative atomic positions in the new coordinate system, the X and Y axes of which were parallel to the shear direction and the normal of shear plane, respectively, of each specific loading protocol. Atoms were labeled based on the magnitude of D2min under the corresponding loading protocol. When evaluating the loss function and accuracy for the GNN models, only the atoms in the test and validation datasets were considered. Therefore, both test and validation datasets for the GNN are exactly the same as those for our CNN. For the 2D (3D) samples in training datasets, we constructed only one graph for each sample in order to relieve memory burden and save computation time. However, we took into account all loading directions for each of the 4900 (1480) training examples by updating at random their target labels with D2min from one of the 4 (24) loading protocols and by rotating edge vectors correspondingly every time we fed them (both targets and graphs) to the network (note that in our earlier attempts, we tried to directly construct and load multiple graphs for each training sample, but the results obtained were similar). Also, we augmented the training data by applying randomly one of the 4 (or 8) mirror symmetries to the edge vectors of graphs of 2D (or 3D) glasses every time we fed the graphs to the GNN. When optimizing GNN models and evaluating loss function and accuracy, only the atoms in the training datasets were used to enable balanced datasets and a fair comparison with our CNN.

To minimize the binary cross-entropy loss between true labels and predicted labels, similarly to ref. 36, we used early stopping and selected the model with the highest accuracy on the validation dataset after running enough epochs. We trained the GNN models with a learning rate of 10−4, gradient clipping, Adam optimizer, and a TensorFlow implementation.

SVM methods

Same as in ref. 33, we used the following radial and angular structural functions to construct input features to SVM models:

$$G_{i,\alpha }\left( r \right) = \mathop {\sum }\limits_{j \in \alpha ,r_{ij} < r_c} {\mathrm{exp}}\left( { - \frac{{\left( {r_{ij} - r} \right)^2}}{{2\Delta ^2}}} \right)$$
$$\Psi _{i,\alpha ,\beta }\left( {\xi ,\lambda ,\varsigma } \right) = \mathop {\sum }\limits_{j \in \alpha } \mathop {\sum }\limits_{k \in \beta } {\mathrm{exp}}\left( { - \frac{{r_{ij}^2 + r_{jk}^2 + r_{ik}^2}}{{\xi ^2}}} \right)\left( {1 + \lambda \cos \theta _{ijk}} \right)^\varsigma$$

where \(r_{ij}\) denotes the distance between atoms i and j, and \(\alpha\) or \(\beta\) denote the species, \(\theta _{ijk}\) is the angle between vector \({\mathbf{{r}}}_{ij}\) and \({\mathbf{{r}}}_{ik}\), and \(r \in [r_0 + 0.5\Delta ,r_c)\) with an increment of \(\Delta\). We use \(r_0\) = 0.5\(\sigma\), \(r_c\) = 5.0\(\sigma\), and \(\Delta\) = 0.025\(\sigma\), and \(r_0\) = 2.0 Å, \(r_c\) = 8.0 Å, and \(\Delta\) = 0.050 Å for 2D and 3D glasses, respectively, based on the findings in Supplementary Fig. 22. For the parameters in the angular structural functions, similar to ref. 36, we used \(\lambda \in \left\{ { - 1,1} \right\},\varsigma \in \left\{ {1,2,4,8,16} \right\}\), and 16 values of \(\xi\) equally spaced between 0.75\(\sigma\) and 5.0\(\sigma\) (between 3.0 and 8.0 Å) for 2D (3D) glasses. We tried to use 4, 8, 16, or 32 values for \(\xi\) in the given region but found only a weak dependency. We also tried to use exactly the same angular parameters as in the original work33, but the results came out slightly worse.

We conducted L2-regularized L2-loss support vector classifications with a linear kernel, which were executed in the LIBLINEAR package62 by solving the primal optimization problem. A regularization parameter of C = 100 was used, as we tried many different values over the range from 10−7 to 105 and found only a weak dependency once C was above 10. We also tuned the termination criterion, which did not further improve the accuracy on validation datasets. Note that 500 (100) samples were sufficient to construct training datasets for 2D (3D) glasses, because training the SVM models does not require large datasets, as demonstrated in Supplementary Fig. 23.

Supplementary Tables 2 and 3, for 2D and 3D glasses, respectively, present a summary of the input features and trainable parameters of the three machine-learning models.

Calculating the flexibility volume

The flexibility volume is defined as the product of vibrational mean squared displacement (MSD) and atom spacing19. We calculated MSD at 0.092 Tg for 2D glasses and at 50 K for 3D glasses, using the procedure in ref. 19. The MSD was averaged over 100 independent runs, all starting from the same configuration but different initial velocities. And the atomic volume is calculated based on Voronoi analysis.

SM analysis

The normal mode analysis of the glasses was carried out by diagonalizing the dynamical matrix of the inherent structure of glasses. The participation fraction of atom i in eigenmode \(e_\omega\) is \(p_i = \left| {\vec e_\omega ^i} \right|^2\), where \(\vec e_\omega ^i\) is the corresponding polarization vector of atom i63. Here, \(p_i\) was summed over a small fraction of 1% (same as that in ref. 18) of the lowest-frequency normal modes. \(p_i\) measures the involvement of atom i in SMs.