A generative deep learning framework for inverse design of compositionally complex bulk metallic glasses

The design of bulk metallic glasses (BMGs) via machine learning (ML) has been a topic of active research recently. However, the prior ML models were mostly built upon supervised learning algorithms with human inputs to navigate the high dimensional compositional space, which becomes inefficient with the increasing compositional complexity in BMGs. Here, we develop a generative deep-learning framework to directly generate compositionally complex BMGs, such as high entropy BMGs. Our framework is built on the unsupervised Generative Adversarial Network (GAN) algorithm for data generation and the supervised Boosted Trees algorithm for data evaluation. We studied systematically the confounding effect of various data descriptors and the literature data on the effectiveness of our framework both numerically and experimentally. Most importantly, we demonstrate that our generative deep learning framework is capable of producing composition-property mappings, therefore paving the way for the inverse design of BMGs.


INTRODUCTION
Since ancient times 1 , glass has been an important engineering material, which is still playing a pivotal role in our daily life today 2 . In principle, glass can be viewed as a frozen-in supercooled liquid without any long-range positional ordering 3 , which can be developed from a variety of material systems 2,4 , such as polymeric glass 4 , silicon glass 5 , oxide glass 6 , colloidal glass 7 , and metallic glass (MG) 8 . Compared to conventional glasses, MGs are a relatively young member to the glass family, which was discovered in the 1960s 9 . Since then, MGs have been attracting tremendous research interest because of their promising structural and functional properties, such as high elastic strain limit 10 , superb strength/hardness 11 , excellent corrosion resistance 12 , and remarkable thermoplastic formability 13 . However, due to the simple atomic structure, the glass-forming ability (GFA) of MGs is relatively low compared to other types of glasses 14 , such as oxide glass. Therefore, it has been active and enduring research to develop good glass-forming alloys. In principle, GFA correlates positively with the maximum size that a glass-forming alloy can be cast into without crystallization, and therefore a superior GFA usually manifests as the successful casting of bulk metallic glasses (BMGs) with a typical size >1 mm 13 . Nevertheless, according to the literature 15 , only 10% of the reported glass-forming alloys can form BMGs despite the tremendous efforts paid for over the past decades. By tradition, the discovery of a BMG composition was usually based on a trial-and-error iterative approach that encompassed empirical rules 16 , such as the eutectic point rule 17 , the confusion principle 18 , and others 19 . However, this traditional approach is time-consuming, ineffective, and costly 20 .
On the other hand, data-driven approaches, such as machine learning (ML), become increasingly popular in recent years in alloy design 21,22 . Making use of the big data accumulated over years 21,22 , ML can greatly accelerate the discovery of new alloys at a speed unmatched by the traditional approach. At the present time, the ML approach for the design of BMGs is mainly based on supervised learning 23 , which, in principle, begins with training a classifier or regressor using data descriptors formulated through alloy compositions 24 , empirical rules 21,25,26 or physical models 27 . Afterward, the trained supervised learning models are exploited with human inputs to explore the multi-dimensional compositional space for the search of alloys with targeted properties, such as excellent GFA [26][27][28] , superior thermal properties 25 and etc. 29 . Although this supervised learning approach has been proven useful 25,27,30 , however, it quickly runs into a bottleneck with the increasing compositional complexity of alloys, such as high entropy metallic glasses 15,31 that contain more than five principal elements. In such a case, sampling the compositional space, which contains an astronomical number of candidate alloys 32 , through supervised learning becomes impractical, if not impossible, in order to identify only a few BMG compositions out of millions or even more human-guided inputs.
Unlike the prior works [24][25][26][27][28][29]33 , we herein develop a generative deep-learning framework (GDLF) for the unsupervised search of compositionally complex BMGs based on the adapted algorithm of Generative Adversarial Network (GAN) 34 . As illustrated in Fig. 1a, we collected four types of BMGs, including Zr-, La-, Fe-, and Tibased BMGs. Despite the development of BMGs in the past decades 8 , the size of our BMG dataset is still small (~300) compared with others, such as the Inorganic Crystal Structure Database (https://icsd.fiz-karlsruhe.de). Thus, it is a great challenge to train the model with such a small dataset. According to the literature 27,35,36 , the choice of descriptors could influence the performance of ML model with a small dataset. Therefore, we first employed three types of basic data descriptors in our ML modeling, which correspond to alloy composition, empirical parameters commonly used in the prior works 19,37,38 , and the recently developed GFA models 27 , respectively. It is noteworthy that we can embed physical properties of alloys, such as thermal properties, in the data descriptors, hence leading to compound data descriptors that include not only alloy composition, but also GFA or other physical properties. As illustrated in Fig. 1b, we trained the adapted GAN composed of the Generator (G) and the Discriminator (D), the outcome of which comprises not only the generated alloy compositions but also their properties or attributes, such as thermal properties (see "Methods"). Apart from the immediate evaluation of the generated results within GAN (i.e., with loss function), we further evaluated our generated BMG compositions using a well-trained surrogate model (i.e., the Classifier (C), Fig. 1c) (see "Methods"). Most importantly, we note that our GDLF is also an enabler for the inverse design of BMGs, i.e., the compositional design based on targeted properties 14,39,40 . Based on our GDLF, we successfully designed four BMGs that were never reported before, which is rare compared to most of the prior works published in this field [41][42][43][44] .

RESULTS
Collection and visualization of BMG data As the first attempt, we exploited the literature and selected four types of BMGs, including Zr-, La-, Fe-and Ti-based BMGs (hereafter referred to as Zr, La, Fe, and Ti datasets respectively for simplicity), as the training datasets for our GDLF. These four types of BMGs contain affordable and non-toxic elements. Here, we emphasize that, despite the dedicated efforts of over 60 years, the total number of the reported BMG compositions is only about 300 for the four types of BMGs (see Supplementary Table 1), implicative of the grand challenges facing the field as discussed in refs. 15,23 . To gauge data diversity, we carried out anatomic analyses of the compositional data with the hierarchical clustering method 45 (see "Methods"). As seen in Fig. 2a, the Zr dataset can be grouped into 13 subtypes, each of which contains at least three different compositions. In a similar fashion, the Fe dataset can be grouped into 13 subtypes, the La dataset into eight subtypes, and the Ti dataset into three subtypes, as shown in Fig. 2b-d. Here, we note that the Ti dataset contains only 23 compositions, much smaller in size compared to the others. In principle, the relatively small dataset with diverse data poses a severe challenge 46 because the resulting large data scattering can cause GAN to diverge or to fail in generating new compositions. In such a case, the design of data descriptors that can reduce data scattering becomes important.

Generation of new BMG data
Based on the above discussions, we decided to use the compound data descriptor (CD1) comprising the composition descriptor (D1) and the theory guided descriptor (D3) to train our GAN for the generation of new BMG compositions (see "Methods"). Meanwhile, we also trained the GAN model with the basic data descriptor D1, and the compound data descriptor (CD2) composed of D1 and the empirical descriptor (D2) for comparison (see "Methods"). Based on the Zr dataset, our generator (G) successfully generated 100 new Zr-based BMG compositions. As shown in Supplementary Fig. 1, the generator loss shows that our GAN model performs very well with D1 and CD1, but poorly with CD2. Next, we evaluated the performance of the GAN model by assessing the generated BMG compositions with the surrogate model, or the MG Classifier as discussed in our previous work 27 , which was trained using over 7000 data (see "Methods"). Here, we note that almost all Zr-based BMG compositions generated with D1 and CD1 are labeled by the surrogate model as glass-forming alloys while only a fraction of the compositions generated with CD2 can pass the test of the MG Classifier ( Supplementary Fig. 2). Similarly, we also generated 100 compositions for the La-, Fe-, and Ti-based BMGs each based on the GDLF shown in Fig. 1, and the trend is that most of the generated BMG compositions with D1 and CD1, but only a few with CD2, can be successfully identified by the BMG Classifier as glass-forming alloys.
Next, we use principal component analysis (PCA) 47 and Kernel smoothing function estimate (KSFE) (see "Methods") based on the alloy compositions for the 2D visualization of the literature data ( Fig. 3a), the generated data with D1 (Fig. 3b), and the generated data with CD1 ( Fig. 3c) for the Zr-based BMGs. Here we note that we constructed the PCA projections based on all three datasets combined but draw them separately in Fig. 3a-c for comparison. To quantify the compositional differences between the generated and literature data by PC1 and PC2, we calculated the data density difference by subtracting the distribution of the generated data from that of the literature data. Evidently, the distribution of the data generated with D1 is much closer to that of the literature data than that with CD1, as seen in Fig. 3d, e. In order to compare the difference among the different datasets, we calculated their Wasserstein distances 48 , which are 0.263 between the original data and the generated data via D1 and 0.343 between the original data and the generated data via CD1. This result is important and indicates that the distribution of the generated Fig. 1 The flowchart of the generative deep-learning framework for inverse design of bulk metallic glasses. a The available datasets and the possible descriptors. b The sketch of the generative adversarial network algorithm. c Evaluation of the generated compositions with the surrogate supervised learning models (i.e., the MG classifier) and the inverse design algorithm.
data via D1 is closer to that of the literature data compared to the distribution of the generated data via CD1.
Aside from the use of the compound data descriptor (CD1), we could also generate new BMG compositions by learning from an expanded dataset that comprises more than one type of BMGs. As a demonstration, we added to the Zr dataset 44 Cu-based, 21 ZrCu-based, 17 Ni-based, and 4 Ti-based BMG compositions (see Supplementary Table 2). Figure 3f shows the probability density plot of the expanded dataset that exhibits three major data groupings (i.e., Zr-, ZrCu-, Cu-based BMGs). Here, we note that the data of the Ni-and Ti-based BMGs are dispersed among other types of BMGs (see Supplementary Fig. 3 for details). By training the GAN model with the expanded dataset, we generated 100 BMG compositions (see Supplementary Table 3). Interestingly, through the mixing of the different datasets, we successfully generated some composition that does not look similar to Zrbased BMGs, such as Zr 27.3 Cu 26.1 Hf 16.9 Al 9.6 Ti 7.2 Ni 6.4 Ag 5.9 Nb 0.4 (in atomic percentage) as exemplified in Fig. 3g. We note that this composition entails multiple principal elements (i.e., Zr, Cu, Hf), resembling high entropy alloys 31,32 more than conventional metallic glasses 8 . For experimental validation, we prepared a 2 mm Zr 27.3 Cu 26.1 Hf 16.9 Al 9.6 Ti 7.2 Ni 6.4 Ag 5.9 Nb 0.4 bulk rod by arcmelting and Cu-mold casting (see "Methods"). Figure 3k shows the X-ray diffraction (XRD) spectrum of the Zr 27.3 Cu 26.1 Hf 16.9 Al 9.6 Ti 7.2 -Ni 6.4 Ag 5.9 Nb 0.4 sample (Fig. 3i), indicative of its amorphous nature. We also calculated the full-width half maximum (FWHM) Δq of the new BMG (Δq = 0.38 Å −1 ) following the work by Li et al. 49 (see Supplementary Fig. 4). Through the energy-dispersive X-ray spectroscopy (EDX) mapping and analyses, we identify that the cast alloy has uniform elemental distributions within the EDX resolution (Fig. 3j) and its overall composition is consistent with the generated composition (see Supplementary Table 4). These results are promising and imply that one can now discover BMGs by navigating uncharted territories in the complex compositional space.

Inverse design of BMGs
Apart from the discovery of unconventional or compositionally complex BMGs, we emphasize that our generative deep-learning framework also enables an inverse design of BMGs based on targeted properties or attributes. Figure 4a illustrates the mechanism for such an inverse design approach. (1) First, we further enrich the compound data descriptor (i.e., CD1) by adding to it the properties or attributes of existent BMGs and train the GAN model with the new compound data descriptor. (2) As a result, the well-trained generator (G) can produce not only the compositions of BMGs but also their corresponding properties, giving rise to composition-property mappings. (3) Based on the composition-property mappings, we can now pinpoint the BMG compositions that can be keyed to targeted properties. As the proof of concept, we added three characteristic temperatures of BMGs, including glass transition temperature (T g ), crystallization temperature (T x ) and liquidus temperature (T l ) to the compound descriptor (i.e., CD1) for the inverse design (see Supplementary  Table 5).
With the expanded data descriptor, we obtained the composition-property mapping based on the Zr dataset, which maps the properties (T g , T x , and T l ) to the compositions of Zrbased BMGs. Here, we note that the generator loss is also low with the expanded data descriptor, and most of the generated compositions pass the test of the surrogate model (see Supplementary Figs. 1 and 2) with T g ranging from 550 to 720 K, T x from 560 to 800 K, and T l from 1000 to 1250 K (Fig. 4b). Our next step is to find out a candidate Zr-based composition with maximized T g and reduced glass transition temperature T rg (= T g /T l ). Figure 4c shows the plot of T g versus T l of the generated compositions, from which we selected three compositions that possess relatively high values of T rg and T g (i.e., Zr 49.8 Cu 32.9 Al 7.4 -Ni 3.9 Ag 2.4 Ti 1.7 Nb 1.4 Fe 0.5 (S1), Zr 50.5 Cu 36.3 Ag 4.7 Al 4.1 Ni 3.4 Nb 0.2 Ti 0.2 (S2), and Zr 45.2 Cu 37.8 Al 9.1 Ag 4.9 Ni 3 (S3)). To validate our selections, we cast 2 mm rod samples (see Supplementary Fig. 5) for all of them. According to the XRD spectra (see Fig. 4d and Supplementary Fig.  4), all three bulk samples are amorphous with Δq values of 0.32-0.37 Å −1 , and their actual compositions are consistent with the generated compositions (see Supplementary Table 4). After that, we carried out differential scanning calorimetry (DSC) experiments to measure their T g , T x (Fig. 4e) and T l (Fig. 4f) (see "Methods"). As seen in Fig. 4g, it is evident that the measurements agree very well with the targeted values (see Supplementary Table  6 for details).

DISCUSSION
Now let us discuss the outcome of our current work and its implications. Unlike the prior works based on supervised learning [25][26][27][28]33 , our GDLF is built upon the unsupervised GAN algorithm to generate BMG compositions that never exist in the literature, and this approach is particularly useful for designing compositionally complex alloys containing a few principal elements. From the perspective of our GDLF, these multi- principal BMG compositions can be viewed as the hybrids generated by mixing the individual datasets for a given type of BMGs, such as the Zr-, La-, Fe-, Ti-based datasets. Aside from the database, we emphasize that the utility of our GDLF also depends on data descriptors.
According to the GAN literature 50 , the distribution of the generated data can be interpreted as the probability density function of the training data. Therefore, if the data descriptors only contain the compositional information, such as the basic compositional descriptor (D1), the generated data are most likely to be within the distribution of the training data; in other words, the generated compositions would simply look similar to those in the literature. Hence, in order to enhance the diversity of the generated data, we combine the basic compositional descriptor (D1) with other descriptors to form a compound descriptor (i.e., CD1). Based on the training result, one could find that the performance of our GDLF depends strongly on the data descriptors. While the empirical descriptors, such as the atomic size difference δ, the mixing enthalpy ΔH mix , and many others 25,28,33,37,38 , are useful for the GFA study with the previous supervised learning models, such as the supervised classification 25,28,37 or supervised regression 25,51 , however, we note that the use of these empirical descriptors is not straightforward for the unsupervised generative machine learning. More than often, these empirical descriptors can lead to the collapse of our GDLF because the GAN cannot converge with them. This result is consistent with the fact that the use of the empirical descriptors leads to a large data scattering, hampering the training performance of the unsupervised generative models 46 . This delivers a strong message that, although the empirical parameters are widely used for regression and classification models (supervised learning), one should be cautious about their direct use in generative machine learning (unsupervised learning). By comparison, the thermodynamic theory-guided descriptors (D3), as we recently proposed 27 , prove to be very useful in improving the diversity of the generated data out of our GDLF. Hence, by combining the compositional and theory-guided descriptors, our GDLF can generate BMG compositions which look differently from the literature compositions. In addition, we emphasize that, according to ref. 36 , the use of theory-guided data descriptors can improve the performance of ML models even with a dataset of a relatively small size. This may explain why our GDLF performs also well on the Ti dataset that contains only 23 compositions. By further including the properties in the data descriptors, we emphasize that our GDLF can enable the inverse design of BMGs. In such a case, the well-trained generator can produce composition-property mappings which allow one to pinpoint the candidate compositions that can be mapped to the targeted properties, such as T g , T x , T l , or their derivatives. Moreover, since our GDLF is capable of generating different types of BMGs, such as Zr-, La-, Fe-, Ti-based BMGs and even hybrid BMGs, one could envision that other high-order systems may be designed based on our GDLF with the proper dataset. Based on the above discussions, we envision that the establishment of the current framework may pave the way for the design of compositionally complex BMGs at an accelerated pace unmatched by prior methods [25][26][27][28]33 .

Descriptor design
The compositional descriptor (D1) is calculated based on the atomic fractions of given elements. Specifically, we used the contents of Zr, Cu, Al, Ni, Ag, Ti, Nb, and Co as D1 for the Zr dataset, La, Al, Cu, Ni, Ag, Co, Ce, and Mg for the La dataset, Fe, B, C, Si, Mo, P, Y, Co Cr, and Nb for the Fe dataset, Ti, Cu, Ni, Zr, Be, Sn, Si, and B for the Ti dataset, Zr, Ti, Cu, Ni, Al, Nb, Ag, and Hf for the Fig. 4 The inverse design of BMGs based on the well-trained generator. a The sketch for the inverse design algorithm. b The distribution of the generated T g , T x , and T l . c The plot of the generated T g versus T l , from which three candidate compositions are selected. d The XRD results of three ф 2 mm rods cast based on the selected compositions, indicative of an amorphous structure. e, f The DSC curves of S1, S2, and S3. g Comparison of the measured and generated values of T g , T x , and T l .
hybrid dataset, Zr, Ti, Cu, Al, Ni, Ti, Fe, and Nb for the Zr dataset together with the thermal properties for the training of the GAN model. The content for each element is normalized to the interval [0,1].
The empirical descriptors (D2) are designed based on 13 empirical parameters 23,37 . These include the mean atom radius (a ¼ P n i¼1 c i r i ), the atomic size difference (δ ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi P n i¼1 c i ð1 À ri a Þ 2 q ), the average of the melting points of the constituent elements (T m ¼ P n i¼1 c i T mi ), the standard deviation of the melting temperature (σ T ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P n i¼1 c i ð1 À Tmi Tm Þ 2 q ), the average mixing enthalpy (ΔH mix ¼ 4 P i≠j c i c j H ij ), the standard deviation of mixing enthalpy (σ ΔH ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P i≠j c i c j ðH ij À ΔH mix Þ 2 q ), the ideal mixing entropy (S id ¼ Àk B P n i¼1 c i lnc i ), the electronegativity (χ ¼ P n i¼1 c i χ i ), the standard deviation of electronegativity (σ χ ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P n i¼1 c i ðχ i À χÞ 2 q ), the valence electron concentration (VEC ¼ P n i¼1 c i VEC i ), the standard deviation of valence electron concentration ), the mean bulk modulus (K ¼ P n i¼1 c i K i ), and the standard deviation of bulk modulus Here, n is the number of elements; c i represents the composition fraction of element i; r i , T mi , χ i , VEC i , K i represent the atomic size, the melting temperature, the electronegativity, the valence electron concentration, and the bulk modulus of element i, respectively; H ij is the mixing enthalpy between element i and j; k B is the constant of Boltzmann.
The theory-guided descriptor (D3) is calculated based on the physical models considering the effects of atomic size 27,52 (five descriptors), local chemical affinity 53 (two descriptors) and nonideal mixing entropy 54 (one descriptor). According to the geometrical model proposed by Ye et al. 52 , the elastic energy storage can be expressed by u e ¼ P n i¼1 9 2 α i β i c i ε 2 i , where α i , β i are constants, and ε i is the residual strain originating from the atomic size difference (see ref. 52 ). Here, we first use the root mean squared residual strain ε RMS ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi P n i¼1 c i ε 2 i p as one descriptor and then generalize the derivation by considering the effect of coordination number deficiency δN N and chemical fluctuation δc. By minimizing the elastic energy storage u e δN N ; δc À Á which is subject to three constraints P n i¼1 c i ε i ¼ 0, δNi Ni ¼ 0, and P n i¼1 δc i ¼ 0, we successfully obtained four scalar descriptors, including the mean value and standard deviation of δN N and δc (see our previous work for details 27 ). The two descriptors derived from local chemical affinity 27,53 are the standard deviation of the cohesive energies of the constituent and the averaged inter- , where n is the number of constituent elements; ε ii is the cohesive energy of atom i and ε ij is the interaction energy between atoms i and j. Based on the work of He et al. 54 , the correlated configurational entropy could be expressed as ; K represents the average bulk modulus; V is the average atomic volume; ΔH is the average of mixing enthalpy ΔH ij between elements i and j.

Wasserstein distance
The data matrix X was first normalized to [0,1] by different features, and then the Wasserstein distance between two data matrices was calculated via the Sinkhorn iterations with the PyTorch package of python. If the two data matrices are totally same, the Wasserstein distance should be 0. In contrast, the larger Wasserstein distance between two distributions means the further difference between them.

Generative adversarial network (GAN)
The GAN model is implemented on Matlab R2020a with the Statistics and Machine Learning Toolbox and Deep Learning Toolbox (https://github.com/jonzhaocn/GAN-Base-on-Matlab). The generator contains three layers, including the input layer with 100 neurons, the hidden layer with 1000 neurons, and the output layer with the number of neurons matching the size of the data descriptors. The discriminator contains three layers, including the input layer with the number of neurons matching the size of the data descriptors, the hidden layer with 1000 neurons, and the output layer with a single neuron. Both the generator and the discriminator are based on a fully connected network. In the hidden layer, we use the Leaky ReLU function and, in the output layer, the Sigmoid function y i ¼ 1 1þe Àx is applied as the activation function. The network was trained with an optimizer of adaptive moment estimation and the learning rate is set to 0.001.
Training of the surrogate model The surrogate model or the MG classifier is trained based oñ 5000 MGs and~2000 non-MGs hitherto reported (https:// citrination.com/datasets/198590). In order to keep the balance between MGs and non-MGs, we apply the synthetic minority oversampling technique to oversample the non-MGs by 150%. The 8 theory-guided descriptors are used as the data descriptor for classification. The classifier is trained by the Classification Learning App in Matlab R2020a with the Statistics and Machine Learning Toolbox. The model type is set to be Boosted Trees and the other parameters are set as the default values in order to achieve the best training results. 10-fold cross-validation is applied to avoid overfitting.

Hierarchical clustering tree
We use the compositional and elemental information, including the atomic fraction, group, period, atomic size, and electronegativity of each constituent element, as the data descriptors of hierarchical clustering. Each kind of descriptor is normalized to [0,1]. The hierarchical clustering is constructed by using the "Ggtree" and "Ape" packages in R Language, and the tool ITOL is used (https://itol.embl.de/) to generate the layout of the radial hierarchical cluster tree.

Principal component analysis (PCA)
The data matrix X is first normalized to [0,1] on each data descriptor (each column), and then PCA is performed with the Matlab function pca(X). The function returns the principal components for the 2D plot via KSFE.

Kernel smoothing function estimate (KSFE)
The KSFE is performed using the Matlab function ksdensity(x). The function returns a probability density estimate, for univariate and bivariate data x.
Arc-melting and Cu-mold casting We use pure metals, including Zr, Cu, Ni, Al, Ti, Al, Ag, Nb, and Fe (purity level >99.95%) to prepare the rod samples. A lab-scale arcmelting furnace is used to melt the metals, producing alloy ingots in Ar atmosphere. To avoid possible oxidation, the furnace is prevacuumized to 8 × 10 −4 Pa with a Ti getter. Then the melted alloy ingots are cast into a Cu mold with a dimension of ф 2 mm.