Introduction

Machine learning is a powerful tool which has become an important complement to experiment, theory, and modeling1,2,3,4,5,6. It has been widely used in materials research to mine composition–processing–properties relationships: e.g., predicting compound forming energy7,8,9, superconductors critical temperature10,11, alloy’s phases12,13,14,15,16, materials’ properties17,18,19,20,21,22,23,24. However, many challenges exist when applying machine learning to new materials development25,26,27. It is common that only small dataset is available for a specific task. In materials science it is impossible to assemble big datasets like that in internet and e-commerce, though materials genome initiative28,29, high-throughput computing, and experiment have increased the speed of generating data by dozens to hundreds of folds30,31.

With a set of suitable descriptors, conventional machine learning can performance very well even with small dataset. However, the optimal set of descriptors for a specific job in material research is not out-of-shelf. It is selected by trails and errors and adding new pertinent descriptors is always been considered if models’ performance is not met requirement32. Building new applicable descriptors entails deep understanding of mechanisms, which is very challenging in developing new materials. For example, Ward et al.7 first used 145 general-purpose Magpie descriptors (descriptive statistics, e.g., average, range, and variance of the constituent elements) in predicting ternary amorphous ribbon alloys (AMRs). Later they used 210 descriptors (including 145 Magpie descriptors and new descriptors derived from physical models and empirical rules developed by amorphous alloys community, e.g., cluster packing efficiency and formation enthalpy) in optimizing Zr-based bulk metallic glass (BMG)33. Some descriptors derived from physical models and empirical rules are sensitive to alloying and temperature; obtaining precise values of them is difficult; using simplified models to calculate them (e.g. utilize ideal solution model in estimating alloy mixing enthalpy instead of Miedema model or experimental results) might weaken the final machine learning models’ performance.

How to fully exploit limited data, existing models, and domain expertise is the key to efficiently applying machine learning in materials research, and general and transferrable machine learning frameworks are in urgent need. Transfer learning is a special machine learning technique that enables models to achieve high performance using small datasets through knowledge sharing between modes in related domains34,35,36. Deep learning is an end-to-end learning which combines automatic feature extractors and conventional machine learning models as regressors or classifiers into one model37. Deep learning has an advantage over conventional machine learning in exploiting transfer learning for its feature extractors can be easily reused in related tasks.

Predicting the phases of a material, e.g., solid solution phases of simple BCC/FCC/HCP structure, intermetallics of complex structure, metastable amorphous phases, and mixture of different phases, is the basic tasks and fundamental challenges of materials research. AMRs and BMGs extended materials from conventional crystalline (CR) metallic materials to amorphous state materials38,39,40,41,42; high-entropy alloys (HEAs), which are also known as multi-principal element alloys (MPEAs) and concentrated solid solution alloys, extended metallic materials from corner and edge regions to the center regions of multi-component phase diagrams43,44,45,46,47. Predicting them challenges our classical theory48. Researchers have attempted to predict alloys’ glass-forming ability (GFA) and HEAs’ phases by empirical thermo-physical parameters49,50,51,52,53, CALPHAD method54,55, first-principles calculations56. Conventional machine learning was used in these tasks as well7,13,14,15,16,33. However, developing new amorphous alloys and HEAs by design is still quite challenging, for their mechanisms are still not clear and data are much less than that of conventional materials, e.g., steels, aluminum alloys.

In this work, we propose a general and transferable deep learning (GTDL) framework to predict phase formation in materials with small dataset and unclear transformation mechanism. Case studies on GTDL predictions with a medium-sized dataset (containing 10000+ pieces of data) of GFA and a small dataset (containing only 355 pieces of data) of HEAs demonstrate: GTDL framework outperforms existing models based on manual features, periodic table knowledge embedded in data representations helps to make predictions, and knowledge shared between different models enable prediction with small dataset. The proposed GDTL framework can be easily used in new materials development with small datasets by exploiting trained deep learning models on big dataset of related materials.

Results

GTDL framework

The pipeline of this work and schematics for transfer learning, etc. are shown in Fig. 1a. For deep learning accepts unstructured data, e.g., image, audio, as input, we mapped raw data, e.g., chemistry and processing parameters, to pseudo-images first using some special two-dimensional (2-D) structures, Convolutional neural networks (CNNs) were then utilized to automatically extract features through their hierarchy structure and to make classification/regression. The well-trained feature extractors, i.e., convolutional layers were reused directly for new tasks with small dataset. Here, we used a whole periodic table containing 108 elements for composition mapping (periodic table representation, PTR). In order to bring processing parameters into representation, we mapped them to an unused area in the periodic table (see Supplementary Fig. 1). An example of PTR for alloy Fe73.5Cu1Nb3Si13.5B9 is given in Fig. 1a. We compared models using different mappings without periodic table structure (see Supplementary Figs 2 and 3), e.g., atom table representation10, to prove the advantage of the embedded periodic table structure. We also compared our models with conventional machine learning models using manual feature engineering (see the full list of the features in Supplementary Table 1) to validate the convenience of automatic features engineering. The workflow of conventional machine learning is also shown in Fig. 1a. A clear advantage of deep learning framework over conventional machine learning is it can automatically extract features and transfer knowledge.

Fig. 1: The workflow of our works.
figure 1

a The workflow of the proposed GTDL framework (in green solid arrows) and conventional machine learning (in black dotted arrows) which does not have the ability of automatically extracting features and knowledge transfer. The schematics for assembling dataset, data representation, machine learning, knowledge transfer, and an example of PTR (periodic table representation) were given. MF, SNN, RF, SVM, and CNN denotes manual features, shallow neural network, random forest, supported vector machine, and convolutional neural network, respectively. In GTDL framework, raw data are mapped to 2-D pseudo-images first, features are then extracted automatically by convolutional layers, knowledge is transferred by sharing the well-trained feature extractors for new tasks with small dataset. b The schematics for our VGG-like convolutional neural network.

Many classical CNN structures for image recognition are available now. However, we need to simplify and compress those structures to reduce the risk of overfitting limited data in our tasks. We tested some simplified classical CNNs, e.g., AlexNet57, VGG58, GoogLeNet, and Inception module59. A VGG-like CNN which is shown in Fig. 1b was used in our work due to its very compact structure and strong power of feature extraction. Our VGG-like CNN has 6274 trainable parameters, only 1% size of atom table CNN10 (611,465 trainable parameters). Thus, it can reduce the risks of overfitting effectively.

Predicting GFA using GDTL

The GFA of an alloy, i.e., the critical cooling rate below which the alloy melt undergoes nucleation and growth and forms crystal (CR), is a core problem in developing new amorphous alloys. However, it is challenging to measure the critical cooling rate experimentally. Researchers often simplify GFA into three levels: BMG, AMR, and CR, which correspond to strong, weak, and no GFA, respectively13. GFA of an alloy can be roughly evaluated through melt-spun (its cooling rate is in the range of 106−105 K s−1) and copper mold casting (its cooling rate is in the range of 102−1 K s−1): if an alloy forms a crystalline state under melt-spun, it is labeled CR (no GFA); if it forms an amorphous state through melt-spun but forms crystalline state under copper mold casting, then it is labeled as AMR (weak GFA); if it forms amorphous state under copper mold casting, it is classified as BMG (strong GFA).

In this work, we try to assemble a GFA dataset as large as possible. Our dataset includes Sun’s binary alloys GFA dataset (about 3000 entries)13, Ward’s ternary alloys GFA dataset (about 6000 entries)7, and BMG dataset (about 800 entries)33, and Miracle’s GFA dataset (about 300 entries)60. In those datasets, crystalline alloys data are in the minority, because AMRs and BMGs are the focus of research, and crystalline alloys are commonly discarded and unpublished as the failed experimental results. In reality, the number of amorphous alloys is less than that of their crystalline counterparts. To compensate for this weakness and increase the variety of crystalline data in our dataset, we add 800+ pieces of conventional crystalline metallic materials data (including steels, superalloys and Co, Al, Mg, Cu, Zn alloys, etc.) which is extracted from https://www.makeitfrom.com/. Figure 2 shows the statistics of elements distribution in our dataset (for detailed statistics see Supplementary Figs. 5 and 6). Our dataset contains 97 elements in the periodic table, and many of these elements are present simultaneously in entries of CRs, AMRs, and BMGs. Considering that some AMRs in our dataset are actually BMGs (due to incomplete record and experiment), we did not simply treat it as a (CR/AMR/BMG) ternary classification problem. Instead, a processing parameter (0 represents rapid solidification melt-spun, and 100 represent copper mold casting of normal cooling rate) was added into this problem to convert the ternary classification problem into a (AM/CR) binary classification problem (AM represents forming amorphous state, and CR represents forming crystalline state). The size of our original dataset is 10,440, and the size of dataset after conversion is 16,250.

Fig. 2: Statistics of our glass-forming ability dataset.
figure 2

The occurrence numbers of elements in the dataset are given under periodic table background. The blank squares, e.g., squares for noble gases, indicate the elements not in the dataset.

Table 1 shows the average training and testing accuracies of four shallow neural networks (SNNs) and three CNNs in 10-fold cross-validation. SNN1 (using only 14 features derived from empirical rules of BMGs community) and SNN4 (using 145 general-purpose Magpie descriptors7) show the lowest testing accuracy of about 90%. They show a marginal difference in accuracy with Ward’s random forest models (89.9% vs. 90%). SNN4 and Ward’s random forest model used 145 general-purpose Magpie features, and SNN1 only used 14 features (including one processing parameter, mixing entropy, the statistical information of atomic radius, Pauling electronegativity, bulk modulus, and work function). We found increasing features or even using the full list of features (see Supplementary Table 1) did not improve accuracy. SNN2 only used composition vector as input, but it showed higher accuracy than SNN1 and SNN4. SNN3 used manual features vector plus composition vector as input and it improved the accuracy further. Due to our limited understanding of the GFA’s physical mechanisms and lack of precise property data as input (e.g., ideal solution model and Miedema model were used to calculate alloy mixing entropy and mixing enthalpy, respectively), improving the model accuracy by adding more pertinent features is impracticable. All four SNNs show lower accuracies than three CNNs. Besides CNNs’ accuracy advantage over SNNs, it is also quite convenient to use CNNs, for they only need compositions and processing parameters as input, and they automatically extract features through convolutional layers. CNN3 which refers to a CNN with PTR shows the highest testing accuracy of 96.3%. The only difference among three CNNs is in that the data representations of CNN1 and CNN2 did not have periodic table structure. The advantage of CNN3 over CNN2 and CNN1 is not evident (only 1.3% higher). However, we will demonstrate that CNN3 has more obvious advantages over other models in predicting unseen alloys, i.e., better generalization.

Table 1 Comparison of average accuracy among different models under 10-fold cross-validation.

The Al–Ni–Zr ternary system has 296 entries (include 186 entries from the Al–Ni–Zr ternary system, and 110 entries from Al–Ni, Al–Zr, Ni–Zr binary systems) in our dataset and the distribution of data points is relatively uniform in composition space, see Fig. 3a the ground truth of the Al–Ni–Zr system. So, the Al–Ni–Zr system is quite suitable to validate and compare models. Figure 3b–d shows the GFA prediction of CNN3, SNN3, and CNN2. CNN3 successfully predicted three amorphous composition areas, and the shapes and boundaries of these areas are satisfied when compared with the ground truth. Other models did not predict all three areas. SNN3 did not predict the crystalline area between two amorphous composition areas, i.e., the GFA of the area was overestimated. CNN2 successfully predicted two amorphous composition areas but missed the small amorphous composition areas near Ni corner. All models correctly predicted the five BMGs in ground truth and the predicted BMGs cover certain area (not some discrete points) around the ground truth points. It is reasonable, researchers commonly reported the optimal BMGs only, and BMG candidates (especially before the appearance of BMGs) are archived as AMRs data. This sparse and ununiform distribution of BMG data points usually induces BMG data points buried by surrounding densely distributed AMR data points and omitted as a noise (see Fig. 3a). That is why we adjusted the ternary classification into binary classification, i.e., ternary classification easily underestimates alloys’ GFA.

Fig. 3: Comparison of experimental data and predictions.
figure 3

a Experimental data points of Al–Ni–Zr ternary system in our dataset and the predictions of b CNN3, c SNN3, and d CNN2.

To validate predicting ability of models on unseen alloy systems, we carried out a leave-one-system-out (LOSO, like the leave-one-cluster-out cross-validation used by Meredig et al.61) cross-validation on 160 ternary systems which has over 40 entries in our dataset. In LOSO cross-validation for a ternary system A–B–C, entries of A–B, A–C, B–C binary alloys and A–B–C ternary alloys were hold out as testing dataset. Models were trained with the remaining dataset. The average testing accuracies of SNN4, CNN2, and CNN3 under LOSO cross-validation are shown in Table 2. CNN3 outperforms CNN2 and SNN4 in predicting unseen alloy systems by about 7%.

Table 2 Comparison of models’ prediction accuracy on unseen alloy systems.

Table 2 also show the LOSO cross-validation results for the Al–Ni–Zr system. Here, we used Al–Ni–Zr AMR results (5151 composition points in total, not the 296 Al–Ni–Zr entries in our dataset) in Fig. 3b as ground truth to calculate prediction accuracy. The predictions of CNN2 and CNN3 are shown in Fig. 4. We can see CNN3 shows accuracy advantage over other CNNs and SNNs by at least 12% when no Al–Ni–Zr data are in training dataset.

Fig. 4: Predictions for Al–Ni–Zr ternary system by the re-trained models.
figure 4

a CNN2, b CNN3, using dataset in which data about Al–Ni, Al–Zr, Ni–Zr binary alloys, and Al–Ni–Zr-containing multi-component alloys were removed.

To further validate the generalization of the models, we collected some newly reported BMG alloys and some specially selected alloys that outside our dataset, e.g., high-temperature Ir–Ni–Ta–(B) BMGs62, Mg–Cu–Yb BMGs63, sulfur-bearing BMGs64, RE-bearing alloys RE6Fe72B22 (RE: Sc, Y, La, Ce, Pr, Nd, Sm, Eu, Gd, Tb, Dy, Ho, Er)65, and 18 binary alloys outliers according to empirical criteria66. Our dataset only has one ternary AMR data point about Ir–Ni–Ta–(B) system and does not have any data about Mg–Cu–Yb system and sulfur-bearing AMRs and BMGs. Rare earth elements have close physical and chemical properties. However, experimental results show the simple substitution of rare earth elements causes the GFA variations of RE6Fe72B22 alloys. Louzguine-Luzgin reported 18 binary alloys outliers which should be good glass-formers according to empirical criteria, but they cannot form an amorphous state even in rapid solidification. Table 2 shows that CNN3 with PTR plus automatic feature engineering attained the highest prediction ability and SNN3 and SNN4 based on manual feature engineering performed the worst. The performances of CNN1 and CNN2, which use automatic feature engineering but do not have periodic table structure in data representation, are between SNN3 and CNN3. The detailed comparisons are shown in Supplementary Tables 5 and 6. These rigorous tests strongly verified CNN3 can be used to predict the GFA in unassessed alloy systems.

Overall, when dataset is large enough (e.g. the Al–Ni–Zr system), the benefit of adding periodic table structure (domain expertise) to representation is not obvious. When data are insufficient or no data are available, domain expertise is vital. Periodic table structure plus CNN, like CNN3, brings the convenience of automatic feature engineering and improves the generalization by introducing background knowledge.

Transfer learning of HEAs with small dataset using GTDL

The well-trained deep learning models for GFA can be reused in predictions of related materials e.g., HEAs. All previous machine learning studies on HEAs used manual feature engineering plus conventional machine learning models, e.g. supported vector machine14 and SNN15. These models need sophisticated features as input and can only distinguish BCC from FCC, or differentiate intermetallics from solid solutions. Tasks like predicting HEAs of HCP structure is rather difficult due to limited data. The two machine learning tasks, i.e. predicting GFA and predicting phases of HEAs, have different output domain (amorphous/crystalline binary classification in GFA prediction and five phases labels in HEAs prediction) and highly correlated (or overlapped) input domain from the point of transfer learning: Figs 2 and 5b show common elements in those alloys are similar; some amorphous alloys are also HEAs; the descriptors developed in conventional machine learning for GFA and HEAs can be shared7,13,14,15,16,33 (e.g. atomic size difference, mixing enthalpy, mixing entropy, difference in Pauling electronegativities, and valence electron concentration). So, we believe that the automatic feature extractors of the well-trained CNNs, which have outperformed known manual features in GFA prediction, will work in HEAs prediction too. Based on the features, we built a high-performance model with a small dataset which can discriminate five types of phases (BCC, FCC, HCP, amorphous, mixture of multiple phases) in HEAs in one go.

Fig. 5: Statistics of the 355 HEAs in the dataset.
figure 5

a The numbers of binary to nonary HEAs and the proportions of different phases. b The occurrence numbers of elements in the dataset are given under periodic table background. The blank squares, e.g., squares for noble gases, signify the elements not in the dataset.

Here, we used the dataset from Gao’s review on HEAs51 where experimentally synthesized 355 HEAs data are collected. Therein, 41 samples have single BCC phase, 24 samples have single FCC phase, 14 samples have single HCP phase, and 59 samples have single amorphous phase. The remaining 217 samples with multiple phases. Numbers of binary to nonary MPEAs and the proportions of BCC, FCC, HCP, amorphous and multiple phases are shown in Fig. 5a. Most of the samples consist of five or six elements and the single-phase HEAs only account for a small fraction. There are 50 elements in the dataset and their occurrence frequencies are shown in Fig. 5b. Elements Fe, Ni, Cr, Co, and Al occur in more than 190 samples, while Sc, Tc, Ga, Ge, and Tm only occur once. It is rather difficult to build machine learning models using such small dataset, with so many elements and unbalanced data distribution.

In transfer learning from GFA to HEAs, the 2-D representations of HEAs’ compositions were fed into the well-trained CNN1, CNN2, CNN3, and the intermediate results (high-dimensional features yielded from convolutional layers) of theses CNNs were extracted. Then these features were used in new classifier (here we used random forest for its good interpretability, and it need very little hyperparameters optimization) as input. Stratified data division strategy (to ensure training and testing dataset have similar data distribution) and Sklearn package were used in training. Table 3 shows the average scores of our transfer learning models on HEAs dataset under fivefold cross-validation. Our model without resorting to any manual features engineering is capable of distinguishing BCC, FCC, HCP, amorphous, and multiple-phase mixture with fivefold cross-validation scores (average accuracy/recall/precision/F1 on testing datasets) over 94% after training and test. We should bear in mind that when labels’ distribution is unbalanced like that of our HEAs data, achieving high recall, high precision, and high accuracy at the same time is very difficult. We can see model transferred from CNN3 has the highest scores which indicates that PTR is also beneficial for transfer learning. Our previous results and some research67 show that if dataset is not big enough, domain knowledge is important for model’s performance. Though raw data (alloy compositions) are the same for CNN1, CNN2, and CNN3, the direct input (data representations) for them and the information extracted by corresponding feature extractors are different. Domain knowledge (periodic table structure) was embedded in CNN3’s input and embodied in the features extracted, while CNN1 and CNN2 do not have access to this knowledge. The proposed transfer learning model is an upgrade for conventional machine learning relying on manual feature engineering and could serve as an effective guide for designing new HEAs.

Table 3 The scores of transfer learning on high-entropy alloys dataset under fivefold cross-validation with three data representations.

Discussions

To explain why PTR and transfer learning is effective, we illustrated the information that is automatically extracted from different representations by CNNs. Visualizing the high-dimensional features extracted by convolutional layers, i.e., the intermediate results of CNNs, is a good way to explore the extracted features. However, finding the visual and intuitive relationship between elements from these high-dimensional features (see Supplementary Fig. 10) is still very challenging: dimensionality reduction is necessary. Those high-dimensional features were compressed by principal component analysis and the first two/four principal components were visualized.

Figure 6a illustrates the knowledge of 108 elements extracted by CNN with PTR and it shows apparent periodic trends: elements from 18 groups, lanthanide (group 19), and actinides (group 20) are clustered in different regions (marked with different colors); group 1 to group 18 distribute along a semicircle in sequence; elements from lanthanide and actinides distribute in two semicircles with atomic number sequence; elements in one group distributes from semicircle’s inside to outside according to ascending atomic number. More than half the elements in periodic table have limited data in our dataset, and halogens (group 17), noble gases (group 18), etc. are absent in our dataset, but the trends of them are consistent and reasonable. It indicates the PTR transfers the knowledge of periodic table to the GFA knowledge, i.e., background knowledge was absorbed by the machine learning models. Figure 6b illustrates the knowledge extracted by CNN from representation without periodic table background: randomized periodic table structure embedded in data representation was learned by model.

Fig. 6: Features analysis of the GFA prediction model after PCA.
figure 6

Projection of the feature vectors of 108 elements onto the plane spanned by the first and second principal axes. The percentage represents the ratio of the variance on the principal axis direction. Elements are colored according to their elemental groups. a Periodic table representation. b Randomized periodic table representation. The superscript 1–18 on element symbol represents the element’s group number; superscript 19 and 20 represent lanthanide and actinides, respectively.

Periodic table has abundant physical and chemical knowledge (see Supplementary Fig. 4). Atomic radius, Pauli electronegativity, valence electrons density, and other physical chemistry properties display periodic variations in periodic table. When developing new amorphous alloys, periodic table is often used as a map. Similar atom substitution and column substitution are common strategies for improving GFA. The spatial information or elements’ relative position information is difficult to be fully described by manual features engineering. The solution is keeping the periodic table structure in representation. Materials properties originated from electrons’ behaviors. The periodic characteristic of element properties in periodic table originated from electrons configuration. The electron configuration of an element can be inferred given its position in periodic table. The abscissa and ordinate of an element in PTR correspond to group number (outer shell electrons number) and its period number (the number of electron shells). CNN exacts the spatial (or co-ordinates) information of pixels in 2-D representation through convolutional layers. So, the knowledge of each element’s group number, period number, and electronic configuration in PTR can be transferred to the features that CNN automatically extracted. Element properties (such as atomic radius and Pauli electronegativity) are not explicitly provided in PTR. However, the periodic characteristic of element properties in rows (period) and columns (group) is embedded in PTR. The element properties that CNN3 (PTR) learned from GFA dataset vary with atomic number periodically (see Supplementary Fig. 11). In contrast, the element properties learned by CNN2 (randomized PTR) did not show periodic characteristic (see Supplementary Fig. 12).

It explained why CNN3 shows better performance in predicting new data than CNN1 and CNN2: we provided different expertise to CNN1, CNN2, and CNN3; and domain knowledge is helpful for machine learning models with small dataset67. Adding periodic table structure into data representations affords models the ability to infer useful information from the periodic table when direct data are insufficient.

The features of 355 HEAs generated by GFA model are shown in Fig. 7. We can see that alloys of the same phases tend to cluster in the diagrams. Based on the first and second principal features, we can intuitively distinguished stable BCC, FCC, HCP, and multi-phase alloys. Most alloys of metastable amorphous phases can be discriminated from other alloys of stable phases by third and fourth principal features visually. It indicates transfer learning from GFA to HEA is successful and justifies the high scores of our model for HEAs.

Fig. 7: The high-entropy alloys’ first four principal features generated by glass-forming ability model.
figure 7

Alloys are colored according to their phases. The percentage represents the ratio of the variance on the principal axis direction.

In sum, CNNs get domain knowledge (e.g., periodic table knowledge) embedded in 2-D representation through learning. Periodic table knowledge and PTR is beneficial for machine learning models with small dataset. The feature extractor of CNN for GFA can generate appropriate features for HEAs prediction brings the success of transfer learning.

Methods

Data representations

Raw data need to be converted into one-dimensional (1-D) vector of features by manual feature engineering for conventional machine learning. This is a process of refining information and adding expertise to data representation. The performance of final models relies on the quality of data representations. The 1-D vector of features (attributes/descriptors) used as input for this work include (a) statistics information of components’ properties, e.g., the maximum/minimum/average atomic radius, Pauling electronegativity, elemental bulk modulus, elemental work function, melting point, etc.; (b) composition vector; (c) parameters derived from empirical criteria, e.g., mixing entropy \({\Delta}S_{\mathrm {mix}}\), mixing enthalpy \({\Delta}H_{\mathrm {mix}}\), the atomic size difference \({\Delta}R\), the electronegativity difference \({\Delta}{\upchi}\), valence electron concentration VEC, etc.

$${\Delta}S_{\mathrm {mix}} = - R\mathop {\sum}\limits_{i = 1}^n {c_i{\mathrm {ln}}c_i}$$
(1)
$${\Delta}H_{\mathrm {mix}} = \mathop {\sum}\limits_{i = 1,i \ne j}^n {4{\Delta}_{{\mathrm {mix}}}^{\mathrm {AB}}c_ic_j}$$
(2)
$${\Delta}R = \sqrt {\mathop {\sum}\limits_{i = 1}^n {c_i\left({1 - \frac{{r_i}}{{\bar r}}} \right)^2}\;}, \bar r = \mathop {\sum}\limits_{i = 1}^n {c_ir_i}$$
(3)
$${\Delta}\chi = \sqrt {\mathop {\sum}\limits_{i = 1}^n {c_i\left({\chi _i - \bar \chi } \right)^2} \;}, \bar \chi = \mathop {\sum}\limits_{i = 1}^n {c_i\chi _i}$$
(4)
$${\mathrm {VEC}} = \mathop {\sum}\limits_{i = 1}^n {c_i\left( {\mathrm {VEC}} \right)_i}$$
(5)

where ci is the atomic fraction of the ith component; \({\Delta}_{\mathrm {mix}}^{\mathrm {AB}}\) is the mixing enthalpy of alloy A–B; ri is the Miracle’s atomic radius of the ith component; \(\chi _i\) is the electronegativity of the ith component; \(\bar r\) is the average atoms radius of the components in the alloy; \(\bar \chi\) is the average electronegativity of the components in the alloy; \((\mathrm {VEC})_i\) is the valence electron concentration of the ith component; VEC is the average valence electron concentration of the components in the alloy. The ri were taken from Miracle’s paper68; \({\Delta}_{\mathrm {mix}}^{\mathrm {AB}}\) were taken from Takeuchi’s paper69; Pauling electronegativity, elemental bulk modulus, elemental work function, etc. were taken from Guo’s paper52.

A schematic diagram for our PTR for alloy composition and preparation process used in CNN3 is shown in Supplementary Fig. 1. PTR mimics digital images. Alloy composition and preparation processes are mapped to a 2-D pseudo-image of 9 pixels × 18 pixels (162 pixels in total). Each square represents a pixel. The 108 blue squares correspond to 108 elements in the periodic table, e.g., the first pixel/square in the first row is used to store the atomic percentage of element hydrogen in an alloy. The 54 gray squares are the unused area in the periodic table. The alloy composition (in atomic percentage) is mapped to the corresponding blue squares, and the preparation process (0 represents melt-spun and 100 represents copper mold casting) is mapped to a gray square (we arbitrarily chose the ninth pixel/square in the first row in this work). The rest pixels/squares are set to 0. The randomized PTR used in CNN2 is almost the same with PTR except 108 elements were randomly placed in the periodical table area (see Supplementary Fig. 2). The atom table representation used in CNN1 are square images of 11 × 11 pixels, elements are placed in an atom table from left to right and from top to bottom according to the atomic number of elements (see Supplementary Fig. 3). The preparation process is mapped to the last pixel in the atom table and the rest unused pixels are set to 0.

CNN structure

A VGG-like CNN was used in automatically extracting features and making classification. The structure of our VGG-like CNNs (see its schematics in Fig. 1b) is as follows: the size of convolutional filters was 3 × 3 for all the three convolutional layers, and the stride was set at 1. The channel number in convolutional layer doubles from 8 to 16 to 32. Padding was used for the input of convolutional layers by adding zeros around the border, i.e., a zero-padding of one, to preserve as much information as possible. The most common type of convolution with a ReLU filter was used, and the value of each filter was learned during the training process. The CNN consists of two parts. One part is the feature extractor involving the first three pairs of convolutional layers, pooling layers, and ReLU (Rectified Linear Unit) layers which have a nonlinear activation function f(x) = max(0, x). The other part is the classifier with one full connection layer and one softmax classification layer. The details of the VGG-like CNN are shown in Supplementary Table 4 and Supplementary Fig. 7. Due to the limit of dataset size and small input images, our CNNs have much fewer layers, channels, and trainable parameters (about 6000) than the well-known VGG-16 (ref. 58) (about 133 million).

Training details

In prediction of GFA, all models were created and tested using the Keras with Tensorflow as its backend. The full list of manual features used in SNNs is shown in Supplementary Table 1. All possible combinations of manual features were tested, and the optimal combination which achieved the best accuracy was chosen. Hyperparameters, e.g., neuron number, were also optimized. SNNs of 20 neurons in the hidden layer were used in this work. In the training phase, the output of the SNN and CNN fitted the ground truth, and the categorical cross-entropy was used as the loss function to evaluate the fitness. The training epoch was set to 2000 (loss values almost remain unchanged), and 10-fold cross-validation (the dataset was split into 10 parts, each time 1 part was hold out as testing dataset, the remaining parts were used in training models, no validation dataset and early stop was used in training, 10 models were created after cross-validation) was used to evaluate the training/testing accuracy. In prediction of new alloys’ GFA, the results of a committee consisted of 10 models were utilized.