Element-wise representations with ECNet for material property prediction and applications in high-entropy alloys

Abstract


I. INTRODUCTION
In recent years, machine learning (ML) methods have been successfully employed for classification, regression, and clustering tasks in material science [1].To date, there have already been a number of excellent advances of ML approaches in the field of materials science.For example, lots of predictive models have been developed in order to model specific properties case by case, e.g., the prediction of numerous material properties such as melting temperatures [2], the superconducting critical temperature [3], band gap energies [4], and mechanical properties [5].Typically, hand-crafted high-dimensional descriptors are used to suit the physics of the underlying property, in which the stoichiometric attributes, elemental and structural properties, and correlated physical properties are considered as descriptors in general [5][6][7].Combined with a series of ML algorithms like random forests and support vector machines, this type of method could achieve good accuracy on a specific problem.Recently, another prevalent approach appeared, which is based on graph models and deep neural network method.Schnet [8], CGCNN [9], and MEGNet [10] are examples of such graph models, making it convenient to learn any material property simply based on raw crystal input.The remarkable expressive power of deep learning graph models allow these models to precisely describe the material properties on various systems and achieve state-of-the-art performance.
One of the most important applications of ML in materials science is the alloy modelling including metallic glass, high-entropy alloys (HEAs), magnets, superalloys, etc [11].
HEAs have garnered substantial interests due to their outstanding properties, such as high mechanical strength, good resistance to corrosion, and attractive magnetic and electronic properties [12][13][14].The multiple principal (at least five) alloying elements in HEAs make the calculations based on density functional theory (DFT) difficult due to the need for large supercells and complex crystal structure space involving multiple prototypes.Large amount of studies demonstrated that the ML methods could alleviate such problems.Based on developed ML potential, many fundamental properties like the screw, defects, and segregation of the MoNbTaW and MoNbTaVW alloy systems could be studied systematically [15,16].
In addition, the data-driven ML approaches like the phase classification problem [17,18] and predictions of configurational energies [19] to accelerate the discovery of HEAs are explored and gained progress.Though the data-driven ML models could learn properties in HEAs, the data set size of HEAs is small compared with those DFT databases such as Materials Project [20], Open Quantum Materials Database [21], and AFLOWLIB [22], which have limited the effectiveness and applicability of ML models.
Graph models could transform crystals into a graph described by atomic attributes, bond attributes, and even ground state attributes.However, when predicting the properties of materials, not all cases would form one-to-one correspondence between the target property and encoded compositional and structural features.For example, the purely compositionsbased ML models such as ElemNet [23] and Roost [24] do not rely on knowledge of crystal structures, but are capable to achieve a high performance in the average formation enthalpy and band gap.Furthermore, for properties like band gap or superconducting critical temperature, the same target values may correspond to large differences in crystal structures.
For this purpose, we highlight the benefit of using a more global or general representations for materials.Here, we propose the operation of elemental convolution in the deep neural networks, and merge the atom-wise features into element-wise features.We demonstrate that this approach could achieve competitive performance in modeling properties like shear modulus, bandgap, and refractive index, compared with previous models.Furthermore, multi-task learning with multiple regression objectives can enhance the performance by joint-learning and make properties prediction very convenient.
In this work, we develop the elemental convolution graph neural networks (ECNet) to predict material properties.This framework is capable of learning material representation from elemental embeddings trained on a data set by itself.Under the operation of the elemental convolution, the element-wise features are the intermediate and final describers, which can extract the knowledge of both atomic information crystal structures and can be updated by learning the target material properties.Our approach is to construct a more general and global attributes than delicate representations in crystal graphs when making the material-property relationship prediction, in which the global attributes are demonstrated to be superior than the previous models in some intrinsic property predictions.Utilizing the developed ECNet model, we focus on the application in the realm of high entropy alloys, especially on the CrFeCoNiMn (Cantor alloy) and CrFeCoNiPd systems [25,26].We model the solid solution using special quasirandom structures (SQSs) and calculate properties of formation energy, total energy, mixing energy, magnetic moment, and root mean square displacement (RMSD) from DFT-based calculations.Since the available data points in HEAs are limited, we utilize the transfer learning (TL) techniques to overcome the small data size problem.Considering it is easier to explore and calculate the simple alloys, we demonstrate the feasibility to utilize the information from low and medium entropy alloys to obtain better performance in predicting HEAs.Specifically, the model is initially learned from the less-principal element alloy systems and then transfers the weights into the HEAs data.Furthermore, we also have tried another TL approach, where the pre-trained ECNet is used as encoders to generate universal element-wise features, and demonstrate these features can achieve excellent performance using simple multi-layer perceptron (MLP) models.The model proposed here is based on the graph deep neural networks.Briefly, the representation of the crystals is described as an undirected graph G = (E, V), where the edges E represents bond connections and the nodes V represents atoms in crystal.The initial embedded feature matrices are V = [v 1 ; . . .; v Na ] ∈ R Na×N f , where N a is the number of atoms, N f is the size of hidden features, and v i ∈ V is the features of the i th atom initialized from the atom type Z i .It is a type of atom-wise feature generated via an atom embedding layer (Z → R N f ), which is then used as the atom (node) attributes in the structural graph.In this work, we propose the operation of elemental convolution (EC) to obtain a more general and global attributes to represent the materials.The atom-wise features are averaged according to its atomic type, as shown in Fig. 1.Then, the atom-wise features are merged into element-wise attributes with X l = x l 1 , . . ., x l T with X l ∈ R T ×N f , where T is the number of elemental types.To be clear, the graph features are still updated using the connectivity of bonds, which are the basic characteristics of graph models.We update the i th feature successively by passing message from neighboring vertices.The relations between two consecutive layers are , where 'o' represents the element-wise multiplication, and W l = Dense({e k (r j − r i )}) × f c are networks to map the atomic positions to the filters (R 3 → R F ). Here, we define a cutoff function according to the Coulomb's law, where the interaction between two atoms is proportional to the square of the distance between them.Meanwhile, the distances between atomic pairs are expanded in a basis of Gaussians.
where r c is the distance cutoff, and {µ k ∈ (0, r c )} are imposed to decorrelating the filter values.The scaling parameter γ and number of Gaussians determine the resolution of the interatomic distances.The symmetric form of interatomic distances fulfills a number of constraints such as rotational and permutational invariances.In the cutoff function f c , we add infinitesimal number ϵ to avoid illegal infinite quantity.
In the interaction blocks, the crystal structures are extracted to be part of atom representations.Through the elemental convolution, the raw structure is transformed into element-wise representations at given properties, and these features are greatly reduced after processing knowledge from physical and geometrical properties in contrast with other graph representations.It should be noted that we have used multi-task learning in our framework in order to improve learning efficiency and prediction accuracy [27].As illustrated in Fig. 1, The EC-based multi-task learning (ECMTL) or single-task learning (ECSTL) could be selected according to the number of tasks.The architecture of ECMTL shares a common set of layers (shared physical knowledge in an elemental dimension) across all tasks and then some task-specific multi-layer perceptron networks (tower layers) are designed for each individual task.This feature makes it possible to learn multiple related tasks at the same time and mutually benefit the performance of individual tasks.Moreover, the multi-task learning improves generalization performance by leveraging the domain-specific information contained in the training signals of related tasks [28].
In this work, mean absolute error (MAE) is adopted as loss function considering the real-value regression problem.The total loss function for the network is the weighted linear combination of individual losses from each specific task, which is also a common setup for the multi-task problem [27].
where L i = ∥p i − pi ∥ 2 and w i are individual losses and weights for each task-specific layers.
The final output of the predicted property pi can be arbitrary properties such as band gap, bulk/shear modulus or RMSD of alloys.

III. RESULTS AND DISCUSSIONS A. Model Performance
To investigate the predictive performance of ECMTL, we apply the model on the inorganic crystal data obtaining from the materials project (MP) [29].The data for formation energies and the band gaps in the MP datasets comprise ∼ 69000 crystals, while the training set is restricted to the 60,000 in order to be consistent with prior settings in other works [30,31].
For the band gap, a restricted data set that excludes the metal materials is also considered with a subset of 45,901 crystals (The band gap with non-zero value is labeled as E nz g ).The elastic moduli properties including bulk modulus and shear modulus have a smaller amounts of data with 5830 samples.In our model, we select correlated properties as predicted targets.
The first set is the formation energies E f , and band gap E g , and the other set is the bulk modulus K VRH , and shear modulus G VRH .In addition, the refractive index, optical gap, and direct gap for 4040 compounds are considered to estimate the performance, which is computed by density functional theory and high-throughput methods [32].Except for the formation energy and band gaps, the data set of other properties is randomly divided into training set (90%) and test set (10%).The validation dataset is set to be one tenth of the training set when the validation is used in order to optimize the hyperparameters.For comparison, we also take into account the single-task learning, called ECSTL.Table I shows the comparison of the performance of the ECSTL, the ECMTL, the MEGNet, and MODNet models in the mean absolute errors.It should be noted that the developed EC models and MEGNet model are graph-based neural networks that utilize only the atomic number and spatial distance as inputs, greatly different from MODNet model with a set of extracted feature set.Generally, the feature-based should be preferred for small to medium datasets, while the graph-based models are preferred for large datasets [30].a Results for single-property learning from the MODNet and SISSO models in Ref. [30].
b Performance listed in the original MEGNet paper in Ref. [31].
The complete dataset for the formation energy and the band gap used in the development of the ECNet model is significantly large with 60,000 samples.We note that the ECSTL/ECMTL (ECNet) for the E f prediction have a higher MAE than the prior models such as MODNet and MEGNet [30,31].The reason is that the elemental convolution merged detailed structural information while the formation energy is sensitive to the structural changes.Specifically, the MEGNet model provide an elaborate description of atomic, bond, and global state attributes, leading to a state-of-the-art performance in E f .However, many other properties, such as band gap, modulus, and refractive index, have a complicated unknown relations, which is not a good choice to use the one-to-one mapping between the structural configurations and properties.The strategy of reducing the compositional and structural features may work better for those intrinsic target properties.Taking the band gap as an example, the ECNet model systematically outperforms MODNet and MEGNet using a large dataset or limited dataset that has excluded the non-zero values.For the prediction of E g and E nz g , the MAEs of ECSTL are 0.164 and 0.27 eV, while those of the ECMTL are 0.227 and 0.27 eV, respectively.Both the single-task and multi-task learning models outperform the MODNet and MEGNet in the band gap predictions.In addition, the MAEs of shear modulus and refractive index are lower than MEGNet model.Among these models, SISSO [33] is found to result in higher prediction errors in both large and small dataset for different target properties.It should be noted that the ECMTL model learn bulk and shear modulus at the same time, and we observe that the MAE for the shear modulus is 0.046 log 10 (GPa), slightly lower than the ECSTL.This observation indicates that multi-task learning is more helpful when applied in a specific combination of tasks.To investigate and understand the graph elemental convolution models, we visualize the learned elemental feature vectors in Fig. 2. Taking the SeO 2 as an example, we visualize the 128 dimensional representations for each element.In Fig. 2 which is in orthorhombic crystal system (point group: mm2) while SeO 2 in Fig. 2(a) is in tetragonal lattice (point group: 4/mmm).In contrast with IB-3 outputs of SeO 2 , we observe that the two features are in similar patterns due to the same chemical compositions of the two compounds.To quantify the difference, we compare the Frobenius norm of the feature matrix and the structural matrix, while the differences in them are represented as ∥∆X∥ F and ∥∆S∥ F , respectively.We notice that the relative difference (∥∆X∥ F /∥X∥ F ) are similar for the two compounds with 0.66 and 0.63 for the first and third interaction block.The relative difference in structural matrix of ∥∆S∥ F /∥S∥ F is 0.91, slightly larger than the difference in feature vectors.The relative structural matrix norm shift is almost consistent with those in feature matrix.This observation shows that these features contain preprocessed physical geometrical knowledge under the graph-based framework, and the ECNet model is capable to distinguish different materials even using a more global elementwise representations.

B. Application in HEAs
To investigate the applications in HEAs, we focus on CrFeNiCoMn/Pd quinary high entropy alloy and binary, ternary, as well as quaternary subsystems.The systems are among of the most interesting in the HEA field.The FCC quinary CrFeNiCoMn HEA, formed from five 3d transition metal elements (called Cantor alloy), has a remarkable strength and ductility at high temperature [34].By substituting Mn with the 4d element Pd, the FCC quinary CrFeNiCoPd HEA is recently reported with a around 2.5 times higher strength than that of CrFeNiCoMn at similar grain size, which also comparable to those of advanced high-strength steels [26].For the purpose of constructing a good training data set and using ECNet model to study property relations, we here apply DFT-based calculations on random phases of CrFeNiCoMn/Pd quinaries and their binary, ternary and quaternary subsystems, to obtain structural and electronic structure, as well as energetic properties.In details, various properties of the CrFeNiCoMn/Pd systems have been selected into training data set , including the total energy (E tot ), the formation energy (E form ), the mixing energy (E mix ), root mean square displacement (RMSD), as well as magnetic moment per atom (m s ) and per cell (m b ).
It should be noted that one critical factor for developing an effective and robust machine learning model is to prepare a diverse training data encompassing a good range of structural environments and compositional space.Therefore, we considered not only CrFeNiCoMn/Pd HEAs in FCC but also in BCC single-phase solid solutions (SPSS); and not only equiatomic but also various non-equiatomic compositions have been chosen as well.In total, there were 363 data points of FCC and BCC SPSS.adjusting hyper-parameters to make all properties be fitted excellent.Since the multi-task learning concerns the problem of optimising a model with respect to many properties, it is crucial to understand which tasks could probably help in an MTL process [28].In the ECMTL deep networks, the different tasks share some common hidden layers, which is called hard parameter sharing in deep neural networks [35], and this method would learn a joint representation among these multiple tasks or do model interpolation to regularize certain target [36].Each task will be learned better if it is trained in the network that is learning other related tasks at the same time, while uncorrelated tasks appear as noise for other tasks to improve generalization performance [35].Here, we firstly investigate the correlations between different properties to find related tasks.The Pearson correlation coefficients for the five properties are shown in Fig. S3 in the supplementary information.
The m s and m b have a correlation coefficient of 0.43, which is an obvious result because the two magnetic moments are merely the values in different scales.Generally, we can observe that the RMSD is less correlated with the magnetic properties as well as the energy-related properties, since the correlation coefficients are small, e.g., -0.1 between RMSD and E form and -0.12 between RMSD and m s .We select tasks of m s and m b with significant correlation to develop a ECMTL model.Then the E tot , E form , and E mix are adopted together in the multi-task learning, while the energy-related properties can be considered to be dependent physically in such cases in addition to the correlations.Finally, we train RMSD within the single-task framework.
As shown in Fig. 3 and 0.01 Å, respectively.Taken the RMSD as an example, the mean absolute error has been reduced with a percentage of 87.3% relative to the error in the framework of unit model, and the error percentages are -79.6% and -90.6% at binary and ternary (2+3), and quaternary and quinary (4+5) data sets, respectively.The similar well-distributed data shapes between DFT calculated and ECNet predicted values also indicate the model is excellent in property prediction in all the domains.More critically, we can observe that the high performance is achieved across the entire range of those properties even when the distribution of the training and test data are not similar as illustrated in Fig. 3.
The DFT-based calculations are much more computationally demanding for HEAs.To circumvent the HEAs bottleneck, one approach is to use the transfer learning (TL) technique [37].We consider that the knowledge related to chemical and physical similarities between different elements could be transferred from the less-principal element alloys into multipleprincipal element systems.To validate the feasibility, we propose the approach that the binary and ternary data are used as the source data set for transfer learning, and the quaternary and quinary data are used as the target data.For the task of transfer learning, we introduce two types of transfer learning.The first one is that we transfer the feature representations including tower layers (TL-I), while the other is excepting the weights from tower layers (TL-II).Both types of source models are learned from binary and ternary data sets.As seen in Table .II, we find that the prediction errors drops after using transfer learning no matter what models have been employed (unit or group).For example, the error percentages of E tot , m b and RMSD are around -52%, -94% and -54% under grouped models using TL-II compared with scratch method.Since the current models are approximately optimal solutions, it is difficult to further optimize model performance even using transfer learning technique.In effect, some error percentages have risen after transfer learning.It is interesting to find that the performance from TL-II is better than TL-I.Since the difference is whether to include the task-specific layers, we conclude that the shared common layers are more general feature representations, which is helpful to transfer the domain knowledge from source data.Generally, the use of transfer learning is very effective in such cases, and it makes the predictions closer to the true calculated properties.
The element-wise features that encoded with compositional and local geometric information are passed through multi-layer neural networks in the ECNet framework naturally.
Thus, the pre-trained ECNet models can be considered as an encoder of a material to generate universal elemental features.The readout vectors after the atom embedding layer, three interaction block layers, and the graph final output are represented with V 0 , IB 1 , IB 2 , IB 3 , and f .These features are all extracted from the parent models, and can be regarded as compositional (V 0 ), or structural descriptor (IB 1 , IB 2 , IB 3 , and f ) related to output properties.
As NN layers stacked, it would gain more and more global information within each element since the internal message on each atom can be propagated to further distances.Since the information gained from previous model training is retained, the readout element-wise vector can be regarded as a TL scheme, which is similar to the TL approach used in AtomSets models [38].In this TL framework, we take the redout element-wise vector trained from magnetic moments as descriptor and use multi-layer perceptron (MLP) model to predict properties.
Figure 4 shows the model convergence study of the MLP models, where the data points are randomly sampled from original datasets at a percentage of 20%, 40%, 60%, 80%, 100%.
Such compositional models can not distinguish two alloys in the same stoichiometry but in different phases.We provide the test results of FCC and BCC solid solution phases for magnetic property in Fig. 4 (a) and (b), and formation energy in Fig. 4 (c) and (d) respectively.Comparing with the TL features, we encode elements as one-hot vector with a dimensional size of 100 to yield a non-TL MTP model.As the data size increases, the predicted MAE in magnetic property drops as expected.The convergence of the MLP-f models are not rapid, in which the features f were generated from the final task-specific layers in the magnetic model.This result shows that the quality of f descriptor is worse than other transfer learning features.We believe that this is caused by the lower data quality in the magnetic moment data, which severely affect the TL features that are closest to the final outputs.However, the MTL-IB 1 , and MTL-IB 3 achieve relatively high performance at all data sizes and generally show rapid convergence than non-TL models.In particular, transferred features to perform exploration, which have shown equivalent performance as seen in Fig. 4. Figure 5 shows the interpolation results of formation energy and magnetic moment for Fe-Co-Mn ternary system in single FCC solid-solution phase at 0K under the MTL-IB3 model.From the formation energies of Fe-Co-Mn in Fig. 5(a), we find that the energy spans a range from around -0.03 to 0.12 eV/atom, and the low Mn-concentration regions are likely to be stable.The magnetic moment of Fe-Co-Mn system in Fig. 5 show a gradually increment as the concentration of Fe atom increases and that of Mn atom decreases.composition-structure-property relationships.The models provide a more general and global element-wise feature vectors to represent materials, and they show better performance in some intrinsic properties like band gaps, elastic moduli, and refractive index.Furthermore, we utilise the multi-task learning technique and find that it improves the learning efficiency and prediction accuracy compared to training a separate model for each task.Then, we explore applications in high entropy alloys and focus on CrFeNiCoMn/Pd systems.We have developed models to accurately predict several physical properties.Furthermore, the transfer learning techniques are used to enhance the performance in the high-entropy alloys.Since the model has learned relevant physical and chemical similarities between structural and elemental information, it is better than models trained from scratch.We demonstrate the feasibility to use the low-principal element alloys as source data to enhance performance on multiple-principal element alloys by TL technique.Moreover, the TL element-wise feature vectors from parent ECNet model can be used as universal descriptors.We find multi-layer perceptron combined these descriptors could reach good accuracies even at small data limits.
Finally, we take Fe-Co-Mn system as an example to do interpolations using this framework in order to understand underlying physics.By taking advantage of the trained ML models, we eliminate the difficulty of a near-infinite compositional space.
dimensional size of hidden channel, three interaction blocks, and two layers neural networks with 128 and 64 neurons in the final tower layer blocks.
FIG. 1.The architecture for the ECNet model.The embedding layer is used to encode initial inputs from the atomic numbers.In the interaction block, a series of neural networks is used to transform the crystal structures into atomic attributes.The elemental convolution operation takes average values of atom-wise features according to the atomic element type.

FIG. 2 . 2 (
FIG. 2. Element-wise feature vectors.(a) Outputs from three successive interaction blocks for SeO 2 (Tetragonal P 4 2 /mbc space group).(b) Outputs from the third interaction block for SeO 2 (Orthorhombic P mc2 1 space group).The materials ID from the Materials Project for materials in (a) and (b) are mp-726 and mp-559545, respectively.
(a), It shows the outputs after three interaction blocks (IBs) in the ECNet model, in which negative and positive values are represented with blue and red colors, respectively.These element-wise features (IB-1, IB-2, IB-3) are updated by iteratively passing messages from structural configurations within three successive IBs.In consequence, the domain knowledge tends to accumulate as layers stacked.It is clear that different elements show unique characteristics.However, the neural networks inside the IBs have encoded elemental features from local atomic environments to particular target properties.Thus, the special characteristics of different elements are the coupled effects between the physical elemental types and information learned on a large set of data samples.In Fig.2(b), we illustrate the third IB outputs of another SeO 2 compound, To begin with, the ECNet model architectures are trained from scratch, where the model parameters are initialized randomly from a uniform distribution.All the elemental features, model weights and biases are learned from the input training data.Our initial ML models are constructed with ECMTL by learning the six properties as a unit at the same time.Besides, we subdivide the data set into three categories: (1) all the known data, (2) binary and ternary compounds, (3) quaternary and quinary compounds.The unit model performance on the test set is illustrated in TableII(model Scratch-unit).We can observe this unit model has shown good accuracies at energy-related properties across the different datasets.However, it shows not good performance for cases like magnetic moments (MAE: 0.700 µ b /cell at binary and ternary data set) and RMSD (MAE: 0.138 Å at quaternary and quinary data set).Within the unit framework, it is hard to further optimize the model performance by , the parity plots show the comparison of the DFT-calculated and ECNet-predicted results for the six properties of CrFeNiCoMn/Pd alloys for both the training and test data sets.The corresponding MAE values are displayed in Table II (model Scratch-group), which shows that the performance of ECNet model is significantly improved compared to the previous unit model results.Specifically, the test MAEs for the E tot , E mix , E form , m s , m b and RMSD are 0.122, 0.022, 0.009 eV/atom, 0.058 µ b /atom, 0.386 µ b /cell,

FIG. 3 .
FIG. 3. Performance of ECNet models on properties of E tot , E mix , E form , m s , m b and RMSD for CrFeNiCoMn/Pd alloy systems.The blue and orange circles are samples from training and test data sets, and their data distributions are displayed for reference.
FIG. 4. Model convergence tests for MTL-IB 1 , MTL-IB 3 , MTL-onehot, and MTL-f.(a) and (b) show the prediction performance of magnetic property for FCC and BCC, respectively.(c) and (d) show the performance of formation energy for FCC and BCC, respectively.The shaded areas are the standard deviation across five experiments randomly sampled from the data set.

Furthermore, we also
FIG. 5. Machine learning predictions of (a) formation energies and (b) magnetic moments for FCC Fe-Co-Mn binary and ternary alloys.

TABLE I .
Test accuracies in MAEs for the formation energy (E f ), the band gap (E g and non-zero gap E nz g ), the bulk modulus (K VRH ), the shear modulus (G VRH ), and the refractive index (n) from ECSTL/ECMTL models and prior works on the materials project data set.The training set size is represented as N train , and the underlined values correspond to the lowest error for each task.

TABLE II .
Statistical summary of the prediction performance (MAEs) of six different properties on test sets.The dataset is divided into three categories with binary and ternary compounds (2+3), quaternary and quinary (4+5) compounds, and total compounds (Total).For comparison, we list results from one unit multi-target model (unit) and three MTL models after all properties being grouped (group).