Introduction

Permanent magnets have been playing a great role in the development of science since their discovery1. They are used almost in every sector of current technology2. The growing awareness of green earth and renewable energy sources has also boosted the use of permanent magnets in energy sectors such as hydro-energy, wind, and wave energy as well as in electric vehicles. With the increased demand for permanent magnets, the rare-earth elements used in manufacturing strong permanent magnets are in a critical state of running out. Numerous experimental and theoretical research works have been done to develop new magnetic materials3,4,5,6,7,8,9,10,11,12.

For example, iron-based chalcogenides have been extensively studied, both theoretically and experimentally for their intriguing magnetic behavior13,14,15,16,17,18,19,20; ferromagnetism, ferrimagnetism, and antiferromagnetism are reported for different chalcogens14,15,18. Varying the elemental composition of the transition metal chalcogenides by changing both the metal elements as well as the chalcogens may reveal a new form of magnetic material. However, investigating all possible compositions of chalcogenides is an open challenge. The experimental investigation involves the synthesis and characterization of these materials, which are prohibitively expensive and time-consuming. Predictive calculation based on the first-principles density functional theory (DFT)21,22,23 that explicitly includes electron–electron interactions within an effective single-particle picture is also numerically challenging considering a multitude of compositional configurations of chalcogenides. In a situation like this, a well-established data-driven approach could offer a faster and computationally cost-effective alternative to those expensive and time-consuming experimental or computational methods. In recent years, much data-driven research has been performed to study magnetic properties24,25,26,27,28,29, band gaps30,31, as well as chemical properties32,33 of materials using machine learning models trained on DFT and experimental data. The catalytic activity of the complicated chemical system has been investigated using machine learning methods32. Also, the accurate predictions of band gaps in functionalized MXene exhibit the credibility of the machine learning approach30. Additionally, complex phenomena such as magnetic ordering, and magnetic moment have been successfully studied in 2d materials using a data-driven approach24. DFT-aided autonomous material search system has been designed to identify magnetic alloys25. Furthermore, the properties of rare-earth lean magnets are studied using DFT-aided machine learning34. In particular, the growing interest in studying the magnetic properties of materials using DFT-based machine learning models has highlighted the importance of DFT in the field of data-driven material science22,25,29,30,31,32.

In this work, considering the recent advances of artificial intelligence in the multidisciplinary field of science and technology35,36,37,38, we attempt to apply machine learning methods to develop a predictive tool that learns meaningful patterns from data and predicts the compositional dependent magnetism in Fe-based bimetallic chalcogenides FexAyB; A represents Ni, Co, Cr, or Mn, and B represents S, Se, or Te, and x and y are the concentration of respective elements. In order to develop a machine learning-based approach for identifying magnetism in Fe-based bimetallic chalcogenides, we generate a dataset of structures representing 4348 compositional configurations of Fe-based bimetallic chalcogenides FexAyB using density functional theory (DFT) calculations. We obtain magnetization of each compositional configuration using spin-polarized DFT calculation. This dataset is subsequently used to train the various ML algorithms such as Linear Regression, Support Vector Regressor39,40, Random Forest41, Decision Trees42, K-nearest neighbors43, Extreme Gradient Boosting44,45, and Artificial Neural Network46,47. Based on a tenfold cross-validation48 score, we selected the six best machine learning algorithms to develop an ensemble model based on stacked generalization for predicting magnetism in bimetallic chalcogenides. We obtained MSE, MAE, and R2 values of 1.655 (µB)2, 0.546 (µB), and 0.922 when we tested the final stacked model on an independent DFT test data.

Materials and methods

Our approach to discovering magnetism in Fe-based bimetallic chalcogenides is based on a supervised machine-learning approach. Initially, we generated a dataset of 4348 structures representing various compositional configurations and then performed DFT21 calculations and obtained magnetization in the unit cell. Then, we performed feature engineering where we employed a set of descriptors (features) that are suitable for describing magnetization in the chalcogenides. We divided the dataset into training and test sets. Subsequently, we trained the model using cross-validation and grid search methods to determine the best-performing model. Finally, we tested the performance of our proposed model on the (independent) test set. Each of these four stages is briefly discussed in the following subsections.

Dataset

To create the dataset, we employed the first principles DFT21 calculations. We started with constructing a primitive cell of hexagonal (space group p63/mmc) Iron-Sulfide (FeS) consisting of two Fe atoms and two S atoms. It has been reported that the chalcogenide can be easily synthesized in the hexagonal form as compared to the tetragonal structure49. Vienna ab initio simulation package (VASP)50,51 is used for the DFT calculations; a plane wave basis with a cutoff energy of 720 eV is used. The atomic structure in the unit cell is optimized without symmetry constraint until the residual force on each atom is smaller than 0.001 eV/ Å. The convergence criterion for the total energy is set at 10−10 eV. The exchange and correlation are approximated using a gradient-corrected Perdew–Burke–Ernzerhof (PBE)52 exchange–correlation functional and the electron–ion interactions are treated with the Projector Augmented Wave (PAW)53 potential. A Monkhrost-Pack scheme with a 3 × 3 × 3 K-point grid is used to sample the first Brillouin zone in the reciprocal lattice. Using the optimized lattice parameters, we expanded the primitive cell of hexagonal FeS to a bigger unit cell which consists of 16 Fe-atoms and 16 S-atoms as shown in Fig. 1.

Figure 1
figure 1

Unit cell of hexagonal FeS. S1, S2, S3, and S4 represent the substitutional sites for the transition metal elements.

Subsequently, we used a substitution technique to create bimetallic chalcogenides (FexAyB) of different atomic compositions; x and y represent the respective concentration of elements. Substitution technique24,54 is a common practice in material science to create a defect as well as new material. The Fe atoms in the structures were substituted by A (Ni, Co, Cr, or Mn) and S atoms by B (Se or Te). To describe the local geometry of the structure, we designated four atomic sites by S1, S2, S3, and S4 as shown in Fig. 1. A number was assigned to these sites S1, S2, S3, and S4 depending upon how many Fe-atoms are replaced at those atomic sites keeping the chalcogen concentration unchanged. For example, when we substitute two Fe-atoms at site S1, one Fe atom at site S2, and no substitution at S3 and S4, then S1 = 2, S2 = 1, and S3 = S4 = 0. This leads to x = (16–3)/16 and y = 3/16. Based on the number and site of substitutions, we generated 4348 bimetallic chalcogenide structures of different compositions. Subsequently, spin-polarized DFT calculations were performed to calculate magnetization for each of these compositions to obtain the data set to develop the machine learning models.

Once the dataset is created, the next step is to identify/define the descriptors or features for the problem as the choice of descriptors is one of the most important aspects of any machine learning-based approach. We choose 12 descriptors for our problem where the 8 descriptors describe the concentration of metal elements such as metal Fe, Ni, Co, Cr, Mn, and chalcogens elements S, Se, and Te. For example, in Fe0.6875Ni0.3125Se, the eight descriptors are 0.6875, 0.3125, 0, 0, 0, 1, 0, 0 representing the concentration of each element for possible bimetallic chalcogenide configurations (please see the data table in the provided GitHub link). Four descriptors describe the location of substitutional sites S1, S2, S3, and S4. The choice of these descriptors was motivated by the fact that magnetic ordering in the substituted chalcogenides is dependent upon the substitutional sites too. The calculated magnetic moment of the unit cell obtained from the DFT is taken as the target variable and these 12 descriptors are the input for the supervised machine learning framework. To understand the correlations between the features, a correlation matrix is generated (see Fig. S2 in Supplementary Information). We observed a low level of correlation between the features.

The data were randomly split into a training set and a test set in the proportion of 85:15. Subsequently, both the training and test data set are normalized separately to avoid information leakage from the test set to the training set. The size of the training and test dataset is given in Table 1.

Table 1 Number of DFT-data points used for training and testing the machine learning models.

Algorithm training and model selection

We trained seven different machine learning algorithms: Linear Regression (LR)55, Support Vector Regressor (SVR)39,40, Random Forests (RF)41, Decision Trees (DT)43, K-Nearest Neighbours (KNN)43, Extreme Gradient Boosting (XGBoost)56, and Artificial Neural Network (ANN). We used scikit-learn57 and TensorFlow Keras API58 to implement these models. To find the optimal hyperparameters for our model, we extensively performed tenfold cross-validation on the training set and grid search on different combinations of hyperparameters (See Table S1 in Supplementary Material). The description of algorithms and cross-validation techniques are briefly described in the following subsections.

We started with LR55, a popular machine-learning model that provides the best linear fit to the data points by finding a linear relationship between features and target output by minimizing the distance between the target value and the predicted value that lies on the best-fit line. The basic LR model takes the form: \(y ={W}^{T}X\), where, y is the target, X = (1, x1, x2, …, xn) is the input feature vector, W = (w0, w1, w2, …, wn) is the weight vector.

For nonlinear relationships between features and the target, other algorithms such as DT, RF, KNN, SVR, XGBoost, and ANN offer better performance. DT42 splits the training examples into a tree-like structure based on the significant splitter in the input features. The splitting results in various leaf nodes, each of which represents a different prediction. RF41,59, which is also known for its capability in solving nonlinear problems, uses an ensemble learning approach that relies on the output of multiple decision trees. Thus, RF is a more powerful estimator as compared to DT and is less prone to overfitting and bias. The K-Nearest Neighbor (KNN)43 is a supervised algorithm that estimates the association between features and target variables based on the average output of the other nearest K data points. In our experiment, we set the number of nearest neighbors to K = 5. SVM39 uses the kernel functions that transform the low dimensional data into a higher dimensional feature space such that it can find a separation hyperplane that maximizes the margin between different classes. For regression problems, SVR40 fits the best hyperplane on training data to predict the discrete values. We have used the XGBoost56 which is a gradient-boosting-based decision tree algorithm. It uses a gradient descent approach to minimize the loss and combines different models using an ensemble approach. XGBoost and RF have nearly similar model representations with different training algorithms. XGBoost is based on serialized base learners, whereas RF is based on parallelized base learners. We have also implemented an artificial neural network (ANN) with two hidden layers using a simple feed-forward neural network architecture that learns by comparing initial outputs with the provided target by adjusting weights and biases through backpropagation. The architecture of the ANN-based model is shown in Fig. 2. After hyperparameter tuning, we found the best-performing neural network consists of two hidden layers having 256 and 64 neurons respectively. The details of hyperparameters are given in Table S1 (Supplementary Material). We have also analyzed how the features are contributing to a model prediction. Based on these calculations, which are performed using the random forest model, the concentrations of elements such as Mn, Cr, Te, and S are found to be the most dominant features as shown in Fig. S3 (Supplementary Information).

Figure 2
figure 2

The architecture of an artificial neural network with two hidden layers with 256 and 64 neurons respectively.

K-fold cross validation

To search for the best hyperparameters and to compare the performance of different models, we implemented a K-fold cross-validation algorithm on the training data. The K-fold cross-validation relies on a data partitioning scheme to ensure that the model can generalize the pattern on a diverse dataset. In this method, the dataset is randomly divided into K different sets. Following this, the model is trained using K-1 sets of the dataset and tested against the remaining set. The process is repeated K times and the results are statistically analyzed to choose the best-performing model. In this work, each model is trained and fine-tuned using K = 10 through a grid search process.

Stacked generalization

After examining the individual models, we combined the best-performing models using a stacked generalization60 algorithm to improve the predictive performance. Stacked generalization is an ensemble approach that combines two or more pre-trained models (base learners) followed by a second-level regression model (meta learner). In this method, we stacked six base learners (DT, RF, SVR, XGB, KNN, and ANN) followed by a meta learner (RF) as shown in Fig. 3. It is noteworthy to point out that LR was omitted from the stacked generalization as its performance was not satisfactory.

Figure 3
figure 3

Block diagram for stack generalization.

Despite being a powerful technique that relies on the strength of multiple models, the stacked generalization approach is more prone to data leakage while performing cross-validation because the same dataset is used to train both the base models and meta-regression models resulting in model overfitting. The data leakage and overfitting in the cross-validation stage may mislead the model selection process. Hence, in this work, we implemented a stacking algorithm with cross-validation proposed by Wolpert60 to prevent data leakage and overfitting. In this technique, initially, the data is randomly divided into K sets. In the first stage, the base models are trained using K-1 sets and score-level features are extracted from those base models using the remaining set of training data. The process is repeated K times and each time a new dataset is prepared using the score-level features of the base models. In the second stage, a meta-regression model is trained using the data constructed from the first stage. Finally, each base model is trained using the entire training dataset and stacked together, which is subsequently connected to the previously trained meta-regression model (RF) to form the final model. It should be noted that in this work, each base learner and meta learner are trained and evaluated independently using the tenfold cross-validation method.

Performance evaluation

To evaluate the performance of different machine learning regression models, we use three evaluation metrics: mean square error (MSE), mean absolute error (MAE), and coefficient of determination (R2), which are defined below.

$${\text{Mean Squared Error }}\left( {{\text{MSE}}} \right){ } = { }\frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {y_{i} - \hat{y}_{i} } \right)^{2}$$
$${\text{Mean Absolute Error }}\left( {{\text{MAE}}} \right){ } = { }\frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {y_{i} - \hat{y}_{i} } \right)$$
$${\text{Coefficient of determination }}\left( {R^{2} } \right) = 1 - \frac{{\mathop \sum \nolimits_{i}^{n} \left( {y_{i} - \hat{y}_{i} } \right)^{2} }}{{\mathop \sum \nolimits_{i}^{n} \left( {y_{i} - \overline{y}_{i} } \right)^{2} }}$$

where, \(y_{i}\), \(\hat{y}_{i}\) and \(\overline{y}\) are the true, predicted, and the average value of y respectively.

MSE measures the average of the squares of residuals, while MAE measures the average of the residuals. They both have positive values; a smaller value indicates less error and better performance. MSE penalizes the model with larger errors than the MAE and hence is more sensitive to the outliers in the data. The lower the values of MSE and MAE are, the better the predictive performance of a model. The R2 score, also known as the coefficient of determination, is also a statistical measure in a regression model that represents the proportion of the variance in the dependent variable that is predictable from the independent variable(s). Its value lies between 0 and 1. Since R2 alone does not measure the accuracy of the predictions61,62, we have used this metric in conjunction with MSE and MAE to measure the performance of the regression models used in our study.

Results and discussion

First, we performed some exploratory data analysis on the DFT dataset. It reveals that the bimetallic chalcogenides containing S and Se are found to have higher magnetic moments compared to those containing Te as shown in Fig. 4. This fact is supported by earlier research49 that revealed FeS and FeSe exhibiting stronger magnetization than FeTe. Furthermore, we found that Fe-chalcogenides containing Cr and S have higher magnetic moments than those containing other transition metal elements (Ni, Co, or Mn) and chalcogen elements (Se and Te) as shown in Fig. 4. An increase in the magnetic moment is also noticeable when Cr or Mn concentration increases in chalcogenides containing S or Se (see Fig. S4 in Supplementary Information).

Figure 4
figure 4

Dot plot showing the magnetic moment of Fe-based bimetallic chalcogenides; FexAyB where A represents Ni, Co, Cr, or Mn, and B represents S, Se, or Te, and x and y represent the concentration of respective atoms. Three shaded regions differentiate the magnetic moments of the chalcogenides containing S, Se, and Te respectively. Blue, orange, green, and red color dots correspond to the value of magnetic moments of the chalcogenides containing transition metal Ni, Co, Cr, and Mn respectively. Bimetallic chalcogenide with Cr and S exhibits a higher range for the magnetic moments.

Further, the substitutional sites of transition metal elements in the chalcogenides are found to influence the target value (magnetic moment).

Tenfold cross-validation results of base learners and meta learners

Table 2 shows the performance comparison of various machine learning models on the training dataset mentioned in Table 1. The RF model is found to perform well based on the comparison of mean MSE, mean MAE, and mean R2. The detailed calculation of performance measures is provided in Tables S2, S3 and S4 (Supplementary Information). Except for the LR approach, all models perform reasonably well, which could be understood from the plausible nonlinear relationship between the target variable and the features in our dataset. The well-performing models are subsequently used to develop the final stacked model.

Table 2 10-Fold cross-validation results of different machine learning models.

Next, to find the best meta-regression model for the stacked generalization approach, we used the output from the individual base models to train LR, RF, and XGB and recalculated mean MSE, mean MAE, and mean R2. The results are presented in Table 3. RF model is found to be the best meta-learner.

Table 3 10-Fold cross-validation results of various meta-learners.

Evaluation of the final stacked model on an independent (DFT) test dataset

Finally, we trained the models using the entire training dataset and tested them on the independent test dataset. Upon testing against the independent 653 DFT-test data points (Table 1), the MSE, MAE, and R2 of the stacked model are found to be 1.655 (µB)2, 0.546 µB, and 0.922 respectively compared to tenfold cross-validation values of 1.29 (µB)2, 0.50 µB and 0.94. These results show that our final model performs equally well on the test data as it did during the validation indicating the generalizability of the approach.

Figure 5 shows the comparison between the true magnetization and predicted magnetization for each data point on the independent DFT test dataset. We have sorted the test data points in ascending order based on the value of the magnetic moment obtained from the DFT (true value). One can notice that the predictive performance of the model is much better for M < 14 μB. This is expected as only 3% of the entire training dataset is available to train the machine learning model for M > 14 μB. Though we noticed over and under prediction of magnetic moments in some instances, our model identifies non-magnetic and magnetic chalcogenides with a high degree of certainty. A deeper analysis of the under and over-predicted region reveal that a limited number of data points having similar target value are used during the training of the model, which explains the variation in the predicted target value (M) in those regions. Nevertheless, our study shows that the complex electronic interactions involved in the DFT calculations are well captured by the purposed model to predict magnetism in bimetallic chalcogenides.

Figure 5
figure 5

Scatter plots showing true (green circle) versus predicted (red circle) magnetic moments (M) in bimetallic chalcogenides. Data points from the independent test dataset, sorted in ascending order based on the value of the magnetic moment (M) obtained from DFT (true value).

Expanding the applicability of the proposed model

In this work, a unit cell of FeS having 16 Fe and 16 S atoms is used to generate the DFT dataset. As a result, we have predefined fixed values of y (0.0625, 0.125, 0.1875, 0.25, 0.3125, 0.375, 0.4375, 0.5, 0.5625, 0.625) and x (= 1 − y) as multiples of 1/16. Furthermore, the descriptors S1, S2, S3, and S4 can each take integer values from 0 to 4. To overcome this limitation and increase the flexibility of our model, we have developed a generalized algorithm that can take the concentration y (or x) in percentages with a constraint of 0 < y < 0.625. It also allows the user to choose the concentration of substituted atoms on the atomic sites S1, S2, S3, and S4 in percentages. Then, the algorithm calculates the input features y, x, S1, S2, S3, and S4 in the suitable form required to feed the ML model. The detailed procedure is explained in Algorithm 1. The implementation of the algorithm is available in the provided GitHub link.

figure a

Conclusions

The quest for new magnetic materials that are cheaper than the rare-earth element-based magnets is of significant interest in current years for their application that ranges from data storage to automotive vehicles to biomedical fields to the green energy sector. Experimental and computational investigation of possible alternative magnetic materials is expensive and time-consuming. In this work, we have used a data-driven framework that would accelerate the discovery of new magnetic materials. We used an optimized FeS structure and employed a substitution technique to design new bimetallic chalcogenides of different compositions. The first principle DFT is used to generate training and test data. After training and evaluating several machine learning models, we have developed a stacked machine learning model combining several best-performing base models as the final predictive tool. The final model has shown a high degree of accuracy on the independent DFT-test data with MSE, MAE, and R2 values of 1.655 (µB)2, 0.546 µB, and 0.922 respectively. Additionally, we have developed a generalized algorithm to expand the applicability of our model to a wide range of inputs. The predicted data reveal the Fe-based bimetallic chalcogenides containing chalcogen element S and a higher concentration of transition metal Cr yielding higher magnetic moments than the other configurations, which is consistent with the DFT data. This work presents a strategy for discovering a new magnetic material made from less expensive and more abundant elements that would eventually replace the costly existing magnetic materials made out of rare-earth metals.