Introduction

The discovery of novel functional materials is a major goal in materials science. Advancement in electronic structure calculations and the development of digital crystal databases have led to the successful discovery of some new functional materials via high-throughput screening (HTS)1,2,3,4,5,6. The HTS is typically conducted in hierarchical stages with increasing accuracy and cost, often starting with the screening of density functional theory (DFT) database of previously synthesized materials, followed by high-level DFT refinements and experimental verifications2,6. To expand the scope, databases such as Materials Project7, OQMD8, and AFLOW9 have been collecting a large number of virtual crystals, which are ground-state structures in silico but not yet experimentally synthesized. Some of the promising virtual crystals indeed have been synthesized10,11,12, demonstrating the validity of a virtual screening strategy to discover new materials.

Many, if not the most, screened virtual materials are not experimentally realized11, thus assessing synthesizability has been an important subject13,14,15,16,17,18,19,20. Typically, the synthesizability of virtual materials is assessed using the energy above convex hull6,11,21,22. As well recognized, however, the latter thermodynamic metric is insufficient for assessing synthesizability as the synthesis kinetics and growth conditions are largely neglected in that approach (e.g., selection of precursors, annealing temperature and duration, external pressure, and so on)11. Therefore, developing a generalized and more reliable method to predict the synthesizability of a candidate crystal can significantly accelerate the high-throughput discovery of new materials.

A binary classification (positive and negative labeling) may be used to predict stability. However, such positive and negative learning cannot be used to predict the synthesizability as there is no negative (“unsynthesizable”) crystal data, since the inability to synthesize a hypothetical crystal is difficult to know a priori. Hence, databases only have previously synthesized crystals (positive) and potentially synthesizable crystals (unlabeled). The positive-unlabeled (PU) semi-supervised classification methods aim to predict positivity for problems where the negative data are hard to obtain23,24,25. Indeed, a transductive PU-learning method26 has been used recently27,28 to predict the synthesizability score, called crystal-likeness (CL) score, of unlabeled virtual crystal structures in the MP database. The model showed a respectable out-of-sample positive data prediction accuracy of around 87%27. Since the method uses crystal graph convolutions to encode material information, it can be seen as a structure-based synthesizability prediction model, in comparison to conventional thermodynamics-based estimations.

While the previous work demonstrated the proof-of-concept for synthesizability prediction, the model accuracy for specific subdomains of chemical space such as perovskites (74%) was below the overall accuracy (87%)29,30. Since perovskites are increasingly receiving wide attention for their applications in photovoltaics31,32,33, light-emitting diode34,35,36, magnetic materials37,38, superconductors39,40,41, and Li-ion conductors42, developing a perovskite focused model with improved accuracy would be invaluable for more efficient materials discovery10,43.

Indeed, several previous synthesis models have focused on perovskites. Heuristic-based Goldschmidt tolerance factor is commonly implemented to predict stability for ionic perovskite44 based on the ionic radii of the constituent elements. Similarly, Bartel et al. developed a machine learning (SISSO)-determined tolerance factor to classify structure type (perovskite vs. non-perovskite) for the ionic perovskites45. In addition, gradient boosting decision tree46,47,48, support vector machine49, random forest classification47,50, and combination of multiple models51 were used to develop similar classification models. However, previous models focused largely on metal oxide perovskites and mostly relied on the Shannon ionic radii database52, making the consideration of the perovskites with more covalent bonding53 or the anti-perovskites54 difficult due to the limited scope of Shannon’s table52. Potentially, training a generalized deep learning model could address these deficiencies.

The previous study has shown that training the model with a particular domain of materials can improve the model accuracy55. Such domain-specific learning could also improve the synthesizability prediction for perovskites as well. Another challenge in applying the PU learning framework27 to a chosen structure type is the small data size for the prototype. Transfer learning56 is a widely-used strategy to train a deep neural network with a small data set, where a more general model is first developed with a large data set that contains the target domain, and the knowledge of this general model is transferred to a new model and the portion of the model is retrained with the target domain data set57.

Here, we combine positive-unlabeled learning26, domain-specific learning55, and transfer learning56 to develop the synthesizability prediction model of perovskites with a high practical accuracy. We pre-train the graph neural network with the Materials Project database and retrain the portion of the model with the smaller perovskites dataset. 943 previously synthesized perovskite crystals and 11,964 virtual perovskites collected from Materials Project (MP), OQMD, and AFLOW databases were used for learning. Our model shows a high out-of-sample positive data accuracy of 95.7%, compared to those of the non-domain specific original model around 74.0%. Our model predicted 962 materials out of 11,964 virtual perovskites as synthesizable, and 179 virtual crystals of those have indeed been synthesized in literature. Compared to the previous ionic perovskite-focused models, our model is capable of predicting the synthesizability of all types of perovskites in the dataset, including anti-perovskites where the anion and cation occupation is inverted. We furthermore suggest promising Li-rich anti-perovskites and metal halides as candidates for solid-state electrolyte and photoactive materials discovery, respectively.

Result and discussion

Development of PU learning model

The inorganic crystal data from the MP7 database, retrieved in October 2020, consisted of 46,546 crystals with inorganic crystal structure databases (ICSD) id and 79,789 crystals without ICSD id. We considered the 46,546 crystals with the ICSD id and experimental tag synthesizable, and the remaining 79,789 crystals without ICSD id “virtual”, as undetermined. These MP data are used to pre-train the model. We then retrieved the perovskite crystals from MP7, OQMD8, and AFLOW databases9 in October 2020 (Fig. 1a). We used the StructureMatcher function in pymatgen58 and perovskite prototype structures in the AFLOW database9 to identify and remove duplicate crystals, resulting in 943 synthesized and 11,964 virtual perovskite crystals. The perovskite data are used to train the transferred model.

Fig. 1: Overview of the model development.
figure 1

a Domain-specific transfer learning workflow. The model is first trained with the Materials Project database, and the model is re-trained with the perovskite-only data extracted from the three databases. b Positive and unlabeled learning procedure overview. c The graph neural network architecture. Ein and Vin are the atom and edge features. Dense indicates the linear multiplication followed by the softplus activation layer and Linear indicates linear multiplication. The number next to the operation indicates the output feature dimension. Min Pool indicates minimum pooling followed by sigmoid activation. More detail is in the “Methods” section. d The crystal representation. Atoms and edges are converted to mathematical representation via featurization.

Both the pre-training and transfer learning are performed using inductive PU learning26. To test our model, 10% of randomly sampled synthesized crystals are set aside from both the MP data used for pre-training and the perovskite data used for transfer learning. Thus, we ensure that the test data is not observed for the pre-training stage. With the rest of the data set, we perform the PU learning procedure. Here, 10% of the synthesized crystals are randomly sampled and the same number of virtual crystals are randomly sampled, both for the model validation. The rest of the synthesized crystals are used for training, and the same number of virtual crystals are randomly sampled and treated as negative data for the training. This process is repeated 100 times, resulting in an ensemble of 100 models. The key procedure is that, for each model, the training and validation set for the virtual crystals change, whereas those for the synthesized crystals remain fixed. The synthesizability score, which we call CL score27, is calculated by averaging over the predictions of 100 models. Varying the virtual data set aids in forming the averaged decision boundary as shown conceptually in Fig. 1b.

For the prediction model, we constructed a graph convolutional neural network (GCNN) inspired by MEGNet59 as shown in Fig. 1c, the detail of which is provided in the method section including the crystal featurization. Our model calculates the CL score between 0 and 1, where the crystals with a high CL score indicate high synthesizability. For practical screening, crystal candidates can be tested in the decreasing order of CL score for the best chance of success. In this work specifically, we empirically set the CL score of 0.5 to calculate metrics such as true positive rate (TPR; true positive/(true positive + false negative)) and also to consider a crystal as a synthesizable candidate. To perform transfer learning, we first pre-train our model with the Materials Project data. Then, the model weights in the encoding layer and the first graphical convolution layer are fixed and the rest of the model is re-trained using the combined perovskite data.

Model accuracy and validation based on previous experiments

We assess the model by the true positive rate using the held-out positive test set as shown in Fig. 2a. We focus on the TPR since negative data (unsynthesizable) are unavailable. Compared to the MP-trained general synthesizability prediction model, the domain-specific transfer PU learning has significantly higher TPR for perovskites, increased from 0.740 (GCNN + PUL in Fig. 2a) to 0.957 (GCNN + PUL + DSL + TL in Fig. 2a). For comparison, we also tested the CGCNN model in our previous work27, and found that TPR is 0.595 and 0.957 for the general model and the domain-specific transfer PU learning, respectively, suggesting that the domain-specific transfer learning is more important than the model architecture. We plotted the CL score distribution for the virtual and synthesized crystals (Fig. 2b) to assess perovskite chemical space. The scores for virtual crystals are skewed towards the CL score of 0, and only 962 (1121 considering structure distortion) out of 11,964 virtual perovskites are predicted synthesizable. We find that domain-specific transfer learning can improve the accuracy for oxide-focused chemical space (from 0.837 to 0.930). We note that while TPR can be artificially increased by lowering the threshold probability or developing a naïve model that predicts all crystals synthesizable, such is not the case for our model as 84% of the perovskite crystals are predicted unsynthesizable. Figure 2b demonstrates the CL score distribution for all data and the out-of-sample test data, which also shows that virtual crystals are generally predicted unsynthesizable.

Fig. 2: Model accuracy and data distribution.
figure 2

a The out-of-sample true positive rate for perovskites for the various tested models. GCNN indicates graph convolutional network, BC indicates binary classification, PUL indicates positive-unlabeled learning, DSL indicates domain-specific learning, and TL indicates transfer learning. True positive rate is assessed as the performance measure, as the positive data (synthesized crystals) are known, while the negative data (unsynthesizable crystals) are not known. As the unlabeled data (virtual crystals) are available in the database, positive unlabeled learning is implemented to assess synthesizability. b The score distribution for the synthesized and virtual crystals. Diamond and circle marks indicate the out-of-sample test data, and all data, respectively. The count is normalized by the highest peak value.

To understand the motive of the model’s success, we test the binary classification model where GCNN is trained using a dataset where all unlabeled data is labeled negative, and positive data are oversampled to balance the number of positive and negative data. Here, we find that TPR decreases to 0.361 (GCNN + BC in Fig. 2a) and 0.691 (GCNN + BC + DSL + TL in Fig. 2a) for the MP-trained general model and transfer learning model, respectively. This could be due to the positive data in unlabeled data that are mislabeled negative, thus the data splitting method in PU learning is critical. We also trained a PU-learning model without pre-training with MP data (i.e., without transfer learning), and find that the TPR decreases slightly to 0.947 (GCNN + PUL + DSL in Fig. 2a). Thus, the model success is largely attributed to the domain-specific data set, and the transfer learning scheme contributes marginally for TPR. For the rest of the discussion, we will use results obtained from the best model, GCNN + PUL + DSL + TL.

We further investigated the correlation between the predicted CL score and the energy above hull for all the virtual crystals obtained from each data source as shown with the histogram and violin in Supplementary Fig. 4. The overall data distribution between the CL score and the energy above hull shows a negative correlation (Pearson correlation coefficient of −0.3739). Thus, our model learns the energetic stability (energy above hull) to some extent without explicitly learning these metrics. Interestingly, Supplementary Fig. 4 shows that a significant number of energetically stable perovskites (energy above hull < 0.1 eV/atom) have low CL scores, indicating the difference between the machine-learned synthesizability and the conventional energetic synthesizability metrics.

To further validate our model in practice, we searched the literature for the reported cases of synthesis for the virtual crystals that are predicted synthesizable. We used the XRD patterns to match the synthesized and virtual crystals as shown in Supplementary Fig. 5. We found that 179 out of 962 synthesizable virtual perovskite compounds have been synthesized before (Supplementary Fig. 5 and Supplementary Table 3). In further analysis, the percentage plot of the found virtual crystals by the CL score in Fig. 3a shows an interesting trend that the ratio of the previous synthesis increases with the predicted CL scores. Figure 3b shows the two previously synthesized virtual perovskites with the highest synthesizability scores and their respective XRD patterns. We also searched the literature for the 1000 virtual crystals with the lowest CL scores but were not able to find any previous report of their synthesis. To furthermore assess the model’s performance for the crystals with an indecisive score, we searched the literature for the crystals with CL scores between 0.4 and 0.5. We found only 20 crystals previously synthesized out of 386 virtual crystals for these crystals, indicating the value of CL score in the indecisive region. While these assessments provide validation for our model, we cannot guarantee the model’s high precision (true positive/(true positive + false positive)), as it is difficult to show that our positive predictions are incorrect.

Fig. 3: Model validation.
figure 3

a The percentage of virtual perovskites that are found synthesized in the literature. The ratio indicates the number of found over the number of virtual crystals in the range. b The structure of virtual crystals and XRD comparison between the experimental and virtual crystals for the top two perovskites previously reported in refs. 71,72. The full list of virtual crystals and XRD pattern comparison are shown in Supplementary Table 3 and Supplementary Fig. 5.

Comparison with tolerance factor-based models

We compare our models’ out-of-sample TPR with the two empirical perovskite discovery strategies, i.e., Goldschmidt rule-based and SISSO-based screening, by the assumption that the materials are considered to be synthesizable if they remain after applying the screening filters. Davies et al.60 used the Goldschmidt tolerance factor44-based screening by assessing the ionic radius of the Shannon table52. This screening focused on standard ionic perovskites, where the element of the C site in the ABC3 formula was limited to 7 anions. Since our data contains non-classical ionic perovskites, only 388 out of 943 synthesized perovskites were found to be within their screening scope. For those 388 perovskites that are directly relevant to the Davies et al.’s60 procedure (see SI), a TPR of 0.863 is obtained using Davies et al’s method. Bartel et al.45 developed and used a SISSO determined tolerance factor that uses the oxidation state and the ionic radius52. Only 310 crystals out of 943 perovskites were within their selection of elements, but by reproducing their procedure (see SI), we calculated the TPR of 0.806. Note that the reported45 TPR (0.936) is different which may be due to the difference in the dataset. Nonetheless, our model’s out-of-sample TPR (0.957) is significantly higher (0.806–0.863) than the previous methods for the experimentally synthesized perovskites considered.

Also, our model chooses less synthesizable candidates than the previous strategies. Supplementary Figure 1 compares the Goldschmidt rule-based screening results and CL score and demonstrates that a large portion of the virtual crystals that passes the screening have low CL score. More precisely, our model predicts that 9.4% of the virtual crystals are synthesizable, whereas Davies et al.60 and Bartel et al.45 predict that 24.5 and 25.7% are accessible, respectively. Figure 4 compares the 2D elementary map for ABO3 perovskite oxide based on our model and Goldschmidt-based screening of Davies et al.60 which also shows that fewer candidates are predicted synthesizable by our model. Here, the red and blue boxes indicate the virtual crystals that have been synthesized. This result does not necessarily indicate that our model is more selective, as the synthesizability of crystals is difficult to measure. On the other hand, our model predicts a probability, thus the best candidates can be prioritized.

Fig. 4: Synthesizability of the ABO3 perovskite compounds for our model (lower left triangle) and Goldschmidt-rule-based screening (upper right triangle).
figure 4

The green color in the lower left triangle indicates the maximum CL score for the perovskites structures from the databases with the given compositions, the color bar of which at the bottom of the figure, and the green color in the upper right triangle indicates that the combination passes the screening. The blue box indicates that the combination has been synthesized before. The red boxes indicate the virtual crystals that were found synthesized previously.

While the previous strategies focused on the Shannon table52-based classical ionic perovskite predictions, our model can predict synthesizability for various perovskite types. We expanded the screening scope of Davies et al.60 by including the non-anion elements for the C site as well. In this case, the TPR of the screening method is low (0.389) which is attributed to increased covalency in bonds for some elemental combinations53 and the lack of relevant elemental data in the Shannon tables for anti-perovskites54. In addition to classical ionic perovskites, we found unconventional combinations of elements within the 179 virtual perovskites that were found synthesized, types of which are “covalent” perovskites that contain two or more anions (e.g., CsIO3, ClOLi3) with higher covalency in bonds, hydride perovskites that contain hydrogen (e.g., CaCsH3), and anti-perovskites that contain anion in B sites instead of C sites in ABC3 combination (e.g., SnNFe3) (see Fig. 5). The prediction for these three types of combinations is a new capability of our model that the previous models were not capable of. Figure 5c, d shows the SISSO-based model45 and Goldschmidt rule-based screening60 result for the discovered 179 virtual crystals, where we observe that the significant portion is outside-of-scope. Also, Fig. 5b shows that the non-domain-specific model only predicts 101 crystals stable out of the found 179 virtual crystals, showing the value of the domain-specific learning.

Fig. 5: Predictions of the other methods for the reported virtual crystals.
figure 5

a The distribution of perovskite types for the 179 virtual perovskites found synthesized. The stability prediction of the 179 compounds using the b non-domain-specific MP-trained general model (GCNN+PUL in Fig. 2a), c SISSO-based model45, and d Goldschmidt rule-based screening60. The ABC3 perovskites were classified based on the following criteria: classical perovskites contain cation in A and B site and anion in C site (e.g., SrTiO3), anti-perovskite contains anion in B site and cation in A and C site (e.g., SnNFe3), covalent perovskite contains two or more anions (e.g., CsIO3, ClOLi3), and hydride contains hydrogen on the C site (e.g., CaCsH3).

Applications

While perovskite has been studied extensively, Fig. 4a shows that there remain many synthesizable elemental combinations yet to be discovered. We plot the periodic table representation of the synthesizability in Supplementary Fig. 2. Here, the ratios of virtual candidates with CL scores above 0.5 are shown with the given element in the given site. Compared to the classical ionic perovskites, anti-perovskites have high CL scores which contain C, N, O, P in the B site, and a transition metal on the C site. Indeed, we found that a significant number of virtual anti-perovskites have been previously synthesized (Fig. 5), suggesting there may be more opportunities to discover anti-perovskites. Anti-perovskites have shown many interesting properties such as superconductivity39,40 and magnetism37,38,61. Our model suggests that 327 virtual anti-perovskites are synthesizable, which are listed in Supplementary Table 3.

We also selected the synthesizable candidates for two technologically important applications. Metal halide perovskites, namely, CsPbI3, RbPbI3, and MAPbI3 (MA = CH3NH3+) have shown many promising applications in photovoltaics and light-emitting diodes in the past decade34,35,36. However, these materials often contain toxic Pb. The semiconducting properties of these perovskites are largely due to the diffuse valence p-orbitals of the halide62, thus we expect that there are more semiconducting halide perovskites that can be accessed. Our model predicts that 98 virtual metal halides are synthesizable. We further screen these materials by band-gap, using a two-step DFT procedure (PBEsol relaxation followed by HSE06 single point calculation). We found that 43 materials have band gaps as listed in Supplementary Table 1. Particularly, 12 candidates have a bandgap between 0.7 and 2.0 eV, which could be promising for photovoltaics as shown in Table 1 including CL score and the energy above hull. Herein, the majority of the predicted materials (8 of 12 candidates) are thermodynamically stable (energy above hull < 0.1 eV/atom). In addition, as shown in Supplementary Fig. 3b, CL score values of all the predicted materials in Table 2 are overlapped with the CL score distribution of positive data. We note that two materials (NPF3, and RbCF3) are highly unstable (energy above hull > 1.0 eV/atom). While our model has a relatively high true positive rate, the model could make false positive predictions (low precision) as discussed above, resulting in this disparity. We note that many of these compositions contain non-standard chemistries (e.g., CsNaF3 or RbOF3) that would not be identified based on simple electron counting considerations.

Table 1 Synthesizable halide perovskites with calculated bandgap in-between 0.7 and 2.0 eV for photovoltaics applications.

Zhao et al.41 discovered that Li-rich anti-perovskite, Li3OCl have superionic conductivity for the application of solid battery electrolytes. The high conductivity was achieved due to high Li concentration and the streamlined C-site diffusion pathway, thus the conductivity is expected to be transferable to other Li-rich anti-perovskite such as Li3OBr63. We listed 8 Li-rich anti-perovskites with CL score > 0.5 in Supplementary Table 2 including CL score and the energy above hull. While the previously reported Li3OBr and Li3OCl are thermodynamically stable (0.012 eV/atom for Li3OBr and 0.006 eV/atom for Li3OCl), the newly predicted materials in Supplementary Table 2 show low thermodynamic stability (>0.3 eV/atom). Also, a similar disparity is observed for the CL score distribution as well (see Supplementary Fig. 3a), indicating potential difficulties in synthesizing these materials thermodynamically despite being more synthesizable based on the CL scores. This suggests an interesting possibility that the combined use of CL scores and thermodynamic metrics can complement the limitations of each approach and yield more reliable synthesizability predictions.

To summarize, perovskites represent a unique class of materials with desirable physical properties. We have implemented domain-specific transfer PU learning to assess the synthesizability of perovskite materials. Our model demonstrated a 0.957 out-of-sample true positive rate, significantly improving over the previous methods based on geometric factors (0.806–0.863)45,60. We searched the literature for the 962 virtual crystals that are predicted synthesizable and found that 179 virtual crystals have been synthesized, adding to the synthesized perovskite pool of 943 crystals in three open crystal databases. The same literature search for the 1000 virtual crystals with the lowest synthesizability scores yielded no synthesized cases, further validating our model. Compared to empirical models based on ionic radii that are most applicable to classical ionic perovskites, our model demonstrates a general ability to assess the synthesizability across all prototypes of perovskites, including the anti-perovskites, covalent perovskites, halides, and hydrides. To this end, we listed promising synthesizable candidates that can expand the materials portfolio for two important applications, i.e., Li-rich ion conductors and metal halide optical materials, which can be tested experimentally. We expect that the proposed domain-specific transfer PU learning would be fruitful to explore the target-specified crystal space for other crystal families and application domains.

Methods

Model architecture and training

The overall architecture of the convolutional neural network is shown in Fig. 1c. Vin and Ein are the atom and edge/interaction input features to the model. The graph structure of crystals is constructed by assigning edges to Voronoi neighbors within the 7 Å radius of each atom. The atom features are constructed by the one-hot encoding method categorized by the element, and edge features are constructed by Gaussian expansion of distance and Voronoi solid angles as shown in Fig. 1d. These features are encoded with linear multiplication and a softplus activation. The graph convolutional layer contains neighbor edge and atom pooling to make new hidden features. In detail, the new edge features of edge i, Eout,i are generated by

$$E_{{\rm{out}},i} = \sigma \left( {W \cdot \phi \left( {V_{{\rm{in}},j},V_{{\rm{in}},k},E_{{\rm{in}},i}} \right) + \beta } \right)$$
(1)

where σ is the softplus function, W is the linear multiplication weight, β is the bias, ϕ is the concatenation operator, j, and k are the two atoms connecting the edge. The new atom features for atom i are generated by

$$V_{{\rm{out}},i} = \sigma \left( {W \cdot \phi \left( {V_{{\rm{in}},i},\mathop {\sum}\nolimits_j^{n_{\rm{neighbor}}} {\frac{{E_{{\rm{in}},j}}}{{n_{\rm{neighbor}}}}} } \right) + \beta } \right)$$
(2)

where j is the index of edges that are connected to atom i. Here, the edge features are averaged and concatenated. The box with “Dense, 64” with two input arrows in Fig. 1c indicates the two convolution operators discussed above. The 64 indicates that the output feature size is 64. The “Dense, 64” with one input arrow indicates a simple activation layer for the feature, F,

$$F_{{{{\mathrm{{out}}}}}} = \sigma \left( {W \cdot F_{{\rm{in}}} + \beta } \right)$$
(3)

For the box with “Linear,1”, linear multiplication is used,

$$F_{{{{\mathrm{{out}}}}}} = W \cdot F_{\rm{in}} + \beta$$
(4)

resulting in a single element value. The “Min Pool” indicates the minimum pooling operation followed by the sigmoid operation. As discussed above, the intermittent atom and edge features are kept at the element size of 64. We used binary cross-entropy loss function with Adam optimizer64 to train our model with a batch size of 512. The model is trained to 50 epochs, and the model with the lowest validation loss is selected.

Bandgap and energy above hull calculations

For all DFT calculations, we performed spin-polarized PBEsol65,66 calculations with PAW-PBE pseudopotentials67 as implemented in the plane-wave-based ab initio package, VASP68. We selected the PAW potentials as recommended in the MP database7. Atomic positions and unit cell parameters are fully relaxed using the conjugate gradient descent method with the convergence criteria of 1.0e−5 eV for the energy and 0.05 eV/Å for the force with 500 eV cut-off energy. Brillouin zone is used with the k-point densities of 1000 k-points per atom using Pymatgen58. For the calculations of bandgap using the relaxed structure, we performed HSE0669 hybrid DFT functional implemented in VASP68 with a mixing parameter of 0.2. For computational efficiency, we used cut-off energy of 400 eV, and also used a uniform reduction factor for the q-point grid of the exact exchange potential is applied (NKRED = 2) with gamma centered even number k-points (with a k-point density of 1000 k-points per atom). For Brillouin zone integration70, we used Blöchl correction-included tetrahedron method. To calculate energy above hull, we extracted all relevant species in the convex hull diagram from the materials project, and performed PBEsol calculations. The energy above hull is obtained by using the calculated energetics and Pymatgen58.