Abstract
The vast compositional and configurational spaces of multielement metal halide perovskites (MHPs) result in significant challenges when designing MHPs with promising stability and optoelectronic properties. In this paper, we propose a framework for the design of Bsitealloyed ABX_{3} MHPs by combining density functional theory (DFT) and machine learning (ML). We performed generalized gradient approximation with Perdew–Burke–Ernzerhof functional for solids (PBEsol) on 3,159 Bsitealloyed perovskite structures using a compositional step of 1/4. Crystal graph convolution neural networks (CGCNNs) were trained on the 3159 DFT datasets to predict the decomposition energy, bandgap, and types of bandgaps. The trained CGCNN models were used to explore the compositional and configurational spaces of 41,400 Bsitealloyed ABX_{3} MHPs with a compositional step of 1/16, by accessing all possible configurations for each composition. The electronic band structures of the selected compounds were calculated using the hybrid functional (PBE0). Then, we calculated the optical absorption spectra and spectroscopic limited maximum efficiency of the selected compounds. Based on the DFT/MLcombined screening, 10 promising compounds with optimal bandgaps were selected, and from among these 10 compounds, CsGe_{0.3125}Sn_{0.6875}I_{3} and CsGe_{0.0625}Pb_{0.3125}Sn_{0.625}Br_{3} were suggested as photon absorbers for singlejunction and tandem solar cells, respectively. The design framework presented herein is a good starting point for the design of mixed MHPs for optoelectronic applications.
Similar content being viewed by others
Introduction
Metal halide perovskites (MHPs) are a class of materials with the chemical formula of ABX_{3}, where A, B, and X represent monovalent organic/inorganic cations, divalent metal cations, and monovalent halide anions, respectively^{1} MHPs have received considerable research efforts^{1} owing to their outstanding optoelectronic properties, low cost, and easy fabrication, which make them promising materials for photovoltaics^{2}, lightemitting diodes^{3}, lasers^{4}, and photodetectors^{5}. Among the various MHPs, compounds containing methylammonium (MA) or formamidinium (FA) at the Asite and Pb at the Bsite have been extensively studied and have demonstrated the best optoelectronic properties^{5}. However, commercialization of the MHPs is still challenging because of several obstacles, such as the toxicity of Pb and instability under adverse conditions such as high temperature and humidity, which primarily originate from organic cations^{6}. Therefore, to facilitate their commercialization, MHPs with better stability while containing less Pb should be developed without compromising their optoelectronic properties^{6}.
To overcome these challenges, several researchers have suggested substitutional alloying (i.e., doping or mixing with different elements or chemicals at each site in an MHP)^{7} as a promising approach for tuning the stability and optoelectronic properties of MHPs^{8}, and thus, the Pbderived toxicity of MHPs can be effectively mitigated by partially replacing Pb with other elements. In the field of metallurgy, the concept of highentropy alloys has recently demonstrated that mixing multiple elements can improve thermodynamic stability by exploiting the large configurational entropy^{9}. This concept has also been applied in MHPs^{10}. For example, the Asite mixing entropy improved the stability of perovskite solar cells^{11,12} and lightemitting diodes^{13}. Moreover, highentropy alloying of the Bsite of MAPbBr_{3} improved the colloidal stability and optical performance while reducing the Pb content^{14}. In this study, we primarily focused on Bsite alloying to achieve both improved stability and lower Pb content.
However, because the compositional space of MHPs with multiple alloying elements is virtually infinite, utilizing an experimental design approach is not practical for exploring the chemical space of elementmixed MHPs. Even the use of highthroughput experiments only achieved the synthesis of a few tens of binary and ternary alloys of MHPs^{15,16,17}. In this regard, several computational approaches utilizing either density functional theory (DFT) calculations^{18,19,20,21} or a combination of DFT with datadriven machine learning (ML) models^{22,23,24} have been proposed for the rapid screening of the compositional space of MHPs. For example, Wang et al.^{18} investigated the effect of entropy on the thermodynamic stability and optoelectronic properties of Cs_{2}B^{+}B^{3+}Cl_{6} doubleperovskite alloys using DFT calculations. They used the special quasirandom structure (SQS)^{25,26} approach to model disordered doubleperovskite alloys, showing entropydriven stabilization. Moreover, Yang et al.^{19} presented generalized gradient approximation and hybridlevel computations on a DFT dataset of 495 ABX_{3} perovskite alloys, performed modeling mixed perovskites using the SQS approach, and suggested 32 compounds showing promising stability and photovoltaic efficiency from the screening.
For exploration using ML, Choubisa et al.^{23} developed a “crystal site feature embedding” (CSFE) representation, which achieves low errors when predicting DFT energies and bandgaps of mixed MHPs. They used CSFE to explore the chemical space of MHPs and discovered how a small amount of Cd doping (i.e., ~1.5%) can change the bandgap to 1.1 eV, which is desirable for the active layers of solar cells^{23}. Moreover, MannodiKanakkithodi et al.^{24} proposed an MLdriven highthroughput screening framework based on the stability, bandgap, and defect tolerance of mixed MHPs, where the tabulated elemental properties of each species of the MHPs were used as inputs for ML (i.e., neural networks). Using the proposed screening framework, 392 out of 17955 MHPs were identified as promising candidates for photon absorbers^{24}.
Despite the practical applicability of the aforementioned computational design framework, most previous studies do not guarantee the most stable atomic configuration for each composition. That is, previous works^{18,19,20,21,23,24} utilized random sampling or SQS methods to estimate the properties of random alloys and did not explore all possible configurations exhaustively. Understanding the properties of the groundstate atomic configuration is important because the atomic configuration of an alloy is likely to be the most stable configuration, and the properties of alloys, such as the bandgap, can be significantly altered by configuration changes, even at the same composition^{27,28}. In this regard, Yamamoto et al.^{22} utilized the cluster expansion approach to identify the ground state of Bsite mixed iodide perovskites (i.e., AB_{x}B’_{1x}I_{3}); however, the authors only investigated the effect of Bsite mixing on thermodynamic stability.
In this work, we propose a DFT/MLcombined framework (Fig. 1) for the design of Bsitealloyed MHPs with improved stability and optoelectronic properties by accessing all possible atomic configurations. To achieve this, we employed a crystal graph convolution neural network (CGCNN)^{29} as a surrogate model for predicting the stability and the electronic band structure computed at the Perdew–Burke–Ernzerhof functional revised for solids (PBEsol) level^{30}. For the construction of the DFTcalculated database used for training the CGCNN, a 20atom unit cell was used, in which Bsite alloyed MHPs were modeled with a compositional step of 1/4. The trained CGCNN is then used to exhaustively explore the enlarged chemical space of Bsite alloyed MHPs with the increased compositional resolution by four times (i.e., 1/16 compositional step) remaining the A and Xsites as unary (i.e., A: Cs, K, and Rb; X: Br, Cl, and I). Through MLassisted screening, 110 compounds were predicted to be thermodynamically stable and exhibited promising electronic band structures, and validations on the thermodynamic stability and electronic band structures were further conducted using DFT calculations with the semilocal PBEsol and hybrid PBE0^{31} functionals, respectively. From this, we identified 10 promising candidates for solarcell application after examining the PBE0calculated bandgaps. To this end, the main contributions of our work can be summarized as follows.

We explored the whole possible atomic configurational space using the trained CGCNN model to identify the most stable atomic configurations within the given Bsite mixed MHP composition which is not explicitly identified in the previous work as mentioned earlier.

We expanded the explored Bsite mixed MHP domain up to the quaternary phase including the mixing entropy effect in terms of both thermodynamic stability and the electronic properties compared to the previous works which explored up to the ternary phase for Bsite mixing without the mixing entropy effect^{23,24}.

We proposed to use PBE0^{31} combined with the spinorbit coupling (SOC) correction^{32} term as a more accurate approach for estimating experimental bandgaps of MHPs exhibiting lower prediction error compared to the other hybrid functionals such as HSE06^{33} and B3LYP^{34}.
Results
Design strategy
Figure 1d illustrates the overall workflow of this study. The generated PBEsol data were used to train the CGCNN models (see details in Training data generation in METHODS section). Specifically, three CGCNN models were independently trained to predict three target properties (see details in Crystal Graph Convolutional Neural Networks training in METHODS section): regression of the decomposition enthalpy (∆H_{decomp}) and bandgap (E_{gap}) and binary classification for band type (see also Thermodynamic stability and bandgap of Bsite mixed metal halide perovskites in METHODS section for details on calculating each target property). For the band type classification, the nonindirect and indirect band types were labeled as positive and negative classes, respectively.
To explore the chemical space of multielement MHPs, we used a fourfold enlarged structure compared to the training data, containing 16 Bsites (i.e., 80 atoms per unit cell). This allows a finer resolution of the compositional step (i.e., 6.25 at.%) and we considered Bsite alloying up to the quaternary system. For each composition of the MHP, CGCNNpredicted ∆H_{decomp} (i.e., \(\Delta {H}_{{\rm{decomp}}}^{{\rm{CGCNN}}}\)) was used to identify the most stable atomic configurations, and the mixing entropy term \((T\Delta {S}_{{\rm{mix}}})\) was added to \(\Delta {H}_{{\rm{decomp}}}^{{\rm{CGCNN}}}\) at a temperature of 298 K. The entropic contribution, \(T\Delta {S}_{{\rm{mix}}}\), is proportional to temperature, as can be seen in Fig. 1b. Therefore, the entropydriven stabilization is more effective at higher temperatures. Even at 298 K, the distribution of \(\Delta {H}_{{\rm{decomp}}}T\Delta {S}_{{\rm{mix}}}\) in our training data shifted negatively when compared to that of ∆H_{decomp}, as shown in Fig. 1c.
Next, we considered Bartel’s tolerance factor^{35}, τ, which is a new datadriven tolerance factor, to classify whether an arbitrary compound is a perovskite (see details in Tolerance factor of Bsite mixed metal halide perovskites in METHODS section). τ exhibited better classification accuracy than Goldschmidt’s tolerance factor^{36}. We also compared the classification accuracy of τ with that of Filip’s geometric limits for the formability of perovskites^{37}. After a comparison of classification accuracy for eight materials that were experimentally confirmed to exist as perovskites, we concluded that τ predicts closer to the experimental results (see details in Supplementary Note 2).
For solarcell applications, CGCNN predicts the band type and bandgap (\({E}_{{\rm{gap}}}^{{\rm{CGCNN}}}\), hereafter) of the atomic configuration with the lowest ∆H_{decomp} at each composition. Because the CGCNN model is trained to predict PBEsolcalculated bandgap \(({E}_{{\rm{gap}}}^{{\rm{PBEsol}}})\), we selected the compositions showing direct bandgaps with the condition \({E}_{{\rm{gap}}}^{{\rm{CGCNN}}}\) < 0.5 eV as promising compounds for photovoltaics. Considering that PBEsol underestimates bandgap by approximately 1.0–1.5 eV when compared to PBE0 (refer to Supplementary Table 5), we expect that the candidates satisfying the latter condition (\({E}_{{\rm{gap}}}^{{\rm{CGCNN}}}\) < 0.5 eV) will show a PBE0calculated bandgap \(({E}_{{\rm{gap}}}^{{\rm{PBE}}0})\) of approximately 1.0–2.0 eV, which is close to an ideal bandgap range of approximately 1.2–1.4 eV for singlejunction photovoltaics according to the Shockley–Queisser limit^{38}. Moreover, this bandgap range (1.0 < \({E}_{{\rm{gap}}}^{{\rm{PBE}}0}\) < 2.0 eV) is close to the ideal bandgap for a top cell in a tandem solar cell (1.72 eV), with silicon as the bottom cell, which has a bandgap of 1.1 eV^{17}.
The compounds selected based on the stability and band structure criteria were further validated for their stability and band structure through PBEsol and PBE0 calculations, respectively, to examine their potential application in photovoltaics. We optimized the geometry of the selected compounds using PBEsol and then calculated the band structure using PBE0. We calculated the carrier effective mass, optical absorption spectra, and spectroscopic limited maximum efficiency (SLME). The SLME was calculated according to a previous study by Yu et al.^{39} using the opensource SL3ME code (https://github.com/ldwillia/SL3ME). In the SLME calculations, the optical absorption spectra and bandgap were used as inputs, assuming an air mass 1.5 global solar spectrum (https://www.nrel.gov/grid/solarresource/spectraam1.5.html).
Analysis of the training dataset
The distribution of the thermodynamic stability of the 3,159 DFT (PBESol)calculated training data was first analyzed, as shown in Fig. 2. As shown in Fig. 2a, we observed that ∆H_{decomp} distribution moves toward a lower energy regime (i.e., orange color in Fig. 2a) by employing the mixing entropy term (i.e., \(T\Delta {S}_{{\rm{mix}}}\)). As expected, a greater shift to the stable region can be obtained by mixing more elements. In fact, without considering the mixing entropy, CsGeBr_{3} showed the lowest ∆H_{decomp} value among the 3159 DFTcalculated data; however, we obtained more stable data for the composition of CsGe_{0.25}Hg_{0.25}Pb_{0.25}Sn_{0.25}Cl_{3} corresponding to the lowest \({\Delta H}_{{\rm{decomp}}}T\Delta {S}_{{\rm{mix}}}\) value among the entire 3159 data. Additionally, we note that alloys with Ge lead to ∆H_{decomp} distribution in the lower energy range when compared to that of alloys without Ge (Fig. 2b), which is consistent with the results of previous computational studies^{19,24} (refer to Supplementary Fig. 6 for the effects of including other elements at the Bsite). As shown in Supplementary Fig. 6, alloys with Zn result in ∆H_{decomp} distribution in a higher energy range when compared to alloys without Zn. Consequently, ∆H_{decomp} becomes lowest (highest) when the Bsite contains only Ge (Zn). Therefore, when the Bsite contains a single element, the ∆H_{decomp} distribution is widest, while the ∆H_{decomp} distribution narrows as the number of elements in the Bsite increases.
We analyzed the correlation between perovskite formability as described by Bartel’s tolerance factor (τ) and thermodynamic stability (Fig. 3a). Fig. 3a shows a positive correlation between τ and \(\Delta {H}_{{\rm{decomp}}}T\Delta {S}_{{\rm{mix}}}\), as both values indicate that compounds with lower values are more stable. However, a region showing inconsistency between the two metrics exists at τ > 4.18, which might be due to the imperfectness of τ. The dataset used to derive τ does not contain Bsite alloyed perovskites^{35}, thereby resulting in an inaccurate prediction for Bsite alloyed perovskites. Moreover, τ ignores the mixing entropydriven stabilization (Eq. (3)); thus, when we compare ∆H_{decomp} with τ, less data exists in the inconsistent region (τ > 4.18 and ∆H_{decomp} < 0), as shown in Supplementary Fig. 7. Specifically, the data deviating from the positive correlation (i.e., τ > 4.18 and ∆H_{decomp}T∆S_{mix} < −85 meV atom^{−1}) are all Gecontaining compounds, namely, CsGeBr_{3}, CsGeCl_{3}, CsCd_{0.25}Ge_{0.75}Br_{3}, and CsGe_{0.75}Hg_{0.25}Br_{3}; however, experimental results have confirmed that CsGeBr_{3} and CsGeCl_{3} exist as perovskite structures^{40}. Despite this imperfection of τ, we can safely apply the condition, τ < 4.18, to screen stable compounds if combined with the condition of maintaining the lowest possible value of \(\Delta {H}_{{\rm{decomp}}}T\Delta {S}_{{\rm{mix}}}\), because the two metrics show a positive correlation. For electronic bandrelated properties, as shown in Fig. 3b, the amount of indirectbandgap data was 4.68 times larger than that of the nonindirectbandgap data. Moreover, of the 905 data with a bandgap of less than 0.5 eV, 654 data were indirect.
Next, we calculated the Pearson correlation coefficient between the elemental fractions and the four output properties, ∆H_{decomp}, τ, bandgap, and band type, and the results are illustrated in Fig. 3c. ∆H_{decomp} and τ tend to be lower (more stable) when the Asite is occupied by Cs, whereas I in the Xsite increases ∆H_{decomp} and τ (more unstable). An increase in the fraction of Ge tends to decrease ∆H_{decomp}, whereas increasing the fraction of Zn tends to increase ∆H_{decomp}. The bandgap tends to decrease when the A and Xsites are occupied by Cs and I, respectively. Notably, the correlation trend between ∆H_{decomp} and bandgap is opposite in the Xsite: I (Cl) tends to increase (decrease) ∆H_{decomp} but decrease (increase) the bandgap. Increasing the Ge, Hg, and Sn fractions decreases the bandgap, whereas increasing the Cd, Pb, and Zn fractions increases it. In this correlation analysis of the band type, we represented the indirect band type as zero and the nonindirect band type as one. Thus, a positive correlation implies that increasing the fraction of certain elements tends to change the compounds into nonindirectbandgap materials and vice versa. The Cs fraction showed the highest positive correlation (0.21), whereas the fractions of all other elements exhibited a relatively low correlation with the band type.
Validation of the trained CGCNN models
To validate the prediction performance of the trained CGCNN models, we plotted the test set prediction results, as shown in Fig. 4. When predicting the thermodynamic stability (Fig. 4a), the trained CGCNN model showed promising prediction accuracy with a mean absolute error (MAE) of 0.45 meV atom^{−1}, indicating that the model could effectively capture the quantitative relation between the crystal structure and thermodynamic stability.
The trained models also showed highly promising prediction performance for the prediction of the bandgap (Fig. 4b) and band type (Fig. 4c). Specifically, the prediction performance of the trained CGCNN model for bandgap prediction (Fig. 4b, MAE of 0.037 eV and root mean squared error of 0.061 eV) was superior to that of a previous study^{24}.
In the classification of the band type (Fig. 4c), our model showed promising classification performance with a low inclusion of misclassified data (i.e., falsenegative and falsepositive data). Only 3 and 10% of the negative and positive classes, respectively, were misclassified with a high accuracy of 0.96 and recall value of 0.90. In other words, our model could effectively recover the truepositive distribution with a relatively low inclusion of falsepositive data from the negative class. Here, we noted that the relatively low precision value of 0.84 was due to the class distribution imbalance of the negative data (Fig. 3b). Despite the large class imbalance between the positive and negative data, the model shows a similar amount of misclassified positive and negative data, as shown in Fig. 4c. Therefore, we expect no significant reduction in sampling efficiency due to the inclusion of falsepositive data.
Design of Bsite mixed MHPs for solarcell application
As described previously, we applied the trained CGCNN models to explore the chemical space of the Bsitealloyed MHPs with a fourfold enlarged unit cell consisting of 80 atoms. For A and Xsites, we considered the chemical elements of {Cs, K, Rb} and {Br, Cl, I}, respectively. For Bsite alloying, we utilized the chemical elements of {Ge, Sn, Pb, Zn, Cd, and Hg} under binary, ternary, and quaternary alloying conditions. The compositional ratio of the alloying was varied from zero to one in 0.0625 steps. Then the search space consisted of 2025 chemical compositions for binary alloying and 18,900 for ternary alloying. For quaternary alloying, we considered only Cs for the Asite to reduce the computational cost of the search procedure. Through the exploration of binary and ternary alloying systems, we determined that ∆H_{decomp} becomes lower when Cs is in the Asite. Thus, the search space for quaternary alloying was reduced to 20,475 compositions. For each chemical composition, the CGCNN model was applied to all possible atomic configurations to determine the lowest \(\Delta {H}_{{\rm{decomp}}}T\Delta {S}_{{\rm{mix}}}\). In total, 41,400 compositions and approximately 5.6 × 10^{11} atomic configurations were explored for alloyed MHPs using the CGCNN model during the search process. The details of computing 5.6 × 10^{11} atomic configurations are described in the Supplementary Note 3.
Figure 5 shows the distributions of \(\Delta {H}_{{\rm{decomp}}}^{{\rm{CGCNN}}}T\Delta {S}_{{\rm{mix}}}\) of CsGe_{x}Sn_{1x}Br_{3} and CsGe_{x}Hg_{y}Sn_{1xy}Cl_{3} systems as case examples, showing the lowest \(\Delta {H}_{{\rm{decomp}}}^{{\rm{CGCNN}}}T\Delta {S}_{{\rm{mix}}}\) in binary and ternary systems, respectively. As mentioned previously, the CGCNN covers the compositions beyond the chemical space of the training data because the unit cell size is four times larger than that of the training data. We also observed large variations in the \(\Delta {H}_{{\rm{decomp}}}^{{\rm{CGCNN}}}T\Delta {S}_{{\rm{mix}}}\) for compositions with a high ∆S_{mix} (i.e., close to equiatomic ratio). In particular, at the fraction of 0.5 in Fig. 5a, three inequivalent atomic configurations exhibit different ∆H_{decomp} in the training data, and the energy deviation lies on the prediction range of our CGCNN models. The lowest \(\Delta {H}_{{\rm{decomp}}}^{{\rm{CGCNN}}}T\Delta {S}_{{\rm{mix}}}\) values predicted by the CGCNN model (x = 0.75 for Fig. 5a and x = 0.5625, y = 0.25 for Fig. 5b) were validated once again using DFT calculations, as shown in Fig. 5. The error between the CGCNN predictions and the DFT data (80 atoms) was below 10 meV atom^{−1}.
With the most stable atomic configurations for each composition, we applied the following four screening criteria to discover potentially promising candidates for solarcell application: (1) nonindirect bandgap, (2) \({E}_{{\rm{gap}}}^{{\rm{CGCNN}}}\) < 0.5 eV, (3) τ value that is as low as possible (i.e., τ < 4.18), and (4) \(\Delta {H}_{{\rm{decomp}}}^{{\rm{CGCNN}}}T\Delta {S}_{{\rm{mix}}}\) that is as low as possible (i.e., \(\Delta {H}_{{\rm{decomp}}}^{{\rm{CGCNN}}}T\Delta {S}_{{\rm{mix}}} < 0\)). However, increasing the fraction of Ge tends to decrease ∆H_{decomp}, as shown in Fig. 2b; therefore, we selected candidates with the top three low values of \(\Delta {H}_{{\rm{decomp}}}^{{\rm{CGCNN}}}T\Delta {S}_{{\rm{mix}}}\) within each region divided into five intervals for τ lower than 4.18. We additionally selected candidates containing at least 50% Sn or Pb, because Sn and Pbcontaining MHPs are the most promising materials for solarcell applications^{41}. Throughout the screening procedure, 110 compounds were selected.
Subsequently, the 110 selected compounds were validated using DFT calculations with the PBEsol functional. We determined that the error between \(\Delta {H}_{{\rm{decomp}}}^{{\rm{CGCNN}}}T\Delta {S}_{{\rm{mix}}}\) and \(\Delta {H}_{{\rm{decomp}}}^{{\rm{PBEsol}}}T\Delta {S}_{{\rm{mix}}}\) did not exceed 6 meV atom^{−1} (view Supplementary Fig. 8a for the error distribution). However, in the case of the bandgap and band type, the predictions from the CGCNN were significantly inaccurate. That is, the CGCNN predicted that all 110 compounds would have a nonindirect bandgap; however, 79 compounds exhibited an indirect bandgap. Moreover, the MAE for bandgap predictions was 0.038 for systems with 20 atoms, whereas it increased to 0.14 for systems with 80 atoms (Supplementary Fig. 8b). We assume that these results were obtained as the band structurerelated properties do not change continuously with compositional variations when compared to the thermodynamic stability, causing high prediction uncertainty for unseen compositional space. The detailed analysis of increased bandgap prediction error can be found in the Supplementary Note 4, Supplementary Table 1, Supplementary Fig. 2, and Supplementary Fig. 3. One practically applicable strategy to enhance the prediction reliability could be adding training data randomly selected from the target domain after the DFT calculations as similarly done in the active learning strategy. A detailed description of the mitigation strategy can also be found in the Supplementary Note 5 and Supplementary Fig. 4.
We calculated the PBE0 bandgap for only those compounds that were calculated to have a direct bandgap using PBEsol, yielding 31 candidates (refer to Supplementary Table 5 for the full list of the selected 31 candidates) with \({E}_{{\rm{gap}}}^{{\rm{PBE}}0}\) close to the Shockley–Queisser limit^{38} (1.2–1.4 eV) or the optimal bandgap of the top cell of a tandem solar cell^{17} (1.73 eV). From among these 31 compounds, 10 compounds showing an optimal bandgap were selected for each system, as summarized in Table 1. All selected compounds have direct bandgaps, as shown in Supplementary Fig. 9.
Previous experimental results have been reported for compositions similar to those listed in Table 1. We compared experimental and PBE0calculated bandgaps of those compounds in Table 2. The difference between experimentally reported band gaps and DFTcalculated band gaps for 19 reported perovskite compounds is summarized in Supplementary Table 4. Previous experimental studies have reported CsGe_{x}Sn_{(1x)}Br_{3}^{42}, CsGe_{0.5}Sn_{0.5}I_{3}^{43}_{,} CsPb_{x}Sn_{(1x)}Br_{3}^{44}, and CsGe_{x}Pb_{(1x)}I_{3}^{45} as light absorbers for perovskite solar cells. Combining Table 2 and Supplementary Table 4, the root mean squared error (RMSE) between the experimentally reported bandgaps and our PBE0calculated bandgaps is approximately 0.30 eV (see also Supplementary Fig. 5). Specifically, in the CsGe_{x}Sn_{(1x)}Br_{3} system, the PBE0 band gap underestimates the experimental band gap by approximately 0.5 ~ 0.6 eV, while for other systems, it underestimates approximately 0.1 eV.
The effective mass of the carriers listed in Table 1 does not exceed 1 m_{e} (mass of free electrons), except for CsGe_{0.5625}Hg_{0.3125}Sn_{0.125}Cl_{3} and CsCd_{0.125}Ge_{0.5625}Hg_{0.1875}Sn_{0.125}Cl_{3}. The carrier effective mass is known to directly affect the mobility of electrons in the electron transport layer, which in turn affects the power conversion efficiency of solar cells. The optical absorption spectra and SLME versus the sample thickness are shown in Supplementary Fig. 10. Note that the SLME in Table 1 is a convenient metric for comparing the suitability of the compounds for photon absorbers; however, it does not represent a realistic power conversion efficiency. Considering that our \({E}_{{\rm{gap}}}^{{\rm{PBE}}0}\) value underestimates \({E}_{{\rm{gap}}}^{\exp }\) by approximately 0.5–0.6 eV in the CsGe_{x}Sn_{1x}Br_{3} system, the SLME of CsGe_{0.5625}Sn_{0.4375}Br_{3} and CsGe_{0.4375}Pb_{0.0625}Sn_{0.5}Br_{3} will be lower than that of CsCd_{0.125}Ge_{0.5625}Hg_{0.1875}Sn_{0.125}Cl_{3}. Thus, considering both the stability and SLME, we suggest that CsGe_{0.3125}Sn_{0.6875}I_{3} and CsGe_{0.0625}Pb_{0.3125}Sn_{0.625}Br_{3} are promising singlejunction and tandem perovskite solar cells, respectively.
Discussion
Although the DFT/ML design framework presented here shows several promising aspects, substantial challenges still exist, which must be overcome for the experimental achievement of this framework. A discrepancy exists between the \({E}_{{\rm{gap}}}^{{\rm{PBE}}0}\) and \({E}_{{\rm{gap}}}^{\exp }\) values. Owing to the significant discrepancies between CGCNN predictions and DFT results regarding the band type, certain compounds that were predicted as indirectbandgap materials by the CGCNN may be direct bandgap materials in reality. The mitigation strategy for bandgap prediction is described in the Supplementary Note 5. Furthermore, our metric of thermodynamic stability, \(\Delta {H}_{{\rm{decomp}}}T\Delta {S}_{{\rm{mix}}}\), ignores the fact that Ge and Sn readily oxidize to a + 4 state during the fabrication procedure, resulting in a significant loss of opencircuit voltage^{46,47}. Further extension should be made to consider other important properties such as defect formation energy, electronic levels of such defects, charge carrier transport properties, and stability of surfaces and interfaces of perovskites for photovoltaics and other optoelectronic applications. Finally, A and Xsite alloying should also be considered in future studies.
This study explored the chemical space (compositional and configurational) of Bsitemixed allinorganic perovskites by combining DFT and ML. A total of 3,159 PBEsol data points were generated for the thermodynamic stability, bandgap, and bandgap type. The CGCNN models were trained on the PBEsol data using the crystalline structures as inputs to predict the aforementioned properties. The trained CGCNN models screened 110 of the 41,400 compounds with desirable properties. Of the 110 compounds, 31 were selected after examining their band types and PBE0calculated bandgaps. Furthermore, the carrier effective masses, optical absorption spectra, and SLME of 10 of the 31 compounds were calculated. Among the 10 compounds, CsGe_{0.3125}Sn_{0.6875}I_{3} and CsGe_{0.0625}Pb_{0.3125}Sn_{0.625}Br_{3} were suggested as promising singlejunction and tandem perovskite solar cells, respectively, considering their stability and SLME. We have also presented the limitations of this study and perspectives for future research. The DFT/ML design framework proposed herein forms the basis for designing perovskite alloys with desirable properties.
Methods
Training data generation
Training data were generated through DFT calculations using PBEsol^{30}. As shown in Fig. 1a, we employed a crystal structure consisting of four formula units of ABX_{3} as a computational unit cell (i.e., the number of atoms in the unit cell = 20), with a focus on multielement mixing at the Bsite. For the A and Xsites in ABX_{3}, we used one of the three monovalent alkali cations (Cs, K, and Rb) and one of the three halogen anions (Br, Cl, and I), respectively. We excluded organic cations such as methylammonium (MA) and formamidinium (FA) from the Asite for two reasons: (1) instability issues of perovskite devices that primarily arise from organic components^{6} and (2) higher positional degrees of freedom than alkali elements, which cause additional energy contributions from various molecular configurations. For the Bsite, we considered multielement mixing of six metal cations (Cd, Ge, Hg, Pb, Sn, and Zn) up to the quaternary system because four available atomic positions of the Bsite exist within the considered computational unit cell, giving 126 (= _{6}H_{4}, combination with repetition) compositions. From this, 1134 (= 3 × 126 × 3) unique ABX_{3} compositions were obtained, and by considering all possible atomic configurations within each composition, 3159 unique Bsitemixed ABX_{3} structures were obtained for the training data. The details of obtaining 3159 unique structures are described in Supplementary Note 1 and Supplementary Fig. 1. The aforementioned six metal elements for the Bsite were chosen to minimize any significant perturbation of the electronic structure of the pristine Pbbased halide perovskite, which is good for applications in solar cells. To achieve this, we intentionally chose cations that exhibit electronic similarities to Pb^{2+} ions. This selection was based on two key conditions: 1) maintaining the same charged state (2+) and 2) ensuring filled delectron orbitals. The idea behind these conditions originated from the understanding that considering a cation with a different charge state can result in charge imbalances, potentially leading to the formation of metallic materials. Additionally, the presence of unfilled dorbitals tends to introduce complex interactions with anions, leading to a markedly different electronic structure.
Crystal Graph Convolutional Neural Networks training
For all CGCNN models, an unrelaxed input geometry was used as the input to predict the target properties of the DFTrelaxed structures, which allowed us to predict the target properties of the relaxed structures without the costly DFT structural relaxations. Here, the experimentally wellknown structure of CsPbI_{3} was used as the input for the CGCNN models; moreover, we present the crystal structure of CsPbI_{3} in the CIF format in Supplementary Note 6. The atomic features of the original CGCNN paper^{29} and default hyperparameters of the CGCNN code distributed in GitHub (https://github.com/txie93/cgcnn) were used. The hyperparameters used in this study are listed in Supplementary Table 3. A total of 3159 PBEsol data points were randomly divided in a ratio of 7:1:2 as training, validation, and test data. The validation data were used to determine the model parameters using an earlystopping approach.
Thermodynamic stability and bandgap of Bsite mixed metal halide perovskites
To estimate the thermodynamic stabilities of the compounds, we used the previously defined decomposition enthalpy, ∆H_{decomp}^{19,20,21,22,24}. Decomposition enthalpy is defined as the difference between the energy of ABX_{3} and the sum of the energies of the most stable AX and BX_{2} phases. Thus, ∆H_{decomp} of the Bsite mixed MHPs (i.e., AB*X_{3}, where B* represents mixed Bsite species such as Cd_{0.25}Ge_{0.75} and Hg_{0.25}Pb_{0.25}Sn_{0.5}) can be defined as follows:
where E is the DFTcalculated total energy of the relaxed geometries and x_{i} is the fraction of the element B_{i}. From Eq. (1), we can observe that a more negative ∆H_{decomp} indicates that the system is more stable when compared to the other binary metal halide phases (i.e., AX and BX_{2}). The energetically most stable structures for the AX and BX_{2} phases were obtained from Materials Project (MP), an openaccess database that offers material properties^{48}. Conversely, the structure of GeCl_{2} was retrieved from the Open Quantum Materials Database (OQMD)^{49} because the structure of GeCl_{2} is not available in the MP. We present a list of MP or OQMD IDs of AX and BX_{2} phases and their DFT (PBESol)calculated total energies in Supplementary Table 2.
The mixing entropy, ∆S_{mix}, is defined as the configurational entropy based on the ideal solid solution model (Eq. (2)):
where k_{B} is the Boltzmann constant, N is the number of components at the Bsite, and x_{i} is the atomic fraction of component i at the Bsite.
In addition to the thermodynamic stability, the band type and bandgap (E_{gap}) were calculated to estimate the possibility of application in photovoltaics at the PBEsol level. Because MHPs with indirect bandgaps are not usually suitable for photovoltaics, we classified the band types into two categories: indirect and nonindirect (i.e., direct, metallic, and semimetallic). Materials determined to have a metallic or semimetallic bandgap using PBEsol can be identified as having a direct bandgap using PBE0; thus, we did not exclude them from the PBE0 calculations. This approach assumes that the band types calculated using PBEsol and PBE0 are consistent with each other, although the bandgap calculated using PBEsol (\({E}_{{\rm{gap}}}^{{\rm{PBEsol}}}\)) is underestimated when compared to that calculated using PBE0 (\({E}_{{\rm{gap}}}^{{\rm{PBE}}0}\)).
Tolerance factor of Bsite mixed metal halide perovskites
According to Bartel et al.^{35}, the probability of a compound being a perovskite increases as τ becomes smaller than 4.18, and vice versa. τ is calculated using Eq. (3):
where n_{A} is the oxidation state of A and r_{i} is the ionic radius of ion i. Here, r_{B} is the compositionweighted average of the ionic radius of each B element (i.e., \({r}_{{\rm{B}}}={\sum }_{i=1}^{N}{x}_{i}{r}_{{B}_{i}}\)). We selected the compositions showing \(\Delta {H}_{{\rm{decomp}}}T\Delta {S}_{{\rm{mix}}} \,<\, 0\) and τ < 4.18 as stable compounds.
Density functional theory calculations
All DFT calculations were performed using the Vienna Ab initio Simulation Package (VASP) 6.1.2.^{50}, and projector augmented wave pseudopotentials^{51}. The pseudopotentials of all elements were chosen according to the official recommendations of the VASP. For training data generation and final DFT validation of the selected compounds, the projection operators of the nonlocal part of the pseudopotential were calculated in reciprocal and real spaces, respectively. The energy cutoff for a planewave representation was 500 eV. All the cell parameters and atomic coordinates are relaxed using a convergence criterion of 10^{−5} eV on the energies of the selfconsistent electronic step and 0.01 eV/Å on the atomic force. The Brillouin zone was sampled using a Γcentered automatic kpoints generation scheme that was implemented in VASP, with a length of 50 Å for determining the subdivisions. Cell relaxation was performed first; then, the energy was obtained from the consequent fixedcell relaxation.
The band structure was calculated for the training data and for the selected compounds through screening. For training data generation, the band structure was calculated using PBEsol at highsymmetry kpoints in the line mode. For the final DFT validation of the selected compounds, the band structures were calculated using PBE0 with Γcentered 2 × 2 × 2 Monkhorst–Pack kpoints. According to PBEsol, the band edges of all the selected compounds were predicted to lie on the highsymmetry kpoints contained in the Γcentered 2 × 2 × 2 Monkhorst–Pack kpoint. Therefore, the Γcentered 2 × 2 × 2 Monkhorst–Pack kpoints were used in the PBE0 calculation to calculate the bandgap and reduce the computational time. Both band structure calculations used the PBEsoloptimized structure as the input and SOC corrections^{32}. We compared the bandgaps calculated using PBE0, HSE06^{32}, and B3LYP^{33} functionals for 19 representative perovskite materials with known experimental bandgaps, as shown in Supplementary Table 4 and Supplementary Fig. 5. After comparison, we found that PBE0 best aligns with the experimental bandgaps. The VASPKIT code^{52} was used to determine the PBEsol band type and bandgap from the data calculated using VASP.
The effective masses of electrons and holes were calculated based on the PBEsolcalculated band structures using the sumo^{53} software, which is a set of commandline tools for plotting and analysis of periodic ab initio calculations. The optical absorption spectra were calculated using the LOPTICS tag^{54}, setting the number of grid points at which the density of states (NEDOS tag) was set to 2000 for each structure. We obtained optical absorption spectra on Γcentered 4 × 4 × 4 Monkhorst–Pack kpoints with the kpoint downsampling scheme using the PBE0 functional with SOC corrections. Sumo^{53} was also used to postprocess the optical absorption spectra.
Data availability
The datasets generated and/or analyzed during the current study are available at https://github.com/KRICTDATA/Perov_CGCNN. Please consult the corresponding author with further questions or requests.
Code availability
The underlying code for this study is available at https://github.com/KRICTDATA/Perov_CGCNN. Please consult the corresponding author with further questions or requests.
References
Dey, A. et al. State of the art and prospects for halide perovskite nanocrystals. ACS Nano 15, 10775–10981 (2021).
Basumatary, P. & Agarwal, P. A short review on progress in perovskite solar cells. Mater. Res. Bull. 149, 111700 (2022).
Dong, H. et al. Metal Halide Perovskite for nextgeneration optoelectronics: progresses and prospects. eLight 3, 3 (2023).
Hao, J. & Xiao, X. Recent development of optoelectronic application based on metal halide perovskite nanocrystals. Front. Chem. 9, 822106 (2022).
Liao, C.H., Mahmud, M. A. & HoBaillie, A. W. Y. Recent progress in layered metal halide perovskites for solar cells, photodetectors, and fieldeffect transistors. Nanoscale 15, 4219–4235 (2023).
Rao, M. K., Sangeetha, D. N., Selvakumar, M., Sudhakar, Y. N. & Mahesha, M. G. Review on persistent challenges of perovskite solar cells’ stability. Sol. Energy 218, 469–491 (2021).
Ono, L. K., JuarezPerez, E. J. & Qi, Y. Progress on perovskite materials and solar cells with mixed cations and halide anions. ACS Appl. Mater. Interfaces 9, 30197–30246 (2017).
Xu, F., Zhang, T., Li, G. & Zhao, Y. Mixed cation hybrid lead halide perovskites with enhanced performance and stability. J. Mater. Chem. A 5, 11450–11461 (2017).
Wang, Y. et al. Highentropy perovskites for energy conversion and storage: design, synthesis, and potential applications. Small Methods 2201138, 1–27 (2023).
George, E. P., Raabe, D. & Ritchie, R. O. Highentropy alloys. Nat. Rev. Mater. 4, 515–534 (2019).
Yi, C. et al. Entropic stabilization of mixed Acation ABX _{3} metal halide perovskites for high performance perovskite solar cells. Energy Environ. Sci. 9, 656–662 (2016).
Li, Z. et al. Stabilizing perovskite structures by tuning tolerance factor: formation of formamidinium and cesium lead iodide solidstate alloys. Chem. Mater. 28, 284–292 (2016).
Kumar, S. et al. Mixing entropyinduced layering polydispersity enabling efficient and stable perovskite nanocrystal lightemitting diodes. ACS Energy Lett. 4, 118–125 (2019).
Solari, S. F. et al. Stabilization of leadreduced metal halide perovskite nanocrystals by highentropy alloying. J. Am. Chem. Soc. 144, 5864–5870 (2022).
Sun, S. et al. Accelerated development of perovskiteinspired materials via highthroughput synthesis and machinelearning diagnosis. Joule 3, 1437–1451 (2019).
Chen, S. et al. Exploring the Stability of Novel Wide Bandgap Perovskites by a Robot Based High Throughput Approach. Adv. Energy Mater. 8, 1701543 (2018).
Jesper Jacobsson, T. et al. Exploration of the compositional space for mixed lead halogen perovskites for high efficiency solar cells. Energy Environ. Sci. 9, 1706–1724 (2016).
Yang, J., Manganaris, P. T. & Mannodi Kanakkithodi, A. K. A highthroughput computational dataset of halide perovskite alloys. Digit. Discov. 2, 856–870 (2023).
Wang, X. et al. Entropydriven stabilization of multielement halide doubleperovskite alloys. J. Phys. Chem. Lett. 13, 5017–5024 (2022).
Pu, W., Xiao, W., Wang, J., Li, X. & Wang, L. Screening of perovskite materials for solar cell applications by firstprinciples calculations. Mater. Des. 198, 109387 (2021).
Liu, D., Li, Q., Jing, H. & Wu, K. Firstprinciples modeling of leadfree perovskites for photovoltaic applications. J. Phys. Chem. C. 123, 3795–3800 (2019).
Yamamoto, K., Iikubo, S., Yamasaki, J., Ogomi, Y. & Hayase, S. Structural stability of iodide perovskite: a combined cluster expansion method and firstprinciples study. J. Phys. Chem. C. 121, 27797–27804 (2017).
Choubisa, H. et al. Crystal site feature embedding enables exploration of large chemical spaces. Matter 3, 433–448 (2020).
MannodiKanakkithodi, A. & Chan, M. K. Y. Datadriven design of novel halide perovskite alloys. Energy Environ. Sci. 15, 1930–1949 (2022).
Zunger, A., Wei, S.H., Ferreira, L. G. & Bernard, J. E. Special quasirandom structures. Phys. Rev. Lett. 65, 353–356 (1990).
van de Walle, A. et al. Efficient stochastic generation of special quasirandom structures. Calphad 42, 13–18 (2013).
Yang, Y.C., Liu, C., Lin, C.Y. & Xia, Z. Core effect of local atomic configuration and design principles in AlxCoCrFeNi highentropy alloys. Scr. Mater. 178, 181–186 (2020).
Wu, D. et al. Noblemetal highentropyalloy nanoparticles: atomiclevel insight into the electronic structure. J. Am. Chem. Soc. 144, 3365–3369 (2022).
Xie, T. & Grossman, J. C. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys. Rev. Lett. 120, 145301 (2018).
Perdew, J. P. et al. Restoring the densitygradient expansion for exchange in solids and surfaces. Phys. Rev. Lett. 100, 136406 (2008).
Adamo, C. & Barone, V. Toward reliable density functional methods without adjustable parameters: The PBE0 model. J. Chem. Phys. 110, 6158–6170 (1999).
Steiner, S., Khmelevskyi, S., Marsmann, M. & Kresse, G. Calculation of the magnetic anisotropy with projectedaugmentedwave methodology and the case study of disordered Fe_{1−x}Co_{x} alloys. Phys. Rev. B 93, 224425 (2016).
Krukau, A. V., Vydrov, O. A., Izmaylov, A. F. & Scuseria, G. E. Influence of the exchange screening parameter on the performance of screened hybrid functionals. J. Chem. Phys. 125, 224106 (2006).
Stephens, P. J., Devlin, F. J., Chabalowski, C. F. & Frisch, M. J. Ab initio calculation of vibrational absorption and circular dichroism spectra using density functional force fields. J. Phys. Chem. 98, 11623–11627 (1994).
Bartel, C. J. et al. New tolerance factor to predict the stability of perovskite oxides and halides. Sci. Adv. 5, eaav0693 (2019).
Goldschmidt, V. M. Die Gesetze der Krystallochemie. Naturwissenschaften 14, 477–485 (1926).
Filip, M. R. & Giustino, F. The geometric blueprint of perovskites. Proc. Natl Acad. Sci. 115, 5397–5402 (2018).
Shockley, W. & Queisser, H. J. Detailed balance limit of efficiency of pn junction solar cells. J. Appl. Phys. 32, 510–519 (1961).
Yu, L. & Zunger, A. Identification of potential photovoltaic absorbers based on firstprinciples spectroscopic screening of materials. Phys. Rev. Lett. 108, 068701 (2012).
Thiele, G., Rotter, H. W. & Schmidt, K. D. Kristallstrukturen und Phasentransformationen von Caesiumtrihalogenogermanaten(II) CsGeX3 (X = Cl, Br, I). Z. Anorg. Allg. Chem. 545, 148–156 (1987).
Li, Y. et al. Allinorganic perovskite solar cells featuring mixed group IVA cations. Nanoscale 15, 7249–7260 (2023).
Kama, A., Tirosh, S., Itzhak, A., Ejgenberg, M. & Cahen, D. New Pbfree stable Sn–Ge solid solution halide perovskites fabricated by spray deposition. ACS Appl. Energy Mater. 5, 3638–3646 (2022).
Chen, M. et al. Highly stable and efficient allinorganic leadfree perovskite solar cells with nativeoxide passivation. Nat. Commun. 10, 16 (2019).
Schwartz, H. A. et al. Bandgap tuning in allinorganic CsPb_{x}Sn_{1– x} Br_{3} perovskites. ACS Appl. Mater. Interfaces 13, 4203–4210 (2021).
Meng, F. et al. Ge incorporation to stabilize efficient inorganic CsPbI_{3} perovskite solar cells. Adv. Energy Mater. 12, 2103690 (2022).
Krishnamoorthy, T. et al. Leadfree germanium iodide perovskite materials for photovoltaic applications. J. Mater. Chem. A 3, 23829–23832 (2015).
Hao, F., Stoumpos, C. C., Cao, D. H., Chang, R. P. H. & Kanatzidis, M. G. Leadfree solidstate organicinorganic halide perovskite solar cells. Nat. Photonics 8, 489–494 (2014).
Jain, A. et al. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation. APL Mater. 1, 011002 (2013).
Saal, J. E., Kirklin, S., Aykol, M., Meredig, B. & Wolverton, C. Materials design and discovery with highthroughput density functional theory: The Open Quantum Materials Database (OQMD). JOM 65, 1501–1509 (2013).
Kresse, G. & Furthmüller, J. Efficient iterative schemes for ab initio totalenergy calculations using a planewave basis set. Phys. Rev. B 54, 11169–11186 (1996).
Blöchl, P. E. Projector augmentedwave method. Phys. Rev. B 50, 17953–17979 (1994).
Wang, V., Xu, N., Liu, J.C., Tang, G. & Geng, W.T. VASPKIT: A userfriendly interface facilitating highthroughput computing and analysis using VASP code. Comput. Phys. Commun. 267, 108033 (2021).
Ganose, A. M., Jackson, A. J. & Scanlon, D. O. sumo: Commandline tools for plotting and analysis of periodic ab initio calculations. J. Open Source Softw. 3, 717 (2018).
Gajdoš, M., Hummer, K., Kresse, G., Furthmüller, J. & Bechstedt, F. Linear optical properties in the projectoraugmented wave methodology. Phys. Rev. B. 73, 045112 (2006).
Acknowledgements
This work is supported by the National Research Foundation (NRF) grant funded by the Korean government (MSIT) (RS202300283597) and the National Supercomputing Center with supercomputing resources and technical support (KSC2019CRE0128).
Author information
Authors and Affiliations
Contributions
JS.K. wrote the codes to automate the generation of training data, trained the CGCNN models, implemented chemical exploration, conducted the DFT calculations, surveyed literature, and wrote the paper. J.N. wrote the codes to conduct chemical space exploration using CGCNN, analyzed data, and edited the paper. J.I. provided the main idea of this study, oversaw all work, and edited the paper.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kim, JS., Noh, J. & Im, J. Machine learningenabled chemical space exploration of allinorganic perovskites for photovoltaics. npj Comput Mater 10, 97 (2024). https://doi.org/10.1038/s41524024012701
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41524024012701