Accelerated design and discovery of perovskites with high conductivity for energy applications through machine learning

We use machine learning tools for the design and discovery of ABO3-type perovskite oxides for various energy applications, using over 7000 data points from the literature. We demonstrate a robust learning framework for efficient and accurate prediction of total conductivity of perovskites and their classification based on the type of charge carrier at different conditions of temperature and environment. After evaluating a set of >100 features, we identify average ionic radius, minimum electronegativity, minimum atomic mass, minimum formation energy of oxides for all B-site, and B-site dopant ions of the perovskite as the crucial and relevant predictors for determining conductivity and the type of charge carriers. The models are validated by predicting the conductivity of compounds absent in the training set. We screen 1793 undoped and 95,832 A-site and B-site doped perovskites to report the perovskites with high conductivities, which can be used for different energy applications, depending on the type of the charge carriers.


INTRODUCTION
Advances in technology and science rely on the screening and discovery of high-performance and novel functional materials. Discovery of materials for alternative green energy technologies have become a necessity to reduce the reliance on fossil fuels and decrease the carbon footprint 1,2 . Green energy technologies like the fuel and electrolyzer cells, alkali-ion batteries, gas (e.g., carbon dioxide and oxygen) sensors, gas separation membranes, membrane reactors, and solar cells, all rely on total conductivity be it of the type ionic, electronic, or mixed [3][4][5][6] . Solid oxide fuel/electrolyzer cells (SOFCs/SOECs) use materials of different charge carrier types as electrodes (mixed conductors) or electrolyte (ionic conductors) for conversion of chemical energy to electrical energy and vice versa [7][8][9][10][11] . Oxygen separation membranes mostly make use of mixed ionic/electronic conduction to compensate for charge differences due to ionic transfer from higher partial pressures to low partial pressures of oxygen, through electronic transport 4 . Dense proton-conducting membranes are used in membrane reactors for dehydrogenation of gaseous hydrocarbons to produce clean hydrogen fuel 12,13 . However, most of these applications currently are limited to high temperatures (800-1000°C), where the kinetics of various chemical reactions and transport of charge carriers are relatively fast. However, high temperatures may sometimes lead to high polarization resistances, as shown by Barfod et al. 14 , and reducing the polarization resistance is also an active area of research. Research efforts over the years have been directed toward extending these technologies for low temperature applications for wider applicability and portability [15][16][17][18][19][20][21][22] , as seen in Fig. 1a, which shows the state of the art conductivities of different solid oxides. Over the years, the general trend for progress has been diagonally upward toward regions of higher conductivity and lower temperature. This has been made possible through extensive experiments by researchers. The use of trial-and-error experimentation for discovery and characterization of advanced materials can be very time-consuming, expensive, and inefficient. Statistical data-driven methods can accelerate this process through high-throughput screening of materials.
In recent work, high-throughput computational and experimental techniques were employed for discovery and design of novel materials for energy applications [23][24][25] . For example, highthroughput density functional theory (DFT) methods were used to screen: perovskite oxides for cathode materials 26 or materials for anodes of SOFCs 27 , perovskite oxides for thermochemical splitting of water 28 , materials for electrocatalytic hydrogen evolution 29,30 and oxygen reduction 31 , and fast Li-ion conductors 32 and electrodes 33 for Li-ion batteries for energy storage. Such studies also led to the significant data on computed and experimental properties of solid oxides. An active area of current research is to develop new methods for data extraction and establishing trends from existing datasets in scientific literature [34][35][36] . For instance, data-intensive machine learning (ML) techniques are currently widely used in materials science 37 . Data science-based methods have already been used in cheminformatics [38][39][40] , pattern recognition [41][42][43] , event forecasting [44][45][46] , decision making 47-50 , etc. The availability of data on properties of interest for candidates within a well-defined chemical space can assist statistical training and prediction approaches leading to design of new compounds. Some recent examples of materials' property predictions using this approach include predictions of molecular 51,52 and periodic systems' properties [53][54][55][56] , transition states 57 , structure classifications 58-60 , dielectric properties 61 , and predictions of bandgaps 62 . These existing material properties in open databases along with published experimental data provide an opportunity for efficient design and discovery of advanced materials, with enhanced functionalities.
The ability to identify solid oxide materials rapidly and accurately, with high conductivities is of much interest for a range of applications that might either require oxide materials for metallic, semiconductor, or resistive properties to name a few. It is very difficult to estimate conductivities without direct experiments using multi-probe equipment in a furnace for direct current conductivities or alternating current impedance spectroscopy. Estimating the bandgap through computational approaches might be useful. However, more advanced quantum calculations, such as the GW method 63 or those using hybrid functionals 64 are computationally expensive, and are thus inefficient for screening of materials based on bandgaps. Here, we develop a ML model, where one can use available attributes (also referred to as features) in open literature, of a material to directly predict conductivities in an efficient and accurate manner.
In this paper, we aim to build a validated statistical learning model for solid oxide perovskites using data from experimental observations of total conductivity, with the type of the majority charge carrier and material properties from open materials databases. The goal is to create a framework for fast exploration of the multidimensional chemical space of perovskites for applications based on total conductivity and carrier type. The perovskite structure is represented by the chemical formula ABO 3 , where the A cations are generally of larger radii and have a 12-fold Fig. 1 Motivation and methodology. a The progress in solid oxide conductivities over decades moving diagonally up toward higher conductivities at lower temperatures. The data has been taken from references 69,70,116,117 . We aim to accelerate this process through machine learning (ML). b The flow chart for the methodology adopted in this work. Total conductivity and charge carrier data from literature and 111 features from open databases are used to train a regressor ML model for total conductivity, to screen perovskites for high conductivities. These are then tested for stability based on energy above hull and tolerance factor. The stable perovskites are then classified for majority charge carrier using the charge carrier data from literature and 112 features (+ total conductivity), resulting in the prediction of new perovskite chemistries for different charge carriers. c Periodic table 118 Fig. 1c. The methodology of screening undoped and doped perovskites with high conductivity through ML regression and classifying them based on charge carriers through ML classification is illustrated in Fig. 1b, and discussed in detail in "Methods".

RESULTS AND DISCUSSION
The important chemical features from the ML regressor model turned out to be the minimum electronegativity and the average ionic radius of B-site ions, while for the classifier model they were minimum atomic mass of B-site ions and minimum formation energy of B-site oxides. The variation of the conductivity and charge carriers with these chemical features for the perovskites in the data collected from the literature are shown in Fig. 2a, b. Figure 2a shows that, higher electronegativity and smaller B-site radii lead to higher conductivities, which is reasonable as the smaller atoms in the periodic table are more electronegative and electronegativity of the B atoms closer to that of oxygen (electronegativity of oxygen is 3.44) lead to higher metallic characteristics enabling higher conductivities. On the other hand, in Fig. 2b, we see that most of the perovskites with higher atomic mass of B-site and lower formation energy of B-site constituting oxides are proton or mixed proton conducting. We hypothesize that the lower formation energies lead to stronger B-O bonds in the perovskites, which makes oxide ion conduction difficult, making proton conduction preferable. Also, heavier B-site ions have larger radii that may lead to adjustment of positions of oxygen atoms in the BO 6 octahedra in the perovskite structures farther from the central B-site ion creating more free-cell volume, which makes proton diffusion easier and hence the higher proton conductivities. The list of features with their feature importance from the XG-Boost models are listed in Supplementary Notes of the Supplementary Information.

Screening of stable pure perovskites
The conductivity of perovskites depends on the chemistry, offstoichiometry, as well as the effect of environment, such as temperature and atmosphere. In perovskites, the conductivity is the sum of electronic (negatively charged electrons and positively charged holes) and ionic conductivities, which may be due to the movement of protons interstitially attaching to O atoms 65 or oxide ions via substitutions of the O sites 66 . The total conductivity is, where each term represents conductivity due to protons, oxide ions, electrons, and holes, respectively. One conductivity term may prevail over the other depending on the majority charge carrier. Also, depending on the effect of temperature, the electrical behavior of materials can either be metallic if the conductivity decreases with temperature due to the decrease in the mean free path of electrons enhanced by increased defects at high temperatures 67,68 or semiconductor type if the conductivity increases with temperature, where the majority charge carrier can be ions, which diffuse faster at higher temperatures. We screened type I and type II perovskites for higher conductivities at temperatures of 600 and 400°C and classified them with respect to their majority charge carriers as shown in Figs. 3 and 4.
We observe both semiconductor and metallic electrical behavior among the screened perovskites from the ML model as seen in Fig. 3a, b, showing total conductivities at 400 and 600°C for type I, and corresponding Fig. 3c, d for type II perovskites. As seen from the classification maps in Fig. 4, at three different atmospheres of wet (3% water vapor) H 2 , wet (3% water vapor) air, and dry O 2 , we observe atmosphere dependent change in charge carriers-for example, from ionic conduction (mostly protonic in wet atmospheres due to the reaction toward more mixed electronic or purely electronic type behavior with increase in O 2 partial pressures as O 2 fills in the oxide vacancies creating holes as per the reaction: x þ 2h Á , leading to increased electronic conductivity. Interestingly, many lanthanide group perovskites of type II become oxide and proton conducting at high oxygen atmospheres as seen in Fig. 4f, which have not been investigated because of the rarity of these elements. For the electronic conduction, we do not distinguish between p-type or n-type electronic conductivity, which is not the primary aim of this work. The trends of effect of temperature and atmosphere are predicted by the XG-Boost models reasonably well. One of the reasons we developed these ML models extracting the data from literature was to see if we could design materials for low temperature applications. From Fig. 3a, which are conductivity maps for screened type I perovskites at 400°C, we see that the conductivities of some perovskites like manganates, chromates, cobaltates, and germanates (columns 4-7 in Fig. 3a) are high, which are mostly classified as mixed oxide/electronic and can be used as cathodes/anodes for SOFCs, oxygen separation membranes or other energy applications. Also, there are other perovskites like osmates, iridates, platinates, plumbates, and bismuthates (columns 23-27 in Fig. 3a), which have high conductivities at 400°C and are classified mostly as proton or mixed protonic/electronic conducting oxides in wet H 2 /air conditions (corresponding Fig. 4a, b), which can be used for hydrogen separation membranes or electrodes for fuel cells again keeping in mind the stability and electrocatalytic behavior (water splitting) for fuel cell anodes. However, many of these oxides become electronic conductors at very high oxygen partial pressures, as seen in corresponding Fig. 4c. Figure 4a-c have the majority charge carrier types plotted for the same position of the elements as in Fig. 3a, b.
As thermally activated proton diffusion is much easier than oxide ion diffusion due to the small size of the protons, which diffuse interstitially, proton conductors are more likely to provide a solution to low temperature applications. We find many viable substitutes to BaZrO 3 and BaCeO 3 , which are some of the bestknown proton-conducting perovskite systems till date 69,70 . Niobates, stannates, hafnates, and thorates (columns 10, 16, 21, and 29 in Fig. 3a) are some systems, which are already known to be proton conductors, and have been predicted to have higher conductivities and can perform better than zirconates (column 9 in Fig. 3a) at low temperatures. We also predict palladates (column 15 in Figs. 3a and 4a) as proton conductors, which have not been studied much in literature.
Among the type II perovskites, from  The increase in conductivity (e.g., indicated by change of color as observed in column 9 for zirconates in a and b) with temperature indicates ionic conduction, while a decrease in conductivity (e.g., indicated by change in color in the lanthanide-based perovskites in rows 11-13 and columns 6-9), with temperature mostly implies electronic contribution to conductivity.
(column 29), antimonates (column 20), and bismuthates (column 21) are oxide ion conductors, while others along with vanadates 73,74 (column 24) are electronic conductors as also predicted by our classification model (see Fig. 4d, e). The conductivity of type II (Fig. 3c, d) perovskites in general is found to be lower than that of type I (Fig. 3a, b).
Stability of perovskites is of major importance when considering their feasibility for energy applications through high-throughput screening. The thermodynamic stability has been theoretically dealt with by many researchers in the past. The stability of perovskites under oxygen reduction reaction (ORR) environments has been reported by Jacobs et al. 26 for use as cathodes in SOFCs. Similarly, high-throughput screening of stability of perovskites for electrocatalytic water splitting has been performed in a previous work by Emery et al. 28 . Emery et al. 28 suggests two criteria for the stability of perovskites: (a) the energy above hull calculated through DFT, which determines whether the composition is energetically at its minimum and possible, and (b) the tolerance factor of a perovskite, which also determines its crystal structure. Based on this, we have chosen the stable perovskites using the energy above hull <0.05 eV (The estimated error of highthroughput DFT calculations of heat of formation for OQMD is 0.096 eV atom −153 compared to experiments, while the stability criteria chosen by their group is <0.025 eV atom −128 .) at 0 K and tolerance factors between 0.7 and 1.1 at 300 K. Energy above hull seems to be a more conservative determination of stability than tolerance factors. The charts shown in Supplementary Fig. 3 in the Supplementary Information give us an idea of the stable perovskites based on the two factors. These stability criteria exclude the environmental effects in the operating environments of different energy devices, which might be oxidizing or reducing. The reducing atmosphere as on the fuel side of a SOFC, might make the perovskites prone to reduction and at extreme circumstances even formation of hydrides 75 . Some perovskites might also split the water electrochemically 28 . In the oxidizing atmosphere as on the air side, the A-site cations in the perovskites may oxidize and segregate on to the surface 76,77 .
Apart from screening the perovskites based on temperature and atmosphere, we also screened the best performing proton, oxide, mixed oxide/electronic, and mixed protonic/electronic perovskites of type I and type II based on the lower activation energy for conductivity and higher conductivities at 400°C in wet air conditions, as shown in Fig. 5. Figure 5 shows the results for all the stable perovskites. While screening them for various applications, we make sure that they have a lower activation energy preferably positive for ionic conduction. To make sure our predictions are accurate, we compared the activation energies from literature to our predicted values. The segmented linear sections for conductivities vs 1/temperature plots were used for the calculations. The comparison of the predicted and experimental values is shown in Fig. 5a and the root mean-squared error (RMSE) for calculation of the activation energies was found to be 0.047 eV.
Among the proton conductors shown in Fig. 5b, BaBiO 3 , BaTbO 3 , PmEuO 3 , CeErO 3 , EuPbO 3 , and SbTmO 3 have conductivities three to four orders of magnitude above the well-known zirconates (YbZrO3 is predicted to be the one with highest conductivity among the zirconates). However, these have an activation energy which is negative indicating co-existence of some electronic conduction, also predicted by classification in dry O 2 environment in Fig. 4c making them useful for specific applications needing mixed protonic/electronic conductivities. Also, AgNbO 3 is among one of the best predicted proton conductors and is also known as a photocatalyst for water and pure O 2 (c, f). The white region refers to proton conductors, purple to oxide ion conductors, dark green to mixed protonic/electronic, light green to mixed oxide/electronic, orange to mixed protonic/oxide, and brown to purely electronic conductors. There is a transition from protonic to electronic conductivity indicated by increase in electronic (brown) and mixed protonic and mixed oxide (green) colors due to inclusion of holes in presence of O 2 . splitting 78 . We, however, have some not so well-known candidates with good performance like EuNbO 3 and EuSnO 3 with conductivities two orders of magnitude higher than the zirconates. While SrNbO 3 and BaSnO 3 have been investigated for proton diffusion in the past 79-81 , EuNbO 3 and EuSnO 3 have not yet been studied. Also, studies for lanthanide elements as B-sites in perovskites are scarce. Among these, InLaO 3 , GdCeO 3 , and EuNdO 3 are predicted to be proton conducting, as seen in Fig. 5b.
Among the oxide conductors, we have predicted a jump of two orders of magnitude in conductivity over the well-known gallates, as seen in Fig. 5c. SrTiO 3 have oxide ions as the prominent charge carrier in wet air conditions, which has been extensively studied in the past 82,83 . SrTiO 3 is a potential candidate for an intermediate temperature (~400°C) oxide conductor with predicted total conductivity an order of magnitude higher than the gallates. Acceptor doped SrTiO 3 has been reported to show a semiconductor behavior with transference number >0.6 at 400°C (ref. 82 ). The conductivity of oxide ions depends not only on the vacancy formation, but also on the vacancy migration energies. SrTiO 3 has a lower oxygen vacancy migration energy of 0.5 eV (ref. 84 ) than the well-known LaGaO 3 (0.6 eV) 85 . Although, SrTiO 3 may have a higher conductivity, the oxygen ion transference number might not be optimum. Also, the ML model is unable to predict structural or phase changes in SrTiO 3 (ref. 86 ) or any other perovskite. Among the gallates, the ScGaO 3 (σ = 0.0011 Scm −1 ) has a total conductivity that is two orders of magnitude higher compared to the more commonly known LSGM (σ = 0.00002 Scm −1 ) 87,88 , which again has not yet been studied, to our knowledge. The RMSE for the prediction is 0.047 eV; screening of stable perovskites with energy above hull <0.05 eV and 0.7 < tolerance factor < 1.1. (type I: darker symbols with black labels for highest conductivities at 400°C for a given B-site and type II: lighter with dark blue labels for highest conductivities at 400°C for a given B-site), b proton-conducting, c oxide-conducting, d protonic/electronic, and e mixed oxide/electronic perovskites based on conductivities and activation energies. The predictions are made in wet air.
Among the mixed protonic oxides, several new candidates, such as EuGeO 3 , CaPtO 3 , YbIrO 3 , YbOsO 3 , and EuMoO 3 are predicted to have high conductivities, along with the well-known stannates (BaSnO 3 ) 80 , as seen in Fig. 5d. Here, mixed conductors refer to mixed electronic-ionic conductors, not different types of ions. MnSnO 3 predicted as a potential mixed protonic/electronic perovskite has previously been used for catalytic reduction of CO 2 , which involves transfer of both protons and electrons 89 . Among the mixed oxide/electronic perovskites (Fig. 5e), we find some perovskites with lanthanide groups as B-sites, such as SbHoO 3 and TbDyO 3 perform better or as good as the well-known cobaltates and manganates (SrCoO 3 , SrMnO 3 ) 90,91 . A complete list of the screened perovskites with predicted conductivities and classification is provided in the Supplementary Datasets.
The studied perovskites have already been reported by many researchers for various energy applications in the past. For example, BaMoO 3 has been studied for hydrogen evolution reaction 92 , and SrTiO 3 has been reported for low temperature fuel cell electrolyte 93 . Addition of transition metal, Ru to La 0.7 Sr 0.3 CoO enhances the electrocatalytic behavior 94 , indicating ruthenates to be possibly used as oxide or mixed ionic-electronic conductors electrodes for fuel cells 95 . Stannates like BaSnO 3 has already been used for SO 2 gas 96 , formaldehyde 97 , and liquified petroleum gas sensor 98 .

Screening of doped perovskites
We next use the ML model to screen the A-and B-site doped chemistries, which are stable by considering cases, where the tolerance factors were between 0.7 and 1.1, as well as energy above hull of the undoped perovskite <0.05 eV. We considered a nominal 5% doping of the A-and B-sites to avoid clustering. Many researchers have found no phase segregation at low dopant concentrations (<5%) in perovskites and oxides in general [99][100][101][102] . The dopant oxides of type M 2 O 3 for type I and MO for type II are used. In general, type I perovskites have higher conductivities compared to the type II perovskites for all classes of charge carriers as seen in Fig. 6, which is a summary of all the stable perovskites screened. While A-site doping does not increase the maximum total conductivity above the pure perovskites, B-site doping increases it for each of the charge carrier type. We find over 4000 (2926 B-site doped type I and 1151 B-site doped type II perovskites) proton conductors with conductivities above the well-known BZY. Also, there are over 300 candidates (5 undoped and 318 B-site doped) for oxide ion conduction above the well-known ceria nanocomposites for intermediate to low temperature applications. There are also several not so well-known candidates for mixed protonic/oxide electronic conduction like GdAlO 3 and doped (Ni 5 Sc)CoO 3 , (Fe 5 Sr)BiO 3 and Y(Cr 5 Ag)O 3 for relevant applications, as seen in Fig. 6. The mixed ionic/electronic conductors, which are often used as cathodes in fuel cells have been screened by Jacobs et al. 26 based on ORR catalytic activity and stability. We find that the predicted ferrates, cobaltates, and aluminates (BaFe 0.875 Al 0.125 O 3 , SrCo 0.75 Fe 0.25 O 3 ) for better catalytic activity by Jacobs et al. 26 , might also have high total conductivities contributed by oxide and electronic charge carriers.
To test the validity of the ML model without using the conductivities for all the carrier types, we used the XG-Boost regressor model to predict conductivities of a class of charge carrier, which was excluded from the training set. We then tried to predict the conductivities of the class which was excluded for validation. The comparison of the testing, validation, and crossvalidation performance of the model with each excluded charge carrier type is shown in Fig. 7. The corresponding mean absolute errors for the different ML models excluding a class of charge carrier type from testing or validation set, and the mean absolute error for the predicted log(conductivities) of the excluded charge carrier class is also reported in Table 1. The predicted conductivities are a maximum one order of magnitude off for the ionic and ionic-electronic conductors and about two orders of magnitude off for the electronic conductors. Training data from all classes of charge carriers is therefore necessary for making valid predictions for all conductor classes of perovskites.
We have developed a ML model by data mining conductivities for 7230 different perovskite chemistries from published literature under different conditions of temperature and atmosphere. We chose 111 different features related to chemistry, physical, electrical, and mechanical properties of A-, B-, and M-(A/B-site dopant) sites in perovskites and correlated them to total conductivities. We also classified them based on the type of charge carriers, which may either be protonic, oxide, mixed protonic/electronic, mixed oxide/electronic, mixed protonic/oxide, or electronic. The regression and classification were made possible through a XG-Boost regressor and classifier model, respectively. We identify average ionic radius, minimum electronegativity, minimum atomic mass, minimum formation energy of oxides for B-site ions and its dopants, as relevant predictors for determining conductivity and the type of charge carriers. We screened 1793 undoped and 95,832 A and B-site doped AO + BO 2 (5% M 2 O 3 ) and A 2 O 3 + B 2 O 3 (5% MO) perovskites for high conductivities, and classified them for majority charge carrier type. Among the proton conductors, the predictions of some potential candidates were four orders of magnitude higher than the well-known BaZr(Y)O 3 system. Eliminating perovskites with negative activation energies, which is indicative of electronic contribution, EuNbO 3 and EuSnO 3 are predicted to be good candidates for low temperature protonconducting electrolytes. ScGaO 3 has a total conductivity two orders of magnitude higher than LSGM, a well-known oxide conductor. The models are validated by predicting the conductivity of compounds absent in the training set. The study shows that ML models using literature data are computationally inexpensive and effective means of obtaining useful guidance for material design and experimentation.

Machine learning regression model for total conductivity
The crucial elements for any ML problem are the target and the features or descriptors that the target is a function of. For the problem considered here, the total conductivity is the target and we chose 111 different features that adequately characterize the difference in electrical properties, without recourse to a specific hypothesis. These features are related to chemistry, structure, electronic, mechanical, and physical properties   Supplementary Fig. 1a, b in Supplementary Information.
Microstructure, such as the grain size distribution [104][105][106] , plays a very important role in determining the conductivity of perovskites. The grain size information was available for only 1124 (out of 7230) perovskite candidates in literature. The errors would have been lower if the grain sizes would have been included in the analysis. Although found important, we could not include it due to the lack of enough data in journal papers, which reported the conductivities with the grain size information. The grain size distribution with a mean and standard deviation was even more scarce, so we collected just the average gran size from journal papers which reported them.
With these data in hand, we evaluated the accuracy of several ML methods using 85% and 10% of the data as the training and testing set, respectively, and the remaining 5% for cross-validation. For all the models, we also used five k-fold cross-validation to optimize the hyperparameters. This procedure has a single parameter, denoted as k, that refers to the number of groups that a given data sample is to be split into so that every split dataset serves as a test set for that iteration (carries for k iterations). ML package, scikit-learn 107 was used for the purpose. For the linear regression models, we evaluated various regularization methods (such as LASSO 108 (least absolute shrinkage and selection operator), ridge regression 109 , and elastic net 110 ), but regularization worsened the predictive performance. We also found that the support vector machines 111 model provided no improvement over the linear regression models. However, a seven-layered neural network 112 with 100 nodes in each layer and k-nearest neighbor 113 regression with five nearest neighbors, each delivered improvement over these methods. Moreover, we found that the RF 114 algorithm with XG-Boosting 115 provided even better predictive performance. RF algorithms operate by randomly selecting a subset of features and constructing decision trees based on the limited data, which are then averaged out for the final prediction. XG-Boosting uses a sequence of RF models learning sequentially from the previous model. By combining several of these decision tree models on the data subsets, an accurate prediction of the conductivities could be made without overfitting. A comparison of these methods is provided in the Supplementary Methods (Supplementary Fig. 2). The comparison of the XG-Boost model predictions with the experimental results is shown in Fig.  8 with the x-axis being the experimental observation. The test set RMSE for the XG-Boost model is 0.24 with a coefficient of determination R 2 value of 0.987, and the RMSE value is 0.25 with R 2 of 0.986 for cross-validation set.
The coefficient of determination is given by, The classification XG-Boost model used the same feature list as conductivity along with total conductivity as an additional feature. Less than 30% of the data reported transference numbers for different perovskites making the exact demarcation of the transference numbers for determining a charge carrier type difficult. We have assigned the majority charge carrier as reported in the literature or the charge carriers whose transference numbers are >0.6. For the XG-Boost classification model, the error for the test set was 1.46% and the error for the crossvalidation set was 0.83% for the classification of charge carriers. Also, the XG-Boost classification model was tested for five and ten different random test sets splitting the data into 5/10 parts. The average error based on this method was 1.4% and 1.5% for classification based on five splits and ten splits, respectively, which was higher than the errors using a random set Table 1. Mean absolute error for the different ML models excluding a class of charge carrier type from training, testing, or validation set The mean absolute error for the predicted log(conductivities) of the excluded charge carrier class is also shown. The predicted conductivities are maximum one order of magnitude off for the ionic and ionic-electronic conductors and about two orders of magnitude off for the electronic conductors.  Fig. 8 Performance of the machine learning algorithm. The comparison of ground truth total conductivities (along the x-axis) with predicted total conductivities for test set (10%) and crossvalidation set (5%) with error bars. The root mean-squared error (RMSE) in prediction of the test set is 0.24, while that of the crossvalidation set is 0.25.  for cross-validation. Such discrepancies have also have been reported by other researchers 14 . The comparison of the tested and predicted number of cases for the cross-validation set is provided in

DATA AVAILABILITY
The data of the training and test sets from the literature are available along with the manuscript. The dataset used for training, testing, and validation, and the corresponding python script to train and validate the XG-Boost model is also provided. The predicted conductivity and charge carrier data for the doped and undoped perovskites is provided in the tables in the dataset accompanying the manuscript. All the datasets are provided at https://figshare.com/s/ 10b18051e26fa4d4f18c.