Introduction

One of the important challenges in the computer-guided accelerated search for new materials is the simultaneous optimization of stability and one or more properties of interest1,2. Although the rationale for property optimization can be readily justified (e.g. need for room temperature superconductors, fast ion-conductors, sub-nm size skyrmions etc.), predicting stability is critical because it can inform whether a candidate material can be potentially synthesized in a laboratory setting or not. Furthermore, a quantitative understanding of the stability can put a constraint on the feasible solution space and our ability to predict them can significantly impact the accelerated search for new materials. Traditionally, density functional theory calculations are utilized, where the formation enthalpy and distance from the convex hull are used to determine the stability or metastability of a compound or an alloy3,4,5,6,7,8. While powerful, this approach has limitations in the accelerated search, especially if complex solid-solutions with fractional site occupancies, large supercells, and transition metal oxides with strong electron correlation are involved. The state-of-the-art computational tools9,10,11 are most suited to handle stoichiometric compounds.

In contrast, machine learning (ML) approaches that use experimental data are also increasingly utilized for accelerating the search for new materials12,13,14,15,16. In these studies, data from both successful and failed experiments are used to train ML models, which in turn can be used to predict whether a new data sample (not present in the training set) can be potentially experimentally synthesized or not. Thus, these methods do not use or strictly require thermodynamic stability data but take advantage of the past experiments to make inference about future experiments. One of the emerging areas in the nascent field of materials informatics is the active learning or adaptive design approach, where the ML models are combined with algorithms that recommend informative experiments (from a vast pool of possible experiments) such that the new data are expected to improve the performance of the ML models in the next iteration17,18,19. Recent demonstrations of these methods to experimentally discover complex organic−inorganic molecules, alloys, and functional oxides are worth mentioning12,13,14,15,16.

In this paper, we focus on the complex xBi\([ {{\mathrm{Me}}_y' {\mathrm{Me}}_{(1 - y)}''} ]\)O3–(1 − x)PbTiO3 (PT) perovskite solid solutions, where Me′ and Me″ are di-, tri-, tetra- or pentavalent cations that occupy the octahedral site of the perovskite lattice (shown in Fig. 1) and the resulting solid solution is charge neutral. These are candidate materials for high-temperature ferroelectric applications because of their large piezoelectric and electromechanical coupling coefficients20. Our survey of the periodic table identified a total of 23 cations that can occupy the Me′ and Me″ octahedral sites in the perovskite lattice. Further, these 23 cations can be distinctly combined to yield a total of 75 {Me′Me″} possible cation pairs and only 13 such pairs are experimentally explored. We also allow x to vary from 0.05 to 0.85 in steps of 0.05 and y to vary from 0.1 to 0.9 in steps of 0.01 for constraining our composition space. Overall, we identify 61,506 unique chemical compositions out of which only 167 are experimentally studied, representing only 0.28% of the search space. In the remaining unexplored chemical and composition space, it is unclear how many candidate high-temperature ferroelectric perovskites can and do exist. This is a challenging question because rules governing formation of high-temperature ferroelectric perovskite phases in a complex multinary phase space are not known a priori. Traditionally, time-consuming and expensive trial-and-error or intuition-driven experimental approaches are used to uncover the composition−structure−property relationships, which is a non-trivial task. The overarching research problem is schematically shown in Fig. 1.

Fig. 1
figure 1

Materials design challenges in the search for high-temperature ferroelectric perovskites. The phase space of xBi\([ {{\mathrm{Me}}_y' {\mathrm{Me}}_{(1 - y)}'' } ]\)O3–(1 − x)PT solid solutions contains a total of 61,500 compositions and only about 0.28% of them are experimentally investigated. A priori, the stability field of the desired high-temperature ferroelectric perovskite phase is not known. Traditionally, trial-and-error or intuition-driven experimental approaches are used to discover new high-TC compositions, which is non-trivial and can be time-consuming and costly. We demonstrate a two-step ML approach that can simultaneously optimize x, y, Me′, and Me″ such that the resulting solid solution will not only have a high-TC, but form in the perovskite crystal structure. Classification learning methods identify and isolate desired regions in the phase diagram, where the candidate perovskite phases are expected to exist. Regression methods, on the other hand, predict the ferroelectric TC for the candidate perovskites. Active learning methods recommend promising high-TC perovskite compositions for synthesis. Experiments validate the ML predictions and provide feedback for further model improvement

Here, we demonstrate a materials design approach driven by ML and active learning methods to simultaneously predict x, y, Me′, and Me″ such that the new candidate solid solutions are expected to (i) form in a perovskite structure (with at least 95% phase purity) and (ii) also have high ferroelectric Curie temperature (TC). The novelty of our ML approach lies in the integration of classification learning with regression methods to constrain the search space of possible perovskites so that only promising compositions are recommended for experimental synthesis, characterization, and feedback. As a result, we build two independent ML models, one for classification learning and the other for regression. The data for training the ML models are taken from the published experimental literature21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45. While the classification learning models allow us to screen for candidate chemical compositions that can have perovskite structure, the regression models then predict the TC for the candidate perovskite solid solutions. We also use our recently developed active learning (or adaptive design) approach based on efficient global optimization (EGO)13,17 to recommend promising compositions for experimental synthesis and characterization (details of EGO algorithm are discussed in the Regression and Active Learning section). We employ the conventional powder processing technique for synthesis, which allows us to validate one of the important questions of perovskite stability under the driving force of temperature. One can also, in principle, experimentally stabilize the perovskite phase by means of pressure, but we focus only on the temperature. We run our iterative loop for a total of five times, which resulted in investigating ten new compositions and we discovered six new perovskites, with 0.2Bi(Fe0.12Co0.88)O3–0.8PT having the highest measured TC of 898 K among them. We also identify three novel {Me′Me″} pairs, namely {FeCo}, {CoAl}, and {NiSn}, that are not explored in the literature. Since the ceramic processing route that we have employed is akin to what industry uses in scaled-up production, the novel composition spaces identified in this work can potentially impact the development of functional materials for high-temperature applications such as piezoelectric actuators46,47. The current work is also a departure from those reported in the literature12,13,15,44 in the following ways. First, in terms of the ML approach, we setup our materials design problem as two-step learning to sequentially guide experiments, where a classification learning model down-selects promising candidates in the desired crystal structure and a regression method coupled to active learning recommends promising compositions for experimental synthesis. The new predictions are expected to simultaneously satisfy two criteria that are crucial for practical applications: first, they are expected to form in the desired perovskite crystal structure and second, they must also have a promising property, i.e., high ferroelectric Curie temperature (TC). The potential of ML to constrain the search space has major implications in rationally guiding experiments towards promising materials with targeted properties. Previous studies considered only either the classification learning or regression methods for design12,13,15. Second, in terms of the materials class, the focus is on complex functional oxides that are synthesized via conventional powder processing routes. However, previous demonstrations of classification learning with feedback from experiments have been on complex organic−inorganic molecules12,15. Third, we use our strategy to sequentially guide 1−3 experiments at each iteration step, in contrast to the batch experiments of Duros et al.15, who performed ten new experiments at each iteration step for validation and feedback.

Results

The need for a two-step learning strategy, especially the classification learning for constraining the perovskite composition space, was motivated by our first prediction on the 0.5Bi(In0.36Sc0.64)O3-0.5PT composition. We predicted this composition solely on the basis of regression and active learning under the assumption that all compositions in the constrained xBi\([ {{\mathrm{Me}}_y' {\mathrm{Me}}_{(1 - y)}'' } ]\)O3–(1 − x)PT search space can be potentially synthesized in the perovskite phase. We did not use classification learning to evaluate whether this composition will form in the perovskite structure or not. This naive strategy proved to be insufficient, because the X-ray diffraction (XRD) measurements revealed the presence of secondary phases. This indicated that the composition did not form in a single-phase perovskite structure and furthermore, these secondary phases are deleterious for high-temperature applications. This failed attempt led us to reformulate our approach and we constructed a new dataset for classification learning, whose primary objective is to identify promising regions in the xBi\([ {{\mathrm{Me}}_y' {\mathrm{Me}}_{(1 - y)}''}]\)O3–(1 − x)PT search space that are expected to form in the pure perovskite phase. The reformulation resulted in the development of a two-step ML strategy, which we show in Fig. 2. The synergistic effect of employing classification learning, regression, active learning, experimental validation, and feedback in the discovery of high-TC ferroelectric perovskites are discussed below.

Fig. 2
figure 2

Two-step machine learning strategy for sequentially guiding experiments. We construct two independent datasets (D1 and D2) from surveying the experimental data in the literature, one for building classification learning and the other for regression models. The role of classification learning is to screen for compositions that can be synthesized in the perovskite phase. Compositions that pass the classification learning screen are referred to as the “candidate perovskites”. The regression models then predict the TC with associated uncertainties (σ) for the candidate perovskites. We then use efficient global optimization (EGO)61 to identify promising high-TC candidates for experiments. Outcome from both successful and failed experiments provides feedback for classification learning. On the other hand, we only use the outcome from successful experiment to update the regression ML models. We iterate our design loop for a total of five times

Classification learning

We constructed an initial dataset of 167 polycrystalline ceramic samples taken from the published literature. The compositions in the dataset were synthesized using either the conventional powder processing, solid state reaction methods, or mixed oxide processing routes, with some differences in the calcination and sintering protocols. Thus, we have a heterogeneous dataset as it is constructed from diverse sources. The phase purity of the polycrystalline ceramics in our training set was determined using the XRD measurements. Compositions that were revealed to be at least 95% phase pure in the perovskite structure were labeled as “+1” (or desired) in our dataset and we do not explicitly distinguish between tetragonal, rhombohedral, or monoclinic perovskite crystal structures. Similarly, those that had secondary phases were labeled as “−1” (or undesired). Our training set contained 107 and 60 compositions with +1 and −1 labels, respectively.

We represent each composition in our dataset using five features, namely tolerance factor, valence electron number, Martynov−Batsanov electronegativity, ideal bond lengths, and Mendeleev number48,49,50,51. Recently, Pilania et al.52 showed the relative importance of these features in classifying the formability of ABO3 compounds in perovskite crystal structure, and therefore, we use them in this work. Since every composition is a solid solution, we used the weighted-fractions of the relative proportions of Bi, Pb, Me′, Me″, and Ti in the solid solution to calculate the feature values. For instance, the tolerance factor (tf) for (say) a xBi(Me′Me″)O3 – (1 − x)PT solid solution is calculated using the formula,

$$t_{{\rm f},{\mathrm{solid}}{\kern 1pt} {\mathrm{solution}}} = \left\{ {x \times t_{{\rm f},{\mathrm{Bi}}({\mathrm {Me}}'{\mathrm {Me}}''){\mathrm{O}}_{\mathrm{3}}}} \right\} + \left\{ {(1 - x) \times t_{{\rm f},{\mathrm{PT}}}} \right\},$$
(1)

where tf is calculated as \(({{r_{\rm A} + r_{\rm O}}})/({{\sqrt 2 \left( {r_{\rm B} + r_{\rm O}} \right)}})\) and rA, rB, and rO are the weighted-average Shannon’s ionic radii53 of the A-site, B-site, and O atom, respectively, in the perovskite lattice. A similar procedure was used to determine the values for other features for each composition.

The objective of classification learning is to build ML models that map the features to the labels (+1 or −1). We use the support vector classifier with a radial basis function kernel (SVCrbf) for building the classification learning models (see Methods section for details). We first evaluated the suitability of the features and the SVCrbf ML method on our initial dataset by splitting the dataset into training and independent test sets, because very little work has been reported in the literature on predicting the perovskite phase stability of these solid solutions. We generated 100 training and test set pairs via random sampling, such that each training and test set pair contained 127 and 40 data points, respectively. We built 100 SVCrbf classifiers from the training sets and used them to predict the labels for the compositions in the corresponding test sets. We calculated the accuracy of each SVCrbf classifier by evaluating its performance on each of the test sets. The overall accuracy was then estimated by calculating the mean value from the outcome of the 100 SVCrbf classifiers. We find that our SVCrbf classifiers performed with an average accuracy of 77.5 ± 6.4%, which is reasonable and not uncommon in the literature12,15,54, if dealing with heterogeneous experimental datasets such as the one constructed in this work. This exercise provides an estimate of the predictive power of the features and the SVCrbf method in classifying +1 and −1 labels of the compositions assembled in our dataset.

For the prediction of new perovskites in the unexplored 61,506 composition space, however, we utilized all 167 data points for training the SVCrbf classifiers. We constructed a total of 100 bootstrapped samples55,56 from the original dataset and built 100 SVCrbf classifiers. Thus, any given composition will have predictions of either +1 or −1 label from 100 SVCrbf classifiers. Since the objective is to down-select promising perovskites from the 61,506 unexplored composition space, we “exploit” our SVCrbf classifiers, i.e., we only choose those compositions that are classified in the +1 (or desired) label at least 95 or more times. The hyperparameters for the SVCrbf classifiers were optimized by the tenfold cross-validation (CV) method (see Methods section for details).

Regression and active learning

We built another dataset of 117 compositions for which the TC data are known from published experiments. This dataset contains compositions that are both at and away from the Morphotropic Phase Boundary (MPB) composition, but we do not distinguish between them. In the ferroelectrics literature, the term MPB refers to structural phase transitions arising due to changes in chemical composition at a given temperature and especially in PbTiO3-based materials, MPB encompasses a region in the phase diagram where two ferroelectric phases (typically in tetragonal and rhombohedral symmetries) coexist. The TC data were determined using the dielectric measurements in an impedance analyzer. For this dataset, we represent each composition using two features, namely tf and ionic displacement. Unlike the classification learning problem, the rationale for the choice of these features in predicting the TC for Bi(Me′Me″)O3 – PT solid solutions is well-established in the literature20,42,43,44,57,58,59. For instance, Abrahams et al.59 showed that in displacive ferroelectrics, TCδ2, where δ is the relative atomic displacement of the homopolar metal atom from the center of the octahedron cage. Later, Grinberg et al.42,57,58 extended the argument to also include displacements from the A-site atoms. Similarly, Eitel et al.20 showed the existence of a correlation between tf and TC at the MPB compositions in BiMeO3-PT solid solutions, where Me is a trivalent octahedral cation. However, the uncertainties associated with tf and δ in describing the TC at compositions away from the MPB are not known. The goal of regression is to build ML models that can predict TC as a function of tf, δ, and compositions (at and away from the MPB). While the tf for each composition is calculated as shown in Eq. (1), the δ for the solid solution is calculated as follows:

$$\delta _{{\mathrm{solid}}{\kern 1pt} {\mathrm{solution}}} = \left\{ {x \times \delta _{{\mathrm{Bi}}} \times \delta _{{\mathrm {Me}}'{\mathrm {Me}}''}} \right\} + \left\{ {(1 - x) \times \delta _{{\mathrm{Pb}}} \times \delta _{{\mathrm{Ti}}}} \right\},$$
(2)

where the values for δ for each octahedral site cation (Ti, Me′, and Me″) were taken from the work of Balachandran et al.60 and that for Bi and Pb comes from Grinberg and Rappe58. All δ values used in this work are given in Supplementary Table 1. In the case of δMe′Me″, we used the weighted-average values for the individual δMe′ and δMe″ cation data.

We used the support vector regression with the radial basis function kernel (SVRrbf) for building the regression models. Similar to classification learning, we construct a total of 100 bootstrapped samples from the original dataset and build 100 SVRrbf models. The mean value and standard deviation from these 100 SVRrbf models was taken as the mean TC prediction \(\left( {\widehat {T_{\rm C}}} \right)\) and the associated uncertainty (σ), respectively. Once the SVRrbf models are built, we apply them to predict \(\widehat {T_{\rm C}}\, \pm\, \sigma\) for the candidate perovskites that were down-selected from the classification learning. The hyperparameters for the SVRrbf models were optimized by the tenfold CV method (see Methods section for details). The performance of the SVRrbf on the training set is shown in Fig. 3. The mean absolute error on the initial dataset was estimated to be 30.2 K.

Fig. 3
figure 3

Performance of SVRrbf on the training dataset. Experimental (x-axis) vs. ML predicted (y-axis) TC for the initial dataset of 117 compounds. The error bars are the standard deviation from the predictions of 100 SVRrbf ML models on the dataset. The dashed red line is the x = y line, where the predictions from SVRrbf models and experimental TC are equal

We explored two methods to rank the candidate perovskites for experimental recommendations. One is based on the EGO61, where we calculate the “expected improvement, E(I)” for each unmeasured composition using the expression, σ[ϕ(z) + zΦ(z)], where \(z = \left( {\widehat {T_{\rm C}} - \mu ^ \ast } \right){\mathrm{/}}\sigma\) and μ* is the maximum TC observed so far in the current training set, ϕ(z) and Φ(z) are the standard normal density and cumulative distribution functions, respectively61. Here, E(I) balances the tradeoff between “exploitation” and “exploration” of the SVRrbf model. At the end of each iteration, EGO returns a score for E(I) for each unmeasured composition, whose relative magnitude depends on the SVRrbf predicted \(\left( {\widehat {T_{\rm C}},\sigma } \right)\) pair for those compositions and the value of μ* in the training set. It is common to pick the composition with the maximum E(I) for validation and feedback. It is anticipated that the validation of compositions recommended by E(I) would lead to model improvement in the subsequent iterations. The second method, in contrast to the EGO method, exploits the SVRrbf models, i.e., we recommend candidate perovskite compositions from the unexplored space that were predicted to have the largest \(\widehat {T_{\rm C}}\). Unlike E(I), these recommendations are not expected to improve the SVRrbf models in the subsequent iterations. We refer to this as “Exploitation” in this paper.

Experiments

All specimens were prepared by the conventional powder processing technique. The general procedure for the processing used in this work is outlined in a previous publication by Kowalski et al.62,63,64. However, the sintering temperature was varied between 1173 and 1373 K due to the dependence of melting temperature on compositions. During sintering the weight loss was less than 3% for all compositions. A sintered pellet from each composition was then crushed for XRD. All compositions studied were, to some degree, perovskite. In general, there is a solubility limit depending on the cations in Bi(Me′Me″)O3 and this limit tends to be less than 50% except for Bi(Zn0.5Ti0.5)O3 and BiFeO3 in PT. If XRD revealed the composition to be mostly perovskite (≥95%), then a sintered pellet was prepared for dielectric measurements (see Methods for additional details).

Iterative loop

The iterative loop consists of the following steps: (a) Screen 61,506 compositions using an ensemble of 100 SVCrbf classifiers and down-select compositions as candidate perovskites, (b) Predict the \(\widehat {T_{\rm C}}\) along with the uncertainties (σ) for the down-selected candidates using SVRrbf regression models, (c) Calculate the E(I) for each candidate perovskite using the EGO algorithm and rank them in the descending order, (d) Recommend the promising compositions [i.e., the one with the largest E(I)] for experimental synthesis and characterization, (e) Synthesize and evaluate the phase stability using XRD and only those that are revealed to be perovskite are prepared for measuring the TC, and (f) Augment the training set with these compositions and repeat the cycle until our budget of ten experiments is expended. The results are summarized in Table 1, where we also provide the data for tetragonality (c/a ratio) of the experimentally confirmed perovskites. A detailed account describing how the search process evolved from one iteration to another is given in Supplementary Note 1 and we only discuss our main results here. In Figs. 4 and 5, we show the XRD and dielectric measurements data for the synthesized compositions, respectively. From XRD measurements, we identify six new compositions that were synthesized in iteration 2a, 3a, 4a, 4b, 5b, and 5c to be mostly perovskite (≥95% phase pure). The temperature-dependent dielectric measurements were then performed only on the six compositions that are confirmed as perovskites by the XRD data.

Table 1 List of ten new compositions predicted and validated by ML and experimental measurements
Fig. 4
figure 4

X-ray diffraction data for the ten compositions. Compositions synthesized at iterations 2a, 3a, 4a, 4b, 5b, and 5c are identified to be mostly perovskites (≥95% phase pure), in agreement with the predictions from SVCrbf ML method. See Table 1 for details

Fig. 5
figure 5

Dielectric data for the six confirmed perovskites. Dielectric constant (ε r ) as a function of temperature (T, in °C) data measured at 10 kHz frequency for the six experimentally confirmed perovskites (labeled A–F, whose compositions are also provided alongside the data). We put the data for E and F compositions in the inset because the samples become conductive. The dielectric constant at that phase transformation is not real. It is mostly due to charge arising from conduction

Among the six experimentally confirmed perovskite compositions, 0.2Bi(Fe0.12Co0.88)O3–0.8PT (predicted in the fourth iteration using the exploitation method) had the highest measured TC of 898 K. Both Fe3+ and Co3+ cations have large ionic displacements in the perovskite lattice60, which is hypothesized as one of the main reasons for its large TC. Our work has led to the prediction of four new promising {Me′Me″} pairs in the chemical space, namely {FeCo}, {NiSn}, {CoAl}, and {YbAl}, not explored in the literature. This is shown in Fig. 6 and the exact compositions are given in Table 1. The XRD measurements confirmed three of them, {FeCo}, {NiSn}, and {CoAl}, in the perovskite structure. In the case of the {YbAl} pair, XRD measurements revealed secondary phases and therefore, our preliminary studies identify them as unsuccessful.

Fig. 6
figure 6

{Me′Me″}-pair combinations in the chemical space. In the Bi(Me′Me″)O3-PT chemical space, where there are 23 cations (of different nominal valence) that can potentially occupy the Me′ or Me″ sites, we have a total of 75 possible {Me′Me″} pairs (thin black lines). However, only 13 such pairs (highlighted in dark blue line) are experimentally explored in the literature and are included in our training set. In this work, we predict and experimentally synthesize four new pairs (highlighted in red line) that are identified as potential high-temperature ferroelectric perovskites by ML. Among them, three pairs (highlighted in full red line)—{FeCo}, {NiSn}, and {CoAl}—are experimentally found to form in the perovskite structure and are identified as promising systems for further investigation. One of the pairs (dotted red line), {YbAl}, was predicted to be a promising high TC perovskite by ML, but XRD revealed presence of secondary phases (therefore, unsuccessful)

Discussion

One of the major hurdles in traditional materials design has been the vast size of the unexplored space and our experimental bias towards knowledge of the well-known materials systems such as BaTiO3, Pb(Zr, Ti)O3 and Pb(Mg, Nb)O3-PbTiO3. We have demonstrated here how synthesis can be guided by a two-step ML strategy that learns from all available data, including both successful and failed experiments, to accelerate the search for new high-T C perovskite oxides. Our work has led to two promising compositions, 0.2Bi(Fe0.12Co0.88)O3–0.8PT and 0.2Bi(Co0.90Al0.10)O3–0.8PT. The Bi(FeCo)O3-PT and Bi(CoAl)O3-PT chemical spaces rank third and fourth highest, respectively, in terms of their measured TC values compared to the state-of-the-art compositions [BiFeO3-PT and Bi(ZnTi)O3-PT] in our training set. The merits of these compositions are their high TC and the ease of processing in the perovskite phase, which are critical for actuator performance. This is important because the operation temperature is limited by AC conductivity, leakage, or de-poling and TC sets the intrinsic limit to piezoelectricity. In future, one needs to incorporate c/a ratio and domain mobility, which we have ignored in this work.

From Table 1, we find two intriguing observations pertaining to the BiGaO3-BiScO3-PT and BiFeO3-BiCoO3-PT predicted compositions that are worth discussing. In BiGaO3-BiScO3-PT, the first two predictions (in iterations 1 and 2, as given in Table 1) did not result in a perovskite phase as revealed by the XRD data. However, the third prediction (in iteration 3) resulted in perovskite structure. Notice that the composition predicted in iteration 3 had less BiGaO3 content (13.25%) relative to the first two predictions (that contained 49.8 and 22% BiGaO3). The PT content, on the other hand, also increased in the third prediction compared to the first two predictions (Table 1). Similarly, in BiFeO3-BiCoO3-PT the first prediction (in iteration 3, 0.7Bi(Fe0.73Co0.27)O3–0.3PT) was rich in Bi(CoFe)O3 relative to that of the PT content. Our XRD measurements revealed secondary phases and therefore, unsuccessful. In the next iteration (# 4), ML predicted 0.2Bi(Fe0.12Co0.88)O3–0.8PT, which is now rich in PT compared to the Bi(CoFe)O3 content. The XRD measurements revealed perovskite phase. We interpret these outcomes as the SVCrbf classifier iteratively “learning” the solubility limit of Bi(Me′Me″)O3 in PT perovskite from previous failed experiments.

We also shed some light on the evolution of the ML models as a result of ten experiments. For instance, the SVRrbf models (after iteration 4) showed only marginal improvement relative to that of the first set of models (before iteration 1). The mean absolute error value decreased from 30 to 28 K. On the other hand, the SVCrbf classification learning models showed no improvement in its performance on the test set. We calculated an average accuracy of 77 ± 6.5% for the final model (i.e., after iteration 4), which is almost identical to that of the initial SVCrbf models (before iteration 1), whose average accuracy was calculated as 77.5 ± 6.4%. We attribute this behavior to the exploitative recommendation strategy of the SVCrbf models. We anticipate that the average accuracies of these classifiers can, in principle, be further improved when these models are coupled to active learning methods, such as uncertainty sampling15, where a loss function can be defined with an intention of improving the average accuracies of the classification learning models. We identify them as promising directions for future investigation.

Methods

Machine learning

We use the C-support vector classification (C-SVC) and ε-support vector regression (ε-SVR) methods with non-linear Gaussian radial basis function kernel for classification learning and regression, respectively, as implemented in the e1071 package65 within the RSTUDIO environment66. The hyperparameters (cost and gamma) for both C-SVC and ε-SVR methods were optimized using the tenfold CV method and the grid search method. In the tenfold CV method, the original dataset was divided randomly (without replacement) into ten subsamples of equal sizes. Then, a single subsample (containing 10% of the data) was chosen as the test set and the remaining 90% was used for training the models. The process was repeated ten times (one for each subsample) such that each composition appear exactly once in the test set data. The set of hyperparameters that resulted in the smallest tenfold CV error (i.e., classification error in the case of C-SVC and mean squared error in the case of ε-SVR) was chosen for training our final models. Bootstrap resampling is one of the well-known methods, where a dataset is sampled repeatedly with replacement55,56.

Experimental

The starting oxide powders used and compositions studied are listed here, PbO, TiO2, Bi2O3, Fe2O3, CoO2, Ga2O3, In2O3, Al2O3, Sc2O3, SnO2, NiO, and Yb2O3. The purity of the powders used exceeds 99%.

The disc pellets were first parallel polished to maintain a 10:1 diameter to thickness ratio. Then the polished pellets were electroded with silver paint and cured at 350 °C. To then investigate the prediction of the TC, dielectric constant measurements were carried out as a function of temperature at 0.5 V with an Agilent 4294A impedance analyzer (Agilent Technologies, Santa Clara, CA) in tandem with a customized CM furnace. Data was captured through a computer interface by LABVIEW (National Instruments Corp., Austin, TX) software.

Data availability

The datasets generated and analyzed for classification learning and regression are deposited at figshare67 with identifier https://doi.org/10.6084/m9.figshare.5687551.v1.