Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

# Experimental search for high-temperature ferroelectric perovskites guided by two-step machine learning

## Abstract

Experimental search for high-temperature ferroelectric perovskites is a challenging task due to the vast chemical space and lack of predictive guidelines. Here, we demonstrate a two-step machine learning approach to guide experiments in search of xBi$$[ {{\mathrm{Me}}_y' {\mathrm{Me}}_{(1 - y)}'' } ]$$O3–(1 − x)PbTiO3-based perovskites with high ferroelectric Curie temperature. These involve classification learning to screen for compositions in the perovskite structures, and regression coupled to active learning to identify promising perovskites for synthesis and feedback. The problem is challenging because the search space is vast, spanning ~61,500 compositions and only 167 are experimentally studied. Furthermore, not every composition can be synthesized in the perovskite phase. In this work, we predict x, y, Me′, and Me″ such that the resulting compositions have both high Curie temperature and form in the perovskite structure. Outcomes from both successful and failed experiments then iteratively refine the machine learning models via an active learning loop. Our approach finds six perovskites out of ten compositions synthesized, including three previously unexplored {Me′Me″} pairs, with 0.2Bi(Fe0.12Co0.88)O3–0.8PbTiO3 showing the highest measured Curie temperature of 898 K among them.

## Introduction

One of the important challenges in the computer-guided accelerated search for new materials is the simultaneous optimization of stability and one or more properties of interest1,2. Although the rationale for property optimization can be readily justified (e.g. need for room temperature superconductors, fast ion-conductors, sub-nm size skyrmions etc.), predicting stability is critical because it can inform whether a candidate material can be potentially synthesized in a laboratory setting or not. Furthermore, a quantitative understanding of the stability can put a constraint on the feasible solution space and our ability to predict them can significantly impact the accelerated search for new materials. Traditionally, density functional theory calculations are utilized, where the formation enthalpy and distance from the convex hull are used to determine the stability or metastability of a compound or an alloy3,4,5,6,7,8. While powerful, this approach has limitations in the accelerated search, especially if complex solid-solutions with fractional site occupancies, large supercells, and transition metal oxides with strong electron correlation are involved. The state-of-the-art computational tools9,10,11 are most suited to handle stoichiometric compounds.

In contrast, machine learning (ML) approaches that use experimental data are also increasingly utilized for accelerating the search for new materials12,13,14,15,16. In these studies, data from both successful and failed experiments are used to train ML models, which in turn can be used to predict whether a new data sample (not present in the training set) can be potentially experimentally synthesized or not. Thus, these methods do not use or strictly require thermodynamic stability data but take advantage of the past experiments to make inference about future experiments. One of the emerging areas in the nascent field of materials informatics is the active learning or adaptive design approach, where the ML models are combined with algorithms that recommend informative experiments (from a vast pool of possible experiments) such that the new data are expected to improve the performance of the ML models in the next iteration17,18,19. Recent demonstrations of these methods to experimentally discover complex organic−inorganic molecules, alloys, and functional oxides are worth mentioning12,13,14,15,16.

In this paper, we focus on the complex xBi$$[ {{\mathrm{Me}}_y' {\mathrm{Me}}_{(1 - y)}''} ]$$O3–(1 − x)PbTiO3 (PT) perovskite solid solutions, where Me′ and Me″ are di-, tri-, tetra- or pentavalent cations that occupy the octahedral site of the perovskite lattice (shown in Fig. 1) and the resulting solid solution is charge neutral. These are candidate materials for high-temperature ferroelectric applications because of their large piezoelectric and electromechanical coupling coefficients20. Our survey of the periodic table identified a total of 23 cations that can occupy the Me′ and Me″ octahedral sites in the perovskite lattice. Further, these 23 cations can be distinctly combined to yield a total of 75 {Me′Me″} possible cation pairs and only 13 such pairs are experimentally explored. We also allow x to vary from 0.05 to 0.85 in steps of 0.05 and y to vary from 0.1 to 0.9 in steps of 0.01 for constraining our composition space. Overall, we identify 61,506 unique chemical compositions out of which only 167 are experimentally studied, representing only 0.28% of the search space. In the remaining unexplored chemical and composition space, it is unclear how many candidate high-temperature ferroelectric perovskites can and do exist. This is a challenging question because rules governing formation of high-temperature ferroelectric perovskite phases in a complex multinary phase space are not known a priori. Traditionally, time-consuming and expensive trial-and-error or intuition-driven experimental approaches are used to uncover the composition−structure−property relationships, which is a non-trivial task. The overarching research problem is schematically shown in Fig. 1.

Here, we demonstrate a materials design approach driven by ML and active learning methods to simultaneously predict x, y, Me′, and Me″ such that the new candidate solid solutions are expected to (i) form in a perovskite structure (with at least 95% phase purity) and (ii) also have high ferroelectric Curie temperature (TC). The novelty of our ML approach lies in the integration of classification learning with regression methods to constrain the search space of possible perovskites so that only promising compositions are recommended for experimental synthesis, characterization, and feedback. As a result, we build two independent ML models, one for classification learning and the other for regression. The data for training the ML models are taken from the published experimental literature21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45. While the classification learning models allow us to screen for candidate chemical compositions that can have perovskite structure, the regression models then predict the TC for the candidate perovskite solid solutions. We also use our recently developed active learning (or adaptive design) approach based on efficient global optimization (EGO)13,17 to recommend promising compositions for experimental synthesis and characterization (details of EGO algorithm are discussed in the Regression and Active Learning section). We employ the conventional powder processing technique for synthesis, which allows us to validate one of the important questions of perovskite stability under the driving force of temperature. One can also, in principle, experimentally stabilize the perovskite phase by means of pressure, but we focus only on the temperature. We run our iterative loop for a total of five times, which resulted in investigating ten new compositions and we discovered six new perovskites, with 0.2Bi(Fe0.12Co0.88)O3–0.8PT having the highest measured TC of 898 K among them. We also identify three novel {Me′Me″} pairs, namely {FeCo}, {CoAl}, and {NiSn}, that are not explored in the literature. Since the ceramic processing route that we have employed is akin to what industry uses in scaled-up production, the novel composition spaces identified in this work can potentially impact the development of functional materials for high-temperature applications such as piezoelectric actuators46,47. The current work is also a departure from those reported in the literature12,13,15,44 in the following ways. First, in terms of the ML approach, we setup our materials design problem as two-step learning to sequentially guide experiments, where a classification learning model down-selects promising candidates in the desired crystal structure and a regression method coupled to active learning recommends promising compositions for experimental synthesis. The new predictions are expected to simultaneously satisfy two criteria that are crucial for practical applications: first, they are expected to form in the desired perovskite crystal structure and second, they must also have a promising property, i.e., high ferroelectric Curie temperature (TC). The potential of ML to constrain the search space has major implications in rationally guiding experiments towards promising materials with targeted properties. Previous studies considered only either the classification learning or regression methods for design12,13,15. Second, in terms of the materials class, the focus is on complex functional oxides that are synthesized via conventional powder processing routes. However, previous demonstrations of classification learning with feedback from experiments have been on complex organic−inorganic molecules12,15. Third, we use our strategy to sequentially guide 1−3 experiments at each iteration step, in contrast to the batch experiments of Duros et al.15, who performed ten new experiments at each iteration step for validation and feedback.

## Results

The need for a two-step learning strategy, especially the classification learning for constraining the perovskite composition space, was motivated by our first prediction on the 0.5Bi(In0.36Sc0.64)O3-0.5PT composition. We predicted this composition solely on the basis of regression and active learning under the assumption that all compositions in the constrained xBi$$[ {{\mathrm{Me}}_y' {\mathrm{Me}}_{(1 - y)}'' } ]$$O3–(1 − x)PT search space can be potentially synthesized in the perovskite phase. We did not use classification learning to evaluate whether this composition will form in the perovskite structure or not. This naive strategy proved to be insufficient, because the X-ray diffraction (XRD) measurements revealed the presence of secondary phases. This indicated that the composition did not form in a single-phase perovskite structure and furthermore, these secondary phases are deleterious for high-temperature applications. This failed attempt led us to reformulate our approach and we constructed a new dataset for classification learning, whose primary objective is to identify promising regions in the xBi$$[ {{\mathrm{Me}}_y' {\mathrm{Me}}_{(1 - y)}''}]$$O3–(1 − x)PT search space that are expected to form in the pure perovskite phase. The reformulation resulted in the development of a two-step ML strategy, which we show in Fig. 2. The synergistic effect of employing classification learning, regression, active learning, experimental validation, and feedback in the discovery of high-TC ferroelectric perovskites are discussed below.

### Classification learning

We constructed an initial dataset of 167 polycrystalline ceramic samples taken from the published literature. The compositions in the dataset were synthesized using either the conventional powder processing, solid state reaction methods, or mixed oxide processing routes, with some differences in the calcination and sintering protocols. Thus, we have a heterogeneous dataset as it is constructed from diverse sources. The phase purity of the polycrystalline ceramics in our training set was determined using the XRD measurements. Compositions that were revealed to be at least 95% phase pure in the perovskite structure were labeled as “+1” (or desired) in our dataset and we do not explicitly distinguish between tetragonal, rhombohedral, or monoclinic perovskite crystal structures. Similarly, those that had secondary phases were labeled as “−1” (or undesired). Our training set contained 107 and 60 compositions with +1 and −1 labels, respectively.

We represent each composition in our dataset using five features, namely tolerance factor, valence electron number, Martynov−Batsanov electronegativity, ideal bond lengths, and Mendeleev number48,49,50,51. Recently, Pilania et al.52 showed the relative importance of these features in classifying the formability of ABO3 compounds in perovskite crystal structure, and therefore, we use them in this work. Since every composition is a solid solution, we used the weighted-fractions of the relative proportions of Bi, Pb, Me′, Me″, and Ti in the solid solution to calculate the feature values. For instance, the tolerance factor (tf) for (say) a xBi(Me′Me″)O3 – (1 − x)PT solid solution is calculated using the formula,

$$t_{{\rm f},{\mathrm{solid}}{\kern 1pt} {\mathrm{solution}}} = \left\{ {x \times t_{{\rm f},{\mathrm{Bi}}({\mathrm {Me}}'{\mathrm {Me}}''){\mathrm{O}}_{\mathrm{3}}}} \right\} + \left\{ {(1 - x) \times t_{{\rm f},{\mathrm{PT}}}} \right\},$$
(1)

where tf is calculated as $$({{r_{\rm A} + r_{\rm O}}})/({{\sqrt 2 \left( {r_{\rm B} + r_{\rm O}} \right)}})$$ and rA, rB, and rO are the weighted-average Shannon’s ionic radii53 of the A-site, B-site, and O atom, respectively, in the perovskite lattice. A similar procedure was used to determine the values for other features for each composition.

The objective of classification learning is to build ML models that map the features to the labels (+1 or −1). We use the support vector classifier with a radial basis function kernel (SVCrbf) for building the classification learning models (see Methods section for details). We first evaluated the suitability of the features and the SVCrbf ML method on our initial dataset by splitting the dataset into training and independent test sets, because very little work has been reported in the literature on predicting the perovskite phase stability of these solid solutions. We generated 100 training and test set pairs via random sampling, such that each training and test set pair contained 127 and 40 data points, respectively. We built 100 SVCrbf classifiers from the training sets and used them to predict the labels for the compositions in the corresponding test sets. We calculated the accuracy of each SVCrbf classifier by evaluating its performance on each of the test sets. The overall accuracy was then estimated by calculating the mean value from the outcome of the 100 SVCrbf classifiers. We find that our SVCrbf classifiers performed with an average accuracy of 77.5 ± 6.4%, which is reasonable and not uncommon in the literature12,15,54, if dealing with heterogeneous experimental datasets such as the one constructed in this work. This exercise provides an estimate of the predictive power of the features and the SVCrbf method in classifying +1 and −1 labels of the compositions assembled in our dataset.

For the prediction of new perovskites in the unexplored 61,506 composition space, however, we utilized all 167 data points for training the SVCrbf classifiers. We constructed a total of 100 bootstrapped samples55,56 from the original dataset and built 100 SVCrbf classifiers. Thus, any given composition will have predictions of either +1 or −1 label from 100 SVCrbf classifiers. Since the objective is to down-select promising perovskites from the 61,506 unexplored composition space, we “exploit” our SVCrbf classifiers, i.e., we only choose those compositions that are classified in the +1 (or desired) label at least 95 or more times. The hyperparameters for the SVCrbf classifiers were optimized by the tenfold cross-validation (CV) method (see Methods section for details).

### Regression and active learning

We built another dataset of 117 compositions for which the TC data are known from published experiments. This dataset contains compositions that are both at and away from the Morphotropic Phase Boundary (MPB) composition, but we do not distinguish between them. In the ferroelectrics literature, the term MPB refers to structural phase transitions arising due to changes in chemical composition at a given temperature and especially in PbTiO3-based materials, MPB encompasses a region in the phase diagram where two ferroelectric phases (typically in tetragonal and rhombohedral symmetries) coexist. The TC data were determined using the dielectric measurements in an impedance analyzer. For this dataset, we represent each composition using two features, namely tf and ionic displacement. Unlike the classification learning problem, the rationale for the choice of these features in predicting the TC for Bi(Me′Me″)O3 – PT solid solutions is well-established in the literature20,42,43,44,57,58,59. For instance, Abrahams et al.59 showed that in displacive ferroelectrics, TCδ2, where δ is the relative atomic displacement of the homopolar metal atom from the center of the octahedron cage. Later, Grinberg et al.42,57,58 extended the argument to also include displacements from the A-site atoms. Similarly, Eitel et al.20 showed the existence of a correlation between tf and TC at the MPB compositions in BiMeO3-PT solid solutions, where Me is a trivalent octahedral cation. However, the uncertainties associated with tf and δ in describing the TC at compositions away from the MPB are not known. The goal of regression is to build ML models that can predict TC as a function of tf, δ, and compositions (at and away from the MPB). While the tf for each composition is calculated as shown in Eq. (1), the δ for the solid solution is calculated as follows:

$$\delta _{{\mathrm{solid}}{\kern 1pt} {\mathrm{solution}}} = \left\{ {x \times \delta _{{\mathrm{Bi}}} \times \delta _{{\mathrm {Me}}'{\mathrm {Me}}''}} \right\} + \left\{ {(1 - x) \times \delta _{{\mathrm{Pb}}} \times \delta _{{\mathrm{Ti}}}} \right\},$$
(2)

where the values for δ for each octahedral site cation (Ti, Me′, and Me″) were taken from the work of Balachandran et al.60 and that for Bi and Pb comes from Grinberg and Rappe58. All δ values used in this work are given in Supplementary Table 1. In the case of δMe′Me″, we used the weighted-average values for the individual δMe′ and δMe″ cation data.

We used the support vector regression with the radial basis function kernel (SVRrbf) for building the regression models. Similar to classification learning, we construct a total of 100 bootstrapped samples from the original dataset and build 100 SVRrbf models. The mean value and standard deviation from these 100 SVRrbf models was taken as the mean TC prediction $$\left( {\widehat {T_{\rm C}}} \right)$$ and the associated uncertainty (σ), respectively. Once the SVRrbf models are built, we apply them to predict $$\widehat {T_{\rm C}}\, \pm\, \sigma$$ for the candidate perovskites that were down-selected from the classification learning. The hyperparameters for the SVRrbf models were optimized by the tenfold CV method (see Methods section for details). The performance of the SVRrbf on the training set is shown in Fig. 3. The mean absolute error on the initial dataset was estimated to be 30.2 K.

We explored two methods to rank the candidate perovskites for experimental recommendations. One is based on the EGO61, where we calculate the “expected improvement, E(I)” for each unmeasured composition using the expression, σ[ϕ(z) + zΦ(z)], where $$z = \left( {\widehat {T_{\rm C}} - \mu ^ \ast } \right){\mathrm{/}}\sigma$$ and μ* is the maximum TC observed so far in the current training set, ϕ(z) and Φ(z) are the standard normal density and cumulative distribution functions, respectively61. Here, E(I) balances the tradeoff between “exploitation” and “exploration” of the SVRrbf model. At the end of each iteration, EGO returns a score for E(I) for each unmeasured composition, whose relative magnitude depends on the SVRrbf predicted $$\left( {\widehat {T_{\rm C}},\sigma } \right)$$ pair for those compositions and the value of μ* in the training set. It is common to pick the composition with the maximum E(I) for validation and feedback. It is anticipated that the validation of compositions recommended by E(I) would lead to model improvement in the subsequent iterations. The second method, in contrast to the EGO method, exploits the SVRrbf models, i.e., we recommend candidate perovskite compositions from the unexplored space that were predicted to have the largest $$\widehat {T_{\rm C}}$$. Unlike E(I), these recommendations are not expected to improve the SVRrbf models in the subsequent iterations. We refer to this as “Exploitation” in this paper.

### Experiments

All specimens were prepared by the conventional powder processing technique. The general procedure for the processing used in this work is outlined in a previous publication by Kowalski et al.62,63,64. However, the sintering temperature was varied between 1173 and 1373 K due to the dependence of melting temperature on compositions. During sintering the weight loss was less than 3% for all compositions. A sintered pellet from each composition was then crushed for XRD. All compositions studied were, to some degree, perovskite. In general, there is a solubility limit depending on the cations in Bi(Me′Me″)O3 and this limit tends to be less than 50% except for Bi(Zn0.5Ti0.5)O3 and BiFeO3 in PT. If XRD revealed the composition to be mostly perovskite (≥95%), then a sintered pellet was prepared for dielectric measurements (see Methods for additional details).

### Iterative loop

The iterative loop consists of the following steps: (a) Screen 61,506 compositions using an ensemble of 100 SVCrbf classifiers and down-select compositions as candidate perovskites, (b) Predict the $$\widehat {T_{\rm C}}$$ along with the uncertainties (σ) for the down-selected candidates using SVRrbf regression models, (c) Calculate the E(I) for each candidate perovskite using the EGO algorithm and rank them in the descending order, (d) Recommend the promising compositions [i.e., the one with the largest E(I)] for experimental synthesis and characterization, (e) Synthesize and evaluate the phase stability using XRD and only those that are revealed to be perovskite are prepared for measuring the TC, and (f) Augment the training set with these compositions and repeat the cycle until our budget of ten experiments is expended. The results are summarized in Table 1, where we also provide the data for tetragonality (c/a ratio) of the experimentally confirmed perovskites. A detailed account describing how the search process evolved from one iteration to another is given in Supplementary Note 1 and we only discuss our main results here. In Figs. 4 and 5, we show the XRD and dielectric measurements data for the synthesized compositions, respectively. From XRD measurements, we identify six new compositions that were synthesized in iteration 2a, 3a, 4a, 4b, 5b, and 5c to be mostly perovskite (≥95% phase pure). The temperature-dependent dielectric measurements were then performed only on the six compositions that are confirmed as perovskites by the XRD data.

Among the six experimentally confirmed perovskite compositions, 0.2Bi(Fe0.12Co0.88)O3–0.8PT (predicted in the fourth iteration using the exploitation method) had the highest measured TC of 898 K. Both Fe3+ and Co3+ cations have large ionic displacements in the perovskite lattice60, which is hypothesized as one of the main reasons for its large TC. Our work has led to the prediction of four new promising {Me′Me″} pairs in the chemical space, namely {FeCo}, {NiSn}, {CoAl}, and {YbAl}, not explored in the literature. This is shown in Fig. 6 and the exact compositions are given in Table 1. The XRD measurements confirmed three of them, {FeCo}, {NiSn}, and {CoAl}, in the perovskite structure. In the case of the {YbAl} pair, XRD measurements revealed secondary phases and therefore, our preliminary studies identify them as unsuccessful.

## Discussion

One of the major hurdles in traditional materials design has been the vast size of the unexplored space and our experimental bias towards knowledge of the well-known materials systems such as BaTiO3, Pb(Zr, Ti)O3 and Pb(Mg, Nb)O3-PbTiO3. We have demonstrated here how synthesis can be guided by a two-step ML strategy that learns from all available data, including both successful and failed experiments, to accelerate the search for new high-T C perovskite oxides. Our work has led to two promising compositions, 0.2Bi(Fe0.12Co0.88)O3–0.8PT and 0.2Bi(Co0.90Al0.10)O3–0.8PT. The Bi(FeCo)O3-PT and Bi(CoAl)O3-PT chemical spaces rank third and fourth highest, respectively, in terms of their measured TC values compared to the state-of-the-art compositions [BiFeO3-PT and Bi(ZnTi)O3-PT] in our training set. The merits of these compositions are their high TC and the ease of processing in the perovskite phase, which are critical for actuator performance. This is important because the operation temperature is limited by AC conductivity, leakage, or de-poling and TC sets the intrinsic limit to piezoelectricity. In future, one needs to incorporate c/a ratio and domain mobility, which we have ignored in this work.

From Table 1, we find two intriguing observations pertaining to the BiGaO3-BiScO3-PT and BiFeO3-BiCoO3-PT predicted compositions that are worth discussing. In BiGaO3-BiScO3-PT, the first two predictions (in iterations 1 and 2, as given in Table 1) did not result in a perovskite phase as revealed by the XRD data. However, the third prediction (in iteration 3) resulted in perovskite structure. Notice that the composition predicted in iteration 3 had less BiGaO3 content (13.25%) relative to the first two predictions (that contained 49.8 and 22% BiGaO3). The PT content, on the other hand, also increased in the third prediction compared to the first two predictions (Table 1). Similarly, in BiFeO3-BiCoO3-PT the first prediction (in iteration 3, 0.7Bi(Fe0.73Co0.27)O3–0.3PT) was rich in Bi(CoFe)O3 relative to that of the PT content. Our XRD measurements revealed secondary phases and therefore, unsuccessful. In the next iteration (# 4), ML predicted 0.2Bi(Fe0.12Co0.88)O3–0.8PT, which is now rich in PT compared to the Bi(CoFe)O3 content. The XRD measurements revealed perovskite phase. We interpret these outcomes as the SVCrbf classifier iteratively “learning” the solubility limit of Bi(Me′Me″)O3 in PT perovskite from previous failed experiments.

We also shed some light on the evolution of the ML models as a result of ten experiments. For instance, the SVRrbf models (after iteration 4) showed only marginal improvement relative to that of the first set of models (before iteration 1). The mean absolute error value decreased from 30 to 28 K. On the other hand, the SVCrbf classification learning models showed no improvement in its performance on the test set. We calculated an average accuracy of 77 ± 6.5% for the final model (i.e., after iteration 4), which is almost identical to that of the initial SVCrbf models (before iteration 1), whose average accuracy was calculated as 77.5 ± 6.4%. We attribute this behavior to the exploitative recommendation strategy of the SVCrbf models. We anticipate that the average accuracies of these classifiers can, in principle, be further improved when these models are coupled to active learning methods, such as uncertainty sampling15, where a loss function can be defined with an intention of improving the average accuracies of the classification learning models. We identify them as promising directions for future investigation.

## Methods

### Machine learning

We use the C-support vector classification (C-SVC) and ε-support vector regression (ε-SVR) methods with non-linear Gaussian radial basis function kernel for classification learning and regression, respectively, as implemented in the e1071 package65 within the RSTUDIO environment66. The hyperparameters (cost and gamma) for both C-SVC and ε-SVR methods were optimized using the tenfold CV method and the grid search method. In the tenfold CV method, the original dataset was divided randomly (without replacement) into ten subsamples of equal sizes. Then, a single subsample (containing 10% of the data) was chosen as the test set and the remaining 90% was used for training the models. The process was repeated ten times (one for each subsample) such that each composition appear exactly once in the test set data. The set of hyperparameters that resulted in the smallest tenfold CV error (i.e., classification error in the case of C-SVC and mean squared error in the case of ε-SVR) was chosen for training our final models. Bootstrap resampling is one of the well-known methods, where a dataset is sampled repeatedly with replacement55,56.

### Experimental

The starting oxide powders used and compositions studied are listed here, PbO, TiO2, Bi2O3, Fe2O3, CoO2, Ga2O3, In2O3, Al2O3, Sc2O3, SnO2, NiO, and Yb2O3. The purity of the powders used exceeds 99%.

The disc pellets were first parallel polished to maintain a 10:1 diameter to thickness ratio. Then the polished pellets were electroded with silver paint and cured at 350 °C. To then investigate the prediction of the TC, dielectric constant measurements were carried out as a function of temperature at 0.5 V with an Agilent 4294A impedance analyzer (Agilent Technologies, Santa Clara, CA) in tandem with a customized CM furnace. Data was captured through a computer interface by LABVIEW (National Instruments Corp., Austin, TX) software.

### Data availability

The datasets generated and analyzed for classification learning and regression are deposited at figshare67 with identifier https://doi.org/10.6084/m9.figshare.5687551.v1.

## References

1. 1.

Zhang, X., Abdalla, L. B., Liu, Q. & Zunger, A. The enabling electronic motif for topological insulation in ABO3 perovskites. Adv. Funct. Mater. 27, 1701266 (2017).

2. 2.

Narayan, A. et al. Wagner, computational and experimental investigation for new transition metal selenides and sulfides: the importance of experimental verification for stability. Phys. Rev. B 94, 045105 (2016).

3. 3.

Gautier, R. et al. Prediction and accelerated laboratory discovery of previously unknown 18-electron ABX compounds. Nat. Chem. 7, 308–316 (2015).

4. 4.

Sun, W. et al. The thermodynamic scale of inorganic crystalline metastability. Sci. Adv. 2, e1600225 (2016).

5. 5.

Sanvito, S. et al. Accelerated discovery of new magnets in the Heusler alloy family. Sci. Adv. 3, e1602241 (2017).

6. 6.

Korbel, S., Marques, M. A. L. & Botti, S. Stability and electronic properties of new inorganic perovskites from high-throughput ab initio calculations. Mater. Chem. C. 4, 3157–3167 (2016).

7. 7.

Cooper, V. R., Morris, J. R., Takagi, S. & Singh, D. J. La-driven morphotrophic phase boundary in the Bi(Zn1/2Ti1/2)O3–La(Zn1/2Ti1/2)O3–PbTiO3 solid solution. Chem. Mater. 24, 4477–4482 (2012).

8. 8.

Emery, A. A., Saal, J. E., Kirklin, S., Hegde, V. I. & Wolverton, C. High-throughput computational screening of perovskites for thermochemical water splitting applications. Chem. Mater. 28, 5621–5634 (2016).

9. 9.

Jain, A. et al. Commentary: the materials project: a materials genome approach to accelerating materials innovation. APL Mater. 1, 011002 (2013).

10. 10.

Saal, J. E., Kirklin, S., Aykol, M., Meredig, B. & Wolverton, C. Materials design and discovery with high-throughput density functional theory: the open quantum materials database (OQMD). JOM 65, 1501–1509 (2013).

11. 11.

Curtarolo, S. et al. AFLOWLIB.ORG: a distributed materials properties repository from high-throughput ab initio calculations. Comput. Mater. Sci. 58, 227–235 (2012).

12. 12.

Raccuglia, P. et al. Machine-learning-assisted materials discovery using failed experiments. Nature 533, 73–76 (2016).

13. 13.

Xue, D. et al. Accelerated search for materials with targeted properties by adaptive design. Nat. Commun. 7, 11241 (2016).

14. 14.

Xue, D. et al. Accelerated search for BaTiO3-based piezoelectrics with vertical morphotropic phase boundary using Bayesian learning. Proc. Natl. Acad. Sci. USA 113, 13301–13306 (2016).

15. 15.

Duros, V. et al. Human versus robots in the discovery and crystallization of gigantic polyoxometalates. Angew. Chem. Int. Ed. 56, 10815–10820 (2017).

16. 16.

Oliynyk, A. O. et al. High-throughput machine-learning-driven synthesis of full-Heusler compounds. Chem. Mater. 28, 7324–7331 (2016).

17. 17.

Lookman, T., Balachandran, P. V., Xue, D., Hogden, J. & Theiler, J. Statistical inference and adaptive design for materials discovery. Curr. Opin. Solid State Mater. Sci. 21, 121–128 (2017).

18. 18.

Sverchkov, Y. & Craven, M. A review of active learning approaches to experimental design for uncovering biological networks. PLoS Comput. Biol. 13, 1–26 (2017).

19. 19.

Ueno, T., Rhone, T. D., Hou, Z., Mizoguchi, T. & Tsuda, K. COMBO: an efficient Bayesian optimization library for materials science. Mater. Discov. 4, 18–21 (2016).

20. 20.

Eitel, R. et al. New high temperature morphotropic phase boundary piezoelectrics based on Bi(Me)O3PbTi3 ceramics. Jpn. J. Appl. Phys. 40, 5999 (2001).

21. 21.

Cheng, J., Eitel, R., Li, N. & Cross, L. E. Structural and electrical properties of (1-x)Bi(Ga1/4Sc3/4)O3xPbTiO3 piezoelectric ceramics. J. Appl. Phys. 94, 605–609 (2003).

22. 22.

Stringer, C. J., Eitel, R. E., Shrout, T. R., Randall, C. A. & Reaney, I. M. Phase transition and chemical order in the ferroelectric perovskite (1-x)Bi(Mg3/4W1/4)O3-xPbTiO3 solid solution system. J. Appl. Phys. 97, 024101 (2005).

23. 23.

Zhang, S. et al. Investigation of bismuth-based perovskite system: (1-x)Bi(Ni2/3Nb1/3)O3-xPbTiO3. J. Appl. Phys. 98, 034103 (2005).

24. 24.

Sterianou, I. et al. High-temperature (1-x)Bi(Sc1/2Fe1/2)O3-xPbTiO3 piezoelectric ceramics. Appl. Phys. Lett. 87, 242901 (2005).

25. 25.

Stein, D. M., Suchomel, M. R. & Davies, P. K. Enhanced tetragonality in (x)PbTiO3-(1-x)Bi(B′B″)O3 systems: Bi(Zn3/4W1/4)O3. Appl. Phys. Lett. 89, 132907 (2006).

26. 26.

Jiang, Y. et al. Microstructure, dielectric, and piezoelectric properties of 0.38Bi(Gax Sc1-x)O3-0.62PbTiO3 high temperature piezoelectric ceramics. Phys. Status Solid. Rapid Res. Lett. 2, 28–30 (2008).

27. 27.

Moure, A., Alguer'o, M., Pardo, L., Ringgaard, E. & Pedersen, A. F. Microstructure and temperature dependence of properties of morphotropic phase boundary Bi(Mg1/2Ti1/2)O3-PbTiO3 piezoceramics processed by conventional routes. J. Eur. Ceram. Soc. 27, 237–245 (2007).

28. 28.

Chen, J., Tan, X., Jo, W. & Rodel, J. Temperature dependence of piezoelectric properties of high-T_C Bi(Mg1/2Ti1/2)O3-PbTiO3. J. Appl. Phys. 106, 034109 (2009).

29. 29.

Feng, G. et al. Phase formation and characterization of high Curie temperature xBiYbO3-(1-x)PbTiO3 piezoelectric ceramics. J. Eur. Ceram. Soc. 29, 1687–1693 (2009).

30. 30.

Hu, Z. et al. Morphotropic phase boundary and high temperature dielectric, piezoelectric, and ferroelectric properties of (1-x)Bi(Sc3/4In1/4)O3-xPbTiO3 ceramics. J. Appl. Phys. 110, 064102 (2011).

31. 31.

Cheng, J.-R., Zhu, W., Li, N. & Cross, L. E. Fabrication and characterization of xBiGaO3-(1-x)PbTiO3: a high temperature reduced Pb-content piezoelectric ceramic. Mater. Lett. 57, 2090–2094 (2003).

32. 32.

Zhang, X. D., Kwon, D. & Kim, B. G. Structural evolution of a high T_C ferroelectric (x)Bi(Zn1/2Ti1/2)O3-(1-x)PbTiO3 solid solution. Appl. Phys. Lett. 92, 082906 (2008).

33. 33.

Suchomel, M. R. & Davies, P. K. Predicting the position of the morphotropic phase boundary in high temperature PbTiO3-Bi(B′B″)O3 based dielectric ceramics. J. Appl. Phys. 96, 4405–4410 (2004).

34. 34.

Duan, R., Speyer, R. F., Alberta, E. & Shrout, T. R. High Curie temperature perovskite BiInO3–PbTiO3 ceramics. J. Mater. Res. 19, 2185–2193 (2004).

35. 35.

Woodward, D. I. & Reaney, I. M. A structural study of ceramics in the x(BiMnO3)–(1-x)(PbTiO3) solid solution series. J. Phys. 16, 8823 (2004).

36. 36.

Ansell, T. Y., Nikkel, J., Cann, D. P. & Sehirlioglu, A. High temperature piezoelectric ceramics based on xPbTiO3-(1-x)Bi(Sc1/2Me1/4Ti1/4)O3 (Me = Zn, Mg) ternary perovskites. Jpn. J. Appl. Phys. 51, 101802 (2012).

37. 37.

Gridnev, S. A. & Ostapenko, S. P. Ferroelectric ceramics in the Bi(Zn1/2Zr1/2)O3-PbTiO3 system. In Applications of Ferroelectrics, 1996. ISAF '96., Proceedings of the Tenth IEEE International Symposium Vol.2, 895–898 (1996).

38. 38.

Nomura, S., Kaneta, K., Kuwata, J. & Uchino, K. Phase transition in the PbTiO3-A(B2/3Nb1/3)O3 (A = La, Bi; B = Zn, Mg) solid solutions. Mater. Res. Bull. 17, 1471–1475 (1982).

39. 39.

Xie, X. L. & Yao, Z. H. Morphotropic phase boundary in (1-x)Bi(Sc0.5In0.5)O3-xPbTiO3 high temperature piezoelectric system. J. Ceram. Soc. Jpn 117, 208–210 (2009).

40. 40.

Stringer, C. J., Shrout, T. R., Randall, C. A. & Reaney, I. M. Classification of transition temperature behavior in ferroelectric PbTiO3-Bi(Me′Me″)O3 solid solutions. J. Appl. Phys. 99, 024106 (2006).

41. 41.

Wei, H. Experimental Search for High Curie Temperature Piezoelectric Ceramics with Combinatorial Approaches. Ph.D. thesis, School Iowa State University (2011).

42. 42.

Grinberg, I., Suchomel, M. R., Davies, P. K. & Rappe, A. M. Predicting morphotropic phase boundary locations and transition temperatures in Pb- and Bi-based perovskite solid solutions from crystal chemical data and first-principles calculations. J. Appl. Phys. 98, 094111 (2005).

43. 43.

Qi, T., Grinberg, I. & Rappe, A. M. Correlations between tetragonality, polarization, and ionic displacement in PbTiO3derived ferroelectric perovskite solid solutions. Phys. Rev. B 82, 134113 (2010).

44. 44.

Balachandran, P. V., Broderick, S. R. & Rajan, K. Identifying the “inorganic gene” for high-temperature piezoelectric perovskites through statistical learning. Proc. R. Soc. A Math. Phys. Eng. Sci. 467, 2271–2290 (2011).

45. 45.

Choi, S. M., Stringer, C. J., Shrout, T. R. & Randall, C. A. Structure and property investigation of a Bi-based perovskite solid solution: (1-x)Bi(Ni1/2Ti1/2)O3-xPbTiO3. J. Appl. Phys. 98, 034108 (2005).

46. 46.

Ranjan, R. & Raju, K. A. Unconventional mechanism of stabilization of a tetragonal phase in the perovskite ferroelectric (1-x)PbTiO3–xBiFeO3. Phys. Rev. B 82, 054119 (2010).

47. 47.

Zhu, W.-M. & Ye, Z.-G. Improved dielectric and ferroelectric properties of high Curie temperature (1-x)BiFeO3-xPbTiO3 ceramics by aliovalent ionic substitution. Appl. Phys. Lett. 89, 232904 (2006).

48. 48.

Goldschmidt, V. M. Die gesetze der krystallochemie. J. Nat. 21, 477–485 (1926).

49. 49.

Rabe, K. M., Phillips, J. C., Villars, P. & Brown, I. D. Global multinary structural chemistry of stable quasicrystals, high-Tc ferroelectrics, and high-Tc superconductors. Phys. Rev. B 45, 7650–7676 (1992).

50. 50.

Villars, P., Cenzual, K., Daams, J., Chen, Y. & Iwata, S. Data-driven atomic environment prediction for binaries using the Mendeleev number: Part 1. Composition AB. J. Alloy. Compd. 367, 167–175 (2004).

51. 51.

Brown, I. D. Recent developments in the methods and applications of the bond valence model. Chem. Rev. 109, 6858–6919 (2009).

52. 52.

Pilania, G., Balachandran, P. V., Gubernatis, J. E. & Lookman, T. Classification of ABO3 perovskite solids: a machine learning study. Acta Crystallogr. B 71, 507–513 (2015).

53. 53.

Shannon, R. D. Revised effective ionic radii and systematic studies of interatomic distances in halides and chalcogenides. J. Acta Crystallogr. A 32, 751–767 (1976).

54. 54.

Balachandran, P. V., Young, J., Lookman, T. & Rondinelli, J. M. Learning from data to design functional materials without inversion symmetry. Nat. Commun. 8, 14282 (2017).

55. 55.

MacKinnon, D. P., Lockwood, C. M. & Williams, J. Confidence limits for the indirect effect: distribution of the product and resampling methods. J. Multivar. Behav. Res 39, 99–128 (2004).

56. 56.

Efron, B. & Gong, G. A leisurely look at the bootstrap, the jackknife, and cross-validation. Am. Stat. 37, 36–48 (1983).

57. 57.

Grinberg, I. & Rappe, A. M. Nonmonotonic TC trends in Bi-based ferroelectric perovskite solid solutions. Phys. Rev. Lett. 98, 037603 (2007).

58. 58.

Grinberg, I. & Rappe, A. M. First principles calculations, crystal chemistry and properties of ferroelectric perovskites. Phase Transit. 80, 351–368 (2007).

59. 59.

Abrahams, S. C., Kurtz, S. K. & Jamieson, P. B. Atomic displacement relationship to curie temperature and spontaneous polarization in displacive ferroelectrics. Phys. Rev. 172, 551–553 (1968).

60. 60.

Balachandran, P. V., Shearman, T., Theiler, J. & Lookman, T. Predicting displacements of octahedral cations in ferroelectric perovskites using machine learning. Acta Crystallogr. B 73, 962–967 (2017).

61. 61.

Jones, D. R., Schonlau, M. & Welch, W. J. Efficient global optimization of expensive black-box functions. J. Glob. Optim. 13, 455–492 (1998).

62. 62.

Sehirlioglu, A., Sayir, A., Dynys, F., Nittala, K. & Jones, J. Structure and piezoelectric properties near the bismuth scandium oxide–lead zirconate–lead titanate ternary morphotropic phase boundary. J. Am. Ceram. Soc. 94, 788–795 (2011).

63. 63.

Kowalski, B. A., Sehirlioglu, A., Dynys, F. W. & Sayir, A. Characterization of the high‐temperature ferroelectric (100−x−y) BiScO3–(x) Bi (Zr0. 5Zn0. 5) O3–(y) PbTiO3 perovskite ternary solid solution. J. Am. Ceram. Soc. 97, 490–497 (2014).

64. 64.

Kowalski, B., Sayir, A. & Sehirlioglu, A. Aliovalent Mn Ti and Ga Ti substitution in high-temperature piezoelectric (x) Bi (Zn 0.5 Zr 0.5) O 3—(y) BiScO 3—(100–x−y) PbTiO 3. J. Mater. Sci. 51, 6761–6769 (2016).

65. 65.

Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., & Leisch, F. e1071: misc functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. Note R package version 1.6-7. http://CRAN.R-project.org/package=e1071 (2015).

66. 66.

R Core Team. R: A Language and Environment for Statistical Computing Organization (R Foundation for Statistical Computing, Vienna, Austria, 2012). http://www.R-project.org/.

67. 67.

Balachandran, P. V., Kowalski, B., Sehirlioglu, A., & Lookman, T. Experimental search for high-temperature ferroelectric perovskites guided by two-step machine learning (datasets). https://doi.org/10.6084/m9.figshare.5687551.v1 (2018).

## Acknowledgements

P.V.B. and T.L. acknowledge funding support from the Center for Nonlinear Science (CNLS) and the LDRD project #20140013DR at Los Alamos National Laboratory (LANL). B.K. and A.S. were supported by the Air Force Office of Scientific Research (AFOSR) Grant #FA9550-0601-1-0260.

## Author information

Authors

### Contributions

The study was planned and the manuscript prepared by P.V.B., B.K., T.L., and A.S. Machine learning studies was performed by P.V.B and T.L. Experiments were performed by B.K and A.S. All authors discussed the results, wrote, and commented on the manuscript.

### Corresponding authors

Correspondence to Prasanna V. Balachandran, Alp Sehirlioglu or Turab Lookman.

## Ethics declarations

### Competing interests

The authors declare no competing interests.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

Reprints and Permissions

Balachandran, P.V., Kowalski, B., Sehirlioglu, A. et al. Experimental search for high-temperature ferroelectric perovskites guided by two-step machine learning. Nat Commun 9, 1668 (2018). https://doi.org/10.1038/s41467-018-03821-9

• Accepted:

• Published:

• DOI: https://doi.org/10.1038/s41467-018-03821-9

• ### Machine learning predictions of surface migration barriers in nucleation and non-equilibrium growth

• Thomas Martynec
• Christos Karapanagiotis
• Stefan Kowarik

Communications Materials (2021)

• ### Machine learning for perovskite materials design and discovery

• Qiuling Tao
• Pengcheng Xu
• Wencong Lu

npj Computational Materials (2021)

• ### Machine learning autonomous identification of magnetic alloys beyond the Slater-Pauling limit

• Yuma Iwasaki
• Masahiko Ishida

Communications Materials (2021)

• ### Emerging materials intelligence ecosystems propelled by machine learning

• Rohit Batra
• Le Song

Nature Reviews Materials (2021)

• ### Nanoparticle synthesis assisted by machine learning

• Huachen Tao
• Tianyi Wu
• Eugenia Kumacheva

Nature Reviews Materials (2021)