Introduction

Noncentrosymmetric (NCS) oxide ceramics that break all improper rotations and centres of symmetry are challenging to discover. Materials with polar, piezoelectric, chiral and those exhibiting circular dichroism (collectively referred to as NCS materials) are defined by the absence of inversion symmetry and are present everywhere—in the form of organic amino acids, sugars and other biological molecules1. Inorganic NCS materials containing oxide anions are also not uncommon2. Quartz crystals with a helical arrangement of corner-connected SiO4 tetrahedral units maintain the punctuality of our mechanical timepieces3. At inorganic crystalline surfaces, chirality plays a crucial role in corrosion processes, heterogeneous catalysis and the fidelity of enantioselective-based production or separation of industrial solvents, plastics and pharmaceutical drugs4. Pb(Zr,Ti)O3, BaTiO3 and BiFeO3 are some of the archetypal polar oxides that have impacted many critical technologies5. Often inorganic polar and chiral basic building units (BBUs) are selected and assembled together, but acentric organization of BBUs within a unit cell are difficult to predict due to the complex interplay of chemistry and structure.

In the context of inorganic oxides, which is the focus of this work, the design of NCS materials has relied mainly on BBUs with metal centres that have d0 electronic configurations or lone-pair cations, where the acentricity arises from an electronic origin due to the pseudo- or second-order Jahn–Teller (SOJT) effect6,7. A majority of inorganic oxides, however, strongly prefer close-packed arrangements of ions and highly symmetric cation coordination environments (for example, octahedra). This is mainly due to the dominant electrostatic effects that are optimized by favouring like–unlike interactions (that is, positive and negative dipoles align equally and oppositely), which stabilize atomic arrangements with inversion symmetry8. In fact, the presence of BBUs with d0 metal centres alone is not a sufficient condition for designing NCS materials. For example, the perovskite SrTiO3 is a quantum paraelectric or incipient ferroelectric9, whereas the isoelectronic layered Ruddlesden-Popper (RP) Sr2TiO4 is a centric dielectric10. Hence, it is the complex interplay between structure and chemistry that determines the formation of NCS inorganic oxides.

Alternative to pseudo-JT or SOJT effects, the ‘trilinear coupling’, ‘hybrid improper’ or ‘geometric ferroelectricity’ mechanism, where two nonpolar lattice distortions (octahedral rotations or tilting) couple to a polar lattice mode, have also been shown to break the inversion symmetry with interesting technological consequences11,12. Even in this case, no a priori rules exist that guide the design of new hybrid improper ferroelectric materials, unless exhaustive calculations are carried out to map the chemical and energy landscape that subsequently inform experiments12. As a result, NCS inorganic oxides are challenging to discover.

Although high-throughput first principles-based methods have shown promise in the design of NCS half-Heusler alloys13, exhaustive calculations for more complex crystal structures with numerous polymorphs (such as the RPs) and thousands of unexplored chemical compositions have not (yet) been demonstrated. This is partly because the potential energy surface of complex oxides is difficult to navigate. Phonon instabilities at high-symmetry points away from the Γ-point in the irreducible Brillouin zones cause the primitive unit cell to multiply several fold, resulting in large system sizes and vast numbers of unique atomic arrangements. It is challenging to rigorously evaluate the energetics of all structures in a high-throughput manner. Furthermore, chemistries with partially filled d (and/or f) orbitals and the existence of energetically competing ground states complicate the structure prediction process. As a result, novel approaches are desired to guide the first principles calculations in an effective manner. Materials informatics, a growing field at the intersections of many scientific disciplines including data and information science, statistics, machine learning (ML) and optimization, has the potential to accomplish this objective14.

Here we develop a predictive data-driven computational framework that unites applied group theory, informatics techniques and ab initio electronic structure calculations for designing novel NCS materials. We apply it to the two-dimensional n=1 RP structure family (Fig. 1a), for which to date few compositions exist in NCS crystal classes15,16,17. Nonetheless, the chemical search space is (Fig. 1b). We use informatics-based methods to screen the chemical space and downselect 242 compositions that show greater promise for NCS ground states. The potential for discovering novel NCS n=1 RP compounds has key implications in technological applications that require a broad range of functionalities including high-temperature piezoelectricity, tunable bandgaps, improper ferroelectricity, multiferroicity and thermoelectricity. We focus in detail on the design of NaRSnO4 stannates and NaRRuO4 ruthenates (where R=La, Pr, Nd, Gd or Y) that were predicted to have NCS ground state structures from informatics and subsequently validated by density-functional theory (DFT). For the stannates, which are candidate materials for sensors and transparent conducting oxides18, we find two energetically competing NCS ground state phases: P21m (piezo-active) and P21212 (chiral and piezo-active). We calculate their electronic bandgaps in the P21m crystal symmetry using hybrid exchange-correlation functionals, finding optical transparency in the visible light regime. We also compute their piezoelectric responses that show a dependence on R-cation size. In sharp contrast, the NCS NaRRuO4 are magnetic with metallic, half-metallic or insulating electronic structures. Their ground state is determined to be either piezo-active with P21m symmetry when R=La, Pr and Nd or polar with Pca21 symmetry when R=Gd and Y. Moreover, there is a transition from ferromagnetic metallic (R=La) or half-metallic (R=Pr, Nd) to antiferromagnetic insulating (R=Gd, Y) character as a function of R-cation size. Therefore, these bulk ruthenates are predicted to belong to the intriguing class of NCS metals19,20 and half-metals with piezo-active symmetries or antiferromagnetic insulators with polar symmetry (that is, multiferroics). Last, we also test our predictions for an additional nine new compounds with different cations occupying the B-sublattice of the RP structure (shown in Fig. 1a). Among them, seven were validated to have an NCS ground state structure—NaLaZrO4, NaLaHfO4, KBaNbO4, NaLaIrO4, NaCaTaO4, SrYGaO4 and SrLaInO4. These results establish our computational framework as a powerful tool for crystal symmetry classification, structure-based property design and control.

Figure 1: Octahedral connectivity of n=1 RP oxides and the chemical search space.
figure 1

(a) The n=1 RP phase has a single layer of octahedra that are connected in two dimensions, shown within brackets, whereas there is no connectivity in the third dimension. (b) Periodic table showing the potential 30 A-site and 19 B-site elements that occupy the n=1 RP phase. In principle, there are more than 19 B-site elements when we also consider the multiple valence states of certain elements (for example, Mn, Fe, Co, Ni and so on). This defines the chemical space for our informatics approach.

Results

Approach

Our search for NCS oxides relies on a multifaceted theoretical approach, which reformulates the discovery objective into identifying structure—chemistry interrelationships (as shown in Fig. 2). The design strategy focuses on three key criteria obtained by subdividing the design process into unique objectives with specific tasks:

  • Structural: How can the atomic structure, or configuration of oxygen octahedra BBUs, be designed to support the desired interaction?

  • Chemical: Which combinations of chemistries will promote that structural configuration?

  • Stability: Is the proposed composition the global ground state?

Figure 2: Predictive materials discovery framework.
figure 2

Synergistic integration of applied group theory, materials informatics and ab initio electronic structure calculations for designing novel functional materials. Applied Group Theory determines the geometric rules, uncovers the crystallographic symmetry restrictions and then subsequently shows how to lift them to achieve NCS structures for a given crystal structure topology. Materials informatics uses the data from experiments, features (such as orbital radii) that capture the chemical trends in the constructed data set and statistical inference tools to extract QCSR that guides selection of chemical compositions. DFT calculations validate the predictions from materials informatics. We then recommend the validated chemical compositions for experimental synthesis and characterization, eventually leading to its discovery. Experimentally synthesized compositions augment the training set for a second materials informatics iteration and the process repeats until desired materials are discovered14. In this paper, we focus on computational tasks 2 and 3 (boxed).

Following classification learning from informatics and evaluation of the energetic stability from first principles methods, the final design relies on response optimization by leveraging additional degrees of freedom to further promote the targeted behaviour. Some of the strategies include searching for microscopic mechanisms and external conditions (such as epitaxial strain) to energetically stabilize those geometries. We note that this paper is a significant advancement from the earlier work of Balachandran et al.15 where the emphasis was on enumerating symmetry guidelines.

Group theory

In an earlier work, Balachandran et al.15 formulated symmetry guidelines for exploring and designing NCS phases in the n=1 RP structures based on group theory. Therefore, we discuss only the key results here. Starting from the centrosymmetric (CS) aristotype structure (shown in Fig. 3a), various symmetry-allowed cooperative atomic displacements (also referred to as ‘shuffles’) were enumerated that transform the aristotype CS structure to a NCS structure of lower symmetry. Particularly, the focus was on CS→NCS phase transitions that are second order or weakly first order, where the symmetry-lowering distortions arise from (i) non-polar octahedral distortions (tilting or rotations) due to phonon softening at the zone boundaries in the BZ of the I4/mmm space group, (ii) A/A′ cation ordering, (iii) the interplay between two or more octahedral distortions and (iv) the interplay between octahedral distortions and A/A′ cation ordering. The necessity to search for alternative routes to breaking inversion symmetry was motivated by the fact that NCS phases are seldom seen in n=1 RPs, which has been explained by the disconnected octahedral layers destroying the coherency required for cooperative off-centring displacements, and thus ferroelectricity21.

Figure 3: A/A′ cation ordering and octahedral tilting in the n=1 RPs for NCS materials design.
figure 3

(a) High-symmetry aristotype structure (φ, I4/mmm). (b) One of the A/A′ cation ordering schemes (irrep: (η1); space group (s.g.): P4/nmm). (c) Out-of-phase octahedral tilting (oxygen displacements indicated using arrows) (irrep: (η1,η1); s.g.: P42/ncm) and lattice constants a and b are of equal length. (d) Out-of-phase octahedral tilting (irrep: (η1,η2); s.g.: Pccn) and lattice constant ab. (e) Coupled distortions (irrep: (0,η1;η2,0); s.g.: Pbca), where (0,η1) and (η2,0) represent Jahn–Teller-like out-of-plane compression and out-of-phase octahedral tilting, respectively.

Balachandran et al.15 found three important symmetry guidelines (given in the rows of Table 1) for lifting parity in the n=1 RP structures. Note that all involve A/A′ cation ordering (Fig. 3b) that transform as irreducible representation (irrep) and couple with octahedral rotations or tilting (as shown in Fig. 3c–e). The structural attributes may be satisfied by any of the following approaches:

Table 1 Irreps, OPDs, SGs and mode representation of distorted structures arising from rotational modes ( and ) and A-site cation ordering ().

Route 1: Out-of-phase octahedral tilting that transform as irrep with order parameter direction (OPD) (η1,η1), which on superposition with irrep (η1) would yield a piezoelectric (P21m) space group (Fig. 3c).

Route 2: Out-of-phase tilting that transform as irrep with OPD (η1,η2) on superposition with (η1) would yield a chiral (P21212) and piezo-active space group (Fig. 3d).

Route 3: Coupled irrep with OPD (0,η1;η2,0) when superposed with irrep (η1) would yield a polar (Pca21) space group (Fig. 3e), where the matrix elements of and irreps accommodate atomic displacements that correspond to Jahn-Teller-like distortions and out-of-phase tilting, respectively.

Note that there is another type of A/A′ cation ordering, transforming as irrep , which lifts inversion solely from the ordering (we refer to it as the trivial case). However, we do not consider A/A′ cation ordering here. Therefore, the key materials design question is: What combinations of chemical elements from the vast chemical space would stabilize these NCS phases? We address this question using materials informatics.

Materials informatics

In Fig. 4, we show the frequency of occurrence of experimentally known crystal symmetries in the bulk n=1 RPs. We report only the low temperature crystal symmetries in Fig. 4 and do not explicitly consider temperature dependence of the crystal structures in our informatics analysis. Our definition of low temperature includes experimentally observed structures ≤300 K. Some RP compounds also undergo structural transformation at a much lower temperature (for example, La2NiO4 (ref. 22)). Under such circumstances, we take the lower temperature crystal structure to be our label for informatics. This simplification was necessary because 0 K DFT calculations are used to validate the informatics-based predictions. Balachandran et al.15 showed that as the temperature increases, the propensity for forming high-symmetry phases also increases. We anticipate those results to hold here.

Figure 4: Distribution of experimentally known RP oxides.
figure 4

Our survey resulted in a total of 84 compounds, which we note represents only a small fraction of the overall combinations of hypothetically feasible chemistries. Except for the nine compounds indicated in space groups P21m and Imm2, there are no other experimental reports of NCS phases in n=1 RP oxides. Inset: The space groups are transformed into their corresponding irreducible representations (irreps) and A/A′ cation ordering is not explicitly considered. The symbol φ denotes no octahedral rotation or tilting. Irreps that we target for NCS materials design are indicated using the dotted rectangle in the inset.

Our literature survey shows that 45% of the compositions are undistorted (denoted as φ in Fig. 4). Similarly, there are also a significant number of compositions that undergo symmetry-lowering distortions, albeit preserving the spatial inversion symmetry. One of the key observations from Fig. 4 is that there are only nine compounds with NCS space groups that conform with our chemical search space (Fig. 1b). In the literature, the family of cation-ordered NaRTiO4 and LiRTiO4 (found only recently), where R=La, Nd, Dy, Gd, Sm, Ho, Eu and Y, have been experimentally shown16,17 to have the piezoelectric P21m space group [ (η1,η1;η1)]. The nominal electronic configuration of Ti4+ in these compounds is d0. The coupling between TiO6 octahedral tilting (that transform as irrep (η1,η1) as shown in Fig. 3c) and Li/R or Na/R cation ordering (that transform as irrep (η1) as shown in Fig. 3b) lifts the inversion symmetry—in accordance with Route 1. The only other experimentally known polar n=1 RP oxide is the A- and B-site-ordered (LaSr)(Li0.5Ru0.5)O4 compound, which is reported in the NCS Imm2 space group23. In this compound, a combination of A-site and B-site cation ordering work in concert to lift the inversion symmetry. In addition to these compounds, Pb2TiO4, Ca2IrO4, Sn2SnO4, cation-ordered LaANiO4 (A=Sr, Ca and Ba) LaSrAlO4 and LaSrMnO4 have also been theoretically predicted to have NCS structures15,24,25,26,27,28; however, these results have not been experimentally verified. Recently, the metastable Ca2IrO4 was epitaxially grown on a YAlO3 substrate in the n=1 RP phase using pulsed laser deposition29. However, the authors did not report its crystal symmetry. Therefore, we do not consider these chemistries in our informatics analysis.

In the family of n=1 RPs with relatively simple stoichiometries such as AA′BO4, where A and A′ are two chemical species (similar or dissimilar) occupying the A-site and B is a cation with 6-fold octahedral coordination, there are 3,200 potential chemical compositions that satisfy crystal chemistry and stoichiometric guidelines (for example, charge neutrality), and therefore are, in principle, amenable for experimental synthesis. However, only 3% have been experimentally synthesized, and among these, only nine have NCS phases. The objective of our informatics analysis is to utilize statistical inference and machine learning (ML) methods for establishing quantitative chemistry-symmetry relationships (QCSR) of known materials in Fig. 4. These QCSRs, in turn, serve as a guide to rapidly screen the vast chemical space and identify new, previously unexplored compositions that favour the distortions given in the Table 1.

Data set

In our ML approach, we build a data set of experimentally known materials that includes both CS and NCS structures. Even though our computational design focuses on AA′BO4 stoichiometries, our training data set includes RP compositions that deviate from the AA′BO4 stoichiometry (see data set in the Supplementary Information). We describe each n=1 RP composition uniquely in terms of its crystal symmetry or irrep (referred to as ‘class label’ in the ML jargon) and a set of features. We use Waber–Cromer orbital radii as features for ML30. Orbital radii and distortion modes have been utilized in the past for predicting structures and formabilities of complex oxides31,32. Our ML objective is to build a classification model that predicts crystal symmetries or irrep labels from orbital radii. All 83 experimentally known RP chemical compositions (after removing (LaSr)(Li0.5Ru0.5)O4, because we do not consider the element Li in our chemical space, see Fig. 1b) were written in the simplified A2BO4 stoichiometric form, where the A- and B-sites can have two or more elements with partial site occupancies. We used a total of 12 and 10 orbital radii features to describe the A- and B-sites, respectively. If there were two or more elements occupying either the A- or B-sites, then linear combinations weighted by their relative stoichiometric proportions were used to build the features.

We constructed two data sets for classification learning that uses: (i) space groups as class labels (an obvious choice) and (ii) irreps corresponding to octahedral tilting, rotations, or lack thereof as class labels. Here, we focus mainly on the ML results from the latter data set (case (ii)) that uses irreps as class labels, which allows us to elegantly isolate octahedral rotations or tilting from cation ordering. As a result, we can group or combine two space groups under the same label. For example, we combine compositions with the I4/mmm and P4/nmm space group together (under the label, φ), because in both cases there are no octahedral rotations or tilting. One of the key differences between I4/mmm and P4/nmm is that in P4/nmm the A-site Wyckoff orbit is split into two unique crystallographic sites15. Similarly, we can combine space groups P21m and P42/ncm into a single irrep, (η1,η1). Such data transformation reduces the number of unique class labels from 9 to 7 (see inset in Fig. 4) for classification learning. The main disadvantage with such grouping is that our QCSR model now cannot distinguish between ordered and disordered structures. This should not affect our NCS materials design goal because of advancements in the nonequilibrium synthesis and processing of these oxides. Recently, there have been experimental demonstrations of layer-by-layer growth of A/A′ cation-ordered n=1 RPs using molecular beam epitaxy with unprecedented control33. We also tested the predictive power of our ML models by intentionally leaving out 14 compounds during training (which reduces the size of our training set from 83 to 69 compounds). One of our informatics goals is to validate whether our classification learning can identify the labels correctly for the left out compounds, before using them for making new predictions.

Even after reducing the number of unique class labels from 9 to 6 (since there is only one chemical composition with irrep , which we do not consider for ML), we must still address the problem of class imbalance, where some irrep class labels are found more frequently than others. This kind of class imbalance is problematic for ML. To test the implications of class imbalance, we trained a decision tree classification model using the imbalanced data set and found that compositions with space group Pccn or (η1,η2) were 100% misclassified. As shown in Table 1 and Fig. 3, Pccn or (η1,η2) is one of the desired class labels for designing NCS materials. Therefore, the class-imbalance problem must be addressed.

A number of methods have been developed in the computer science and artificial intelligence literature to overcome the class-imbalance problem34,35. Some of them include: oversampling (that is, randomly duplicating instances of the under-represented class category), undersampling (random removal of instances of the most frequently occurring class) and interpolation schemes. In this work, we utilize an oversampling scheme referred to as synthetic minority class oversampling technique (or SMOTE)34, in which the under-represented class labels are oversampled by creating ‘synthetic’ examples of extra or fictitious training data points from the original imbalanced data. It is based on a k-nearest-neighbour analysis and one of its main advantages (relative to other algorithms) is that the extra data points, in principle, informs the ML models to create larger and less specific decision regions. Additional details about the algorithm are described in the Methods section.

We took the data set that contained irreps as class labels and applied SMOTE to construct synthetic data points for the two irrep labels, P4 and (η1,η2). We created a total of three and six synthetic data points for the under-represented P4 and (η1,η2) labels, respectively. Our training data set size now increased to 78 compounds (69 originally+9 from SMOTE) for classification learning. We confirmed using principal component analysis (PCA) that SMOTE did not affect our data manifold (Supplementary Fig. 1).

Data preprocessing

Our NCS materials design is initiated by exhaustively enumerating, at first, all possible AA′BO4 combinations that satisfy crystal chemistry and stoichiometric rules (for example, charge neutrality). As noted before, we use Waber–Cromer orbital radii as features. We then augment this exhaustive data set with the 78 n=1 RPs. Note that at this point, we do not include the irrep class labels in our data set. Now, we have a total of 3,253 chemical compositions and 22 orbital radii features.

We autoscaled the data (normalized to zero mean and unit variance) and applied PCA, which constructs linear combinations of weighted contributions of orbital radii (see Supplementary Figs 2 and 3). In a recent work, Balachandran et al.36 showed that in a data set containing orbital radii as features, PCA removes redundancy of information, reduces data dimensionality and constructs physically meaningful linear combinations of orbital radii (see Supplementary Note 1). In addition, principal components (PCs) are also independent of one another (assuming Gaussian or Normal distribution). After PCA, we reduced the dimensionality of our data set from 22 orbital radii features to 8 PCs, which together capture >90% of total variance in the data set. We then identify and isolate 78 chemical compositions for which the irrep labels are experimentally known; we refer to this data set as the training set. The remaining compositions are referred to as the ‘virtual set’ defining the vast chemical search space yet to be explored for new NCS materials design.

Classification learning

We utilized the J48 decision tree classification learning algorithm, as implemented in WEKA, for establishing QCSR37,38. The reasons for choosing the J48 algorithm are discussed in the Methods section. We constructed five bootstrapped samples of 78 compositions each from the original training set. We then trained the decision tree algorithm using the five bootstrapped samples and constructed five decision tree models (Supplementary Figs 4–8). The classification accuracies for the five decision tree models were evaluated on the training data set and by 10-fold cross-validation. The results are given in Supplementary Table 1 and Supplementary Note 2. The average classification accuracy from the five bootstrapped decision trees using the 10-fold cross-validation is 80%. These results indicate that more accurate QCSR models could potentially be formulated either through alternative feature selection methods39 or by utilizing other (kernel-based) ML algorithms (which we do not address here). Furthermore, we also tested our decision trees to determine whether they could correctly identify the irrep labels for 14 compounds, which were intentionally held out during the training process. Results are given in Table 2. Our ensemble of decision trees correctly labelled with ≥60% accuracy (except for YSrCrO4 and Ca2CrO4) 12 out of 14 compounds in the independent test set, giving confidence in our classification learning.

Table 2 A comparison between experimental and predicted irreps to independently validate the classification models.

Using the five bootstrapped decision trees, we screened a total of 3,175 compositions in the virtual set and filtered 242 new compositions that showed potential for NCS ground state structures. At this stage, we retained only those compositions that were identified to be NCS, that is, belonging to either (η1,η1), (η1,η2) or (0,η1;η2,0), by at least three out of the five decision trees. We then created additional filters to remove data points that contained (i) toxic elements, such as Pb, Hg and Cd, (ii) compositions where both A and A′ sites were occupied by the same element and (iii) compositions with A or A′ site elements that were not part of the original training data set (for example, Cs, Rb, Tl, Ag and Mg).

We note that some disagreement is expected between our predictions and experiments (or calculations), particularly when concerned with the transition metal elements whose valence state falls within the strong electron correlations regime (for example, Ti3+, Cr3+, V3+, Mn3+ and so on), mainly because there were very few instances of chemical compositions with these transition metal cations in our training set. Our refined results, after screening through various filters and removing chemical compositions that could fall in the strongly correlated regime, included a total of 242 new chemical compositions that show promise for NCS structures.

The following octahedral B-site cations in the virtual set are predicted to have NCS structures in the n=1 RP oxides: Ga3+, In3+, Ti4+, Zr4+, Ru4+, Sn4+, Hf4+, Ir4+, Nb5+ and Ta5+. We could also exclude In3+, because of the experimental difficulties in forming n=1 RP structures using equilibrium synthesis and processing techniques40 (although we do not preclude stabilizing In-based n=1 RPs using non-equilibrium methods). The chemical compositions for all predicted NCS materials are listed in Table 3. Additional details can be found in Supplementary Table 2, Supplementary Note 3 and the data sets can be downloaded from ref. 41. To summarize, using informatics we identified 242 new n=1 RP chemical compositions with potential for NCS crystal structures, which significantly expands the chemical space of NCS n=1 RP oxides (25-fold increase).

Table 3 Full list of 242 predicted AA′BO4 RP compounds from classification learning that show propensity towards NCS structures.

Density-functional theory

On the basis of the group theory and materials informatics analysis, we first validate our predictions by assessing the energetic stability component (Task 3 in Fig. 2) for ten downselected NaRSnO4 and NaRRuO4 compounds, where R is a rare-earth element (R=La, Pr, Nd, Gd and Y) using DFT calculations. In our calculations, Na1+ and R3+ cations were ordered in accordance with the irrep label (η1), as shown in Fig. 3b. To the best of our knowledge, no previous experimental or theoretical data exists for either NaRSnO4 or NaRRuO4 compounds. In addition, stannates have implications in the design of transparent conducting oxides18 and ruthenates are potential materials for investigating metal–insulator transitions42.

We choose especially NaRSnO4 and NaRRuO4 for validation, motivated (albeit naively) by the adaptive design paradigm14, where the objective is to iteratively improve the predictions of the classification model. Typically, the improvements are made by choosing chemical compositions for experiment that show promising characteristics (such as NCS crystal classes as discussed here), yet have large uncertainties. Here, NaRSnO4 and NaRRuO4 satisfy these criteria, because the predictions from the five decision trees were (NCS), (η1,η2) (NCS), (0,η1) (CS), (NCS) and (η1,η2) (NCS), corresponding to Pca21 (polar), P21212 (chiral), Pbcm (centrosymmetric), Pca21 (polar) and P21212 (chiral) space groups, respectively. Four out of the five decision trees predict these compounds to have a chiral or polar structure, making them promising NCS candidates, yet the irrep labels or space groups are different, indicating uncertainty. Furthermore, with stannates the nominal electronic configuration of Sn4+ (4d10) is different from that of SOJT-cation Ti4+ (3d0), thereby presenting an interesting case for comparison between the two B-site octahedral cations. The Shannon ionic radii for Sn4+ and Ti4+ in the six-fold coordination are 0.69 and 0.605 Å, respectively43, making their ionic sizes within the hard-sphere model also different. Similarly, ruthenates (with Ru in nominally 4+ ionic state) have partially filled 4d electrons with four electrons occupying the t2g orbital manifold and are quite distinct from the 3d0 titanates.

Stannates

We performed full structural relaxations for NaRSnO4 (where R=La, Pr, Nd, Gd and Y) within the generalized gradient approximation (cf. Methods). The phonon dispersions are given in Supplementary Fig. 9, from which we identify a common set of six candidate crystal symmetries from ‘freezing in’ the imaginary phonon modes of the high-symmetry paraelectric reference phase (P4/nmm) for determining the ground state structure. They include Pmn21, Pc, P21m, P2m, I2m and Pnma. In addition to these six crystal symmetries, we also considered three more symmetries, namely P21212, Pbcm and Pca21, as recommended by ML to unambiguously confirm the ground state. Therefore, in total, we considered nine distorted candidate structures. The total energy data from DFT calculations is given in Table 4, which shows that all stannates exhibit a strong energetic competition between the NCS piezoelectrically active P21m [ (η1,η1)] and chiral P21212 symmetries [ (η1,η2)]. We find that the total energy difference is <0.1 meV per f.u. (Table 4) between the two NCS phases. A closer examination of the two converged crystal structures revealed that they differ mainly in the in-plane lattice parameters (in P21m a=b, whereas in P21212 ab and this is shown in Fig. 3c,d, respectively). Furthermore, in P21212 the in-plane lattice constant a was found to be not equal to b only in the fourth or fifth decimal point. Therefore, we assign the ground state structure to be NCS P21m space group for the stannates. We conclude from our DFT calculations that the RP stannates are NCS, in good agreement with the insights from ML and the inversion symmetry is broken due to the coupled action of SnO6 oxygen octahedral tilting and Na/R cation ordering (Route 1).

Table 4 The total energy difference and thermodynamic stability for different known and predicted RP phases from Quantum ESPRESSO63.

We then computed the bandgaps (Eg) for each of the compounds using the HSEsol exchange-correlation functional (which often more accurately reproduces experimental results44) and found them to be in the range 4.3 to 4.5 eV (Table 5), similar to Ba2SnO4 (Eg=4.41 eV)18. The amount of exact exchange used in the calculations was tuned using the known experimental bandgap of BaSnO3 (ref. 45).

Table 5 Bandgap (Eg in eV) at the HSEsol level for each NaRSnO4 compound from VASP69,70 in the NCS P21m space group.

We next computed the piezoelectric strain coefficients (dij) for each compound in P21m space group (Fig. 5); the dij response is marginally smaller than that reported for the titanates16, but follows the same trend (increasing with decreasing atomic radius, up to R=Gd and then decreases).

Figure 5: Calculated piezoelectric coefficients.
figure 5

Piezoelectric strain coefficients (y axis) for the P21m NaRSnO4 structures as a function of the rare-earth cation ionic size in Å, rRE (x axis). There are three symmetry-allowed dij components (d14, d25 and d36) and two of which are equivalent (d14=d25).

Ruthenates

All DFT calculations were performed using the spin-polarized DFT+U method, where an effective Hubbard-U of 1.5 eV was used to treat the correlated Ru 4d electrons (cf. Methods). The phonon dispersions are given in Supplementary Fig. 10 and show some similarities with the stannates. We explored a total of nine distorted crystal symmetries to determine the ground state (six from phonon calculations and three from ML). The total energies from DFT+U for NaRRuO4 in different crystal symmetries and ferromagnetic spin order are given in Table 4; the ground state is determined to be NCS for NaLaRuO4, NaPrRuO4 and NaNdRuO4 with two competing structures, P21212 and P21m. Moreover, in the P21212 symmetry, a was found to be not equal to b only at the fourth decimal point (similar to the stannates). We also performed additional DFT+U calculations for the top two lowest energy structures (namely P21m and Pca21), where we now impose antiferromagnetic spin order on the in-plane Ru atoms (shown schematically in Supplementary Fig. 11). The total energy results are given in Table 6, from which we conclude that the NCS P21m space group with ferromagnetic Ru4+–O2−–Ru4+ interactions is the likely ground state for these compounds (Route 1).

Table 6 Total energy difference (ΔE in meV per atom) with respect to the lowest energy structure for NaRRuO4 in two P21m and Pca21 structures with both FM and AFM spin configurations.

In the case of NaGdRuO4 and NaYRuO4, the ground state structure is also determined to be NCS, but in polar Pca21 crystal symmetry (see Table 4). Furthermore, in both NaGdRuO4 and NaYRuO4, the Pca21 structure with in-plane antiferromagnetic Ru4+–O2−–Ru4+ interactions (Supplementary Fig. 11) were found to be 1.44 and 5.54 meV per atom lower in energy, respectively, than that for the ferromagnetic structures. The total energy data along with Ru-atom magnetic moments are given in Table 6. Thus, we predict NaGdRuO4 and NaYRuO4 to have polar Pca21 ground state structures (Route 3) with antiferromagnetic spin order.

We also calculated the electronic band structures for all five NaRRuO4 in their respective ground states. The results are shown in Supplementary Fig. 11. We find that NaLaRuO4 is metallic with bands crossing the Fermi level in both the spin-up and spin-down electron channels. On the other hand, the NaPrRuO4 and NaNdRuO4 are found to be half-metals, that is, bands cross the Fermi level only in the spin-down channel and a gap appears for the spin-up channel. Moreover, the size of the gap increases as the rare-earth cation size decreases. This occurs because the relative amplitude of RuO6 octahedral tilting also increases with decreasing rare-earth cation size, impacting the electronic bandwidths of the Ru-t2g orbitals. Note that this is not the first time ferromagnetic metals or half-metals are reported in ruthenium-based oxides46,47. However, our intriguing finding is that NaLaRuO4, NaPrRuO4 and NaNdRuO4 RP oxides are also NCS with piezo-active symmetries. Thus, these compounds add to the growing list of NCS metals19,20 or half-metals with unusual coexisting properties (broken inversion symmetry and metallic-like conduction).

In contrast, the NCS NaGdRuO4 and NaYRuO4 are found to be insulating with a gap appearing in both spin-up and spin-down electron channels (see Supplementary Fig. 11). We note that ruthenium oxides with antiferromagnetic insulating ground states are also not uncommon. For example, RP Ca2RuO4 is a known antiferromagnetic insulator in the CS Pbca space group (Fig. 3e) at low temperatures48,49. Thus, we predict NaGdRuO4 and NaYRuO4 as potential multiferroics with polar symmetry, antiferromagnetic spin order and a bandgap. Are these stannates and ruthenates also thermodynamically stable? We address this question in the next section.

Thermodynamic stability

We use grand canonical linear programming50 to determine the thermodynamic stability for the predicted RP stannates and ruthenates. The ‘reservoir’ of stable compounds present in the Open Quantum Materials Database51 were chosen to describe the theoretical convex hull. The process involves calculation of the total energy change (ΔED) for a chemical reaction involving reactants that are known to be thermodynamically stable and a product, which is the ground state structure of our predicted RP compounds. Compounds with negative ΔED are identified to be thermodynamically stable.

It is also important to note that compounds with positive ΔED (metastable) have also been synthesized. Commonly, when ΔED is <+25 meV per atom above the convex hull, it is suggested that the composition could be potentially synthesized under appropriate experimental conditions52. To evaluate this criterion for our design problem, we first calculated the ΔED for Ca2IrO4 that was recently epitaxially grown in the RP structure-type using the pulsed laser deposition method29. It is well known in the literature that Ca2IrO4 in RP structure type is a metastable phase29. Our main motivation is to compare the ΔED for Ca2IrO4 with our newly predicted compounds (especially those with positive ΔED) and glean additional insights. The results are given in Table 4. The ΔED for RP Ca2IrO4 in the theoretical ground state and high-symmetry structures are +34 and +156 meV per atom, respectively, above the convex hull, yet it was successfully synthesized. We give the ΔED data for both the theoretical ground state and high-symmetry structures, because Souri et al.29 do not report the crystal symmetry of their thin film, and therefore the reference point is unclear.

Having benchmarked the ΔED data for Ca2IrO4, we return to our predicted NCS stannates and ruthenates. In Table 4, we provide the ΔED data for both stannates and ruthenates. The associated decomposition reactions are given in the Supplementary Note 4. Two out of 10 compounds—NaGdRuO4 and NaYRuO4—have negative ΔED, and therefore, we identify them to be thermodynamically stable and promising for synthesis. The remaining eight compounds have ΔED≤+82 meV per atom.

Additional predictions

In Table 7, we report our results for nine additional randomly chosen compounds that were predicted to have NCS ground state structures from ML. The total energy data, along with the different crystal symmetries obtained from both phonon calculations and ML, are given in the Supplementary Table 3. Seven out of nine compounds are found to have NCS ground state structures, in good agreement with our classification learning. Note that some of them (for example, KBaNbO4 and NaCaTaO4) have space groups that are not seen in any known or reported RP compounds (see Fig. 4). This is because we did not constrain our DFT calculations to only known structures or those from ML, but performed phonon calculations and full structure relaxations. The decomposition energies, ΔED, for all nine compounds are also given in Table 7. Six out of nine predicted compounds have either a negative ΔED (thermodynamically stable) or ΔED≤34 meV per atom (that is, stable relative to Ca2IrO4), indicating promise. Experimental results are necessary to confirm these predictions. In Table 3, chemistries for all 242 predicted RP oxides that show potential for NCS structures are listed. The DFT optimized ground state crystallographic information files for all 19 compounds can be downloaded from ref. 53.

Table 7 DFT aided validation for nine randomly selected RP oxides that were predicted to have an NCS ground state structure from ML.

As a general observation, we note that the NCS P21m space group that we predict for 13 out of 19 compositions from DFT is also one of the most commonly observed experimental ground states16,17 (also see Fig. 4) for the n=1 RP compounds.

Discussion

We developed a computational strategy built on the foundations of applied group theory, ML and DFT to design NCS RP compounds. In terms of the novelty of our informatics approach, we note that the use of irreps as class labels for ML is new to materials science. Normally, space groups are utilized. The role of group theory in our framework was to transform the space groups into irreps. From using irreps as class labels for ML, we were able to reduce the complexity of our classification problem from 9 to 6 class labels. Even after reducing the complexity, we found that our data set suffered from class imbalance. To address this deficiency, we applied the SMOTE algorithm to generate synthetic data points and then constructed an ensemble of decision trees for irrep classification. Our decision trees identified 242 new compositions (from screening 3,200 compositions) that show potential for NCS ground state. We tested our prediction for 19 compositions using DFT, among which 17 were validated to have an NCS ground state structure. We thus find good agreement between our informatics-based predictions and DFT ground state structures. One of the major design outcomes is the identification of two new multiferroics (NaGdRuO4 and NaYRuO4), which were also determined to be thermodynamically stable.

It is also important to recognize that not all our ML predictions agreed with the DFT calculations. For example, KLaIrO4 and BaLaGaO4 were predicted to be NCS but our frozen-phonon calculations and full structural relaxations from DFT indicate disagreement (Table 7). Moreover, the inconsistencies are found to be pronounced when both A/A′ cations have relatively large ionic sizes (for example, K, Ba or La). Our DFT calculations reveal that the presence of large A/A′ cations significantly reduces the amplitude of octahedral tilting, which we ascribe to the steric effects. Our ML models appear to incorrectly classify them as NCS.

There are several ways to reduce such misclassification errors and improve our ML prediction accuracies. We list some of them here: First, one of the most promising directions is to synthesize the predicted materials and determine the crystal structure for each compound, which will allow us to augment our data set with new data points and retrain our ML models. We anticipate our ML models to learn rapidly from these new data points and improve their prediction accuracy in subsequent iterations32. Second, our current ML models are based on five decision tree classifiers; one of the natural extensions would be to construct more than five bootstrapped samples and generate additional decision trees (or apply a random forest algorithm with hundreds of classifiers) that could, in principle, reduce the misclassification errors. Also, exploring kernel-based ML algorithms, such as support vector machines and semisupervised learning schemes represent alternative informatics-based avenues to gain confidence or reduce uncertainties in our predictions.

Furthermore, we demonstrated the use of the SMOTE algorithm for the first time in materials design problems; recently, a number of new algorithms35 have been developed for addressing similar class-imbalance problems, which could also be explored. We note that class-imbalance problems are ubiquitous in materials design and remains an unchartered territory in materials informatics54. Finally, the choice of more robust features could also improve the prediction accuracies. Further computational efforts aimed at exhaustively evaluating the potential energy surface of related phases55 or alternatively, data-driven approaches56 involving inference models could further refine the predictions by addressing issues related to compound formability and order-disorder transitions.

Notwithstanding the limitations, our approach provides a rational framework for structure-based design of novel functional materials with implications beyond the layered RP oxides. For instance, our methodology can be extended to explore NCS structures in Dion–Jacobson, Aurivillius, Brownmillerite or any crystal family. In principle, our strategy could also guide the search for materials with intriguing functionalities such as ferroaxiality57. The key component to realize such predictions will be the database construction process and more importantly, the nature of available data (including features) would determine the type of questions that can be addressed. In terms of ML methods, off-the-shelf classification learning with class-imbalance algorithms (such as those demonstrated in this work) has the potential to provide insights necessary for guiding the accelerated search of new materials with targeted crystal symmetry or functionality. Advanced learning strategies (for example, semisupervised learning, algorithms beyond SMOTE and Bayesian methods) may be necessary, but the choice and its formulation will hinge critically on the available databases and/or prior domain knowledge.

Methods

Group theory

The group theoretical analysis was performed using the ISOTROPY58 tool and electronic resources available from the Bilbao Crystallographic Server59.

Materials informatics

We used the following inference and ML methods in this paper: PCA for data-dimensionality reduction and feature extraction60, sampling techniques such as bootstrap method that constructs multiple data sets from our experimental data set via sampling with replacement, decision tree classification learning61 for formulating QCSR design rules and SMOTE34 to rectify the class-imbalance problem. We chose the decision tree classification learner for the following reasons62: (i) they are interpretable making the model transparent to domain experts; (ii) the splitting criteria (for example, Shannon entropy) serves to accomplish feature selection without the need for using any additional ML methods; (iii) they are scalable; and (iv) they have the capability to match the prediction accuracies of state-of-the-art ML methods. ML calculations were performed using RSTUDIO and WEKA. The decision tree algorithm as implemented in WEKA was used. The data set was constructed using the Waber–Cromer orbital radii as features.

The class-imbalance problem was rectified using the SMOTE algorithm. When there is class-imbalance, these ML models could ignore the less frequently observed class labels and group them with other class labels in the nearest-neighbor high-dimensional data space that occur more frequently. This is not desirable for this work, because the frequency of occurrence of the NCS space groups, to begin with, are already under-represented. The input to SMOTE is our data set and three additional parameters: (i) the under-represented or minority class label that we intend to oversample, (ii) the number of nearest neighbours (k) and (iii) the number of extra synthetic data samples (in %) to be created. The SMOTE algorithm functions as follows: it takes the difference between the feature vectors (that is, orbital radii) of the under-represented irreps and its k nearest neighbours and multiplies the difference by a random number between 0 and 1 to create a new feature vector. This new feature vector is augmented to the original data set. As a result, the selection of a random data point is made along the line segment (a simplified visual representation of the process based on our data set is given in Supplementary Fig. 1). We used PCA to ensure that SMOTE did not affect the manifold of our data set. We use the SMOTE algorithm as implemented in WEKA37.

Electronic structure calculations

DFT calculations for all RP compounds were performed using the planewave pseudopotential code, Quantum ESPRESSO (QE)63 to obtain the total energies. We used ultrasoft pseudopotentials64 with the PBEsol exchange-correlation functional65 taken from the PSlibrary66. A plane-wave cutoff of 60 Ry was used during the ionic and electronic relaxation steps. Electron correlations in Ru-4d and Ir-5d electrons were treated using the Hubbard-U method within the Dudarev formalism67. Spin-polarized calculations with collinear ferromagnetic spin order were imposed on the Ru and Ir atoms. An effective Hubbard-U of 1.5 eV was chosen in both cases. Frozen phonon calculations were performed using PHONOPY code68 that uses the forces from QE as input for calculating the dynamical matrices and interatomic force constants. We employed a supercell of size 2 × 2 × 2 with 112 atoms for the frozen phonon calculations.

All calculations to obtain bandgaps and piezoelectric coefficients for NaRSnO4 were performed using DFT as implemented in the Vienna ab initio Simulation Package69,70. The crystal structures were taken from converged QE calculations. We used projector augmented-wave potentials71 with the PBEsol functional. The piezoelectric and elastic tensors were computed within the density-functional perturbation theory72,73 with a plane-wave cutoff of 800 eV. The density of states were computed first with PBEsol, and then with different amounts of exact exchange using HSE (Heyd–Scuseria–Ernzerhof). By comparing the experimental bandgap of BaSnO3 with our computed values, we selected the amount of exact exchange to use (here 35%).

Data availability

The data sets for the informatics study and the DFT optimized crystallographic information files are deposited at figshare (refs 41, 53.).

Additional information

How to cite this article: Balachandran, P. V. et al. Learning from data to design functional materials without inversion symmetry. Nat. Commun. 8, 14282 doi: 10.1038/ncomms14282 (2017).

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.