Abstract
Accelerating the search for functional materials is a challenging problem. Here we develop an informaticsguided ab initio approach to accelerate the design and discovery of noncentrosymmetric materials. The workflow integrates group theory, informatics and densityfunctional theory to uncover design guidelines for predicting noncentrosymmetric compounds, which we apply to layered RuddlesdenPopper oxides. Group theory identifies how configurations of oxygen octahedral rotation patterns, ordered cation arrangements and their interplay break inversion symmetry, while informatics tools learn from available data to select candidate compositions that fulfil the grouptheoretical postulates. Our key outcome is the identification of 242 compositions after screening ∼3,200 that show potential for noncentrosymmetric structures, a 25fold increase in the projected number of known noncentrosymmetric RuddlesdenPopper oxides. We validate our predictions for 19 compounds using phonon calculations, among which 17 have noncentrosymmetric ground states including two potential multiferroics. Our approach enables rational design of materials with targeted crystal symmetries and functionalities.
Introduction
Noncentrosymmetric (NCS) oxide ceramics that break all improper rotations and centres of symmetry are challenging to discover. Materials with polar, piezoelectric, chiral and those exhibiting circular dichroism (collectively referred to as NCS materials) are defined by the absence of inversion symmetry and are present everywhere—in the form of organic amino acids, sugars and other biological molecules^{1}. Inorganic NCS materials containing oxide anions are also not uncommon^{2}. Quartz crystals with a helical arrangement of cornerconnected SiO_{4} tetrahedral units maintain the punctuality of our mechanical timepieces^{3}. At inorganic crystalline surfaces, chirality plays a crucial role in corrosion processes, heterogeneous catalysis and the fidelity of enantioselectivebased production or separation of industrial solvents, plastics and pharmaceutical drugs^{4}. Pb(Zr,Ti)O_{3}, BaTiO_{3} and BiFeO_{3} are some of the archetypal polar oxides that have impacted many critical technologies^{5}. Often inorganic polar and chiral basic building units (BBUs) are selected and assembled together, but acentric organization of BBUs within a unit cell are difficult to predict due to the complex interplay of chemistry and structure.
In the context of inorganic oxides, which is the focus of this work, the design of NCS materials has relied mainly on BBUs with metal centres that have d^{0} electronic configurations or lonepair cations, where the acentricity arises from an electronic origin due to the pseudo or secondorder Jahn–Teller (SOJT) effect^{6,7}. A majority of inorganic oxides, however, strongly prefer closepacked arrangements of ions and highly symmetric cation coordination environments (for example, octahedra). This is mainly due to the dominant electrostatic effects that are optimized by favouring like–unlike interactions (that is, positive and negative dipoles align equally and oppositely), which stabilize atomic arrangements with inversion symmetry^{8}. In fact, the presence of BBUs with d^{0} metal centres alone is not a sufficient condition for designing NCS materials. For example, the perovskite SrTiO_{3} is a quantum paraelectric or incipient ferroelectric^{9}, whereas the isoelectronic layered RuddlesdenPopper (RP) Sr_{2}TiO_{4} is a centric dielectric^{10}. Hence, it is the complex interplay between structure and chemistry that determines the formation of NCS inorganic oxides.
Alternative to pseudoJT or SOJT effects, the ‘trilinear coupling’, ‘hybrid improper’ or ‘geometric ferroelectricity’ mechanism, where two nonpolar lattice distortions (octahedral rotations or tilting) couple to a polar lattice mode, have also been shown to break the inversion symmetry with interesting technological consequences^{11,12}. Even in this case, no a priori rules exist that guide the design of new hybrid improper ferroelectric materials, unless exhaustive calculations are carried out to map the chemical and energy landscape that subsequently inform experiments^{12}. As a result, NCS inorganic oxides are challenging to discover.
Although highthroughput first principlesbased methods have shown promise in the design of NCS halfHeusler alloys^{13}, exhaustive calculations for more complex crystal structures with numerous polymorphs (such as the RPs) and thousands of unexplored chemical compositions have not (yet) been demonstrated. This is partly because the potential energy surface of complex oxides is difficult to navigate. Phonon instabilities at highsymmetry points away from the Γpoint in the irreducible Brillouin zones cause the primitive unit cell to multiply several fold, resulting in large system sizes and vast numbers of unique atomic arrangements. It is challenging to rigorously evaluate the energetics of all structures in a highthroughput manner. Furthermore, chemistries with partially filled d (and/or f) orbitals and the existence of energetically competing ground states complicate the structure prediction process. As a result, novel approaches are desired to guide the first principles calculations in an effective manner. Materials informatics, a growing field at the intersections of many scientific disciplines including data and information science, statistics, machine learning (ML) and optimization, has the potential to accomplish this objective^{14}.
Here we develop a predictive datadriven computational framework that unites applied group theory, informatics techniques and ab initio electronic structure calculations for designing novel NCS materials. We apply it to the twodimensional n=1 RP structure family (Fig. 1a), for which to date few compositions exist in NCS crystal classes^{15,16,17}. Nonetheless, the chemical search space is (Fig. 1b). We use informaticsbased methods to screen the chemical space and downselect 242 compositions that show greater promise for NCS ground states. The potential for discovering novel NCS n=1 RP compounds has key implications in technological applications that require a broad range of functionalities including hightemperature piezoelectricity, tunable bandgaps, improper ferroelectricity, multiferroicity and thermoelectricity. We focus in detail on the design of NaRSnO_{4} stannates and NaRRuO_{4} ruthenates (where R=La, Pr, Nd, Gd or Y) that were predicted to have NCS ground state structures from informatics and subsequently validated by densityfunctional theory (DFT). For the stannates, which are candidate materials for sensors and transparent conducting oxides^{18}, we find two energetically competing NCS ground state phases: P2_{1}m (piezoactive) and P2_{1}2_{1}2 (chiral and piezoactive). We calculate their electronic bandgaps in the P2_{1}m crystal symmetry using hybrid exchangecorrelation functionals, finding optical transparency in the visible light regime. We also compute their piezoelectric responses that show a dependence on Rcation size. In sharp contrast, the NCS NaRRuO_{4} are magnetic with metallic, halfmetallic or insulating electronic structures. Their ground state is determined to be either piezoactive with P2_{1}m symmetry when R=La, Pr and Nd or polar with Pca2_{1} symmetry when R=Gd and Y. Moreover, there is a transition from ferromagnetic metallic (R=La) or halfmetallic (R=Pr, Nd) to antiferromagnetic insulating (R=Gd, Y) character as a function of Rcation size. Therefore, these bulk ruthenates are predicted to belong to the intriguing class of NCS metals^{19,20} and halfmetals with piezoactive symmetries or antiferromagnetic insulators with polar symmetry (that is, multiferroics). Last, we also test our predictions for an additional nine new compounds with different cations occupying the Bsublattice of the RP structure (shown in Fig. 1a). Among them, seven were validated to have an NCS ground state structure—NaLaZrO_{4}, NaLaHfO_{4}, KBaNbO_{4}, NaLaIrO_{4}, NaCaTaO_{4}, SrYGaO_{4} and SrLaInO_{4}. These results establish our computational framework as a powerful tool for crystal symmetry classification, structurebased property design and control.
Results
Approach
Our search for NCS oxides relies on a multifaceted theoretical approach, which reformulates the discovery objective into identifying structure—chemistry interrelationships (as shown in Fig. 2). The design strategy focuses on three key criteria obtained by subdividing the design process into unique objectives with specific tasks:

Structural: How can the atomic structure, or configuration of oxygen octahedra BBUs, be designed to support the desired interaction?

Chemical: Which combinations of chemistries will promote that structural configuration?

Stability: Is the proposed composition the global ground state?
Following classification learning from informatics and evaluation of the energetic stability from first principles methods, the final design relies on response optimization by leveraging additional degrees of freedom to further promote the targeted behaviour. Some of the strategies include searching for microscopic mechanisms and external conditions (such as epitaxial strain) to energetically stabilize those geometries. We note that this paper is a significant advancement from the earlier work of Balachandran et al.^{15} where the emphasis was on enumerating symmetry guidelines.
Group theory
In an earlier work, Balachandran et al.^{15} formulated symmetry guidelines for exploring and designing NCS phases in the n=1 RP structures based on group theory. Therefore, we discuss only the key results here. Starting from the centrosymmetric (CS) aristotype structure (shown in Fig. 3a), various symmetryallowed cooperative atomic displacements (also referred to as ‘shuffles’) were enumerated that transform the aristotype CS structure to a NCS structure of lower symmetry. Particularly, the focus was on CS→NCS phase transitions that are second order or weakly first order, where the symmetrylowering distortions arise from (i) nonpolar octahedral distortions (tilting or rotations) due to phonon softening at the zone boundaries in the BZ of the I4/mmm space group, (ii) A/A′ cation ordering, (iii) the interplay between two or more octahedral distortions and (iv) the interplay between octahedral distortions and A/A′ cation ordering. The necessity to search for alternative routes to breaking inversion symmetry was motivated by the fact that NCS phases are seldom seen in n=1 RPs, which has been explained by the disconnected octahedral layers destroying the coherency required for cooperative offcentring displacements, and thus ferroelectricity^{21}.
Balachandran et al.^{15} found three important symmetry guidelines (given in the rows of Table 1) for lifting parity in the n=1 RP structures. Note that all involve A/A′ cation ordering (Fig. 3b) that transform as irreducible representation (irrep) and couple with octahedral rotations or tilting (as shown in Fig. 3c–e). The structural attributes may be satisfied by any of the following approaches:
Route 1: Outofphase octahedral tilting that transform as irrep with order parameter direction (OPD) (η_{1},η_{1}), which on superposition with irrep (η_{1}) would yield a piezoelectric (P2_{1}m) space group (Fig. 3c).
Route 2: Outofphase tilting that transform as irrep with OPD (η_{1},η_{2}) on superposition with (η_{1}) would yield a chiral (P2_{1}2_{1}2) and piezoactive space group (Fig. 3d).
Route 3: Coupled irrep ⊕ with OPD (0,η_{1};η_{2},0) when superposed with irrep (η_{1}) would yield a polar (Pca2_{1}) space group (Fig. 3e), where the matrix elements of and irreps accommodate atomic displacements that correspond to JahnTellerlike distortions and outofphase tilting, respectively.
Note that there is another type of A/A′ cation ordering, transforming as irrep , which lifts inversion solely from the ordering (we refer to it as the trivial case). However, we do not consider A/A′ cation ordering here. Therefore, the key materials design question is: What combinations of chemical elements from the vast chemical space would stabilize these NCS phases? We address this question using materials informatics.
Materials informatics
In Fig. 4, we show the frequency of occurrence of experimentally known crystal symmetries in the bulk n=1 RPs. We report only the low temperature crystal symmetries in Fig. 4 and do not explicitly consider temperature dependence of the crystal structures in our informatics analysis. Our definition of low temperature includes experimentally observed structures ≤300 K. Some RP compounds also undergo structural transformation at a much lower temperature (for example, La_{2}NiO_{4} (ref. 22)). Under such circumstances, we take the lower temperature crystal structure to be our label for informatics. This simplification was necessary because 0 K DFT calculations are used to validate the informaticsbased predictions. Balachandran et al.^{15} showed that as the temperature increases, the propensity for forming highsymmetry phases also increases. We anticipate those results to hold here.
Our literature survey shows that ∼45% of the compositions are undistorted (denoted as φ in Fig. 4). Similarly, there are also a significant number of compositions that undergo symmetrylowering distortions, albeit preserving the spatial inversion symmetry. One of the key observations from Fig. 4 is that there are only nine compounds with NCS space groups that conform with our chemical search space (Fig. 1b). In the literature, the family of cationordered NaRTiO_{4} and LiRTiO_{4} (found only recently), where R=La, Nd, Dy, Gd, Sm, Ho, Eu and Y, have been experimentally shown^{16,17} to have the piezoelectric P2_{1}m space group [⊕ (η_{1},η_{1};η_{1})]. The nominal electronic configuration of Ti^{4+} in these compounds is d^{0}. The coupling between TiO_{6} octahedral tilting (that transform as irrep (η_{1},η_{1}) as shown in Fig. 3c) and Li/R or Na/R cation ordering (that transform as irrep (η_{1}) as shown in Fig. 3b) lifts the inversion symmetry—in accordance with Route 1. The only other experimentally known polar n=1 RP oxide is the A and Bsiteordered (LaSr)(Li_{0.5}Ru_{0.5})O_{4} compound, which is reported in the NCS Imm2 space group^{23}. In this compound, a combination of Asite and Bsite cation ordering work in concert to lift the inversion symmetry. In addition to these compounds, Pb_{2}TiO_{4}, Ca_{2}IrO_{4}, Sn_{2}SnO_{4}, cationordered LaANiO_{4} (A=Sr, Ca and Ba) LaSrAlO_{4} and LaSrMnO_{4} have also been theoretically predicted to have NCS structures^{15,24,25,26,27,28}; however, these results have not been experimentally verified. Recently, the metastable Ca_{2}IrO_{4} was epitaxially grown on a YAlO_{3} substrate in the n=1 RP phase using pulsed laser deposition^{29}. However, the authors did not report its crystal symmetry. Therefore, we do not consider these chemistries in our informatics analysis.
In the family of n=1 RPs with relatively simple stoichiometries such as AA′BO_{4}, where A and A′ are two chemical species (similar or dissimilar) occupying the Asite and B is a cation with 6fold octahedral coordination, there are ∼3,200 potential chemical compositions that satisfy crystal chemistry and stoichiometric guidelines (for example, charge neutrality), and therefore are, in principle, amenable for experimental synthesis. However, only 3% have been experimentally synthesized, and among these, only nine have NCS phases. The objective of our informatics analysis is to utilize statistical inference and machine learning (ML) methods for establishing quantitative chemistrysymmetry relationships (QCSR) of known materials in Fig. 4. These QCSRs, in turn, serve as a guide to rapidly screen the vast chemical space and identify new, previously unexplored compositions that favour the distortions given in the Table 1.
Data set
In our ML approach, we build a data set of experimentally known materials that includes both CS and NCS structures. Even though our computational design focuses on AA′BO_{4} stoichiometries, our training data set includes RP compositions that deviate from the AA′BO_{4} stoichiometry (see data set in the Supplementary Information). We describe each n=1 RP composition uniquely in terms of its crystal symmetry or irrep (referred to as ‘class label’ in the ML jargon) and a set of features. We use Waber–Cromer orbital radii as features for ML^{30}. Orbital radii and distortion modes have been utilized in the past for predicting structures and formabilities of complex oxides^{31,32}. Our ML objective is to build a classification model that predicts crystal symmetries or irrep labels from orbital radii. All 83 experimentally known RP chemical compositions (after removing (LaSr)(Li_{0.5}Ru_{0.5})O_{4}, because we do not consider the element Li in our chemical space, see Fig. 1b) were written in the simplified A_{2}BO_{4} stoichiometric form, where the A and Bsites can have two or more elements with partial site occupancies. We used a total of 12 and 10 orbital radii features to describe the A and Bsites, respectively. If there were two or more elements occupying either the A or Bsites, then linear combinations weighted by their relative stoichiometric proportions were used to build the features.
We constructed two data sets for classification learning that uses: (i) space groups as class labels (an obvious choice) and (ii) irreps corresponding to octahedral tilting, rotations, or lack thereof as class labels. Here, we focus mainly on the ML results from the latter data set (case (ii)) that uses irreps as class labels, which allows us to elegantly isolate octahedral rotations or tilting from cation ordering. As a result, we can group or combine two space groups under the same label. For example, we combine compositions with the I4/mmm and P4/nmm space group together (under the label, φ), because in both cases there are no octahedral rotations or tilting. One of the key differences between I4/mmm and P4/nmm is that in P4/nmm the Asite Wyckoff orbit is split into two unique crystallographic sites^{15}. Similarly, we can combine space groups P2_{1}m and P4_{2}/ncm into a single irrep, (η_{1},η_{1}). Such data transformation reduces the number of unique class labels from 9 to 7 (see inset in Fig. 4) for classification learning. The main disadvantage with such grouping is that our QCSR model now cannot distinguish between ordered and disordered structures. This should not affect our NCS materials design goal because of advancements in the nonequilibrium synthesis and processing of these oxides. Recently, there have been experimental demonstrations of layerbylayer growth of A/A′ cationordered n=1 RPs using molecular beam epitaxy with unprecedented control^{33}. We also tested the predictive power of our ML models by intentionally leaving out 14 compounds during training (which reduces the size of our training set from 83 to 69 compounds). One of our informatics goals is to validate whether our classification learning can identify the labels correctly for the left out compounds, before using them for making new predictions.
Even after reducing the number of unique class labels from 9 to 6 (since there is only one chemical composition with irrep , which we do not consider for ML), we must still address the problem of class imbalance, where some irrep class labels are found more frequently than others. This kind of class imbalance is problematic for ML. To test the implications of class imbalance, we trained a decision tree classification model using the imbalanced data set and found that compositions with space group Pccn or (η_{1},η_{2}) were 100% misclassified. As shown in Table 1 and Fig. 3, Pccn or (η_{1},η_{2}) is one of the desired class labels for designing NCS materials. Therefore, the classimbalance problem must be addressed.
A number of methods have been developed in the computer science and artificial intelligence literature to overcome the classimbalance problem^{34,35}. Some of them include: oversampling (that is, randomly duplicating instances of the underrepresented class category), undersampling (random removal of instances of the most frequently occurring class) and interpolation schemes. In this work, we utilize an oversampling scheme referred to as synthetic minority class oversampling technique (or SMOTE)^{34}, in which the underrepresented class labels are oversampled by creating ‘synthetic’ examples of extra or fictitious training data points from the original imbalanced data. It is based on a knearestneighbour analysis and one of its main advantages (relative to other algorithms) is that the extra data points, in principle, informs the ML models to create larger and less specific decision regions. Additional details about the algorithm are described in the Methods section.
We took the data set that contained irreps as class labels and applied SMOTE to construct synthetic data points for the two irrep labels, P_{4} and (η_{1},η_{2}). We created a total of three and six synthetic data points for the underrepresented P_{4} and (η_{1},η_{2}) labels, respectively. Our training data set size now increased to 78 compounds (69 originally+9 from SMOTE) for classification learning. We confirmed using principal component analysis (PCA) that SMOTE did not affect our data manifold (Supplementary Fig. 1).
Data preprocessing
Our NCS materials design is initiated by exhaustively enumerating, at first, all possible AA′BO_{4} combinations that satisfy crystal chemistry and stoichiometric rules (for example, charge neutrality). As noted before, we use Waber–Cromer orbital radii as features. We then augment this exhaustive data set with the 78 n=1 RPs. Note that at this point, we do not include the irrep class labels in our data set. Now, we have a total of 3,253 chemical compositions and 22 orbital radii features.
We autoscaled the data (normalized to zero mean and unit variance) and applied PCA, which constructs linear combinations of weighted contributions of orbital radii (see Supplementary Figs 2 and 3). In a recent work, Balachandran et al.^{36} showed that in a data set containing orbital radii as features, PCA removes redundancy of information, reduces data dimensionality and constructs physically meaningful linear combinations of orbital radii (see Supplementary Note 1). In addition, principal components (PCs) are also independent of one another (assuming Gaussian or Normal distribution). After PCA, we reduced the dimensionality of our data set from 22 orbital radii features to 8 PCs, which together capture >90% of total variance in the data set. We then identify and isolate 78 chemical compositions for which the irrep labels are experimentally known; we refer to this data set as the training set. The remaining compositions are referred to as the ‘virtual set’ defining the vast chemical search space yet to be explored for new NCS materials design.
Classification learning
We utilized the J48 decision tree classification learning algorithm, as implemented in WEKA, for establishing QCSR^{37,38}. The reasons for choosing the J48 algorithm are discussed in the Methods section. We constructed five bootstrapped samples of 78 compositions each from the original training set. We then trained the decision tree algorithm using the five bootstrapped samples and constructed five decision tree models (Supplementary Figs 4–8). The classification accuracies for the five decision tree models were evaluated on the training data set and by 10fold crossvalidation. The results are given in Supplementary Table 1 and Supplementary Note 2. The average classification accuracy from the five bootstrapped decision trees using the 10fold crossvalidation is ∼80%. These results indicate that more accurate QCSR models could potentially be formulated either through alternative feature selection methods^{39} or by utilizing other (kernelbased) ML algorithms (which we do not address here). Furthermore, we also tested our decision trees to determine whether they could correctly identify the irrep labels for 14 compounds, which were intentionally held out during the training process. Results are given in Table 2. Our ensemble of decision trees correctly labelled with ≥60% accuracy (except for YSrCrO_{4} and Ca_{2}CrO_{4}) 12 out of 14 compounds in the independent test set, giving confidence in our classification learning.
Using the five bootstrapped decision trees, we screened a total of 3,175 compositions in the virtual set and filtered 242 new compositions that showed potential for NCS ground state structures. At this stage, we retained only those compositions that were identified to be NCS, that is, belonging to either (η_{1},η_{1}), (η_{1},η_{2}) or ⊕ (0,η_{1};η_{2},0), by at least three out of the five decision trees. We then created additional filters to remove data points that contained (i) toxic elements, such as Pb, Hg and Cd, (ii) compositions where both A and A′ sites were occupied by the same element and (iii) compositions with A or A′ site elements that were not part of the original training data set (for example, Cs, Rb, Tl, Ag and Mg).
We note that some disagreement is expected between our predictions and experiments (or calculations), particularly when concerned with the transition metal elements whose valence state falls within the strong electron correlations regime (for example, Ti^{3+}, Cr^{3+}, V^{3+}, Mn^{3+} and so on), mainly because there were very few instances of chemical compositions with these transition metal cations in our training set. Our refined results, after screening through various filters and removing chemical compositions that could fall in the strongly correlated regime, included a total of 242 new chemical compositions that show promise for NCS structures.
The following octahedral Bsite cations in the virtual set are predicted to have NCS structures in the n=1 RP oxides: Ga^{3+}, In^{3+}, Ti^{4+}, Zr^{4+}, Ru^{4+}, Sn^{4+}, Hf^{4+}, Ir^{4+}, Nb^{5+} and Ta^{5+}. We could also exclude In^{3+}, because of the experimental difficulties in forming n=1 RP structures using equilibrium synthesis and processing techniques^{40} (although we do not preclude stabilizing Inbased n=1 RPs using nonequilibrium methods). The chemical compositions for all predicted NCS materials are listed in Table 3. Additional details can be found in Supplementary Table 2, Supplementary Note 3 and the data sets can be downloaded from ref. 41. To summarize, using informatics we identified 242 new n=1 RP chemical compositions with potential for NCS crystal structures, which significantly expands the chemical space of NCS n=1 RP oxides (∼25fold increase).
Densityfunctional theory
On the basis of the group theory and materials informatics analysis, we first validate our predictions by assessing the energetic stability component (Task 3 in Fig. 2) for ten downselected NaRSnO_{4} and NaRRuO_{4} compounds, where R is a rareearth element (R=La, Pr, Nd, Gd and Y) using DFT calculations. In our calculations, Na^{1+} and R^{3+} cations were ordered in accordance with the irrep label (η_{1}), as shown in Fig. 3b. To the best of our knowledge, no previous experimental or theoretical data exists for either NaRSnO_{4} or NaRRuO_{4} compounds. In addition, stannates have implications in the design of transparent conducting oxides^{18} and ruthenates are potential materials for investigating metal–insulator transitions^{42}.
We choose especially NaRSnO_{4} and NaRRuO_{4} for validation, motivated (albeit naively) by the adaptive design paradigm^{14}, where the objective is to iteratively improve the predictions of the classification model. Typically, the improvements are made by choosing chemical compositions for experiment that show promising characteristics (such as NCS crystal classes as discussed here), yet have large uncertainties. Here, NaRSnO_{4} and NaRRuO_{4} satisfy these criteria, because the predictions from the five decision trees were ⊕ (NCS), (η_{1},η_{2}) (NCS), (0,η_{1}) (CS), ⊕ (NCS) and (η_{1},η_{2}) (NCS), corresponding to Pca2_{1} (polar), P2_{1}2_{1}2 (chiral), Pbcm (centrosymmetric), Pca2_{1} (polar) and P2_{1}2_{1}2 (chiral) space groups, respectively. Four out of the five decision trees predict these compounds to have a chiral or polar structure, making them promising NCS candidates, yet the irrep labels or space groups are different, indicating uncertainty. Furthermore, with stannates the nominal electronic configuration of Sn^{4+} (4d^{10}) is different from that of SOJTcation Ti^{4+} (3d^{0}), thereby presenting an interesting case for comparison between the two Bsite octahedral cations. The Shannon ionic radii for Sn^{4+} and Ti^{4+} in the sixfold coordination are 0.69 and 0.605 Å, respectively^{43}, making their ionic sizes within the hardsphere model also different. Similarly, ruthenates (with Ru in nominally 4+ ionic state) have partially filled 4d electrons with four electrons occupying the t_{2g} orbital manifold and are quite distinct from the 3d^{0} titanates.
Stannates
We performed full structural relaxations for NaRSnO_{4} (where R=La, Pr, Nd, Gd and Y) within the generalized gradient approximation (cf. Methods). The phonon dispersions are given in Supplementary Fig. 9, from which we identify a common set of six candidate crystal symmetries from ‘freezing in’ the imaginary phonon modes of the highsymmetry paraelectric reference phase (P4/nmm) for determining the ground state structure. They include Pmn2_{1}, Pc, P2_{1}m, P2m, I2m and Pnma. In addition to these six crystal symmetries, we also considered three more symmetries, namely P2_{1}2_{1}2, Pbcm and Pca2_{1}, as recommended by ML to unambiguously confirm the ground state. Therefore, in total, we considered nine distorted candidate structures. The total energy data from DFT calculations is given in Table 4, which shows that all stannates exhibit a strong energetic competition between the NCS piezoelectrically active P2_{1}m [ (η_{1},η_{1})] and chiral P2_{1}2_{1}2 symmetries [ (η_{1},η_{2})]. We find that the total energy difference is <0.1 meV per f.u. (Table 4) between the two NCS phases. A closer examination of the two converged crystal structures revealed that they differ mainly in the inplane lattice parameters (in P2_{1}m a=b, whereas in P2_{1}2_{1}2 a≠b and this is shown in Fig. 3c,d, respectively). Furthermore, in P2_{1}2_{1}2 the inplane lattice constant a was found to be not equal to b only in the fourth or fifth decimal point. Therefore, we assign the ground state structure to be NCS P2_{1}m space group for the stannates. We conclude from our DFT calculations that the RP stannates are NCS, in good agreement with the insights from ML and the inversion symmetry is broken due to the coupled action of SnO_{6} oxygen octahedral tilting and Na/R cation ordering (Route 1).
We then computed the bandgaps (E_{g}) for each of the compounds using the HSEsol exchangecorrelation functional (which often more accurately reproduces experimental results^{44}) and found them to be in the range 4.3 to 4.5 eV (Table 5), similar to Ba_{2}SnO_{4} (E_{g}=4.41 eV)^{18}. The amount of exact exchange used in the calculations was tuned using the known experimental bandgap of BaSnO_{3} (ref. 45).
We next computed the piezoelectric strain coefficients (d_{ij}) for each compound in P2_{1}m space group (Fig. 5); the d_{ij} response is marginally smaller than that reported for the titanates^{16}, but follows the same trend (increasing with decreasing atomic radius, up to R=Gd and then decreases).
Ruthenates
All DFT calculations were performed using the spinpolarized DFT+U method, where an effective HubbardU of 1.5 eV was used to treat the correlated Ru 4d electrons (cf. Methods). The phonon dispersions are given in Supplementary Fig. 10 and show some similarities with the stannates. We explored a total of nine distorted crystal symmetries to determine the ground state (six from phonon calculations and three from ML). The total energies from DFT+U for NaRRuO_{4} in different crystal symmetries and ferromagnetic spin order are given in Table 4; the ground state is determined to be NCS for NaLaRuO_{4}, NaPrRuO_{4} and NaNdRuO_{4} with two competing structures, P2_{1}2_{1}2 and P2_{1}m. Moreover, in the P2_{1}2_{1}2 symmetry, a was found to be not equal to b only at the fourth decimal point (similar to the stannates). We also performed additional DFT+U calculations for the top two lowest energy structures (namely P2_{1}m and Pca2_{1}), where we now impose antiferromagnetic spin order on the inplane Ru atoms (shown schematically in Supplementary Fig. 11). The total energy results are given in Table 6, from which we conclude that the NCS P2_{1}m space group with ferromagnetic Ru^{4+}–O^{2−}–Ru^{4+} interactions is the likely ground state for these compounds (Route 1).
In the case of NaGdRuO_{4} and NaYRuO_{4}, the ground state structure is also determined to be NCS, but in polar Pca2_{1} crystal symmetry (see Table 4). Furthermore, in both NaGdRuO_{4} and NaYRuO_{4}, the Pca2_{1} structure with inplane antiferromagnetic Ru^{4+}–O^{2−}–Ru^{4+} interactions (Supplementary Fig. 11) were found to be 1.44 and 5.54 meV per atom lower in energy, respectively, than that for the ferromagnetic structures. The total energy data along with Ruatom magnetic moments are given in Table 6. Thus, we predict NaGdRuO_{4} and NaYRuO_{4} to have polar Pca2_{1} ground state structures (Route 3) with antiferromagnetic spin order.
We also calculated the electronic band structures for all five NaRRuO_{4} in their respective ground states. The results are shown in Supplementary Fig. 11. We find that NaLaRuO_{4} is metallic with bands crossing the Fermi level in both the spinup and spindown electron channels. On the other hand, the NaPrRuO_{4} and NaNdRuO_{4} are found to be halfmetals, that is, bands cross the Fermi level only in the spindown channel and a gap appears for the spinup channel. Moreover, the size of the gap increases as the rareearth cation size decreases. This occurs because the relative amplitude of RuO_{6} octahedral tilting also increases with decreasing rareearth cation size, impacting the electronic bandwidths of the Rut_{2g} orbitals. Note that this is not the first time ferromagnetic metals or halfmetals are reported in rutheniumbased oxides^{46,47}. However, our intriguing finding is that NaLaRuO_{4}, NaPrRuO_{4} and NaNdRuO_{4} RP oxides are also NCS with piezoactive symmetries. Thus, these compounds add to the growing list of NCS metals^{19,20} or halfmetals with unusual coexisting properties (broken inversion symmetry and metalliclike conduction).
In contrast, the NCS NaGdRuO_{4} and NaYRuO_{4} are found to be insulating with a gap appearing in both spinup and spindown electron channels (see Supplementary Fig. 11). We note that ruthenium oxides with antiferromagnetic insulating ground states are also not uncommon. For example, RP Ca_{2}RuO_{4} is a known antiferromagnetic insulator in the CS Pbca space group (Fig. 3e) at low temperatures^{48,49}. Thus, we predict NaGdRuO_{4} and NaYRuO_{4} as potential multiferroics with polar symmetry, antiferromagnetic spin order and a bandgap. Are these stannates and ruthenates also thermodynamically stable? We address this question in the next section.
Thermodynamic stability
We use grand canonical linear programming^{50} to determine the thermodynamic stability for the predicted RP stannates and ruthenates. The ‘reservoir’ of stable compounds present in the Open Quantum Materials Database^{51} were chosen to describe the theoretical convex hull. The process involves calculation of the total energy change (ΔE^{D}) for a chemical reaction involving reactants that are known to be thermodynamically stable and a product, which is the ground state structure of our predicted RP compounds. Compounds with negative ΔE^{D} are identified to be thermodynamically stable.
It is also important to note that compounds with positive ΔE^{D} (metastable) have also been synthesized. Commonly, when ΔE^{D} is <+25 meV per atom above the convex hull, it is suggested that the composition could be potentially synthesized under appropriate experimental conditions^{52}. To evaluate this criterion for our design problem, we first calculated the ΔE^{D} for Ca_{2}IrO_{4} that was recently epitaxially grown in the RP structuretype using the pulsed laser deposition method^{29}. It is well known in the literature that Ca_{2}IrO_{4} in RP structure type is a metastable phase^{29}. Our main motivation is to compare the ΔE^{D} for Ca_{2}IrO_{4} with our newly predicted compounds (especially those with positive ΔE^{D}) and glean additional insights. The results are given in Table 4. The ΔE^{D} for RP Ca_{2}IrO_{4} in the theoretical ground state and highsymmetry structures are +34 and +156 meV per atom, respectively, above the convex hull, yet it was successfully synthesized. We give the ΔE^{D} data for both the theoretical ground state and highsymmetry structures, because Souri et al.^{29} do not report the crystal symmetry of their thin film, and therefore the reference point is unclear.
Having benchmarked the ΔE^{D} data for Ca_{2}IrO_{4}, we return to our predicted NCS stannates and ruthenates. In Table 4, we provide the ΔE^{D} data for both stannates and ruthenates. The associated decomposition reactions are given in the Supplementary Note 4. Two out of 10 compounds—NaGdRuO_{4} and NaYRuO_{4}—have negative ΔE^{D}, and therefore, we identify them to be thermodynamically stable and promising for synthesis. The remaining eight compounds have ΔE^{D}≤+82 meV per atom.
Additional predictions
In Table 7, we report our results for nine additional randomly chosen compounds that were predicted to have NCS ground state structures from ML. The total energy data, along with the different crystal symmetries obtained from both phonon calculations and ML, are given in the Supplementary Table 3. Seven out of nine compounds are found to have NCS ground state structures, in good agreement with our classification learning. Note that some of them (for example, KBaNbO_{4} and NaCaTaO_{4}) have space groups that are not seen in any known or reported RP compounds (see Fig. 4). This is because we did not constrain our DFT calculations to only known structures or those from ML, but performed phonon calculations and full structure relaxations. The decomposition energies, ΔE^{D}, for all nine compounds are also given in Table 7. Six out of nine predicted compounds have either a negative ΔE^{D} (thermodynamically stable) or ΔE^{D}≤34 meV per atom (that is, stable relative to Ca_{2}IrO_{4}), indicating promise. Experimental results are necessary to confirm these predictions. In Table 3, chemistries for all 242 predicted RP oxides that show potential for NCS structures are listed. The DFT optimized ground state crystallographic information files for all 19 compounds can be downloaded from ref. 53.
As a general observation, we note that the NCS P2_{1}m space group that we predict for 13 out of 19 compositions from DFT is also one of the most commonly observed experimental ground states^{16,17} (also see Fig. 4) for the n=1 RP compounds.
Discussion
We developed a computational strategy built on the foundations of applied group theory, ML and DFT to design NCS RP compounds. In terms of the novelty of our informatics approach, we note that the use of irreps as class labels for ML is new to materials science. Normally, space groups are utilized. The role of group theory in our framework was to transform the space groups into irreps. From using irreps as class labels for ML, we were able to reduce the complexity of our classification problem from 9 to 6 class labels. Even after reducing the complexity, we found that our data set suffered from class imbalance. To address this deficiency, we applied the SMOTE algorithm to generate synthetic data points and then constructed an ensemble of decision trees for irrep classification. Our decision trees identified 242 new compositions (from screening ∼3,200 compositions) that show potential for NCS ground state. We tested our prediction for 19 compositions using DFT, among which 17 were validated to have an NCS ground state structure. We thus find good agreement between our informaticsbased predictions and DFT ground state structures. One of the major design outcomes is the identification of two new multiferroics (NaGdRuO_{4} and NaYRuO_{4}), which were also determined to be thermodynamically stable.
It is also important to recognize that not all our ML predictions agreed with the DFT calculations. For example, KLaIrO_{4} and BaLaGaO_{4} were predicted to be NCS but our frozenphonon calculations and full structural relaxations from DFT indicate disagreement (Table 7). Moreover, the inconsistencies are found to be pronounced when both A/A′ cations have relatively large ionic sizes (for example, K, Ba or La). Our DFT calculations reveal that the presence of large A/A′ cations significantly reduces the amplitude of octahedral tilting, which we ascribe to the steric effects. Our ML models appear to incorrectly classify them as NCS.
There are several ways to reduce such misclassification errors and improve our ML prediction accuracies. We list some of them here: First, one of the most promising directions is to synthesize the predicted materials and determine the crystal structure for each compound, which will allow us to augment our data set with new data points and retrain our ML models. We anticipate our ML models to learn rapidly from these new data points and improve their prediction accuracy in subsequent iterations^{32}. Second, our current ML models are based on five decision tree classifiers; one of the natural extensions would be to construct more than five bootstrapped samples and generate additional decision trees (or apply a random forest algorithm with hundreds of classifiers) that could, in principle, reduce the misclassification errors. Also, exploring kernelbased ML algorithms, such as support vector machines and semisupervised learning schemes represent alternative informaticsbased avenues to gain confidence or reduce uncertainties in our predictions.
Furthermore, we demonstrated the use of the SMOTE algorithm for the first time in materials design problems; recently, a number of new algorithms^{35} have been developed for addressing similar classimbalance problems, which could also be explored. We note that classimbalance problems are ubiquitous in materials design and remains an unchartered territory in materials informatics^{54}. Finally, the choice of more robust features could also improve the prediction accuracies. Further computational efforts aimed at exhaustively evaluating the potential energy surface of related phases^{55} or alternatively, datadriven approaches^{56} involving inference models could further refine the predictions by addressing issues related to compound formability and orderdisorder transitions.
Notwithstanding the limitations, our approach provides a rational framework for structurebased design of novel functional materials with implications beyond the layered RP oxides. For instance, our methodology can be extended to explore NCS structures in Dion–Jacobson, Aurivillius, Brownmillerite or any crystal family. In principle, our strategy could also guide the search for materials with intriguing functionalities such as ferroaxiality^{57}. The key component to realize such predictions will be the database construction process and more importantly, the nature of available data (including features) would determine the type of questions that can be addressed. In terms of ML methods, offtheshelf classification learning with classimbalance algorithms (such as those demonstrated in this work) has the potential to provide insights necessary for guiding the accelerated search of new materials with targeted crystal symmetry or functionality. Advanced learning strategies (for example, semisupervised learning, algorithms beyond SMOTE and Bayesian methods) may be necessary, but the choice and its formulation will hinge critically on the available databases and/or prior domain knowledge.
Methods
Group theory
The group theoretical analysis was performed using the ISOTROPY^{58} tool and electronic resources available from the Bilbao Crystallographic Server^{59}.
Materials informatics
We used the following inference and ML methods in this paper: PCA for datadimensionality reduction and feature extraction^{60}, sampling techniques such as bootstrap method that constructs multiple data sets from our experimental data set via sampling with replacement, decision tree classification learning^{61} for formulating QCSR design rules and SMOTE^{34} to rectify the classimbalance problem. We chose the decision tree classification learner for the following reasons^{62}: (i) they are interpretable making the model transparent to domain experts; (ii) the splitting criteria (for example, Shannon entropy) serves to accomplish feature selection without the need for using any additional ML methods; (iii) they are scalable; and (iv) they have the capability to match the prediction accuracies of stateoftheart ML methods. ML calculations were performed using RSTUDIO and WEKA. The decision tree algorithm as implemented in WEKA was used. The data set was constructed using the Waber–Cromer orbital radii as features.
The classimbalance problem was rectified using the SMOTE algorithm. When there is classimbalance, these ML models could ignore the less frequently observed class labels and group them with other class labels in the nearestneighbor highdimensional data space that occur more frequently. This is not desirable for this work, because the frequency of occurrence of the NCS space groups, to begin with, are already underrepresented. The input to SMOTE is our data set and three additional parameters: (i) the underrepresented or minority class label that we intend to oversample, (ii) the number of nearest neighbours (k) and (iii) the number of extra synthetic data samples (in %) to be created. The SMOTE algorithm functions as follows: it takes the difference between the feature vectors (that is, orbital radii) of the underrepresented irreps and its k nearest neighbours and multiplies the difference by a random number between 0 and 1 to create a new feature vector. This new feature vector is augmented to the original data set. As a result, the selection of a random data point is made along the line segment (a simplified visual representation of the process based on our data set is given in Supplementary Fig. 1). We used PCA to ensure that SMOTE did not affect the manifold of our data set. We use the SMOTE algorithm as implemented in WEKA^{37}.
Electronic structure calculations
DFT calculations for all RP compounds were performed using the planewave pseudopotential code, Quantum ESPRESSO (QE)^{63} to obtain the total energies. We used ultrasoft pseudopotentials^{64} with the PBEsol exchangecorrelation functional^{65} taken from the PSlibrary^{66}. A planewave cutoff of 60 Ry was used during the ionic and electronic relaxation steps. Electron correlations in Ru4d and Ir5d electrons were treated using the HubbardU method within the Dudarev formalism^{67}. Spinpolarized calculations with collinear ferromagnetic spin order were imposed on the Ru and Ir atoms. An effective HubbardU of 1.5 eV was chosen in both cases. Frozen phonon calculations were performed using PHONOPY code^{68} that uses the forces from QE as input for calculating the dynamical matrices and interatomic force constants. We employed a supercell of size 2 × 2 × 2 with 112 atoms for the frozen phonon calculations.
All calculations to obtain bandgaps and piezoelectric coefficients for NaRSnO_{4} were performed using DFT as implemented in the Vienna ab initio Simulation Package^{69,70}. The crystal structures were taken from converged QE calculations. We used projector augmentedwave potentials^{71} with the PBEsol functional. The piezoelectric and elastic tensors were computed within the densityfunctional perturbation theory^{72,73} with a planewave cutoff of 800 eV. The density of states were computed first with PBEsol, and then with different amounts of exact exchange using HSE (Heyd–Scuseria–Ernzerhof). By comparing the experimental bandgap of BaSnO_{3} with our computed values, we selected the amount of exact exchange to use (here 35%).
Data availability
The data sets for the informatics study and the DFT optimized crystallographic information files are deposited at figshare (refs 41, 53.).
Additional information
How to cite this article: Balachandran, P. V. et al. Learning from data to design functional materials without inversion symmetry. Nat. Commun. 8, 14282 doi: 10.1038/ncomms14282 (2017).
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
 1
Fletcher, S. P. Building blocks of life: growing the seeds of homochirality. Nat. Chem. 1, 692–693 (2009).
 2
Halasyamani, P. S. & Poeppelmeier, K. R. Noncentrosymmetric oxides. Chem. Mater. 10, 2753–2769 (1998).
 3
YogevEinot, D. & Avnir, D. Quantitative symmetry and chirality of the molecular building blocks of quartz. Chem. Mater. 15, 464–472 (2003).
 4
Hazen, R. M. & Sholl, D. S. Chiral selection on inorganic crystalline surfaces. Nat. Mater. 2, 367–374 (2003).
 5
Haertling, G. H. Ferroelectric ceramics: history and technology. J. Am. Ceram. Soc. 82, 797–818 (1999).
 6
Halasyamani, P. S. Asymmetric cation coordination in oxide materials: influence of lonepair cations on the intraoctahedral distortion in d^{0} transition metals. Chem. Mater. 16, 3586–3592 (2004).
 7
Ok, K. M. et al. Distortions in octahedrally coordinated d^{0} transition metal oxides: a continuous symmetry measures approach. Chem. Mater. 18, 3176–3183 (2006).
 8
Brock, C. P. & Dunitz, J. D. Towards a grammar of crystal packing. Chem. Mater. 6, 1118–1127 (1994).
 9
Müller, K. A. & Burkard, H. SrTiO3: an intrinsic quantum paraelectric below 4 K. Phys. Rev. B 19, 3593–3602 (1979).
 10
Ruddlesden, S. N. & Popper, P. New compounds of the K2MF4 type. Acta Crystallogr. 10, 538–539 (1957).
 11
Benedek, N. A. & Fennie, C. J. Hybrid improper ferroelectricity: a mechanism for controllable polarizationmagnetization coupling. Phys. Rev. Lett. 106, 107204 (2011).
 12
Benedek, N. A., Rondinelli, J. M., Djani, H., Ghosez, P. & Lightfoot, P. Understanding ferroelectricity in layered perovskites: new ideas and insights from theory and experiments. Dalton Trans. 44, 10543–10558 (2015).
 13
Roy, A., Bennett, J. W., Rabe, K. M. & Vanderbilt, D. HalfHeusler semiconductors as piezoelectrics. Phys. Rev. Lett. 109, 037602 (2012).
 14
Xue, D. et al. Accelerated search for materials with targeted properties by adaptive design. Nat. Commun. 7, 11241 (2016).
 15
Balachandran, P. V., Puggioni, D. & Rondinelli, J. M. Crystalchemistry guidelines for noncentrosymmetric A2BO4 RuddlesdenPopper oxides. Inorg. Chem. 53, 336–348 (2014).
 16
Akamatsu, H. et al. Inversion symmetry breaking by oxygen octahedral rotations in the RuddlesdenPopper NaRTiO4 family. Phys. Rev. Lett. 112, 187602 (2014).
 17
Gupta, A. S. et al. Improper inversion symmetry breaking and piezoelectricity through oxygen octahedral rotations in layered perovskite family LiRTiO4 (R=Rare Earths). Adv. Electron. Mater. 2, 1500196 (2016).
 18
Li, Y., Zhang, L., Ma, Y. & Singh, D. J. Tuning optical properties of transparent conducting barium stannate by dimensional reduction. APL Mater. 3, 011102 (2015).
 19
Benedek, N. A. & Birol, T. ‘Ferroelectric’ metals reexamined: fundamental mechanisms and design considerations for new materials. J. Mater. Chem. C 4, 4000–4015 (2016).
 20
Kim, T. H. et al. Polar metals by geometric design. Nature 533, 68–72 (2016).
 21
Birol, T., Benedek, N. A. & Fennie, C. J. Interface control of emergent ferroic order in RuddlesdenPopper Srn+1TinO3n+1 . Phys. Rev. Lett. 107, 257602 (2011).
 22
Lander, G. H., Brown, P. J., Spal/ek, J. & Honig, J. M. Structural and magnetization density studies of La2NiO4 . Phys. Rev. B 40, 4463–4471 (1989).
 23
Rodgers, J. A., Battle, P. D., Dupré, N., Grey, C. P. & Sloan, J. Cation and spin ordering in the n=1 RuddlesdenPopper phase La2Sr2LiRuO8 . Chem. Mater. 16, 4257–4266 (2004).
 24
Fennie, C. J. & Rabe, K. M. Firstprinciples investigation of ferroelectricity in epitaxially strained Pb2TiO4 . Phys. Rev. B 71, 100102 (2005).
 25
Zhang, R.Z. et al. RuddlesonPopper phase SnO(SnTiO3)n: leadfree layered ferroelectric materials with large spontaneous polarization. J. Appl. Phys. 116, 174101 (2014).
 26
Balachandran, P. V., Cammarata, A., NelsonCheeseman, B. B., Bhattacharya, A. & Rondinelli, J. M. Inductive crystal field control in layered metal oxides with correlated electrons. APL Mater. 2, 076110 (2014).
 27
Balachandran, P. V. & Rondinelli, J. M. Massive band gap variation in layered oxides through cation ordering. Nat. Commun. 6, 6191 (2015).
 28
Cammarata, A. & Rondinelli, J. M. Ferroelectricity from coupled cooperative Jahn–Teller distortions and octahedral rotations in ordered RuddlesdenPopper manganates. Phys. Rev. B 92, 014102 (2015).
 29
Souri, M. et al. Investigations of metastable Ca2IrO4 epitaxial thinfilms: systematic comparison with Sr2IrO4 and Ba2IrO4 . Scientific Rep. 6, 25967 (2016).
 30
Waber, J. T. & Cromer, D. T. Orbital radii of atoms and ions. J. Chem. Phys. 42, 4116–4123 (1965).
 31
Zhang, X. & Zunger, A. Diagrammatic separation of different crystal structures of A2BX4 compounds without energy minimization: a pseudopotential orbital radii approach. Adv. Funct. Mater. 20, 1944–1952 (2010).
 32
Balachandran, P. V., Xue, D. & Lookman, T. StructureCurie temperature relationships in BaTiO3based ferroelectric perovskites: anomalous behavior of (Ba,Cd)TiO3 from DFT, statistical inference, and experiments. Phys. Rev. B 93, 144111 (2016).
 33
NelsonCheeseman, B. B. et al. Polar cation ordering: a route to introducing >10% bond strain into layered oxide films. Adv. Funct. Mater. 24, 6884–6891 (2014).
 34
Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. SMOTE: Synthetic Minority Over sampling Technique. J. Artif. Int. Res. 16, 321–357 (2002).
 35
Nanni, L., Fantozzi, C. & Lazzarini, N. Coupling different methods for overcoming the class imbalance problem. Neurocomputing 158, 48–61 (2015).
 36
Balachandran, P. V., Theiler, J., Rondinelli, J. M. & Lookman, T. Materials prediction via classification learning. Scientific Rep. 5, 13285 (2015).
 37
Hall, M. et al. The WEKA Data Mining Software: an update. SIGKDD Explor. Newsl. 11, 10–18 (2009).
 38
R Core Team. R: A Language and Environment for Statistical Computing R Foundation for Statistical Computing (2012) ISBN 3900051070.
 39
Ghiringhelli, L. M., Vybiral, J., Levchenko, S. V., Draxl, C. & Scheffler, M. Big data of materials science: Critical role of the descriptor. Phys. Rev. Lett. 114, 105503 (2015).
 40
Fujii, K. et al. New perovskiterelated structure family of oxideion conducting materials NdBaInO4 . Chem. Mater. 26, 2488–2491 (2014).
 41
Balachandran, P. V., Young, J., Lookman, T. & Rondinelli, J. M. Learning from data to design functional materials without inversion symmetry (Datasets). doi:10.6084/m9.figshare.4264190.v1 (2016).
 42
Maeno, Y., Nakatsuji, S. & Ikeda, S. Metal–insulator transitions in layered ruthenates. Mater. Sci. Eng. B 63, 70–75 (1999).
 43
Shannon, R. D. Revised effective ionic radii and systematic studies of interatomic distances in halides and chalcogenides. Acta Crystallogr. A 32, 751–767 (1976).
 44
Heyd, J., Scuseria, G. E. & Ernzerhof, M. Hybrid functionals based on a screened Coulomb potential. J. Chem. Phys. 118, 8207–8215 (2003).
 45
Kim, H. J. et al. Physical properties of transparent perovskite oxides (Ba,La)SnO3 with high electrical mobility at room temperature. Phys. Rev. B 86, 165205 (2012).
 46
Mazin, I. I. & Singh, D. J. Electronic structure and magnetism in Rubased perovskites. Phys. Rev. B 56, 2556–2571 (1997).
 47
Rondinelli, J. M., Caffrey, N. M., Sanvito, S. & Spaldin, N. A. Electronic properties of bulk and thin film SrRuO3: search for the metal–insulator transition. Phys. Rev. B 78, 155107 (2008).
 48
Jung, J. H. et al. Change of electronic structure in Ca2RuO4 induced by orbital ordering. Phys. Rev. Lett. 91, 056403 (2003).
 49
Gorelov, E. et al. Nature of the Mott transition in Ca2RuO4 . Phys. Rev. Lett. 104, 226401 (2010).
 50
Akbarzadeh, A. R., Ozoliņš, V. & Wolverton, C. Firstprinciples determination of multicomponent hydride phase diagrams: application to the LiMgNH system. Adv. Mater. 19, 3233–3239 (2007).
 51
Saal, J. E., Kirklin, S., Aykol, M., Meredig, B. & Wolverton, C. Materials design and discovery with highthroughput density functional theory: the Open Quantum Materials Database (OQMD). JOM 65, 1501–1509 (2013).
 52
Körbel, S., Marques, M. A. L. & Botti, S. Stability and electronic properties of new inorganic perovskites from highthroughput ab initio calculations. J. Mater. Chem. C 4, 3157–3167 (2016).
 53
Balachandran, P. V., Young, J., Lookman, T. & Rondinelli, J. M. Learning from data to design functional materials without inversion symmetry (CIF files). doi:10.6084/m9.figshare.4264214.v1 (2016).
 54
Rondinelli, J. M., Poeppelmeier, K. R. & Zunger, A. Research update: towards designed functionalities in oxidebased electronic materials. APL Mater. 3, 080702 (2015).
 55
Gautier, R. et al. Prediction and accelerated laboratory discovery of previously unknown 18electron ABX compounds. Nat. Chem. 7, 308–316 (2015).
 56
Balachandran, P. V., Broderick, S. R. & Rajan, K. Identifying the ‘inorganic gene’ for hightemperature piezoelectric perovskites through statistical learning. Proc. R. Soc. Ser. A 467, 2271–2290 (2011).
 57
Hlinka, J., Privratska, J., Ondrejkovic, P. & Janovec, V. Symmetry guide to ferroaxial transitions. Phys. Rev. Lett. 116, 177602 (2016).
 58
Stokes, H. T., Hatch, D. M. & Campbell, B. J. ISOTROPY Software Suite. http://stokes.byu.edu/iso/isotropy.php (2007).
 59
Kroumova, E. et al. Bilbao Crystallographic Server: useful databases and tools for phasetransition studies. Phase Transit. 76, 155–170 (2003).
 60
Jolliffe, I. in Wiley StatsRef: Statistics Reference Online Wiley (2014).
 61
Quinlan, J. R. in Proceedings of the Thirteenth National Conference on Artificial Intelligence—Volume 1, AAAI’96 725–730AAAI Press (1996).
 62
Geurts, P., Irrthum, A. & Wehenkel, L. Supervised learning with decision treebased methods in computational and systems biology. Mol. BioSyst. 5, 1593–1605 (2009).
 63
Giannozzi, P. et al. QUANTUM ESPRESSO: a modular and opensource software project for quantum simulations of materials. J. Phys. 21, 395502 (2009).
 64
Vanderbilt, D. Soft selfconsistent pseudopotentials in a generalized eigenvalue formalism. Phys. Rev. B 41, 7892–7895 (1990).
 65
Perdew, J. P. et al. Restoring the densitygradient expansion for exchange in solids and surfaces. Phys. Rev. Lett. 100, 136406 (2008).
 66
Dal Corso, A. Pseudopotentials periodic table: from H to Pu. Comput. Mater. Sci. 95, 337–350 (2014).
 67
Dudarev, S. L., Peng, L.M., Savrasov, S. Y. & Zuo, J.M. Correlation effects in the groundstate charge density of Mott insulating NiO: a comparison of ab initio calculations and highenergy electron diffraction measurements. Phys. Rev. B 61, 2506–2512 (2000).
 68
Togo, A., Oba, F. & Tanaka, I. Firstprinciples calculations of the ferroelastic transition between rutiletype and CaCl2type SiO2 at high pressures. Phys. Rev. B 78, 134106 (2008).
 69
Kresse, G. & Hafner, J. Ab initio molecular dynamics for liquid metals. Phys. Rev. B 47, 558–561 (1993).
 70
Kresse, G. & Furthmüller, J. Efficiency of ab initio total energy calculations for metals and semiconductors using a planewave basis set. Comput. Mater. Sci. 6, 15–50 (1996).
 71
Blöchl, P. E. Projector augmentedwave method. Phys. Rev. B 50, 17953–17979 (1994).
 72
Baroni, S. & Resta, R. Ab initio calculation of the lowfrequency Raman cross section in silicon. Phys. Rev. B 33, 5969–5971 (1986).
 73
Gajdos, M., Hummer, K., Kresse, G., Furthmuller, J. & Bechstedt, F. Linear optical properties in the PAW methodology. Phys. Rev. B 73, 045112 (2006).
 74
Friedt, O. et al. Structural and magnetic aspects of the metal–insulator transition in Ca2−xSrxRuO4 . Phys. Rev. B 63, 174432 (2001).
 75
Reul, J. et al. Temperaturedependent optical conductivity of layered LaSrFeO4 . Phys. Rev. B 87, 205142 (2013).
 76
SánchezAndújar, M. & SeñaŕisRodŕiguez, M. A. Synthesis, structure and microstructure of the layered compounds Ln1−xSr1+xCoO4 (Ln: La, Nd and Gd). Solid State Sci. 6, 21–27 (2004).
 77
Kao, T.H. et al. Crystal structure and physical properties of Cr and Mn oxides with 3d^{3} electronic configuration and a K2NiF4type structure. J. Mater. Chem. C 3, 3452–3459 (2015).
 78
Romero, J. et al. Phase transitions and magnetic behaviour of R1−xCa1+xCrO4 oxides (R=Y or Sm) (0≤x≤0.5). J. Alloys Compd. 225, 203–207 (1995).
 79
NguyenTrutDinh, M. M., Vlasse, M., Perrin, M. & Le Flem, G. Un oxyde magnetique bidimensionnel: CaLaFeO4 . J. Solid State Chem. 32, 1–8 (1980).
 80
Cao, L. P. et al. Highpressure and hightemperature synthesis and physical properties of Ca2CrO4 solid. AIP Adv. 6, 055010 (2016).
Acknowledgements
P.V.B. and T.L. acknowledge funding support from the Los Alamos National Laboratory (LANL) LDRD no. 20140013DR on Materials Informatics and the Center for Nonlinear Studies (CNLS). J.M.R. and J.Y. were supported by NSF under grant nos. DMR1454688 and DMR1420620, respectively. The authors acknowledge the HighPerformance Computing Modernization of the DOD and LANL Institutional Computing (IC) for computational resources that have contributed to the research results reported herein.
Author information
Affiliations
Contributions
The study was planned, calculations performed and the manuscript prepared by P.V.B., J.Y., T.L. and J.M.R. All authors discussed the results, wrote and commented on the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Information
Supplementary Figures, Supplementary Tables, Supplementary Notes and Supplementary References (PDF 13432 kb)
Rights and permissions
This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
About this article
Cite this article
Balachandran, P., Young, J., Lookman, T. et al. Learning from data to design functional materials without inversion symmetry. Nat Commun 8, 14282 (2017). https://doi.org/10.1038/ncomms14282
Received:
Accepted:
Published:
Further reading

Structural Diversity in Layered Hybrid Perovskites, A2PbBr4 or AA′PbBr4, Templated by Small DiscShaped Amines
Inorganic Chemistry (2020)

Machine Learning for Materials Scientists: An Introductory Guide toward Best Practices
Chemistry of Materials (2020)

Efficient Phase Diagram Sampling by Active Learning
The Journal of Physical Chemistry B (2020)

Representations and descriptors unifying the study of molecular and bulk systems
International Journal of Quantum Chemistry (2020)

BigData Science in Porous Materials: Materials Genomics and Machine Learning
Chemical Reviews (2020)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.