## Background & Summary

High-throughput screening of material databases integrated with first-principles calculations has been increasingly successful in the discovery of new functional materials1,2,3,4. While many of the individual components for performing high-throughput searches exist, the infrastructure needed to connect and automate all the necessary components is still under development. The identification of ferroelectrics through symmetry arguments has been an active area of research5,6,7,8,9,10,11,12,13,14,15,16. Moreover, lists of known ferroelectrics have been previously curated17,18,19,20,21,22,23. However, the identification of new ferroelectrics has yet to be automated in a manner readily applicable to emerging materials databases24,25,26,27,28,29,30. Automated high-throughput searches for ferroelectric candidates would provide a valuable guide for in-depth computational studies and experimental efforts.

Ferroelectrics have important technological applications, such as in tunable capacitors, non-volatile random access memory devices, and electro-optical data storage. In addition, ferroelectrics are capable of displaying couplings between their electronic degrees of freedom with magnetic or lattice degrees of freedom in multiferroic materials. Ferroelectricity often arises from a structural phase transition between a high-symmetry nonpolar structural phase to a low-symmetry polar structural phase with decreasing temperature, resulting in the emergence of a spontaneous polarization31,32,33. In this scenario, the atomic geometry of the nonpolar structure can continuously distort such that the new polar structure has a subset of the symmetries of the original structure, satisfying the requirements of a second-order phase transition; in these cases, the polar space group must be an isotropy subgroup of the nonpolar space group, which is a stronger requirement than they only share a simple group-subgroup relation34,35,36.

Thus, certain ferroelectrics can be systematically screened by searching for pairs of nonpolar and polar structures related by a small symmetry-breaking distortion. In the late 1980s, Abrahams performed some of the earliest searches for ferroelectrics in crystallographic databases using symmetry criteria5,6. More recently, automated searches for new ferroelectric candidates have used symmetry arguments to identify nonpolar reference structures for existing polar materials7,8,9. Other studies have used a combination of group theoretic and first-principles calculations to propose ferroelectric candidates10,11,12. Bennett and co-workers proposed using high-throughput calculations to perform chemical substitution into structures of known classes of ferroelectrics13,14,15. Recent work used high-throughput phonon calculations to identify ferroelectrics through polar soft phonon modes of nonpolar phases16.

Previous studies have focused on a limited number of compounds or families of compounds using a relatively narrow set of symmetry conditions. Current curated lists of ferroelectrics only include known ferroelectrics that have been experimentally verified. With shrinking computing costs, high-throughput material searches using first-principles methods provide an efficient strategy to discover and catalog materials. Ferroelectric databases and systematic screening of properties such as band gaps, polarizations, volume expansion, critical temperatures, and coupling to magnetic and/or topological degrees of freedom may lead to new functional materials and potentially new physical phenomena.

In this work, we integrate density functional theory (DFT), crystal structure databases, symmetry tools, workflow software, and a custom analysis toolkit to build a workflow capable of generating libraries of known, previously-proposed and newly-proposed ferroelectrics. This workflow is general and can be used with any crystal structure dataset. We present the results from performing this workflow on the Materials Project database of inorganic crystal structures24. We screen over 67,000 material structures using symmetry relations between nonpolar and polar structure pairs and calculating the polarization from first-principles calculations. We identify 255 ferroelectric candidates, 200 being classified as high-quality candidates by a stringent verification process. Within these high-quality candidates, 74 are known or previously proposed, and 126 are new ferroelectrics. With the workflow developed here, we construct the first automatically-curated first-principles dataset of diverse, multi-class known and new ferroelectrics calculated with a standardized method that permits straightforward comparison. This dataset can be used to develop new tools and criteria for studying ferroelectricity across diverse materials systems. In addition our code for conducting this search has been contributed to the open-source python packages atomate and pymatgen so others can conduct searches of their own and build directly on this work37,38.

Our automated workflow has three stages: symmetry analysis, first-principles calculations, and post-processing. Accordingly, the rest of the work is organized as follows: the description of the workflow, based in the concept of ferroelectric nonpolar-polar symmetry pair, is described in the Methods section. Technical aspects of our workflow and database are included in the section Data Records. Finally, we validate our workflow method against experimental databases of ferroelectrics and verify our workflow is comparable to previous first-principles results in the section Technical Validation.

## Methods

### Identifying ferroelectricity from first principles

Ferroelectrics are characterized by a polarization versus electric-field hysteresis loop. Experimentally, the spontaneous polarization can be determined as half of the change in polarization at zero external field31. The spontaneous polarization is not a direct observable; one measures the change in spontaneous polarization between two stable configurations of a material39.

In three dimensions, the only space groups compatible with a polarization, meaning they leave a vector invariant under its symmetry operations, are those with polar point groups. Out of the 32 crystallographic point groups, 10 are polar; these polar point groups can keep points along specific lines (point groups 2, 3, 4, 6, mm2, 4mm, 3m, 6mm), planes (point group m), or all points in three-dimensional space (point group 1) invariant40,41. All other point groups are nonpolar. We define polar structures as crystal structures with a polar space group, which is composed of a polar point group plus translations, likewise for nonpolar structures.

Following the modern theory of polarization39,42,43,44,45,46,47, the polarization, P, of a crystal is defined as,

$${\bf{P}}={{\bf{P}}}_{0}+\sum _{i\in \{a,b,c\}}{n}_{i}\frac{e{{\bf{R}}}_{i}}{\Omega },$$
(1)

where e is the charge of the electron, $${n}_{i}$$ is an integer, Ri is a primitive lattice vector, and Ω is the unit cell volume (bold letters denote vectors). P0 includes electronic and ionic contributions. The second term on the right hand side of Eq. 1 is the quantum of polarization, $$e{{\bf{R}}}_{i}/\Omega$$, a consequence of translation symmetry. In the general case, P0 is defined up to any integer multiple of $$e{{\bf{R}}}_{i}/\Omega$$; for nonmagnetic crystals containing elements from even columns of the Periodic Table, P0 is defined up to even integer multiples. Since we are screening many systems here, we use the more general definition in this work.

Only differences of the computed polarization on the “same branch” are physically meaningful, where different branches are related by integer multiples of the quantum of polarization. Equivalently, the evolution of the polarization along an adiabatic path between two states must be smooth. Nonpolar space groups can only host formal polarizations that are zero or one-half modulo the polarization quantum. We use a nonpolar reference structure to calculate the change in polarization due to a polar distortion. In general, paths between two opposite polar configurations are sufficient to compute the change in polarization; however, we choose to use a nonpolar reference structure in this work as is standard for calculating the polarization from first principles.

To recover a smooth polarization path, we ensure the nonpolar structure must be continuously deformable into the polar structure along a path that preserves the symmetry of the polar structure and for which the system remains insulating. We then perform calculations of multiple structures along the distortion path to compute the spontaneous polarization, which can be compared to experiment. Using this approach, the spontaneous polarization can be directly predicted using first-principles methods with good accuracy48.

Hence, for the purposes of this work, we only consider ferroelectrics for which a high-symmetry nonpolar reference structure can be readily identified in the database for a lower-symmetry polar structure that can support a polarization42. We automate a search for compounds supporting two such phases and then compute the polarization difference along the structural path connecting the two structures.

If a polar ferroelectric structure corresponds to a metastable state, and is higher in energy than a nearby non-polar ground state by a small amount, the system can be considered an antiferroelectric49,50. Antiferroelectrics exhibit double hysteresis loops in polarization versus electric-field measurements; the field-induced first-order phase transition originates with an energy barrier between the nonpolar ground state and the polar metastable phase. In an antiferroelectric, the nonpolar ground state phase is related by a nonpolar distortion to a distinct nonpolar reference structure; the ground state structure is generally characterized as “anti-polar” to distinguish it from the nonpolar reference structure. Symmetry conditions for antiferroelectrics are described in ref. 51. For completeness, we note that to identify antiferroelectrics using the workflow presented here, in addition to finding a reference nonpolar phase, the polar metastable ferroelectric phase and an antipolar ground state, the material would need to display a small energy difference between the polar and antipolar structures on the order of 1–10 meV52.

### Workflow overview

We first describe the general workflow diagram comprising symmetry analysis, first-principles calculations, and post-processing. As shown in Fig. 1, the complete workflow involves the passing of data between many separate calculations. In developing our workflow, we automate the following tasks:

1. 1.

Identifying candidate materials possessing nonpolar-polar structure pairs related by a continuous distortion (an isotropy subgroup symmetry relation).

2. 2.

Performing spin-polarized DFT calculations of changes in total energy, band gap, and polarization for multiple structures along the nonpolar-polar distortion.

3. 3.

Post-processing calculation data to compute the spontaneous polarization of the polar ferroelectric phase.

4. 4.

Validating the calculation quality for each ferroelectric candidate.

5. 5.

Creating an interface for viewing the results for all candidates (see the section Graphical Interface).

We start by choosing a crystal structure database on which to perform the search (see Structure Selection Symmetry Analysis). We emphasize that any crystallographic database (e.g. any of the databases described in24,25,26,27,28,29,30) can be used to perform our workflow, as long as the atomic coordinates and lattice parameters of the structures are provided.

Within the chosen database, we perform a symmetry analysis to find candidate materials possessing nonpolar-polar structure pairs related by a continuous symmetry deformation. Any such pairs found to satisfy the symmetry deformation criteria are stored in the Distortion Database as being deformable by symmetry. This criteria includes the following conditions: (1) The polar structure belongs to a space group that is a subgroup of the space group of the nonpolar structure; and (2) There exists a transformation matrix between the high-symmetry setting of the nonpolar structure to the low-symmetry setting of the polar structure. The latter imposes that the distortion of the lattice parameters and atomic coordinates between the nonpolar and polar structures is continuous, meaning the polar structure belongs to an isotropy subgroup of the nonpolar structure.

We then carry out DFT calculations on the candidate pairs to extract the changes in the band gaps, total energies, and polarization along the nonpolar-polar distortion (see Computational Methods). These results are stored in a Workflow Database and then accessed by our Computing Resources to perform the calculations. Next, the information stored in the Distortion, Workflow, and Calculation Databases is used together to post-process quantities such as the computed spontaneous polarization and to validate ferroelectric candidates using experimental and previous first-principles results (see Post-processing and Spontaneous Polarization Values and Verification of Computational Methodology). The information needed to assess the quality and properties of the candidates is then added to the Candidate Database where it can be accessed by our web Interface for viewing the candidate materials in aggregate (see Graphical Interface). Finally, candidates are screened to ensure the polarization and energy profile across the nonpolar-polar distortion are smooth and continuous, i.e. all calculations ended correctly and provide reliable results.

In the sections below, we describe in detail the methods used for creating an automatically curated dataset of ferroelectrics from the Materials Project database24. The Materials Project database is largely based on structures from the Inorganic Crystal Structure Database (ICSD)27,28 and includes hypothetical structures created through stoichiometric substitution. We use the Materials Project database to test our workflow. Our results for the Materials Project are not intended as the most general curated list of ferroelectrics; however, as the first automatically obtained list of ferroelectrics, they uncover new candidates and provide a blueprint for further studies. More elaborately curated lists may be constructed by applying our workflow to additional databases in future studies. The results from applying our workflow are described below and summarized in Table 1. We note that our workflow is modular and open-source, so it can be readily adapted and applied by others to expand the search for ferroelectrics and related materials such as antiferroelectrics and multiferroics.

### Structure selection

As motivated earlier, the input to our workflow is a collection of candidate materials possessing nonpolar-polar structure pairs. There are several methods that can be used to create candidate nonpolar-polar structure pairs. For example, one can apply a polar distortion to an existing nonpolar structure or create a hypothetical nonpolar reference structure for an existing polar structure. In this work, to classify a material as a ferroelectric candidate, we require both nonpolar and polar structures to be present in the database. As shown below, even this direct approach provides new candidate materials, previously overlooked as ferroelectrics. Future studies may choose to relax this requirement to identify a greater number of promising materials.

To search for compatible nonpolar-polar structure pairs in the Materials Project dataset, we first select compounds possessing nonpolar and polar structures with space groups which are related by a group-subgroup relation. Note that in principle the same compound may display more than one ferroelectric structural transition, and therefore have more than one nonpolar-polar symmetry pair. For each of these pairs, we require that the number of sites in the nonpolar structure is less than or equal to the number of sites in the polar structure. We perform this initial query using pymatgen, spglib, and the Materials Project API37,53,54,55. We provide the number of nonpolar-polar structures pairs resulting from this query in the top box of Table 1.

At the time of this search, the Materials Project database had approximately 67,000 structures, approximately 15,000 of which are polar. We find approximately 17,000 nonpolar-polar structure pairs related via group-subgroup space group relations. This number is large, in part, because the same polar structure may be paired with multiple possible nonpolar reference structures, and vice versa. This number is also large because the requirement that the polar structure is in a subgroup of the nonpolar space group is a much weaker requirement than the polar structure belongs to an isotropy subgroup of the nonpolar structure–which we check later in the workflow. These roughly 17,000 pairs contain approximately 1,600 of the approximately 10,000 distinct polar compositions in the Materials Project database. The remaining polar compositions in the Materials Project do not have symmetry compatible nonpolar structures within the same composition present in the database. We note that it is possible to propose hypothetical nonpolar reference structures for polar candidates using group theoretic methods or by relaxing the symmetry tolerance between nonpolar-polar distortions7,8,9,16; this is left for future work.

#### Naming conventions

We adopt the pymatgen alphabetical_formula method for the Composition class (with spaces and 1 s removed) to output consistent formulas for our candidates. We note that this method orders elements in such a way that does not match conventions in the literature. For example, we use O3 PbTi where the standard in the literature is PbTiO3. Compositions printed by pymatgen also differ from those used in mineralogy, such as for boracite, lawsonite and many other minerals in our dataset. In our datafiles, we also provide formula name output using the pymatgen reduced_formula method for the Composition class, which sorts elements by electronegativity.

### Symmetry analysis

The automated nature of our ferroelectric search relies on strict symmetry criteria. As described in the Structure Selection section, we pre-screen our candidate nonpolar-polar structure pairs using the symmetry tools in pymatgen and spglib to ensure that these pairs satisfy preliminary group-subgroup relationships. We then use the Structure Relations symmetry tool provided by the Bilbao Crystallographic Server (BCS)56,57,58 to impose the symmetry criteria described in the Workflow Overview, namely, to obtain a transformation matrix connecting the lattice parameters and atomic coordinates of the structure pair59,60. The BCS has a freely available web interface for accessing a wide variety of symmetry tools. We create python scripts to automate interaction with and scrape returned data from the BCS to perform our symmetry checks using the python package mechanize61. The Structure Relations tool checks the following:

1.1 The group-subgroup index relations are compatible. The index of a group-subgroup relation indicates the number of ferroelectric domains (distinct polar variants) that arise from the symmetry breaking of the high-symmetry structure.

1.2 There exists a path of maximal subgroups between the high-symmetry structure and low-symmetry structure.

1.3 The Wyckoff position splitting of the high-symmetry structure is compatible with the Wyckoff positions of the low-symmetry structure.

1.4 The lattice of the high-symmetry structure in the low-symmetry setting must be within a defined tolerance of the lattice of the low-symmetry structure (see Symmetry Precision section).

1.5 Each atom in the high-symmetry structure in the low-symmetry setting can be paired to an atom in the low-symmetry structure such that atom pairs are separated by a distance no greater than a given tolerance (see Symmetry Precision section).

Structure Relations takes Crystallographic Information Files (CIFs) of high-symmetry and low-symmetry structures and tolerance threshholds as arguments. We use a lattice tolerance of 3 Å and 10° for lattice parameters and angles, respectively. These tolerances are generous for materials with average sized unit cells (i.e.with lattice parameters less than 20 Å) and permit a wide variety of distortions. For the present work, high-quality candidates are reported for a maximum pairing distance of 1.5 Å. As shown in Table 1, out of the 17,000 structure pairs that we test with Structure Relations, 413 are found to be deformable by symmetry with a maximum distortion less than or equal to 1.5 Å.

#### Symmetry precision

Symmetry precision is a tolerance factor used to assess whether an atom is equivalent to another after a symmetry operation up to a maximum distance. A symmetry precision between 10−1 and 10−5 Å is typically used. In the Materials Project database, a symmetry tolerance of 10−1 Å is used for the reported space group stored in the database. We use the same tolerance to generate CIFs sent to the BCS Structure Relations.

We evaluated how varying the symmetry tolerance changes the resolved space group for all the structures in the Materials Project. We were able to determine this efficiently by using a binary search on a log10 scale for a maximum and minimum symmetry tolerance of 10−1 Å and 10−5 Å, respectively. Out of the 67,000 structures we checked, 50,000 (75%) structures were resolved into one distinct space group for the entire symmetry precision range. For additional discussion about the sensitivity of symmetry precision on resulting space groups in the search for ferroelectrics, see refs. 16,62.

### DFT calculation details

In our workflow, we perform spin-polarized DFT calculations using the Vienna Ab initio Software Package (VASP) version 5.3.563,64,65. We use the generalized gradient approximation (GGA) functional of Perdew, Burke, and Ernzerhof (PBE)66. Our calculations use PAW pseudopotentials and an energy cutoff of 520 eV for the plane-wave basis67,68; this is 1.3 times the highest cutoff recommended for the pseudopotentials used69. Structures are initialized with ferromagnetic ordering in all cases in this work. Since the default is for parallel alignment of the spins, we expect the workflow to be reliable for nonmagnetic or ferromagnetic materials. Materials with more complicated magnetic ground states, such as antiferromagnets, would require consideration of different antiferromagnetic orderings. Therefore, some common multiferroics possessing antiferromagnetic or near antiferromagnetic ordering may not be identified by this workflow. Extensions to include antiferromagnetic spin arrangements are relegated to future work. These settings correspond to default values used to create the Materials Project database and therefore allow a direct comparison of our results with the Materials Project database.

We use the Berry phase approach from refs. 39,43,44,70,71, as implemented in VASP to calculate the electronic part of the macroscopic spontaneous polarization. We calculate the ionic part of the polarization using the point charge and position for each atom in the unit cell, see the calc_ionic function in pymatgen.analysis.ferroelectricity.polarization for details.

We use the default parameters for VASP inputs as defined in pymatgen (and used by Materials Project) and atomate69,72,73. For details on these parameters, see the documentation for pymatgen.io.vasp.sets.MPStaticSet. We use a Hubbard U correction to correct the DFT-PBE description of d states of select oxides and fluorides following the approach in ref. 74. To see the guidelines for which compounds we apply a U, see refs. 24,69,72,73. We use a reciprocal k-point density of 50 k-points per (1/Å)3 for structural relaxations and 100 k-points per (1/Å)3 for static and polarization calculations. We use total energy convergence criteria of 5 × 10−5 eV per atom for the electronic self-consistent loop and 5 × 10−4 eV/Å per atom for the ionic relaxation loop for structural optimizations. These convergence parameters were tested against higher-accuracy convergence parameters on a set of 182 chemically diverse compounds in ref. 73, yielding total energies within 15 meV/atom and lattice volumes within 7.5% of the higher-accuracy calculations for nearly 96% of the compounds.

We note that while the local density approximation (LDA) is commonly used to describe certain ferroelectric oxides, and therefore to compute their polarizations, use of a generalized gradient approximation (GGA), such as the PBE functional, tends to be standard for wider classes of materials nowadays. PBE is also the default functional used by the Materials Project for structural relaxations and calculating material properties. Thus, we use PBE for this effort. Our results are in line with the typical overestimation of PBE for the lattice parameters, and therefore, we expect a similar overestimation of the polarization. In addition, while DFT-PBE tends to underestimate electronic band gaps, the latter plays a minimal role in the determination of standard ferroelectric materials, and as shown below, will only limit the computation of the spontaneous polarization for a small number of them.

### Scientific workflow packages

We construct the scientific workflows to perform the structural relaxations and spin-polarized DFT calculations of energy, band gap, and polarization using the FireWorks and atomate python packages38,75. FireWorks is built for managing computational scientific workflows. atomate is built for constructing workflows for multiple computational material science codes, such as VASP. atomate uses FireWorks classes to develop modules for performing common DFT calculations with VASP, such as structural relaxations and self-consistent calculations of total energy. atomate also provides a framework for building custom modules, which we use to construct our structural interpolations and polarization calculations modules.

### DFT workflow

We use the DFT workflow, shown in Fig. 2, to compute the physical properties needed to identify ferroelectric candidates. We perform spin-polarized calculations and for systems with spin-polarized ground states, consider only ferromagnetic ordering. We execute the DFT workflow shown in Fig. 2 for the 413 pairs with continuous nonpolar-polar transformations, with maximum distortions that do not exceed 1.5 Å.

For each structure pair, we begin with the nonpolar structure in the low-symmetry setting (obtained from BCS Structure Relations in the Symmetry Analysis step) and the polar structure. We use the nonpolar structure transformed into the low-symmetry setting so we can perform structure interpolations and polarization calculations across similar lattices. We perform relaxations of the unit cell and atomic positions of both of these structures twice, using a level of convergence similar to what Materials Project uses for its database entries. As mentioned, all calculations are spin-polarized and use PBE, with a Hubbard U correction for systems with non-d0 open-shell cations; our workflow assumes ferromagnetic ordering for all systems. Extensions of our workflows to consider antiferromagnetic and other orderings will be the subject of future work. We then fix the relaxed nonpolar and polar structures, and perform a self-consistent DFT calculation to compute the total energy and band gap. If either the nonpolar structure or polar structure is found to be metallic at the DFT-PBE level in our spin-polarized calculations initialized with ferromagnetic orderings and a standard U–here we define metallic as having a DFT-PBE band gap of less than 10 meV–we stop the workflow for that structure pair.

If the polar and nonpolar structures are both insulating, we compute the polarization along the distortion path. As shown in Table 1, 80 of the 413 structure pairs were computed to have metallic endpoints in our spin-polarized calculations: 30 were found to have a metallic nonpolar structure but insulating polar structure, 2 were found to have a metallic polar structure but insulating nonpolar structure, and 24 were found to have both metallic polar and nonpolar structures. 24 additional structures have at least a metallic nonpolar structure, but these workflows were halted before the polar structures had their band gaps computed.

We compute the DFT total energy, band gap, and polarization of eight evenly-distributed linearly-interpolated structure, or interpolations, of the nonpolar to polar structures. We found eight interpolations to be sufficient for reconstructing a smooth polarization trend for at least 75% of our candidates. Similiar to the previous step where a metallic calculation causes the workflow to stop, metallic interpolations similarly halt the workflow since we would be unable to calculate the polarization of that structure. 59 candidates were found to have metallic interpolations. The DFT workflow is labeled as complete when all polarization calculations along the path have completed. As shown in Table 1, 255 structure pairs successfully completed the workflow, and satisfy our requirements of a ferroelectric phase transition.

### Post-processing spontaneous polarization values

As discussed earlier, only polar space groups are compatible with a polarization vector that is not integer or half integer multiples of the polarization quantum46. If a reference nonpolar structure that is continuously deformable into the polar structure can be identified, we can then calculate the polarization of several interpolated structures between the reference nonpolar and the target ferroelectric polar structure. The nonpolar structure is used as a means to calculate the spontaneous polarization; however, we note in general it is not necessary for the nonpolar structure to be experimentally observable for the polar material to be ferroelectric.

We start by calculating the formal polarization of the nonpolar structure, which is either zero or a half quantum of polarization (modulo the quantum of polarization) along the three lattice directions. Then we perform the same calculation for the first interpolated structure along the distortion, and then the next, until we arrive at the final polar structure. For a sufficient number of interpolations between the nonpolar and polar structures, we can trace out smooth, continuous polarization “paths” along the distortion; there will be infinitely many paths due to the periodicity of the polarization lattice. Subtracting the polarization values at the nonpolar and the polar endpoints of the same path or “branch” will give us the spontaneous polarization vector of the polar ferroelectric phase.

We perform the following steps to recover the same branch “proper” polarization, which is independent of choice of branch. The first step, which is crucial, is to readjust the polarization for each structure along the distortion to be in the polar polarization lattice. To do this, we modify the polarization of the intermediate structures by the ratio of the quantum of polarizations of the two lattices (the lattice parameters divided by the volume multiplied by the electron charge), i.e.,

$${P}_{\eta ,i}^{{\rm{(polar}}\,{\rm{lattice)}}}={P}_{\eta ,i}\left(\frac{{R}_{{\rm{polar}},i}{/\Omega }_{{\rm{polar}}}}{{R}_{\eta ,i}{/\Omega }_{\eta }}\right),$$
(2)

where $${R}_{\eta ,i}$$ and $${\Omega }_{\eta }$$ are the lattice parameters and volume of the $${\eta }^{th}$$ structure along the distortion, $${R}_{{\rm{polar}}}$$ and $${\Omega }_{{\rm{polar}}}$$ are the lattice parameters and volume of the polar structure, and i is a lattice direction $${ {\hat{a}} }$$, $$\widehat{b}$$ or $${\hat{c}}$$. If we do not perform this adjustment, we are calculating what is called the “improper polarization” which will depend on the choice of branch and is therefore unphysical76. See Fig. 3 for an example of the differences between proper and improper polarization for BaTiO3 (here we use conventional notation for the name).

After adjusting our initial polarizations, we then construct a periodic lattice with lengths and angles corresponding to the quantum of polarization along each polar lattice direction; this corresponds to the second term of Eq. (1). For the polarization lattice, the lengths of the lattice vectors are the cell lattice vectors divided by the volume of the unit cell and multiplied by conversion factors for electron charge and length scale.

Our algorithm for adjusting the polarizations to be on the same branch is depicted in Fig. 4. First, we take the nonpolar polarization (adjusted to be in the polar polarization lattice), and choose the “image” (or periodic value) of the polarization value in the nonpolar polarization lattice that is closest to the Cartesian origin (0, 0, 0). The value of the nonpolar polarization along $${ {\hat{a}} }$$, $$\widehat{b}$$, and $${\hat{c}}$$ can either be zero or a half-quantum. Then, we find the image of the first interpolation polarization value (again adjusted to be in the polar polarization lattice) that is closest to the Cartesian coordinates of the adjusted nonpolar polarization value. We continue this process until we get to the polarization of the polar structure.

This algorithm will find the polarization path with the smallest difference between polarizations of subsequent interpolations. An issue is that this algorithm can incorrectly find the same branch polarization in cases where the change in polarization between interpolations is larger than the quantum of polarization between branches. One example of this type of failure is the polarization resolved for CrO3 with search ID 187, see Fig. 5 (the search ID being a simplified unique identifier, defined for the purposes of our work, for pairs of structures used in our workflow search, see Online-only Table 3). In this example, the algorithm chooses a discontinuous path that has a smaller spontaneous polarization of 78.2 μC/cm2 in red. However, the correct path uses the last three interpolations in the branch shown with a dashed red line and gives a polarization of 122.9 μC/cm2. To correctly reconstruct this polarization with our existing algorithm, more structures would be needed to better interpolate between the nonpolar and polar structures. Alternatively, the curvature of the spline connecting nonpolar and polar structures could be used to identify points on the same branch; we leave this refinement for future work.

### Graphical interface

To view the DFT ferroelectric candidate data in aggregate, we create an interactive web site for viewing polarization and total energy plots, animations of the distortion, and other data. The interface consists of two main pages: (1) a page containing a sortable table of ferroelectric candidates organized by category (whether the candidate had a value of polarization successfully calculated and if so with what level of confidence) and (2) individual candidate pages that show energy and polarization plots, distortion animations, and other data specific to that candidate. This interface is available at https://blondegeek.github.io/ferroelectric_search_site/.

## Data Records

This dataset is available as two JSON files deposited in figshare77 and our GitHub repository (http://github.com/blondegeek/ferroelectric_search_site)78. The JSON files provide details of the symmetry analysis performed for each candidate and data generated by DFT calculations and post-processing from the workflow. Zipped folders of the input and output VASP files for each candidate deposited in figshare77. The title of the zipped folder includes the workflow ID to correlate the VASP files to information in the JSON files provided. We also provide an interface for viewing the dataset at http://blondegeek.github.io/ferroelectric_search_site with the code for the interface located at http://github.com/blondegeek/ferroelectric_search_site.

### File format

We contribute the following data:

1. 1.

JSON file with information on workflow status of each calculated candidate and calculation details extracted from VASP inputs and outputs. This includes total energy, band gap, polarization, post-processed information, and validation criteria for candidates with completed calculation. See Tables 2 and 3 for details.

2. 2.

JSON file with information describing all 413 nonpolar-polar structure pairs with group-subgroup relations compatible with a second-order phase transition in the Materials Project determined with BCS Structure Relations and used in this search. See Tables 4 and 5 for details.

3. 3.

Zipped folders with the VASP 5.3.5 INCAR, KPOINTS, OUTCAR, and POSCAR files.

### Reported properties

For each candidate we provide an initial nonpolar-polar pair of structures, including the nonpolar structure in both the nonpolar, high-symmetry setting and polar, low-symmetry setting. We also provide the displacements of each atom and other metrics provided by BCS Structure Relations.

For each successful calculation, we provide the structure used for calculation, the ionic and electronic polarization computed by VASP, the ionic polarization computed via the method of point charges, the total energy and energy per atom of the structure, and other commonly computed quantities such as total magnetization, magnetization per atom, forces, and stresses. We also give details as to which calculations (out of the 22 computed) for a given candidate pair were completed.

For each set of completed calculations we also provide, the recovered spontaneous polarization using the workflow described in the Methods section, as well as spline data characterizing the smoothness of the polarization and energy trends across the nonpolar-polar distortion.

### Graphical representation of results

In the top row of Fig. 6, we partition the high-quality candidates found in the Materials Project into known and newly-proposed ferroelectrics and further partition those ferroelectrics into subclasses. In Fig. 6 we see that known and new ferroelectric candidates are well mixed along the metrics of nonpolar-polar structure energy difference, distortion maximum between nonpolar and polar structures, PBE band gap of polar structure, and energy above hull (a measure of the thermodynamic driving force for decomposition79).

In the middle row of Fig. 6, the candidates with large polarizations denoted with red triangles belong to the perovskite family. We recover many known perovskite ferroelectrics, such as: LiNbO3, AlBiO3, BiInO3, KNbO3, BaO3Ti, CdO3Ti and O3PbTi. In addition, we recover well-known double perovskite ferroelectrics, such as Bi2Nb2O9Pb and Bi2O9SrTa2. We also recover well-known antiferroelectrics possessing ferroelectric metastable phases, such as NaNbO3, HfO3Pb and O3PbZr. We again note that we do not use conventional notation for these systems but rather use alphabetical order of elements to provide a consistent ordering with our workflow output.

Other classes in the middle row of Fig. 6 are the organic (NH4)2SO4 family in blue and structures already proposed by theory to be ferroelectric in purple. These data show that there are many known and proposed ferroelectrics in the literature with polarizations of 10 μC/cm2 or less.

In the bottom row of Fig. 6, we categorize new ferroelectric candidates into different trending compositions or structure types. There are several candidates containing fluorides, carbon-oxygen compounds, and hydroxyl groups. We highlight these candidates because they are very different in composition from oxide ferroelectrics most common in the literature. We also point out some hypothetical non-magnetic hexagonal manganite-like structures found in the Materials Project database that have polarizations of approximately 10 μC/cm2 and half-quantum nonpolar polarizations.

In Fig. 7, we show trends in the number of pairs with continuous deformation with given nonpolar-polar point group transitions. There are 32 crystallographic point groups; nonpolar and polar point groups are shown in blue and red, respectively. The thickness and color of the line connecting nonpolar and polar point groups indicate the number of structures in the dataset with a continuous deformation between those point groups. We find that point group transitions that correspond to orthorhombic structures such as mmmmm2, monoclinic structures such as 2/m2 or 2/mm, and hexagonal 6/mmm6mm are the most prevalent.

In Fig. 8, we show the computed polarization of the ferroelectric candidates plotted with respect to their polar point group, similar to the plot of piezoelectric tensor magnitudes in ref. 4. The majority of candidates have polarization less than 5 μC/cm2, shown on the right. The candidates in point group 4mm with large polarizations are perovskites with a reference structure in point group $$m\bar{3}m$$. The degree of shading of a radial cell is proportional to the number of candidates in that region of the plot. For example, there are many candidates with polar point groups 2 and mm2 that have polarizations within 25 μC/cm2.

## Technical Validation

### Verification of computational methodology

Several checks are needed to ensure our automated calculations have completed satisfactorily and the information automatically extracted from them is reliable. We describe these tests below.

#### Testing smoothness of energy and polarization trends with distortion

We flag any ferroelectric candidates whose calculations cannot be used to reliably assess the quality of the candidate. For example, if the trend in total energy is not continuous, we cannot be confident that we can extract a meaningful polarization trend. Similarly, if the same branch polarization is not continuous, we cannot be confident that an accurate spontaneous polarization has been determined.

To assess the smoothness of trends in polarization and energy across a distortion, we use UnivariateSplines from scipy.interpolate80. We use cubic splines for fitting polarizations and quartic splines for fitting total energies. We use the default smoothness parameter of 1.0. These splines are generated using the Polarization class in pymatgen.analysis.ferroelectricity.polarization.

We find that 31 out of the 55 materials that do not have smooth interpolations contain atoms with nonzero magnetic moments, mostly containing 3d elements (V, Cr, Mn, Fe, Co, Ni) and one containing the 5d element W. We found 26 materials to have several discontinuities in total energy (even when these calculations resulted in smooth polarizations). These materials were transition metal oxides, fluorides, carbonates, orthosilicates, and phosphates with alkali or alkaline earth metals (Li, Na, and Ba for these specific examples), many being Li-ion battery cathode candidates. The transition metals in these materials (V, Cr, Mn, Fe, Co, Ni) can take multiple oxidation states. Because these discontinuities in energy were coincident with discontinuities in the total magnetization, we believe these jumps were caused by the transition metal species changing oxidation state through the distortion, suggesting the need to asses their ground state magnetic ordering more carefully.

#### Metallic endpoints and metallic interpolations

Polarization calculations require a nonzero band gap along the distortion path. Therefore, workflows that have either polar or nonpolar structures that are calculated to be metallic are halted. In the workflow_data.json, these workflows are designated by a workflow_status of “DEFUSED”. Of the 413 pairs considered, 80 possess metallic endpoints and therefore interpolations were not performed. Occasionally, interpolated structures between two nonmetallic structure endpoints are metallic and there were 59 that had metallic interpolation structures. If any structure along the path from nonpolar to polar structure is metallic, the quality of automated analysis cannot be guaranteed. We include these candidates in our dataset, but they are noted as having a polarization_len (see Tables 2 and 3) of less than 10 or do not have a polarization_change_norm (in cases where some of the interpolated polarizations are None). If a candidate has a polarization_len equal to 10 and have a polarization_change_norm, all interpolated polarization calculations completed successfully.

We also note that the method used to determine whether a material is insulating differs for pymatgen and VASP. In our workflow, pymatgen determines the band gap by comparing the energies of the band edges. When VASP determines whether to proceed with calculating the polarization, it checks whether the material is insulating by checking the occupations of the band edges. Since the occupations can be sensitive to the choice of smearing and k-grid density, there are instances where using our default settings in the workflow leads to partial occupancies, while the band edge energies suggest the material has a finite gap; therefore VASP will not proceed with calculating the polarization whereas our workflow deems the material to be insulating. This tends to occur for materials containing 3d elements which, as discussed earlier in the context of magnetism, would require more careful calculations to reliably capture their properties, and therefore not necessarily expected to successfully complete the workflow.

#### Comparison of materials project to relaxed structures

Two structural relaxation calculations are already performed on all structures in the Materials Project. We perform additional relaxations of the unit cell and atomic positions to ensure total energy convergence. We found only less than 5% (10%) of our relaxed structures have lattice parameter differences of more than 3% (1%) from the original Materials Project structures. Because we perform relaxations of nonpolar structures transformed to the low-symmetry polar setting, we compare the relaxed nonpolar structure to the low-symmetry transformed structure output by BCS of the nonpolar structure from Materials Project.

#### Identifying high-quality candidates

In any high-throughput search, there are calculations that complete without errors and some that require further scrutiny to interpret its results. We deem as high-quality candidates those calculations where the polarization and total energy trends are smooth and continuous; we define this criteria in the following way: the maximum difference between the data and spline fitted to the same branch polarization must be less than 10−1 μC/cm2, and the maximum difference between the data and the spline fitted to the energy trend must be less than 10−2 eV.

As shown in Table 1, out of the 255 candidates, 200 pass through our stringent verification criteria and ensures the polarization and energy trends across the ferroelectric distortion are smooth and continuous. The remaining candidates are still valid candidates; we recommend checking the polarization and energy trends by hand as the algorithms used for analysis may not have reliably recovered the spontaneous polarization in these cases, as in Fig. 5. These high-quality candidates are further described in the section Determining Known and New Ferroelectrics. There are candidates included in this list with polarization values of zero, as computed within the accuracy of our calculations; we include these since they pass the above criteria and could be tuned to host nonzero polarizations such as by chemical substitution. As well, the polarization values may be very small but this is also true of some of the known ferroelectric candidates computed with our workflow, and so for the sake of completeness we also list these candidates. We also call the reader’s attention to the materials with magnetic elements. Since all materials are initialized with ferromagnetic ordering, we do not expect to reliably capture materials with other magnetic orderings (e.g. antiferromagnets); as such, materials with nonzero magnetic moments may not necessarily have been calculated using their true magnetic ground state and those results should be interpreted with caution.

### Comparison to DFT calculated polarizations from literature

We verify that our workflow reproduces the first-principles calculated polarizations for a variety of ferroelectrics in the literature. DFT calculated values for the ferroelectric polarization depends heavily on the structures and functional used in the calculation, and for magnetic systems, it will also be sensitive to magnetic ordering. For example, while in our work the endpoint structures are fully relaxed, other works constrain the relaxed polar unit cell to have the same volume as the experimental structure81; these calculated polarizations will be systematically smaller than ours due to PBE optimized structures having larger lattice parameters than experimental values.

We compare to literature where a fully optimized (unit cell, volume, and atomic positions) relaxation procedure is used. For clarity of comparison, we use the chemical formula used by these references rather than the alphabetical_formula that is convention for the rest of this work.

The ferroelectric first-principles literature is largely dominated by studies of perovskites. We compare to calculations for the perovskites BaTiO3, PbTiO3, LiNbO3, BiAlO3, CdTiO3, BiFeO3, and we include in our comparison the double perovskite SrBi2Ta2O9. These comparisons are summarized in Table 6.

We note again here that because our workflow assumes ferromagnetic ordering, it will fail for many multiferroics since they tend to have other magnetic orderings, such as antiferromagnetic or noncollinear orderings. For example, two important multiferroics are BiFeO3 and YMnO3; the former is only captured to some extent by our workflow and the latter is not reported at all by our database. The standard polar phase of BiFeO3 features G-type antiferromagnetic ordering, and that for YMnO3 features A-type antiferromagnetic ordering. As the present version of our workflow initializes all calculations with ferromagnetic ground states for all magnetic systems, we do not expect to capture most multiferroics using the workflow as implemented for this search with the Materials Project database. Additionally, we note that the standard nonpolar reference structure for BiFeO3 is in space group 167 (R$$\bar{3}$$c); however, the R$$\bar{3}$$c structure of BiFeO3 is not in the Materials Project database. That the R3m and Pm$$\bar{3}$$m structures of BiFeO3 calculated in this workflow, with ferromagnetic ordering and the default U value, are found to be insulating is fortuitous and allows it to be reported as a high quality candidate. However, the BiFeO3 experimentally verified ground state structure, 161 (R3c), did not complete our workflow due to metallic interpolations. Conversely, YMnO3 is DEFUSED and does not proceed to the interpolations because its nonpolar structure is found to be metallic in our calculations. Extending the workflows to account for antiferromagnetic and other spin-ordered ground states, as well as to relax the constraint that a nonpolar reference structure exist in the database, will enable casting a wider net for known and new multiferroics, and will be the subject of future work.

### Determining known and new ferroelectrics

We distinguish known from new ferroelectrics in our workflow depending on whether or not a material has been reported in the literature as ferroelectric. Thus, we perform a literature review by hand for every considered candidate, and leave automating such literature searches to future work. We find that out of 200 high quality candidates, 74 are known or previously proposed ferroelectrics and 126 are, to our knowledge, new ferroelectric candidates.

In Fig. 6, we plot calculated spontaneous polarization versus nonpolar-polar total energy difference, maximum distortion distance, PBE band gap, and energy difference between the polar structure and convex hull reported in the Materials Project for known, proposed and new ferroelectric candidates. We also provide tables of the known and new candidates grouped by chemical formula and polar space group in Online-only Tables 1 and 2, respectively. Details connecting these candidate to specific workflow calculations are in Online-only Table 3.

As seen in Fig. 6, known and newly-proposed ferroelectrics display similar dispersion and overlap in the range considered. The middle row of Fig. 6 demonstrates the variety of known ferroelectric candidates that we are able to recover, from perovskites to candidates in the organic (NH4)2 SO4 family to candidates proposed by theory. The bottom row of Fig. 6 shows categories of new ferroelectric candidates we find in the Materials Project, some from previously known ferroelectric classes such as hexagonal manganite-like structures and less-studied categories such as fluorides, carbon-oxygen compounds, and crystals containing hydroxyl groups.

### Comparison to hand-curated list of ferroelectrics in the Pauling Files database

The Pauling File is a materials database accessible through SpringerMaterials18. In this database, there are materials tagged as ferroelectric and antiferroelectric. We use these tagged entries to validate whether the workflow is able to successfully identify diverse ferroelectrics by examining which tagged (anti)ferroelectrics complete the workflow. In the Pauling Files, there are 955 distinct compositions tagged as (anti)ferroelectric, 306 of which are pure (not doped) compositions. Out of 306 pure compositions, 95 of those compositions are included in the Materials Project as polar materials. This does not necessarily mean that the Materials Project database contains the same ferroelectric polar structure as referenced in the Pauling Files; rather, it simply means that there exists a polar structure in the Materials Project with the same composition as a tagged ferroelectric or antiferroelectric in the Pauling Files. 57 of these compositions have nonpolar-polar structure pairs in the Materials Project, 40 of which are found to have a continuously distortion by BCS Structure Relations. 32 of the 40 are successfully identified by the workflow as high-quality candidates, meaning the energy and polarization trends are smooth. Out of the 8 candidates that did not successfully make it through the workflow as high quality candidates, 4 of them (CrO3Y, Eu2GeSe4, Cl3CoTl, and MnO3Y) had metallic endpoints, 2 candidates (Cl3CrRb and Br3MnRb) had metallic interpolation structures, and 2 candidates (Cl4CoRb2 and B7ClCr3O13) did not have smooth energy trends due to fluctuating band gaps and magnetic moments in the interpolations. We note that most of these 8 candidates are non-d0 systems, and therefore expected to exhibit magnetic order. This suggests that the primary impact to the robustness of our workflow is the level of DFT used and the magnetic orderings considered; since PBE tends to underestimate band gaps, and since ferromagnetically-ordered systems tend to be overwhelmingly metallic, several of these candidates are experimentally insulating but are metallic in our workflow.

### Comparison to experimental measurements in Landolt-Börnstein series

To validate that our workflow calculates polarizations that can be used to guide experimental efforts, we compare to tabulated experimentally measured polarizations of known ferroelectrics in Landolt-Börnstein - Group III Condensed Matter - Ferroelectrics and Related Substances19,20,21,22,23. This series classifies hundreds of ferroelectrics into a 72 class numbering scheme. We note that polarization values for ferroelectrics in the Landolt-Börnstein series volumes 36A, 36B and 36C may be superseded by more recent experimental measurements. Experimentally measured polarizations depend greatly on the quality of the sample and the method used. For many ferroelectrics, polarization measurements made across decades vary greatly depending on these factors. Any comparison between theory must be taken in this context.

Nonetheless, plots of polarizations values calculated by our workflow vs. experimental spontaneous polarizations reported in the Landolt-Börnstein series volumes 36A, 36B and 36C are shown in Fig. 9 and tabulated in Table 7. We only compare search candidates to Landolt-Börnstein entries that match in composition and polar space group. For example, NaNbO3 is LB Number 1A-1 and has a polarization of 12 μC/cm2 for the 161 space group structure. However, all the NaNbO3 found in our search are orthorhombic (space groups 26 and 29), so we do not make the comparison.

For experimental polarizations greater than 10 μC/cm2, the majority of experimental polarizations are between 25% and −50% of those that we calculate, well within an order of magnitude. The exceptions are both polar structures of Bi2O9SrTa2 (space groups 36 and 41), which are calculated to have polarizations much greater than their experimental values. Multiple entries of a given formula indicate multiple calculations for different structure pairs in our dataset for the same compound. Compounds with polarizations of less than or equal to 1 μC/cm2 are not shown on the plot given the log-log scale. For polarizations less than 5 μC/cm2, we see our calculations capture the general trends of the experimental polarizations.

We find that the PBE functional we use for our DFT calculations overestimates polarizations. This is partially due to unit cells relaxed with PBE having larger than experimental volumes and thus larger distortions.

## Usage Notes

In this work, we present 413 nonpolar-polar structure pairs in the Materials Project database that are compatible with a second-order phase transition as ferroelectric candidates and perform DFT calculations of total energy, band gap, and polarization for these structures pairs.

This dataset offers the first opportunity to compare a large number of known, previously proposed, and new ferroelectrics side by side with the same methodology. We believe by setting strict criteria for ferroelectricity and casting a wide-net using high-throughput searches, we will find candidates that challenge and advance our understanding of ferroelectric phenomena. As seen in our candidates, there may be ferroelectrics waiting to be discovered that defy our expectations. This dataset will be useful for creating new tools and criteria for analyzing diverse ferroelectrics.

The infrastructure provided by the Bilbao Crystallographic Server, FireWorks, pymatgen, and atomate is crucial to being able to perform these types of searches efficiently. Thus, we also provide our code and data for these searches with the hope they will provide access for others to perform and develop similar searches.

Our code for performing structural interpolations and polarization calculations has been incorporated into the pymatgen and atomate packages. We also provide the code that we use to create the interface that we used to view our candidates in aggregate.

The workflow we have presented can be extended to any crystal structure database, experimental and hypothetical. Several modifications can be made to this workflow to extend the scope of these searches. Notably, extending our workflow to treat different magnetic orderings will enable it to treat multiferroics. Additionally, the same DFT workflow can be used to screen any experimentally measured polar structure - even one without an existing nonpolar reference - by generating nonpolar reference structures with BCS Pseudosymmetry7. Our workflow can also be adapted to perform species substitutions and find symmetry relations between structure types, classes of structures that share space groups, Wyckoff positions, and other lattice similarities.