A hybrid organic-inorganic perovskite dataset


Hybrid organic-inorganic perovskites (HOIPs) have been attracting a great deal of attention due to their versatility of electronic properties and fabrication methods. We prepare a dataset of 1,346 HOIPs, which features 16 organic cations, 3 group-IV cations and 4 halide anions. Using a combination of an atomic structure search method and density functional theory calculations, the optimized structures, the bandgap, the dielectric constant, and the relative energies of the HOIPs are uniformly prepared and validated by comparing with relevant experimental and/or theoretical data. We make the dataset available at Dryad Digital Repository, NoMaD Repository, and Khazana Repository (http://khazana.uconn.edu/), hoping that it could be useful for future data-mining efforts that can explore possible structure-property relationships and phenomenological models. Progressive extension of the dataset is expected as new organic cations become appropriate within the HOIP framework, and as additional properties are calculated for the new compounds found.

Design Type(s) data integration objective • database creation objective
Measurement Type(s) material properties
Technology Type(s) computational modeling technique
Factor Type(s) cation

Machine-accessible metadata file describing the reported data (ISA-Tab format)

Background and Summary

Perovskites belong to a class of inorganic crystals with chemical formula ABX3, sharing the same structure with calcium titanate CaTiO3. In such a perovskite structure, the inorganic cations A and B are coordinated by 12 and 6 anions X, respectively. By substituting an organic cation for A, the first hybrid organic-inorganic perovskites (HOIPs), namely CH3NH3PbX3 (X=Cl, Br, I), were synthesized and characterized in 1978 (ref. 1). HOIPs remained largely unnoticed until the first successful application of CH3NH3PbX3 (X=Cl, Br) as photovoltaic absorbers with a power conversion efficiency of 3.8% in 2009 (ref. 2). An enormous number of experimental and computational efforts have then been devoted to optimizing some halide-based HOIPs, e.g., CH3NH3PbI3, HC(NH2)2PbI3, and CH3NH3SnI3, for photovoltaic applications36. Currently, CH3NH3PbI3 and HC(NH2)2PbI3 have taken a leading position in providing high performance (reaching 20.1% in the conversion efficiency)7 and low fabrication cost36.

In fact, there are plenty of choices for the sites A, B, and X in a HOIP. At the site A, methylammonium CH3NH335,8, formamidnium HC(NH2)27,9, and many more6, have been realized. Cations B can be Pb or Sn while the halogens Br, I, and Cl can be used for X1,2. Moreover, the introduction of an organic cation A into the perovskite structure can give raise of many different structural motifs6,1012, making the class of halide-based HOIPs highly diverse. Rapidly and thoroughly screening this un-explored domain of the chemical space, for instance, with the emerging data-driven approaches1325, may reveal new promising compounds potentially meeting the pressing need for lead-free perovskite solar cell materials26.

This contribution aims at taking an initial step towards the creation of a comprehensive database of HOIPs, which may be useful for this goal. In fact, this idea has recently been emerging with some datasets of hybrid organic/inorganic perovskites, prepared at some level of computations27,28. Our dataset, which contains 1,346 HOIPs, is prepared uniformly at the level of density functional theory (DFT)29,30 from the initial structures predicted by the minima-hoping method31,32. For each material, the equilibrium structure, the relative energies ( ε rel 1 and ε rel 2 , computed with respect to different energy references as described in Numerical calculations Section), the atomization energy ( ε at ), the dielectric constant (ε), and the direct or indirect energy bandgap (Eg) are reported. This dataset, which is available at Dryad Digital Repository, NoMaD Repository, and Khazana Repository, can readily be expanded in multiple ways, i.e., new properties can be calculated from the provided structures, and new HOIPs can also be progressively added. We expect that this dataset can supply a playground for future machine learning based work in this active research area.



Figure 1 summarizes the workflow of the dataset preparation. This procedure starts by collecting 16 organic (molecular) cations A+1, all of which have been considered in the literature1,6,7,12. Each of these 16 cations, shown in Fig. 2, is placed at the site A of the ASnI3-based perovskites. This is the starting point for various structure prediction simulations, performed with the minima-hopping method31,32. The low-energy structures predicted for ASnI3 are subjected to a preliminary filtering step, keeping 135 prototype structures that are different in the DFT energy and the volume (these quantities are estimated on a not-so-high accuracy level used for the searches). Next, we expand the set of 135 structures by substituting either Ge or Pb for Sn, and, similarly, by substituting either F, Cl, or Br for I. The resulted 1,620 (initial) structures were optimized by DFT at the desired level of accuracy (described in Numerical calculations Section), yielding the relative energies and the atomization energies. Then, the band edge positions in the k space, the energy bandgap, and the dielectric constant were calculated for the optimized structures. A post-filtering step is finally performed on the whole dataset, removing redundancy (this time, redundancy is identified at the desired accuracy level of DFT computations), keeping 1,346 distinct data points (summarized in Table 1). Whenever possible, our calculated results are compared with those computed and/or measured data. Relaxed structures of all the materials are finally converted into the crystallographic information format (cif) using the pymatgen library33.

Figure 1: Scheme for preparing the dataset of hybrid organic-inorganic perovskites.

Minima-hopping is a structure prediction method that was used for generating an initial set of 135 ASnI3 prototypical structures (where A stands for 16 organic cations), which were used as seeds for the creation of the remaining compounds.

Figure 2: Ball and stick representations of 16 organic cations considered in the HOIP dataset.

Carbon, hydrogen, oxygen and nitrogen atoms are shown in dark brown, light pink, red, and gray, respectively.

Table 1 Summary of the data subclasses in the hybrid organic-inorganic perovskites dataset.

Initial structure accumulation

As briefly demonstrated in the Workflow section, our dataset is built up from 135 prototype structures obtained by searching for low-energy structures of 16 HOIPs with chemical formulae ASnI3 (in fact, prototype structures of any material can be searched). In the minima-hopping structure prediction simulations, the DFT-level evergy is used to construct the potential energy surface (PES) of the composition31,32. Starting from an initial structure, low-energy minima of the PES are then searched by alternatively performing DFT-based local optimization runs (to locate the nearby minima) and molecular dynamics runs (to escape the identified minima). Thanks to some feedback mechanisms implemented, structure searches using this method is biased, giving some preference to the low-energy domains of the PES. Because of the large number of minima, the searches were performed at a given not-so-high accuracy level of DFT energy, and the minima identified in this step were then refined at the desired level. The power of the minima-hopping has been demonstrated over several classes of crystalline solids3436, including three SnI3-based HOIPs12.

For each of 16 ASnI3 HOIPs, numerous low-energy structures identified are subjected to a filtering step, keeping only those that are different by at least 5 meV/atom in the DFT energy and at least 0.1 Å3/atom in the structure volume. After the filtering step, 135 prototypical structures of 16 HOIPs were selected, three of which are shown in Fig. 3. In case of isotropic organic cations such as tetramethylammonium, a cubic-like cage formed by the network of Sn and I ions is stabilized in a three-dimensional structure. For the case of anisotropic or polar organic cations, the framework deforms into the two-dimensional planar or pillar motif. More structural variation is possible to be found from further structure searching using different organic cations and/or slightly nonstoichiometric composition in the HOIP system6,37. By substituting either Ge or Pb for Sn, and substituting either Cl, F, or Br for I, 1,620 structures of 192 chemically distinct HOIPs were obtained. They are the initial structures used to build up the HOIP dataset.

Figure 3: Lowest energy structures of tetramethylammonium, hydrazinium, and propylammonium tin iodide showing three prototypical conformations of organic-inorganic hybrid perovskites.

Carbon, hydrogen, nitrogen, tin and iodine atoms are shown in dark brown, light pink, gray, green and purple, respectively.

Numerical calculations

General scheme

Our calculations are performed within the DFT29,30 formalism, using the projector augmented-wave (PAW) method38 as implemented in the Vienna Ab initio Simulation Package (vasp)3942. The default accuracy level of our calculations is ‘Accurate’, specified by setting PREC=Accurate in all the runs with vasp. The basis set includes plane waves with kinetic energies up to 400 eV, as recommended by vasp manual for this level of accuracy. PAW datasets of version 5.2, which were used to describe the ion-electron interactions, are also summarized in Table 2. The van der Waals dispersion interactions are estimated with the non-local density functional vdW-DF2 (ref. 43). The generalized gradient approximation (GGA) functional associated with vdW-DF2, i.e., refitted Perdew-Wang 86 (rPW86)44, was used for the exchange-correlation (XC) energies. For all the calculations, except bandgap determination, we sample the Brillouin zones, which are significantly different in shape for the different compounds, by an equispaced (with the spacing of hk=0.20 Å−1), Γ-centered Monkhorst-Pack45 k-points mesh. The equilibration of the examined structures is assumed when the atomic forces are below 0.01 eV/Å. This numerical scheme is consistent with that we used for preparing the polymer dataset reported in ref. 35.

Table 2 VASP PAW potentials of the elements used for calculations in this work.

Bandgap determination

The bandgap Eg is perhaps the most desired physical property of HOIPs. Within DFT, Eg is determined as the energy difference between the conduction band minimum (CBM) and the valence band maximum (VBM), identified on a given k-point mesh. For a solid with an arbitrary primitive cell, the locations of VBM and CBM are generally not known beforehand, and the k-point mesh should be very dense in order to locate the band edges accurately. With a mesh of this type, the computation of Eg using the Heyd-Scuseria-Ernzerhof (HSE06)46,47 exchange-correlation functional, the level of DFT at which the calculated bandgap is expected to be close to the real bandgap, is computationally prohibitive. Although such a computation at the GGA level of DFT is feasible, Eg is generally underestimated by 30% or more48.

The conduction bands and the valence bands computed at the GGA and HSE06 levels of DFT are essentially similar in the shape. However, they are shifted as a whole with respect to each other and to the true electronic structrures (see, for example, ref. 49). Therefore, our bandgap determination procedure, shown in Fig. 4, includes two steps. First, the locations of VBM and CBM are searched at the GGA level on three different dense k-point meshes. The first two meshes (one centered at Γ=(0,0,0) and the other centered at X=(0.5, 0.5, 0.5)) are equispaced with hk=0.15 Å−1, while the third mesh contains k-points distributed along Γ-X-M-Γ-R-M-X-R, the path that has widely been used to represent the electronic band structrure of HOIPs12,50. In the second step, the positions of VBM and CBM identified in the first step are used with zero weight for sampling the Brillouin zones using a Monkhorst-Pack k-point mesh with hk=0.20 Å−1, hereby determining the energy difference between CBM and VBM at the HSE06 level of DFT. Although this procedure needs some extra work, we expect that the bandgap computed for HOIPs with an arbitrary primitive cell is reliable.

Figure 4: Scheme for calculation of the bandgap of hybrid organic-inorganic perovskites at GGA and HSE06 level of theories.

Data entry 0,845 (MASnI3; CH3NH3SnI3, Khazana ID: 2,695) is used for demonstration. Set-Γ, Set-X and Set-p correspond to the k-points sets generated within Γ-centered mesh, X-centered mesh, and high symmetry path for P1 group.

Atomization and relative energies definitions

The atomization energy of each of these compounds are calculated as

(1) ε at = E ABX 3 i n i E i

where E ABX 3 is the energy of the HOIP and ni and Ei are the number and the energy of an isolated atom of the element i respectively. We also report two kinds of relative energies with respect to the atomic constituents and solid constituents.

(2) ε rel 2 = E ABX 3 E A E B 3 2 E X 2 1 2 E H 2
(3) ε rel 2 = E ABX 3 E A E BX 2 E HX

where E A , EB, E X 2 , and E H 2 are the energies of isolated neutral organic molecule A, metallic crystals B, isolated X2, and H2 molecules respectively. E BX 2 and EHX are the energies of the metallic halides (BX2) and hydrogen halides (HX), respectively. For the case of tetramethylammonium cation (C4H12N+), the energy of neutral trimethylamine (C3H9N) was used for E A , and the energies of the molecules C2H6 and CH3X are used instead of E H 2 and EHX in equations (2) and (3), respectively.


The preliminary filtering step is performed only on prototypical structures (ASnI3) based on their DFT energy and bandgap estimated during the structure prediction runs with a limited accuracy. Therefore, an additional filtering step is performed on the whole relaxed structures from 1,620 initial structures to remove any possible redundancy. Within this step, all cases with the same chemical composition but different by less than 2% in volume of unit cell Ω, Eg, ε at , є elec and є ion , are clustered. All the clustered points were inspected visually, keeping only those materials that are distinct. At the end of this step, we are left with 1,346 distinct compounds (also summarized in Table 1). These compounds constitute our final dataset.

Data Records

The complete dataset of HOIP materials can be downloaded as a tarball or can be accessed via Dryad Digital Repository (Data Citation 1) and Khazana Repository (http://khazana.uconn.edu/). 1,346 compounds in our final dataset are recorded in Khazana ID from 1,851 to 3,197. All 8,076 (=1,346×6) DFT runs of the whole dataset (for each structure, there are 6 runs, including relax, dielectric, GGA bandgap with Γ-centered mesh, GGA bandgap with X-centered mesh, GGA bandgap with k-points distributed along Γ-X-M-Γ-R-M-X-R, and HSE06 bandgap) are hosted by NoMaD Repository (Data Citation 2).

File format

The information reported in the dataset for a given material is stored in a file, named as N.cif, where N is a cardinal number used for the identification of the entry in the dataset. The first part of a file of this type is devoted to the optimized structure in the standard cif format which is compatible with many visualization software. Other information, including the calculated properties, is provided as the comments lines in the second part of the file as follows (for the example of N=845).

While most of the keywords are clear, we used keyword Label to provide more detail information of the HOIP compounds, which includes the common name of A organic cation, B cation and X anion. The origin of the formula and structure of organic cations is provided in the keyword Organic cation source. Keywords Material class and Geometry class are set to be ‘Hybrid organic-inorganic perovskite’ and ‘Bulk crystalline materials’, respectively.

# HOIP entry ID: 0845 # Khazana ID: 2695 # Organic cation source: T.D.Huan et al., Phys. Rev. B 93,094105(2016) # Label: Methylammonium Tin Iodide # Material class: Hybrid organic-inorganic perovskite (MC_ino) # Geometry class: Bulk crystalline materials (GC_cry) # Organic cation chemical formula: CH3NH3 # Number of atom types: 5 # Total number of atoms: 12 # Atom types: C H N Sn I # Number of each atom: 1 6 1 1 3 # Bandgap, HSE06 (eV): 2.6347 # Bandgap, GGA (eV): 1.9191 # Kpoint for VBM: 0.5, 0.0556, 0.5 # Kpoint for CBM: 0.5, 0.5, 0 # Dielectric constant, electronic: 4.8562 # Dielectric constant, ionic: 13.0716 # Dielectric constant, total: 17.9278 # Refractive index: 2.2037 # Atomization energy (eV/atom): −3.9099 # Relative energy1 (eV/atom): 0.2785 # Relative energy2 (eV/atom): 0.4387 # Volume of the unit cell (A^3): 251.45 # Density (g/cm^3): 3.51.

Graphical summary of the dataset

We visualize the calculated quantities in the property space as shown in Fig. 5. Because the relative energy, unit cell volume of the compound, bandgap and dielectric constant are the primary properties reported by this dataset, six plots, namely Ω ε rel 1 , Ω E Ε g HSE06 , E g GGA E g HSE06 , E g HSE06 є elec , E g HSE06 є ion , and E g HSE06 є, were shown. Compounds containing different A cations and X anions are represented using different colors and size of the symbols to clarify the role of the chemical contents in controlling the properties of the HOIP.

Figure 5: A summary of the HOIP dataset based on the calculated volume of unit cell Ω, relative energy ε rel 1 , GGA level bandgap E g GGA , HSE level bandgap E g HSE06 , and the dielectric constants є elec , є ion , and є= є elec + є ion .

The panels show (a) unit cell volume vs relative energy, (b) unit cell volume vs HSE bandgap, (c) GGA bandgap vs HSE bandgap, (d) HSE bandgap vs electronic dielectric constant, (e) HSE bandgap vs ionic dielectric constant, and (f) HSE bandgap vs total dielectric constant. In each plot, the color and size of the symbols are coded following the figure keys shown in plot (a).

It can be clearly seen that the dataset is clustered based on the X anions, showing the sequence of F, Cl, Br and I. As shown in Fig. 5a most of F containing HOIP compounds are more favorable to be formed as measured by the relative energy regardless of the A cation contents. Bandgap and unit cell volume are strongly correlated mainly because the electronegativity and the ionic radii of X anions significantly differ for F, Cl, Br and I. Simple and strong correlation between GGA and HSE level bandgap is found as a linear function with scale factor of ~1.2 as shown in Fig. 5c. Small bandgap values varying from 1.5 eV to 1.6 eV, favorable for photovoltaic application, was found for SnI3 containing HOIP compounds including CH3NH3SnI3, NH3NH2I3SnI3, C3H8NSnI3. A limit of the form є elec ~1/ E g HSE06 shown in Fig. 5(d) has also been demonstrated for other classes of materials in the literature13,35,36,5162.

Technical Validation

The relative energy computed via equation (2) is physically relevant to examine the relative stability useful for future studies of new HOIPs. As the dataset contains theoretically stable structures, we used the bandgap, dielectric constant, and XRD pattern with Cu Kα (1.54056 Å) for the validation of the calculations. Since available experimental studies for HOIPs seem to be limited to a small subset of the combinatorial possibilities, a small number of experimental bandgap could only be collected from available resources. These correspond to compounds containing acetamidinium (ACM, C2H7N2), formamidinium (FA, CH5N2), guanidinium (GUA, CH6N3), isopropylammonium (IPA, C3H10N), methylammonium (MA, CH3NH3), and tetramethylammonium (TMA, C4H12N). Four computed bandgaps are also included in the comparison set. As shown in Fig. 6a, the calculated bandgap for the most stable structure of each case (marked as color coded symbols) agrees well with the data from previous studies. (gray symbols correspond to less stable polymorphs).

Figure 6: Validation of data computed for some HOIPs by comparing it with the measured data available.

Bandgap and dielectric constants computed for the low-energy structures of these compounds are plotted in (a,b) vs. those experimentally measured, respectively. In these panels, the lowest-energy structure of each HOIP is indicated by a colored symbol while data from the energetically competing structures are shown in gray (a) or given within an error bar (b). Experimental data of bandgap and dielectric constants of these HOIPs is obtained from refs 8,64,65,66,6768,6970,7172,7374,7576,7778,79 and refs 7483, respectively. In (cf), the simulated and measured XRD spectra for MAPbBr365, MAPbI384, IPAGeI373, and MASnI35,85, are shown. The reported index of reflection orientation is given on top of each significant peak.

In order to further validate the HOIP dataset, experimentally measured and theoretically calculated dielectric constants for both high frequency and static regime are collected and compared with computed dielectric constants. The information is available for a limited number of HOIPs with MA and FA organic cations. Since the computation of dielectric constant using DFPT is highly sensitive to the numerical accuracy of the vibration frequency we used rather tight convergence criterion for the change of total energy by 10−8 eV. Figure 6b shows the excellent agreement between previously reported and computed dielectric constants for the selected HOIPs. Finally, we show the XRD spectra calculated for four HOIPs, including MAPbBr3, MAPbI3, IPAGeI3 and MASnI3 in Fig. 6c–f. Each of them is compared with the corresponding measured XRD patterns showing comparable agreement that can be regarded as supportive validation of computational schemes.

Usage Notes

This dataset, which includes 1,346 HOIPs, has been consistently prepared using first-principles calculations. While the HSE06 bandgap E g HSE06 is believed to be fairly close to the true bandgap of the materials, the GGA-rPW86 bandgap is also reported for completeness and for further possible analysis. The reported atomization energy and the dielectric constants are also expected to be accurate.

Additional Information

How to cite this article: Kim, C. et al. A hybrid organic-inorganic perovskite dataset. Sci. Data 4:170057 doi: 10.1038/sdata.2017.57 (2017).

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



  1. 1

    Weber, D. CH3NH3PbX3, a Pb(II)-System with Cubic Perovskite Structure. Z. Naturforsch., B: J. Chem. Sci. 33, 1443–1445 (1978).

    Article  ADS  Google Scholar 

  2. 2

    Kojima, A., Teshima, K., Shirai, Y. & Miyasaka, T. Organometal Halide Perovskites as Visible-Light Sensitizers for Photovoltaic Cells. J. Am. Chem. Soc. 131, 6050 (2009).

    CAS  Article  PubMed  Google Scholar 

  3. 3

    Burschka, J. et al. Sequential deposition as a route to high-performance perovskite-sensitized solar cells. Nature 499, 316–319 (2013).

    CAS  Article  ADS  PubMed  Google Scholar 

  4. 4

    Liu, M., Johnston, M. B. & Snaith, H. J. Efficient planar heterojunction perovskite solar cells by vapour deposition. Nature 501, 395–398 (2013).

    CAS  Article  ADS  PubMed  Google Scholar 

  5. 5

    Hao, F., Stoumpos, C. C., Cao, D. H., Chang, R. P. H. & Kanatzidis, M. G. Lead-free solid-state organic-inorganic halide perovskite solar cells. Nature Photon 8, 489–494 (2014).

    CAS  Article  ADS  Google Scholar 

  6. 6

    Saparov, B. & Mitzi, D. B. Organic-Inorganic Perovskites: Structural Versatility for Functional Materials Design. Chem. Rev. 116, 4558–4596 (2016).

    CAS  Article  PubMed  Google Scholar 

  7. 7

    Yang, W. S. et al. High-performance photovoltaic perovskite layers fabricated through intramolecular exchange. Science 348, 1234–1237 (2015).

    CAS  Article  ADS  PubMed  Google Scholar 

  8. 8

    Baikie, T. et al. Synthesis and crystal chemistry of the hybrid perovskite (CH3NH3)PbI3 for solid-state sensitised solar cell applications. J. Mater. Chem. A 1, 5628–5641 (2013).

    CAS  Article  Google Scholar 

  9. 9

    Mitzi, D. B. & Liang, K. Synthesis, Resistivity, and Thermal Properties of the Cubic Perovskite NH2CH=NH2SnI3 and Related Systems. J. Solid State Chem. 134, 376–381 (1997).

    CAS  Article  ADS  Google Scholar 

  10. 10

    Xu, Z. & Mitzi, D. B. [CH3(CH2)11NH3]SnI3:- A Hybrid Semiconductor with MoO3-type Tin(II) Iodide Layers. Inorg. Chem. 42, 6589–6591 (2003).

    CAS  Article  PubMed  Google Scholar 

  11. 11

    Xu, Z., Mitzi, D. B. & Medeiros, D. R. [(CH3)3NCH2CH2NH3]SnI4:-A Layered Perovskite with Quaternary/Primary Ammonium Dications and Short Interlayer Iodine-Iodine Contacts. Inorg. Chem. 42, 1400–1402 (2003).

    CAS  Article  PubMed  Google Scholar 

  12. 12

    Huan, T. D., Tuoc, V. N. & Minh, N. V. Layered structures of organic/inorganic hybrid halide perovskites. Phys. Rev. B 93, 094105 (2016).

    Article  ADS  CAS  Google Scholar 

  13. 13

    Mannodi-Kanakkithodi, A. et al. Rational co-design of polymer dielectrics for energy storage. Adv. Mater. 28, 6277–6291 (2016).

    CAS  Article  PubMed  Google Scholar 

  14. 14

    Huan, T. D., Mannodi-Kanakkithodi, A. & Ramprasad, R. Accelerated materials property predictions and design using motif-based fingerprints. Phys. Rev. B 92, 014106 (2015).

    Article  ADS  CAS  Google Scholar 

  15. 15

    Mueller, T., Kusne, A. G. & Ramprasad, R. in Reviews in Computational Chemistry Vol. 29 (ed. Parrill A. L. & Lipkowitz K. B. ) Ch. 4 (John Wiley & Sons, Inc., 2016).

    Google Scholar 

  16. 16

    Mannodi-Kanakkithodi, A., Pilania, G., Ramprasad, R., Lookman, T. & Gubernatis, J. E. Multi-objective optimization techniques to design the Pareto front of organic dielectric polymers. Comput. Mater. Sci. 125, 92–99 (2016).

    CAS  Article  Google Scholar 

  17. 17

    Botu, V., Mhadeshwar, A. B., Suib, S. L. & Ramprasad, R. in Springer Series in Materials Science Vol. 225 (eds Lookman T., Alexander F. J. & Rajan K. ) Ch. 8. (Springer International Publishing, 2016).

    Google Scholar 

  18. 18

    Kim, C., Pilania, G. & Ramprasad, R. From organized high-throughput data to phenomenological theory: the example of dielectric breakdown. Chem. Mater. 28, 1304–1311 (2016).

    CAS  Article  Google Scholar 

  19. 19

    Pilania, G. et al. Machine learning bandgaps of double perovskites. Sci. Rep 6, 19375 (2016).

    CAS  Article  ADS  PubMed  PubMed Central  Google Scholar 

  20. 20

    Botu, V., Batra, R., Chapman, J. & Ramprasad, R. Machine learning force fields: construction, validation, and outlook. J. Phys. Chem. C 121, 511–522 (2017).

    CAS  Article  Google Scholar 

  21. 21

    Kim, C., Pilania, G. & Ramprasad, R. Machine learning assisted predictions of intrinsic dielectric breakdown strength of ABX3 perovskites. J. Phys. Chem. C 120, 14575–1458 (2016).

    CAS  Article  Google Scholar 

  22. 22

    Ghiringhelli, L. M., Vybiral, J., Levchenko, S. V., Draxl, C. & Scheffler, M. Big data of materials science: critical role of the descriptor. Phys. Rev. Lett. 114, 105503 (2015).

    Article  ADS  CAS  PubMed  Google Scholar 

  23. 23

    Botu, V., Chapman, J. & Ramprasad, R. A study of adatom ripening on an Al(111) surface with machine learning force fields. Comput. Mater. Sci. 129, 332–335 (2016).

    Article  CAS  Google Scholar 

  24. 24

    Botu, V. & Ramprasad, R. Learning scheme to predict atomic forces and accelerate materials simulations. Phys. Rev. B. 92, 094306 (2015).

    Article  ADS  CAS  Google Scholar 

  25. 25

    Botu, V. & Ramprasad, R. Adaptive machine learning framework to accelerate ab initio molecular dynamics. Int J Quantum Chem 115, 1074–1083 (2015).

    CAS  Article  Google Scholar 

  26. 26

    Giustino, F. & Snaith, H. J. Toward Lead-Free Perovskite Solar Cells. ACS Energy Lett 1, 1233–1240 (2016).

    CAS  Article  Google Scholar 

  27. 27

    Castelli, I. E., GarcÌa-Lastra, J. M., Thygesen, K. S. & Jacobsen, K. W. Bandgap calculations and trends of organometal halide perovskites. APL Materials 2, 081514 (2014).

    Article  ADS  CAS  Google Scholar 

  28. 28

    Becker, M., Kluner, T. & Wark, M. Formation of hybrid ABX3 perovskite compounds for solar cell application: First-principles calculations of effective ionic radii and determination of tolerance factors. Dalton Trans. 46, 3500–3509 (2017).

    CAS  Article  PubMed  Google Scholar 

  29. 29

    Hohenberg, P. & Kohn, W. Inhomogeneous electron gas. Phys. Rev. 136, B864–B871 (1964).

    Article  ADS  MathSciNet  Google Scholar 

  30. 30

    Kohn, W. & Sham, L. Self-consistent equations including exchange and correlation effects. Phys. Rev. 140, A1133–A1138 (1965).

    Article  ADS  MathSciNet  Google Scholar 

  31. 31

    Goedecker, S. Minima hopping: An efficient search method for the global minimum of the potential energy surface of complex molecular systems. J. Chem. Phys. 120, 9911–9917 (2004).

    CAS  Article  ADS  PubMed  Google Scholar 

  32. 32

    Amsler, M. & Goedecker, S. Crystal structure prediction using the minima hopping method. J. Chem. Phys. 133, 224104 (2010).

    Article  ADS  CAS  PubMed  Google Scholar 

  33. 33

    Ong, S. P. et al. Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis. Comput. Mater. Sci. 68, 314–319 (2013).

    CAS  Article  Google Scholar 

  34. 34

    Huan, T. D., Amsler, M., Tuoc, V. N., Willand, A. & Goedecker, S. Low-energy structures of zinc borohydride Zn(BH4)2 . Phys. Rev. B 86, 224110 (2012).

    Article  ADS  CAS  Google Scholar 

  35. 35

    Huan, T. D. et al. A polymer dataset for accelerated property prediction and design. Sci. Data 3, 160012 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  36. 36

    Baldwin, A. F. et al. Rational design of organotin polyesters. Macromolecules 48, 2422–2428 (2015).

    CAS  Article  ADS  Google Scholar 

  37. 37

    Albero, J., Asiri, A. M. & Garcia, H. Influence of the composition of hybrid perovskites on their performance in solar cells. J. Mater. Chem. A 4, 4353–4364 (2016).

    CAS  Article  Google Scholar 

  38. 38

    Blöchl, P. E. Projector augmented-wave method. Phys. Rev. B 50, 17953–17979 (1994).

    Article  ADS  Google Scholar 

  39. 39

    Kresse, G. & Hafner, J. Ab initio molecular dynamics for liquid metals. Phys. Rev. B 47, 558–561 (1993).

    CAS  Article  ADS  Google Scholar 

  40. 40

    Kresse, G. Ab initio Molekular Dynamik für flüssige Metalle. Ph.D. thesis Technische Universität Wien, (1993).

  41. 41

    Kresse, G. & Furthmüller, J. Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Comput. Mater. Sci. 6, 15–50 (1996).

    CAS  Article  Google Scholar 

  42. 42

    Kresse, G. & Furthmüller, J. Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. Phys. Rev. B 54, 11169–11186 (1996).

    CAS  Article  ADS  Google Scholar 

  43. 43

    Lee, K., Murray, É. D., Kong, L., Lundqvist, B. I. & Langreth, D. C. Higher-accuracy van der Waals density functional. Phys. Rev. B 82, 081101(R) (2010).

    Article  ADS  CAS  Google Scholar 

  44. 44

    Murray, E. D., Lee, K. & Langreth, D. C. Investigation of exchange energy density functional accuracy for interacting molecules. J. Chem. Theor. Comput 5, 2754–2762 (2009).

    CAS  Article  Google Scholar 

  45. 45

    Monkhorst, H. J. & Pack, J. D. Special points for Brillouin-zone integrations. Phys. Rev. B 13, 5188–5192 (1976).

    Article  ADS  MathSciNet  Google Scholar 

  46. 46

    Heyd, J., Scuseria, G. E. & Ernzerhof, M. Hybrid functionals based on a screened Coulomb potential. J. Chem. Phys. 118, 8207–8215 (2003).

    CAS  Article  ADS  Google Scholar 

  47. 47

    Krukau, A. V., Vydrov, O. A., Izmaylov, A. F. & Scuseria, G. E. Influence of the exchange screening parameter on the performance of screened hybrid functionals. J. Chem. Phys. 125, 224106 (2006).

    Article  ADS  CAS  PubMed  Google Scholar 

  48. 48

    Perdew, J. P. Density functional theory and the band gap problem. Int. J. Quant. Chem 28, 497–523 (1985).

    Article  Google Scholar 

  49. 49

    Ramprasad, R., Glassford, K. M., Adams, J. B. & Masel, R. I. CO on Pd(110): determination of the optimal adsorption site. Surf. Sci 360, 31–42 (1996).

    CAS  Article  ADS  Google Scholar 

  50. 50

    He, Y. & Galli, G. Perovskites for Solar Thermoelectric Applications: A First Principle Study of CH3NH3AI3 (A=Pb and Sn). Chem. Matter 26, 5394–5400 (2014).

    CAS  Article  Google Scholar 

  51. 51

    Wang, C. et al. Computational strategies for polymer dielectrics design. Polymer 55, 979–988 (2014).

    CAS  Article  Google Scholar 

  52. 52

    Wang, C. & Ramprasad, R. Novel hybrid polymer dielectrics based on group 14 chemical motifs. Int. J. Hi. Spe. Ele. Syst 23, 1420002 (2014).

    Article  CAS  Google Scholar 

  53. 53

    Baldwin, A. F. et al. Poly(dimethyltin glutarate) as a prospective material for high dielectric applications. Adv. Mater. 27, 346–351 (2015).

    CAS  Article  PubMed  Google Scholar 

  54. 54

    Baldwin, A. F. et al. Effect of incorporating aromatic and chiral groups on the dielectric properties of poly(dimethyltin esters). Macromol. Rapid Commun. 35, 2082–2088 (2014).

    CAS  Article  PubMed  Google Scholar 

  55. 55

    Ma, R. et al. Rationally designed polyimides for high-energy density capacitor applications. ACS Appl. Mater. Interfaces 6, 10445–10451 (2014).

    CAS  Article  PubMed  Google Scholar 

  56. 56

    Sharma, V. et al. Rational design of all organic polymer dielectrics. Nat. Comm. 5, 4845 (2014).

    CAS  Article  ADS  Google Scholar 

  57. 57

    Mannodi-Kanakkithodi, A., Wang, C. C. & Ramprasad, R. Compounds based on Group 14 elements: building blocks for advanced insulator dielectrics design. J. Mater. Sci. 50, 801–807 (2015).

    CAS  Article  ADS  Google Scholar 

  58. 58

    Ma, R. et al. Rational design and synthesis of polythioureas as capacitor dielectrics. J. Mater. Chem. A 3, 14845–14852 (2015).

    CAS  Article  Google Scholar 

  59. 59

    Mannodi-Kanakkithodi, A., Pilania, G., Huan, T. D., Lookman, T. & Ramprasad, R. Machine learning strategy for the accelerated design of polymer dielectrics. Sci. Rep. 6, 20952 (2016).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  60. 60

    Huan, T. D. et al. Advanced polymeric dielectrics for high energy density applications. Prog. Mater. Sci. 83, 236–269 (2016).

    CAS  Article  Google Scholar 

  61. 61

    Mannodi-Kanakkithodi, A., Pilania, G. & Ramprasad, R. Critical assessment of regression-based machine learning methods for polymer dielectrics. Comput. Mater. Sci. 125, 123–135 (2016).

    CAS  Article  Google Scholar 

  62. 62

    Zhu, H., Tang, C., Fonseca, L. R. C. & Ramprasad, R. Recent progress in ab initio simulations of hafnia-based gate stacks. J. Mater. Sci 47, 7399–7416 (2012).

    CAS  Article  ADS  Google Scholar 

  63. 63

    Towns, J. et al. XSEDE: accelerating scientific discovery. Comput. Sci. Engin 16, 62–74 (2014).

    Article  CAS  Google Scholar 

  64. 64

    Hirasawa, M., Ishihara, T. & Goto, T. Exciton features in 0-, 2-, and 3-dimensional networks of [PbI6]4- octahedra. J. Phys. Soc. Jpn 63, 3870–3879 (1994).

    CAS  Article  ADS  Google Scholar 

  65. 65

    Baikie, T. et al. A combined single crystal neutron/X-ray diffraction and solid-state nuclear magnetic resonance study of the hybrid perovskites CH3NH3PbX3 (X=I, Br and Cl). J. Mater. Chem. A 3, 9298–9307 (2015).

    CAS  Article  Google Scholar 

  66. 66

    Geng, W., Zhang, L., Zhang, Y.-N., Lau, W.-M. & Liu, L.-M. First-principles study of lead iodide perovskite tetragonal and orthorhombic phases for photovoltaics. J. Phys. Chem. C 118, 19565–19571 (2014).

    CAS  Article  Google Scholar 

  67. 67

    Kitazawa, N., Watanabe, Y. & Nakamura, Y. Optical properties of CH3NH3PbX3 (X=halogen) and their mixed-halide crystals. J. Mater. Sci. 37, 3585–3587 (2002).

    CAS  Article  ADS  Google Scholar 

  68. 68

    El-Mellouhi, F. et al. Hydrogen bonding and stability of hybrid organic-inorganic perovskites. ChemSusChem 9, 2648–2655 (2016).

    CAS  Article  PubMed  Google Scholar 

  69. 69

    Bernal, C. & Yang, K. First-principles hybrid functional study of the organic-inorganic perovskites CH3NH3SnBr3 and CH3NH3SnI3 . J. Phys. Chem. C 118, 24383–24388 (2014).

    CAS  Article  Google Scholar 

  70. 70

    Papavassiliou, G. & Koutselas, I. Structural, optical and related properties of some natural three- and lower-dimensional semiconductor systems. Synth. Met 71, 1713–1714 (1995).

    CAS  Article  Google Scholar 

  71. 71

    Eperon, G. E. et al. Formamidinium lead trihalide: a broadly tunable perovskite for efficient planar heterojunction solar cells. Energy Environ. Sci 7, 982–988 (2014).

    CAS  Article  Google Scholar 

  72. 72

    Ma, Z.-Q., Pan, H. & Wong, P. K. A first-principles study on the structural and electronic properties of Sn-based organic-inorganic halide perovskites. J. Electron. Mater. 45, 5956–5966 (2016).

    CAS  Article  ADS  Google Scholar 

  73. 73

    Stoumpos, C. C. et al. Hybrid germanium iodide perovskite semiconductors: active lone pairs, structural distortions, direct and indirect energy gaps, and strong nonlinear optical oroperties. J. Am. Chem. Soc. 137, 6804–6819 (2015).

    CAS  Article  PubMed  Google Scholar 

  74. 74

    Feng, J. & Xiao, B. Effective Masses and Electronic and Optical Properties of Nontoxic MASnX3 (X=Cl, Br, and I) Perovskite Structures as Solar Cell Absorber: A Theoretical Study Using HSE06. J. Phys. Chem. C 118, 19655–19660 (2014).

    CAS  Article  Google Scholar 

  75. 75

    Ju, M.-G., Sun, G., Zhao, Y. & Liang, W. A computational view of the change in the geometric and electronic properties of perovskites caused by the partial substitution of Pb by Sn. Phys. Chem. Chem. Phys. 17, 17679–17687 (2015).

    CAS  Article  PubMed  Google Scholar 

  76. 76

    Hirasawa, M., Ishihara, T., Goto, T., Uchida, K. & Miura, N. Magnetoabsorption of the lowest exciton in perovskite-type compound (CH3NH3)PbI3 . Phys. B 201, 427–430 (1994).

    CAS  Article  ADS  Google Scholar 

  77. 77

    Frost, J. M., Butler, K. T. & Walsh, A. Molecular ferroelectric contributions to anomalous hysteresis in hybrid perovskite solar cells. APL Mater 2, 081506 (2014).

    Article  ADS  CAS  Google Scholar 

  78. 78

    Onoda-Yamamuro, N., Matsuo, T. & Suga, H. Dielectric study of {CH3NH3PbX3} (X=Cl, Br, I). J. Phys. Chem. Solids 53, 935–939 (1992).

    CAS  Article  ADS  Google Scholar 

  79. 79

    Dong, Q. et al. Electron-hole diffusion lengths>175 μm in solution-grown CH3NH3PbI3 single crystals. Science 347, 967–970 (2015).

    CAS  Article  ADS  PubMed  Google Scholar 

  80. 80

    Poglitsch, A. & Weber, D. Dynamic disorder in methylammoniumtrihalogenoplumbates (II) observed by millimeter-wave spectroscopy. J. Chem. Phys. 87, 6373–6378 (1987).

    CAS  Article  ADS  Google Scholar 

  81. 81

    Umari, P. & Mosconi, E. Relativistic GW calculations on CH3NH3PbI3 and CH3NH3SnI3 Perovskites for Solar Cell Applications. Sci. Rep. 4, 4467 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. 82

    Brivio, F., Walker, A. B. & Walsh, A. Structural and electronic properties of hybrid perovskites for high-efficiency thin-film photovoltaics from first-principles. APL Mater 1, 042111 (2013).

    Article  ADS  CAS  Google Scholar 

  83. 83

    Bokdam, M. et al. Role of Polar Phonons in the Photo Excited State of Metal Halide Perovskites. Sci. Rep. 6, 28618 (2016).

    CAS  Article  ADS  PubMed  PubMed Central  Google Scholar 

  84. 84

    Koh, T. M. et al. Formamidinium-containing metal-halide: an alternative material for near-IR absorption perovskite solar cells. J. Phys. Chem. C 118, 16458–16462 (2014).

    CAS  Article  Google Scholar 

  85. 85

    Dang, Y. et al. Formation of hybrid perovskite tin iodide single crystals by top-seeded solution growth. Angew. Chem. Int. Ed. 55, 3447–3450 (2016).

    CAS  Article  Google Scholar 

Data Citations

  1. 1

    Kim, C., Huan, T. D., Krishnan, S., & Ramprasad, R. Dryad Digital Repository http://dx.doi.org/10.5061/dryad.gq3rg (2017)

  2. 2

    Kim, C., Huan, T. D., Krishnan, S., & Ramprasad, R. NoMaD Repository http://dx.doi.org/10.17172/NOMAD/2017.03.15-1 (2017)

Download references


Computational work was made possible through XSEDE63 computational resource allocation number TG-DMR080058N. The authors thank Stefan Goedecker and Max Amsler for the minima-hopping code.

Author information




C.K. and T.D.H. contributed equally to the work and manuscript. R.R. designed and supervised this project. All authors discussed the results, wrote, and shaped the manuscript. The DFT computations were performed by C.K., S.K., and T.D.H. Data repository (Khazana) was designed and maintained by C.K.

Corresponding author

Correspondence to Rampi Ramprasad.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

ISA-Tab metadata

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0 Metadata associated with this Data Descriptor is available at http://www.nature.com/sdata/ and is released under the CC0 waiver to maximize reuse.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kim, C., Huan, T., Krishnan, S. et al. A hybrid organic-inorganic perovskite dataset. Sci Data 4, 170057 (2017). https://doi.org/10.1038/sdata.2017.57

Download citation

Further reading


Sign up for the Nature Briefing newsletter for a daily update on COVID-19 science.
Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing