Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A polymer dataset for accelerated property prediction and design


Emerging computation- and data-driven approaches are particularly useful for rationally designing materials with targeted properties. Generally, these approaches rely on identifying structure-property relationships by learning from a dataset of sufficiently large number of relevant materials. The learned information can then be used to predict the properties of materials not already in the dataset, thus accelerating the materials design. Herein, we develop a dataset of 1,073 polymers and related materials and make it available at This dataset is uniformly prepared using first-principles calculations with structures obtained either from other sources or by using structure search methods. Because the immediate target of this work is to assist the design of high dielectric constant polymers, it is initially designed to include the optimized structures, atomization energies, band gaps, and dielectric constants. It will be progressively expanded by accumulating new materials and including additional properties calculated for the optimized structures provided.

Design Type(s) database creation objective • Polymer Chemistry • 3D structure prediction
Measurement Type(s) material properties
Technology Type(s) computational modeling technique
Factor Type(s) chemical compound

Machine-accessible metadata file describing the reported data (ISA-Tab format)

Background & Summary

A central tenet of data-driven materials discovery is that if the volume of accumulated or available data is sufficiently large, and if it can be mined properly with suitable data-driven techniques, the process of designing a new material could be more efficient and rational111. This notion has lead to the development of many useful materials databases1218. The present contribution deals with polymeric materials. Given the complexity of the chemical and configurational/morphological space of polymeric materials, the creation of a database focusing on this materials class is challenging. Nevertheless, if systematic steps can be taken in this direction, consistent with the charter of the Materials Genome Initiative, we will progressively get closer to the rational design and discovery of application-specific polymers.

Within this context, it is worth noting that the recent rational development of nearly a hundred novel polymeric dielectrics for capacitive or electrostatic energy storage1926 has benefitted from the synergy between experimental and computational efforts, of which computations at various levels, including force fields2730 and density functional theory (DFT)31,32, have provided critical guidance. Given a polymer chemical composition, the computational step mainly involves predicting the lowest-energy structures and computing the associated dielectric constant ε and band gap Eg. Those with high ε and high Eg were then identified, leading to the experimental realizations of polymers with desired performances such as high energy density, low loss, etc., refs 1926.

This contribution describes a dataset of 1,073 polymers and related materials as the first step aiming at the rational design of polymers by data-driven approaches. The dataset reported herein, referred to as ‘‘polymer dataset’’ for convenience, was prepared at a uniform and consistent level of first-principles DFT computations. Since our initial goal is to assist the design of high dielectric constant polymers for energy storage, the polymer dataset supplies the equilibrium (relaxed) structures of the materials associated with relevant calculated properties, including the atomization energy εat, the dielectric constant ε and the energy band gap Eg. The initial structures used for the preparation were collected either from other available sources or, quite often, from computational structure searches. This dataset, which is available at, can readily be expanded in multiple ways, i.e., new properties can be calculated from the provided equilibrium structures, and new materials with relevant calculated properties can also be progressively added. Furthermore, it may also serve as a playground for data-mining.



The workflow in Fig. 1 summarizes the preparation of the polymer dataset. In the first step, crystal structures of polymers and related compounds were collected from various available sources, including the reported literature, the Crystallography Open Database (COD)15, and our structure prediction works2026. Those obtained from structure prediction runs were subjected to a preliminary filter (described below), removing any obvious redundancy of identical structures. Then, the selected structures were optimized by DFT calculations, yielding the equilibrium structures and their atomization energies εat. The energy band gap Eg was then calculated on a dense grid of k points while their dielectric constant ε, which is composed of an electronic part εelec and an ionic part εion, was computed within the framework of density functional perturbation theory (DFPT)33. In the next step, the computational scheme and the calculated results were validated with available measured data, including the measured band gap Eg, the dielectric constant ε and/or the infrared spectroscopy (IR) measurements. Those which do not agree with the available experimental data are subjected to further calculations at tighter convergence criteria of residual atomic force (see Technical Validation for more details), and if better agreement is not reached, these points are removed from the dataset. A post-filtering step was finally performed on the whole dataset, keeping only distinct data points. Relaxed structures of all the materials are finally converted into the crystallographic information format (cif) using the pymatgen library34. A note was also provided together with the dataset, indicating the convergence criteria of the datapoints reported herein.

Figure 1: Scheme for preparing the dataset of polymers and related materials.
figure 1

USPEX and minima-hopping are two structure prediction methods that were used for generating a majority of the dataset.

Structure accumulation

Our dataset includes three primary subsets, each of them originating from a distinct source. Subset 1 consists of common polymers which have already been synthesized, resolved, and reported elsewhere. This set contains 34 polymers, listed in Table 1. Collecting polymer structures of this class is challenging because the reported data is widely scattered, and in case the information obtained is sufficient to reconstruct structures, this work has to be done manually and hence, substantially laborious. We further note that only for a few of them, measurement for band gap, dielectric constant, and/or infrared (IR) spectrum have been performed. This data was used for the validation step.

Table 1 List of the common polymers summarized in this dataset and the corresponding references.

Subset 2 includes 314 new organic polymers (284 of them have been used in ref. 11) and 472 new organometallic polymers. Their structures were generated from a computation-driven strategy19,20 which has been used to rationally design various classes of polymeric dielectrics11,2026. The starting point of this strategy is a pool of common polymer building blocks, which are either organic, e.g., –CH2–, –NH–, –CO–, –O–, –CS–, –C6H4–, and –C4H2S–, or inorganic (metal-containing) like –COO–Sn(CH3)2–OCC–, –SnF2–, and –SnCl2–. The repeat unit of an organic polymer is then created by concatenating a given number of organic building blocks while that of an organometallic polymer contains at least one inorganic block linked with a chain of several CH2 groups. Next, chains of the repeat units (illustrated in Fig. 2) are packed in low-energy crystal structures which are determined by Universal Structure Predictor: Evolutionary Xtallography (USPEX)23,35 or minima-hopping (MH)36,37, two of the currently most powerful structure prediction methods. In brief, these methods allow for predicting the low-energy structures of a material as the local minima of the potential energy surface, constructed from DFT energy. The efficiency of these methods have been successfully demonstrated for many different materials classes3841, including a large number of organic20,23 and organometallic polymers2426.

Figure 2: Organic polymer chains with repeat units of –NH–CH2–CH2–O– (a) and –C6H4–O–C6H4–CO– (b) and organometallic polymer chains with repeat units of –SnCl2–(CH2)2– (c) and –OOC–Sn–(CH3)2–COO–(CH2)4– (d).
figure 2

Carbon, hydrogen, oxygen, nitrogen, chlorine, and tin atoms are shown in dark brown, light pink, red, light cyan, green, and dark cyan, respectively.

For each structure prediction run, the lowest-energy structure and those within 200 meV per atom above it were collected. The number of structures within this energy window is material-dependent, ranging from several to several dozens. Because many of them are just slightly different by small perturbations in the atomic arrangement, a preliminary filtering step was used to remove this redundancy. In particular, we used a clustering algorithm (hierarchical) to group those which are different by less than 5 meV per atom in εat and less than 0.1 eV in Eg, keeping the representative structures. Only those with polymeric motifs, when visually confirmed, are selected for the next steps. In the predicted polymer structures, especially for those of organometallic polymers, these polymeric chains are not necessarily isolated, i.e., inter-chain bonds may occur in various fashions2426.

The material structures used to prepare subset 3 were collected from COD. Generally, materials provided by COD are not polymers, but a number of them are collected in this dataset as they are closely related to the examined polymers. Although collecting materials structures from this database is straightforward, we limited ourselves to only those whose cell volumes are not too large, i.e., roughly 1,500 Å3 and below. This subset contains 253 molecular organic and organometallic crystals, 178 of them have recently been used in ref. 10 by some of us.

Table 2 summarizes the contents of the polymer dataset, which contains both polymers (subset 1 and 2) and non-polymers (subset 3). In terms of chemistry, the included materials can be classified as either organic or organometallic, incorporating different metals in their backbone. The complete list of chemical elements that appear in this dataset is given in Table 3.

Table 2 Summary of the data subclasses in the polymer dataset.
Table 3 VASP PAW potentials of the elements used for calculations in this work.

Numerical calculations

The computed data reported in our dataset was prepared with density functional theory (DFT)31,32 calculations, using the projector augmented-wave (PAW) formalism42 as implemented in Vienna Ab initio Simulation Package (vasp)4346. The default accuracy level of our calculations is ``Accurate'', specified by setting PREC=Accurate in all the runs with vasp. The basis set includes all the plane waves with kinetic energies up to 400 eV, as recommended by vasp manual for this level of accuracy. PAW datasets of version 5.2, which were used to describe the ion-electron interactions, are also summarized in Table 3. The van der Waals dispersion interactions, known47 to be important in stabilizing soft materials dominated by non-bonding interactions like polymers48, were estimated with the non-local density functional vdW-DF2 (ref. 49). The generalized gradient approximation (GGA) functional associated with vdW-DF2, i.e., refitted Perdew-Wang 86 (rPW86)50, was used for the exchange-correlation (XC) energies.

Because the examined material structures are significantly different in terms of the cell shape, the sampling procedure of their Brillouin zones must be handled appropriately. For each structure, a Monkhorst-Pack k-point mesh51 of a given spacing parameter hk in the reciprocal space was used. For the geometry optimization and dielectric constant calculations, hk=0.25 Å−1 while the band gap calculations have been performed on a finer Γ-centered mesh with hk=0.20 Å−1. We further set the lower limit for the Monkhorst-Pack mesh dimensionality, that is, the number of grid points along any reciprocal axis is no less than 3, regardless of how short the reciprocal lattice dimension along this axis is.

During the relaxation step, we optimized both the cell and the atomic degrees of freedom of the materials structures until atomic forces are smaller than 0.01 eV Å−1. Calculations for band gap Eg was then carried out on top of the equilibrium structures. Because Eg is typically underestimated with a GGA XC functional like rPW86 (ref. 52), this important physical property has also been calculated with the hybrid Heyd-Scuseria-Ernzerhof (HSE06) XC functional53,54 with an expectation that the calculated result would become much closer to the true material band gap. Both EgGGA and EgHSE06, the band gap calculated at the GGA-rPW86 and HSE06 levels of theory, are provided in all the entries of the dataset (see File format for more details). Finally, the dielectric constant ε of these structures was calculated within the DFPT formalism as implemented in vasp package. Calculations of this type involve the determination of the lattice vibrational spectra at Γ, the center of the Brillouin zone. This information is also used to compute the IR spectra of some structures for the purpose of validation.


Given that the sources of the polymer dataset reported herein are diversified, any clear duplicate and/or redundancy should be identified and removed. Because the preliminary filtering step was performed only on subset 2 based on their DFT energy and band gap estimated during the structure prediction runs with a limited accuracy, an additional filtering step was performed on the whole dataset. Within this step, all cases with the same chemical composition but different by less than 0.1 eV in Eg, less than 5 meV per atom in εat, and less than 0.1 in both εelec and εion, are clustered. At this point, the number of clustered points is not large, and all of them were inspected visually, keeping only distinct materials.

Data Records

The complete dataset of 1,073 polymers and related materials can be downloaded as a tarball from Dryad Repository (Data Citation 1) or can be accessed via (all the records with ID from 0001 to 1073). All 4,292 DFT runs of the entire dataset (for each structure, there are 4 runs, including relax, dielectric, GGA band gap, and HSE06 band gap) are hosted by NoMaD Repository (Data Citation 2).

File format

All the information reported in the dataset for a given material is stored in a file, named as 0001.cif, where a cardinal number (0001 in this example) is used for the identification of the entry in the dataset. The first part of a file of this type is devoted to the optimized structure in the standard cif format which is compatible with majority of visualization software. Other information, including the calculated properties, is provided as the comments lines in the second part of the file as follow

# Source: VSharma_etal:NatCommun.5.4845(2014)

# Class: organic_polymer_crystal

# Label: Polyimide

# Structure prediction method used: USPEX

# Number of atoms: 32

# Number of atom types: 4

# Atom types: C H O N

# Dielectric constant, electronic: 3.71475E+00

# Dielectric constant, ionic: 1.54812E+00

# Dielectric constant, total: 5.26287E+00

# Band gap at the GGA level (eV): 2.05350E+00

# Band gap at the HSE06 level (eV): 3.30140E+00

# Atomization energy (eV/atom): -6.46371E+00

# Volume of the unit cell (A^3): 2.79303E+02

While most of the keywords are clear, we used Source to provide the origin of the material structure and Class to refer to the class of materials which can either be ‘‘organic polymer crystal’’, ‘‘organometallic polymer crystal’’, ‘‘organic molecular crystal’’, or ‘‘organometallic molecular crystal’’. Keyword Label was used to provide more detailed information on the material, which can be the common name of the material if it is available, the ID of the record obtained from COD, or the repeat unit of the polymer structure predicted.

Graphical summary of the dataset

To graphically summarize the polymer dataset, we visualize it in the property space. Because the band gap and the dielectric constant are the primary properties reported by this dataset, three plots, namely E g HSE06 ε elec , E g HSE06 ε ion , and E g HSE06 ε, were compiled and shown in Fig. 3. Materials from different classes are shown in different colors to clarify the role of the polymer chemical composition in controlling Eg and ε. Within the recent effort of developing polymers for high-energy-density applications1926, such plots are useful for identifying promising candidates, i.e., those which have high dielectric constant while maintaining sufficient band gap (Eg≥3 eV).

Figure 3: A summary of the polymer dataset based on the calculated band gap E g HSE06 and the dielectric constants ε elec (a), ε ion (b), and ε= ε elec + ε ion (c).
figure 3

In the figure keys, ‘‘CM’’, ‘‘P’’, ‘‘NP’’, and ‘‘O’’ refer to ‘‘Common’’, ‘‘Polymer’’, ‘‘Non-Polymer’’, and ‘‘Organic’’, respectively. For organometallic polymers, the identity of the metal element included is used. The polymers developed by the structure prediction based pathway in refs 1926 are labeled as ‘‘Dev-P’’.

Figure 3a clearly indicates a limit of the form ε elec 1/ E g between ε elec and Eg, which is applicable for both organic and organometallic classes of materials. We note that this behavior has also been reported elsewhere10,19. Figure 3c, on the other hand, demonstrates that the classes of organic and organometalic polymers and molecular crystals occupy different regions in the property space. At a given value of band gap, the organometallic polymers are generally much higher than the organic polymers in terms of the dielectric constant. While a fairly large number of organometallic polymers were already developed2426, this observation suggests that there remains significant room for manipulating the dielectric constant of the organometallic polymers.

Technical Validation

Among the materials properties reported in the present dataset, the atomization energy E at is physically relevant and has always been used as a standard method for examining the thermodynamic stability of various classes of materials, including inorganic crystals3841 and polymers1926. While the band gap E g GGA calculated at the GGA level of DFT is not ready to be compared with the measured data due to the aforementioned well-known underestimation52, E g HSE06 (the band gap calculated with the HSE06 XC functional) is expected to be rather close to the true band. We show in Fig. 4a E g HSE06 of 11 polymers for which the band gap has been measured experimentally. The calculated band gap seems to agree pretty well with the measured data with a numerical discrepancy of about 20% and below.

Figure 4: Calculated and measured dielectric constants of (a) several inorganic compounds, and (b) the polymers reported in refs 2022 (new organic polymers) and refs 2426 (poly(tin ester)).
figure 4

The error bars originated from different (energetically competing) structures predicted for a given polymer. For organometallic polymers, the error bars are significant due to the diversity of structural motifs involving the aforementioned inter-chain bonds, which are not present in organic polymers. In (c), (d), (e), and (f), the simulated and measured infrared spectra of orthorhombic polyethylene, orthorhombic polyoxymethylene, poly(dimethyltin glutarare), and polythiourea are shown. The experimental data of these three polymers was taken from refs 60,55, 24, 20, respectively. Shadow areas are given to indicate the agreement between simulated and measured transmitance peaks.

We now consider the calculations of the dielectric constants, namely ε elec and ε ion . Overall, the theoretical foundations and the implementations for calculating ε elec and ε ion are well developed and tested, leading to rather accurate results. Within the DFT-based perturbative approach, ε elec is computed via the response to the external field perturbations while ε ion is evaluated through the phonon frequencies at the Γ point of the Brillouin zone. To be precise, the dielectric response of a crystalline insulator to an external electric field E is given in terms of a frequency-dependent tensor ε α β (ω). To linear order, the electronic contribution of the dielectric tensor is given by

(1) ε elec α β ( ω ) = 1 + 4 π P α E β ,

where Pα is the component along the α direction of the induced polarization P. On the other hand, the ionic part of the dielectric tensor is determined as

(2) ε ion α β ( ω ) = 4 π Ω m S m α β ω m , q = 0 2 ω 2 .

In this expression, Ω is the volume of the simulation cell, appearing as a normalization factor. The sum is taken over the index m of the phonon normal modes, which assumes the frequency ωm,q=0 at the Brillouin zone center (q=0) while the mode oscillator strength Smαβ is determined through the Born effective charge Zs,αβ* of the atom s. For an isotropic material, the dielectric constant of the practical interest is taken to be the average value of its diagonal elements at the static limit, i.e., ε= 1 3 α [ ε α α (ω0)].

Equation 2 implies that at the limit of ω→0, ε ion α β (ω) is rather sensitive to the numerical accuracy of ωm,q=0, which, in turn, suggests highly equilibrated materials structures for the DFPT calculations. As mentioned in the Workflow Section, if the calculated dielectric constant ε of a polymer is different from its measured data (this information is available for just a limited number polymers in subset 1 and 2) by more than 20%, the structures are further optimized until the residual atomic forces are smaller than 0.001 eV Å−1. Only those with calculated dielectric constant within 20% of the experimental data [shown in Fig. 4b] are kept.

Within our dataset, the IR spectrum was measured for some materials. From the computational side, this material characteristic can also be calculated rather accurately from the byproducts of the dielectric constant calculations with DFPT. In particular, the intensities of the infrared-active modes are given by56

(3) I m α | s β Z * s , α β * e m , s β | 2 ,

where em, is the β component of the normalized vibrational eigenvector of the mode m at the atom s. Obviously, all of the necessary quantities needed to calculate Im according to Equation 3 can be obtained within the DFPT-based computational scheme of ε, thus requiring essentially no computational overhead. This approach has widely been used in characterizing various classes of materials57,58. We show in Figure 4c–f the IR spectra calculated for four polymers, including orthohombic polyethylene, orthohombic polyoxymethylene, poly(dimethyltin glutarate)24, and polythiourea20, each of them is compared with the corresponding measured IR spectrum. The excellent agreement between the calculated and the measured IR spectra can be regarded as a supportive validation of the computational scheme based on DFT calculations used for this polymer dataset.

Usage Notes

This dataset, which includes a variety of known and new organic and organometallic polymers and related materials, has been consistently prepared using first-principles calculations. While the HSE06 band gap E g HSE06 is believed to be fairly close to the true band gap of the materials, the GGA-rPW86 band gap is also reported for completeness and for further possible analysis. The reported atomization energy and the dielectric constants are also expected to be accurate.

The polymer dataset is one among many recently developed datasets which can be used for designing materials by various data-driven approaches. To be specific, this dataset is expected to be useful in the development of polymers for energy storage and electronics applications. Moving forward, the development of this dataset will be continuously validated and updated, and the most recent version can be accessed at repository



  1. Curtarolo, S. et al. The high-throughput highway to computational materials design. Nat. Matter 12, 191 (2013).

    ADS  CAS  Article  Google Scholar 

  2. Hautier, G., Jain, A. & Ong, S. From the computer to the laboratory: materials discovery and design using first-principles calculations. J. Mater. Sci. 47, 7317 (2012).

    ADS  CAS  Article  Google Scholar 

  3. Rajan, K. Materials informatics. Mater. Today 8, 38 (2005).

    CAS  Article  Google Scholar 

  4. Schön, J. C. How can databases assist with the prediction of chemical compounds? Z. Anorg. Allg. Chem. 640, 2717 (2014).

    PubMed  PubMed Central  Article  Google Scholar 

  5. Curtarolo, S., Morgan, D., Persson, K., Rodgers, J. & Ceder, G. Predicting crystal structures with data mining of quantum calculations. Phys. Rev. Lett. 91, 135503 (2003).

    ADS  PubMed  Article  Google Scholar 

  6. Hansen, K. et al. Assessment and validation of machine learning methods for predicting molecular atomization energies. J. Chem. Theory Comput. 9, 3404 (2013).

    CAS  PubMed  Article  Google Scholar 

  7. Meredig, B. et al. Combinatorial screening for new materials in unconstrained composition space with machine learning. Phys. Rev. B 89, 094104 (2014).

    ADS  Article  Google Scholar 

  8. Bhat, T. N., Bartolo, L. M., Kattner, U. R., Campbell, C. E. & Elliott, J. T. Strategy for extensible, evolving terminology for the Materials Genome Initiative efforts. JOM 67, 1866 (2015).

    Article  Google Scholar 

  9. Pilania, G., Wang, C., Jiang, X., Rajasekaran, S. & Ramprasad, R. Accelerating materials property predictions using machine learning. Sci. Rep 3, 2810 (2013).

    ADS  PubMed  PubMed Central  Article  Google Scholar 

  10. Huan, T. D., Mannodi-Kanakkithodi, A. & Ramprasad, R. Accelerated materials property predictions and design using motif-based fingerprints. Phys. Rev. B 92, 014106 (2015).

    ADS  Article  Google Scholar 

  11. Mannodi-Kanakkithodi, A., Pilania, G., Huan, T. D., Lookman, T. & Ramprasad, R. Machine learning strategy for the accelerated design of polymer dielectrics. Sci. Rep. 10.1038/srep20952 (2016).

  12. Jain, A. et al. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation. APL Materials 1, 011002 (2013).

    ADS  Article  Google Scholar 

  13. Saal, J. E., Kirklin, S., Aykol, M., Meredig, B. & Wolverton, C. Materials design and discovery with high-throughput density functional theory: the open quantum materials database (OQMD). JOM 65, 1501 (2013).

    CAS  Article  Google Scholar 

  14. Taylor, R. H. et al. A RESTful API for exchanging materials data in the consortium. Comput. Mater. Sci. 93, 178 (2014).

    Article  Google Scholar 

  15. Gražulis, S. et al. Crystallography Open Database (COD): an open-access collection of crystal structures and platform for world-wide collaboration. Nucleic Acids Res. 40, D420 (2012).

    PubMed  Article  Google Scholar 

  16. Otsuka, S., Kuwajima, I., Hosoya, J., Xu, Y. & Yamazaki, M. in International Conference on Emerging Intelligent Data and Web Technologies (EIDWT) pp 22–29 (IEEE, Tirana, 2011).

  17. Reymond, J.-L. The chemical space project. Acc. Chem. Res. 48, 722 (2015).

    CAS  PubMed  Article  Google Scholar 

  18. Ramakrishnan, R., Dral, P. O., Rupp, M. & von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Dat 1, 140022 (2014).

    CAS  Article  Google Scholar 

  19. Wang, C. C. et al. Computational strategies for polymer dielectric design. Polymer 55, 979 (2014).

    CAS  Article  Google Scholar 

  20. Sharma, V. et al. Rational design of all organic polymer dielectrics. Nat. Commun 5, 4845 (2014).

    ADS  CAS  PubMed  Article  Google Scholar 

  21. Lorenzini, R., Kline, W., Wang, C., Ramprasad, R. & Sotzing, G. The rational design of polyurea & polyurethane dielectric materials. Polymer 54, 3529 (2013).

    CAS  Article  Google Scholar 

  22. Ma, R. et al. Rational design and synthesis of polythioureas as capacitor dielectrics. J. Mater. Chem. A 3, 14845 (2015).

    CAS  Article  Google Scholar 

  23. Zhu, Q., Sharma, V., Oganov, A. R. & Ramprasad, R. Predicting polymeric crystal structures by evolutionary algorithms. J. Chem. Phys. 141, 154102 (2014).

    ADS  PubMed  Article  Google Scholar 

  24. Baldwin, A. F. et al. Poly(dimethyltin glutarate) as a prospective material for high dielectric applications. Adv. Matter 27, 346 (2015).

    CAS  Article  Google Scholar 

  25. Baldwin, A. F. et al. Rational design of organotin polyesters. Macromolecules 48, 2422 (2015).

    ADS  CAS  Article  Google Scholar 

  26. Baldwin, A. F. et al. Effect of incorporating aromatic and chiral groups on the dielectric properties of poly(dimethyltin esters). Macromol. Rapid Commun. 35, 2082 (2014).

    CAS  PubMed  Article  Google Scholar 

  27. Banks, J. L. et al. Integrated modeling program, applied chemical theory (IMPACT). J. Comput. Chem. 26, 1752 (2005).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  28. Jorgensen, W. L., Ulmschneider, J. P. & Tirado-Rives, J. Free Energies of Hydration from a Generalized Born Model and an All-Atom Force Field. J. Phys. Chem. B 108, 16264 (2004).

    CAS  Article  Google Scholar 

  29. Vanommeslaeghe, K. & MacKerell, Jr., A. D. Automation of the CHARMM General Force Field (CGenFF) I: Bond Perception and Atom Typing. J. Chem. Infor. Model. 52, 3144 (2012).

    CAS  Article  Google Scholar 

  30. Vanommeslaeghe, K., Raman, E. P. & MacKerell, Jr., A. D. Automation of the CHARMM General Force Field (CGenFF) II: Assignment of Bonded Parameters and Partial Atomic Charges. J. Chem. Infor. Model. 52, 3155 (2012).

    CAS  Article  Google Scholar 

  31. Hohenberg, P. & Kohn, W. Inhomogeneous electron gas. Phys. Rev. 136, B864 (1964).

    ADS  MathSciNet  Article  Google Scholar 

  32. Kohn, W. & Sham, L. Self-consistent equations including exchange and correlation effects. Phys. Rev. 140, A1133 (1965).

    ADS  MathSciNet  Article  Google Scholar 

  33. Baroni, S., de Gironcoli, S. & Dal Corso, A. Phonons and related crystal properties from density-functional perturbation theory. Rev. Mod. Phys. 73, 515 (2001).

    ADS  CAS  Article  Google Scholar 

  34. Ong, S. P. et al. Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis. Comput. Mater. Sci. 68, 314 (2013).

    CAS  Article  Google Scholar 

  35. Glass, C. W., Oganov, A. R. & Hansen, N. USPEX-Evolutionary crystal structure prediction. Comput. Phys. Commun. 175, 713 (2006).

    ADS  CAS  MATH  Article  Google Scholar 

  36. Goedecker, S. Minima hopping: An efficient search method for the global minimum of the potential energy surface of complex molecular systems. J. Chem. Phys. 120, 9911 (2004).

    ADS  CAS  PubMed  Article  Google Scholar 

  37. Amsler, M. & Goedecker, S. Crystal structure prediction using the minima hopping method. J. Chem. Phys. 133, 224104 (2010).

    ADS  PubMed  Article  Google Scholar 

  38. Huan, T. D., Amsler, M., Tuoc, V. N., Willand, A. & Goedecker, S. Low-energy structures of zinc borohydride Zn(BH4)2 . Phys. Rev. B 86, 224110 (2012).

    ADS  Article  Google Scholar 

  39. Huan, T. D. et al. Thermodynamic stability of alkali metal/zinc double-cation borohydrides at low temperatures. Phys. Rev. B 88, 024108 (2013).

    ADS  Article  Google Scholar 

  40. Huan, T. D., Sharma, V., Rossetti, G. A. & Ramprasad, R. Pathways towards ferroelectricity in hafnia. Phys. Rev. B 90, 064111 (2014).

    ADS  Article  Google Scholar 

  41. Sharma, H., Sharma, V. & Huan, T. D. Exploring PtSO4 and PdSO4 phases: an evolutionary algorithm based investigation. Phys. Chem. Chem. Phys. 17, 18146 (2015).

    CAS  PubMed  Article  Google Scholar 

  42. Blöchl, P. E. Projector augmented-wave method. Phys. Rev. B 50, 17953 (1994).

    ADS  Article  Google Scholar 

  43. Kresse, G. & Hafner, J. Ab initio molecular dynamics for liquid metals. Phys. Rev. B 47, 558 (1993).

    ADS  CAS  Article  Google Scholar 

  44. Kresse, G. (Ph.D. thesis), Ab initio Molekular Dynamik für flüssige Metalle, Technische Universität Wien, (1993).

  45. Kresse, G. & Furthmüller, J. Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Comput. Mater. Sci. 6, 15 (1996).

    CAS  Article  Google Scholar 

  46. Kresse, G. & Furthmüller, J. Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. Phys. Rev. B 54, 11169 (1996).

    ADS  CAS  Article  Google Scholar 

  47. Woods, L. M. et al. Preprint, arXiv:1509.03338.

  48. Liu, C.-S., Pilania, G., Wang, C. & Ramprasad, R. How Critical Are the van der Waals Interactions in Polymer Crystals? J. Phys. Chem. A 116, 9347 (2012).

    CAS  PubMed  Article  Google Scholar 

  49. Lee, K., Murray, É. D., Kong, L., Lundqvist, B. I. & Langreth, D. C. Higher-accuracy van der Waals density functional. Phys. Rev. B 82, 081101(R) (2010).

    ADS  Article  Google Scholar 

  50. Murray, E. D., Lee, K. & Langreth, D. C. Investigation of exchange energy density functional accuracy for interacting molecules. J. Chem. Theor. Comput 5, 2754 (2009).

    CAS  Article  Google Scholar 

  51. Monkhorst, H. J. & Pack, J. D. Special points for Brillouin-zone integrations. Phys. Rev. B 13, 5188 (1976).

    ADS  MathSciNet  Article  Google Scholar 

  52. Perdew, J. P. Density functional theory and the band gap problem. Int. J. Quant. Chem. 28, 497 (1985).

    Article  Google Scholar 

  53. Heyd, J., Scuseria, G. E. & Ernzerhof, M. Hybrid functionals based on a screened Coulomb potential. J. Chem. Phys. 118, 8207 (2003).

    ADS  CAS  Article  Google Scholar 

  54. Krukau, A. V., Vydrov, O. A., Izmaylov, A. F. & Scuseria, G. E. Influence of the exchange screening parameter on the performance of screened hybrid functionals. J. Chem. Phys. 125, 224106 (2006).

    ADS  PubMed  Article  Google Scholar 

  55. Carazzolo, G. & Mammi, M. Crystal structure of a new form of polyoxymethylene. J. Polym. Sci. A 1, 965 (1963).

    CAS  Google Scholar 

  56. Brüesch, P. Phonons: Theory and Experiments II, of Springer Series in Solid-State Sciences Vol. 65, Chap. 2, pp 8–64 (Springer: Berlin, 1986).

    Google Scholar 

  57. Giannozzi, P. & Baroni, S. Vibrational and dielectric properties of C60 from density-functional perturbation theory. J. Chem. Phys. 100, 8537 (1994).

    ADS  CAS  Article  Google Scholar 

  58. Wang, C. C., Pilania, G. & Ramprasad, R. Dielectric properties of carbon-, silicon-, and germanium-based polymers: A first-principles study. Phys. Rev. B 87, 035103 (2013).

    ADS  Article  Google Scholar 

  59. Towns, J. et al. XSEDE: accelerating scientific discovery. Comput. Sci. Engin 16, 62 (2014).

    Article  Google Scholar 

  60. Wigman, L. S., Hart, E. E. & Gombatz, C. IR spectroscopy using disposable polyethylene cards: a replacement for KBr pellets and mulls. J. Chem. Educ. 73, 677 (1996).

    CAS  Article  Google Scholar 

  61. Peacock, A. Handbook of Polyethylene: Structures: Properties, and Applications. 1 ed. (CRC Press: New York, US, 2000).

    Book  Google Scholar 

  62. Hikosaka, M. & Seto, T. The order of the molecular chains in isotactic polypropylene crystals. Polym. J. 5, 111 (1973).

    CAS  Article  Google Scholar 

  63. Kolda, R. R. & Lando, J. B. The effect of hydrogen-fluorine defects on the conformational energy of polytrifluoroethylene chains. J. Macromol. Sci. Phys. 11, 21 (1975).

    ADS  Article  Google Scholar 

  64. De Rosa, C., Guerra, G., Petraccone, V. & Pirozzi, B. Crystal structure of the emptied clathrate form (δ e form) of syndiotactic polystyrene. Macromolecules 30, 4147 (1997).

    ADS  CAS  Article  Google Scholar 

  65. Kobayashi, M. et al. Synthesis and properties of chemically coupled poly(thiophene). Synth. Met. 9, 77 (1984).

    CAS  Article  Google Scholar 

  66. Dorset, D. L. Direct determination of polymer crystal structures from fibre and powder X-ray data. Polymer 38, 247 (1997).

    CAS  Article  Google Scholar 

  67. Fratini, A. V., Cross, E. M., O'brien, J. F. & Adams, W. W. The structure of poly-2,5-benzoxazole (ABPBO) and poly-2,6-benzothiazole (ABPBT) fibers by X-ray diffraction. J. Macromol. Sci. Phys. 24, 159 (1985).

    ADS  Article  Google Scholar 

  68. Kumpanenko, I. V., Kazaskii, K. S., Ptitsyna, N. V. & Kushnerev, M. Y. Structural study of polymeric 3,3,3-trifluoro-1,2-epoxypropane. Polym. Sci. USSR 12, 930 (1970).

    Article  Google Scholar 

  69. Matsubayashi, H., Chatani, Y., Tadokoro, H., Tabata, Y. & Ito, W. Molecular and crystal structure of hexafluoroacetone-ethylene alternating copolymer. Polym. J. 9, 145 (1977).

    CAS  Article  Google Scholar 

  70. Turner-Jones, A. & Bunn, C. W. The crystal structure of polyethylene adipate and polyethylene suberate. Acta Cryst 15, 105 (1962).

    CAS  Article  Google Scholar 

  71. Tabor, B. J., Magré, E. P. & Boon, J. The crystal structure of poly-p-phenylene sulphide. Eur. Polym. J. 7, 1127 (1971).

    CAS  Article  Google Scholar 

  72. Sakakihara, H., Takahashi, Y., H., T., Sigwalt, P. & Spassky, N. Structural studies of the optically active and racemic poly(propylene sulfides). Macromolecules 2, 515 (1969).

    ADS  CAS  Article  Google Scholar 

  73. Tanigami, T. et al. Structural studies on ethylene-tetrafluoroethylene copolymer 1. Crystal structure. Polymer 27, 999 (1986).

    CAS  Article  Google Scholar 

  74. Mencik, Z. The crystal structure of poly(tetramethylene terephthalate). J. Polym. Sci.: Polym. Phys. Ed. 13, 2173 (1975).

    ADS  CAS  Google Scholar 

  75. Jourdan, N., Deguire, S. & Brisse, F. Structural study of linear polyesters. 1. crystal structure of poly(trimethylene sebacate), established from X-ray and electron diffraction data. Macromolecules 28, 8086 (1995).

    ADS  CAS  Article  Google Scholar 

  76. Lando, J. B. & Hanes, M. D. X-ray Analysis of Poly(vinyl fluoride). Macromolecules 28, 1142 (1995).

    ADS  CAS  Article  Google Scholar 

  77. de, P., Daubeny, R. & Bunn, C. W. The crystal structure of polyethylene terephthalate. Proc. R. Soc. A 226, 531 (1954).

    ADS  Article  Google Scholar 

  78. Hasegawa, R., Kobayashi, M. & Tadokoro, H. Molecular conformation and packing of poly(vinylidene fluoride). Stability of three crystalline forms and the effect of high pressure. Polym. J. 591 (1972).

  79. De Rosa, C. & Corradini, P. Crystal structure of syndiotactic polypropylene. Macromolecules 26, 5711 (1993).

    ADS  CAS  Article  Google Scholar 

  80. Puterman, M., Kolpak, F. J., Blackwell, J. & Lando, J. B. X-ray structure determination of isotactic poly(2-vinylpyridine). J. Pol. Sci.: Polym. Phys. Ed. 15, 805 (1977).

    ADS  CAS  Google Scholar 

  81. Hobson, R. J. & Windle, A. H. Crystalline structure of atactic polyacrylonitrile. Macromolecules 26, 6903 (1993).

    ADS  CAS  Article  Google Scholar 

  82. Lotz, B. Crystal structure of polyglycine I. J. Mol. Bio. 87, 169 (1974).

    CAS  Article  Google Scholar 

  83. Kakida, H., Chatani, Y. & Tadokoro, H. Crystal structure of poly(m-phenylene isophthalamide). J. Polym. Sci.: Polym. Phys. Ed. 14, 427 (1976).

    ADS  CAS  Google Scholar 

  84. Kobayashi, N. et al. Chain Distortion of m-Linked Aromatic Polymers: Poly(m-phenylene) and Poly(m-pyridine). Macromolecules 37, 7986 (2004).

    ADS  CAS  Article  Google Scholar 

  85. Tashiro, K. et al. Confirmation of the crystal structure of poly(p-phenylene benzobisoxazole) by the X-ray structure analysis of model compounds and the energy calculation. J. Polym. Sci. Part B: Polym. Phys. 39, 1296 (2001).

    ADS  CAS  Article  Google Scholar 

Data Citations

  1. Huan, T. D. Dryad Digital Repository (2015)

  2. Huan, T. D. NoMaD Repository (2016)

Download references


The present work was supported by Multidisciplinary University Research Initiative (MURI) grant from the Office of Naval Research under Award No. N00014-10-1-0944. G.P. acknowledges the support provided by U.S. Department of Energy through the LANL/LDRD Program's Director's postdoctoral fellowship. A.M.K. would like to thank Turab Lookman at Los Alamos National Laboratory for providing access to computational resources. Computational work was made possible through XSEDE computational resource allocation number TG-DMR080058N59.

Author information

Authors and Affiliations



T.D.H. wrote the paper with inputs and critique from all authors. Data accumulations and first-principles calculations were performed by T.D.H., A.M.K., C.K., V.S., and G.P. Dataset was refined, validated, and finalized by T.D.H. Data repository (Khazana) was designed and maintained by C.K. This project was initiated, designed and supervised by R.R. Contributions from T.D.H. and A.M.K. are equal.

Corresponding author

Correspondence to Rampi Ramprasad.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

ISA-Tab metadata

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit Metadata associated with this Data Descriptor is available at and is released under the CC0 waiver to maximize reuse.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Huan, T., Mannodi-Kanakkithodi, A., Kim, C. et al. A polymer dataset for accelerated property prediction and design. Sci Data 3, 160012 (2016).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing