Data Descriptor | Open

# An ab initio electronic transport database for inorganic materials

• Scientific Data 4, Article number: 170085 (2017)
• doi:10.1038/sdata.2017.85
Accepted:
Published online:

## Abstract

Electronic transport in materials is governed by a series of tensorial properties such as conductivity, Seebeck coefficient, and effective mass. These quantities are paramount to the understanding of materials in many fields from thermoelectrics to electronics and photovoltaics. Transport properties can be calculated from a material’s band structure using the Boltzmann transport theory framework. We present here the largest computational database of electronic transport properties based on a large set of 48,000 materials originating from the Materials Project database. Our results were obtained through the interpolation approach developed in the BoltzTraP software, assuming a constant relaxation time. We present the workflow to generate the data, the data validation procedure, and the database structure. Our aim is to target the large community of scientists developing materials selection strategies and performing studies involving transport properties.

Design Type(s) database creation objective electronic transport properties computational modeling technique inorganic compound or molecule

## Background & Summary

Many devices such as solar cells, transistors, and thermoelectric generators rely on materials with specific transport properties such as conductivity, mobility, and Seebeck coefficient. An insight into the electronic transport properties of materials is accessible, although with some limitations, by exploiting the semi-classical approach provided by the Boltzmann theory1,2. The transport tensors can indeed be linked to the fundamental electronic structure and atomistic properties of any material using Boltzmann transport theory. From the knowledge of the band structure (e.g., from ab initio computations within density functional theory (DFT)) and of the relaxation times due to different scattering processes, solving the Boltzmann transport equation provides indeed an assessment of the electronic transport tensors3,4.

Materials properties can be computed on a unprecedented level using high-throughput (HT) ab initio computing5,​6,​7,​8. This new paradigm is transforming the way materials are discovered, offering the possibility to select materials with certain properties before going to the lengthy process of synthesis, characterization, and device making9,10. Large datasets are also useful to perform data mining studies where trends and correlations between materials properties are revealed11,​12,​13,​14. Freely accessible high-throughput databases are being built providing data to a large community of scientists to facilitate the screening and data mining process15 (the Materials Project16,17, Open Quantum database18, AFLOW19, NOMAD repository20, the Harvard Clean Energy Project21).

The BoltzTraP code4 solves Boltzmann equation by interpolating a band structure computed within DFT and performing all the required integrations. Since its development, BoltzTraP has been used in several high-throughput studies in various fields from thermoelectrics to transparent conducting oxides22,​23,​24,​25,​26,​27 and also to obtain new general descriptors of the band structure28. In this paper, we report on a large dataset of electronic transport properties obtained by combining high-throughput generated ab initio band structures and Boltzmann transport theory within the constant relaxation time approximation. In total, we provide access to the computed electronic transport data for about 48,000 compounds to this date. This is the largest public database of electronic transport data obtained by BoltzTraP and DFT.

This dataset adds to the growing database of materials properties of the Materials Project (MP)16,17. It will be accessible via its web interface similar to thermodynamics, battery, elastic29, and piezo-electric data30. In the remainder of the paper, we summarize the Boltzmann equation within the relaxation time approximation, the properties calculated, and the workflow followed. Finally, we present a graphical overview of the data and compare it with published computational and experimental values to better understand the precision and accuracy of our approach.

## Methods

### Methods definitions

In order to evaluate transport phenomena occurring at the electronic level, a microscopic model of the transport process is needed to assess the transport coefficients of materials. The basic transport equation of the current density in presence of electrical E and magnetic B field, and a temperature gradient ΔT is $ji=σijEj+σijkEjBk+νij∇jT+…$. In this work, we limit the development to the first order in the magnetic field B and we focus only on the conductivity tensors σij, σijk, and νij.

A semi-classical approach based on solving Boltzmann’s equation, within the relaxation time approximation, is commonly used to describe the conductivity tensors. This model evaluates the electrical conductivity introducing a lifetime, τ, for an electron that encapsulates all the different scattering mechanisms that it can undergo1,​2,​3. Following the notation used in ref. 4 describing the BoltzTraP code, the conductivity tensors can be written as: $(1)σαβ(i,k)=e2τi,kυα(i,k)υβ(i,k)$ and using the Levi-Civita tensor31 εijk: $(2)σαβγ(i,k)=e3τi,k2εγuυυα(i,k)υυ(i,k)Mβu−1,$ in terms of the group velocity and the inverse mass tensor: $(3)υα(i,k)=1ℏ∂εi,k∂kα,Mβu−1(i,k)=1ℏ2∂2εi,k∂kβ∂ku.$

Apart from the band structure (εi,k), the relaxation time τi,k term needs to be defined. It describes all the scattering processes involved in the electronic transport and, in the most general description, it depends on both energy band index i and k vector direction. In the section Limitations, we provide a more detailed description about common models used to compute the relaxation time (one of which consist in approximating it by a constant) and how we treat it in our HT approach.

Summing over all the bands and all the k-points in the full Brillouin zone, we calculate a differential conductivity tensor depending on energy: $σαβ(ε)=1N∑i,kσαβ(i,k)δ(ε−εi,k)$, where i is the number of bands and N is the number of k-points. The three main transport tensors depending on the temperature T and the Fermi level (or chemical potential) of the electrons μ are now accessible4:

1. the conductivity related to the electric field: $(4)σαβ(T;µ)=1Ω∫σαβ(ε)[−∂fµ(T;ε)∂ε]dε,$

2. the conductivity related to the electric and magnetic field: $(5)σαβγ(T;µ)=1Ω∫σαβγ(ε)[−∂fµ(T;ε)∂ε]dε,$

3. the conductivity related to the thermal gradient: $ναβ(T;µ)=1eTΩ∫σαβ(ε−µ)[−∂fµ(T;ε)∂ε]dε,$

4. the electronic contribution to thermal conductivity: $(6)καβ0(T;µ)=1e2TΩ∫σαβ(ε)(ε−µ)2[−∂fµ(T;ε)∂ε]dε.$ where fμ is the Fermi distribution, Ω is the volume of the unit cell, and e the electron charge. From these tensorial quantities, it is straightforward to determine the other following quantities: $(7)κijel=κij0−Tνiα(σ−1)βανβj,theelectronicthermalconductivityatzeroelectriccurrent;$ $(8)Sij=Ei(∇jT)−1=(σ−1)αiναj,theSeebeckcoefficient;$ $(9)Rijk=EjindjiapplBkappl=(σ−1)αjσαβk(σ−1)iβ,theHallcoefficient;$ $(10)n(T;µ)=nυ−1Ω∫g(ε)fµ(T;ε)dε,thedopingcarrierconcentration.$

The Seebeck coefficient Sij, also known as thermopower, is one of the characteristic properties of thermoelectrics. Within the constant relaxation time approximation, 1/Rijk is proportional to the Hall carrier density, a quantity usually obtained in experiments by Hall effect measurements. n(T; μ) is the electron or hole concentration depending on the doping type, calculated via the density of states g(ε), the number of valence electrons per volume $nυ$ and the Fermi distribution fμ(T; ε). All these quantities are part of the standard output of the BoltzTraP code.

In addition, we computed the conductivity effective mass. This effective mass is simply derived from the conductivity tensor and the doping carrier concentration: $(11)M¯αβ−1=σαβne2τ$

We note that this definition works properly only for semiconductors where the doping carrier concentration (equation (10)) is well defined. In metals and small gap materials it fails because the doping carrier concentration deviates from the total carrier concentration, as we discuss further in the Usage Notes. Effective mass tensors are typically evaluated from band structures by computing second derivatives at a certain k-point (e.g., the valence band maximum or conduction band minimum) along certain symmetry lines through finite differences. There are numerical challenges in doing so32 and choosing the k-point to evaluate the effective mass is not obvious when facing band structures with important non-parabolicity, multiple degenerate bands or pockets with close energy in different part of the Brillouin zone. The conductivity effective mass can be also seen as an average over the Brillouin zone and bands of the k-dependent second derivative (equation (3)) as integration by parts leads to: $(12)M¯αβ−1=−∑i∫Mαβ−1(i,k)fµ(εi,k,T)dk4π3∑i∫fµ(εi,k,T)dk4π3.$

We note that this conductivity effective mass tensor is dependent on temperature and doping level. This quantity has been successfully used for high-throughput screening of new low effective mass transparent conducting and thermoelectric materials27,28,33,​34,​35. Hereafter, when we refer to calculated effective mass we mean conductivity effective mass.

The integration of Boltzmann’s equation requires an analytical description of the band structure. The BoltzTraP code provides it using an interpolation method based on a Fourier expansion of the band energies that maintains the space group symmetry by using star functions. The basic idea of this technique is to use more star functions than band-energies, but constraining the number of fit bands $ε˜$ to be equal to the number of energy bands ε and using the additional freedom to minimize a roughness function ρ. This method was introduced by Shankland36, verified and tested by Koelling and Wood37, and modified by Pickett et al.38. The BoltzTraP code has been largely tested over the last decade in different applications ranging from superconductors39 to thermoelectric40,​41,​42,​43,​44 materials, and good agreement has been found with experimental values in several cases45,​46,​47. From a practical point of view, the BoltzTraP code takes as input the electronic energies for different k-points, previously calculated by a DFT code (or other methods), interpolates the bands, and computes the Fermi integrals for different temperatures and Fermi level. Finally, it returns as output all the transport coefficients, along with other data such as the coefficients of the interpolating function.

Finally, we also would like to mention the BoltzWann code48: a recent attempt to interpolate bands using Wannier functions49. Although it provides a greater accuracy for the interpolated band structures, (e.g., treating the band crossings better), this method has not been as widely tested as BoltzTraP, and it is difficult to exploit within a HT framework since the automated construction of Wannier functions is still in its early stages50.

### Computational parameters

The input data needed to run BoltzTraP are the crystal structure and the electronic band structure on a uniform grid. Both of these inputs are computed using the standard high-throughput density functional theory (HT-DFT) recipe from the MP summarized in refs 51,52. The DFT calculations were performed using the Vienna Ab initio Simulation Package (VASP)53,54 using the Perdew-Burke-Ernzerhof (PBE)55 generalized gradient approximation (GGA) and adopting the projector augmented-wave (PAW)56,57 approach. For transition metal oxides with localized d orbitals, the GGA+U method was employed setting the MP standard Hubbard corrections58,59. Most of the structures contained in the MP database originate from the Inorganic Crystal Structures Database (ICSD)60,61. The others come from previous high-throughput projects (e.g., a Li-ion battery screening project51) as well as from other databases (e.g., the Open Quantum Materials Database18). All structures were fully-relaxed (cell and atomic positions) using a two-step procedure, until the energy difference is lower than 0.0005 eV/atoms. All relaxations were performed with spin polarization on and initializing magnetic ions in a high-spin ferromagnetic. For subsequent calculations spin-polarization was retained only when the relaxation results demonstrated non-zero atomically projected magnetic moments. The band structure calculations were determined for standard primitive cells according to the conventions of Setyawan and Curtarolo19. A self-consistent static calculation was first performed in order to converge the charge density using a moderate k-point density to sample the Brillouin zone (90 k-points per Å−3 (reciprocal lattice volume) for large gap systems (≥0.5 eV) and of 450 k-points per Å−3 for those with small gap (<0.5 eV)). The tetrahedron method has been used for the band structure integration over k space in most of the cases. Whenever this method fails, the Gaussian smearing method has been used51,52. Then, two non-self-consistent calculations were performed to evaluate the band structures: the first one along symmetry lines as defined in ref. 19 and the second one on an uniform k-point grid (1,000 k-points per Å−3 for large band gap systems, i.e., ≥0.5 eV, estimated from self-consistent runs and 1,500 k-points per Å−3 for small band gap systems i.e., <0.5 eV). Spin-orbit coupling was not considered in the current study, but could be implemented as a next step to refine the database.

Doping (i.e., introduction of additional carriers either holes or electrons) has a tremendous effect on electronic transport properties. Doping will set the Fermi level (μ) and directly influence the values of the transport properties. A first dataset provides all the transport quantities for both n-type and p-type doping at fixed doping levels ranging from 1016 to 1020 cm−3, increasing the doping by one order of magnitude at each step. A second and finer dataset provides the electronic transport properties at various Fermi level energies (on a uniform bin from −1.5 to 1.5 eV around the Fermi level with an energy increment of 0.005 eV), and temperatures (ranging from 100 to 1,300 K with an increment of 100 K). The transport quantities accessible in the two datasets are listed in Tables 1 and 2. We should note that users interested in values for doping levels not within our fixed dopings from 1016 to 1020cm−3 can use the finer dataset to compute more precise doping (see Usage Notes).

### Limitations

Here, we would like to discuss the main approximation which is made in this work: the constant relaxation time. Looking at the conductivity tensor, the relaxation time τi,k is written in the general form as a tensor depending on both the energy and the direction. All scattering events that can influence electron conduction such as impurity scattering, phonon scattering, etc., are included in this parameter1,2,62. Considering this term as a constant thus means that it is modeled to be isotropic and not strongly varying at the energy scale of kBT. This is a strong approximation that it is known to be far from experimental values for several materials. Many models have been proposed and tested in order to take into account different scattering processes, both empirical39,63,​64,​65,​66 and first-principles67,68. However such models for going beyond the constant relaxation time are more complex and introduce a dependence on further materials properties such as electron-phonon interaction, deformation potential, elastic constants, and dielectric constants. They are therefore more difficult to use on a high-throughput scale for thousands of materials. We should stress that while more accurate approaches exist, particularly for detailed studies of single materials, the constant relaxation time is extremely useful for a first screening and for getting general trends if the user keeps in mind its limitations23,​24,​25.

As conductivities (thermal and electronic) depend proportionally on the relaxation time within our constant relaxation time framework, we provide those quantities per unit of relaxation time. The user could then simply multiply these values by a constant relaxation time (typically 10−14 to 10−15 s) to obtain the final transport properties. The Seebeck coefficient does not depend on the relaxation time within the constant relaxation time approximation. We remind though that in this approximation the sign of Seebeck coefficient is wrong for some metals69.

Another issue is related to the k-point grid. Its density is quite important for the precision of transport properties calculated by interpolation. A known problem of the Fourier interpolation is the incorrect determination of band derivatives near band crossings. This problem has been analyzed in ref. 38 demonstrating that if the band crossing is not too close to the Fermi level, the derivative and curvature of the bands are not much affected. A possible solution has been proposed by Uehara et al.70. Also, as mentioned in ref. 4, this problem is localized only along high-symmetry lines. A dense k-point grid will often solve this issue, and since properties are averaged with respect to k-points and bands their accuracy is not affected significantly. When considering a limited number of materials, a very dense k-grid is commonly used. For example, Madsen suggests 64·106/V k-points in the full Brillouin zone24. Since we are dealing with thousands of materials, the k-point grid used in this project is coarser. It represents a compromise between computational time and accuracy. However, we stress here that we use a validation method (see Validation section) which tests the quality of the band structure interpolation and assesses if the k-point grid is dense enough to avoid any large failure of the interpolation scheme.

Finally, standard density functionals such as the generalized gradient approximation (GGA) used in this work are known to underestimate band gaps. We have found that, in particular, materials for which we predict band gaps less then about 10 kBT, but the true gaps are higher than this value, can be subject to larger errors in the predicted properties23.

### Workflow

The sequence of steps used for the HT calculations in order to produce the dataset is illustrated in Fig. 1. It has been automated using the FireWorks workflow software71. The Materials Project provides the GGA/GGA+U band structure on a uniform grid for the majority of the materials. On this set of materials, we executed the BoltzTraP code exploiting the BoltztrapRunner class from the pymatgen software72. This class, written by some of the authors of this paper, automates writing the four input files required by BoltzTraP, converting units (from eV in Ry and from bohr to Å), checking possible known errors in the output log file, and rerunning BoltzTraP with different parameters in order to solve them. This class also includes an internal loop on two main parameters to get a convergence of the conductivity effective mass. The two tuned parameters in the loop are the lpfac, controlling the multiplier for the interpolated mesh and the energy_grid that is the increment used to compute the integral of transport properties. We use another class of pymatgen that we developed, called BoltztrapAnalyzer, to extract the properties from the output and transform them into Python dictionaries that organize the data according to the doping type, doping levels, and temperatures.

Before storing the transport properties, we perform a validation step, which compares the bandstructure on high-symmetry lines calculated by DFT with those interpolated by BoltzTraP. Having a rough assessment of the interpolation accuracy, we can weight the reliability of the related properties. We can also determine in which cases the uniform grid is too sparse, and when needed, recompute the band structure with a denser grid. This validation step is discussed further in the validation section.

Finally, once all the properties are collected for each material, we store them in the form of a JSON (JavaScript Object Notation) data document in the Dryad-repository (Data Citation 1: Dryad digital repository https://doi.org/10.5061/dryad.gn001). Furthermore, in the future all currently available data will be accessible via the MP website and obtainable by the MP REST API73,74.

### Code availability

The proprietary Vienna Ab Initio Simulation Package (VASP) code53,54 is used in this work for the calculation of band structures. The BoltzTraP code is open source and freely accessible. The python classes used to run the BoltzTraP code, extract its output, format it, and perform the accuracy check on bands are implemented in the pymatgen software72. Pymatgen is released under the MIT (Massachusetts Institute of Technology) License and is open source. The workflow depicted in Fig. 1 is implemented using the FireWorks software71, which is open source under a modified GPL (GNU General Public License). Although VASP is available only under commercial license, the present results can be reproduced by querying for the band structures in the MP database using the associated mp-id and then running BoltzTraP calculations.

## Data Records

### File format

The data for each of the calculated material is stored as a JSON document (Data Citation 1: Dryad digital repository https://doi.org/10.5061/dryad.gn001). The JSON format is comprised of hierarchical key-value pairs. Tables 1 and 2 report the first level JSON keys, units, the datatype of the values, and a short description, for both datasets. Table 4 contains a description of the dictionary used to store the output of the check of the interpolation of bands. All these keys are inside the main root key called ‘GGA’ (and/or ‘GGA+U’ when available). Table 5 offers a description of the structure of the dictionary used for collecting all the values of each property according to doping type, temperature, doping level, and data type. Additional keys (located at the root level) are provided as metadata for each entry of both datasets. They contain information regarding some of the properties of the materials, such as the crystal structure and a unique mp-id for structure identification within the MP database.

### Properties

The properties included in the two datasets are reported in Tables 1 and 2. Each property is stored in a dictionary and, except for the effective mass, has been calculated for various doping types, temperatures, doping levels, and data type. All these cases are accessible by the sub-keys reported in Table 5.

In the first dataset, the following properties are stored: Seebeck coefficient, electronic conductivity (divided by τ), and electronic thermal conductivity (divided by τ) for different doping (type and levels) and temperature; carrier and Hall carrier concentration for different temperatures as a function of the Fermi level (energy steps contained as values of the mu_steps key); effective masses for a doping concentration of 1018 cm−3 at 300 K, where n- and p-type refer to electron and hole masses, respectively; the Fermi level values.

In the second dataset (containing additional information intended for expert users) the following properties are stored: Seebeck coefficient, electronic conductivity (divided by τ), and electronic thermal conductivity (divided by τ) for different temperatures as a function of the Fermi level (energy steps contained as values of the mu_steps key); effective masses for different doping levels (n- and p-type) at 300 K; the values of the chemical potential corresponding to each doping level (n- and p-type); the Fermi level values.

In both datasets, for the Seebeck coefficient, the electronic conductivity (divided by τ), and the electronic thermal conductivity (divided by τ) both the full tensor and its eigenvalues, sorted in ascending order, are stored. For the effective mass, only the sorted (in ascending order) eigenvalues of the full tensor are stored. Regarding the Hall carrier concentration, only the averaged trace of the full Hall tensor is stored. We provide eigenvalues since they are invariant of the axis choice. They are therefore extremely useful to query. For instance, a search for high Seebeck materials would involve a query on the Seebeck eigenvalues. To facilitate queries, the eigenvalues are sorted by ascending order (the first eigenvalue being the smallest one). The anisotropy of a property can directly be assessed by the difference between the last and first eigenvalue. We stress that the provided eigenvalues are sorted in ascending order and do not contain any information about the corresponding principal directions. In order to obtain the correspondence between crystallographic directions and eigenvalues, we suggest to work on the full tensor (and the crystal structure information) and apply an algorithm finding eigenvalues and eigenvectors (see also Usage Notes). We also remind that the effective masses are reported only for semiconductor materials, namely compounds with a band gap higher than zero in GGA or GGA+U.

### Graphical representation of results

In Figs 2 and 3, we present some of the transport properties stored in the current database. In Fig. 2, we present the Seebeck coefficient as a function of the electrical conductivity (divided by τ), for all materials having a GGA band gap higher than 0.1 eV (around 25,000 compounds). Both properties are computed for 600 K and a doping level of 1020 cm−3. The diameter of the circles is used to indicate the band gap and the color to represent the power factor, S2σ (PF). The graph shows an almost symmetrical spread of points with respect to the x-axis. The two halves contain the two types of doping due to the opposite sign of the Seebeck coefficient. The color gradient shows a reasonable increasing trend toward values of Seebeck and conductivity that maximize the PF. It is evident, however, how difficult it is for materials to reach both high Seebeck and high conductivity at the same time, given the absence of points in that region. The distribution of points according to their size suggests that small band gap materials are concentrated in a range of Seebeck coefficient values lower than 200 μV/K. Above 200 μV/K is difficult to find any trend because of the overlapping of data points.

In Fig. 3, we plot the electrical part of thermal conductivity as a function of the electrical conductivity (both divided by τ) for all metallic compounds (with a gap equal to zero in GGA) in the database (~18,000 compounds). For such materials, the electronic contribution of the thermal conductivity can be related to the electrical conductivity and the temperature through the well known Wiedemann-Franz law: κel/σ=LT, where L=2.4·10−8 WΩK−2 is the Lorenz number. This law is plotted as a blue line superimposed onto the set of points. The theoretical trend is followed quite well by our dataset, especially for those materials that are common metals with electronic conductivity in the range 1021−1022 (mΩs)−1.

Experimental data for Seebeck, thermal and electrical conductivity stored in the MRL database of thermoelectric properties75,76 show very similar trends.

## Technical Validation

### Validation of interpolation precision

Given that the initial uniform k-point grid of band structure might not be sufficient for a good interpolation of all band structures, we performed a post-process check before storing our data. The band structure along symmetry lines given by the interpolation are compared to the one explicitly computed with denser k-point grid which are reported in the Materials Project. This comparison has been implemented in pymatgen.

The comparison is two-fold. First, we assess the correlation distance (as defined in scipy.spatial.distance.correlation class; basically 1−ρ, where ρ is the Pearson coefficient) between the two energy bands to determine if they behave similarly. Second, we evaluate their energy distance for each segment of high-symmetry path by means of a sum of absolute differences averaged over the number of k-points in each segment: $Dik−path=1N∑k|εi,kBzt−εi,kDFT|$, where $εi,kBzt,εi,kDFT$ are the energies for the band i in the k-point k calculated by BoltzTraP and DFT, respectively. The output of this check is stored in a dictionary described in the Table 4. It mainly contains the correlation distance and the energy distance (for each segment and for the entire band) for the last (first) four valence (conduction) bands for non-metals or four bands above and four below the Fermi level for metals. For a quick screening, it also contains a warning flag (see ‘acc_err’ key in Table 4), for both correlation distance and energy distance (for the entire band), set to True when their average over the eight bands is higher then 0.03. According to this threshold, around 2.5% of GGA/GGA+U band structures have a warning on the correlation and 4% have a warning on the energy distance. The data with a warning on interpolation should be used with extreme caution.

### Validation through comparison to experimental measurements

In this section, we evaluate the level of agreement between calculated properties and the experimental counterpart. Several sources of disagreement can a priori be expected. First of all, we use a series of approximations including DFT, the neglect of temperature effect on the band structure and the constant relaxation time assumption. Numerical effects will also be present in terms of the k-point grid density or the accuracy of derivative close to band crossings although we expect those to be of smaller effect. Finally, experimental measurements are often performed on crystals that could have impurities or be polycrystalline.

Keeping that in mind, we refer to a recent paper by Chen et al.23 where the Seebeck coefficient and electrical conductivities providing from a same approach using DFT and the constant relaxation time within BoltzTraP are compared with experimental measurements. We summarize here only the main outcomes of the comparison, and refer the reader to the original paper and its supplementary section for more details. The best agreement is by far obtained for the Seebeck coefficient. Mobilities and conductivities are more sensitive to the constant relaxation time approximation but general trends between materials are fairly reproduced. We should stress though that our dataset has not been corrected for the typical band gap error in DFT by a scissor operation.

We finally compare our computed effective mass with experimental data. We only select direct measurements of effective mass through cyclotron resonance and Shubnikov-de Haas (SdH) effect. All the experimental data is obtained from the Landolt-Börnstein database77. We take into account the anisotropy of the effective mass when needed and report each symmetrically different direction as a different data point. Our computed effective mass is obtained from the conductivity tensor and averages all the bands contributing to the transport. When compared to cyclotron and SdH measurements of individual bands, we need to average those individual band contributions. We do so by a weighted average following the given formula: $(13)m¯12=m132⋅m1+m232⋅m2m132+m232,$ where the individual contributions are labeled with 1 and 2. The formula assumes parabolicity of the bands.

In total we compare 33 effective masses. This is the largest comparison versus experiment to our knowledge. Figure 4 plots the experimental versus the theoretical effective mass obtained by our approach within GGA. The agreement is fairly good and the trends between large and small effective mass materials are well reproduced by DFT. The calculated Pearson and Spearman coefficients are equal to 0.93 and 0.91, respectively. This justifies the use of these DFT effective masses to screen for materials with low effective masses27,33. No difference in accuracy between electron and hole effective mass is noticeable. Most of the DFT effective masses underestimate the experimental data. This could come from either a systematic tendency for DFT along the underestimation of the band gap as well for the effect of large polaron present in experiments and not taken into account in our work.

When comparing our results with experiments, one should keep in mind the systematic tendency for semilocal exchange-correlation functionals used within DFT to underestimate the band gap. While the band structure of semiconductors with smaller band gaps can still provides very useful transport properties, the closing of the band gap and the formation of a metallic compounds can lead to much larger deviations.

## Usage Notes

Our paper provides a dataset of transport properties on about 48,000 materials derived from DFT (GGA/GGA+U level) band structures and Boltzmann transport calculations within the constant relaxation time approximation. This type of data has already been used to give insights into fundamental materials properties in electronics, or thermoelectrics. While we warn the user to be always careful in the way this dataset is used (keeping in mind the limits of our approach), this database constitutes a powerful basis for materials search and data mining of materials transport properties.

The meaning of the doping provided by BolzTraP and used in our dataset needs to be clarified. The doping level is not the total amount of carriers. (equation (10)) states that the doping concentration is the difference between the number of electrons per volume present in an undoped material and the number of electrons per volume at the given Fermi level. For a better understanding, we can rephrase it defining the doping concentration as the number of excess holes compared to the number of free electrons at the given Fermi level. It is more clear now that the doping concentration is positive for p-type doping, where there are many more holes than free electrons, and negative for n-type doping, where the opposite is true. We note that mobile carriers that are intrinsically generated, resulting in equal numbers of holes and free electrons, are not considered as part of the doping concentration. For example, metals and small gap materials may include a significant carrier concentration that is intrinsic and separate from the doping levels reported in this work. For such materials, the total carrier concentration can be directly obtained using for instance the Hall carrier concentration. We also remind the user to keep in mind that the Hall carrier concentration does not have to be the same than doping in general. This equality is only exact for parabolic bands when the semiconductor is highly degenerate78. When comparing experimental and theoretical results, one should remember that the vast majority of the cases carrier concentration provided experimentally are Hall carrier concentration. Moreover, this definition of carrier concentration affects the assessment of the conductivity effective mass given by equation (11). Therefore we report the effective mass only for materials with an energy gap higher then zero in GGA or GGA+U and we advice the user to be careful using the effective mass for materials with an energy gap lower than 0.1 eV.

As mentioned, we provide in the first dataset all the transport properties at fixed doping levels. If the value of a certain property at a different doping level is needed, it is possible for the user to use the second dataset providing properties in function of Fermi level. When a target doping is set, the user can find what Fermi level would provide this doping level at the required temperature and use the properties corresponding to this Fermi level and given temperature.

In both datasets, we stored both the full tensor and its sorted (in ascending order) eigenvalues for the Seebeck coefficient, the electronic conductivity (divided by τ), and the electronic thermal conductivity (divided by τ). The eigenvalues (also sorted in ascending order) of the effective mass are also provided. In case the value of a property along a specific direction of the crystal is needed, the use of the full tensor and the structure are mandatory. It is also important to note that when a derived property is needed (e.g., the power factor S2σ), it would be wrong to operate on eigenvalues (since they might not refer to corresponding directions). Therefore, we strongly suggest to instead perform the operations on the full tensors. Eigenvalues can be obtained by running an adequate algorithm on the resulting full tensor.

How to cite this article: Ricci, F. et al. An ab initio electronic transport database for inorganic materials. Sci. Data 4:170085 doi: 10.1038/sdata.2017.85 (2017).

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## References

1. 1.

Principles of the Theory of Solids. 2nd edn (Cambridge University Press, 1972).

2. 2.

. Electron Transport in Compound Semiconductors (Springer Verlag, 1980).

3. 3.

, , , & Transport coefficients from first-principles calculations. Phys. Rev. B 68, 125210 (2003).

4. 4.

& Boltztrap. a code for calculating band-structure dependent quantities. Computer Physics Communications 175, 67–71 (2006).

5. 5.

et al. The high-throughput highway to computational materials design. Nat Mater 12, 191–201 (2013).

6. 6.

, , , & Computational high-throughput screening of electrocatalytic materials for hydrogen evolution. Nat Mater 5, 909–913 (2006).

7. 7.

, , & Novel ternary sulfide thermoelectric materials from high throughput transport and defect calculations. J. Mater. Chem. A 4, 11086–11093 (2016).

8. 8.

et al. Novel mixed polyanions lithium-ion battery cathode materials predicted by high-throughput ab initio computations. J. Mater. Chem. 21, 17147–17153 (2011).

9. 9.

, & Computational predictions of energy materials using density functional theory. Nature Reviews Materials 1, 15004 (2016).

10. 10.

, & From the computer to the laboratory: materials discovery and design using first-principles calculations. Journal of Materials Science 47, 7317–7340 (2012).

11. 11.

, , , & Finding unprecedentedly low-thermal-conductivity half-heusler semiconductors via high-throughput materials modeling. Phys. Rev. X 4, 011019 (2014).

12. 12.

et al. Phosphates as lithium-ion battery cathodes: An evaluation based on high-throughput ab initio calculations. Chemistry of Materials 23, 3495–3508 (2011).

13. 13.

, , & New opportunities for materials informatics: Resources and data mining techniques for uncovering hidden relationships. Journal of Materials Research 31, 977–994 (2016).

14. 14.

, , , & Relating voltage and thermal safety in li-ion battery cathodes: a high-throughput computational study. Phys. Chem. Chem. Phys. 17, 5942–5953 (2015).

15. 15.

, & Research update: The materials genome initiative: Data sharing and the impact of collaborative ab initio databases. APL Mater 4, 053102 (2016).

16. 16.

The materials project. .

17. 17.

et al. Commentary: The materials project: A materials genome approach to accelerating materials innovation. APL Mater 1, 011002 (2013).

18. 18.

, , , & Materials design and discovery with high-throughput density functional theory: The open quantum materials database (oqmd). JOM 65, 1501–1509 (2013).

19. 19.

& High-throughput electronic band structure calculations: Challenges and tools. Computational Materials Science 49, 299–312 (2010).

20. 20.

21. 21.

et al. The harvard clean energy project: Large-scale computational screening and design of organic photovoltaics on the world community grid. The Journal of Physical Chemistry Letters 2, 2241–2251 (2011).

22. 22.

Automated search for new thermoelectric materials: The case of liznsb. Journal of the American Chemical Society 128, 12140–12146 (2006).

23. 23.

et al. Understanding thermoelectric properties from high-throughput calculations: trends, insights, and comparisons with experiment. J. Mater. Chem. C 4, 4414–4426 (2016).

24. 24.

& High-throughput exploration of alloying as design strategy for thermoelectrics. Phys. Rev. B 92, 085205 (2015).

25. 25.

et al. Designing high-performance layered thermoelectric materials through orbital engineering. Nature Communications 7, 10892 (2016).

26. 26.

, , , & High-throughput study of the structural stability and thermoelectric properties of transition metal silicides. New Journal of Physics 15, 105010 (2013).

27. 27.

, , , & Identification and design principles of low hole effective mass p-type transparent conducting oxides. Nature Communications 4, 2292 (2013).

28. 28.

et al. Effective mass and fermi surface complexity factor from ab initio band structure calculations. npj Computational Materials 3, 8 (2017).

29. 29.

et al. Charting the complete elastic properties of inorganic crystalline compounds. Scientific Data 2, 150009 (2015).

30. 30.

, , , & A database to enable discovery and design of piezoelectric materials. Scientific Data 2, 150053 (2015).

31. 31.

. The Hall Effect in Metals and Alloys (Springer US, 1972).

32. 32.

et al. Precise effective masses from density functional perturbation theory. Phys. Rev. B 93, 205147 (2016).

33. 33.

, , , & How does chemistry influence electron effective mass in oxides? a high-throughput computational analysis. Chemistry of Materials 26, 5447–5458 (2014).

34. 34.

et al. High-mobility bismuth-based transparent p-type oxide from high-throughput material screening. Chemistry of Materials 28, 30–34 (2016).

35. 35.

et al. High-throughput design of non-oxide p-type transparent conducting materials: Data mining, search strategy, and identification of boron phosphide. Chemistry of Materials 29, 2568–2573 (2017).

36. 36.

. Computational Methods in Band Theory, 362 (Plenum, 1971).

37. 37.

& On the interpolation of eigenvalues and a resultant integration scheme. Journal of Computational Physics 67, 253–262 (1986).

38. 38.

, & Smooth fourier interpolation of periodic functions. Phys. Rev. B 38, 2721–2726 (1988).

39. 39.

, & Anisotropic normal-state transport properties predicted and analyzed for high-Tc oxide superconductors. Phys. Rev. B 37, 7482–7490 (1988).

40. 40.

& Calculated thermoelectric properties of la-filled skutterudites. Phys. Rev. B 56, R1650–R1653 (1997).

41. 41.

, , , & Band structure engineering through orbital interaction for enhanced thermoelectric power factor. Applied Physics Letters 104, 082107 (2014).

42. 42.

, & Analysis of the thermoelectric properties of n-type zno. Phys. Rev. B 83, 115110 (2011).

43. 43.

et al. Computational and experimental investigation of TmAgTe2 and XYZ2 compounds, a new group of thermoelectric materials identified by first-principles high-throughput screening. J. Mater. Chem. C 3, 10554–10565 (2015).

44. 44.

et al. YCuTe2: a member of a new class of thermoelectric materials with CuTe4-based layered structure. J. Mater. Chem. A 4, 2461–2472 (2016).

45. 45.

, , & Electronic structure and transport in type-I and type-VIII clathrates containing strontium, barium, and europium. Phys. Rev. B 68, 125212 (2003).

46. 46.

, , , & Transport properties of composition tuned α- and β-Eu8Ga16−xGe30+x. Phys. Rev. B 71, 165206 (2005).

47. 47.

et al. Relationship between composition and charge carrier concentration in Eu8Ga16−xGe30+x clathrates. Phys. Rev. B 71, 165205 (2005).

48. 48.

, , , & Boltzwann: A code for the evaluation of thermoelectric and electronic transport properties with a maximally-localized wannier functions basis. Computer Physics Communications 185, 422–429 (2014).

49. 49.

& Maximally localized generalized wannier functions for composite energy bands. Phys. Rev. B 56, 12847–12865 (1997).

50. 50.

, , & Automated construction of maximally localized wannier functions: Optimized projection functions method. Phys. Rev. B 92, 165134 (2015).

51. 51.

et al. A high-throughput infrastructure for density functional theory calculations. Computational Materials Science 50, 2295–2310 (2011).

52. 52.

The materials project wiki page. .

53. 53.

& Ab initio molecular dynamics for liquid metals. Phys. Rev. B 47, 558–561 (1993).

54. 54.

& Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. Phys. Rev. B 54, 11169–11186 (1996).

55. 55.

, & Generalized gradient approximation made simple. Phys. Rev. Lett. 77, 3865–3868 (1996).

56. 56.

Projector augmented-wave method. Phys. Rev. B 50, 17953–17979 (1994).

57. 57.

& From ultrasoft pseudopotentials to the projector augmented-wave method. Phys. Rev. B 59, 1758–1775 (1999).

58. 58.

, & Band theory and mott insulators: Hubbard U instead of stoner I. Phys. Rev. B 44, 943–954 (1991).

59. 59.

, , , & Electron-energy-loss spectra and the structural stability of nickel oxide: An lsda+u study. Phys. Rev. B 57, 1505–1509 (1998).

60. 60.

, , & New developments in the Inorganic Crystal Structure Database (ICSD): accessibility in support of materials research and design. Acta Crystallographica Section B 58, 364–369 (2002).

61. 61.

, , & The inorganic crystal structure data base. Journal of Chemical Information and Computer Sciences 23, 66–69 (1983).

62. 62.

. Intern. Tables for Crystall. D Vol. 1, Chap. 8 220–227 (Cambridge University Press, 2006).

63. 63.

et al. Doping-induced dimensional crossover and thermopower burst in nb-doped srtio3 superlattices. Phys. Rev. B 88, 045310 (2013).

64. 64.

et al. Thermopower in oxide heterostructures: The importance of being multiple-band conductors. Phys. Rev. B 86, 195301 (2012).

65. 65.

, , , & Prediction of a native ferroelectric metal. Nature Communications 7, 11211 (2016).

66. 66.

& Nontrivial behavior of the thermoelectric power: Electron-electron versus electron-phonon scattering. Phys. Rev. B 61, 5303–5310 (2000).

67. 67.

, & Ab initio electronic transport model with explicit solution to the linearized boltzmann transport equation. Phys. Rev. B 91, 235123 (2015).

68. 68.

, & Electron-phonon interaction using wannier functions. Phys. Rev. B 76, 165108 (2007).

69. 69.

& First principles explanation of the positive seebeck coefficient of lithium. Phys. Rev. Lett. 112, 196603 (2014).

70. 70.

& Calculations of transport properties with the linearized augmented plane-wave method. Phys. Rev. B 61, 1639–1642 (2000).

71. 71.

et al. Fireworks: a dynamic workflow system designed for high-throughput applications. Concurrency and Computation: Practice and Experience 27, 5037–5059 (2015).

72. 72.

et al. Python materials genomics (pymatgen): A robust, open-source python library for materials analysis. Computational Materials Science 68, 314–319 (2013).

73. 73.

et al. The materials application programming interface (api): A simple, flexible and efficient {API} for materials data based on {REpresentational} state transfer (rest) principles. Computational Materials Science 97, 209–215 (2015).

74. 74.

. Architectural Styles and the Design of Network-based Software Architectures, Ph.D. thesis University of California, Irvine (2000).

75. 75.

et al. Data-driven review of thermoelectric materials: Performance and resource considerations. Chemistry of Materials 25, 2911–2920 (2013).

76. 76.

Mrl, material research laboratory. .

77. 77.

Landolt-bornstein—group III condensed matter. .

78. 78.

& Introduction to Modeling Thermoelectric Transport at High Temperatures Vol. 1, Chap. 11 (CRC Press, 2012).

1. 1.

## Acknowledgements

This work was intellectually led by the U.S. Department of Energy, Office of Basic Energy Sciences, Early Career Research Program, which funded A.J.’s portion of this work. F.R. and G.H. were supported by the F.R.S.-FNRS project HTBaSE (contract no PDR-T.1071.15). W.C., U.A., and G.J.S. acknowledge funding from the Materials Project Center, supported by the DOE Basic Energy Sciences Grant No. EDCBEE. This work made use of resources of the National Energy Research Scientific Computing Center (NERSC), supported by the Office of Basic Energy Sciences of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. Additional computational resources were provided by the supercomputing facilities of the Université catholique de Louvain (CISM/UCL), the Consortium des Equipements de Calcul Intensif en Fédération Wallonie Bruxelles (CECI) funded by the F.R.S.-FNRS. The authors thank Shyue Ping Ong for his contributions in maintaining and developing pymatgen which was used heavily in this work. They also thank Max Wood and Jan-Hendrik Pöhls for contributing in the review process of the article.

## Affiliations

1. ### Institute of Condensed Matter and Nanosciences (IMCN), Université catholique de Louvain, Chemin des étoiles 8, bte L7.03.01, Louvain-la-Neuve, Belgium

• Francesco Ricci
• , Gian-Marco Rignanese
•  & Geoffroy Hautier
2. ### Lawrence Berkeley National Lab, 1 Cyclotron Rd, Berkeley, California, USA

• Wei Chen
•  & Anubhav Jain

• Wei Chen
4. ### Department of Materials Science and Engineering, Northwestern University, 2220 Campus Drive, Evanston, Illinois 60208, USA

• Umut Aydemir
•  & G. Jeffrey Snyder

## Authors

### Contributions

F.R. worked on the analysis, verification and preparation of the data. F.R. implemented and performed the band interpolation check. A.J., G.H., F.R., G.-M. R., and W.C. worked on the development, implementation and running of the high-throughput band structure and BoltzTraP framework. U.A and G.J.S. worked on the analysis and verification of the data. G.H. supervised the work. F.R., G.H., and A.J. wrote the manuscript with help from U.A, G.J.S., and G.-M.R.

### Competing interests

The authors declare no competing financial interests.

## Corresponding author

Correspondence to Geoffroy Hautier.