Database of Wannier tight-binding Hamiltonians using high-throughput density functional theory

Wannier tight-binding Hamiltonians (WTBH) provide a computationally efficient way to predict electronic properties of materials. In this work, we develop a computational workflow for high-throughput Wannierization of density functional theory (DFT) based electronic band structure calculations. We apply this workflow to 1771 materials (1406 3D and 365 2D), and we create a database with the resulting WTBHs. We evaluate the accuracy of the WTBHs by comparing the Wannier band structures to directly calculated spin-orbit coupling DFT band structures. Our testing includes k-points outside the grid used in the Wannierization, providing an out-of-sample test of accuracy. We illustrate the use of WTBHs with a few example applications. We also develop a web-app that can be used to predict electronic properties on-the-fly using WTBH from our database. The tools to generate the Hamiltonian and the database of the WTB parameters are made publicly available through the websites https://github.com/usnistgov/jarvis and https://jarvis.nist.gov/jarviswtb.


Background & summary
Wannier functions (WF) were first introduced 1 in 1937, and have proven to be a powerful tool in the investigation of solid-state phenomenon such as polarization, topology, and magnetization 2 .
Mathematically, WFs are a complete orthonormalized basis set that act as a bridge between a delocalized plane wave representation commonly used in electronic structure calculations and a localized atomic orbital basis that more naturally describes chemical bonds [3][4][5] . One of the most common ways of obtaining Wannier tight-bonding Hamiltonians (WTBH) [6][7][8] is by using the Wannier90 software package 9 to generate maximally localized Wannier functions, based on underlying density functional theory (DFT) calculations. However, obtaining high-quality Wannier functions requires several choices by code users, including which bands and energy ranges to Wannierize, as well as a choice of starting orbitals. Therefore, in order to unlock the many materials properties that can be calculated with WTBH for use in high-throughput computations, we provide tools to automate the Wannierization of DFT band structures, and we generate a database of verified WTBH for use in future applications.
The computational advantage of Wannier functions comes from their localization, which allows the WTBH to be determined once on a relatively coarse real-space grid, and then Fourier transformed to obtain the Hamiltonian and its derivatives at arbitrary k-points, allowing many expressions to be evaluated efficiently 10 . Many computationally expensive quantities such as Z2 index, Chern number, Fermi-surfaces, Weyl-chirality, Hall conductivity, spin-texture, photogalvanic effect, thermoelectric properties, thermal properties, Landau level applications, gyrotropic effects, and shift-photocurrent 9,11-16 can be efficiently computed with WTBHs. In addition, many materials properties are based on localized phenomena [17][18][19] such as impurities 20 , defects 21,22 , excitons 23 , polarons 24 , screened electron-electron interaction 25 , and electron-phonon interactions 26 , all of which can be modeled in a Wannier basis 27 . In addition, an examination of the Wannier Hamiltonian can provide an intuition to help understand bonding that is difficult to get from plane waves. They are also useful in second quantization based beyond DFT calculations such and Dynamical Mean Field Theory (DMFT) calulations 28,29 .
The goal of this paper is to: a) develop a high throughput workflow for Wannierization of DFT  40,43 , topological materials 38,39 , and computational STM images 42 . The JARVIS-DFT database consists of ≈ 40000 3D and ≈1000 2D materials. As an initial step, we deploy our computational workflow on the materials that were recently predicted to be topologically nontrivial based on the spin-orbit spillage technique, including three dimensional (3D), two dimensional (2D), magnetic, non-magnetic, insulating, and metallic systems 38,39 systems including spin-orbit interactions. After obtaining the WTBH from DFT, we perform several checks to ensure the quality of the Hamiltonians. Although here we present results for mainly high-spillage materials, we will be extending this workflow to the entire JARVIS-DFT database. Currently, we have calculated Wannier Hamiltonians including spin-orbit coupling for 1406 3D and 365 2D materials, which can be used to efficiently calculate materials properties using either our software tools or other external software such as Wannier-tools 11 , Z2Pack 54 , WOPTIC 55 , EPW 56 . We believe that releasing this database and toolset for use by the materials community should enable accelerated materials prediction and analysis.

Methods
The methodology supporting the current project consists of several steps that are given in Fig. 1. DFT calculations were carried out using the Vienna Ab-initio simulation package (VASP) 57,58 software using the workflow given on our JARVIS-Tools github page (https://github.com/usnistgov/jarvis). Please note commercial software is identified to specify procedures. Such identification does not imply recommendation by National Institute of Standards and Technology (NIST). We use the OptB88vdW functional 59 , which gives accurate lattice parameters for both vdW and non-vdW (3D-bulk) solids 60 . We optimize the crystal-structures of the bulk and monolayer phases using VASP with OptB88vdW. Because spin-orbit coupling (SOC) is not currently implemented for OptB88vdW in VASP, we carry out spin-orbit PBE calculations.
Such an approach has been validated by Refs. 38,61 . The crystal structure was optimized until the forces on the ions were less than 0.01 eV/Å and energy less than 10 -6 eV. We use Wannier90 9 to construct Maximally-Localized Wannier Functions (MLWF) based TB-Hamiltonians.
The basic formalism of Wannierization is well-established. We briefly review some aspects here, interested readers can see longer discussions in 5,16 . For a set of Bloch eigenvectors | , ⟩, a general set of WFs | ⟩ (n=1...N) can be written as: where labels the unit cell of the WF, V is the volume of the unit cell, and ( ) is an arbitrary unitary matrix. To construct maximally-localized WFs, ( ) is chosen to minimize the following spread functional: where ̅ = ⟨0 | |0 ⟩ and 〈 2 〉 = ⟨0 | 2 |0 ⟩. The minimization proceeds iteratively, based on an initial guess of localized orbitals.
For the case of interest in this work, where we wish to describe both the valence and conduction bands near the Fermi level, it is necessary to first select a set of bands to Wannierize, which includes separating the conduction bands from the free-electron-like bands that generally overlap with them in energy 62 . The procedure to determine this localized subspace of Bloch wavefunctions proceeds similarly to minimization described above, where after an initial guess, the subspace is iteratively updated in order to minimize the spread function in Eq. 2. After this initial disentanglement step, the Wannierization of the selected subspace proceeds as described above.
Due to the iterative non-linear minimization employed during both the disentanglement and Wannierization steps, the localization and utility of the final Wannier functions depend in practice on the initial choice of orbitals that are used to begin the disentanglement procedure, and which are then used as the initial guess for the Wannierization. Our initial guesses consist of a set of atomic orbitals we have chosen to describe all the chemically relevant orbitals for each element in typical elemental systems and compounds. We provide the list of the orbitals we select for each element in Table S1. For many specific materials, it may be possible to select a smaller set of orbitals while still maintaining high-quality WFs that describe the bands of interest; however, our fairly inclusive set of orbitals is able Wannierize nearly all compounds in a high-throughput manner without human intervention. Because most applications of WFs are computationally inexpensive compared to the DFT calculations used to construct the WFs, in practice, our larger Wannier basis has only minimal computational cost. However, it is necessary to have enough empty bands in the underlying DFT calculation such that any empty orbitals chosen are included in the Bloch basis. We do not include any semicore orbitals in our Wannier basis, as they are generally well-separated in energy from the valence orbitals and are not necessary to describe bands near the Fermi level, and we exclude semicore bands from the disentanglement.
During the disentanglement step, it is possible to choose an energy range that is included exactly ("the frozen window"), with the result that the Wannier band structure will exactly match the DFT band structure in this energy range and at the grid of k-points used in the Wannierization. We use a frozen window of ± 2 eV around the Fermi-energy. This window ensures that bands near the Fermi level are well described by the WTBH. For cases where the original WFs were unsatisfactory (see below), we found that lowering the lower bound of this window to include all of the valence bands often improves that WTBH, which we use as a second Wannierization setting.
In order to validate our WTBH, we calculate the maximum absolute difference ( ) between the Wannier and DFT eigenvalues within an energy range of ± 2eV around the Fermi level: As discussed above, at the grid of k-points used in the construction of the WFs and within the frozen window, the eigenvalues should match exactly by construction. Therefore, to test the WTBH, we evaluate Eq. 3 on the dense lines of k-points used to generate our band structure plots, which tests the WFs out of sample. A weakness of this evaluation method is that highly dispersive energy bands (high ) can result in high values even if the WTBH is of good quality because any slight shift in the k-direction of a dispersive band will result in a large energy error. We consider that systems with less than 0.1 eV to useful for most applications, and we provide data for the user to evaluate individual WTBH for their own applications.

Data records
After the calculations, the TB Hamiltonians, Wannier90 input and outputs files are stored as tar files which will be distributed through the JARVIS-DFT

Technical validation
To validate the WTBHs generated in this work, we compare the Wannier electronic bands with directly calculated DFT bands and measure the differences using Eq. 3 on two different k-point grids. As an example, in Fig. 1 As expected, the agreement within the frozen window and on the dense k-point grid is almost exact, but quickly increases up to 0.25 eV when leaving the window. We find a larger but still small energy difference on the high symmetry grid Fig. 1c-d, with a maximal error in the frozen window of 9 meV. This test shows that this WTBH can be used to interpolate the band structure accurately, and at the greatly reduced computational cost compared to DFT.

Fig. 2 Comparison of DFT and WTB bandstructures for Bi2Se3. a-b) on dense k-grid, c-d) highsymmetry Brillouin zone points.
We show a few more examples of 3D WTBH in Fig. 3 for Si, PbTe, Sb2Te3 and Na3Bi, this time focusing only on the difference for the high-symmetry k-point grids. Similar to the Bi2Se3 case discussed above, they show the minimal difference, and the WTBH are able to reproduce features such as the Dirac point band crossing of Na3Bi between Γ and A.

Fig. 3 Examples of Wannier and DFT bandstructure and their energy difference plot for example 3D materials. a) Si, b) PbTe, c) Sb2Te3, and d) Na3Bi.
Bi2Se3, shown in Fig. 1, is a classic example of a 3D topological insulator. We show similar examples of 2D topological materials for graphene, ZrFeCl6, Ti2Te2P, and VAg(PSe3)2 in Fig. 4.
A detailed topological analysis of these materials can be found in our previous works 38 . Similar to the Bi2Se3 case, we observe that the DFT and WTBH bands overlap within the ± 2eV window and start to separate for outside these ranges. We again find excellent agreement between the DFT and the Wannier bands. Similar figures will be available for all the WTBH produced in this work on our website, so that the user can evaluate the WTBH for their own applications.

Fig. 4 Examples of Wannier and DFT bandstructure and their energy difference plot for example 2D materials. a) for graphene, b) for ZrFeCl6, c) for Ti2Te2P, and d) for VAg(PSe3)2.
As is clear from the above examples, it is important to evaluate the energy difference between the DFT and WTBH bands to ensure a high-quality Wannierization. We use the maximum value of these differences (MaxDiff) for each k-point and in the disentanglement window range (±2 eV) as the measure of the quality of WTBHs (see Eq. 6). We calculate these differences for both the kpoint grid and high-symmetry BZ points. Choosing a tolerance of 0.1 eV as the maximum energy difference, we find that 93.0 % of materials have a dense k-mesh MaxDiff less than the tolerance, while only 64 % of materials have high-symmetry BZ MaxDiff less than the tolerance as shown in Fig. 5a and 5b respectively. These larger discrepancies mainly occur for metallic systems such as Al, which have very dispersive electronic bands that naturally result in larger errors as disused earlier (see also Fig. (S1)).

Fig. 5 DFT-TB maximum difference (μ) distribution for all the Wannier Tight-binding Hamiltonians (WTBHs). A) on a regular k-point grid, b) on high-symmetry k-points.
Next, we show a few example applications to demonstrate the usefulness of the WTB Hamiltonians. In Fig. 6a 38 , and the resulting spinpolarized conducing edge channel can be visualized in Fig.6c. Finally, in Fig. 7 we show a screenshot of a web-app we are developing to allow users to calculate materials properties using WTBH directly from our database, without downloading the Hamiltonians themselves. Currently, we support the calculation of Wannier-projected band structures for arbitrary k-points, as well as projected DOS. In addition, we provide plots to evaluate the accuracy of the WTBH. We plan to add other WTBH related functionalities in the app soon. Fig. 7 Snapshot of the web-app.

Usage notes
The database presented here represents the largest collection of consistently calculated Wannier tight binding Hamiltonians of materials using density functional theory assembled to date. We anticipate that this dataset, and the methods provided for access will provide a useful tool in fundamental and application-related studies of materials. Our actual DFT verification provides insight into understanding the applicability and limitation of our the WTBH data. The WTBH can be used to obtain important electronic properties such as band-structures, density of states, and topological invariants in a computationally efficient way. Data-analytics tools can also be applied on the generated dataset.

Contributions
Both KFG and KC contributed in developing the workflow, analyzing data and writing the manuscript.

Competing interests
The authors declare no competing interests.

Code availability
Python-language based scripts for obtaining and analyzing the dataset are available at https://github.com/usnistgov/jarvis