High performance Wannier interpolation of Berry curvature and related quantities: WannierBerri code

Wannier interpolation is a powerful tool for evaluation of Brillouin zone integrals over a dense grid of $\mathbf{k}$ points, which is essential in e.g. anomalous Hall conductivity or Boltzmann transport coefficients. However new physical problems and new materials create new numerical challenges, and the computations with the existing codes become very computationally expensive, which often prevents reaching the desired accuracy. In this article I present a series of methods which allow to boost the speed of Wannier interpolation by several orders of magnitude compared to then the popular code Wannier90. The suggested methods include a combination of fast and slow Fourier transform, explicit use of symmetries, recursive adaptive grid refinement and some other techniques. The suggested methodology is implemented in the new python code WannierBerri, which also aims to serve as a convenient platform for development of further interpolation schemes for novel phenomena

Wannier functions [1,2] (WFs) have proved extremely efficient in evaluation of multiple properties of solids, including the modern theory of polarization [3][4][5] and orbital magnetization [6][7][8] and topological properties [9,10].Moreover, WFs provide a way to describe a group of energy bands in a crystal by a compact Hamiltonian, thus allowing a fast evaluation of the bandstructure at any point of the Brillouin zone (BZ) without an extra call to the ab initio code.This procedure called Wannier interpolation (WI) is similar in spirit to tight-binding calculations [11], but allows a systematic precise description of the bands without truncation error.WI is particularly useful in search of Weyl points [10] and evaluation of momentum-space integrals of rapidly varying functions, which appear, e.g. in calculations of anomalous Hall effect [12], orbital magnetization [13], Boltzmann transport coefficients [14], and optical properties [15].WI schemes were also developed for , e.g., electron-phonon coupling [16,17], gyrotropic effects [18], shift current [19] and spin Hall and effect [20,21].
While the WFs may be constructed in multiple ways, the most popular technique is the maximal localization [2,22], and the most established code for construction of maximally localized Wannier functions (MLWFs) is Wannier90 [23,24], which is now driven by a broad community [25].Generally, the construction of a good set of WFs requires a careful manual input, including a choice of an initial guess for the minimization procedure, as well as a choice of disentanglement energy windows [26].However, recently a significant progress was achieved in automated construction of WFs, employing different techniques, such as optimized projection functions method [27], selected columns of the density matrix (SCDM) method [28][29][30], or automated choice of projections and energy windows [31,32], and a database of Wannier Hamiltonians is being constructed [33].These advances are a big step towards employing WI for highthroughput automated calculations of electronic proper-ties of solids.
Most of the mentioned WI schemes are implemented within the popular codes -Wannier90 (namely it's post-processing part postw90.x)and WannierTools [10].These codes, being well-established and accepted by the community have quite a broad functionality.However, new materials and new physical effects create new numerical challenges, and calculation with those codes become quite heavy.In particular for a system with a large number of WFs and a complicated Fermi surface it maybe hard to achieve convergence with respect to momentumspace grid in a reasonable time.When speaking about high-throughput calculations the performance becomes even more important.
In this article I present a series of methodological improvements, which allow to perform Wannier interpolation much faster then postw90.x.The introduced methodology is implemented in the new python code WannierBerri [34] (WB), which is freely available to install [35] and is open for contributors [36].The code has a broad functionality, and already a number of contributors.The present article covers only the core methodology of the code which procure its efficiency.More broad scope of the code can be found on its web page [34] and will be detailed in future publications with the co-authorship of co-developers.Interesting to note that WannierBerri may be equally used for WI and tightbinding calculations, and also offers a convenient platform for development of new features.The name of the code is derived from Wannier functions and the Basque word "berri" which means "new" and enters local toponyms (e.g.Lekunberri, Ekainberri) [37].
The high efficiency of calculations is achieved by implementing several numerical approaches.First, note that in a typical calculation using postw90.x, the bottleneck is the Fourier transform, which is implemented as a standard discrete Fourier transform.Therefore it looks appealing to use a fast Fourier transform (FFT) which is widely used in numerical calculations.However it is problematic to do it over a very dense grid of k-points which may include upto 10 8 points.This issue is overcome by a mixed scheme employing both fast and "slow" Fourier transform.Next the symmetries of the system may be used, to reduce the evaluation only to the symmetryirreducible points, and obtain the result for the rest of points by applying symmetry operations.This also helps to obtain the result more symmetric (tensor components that should be equal or vanish will be exactly equal, or exactly vanish), even if the symmetries are slightly broken due to numerical inaccuracies in wannierization.Then I introduce an adaptive refinement algorithm, that chooses points that give the largest contribution to the integral, and makes the grid more dense in the vicinity of such points.This helps to have a more accurate description near special points, where the integrand is rapidly changing -e.g.near Weyl nodes or lines.This is close in spirit to adaptive refinement used in [12,38], but is more automatic and requires less input from the user.Finally, I introduce methods which reduce the computational cost of the minimal-distance replica selection method [25] and scan of multiple Fermi levels to a negligible time.
The rest of the article is organized as follows.Sec.II describes the methods wich help to boost the speed of the calculation.Sec.III demonstrates the usage of the code on the 'textbook' example of the anomalous Hall conductivity of bcc iron, and in sec.IV based on that example the comparison the evaluation time with postw90 is made.Appendix A describes the list of implemented functionality.

A. General equations for Wannier interpolation
The goal of this section is to introduce notation necessary for further discussion.For more details please refer to review [2] and original articles cited therein.The problem of Wannier interpolation is stated in the following way.First we evaluate the energies E nq and wavefunctions ψ nq ≡ e iqr u nq from first principles on a rather coarse grid of wavevectors q.Next we want to find the energies and wavefunctions at points on a denser grid of wavevectors k.Further we will consistently use q and k to denote the ab initio and interpolation grids respectively.
For a group of entangled bands one can define a set of J Wannier functions defined as where J q ≥ J.The matrices V mn (q) are constrained by Jq m=1 V * mn (q)V mn (q) = δ nn and are chosen in such a way that the Wannier functions are localized, which yields that the Bloch wavefunctions in the Wannier gauge vary slowly with the k vector, unlike the true wavefunctions.Now let us see how Wannier functions may be used to interpolate the band energies.First, one evaluates the matrix elements of the Hamiltonian: Next, to obtain energies at an arbitrary point k one needs to construct the Wannier Hamiltonian which further may be diagonalized as In a similar way, for any operator X, for which the matrix elements are evaluated on the ab initio grid, one may obtain the real-space matrix elements and then may be interpolated to any k point in the Wannier gauge by and further be rotated to the Hamiltonian gauge For example, it was shown [12] that the total Berry curvature of the occupied manifold is interpolated via expression (9) where the ingredients of the equation are obtained using eqs.(7), (8) The anomalous Hall conductivity is evaluated as an integral: Note, that while the direct Fourier transform (6) is performed only once for the calculation, and is not repeated for the multiple k point upon interpolation, the inverse Fourier transform (7) is repeated for every interpolation k point.And in fact it presents the most time-consuming part of the calculation involving Wannier interpolation as implemented in the Wannier90 code.

B. Mixed Fourier transform
In this section we will see how the evaluation of Eq. ( 7) may be accelerated.It is easy to see that the computation time of a straightforward evaluation of discrete Fourier transform scales as t ∝ N R N k , where we are typically interested in a case N k N R (N R ≈ N q ).When the Fourier transform is done on a regular grid of k points, it is usually appealing to use the fast Fourier transform (FFT).For that one needs to place the R vectors on a regular grid of size N k , fill the missing spots with zeros and perform the standard FFT, which will scale as t ∝ N k log N k .However the drawbacks are manifold.First, FFT would give some advantage for moderate k grids, but not for really large ones (N k N R ), which are the main goal of doing Wannier interpolation at all.Moreover doing a FFT on a large grid means storing the data for all k-points in memory at the same time, which becomes a severe computational limitation.Also FFT does not allow reduce computation to only the symmetry-irreducible k-points, and is more difficult to do in parallel.
However there is a way to combine the advantages of both the FFT and the usual discrete Fourier transform, leading to the concept of mixed Fourier transform.
We want to evaluate Eq. ( 7) for a set of k points.
where 0 [39] Then the set of points ( 11) is equivalent to a set of points k = K + κ, where where 0 , which shows a 2×2 K-grid, each corresponding to 4×4 FFT grid (dots of a certain color).Now for each K-point we can define and then (7) reads as The principle idea of mixed Fourier transform consists in performing the fourier transform (14) as FFT, while ( 13) is performed as a usual discrete Fourier transform.The advantages of this approach are the following.First, the computational time scales as t 1 ∝ N K N R for (13) and we have t 1 t 2 ∝ N k log N FFT , which scales better then both the Fast and 'slow' Fourier transform.Next, we can perform Eqs. ( 13) and ( 14) independently for different K-points.This saves us memory, and also offers a simple parallelization scheme.Also we can further restrict evaluation only to symmetry irreducible K-points (Sec.II C) and also perform adaptive refinement over Kpoints (Sec.II D).
Moreover, the evaluation time of a mixed Fourier transform only logarithmically depends on the size of the ab initio grid (recall that N FFT ∼ N R ∼ N q ), while for the slow Fourier transform, the dependence is linear.However, in practice we will see (sec.IV) that the Fourier transform in the present implementation consumes only a small portion of computational time, and therefore the overall computational time is practically independent of the size of the ab initio grid.

C. Symmetries
When we integrate some quantity over the BZ, at every K-point (after summing over κ points) we obtain the result as a rank-m tensor X i1,...,im (K).Then the BZ integral is expressed as a sum and initially set {K} as a regular grid of K points, and w k = 1/N K .Suppose G is the point group of the system.We define the set of symmetryirreducible K-points as a a set of points that ∀K, K ∈ irr, ∀g ∈ G holds gK = K , unless g = E (identity).Then we can split the sum (15) as Where in the second line we choose g K such that g −1 K K ∈ irr.(this choice may be not unique).Thus, only the irreducible K points need to be evaluated.Next, to make sure that the result respects the symmetries, despite possible numerical inaccuracies, we symmetrize the result as: Note, that X = X if the model respects the symmetry precisely (e.g. when symmetry-adapted Wannier functions [40] are used).Combining ( 16) and ( 17) and using  where G • K denotes the orbit of K under action of group G.The latter equation reflects the implementation in the WB code.Starting from a regular grid of K points we search for pairs symmetry-equivalent points.Whenever such a pair is found, one of the points is excluded and it's weight is transferred to the other point.Compare fig 1(a) and (b): the red points are removed and their weight is moved to green points.Thus we end with a set of irreducible K-point with weights G•K K w K .Next we evaluate X(K) (employing the corresponding interpolation scheme) only at symmetry-irreducible K-points.Note, that although some k-points corresponding to the same K-point (same color in fig. 1) are equivalent, we have to evaluate them all to be able to use the FFT.Finally, after summation, we symmetrize the result.The described procedure achieves two goals: (i) reduce the computational costs and (ii) make the result precisely symmetric, even if the Wannier functions are not perfectly symmetric.

D. Recursive adaptive refinement
It is well known that in calculations of quantities involving Berry curvature, orbital moments, etc , one performs integration over k-space of a function that strongly depends on k.As a result, small areas of k-space give the major contribution to the integral.Such areas often appear in the vicinity of Weyl points, nodal lines, as well as avoided crossings.To accelerate convergence with respect to the number of k points, we utilize adaptive mesh refinement similar to Refs.[12,38].The authors of [12,38] assumed a pre-defined threshold, and the kpoints yielding Berry curvature above the threshold were refined.This is inconvenient because one needs a good intuition to guess an optimal value for this threshold, because it depends both on the quantity one wants to calculate, and the material considered.
In WB it is implemented in a way that does not require initial guess from the user.This procedure, in combination with symmetrization described above, is illustrated in Fig. 1.After excluding symmetry-equivalent K-points (Fig. 1b) the results are evaluated for every K point and stored.Then we pick a few "most important K-points".The criteria of importance may be different -either the Maximal value for any E F , or maximal value summed over all E F , or yielding most variation over the E F .Suppose it is the magenta point.Then those points are refined -replaced with 8 points around it (in Fig. 1c 4 new blue K-points in the 2D case), with weight of the initial point distributed over the new points.Then the symmetrization is applied again (the four blue points are connected by 4-fold rotation) to exclude the equivalent points.After the new K-points are evaluated, we go to the next iteration of refinement.On each iteration any point may be refined, including both those from the initial regular grid, and those created during previous refinement iterations.The procedure stops after the pre-defined number of iterations was performed.Minimal-distance replica selection method Minimal-distance replica selection (MDRS) method was shown to be very efficient in the Wannier interpolation [25], in particular when moderate q-grids are used in the abinitio calculations.With MDRS method is the Fourier transform ( 7) is modified in the following way: Where T (j) mnR are N mnR lattice vectors that minimise the distance |r m −(r n +R+T)| for a given set m, n, R.However, the evaluation of Eq. ( 19) is quite slower then (7), because every k, m, n, R an extra loop over j is needed.Therefore calculations employing MDRS in postw90.x(which is enabled by default) takes more time.Instead it is convenient to re-define the modified real-space matrix elements as mnR (20) only once for the calculation, and then the transformation to k-space is performed via The latter expression, having essentially same form as (7), can be evaluated via mixed Fourier transform, as described in Sec.II B.
Thus the MDRS method implemented in WB via Eqs.( 20)-( 21), and has no extra computational cost, while giving notable accuracy improvement.

F. Evaluation of Fermi-sea integrals
It is often needed to study anomalous Hall conductivity (AHC) not only for the pristine Fermi level E F , but considering it as a free parameter .On the one hand it gives an estimate of the accuracy of the calculation, e.g.sharp spikes may indicate that the result is not converge.On the other hand -dependence gives access to the question of the influence of doping and temperature, and also allows calculation of anomalous Nernst effect (see eq. (A1)).As implemented in postw90.x,evaluation of multiple Fermi levels has a large computational cost.However there is a way to perform the computation of AHC for multiple Fermi levels without extra computational costs.To show this let's rewrite ( 9), (10) where k = K + κ, and the definitiogns of P n and Q ln straightly follow from (9).Now suppose we want to evaluate O( i ) for a series of Fermi levels i .For different k-points and Fermi levels the sets of occupied O(k, ) and unoccupied states U (k, ) change and repeating this summations many times may be computationally heavy.Instead we note that when going from one Fermi level i to another i+1 only a few states at a few k-points change from unoccupied to occupied.Let's denote the set of such k-points as δκ i then, the change of the total Berry curvature is where δO i (k) ≡ O(k, i+1 ) − O(k, i ).Note that if the step i+1 − i is small, then δK i and δO i (k) include only few elements, if not empty.Hence the evaluation of 23 swill be very fast.Thus, the full summation Eq. ( 22) is needed only for the first Fermi level.In a similar way this approach may be applied to orbital magnetization and other Fermi-sea properties.E.g. the orbital magnetization may be written as (24) where C mn,γ (R) ≡ 0m|r α • Ĥ • (r β − R β )|Rn and the other ingredients were explained under eq.( 9).Eq. ( 24) is written following the approach of Ref. 13, but the result has a different form, which can be straightforwardly processed by analogy with eqs.( 22), (23).

III. EXAMPLE: AHC OF BCC FE
In this section the usage of the WannierBerri code is demonstrated on a simple example -anomalous Hall conductivity of bcc iron.First we performed the ab-initio calculations using the QuantumEspresso (QE) code [41] on a grid 8×8×8 q-points, fixing the magnetization along [001] axis.Next, we construct 18 WFs.These two steps follow exactly the Tutorial#18 of Wannier90 (W90) [42], ad the reader is addressed to the documentation of Wan-nier90 and QE for details.
After that, the calculation is performed by the following short python script.First, we import the needed packages: import wannierberri as WB import numpy as np Then we read the information about the system and Wannier functions: system = WB .System_w90 ( ' Fe ' , berry = True ) from files Fe.chk, Fe.eig, Fe.mmn [43], or we can read all information from a file Fe tb.dat, which is also written by Wannier90, or maybe composed by user from any tight-binding model: system = WB .System_tb ( ' Fe_tb .dat ' , getAA = True ) Next, we define the symmetries of the system that we wish to take into account.In the ab initio calculation we have specified the magnetization along the z axis, hence the symmetries that are preserved are inversion I, 4-fold rotation around the z axis C 4z ,and a combination of timereversal T and 2-fold rotation around the x axis C 2x .Here we need only the generators of the symmetry group.The other symmetries will be automatically obtained by taking products of these generators.E.g. the mirror Next we need to set the grids of k, K and κ points.Most conveniently it can be done by setting the 'length' parameter: grid = WB .Grid ( system , length =100) This will guarantee the grid to be consistent with the symmetries, and the spacing of k-points will be ∆k ≈ 2π length .In this particular case a k-grid of 52 × 52 × 52 points (13 × 13 × 13 K-grid, 4 × 4 × 4 κ-grid) will be generated, but that depends on the size of the unit cell.
Next, we want to integrate the Berry curvature to get the AHC.This is done by the WB.integrate method.
WB .integrate ( system , grid , Efermi = np .linspace (12.,13.,1001) , smearEf =10 , # 10 K quantities =[ " ahc " ," cumdos " ] , numproc =16 , adpt_num_iter =20 , fout_name = " Fe " ) and in addition to AHC we evaluate the cumulative density of states (cDOS) (A6).We consider Fermi level as a free parameter, scanning over a set of Fermi levels from 12 to 13 eV with a step of 1 meV, and a small smearing over the Fermi level corresponding to temperature 10K (∼ 1 meV) is used.It is known, that in the BZ integration, some k points may give a large contribution to the integral.This is especially strong for Berry curvature, which blows up near band degeneracies and avoided crossings, that fall close to the Fermi level.This is reflected as huge spikes in the E F -resolved curves -see blue curve in Fig 1(g).To make the calculation more precise around such points, an adaptive recursive refinement algorithm is used, and we set the number of iterations to 20.The integration is done in parallel by means of the multiprocessing[44] module and the parameter 'numproc' specifies that a Pool of 16 worker processes is used.
From the cDOS we can find the precise position of the Fermi level E F = 12.610 eV -the energy at which the cumulative DOS reaches 8 electrons per unit cell.This is more accurate, then the result evaluated from a coarse ab initio grid.Next, it is instructive to plot the AHC after each iteration.In Fig. 1g one can see that already after 20 iterations the chaotic peaks are removed, and we can get a reasonably sooth curve, although we have started from a rather coarse grid of only 52×52×52 k-points.
This short example demonstrates that that the calculations with WB may be run with a few lines of Python script.Appendix A describes more of the implemented functionality, and more options are under development or under testing.For more detailed and updated information please refer to the online documentation at [34].x the calculations are done with (yellow) and without (purple) MDRS.For WB the calculations are done with (cyan) and without (red) use of symmetries.

IV. COMPUTATION TIME
In this section we will compare the calculation time for the calculations of anomalous Hall conductivity using postw90.xand WannierBerri.We will take the example of bcc Fe and vary different parameters.Calculations were performed on identical 32-core virtual nodes of the ScienceCloud cluster at University of Zürich.The nodes are based on AMD EPYC 7702 64-Core Processors with frequency 2GHz and 128 GB RAM per node, and one node was used per task.
The computation consists of two phases.First, some preliminary operations are done.Those include reading the input files and performing Fourier transform from ab initio grid q to real-space vectors R : eqs.( 3) and ( 6).This operation takes in WB (postw90.x)between 2 (3) seconds for the small q grid 4 × 4 × 4 and 2 (3) minutes for a large grid of 16 × 16 × 16.This time is mostly taken by reading the large formatted text file Fe.mmn, and it is done only once and does not scale with the density of the interpolation grid.In WB this is done in the constructor of the System_w90 class, and the object can be saved on disk using a pickle module, so that this operation does not repeat for further calculations.
Next comes the interpolation part itself, for which the evaluation time scales linearly with the number of kpoints used.Further the time for an interpolation grid 200×200×200 is given, which is a rather good grid to make an accurate calculation for this material.
We start with comparing time with the MDRS switched off and without use of symmetries in WB.As can be seen in Fig. 2, for a small q-grid 4×4×4 WB is just slightly faster then postw90.x.However, for dense q-grids the computational time of postw90.xgrows lin- early with the number of q points, while in WB it stays almost the same.This happens because in postw90.x the Fourier transform is major time-consuming routine.On the other hand, in WB, although cost of the mixed Fourier transform is expected to grow logarithmically with the ab initio grid, we do not see it because Fourier transform amounts only to ∼ 10% of the computational time.
Next, we switch on the MDRS method in postw90.x,and the computational time grows by a factor of 5. On the other hand the computational time does not change (not shown).
Finally let's switch on the use of symmetries in WB.Thus the computational time decreases by a factor of 8.In the ultra-dense grid limit one would expect the speedup to be approximately equal to the number of elements in the group -16 in the present example, due to exclusion of symmetry-equivalent K-points.But this does not happen, because we use an FFT grid of 25×25×25 κ-points, hence the K-grid is only 8×8×8, and a considerable part of K-points are at high-symmetry positions.Therefore they do not have symmetric partners to be excluded from the calculation.
Thus we can see that the difference in computational time with postw90.x and WB reaches 3 orders of magnitude for this example.Note that the examples above were performed only for the pristine Fermi level.Now let's see what happens upon scanning the Fermi levels (Fig. 3).In WB the computational time remains practically unchanged when we use upto N ≈ 1000 Fermi levels, and only start to grow considerably at N ∼ 10 4 .On the other hand in postw90.x the computational time significantly grows with N , which is especially remarkable for small q-grids.
In this section we did not use the adaptive refinement procedure.However when on starts from a rather dense grid of K-points, the new K-points coming from the re-finement procedure constitute only a small portion of the initial grid, and hence do not contribute much into computation time.
In this section we have shown that the methods suggested in this article help to significantly reduce the computation time from days to minutes.However, bcc iron is a simple example with only 1 atom per unit cell, and only 18 WFs are needed.More complicated systems will require more time.E.g. to obtain a converged value of ANE in PrAlGe [45] using wannier19 (early version of WB), the calculation took approximately 30 hours on the same computation node.Estimates predict that the same calculation with postw90.xcould take several months.Thus it is the case where the numerical advance not only saves time, but also brings the calculation from the unreasonably time-consuming area, where most people would avoid working, to a reasonably feasible computation time.

V. SUMMARY
In this article I have presented a number of methods that allow to boost the performance of Wannier interpolation to a new level.The methods are implemented in the new Python code WannierBerri.It is important to note that the mixed Fourier transform and the optimization of MDRS method and Fermi-level iteration, while giving a large computational advantage, do not affect the result within machine precision.Hence the new code can be easily benchmarked with the established postw90.xcode.The code not only allows to perform high-speed and high-precision calculations of AHC and a palette of other properties, it also serves as a platform for implementation new features involving Wannier interpolation.
Thus it has a potential to become a new community code.Interestingly, the code uses the same routines to perform calculations both from Wannier Functions and for tight-binding models models.Finally, in combination with recent advances in automated construction of Wannier functions, it paves a way to high-throughput calculations of properties of solids that require Wannier interpolation.

Tabulating
WB can also tabulate certain band-resolved quantities over the Brillouin zone.This feature is called by, e.g.• 'spin' : the expectation value of the Pauli operator.: • 'V' : the band gradients ∇ k E nk

Evaluation of additional matrix elements
Wannier interpolation starts from certain matrix elements defined on the ab initio (q) grid.Those matrix elements should be evaluated within the ab initio code, namely within its interface to Wannier90.However, only QuantumEspresso [41] has the most complete interface pw2wannier90.x.The other codes provide only the basic interface, which includes the eigenenergires E nq (.eig file) and overlaps M b mnq = u mq |u nq+b (A10) (file .mmn),where b vector connects neighbouring qpoints.This information allows to interpolate the band energies (and their derivatives of any order) as well as Berry connections [15] and Berry curvature [12].However, to evaluate the orbital moment of a Bloch state, one needs matrix elements of the Hamiltonian [13] (.uHu file) The evaluation of eq. ( A11) is very specific to the details of the ab initio code, and implemented only in pw2wannier90.x.To make it possible study of orbital magnetization with other ab initio codes, the following workaround may be employed.By inserting a complete stet of Bloch states at a particular q point This equation is implemented within the wannierberri.mmn2uHusubmodule, which allows to generate the .uHufile out of .mmnand .eigfiles.The equality in (A12) is exact only in the limit l max → ∞ and infinitely large basis set for the wavefunctions representation.So in practice one has to check convergence for a particular system.However, in practice the term depending on C b1,b2 mnq usually gives a minor contribution (although it is not guaranteed and depends on the choice of WFs), and the convergence of the total result may be easily achieved.
To interpolate the spin operator expectation value, the matrix S mnq = u mq |σ|u nq is needed.To facilitate study of spin-dependent properties within VASP [51] code, a submodule wannierberri.vaspspn is included, which computes S mnq based on the normalized pseudo-wavefunction read from the WAVECAR file.Note that the use of pseudo-wavefunction instead of the full PAW [52] wavefunction is an approximation, which however in practice gives a rather accurate interpolation of spin.
The mmn2uHu and vaspspn modules were initially developed and used in [18] as separate scripts, but were not published so far.Now they are included in the WannierBerri package with a hope of being useful for the community.

FIG. 1 .
FIG. 1. (a-f) Illustration of the procedure of mixed Fourier transform, adaptive refinement and use of symmetries.The size of colored circles corresponds to the weight of the K-point, gray crosses denote the points with zero weight.See the text for detailed description.(g) AHC of bcc Fe, evaluated from a grid of 52×52×52 k-points and 20 recursive adaptive refinement iterations.

Fig 1 (
g) shows how undesired artificial peaks of the the AHC curve are removed iteration by iteration, yielding a smooth curve.(See sec.III for details) E.

FIG. 2 .
FIG.2.Computational time for AHC using WB (triangles) and postw90.x(circles) for different ab initio grids.For postw90.x the calculations are done with (yellow) and without (purple) MDRS.For WB the calculations are done with (cyan) and without (red) use of symmetries.

l
|u lq u lq | we can rewrite (A11) as C b1,b2 mnq ≈ Computational time for scanning multiple Fermi levels using WB and postw90.x(pw90) for different ab initio grids.MDRS method and symmetries are disabled here.