Abstract
We analyze a data set comprising 370 GW band structures of twodimensional (2D) materials covering 14 different crystal structures and 52 chemical elements. The band structures contain a total of 61716 quasiparticle (QP) energies obtained from planewavebased oneshot G_{0}W_{0}@PBE calculations with full frequency integration. We investigate the distribution of key quantities, like the QP selfenergy corrections and QP weights, and explore their dependence on chemical composition and magnetic state. The linear QP approximation is identified as a significant error source and we propose schemes for controlling and drastically reducing this error at low computational cost. We analyze the reliability of the 1/N basis set extrapolation and find that is wellfounded with a narrow distribution of coefficients of determination (r^{2}) peaked very close to 1. Finally, we explore the accuracy of the scissors operator approximation and conclude that its validity is very limited. Our work represents a step towards the development of automatized workflows for highthroughput G_{0}W_{0} band structure calculations for solids.
Introduction
In computational materials science, the highthroughput mode of operation is becoming increasingly popular^{1}. The development of automatized workflow engines capable of submitting, controlling, and receiving thousands of interlinked calculations^{2,3,4} with minimal human intervention has greatly expanded the range of materials, and properties, that can be investigated by a single researcher. Several highthroughput studies have been conducted over the past decade mostly with the aim of identifying new prospect materials for various applications including catalysis^{5}, batteries^{6,7}, thermoelectrics^{8,9}, photocatalysts^{10}, transparent conductors^{11}, and photovoltaics^{12,13}, just to mention some. The vast amounts of data generated by such screening studies have been stored in open databases^{14,15,16,17} making them available for further processing, testing, and comparison of methods and codes, training of machine learning algorithms, etc. With very few exceptions, the highthroughput screening studies and the generation of materials databases, have been based on density functional theory (DFT) at the level of the generalized gradient approximation (GGA).
While DFT is fairly accurate for structural parameters and other properties related to the electronic ground state, it is well known that electronic band structures, in particular the size of band gaps, are not well reproduced by most xcfunctionals^{18}. This holds in particular for the LDA and GGA functionals, which hugely underestimate band gaps, often by about a factor of 2 or more^{19,20}. Hybrid functionals and certain metaGGAs perform significantly better^{21}, but are not fully abinitio and miss fundamental physics such as nonlocal screening effects^{22}. Instead, the gold standard for quasiparticle band structure calculations of solids is the manybody GW method^{23,24,25,26}, which explicitly accounts for exchange and dynamical screening. In its simplest nonselfconsistent form, i.e., G_{0}W_{0}, this approximation reproduces experimental band gaps to within 0.3 eV (mean absolute error) or 10% (mean relative error)^{19,20,27}. We note in passing that for partially selfconsistent GW_{0}^{20} or when vertex corrections are included^{28,29}, the deviation from experiments falls below 0.2 eV, which is comparable to the uncertainty of the experimental reference data. The improved accuracy of the GW method(s) comes at the price of a significantly more involved methodology both conceptually and numerically as compared to DFT. While DFT calculations can be routinely performed by nonexperts using codes that despite very different numerical implementations produce identical results^{30}, GW calculations remain an art for the expert.
The high complexity of GW calculations is due to several factors including (i) The basic quantities of the theory, i.e., the Greens function (G) and screened Coulomb interaction (W) are dynamical quantities that depend on time/frequency. Several possibilities for handling the frequency dependence exists including the formally exact direct integration^{19} and contour deformation techniques^{31} as well as the controlled approximate analytic continuation methods^{32} and the rather uncontrolled but inexpensive plasmonpole approximations^{24}. (ii) The formalism involves infinite sums over the unoccupied bands. While most implementations perform the sum explicitly up to a certain cutoff, schemes to avoid the sum over empty states have been developed^{33,34}. (iii) The basic quantities are twopoint functions in real space (or reciprocal space) that couple states at different kpoints. This leads to large memory requirements and makes it unfeasible to fully converge GW calculations with respect to the basis set. Consequently, strategies for extrapolation to the infinite basis set limit must be employed^{35,36}. (vi) Unless the GW equations are solved fully selfconsistently, which is rarely done and does not improve accuracy^{29,37}, there is always a starting point dependence. This has been systematically explored for molecules where it was found that LDA/GGA often comprise a poor starting point whereas hybrids perform better in the sense that they lead to better agreement with experimental ionization potentials and produce more welldefined spectral peaks with higher quasiparticle weights^{38,39}. These and other factors imply that GW calculations not only become significantly more demanding than DFT in terms of computer resources, but they also involve more parameters making it difficult to assess whether the obtained results are properly converged or perhaps even erroneous.
Successful application of the highthroughput approach to problems involving excited electronic states, e.g., light absorption/emission, calls for the development of automatized and robust algorithms for setting the parameters of manybody calculations such as GW (according to available computational resources and required accuracy level), extrapolating the basis set, and assessing the reliability of the obtained results. The first step towards this goal is to analyze and systematize the data from largescale GW studies. With a similar goal in mind, van Setten et al. compared G_{0}W_{0}@PBE band gaps, obtained with the plasmonpole approximation, to the experimental band gaps. They analyzed the correlations between different quantities and concluded that that G_{0}W_{0} (with plasmonpole approximation) is more accurate than using an empirical correction of the PBE gap, but that, for accurate predictive results for a broad class of materials, an improved starting point or some type of selfconsistency is necessary.
In this work we perform a detailed analysis of an extensive GW data set consisting of G_{0}W_{0}@PBE band structures of 370 twodimensional semiconductors comprising a total of 61,716 QP energies. Our focus is not on the ability of the G_{0}W_{0} to reproduce experiments, i.e., its accuracy, which is well established by numerous previous studies, but rather on the numerical robustness and reliability of the method and the basis set extrapolation procedure. The calculations employ a planewave basis set and direct frequency integration; thus the use of projector augmented wave (PAW) potentials represent the only significant numerical approximation. We investigate the distribution of selfenergy corrections and quasiparticle weights, Z, and explore their dependence on the materials composition and magnetic state. By investigating the full frequencydependent selfenergy for selected materials we analyze the error caused by the linear approximation to the QP equation and propose methods to estimate and correct this error. We assess the reliability of a planewave basis set extrapolation scheme finding it to be very accurate with a coefficient of determination, r^{2}, values above 0.95 in more than 90% of the cases when extrapolation is performed from 200 eV. Finally, we assess the accuracy of the scissors operator approach and conclude that it should only be used when the average (maximal) band energy errors of 0.2 eV (2 eV) are acceptable.
Results and discussion
The G_{0}W_{0} data set
The 370 G_{0}W_{0} calculations were performed as part of the Computational 2D Materials Database (C2DB) project^{40}. Below we briefly recapitulate the computational details behind the G_{0}W_{0} calculations and refer to Ref. ^{40} for more details. All calculations were performed with the projector augmented wavefunction code GPAW^{41}.
The C2DB database contains about 4000 monolayers comprising both known and hypothetical 2D materials constructed by decorating experimentally known crystal prototypes with a subset of elements from the periodic table^{40}. Currently, G_{0}W_{0} calculations have been performed for 370 materials covering 14 different crystal structures and 52 different chemical elements. Figure 1a illustrates the distribution of elements. The number of materials containing a given element is shown below the element symbol. The number of magnetic materials containing the elements is shown in parenthesis next to the total number.
To give an overview of some of the data analyzed in this work, the distribution of the 61716 G_{0}W_{0} corrections for the six bands around the bandgap is shown in Fig. 1b. The distribution for the valence bands is shown in blue and for the conduction in orange. It is usually the case in GW studies that the DFT valence bands are shifted down and the conduction bands are shifted up. Similar behavior is found for the main part of our data, but we also observe a small subset of states for which the correction has the opposite sign. It is difficult to provide a clear physical explanation for why some occupied states are shifted up and some empty states are shifted down. We stress, however, that the GW corrections are measured relative to the PBE band energies, which is a somewhat arbitrary reference. For example, G_{0}W_{0}@LDA and G_{0}W_{0}@HSE would give different results—not so much for the resulting QP energies, which are relatively independent of the starting point—but for the size and sign of the GW corrections, which would now be measured relative to the LDA and HSE energies, respectively.
Figure 1c shows a scatter plot of the PBE energies versus the G_{0}W_{0} energies. We only show energies from −10 to 10 eV for clarity. The color of a point shows the Z value. The latter has been truncated to the region [0.5, 1.0] to show the variation of the main part of the distribution. The main observation we can make from this figure is that there is no obvious correlation between the energies and the Z values. This is also verified by the calculated correlation coefficient, C, between E_{PBE} and Z (C = 0.27), \({E}_{{\text{G}}_{0}{\text{W}}_{0}}\) and Z (C = 0.23) and between the G_{0}W_{0} correction, \({E}_{{\text{G}}_{0}{\text{W}}_{0}}{E}_{\text{PBE}}\), and Z (C = 0.10). We conclude that there is no significant correlation between the energies and Z, meaning that low Z values (which signals a break down of the QP approximation) may occur in any energy range.
Quasiparticle weight Z
The quasiparticle weight, Z, gives a rough measure of the validity of the quasiparticle picture, i.e., how well the charged excitations of the interacting electron system can be described by singleparticle excitations from the ground state. In the “Methods” section, we prove a physical interpretation of the quasiparticle weight.
In the following, we analyze the 61,716 calculated QP weights, Z, contained in the C2DB database. As discussed in the Methods section, for the QP approximation to be wellfounded Z should be close to 1. We split the Z values into two classes: quasiparticleconsistent (QPc) for Z ∈ [0.5, 1.0] and quasiparticleinconsistent (QPic) for Z ∉ [0.5, 1.0]. With this definition, QPc states will have at least half of their spectral weight in the quasiparticle peak, but there is no deeper principle behind the threshold value of 0.5. We can expect that the QP approximation is more accurate for QPc states than for QPic states.
Figure 2 shows a histogram of the Zvalues (all extrapolated to the infinite planewave limit) corresponding to the 3 highest valence bands and 3 lowest conduction band of 370 semiconductors. The vast majority of the values are distributed around ≈0.75 with only 0.28% lying outside the physical range from 0 to 1 (0.16% are larger than one and 0.12% are negative). We find that 97.5% of the states are QPc.
It is of interest to investigate if there are specific types of materials/elements that are particularly challenging to describe by G_{0}W_{0}. Figure 3 shows a barplot of the percentage of QPic states in materials containing a given element (note the logarithmic scale). The result of this analysis performed on the nonmagnetic (ferromagnetic) materials is shown in blue (orange). For example, a large percentage (about 65%) of the states in Cocontaining materials are QPic. It is clear that magnetic materials contribute a large fraction of the QPic states. In fact, 0.36% of the nonmagnetic states are QPic while 22% of the magnetic states are QPic. In general, it thus seems that the QP approximation is generally worse for magnetic materials.
We note that the employed PAW potentials are not strictly normconserving. It has previously been found that the use of normconserving pseudopotentials can be crucial for the quantitative accuracy of G_{0}W_{0} results for materials with localized d or f states^{35,42,43}. To investigate this potential issue, we checked the distributions of G_{0}W_{0} corrections and QP weights for materials containing at least one element with a pseudo partial wave of norm <0.5, i.e., materials where the normconservation could potentially be strongly violated for certain states. Out of the 370 materials, there were 279 materials in this category. The resulting distributions were not found to deviate qualitatively from those of all the materials (shown in Figs. 1b and 2, respectively), and the strongest indicator of unphysical Z values or oppositesign G_{0}W_{0} corrections remained the magnetic state of the material. On basis of this analysis, we conclude that the use of nonnormconserving PAW potentials does not affect the conclusions of our study.
Based on the distribution of QP weights in Fig. 2, it appears that the QP approximation is valid for essentially all the states in the nonmagnetic materials and most of the states in the magnetic materials. However, while a QPc Z value is likely a necessary condition for predicting an accurate QP energy from the linearized QP equation [Eq. (6) in the “Method” section], it is not sufficient. This is because the assumption behind Eq. (6), i.e., that Σ(ε) varies linearly with ε in the range between the KS energy and the QP energy, is not guaranteed for QPc states. This is illustrated in Fig. 4, which shows the full frequencydependent selfenergy for three states in the ferromagnetic FeCl_{2}. Case (a) is a typical example where the selfenergy of a QPc state (Z = 0.61) varies linearly around ε_{KS} and the 1st order approximation works well. The second case (b) shows an example where the 1st order approximation breaks down for a QPic state (Z = 1.19). The final case (c) illustrates that the 1st order approximation can break down even in cases where Z is very close to 1. Unfortunately, there is no simple way to diagnose such cases from the information available in a standard G_{0}W_{0} calculation (Σ(ε_{KS}) and Z). We stress that the example in Fig. 4c is a special case and that in general, the linear approximation is significantly more likely to hold for QPc states than for QPic states (see discussion below).
Beyond the linear QP approximation
Under the assumption that the KS wave functions constitute a good approximation to the QP wave functions so that offdiagonal elements can be neglected, the solution to the QP equation reduces to solving an equation of the form
where Σ(ω) = Σ_{GW}(ω) − v_{xc} is the frequencydependent selfenergy (see “Methods” section).
In this section, we investigate different rootfinding schemes to estimate the size of the error introduced by the linear approximation and obtain an improved QP energy. With highthroughput computations in mind, a good algorithm provides a reasonable balance between computation time (number of Σ/Z evaluations) and accuracy. To benchmark the different schemes we computed the full frequencydependent selfenergy for 3192 states, corresponding to the 3 highest valence bands and 3 lowest conduction bands, for 12 of the 370 2D materials (including two ferromagnetic materials). The two ferromagnetic materials were chosen at random from materials that had some Z ∉ [0, 1]. The remaining 10 materials were chosen at random from materials with all Z ∈ [0, 1] and typical Z distributions. An overview of the materials is shown in Table 1. The selfenergy is evaluated on a uniform frequency grid and interpolated using cubic splines. The “true” solution of the QP equation is then determined and used to evaluate the errors of the approximate schemes. In cases where there are multiple solutions, the smallest correction is selected.
We first determine the errors introduced by the linear approximation. Histograms of the errors for QPc and QPic states are shown in Fig. 5. This shows that QPic generally has larger error and thus warrant particular attention.
We first consider the iterative Newton–Raphson (NR) method where we limit ourselves to 1 and 2 iterations to keep the number of selfenergy evaluations and thus the computational cost low. We note that 1 iteration (NR1) is equivalent to the linear approximation. The distribution of the errors is shown in Fig. 6a. Although 87% of the errors from NR1 are below 0.1 eV, the mean absolute error (MAE) is 0.11 eV due to outliers. Most of these errors are significantly reduced by performing one more iteration of Newton–Raphson (NR2), but again outliers increase the MAE. If we evaluate the MAE without the outliers (those lying outside the displayed error range), the MAE reduces to only 0.006 eV.
Motivated by the relatively narrow distribution of Z values in Fig. 2, we consider an empirical solution estimate consisting of replacing the actual Z value with the mean value of the distribution, i.e., we simply set Z = 0.75. This has the advantage of being simple, computationally cheap, and robust in the sense of avoiding outlier Zvalues arising from local irregularities in Σ at the KS energy (Fig. 4b). The resulting error distribution is shown in Fig. 6b. While the central part of the distribution is slightly broadened compared to the 1st order approximation, the MAE is reduced due to a reduction of outliers (enhanced robustness). As shown in Fig. 6c, the central part of the distribution can be narrowed by applying the empirical approach only for QPic states, i.e., when Z ∉ [0.5, 1]. In fact, this approach (empZ@QPic) has a MAE equal to that of NR2 but with half the computational cost (two Σ/Z evaluations compared to four).
Next, we examine the polynomial fitting of the selfenergy. We construct second and fourthorder polynomials, P_{n}(ω), from the selfenergy at energies in a range of ±1 eV around the KS energy. The cost of the second and fourthorder fits is equivalent to three and five selfenergy evaluations, respectively. In general, the polynomial fits have rather low correlation coefficients of C < 0.9 and are sensitive to the choice of frequency points and selfenergy data used for the fit. As a consequence, the resulting errors are large (not shown) and the approach is not suitable. We attribute this to our observation that selfenergies are often irregular (on the relevant scale of 1 eV) and not welldescribed by loworder polynomials.
Finally, we consider a scheme that we refer to as ΣdE, which estimates the error as
The motivation for this expression is the following. If the linear approximation is exact, then δ vanishes as it should. Moreover, if the selfenergy has a nonzero curvature it can be shown that δ equals the true error to leading order in the curvature. In that sense, it is similar to the secondorder polynomial fit, but with the important difference that whereas the polynomial fit was based on uniformly distributed points, ΣdE uses the value and slope at E^{KS} and the value at E^{QP,lin}.
In Fig. 7a, the distribution of the ratios of the estimated error and true error is shown and the errors resulting from Eq. (2) are shown in Fig. 7b. Compared to the linear approximation, the ΣdE reduces the MAE from 0.11 to 0.05 eV, at the cost of one additional selfenergy evaluation. Interestingly, Eq. (2) systematically overestimates the error as shown in Fig. 7a. A Gaussian fit to the distribution (red curve) has a mean value of α_{0} = 1.5 and a standard deviation of 0.2. Since the distribution of α is fairly narrow, it is tempting to correct for the systematic error using α = α_{0}, i.e., replacing δ → δ/α_{0}. We denote this estimate as ΣdEcorrected. To verify this procedure we randomly bisect the data into a “training” and a “test” set of equal size. α_{0} is determined from the training set and the MAE is calculated on the test set. The MAEs thus found were always 0.02–0.03 eV. We performed the same analysis using different sizes of the training set and found that an MAE of 0.03 eV is robust with a training set down to ≥5% of data points. This indicates the approach is insensitive to data used to determine α_{0}. In Fig. 7c, the ΣdEcorrected values are shown, where α_{0} was determined from the full distribution for simplicity. The ΣdEcorrected scheme shows excellent performance with an almost fourfold reduction of the MAE from 0.11 eV for the linear approximation to only 0.03 eV at a computational overhead of just one additional selfenergy evaluation.
The performance of the different correction schemes is summarized in Table 2.
Planewave extrapolation
The selfenergy and the derivative of the selfenergy (both evaluated at the KS energy) are calculated at three cutoff energies: 170, 185, and 200 eV. These values are then extrapolated to infinite cutoff, or an infinite number of plane waves, N_{PW} → ∞, by assuming a linear dependence on the inverse number of plane waves^{44}. An example of this fitting procedure is shown in Fig. 8a. The extrapolation procedure saves computational time while improving the accuracy of the results—provided the extrapolation is sufficiently accurate. Extrapolation can fail if convergence as a function of the planewave cutoff for the given quantity does not follow the expected 1/N_{PW} behavior in the considered cutoff range.
To validate this approach, we investigate the distribution of the r^{2} values for all 61716 extrapolations in C2DB. We split them into two cases: extrapolation of the selfenergy and extrapolation of the derivative of the selfenergy. The distributions are shown as histograms in Fig. 8b. The distributions are clearly peaked very close to 1, and in general, it seems that the extrapolation is very good. The distribution for the derivatives is somewhat broader, and the extrapolation is generally less accurate than for the selfenergies, which indicates a slower convergence with plane waves than for the selfenergies. If we choose r^{2} = 0.8 as an acceptable threshold, we find that 1.7% of the r^{2} values of the selfenergy extrapolation fall below this criterion while 5.0% are below for the derivative extrapolation. While these numbers might seem large, the problem is readily diagnosed (by the r^{2} value) and can be alleviated by using higher planewave cutoffs.
Scissors operator approximation
Within the socalled scissors operator approximation (SOA) it is assumed that the G_{0}W_{0} correction is independent of band and kindex. Consequently, the G_{0}W_{0} correction calculated at, e.g., the Γ point is applied to all the eigenvalues thus saving computational time as only one G_{0}W_{0} correction is required. In Fig. 9a, the idea is illustrated for a generic band. With the notation from the figure, the SOA consists of setting Δ(k) = Δ (or Δ_{nσ}(k) = Δ_{nσ} when more than one band and spin is involved).
To test the accuracy of the SOA, we evaluate the mean absolute error (\(\left\langle  \epsilon  \right\rangle\)) and maximum absolute error (\(\max ( \epsilon  )\)) of the band energies obtained with the SOA for each of the 370 materials:
and
The distribution of these errors is shown in Fig. 9b, c. From Fig. 9b, we see that the mean error exceeds 100 meV for about half of all materials—a rather large error, comparable to the target accuracy of the G_{0}W_{0} method itself. Furthermore, it follows from Fig. 9c that the maximum absolute error is often 0.5–1.0 eV. We conclude that while the average error of the SOA might be acceptable, it can produce significant errors for specific bands and should be used with care.
Summary and conclusions
As highthroughput computations are gaining popularity in the electronic structure community, it becomes important to establish protocols for performing various types of calculations in an automated, robust, and errorcontrolled manner. In this work, we have taken steps towards the development of automated workflows for G_{0}W_{0} band structure calculations of solids. With G_{0}W_{0} representing the stateoftheart for predicting QP energies in condensed matter systems, such workflows are essential for continued progress in the field of computational materials design.
Based on our detailed analysis of 61,716 G_{0}W_{0} selfenergy evaluations for the eigenstates of 370 twodimensional semiconductors we were able to draw several conclusions relevant to largescale GW studies. First of all, we found it useful to divide the states into two categories, namely quasiparticleconsistent (QPc) and quasiparticleinconsistent (QPic) states defined by Z ∈ [0.5, 1.0] and Z ∉ [0.5, 1.0], respectively. Importantly, we found that the QP energies obtained from the standard linearized QP equation are significantly more accurate for QPc states than for QPic state. Moreover, we found the fraction of QPic states to be much larger in magnetic materials (22%) than in nonmagnetic materials (0.36%). Thus, extra care must be taken when performing G_{0}W_{0} calculations for magnetic materials; in particular, such materials might require special treatment in highthroughput workflows.
The mean absolute error (MAE) on the QP energies resulting from the linearized QP equation was found to be 0.11 eV. The MAE evaluated separately for QPc and QPic states were 0.04 and 0.27 eV, respectively. In comparison, the accuracy of the GW approximation itself (compared to experiments) is on the order of 0.2 eV. It is therefore of interest to reduce or at least estimate the numerical error bar on the QP energies obtained from G_{0}W_{0} calculations. We found that an empirical scheme, where we set Z = 0.75 (corresponding to the mean of the Zdistribution) for QPic states, reduces the MAE from 0.11 to 0.06 eV with no computational overhead. Similarly, the method dubbed the corrected ΣdE scheme reduces the MAE to 0.03 eV, at the cost of one additional selfenergy evaluation. From these studies, it seems natural to accompany the QP energies obtained from G_{0}W_{0} with estimated error bars derived from one of these correction schemes. In fact, we have used the empZ@QPic method to correct all the GW band structures in the C2DB database.
Our analysis of the well known and widely used scissors operator approximation shows that the errors introduced on the individual QP energies when averaged over all bands (specifically the 3 highest valence and 3 lowest conduction bands) typically is on the order of 0.1 eV while the maximum error typically exceeds 1 eV. We stress that our scissors operator fits each of the six bands separately using the G_{0}W_{0} corrections at the Γpoint. Thus the errors introduced by the more standard scissors approximation that fits only the bandgap, are expected to be even larger. We conclude that the scissors operator should be used with care and only in cases where errors on specific band energies of 1–3 eV are acceptable.
Finally, the planewave extrapolation scheme was found to be highly reliable for our PAW calculations when applied to cutoff energies in the range 180–200 eV. In fact, only 1.7% (5.0%) of the selfenergy (a derivative of selfenergy) extrapolations had an r^{2} below 0.8. However, for the purpose of highthroughput studies, it may be prudent to store and make available information on the r^{2} for the extrapolation so that the quality of the extrapolation can always be examined and improved calculations with higher cutoff can be performed if deemed necessary.
Methods
G_{0}W_{0} calculations
For the materials considered here, DFT calculations using PBE^{45} were performed using an 800 eV planewave cutoff. Spin–orbit coupling is included by diagonalizing the spin–orbit Hamiltonian in the ksubspace of the Bloch states found from PBE.
Those materials that have a finite gap and up to 5 atoms in the unit cell are selected for G_{0}W_{0} calculations. The QP energies in C2DB are calculated for the 8 highest occupied and the 4 lowest unoccupied bands, however, in this study we only use the 6 bands closest to the Fermi level (3 valence and 3 conduction bands). Furthermore, we only include materials with a PBE gap greater than 0.2 eV as the accuracy of G_{0}W_{0} for materials with very small PBE gaps is questionable. Three energy cutoffs are used: 170, 185, and 200 eV. The results are then extrapolated to infinite energy, i.e., to an infinite number of plane waves. This extrapolation is done by expressing the selfenergies in terms of the inverse number of plane waves, 1/N_{PW}, performing a linear fit, and determining the value of the fit at 1/N_{PW} = 0^{35,46}.
The screened Coulomb interaction entering in the selfenergy is calculated using full frequency integration in real frequency space. To avoid effects from the (artificially) repeated layers. A Wigner–Seitz truncation scheme is used for the exchange part of the selfenergy^{47} and a 2D truncation of the Coulomb interaction is used for the correlation part^{44,48}. A truncated Coulomb interaction leads to significantly slower kpoint convergence because the dielectric function strongly depends on q around q = 0; this is remedied by handling the integral around q = 0 analytically^{49,50}. A kpoint density of 5.0/Å^{−1} was used.
The statistical analyses performed here use the data from all spins, kpoints, and the three highest occupied bands, and the three lowest unoccupied bands. In section IV B we consider several examples of the full frequencydependent selfenergies for a randomly selected spin, kpoint, and band combination, subject to some requirements on the quasiparticle weight, Z, which are described below.
Quasiparticle theory
The G_{0}W_{0} quasiparticle energies are found by solving the quasiparticle equation (QPE)^{37}:
Here ψ_{nkσ} is the Kohn–Sham wavefunction for band n, crystal momentum k, and spin σ, H_{KS} is the singleparticle Kohn–Sham Hamiltonian, Σ(ω) = Σ_{GW}(ω) − v_{xc} is the selfenergy, and v_{xc} is the exchangecorrelation potential.
Typically, and in C2DB, the QPE is solved via one iteration of the Newton–Raphson method starting from the KS energy, ϵ_{nkσ}, which is equivalent to making a linear approximation of the selfenergy. This yields the solution
Z is known as the quasiparticle weight. The G_{0}W_{0} correction is defined as the difference between the G_{0}W_{0} energy and KS energy, \({{\Delta }}{E}_{nk\sigma }={E}_{nk\sigma }^{\,\text{QP}\,}{\epsilon }_{nk\sigma }\).
Following ref. ^{49}, we provide here a physical interpretation of Z. We denote the manybody eigenstates for the N particle system by \(\left{{{\Psi }}}_{i}^{N}\right\rangle\), where i is the excitation index. An interesting question is how well the state \(\left{{{\Psi }}}_{i}^{N+1}\right\rangle\) can be described as the addition of a single electron to the ground state \(\left{{{\Psi }}}_{0}^{N}\right\rangle\). In other words, can we find a state ϕ such that \(\left{{{\Psi }}}_{i}^{N+1}\right\rangle \approx {c}_{\phi }^{\dagger }\left{{{\Psi }}}_{0}^{N}\right\rangle\)? The optimal ϕ is determined from maximizing the overlap, i.e.,
If the maximal overlap is close to 1 the excited manybody state is well approximated by a singleparticle excitation.
It turns out that the square of this maximal overlap is exactly equal to the QP weight Z defined by Eq. (6) if it is evaluated at the true QP energy and with the true QP wavefunction rather than at the KS energy and with the KS wavefunction. Furthermore, Z can be shown to be equal to the squared norm of the QP wavefunction, which is defined as
For proof of these results, we refer to ref. ^{49}. In standard G_{0}W_{0} calculations, the selfenergy is evaluated at the KS energy using KS eigenstates. In this case, Z is no longer equal to the exact QP weight but only approximates it. If Z deviates significantly from 1, we can only conclude that either (1) the system is strongly correlated so that the QP approximation fails, or (2) the Kohn–Sham energy and/or wavefunction are a bad approximation to the true QP energy and/or wavefunction. In either case, we would expect that the G_{0}W_{0} calculation is problematic and requires special attention.
Data availability
Data are available as an ASE^{51} database at https://cmr.fysik.dtu.dk/htgw/htgw.html.
References
Curtarolo, S. et al. The highthroughput highway to computational materials design. Nat. Mater. 12, 191–201 (2013).
Jain, A. et al. Fireworks: a dynamic workflow system designed for highthroughput applications. Concurr. Comput. 27, 5037–5059 (2015).
Pizzi, G., Cepellotti, A., Sabatini, R., Marzari, N. & Kozinsky, B. Aiida: automated interactive infrastructure and database for computational science. Comput. Mater. Sci. 111, 218–230 (2016).
Mortensen, J., Gjerding, M. & Thygesen, K. Myqueue: Task and workflow scheduling system. J. Open Source Softw. 5, 1844 (2020).
Greeley, J., Jaramillo, T. F., Bonde, J., Chorkendorff, I. & Nørskov, J. K. Computational highthroughput screening of electrocatalytic materials for hydrogen evolution. Nat. Mater. 5, 909–913 (2006).
Kirklin, S., Meredig, B. & Wolverton, C. Highthroughput computational screening of new liion battery anode materials. Adv. Energy Mater. 3, 252–262 (2013).
Zhang, Z. et al. Computational screening of layered materials for multivalent ion batteries. ACS Omega 4, 7822–7828 (2019).
Chen, W. et al. Understanding thermoelectric properties from highthroughput calculations: trends, insights, and comparisons with experiment. J. Mater. Chem. C 4, 4414–4426 (2016).
Bhattacharya, S. & Madsen, G. K. Highthroughput exploration of alloying as design strategy for thermoelectrics. Phys. Rev. B Condens. Matter 92, 085205 (2015).
Castelli, I. E. et al. Computational screening of perovskite metal oxides for optimal solar light capture. Energy Environ. Sci. 5, 5814–5819 (2012).
Hautier, G., Miglio, A., Ceder, G., Rignanese, G.M. & Gonze, X. Identification and design principles of low hole effective mass ptype transparent conducting oxides. Nat. Commun. 4, 1–7 (2013).
Yu, L. & Zunger, A. Identification of potential photovoltaic absorbers based on firstprinciples spectroscopic screening of materials. Phys. Rev. Lett. 108, 068701 (2012).
Kuhar, K., Pandey, M., Thygesen, K. S. & Jacobsen, K. W. Highthroughput computational assessment of previously synthesized semiconductors for photovoltaic and photoelectrochemical devices. ACS Energy Lett. 3, 436–446 (2018).
Thygesen, K. S. & Jacobsen, K. W. Making the most of materials computations. Science 354, 180–181 (2016).
Saal, J. E., Kirklin, S., Aykol, M., Meredig, B. & Wolverton, C. Materials design and discovery with highthroughput density functional theory: the open quantum materials database (oqmd). JOM 65, 1501–1509 (2013).
Jain, A. et al. Commentary: The materials project: a materials genome approach to accelerating materials innovation. APL Mater. 1, 011002 (2013).
Curtarolo, S. et al. Aflow: an automatic framework for highthroughput materials discovery. Comput. Mater. Sci. 58, 218–226 (2012).
Godby, R., Schlüter, M. & Sham, L. Accurate exchangecorrelation potential for silicon and its discontinuity on addition of an electron. Phys. Rev. Lett. 56, 2415 (1986).
Hüser, F., Olsen, T. & Thygesen, K. S. Quasiparticle GW calculations for solids, molecules, and twodimensional materials. Phys. Rev. B Condens. Matter 87, 235132 (2013).
Shishkin, M. & Kresse, G. Selfconsistent GW calculations for semiconductors and insulators. Phys. Rev. B Condens. Matter 75, 235102 (2007).
Borlido, P. et al. Exchangecorrelation functionals for band gaps of solids: benchmark, reparametrization and machine learning. Npj Comput. Mater. 6, 1–17 (2020).
GarciaLastra, J. M., Rostgaard, C., Rubio, A. & Thygesen, K. S. Polarizationinduced renormalization of molecular levels at metallic and semiconducting surfaces. Phys. Rev. B Condens. Matter 80, 245427 (2009).
Hedin, L. New method for calculating the oneparticle green’s function with application to the electrongas problem. Phys. Rev. 139, A796 (1965).
Hybertsen, M. S. & Louie, S. G. Electron correlation in semiconductors and insulators: band gaps and quasiparticle energies. Phys. Rev. B Condens. Matter 34, 5390 (1986).
Aryasetiawan, F. & Gunnarsson, O. The GW method. Rep. Prog. Phys. 61, 237 (1998).
Golze, D., Dvorak, M. & Rinke, P. The GW compendium: a practical guide to theoretical photoemission spectroscopy. Front. Chem. 7, 377 (2019).
Nabok, D., Gulans, A. & Draxl, C. Accurate allelectron G0W0 quasiparticle energies employing the fullpotential augmented planewave method. Phys. Rev. B Condens. Matter 94, 035118 (2016).
Shishkin, M., Marsman, M. & Kresse, G. Accurate quasiparticle spectra from selfconsistent GW calculations with vertex corrections. Phys. Rev. Lett. 99, 246403 (2007).
Schmidt, P. S., Patrick, C. E. & Thygesen, K. S. Simple vertex correction improves GW band energies of bulk and twodimensional crystals. Phys. Rev. B Condens. Matter 96, 205206 (2017).
Lejaeghere, K. et al. Reproducibility in density functional theory calculations of solids. Science 351, aad3000 (2016).
Faber, C., Attaccalite, C., Olevano, V., Runge, E. & Blase, X. Firstprinciples GW calculations for dna and rna nucleobases. Phys. Rev. B Condens. Matter 83, 115123 (2011).
Caruso, F., Rinke, P., Ren, X., Scheffler, M. & Rubio, A. Unified description of ground and excited states of finite systems: The selfconsistent GW approach. Phys. Rev. B Condens. Matter 86, 081102 (2012).
Umari, P., Stenuit, G. & Baroni, S. GW quasiparticle spectra from occupied states only. Phys. Rev. B Condens. Matter 81, 115104 (2010).
Govoni, M. & Galli, G. Large scale GW calculations. J. Chem. Theory Comput. 11, 2680–2696 (2015).
Klimeš, J., Kaltak, M. & Kresse, G. Predictive GW calculations using plane waves and pseudopotentials. Phys. Rev. B Condens. Matter 90, 075125 (2014).
Rasmussen, F. A. & Thygesen, K. S. Computational 2d materials database: electronic structure of transitionmetal dichalcogenides and oxides. J. Phys. Chem. C 119, 13169–13183 (2015).
Shishkin, M. & Kresse, G. Implementation and performance of the frequencydependent GW method within the paw framework. Phys. Rev. B Condens. Matter 74, 035101 (2006).
Rostgaard, C., Jacobsen, K. W. & Thygesen, K. S. Fully selfconsistent gw calculations for molecules. Phys. Rev. B Condens. Matter 81, 085103 (2010).
Bruneval, F. & Marques, M. A. Benchmarking the starting points of the gw approximation for molecules. J. Chem. Theory Comput. 9, 324–329 (2013).
Haastrup, S. et al. The computational 2d materials database: highthroughput modeling and discovery of atomically thin crystals. 2D Mater. 5, 042002 (2018).
Enkovaara, J. E. et al. Electronic structure calculations with gpaw: a realspace implementation of the projector augmentedwave method. J. Phys. Condens. Matter 22, 253202 (2010).
Jiang, H. & Blaha, P. GW with linearized augmented plane waves extended by highenergy local orbitals. Phys. Rev. B Condens. Matter 93, 115203 (2016).
Jiang, H. Revisiting the GW approach to dand felectron oxides. Phys. Rev. B Condens. Matter 97, 245132 (2018).
Rozzi, C. A., Varsano, D., Marini, A., Gross, E. K. & Rubio, A. Exact coulomb cutoff technique for supercell calculations. Phys. Rev. B Condens. Matter 73, 205119 (2006).
Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient approximation made simple. Phys. Rev. Lett. 77, 3865 (1996).
Tiago, M. L., IsmailBeigi, S. & Louie, S. G. Effect of semicore orbitals on the electronic band gaps of Si, Ge, and GaAs within the GW approximation. Physical Rev. B 69, 125212 (2004).
Sundararaman, R. & Arias, T. Regularization of the coulomb singularity in exact exchange by wignerseitz truncated interactions: Towards chemical accuracy in nontrivial systems. Phys. Rev. B Condens. Matter 87, 165122 (2013).
IsmailBeigi, S. Truncation of periodic image interactions for confined systems. Phys. Rev. B Condens. Matter 73, 233103 (2006).
Hüser, F., Olsen, T. & Thygesen, K. S. Quasiparticle gw calculations for solids, molecules, and twodimensional materials. Phys. Rev. B Condens. Matter 87, 235132 (2013).
Rasmussen, F. A., Schmidt, P. S., Winther, K. T. & Thygesen, K. S. Efficient manybody calculations for twodimensional materials using exact limits for the screened potential: Band gaps of mos 2, hbn, and phosphorene. Phys. Rev. B Condens. Matter 94, 155406 (2016).
Larsen, A. H. et al. The atomic simulation environment—a python library for working with atoms. J. Phys. Condens. Matter 29, 273002 (2017).
Acknowledgements
We acknowledge funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (Grant No. 773122, LIMA). The Center for Nanostructured Graphene is sponsored by the Danish National Research Foundation, Project DNRF103. This project has received funding in the European Union’s Horizon 2020 research and innovation program under the European Union’s Grant Agreement No. 951786 (NOMAD CoE). T.D. acknowledges financial support from the German Research Foundation (DFG Project No. DE 2749/21).
Author information
Authors and Affiliations
Contributions
A.R. performed the statistical analyses and full, frequencydependent selfenergy calculations. T.D. performed the G_{0}W_{0} calculations. K.S.T. conceptualized the project. All authors interpreted the analyses and wrote the article.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Rasmussen, A., Deilmann, T. & Thygesen, K.S. Towards fully automated GW band structure calculations: What we can learn from 60.000 selfenergy evaluations. npj Comput Mater 7, 22 (2021). https://doi.org/10.1038/s41524020004807
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41524020004807
This article is cited by

Efficient GW calculations in two dimensional materials through a stochastic integration of the screened potential
npj Computational Materials (2023)

Towards highthroughput manybody perturbation theory: efficient algorithms and automated workflows
npj Computational Materials (2023)

Representing individual electronic states for machine learning GW band structures of 2D materials
Nature Communications (2022)

A universal similarity based approach for predictive uncertainty quantification in materials science
Scientific Reports (2022)