Speeding up GW Calculations to Meet the Challenge of Large Scale Quasiparticle Predictions

Although the GW approximation is recognized as one of the most accurate theories for predicting materials excited states properties, scaling up conventional GW calculations for large systems remains a major challenge. We present a powerful and simple-to-implement method that can drastically accelerate fully converged GW calculations for large systems, enabling fast and accurate quasiparticle calculations for complex materials systems. We demonstrate the performance of this new method by presenting the results for ZnO and MgO supercells. A speed-up factor of nearly two orders of magnitude is achieved for a system containing 256 atoms (1024 valence electrons) with a negligibly small numerical error of ±0.03 eV. Finally, we discuss the application of our method to the GW calculations for 2D materials.

where ε r (ε a ) is the retarded (advanced) dielectric function. The screened exchange (SEX) 5 part of the self-energy, on the other hand, involves a summation only with occupied states. The convergence of GW results with respect to the number of conduction bands included in the above summations can sometimes be extremely slow, which has become both of a burden and a source of confusion since under-converged results reported by different groups can vary significantly. Our new method effectively allows including all conduction bands in the above two summations as explained below. We first use MgO as an example to illustrate the difficulty of scaling up conventional GW calculations for large systems. We then propose our new approach and demonstrate the performance of the newly developed method by showing results for MgO and ZnO. Finally, we discuss the application of our method to GW calculations for 2 dimensional (2D) materials. Due to the slow convergence of the GW results with respect to the k-point sampling density, interlayer spacing, and the number of conduction bands included in the calculations, reported quasiparticle band gap of monolayer MoS 2 ranges from 2.41 to 2.84 eV 14,17 , underlining the importance of affordable and efficient GW approaches for reliable predictions of the excited states properties of 2D systems.

The convergence issue in GW calculations.
To better illustrate the convergence issue in GW calculations, we show in Fig. 1 the convergence behavior of the calculated quasiparticle band gap of MgO (a two-atom primitive cell) as a function of the number of conduction bands included in the COH self energy summation and the kinetic energy cutoff for the dielectric matrices. Depending on the choice of these cutoff parameters, the calculated QP band gap varies from 7.25 to 7.90 eV. Once fully converged, we obtain a QP band gap of about 7.90 eV, in good agreement with the experimental value of 7.78 eV 18 and a previous calculation 11 . As it is shown in Fig. 1, a high kinetic cutoff (|G cut | 2 /2 ~ 40 Ry) for the dielectric matrices ε G,G′ (q, ω) and a large number of conduction bands (N c ~ 800) in the COH summation are required. If a small kinetic energy cutoff for the dielectric matrices is used, the band gap converges quickly but prematurely with respect to the number of band included in the summation.
With today's massively parallel computers, fully converged GW calculations for small systems such as 2-atom MgO can be done easily. The problem, however, quickly becomes intractable when one attempts to scale up the calculations for large systems containing hundreds of atoms because the number of conduction bands required in the COH and dielectric function calculations scales linearly with the system size (i.e., number of atoms). To put this problem in perspectives, suppose now we would like to carry out GW calculations for a 128-atom MgO supercell containing a defect (e.g., an F-center), the required number of bands would increase by 64 (i.e., 128/2) times. Not only it is extremely difficult to generate so many wave functions, storing these wave functions would also pose a serious challenge, not to mention performing the band summations in the GW calculations.
Scientific RepoRts | 6:36849 | DOI: 10.1038/srep36849 Considering the importance of predicting materials excited states properties, it is not surprising that there has been much work on improving the efficiency of the GW formalism for large systems beyond simple crystals. One direction is to introduce better (optimized) bases for representing the dielectric matrix and/or the self-energy operator beyond plane waves and KS eigen states [19][20][21] . The second direction is to reduce the number of conduction bands N c by replacing the summation over high-energy states with computationally tractable terms 7,8,11,22,23 , or completely removing the necessity of summing over unoccupied states [24][25][26] . Other efforts include applications of the Lanczos algorithm for calculating the dielectric functions 27,28 , the development of the stochastic GW approach 29 , and the separation of the dynamical and nonlocal correlations 30 in dynamical mean field theory (DMFT) calculations which may also be applied to GW calculations. These efforts underline the urgency of developing efficient GW methods to meet the challenge of fast and accurate predictions of the excited states properties of large systems such as nanostructures, defects in solids, and surfaces and interfaces. Some of these approaches, however, may require substantial modifications (or completely rewriting) existing GW codes, and the study of the convergence behavior of these methods is still in the early stage.
Replacing the band-by-band summation with an energy integration in GW calculations. Our approach to speeding up the calculation is inspired by the observation that the electron DOS of all materials can be divided into two regions: a low energy region in which the behavior of electronic states is strongly affected by the crystal potential (thus materials dependent) and a high energy region in which kinetic energy dominates and the DOS scales as − E V (0) xc as that for free electron systems, where V xc (0) is the averaged exchange-correlation potential of the material. Figure 2 compares the DOS of MgO and ZnO calculated within the LDA and that of the free-electron gas scaled with the volume of the respective unit cells. The DOS below certain energy (E 0 shown in left panel of Fig. 2) is what distinguishes one material from another. Above E 0 , the DOS can be well-approximated by that of free-electron gas.

Figure 2. Comparison between the DOS of MgO (left panel) and ZnO (right panel) calculated within the LDA (red) and that of free electron gas (black).
The DOS for the free electron gas is shifted with respect to the averaged exchange-correlation potential. The relative difference between the LDA DOS and that of free-electron gas DOS as a function of energy is also shown to gauge the quality of the approximation. This observation leads us to propose that the band summation in the GW calculations be divided into two regions: a low energy region (shaded area in Fig. 2) in which the band summation is carried out explicitly and a high-energy region in which the band summation is replaced (approximated) by an energy integration. For example, the static polarizability χ 0 is rearranged in the following form: is the approximated DOS for the system with a cell volume Ω, E 0 corresponds to the energy of band N 0 , and M vE (k, q, G) are the matrix elements between the valance band v and the conduction band at (or near) the energy E. In practical calculations, we find that the number of bands N 0 can be limited to less than 10 per atom for most systems. The auxiliary integral part can be carried out using simple numerical integration techniques by sampling the electronic states on an energy grid as illustrate in left panel of Fig. 2: where N E is the number of sampling points used in the numerical integration as shown schematically in the left panel of Fig. 2 with large blue dots. The same approach is applicable to the calculation of the COH self-energy. The integration can be carried on a uniform or nonuniform energy grid, and a typical energy step Δ E can range from 1 to 5 eV. Higher order numerical integration techniques may also be applied.

Demonstration of the accuracy and efficiency of the new method.
We now examine the accuracy of this new method. We first compare the GW band structure of MgO (2-atom primitive cell) calculated using the conventional approach and our new method as shown in left panel of Fig. 3. The difference is unnoticable at this scale. We have also tested the result for ZnO. The right panel of Fig. 3 shows the calculated GW band gap of ZnO as a function of number bands included in the COH self-energy summation (and in the dielectric matrix calculations). The difference between the results calculated using the two methods is again negligibly small (~0.02 eV). For the calculation using our new method, the number of integration points plus the number of bands within the energy window E 0 is about 350, to be compared with 3,000 bands in conventional GW calculations. We then test the stability and performance of our method using a reasonably small (but not too small) system, an MgO supercell containing 16 atoms, as an example. For this system, we choose N 0 to be 128. In other words, 8 conduction bands per atom are included in the explicit band-by-band summations in the calculation of the dielectric matrices and the COH self-energy. The auxiliary integral part is calculated on a uniform energy grid with a step Δ E ranging from 1.5 to 4.0 eV, and the integration is carried out up to an energy which is equivalent to including 8,000 (or 1,000 per 2-atom cell) conduction bands. The kinetic energy cutoff for the dielectric matrix is set at 40 Ry.
For such a small system, we can compare directly the results calculated using the new method and the conventional band-by-band summation approach, as shown in Table 1. As the energy integration step varies from 1.5 to 4 eV (while keeping N 0 fixed at 128), the number of conduction bands plus the number of the energy integration points decreases from 895 to 320, to be compared with 8,000 required in conventional calculations to achieve the same level of convergence, representing a speed-up factor of 9 to 25. It is rather encouraging that the results calculated using this new method are insensitive to the energy integration step Δ E, showing the stability of the this method. The GW band gap calculated with the conventional method is 7.86 eV, and that obtained from the new method only deviates by a negligible small amount of ± 0.02 eV.
Our method reduces the computational time of fully converged GW calculations for a 16-atom MgO supercell from about 5 hours to 15 minutes using 64 computing cores (8 Intel Xeon E5-2650 processors). We would like to mention that within this new approach, one only needs to calculate (and store) the wave functions of the N 0 low energy states plus selected high-energy conduction states at or near predefined energy grid points. The folded-spectrum method developed by Wang and Zunger 31 and an alternative approach proposed by Tackett and Ventra 32 are ideal for this purpose. Using these methods, one can efficiently calculate KS eigenfunctions at or near specified energy grid points without the need to calculate all KS states. Therefore, the speed-up factor shown in Table 1 also indicates the reduction in the required memory and disk space associated with storing the wave functions in GW calculations.
The most interesting and important aspect of our method is that the speed-up factor actually increases with increasing system size, enabling fully converged GW calculations for large systems containing hundreds of atoms. This is because with a given integration step size Δ E i , the weight of a single integration point, g(E i )Δ E i in Eq.5, which measures effectively the number of states contributing to the summation represented by a single state at E i , scales linearly with the system size. To demonstrate that our method can indeed be scaled up for large systems, we have carried out GW calculations for MgO systems containing 2 to 256 atoms. Table 2 shows the performance of our method, both in terms of accuracy and the speed-up factor. A speed up factor of over 80 times is achieved for the largest system containing 256 atoms (1,024 valence electrons), and the numerical integration error, measured by the calculated band gap, is only ± 0.03 eV for all systems. Thus the greater speed-up for larger systems does not compromise the accuracy of this method, and there is still room for improvement (for example, with improved numerical integration techniques). We would like to mention that such fully converged GW calculations for large oxide systems containing over 200 atoms would be extremely difficult using the conventional approach due to the vast amount of computational resources required (both in terms of the computational time and the memory and storage requirements).
Applications to 2D materials. GW calculations for 2D materials pose additional challenges due to the analytical behavior of the 2D electronic screening, which is different from 3 dimensional (3D) systems, and the need to remove interactions between period images [33][34][35][36] . The rapid change in the 2D dielectric function as → → q 0 requires rather dense k-point sampling even with clever integration techniques 33,36 ; the spurious interlayer interaction can be reduced with increasing interlayer distance and with the use of truncated Coulomb potential 33,34 . Unfortunately, even with these creative methods, fully converged GW calculations for 2D materials remain a serious challenge since the number of conduction bands required in the calculation scales linearly with cell volume. For example, Qiu et al. concluded that at least 6,000 bands are needed for properly converged GW results of monolayer MoS 2 .
Our method can be directly applied to 2D materials in the same way as 3D bulk materials. Using this new method, we thoroughly studied the convergence behavior of monolayer MoSe 2 . The left panel of Fig. 4 shows the convergence behavior of the calculated COH self-energy for the top of the valence band of MoSe 2 as a function  Table 2. Performance of the new method with increasing system size.
of the number of conduction bands and the kinetic energy cutoff of the dielectric function. Depending on the the choices of these parameters, the calculated Σ COH varies from less than − 11.5 eV to about − 9.5 eV. The right panel of Fig. 4 shows a similar convergence behavior of calculated quasiparticle band gap E g of MoSe 2 . The variation in band gap, although not as large as that in the COH enegy due to cancellation of errors between valence and conduction bands, is still significant. These results again demonstrate the importance of the band convergence issue in GW calculations, and often seemingly conflicting GW results can be traced to the convergence issue. We mention that GW calculations for 2D materials usually converge much slower with respect to k-point sampling than 3D systems 33,35 . if one employs truncated Coulomb potentials, which is usually required to avoid spurious influences from periodic images. As shown in Fig. 5, for monolayer MoS 2 and MoSe 2 , one has to increase the k-grid sampling to at least 18 × 18 × 1 in order to converge the quasiparticle band gap. In contrast, 6 × 6 × 6 k-grids are usually sufficient for typical bulk materials with small unit cells containing a few atoms. Our calculations make use of the truncation technique proposed by Ismail-Beigi 33 . This issue significantly increases the computational costs for GW calculations for 2D systems and makes the convergence test even more trickier and difficult due to possible cancellations of errors arising from various cut-off parameters. Our new method enables including effectively all conduction bands in the GW calculations with greatly reduced computational costs (by one to two orders of magnitude) compared with conventional methods. Using this new method, we have calculated GW band gap of free-standing monolayer MoSe 2 to be 2.28 eV, consistent with previously reported value 2.26 eV 15 , and the band gap of monolayer MoS 2 to be 2.64 eV, which is also in good agreement with a previously calculation 37 . These previous results were obtained with highly converged (but computationally expensive) conventional GW calculations.

Conclusion
In conclusion, we present an efficient integration method that can drastically speed up fully converged GW quasiparticle calculations for large systems. Our method takes advantage of the fact that the DOS of all materials resembles that of the free electron gas at high energies, thus the summation over high-energy conduction bands in both the dielectric function and self-energy calculations within the GW approximation can be well approximated  by a numerical integration on a sparse energy grid. Using this new method, we can now carry out fully converged GW calculations for large systems without the need to perform tedious convergence tests. We have demonstrated a nearly two orders of magnitude speed-up of GW calculations for a 256-atom MgO model system while achieving a numerical accuracy of ± 0.03 eV for the predicted band gap. Although we have presented results calculated within the so-called G 0 W 0 approximation and using the HL-GPP model for the dielectric function, we believe that our approach can be applied to various levels of self-consistent GW calculations and/or GW calculations without the use of plasmon-pole models. Our new method also works equally well for 2D systems. Our approach is conceptually simple and is very easy to be implemented in well-developed GW packages.

Methods
All GW results presented in this work are calculated within the G 0 W 0 approach using a modified version of the BerkeleyGW 16 package. The Hybersten-Louie generalized plasmon-pole (HL-GPP) model 5 is used to extend the static dielectric function to finite frequencies. The new method presented in this work, however, can be applied to other GW implementations including various self-consistent GW approaches. The DFT calculations within the LDA are carried out using the PARATEC 38 code. Norm-conserving Troullier-Martin pseudopotentials 39