Towards high-throughput many-body perturbation theory: efficient algorithms and automated workflows

The automation of ab initio simulations is essential in view of performing high-throughput (HT) computational screenings oriented to the discovery of novel materials with desired physical properties. In this work, we propose algorithms and implementations that are relevant to extend this approach beyond density functional theory (DFT), in order to automate many-body perturbation theory (MBPT) calculations. Notably, a novel algorithm pursuing the goal of an efficient and robust convergence procedure for GW and BSE simulations is provided, together with its implementation in a fully automated framework. This is accompanied by an automatic GW band interpolation scheme based on maximally-localized Wannier functions, aiming at a reduction of the computational burden of quasiparticle band structures while preserving high accuracy. The proposed developments are validated on a set of representative semiconductor and metallic systems.


I. INTRODUCTION
Computational HT screening is nowadays a consolidated approach to materials discovery, [1][2][3] as a complementary and accelerated tool with respect to experimental efforts.In the last decade, seminal works in this field have addressed, among many other topics, the discovery of novel 2D materials, [4][5][6][7][8][9][10] the identification of optimal new lithium-ion battery anodes, 11,12 thermoelectric, 13,14 photocatalysts 15 and photovoltaic light harvesting [16][17][18] materials.[23][24][25][26][27][28][29] Concerning the electronic structure field, most of these works and implementations are based on DFT, which allows one to compute total energies, optimized geometries, and other ground state properties of materials with predictive accuracy.1][32][33][34][35][36] In this context, MBPT and Green's function methods represent the state-of-the-art tools, where charged (electronic quasi-particle levels) and neutral excitations (optical properties, electron energy loss spectra) can be obtained by means of the GW approximation and the Bethe-Salpeter equation (BSE), respectively. 37o date, a limited number of attempts have been made toward automation 38 and HT screening 9,39 based on these MBPT approaches, mainly because of their conceptual and computational complexity.Indeed, depending on the specific physical problem, different levels of theory might be adopted, 40 further branching off depending on the chosen approximations and implementations. 41,42Furthermore, even for the simplest approximations, these calculations require the control over a much larger parameter space with respect to DFT, with parameters that are often interdependent and might change depending on the specific implementations adopted, but the choice of which is always crucial to obtain reliable results.From a computational point of view, this type of simulations are constituted by a chain of distinct steps, often relying on the usage of different software tools, each of them with their own specificity, e.g., in terms of memory and parallelization requirements.Memory requirements are much heavier than in standard DFT simulations even for moderate system size, and often require massive usage of parallel computing resources.Calculations often fail due to memory overflow and have to be restarted with careful choice of parameters.All of these problems make the application of MBPT-based approaches a complex and difficult task per se, and its automation still an open challenge.
Building on pioneering works in the field, 38,39 we here focus on the development of an improved algorithm aiming at an efficient and computationally cost-effective management of the choice of converged parameters for accurate GW-BSE calculations.The algorithm is implemented in the AiiDA framework, 28,29 a platform that is routinely used for HT studies 6,[43][44][45] and that incorpo-rates the ADES model for Automation, Data, Environment and Sharing. 24As detailed below, this implementation allows us to encode an efficient error handling, memory and parallelization management, and logic computational flows within automated python workflows.Moreover, it guarantees a seamless interoperability of different software codes that tackle the different steps usually involved in MBPT simulations, i.e., the preliminary DFT part (here using Quantum ESPRESSO 46,47 ), the GW-BSE calculations (Yambo 48,49 ), and any required post-processing.In particular, for the latter, we here introduce a scheme based on maximally localized Wannier functions 50 for the automatic interpolation of GW band structures, which interfaces the Wannier90 51 and Yambo projects.All of these developments point to a drastic reduction of both human and computational efforts, key issues for enabling HT studies.In addition, by incorporating the different domain-specific scientific and computational competences into robust and reliable workflows, we aim at making accurate GW-BSE calculations available for the materials science community at large, including non-experts in the field (e.g., via graphical user interfaces 52 ), similarly to what has already happened with DFT.
The manuscript is organised as follows.In the Results and Discussion Section, we first introduce a model for the convergence surface in the N-dimensional space of the GW (BSE) variables, and present our improved algorithm for efficiently retrieving converged values for the main (interdependent) parameters.The implementation of this algorithm within the aiida-yambo plugin is then described, followed by the presentation of the novel aiida-yambo-wannier90 plugin that encodes the GW band interpolation based on Wannierization.Both these implementations are validated for selected prototypical systems.Additional details on simulations are provided in the Methods section.In the remainder of this section, we instead introduce the main concepts and quantities related to the GW and BSE schemes, which will be useful to properly understand the subsequent Sections.

GW approximation
Accurate electronic band structures of materials can be computed within the MBPT framework by correcting the Kohn-Sham (KS) DFT eigenvalues with a selfenergy term Σ by means of the GW approximation, i.e.
Σ is approximated with the first term of the perturbation series expansion in terms of the screened Coulomb interaction W . 53 We hereafter consider the simplest and most widespread implementation of GW, i.e. the so-called G 0 W 0 approach, where G 0 is the KS independent-particle one-body Green's function and W is computed within the random-phase approximation (RPA). 54Nevertheless, the convergence algorithm presented here can be used also for more sophisticated flavours of the theory, e.g. the selfconsistent GW.Under these assumptions and by considering a plane-wave expansion, the self-energy term Σ nk for a band n at a given k-point -written as the sum of the Fock exchange (Σ x ) and the frequency-dependent correlation (Σ c ) terms -is given by: where v G (q) and W GG (q, ω ) are the bare and screened Coulomb interaction, respectively.The generalized dipole matrix elements ρ nm (k, q, G) are defined as: where KS nk and |nk are the corresponding KS eigenvalues and eigenvectors.Here, W can be written in terms of the reducible polarizability χ (W = v + vχv), which is in turn computed by solving a Dyson equation for the RPA irreducible polarizability χ 0 : where In practice, the above quantities are computed by in-troducing specific approximations that crucially impact both the accuracy and the computational and memory costs of the calculations.In particular, Eq. ( 4) as well as Eq. ( 2) involve a discrete, finite number of G vectors, set by a cutoff parameter G cut , which defines the size of the response matrix.Moreover, with the parameter N b , we introduce a truncation over the KS states summation for both response χ 0 GG (Eq.( 5)) and self-energy Σ c (Eq. ( 2)), which should in principle include all occupied and an infinite number of empty states.Finally, all the integrals in reciprocal space are computed on a discrete k-points (q-points) grid whose size, N k , defines the accuracy of the sampling of the Brillouin zone (BZ).All of these three parameters, G cut , N b , and N k , need to be increased till the desired convergence is reached.
One of the major obstacles to automate the convergence procedure lies in the interdependence of the first two parameters, G cut and N b , such that their convergence has to be performed jointly, as thoroughly discussed elsewhere. 38,42,55Indeed, Eqs. ( 4) and ( 5) contains a summation over both empty states and reciprocal lattice vectors (G), and the expression of the generalized dipole terms in Eq. ( 3) is such that matrix elements with large G are governed by high-energy KS states. 38Furthermore, the interdependence is non-trivial, given the presence of an inversion in Eq. ( 4) that further enters in the evaluation of the correlation self-energy, Eq. ( 2).Given the lack of an efficient "recipe" to carry out this non-trivial, coupled convergence, one has to resort to an iterative convergence of each of the two parameters by fixing the other, in an alternating way.This procedure, in addition of being tedious, is computationally very expensive, sometimes representing the most cumbersome part of a GW calculation.

Bethe-Salpeter Equation
Starting from the GW quasi-particles, the solution of the BSE 54,56 can give access to optical properties of materials via the macroscopic dielectric function: where V is the volume of the unit cell; N q is the number of q-points sampling the Brillouin zone (BZ); E λ (q) is the eigenvalue of the exciton λ at momentum q = k − k , and the corresponding exciton oscillator strength Φ λ (q) is defined as: Equation ( 7) contains a summation over valence and conduction bands (v, c), and k-point mesh, which are the main parameters to converge for a BSE calculation.The terms A λ vck (q) represent the weight of each electronhole transition contributing the exciton λ, as resulting from the solution of the BSE.The summation over bands mainly defines the range of energies under investigation.The k-point mesh is connected to the accuracy with which we describe the exciton composition in terms of single-particle v → c transitions over the entire BZ, and it is usually significantly larger than the one needed to converge the corresponding GW band structure.Additional convergence parameters are the number of G vectors used for expanding the KS wave-functions in the transition matrix elements, as defined in Eq. ( 3), and for Fast-Fourier-Transform (FFT) operations, as well as the plane-wave expansion of the BSE kernel (both direct and exchange terms).These parameters are usually inherited from GW convergences.

Description of the convergence surface
The above-described coupled convergence of the parameters, combined with a much worse scaling than DFT and more computationally and memory demanding calculations, call for efficient procedures to describe and explore the convergence space in GW-BSE simulations.8][59] For a general (N+1) dimensional space, a model convergence surface f (x) that represents the value of a given observable E(x) (e.g., quasiparticle energies or excitonic eigenvalues) as a function of the N parameters x = [x 1 , ..., x N ] can be defined as: where A i , b i and α i are free fitting parameters, and B = N i b i is the extrapolated converged value.Of course, the accuracy of the latter depends on the actual region of the parameter space explored for the evaluation of the convergence behaviour, i.e., how far is the extrapolated value from the exact one.As such, B might not always be a good choice for guiding the search of the convergence parameters.
In the following, we introduce conditions on the mixed partial derivatives of Eq. ( 8) as a way to address the parameter interdependence.Indeed, the adoption of an analytical form for the description of the convergence space has the clear advantage of enabling the calculation of allorder derivatives, once the fitting parameters are known.On the other hand, not taking directly into account this interdependence can result in a very tedious convergence procedure, as it would require the cyclic repetition of multiple univariate convergences, as mentioned in the In-troduction (further details are provided in Supplemental Material 60 and Ref. 49).
For the expression in Eq. ( 8), the gradient components are: while second derivatives are: The asymptotic region of the convergence surface can be determined by imposing, for each parameter x i , x j with i, j = 1, ..., N , two conditions: The first condition determines the region in which the convergence surface becomes flat (thus approaching convergence), whereas the condition on second partial derivatives ensures that the N parameters are no longer interdependent.The threshold values ∆ i and ∆ ij can be tuned according to the desired accuracy. 79Once this asymptotic region has been determined, a guess for the converged value of f (x), E guess , is made, and its accuracy is then checked and validated according to the automated algorithm described below.

Convergence algorithm
Figure 1 schematically depicts our convergence algorithm, which is purposed to obtain the desired accuracy on GW-BSE results with the least possible number of calculations.In the following, we consider N b and G cut as the two interdependent parameters to be converged simultaneously.We remark that, while the algorithm is specifically designed to handle coupled convergences, it can be successfully used to accelerate convergence tests with respect to any other parameter, such as the BZ kpoint mesh or the FFT grids.
The first step (i) consists in the construction of the N-dimensional space of parameters as a grid of equally spaced points, with spacing and ranges provided from input.It is worth noting that, to fit the functional form of Eq. ( 8), one needs to generate a grid with minimum 3N points, since f (x) contains three free fitting parameters for each of the N dimensions of the parameter space.However, a further reduction on the minimal grid size (that is, on the minimum number of calculations to perform) can be obtained by fixing the power-law dependence, α i , to a given value, as suggested in Ref. 38, thus resulting in a minimal grid size of 2N.Usually, much denser grids are generated, where (ii) M 0 ≥ 2N calculations are performed on a subset of the grid points, chosen such as to efficiently sample the parameter space.
Next (iii), the results of the calculations are fitted by using the expression Eq. ( 8).As mentioned above, the power laws are fixed to given values: α i ∈ {1,2} ∀ i=1, ...N, and the one resulting in the lowest mean squared error is chosen.The asymptotic region is then identified by computing the first and second order derivatives (Eqs.( 9)-( 11)), and imposing the conditions in Eq. (12).Given the asymptotic region, a guess converged value is selected , where (N 0 b , G 0 cut ) are the lowest values that can be chosen for the parameters such that E guess is within a desired convergence threshold ∆ with respect to the asymptotic region.
To establish the accuracy and convergence of the fitting procedure, (iv) E(N 0 b , G 0 cut ) is computed and compared to the outcome of the fit E guess , by considering the cho-sen threshold ∆ (see Eq. ( 13)a below).If the accuracy condition is satisfied, we need to check the convergence of the fitting procedure, that is, a new pair of parameters (N 1 b , G 1 cut ) is obtained from the fit by adding the (N 0 b , G 0 cut ) point to the initial M 0 grid, and compared to the previous one (see Eq. ( 13)b below).The last step is repeated until convergence is reached.If the accuracy condition is not satisfied, the grid is instead shifted toward higher values of the parameters, and the steps (ii)-(iv) are repeated until the two conditions: are simultaneously satisfied for the j-th iteration.
The aiida-yambo plugin and automated workflows The above convergence algorithm has been implemented in the new version of the aiida-yambo plugin, 61 which is meant to fully automate GW-BSE calculations by interfacing the Yambo project 48,49 and the AiiDA informatics infrastructure and workflow management system. 28,29The automation concerns input generation, scheduler submission, and output parsing phases. 80Thanks to the AiiDA infrastructure, links between single calculations are managed on the fly by adhoc, dynamic workflows (the so-called workchains in the AiiDA jargon), i.e. their execution path is not fixed, but can depend on the results of completed calculations.This allows for the implementation of complex logics, such as those characterizing the convergence algorithm and GW band interpolation that we propose in this work.Moreover, each calculation, together with inputs and outputs, is stored in the AiiDA relational database, thus ensuring data provenance and full reproducibility of results.
Currently, the aiida-yambo plugin supports quasiparticle (G 0 W 0 and COSHEX 53 level) and optical properties (IP-RPA and BSE) simulations, as well as interfaces with different codes (e.g., Quantum ESPRESSO and Wannier90).These options are implemented in the YamboCalculation and YppCalculation classes, which manage individual simulations (including data interfacing) that can be performed by using the Yambo code.On top of them, task-specific workflows are implemented, and organized in a modular way, in order to automate tasks of increasing complexity.In particular, the aiida-yambo plugin contains three main workflows, each of them targeting a precise task: • YamboRestart: automation of error handling and restart for each YamboCalculation; • YamboWorkflow: automation of the single GW or BSE flow (composed of several interlinked steps, explained in the following); • YamboConvergence: automation of the convergence (composed of multiple YamboWorkflow runs).
Their nested organization is shown in Fig. 2.
The highest level workflow is YamboConvergence, which calls multiple YamboWorkflow workchains.YamboWorkflow comprises all the steps needed to perform individual GW-BSE calculations from scratch.In case of failures, it calls the YamboRestart workchain for automatic error handling.The outputs are stored in the AiiDA database in a human readable fashion, and are easily accessible and shareable by the user.
The highest level workflow is represented by the YamboConvergence workchain, which implements the full automation of the convergence algorithm of Sec.II, thus allowing for all Yambo simulations to be organized on the fly, without any external user intervention.The user is only requested to provide a python list containing the information on the parameter space to be explored.An example of such input reads: ' s t a r t ' : [ 5 0 , 5 0 , 2 ] , ' stop ' : [ 4 0 0 , 4 0 0 , 1 0 ] , ' d e l t a ' : [ 5 0 , 5 0 , 2 ] , 'max ' : [ 1 0 0 0 , 1 0 0 0 , 3 6 ] , where the Yambo variables "BndsRnXp" and "Gbn-dRnge" govern the convergence over empty states, N b , to be carried out jointly with that on the size of the response matrix, G cut ("NGsBlkXp" variable).The edges of the grid ("start" and "stop") and its spacing ("delta"), together with an upper bound of the parameter space ("max"), limiting the search to computationally accessible calculations, are also set by the user.The key "what" indicates the quantity to be converged -in our example, the direct band gap of the material at Γ pointup to a given convergence threshold ∆ ("conv thr" key).The output summarizes the convergence history and allows the user to easily parse the converged simulation.YamboConvergence allows one to converge several manybody quantities, like quasiparticle levels, band-gaps, as well as optical excitation energies.Notably, the convergence block in Fig. 2 can be skipped if converged parameters are already known.
Each single GW (BSE) calculation is instead automated within YamboWorkflow, which is the core workchain of the plugin that takes care of performing all the steps needed in a typical Yambo simulation -from preliminary self-consistent (SCF) and non-self-consistent (NSCF) DFT calculations to the actual GW (BSE) calculations, and the related post-processing.The workflow ensures a robust interoperability between DFT and MBPT codes (Quantum ESPRESSO and Yambo, respectively), and links subsequent calculations, interfacing the data automatically.In practice, YamboWorkflow encodes the specific flowchart underlying each requested calculation, and allows for its dynamic execution according to the instructions provided in input.This implies performing all the intermediate steps needed for a specific calculations without the need of instructing them explicitly, or, on the contrary, to skip some of the intermediate steps for which parent calculations are available, fully exploiting the YamboWorkflow provenance information.
To support a restart mechanism in case of code failures, YamboWorkflow takes advantage of the YamboRestart workchain, a sub-level workflow that encodes an automatic error handler (inherited from the AiiDA BaseRestartWorkchain class) which, depending on the encountered failure, automatically instructs a restart run.For out-of-memory errors or failures connected with insufficient wall-time requests, YamboRestart automatically resubmits the calculation by appropriately changing the requested resources (e.g., the maximum wall-time and the MPI/OpenMP balance); parallelization errors are managed by overwriting the parallelism variables set in input by the user with the default parallelism decided on the fly by yambo.In all these cases, an efficient, CPUtime saving restart mechanism is implemented, which avoids to restart unfinished runs from scratch by automatically retrieving and enabling the reuse of stored data files.
As a final issue, we would like to discuss the possibility to develop protocols for MBPT calculations.Indeed, most of the DFT-based AiiDA plugins enable the use of protocols, 62 that is, the possibility of creating inputs with pre-populated default values for several parameters.Such protocols are usually code-agnostic and robust, given the high level of reproducibility of DFT with different quantum engines. 19Moreover, their reliability is guaranteed by means of large scale studies spanning systems with a wide variety of characteristics (i.e., metals, semiconductors, dimensionality and so on). 44oncerning MBPT calculations, the possibility to define protocols is still an open issue. 38First of all, codeagnostic parameters are not at all easy to be determined as it is for DFT-based codes, because the MBPT implementations and the subsequent definition of parameters can differ in very many aspects, as highlighted in the Introduction section.Secondly, the high computational cost of these simulations has limited so far the number of systems to be studied extensively, which is crucial to define a reliable statistics on convergence parameters.Last but not least, DFT-based protocols usually result in safe but overconverged parameters, an approach that might lead to unfeasible calculations when moving to the GW-BSE framework.
For all these reasons, we believe that an efficient, fully automated convergence tool, as the one presented here, is currently the most valuable solution.Nonetheless, also in view of possible future developments, the aiida-yambo plugin provides an implementation for a protocol framework for both GW and BSE simulations, which is currently pre-populated on the basis of previous experience on a limited subset of systems.Such protocols concern several parameters connected to the main yambo input variables, such as the FFT grids ("FFTGvecs"), the summation over empty states ("BndsRnXp" and "Gbn-dRnge"), the plane-wave expansion for the polarizability ("NGsBlkXp") and the BZ k-point sampling.A detailed documentation on these and other aspects concerning the aiida-yambo plugin is provided elsewhere. 63tomatic GW bands interpolation: the aiida-yambo-wannier90 plugin GW band interpolation from Wannierization is a crucial task in order to obtain the most accurate quasiparticle band structure with the lowest computational cost.This task is encoded in the aiida-yambo-wannier90 plugin. 64,65Essentially, the plugin provides a metaworkflow, called YamboWannier90WorkChain, which uti- lizes the automation and error handling of the underlying aiida-yambo and aiida-wannier90-workflows plugins for GW convergence and Wannierization, respectively.The flowchart of the workflow is summarized in Fig. 3. Starting from a given crystal structure, the workflow first launches a YamboConvergence workflow for automatic convergence.Then, it finds the minimal commensurate mesh with the Wannier90 ones that satisfies the GW convergence conditions (see below).Thirdly, it runs YamboWorkflow to compute all the quasiparticle corrections required for the Wannierization on the commensurate mesh, and a subsequent ypp calculation (by means of YppRestart) to extract the GW corrections in a Wannier90 eig file format.Fourthly, the workflow Wannierizes the KS wavefunctions, saving the unitary transformation matrices of maximal localization, and interpolates the band structure.Finally, the workflow performs the Wannierization procedure at G 0 W 0 level, which consists in incorporating the GW corrections into the DFT eigenvalues, and interpolating the band structure by using the DFT Wannierization outcomes.

Structure
A crucial step of the workflow is finding a commensurate mesh for both GW QP calculations and Wan-nierization.Indeed, the GW mesh resulting from automated convergence might not always be compatible with the mesh required by Wannier90 to ensure interpolation accuracy.Notably, considering a Monkhorst-Pack (MP) grid for the Wannierization, the corresponding GW mesh must be an integer multiple of the MP grid.We here propose a recipe to find the minimal commensurate meshes for GW and Wannier90 calculations, as depicted in Fig.  n c ) are already two good solutions.In fact, the optimal solution is often inside the triangular region determined by the input (n d , n c ), s low , and s high .The final solution is chosen according to the 1 distance to the input, such to ensure the minimal increase in computational cost.It is also possible to change the metric, e.g., pushing the solution towards increasing the Wannier mesh or GW mesh, depending on which calculation is less cumbersome.The aforementioned recipe is repeated for each of the three dimensions of the MP grid.

Validation of the workflows
The proposed convergence algorithm, as implemented in the YamboConvergence workchain, has been validated by performing convergence studies for the quasiparticle G 0 W 0 gap of a set of prototypical semiconductors: silicon, diamond, ZnO, rutile TiO 2 , monolayer MoS 2 , bulk and monolayer hBN.The convergence addresses the direct band gap at the Γ point with respect to the two coupled parameters, N b and G cut , and the k-point grid (in terms of k-points density ρ k ).Once convergence is achieved, the minimum band gap E G0W0 gap is also computed.The results are summarized in Table I, where, together with E G0W0 gap , we report the final parameters resulting from the convergence procedure (N b , G cut , and ρ k ) as well as the convergence threshold (absolute, ∆ Γ , and relative, ∆ Γ % ) adopted in each case.Our results are found in good agreement with previous findings, which are also reported in the Table I.Deviations with respect to reference results can be ascribed to different GW implementations and/or different DFT starting points used in the related works. 41Further details are contained in the Supplemental Material. 60igure 5 shows the convergence procedure for monolayer hBN, considering both the joint convergence with respect to N b and G cut (panel a), and the single-   parameter convergence with respect to the k-mesh (panel b).Starting from an input grid with N b ∈ [200,800] and G cut ∈ [4,16] Ry (Figure 5a, blue shaded area), a subset of 6 calculations is performed (black squares).Since the converged guess is above the upper bound of the parameter space (see "max" variable in the plugin description), a new shifted grid (orange shaded area) is considered, and a first guess for the converged parameters is found from the fitting procedure (blue square), now satisfying the accuracy condition (Eq.( 13)a).A new fit is performed including this additional point, which results in a new converged guess (red square).The procedure is repeated till the converged result is verified to be consistent with the prediction within the given threshold (Eq.( 13)a) and to be the true converged point (Eq.( 13)b).A similar path is followed for the k-mesh convergence (Figure 5b): a limited number of calculations is initially performed (black squares), from which the fitting is evaluated (blue curve), and the smallest grid compatible with the given threshold is finally selected (here 8×8×1).We note that, despite the results of the simulations seem to have an oscillating behaviour with respect to the fitted curve, the error bar considered here (from the 14×14×1 mesh) is ∼ 0.13% of the band gap at Γ, i.e. ∼ 10 meV, in line with the accuracy of state-of-the-art GW calculations.The dependence of the outcome of the algorithm on the input settings (e.g., the initial value of the parameters and the boundaries of the convergence space) is a key issue for evaluating the robustness of the algorithm itself.Indeed, the convergence procedure has been tested on the starting grid for monolayer MoS 2 .By using two different grids, [8,24] Ry, the convergence point obtained from the workflow remains the same, i.e., (N b ,G cut ) = (400,8).Another important issue to evaluate is the efficiency of the algorithm.We notice that, in the case of the 2D-hBN convergence shown in the left panel of Figure 5, only 14 calculations where required to reach convergence.Older implementations 49 would have required ≥ 25 simulations to achieve a final result, with over 40% reduction.
Next, the YamboWannier90WorkChain as been tested on bulk Si and Cu, in order to validate the automatic Wannier interpolation of GW band structures for both semiconductors and metals.Results are plotted in Fig. 6, where we compare the DFT bands, the Wannier inter- polated DFT, and the Wannier interpolated GW bands.For Si (Fig. 6a), the comparison between computed and interpolated DFT bands shows that the results are almost identical, indicating the accuracy of the Wannierization of the KS wavefunctions.Moreover, the typical band gap opening upon inclusion of GW corrections is found when comparing KS and QP band structures.
For Cu (Fig. 6b), we obtain a discrepancy of ∼ 10 meV around the Fermi energy (here set to zero) between computed and interpolated bands at the DFT level.Better accuracy can be achieved imposing more stringent values of the involved parameters.At QP level, the GW correction is very small around the Fermi level (∼ 37 meV), but still not negligible.Here, the GW convergence is more stringent than for Si, especially concerning the k-point mesh.Indeed, denser grids are needed to account for the contribution of intra-band transitions in the q→0 limit, which is crucial for metallic systems but not explicitly included within the plasmon pole approximation. 68,69Considering the converged parameters, (N b , G cut , ρ k ) = (400, 18 Ry, 0.2 Å−1 ), the quasiparticle evaluations required to interpolate the bands for the minimum converged Wan-nier90 k-point mesh (16×16×16) become 2900.This quite large number of QP can be easily computed using the YamboWorkflow workchain thanks to the possibility to split the QP calculation in several runs, each of them computing only a fraction of the GW corrections, and then collecting all the data in a final database well-suited for the YamboWannier90WorkChain. Since the number of QP corrections to compute can be quite high, in the Supporting Information (section S.IIIA) we suggest an effective way to reduce the number of the required calculations for cases when accurate QP corrections are needed only in a limited energy region, e.g.around the Fermi energy, and energies outside of the chosen region can be approximated e.g. through a scissor and stretching correction.

III. CONCLUSIONS
In this work, we have presented the successful design and implementation of advanced algorithms in state-ofthe-art GW (BSE) calculations, that is, convergence between interdependent parameters, error handling and automatic band interpolation by means of Wannierization.We validated the tools on selected cases among semiconductors and metallic systems.The results contained in this work clearly show the power of these newly developed workflows for the automated study of excited states properties of materials, paving the way for achieving highthroughput MBPT studies.Thanks to these developments and within the next-generation of pre-exascale and exascale supercomputers, these simulations may become extensively and routinely performed by the materialsscience community in the near future.

IV. METHODS
For all the systems studied here, we used symmetrized geometries in such a way to reduce the computational cost of simulations.We do not expect relevant differences in the results obtained with fully-relaxed structures.DFT simulations were carried out by using the Quantum ESPRESSO simulation package, which implements plane-wave basis set and pseudopotential approach.The KS-DFT exchange-correlation functional was approximated using GGA-PBE, 70 through the optimized norm-conserving Vanderbilt (ONCV) SG15 71,72 pseudopotentials.In the case of ZnO, we adopted Local Density Approximation (LDA), to compare the results with the existing literature, 41,42 and PseudoDojo pseudopotentials. 73GW and BSE results were obtained by means of the Yambo code.The frequency dependence of the screened interaction potential was approximated by using the Godby-Needs plasmon pole approximation 69 (GNPPA), and the quasiparticle energies were calculated according to the G 0 W 0 approximation, 53,54 as implemented in Yambo.The Bruneval-Gonze technique 74 was used to reduce the number of empty states N b needed for the construction of the correlation Self-Energy Σ c (Eq. ( 2)).For low-dimensional systems, spurious interactions between supercell replica were avoided using a slab truncation of the Coulomb potential 75 along the non-periodic direction; its divergences are cured by means of the Random Integration Method 76 (RIM), which also accelerates convergence with respect to the BZ sampling.For 2D systems, specifically, we adopted a recently developed accelerating tech-nique based on stochastic integration of the screened potential, 77 which allows to have GW-converged results using reduced Monkhorst-Pack k-points grids, just slightly denser than the DFT one.Finally, Wannierization and band interpolations are performed by means of the Wan-nier90 code.All the simulations are performed using the automated workflows implemented in the aiida-yambo and aiida-yambo-wannier90 plugins, developed for the AiiDA platform and presented here as part of the results achieved in this work.Input parameters are generated using the protocols procedure, as implemented in the corresponding plugins.
from the corresponding GitHub repositories, already referenced throughout the manuscript, see Refs.61,64.

I. CONVERGENCE PLOTS FOR THE SYSTEMS STUDIED IN THIS WORK
In the following, we show the convergence plot for the systems studied in this work, as done for in the main text for the monolayer hBN case.We converged the Γ − Γ band gap with respect to the two coupled parameters N b and G cut and the k-point grid as well, except for TiO 2 and ZnO.For diamond, we converged also the FFTGvecs parameter, governing the Fast-Fourier-Transform (FFT) size for the evaluation of the generalized dipole matrix elements.can be left intact.This can speed up large-scale calculations where the file size of these matrices are large, since usually the disk read/write speed is the slowest during the calculations.Moreover, reusing the U mnk means it will reach the same minimum as the Wannierization at DFT level, thus the Wannier interpolation accuracy of GW eigenenergies is expected to be similar to that at DFT level.

A. Effective band interpolation around given energy windows to reduce computational costs
The calculation of a large number of quasiparticle corrections can become easily computationally unfeasible for large systems.For example, in the case of PdCoO 2 , ? a bulk metallic system of interest for Fermi surface measurements, the minimum number of GW eigenvalues needed to obtain the interpolated band structure is N QP ∼ 3 • 10 4 .We propose here a method to effectively reduce the number of GW evaluation needed, if only a given energy region of interest requires an accurate band interpolation.Within this approximation, the quasiparticle correction to KS eigenvalues is introduced in such a way to be the exact one in a given energy range, and replaced by a scissor and stretching evaluation outside.The exact GW correction goes continuously into the scis- The smearing parameter T is tuned in such a way to reduce the bump between the two different types of corrections.The E center , in terms of KS energies, is user defined.In Fig. 13 are shown the results of the interpolation for different values of µ and T , considering E center =E KS F ermi .We observe that results are in very good agreement (≤ 10 −2 eV) for T ≤ 10 −2 eV.In particular for the metals case, we are often interested only in the region around the Fermi level, for example to compute the Fermi surface.This means that we require the exact GW correction to DFT only for a small region belonging to an energy window supposed to include the Fermi level, approximating the corrections for the other eigenvalues in such a way to still have a good Wannier interpolation but reducing the number of explicit GW eigenvalues.Anyway, we stress that since G 0 W 0 is not particle conserving, a large region centered at the DFT Fermi energy is needed in order to possibly include the new and unknown GW Fermi level, that can be calculated a posteriori.FIG.13: FD and scissor corrections for Silicon.We observe a rather good agreement, around 10 −2 eV, between the exact solution and the approximated ones for T≤ 10 −2 eV.Ecenter here is chosen to be the KS-DFT Fermi level.

FIG. 1 :
FIG.1: Flowchart of the convergence algorithm.After generating the grid for the N-dimensional parameter space, a subset of M simulations is performed.The results are then fitted to predict the converged parameters.Finally, the accuracy (Eq.(13)a) and the convergence (Eq.(13)b) of the prediction are verified, and the procedure iterated, if needed.

FIG. 3 :
FIG.3: Flowchart of the YamboWannier90WorkChain for the Wannier interpolation of GW band structures.After performing the Yambo convergence, the workflow searches for a commensurate k-point meshes for Yambo and Wannier90, and carries out the corresponding Yambo QP calculation.Given the QP corrections, the workflow proceeds with the Wannierization and the band interpolation at DFT and GW level.

4 .
Considering n d as the number of k-points chosen by the YamboConvergence workflow, and n c the number of k-points chosen by the Wannierization protocol (typically based on a k-point spacing, 0.2 Å−1 ) 45 , the target is to find a new (n d , n c ) such that the dense mesh n d = k • n c , where k ∈ N, i.e., natural number.The given input (n d , n c ) restricts the search space to a sector bounded by k low and k high (see Fig. 4), where k low = 1 (n d = n c = max(n d , n c ) is always a good solution), and k high = n d nc , where • indicates the ceiling integer.The search always succeeds since s low = (max(n d , n c ), max(n d , n c )) and s high = (k high • n c ,

FIG. 4 :
FIG.4: Recipe to find commensurate meshes for GW and Wannier90 calculations.Using input meshes(11, 5)  as an example, final commensurate meshes(12,6) are found.The k high and k low lines, that intersect in the origin, are respectively the imposed upper and lower bound for searching the commensurate meshes; the orange dots are possible solutions; the red dot is the chosen solution, which is the closest to the input in the metric of 1 norm.

FIG. 5 :
FIG. 5: Convergence algorithm applied to monolayer hBN.(a) Convergence of the direct quasiparticle band gap at the Γ point with respect to the coupled parameters N b and Gcut.The blue shaded area represents the starting grid, while the red shaded area the shifted grid obtained after the first iteration, according to the flowchart in Fig. 1.Black squares represent the actual calculations performed by the workflow; the blue square is the first guess for the converged parameters; the red square indicates the final, converged point.The colormap specifies the relative error with respect to the converged point.A maximum absolute error of ∆ = 42 meV is achieved for (N b , Gcut) = (1200,28 Ry), corresponding to a maximum relative error of ∆ % = 0.5 %.(b) Convergence of the direct quasiparticle band gap at the Γ point with respect to the k-mesh.The black points represent the actual calculations performed by the workflow, whereas the blue points are the ones obtained within the fitting procedure and used to predict the convergence.The final, converged mesh (red square) is achieved with five simulations.

FIG. 6 :
FIG. 6: Wannier interpolation of GW band structures.Band structure of Si (a) and Copper (b).Interpolated bands are plotted for both DFT (red dashed line) and GW (green solid line) eigenvalues, as compared to the DFT computed bands (black dots).

|E
sored and stretched one, and viceversa, through a Fermi-Dirac smearing, considering an energy window centered at E center and of width µ: KS −E center |−µ T ) × (A * E KS + b)

FIG. 12 :
FIG.12: Detailed flowchart of the YamboWannier90WorkChain for automated GW convergence and Wannier interpolated GW band structure.The workflow performs the Yambo searching of commensurate kpoint mesh between Yambo and Wannier90, and running the Yambo quasiparticle calculation.The QP correction is provided and the final steps of the flow comprise the Wannierizations and the band interpolations at DFT level and GW level.

TABLE I :
G0W0 convergence tests on prototypical semiconductors.
FIG.14: of provenance graph for YamboConvergence.Provided the inputs (the first column of boxes from the left), the YamboConvergence calls a series of YamboWorkflows to perform the needed runs of PwCalculation and YamboCalculation, represented by the red boxes.Outputs are then generated and analysed by the workflow.