Introduction

Density functional theory (DFT)1,2 has shown tremendous success in the calculation of electronic ground-state properties. However, it is well known that band gaps of solids and HOMO–LUMO gaps of molecules are often significantly underestimated when computed using Kohn–Sham (KS) eigenvalues3,4. In order to remedy this issue, the GW method5,6,7 is often employed in which a self-energy correction to the DFT KS energies is computed. The resulting quasiparticle energies are in agreement with experimental measurements for a wide range of materials. However, the large numerical effort required for GW calculations and the method’s unfavorable scaling with system size have traditionally restricted applications to relatively small systems8,9. The most expensive step is the computation of the interacting density–density response function (DDRF), which is closely related to the inverse dielectric matrix. In particular, the non-interacting DDRF is typically computed by carrying out a slowly converging summation over all unoccupied states8,10,11. Afterward, the non-interacting DDRF must be inverted to calculate the interacting DDRF.

To overcome these limitations of the GW approach, significant efforts have been made in recent years to develop scalable implementations12,13,14,15,16. Alternatively, model DDRFs (or model dielectric functions) have been developed to accelerate GW calculations. For example, Hybertsen and Louie constructed a model dielectric matrix based on the assumption that the local screening response of the material is similar to that of a homogeneous medium with the same local density17. A similar model was also proposed by Cappellini et al.18,19. However, it has proven difficult to generalize these model dielectric functions to highly non-uniform systems, such as isolated molecules or nano-clusters whose screening properties differ substantially from uniform systems. To overcome this limitation, Rohlfing9 proposed to express the dielectric matrix as a sum of atomic contributions attributing a density response resulting from a Gaussian-shaped charge density to each atom. This model dielectric matrix contains a number of parameters that need to be determined, for example, by comparison to calculated RPA dielectric functions.

In recent years, machine learning (ML) techniques have been widely adopted to predict scalar properties of materials, such as the total energy. A key ingredient in ML approaches is the descriptor which parametrizes the atomic and chemical structure of the material. Many descriptors used in computational chemistry are explicitly constructed to be invariant under rotations and translations: for example, ACE20, SOAP21, the Coulomb matrix22,23, bag-of-bonds24 or fingerprint-based descriptors have been shown to be reliable descriptors for the prediction of scalar quantities. When predicting tensors or functions, however, it is no longer sufficient to employ a rotationally invariant descriptor. To alleviate this problem, Grisafi et al.25 developed a symmetry-adapted version of the SOAP kernel which is equivariant under rotations and was successfully used in the prediction of polarizability tensors and first hyperpolarizabilities25,26, dipole moments27 and electronic densities28. Several other groups also explored ML approaches for the electronic density, including Brockherde et al.29, Alred et al.30, and Chandrasekaran and co-workers31. Moreover, the construction of group-equivariant neural networks (NNs), such as Clebsch–Gordan networks32,33,34, tensor-field networks35, and spherical convolutional NNs (CNNs)36,37 have seen significant developments in recent years, and the implementation of these methods has been significantly simplified by frameworks such as e3NN38 developed by Geiger et al.39, thus providing promising alternatives to the symmetry-adapted SOAP for the learning of functions.

In this work, we address the problem of predicting non-local response functions, such as the DDRF. Predicting such quantities is a formidable challenge: for example, the DDRF of a small silicon cluster can be tens of gigabytes in size when represented on a plane-wave basis, even when a modest plane-wave cutoff is used. To address this problem, we introduce a decomposition of the DDRF into atomic contributions, which can be predicted using ML techniques. To ensure that the ML model appropriately incorporates the transformation properties of the DDRF, we also develop a descriptor called neighborhood density–matrix (NDM), which transforms in the same way as the DDRF under rotations and is used in conjunction with a dense NN to predict the atomic contributions to the DDRF. We then use the ML DDRFs to carry out GW calculations of hydrogenated silicon clusters. This approach which we refer to as the ML–GW method, produces accurate GW quasiparticle energies at a significantly reduced computational cost compared to standard implementations. We note that recently several attempts were made to use ML to directly predict quasiparticle energies in materials40,41,42. In contrast, the ML–GW approach still solves a physical model (the quasiparticle equation) but uses ML DDRFs to accelerate calculations.

Results

Theoretical results

The GW method yields accurate quasiparticle energies by applying a self-energy correction to the mean-field KS energy levels. The GW self-energy \(\Sigma ({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} },\omega )\) is calculated from the one-electron Green’s function \(G({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} },\omega )\) and the screened Coulomb interaction \(W({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} },\omega )\) according to 7,8,43

$$\Sigma ({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} },\omega )=\frac{i}{2\pi }\int\,{e}^{-i\delta {\omega }^{{\prime} }}G({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} },\omega +{\omega }^{{\prime} })W({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} },{\omega }^{{\prime} })d{\omega }^{{\prime} }$$
(1)

with δ denoting a positive infinitesimal. The screened Coulomb interaction is, in turn, computed from the bare Coulomb interaction \(v({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} })\) and the inverse dielectric matrix \({\epsilon }^{-1}({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} },\omega )\) via

$$W({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} },\omega )=\int\,{\epsilon }^{-1}({{{\bf{r}}}},{{{{\bf{r}}}}}_{2},\omega )v({{{{\bf{r}}}}}_{2},{{{{\bf{r}}}}}^{{\prime} })d{{{{\bf{r}}}}}_{2},$$
(2)

which demonstrates that the dielectric matrix constitutes a key ingredient in GW calculations. It can be obtained from the interacting DDRF \(\chi ({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} },\omega )\) according to

$${\epsilon }^{-1}({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} },\omega )=\delta ({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} })+\int\,v({{{\bf{r}}}},{{{{\bf{r}}}}}_{2})\chi ({{{{\bf{r}}}}}_{2},{{{{\bf{r}}}}}^{{\prime} },\omega )d{{{{\bf{r}}}}}_{2}.$$
(3)

In the remainder of this paper, we will assume that the frequency dependence of the dielectric matrix can be approximated by the generalized plamon-pole approximation (GPP)7,44,45. As a consequence, only the static DDRF \(\chi ({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} })\equiv \chi ({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} },\omega =0)\) needs to be determined.

Within the random-phase approximation (RPA), the interacting static DDRF is given by

$$\begin{array}{l}\chi ({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} })\,=\,{\chi }_{0}({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} })\\ \qquad\qquad\,\,\, + \,\int\,d{{{{\bf{r}}}}}_{1}d{{{{\bf{r}}}}}_{2}{\chi }_{0}({{{\bf{r}}}},{{{{\bf{r}}}}}_{1})v({{{{\bf{r}}}}}_{1},{{{{\bf{r}}}}}_{2})\chi ({{{{\bf{r}}}}}_{2},{{{{\bf{r}}}}}^{{\prime} })\end{array}$$
(4)

with \({\chi }_{0}({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} })\) denoting the static non-interacting DDRF, which is typically computed as a sum over empty and occupied states10,11 according to

$$\begin{array}{l}{\chi }_{0}({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} })\,=\,\mathop{\sum}\limits_{ij}\frac{{f}_{i}(1-{f}_{j})}{{\epsilon }_{i}-{\epsilon }_{j}}\\ \qquad\qquad\quad \times \left[{\phi }_{i}^{* }({{{\bf{r}}}}){\phi }_{j}({{{\bf{r}}}}){\phi }_{j}^{* }({{{{\bf{r}}}}}^{{\prime} }){\phi }_{i}({{{{\bf{r}}}}}^{{\prime} })+\,{{\mbox{c.c.}}}\,\right].\end{array}$$
(5)

Here, ϵi, fi, and ϕi(r) denote the orbital energy, occupancy, and wavefunctions of the KS state i.

Equations (4) and (5) highlight the two main challenges in computing the DDRF: (1) the calculation of the non-interacting DDRF requires a summation of all empty states, which is slowly converging, and (2) the calculation of the interacting DDRF requires a matrix inversion which scales unfavorably with system size.

In order to bypass the expensive computation of the DDRF and pave the way toward an ML approach, we propose to express \(\chi ({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} })\) as a sum of atomic contributions \({\chi }_{i}({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} })\) according to

$$\chi ({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} })=\mathop{\sum }\limits_{i=1}^{N}{\chi }_{i}({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} }),$$
(6)

where i labels atoms and N is the total number of atoms.

How this partitioning is achieved is not immediately obvious. However, the atomic contributions to the DDRF should have the following properties: (1) the atomic contributions should be localized in the vicinity of the corresponding atom, (2) they should retain the global symmetry of χ, i.e., \({\chi }_{i}({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} })={\chi }_{i}({{{{\bf{r}}}}}^{{\prime} },{{{\bf{r}}}})\), and (3) they should integrate to zero, i.e., \(\int\,{\chi }_{i}({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} })d{{{\bf{r}}}}=\int\,{\chi }_{i}({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} })d{{{{\bf{r}}}}}^{{\prime} }=0\), to ensure that the change in the charge density induced by a perturbing potential is overall charge neutral8.

We start by expressing the DDRF in a localized basis set of real orbitals \(\{{\phi }_{{\alpha }_{a}}^{a}({{{\bf{r}}}})\}\), where a labels the atom on which the basis function is centered and αa indexes the orbital on site a46. In this basis the DDRF is given by

$$\chi ({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} })=\mathop{\sum}\limits_{a,{\alpha }_{a}}\mathop{\sum}\limits_{b,{\alpha }_{b}}{\chi }_{{\alpha }_{a}{\alpha }_{b}}^{ab}{\phi }_{{\alpha }_{a}}^{a}({{{\bf{r}}}}){\phi }_{{\alpha }_{b}}^{b}({{{{\bf{r}}}}}^{{\prime} }),$$
(7)

where \({\chi }_{{\alpha }_{a}{\alpha }_{b}}^{ab}\) is a symmetric matrix. This expression suggests the following decomposition of the DDRF into atomic contributions

$${\chi }_{i}({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} })=\frac{1}{2}\mathop{\sum}\limits_{{\alpha }_{i}}\mathop{\sum}\limits_{b,{\alpha }_{b}}\left({\chi }_{{\alpha }_{i}{\alpha }_{b}}^{ib}{\phi }_{{\alpha }_{i}}^{i}({{{\bf{r}}}}){\phi }_{{\alpha }_{b}}^{b}({{{{\bf{r}}}}}^{{\prime} })+{\chi }_{{\alpha }_{b}{\alpha }_{i}}^{bi}{\phi }_{{\alpha }_{b}}^{b}({{{\bf{r}}}}){\phi }_{{\alpha }_{i}}^{i}({{{{\bf{r}}}}}^{{\prime} })\right).$$
(8)

We refer to the representation of the DDRF in the basis \(\{{\phi }_{{\alpha }_{a}}^{a}({{{\bf{r}}}})\}\) as 2-center DDRF (2C-DDRF) because it contains pairs of basis functions which are centered on different atoms.

Using the symmetry of \({\chi }_{{\alpha }_{i}{\alpha }_{w}}^{iw}\) and the fact that the basis functions are real, it can be easily verified that \({\chi }_{i}({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} })={\chi }_{i}({{{{\bf{r}}}}}^{{\prime} },{{{\bf{r}}}})\). We can also ensure that \(\int\,{\chi }_{i}({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} })d{{{\bf{r}}}}=0\) by removing all s-orbitals from the basis: see the computational methods section for details. The locality of \({\chi }_{i}({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} })\) is directly inherited from the corresponding properties of the full DDRF. In particular, we have found that the expansion coefficients \({\chi }_{{\alpha }_{i}{\alpha }_{w}}^{iw}\) decay rapidly as the distance between atom i and atom w increases47.

We stress that this atomic representation of the DDRF is exact, i.e., \({\sum }_{i}{\chi }_{i}({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} })\) reproduces the full interacting DDRF when the local basis sets are complete. However, the atomic contributions to the DDRF contain contributions from pairs of basis functions that are centered on different atoms, see Eq. (8). These contributions are difficult to learn using atom-centered descriptors.

To make progress, we exploit the localization of \({\chi }_{i}({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} })\) and expand it in terms of a set of basis functions \({\psi }_{nlm}^{i}({{{\bf{r}}}})={Y}_{lm}(\hat{{{{\bf{r}}}}}){R}_{n}(| {{{\bf{r}}}}| )\) (with Ylm denoting the spherical harmonics and Rn a set of radial functions), which are all centered on atom i according to

$$\begin{array}{l}{\chi }_{i}({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} })\,=\,\mathop{\sum}\limits_{nlm}\mathop{\sum}\limits_{{n}^{{\prime} }{l}^{{\prime} }{m}^{{\prime} }}{\chi }_{nlm{n}^{{\prime} }{l}^{{\prime} }{m}^{{\prime} }}^{(i)} {Y}_{lm}(\hat{{{{\bf{r}}}}}){Y}_{{l}^{{\prime} }{m}^{{\prime} }}^{* }({\hat{{{{\bf{r}}}}}}^{{\prime} }){R}_{n}(| {{{\bf{r}}}}| ){R}_{{n}^{{\prime} }}^{* }(| {{{{\bf{r}}}}}^{{\prime} }| )\end{array}$$
(9)

with \({\chi }_{nlm{n}^{{\prime} }{l}^{{\prime} }{m}^{{\prime} }}^{(i)}\) denoting the expansion coefficients given by

$$\begin{array}{l}{\chi }_{nlm{n}^{{\prime} }{l}^{{\prime} }{m}^{{\prime} }}^{(i)}\,=\,\int\int\,d{{{\bf{r}}}}d{{{{\bf{r}}}}}^{{\prime} }{\chi }_{i}({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} }){R}_{n}^{* }(| {{{\bf{r}}}}| ){R}_{{n}^{{\prime} }}(| {{{{\bf{r}}}}}^{{\prime} }| )\,{Y}_{lm}^{* }(\hat{{{{\bf{r}}}}}){Y}_{{l}^{{\prime} }{m}^{{\prime} }}({\hat{{{{\bf{r}}}}}}^{{\prime} }).\end{array}$$
(10)

These coefficients can be learned using a NN based on atom-centered descriptors. We refer to the representation of the DDRF in the basis \(\{{\psi }_{nlm}^{i}({{{\bf{r}}}})\}\) as 1-center DDRF (1C-DDRF) because it only contains pairs of basis functions centered on the same atom.

As discussed in the introduction, it is not appropriate to use a scalar descriptor (such as the standard SOAP descriptor48) that is invariant under rotations to develop an ML model for the DDRF: the behavior of the atomic DDRFs under rotations is determined by their analytical form: see Eq. (9). In particular, we show in the Supplementary Discussion that the coefficients of the atomic DDRF transform according to

$${\tilde{\chi }}_{nl{m}_{1}{n}^{{\prime} }{l}^{{\prime} }{m}_{2}}^{(i)}=\mathop{\sum}\limits_{m,{m}^{{\prime} }}{D}_{{m}_{1}m}^{l}(\hat{R}){D}_{{m}_{2}{m}^{{\prime} }}^{{l}^{{\prime} }* }(\hat{R}){\chi }_{nlm{n}^{{\prime} }{l}^{{\prime} }{m}^{{\prime} }}^{(i)},$$
(11)

where \({\tilde{\chi }}_{nlm{n}^{{\prime} }{l}^{{\prime} }{m}^{{\prime} }}^{(i)}\) denote the coefficients of the transformed DDRF, \(\hat{R}\) is a rotation and \({D}_{m{m}^{{\prime} }}^{l}(\hat{R})\) is a Wigner D-matrix49.

Next, we construct the NDM descriptor, which transforms under rotations in the same way as the atomic DDRF. The starting point for such a descriptor is a non-local extension of the smooth neighborhood density of atom i of species η employed in the SOAP descriptor21, defined as

$${\rho }_{i}^{\eta }({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} })=\mathop{\sum}\limits_{k\in \eta }\mathop{\sum}\limits_{l\in \eta }{e}^{-\alpha {({{{\bf{r}}}}-{{{{\bf{r}}}}}_{k})}^{2}}{e}^{-\alpha {({{{{\bf{r}}}}}^{{\prime} }-{{{{\bf{r}}}}}_{l})}^{2}},$$
(12)

where k and l run over atoms in the neighborhood of atom i within a cut-off radius Rcut and α is a hyperparameter that describes the size of an atom. The NDM is then expanded in a basis of spherical harmonics and radial basis functions Rn(r) according to

$$\begin{array}{l}{\rho }_{i}^{\eta }({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} })\,=\,\mathop{\sum}\limits_{nlm}\mathop{\sum}\limits_{{n}^{{\prime} }{l}^{{\prime} }{m}^{{\prime} }}{\rho }_{nlm{n}^{{\prime} }{l}^{{\prime} }{m}^{{\prime} }}^{(i,\eta )}\,{Y}_{lm}(\hat{{{{\bf{r}}}}}){Y}_{{l}^{{\prime} }{m}^{{\prime} }}^{* }({\hat{{{{\bf{r}}}}}}^{{\prime} }){R}_{n}(| {{{\bf{r}}}}| ){R}_{{n}^{{\prime} }}^{* }(| {{{{\bf{r}}}}}^{{\prime} }| ),\end{array}$$
(13)

with \({\rho }_{nlm{n}^{{\prime} }{l}^{{\prime} }{m}^{{\prime} }}^{(i,\eta )}\) being expansion coefficients. The above equation shows that the NDM transforms in the same way as the atomic DDRF: see Supplementary information for additional details. Therefore, we use the expansion coefficients as a descriptor for learning the DDRF.

We note that the NDM can be written as the product of two neighborhood densities \({\rho }_{i}^{\eta }({{{\bf{r}}}})={\sum }_{k\in \eta }\exp \{-\alpha {({{{\bf{r}}}}-{{{{\bf{r}}}}}_{k})}^{2}\}\) according to

$${\rho }_{i}^{\eta }({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} })={\rho }_{i}^{\eta }({{{\bf{r}}}}){\rho }_{i}^{\eta }({{{{\bf{r}}}}}^{{\prime} }).$$
(14)

Similar to the NDM, \({\rho }_{i}^{\eta }({{{\bf{r}}}})\) can be expanded in a basis of spherical harmonics and radial basis functions Rn(r) with coefficients \({\rho }_{nlm}^{(i,\eta )}\). It follows that

$${\rho }_{nlm{n}^{{\prime} }{l}^{{\prime} }{m}^{{\prime} }}^{(i,\eta )}={\rho }_{nlm}^{(i,\eta )}{\rho }_{{n}^{{\prime} }{l}^{{\prime} }{m}^{{\prime} }}^{(i,\eta )},$$
(15)

which demonstrates that the coefficients of the neighborhood density contain the same information as the coefficients of the neighborhood density matrix. Indeed, we have found in our calculations that both types of coefficients perform equally when used as descriptors to predict the atomic DDRFs. We further note that the coefficients of the 3-body version of the SOAP descriptor \({d}_{n{n}^{{\prime} }l}^{(\eta )}\) can be obtained from the NDM using

$${d}_{n{n}^{{\prime} }l}^{(\eta )}=\mathop{\sum}\limits_{{l}^{{\prime} }m{m}^{{\prime} }}\sqrt{\frac{8{\pi }^{2}}{2l+1}}{\rho }_{nlm}^{(i,\eta )}{\rho }_{{n}^{{\prime} }{l}^{{\prime} }{m}^{{\prime} }}^{(i,\eta )}{\delta }_{l{l}^{{\prime} }}{\delta }_{m{m}^{{\prime} }},$$
(16)

in the case where there is no coupling between different atomic species η.

Machine learning

We apply our ML approach for predicting DDRFs to hydrogenated silicon clusters and then use the DDRFs to calculate GW quasiparticle energies for these systems. We refer to this technique as the ML–GW approach. The atomic positions of the clusters were constructed as described in the methods section and then relaxed using DFT.

To establish the accuracy of this approach, we first investigate the error in the GW quasiparticle energies resulting from the expansion of the DDRF in terms of the intermediate local basis \(\{{\phi }_{{\alpha }_{a}}^{a}({{{\bf{r}}}})\}\): see Eq. (7). Figure 1 compares the HOMO–LUMO gaps obtained from mean-field DFT–PBE calculations, a standard plane-wave G0W0 calculation using a generalized plasmon-pole approximation (GPP)7,45 and a G0W0 calculation using the 2C-DDRF, where the DDRF is expanded in terms of a modified version of the admm-2 basis set50: see “Methods” section. The DFT–PBE results show that the HOMO–LUMO gap decreases with increasing cluster size from Eg ≈ 4.8 eV for the smallest cluster containing 10 Si atoms to Eg ≈ 3 eV for the biggest cluster with almost 60 Si atoms. This decrease is a consequence of quantum confinement effects, which are less pronounced for bigger clusters. The plane-wave GW HOMO-LUMO gaps show a similar trend as a function of cluster size, but the gaps are larger than the DFT–PBE gaps by several electron volts. Interestingly, the GW corrections are larger for smaller clusters than for larger clusters. As a consequence, the reduction in the GW HOMO–LUMO gaps as a function of cluster size is larger compared to the DFT–PBE result: in particular, the gap is as large as 8.6 eV for the smallest clusters and shrinks to 5.5 eV for the largest clusters corresponding to a decrease of 3.1 eV (compared to a decrease of 1.8 eV in the DFT–PBE HOMO–LUMO gap energies). Similar results were obtained by Chelikowsky et al.51, who also carried out GW calculations on hydrogenated Si clusters. In particular, they found that the HOMO–LUMO gap shrinks from ~9 eV for a 10 Si atom cluster to ~6.5 eV for a 47 Si atom cluster. The GW results obtained with the 2C-DDRF are qualitatively similar to the plane-wave GW results. However, the HOMO–LUMO gaps that are obtained with this approach are consistently ~0.3–0.4 eV smaller than the plane-wave results. This is a consequence of the incompleteness of the local basis set. Interestingly, the calculated HOMO–LUMO gaps exhibit step-like features at clusters with 16, 24, and 46 silicon atoms. Inspection of the atomic structure of these clusters reveals that they exhibit one or more SiH3 units on their surface, see Fig. 2, suggesting an interesting interplay between the chemical bonding and the HOMO–LUMO gaps in these systems.

Fig. 1: HOMO–LUMO gaps of silicon clusters.
figure 1

HOMO–LUMO gaps of hydrogenated silicon clusters from DFT–PBE Kohn–Sham eigenvalues, plane-wave G0W0 and G0W0 calculations using the 2C-DDRF.

Fig. 2: Atomic structure of silicon clusters.
figure 2

Clusters with a 15 Si atoms and b 16 Si atoms. Notice the presence of two SiH3 units on the surface of the cluster in (a). Hydrogen atoms are white; silicon atoms are brown.

Next, we determine the 1C-DDRF. For the basis set, we use solid harmonic Gaussians with optimized decay coefficients: see the “Methods” section. Figure 3a compares the HOMO–LUMO gaps from G0W0 calculations with the 1C-DDRF to those obtained with the 2C-DDRF and also to plane-wave G0W0 results. For small clusters, the HOMO–LUMO gaps obtained with the 1C-DDRF are smaller than those obtained with the 2C-DDRF, while the opposite behavior is observed for larger clusters. The largest difference between the two methods is obtained for clusters containing ~40 Si atoms. The root-mean-square error (RMSE) of the 1C-basis results relative to the 2C-basis results is 0.22 eV, and the RMSE relative to the plane-wave results is 0.45 eV for all clusters. Figure 3b shows the HOMO and LUMO quasiparticle energies. It can be seen that better agreement with the plane-wave result is obtained for the LUMO than for the HOMO.

Fig. 3: HOMO–LUMO gaps, HOMO and LUMO energies of silicon clusters.
figure 3

a HOMO–LUMO gaps of hydrogenated silicon clusters from plane-wave G0W0 and G0W0 calculations using the 2C-DDRF and G0W0 calculations using the 1C-DDRF. b HOMO and LUMO energies of hydrogenated Si clusters.

Figure 4a shows the quasiparticle energy corrections of the ten lowest conduction orbitals and the ten highest valence orbitals from plane-wave G0W0 and G0W0 with the 1C-DDRF. The corrections obtained with the 1C-DDRF follow a similar trend as those obtained from the plane-wave calculation. For the unoccupied states, the quantitative agreement is better than for the occupied states, but the 1C-DDRF results for the unoccupied states are scattered over a larger energy range than the plane-wave results. To analyze the errors that arise from the use of the 1C-DDRF in more detail, Fig. 4b shows a two-dimensional histogram of the difference in QP corrections between plane-wave G0W0 and G0W0 with the 1C-DDRF. For the occupied states, the differences are mostly smaller than 0.4 eV, while they are somewhat smaller for the unoccupied states. The RMSE over all energy levels is 0.32 eV.

Fig. 4: QP corrections obtained from plane-wave GW and 1C-GW.
figure 4

a Quasiparticle corrections from plane-wave G0W0 and G0W0 with the 1C-DDRF for the 10 highest valence orbitals and the 10 lowest conduction orbitals of hydrogenated silicon clusters. b Histogram of difference in quasiparticle corrections from plane-wave G0W0 and G0W0 calculations with the 1C-DDRF for the 10 highest valence orbitals and the 10 lowest conduction orbitals of hydrogenated silicon clusters. The mean-field energies are referenced to the middle of the mean-field HOMO–LUMO gap.

Now that we have established the accuracy of the method used to generate the training set, we use a dense NN in conjunction with the NDM descriptor to generate the coefficients of the 1C-DDRF according to

$${\chi }_{nlm{n}^{{\prime} }{l}^{{\prime} }{m}^{{\prime} }}^{(i)}=f\left({\rho }_{nlm}^{(i,Si)},{\rho }_{nlm}^{(i,H)}\right),$$
(17)

where f is the NN function. The hydrogen and silicon environment descriptors are concatenated into a single vector before being fed into the NN. A separate network is trained for Si and H contributions to the DDRF. The exact architecture of the network as well as the practical computation of the atomic decomposition and the descriptors, are described in the “Methods” section. To generate the training data for the NN, we start from the set of relaxed hydrogenated Si clusters that were studied above. From each relaxed cluster, we generate six configurations by randomly displacing the atoms with the magnitude of the displacements being drawn from a uniform distribution with a maximum of 0.1 Å. For these clusters, we then calculate the 1C-DDRF.

Once the NN is trained on the 1C-DDRFs of the randomly displaced clusters, we use it to calculate the 1C-DDRFs of the relaxed clusters and then determine quasiparticle energies via the ML–GW approach. Figure 5 compares the HOMO–LUMO gaps from ML–GW and GW with explicitly calculated 1C-DDRFs. Except for the smallest cluster, the ML–GW method accurately reproduces the HOMO-LUMO gaps of the explicit GW calculations. The worse performance for the smallest cluster is a consequence of the training set, which contains a large number of bigger clusters containing atomic environments that differ from those found in the smallest clusters. The overall RMSE of the ML–GW method relative to the explicit GW with the 1C-basis is only 0.15 eV but reduces to 0.06 eV when the smallest cluster is excluded.

Fig. 5: 1C and ML–GW HOMO–LUMO gaps.
figure 5

HOMO–LUMO gaps of hydrogenated silicon clusters obtained from G0W0 calculations using the 1C-DDRF and ML–G0W0.

Figure 6 shows the difference in QP corrections between ML–GW and GW with the 1C-DDRF for the 10 highest valence states and 10 lowest conduction states, with the energies of the smallest cluster excluded. ML–GW produces QP shifts for both valence and conduction states within 0.1 eV from the explicit G0W0 with the 1C-DDRF. The majority of valence states exhibit a positive error, while for conduction states, the error is largely negative.

Fig. 6: ML–GW QP correction error compared to 1C-GW.
figure 6

Histogram of difference in quasiparticle corrections from G0W0 using the 1C-DDRF and ML–G0W0 for the 10 highest valence orbitals and the 10 lowest conduction orbitals of hydrogenated silicon clusters. The mean-field energies are referenced to the middle of the mean-field HOMO–LUMO gap. The energies of the smallest cluster were excluded.

Figure 7 compares the ML–G0W0 QP corrections to plane-wave G0W0 results, again with the energies of the smallest cluster excluded. As expected, the differences are very similar to those between plane-wave G0W0 and the explicit G0W0 with the 1C-basis. In particular, the RMSE is 0.34 eV for all clusters and reduces to 0.30 eV when the smallest cluster is excluded. This result demonstrates that the key obstacle to improving the ML–GW approach is the development of a better basis set.

Fig. 7: QP correction error of ML–GW compared to plane-wave GW.
figure 7

Histogram of difference in quasiparticle corrections from plane-wave G0W0 and ML–G0W0 DDRF for the 10 highest valence orbitals and the 10 lowest conduction orbitals of hydrogenated silicon clusters. The mean-field energies are referenced to the middle of the mean-field HOMO–LUMO gap. The energies of the smallest cluster were excluded.

Finally, we test the ability of the ML–GW approach to predict the quasiparticle energies of clusters that are larger than those included in the training data. For this, we only include clusters with up to Nmax Si atoms in the training set, with Nmax being 60, 50, and 40. Again, the training set only includes clusters with randomly displaced atoms, and the test set consists of relaxed clusters. The predicted ML–GW for the whole set of relaxed clusters is shown in Fig. 8. From this graph, it is clear that the accuracy of the prediction for the largest clusters deteriorates as Nmax is reduced: while for Nmax = 60, the gaps and QP corrections for clusters with more than 60 Si atoms are still highly accurate, larger differences are observed for Nmax = 50. For Nmax = 40, errors as large as 1 eV are obtained for the gaps of clusters with around 50 Si atoms. Figure 8f shows that the large error in the gaps is a consequence of having a negative error in the QP shifts for occupied states and a positive error in the shift for unoccupied states. In other words: instead of a cancellation, we get an accumulation of errors when computing HOMO–LUMO gaps.

Fig. 8: Performance of ML–GW when extrapolating to larger clusters.
figure 8

HOMO-LUMO gaps (left panels - a, c, e) and errors in quasiparticle shifts (right panels - b, d, f) from explicit G0W0 calculations with the 1C-DDRF and from ML–G0W0 trained on clusters containing up to Nmax = 60 Si atoms (upper panels - a, b), Nmax = 50 Si atoms (middle panels - c, d) and Nmax = 40 Si atoms (lower panels - e, f). The red vertical line indicates Nmax. The panels on the right-hand side (b, d, f) only contain results for clusters with more Si atoms than Nmax. The mean-field energies are referenced to the middle of the mean-field HOMO–LUMO gap.

Discussion

We have developed an ML approach to predict the interacting DDRF of materials. To achieve this, we introduce a decomposition of the DDRF into atomic contributions, which form the output of a NN. We also introduce the NDM descriptor, which is a generalization of the widely used SOAP descriptor21: instead of symmetrizing the descriptor using a Haar integral over a symmetry group52, we construct the tensor product of the expansion coefficients of the neighborhood density, which transforms under rotation in the same way as the atomic contributions to the DDRF. Thus, while not fully covariant, our approach is able to distinguish between different orientations of a chemical environment, which is a key requirement for predicting functions such as the DDRF.

The ML technique for DDRFs is then combined with the GW approach. The resulting method is called the ML–GW approach. We apply this method to hydrogenated silicon clusters. The ML–GW approach reproduces HOMO–LUMO gaps and quasiparticle energies of GW calculations using the explicitly calculated 1C-DDRF, i.e., the DDRF in a pair basis where the basis functions of each pair are centered on the same atom, with an accuracy of about 0.1 eV. The accuracy of the results deteriorates when it is applied to clusters that are larger than those included in the training set.

However, the error of ML–GW is significantly larger when compared to standard plane-wave GW results: HOMO–LUMO gaps are reproduced to within 0.5 eV, but the error reduces to 0.4 eV when the smallest cluster is excluded from the test set. These errors are comparable to those obtained by Rohlfing in his GW calculations for silane using a model dielectric function9.

These findings demonstrate that the main challenge to improving the ML–GW method is the construction of better local basis sets for the DDRF. The basis used for the 2C-DDRF can be improved straightforwardly by using larger basis sets, such as aug-admm-2, admm-3, or aug-admm-350. However, it is more difficult to increase the basis used for the 1C-DDRF as this leads to linear dependencies, which deteriorate the predictive accuracy of the NN. This was also observed by Grisafi et al.28 when predicting the expansion coefficients of the electronic density using the symmetry-adapted SOAP kernel25. In the future, we plan to explore the use of orthogonal radial basis sets, such as Laguerre polynomials, instead of solid harmonic Gaussians.

We expect that the ML–GW method can be applied to calculate quasiparticle energies in systems that have so far been out of reach for standard implementations. Examples include disordered materials, liquids, interfaces, or nanoparticles. It could also be combined with on-the-fly ML methods53 to perform GW calculations on molecular-dynamics snapshots to determine finite-temperature quasiparticle energies.

Methods

Data generation

The atomic structures of the hydrogenated silicon clusters were obtained in the same way as described by Zauchner et al.54: starting from the Si123H100 cluster of the silicon Quantum Dot data set55, we remove the silicon atom furthest from the center of the cluster, terminate the dangling bonds with hydrogen atoms and relax the resulting structure using DFT. The process is repeated until only 10 silicon atoms remain. From this set of silicon clusters, only clusters with fewer than 60 silicon atoms were used in the training set for DDRF prediction. From each cluster with fewer than 60 silicon atoms, we created six additional clusters in which random displacements were added to the atomic positions. The magnitudes of the displacements were drawn from a uniform distribution with a width of 0.1 Å. Finally, calculations were also carried out for clusters with between 60 and 70 silicon atoms. These clusters are not part of the training set but are used to test the extrapolation capacity of the ML approach. Note that all calculations were carried out for clusters in a vacuum, i.e., we did not consider the effect of a substrate or a solid matrix.

DFT and GW calculations

The DDRF and QP corrections were calculated using the BerkeleyGW software package7,56. This code uses a plane-wave basis to represent the DDRF which makes it possible to systematically converge results by increasing the plane-wave cutoff. In contrast, it is often more difficult to achieve convergence when GW implementations based on local orbitals are used. Mean-field DFT calculations were performed using the Quantum Espresso code57,58. Norm-conserving pseudopotentials from the Quantum Espresso Pseudopotential Library were used. The parameters of the DFT calculations were the same as those used by Zauchner et al.54: a plane-wave cut-off of 65 Ry and a supercell with sufficient vacuum to avoid interactions between periodic images. For the calculation of the DDRF, a total of 1000 Kohn–Sham states were used in the summation. Also, a plane-wave cut-off of 6 Ry and a truncated Coulomb interaction was used. The QP corrections were calculated using the GPP7, an explicit sum over 1000 Kohn–Sham states, and also a static remainder correction59. To calculate the HOMO and LUMO energies, the vacuum level was determined by averaging the electrostatic potential over the faces of the supercell.

Projection onto the intermediate basis

We first use BerkeleyGW to calculate the inverse dielectric matrix \({\epsilon }_{{{{{\bf{GG}}}}}^{{\prime} }}^{-1}\) in a plane-wave basis56. From this, we determine the interacting DDRF via

$${\chi }_{{{{{\bf{GG}}}}}^{{\prime} }}=({\epsilon }^{-1}_{{{{{\bf{GG}}}}}^{{\prime} }}-{\delta }_{{{{{\bf{GG}}}}}^{{\prime} }})/{v}_{{{{\bf{G}}}}}$$
(18)

with vG being the Fourier transform of the truncated Coulomb interaction.

Next, the DDRF in real space is obtained as

$$\chi ({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} })=\frac{1}{V}\mathop{\sum}\limits_{{{{{\bf{G,G}}}}}^{{\prime} }}{e}^{i{{{\bf{G\cdot r}}}}}{\chi }_{{{{{\bf{GG}}}}}^{{\prime} }}{e{}^{-i{{{{\bf{G}}}}}^{{\prime} }\cdot {{{\bf{r}}}}}}^{{\prime} },$$
(19)

where V is the volume of the supercell.

Starting from a set of real atom-centered basis functions \({\phi }_{{\alpha }_{i}}^{i}({{{\bf{r}}}})\), where αi labels the basis function on atom i, we construct an orthogonal basis set \({\tilde{\phi }}_{{\alpha }_{i}}^{i}({{{\bf{r}}}})\)

$${\tilde{\phi }}_{{\alpha }_{i}}^{i}({{{\bf{r}}}})=\mathop{\sum}\limits_{k}\mathop{\sum}\limits_{{\alpha }_{k}}{A}_{ik}^{{\alpha }_{i}{\alpha }_{k}}{\phi }_{{\alpha }_{k}}^{k}({{{\bf{r}}}}),$$
(20)

where \({A}_{ik}^{{\alpha }_{i}{\alpha }_{k}}\) is the matrix of eigenvectors of the overlap matrix. The coefficients of the DDRF, when expanded on the orthogonalized basis, are

$$\begin{array}{l}{\tilde{\chi }}_{{\alpha }_{i}{\alpha }_{j}}^{ij}\,=\,\frac{1}{V}\mathop{\sum}\limits_{{{{{\bf{G,G}}}}}^{{\prime} }}{\chi }_{{{{{\bf{G,G}}}}}^{{\prime} }}\\ \qquad \qquad \times \,\int\nolimits_{-\infty }^{\infty }{\tilde{\phi }}_{{\alpha }_{i}}^{i}({{{\bf{r}}}}){e}^{i{{{\bf{G\cdot r}}}}}d{{{\bf{r}}}}\int\nolimits_{-\infty }^{\infty }{e{}^{-i{{{{\bf{G}}}}}^{{\prime} }\cdot {{{\bf{r}}}}}}^{{\prime} }{\tilde{\phi }}_{{\alpha }_{j}}^{j}({{{{\bf{r}}}}}^{{\prime} })d{{{{\bf{r}}}}}^{{\prime} },\end{array}$$
(21)

where, due to the localized nature of the basis functions, we extended the integral from an integral over the supercell to an integral over all space. These integrals are proportional to the Fourier transforms of the basis functions (or their complex conjugates). We note that it is possible to skip this step if a GW implementation based on local orbitals is used16.

We then transform back to the non-orthogonal localized basis set using Eq. (20) to find

$$\begin{array}{l}\chi ({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} })\,=\,\mathop{\sum}\limits_{{\alpha }_{i}{\alpha }_{j}}\mathop{\sum}\limits_{ij}{\tilde{\chi }}_{{\alpha }_{i}{\alpha }_{j}}^{ij}{\tilde{\phi }}_{{\alpha }_{i}}^{i}({{{\bf{r}}}}){\tilde{\phi }}_{{\alpha }_{j}}^{j}({{{{\bf{r}}}}}^{{\prime} })\\ \qquad\quad\,\,\, = \,\mathop{\sum}\limits_{{\alpha }_{k}{\alpha }_{l}}\mathop{\sum}\limits_{kl}\mathop{\sum}\limits_{{\alpha }_{i}{\alpha }_{j}}\mathop{\sum}\limits_{ij}{A}_{ik}^{{\alpha }_{i}{\alpha }_{k}}{A}_{jl}^{{\alpha }_{i}{\alpha }_{k}}{\tilde{\chi }}_{{\alpha }_{i}{\alpha }_{j}}^{ij}{\phi }_{{\alpha }_{k}}^{k}({{{\bf{r}}}}){\phi }_{{\alpha }_{l}}^{l}({{{{\bf{r}}}}}^{{\prime} })\\ \qquad\quad\,\,\, = \,\mathop{\sum}\limits_{{\alpha }_{k}{\alpha }_{l}}\mathop{\sum}\limits_{kl}{\chi }_{{\alpha }_{k}{\alpha }_{l}}^{kl}{\phi }_{{\alpha }_{k}}^{k}({{{\bf{r}}}}){\phi }_{{\alpha }_{l}}^{l}({{{{\bf{r}}}}}^{{\prime} }),\end{array}$$
(22)

where we defined

$${\chi }_{{\alpha }_{k}{\alpha }_{l}}^{kl}=\mathop{\sum}\limits_{{\alpha }_{i}{\alpha }_{j}}\mathop{\sum}\limits_{ij}{\tilde{\chi }}_{{\alpha }_{i}{\alpha }_{j}}^{ij}{A}_{ik}^{{\alpha }_{i}{\alpha }_{k}}{A}_{jl}^{{\alpha }_{i}{\alpha }_{k}}.$$
(23)

The basis functions we employed are the real solid harmonic Gaussians as defined in LibInt60

$${\phi }_{lm}(r,\theta ,\phi )={N}_{l}(\beta ){r}^{l}{e}^{-\beta {r}^{2}}{R}_{lm}(\theta ,\phi ),$$
(24)

where β is a decay parameter, Nl(β) is a normalization factor, and Rlm are the real spherical harmonics given by61

$$\begin{array}{l}{R}_{lm}(\theta ,\phi )\\= \left\{\begin{array}{l}\frac{i}{\sqrt{2}}\left({Y}_{l-| m| }(\theta ,\phi )-{(-1)}^{m}{Y}_{l| m| }(\theta ,\phi )\right)\,{{\mbox{if}}}\,m \,<\, 0\quad \\ {Y}_{lm}(\theta ,\phi )\,{{\mbox{if}}}\,m=0\quad \\ \frac{1}{\sqrt{2}}\left({Y}_{l-| m| }(\theta ,\phi )+{(-1)}^{m}{Y}_{l| m| }(\theta ,\phi )\right)\,{{\mbox{if}}}\,m \,>\, 0,\quad \end{array}\right.\end{array}$$
(25)

where Ylm(θ, ϕ) are the complex spherical harmonics with the Condon–Shortley phase convention. Kuang and Lin showed that the Fourier transform of the complex solid harmonic Gaussians is again a solid harmonic Gaussian62

$$\begin{array}{l}\frac{1}{{(2\pi )}^{3/2}}\int\,d{{{\bf{r}}}}{e}^{-i{{{\bf{G\cdot r}}}}}{N}_{l}(\beta ){r}^{l}{e}^{-\beta {r}^{2}}{Y}_{lm}(\hat{{{{\bf{r}}}}})\\ ={(-i)}^{l}{\tilde{N}}_{l}(\beta ){G}^{l}{e}^{-{G}^{2}/(4\beta )}{Y}_{lm}(\hat{{{{\bf{G}}}}}),\end{array}$$
(26)

with \({\tilde{N}}_{l}(\beta )={N}_{l}(\beta )/{(2\beta )}^{3/2}\). The Fourier transform of the real solid harmonic Gaussians can then be easily computed using Eq. (25).

The basis set used in this work is a modified version of the admm-2 basis set50 (see Supplementary Methods for details), in which the s-orbitals were removed and contracted Gaussians were uncontracted into individual basis functions. Removing the s-orbitals ensures that \(\int\,d{{{\bf{r}}}}\chi ({{{\bf{r}}}},{{{{\bf{r}}}}}^{{\prime} })=0\) since only the Fourier transform of s-orbitals has a G = 0 contribution.

Projection onto the atomic basis

The fully atom-centered basis set also consists of solid harmonic Gaussians. The basis set was constructed following the same procedure as in the DScribe library63, where individual basis functions are given by

$${\psi }_{nlm}(r,\theta ,\phi )={N}_{l}({\beta }_{nl}){r}^{l}{e}^{-{\beta }_{nl}{r}^{2}}{R}_{lm}(\theta ,\phi ),$$
(27)

where the basis set is truncated at a maximum angular momentum lmax and a maximum principal quantum number nmax. For silicon atoms we use lmax = nmax = 4. For hydrogen atoms we use lmax = nmax = 3.

The exponents βnl are constructed such that the corresponding basis functions decay to zero at a cutoff radius Rn, i.e., \({\beta }_{nl}=-\ln (\frac{T}{{R}_{n}^{l}})/{R}_{n}^{2}\) with T = 10−3 Ål being a threshold parameter. The cutoff radius Rn = Ri + (Ro − Ri)/n lies between an inner radius Ri and an outer radius Ro. For hydrogen atoms, we used Ri = 0.1 Å and Ro = 3.0 Å and for silicon atoms, we used Ri = 1.0 Å and Ro = 8.0 Å. Additionally, for silicon atoms, we also included the basis functions of the modified admm-2 basis. Both Ri and Ro were optimized to minimize linear dependencies in the basis set, as such dependencies significantly deteriorate the accuracy of the NN predictions. A similar observation was made by Grisafi et al.28 when learning electron densities, although a different approach was taken to remedy this issue in their work.

In order to compute the coefficients of the atomic contributions to the DDRF in the fully atom-centered basis, the same procedure as in the intermediate basis was used: the basis was first orthogonalized by computing the eigenvectors of the overlap matrix. Then the atomic DDRFs in the intermediate basis were projected onto the orthogonalized fully-atom centered basis with overlaps between the different basis functions being computed using LibInt60. Then the atomic DDRFs were transformed back to the non-orthogonal basis producing the desired coefficients \({\chi }_{nlm{n}^{{\prime} }{l}^{{\prime} }{m}^{{\prime} }}^{(i)}\).

Descriptors

The basis set for neighborhood densities was generated using the same procedure as for the fully atom-centered basis for the DDRF. However, s-orbitals were not removed and the basis functions of the admm-2 basis set were not included. We used Ri = 1.0 Å for both hydrogen and silicon atoms Ro = 4.0 Å for hydrogen atoms and Ro = 9.0 Å for silicon atoms. The exponents of the Gaussians in Eq. (12) were set such that the standard deviation of the Gaussians is 0.5 Å. LibInt60 was again used to compute the required integrals for the projection.

Neural network

A dense NN with four hidden layers with 2000, 1500, 1000, and 2000 nodes, respectively, was constructed for both silicon and hydrogen atoms. Each layer uses a Leaky–ReLu activation function with a leak parameter of 0.1. The output layer was further symmetrized by adding its transpose. The loss used was the mean-squared error between the predicted and true expansion coefficients \({\chi }_{nlm{n}^{{\prime} }{l}^{{\prime} }{m}^{{\prime} }}^{(i)}\). The NN was trained on the perturbed clusters for 20,000 epochs. We found that adding dropout to the layers does not significantly improve the quasiparticle energies resulting from the predictions, which is likely due to the similarity between the atomic environments in the training and test set.