Representing individual electronic states for machine learning GW band structures of 2D materials

Choosing optimal representation methods of atomic and electronic structures is essential when machine learning properties of materials. We address the problem of representing quantum states of electrons in a solid for the purpose of machine leaning state-specific electronic properties. Specifically, we construct a fingerprint based on energy decomposed operator matrix elements (ENDOME) and radially decomposed projected density of states (RAD-PDOS), which are both obtainable from a standard density functional theory (DFT) calculation. Using such fingerprints we train a gradient boosting model on a set of 46k G0W0 quasiparticle energies. The resulting model predicts the self-energy correction of states in materials not seen by the model with a mean absolute error of 0.14 eV. By including the material’s calculated dielectric constant in the fingerprint the error can be further reduced by 30%, which we find is due to an enhanced ability to learn the correlation/screening part of the self-energy. Our work paves the way for accurate estimates of quasiparticle band structures at the cost of a standard DFT calculation.


Introduction
The electronic band structure is one of the most fundamental and important characteristics of a crystalline solid.It relates the quantum mechanical energy levels of an electron in the solid to its (crystal) momentum and provides the basis for describing and understanding a range of materials properties.As a consequence, the accurate prediction of electronic band structures represents a corner-stone problem of computational condensed matter physics.
Density functional theory (DFT) [1] with semi-local exchange-correlation functionals [2] is the standard method for solving the electronic structure problem of materials from first principles.However, the DFT single-particle energies do not in general provide an accurate model for the electronic band structure.[3] Instead, the gold standard for band structure calculations is represented by the GW self-energy method [4], which provides the true quasiparticle (QP) band structure, i.e. it goes beyond a mean field description by explicitly accounting for exchange and many-body screening effects.[5,6] In Ref. [7] the mean absolute error on the calculated band gap relative to experimental references for a set of ten simple semiconductors and insulators was found to be 2.05 eV for DFT-LDA and 0.31 eV for non-selfconsistent G 0 W 0 @LDA.Very similar results have been found in other studies.[8,9] The improved accuracy of the GW method comes at the price of a significantly more involved methodology and a much higher computational cost.In practice, this means that GW calculations are limited to small-scale studies of relatively simple materials.
Recently, machine learning (ML) has attracted widespread interest as a means to predict materials properties without performing expensive quantum mechanical calculations.[10,11,12,13,14,15] In the context of band gap predictions, Zhou et al. trained a support vector machine on 3896 experimental band gaps using a representation based only on elemental properties of the constituent atoms.[16] Rajan et al. used different regressions methods to predict band gaps of MXene crystals using a training set of 76 G 0 W 0 band gaps and a representation encoding atomic and structural properties.[17] Liang et al. used a representation based on atomic ionicity descriptors to predict GW band gaps of a set of 2D semiconductors.[18] In all these previous studies, the ML model was trained to predict the size of the band gap rather than the full k-resolved band structure.Thereby, important information is missed including the type of the band gap (direct or indirect), the curvature of the valence and conduction bands at the extrema points (effective masses), and the position and dispersion of other bands away from the band gap.Predicting the full band structure directly from the atomic structure of the material is a daunting challenge that, although possible in principle, would require highly sophisticated ML models and immense amounts of training data.
Here we take a different approach, in which the output from a DFT calculation is taken as input to a ML model to predict the full GW band structure.The philosophy behind our approach is that standard DFT calculations are computationally very cheap, in particular compared to GW, and although they do not directly produce the desired precision, they hold the gist of the material's genome and thus should provide an excellent starting point for accurate property predictions.In our scheme, the rich, but unmanageable, information contained in the DFT wave functions is encoded into low dimensional fingerprints via energy resolved orbital projections and operator matrix elements.These state-specific electronic fingerprints provide a description of the local environment of a given electronic eigenstate in the infinite dimensional Hilbert space, and are thus analogue to the well known fingerprints used to describe atoms in chemical environments [19].
Using a data set of 286 G 0 W 0 band structures of non-magnetic 2D semiconductors comprising a total of 46.000 (ε QP nk , k) pairs, we train a gradient boosting algorithm to predict the G 0 W 0 correction of an eigenstate from its DFT fingerprint.The method achieves a mean absolute error (MAE) of 0.14 eV for individual band energies and 0.18 eV for the band gap.These deviations are significantly smaller than the typical size of the G 0 W 0 corrections and also lower than the accuracy of the G 0 W 0 method itself.The model can be further and significantly improved by adding the static electronic polarisability to the fingerprint.A SHAP feature analysis reveals that the inclusion of the polarisability allows the ML model to distinguish between materials with similar PBE band structures but different dielectric screening properties, which is directly related to the size of the GW correction.
We have used the resulting ML model to obtain G 0 W 0 band structures for ∼ 700 2D semiconductors from the Computational 2D Materials Database (C2DB) [20,21].These materials are additional to the dataset used in this study, and the band structures will be published on the C2DB web page [22].

Results
Figure 1a shows an example of a PBE (orange) and G 0 W 0 (green) band structure for monolayer MoS 2 (note that spin-orbit interactions are not included throughout this work).It is clear that there are significant differences between the two descriptions.First of all, G 0 W 0 yields a QP band gap of 2.53 eV in good agreement with the experimental value of 2.5 eV [23] while PBE yields a significantly smaller band gap of 1.58 eV.It can also be noted that unoccupied bands are shifted up in energy while occupied bands are shifted down.This is in fact a general trend across all the materials in the data set and it leads to a double peak in the histogram of G 0 W 0 corrections with the peak of negative (positive) corrections corresponding to occupied (empty) bands, see Figure 1b.The absolute values of the G 0 W 0 corrections range from 0 to 3 eV with an average value of 1.17 eV, see the histogram in Figure 1c.Returning to the band diagram in panel (a) we further note that not all the bands are shifted by the same amount -even when disregarding the different sign for occupied/empty bands.Although for most materials, all the occupied bands experience similar, though material specific, shifts and the same holds for the empty bands, there are several examples, like MoS 2 , where this is not the case.Therefore, an accurate prediction of G 0 W 0 corrections for general bands requires a representation that not only encodes the occupation of the state, but also information about the energy and shape of the wave function and its relation to other relevant states of the crystal.

Electronic fingerprints
The ENDOME and RAD-PDOS representations, defined in the Methods Section, are attempts to generalise the notion of the local environment of an atom, which has been successfully employed to represent solids and molecules in machine learning studies, to the case of an electronic state.The ENDOME fingerprint represents the local environment of an energy eigenstate |nk in terms of operator matrix elements between the state itself and other eigenstates of the crystal, | nk| Â|n k | 2 .These matrix elements are arranged on an grid as a function of the energy difference ε nk − ε n k , and their sign is used to encode the occupation of the final state |n k .With the ENDOME fingerprint two states are thus considered similar if they have similar matrix elements with other states of similar relative energies.In this work we include matrix elements for the position operator, momentum operator, and Laplacian operator.Since we exclusively consider 2D materials in the present work, the fingerprints are split into in-plane and out-of-plane components for the position operator (labeled xy and z, respectively) and the momentum operator (labeled p xy and p z ).The RAD-PDOS fingerprint is a correlation function in energy and radial distance between the atomic orbital projections (onto angular momentum channels s, p, and d) of the reference eigenstate and all other eigenstates of the crystal.Figure 2 visualises the two types of fingerprints for three different electronic states of MoS 2 .
Any reasonable fingerprint should comply with certain general requirements [13] of which invariance and simplicity are the most fundamental.In the present context, this means that the fingerprint should be invariant with respect to the choice of unit cell (number of primitive cells, rotations and translations), the gauge used for the Bloch wave functions, and that it should be computationally cheap to generate compared to a full G 0 W 0 calculation.Both the ENDOME and RAD-PDOS fingerprints clearly fulfill these requirements.Besides the invariance and simplicity conditions, the fingerprints should also be unique such that two different systems (here electronic states) are not mapped to the same fingerprint, and they should be descriptive such that systems with similar properties are close in fingerprint space.The interpretation and quantitative assessment of notions such as different systems and similar properties are obviously problem dependent.This fact can make it difficult for problem independent fingerprints like the ENDOME and RAD-PDOS to meet these requirements in general.This is, however, not a principal problem, and can usually be solved by increasing the size of the training data set, at least as long as the fingerprints are complex and flexible enough to capture the variations in the considered systems that are relevant to the specific learning problem.
An impression of the descriptiveness of the fingerprints can be obtained from Figure 3, which shows 2dimensional projections of the ENDOME-p xy and RAD-PDOS-dd fingerprints using t-distributed stochastic neighbor embedding (tSNE) color coded by the GW corrections.It is clear that data points, which are close in p xy -space have similar GW corrections.The pd fingerprint is also descriptive for some data points, but there is also a large blob of data points that are indistinguishable in fingerprint space but have very different GW corrections.Not unexpectedly, these points correspond to the subset of materials without valence d-electrons, which results in all-zero pd fingerprint vectors.The tSNE plots for the other components of the ENDOME and RAD-PDOS fingerprints look similar.

State energies
To predict the state-specific G 0 W 0 corrections to the PBE eigenvalues of 2D semiconductors, we use the XG-Boost package [24] to build a machine learning model based on a gradient boosting algorithm for decision tree ensembles.The G 0 W 0 data set was described and analysed in detail in Ref. [25].We split the data set into a training set of 228 randomly selected materials (37851 electronic states) and a test set consisting of the remaining 58 materials (8766 electronic states).As objective function we use the mean absolute error (MAE) between the predicted and actual G 0 W 0 corrections.The electronic states are represented by the ENDOME and RAD-PDOS fingerprints supplemented by a set of extra features consisting of the occupation of the state (f nk = 0, 1), its distance to the Fermi energy (ε nk − E F ), the PBE band gap of the material (E gap ), and the static averaged in-plane and out-of-plane polarisabilities of the material ( 12 (α x + α y ) and α z ).The averaged in-plane polarisability is used to ensure invariance of the feature with respect to rotations of the 2D material in the plane, which is important for materials with in-plane anisotropy.The effect of including the polarisabilities in the fingerprint has been analysed separately (see later discussion).
The results of the model together with relevant baselines for assessing its performance, are summarised in Table 1.The first row shows the estimated accuracy of our target G 0 W 0 data relative to experiments based on previous reports in the literature [7,8,9].Experimental data for individual band/state QP energies are scarce and subject to significant uncertainties, and thus do not represent a meaningful reference.The remaining rows of the Table show the mean absolute error (MAE) on the band gap and individual state energies for different approximate methods versus G 0 W 0 .The MAE on state energies is evaluated over all the bands for which G 0 W 0 data is available, namely the 8 highest valence bands (VB) and 4 lowest conduction bands (CB).The second and third rows are straightforward comparisons of band energies from PBE and HSE06 with G 0 W 0 , respectively.The fourth row shows the MAE between G 0 W 0 and PBE after the occupied and unoccupied PBE energies have been rigidly shifted (by applying a scissors operator) to match the valence band maximum (VBM) and conduction band minimum (CBM) of the G 0 W 0 band structure.From this it follows that the lowest possible MAE on individual band energies obtainable with a model trained to predict only the VBM and CBM energies, is 0.17 eV.The last two rows of the table shows the MAE on the test set obtained with the XGBoost model (see below for more details).Improved performance for the band gap can be obtained by training the model only on the highest valence and lowest conduction band (last row); however, such a restriction on the training data reduces the prediction accuracy for bands further away from the band gap.The numbers marked by (*) refer to the MAE obtained when the static polarisability of the materials is included in the fingerprint (see later discussion).
In the following, unless stated otherwise, results refer the case where the model has been trained on all bands (8VB + 4CB) and with the static polarizabilities included in the fingerprint.
Figure 4a shows a parity plot of the predicted vs. true values for the train and test set.The evaluation yields MAEs of 0.05 eV and 0.11 eV for train and test set, respectively.To test for potential bias of the model, the residual distributions are plotted in Figure 4b, showing that both the train and test set have residuals distributed evenly around 0 eV.To estimate the effect of adding more data to the train set, a learning curve is shown in Figure 4c.The learning curve is calculated by continuously adding more materials to the training set while evaluating the performance on a constant test set.The test set MAE decreases significantly up to ≈ 50 materials after which the learning curve flattens considerably, although still presenting a slightly decreasing MAE.This suggests that a generalizable model can be trained using a rather limited number of materials, though it should be noted that overfitting issues decrease with the amount of materials in the training set.In general, it is difficult to assess whether the learning ability of the model is limited by the flexibility of the model/fingerprint or by the noise level in the data set.We do stress, however, that the numerical precision of the G 0 W 0 corrections is not expected to be much better than 0.05 eV due to errors introduced by e.g.plane wave extrapolation and linearisation of the self-energy, see [25].This could explain (part of) the finite prediction error of the model.
All MAEs reported in this paper were evaluated for a specific, randomly generated test set of 58 materials.We have verified that this test set is representative and fair by comparing to MAEs obtained for 100 different random test sets, see Sec. .
The data used to train and evaluate the ML model represent states/energies evaluated at discrete uniformly distributed k-points of the Brillouin zone.However, the resulting ML model can of course be used to predict the G 0 W 0 energy corrections of states at arbitrary k-points and thereby generate full, densely sampled band structures.Figure 5 shows examples of ML generated band structures for PtO 2 , SbClTe, GeS 2 , and CaCl 2 , which are all test set materials.For comparison, the PBE and the true discrete G 0 W 0 energies are also shown.Overall, the ML bands closely interpolates the true G 0 W 0 energies.In cases where the ML bands deviate, e.g. the conduction bands of CaCl 2 , they still present a better description than PBE.Interestingly, the ML model is able to deviate from a scissors operator that would ascribe the same corrections to all occupied and all unoccupied bands, respectively.This is for example clear in the PtO 2 band structure where the four conduction bands are shifted by different amounts.We note that the single-point regression nature of the model, i.e. the fact that the model does not explicitly couple different k-points, can sometimes lead to weak and unphysical wiggles in the machine learned band energies.These qualitative errors may be reduced by applying a smoothing function (e.g. a Gaussian filter) as post-processing of the ML energies across bands.This has been done for the plots in Figure 5.

Band gaps
The ML state energies can be translated into ML band gaps by simply calculating the vertical difference between conduction band minimum and valence band maximum.Figure 6 shows parity plots of the predicted band gaps vs. G 0 W 0 band gaps for a ML model trained on all bands and a ML model trained only on valence and conduction bands.Due to the discreteness of the original G 0 W 0 data, the ML band gap has been evaluated on the same states (discrete k-points) that define the G 0 W 0 gap.The PBE and HSE06 data are also shown as baselines.Only data from the test set has been used for the comparison.The PBE and HSE06 functionals systematically underestimate the band gaps leading to MAEs of 1.70 eV and 0.85 eV, respectively.The ML model trained on all bands achieves a MAE on the band gap of 0.18 eV, while training the ML model only on valence and conduction bands reduces the band gap MAE to 0.15 eV, but at the cost of increasing the MAE on the individual state energies across all bands from 0.11 to 0.22 eV.
While our ML model and fingerprints allow for prediction of state-specific properties, such as individual band energies, it is of interest to compare its accuracy on band gap predictions to alternative schemes reported in the literature.Lee and coworkers [26] used nonlinear support vector regression with fingerprints containing the Kohn-Sham band gap obtained with both the PBE and the mBJ xc-functionals, together with a set of features describing the constituent chemical elements, to predict G 0 W 0 band gaps of inorganic bulk semiconductors.Using a database of 270 G 0 W 0 band gaps, they obtained a root mean square error (RMSE) of 0.24 eV.Rajan et al. used a Gaussian process to predict G 0 W 0 band gaps of 2D MXene crystals with a fingerprint encoding atomic and structural properties of the MXenes.[17] Employing a training set of 76 G 0 W 0 MXene band gaps, they obtained a RMSE of 0.14 eV.
We stress that both the inorganic bulk semiconductors considered Ref. [26] and, in particular, the MXene 2D crystals of Ref. [17], represent more homogeneous sets of materials than the 2D crystals considered in the present work.Nevertheless, with a RMSE of 0.26 and 0.21 eV on the predicted G 0 W 0 band gap for the models trained on 8VB+4CB and VB+CB, respectively, our general ML model with purely electronic fingerprints, is comparable in accuracy to the more system-specific ML models.
Additionally, by applying our ML model on ∼ 700 semiconductors from C2DB we have found the band gap to change nature (direct/indirect) in 12% of the materials when comparing the PBE and ML band gaps.For these materials, 72% shift from direct to indirect gaps.

Effective masses
Since the ML model can be used to calculate G 0 W 0 energies at any k-point grid, it is possible to use the method to calculate effective masses.Effective masses at the valence and conduction band extrema can be calculated by fitting a second order polynomial to the energies at a densely sampled k-point grid centered around the band extrema [20,21].This method is generally challenging with G 0 W 0 due to the high computational cost of calculating the energies at sufficiently dense k-point grids, but using the ML model it is possible to achieve accurate estimates of the G 0 W 0 effective masses.Figure 7 shows effective masses calculated using PBE and ML energies for ≈ 330 materials using a k-point density of 55/ Å−1 in a radius of 0.16 Å−1 .The validity of the polynomial fit is evaluated using a mean absolute relative error (MARE) metric.The MARE is defined as the absolute difference between the parabolic fit and the actual ML-G 0 W 0 band energies averaged over an energy range of 100 meV (from the band extremum) relative to the actual band energies averaged over the same energy range.The data shown in Figure 7 includes only fits with MARE less than 10 %.
Returning to Figure 7 we note that the effective masses obtained with ML-G 0 W 0 can deviate quite significantly from the PBE values.Specifically, the mean absolute deviation is 0.31m 0 and 0.19m 0 for valence and conduction bands, respectively, corresponding to relative deviations of 32% and 28%.We can also deduce that the ML-G 0 W 0 method has a general tendency to yield smaller effective masses than PBE, although deviations from this trend occur relatively often.

Feature importance
Often the evaluation of a machine learning model stops after considering the overall performance in terms of an objective function like the MAE.However, important insight may be gained by analysing how the model responds to different features in the input data.This is particularly important when devising new types of fingerprints.To extract information about the role of the different features composing the fingerprint vectors used in the present work, a feature importance analysis is performed using a feature subset hold-out method.The features are grouped at two different levels: The first level has four groups, namely the RAD-PDOS components, the ENDOME components, the extra features covering the PBE gap, occupation number, distance to the Fermi level, and finally the in-plane and out-of-plane polarizabilities.The second level breaks the RAD-PDOS and ENDOME components further down into their individual ll angular momentum blocks and operator matrix elements, respectively.The analysis is carried out in two complementary ways where a group of features is either used exclusively or dropped from the full fingerprint when training the ML model.
Figure 8 shows the test set MAE on individual state energies for the various feature groups with the allfeature baseline indicated by the vertical black line.Focusing first on panel (a), the analysis shows that both the RAD-PDOS and ENDOME perform well by themselves, though not as well as the full fingerprint.The extra features, in particular the polarisabilities, are unable to produce an accurate ML model.The poor performance of the polarisability-only feature is unsurprising as this feature is fully material specific and not even able to distinguishing between occupied and unoccupied states.Panel (b) shows the same analysis when the feature groups are broken further down.When used alone, the pp, ss and sp components of the RAD-PDOS perform best followed by the various operator matrix elements of the ENDOME.An interesting observation is that at this level of feature grouping, almost any group of features can be dropped without increasing the MAE, except for the in-plane polarisability, α xy , which results in a significant 27% increase of the MAE from 0.11 eV to 0.14 eV.This reveals a clear feature synergy since α xy in itself does not have any predictive ability unless it is combined with other features (see below).In general, there seems to be some redundant information in the various fingerprint components since dropping any of the feature sets, at least at the second level of grouping, does not affect the test score by much.In some cases, the model might even gain performance when dropping some features (not visible on the scale of the plot).This suggests that a feature selection algorithm prior to the prediction algorithm might in general slightly improve the performance of the model.However, since gradient boosting algorithms like XGBoost already has some implicit feature selection in the training iterations, the improvement is not expected to be significant and is thus not considered here.

SHAP analysis
The role of the α xy feature and its synergy with other features is further investigated using the general feature importance method SHAP, which is a game theoretic approach to explain the output of any machine learning model [27].SHAP builds an explanation model on top of a ML model which relates the output from the ML model to the importance of individual features for each predicted output.The SHAP values for a given feature can thus be interpreted as the direct effect of that feature on the model output, i.e. the difference between the model's prediction when used with and without that particular feature in the input.Figure 9a shows the SHAP values for α xy as a function of α xy .Only states from the test set are shown in Figure 9, and the color code in panel (a) reflects the occupancy of the state.The plot shows a surprisingly clear trend: The SHAP values for occupied states increase consistently and monotonously for increasing α xy while the opposite trend is seen for the empty states.In the following we present a physical explanation for this observation.
The G 0 W 0 correction can be split into two terms with distinctly different physical origin: The first term (in parenthesis) represents the difference between the local xc-potential (in this case the PBE potential) and the nonlocal exact exchange potential while the last term accounts for the interaction of the electron/hole with its own polarisation cloud.The first term is typically negative for occupied states and positive for unoccupied states (Hartree-Fock typically opens the PBE gap), but its magnitude depends on the detailed shape of the wave functions of the system.In particular, this term can be quite different for different states of the same material.Moreover, one does not expect the size of this term to correlate with the material's static polarisability and thus it should not be captured by the α xy -SHAP values.The second term is always positive for occupied states (hole quasiparticles) and negative for unoccupied states (electron quasiparticles) because the Coulomb interaction of the bare particle with its oppositely charged polarisation cloud will always stabilise the quasiparticle, thus shifting occupied states up and empty states down in energy [28,29,30].Now, the shape and size of the polarisation cloud does not depend on the detailed shape of the wave function, but is largely governed by the (microscopic) polarisability of the material.Therefore, on purely physical grounds, the static macroscopic polarisability, α xy , is expected to provide a good descriptor for ∆ scr nk : A large value of α xy signals high screening ability of the material and therefore large QP polarisation clouds, which in turn will yield a large ∆ scr nk (with opposite signs for occupied/empty states).This is exactly what is seen in Figure 9a.By subtracting the α xy -SHAP values for the states at the CBM and VBM, we obtain the α xy -SHAP values for the band gap correction, see Figure 9b.These show that the α xy feature increases the band gap in materials with low screening and decreases the band gap in materials with high screening.Again, this is perfectly in line with the physical understanding of screening-induced renormalisation of the band gaps [28,29,30].
It can be noted that the α xy -SHAP values for the state energies and band gaps are significantly larger than the change in the MAE upon including/dropping α xy from the feature set, see Figure 8b.For example, the α xy -SHAP values for the band gap range from -0.50 to 0.70 eV while the MAE decreases by 0.03 eV when α xy is included.This is due to the redundant information carried by the feature set.When the model is trained without α xy as feature, other features can, to a large extent, provide the same information.For example, the PBE band gap alone correlates fairly well with α xy .To test this hypothesis, we have carried out the same SHAP analysis for E PBE g on a model trained with and without α xy in the feature set.The analysis shows that when α xy is used to train the model, the E PBE g -SHAP values are fairly low (below ±0.1 eV) and do not show any clear trends.In contrast, when α xy is not included in the fingerprint, the E PBE g -SHAP values are very similar to the α xy -SHAP values shown in Figure 9, although the values are slightly smaller and the trend less pronounced.This shows that in the absence of α xy the model uses E PBE g to encode similar information.However, the model also finds that α xy provides a better description of ∆ scr nk than does E PBE g , which is why the SHAP values of E PBE g are dwarfed by those of α xy when both features are available for learning.

Summary
In summary, we have introduced two different methods to generate fingerprints of individual electronic states based on information available from a standard DFT ground state calculation (eigenvalues and wave functions).The fingerprints were used to train a decision-tree based ML model to predict the G 0 W 0 corrections to the PBE band structure of a 2D semiconductor.The model achieves a MAE of 0.14 eV for individual state energies, which is reduced to 0.11 eV when the static polarisability is included in the fingerprint.For the band gap, the MAE is 0.15-0.23 eV depending on whether the model is trained on all bands or only the valence/conduction bands and whether or not the static polarisability is included in the fingerprint.This level of precision is highly encouraging considering that the noise on the employed G 0 W 0 data for individual state energies could be on the order of 0.05 eV and that the accuracy of the G 0 W 0 method itself, when evaluated against experimental band gaps, is about 0.3 eV.Since the bottleneck of the computations is the self-consistent DFT calculation (in particular the structural relaxation if performed), the method enables GW-quality band structures at the cost of a DFT calculation.Although the current work has focused on states in periodic 2D crystals, the methods can be straightforwardly used to fingerprint states in 3D crystals as well as non-periodic structures like molecules or surfaces.While the fingerprint methods can be used for e.g.3D crystals, the ML model trained on 2D materials will not be transferable since some of the fingerprint components are divided into in-plane and out-of-plane parts.To use the full method of fingerprints and ML model for 3D crystals would require a ML model trained on a database of GW calculations of such systems.

Methods
This section describes the definition and generation of the Energy Decomposed Operator Matrix Elements (ENDOME) and Radially Decomposed Projected Density Of States (RAD-PDOS) fingerprints.In addition, the G 0 W 0 band structure data set is presented along with a description of the employed machine learning model.

Electronic state fingerprints
The ENDOME fingerprint is based on operator matrix elements between electronic states (here assumed to be Bloch states of a periodic crystal) where Â is some operator.For a reference state |nk with energy ε nk , the ENDOME fingerprint is defined as where G(x; δ) is a Gaussian of width δ centered at x = 0.This function encodes the matrix element between the reference state and all other states at an energy distance of E from the reference state.In principle, any operator can be used to create fingerprints, but in this study we include the position operators (x, y, z), the momentum operators (∇ x , ∇ y , ∇ z ), and the Laplace operator (∇ 2 ).These operators are all diagonal in the k index.In addition, we include the all-one matrix, A nk,n k = 1, which essentially yields the density of states (DOS) translated to the energy of the reference state, ε nk .
In practice, the function m A nk (E) is represented on a uniformly spaced energy grid with 50 energy points from -10 to 10 eV around the reference state.Since we consider 2D materials, the in-plane (x and y) components of both the position and momentum operators are collected into a single fingerprint vector (i.e.m xy nk = m x nk + m y nk and similarly for the momentum operator) while the out-of-plane z component is treated separately.For a given reference state, the ENDOME fingerprint thus consists of six 50-dimensional vectors resulting in a total of 300 features.
The RAD-PDOS encodes the electronic structure in terms of the density of states projected onto atomic orbitals.Specifically, a correlation function in energy and radial distance is defined as where N e is the number of electrons in the system, a and a denote atoms in the primitive unit cell and the entire crystal, respectively, and ν and ν denote atomic orbitals.The atomic projections are given by The functions ρ νν nk (E, R) are represented on a uniform (E, R)-grid of size 25 × 20 spanning the intervals from -10 to 10 eV (centered around the reference energy ε nk ) and 0 to 5 Å, respectively.For the Gaussian smearing functions we use δ E = 0.3 eV and δ R = 0.25 Å, respectively.For a given state, the RAD-PDOS fingerprint consists of six 2D grids of 500 points each resulting in a total of 3000 features.
Figure 2 shows examples of ENDOME and RAD-PDOS fingerprints for three different states at the K-point of MoS 2 .Note that some of the RAD-PDOS fingerprints are qualitatively similar (e.g.sp and pp) but the scales differ by about an order of magnitude.This is due to the fact that the density of states projected onto s and p orbitals have similar dependence on energy.
The G 0 W 0 data set The data set comprises quasiparticle (QP) energies from 286 G 0 W 0 band structures of non-magnetic 2D semiconductors covering 14 different crystal structures and 52 chemical elements.The QP energies have been obtained from plane-wave-based one-shot G 0 W 0 @PBE calculations with full frequency integration and were produced as a part of the Computational 2D Materials Database (C2DB) [20,21].The data set has been described and analysed in detail in [25].
The QP energies of the data set have been calculated under the standard assumption that the G 0 W 0 selfenergy can be treated within first-order perturbation theory and linearised around the non-interacting reference energy, ω = ε nk , leading to the expression where is the QP weight and ψ nk is the PBE wave function with eigenvalues nk .In practice, the G 0 W 0 correction to the PBE energies, ∆E QP nk = E QP nk − nk , were used as targets for the machine learning model.To ensure the highest data quality, the original data set was filtered such that only states with QP weight between 0.7 and 1.0 were kept.As shown in Ref. [25] the MAE on the QP correction of such states due to the linearisation of the QP equation is 0.04 eV.

Machine learning model
The choice of learning algorithm for a machine learned model depends on different considerations such as the amount of training data available and the nature of the learning objective (regression/classification, discrete/continuous).The fingerprints presented here are not designed for a specific learning algorithm and can thus be used to train a wide range of algorithms.For this specific purpose of predicting G 0 W 0 QP energies, several types of algorthims including tree-based ensemble methods, neural networks and gaussian process regression have been considered and tested.The machine learning model is built using a gradient boosting method from the XGBoost distribution based on decision trees in an ensemble [24].The choice of XGBoost as learning algorithm is based on its generality and good performance across multiple machine learning applications, the possibility to extract knowledge from single features and the ability of training on large amounts of data.For this specific purpose, a neural network and a gaussian process regression method have also been tested resulting in similar prediction accuracy.
A train and test set is created using a random 80/20% split on the material level which results in a train set of 228 materials (37851 QP energies) and a test set of 58 materials (8766 QP energies).Hyperparameters of the learning algorithm (max depth = 5, learning rate = 0.15, and number of estimators = 60) are tuned using a grid search method with a 5-fold cross-validation of the 80% train set.The performance of the machine learning is based on the mean absolute error (MAE) of the 20% test set.
Since the test set size is only 58 materials, the test MAE might exhibit some test set dependence.To evaluate this effect, the entire process of splitting the data in 80/20% train/test set, training the model using 5-fold cross validation on the train set, and evaluating the MAE of the test set, has been repeated 100 times using different seeds for the random split.The distribution of the 100 test MAEs have a mean of 0.13 eV and a standard deviation of 0.02 eV.We note that the specific test set used for Table 1 yields a MAE within one standard deviation from the mean.
Since the XGBoost model is based on decision trees some small discontinuities in band energies might be introduced by the model.When calculating effective masses using a harmonic fit on a much smaller energy scale than the full band structures it was necessary to use a neural network (feed-forward network with 3 hidden layers with 200 neurons and tanh activation functions) to ensure a more continuous output.This NN yielded a test MAE of 0.13 eV compared to the 0.11 eV of the XGBoost model.

Figure 1 :Figure 2 :
Figure 1: G 0 W 0 data.(a) Example of PBE and G 0 W 0 band structures of monolayer MoS 2 .The prediction target data is the difference in energy between the PBE and G 0 W 0 energies.(b) Histogram of the G 0 W 0 corrections for all states in all materials.(c) Histogram of the absolute values of the G 0 W 0 corrections with a mean of 1.17 eV.

Figure 3 :
Figure 3: tSNE visualizations of fingerprints.a) tSNE components of ENDOME p xy .b) tSNE components of RAD-PDOS pd fingerprints color-coded with the GW corrections.For p xy , states with similar GW corrections are also close in fingerprint space.In b) a large amount of the states with both positive and negative GW corrections have similar distances in fingerprint space, corresponding to the materials without d-electrons where the RAD-PDOS pd fingerprint will be all zeros.

Figure 4 :Figure 5 :
Figure 4: Machine learning results.a) Parity plot showing the ML predicted vs. true values of the GW correction for individual states for the train and test set.The MAEs of the train and test set are 0.05 and 0.11 eV, respectively.b) Histograms of the prediction residuals of the train and test set.c) Learning curve for the ML model showing validation MAE as function of number of materials/states in training set.

Figure 6 :
Figure 6: Comparison of band gaps.Parity plots for predicted bandgaps vs. GW bandgaps for PBE and HSE06 and two different ML models predicting GW corrections for either all bands (MAE = 0.18 eV) or only valence and conduction bands (MAE = 0.15 eV) which significantly outperform PBE and HSE06 with MAEs of 1.70 and 0.85 eV, respectively.

Figure 7 :
Figure 7: Effective masses.Comparison of effective masses calculated using PBE and ML-G 0 W 0 eigenvalues for valence and conduction band of ∼ 800 materials.a) shows effective masses for the valence bands and b)shows for the conduction bands.There seems to be a (weak) systematic trend for the ML model to predict smaller effective masses than PBE for both valence and conduction bands.

Figure 8 :
Figure 8: Feature analysis of ML model.Solid bars refers to a ML model using only the specific features while the shaded bars are for a ML model without these features.a) High-level feature groups.b) Low-level feature groups.

Figure 9 :
Figure 9: SHAP analysis.a) SHAP values for α xy for the prediction of GW correction energies color-coded by occupancy.For materials with a low polarisability the ML model predicts a more negative GW correction for the occupied states and a more positive correction for the unoccpied states.For materials with a high polarisability the occupied states are predicted with a more positive correction when using the polarisability as a feature while the unoccupied states are only weakly affected.b) SHAP values for α xy for the prediction of band gaps.This shows that the band gap increases for materials with a low α xy and decreases for high α xy values.

Table 1 :
Summary of results.The table shows the mean absolute error (MAE) on the band gap and individual state energies for G 0 W 0 versus experiments and different approximate methods versus G 0 W 0 , respectively.The MAE on state energies is always evaluated for the 8 highest valence bands (VB) and 4 lowest conduction bands (CB).ML(X) refers to the test set MAE of the gradient boosting model after training on all bands (8VB+4CB) or only the highest valence and lowest conduction band (VB+CB), respectively.The values marked by (*) are obtained after training the model with the static polarisability of the materials included as extra features in the fingerprint.