Introduction

Over the past decades, density-functional theory (DFT)1,2 has become the workhorse theory in computational chemistry and solid-state physics. This powerful approach is an exact and elegant reformulation of the many-body quantum mechanics that governs the behavior of electrons in all kinds of systems (atom, molecule, solid, etc.). Moreover, the Kohn–Sham equations that stem from the theory2 can be solved efficiently with modern computers. These equations rely on a single approximation, namely the one for the exchange-correlation (xc) energy, which is responsible for the accuracy of the calculations3. A very large number, perhaps more than 5004,5, of such approximations have appeared in the literature over the last 50 years. However, only a handful of them, such as the PBE from Perdew, Burke, and Ernzerhof6,7 or the hybrids B3LYP8,9 and HSE06 from Heyd, Scuseria, and Ernzerhof10,11, have found widespread use.

In principle, the exact xc functional is universal, i.e., it can be used for any system of electrons interacting through the Coulomb potential and for any (ground-state) property. Some approximate functionals however, are designed to provide a particular good description of a specific quantity, such as for instance the nuclear magnetic resonance chemical shift12, ionization potential13, formation energy14, or electronic band gap15,16,17. Such approach favors the best description of the target property, at the cost of an accompanying reduction of the scope of the functional. Of course, for many specific applications, accuracy is more important than universality, so this path is widely approved.

A large amount of attention has been devoted by the computational solid-state community to the problem of predicting the fundamental band gap of materials. This is due to the relevance of the band gap in technological applications such as opto-electronics and photovoltaics18,19.

A proper evaluation of the band gap is one of the greatest challenges for Kohn–Sham DFT. The fundamental gap (EG) for a system of N electrons is defined as the difference between its ionization potential (I) and its electron affinity (A),

$${E}_{{\rm{G}}}=I-A,$$
(1)

meaning that it can be expressed as a difference of ground-state total energies, which are accessible to DFT. On the other hand, the Kohn–Sham (KS) band gap is defined as the difference between the eigenvalues of the conduction band minimum (CBM) and the valence band maximum (VBM):

$${E}^{\rm KS}_{{\rm{g}}}={\varepsilon }_{{\rm{CBM}}}-{\varepsilon }_{{\rm{VBM}}}.$$
(2)

In exact Kohn–Sham DFT these two quantities are not equal, as they differ by the so-called derivative discontinuity Δxc20,21:

$${E}_{{\rm{G}}}={E}_{{\rm{g}}}^{{\rm{KS}}}+{\Delta }_{{\rm{x}}c}.$$
(3)

The exchange-correlation functionals that depend explicitly on the electron density, such as local density approximations (LDAs) and generalized gradient approximations (GGAs), have a derivative discontinuity that is exactly zero (\({\Delta }_{{\rm{x}}c}^{{\rm{LDA}},{\rm{GGA}}}=0\)) for solids22. This is often presented as a justification for the systematic underestimation of band gaps in these approximations21,23,24. More sophisticated functionals, like hybrids and meta-GGAs, which depend on the orbitals, can possess a non-vanishing derivative discontinuity Δxc. Therefore, within the standard Kohn–Sham theory, a theoretical estimation of the band gap must include both terms, as in Eq. (3).

However, hybrid and meta-GGA functionals are often not implemented in the standard Kohn–Sham formalism, where the Kohn–Sham potential is strictly local, since this would require using the optimized effective potential method25,26,27, i.e. solving a complicated integral equation. Instead, the large majority of codes28,29,30,31,32 implements meta-GGAs and hybrids within the generalized Kohn–Sham (GKS) formalism33, which leads to a non-local potential. In this formalism, it was shown that the generalized Kohn–Sham gap \({E}_{{\rm{g}}}^{{\rm{GKS}}}\) is a direct approximation to EG and that no derivative discontinuity should be added34,35,36,37,38. This is also consistent with the interpretation of hybrid functionals as approximations to the self-energy of the many-body GW theory39. Comparing the generalized Kohn–Sham band gap of a hybrid or meta-GGA functional to the experimental photoemission gap is hence justified. Note that for explicit functionals of the electron density, the Kohn–Sham and generalized Kohn–Sham formalisms are equivalent and lead to the same (semilocal) potential. For a deeper discussion on the definition of band gaps of solids, we refer the reader to ref. 34.

Of course, most of the experimentally measured band gaps are optical band gaps, which differ from the photoemission band gap by the excitonic binding energy. The latter can, in principle, be calculated by solving the many-body Bethe–Salpeter equation40 (or by using time-dependent DFT41). However, in practice, for the majority of bulk materials the excitonic binding energy amounts to tens of meV42, a much smaller quantity than typical errors of DFT functionals.

Recently, we developed a large material data set with the objective of benchmarking approximate xc functionals for the calculation of band gaps43. This data set includes a large variety of semiconductors and insulators, with small, intermediate, and large band gaps. Furthermore, it includes materials with chemical elements that span almost the complete periodic table. We performed DFT calculations for all these materials using 12 functionals of the LDA, GGA, meta-GGA, and hybrid types43. The most accurate functional turned out to be the popular modified (m) Becke–Johnson (BJ) with LDA correlation (mBJLDA) meta-GGA potential15, followed very closely by the GGA high-local exchange (HLE16)16 and the screened hybrid HSE0610,11,44.

Here, we will take a step further: we investigate the performance of 21 additional xc functionals to obtain a complete picture of the advantages and limitations of each approximation. Building on this information, we propose then reparametrizations targeted to improve the accuracy of the most reliable functionals in the determination of band gaps.

Most of the functionals that we study in this work are modern meta-GGAs, which should potentially be more accurate than LDAs and standard GGAs for the calculation of band gaps. Actually, a few of them were developed specifically for band gaps (such as the meta-GGA HLE1717), whereas others were developed for better accuracy in general. Besides meta-GGAs, less studied LDA- and GGA-type functionals are also considered.

Our choice to benchmark DFT functionals leaves out several other approaches proposed in the literature to tackle the prediction of band gaps. These cover methods like DFT+U45,46, DFT−1/247,48,49, Koopmans-compliant functionals50, dielectric-dependent hybrids51,52, and self-interaction correction methods53. Their inclusion, which we leave for future work, would require additional computational resources that are not available.

All xc functionals depend in some way on internal parameters that are either determined from theoretical considerations or by fitting accurate ab initio results or experimental data. In order to design a new and improved approximation it is essential to understand the behavior of the chosen analytical form with respect to variations of its internal parameters. As such, we focus in this work on the error in the band gaps as a function of the internal parameters of selected functional forms. This analysis allows us also to select improved empirical parametrizations by minimizing the average error within a data set.

Finally, we show how to use this large data set to train a machine learning model for band gap prediction. In fact, band gaps calculated using different functionals can be used as “estimators”. In principle, it should be possible to decrease the error in the calculation of band gaps by combining several estimators using statistical methods. We decided therefore to experiment this strategy testing few machine learning models that are especially adequate for this task54. There has already been a number of machine learning studies concerning the prediction of band gaps. One research direction is the direct prediction of DFT band gaps to avoid ab initio calculations55,56. However, this approach provides at best an accuracy that is slightly worse than the accuracy of the functional used in generating the training data. A second approach relies on learning higher fidelity theoretical band gaps, like in ref. 57, which leads to a slightly improved accuracy in comparison to experimental data. Finally, one can directly learn experimental band gaps, as in ref. 58. We will follow the latter approach, combining it with the use of various functionals as input features for the machine learning model, to achieve higher accuracy than traditional DFT calculations.

The remainder of this paper is organized as follows. In the following, we provide a brief overview of the tested functionals, followed by a description of the large-scale benchmark and its results in “Results and discussion”. In the “Functional reparametrization” section, we study the behavior of selected functionals as a function of their internal parameters, which leads to the new parametrizations that are then tested and analyzed. In the “Machine learning” section, we investigate machine learning models that use our data to improve the estimation of band gaps. We finish with a summary and conclusions. Details on our calculations are given in the “Methods” section.

The 12 xc functionals/potentials that were already tested in our previous work43 are the LDA PZ8159, three GGAs (PBE6,7, PBEsol60, and HLE1616), three meta-GGAs (BJ61, mBJLDA15, that we call here for simplicity mBJ, and SCAN62), three traditional hybrids (HSE0610,11,44, HSE1463, and PBE064,65), and two hybrids with a system-dependent mixing parameter (HSEmix39 and PBE0mix39). A description of these functionals can be found in ref. 43. In the following, we provide a brief description of the additional functionals selected for the present work. The materials database is also presented in ref. 43.

We start at the LDA level of approximation, i.e., first rung of Jacob’s ladder66, considering the local Slater potential (Sloc)67. Its starting point is some general, yet very simple, form of LDA-type exchange potential:

$${v}_{{\rm{x}}}=-a{n}^{b},$$
(4)

where \(n=\mathop{\sum }\nolimits_{j}^{{\rm{occ.}}}{\left|{\psi }_{j}\right|}^{2}\), built with the Kohn–Sham orbitals ψj, is the electron density. This functional form, that encompasses both the LDA68 and the Slater Xα method69, was fitted to the non-local Slater potential70, calculated with the Hartree–Fock orbitals71 of a series of closed-shell atoms (Be, Ne, Mg, Ar, Ca, Zn, Kr, Sr, Pd, and Xe), to determine the fitting parameters a = 1.67 and b = 0.3. It is obvious that Sloc violates several exact conditions, most strikingly the requirement of uniform coordinate scaling of exchange, which requires b = 1/372. Furthermore, and despite the fact that this functional is technically a LDA, it does not recover the exact exchange energy of the homogeneous electron gas for uniform densities, which is obtained only if a = (3/π)1/3 ≈ 0.985 and b = 1/3. Note that the exchange energy functional corresponding to Eq. (4) is Ex = −a/(b + 1)∫nb+1d3r.

Moving to the GGA family of functionals, the second rung of Jacob’s ladder, we find the Engel–Vosko exchange functional (EV93)73. Its enhancement factor Fx, which multiplies the LDA-like part of the functional, has a Padé form:

$${F}_{{\rm{x}}}^{{\mathrm{EV93}}}=\frac{1+{a}_{1}{s}^{2}+{a}_{2}{s}^{4}+{a}_{3}{s}^{6}}{1+{b}_{1}{s}_{1}^{2}+{b}_{2}{s}^{4}+{b}_{3}{s}^{6}},$$
(5)

where \(s=\left|\nabla n\right|/(2n{k}_{{\rm{F}}})\) is the reduced gradient and \({k}_{{\rm{F}}}=({3{\pi }^{2}n})^{1/3}\) is the Fermi momentum. In order to recover the gradient expansion, the difference a1b1 = 10/81 is fixed, reducing to five the number of free-parameters. These were fitted to the local part of the virial relation72, obtained from results of the optimized potential method25,74 for a set of 20 spherical atoms (Kr, Rb, Sr, Mo, Tc, Pd, Ag, Cd, Sb, Xe, Cs, Ba, Eu, Yb, Re, Pt, Au, Hg, Bi, and Rn). These atoms were chosen because they present extensive regions for which s < 1. As EV93 is an exchange-only functional, in this work, as in previous works75,76, it is used in conjunction with the GGA correlation PW91 (EV93PW91). Tested in previous works75,76,77, EV93PW91 moderately improves over the PBE for band gaps.

The Armiento–Mattson (AM05) xc functional78 was constructed in order to reproduce both the homogeneous electron gas and the Airy gas model of the edge electron gas79. Similarly to the GGA PBEsol, AM05 was shown to be more accurate than PBE for the lattice constant of solids80.

The second-order GGA proposed in 2011 (SOGGA11)81 was designed, as the name indicates, with the intent of satisfying the second-order gradient expansion, while preserving good energetics for molecules. This GGA defines an exchange enhancement factor Fx that is the sum of two geometric series, whose arguments are in turn the exchange enhancement factors of PBE6,7 and RPBE82. SOGGA11 treats correlation in a similar fashion, although using inspiration from the GGA of Perdew-Wang (PW91)83,84,85. Due to its construction, this functional has many parameters (24 in total), most of them fitted to a series of databases. According to the results in ref. 86, SOGGA11 is slightly better than the PBE for the band gap of simple semiconductors.

The Armiento–Kümmel functional (AK13)87 is an exchange-only approximation designed to mimic the asymptotic behavior (1/r) of finite and semi-infinite (surface) systems, but also to have a reasonable behavior for small s. The authors thus obtained two expressions for Fx, adequate in the corresponding regimes, and by combining them the resulting enhancement factor takes the form

$${F}_{{\rm{x}}}^{{\rm{AK}}13}=1+{B}_{1}s\,{\mathrm{ln}}\,(1+s)+{B}_{2}s\,{\mathrm{ln}}\,(1+{\mathrm{ln}}\,(1+s)).$$
(6)

The two coefficients are determined by satisfying the gradient expansion at small s, and demanding that at large distance r from the nuclei the potential has the form 1/r. AK13 was shown to improve over the PBE for band gaps88. Note that there are a series of issues related to the asymptotic form of the AK13 functional89,90,91; the situation is similar to the one of LB94 and (m)BJ, that are described below. These problems limit considerably the possibility to use the functional for finite systems. However, they are not relevant for the present study, since we are dealing exclusively with infinite periodic systems. As AK13 is an exchange-only functional, we use it together with the LDA correlation PW9292. In the following, we denote the combination of AK13 exchange and LDA correlation as AK13LDA.

With the vast majority of GGA approximations to the xc energy the corresponding potential vxc decays too fast, i.e., typically exponentially instead of decaying as 1/r (AK13 is one exception). Van Leeuwen and Baerends93 proposed a solution to this problem by modeling directly the exchange potential so that it obeys the correct asymptotic behavior 1/r. Their potential LB94, which is inspired by a previous work of Becke94, has a rather simple form:

$${v}_{{\rm{xc}}}^{{\rm{LB94}}}={v}_{{\rm{x}}c}^{{\rm{LDA}}}-{\left(\frac{n}{2}\right)}^{1/3}\frac{\beta {x}^{2}}{1+3\beta x\,{\rm{arc}}\,{\rm{sinh}}(x)},$$
(7)

where \(x={2}^{1/3}| \nabla n| /{n}^{4/3}=2{\left(6{\pi }^{2}\right)}^{1/3}s\). As in the case of basically any direct approximation to the xc potential74,95, Eq. (7) is not a functional derivative of an energy functional, and thus violates the zero-force and zero-torque theorems, along with several other exact conditions95,96. On the other hand, its correct asymptotic behavior makes LB94 useful in situations where ionization is important97,98,99. As indicated in Eq. (7) the LDA correlation PW92 is used.

We turn now to meta-GGA functionals, which depend also on the kinetic-energy density \(\tau =\left(1/2\right)\mathop{\sum }\nolimits_{j}^{{\rm{occ.}}}\nabla {\psi }_{j}^{* }\cdot \nabla {\psi }_{j}\) and represent the third rung of Jacob’s ladder. The Perdew–Kurth–Zupan–Blaha (PKZB)100,101 functional, one of the earliest meta-GGAs, has an exchange enhancement factor similar to the PBE and recovers the fourth-order gradient expansion. In addition, its correlation component is exactly zero for one-electron densities, i.e., the functional is self-interaction-error free. PKZB, which has two fitted parameters, yields good surface and atomization energies100, but it provides a rather poor description of the geometry of molecules102 and solids103, as well as hydrogen-bonded systems104.

The non-empirical meta-GGA Tao–Perdew–Staroverov–Scuseria (TPSS)105,106 is built on top of the PKZB functional, and it is forced to satisfy additional exact mathematical constraints. In terms of performance (geometry and energetics), TPSS represents an improvement with respect to PKZB105,106.

Several other functionals were built starting from the TPSS, such as the revised TPSS (revTPSS)107,108. This functional attempts to correct the deficiencies of TPSS for solids (e.g., too large lattice constants), while maintaining the same accuracy for molecular systems. The differences between TPSS and revTPSS concern the reparametrization of several coefficients, to better describe the small gradient expansion, and the change of an exponent in the enhancement factor. This functional yields results close to those of PBEsol60, but it suffers from some analytic complications related to its asymptotic behavior109.

The meta-GGA “made very simple” (MVS)110 is an exchange-only functional, which has a relatively simple form

$${F}_{{\rm{x}}}^{{\rm{MVS}}}=\frac{1+{k}_{0}{f}_{{\rm{x}}}(\alpha )}{{\left(1+b{s}^{4}\right)}^{1/8}},$$
(8)

where

$${f}_{{\rm{x}}}(\alpha )=\frac{1-\alpha }{{\left[{\left(1+{e}_{1}{\alpha }^{2}\right)}^{2}+{c}_{1}{\alpha }^{4}\right]}^{1/4}}$$
(9)

and \(\alpha =\left(\tau -{\tau }^{{\rm{W}}}\right)/{\tau }^{{\rm{TF}}}\) is an iso-orbital indicator. In the latter expression, \({\tau }^{{\rm{W}}}={\left|\nabla n\right|}^{2}/\left(8n\right)\) and \({\tau }^{{\rm{TF}}}=\left(3/10\right){\left(3{\pi }^{2}\right)}^{2/3}{n}^{5/3}\) are the von Weizsäcker111 and Thomas-Fermi112,113 kinetic-energy densities, respectively. Equation (9) interpolates between the homogeneous electron gas (α = 1) and the lower bound114 (α = 0) regimes. The constants k0, b, e1, and c1 were chosen such that this functional obeys exact constraints, among which the second-order gradient expansion and the large-Z asymptotic expansion of the exchange energy for neutral atoms115. We use MVS exchange in conjunction with the regularized TPSS107 correlation, sometimes also called vPBE in the literature.

The recent “strongly constrained and appropriately normed” (SCAN)62 functional is currently one of the most popular meta-GGAs for the geometry optimization and energetics of solids103,116. It was constructed non-empirically to satisfy all 17 mathematical conditions that are possible to impose to a meta-GGA functional. Several modifications of SCAN have been proposed117,118. One of these is the regularized SCAN (rSCAN)118, which tries to solve some of the numerical problems of SCAN, related to the iso-orbital indicator α defined above. The derivative of α, a common ingredient in the formulation of meta-GGAs, can lead to divergences in some physical situations119, which in turn cause divergences in the xc potential. rSCAN replaces the kinetic energy of the homogeneous electron gas, the iso-orbital indicator and the switching function of the original SCAN, with modified versions that greatly minimize the divergence problems.

Tao and Mo (TM)120 proposed an exchange enhancement factor based on a weighted average of the density-matrix expansion (DME) and a fourth-order gradient correction (slowly varying correction, SC):

$${F}_{{\rm{x}}}^{{\rm{TM}}}=w{F}_{{\rm{x}}}^{{\rm{DME}}}+(1-w){F}_{{\rm{x}}}^{{\rm{SC}}}.$$
(10)

The explicit form of these terms can be found in ref. 120. As for the correlation, the TM functional is based on TPSS correlation, but with modifications in order, for instance, to improve the description of the low-density region.

Recently, a revised version of TM (revTM)121 has been proposed. This functional applies two modifications to TM inspired by the TPSS and revTPSS functionals. First, it generalizes a variable \(\tilde{q}\) (involving s and α) in the exchange by including an extra parameter b. The generalized \(\tilde{q}\) is the same as the one used in TPSS and revTPSS. Second, it modifies the correlation part by turning the constant β into a function of the Wigner-Seitz radius \({r}_{s}={\left[3/\left(4\pi n\right)\right]}^{1/3}\), similarly to what revTPSS does. This allows to recover the correct asymptotic behavior in the low- and high-density regimes.

We also consider a very recent addition to the literature, the TASK functional122. This functional was built with the goal of introducing ultra-nonlocality, typical of hybrid functionals, at the meta-GGA level. This is achieved by imposing constrains that enforce a sizable contribution to the derivative discontinuity. Although the resulting functional is not outstanding for what concerns atomization energies of molecules, it was found to give a considerable improvement for band gaps of typical semiconductors with respect to PBE and SCAN122.

Turning now to more empirical meta-GGAs, the “high local exchange” proposed in 2017 (HLE17)17 is a re-scaled version of the TPSS using the same ideas of the GGA HLE1616. Specifically, the exchange and correlation parts are scaled by 1.25 and 0.5, respectively:

$${E}_{{\rm{xc}}}^{{\rm{HLE17}}}=1.25{E}_{{\rm{x}}}^{{\rm{TPSS}}}+0.5{E}_{{\rm{c}}}^{{\rm{TPSS}}}\ .$$
(11)

This empirical scaling was chosen such that HLE17, as HLE1616, can yield good results for the band gaps of solids and excitation energies of molecules. As reported in ref. 17, HLE17 performs as well as the hybrid HSE06 for simple semiconductors, but is much worse for antiferromagnetic transition-metal oxides.

The functionals of the Minnesota family are also empirical. Although these functionals belong to several different rungs of Jacob’s ladder, they are sometimes grouped together as they originate from the same research group and follow similar philosophies. These are highly parameterized functionals (often depending on more than 15 parameters) fitted to a series of databases and they aim at a good description of general properties. Among them we will focus on the following meta-GGAs: M06-L123,124, revM06-L125, M11-L126, MN12-L127, and MN15-L127.

In contrast to all meta-GGAs presented above, the one of Räsänen, Pittalis, and Proetto (RPP)128 is only a potential, with no corresponding energy functional. Actually, it belongs to the well-known family of BJ exchange potential approximations61. Similarly to BJ, RPP is built as a correction to the Slater potential70, that is usually accurately approximated129 by the Becke–Roussel (BR) formula130, and takes the form

$${v}_{{\rm{x}}}^{{\rm{RPP}}}={v}_{{\rm{x}}}^{{\rm{Slater}}/{\rm{BR}}}+\frac{1}{\pi }\sqrt{\frac{5}{12}}\sqrt{\frac{D}{n}}\ ,$$
(12)

where

$$D=2\tau -\frac{1}{4}\frac{{\left|\nabla n\right|}^{2}}{n}.$$
(13)

The RPP potential, which differs from the original BJ potential by just the extra term \(-{\left|\nabla n\right|}^{2}/(4n)\) in Eq. (13), approaches zero asymptotically, is exact for any single-orbital system (in contrast with BJ) as well as for the homogeneous electron gas. It has been shown to be as accurate as the BJ potential for the ionization potential, electron affinity, and polarizability of finite systems131. We mention that, as originally proposed128, D also contains an extra term depending on the paramagnetic density-current j, however this term is not relevant here since it is zero in the present context (non-magnetic solids). As in the case of BJ, the RPP potential is used with the BR term in Eq. (12) and is combined with the LDA correlation potential of PW9292.

Results and discussion

As a reliable evaluation of performance cannot be reduced to a single value, we will use an assortment of statistically relevant quantities in our analysis132. Specifically, we resort to Kendall’s133 rank correlation τ and Pearson’s correlation coefficients r; the coefficients of the linear fit y = ax + b for the calculated versus experimental band gaps; the mean absolute error \({\rm{MAE}}=\mathop{\sum }\nolimits_{i = 1}^{n}| {y}_{i}-{y}_{i,\exp }| /n\); the mean error \({\rm{ME}}=\mathop{\sum }\nolimits_{i = 1}^{n}({y}_{i}-{y}_{i,\exp })/n\); the error standard deviation \(\sigma =\sqrt{\mathop{\sum }\nolimits_{i = 1}^{n}{({y}_{i}-{y}_{i,\exp }-{\rm{ME}})}^{2}/n}\); the median error MnE; the interquartile range IQR; the median of the absolute deviations from the median MADM; the mean absolute percentage error \({\rm{MAPE}}=100\times \mathop{\sum }\nolimits_{i = 1}^{n}| {y}_{i}-{y}_{i,\exp }| /(n\ {y}_{i,\exp })\); and the mean percentage error \({\rm{MPE}}=100\times \mathop{\sum }\nolimits_{i = 1}^{n}({y}_{i}-{y}_{i,\exp })/(n\ {y}_{i,\exp })\).

We emphasize that the experimental values of the band gaps have an error bar that is often very hard to determine. Moreover, there are a myriad of effects (temperature, sample quality, fitting, etc.) that can limit the accuracy of the experimental values. In the context of our data set this was already extensively discussed in ref. 43.

In order to gather more insight into the behavior of the functionals, we further separate our analysis on the subsets of sp and fd materials, where the former consists of materials composed of chemical elements with only s and p valence electrons, and the latter includes the remaining compounds. We consider also the subset, denoted TB67, obtained from intercepting the present data set and that of ref. 76. This allows us to compare the effects of data set size on the computed quantities. Apart from the size, the main difference between these two data sets is the absence of magnetic compounds in the present work (six antiferromagnetic oxides were in the data set of 76 compounds used in ref. 76).

Ideally, we would generate further subsets referring to the nature of the compounds (ionic, covalent, van der Waals, etc.). Although physical intuition might be sufficient to classify some compounds in this manner, the lack of a rigorous separation criteria means that we cannot do this on the large scale of the present data set.

In order to present the statistical data, we resort to radar plots that provide a rather intuitive overview over the most important statistical quantities that we use to quantify the errors. The exact numerical values are presented in Table SI of the Supplementary Information.

Error averages for the sets of materials containing each element of the periodic table, along with error averages for specific gaps ranges can be found in Supplementary Figs. S1 and S2. In addition, Supplementary Figs. S3S26 show a comparison of band-gap values and errors between selected pairs of functionals. Supplementary Fig. 27 compares other statistical variables for the whole dataset and Fig. S28 displays the amount of false metals predicted by each functional. Table SII lists ICSD-ids and compositions of the 85 materials used for the reparametrization of the functionals, while Table SIII reports the compositional features used for machine learning feature selection. The Supplementary Information also contains a spreadsheet with all calculated band gaps.

In Figs. 13, we depict the radar plots for all functionals studied here. For comparison, we include also the functionals studied in ref. 43, although in the following we will focus on the new results. The discussion of the results in “LDA”, “GGA”, and “meta-GGA” sections will mainly be based on the analysis of ME, MAE, MPE, and MAPE. A brief discussion on the analysis using the other statistical variables is provided in the “Other metrics” section.

Fig. 1: Radar plots displaying the statistical quantities calculated in this benchmark (MAE, ME, b, σ, and IQR in eV; MPE and MAPE in %).
figure 1

Here we show the radar plots of the LDA and GGA functionals except LB94.

LDA

We start our analysis of the errors with Sloc (Fig. 1b). Although it is a very simple functional, it presents remarkably small errors. With a MAPE of 36% and a MPE of −11%, it surpasses both LDA (51% and −47%, respectively) and PBE (46% and −41%, respectively) in this front (see also Supplementary Fig. S3). In terms of ME (−0.43 eV) and MAE (0.66 eV), the Sloc errors are about twice smaller than with LDA and PBE. With the smaller sample size of the TB67 subset, the ME and MAE change noticeably (they increase to −0.71 and 0.91 eV, respectively), but to a lesser extent for the MPE and MAPE.

Throughout the periodic table Sloc presents low MAPE, with the exception of Pd and Pb (Supplementary Fig. S1). As usual, the inclusion of spin–orbit coupling should improve the results for solids with the Pb element when Sloc gives a too large band gap.

GGA

At the GGA level, the LB94 potential stands as basically the worst performing method (Fig. 2a), being on almost equal footing with LDA, PBE, and PBEsol (Supplementary Fig. S4). For the majority of compounds, LB94 increases the band gap with respect to LDA/PBE/PBEsol, but this is moderate and by far not systematic, since in many cases the LB94 band gap is actually smaller. Furthermore, the number of predicted false metals is 50, while it was in the range 30–35 for LDA/PBE/PBEsol43.

Fig. 2: Radar plots displaying the statistical quantities calculated in this benchmark (MAE, ME, b, σ, and IQR in eV; MPE and MAPE in %).
figure 2

Here we show the radar plots of LB94 and a set of meta-GGA functionals.

The band gap values calculated with AM05 (Fig. 1f) are very similar to the ones obtained with PBE (Supplementary Fig. S5). With 37 false metals, AM05 performs similarly to LDA/PBE/PBEsol and represents a slight improvement with respect to LB94.

The SOGGA11 functional (Fig. 1e) has a better performance than LDA and all GGAs discussed above, as it yields in average smaller errors, as seen from the M(A)E of −0.81 eV (0.90 eV) and M(A)PE of −33% (42%). It predicts 28 false metals, a slightly better result than PBE. In spite of this improvement, the underestimation of the band gaps is still large.

EV93PW91 (Fig. 1h) further improves the results of SOGGA11, with a M(A)E of −0.7 eV (0.8 eV) and M(A)PE −23% (37%). This improvement is seen in both sp and fd subsets, and in general in the range of small to intermediate band gaps (<4 eV). As in the case of the previous GGAs, the mean errors in the fd subset are smaller than those in the sp one, although the situation is inverted for the percentage errors. As seen in Supplementary Fig. S7, this is due to the prevalence of smaller gaps in the fd subset. With 17 false metals predicted, it performs very close to Sloc on this metric.

We end the discussion of the results at the GGA rung with AK13LDA (Fig. 1i). With M(A)PE of 14% (38%) and M(A)E of 0.05 eV (0.58 eV), this is the most accurate GGA among the functionals tested in this work. Its very small ME is an indication of the centered nature of the error distribution. As it turns out, this result is not achieved through a generally good behavior, but this is rather due to error cancellation between the different subsets. For sp materials, AK13LDA tends to overestimate band gaps, particularly the small ones below 2 eV. For fd materials, a general underestimation is observed. As a result, the overall distribution of percentage errors is almost symmetrical (Supplementary Fig. S7). With only 10 false metals predicted, it is the most reliable GGA in identifying semiconductors. All in all, this makes AK13LDA the best GGA functional tested in this work, in particular for band gaps larger than 1 eV. Its performance is overall rather similar (albeit slightly worse for the M(A)PE) to another GGA that we tested in ref. 43, the HLE16. It should be noted that, depending on the computer code, calculations with AK13LDA can be subject to numerical difficulties, particularly in cases of large gradients, and the self-consistent field convergence may be hard to reach. This type of convergence behavior is not uncommon with functionals that exhibit bumpy or unusual potentials, like AK13LDA (see plots of the AK13LDA potential in ref. 134).

All considered GGAs, except AK13LDA, present an overall tendency to underestimate band gaps. Considering chemical compositions, GGAs exhibit their largest errors for compounds containing Ni, Pd, Pt, and Pb (see Supplementary Fig. S1). Curiously, AK13LDA seems to be the functional suffering the most from this condition, in spite of its overall better performance.

meta-GGA

Turning now to the meta-GGAs, we start with PKZB (Fig. 2e). This functional presents mean errors very similar to SOGGA11, nevertheless PKZB reduces the number of predicted false metals (from 28 to 20), particularly in the fd subset. Also in the same way as SOGGA11, PKZB performs poorly for Ni and Pt compounds (Supplementary Fig. S1).

TPSS, revTPSS, TM, and revTM (Fig. 2 f–i), as it turns out, present very similar results to PKZB. This is not only inferred from their mean errors, but also from a direct comparison of the calculated results (Supplementary Fig. S8). We just note that all of these functionals predict a larger number of false metals than PKZB (20 for PKZB, 28 for TPSS, 30 for revTPSS, 24 for TM, and 30 for revTM), and that revTPSS and revTM both predict more false metals than their respective parent functionals TPSS and TM.

MVS (Fig. 3a) stands out as one of the few functionals tested here with positive MPE (2.4%). This is due to the large overestimation of band gaps smaller than 1 eV, while beyond this range it underestimates band gaps, but performs better than the previous meta-GGAs. It also performs clearly better for the prediction of false metals, 11 in total.

Fig. 3: Radar plots displaying the statistical quantities calculated in this benchmark (MAE, ME, b, σ, and IQR in eV; MPE and MAPE in %).
figure 3

Here we show the radar plots of a set of meta-GGA and hybrid functionals.

As expected, rSCAN (Fig. 2k) performs overall similarly to SCAN43, rSCAN achieves a M(A)PE of −23% (37%) while SCAN leads to −27% (38%). A noticeable difference is the description of small band gaps. For band gaps smaller than 1 eV, the MPE of SCAN is about 25%, while it is reduced to 10% for rSCAN. For both functionals, the MAPE for these small band gaps is large (75%). From an elemental point-of-view, the largest difference is that rSCAN presents much larger errors for Pt compounds (with a MAPE of 180%) than SCAN (MAPE of 51%). rSCAN improves the number of predicted false metals, 15, from the 20 of SCAN.

Among the newly tested functionals, HLE17 is one of the best performers (Fig. 3g). In spite of being a very simple (empirical) rescaling of TPSS, it presents considerably better results than all meta-GGAs discussed so far (PKZB, TPSS, etc.) in every measure. Not only does HLE17 reduce the underestimation of the band gaps (ME of −0.44 eV), it does so consistently across the sp and fd subsets (ME is −0.42 and −0.45 eV, respectively). On the other hand, such good results from HLE17 are not really unexpected, since, as already mentioned above, this functional was empirically fitted to be accurate for band gaps. Supplementary Figure S11 shows the comparison of the calculated gaps between TPSS and HLE17. Although a linear trend is observed, we can witness a larger dispersion of the results than in the previous cases. A linear fit to the (TPSS/HLE17) data yields a = 1.1 and b = 0.3, from which we clearly see the HLE17 tendency to predict larger band gaps than TPSS. Both of these observations were expected, as HLE17 increases the exchange contribution with respect to TPSS, leading to increased band gaps. Apart from this, other measures such as the standard deviation σ (0.82 eV), the IQR (0.75 eV), the MnE (−0.30 eV), and the MAE (0.60 eV) are also improved throughout the different subsets. Most striking are the changes in the mean (absolute) percentage errors which go from −37% (43%) for TPSS to −11% (31%) for HLE17. These results come mostly from the improved description of small band gap sp materials (Supplementary Fig. S12), which also explains the tiny MPE of 0.28% in this subset. Overall, the good quality of HLE17 is also visible when comparing the TB67 subset among the functionals. With 17 predicted false metals, HLE17 performs at the same level as Sloc in this aspect.

From Fig. 2n, the TASK functional seems very close to AK13LDA in terms of performance. This is confirmed by direct comparison in Supplementary Figs. S14 and S15, which in turn makes the overall discussion of this functional very similar to that of AK13LDA. However, these functionals present a different behavior in the TB67 subset. Here, TASK performs similarly to PBE0, overestimating small gaps and underestimating large ones, but with a much better performance for mid-range gaps. This can be understood by looking at the errors in this subset: M(A)E −0.4 eV (0.8 eV) and M(A)PE 14% (41%). Apart from this, TASK and AK13LDA present a similar elemental error fingerprint (with Ni, Pt and Pb showing the largest errors) and number of false metals (11 and 10, respectively).

Considering now the meta-GGAs of the Minnesota family, M06-L (Fig. 3b) shows a fingerprint very similar to PKZB, TPSS, revTPSS, TM, and revTM, but with improved results. Actually, with M(A)E of −0.8 eV (0.8 eV) and M(A)PE of −30% (38%), M06-L shows results rather similar to SCAN and the BJ potential, which were tested in ref. 43 (the results are reproduced in Fig. 2l and b, respectively).

The functional revM06-L (Fig. 3c) performs very differently from its parent M06-L, but much more akin to MVS, as shown in Supplementary Fig. S16. However, comparing the M(A)E and M(A)PE of revM06-L and HLE17, the results are of similar quality, and actually the number of false metals is only 10 for revM06-L (17 for HLE17). Thus, as HLE17, revM06-L should be considered as a functional with improved accuracy with respect to most other meta-GGAs. The largest errors with revM06-L are for compounds with Ni and Pt atoms (Supplementary Fig. S1).

The MA(P)E of M11-L are somehow similar to M06-L, i.e., worse than HLE17 and revM06-L but better than TPSS or revTPSS, for instance. Noteworthy, M11-L presents the same overestimation as MVS and revM06-L for band gaps smaller than 1 eV, which is responsible for the disparity between MAPE (40%) and MPE (−4%).

Finally, we turn to MN12-L and MN15-L (Fig. 3e, f), which perform similarly, although the latter is slightly better than the former for the M(A)E and MP(A)E. This can also be seen by comparing directly the results of the two functionals (Supplementary Fig. S18). These two functionals are overall inferior to HLE17.

We finish with the RPP potential (Fig. 2d) that, with M(A)E of −0.6 eV (0.7 eV) and M(A)PE of −22% (36%), performs pretty much as rSCAN (and therefore also SCAN).

Summary for the MAE and MAPE

The MAE and MAPE of most functionals tested in the present and previous43 works are shown in Fig. 4. This visual summary clearly shows which functionals reproduce most accurately the experimental results. The mBJ potential and the hybrid HSE06 are the most accurate for both the MAE and MAPE. Looking at the MAPE, the five best performing functionals are mBJ, HSE06, HLE17, HLE16, and MN15-L, while considering the MAE, we can pick out mBJ, HSE06, TASK, HLE14, and AK13LDA. Considering its extremely simple mathematical form, the performance of Sloc is very interesting. For comparison, we should mention that the results from ref. 77, obtained on the same data set with the computationally fast GLLB-SC potential135, are 0.7 eV and 39% for the MAE and MAPE, respectively, which is very similar to MVS or EV93PW91, for instance. Also, among the five hybrid functionals tested in ref. 43, HSE06 is the most accurate. At that point, we also remind that the mBJ potential shows also good performance (similar to HSE06) for antiferromagnetic oxides with localized 3d-states (not considered in the present work), while all other GGAs or meta-GGAs are clearly worse for these systems16,17,76.

Fig. 4: Overview of the MAE and MAPE of all studied functionals.
figure 4

MAE (x axis) vs. MAPE (y axis) for the functionals included in this work and in ref. 43.

Other metrics

Using other metrics may provide another way of comparing the performance of the functionals or reveal trends. For instance, the standard deviation σ lies in the range from 0.75 eV (TASK) to 1.09 eV (M11-L). This indicates that TASK, but also RPP, lead more systematically to the same error than the other functionals. A similar dispersion is observed for the IQR, which varies from ~0.75 eV (RPP and HLE17) to 1.04 eV (LB94), while the MADM ranges from ~0.36 eV (RPP and HLE17) to 0.52 (LB94). The MnE presents some more interesting details as AK13LDA is the only functional (along with all hybrid functionals tested in ref. 43) with a positive (and very small) value for this quantity (0.07 eV). This is not surprising given the overestimation of band gaps smaller than 1 eV, in combination with the small errors for larger band gaps, as previously discussed. From the remaining functionals, including those tested in ref. 43, HSE06 (0.0 eV), revM06-L (−0.1 eV), and mBJ (−0.1 eV) present the smallest absolute values of MnE.

From the point of view of reproducing the experimental trends with a linear fit (y = ax + b), the functionals that lead to the smallest error in this work are Sloc, AK13LDA, and TASK. Sloc is rather well balanced, since with coefficients a = 0.82 and b = 0.04 eV, it is simultaneously near the optimal values for both quantities. AK13LDA and TASK on the other hand yield values of a of 0.96 and 0.90, respectively, whereas most other functionals give values of a in the range 0.62–0.77. Their intercepts (b = 0.15 and 0.27 eV, respectively) are however worse than the one of Sloc. Since b is typically associated with systematic errors, AK13LDA could be easier to improve than Sloc. Among the methods tested in our previous work43, the mBJ potential shows the best linear fit, with a = 0.88 and b = 0.10 eV, which is overall of the same quality as Sloc and AK13LDA.

In the sp and fd subsets of compounds, the Sloc and AK13LDA functionals have similar behaviors. For sp, they maintain similar values of a (0.84 for Sloc and 0.97 for AK13LDA), but worse values for b (0.20 for Sloc and 0.40 for AK13LDA). However, for the fd subset, both functionals perform significanlty worse.

Functional reparametrization

In the “Results and discussion” section and in ref. 43, we presented a large quantity of statistical data regarding the calculation of band gaps using different functionals. Now we instead study systematically the performance of a few selected functionals for band gaps as a function of the internal parameters of the functional. This allows not only to optimize the functionals, but also to better understand their overall behavior. For this purpose, we selected at least one functional from each rung, with the exception of the hybrids, due to their computational cost.

Due to the large number of calculations required for this task, we did not use the complete data set. Instead, we decided to construct a subset comprising the 85 materials (listed in Supplementary Table SII) with the smallest number of electrons and that do not exhibit strong spin–orbit coupling. Although essential for computational reasons, we are well aware that this choice may lead to a bias. We note nevertheless that a large number of tests and benchmarks are often conducted with similar small data sets76,86,136,137,138.

Using this reduced data set, we selected the best parameters to optimize the size of the band gap. The functionals defined with these fitted parameters will be denoted with the prefix “t-”, to distinguish them from their regular counterparts. Radar plots depicting the errors of the fitted functionals can be found in Fig. 5.

Fig. 5: Radar plots displaying the statistical quantities calculated in this benchmark (MAE, ME, b, σ, and IQR in eV; MPE and MAPE in %).
figure 5

Results for the reoptimized functionals: (a) t-Sloc, (b) t-PBE1, (c) t-PBE2, (d) t-HLE17, (e) t-rSCAN, (f) t-mBJ, (g) t-RPP.

We begin our analysis with Sloc, given its surprisingly good results and its extremely simple form. In Fig. 6, we present the MAPE as a function of the two parameters of Sloc, namely a and b (Eq. 4). In the figure, we also show the parameters that correspond to the standard LDA exchange functional and to the original Sloc. It turns out that the MAPE depends much more strongly on the prefactor a than the exponent b. As Sloc underestimates band gaps, one expects the optimal parameters to counterbalance this by increasing the weight of the exchange contribution. This is achieved by increasing a and decreasing b. Indeed, the minimum of the MAPE is located at a rather small region of the space of parameters, specifically at a = 1.775 and b = 0.260. These new parameters define t-Sloc.

Fig. 6: MAPE as a function of the parameters a and b of Sloc.
figure 6

The symbols indicate the values of the parameter for the different functionals of this family: Sloc, LDA, and t-Sloc. The color-coded scale on the right is in %.

The fitted functional (Fig. 5a and Supplementary Fig. S23) indeed reduces the underestimation of Sloc but only by 0.05 eV on average for the complete data set (individual corrections vary from −0.18 eV for AgF to 0.80 eV for Ar). With the exception of the ME and MPE, the other metrics, most notably the MAPE, are however not improved upon.

The PBE approximation is certainly one of the most successful and used functionals in the solid-state physics community. It depends on four parameters whose numerical values are fixed by theoretical considerations, specifically (i) μ (to obey the slow varying limit of exchange), (ii) κ (to enforce the local Lieb–Oxford bound139), (iii) β (to obey the slow varying limit of correlation), and (iv) γ (to cancel the logarithmic singularity of the correlation in certain limits). The flexibility offered by this parametrization has originated a series of other members of this family (e.g., RPBE82, revPBE140, APBE141, PBEsol60, PBEfe14, xPBE142, and others143), some of which fitted to give more accurate results for specific quantities.

The two parameters that are the most relevant for band gaps are the ones that define the exchange contribution, specifically μ and κ. In Fig. 7, we show the MAPE as a function of these two parameters. The values of μ and κ for a few members of the PBE family of functionals are also indicated in the picture. The MAPE surface is considerably more complex than with Sloc, a feature also observed for other quantities144. There is a clear valley at around (μ, κ) = (0.6, 3) and some craters at higher κ.

Fig. 7: MAPE as a function of the parameters μ and κ of PBE.
figure 7

The symbols indicate the values of the parameter for the different functionals of this family: PBE (), revPBE (Δ), PBEfe (), and t-PBE1 (). The color-coded scale on the right is in %.

We chose to study further two different minima present in the plot, namely the values of (μ1, κ1) = (2/3, 3 + 1/6) (denoted t-PBE1) and (μ2, κ2) = (1/3, 12 + 5/6) (denoted t-PBE2, not shown in the figure). Note that these exhibit considerably higher values of κ with respect to typical PBE reparametrizations.

When compared to PBE, both parametrizations improve upon the severe band gap underestimation and the error dispersion (Fig. 5b, c), and reduce significantly the number of false metals (31 for PBE, 10 for t-PBE1, and 12 for t-PBE2). These two fitted functionals behave very differently for different band gap ranges. For band gaps smaller than 5 eV, t-PBE1 predicts larger band gaps than t-PBE2 (Supplementary Fig. S26), being closer to experimental values in this range. For the higher range this situation seems to be inverted, with t-PBE2 providing a better description of the gaps. However, the reduced number of compounds in this region makes it difficult to ascertain whether this is an actual improvement or only specifically for these compounds.

We now turn to the meta-GGA HLE17. Being simply a rescaling of the TPSS energy functional (Eq. 11), we study its behavior by allowing arbitrary weights wx and wc for each of the components,

$${E}_{{\rm{xc}}}^{{\rm{t}}-{\rm{HLE17}}}={w}_{{\rm{x}}}{E}_{{\rm{x}}}^{{\rm{TPSS}}}+{w}_{{\rm{c}}}{E}_{{\rm{c}}}^{{\rm{TPSS}}}\ ,$$
(14)

to be optimized. In addition to HLE17, we apply the same procedure to rSCAN, i.e.,

$${E}_{{\rm{x}}c}^{{\rm{t}}-{\rm{rSCAN}}}={w}_{{\rm{x}}}{E}_{{\rm{x}}}^{{\rm{rSCAN}}}+{w}_{{\rm{c}}}{E}_{{\rm{c}}}^{{\rm{rSCAN}}}\ .$$
(15)

Figure 8 represents the MAPE calculated on the training set, as a function of the fitting parameters for both scaled HLE17 and rSCAN. For both functionals the variation of MAPE is smooth, although there are qualitative differences between them. For HLE17, the MAPE is essentially independent of the correlation term. For fixed wc, the MAPE has an asymmetric shape, increasing more rapidly for smaller wx. In the end, we selected wx = 1.35 and wc = 1.00 for the final version of t-HLE17.

Fig. 8: Color map of the MAPE as a function of functional parameters of HLE17 and rSCAN.
figure 8

MAPE as a function of the parameters ωx and ωc of HLE17 (a) and rSCAN (b). We also indicate the actual parametrization of the original () and fitted () versions of the functionals. The color-coded scale on the right is in %.

In the case of rSCAN, correlation is much more relevant than in t-HLE17. This can be seen from the well defined minimum in the MAPE surface, located approximately at ωx = 1.30 and ωc = 1.40. These values define the t-rSCAN functional.

Regarding t-HLE17 (Fig. 5d), the small variation in the weight of the exchange part leads to slightly increased band gaps with respect to HLE17 (Supplementary Fig. S20). This is also visible from the change in the ME for the whole data set (from −0.44 eV for HLE17 to −0.27 eV for t-HLE17). However, such increase is not universal, and the amount of compounds for which t-HLE17 predicts a smaller band gap than HLE17 is also not very small (63 compounds in total, with LaF3 showing the largest reduction by 1 eV). As usual, the increase in predicted band gaps leads to a overestimation of band gaps smaller than 1 eV, which in turn translates to a large increase in the MPE. This increase is clearly visible in the sp subset which goes from 0.28% in HLE17 to 10% in t-HLE17. Values of absolute errors (MAE and MAPE) are not significantly changed between the two functionals. Finally, all of these changes are not accompanied by a change in error dispersion, as t-HLE17 presents a σ of 0.83 eV and IQR of 0.80 eV, very close to the HLE17 values.

The improvements of t-rSCAN over rSCAN are much more significant than the corresponding ones for HLE17. This fact is easily seen in Supplementary Fig. S21. As expected, t-rSCAN reduces the underestimations of rSCAN, and does it in all subsets, bringing the ME to −0.13 eV (all), −0.09 eV (sp) and −0.17 eV (fd). Unlike the case of t-HLE17, the MA(P)E are also significantly improved. The only error metric where t-rSCAN worsens with respect to rSCAN is in the MAPE of the sp subset (35% vs. 33%, respectively). When looking at the distribution of errors along the band gap range, one understands that this is, once again, due to the overestimation of band gaps smaller than 1 eV. Apart from this, there are some small gains in error dispersion and a reduced number of false metals (from 17 to 9). The description of the a and b coefficients is also not particularly good for both rSCAN (a = 0.70 and b = 0.09) and t-rSCAN (a = 0.82 and b = 0.34). In addition, this functional maintains the same problems as rSCAN for compounds containing Ni, Pt, Pd, and Pb (Supplementary Fig. S1).

The mBJ potential is an exchange- and potential-only meta-GGA approximation, which describes vx as an enhancement of the BJ potential61,

$${v}_{{\rm{x}}}^{{\rm{mBJ}}}=c{v}_{{\rm{x}}}^{{\rm{BR}}}+(3c-2)\frac{1}{\pi }\sqrt{\frac{5}{12}}\sqrt{\frac{2\tau }{n}}\ ,$$
(16)

with

$$c=\alpha +\beta {\left[\frac{1}{{V}_{{\rm{cell}}}}{\int}_{{\rm{cell}}}\frac{| \nabla n({\bf{r}})| }{n({\bf{r}})}{\rm{d}}{\bf{r}}\right]}^{1/2}\ .$$
(17)

The two parameters α = −0.012 and β = 1.023 Bohr1/2 were originally obtained by fitting band gaps of all-electron calculations to experimental ones15.

The MAPE dependence on α and β is shown in Fig. 9, where we can observe a long diagonal band where the MAPE is essentially constant. The original parameters are already located at the center of this valley. The absolute minimum of the MAPE is αt−mBJ = −0.125 and βt−mBJ = 1.100 Bohr1/2, which we took to define t-mBJ. As we can see, the β parameter is very close to the original one of mBJ, while α is more negative. This, however, does not seem to have a significant impact on the predicted band gaps, as the metrics of mBJ and t-mBJ are essentially the same, as can be seen by comparing Figs. 2l and 5f. The most notable difference between mBJ and t-mBJ is found for Ne (experimental gap of 21.48 eV) for which mBJ predicts 18.85 eV, whereas t-mBJ predicts 21.35 eV. However, it is worth mentioning that for the very large band gaps above ~12 eV, mBJ results obtained with VASP are clearly smaller (up to a few eV for Ne) than the reference mBJ values obtained with an all-electron code76. Such large underestimations by VASP is specific to mBJ and is, according to our investigations, mainly due to the calculation of c [Eq. 17].

Fig. 9: Color map of the MAPE as a function of the parameters α and β of mBJ.
figure 9

The scale on the right is given in %. The symbols indicate the actual parameter values for the different functionals of this family: mBJ () and t-mBJ ().

As previously mentioned, the RPP potential was developed to solve some of the problems of the BJ potential. Therefore, it is rather tempting to modify the original form of RPP by following the ideas of mBJ in order to describe more accurately the band gaps (see also ref. 134):

$${v}_{{\rm{x}}}^{{\rm{t}}-{\rm{RPP}}}=c{v}_{{\rm{x}}}^{{\rm{BR}}}+(3c-2)\frac{1}{\pi }\sqrt{\frac{5}{12}}\sqrt{\frac{D}{n}}\ ,$$
(18)

where c is given by Eq. (17) as in mBJ. Overall, we observe in Fig. 10 that the resulting MAPE looks very similar to the one for mBJ in Fig. 9, but we remark that a different scale is used for the parameters α and β.

Fig. 10: Color map of the MAPE as a function of the parameters α and β of t-RPP.
figure 10

The scale on the right is given in %. Note the different scale with respect to the one for (t-)mBJ in Fig. 9.

The new potential t-RPP shows very promising results. Looking at Fig. 5g, we see that it presents excellent values for the a and b coefficients of the linear fit, at 0.99 and 0.01 eV, respectively. The results are slightly worse for b in the sp subset (a = 1.00, b = 0.13 eV), but more clearly for a and b in the fd subset (a = 0.81, b = 0.23 eV). The mean errors are also very small (ME of −0.03 eV and MPE of 3.1%), but the absolute mean errors (MAE of 0.63 eV and MAPE of 38%) are comparable to other functionals like AK13LDA or MVS. The reason for this comes from the dispersion of errors, and the fact that t-RPP both underestimates and overestimates band gaps, depending on the band gap range under consideration.

Summary for the fitted functionals

A summary of the MAE and MAPE for all fitted functionals is presented in Fig. 11, where the original functionals are also shown for comparison. The most impressive improvements are obtained by reoptimizing the PBE functional, so that the performances of t-PBE1 and t-PBE2 are similar to HLE16 and (t-)HLE17. Noteworthy, the fitted PBE functionals still satisfy the homogeneous electron gas limit, while HLE16, (t-)HLE17, and also t-rSCAN do not, since they have scaled exchange and correlations parts.

Fig. 11: MAE (x axis) and MAPE (y axis) on the band gap for the fitted functionals in this work.
figure 11

The errors of the original functionals are also shown for comparison.

The improvement of rSCAN is also significant. As somehow expected, for functionals which originally were already fitted specifically for band gaps (mBJ, Sloc, and HLE17), the improvement is only minor. We expect that the same conclusion may probably hold also for HLE16.

Machine learning

We witnessed in the previous section the difficulty in improving existing functionals for band gaps, and in particular the persistence of the MA(P)E in the large data set. We will now take one step further and build more complex machine-learning models that combine the results of multiple functionals to achieve higher accuracy.

We chose two different methods that provide highly interpretable models, as the small size of the data sets severely limits the complexity of the models. Specifically: (i) The sure independence screening and sparsifying operator (SISSO)145 starts with a number of features, combines them with linear and non-linear operators to create millions of higher-level features and then attempts to find the best sparse linear model using these higher-level features. (ii) The model agnostic supervised local explanations (called MAPLE)146 is a recently developed machine-learning technique that combines the usability and accuracy of decision trees with the interpretability of local linear models and self-explanations, therefore circumventing the usual accuracy-interpretability trade-off. MAPLE uses traditional decision trees, e.g. random forests147, to arrive at a local simulatable148 linear model. Furthermore, it can output the most influential training samples for a prediction as a form of self-explanation.

As starting features, we used all the results calculated with the various functionals from the benchmark data set (the fitted functionals were not included). Furthermore, we complemented this data with composition specific features created with matminer149 (see Supplementary Table SIII for a full list), with the symmetry groups in Hermann–Mauguin notation, as well as the crystal structure prototype according to the ICSD150. In the case of MAPLE, the feature importance of the underlying random forest model can be used to decrease the number of features by always removing the least important functional and refitting the model. While SISSO ideally selects its features completely independently, the feature selection task in this case is highly non-trivial as all the functionals are highly correlated. To keep the number of functionals that end up in the final features, and therefore the computational cost, limited, we used the most important functionals selected by MAPLE as a starting point for SISSO. Unsurprisingly, the mBJ functional was consistently chosen as the most important or second most important functional. Other relevant functionals varied, for example, we obtained the HLE16 and RPP/M06-L or the LDA and LDA-SOC (with spin-orbit corrections). Due to the high variability in the systems and their band gaps and the small amount of data, 10-fold cross-validation was used to evaluate the models instead of a separate test set. The training/validation split was chosen as 70%/30%. All the following results are averaged according to this validation scheme.

SISSO only returned results comparable to MAPLE once higher-dimensional models were used, therefore we will concentrate on the MAPLE models in the following. For combinations of two functionals and three functionals together with the selected experimental features, we selected the best performing pair of functionals by brute-force trying all combinations of functionals together with mBJ. In Fig. 12, we compare MAPLE models using various numbers of functionals and elemental properties as input to the best functional in the complete benchmark (present work and ref. 43). Using only band gaps calculated with the mBJ functional as input, the MAE, IQR, and ME are significantly improved while the MAPE and MPE increase. Using band gaps calculated with two or three functionals the MAE, ME, standard deviation σ, and IQR are significantly improved, while the MAPE still remains nearly unchanged. Concerning the linear fit, the coefficient b is also clearly improved, while a is increased. Examining the distribution of the MAPE reveals that this quantity is larger than 70% for the small band gaps (<1 eV) while being comparatively small at ~20% for the larger band gaps. We also attempted to learn the error of the mBJ functional in comparison to the experimental data (which is known as delta-learning, or using a crude estimator of property151). However, in our case this approach did not seem to work.

Fig. 12: Radar plot for three MAPLE models and the mBJ functional.
figure 12

All models use various elemental features: purple uses the mBJ band gap, light blue uses mBJ and M06-L band gaps, and yellow uses mBJ, HLE16 and M06-L band gaps. Results correspond to 10-fold cross-validation with 70%/30% training/test split. Note the different scale with respect to the previous radar plots. The radar plot of the MBJ functional is given in orange for comparison.

The difficulty in improving the estimators, either by reoptimizing functionals or through machine learning models, is perhaps not surprising if we consider the relatively small size of the data set (473 materials), the large range of band gaps (between 0 and 20 eV), and the large diversity of compositions (nearly all chemical elements) and crystal structures (over 150 structure prototypes, from elementary to quaternary materials). However, if we only consider the region between 0.9 and 2 eV (171 entries in total), which is the most relevant for applications like photovoltaics, we can actually achieve a significant improvement in all statistical properties. This can be clearly seen from Fig. 13, where we depict the radar plots for three models, one including compositional features together with the mBJ band gaps, one adding the RPP band gap and another one adding the HLE16 and rSCAN band gaps. The MAE and MAPE are both improved by 40% in comparison to the mBJ functional and present the most accurate estimator of band gaps between 0.9 and 2 eV that we could find.

Fig. 13: Radar plot for three MAPLE models and the mBJ functional.
figure 13

All models use various elemental features: purple uses the mBJ band gap, light blue uses mBJ and RPP band gaps, and yellow uses mBJ, HLE16 and rSCAN band gaps. The radar plot of the MBJ functional is given in orange for comparison. The results are for only the materials with band gaps from the region of interest between 0.9 and 2 eV. Results correspond to 10-fold cross-validation with 70%/30% training/test split. Note the different scale with respect to the previous radar plots.

In order to allow other researchers to improve their band gap predictions, we will publish the machine learning model on our website (https://www.tddft.org/bmg/). Besides predicting the band gap, the MAPLE model is able to highlight the materials in the training set that were most relevant for its prediction as well as the local linear fit from which the band gap is calculated. This allows researchers to make an informed decision on whether to trust the predicted band gap.

In summary, we performed calculations of the electronic band gap of 473 materials using 21 different approximations to the exchange-correlation functional of density-functional theory. These include a series of local, generalized-gradient, and meta-generalized-gradient approximations. Together with our previous results of ref. 43, this amounts to the largest benchmark of band gaps up to date, and allows us an unprecedented view on the problems and deficiencies of density-functional theory to calculate this important physical property.

Statistically, at the lowest rung of Jacob’s ladder of density functionals, the most accurate approximation is Sloc. Going up the ladder to the second and third rungs, we then encounter the GGA HLE16 and the meta-GGAs mBJ, HLE17, and TASK. Finally, the hybrid HSE06 (fourth rung) is also among the most accurate. Overall, the most reliable functionals for band gap calculations are mBJ, that yields the lowest mean absolute (percent) error, and HSE06. Other semilocal functionals that perform reasonably well are for instance AK13LDA and some meta-GGAs of the Minnesota family like revM06-L.

Even if our objective was not to perform a detailed analysis of the computational cost of the benchmarked functionals, as this would require to carefully control the running conditions for all calculations, we believe that some information concerning the runtimes can be useful to the readers. Our computational resources consisted in a shared cluster. To make the best possible use of the available resources, not all calculations could be performed with the same number of cores, type of processors and parallelization options. Furthermore, our running times refer to VASP and could be substantially different if other implementations are used. For example, the adaptively compressed exchange operator algorithm152 provides a significant boost of hybrid functional calculations, but it is not implemented in version 5.4.4 of VASP.

Keeping this in mind, we can compare the numerical efficiency of calculations using different functionals for our data set. The average meta-GGA is around 5 times slower than LDA/GGAs, probably due to the need to evaluate extra terms in the Hamiltonian. The mBJ (and related functionals) are 10–50 times slower, due to the larger number of iterations required for convergence and to the time needed to solve the non-linear equation that appears in the definition of the functional. Calculations using regular hybrids and dielectric-dependent hybrids from ref. 39 are two order of magnitudes slower than LDA/GGAs calculations.

We also analyzed a smaller subset of our 473 materials that includes many of the materials that are commonly used in the benchmarks of band gaps. It includes mostly elementary or binary zincblende and wurtzite semiconductors, with a strong presence of sp elements. The small set is mostly composed of “simple” materials and is therefore not fully representative of the set of semiconductors and insulators. Fitting functionals to this set may therefore lead to a bias.

In order to understand the behavior of the analytical form of selected functionals, we then investigated how the average error in the band gap depends on the individual parameters used in the construction of the functionals. We performed such maps for Sloc, PBE, HLE17, rSCAN, mBJ, and RPP. By using the values of the parameters that minimized the mean absolute error, we then constructed reparametrizations of these approximations. It turns out that one can construct accurate functionals for band gaps at all rungs of Jacob’s ladder if the fraction of exchange is increased. Of course, this may lead to a degradation of the results for other physical quantities (see ref. 77 for a discussion on that problem). Although one can improve considerably some of the statistical quantities describing the error in the band gaps, the mean average percent error remains high, likely due to the small size of the fitting set.

Finally, we developed machine learning models to improve the prediction of band gaps. The input for our models are band gaps calculated with approximate exchange-correlation functionals and structural and compositional data. We were unable to train consistently better models that worked in the whole range of band gaps. This can be explained by the small size of the data set (for machine learning applications) and by the extremely large diversity of the materials considered. We could, however, train a very simple model that improves considerably the band gaps in the range of 0.9–2 eV, an interval that includes the range relevant for photovoltaic applications.

The recurring problem in benchmarking band gaps is the relatively small number of reliable experimental data. These can be complemented by information on new materials, but also by revisiting “old” semiconductors with more modern experimental techniques. This can lead to a deeper understanding of material properties and to the development of better theoretical tools and approaches. For band gaps, these obtained data should also ideally be corrected for temperature effects, zero-point corrections, excitonic binding energies, etc., factors that together can explain some of the deviation to calculated results. This time-consuming and tedious process can nevertheless be accelerated by the use of active-learning techniques to propose new candidates for experimental characterization to better cover material space. A close collaboration between experimentalists and theorists would allow for the feedback necessary to obtain reliable and sufficient data for the use of sophisticated theoretical and statistical methods.

Methods

In order to benchmark the functionals presented above, we resort to the data set of ref. 43. As a result of our effort to maintain and improve the data set some modifications have been made to the original reference. One entry (BaCu2GeS4) was removed from data set due to a discrepancy in the reported experimental structures. Two entries were modified: LiIO3, which pointed to a structure with same space group but permuted ions, and TiO2, updated to a more accurate reference. Finally, two new entries have been added, namely BAs and CdO, bringing the total number of entries to 473. Obviously, due to the large size of the data set, these few modifications do not change the conclusions of our previous work.

For each material of the data set, calculations were performed at the corresponding experimental geometry, obtained from the inorganic crystal structure database153.

The calculated band gaps were obtained from self-consistent calculations as the difference of the Kohn–Sham eigenvalues at the valence band maximum and conduction band minimum. All calculations were performed within the projector augmented wave formalism154, as implemented in the Vienna ab initio simulation package (VASP; version 5.4)155. A custom version of VASP, linked to the library of xc functionals Libxc4,5 was used to access xc functionals not implemented in the default distribution. Within the set of pseudopotentials of the VASP distribution, we use the ones recommended by the materials project database156. All meta-GGA calculations were performed accounting for non-spherical contributions of the density gradient inside the augmentation spheres. We used the same k-point sets as in ref. 43, that ensured values converged within 50 meV for both PBE and HLE16 calculations. In addition, we neglect the effect of spin–orbit coupling. Although not entirely negligible, this effect is expected on average to contribute by about 0.1 eV43, which is considerably smaller than the typical average error of approximate functionals.

One final comment is in order. All our calculations were performed with VASP, that uses PAW pseudopotentials to model the electron–ion interaction. The VASP distribution provides a well-tested set of PAW potentials for two functionals (LDA and PBE). This means that most of our calculations, as most of pseudopotential calculations in literature, were performed with functional-inconsistent pseudopotentials. However, the error introduced can be safely neglected if one is only interested in statistical averages157.