Introduction

Advances in computing power and electronic structure methods, in particular density functional theory (DFT), have propelled computation and theory to the forefront of materials design and discovery. Countless efforts now employ data-driven and/or high-throughput computational approaches1,2,3,4,5,6,7,8 to isolate individual or groups of materials with desired properties3,4,5,6,9,10,11. For instance, the Materials Genome Initiative has fostered multiple research projects to map out materials’ properties, based on DFT predictions, to explore the physical and chemical properties of known and hypothetical materials3,4,12,13,14,15,16,17,18. A striking example is work by Greeley and co-workers which screened over 700 binary alloys and identified BiPt as the best electrocatalyst for the hydrogen evolution reaction, comparable to pure Pt in terms of surface activity12. Similarly, Hautier et al. explored phosphate chemistry with thousands of compounds for the design of lithium-ion battery cathodes13. Emery and co-workers searched through 5,329 ABO3 perovskite compounds to find thermodynamically-favorable compounds most suitable for a two-step thermochemical water splitting process17. The combination of high-throughput DFT with materials informatics and machine learning techniques promises even greater search efficiency5,19,20,21,22.

Central to the success of DFT are the approximations for many body exchange and correlations (XC) within the framework of the Hohenberg–Sham–Kohn theorems23,24. These approximations attempt to balance accuracy and speed; thereby making DFT an attractive tool for computational materials design. Initial efforts to approximate XC relied on the homogenous electron gas assumption25, in which the energy at a point only depended on the charge density (n) at that point. This local density approximation (LDA)23,24,26 has a tendency to over bind leading to errors in calculated physical properties27, including underestimations of lattice parameters28,29,30 and vibrational frequencies31,32. This overbinding is attributed to the LDA exchange which scales as n4/3 as opposed to n2 as expected from Hartree–Fock exact exchange33.

To improve this approximation, functionals were developed to account for density variations in the electron gas, i.e. including local gradients. These so called generalized gradient approximations (GGA) have a much wider range of forms and thus ushered in an explosion of functionals34. Early derivations include the XC functional of Perdew, Burke and Ernzerhof (PBE), which while improving quantities such as binding energies, tended to overestimate lattice constants35. A more recent incarnation was designed to return to the gradient expansion approximation to give better agreement for solids (PBEsol)36. Unfortunately, GGAs still underestimate band gaps; sometimes predicting semiconductors to be metals37. Hybrid GGAs, such as B3LYP38,39, PBE040 and HSE41, mix the GGA with non-local exact exchange, resulting in improved descriptions of electronic band gaps as well as covalent, ionic, and hydrogen bonding, but at increased computational cost42,43. Similarly, the non-empirical strongly constrained and appropriately normed (SCAN) meta-GGA provides a significant improvement in accuracy over standard LDA and PBE44,45, but is still quite computationally expensive. A class of van der Waals density functionals (vdW-DF) incorporate long-range vdW interactions allowing for the computation of material functionality in both sparsely and densely packed structures45,46,47,48,49,50.

It is widely accepted that the main component of the errors in DFT calculations is the exchange–correlation functional approximation and the goal is to choose the most reliable functional51,52. The broad diversity of XC functionals begs the question, for a given composition and structure, what is the best choice of functional? However, one of the major challenges is to determine the reliability and the source of the error for a given functional53,54. The question of DFT calculations’ reliability is mainly handled by two different approaches: Bayesian error estimation and statistical analysis. The Bayesian approach is a semi-empirical approach where an ensemble of XC-functionals is used, and the desired quantities are represented in form of a distribution. The spread of the calculations provides an error estimation52,53,55,56,57. This method has been successfully applied for uncertainty quantification in DFT calculations in a large variety of applications ranging from calculation of physical and surface properties to the refinement of phase diagrams53,55,56,58,59,60,61,62.

In the case of statistical analysis by regression, experimental measurements are expressed as a function of DFT calculation errors. This is an a posteriori analysis to provide material-specific predictions of calculation errors. An example of this method, developed by Lejaeghere et al., employed a linear function51. Another later effort, used polynomial functions to represent the relationship between calculations and measurements for cubic crystals63. This approach of mapping measurements to calculations and statistically analyzing the error has since been successful for predicting the error in calculations of surface energies and work functions, energetic and elastic quantities, thermal expansion coefficients, and melting temperatures employing DFT-based, semi-empirical methods51,52,54,64,65.

The question of best exchange–correlation functional is particularly relevant for high throughput calculations where one functional may be used to screen a large group of materials. To examine the accuracy of such approaches, here, we employ high-throughput DFT calculations to quantify the accuracy of four XC functionals: LDA, PBE, PBEsol, and vdW-DF with C09 exchange46. We explore the properties of 141 binary and ternary oxides encompassing 44 different space groups across all 7 crystal structure types. We examined structures with 4–264 atoms containing 31 different cations. These structures were chosen based on previous work by Hautier and co-workers14. The lattice parameters, bulk moduli and reaction enthalpies for forming the ternary oxides from the binary oxides were computed and compared against available experimental data, using the Nexus workflow management system66. Further details of the methods are given in the Supplementary Information (SI) and the distribution of atomic structures studied is given in Fig. S1. In general, we find that the vdW-DF-C09 and PBEsol functionals have the lowest errors.

In this study, we approached the problem of finding the best functional from the aspect of material-specific calculation errors. We note here that we choose the term error instead of uncertainty as uncertainty is defined as a “non-negative parameter characterizing the dispersion of the quantity values being attributed a measurand”67. Based on this, the quantity we are interested in is not a dispersion, but instead, it is a material-specific deviation from the experimental measurements. In addition, we do not consider ensembles of the same functional as in the Bayesian Error Estimation Functionals53,55,56. Instead, we calculate errors, and evaluate them separately for the four different functionals employed in this study. Another difference is that we estimate the error in terms of material-specific parameters using machine learning. Employing materials informatics methods, we quantify the main contributions to the errors and explore their physical origins. Our results emphasize the importance of both the electron density and metal–oxygen bonding/orbital hybridization. We demonstrate the strength of this approach by predicting the lattice constant errors of a distinct dataset of potential perovskites. Overall, our approach provides a means of understanding the expected errors associated with a particular XC, which can lend to greater accuracy for materials screening, along with giving insights that may enable the development of improved XC functionals. As such, the notion of incorporating “error bars” within DFT is to estimate the errors a particular functional would have on a given property with the application of advanced machine learning models and ultimately recommend the functional suitable for a particular class of material.

Results and discussion

DFT-based properties with different XC functionals

Figure 1 shows DFT-optimized lattice constant percent errors relative to experimental values. These errors are good indicators of the over or under-binding caused by the XC. As expected, PBE, on average, overestimates lattice constants while LDA typically underestimates them. Interestingly, vdW-DF-C09 yields similar trends in the lattice constant percent errors as PBEsol; with errors centered around 0%.

Figure 1
figure 1

Histogram of DFT-predicted percent errors relative to the experimental lattice parameters using different XC functionals. a, b and c lattice parameters are considered separately for each structure.

The distribution of the lattice constants absolute % errors is depicted in Fig. 2a. The errors of each lattice constant are examined independently. For each XC functional, the y-axis indicates the number of lattice constant data points with an absolute percent error less than or equal to the x-axis value; the y-values asymptote to 100%. The mean absolute relative error (MARE) and standard deviation (SD) are 2.21% and 1.69% for LDA and 1.61% and 1.70% for PBE, respectively. PBEsol and vdW-DF-C09 have significantly lower errors: PBEsol has MARE: 0.79% and SD: 1.35% and vdW-DF-C09 has MARE: 0.97% and SD: 1.57%. Previously, we observed similar accuracy for PBEsol and vdW-DF-C09 for a small dataset of ferroelectric perovskites30. If we were to consider the number of data points with MAREs less than 1%, we see that PBEsol and vdW-DF-C09 would have nearly 80% accuracy, whereas PBE and LDA are dramatically reduced to < 30% and < 20%, respectively.

Figure 2
figure 2

Distribution of DFT-computed percent errors of (a) lattice parameters and (b) bulk moduli for different XCs. a, b and c lattice parameters are considered separately for each structure.

Intuitively, one would expect XC errors to couple to the chemistry of the compounds14. The percent error as a function of chemical element is shown in Fig. 3 for the four XC potentials. While no distinctive pattern emerges, for PBEsol and vdW-DF-C09 compounds with lighter chemical elements (Z < 23) have lower MAREs <  ~ 1% with the greatest errors associated with magnetic elements, specifically Cr, Fe, Ni, and Mo. (All calculations were exclusively spin unpolarized to isolate common XC errors from spin-dependent errors. The results suggest that magnetoelastic couplings may be significant in several compounds.) In fact, a significant reduction in MARE can be achieved for PBEsol (MARE = 0.64) and vdW-DF-C09 (MARE = 0.78), if the magnetic ion-containing compounds are excluded (see Table S4).

Figure 3
figure 3

MAREs of DFT-predicted lattice parameters relative to experimental lattice parameters as a function of chemical elements using (a) LDA, (b) PBE, (c) PBEsol, and (d) vdW-DF-C09. For ternary compounds, with a general formula of AmBnOx, the species indicated by the y- and x-axes can be on either the A- or the B-site. As a result, the figures are symmetric relative to the diagonal of the plots. Binary compounds are considered in the mean.

In addition to lattice parameters, the bulk moduli were computed using the Birch-Murnaghan equation of state as given in Eq. S1. As indicated by Fig. 2b, the predicted bulk moduli MAREs are larger than the lattice parameters for all XC functionals, which is expected. LDA and PBE over and underestimate the bulk moduli respectively, due to the inverted dependence of total energy on volume. LDA yields the largest MARE, 17.93%, and has a SD of 19.82%, followed by PBE with MARE of 15.75% and SD 13.10%. PBEsol and vdW-DF-C09 have lower errors with MARE: 9.49% and SD: 12.79% for PBEsol and MARE: 11.45% and SD: 13.40% for vdW-DF-C09.

Next, we examine computed reaction enthalpies errors for forming ternary oxides from binary oxides, AmOn + BxOy → AmBxO(n+y), as shown in Fig. S2. A full list of reaction energies for each XC along with corresponding experimental values are given in Table S3. There is excellent agreement between DFT and experiment for all XCs. This suggests that the cancellation of systematic errors, a hallmark of the success of DFT, is not significantly dependent on the XC. Similar observations were reported for a DFT (GGA + U)-high-throughput study of these metal oxides14.

Together our observations indicate that PBEsol and vdW-DF-C09 are excellent general purpose XCs for examining structural properties, while LDA and PBE fail to capture the structural/mechanical properties of bulk oxides. These present us with a measure of accuracy that may be applied when performing high-throughput calculations.

Materials informatics approach

Exploiting this high-throughput dataset, we employed a materials informatics approach to determin a priori the “best” XC for a given metal oxide. (Note: We exclude oxides that have greater than a 5% lattice parameter error for a single lattice parameter for more than two functionals and oxides with greater than a 5% lattice parameter error for multiple lattice parameters. Non-groundstate structures are excluded. This results in 123 data points. We also exclude oxides for which ionic radii value with coordination number of the element do not exist. This left us with 110 samples to be used for model selection and development. The details are explained in the SI.) Unlike, some other efforts68,69 bringing DFT and data-based methods together that seek to predict the magnitude of a physical property, e.g., the lattice parameter, our effort aims to predict the percent error attributed to an XC for a specific property. The philosophy is to provide material-specific “error bars”, i.e., deviations from experimental measurements, for each XC—in this case (as illustrated in Fig. 4) we consider the mean average errors (MAE) and SDs as measures for the range of errors one would expect when studying lattice parameters with each of the functionals. Choosing, beforehand, a “best-fit” XC should help to improve high-throughput dataset accuracy. Here, we focus on the average lattice parameter, \(\delta \overline{a },\) as an archetypical physical quantity.

Figure 4
figure 4

Comparison of the percentage error in average lattice parameter before (XC error) and after RF correction (RF-C XC error).

Based on available standard elemental and structure-specific properties: (i) electronegativity of A- and B-site elements (\({\rm X}_{A}\) and \({\rm X}_{B}\) ); (ii) number of valence electrons (nA and nB); (iii) atomic numbers (\({Z}_{A}\) , \({Z}_{B}\), \({Z}_{O}\)), (iv) ionic volumes (\({V}_{A}\), \({V}_{B}\), \({V}_{O}\)); (v) DFT ionization percent errors (δIEA, δIEB); (vi) number of A-site, B-site, and oxygen atoms (\({N}_{A}\), \({N}_{B}\), \({N}_{O}\)); (vii) nominal charge of A- and B-site elements (\({\sigma }_{A}\) and \({\sigma }_{B}\)); and (viii) coordination numbers (\({CN}_{A}\) and \({CN}_{B}\)), we created the following compound-specific feature set (summarized in Tables S5 and S6) to eliminate inter-feature correlations (see Fig. S4):

  1. (i)

    Fractional ionicity, f: This expresses the relative ionicity/covalency of the bonds. For ternary compounds, it is computed using the fractional ionicities of the end member binary compounds. Individual f values are computed from the experimental electronegativities of each binary compound.

  2. (ii)

    Charge per valence electron, \({\bar{\sigma }}\)n: This represents the fraction of valence electrons contributing to the ionic bonding and consequently implies the fraction of electrons contributing to metal–oxygen hybridization. The nominal charge of the A- and B-site ions (\({\sigma }_{A}\) and \({\sigma }_{B}\)) is multiplied by the fractional ionicity of the AO and BO binary oxides fAO divided by the number of A- and B-valence electrons (nA and nB). For ternary oxides, we employ a weighted average value of \({\bar{\sigma }}\)n based on the number of A and B-site ions.

  3. (iii)

    Valence electrons per atomic number, \(\overline{n }\)Z: This captures electron screening effects and the impact of core or semi-core electrons on the valence electrons. \(\overline{n }\)Z is essentially a weighted average of the number of valence electrons (nA and nB). This quantity is mathematically transformed to create Gaussian-like distributions that are compatible with the data-analytics techniques.

  4. (iv)

    Pauling electrostatic strength of the metal–oxygen bond, \(\overline{S }\): This is the ionic bond strength. To determine the valence electrons contributing to the ionic bonds, the number of A- and B-valence electrons (nA and nB) are multiplied by the fractional ionicity divided by the coordination numbers of the A- and B-site cations (\({CN}_{A}\) and \({CN}_{B}\)). A weighted average value is used for ternary oxides.

  5. (v)

    Mass density, \({\rho }^{M}\): This is the mass to volume ratio and roughly proportional to the average electron density. It is important for highly localized valence states, such as d and f electrons, as well as semi-core states. The mass is calculated by multiplying the atomic weight with the number of A-site, B-site, and oxygen atoms in the stoichiometric compound (\({N}_{A}\), \({N}_{B}\), \({N}_{O}\)). The volume is calculated similarly by multiplying the atomic volumes (\({V}_{A}\), \({V}_{B}\), \({V}_{O}\)) with the number of A-site, B-site, and oxygen atoms.

  6. (vi)

    Oxygen fractional occupation, \({\overline{N} }_{O}\): This is the anion to total ions ratio. It is calculated by dividing the number of oxygen atoms (\({N}_{O}\)) by the total number of ions in the stoichiometric compound.

  7. (vii)

    Element specific DFT ionization errors, \(\overline{\delta IE }\): These express the difference between DFT computed and experimental first ionization energies for individual elements. This is the only feature that depends on the XC and pseudopotential choice. A weighted average is used for ternary oxides.

To predict \(\delta \overline{a }\) for each XC, we trained random forest regression models using the above feature set; this was done with the NumPy, SciPy, and Scikit-learn libraries70,71. Both data before its preparation for machine learning models and feature sets for prediction of errors for all functionals are provided in the Supporting Information. The machine learning codes are also provided in the Supporting Information. The leave-one-out (LOO) cross-validation method was employed with a linear bias correction. The error and the standard deviation for the LOO cross validation is provided in Table S7. MAEs were 0.191, 0.109, 0.115, and 0.139 for LDA, PBE, PBEsol, and vdW-DF-C09, respectively. The predicted versus true error is plotted for each XC in Fig. S3. These results demonstrate the good accuracy of our approach for all XCs. In addition, with an approach similar to the Δ-machine learning approach72, we calculated the random forest corrected (RF-C) average lattice parameters and compared them to the experimental average lattice parameters, as shown in Fig. 4. Figure S4 further summarizes the comparison of the RF-CDFT errors and the DFT errors without any correction in greater details. Even though the median of all the RF-C errors is similar, PBEsol shows the smallest variations compared to the experimental values. A similar result was found by Pernot et al. after application of a linear calibration. LDA, PBE, and PBEsol gave similar mean errors, but they differed in terms of width of distribution. LDA has a wider distribution than PBE and PBEsol has the smallest63. Considering linear calibration captures the systematic error, the similarity between their and our results suggest we were able to predict the systematic error by using material-specific features in our ML model.

The key features, i.e., input parameters used in the RF predictions, were further used to identify the most important variables for predicting \(\delta \overline{a }\) using the tree interpreter algorithm. This algorithm calculated the contributions of each feature by fitting a linear equation to each sample. The mean and standard deviations of the absolute linear coefficients are given in Table S8. Fitting a different equation to each sample creates a large standard deviation for the linear coefficients and hence the contributions. The contributions summarized below are based on the average values, however, we are confident these average contributions will not change based on different datasets as they are based on the average of linear coefficients fitted to each example separately.

The absolute values of the linearity coefficients were normalized to one. The normalized linearity coefficients are given as a radar plot (see Fig. 5). Equal importance of all features would give vertex values of \(1/7\). Deviations from this ideal value allow us to assess specific feature contributions to XC errors; thus, providing insight into the determining factors in the success or failure of each XC.

Figure 5
figure 5

Normalized mean absolute linear dependency coefficients for different XCs obtained from regression analyses. A vertex value of 1/7 is indicative of equal contributions.

The contribution of each feature can be summarized as follows:

  • f`: examines the importance of the relative bond ionicity/covalency. In Fig. 5, we see that for both LDA and vdW-DF-C09, f is greater than the 0.14 average value. PBE, on the other hand, is roughly 0.14 while PBEsol exhibits a reduction to ~0.10.

  • \({\bar{\sigma }}\) n: Similar to f, the role that ionic bonding and metal–oxygen hybridization plays in determining \(\delta \overline{a }\) are explored. In Fig. 5, we observe that it contributes similarly to \(\delta \overline{a }\) for all XCs with coefficients that are higher than the average value of 1/7.

  • \(\overline{n }\)Z: relates to electron screening effects. For LDA and PBE, we observe an average contribution ~ 0.14 (Fig. 5), while vdW-DF-C09 and PBEsol show stronger than average dependence on this feature.

  • \(\overline{S }\): corresponds to the ionic strength per bond. Here, the predicted errors in PBEsol and vdW-DF-C09 seem to have a roughly average dependence on this feature while, the PBE predicted errors show enhanced correlations and LDA exhibits reduced correlations.

  • \({\rho }^{M}\): is roughly proportional to the electron density and is viewed as a measure of valence state, i.e. d and f electrons, and semi-core state localization. The relative contribution of this feature to \(\delta \overline{a}\) is highest for LDA and vdW-DF-C09; but of limited significance for PBE and PBEsol.

  • \(\overline{N}_{O}\): expresses the relative anion:cation ratio. While in general, this feature has the least significance when predicting \(\delta \overline{a}\) for all XCs, we observe that removing it from the regression significantly degrades the quality of predictions.

  • \(\overline{\delta IE}\): is the difference between the DFT computed and experimentally available first ionization energies. Unlike other variables, it only depends on the chemical identity of the elements involved in a structure and thus is an indicator of the quality of the pseudopotential as well as the XC. For LDA it is least significant for \(\delta \overline{a }\) predictions. Interestingly, even though the same pseudopotential was employed for PBE, PBEsol and vdW-DF-C09, we observe a spread in its relative significance; being least important for PBE and most significant for PBEsol. This perhaps reflects the different philosophies behind XC development.

An analysis of the above data suggests that the largest source of errors is the inability of the functionals to capture and represent non-homogeneities due to limitations in the exchange hole and the short-range expansion of the reduced gradient. It is widely known that LDA exchange holes cannot capture inhomogeneities, therefore resulting in systematic overbinding73. From our analysis, LDA’s tendency to overbind is, perhaps, most reflected in the significance of f, \({\bar{\sigma }}\), \(\overline{n }\) Z and \({\rho }^{M}\) in predicting \(\delta \overline{a }\). These features are all related to bonding and hybridization; possibly emphasizing the direct relationship between improvements in these quantities and the reduction of XC errors.

Interestingly, the gradient corrected functionals all show significant reductions in \({\rho }^{M}\) dependence, largely an indication of the localization or homogeneity of the electrons. For PBE the predicted \(\delta \overline{a }\) is roughly equally dependent on all features—perhaps indicative of PBE’s poor performance thus requiring a broader set of features to predict the large distribution of errors. On the other hand, vdW-DF-C09 and PBEsol have fewer significant features. PBEsol, for instance, has above average dependencies on \(\overline{\delta IE} , \;\overline{\sigma }\), and \(\overline{n}\)Z. While vdW-DF-C09 exhibits significant dependence on \(\overline{\delta IE}\) and f, \(\overline{\sigma }\), \(\overline{n}\) Z and \(\rho^{M}\) like LDA.

Surprisingly, all XCs have similarly strong dependence on \({\bar{\sigma }}\)n; which is an indicator of hybridization. To gain further insights into the relationship between \({\bar{\sigma }}\)n and hybridization, the hybridization energies for sp, sp2, sp3 sp3d, and sp3d2 were calculated using a linear sum of the orbital energies following Harrison’s prescription42 and orbital energies from the literature43,44. In all cases, our calculated energies are linearly dependent on \(\overline{\sigma }\)n as plotted in Fig. S5, typical of hybridization. The orbital and hybridization energies are correlated with the self-interaction energy contributions which rely on the cancellation of errors. The XC functionals compared here are continuous, making it impossible to cancel this energy entirely. A recent study suggested that the error related to the orbital energies may not be directly related to the self-interaction errors but are instead due to the too repulsive exchange response potentials of the LDA which are not fully corrected in the GGA formulation74. In either case, the fact that the \(\overline{\sigma }\)n is similar for all XCs is consistent with the fact that typical GGAs and LDAs are incapable of eliminating effects due to poor descriptions of orbital hybridization75.

Finally, to demonstrate the power of this approach, the trained regression model was used to predict \(\delta \overline{a}\) for a large set of possible ABO3 perovskites for the four XCs (Fig. 6). As expected, the predicted \(\delta \overline{a}\) indicates a systematic under- and over-estimation for LDA and PBE, respectively. Again, PBEsol and vdW-DF-C09 have errors centered around 0, with the vdW-DF-C09 functional having a slightly larger distribution of errors. The distirubtion of the calculated errors are provided in Supporing Information Fig. S6.

Figure 6
figure 6

Heat maps showing predicted \(\delta \overline{a }\) for selected perovskites. Black masked regions correspond to non-existent A- and B-site combination, and light gray regions show combinations where the absolute error is larger than 1%.

To confirm these predictions, we extracted available data for lattice constant errors for PBE calculations of these oxides from the Materials Project high throughput databases and compared this to experimental data in the Inorganic Crystal Structure Database (ICSD) (see Fig. 7). (N.B. this databases employs VASP with projector augmented wave potentials as opposed to Quantum Espresso (QE) with GBRV psuedopotentials in the training set). To gauge the difference between VASP and QE we include data points from this study and our recent comparative paper30. In general, we find excellent agreement between our machine learning predictions and the VASP and QE calculations; particularly as it relates to non-magnetic materials. The largest deviations occur in three categories:

  1. 1)

    Polar structures, e.g. ferroelectric PbTiO3, AgNbO3, and antiferroelectric PbZrO3. We note that structures with large off-center displacements are not well represented in this database. N.B., with the exception of PbTiO3, we typically find better agreement between our QE calculations and the predictions—likely indicative of the differences between QE and VASP.

  2. 2)

    Magnetic cations particularly Mo and Mn. Our calculations were explicitly non-magnetic and compounds with these cations were excluded from our training set. Nevertheless, magnetic ions such as Fe, Cr and Ni exhibit remarkably good agreement; possibly an indication of weak spin–lattice couplings within these materials (see Fig. S7).

  3. 3)

    Cd-based compounds. We had only two Cd-containing compounds in our training set, neither of which was a perovskite. This suggests the need for additional training data points (see Fig. S7).

Figure 7
figure 7

Comparison of \(\delta \overline{a }\) computed from DFT vs. experimental values from ICSD. Solid black squares and empty green triangles are Materials project data and our QE data for non-magnetic materials (excluding Cd-based materials), respectively. Our data includes those in this study and our previous work30. Gray regions indicate 1 standard deviation of the computed \(\delta \overline{a }\). Predictions for magnetic systems and Cd can be found in the SI.

Ultimately, these results present strong indications that our approach can reliably predict the errors of a class of materials which may not necessarily be included in our training set. In other words, this allows us to understand the boundaries of predictions for a large dataset using a particular functional. This should allows us to construct much higher accuracy high throughput databases. For example, for the exploration of ABO3 oxides our results indicate that PBEsol and vdW-DF-C09 would be the best choice; at least when predicting lattice constants. Here, we stress that these may not be the optimal XC functionals when exploring other materials properties. For example, we have previously shown that for ABO3 ferroelectric oxides PBEsol has a tendancy to severely overestimate the soft-mode displacements that give rise to polarization and thus, although it gives excellent predictions for lattice constants, would be a poor choice for polar materials.

Conclusion

In conclusion, the impact of XC choice on computed macroscopic properties was investigated for binary and ternary metal oxides. As expected, LDA and PBE lead to under- and overestimations in predicted lattice parameters, respectively. The opposite trend is observed for bulk moduli due to the inverted dependence of total energy to volume. Surprisingly, vdW-DF-C09 exhibits performance comparable to PBEsol. One explanation for this similar behavior, could go back to the philosophy behind the development of the C09 exchange which was to reduce short-range exchange repulsion. To do this, the functional uses the gradient expansion approximation for the enhancement factor Fx(s) = 1 +  µs2 where s =|\(\nabla\) n|/(2kFn) is the reduced gradient of the density n and µ = 0.0864, which is the value used in PBEsol46. Good agreement, within experimental uncertainties (see Table S3), can be seen between DFT-computed and experimental reaction energies regardless of the XC; most likely brought about due to systematic error cancellation. Employing machine learning focused on the prediction of errors, we explore the defining differences in accuracy between different XCs. Most significantly, we find that the representation of electron density and hybridization may be the determining factor behind the accuracy of a functional. This suggests the need to apply more stringent criteria when designing XC functionals; thereby emphasizing the return to the exact density as suggested by Medvedev et al.76. Ultimately, vdW-DF-C09 and PBEsol are indicated to be good general purpose XCs for oxides, producing the highest accuracy for high-throughput screening. This is illustrated in our application to predicting ABO3 oxides \(\delta \overline{a }\). While further work is needed to understand the deviations for other quantities such as band gaps, elastic, polar or magnetic properties, these results present a meaningful approach, centered on error prediction rather than actual quantities, that may assist in the guidance of choice of XC functional to produce high fidelity datasets and to define routes to creating functionals with improved performances.