## Introduction

Over the last decades, computational materials science has evolved as a paradigm of materials science, complementing theory and experiment with computer experiments2. In particular, density-functional theory (DFT) has become the workhorse for a plenitude of computational investigations, representing a good compromise between precision and computational expense, thus allowing for the investigation of realistic systems with affordable numerical effort3. The widespread application of electronic-structure theory was especially fueled by the development and distribution of many user-friendly and computationally efficient simulation packages (termed codes in the following) based on DFT4. Essentially all these codes rely on the same fundamental physical concept and solve the Kohn-Sham (KS) equations5 of DFT self-consistently by expanding the Kohn-Sham states in a finite basis set. Moreover, apart from the choice of the basis set, different approximations and various numerical techniques and algorithms are employed. Inherently, this raises the question how consistent, and hence, how comparable, results from different codes are.

In 2016, a synergistic community effort led by K. Lejaeghere and S. Cottenier1 has shed light on these issues, essentially concluding that “most recent codes and methods converge toward a single value”. This concerns, however, only the investigated relatively robust case of computing the equation of states for elemental solids1,6 using the PBE exchange-correlation (xc) functional. In this context, it has to be noted that such a close agreement across codes and methods was only achieved by using safe numerical settings that guaranteed highest precision and that are rarely used in routine DFT calculations. In practice, such settings are often not even necessary as long as only data obtained by the same methodology, code, and settings are used, because then one benefits from error cancellation, and trends are described reliably.

Over the last decade, the increased amount of available computational power as well as the maturity of existing first-principles materials-science codes made it possible to perform computational studies in a high-throughput fashion by scanning the compositional and structural space in an almost automated manner7,8,9,10. In such a case, the numerical settings have to be decided a priori in such a way that the trends of the properties of interest are captured. Often, this is achieved via educated guesses, sometimes via (semi-)automatic algorithms11,12. Since the properties of interest differ in different investigations, also the numerical settings can vary quite significantly13,14,15. This has some impact on the possibility of reusing data beyond its original scope and purpose. Also, comparing data from different sources–created using different methodologies and settings or focusing on different properties–is not risk-free, in spite of the fact that the data may be publicly available in databases and repositories, as for instance, in the NOMAD Repository16, AFLOW17,18, Materials Project19, OQMD9, Materials Cloud20, the Computational Materials Repository21, and alike. In a nutshell, using data from different sources that are based on different numerical settings implies potentially uncontrollable uncertainties. This is a pressing and severe issue, given that the sheer amount of calculations existing to date prevents a human, case-by-case check of the data.

In this work, we describe a first step for overcoming this unsatisfactory situation and show how errors for data stemming from DFT computations can be estimated. We emphasize that we do not investigate errors that originate from the use of approximate physical equations, e.g., the use of a particular xc-functional. We rather focus on numerical aspects, i.e., on errors arising from the fact that the same equation is solved in different approaches by employing different numerical approximations and techniques. Note that different treatments of exchange and correlation can, however, require different numerical settings for convergence, as discussed in sec. “Discussion”. To this end, we systematically investigate the numerical errors that arise in total energies and energy differences when three different methodologies are applied, using representative DFT codes as examples. These are the linearized augmented plane-waves plus local orbitals ansatz, as implemented in the all-electron, full-potential code exciting22, the linear combination of numeric atom-centered orbitals (NAOs) method as implemented in the all-electron, full-potential code FHI-aims23,24, as well as the projector-augmented wave (PAW) formalism25, as implemented in the package GPAW26,27. All electrons are accounted for on the same footing in the self-consistency cycle in the first two methods. Conversely, core states are frozen in the PAW approach and valence states are mapped onto smooth pseudo-valence states using a linear transformation involving atom-centered partial wave expansions25. These pseudo-states are smooth and represented in a plane-wave expansion (throughout this work, we use the PAW potentials recommended by the GPAW developers). In the following, we evaluate and analyze the numerical errors arising in these different formalisms at various levels of precision (see sec. “Methods”) and then suggest how to estimate the errors associated with the basis-set incompleteness and, consequently, get access to the complete-basis-set limit for total energies and energy differences.

## Results

### Overview

To cover the chemical space in the benchmark calculations, a set of representative materials is chosen. This includes the 71 elemental solids that have been studied in the aforementioned work by Lejaeghere and coworkers1 and also includes binary materials (one for each element with atomic number ≤71; noble gases excluded). The atomic structures and detailed geometries were taken from the experimental Springer Materials database (https://materials.springer.com) by selecting the energetically most stable binary structure for each particular element. We use the T = 0 K experimental geometries. Zero-point vibrational effects are included in these experimental values and are not corrected for in the calculations. This is fine as we only need a consistent treatment for all calculations and materials. On top of that, 10 ternary materials were chosen from the NOMAD Repository (https://repository.nomad-coe.eu). A detailed list including space groups, stoichiometric formulae, structures, and references to the original scientific publications is given in the Supplementary Discussion section.

In this section, we focus on the convergence and related errors of two fundamental properties, i.e., the absolute total energies Etot and relative energies Erel. The latter were computed as the total-energy difference between the original unit cell and an expanded cell, with 5% larger volume and scaled internal atomic positions. While Etot includes both the energetic contribution from core and valence electrons, Erel is less sensitive to contributions from the core and semi-core electrons due to benign error cancellation. Accordingly, Erel is a good metric to quantify the typically needed numerical precision for energy differences as well as potential-energy surfaces. It also sheds light on the errors that would occur in properties derived from the total energy, like elastic constants, vibrational properties, and alike. In our evaluations, the error for one material i in a data set xi is always defined with respect to the “fully converged” reference value ci, as indicated by the notation Δxi = xi − ci, e.g., ΔEtot,i for the total energy error of material i. To statistically analyze the errors across the full set of materials with N entries, we report the mean absolute error

$$\langle {{\Delta }}x\rangle =\frac{1}{N}\mathop{\sum }\limits_{i}^{N}| {{\Delta }}{x}_{i}|$$
(1)

and the maximum error

$$\max ({{\Delta }}x)=\mathop{\max }\limits_{i}\left|{{\Delta }}{x}_{i}\right|.$$
(2)

Here, we limit the discussion to data computed with the PBE xc-functional. The numerical errors occurring with a different type of generalized gradient approximation (GGA) or the local-density approximation (LDA) show the same qualitative behavior and only minor quantitative differences (see Supplementary Discussion). However, quantitative differences occur for beyond-DFT methods, as discussed in sec. “Discussion”.

In the following, we first summarize the trends observed for the elemental solids (sec. “Elemental Solids”). When discussing errors related to the basis set, we always compare to calculations that are “fully converged” with respect to k-points. Likewise, errors arising from an insufficient k-point density are discussed for “fully converged” basis sets, since the errors arising from either source can be considered independent of each other. In all cases, a simple summation approach with a Fermi-function smearing of 100 meV is used for the BZ integration. The observed trends allow us to propose a simple mathematical model to estimate the error associated with the basis set for any compound and any of the investigated codes, as exemplified in sec. “Predicting errors for binary and ternary systems” for binary and ternary materials.

### Elemental solids

First, we address the convergence with respect to the size viz. degree of completeness of the basis set. The results are shown in Fig. 1. In the case of exciting, the atom-specific settings, which are kept fixed in all calculations, correspond to a sizable number of local orbitals that ensure well-converged ground-state calculations and transferability between different compounds. The remaining (and most widely used) parameter to judge the quality of the plane-wave basis is $$R{K}_{\max}$$, which is the product of the radius of the smallest atomic sphere and the plane-wave cutoff (for details, see ref. 22). Choosing the optimal value $$R{K}_{\mathrm{max }}^{\,{{\rm{opt}}}\,}$$ such that it corresponds to a convergence of the total energy of about 0.1 meV/atom, we use the squared fraction $${(R{K}_{\mathrm{max}}/R{K}_{\mathrm{max}}^{{{\mathrm{opt}}}})}^{2}$$ to label the basis-set quality, see Supplementary Methods for details. For FHI-aims, which uses tabulated, chemical-species-specific sets of NAOs, the number of NAOs per electron is used as metric. Note that these NAOs come in tiers that group different angular momenta23. The average number of basis functions per electron present in these tiers and in the species-specific suggested settings (“light”, “tight”) provided by the FHI-aims developers are shown as black and gray vertical lines in the figures. Since the “translation” from the number of NAOs into this metric requires binning (not all elemental solids appear for all values of the x-axis), the reported errors do not decrease monotonically. It is important to note that tier 4 sets are not provided for all elements, but only for those species for which such an additional set of basis functions improved the description of the electronic structure during the basis-set construction procedure23. Accordingly, only these problematic elements determine the errors shown for 9 and more NAOs per electron. The more benign elements, that are already fully converged in this limit, no longer enter the shown average error, since the developer-suggested settings do not allow for more than 8 NAOs per electrons for these species. In the plane-wave code GPAW the basis set is characterized by the cutoff energy Ecut, i.e., all plane waves with a kinetic energy smaller than Ecut are included in the basis set. Note that this affects the convergence of relative energies, since, for the same value of Ecut, cells with different volume contain different number of plane waves.

As evident from Fig. 1, the errors in the total energy exhibit a systematic convergence with increasing basis-set size for all three codes. Generally, the maximum error in the total energy can be even roughly one order of magnitude larger than the average error. This is due to the fact that numerical errors are element specific, i.e., some chemical species require a large basis set to be described precisely. This is reflected by the fact that the difference between average and maximum error is more pronounced in the results for FHI-aims and GPAW (Fig. 1) due to the metric chosen to quantify the basis-set completeness, i.e., the x-axis in this figure. While FHI-aims and GPAW use an absolute metric, exciting uses a relative one, i.e., fractions of species-specific values $$R{K}_{\mathrm{max}}^{\mathrm{opt}}$$. In this case, the fact that the developers provide well-balanced, species-specific values for $$R{K}_{\mathrm{max }}^{{{\mathrm{opt}}}}$$ ensures that a similar precision is achieved for all species at a specific fraction of $${(R{K}_{\mathrm{max}}/R{K}_{\mathrm{max}}^{{{\mathrm{opt}}}})}^{2}$$. In turn, this leads to a more consistent precision across material space and thus to smaller maximum errors at a given value of $${(R{K}_{\mathrm{max }}/R{K}_{\mathrm{max }}^{{{\mathrm{opt}}}})}^{2}$$. For all three codes, the average and maximum errors in total energies are roughly one to two orders of magnitudes larger than the ones for relative energies. Again, this finding reflects that the main source for imprecisions in the total energy is species specific and leads to a beneficial error cancellation in energy differences.

Eventually, it is important to note that the numerical errors vary considerably for different species, types of bonding, and across methodologies, as detailed in the Supp. Material. Naturally, plane waves are more suitable for quasi-free-electron systems like aluminum, whereas NAOs perform better for inert elements like rare gases or localized covalent bonds. These observations, dating back to the early days of electronic-structure theory and predating modern DFT implementations, are among the historical reasons28 that actually led to the development of the different methodologies discussed in this paper. Accordingly, also the above described trends for the numerical errors, their influence on computed observables, and their numerical as well physical origin have been discussed29,30,31 and reviewed32 in literature before. Most importantly, the finding that errors are largely species-specific can be rationalized by the fact that changes in the kinetic energy of core electrons, despite being orders of magnitude larger than total-energy changes, vanish to first-order in charge-density differences33. For instance, this aspect is directly exploited in the VASP code34,35 for an automatic convergence correction36. Due to this automatic convergence correction, the total energy output of VASP does not necessarily decrease monotonically when Ecut is increased, as it is the case in most common PAW implementations. Accordingly, an analysis of this code-specific aspect goes beyond the scope of this paper. Nonetheless, a complete, consistent VASP data set covering the materials discussed in this work is available via the NOMAD repository at https://doi.org/10.17172/NOMAD/2020.07.29-1. In sec. “Predicting errors for binary and ternary systems”, we will exploit this fact for the three codes exciting, FHI-aims, and GPAW to predict errors a priori for multicomponent systems using information from the elemental solids.

Let us now inspect the errors in total energies that arise due to the finite reciprocal-space grid. Figure 2 shows results for k-point densities of 2 and 4 Å. Data obtained with a k-point density of 8 Å serves as “fully converged” reference. The rather large observed errors result from the fact that many elemental solids are metallic with a more involved shape of the Fermi-surface, so that a substantial number of k-points is required to reach convergence. Quite consistently, all codes yield average errors of the same order of magnitude if the same k-point densities are used, despite the fact that the three codes handle the numerical details of the reciprocal-space integration differently. This is reflected in the maximum errors, which vary slightly more between codes than the average ones. Again, we observe that the maximum error is approximately one order of magnitude larger than the average error.

### Predicting errors for binary and ternary systems

Following our discussion of the errors in total and relative energies of elemental solids stemming from the basis-set incompleteness, we propose to estimate the corresponding errors for multicomponent systems by linearly combining the respective errors observed for the constituents in the elemental-solids calculations at the same settings. This follows the above discussed observation that there are chemical species that require larger basis sets to reach convergence. This is in fact independent of the employed code. For the error in the total energy we simply assume:

$${\overline{{{\Delta }}E}}_{{{\mathrm{tot}}}}=\frac{1}{N}\mathop{\sum}\limits_{I}{N}_{I}{{\Delta }}{E}_{{{\mathrm{tot}}},I}$$
(3)

NI being the number of atoms of species I. For $${\overline{{{\Delta }}E}}_{{{\mathrm{rel}}}}$$ we proceed analogously. Note that in the case of O, F, and N, the elemental solid is a molecular crystal that is not a good representative for the binding in the various oxides, fluorides, and nitrides present in the binaries data set. For this reason, we determine the values of ΔE for these particular elements from the binaries MgO, NaF, and BN by inverting Eq. (3).

To validate the ansatz of Eq. (3), we have computed the total and relative-energy errors for 63 binary solids using the exact same strategies used for the elemental solids in sec. “Elemental solids”. In Fig. 3, we then compare these real errors observed in the calculations for binary systems for two basis-set sizes for each of the three codes to the estimated errors obtained via Eq. (3). As shown in these plots, we generally obtain quite reliable total energy predictions for all three codes by this means. For the total energies (top panels), we observe better predictions when an “unbiased” and smooth metric is used to characterize the basis-set completeness. For instance, GPAW, which uses the atom-independent plane-wave cutoff Ecut, yields an almost perfect correlation between predicted and actual total energy errors. Conversely, more scattering is observed for FHI-aims, which uses an atom-specific, granular metric with different NAOs for each atom. Nonetheless, we find a clear correlation between the predicted, $${\overline{{{\Delta }}E}}_{{{\mathrm{tot}}}}$$, and the actual errors, ΔEtot, for all codes. In particular, this holds for absolute energy errors larger than > 10 meV/atom. This demonstrates that the relatively intuitive relation formulated in Eq. (3) can serve as a reliable estimate for the error associated with a particular total-energy calculation.

For the relative-energy errors shown in the lower half of Fig. 3, we observe more scattering and a less neat correlation between predicted and actual errors. The reason for that is twofold: First, benign error cancellation reduces numerical errors in relative energies, since total energy differences are inspected. In other words, a large portion of the species-specific errors described by Eq. (3) cancel each other out when computing relative energies as a difference. For this exact reason, relative energies are generally less affected by numerical errors (see Fig. 1 and its discussion). Second, relative errors are—in contrast to total energies—non-variational, i.e., they do not necessarily decrease monotonically with basis-set size. The reason is that the errors associated with the two total energies entering the relative energies typically do not decrease at the exact same rate. Still, the relative-energy error estimates for all codes are reliable enough in the respective energy window of interest, hence allowing us to compare relative energies obtained from different codes with different settings.

The data shown and discussed for the binary materials suggest that Eq. (3) can be used to estimate the total energy errors for any multicomponent system. As an example, we demonstrate this in Fig. 4, in which the same comparison between predicted and actual total energy errors is made for ten ternary systems, which were selected from the huge pool of compounds available in the NOMAD Repository16 so to cover material and structural space. Also in this case, the same quantitative and qualitative behavior as discussed for Fig. 3 is observed. The relatively simple approach of Eq. (3) is able to correctly predict the numerical errors also in these ternary systems. This further substantiates that the described approach is not only applicable to the relatively simple binary systems discussed in Fig. 3, but also to more complex systems, as the ones found in electronic-structure materials databases.

## Discussion

The focus of the formalism presented in sec. “Results” lays on the analysis of total and relative energies, since those are the most fundamental quantities produced in electronic-structure-theory calculations. However, such first-principles approaches also allow computing many other material properties, ranging from structural parameters, over thermodynamic expectation values, to electronic properties. Generally, these quantities will exhibit a different convergence behavior than the total and relative energies. In particular, this is the case for non-variational properties that do not depend monotonically on the basis-set size and k-grid density.

As an example, we discuss the numerical errors associated with the evaluation of the stress tensor σ. Its components are defined as37,38

$${\sigma }_{\lambda \mu }={\left.\frac{1}{V}\frac{\partial {E}_{{{{\rm{tot}}}}}}{\partial {\varepsilon }_{\lambda \mu }}\right|}_{\varepsilon = 0}\ ,$$
(4)

i.e., as the total energy derivatives with respect to symmetric strain deformations ελμ for the Cartesian axes λ, μ  normalized by the unit-cell volume V. Despite the fact the stress is defined as a total energy derivative, it is well known39 that it is particularly sensitive to the value of Ecut chosen in plane-wave calculations. This is further demonstrated for the GPAW code in Fig. 5 using the trace of the stress tensor $$\,{{\mbox{tr}}}\,\left[{{{\boldsymbol{\sigma }}}}\right]$$, as computed for the experimental lattice constants and structures. Qualitatively, the average and maximum errors observed for the stress resemble the behavior observed for GPAW’s total energy convergence quite closely, as a comparison of Figs. 1 and 5 reveals. This is not surprising, given that stress and total energy are directly related via Eq. (4). However, obtaining meaningful values for the stress, i.e., values accurate enough to perform reliable structure relaxations, requires roughly 50% higher cutoff energies Ecut than needed to obtain reasonably converged total energies. Let us note that the contributions to the numerical error in the stress tensor stemming from the finite k-grid density are much smaller than those arising from Ecut, as found for the total energy before (see Fig. 2). With respect to the basis-set convergence, the observed trends suggest that the strategy devised in this work for total and relative energies might also be useful for estimating errors in first-order derivatives of the total energy, i.e., for forces and stresses, which only depend on an accurate description of occupied electronic states40. More data, especially for structures far from equilibrium, is needed to further investigate this hypothesis and to develop accurate error-estimate models for such quantities.

Not all material properties of interest solely depend on occupied electronic states, e.g., evaluating opto-electronic properties specifically requires the eigenvalues (and/or wavefunctions) of unoccupied electronic states. As an example for such kind of properties, we show in Fig. 6 the error of the Kohn-Sham band gap, ΔEBG, as obtained for the 71 elemental solids from band-structure calculations with FHI-aims along high-symmetry paths in the Brillouin zone41. The comparison of Fig. 6 with the respective total energy convergence plot in Fig. 1 shows that the range observed for both average and maximum errors in EBG spans almost twice the orders of magnitude obtained for Etot, substantiating that larger basis sets are required to converge EBG. Furthermore, we note that the numerical errors do not decrease monotonically. In part, this is a consequence of the fact that the band gap is a difference of two values that exhibit different, non-variational convergence. Furthermore, we see again the effect of the employed “binning” procedure discussed for Etot above (e.g., the peak at 5.5 basis functions per electron). In the case of the band gap, the latter is particularly important, since the calculated band gaps span a wide range, starting from virtually zero, e.g., for graphite, and reaching 17 eV for the rare gas helium. For this exact reason, the relative numerical errors for EBG, shown in percent of the converged value in the inlet of Fig. 6, exhibit a more regular—but still non-monotonic—behavior. As it was observed for the evaluation of the stress, computing reasonably converged band gaps hence requires roughly 50% larger basis sets than needed to achieve total energy convergence.

As noted in the introduction, we have restricted our analysis to (semi-)local xc-functionals, since such kind of calculations are the current workhorse in computational high-throughput studies and hence constitute the uttermost majority of data stored in existing electronic-structure theory databases16. However, it is well known that beyond-DFT methods require larger basis sets to achieve convergence in total energy42. For the generalized hybrid functional HSE0643, which incorporates a fraction of non-local, exact exchange, this is demonstrated in Fig. 7, which shows the correlation between the numerical errors observed in the total energy of the 71 elemental solids for the PBE and HSE06 functional, respectively. Especially when compared to the LDA/PBE correlation plot shown in the Supplementary Discussion, it is obvious that the numerical errors are typically larger in HSE06 calculations. Nonetheless, there is a clear qualitative correlation between PBE and HSE06 errors, suggesting that the strategies developed in this work might also be useful for beyond-(semi-)local-DFT databases.

In this study, we presented an extensive, curated data set obtained by three conceptually very different electronic-structure methods. This set contains elemental solids, binary, and ternary materials for various combinations of computational parameters. The data have been used to understand and predict the errors of calculations with respect to the basis-set quality. More specifically, we have shown that the errors for arbitrary systems can be estimated from the errors obtained from systematic calculations for related elemental solids, as exemplified for 63 binaries and 10 ternary systems covering 13 different space groups. Let us emphasize that the presented findings are not code-specific, i.e., limited to exciting, FHI-aims, and GPAW. Rather, the qualitative trends observed for the linearized augmented plane-waves plus local orbitals, the linear combination of numeric atom-centered orbitals, and the projector-augmented wave formalisms, respectively, generally hold for all implementations of these approaches, thus covering the vast majority of codes present in current material databases. Obviously, quantitative error estimates for individual codes depend on the details of the implementations and basis sets, e.g., the chosen local orbitals, the exact definition of the NAOs, or the employed PAW potentials, and thus require code-specific reference calculations for the elemental solids. That given, the developed formalism, which gives surprisingly good results for total energies despite its conceptual simplicity, can be incorporated into computational materials databases to estimate errors of stored data. This is a prerequisite for operating on data collections that originate from different computations, performed with different computer codes and/or different precision. Our work may serve as a starting point for more sophisticated concepts to quantify numerical errors and uncertainties, especially for more complex materials properties that do not necessarily depend monotonically on the basis-set size, e.g., band gaps, forces, vibrational frequencies, and the relative energies discussed in this work.

## Methods

### First-principles calculations

To perform the DFT calculations with these three codes in a systematic manner, the atomic simulation environment ASE44,45 is used to generate the code-specific input files and to store the results using ASE’s lightweight database module. In this paper, we focus on the two main numerical approximations that are used to discretize and represent the electron density n(r) = ∑lkψlk(r)2 via the Kohn-Sham wavefunctions ψlk(r) for the individual electronic states l. These are the density of the reciprocal-space grid (k-grid) for Brillouin-zone (BZ) integrations and the finite basis set ϕjk(r) enumerated via j. The Kohn-Sham wavefunctions are written as

$${\psi }_{l{{{\bf{k}}}}}({{{\bf{r}}}})={u}_{l{{{\bf{k}}}}}({{{\bf{r}}}})\exp (i{{{\bf{k}}}}{{{\bf{r}}}})\quad \,{{\mbox{with}}}\,\quad {u}_{l{{{\bf{k}}}}}({{{\bf{r}}}})=\mathop{\sum}\limits_{j}{c}_{lj{{{\bf{k}}}}}{\phi }_{j{{{\bf{k}}}}}({{{\bf{r}}}}).$$
(5)

For the BZ sampling, we use a Γ-centered k-grid characterized by a uniform k-point density

$${\rho }_{{{{\bf{k}}}}}={({N}_{{{{\bf{k}}}}}/{V}_{{{\mathrm{BZ}}}})}^{\frac{1}{3}},$$
(6)

where Nk is the total number of k-points and VBZ the BZ volume.

To discuss and analyze numerical errors, we perform total-energy calculations for fixed geometries, i.e., without any relaxation, using a representative set of numerical settings. These are k-point densities of 2, 4, and 8 Å, respectively, and choices of basis sets that are described in detail in the Supplementary Methods section. They reflect settings typically used in production calculations and also include extremely precise numerical settings that ensure convergence in total energy of <0.001 eV/atom. The latter are termed “fully converged” reference when we evaluate the error occurring with less precise (typical) settings. To make sure that no other numerical errors cloud the ones stemming from the k-grid and the basis set, all other computational parameters—for example, the convergence thresholds for self-consistency—are chosen in an extremely conservative way, as detailed in the Supplementary Methods section.