## Abstract

Electronic-structure theory is a strong pillar of materials science. Many different computer codes that employ different approaches are used by the community to solve various scientific problems. Still, the precision of different packages has only been scrutinized thoroughly not long ago, focusing on a specific task, namely selecting a popular density functional, and using unusually high, extremely precise numerical settings for investigating 71 monoatomic crystals^{1}. Little is known, however, about method- and code-specific uncertainties that arise under numerical settings that are commonly used in practice. We shed light on this issue by investigating the deviations in total and relative energies as a function of computational parameters. Using typical settings for basis sets and k-grids, we compare results for 71 elemental^{1} and 63 binary solids obtained by three different electronic-structure codes that employ fundamentally different strategies. On the basis of the observed trends, we propose a simple, analytical model for the estimation of the errors associated with the basis-set incompleteness. We cross-validate this model using ternary systems obtained from the Novel Materials Discovery (NOMAD) Repository and discuss how our approach enables the comparison of the heterogeneous data present in computational materials databases.

## Introduction

Over the last decades, computational materials science has evolved as a paradigm of materials science, complementing theory and experiment with *computer experiments*^{2}. In particular, density-functional theory (DFT) has become the workhorse for a plenitude of computational investigations, representing a good compromise between precision and computational expense, thus allowing for the investigation of realistic systems with affordable numerical effort^{3}. The widespread application of electronic-structure theory was especially fueled by the development and distribution of many user-friendly and computationally efficient simulation packages (termed *codes* in the following) based on DFT^{4}. Essentially all these codes rely on the same fundamental physical concept and solve the Kohn-Sham (KS) equations^{5} of DFT self-consistently by expanding the Kohn-Sham states in a finite basis set. Moreover, apart from the choice of the basis set, different approximations and various numerical techniques and algorithms are employed. Inherently, this raises the question how consistent, and hence, how comparable, results from different codes are.

In 2016, a synergistic community effort led by K. Lejaeghere and S. Cottenier^{1} has shed light on these issues, essentially concluding that “most recent codes and methods converge toward a single value”. This concerns, however, only the investigated relatively robust case of computing the equation of states for elemental solids^{1,6} using the PBE exchange-correlation (xc) functional. In this context, it has to be noted that such a close agreement across codes and methods was only achieved by using *safe* numerical settings that guaranteed highest precision and that are rarely used in routine DFT calculations. In practice, such settings are often not even necessary as long as only data obtained by the same methodology, code, and settings are used, because then one benefits from error cancellation, and trends are described reliably.

Over the last decade, the increased amount of available computational power as well as the maturity of existing first-principles materials-science codes made it possible to perform computational studies in a high-throughput fashion by scanning the compositional and structural space in an almost automated manner^{7,8,9,10}. In such a case, the numerical settings have to be decided a priori in such a way that the trends of the properties of interest are captured. Often, this is achieved via educated guesses, sometimes via (semi-)automatic algorithms^{11,12}. Since the properties of interest differ in different investigations, also the numerical settings can vary quite significantly^{13,14,15}. This has some impact on the possibility of reusing data beyond its original scope and purpose. Also, comparing data from different sources–created using different methodologies and settings or focusing on different properties–is not risk-free, in spite of the fact that the data may be publicly available in databases and repositories, as for instance, in the NOMAD Repository^{16}, AFLOW^{17,18}, Materials Project^{19}, OQMD^{9}, Materials Cloud^{20}, the Computational Materials Repository^{21}, and alike. In a nutshell, using data from different sources that are based on different numerical settings implies potentially uncontrollable uncertainties. This is a pressing and severe issue, given that the sheer amount of calculations existing to date prevents a human, case-by-case check of the data.

In this work, we describe a first step for overcoming this unsatisfactory situation and show how errors for data stemming from DFT computations can be estimated. We emphasize that we do not investigate errors that originate from the use of approximate *physical* equations, e.g., the use of a particular xc-functional. We rather focus on *numerical* aspects, i.e., on errors arising from the fact that the same equation is solved in different approaches by employing different numerical approximations and techniques. Note that different treatments of exchange and correlation can, however, require different numerical settings for convergence, as discussed in sec. “Discussion”. To this end, we systematically investigate the numerical errors that arise in total energies and energy differences when three different methodologies are applied, using representative DFT codes as examples. These are the *linearized augmented plane-waves plus local orbitals* ansatz, as implemented in the all-electron, full-potential code exciting^{22}, the *linear combination of numeric atom-centered orbitals* (*NAOs*) method as implemented in the all-electron, full-potential code FHI-aims^{23,24}, as well as the *projector-augmented wave* (*PAW*) formalism^{25}, as implemented in the package GPAW^{26,27}. All electrons are accounted for on the same footing in the self-consistency cycle in the first two methods. Conversely, core states are frozen in the PAW approach and valence states are mapped onto smooth pseudo-valence states using a linear transformation involving atom-centered partial wave expansions^{25}. These pseudo-states are smooth and represented in a plane-wave expansion (throughout this work, we use the PAW potentials recommended by the GPAW developers). In the following, we evaluate and analyze the numerical errors arising in these different formalisms at various levels of precision (see sec. “Methods”) and then suggest how to estimate the errors associated with the basis-set incompleteness and, consequently, get access to the complete-basis-set limit for total energies and energy differences.

## Results

### Overview

To cover the chemical space in the benchmark calculations, a set of representative materials is chosen. This includes the 71 elemental solids that have been studied in the aforementioned work by Lejaeghere and coworkers^{1} and also includes binary materials (one for each element with atomic number ≤71; noble gases excluded). The atomic structures and detailed geometries were taken from the experimental Springer Materials database (https://materials.springer.com) by selecting the energetically most stable binary structure for each particular element. We use the *T* = 0 K experimental geometries. Zero-point vibrational effects are included in these experimental values and are not corrected for in the calculations. This is fine as we only need a consistent treatment for all calculations and materials. On top of that, 10 ternary materials were chosen from the NOMAD Repository (https://repository.nomad-coe.eu). A detailed list including space groups, stoichiometric formulae, structures, and references to the original scientific publications is given in the Supplementary Discussion section.

In this section, we focus on the convergence and related errors of two fundamental properties, i.e., the absolute total energies *E*_{tot} and relative energies *E*_{rel}. The latter were computed as the total-energy difference between the original unit cell and an expanded cell, with 5% larger volume and scaled internal atomic positions. While *E*_{tot} includes both the energetic contribution from core and valence electrons, *E*_{rel} is less sensitive to contributions from the core and semi-core electrons due to benign error cancellation. Accordingly, *E*_{rel} is a good metric to quantify the typically needed numerical precision for energy differences as well as potential-energy surfaces. It also sheds light on the errors that would occur in properties derived from the total energy, like elastic constants, vibrational properties, and alike. In our evaluations, the error for one material *i* in a data set *x*_{i} is always defined with respect to the “fully converged” reference value *c*_{i}, as indicated by the notation Δ*x*_{i} = *x*_{i} − *c*_{i}, e.g., Δ*E*_{tot,i} for the total energy error of material *i*. To statistically analyze the errors across the full set of materials with *N* entries, we report the mean absolute error

and the maximum error

Here, we limit the discussion to data computed with the PBE xc-functional. The numerical errors occurring with a different type of generalized gradient approximation (GGA) or the local-density approximation (LDA) show the same qualitative behavior and only minor quantitative differences (see Supplementary Discussion). However, quantitative differences occur for beyond-DFT methods, as discussed in sec. “Discussion”.

In the following, we first summarize the trends observed for the elemental solids (sec. “Elemental Solids”). When discussing errors related to the basis set, we always compare to calculations that are “fully converged” with respect to k-points. Likewise, errors arising from an insufficient k-point density are discussed for “fully converged” basis sets, since the errors arising from either source can be considered independent of each other. In all cases, a simple summation approach with a Fermi-function smearing of 100 meV is used for the BZ integration. The observed trends allow us to propose a simple mathematical model to estimate the error associated with the basis set for *any* compound and *any* of the investigated codes, as exemplified in sec. “Predicting errors for binary and ternary systems” for binary and ternary materials.

### Elemental solids

First, we address the convergence with respect to the size viz. degree of completeness of the basis set. The results are shown in Fig. 1. In the case of exciting, the atom-specific settings, which are kept fixed in all calculations, correspond to a sizable number of local orbitals that ensure well-converged ground-state calculations and transferability between different compounds. The remaining (and most widely used) parameter to judge the quality of the plane-wave basis is \(R{K}_{\max}\), which is the product of the radius of the smallest atomic sphere and the plane-wave cutoff (for details, see ref. ^{22}). Choosing the *optimal* value \(R{K}_{\mathrm{max }}^{\,{{\rm{opt}}}\,}\) such that it corresponds to a convergence of the total energy of about 0.1 meV/atom, we use the squared fraction \({(R{K}_{\mathrm{max}}/R{K}_{\mathrm{max}}^{{{\mathrm{opt}}}})}^{2}\) to label the basis-set quality, see Supplementary Methods for details. For FHI-aims, which uses tabulated, chemical-species-specific sets of NAOs, the number of NAOs per electron is used as metric. Note that these NAOs come in *tiers* that group different angular momenta^{23}. The average number of basis functions per electron present in these tiers and in the species-specific suggested settings (“light”, “tight”) provided by the FHI-aims developers are shown as black and gray vertical lines in the figures. Since the “translation” from the number of NAOs into this metric requires binning (not all elemental solids appear for all values of the *x*-axis), the reported errors do not decrease monotonically. It is important to note that tier 4 sets are not provided for all elements, but only for those species for which such an additional set of basis functions improved the description of the electronic structure during the basis-set construction procedure^{23}. Accordingly, only these problematic elements determine the errors shown for 9 and more NAOs per electron. The more benign elements, that are already fully converged in this limit, no longer enter the shown average error, since the developer-suggested settings do not allow for more than 8 NAOs per electrons for these species. In the plane-wave code GPAW the basis set is characterized by the cutoff energy *E*_{cut}, i.e., all plane waves with a kinetic energy smaller than *E*_{cut} are included in the basis set. Note that this affects the convergence of relative energies, since, for the same value of *E*_{cut}, cells with different volume contain different number of plane waves.

As evident from Fig. 1, the errors in the total energy exhibit a systematic convergence with increasing basis-set size for all three codes. Generally, the maximum error in the total energy can be even roughly one order of magnitude larger than the average error. This is due to the fact that numerical errors are element specific, i.e., some chemical species require a large basis set to be described precisely. This is reflected by the fact that the difference between average and maximum error is more pronounced in the results for FHI-aims and GPAW (Fig. 1) due to the metric chosen to quantify the basis-set completeness, i.e., the *x*-axis in this figure. While FHI-aims and GPAW use an absolute metric, exciting uses a relative one, i.e., fractions of species-specific values \(R{K}_{\mathrm{max}}^{\mathrm{opt}}\). In this case, the fact that the developers provide well-balanced, species-specific values for \(R{K}_{\mathrm{max }}^{{{\mathrm{opt}}}}\) ensures that a similar precision is achieved for all species at a specific fraction of \({(R{K}_{\mathrm{max}}/R{K}_{\mathrm{max}}^{{{\mathrm{opt}}}})}^{2}\). In turn, this leads to a more consistent precision across material space and thus to smaller maximum errors at a given value of \({(R{K}_{\mathrm{max }}/R{K}_{\mathrm{max }}^{{{\mathrm{opt}}}})}^{2}\). For all three codes, the average and maximum errors in total energies are roughly one to two orders of magnitudes larger than the ones for relative energies. Again, this finding reflects that the main source for imprecisions in the total energy is species specific and leads to a beneficial error cancellation in energy differences.

Eventually, it is important to note that the numerical errors vary considerably for different species, types of bonding, and across methodologies, as detailed in the Supp. Material. Naturally, plane waves are more suitable for quasi-free-electron systems like aluminum, whereas NAOs perform better for inert elements like rare gases or localized covalent bonds. These observations, dating back to the early days of electronic-structure theory and predating modern DFT implementations, are among the historical reasons^{28} that actually led to the development of the different methodologies discussed in this paper. Accordingly, also the above described trends for the numerical errors, their influence on computed observables, and their numerical as well physical origin have been discussed^{29,30,31} and reviewed^{32} in literature before. Most importantly, the finding that errors are largely species-specific can be rationalized by the fact that changes in the kinetic energy of core electrons, despite being orders of magnitude larger than total-energy changes, vanish to first-order in charge-density differences^{33}. For instance, this aspect is directly exploited in the VASP code^{34,35} for an automatic convergence correction^{36}. Due to this automatic convergence correction, the total energy output of VASP does not necessarily decrease monotonically when *E*_{cut} is increased, as it is the case in most common PAW implementations. Accordingly, an analysis of this code-specific aspect goes beyond the scope of this paper. Nonetheless, a complete, consistent VASP data set covering the materials discussed in this work is available via the NOMAD repository at https://doi.org/10.17172/NOMAD/2020.07.29-1. In sec. “Predicting errors for binary and ternary systems”, we will exploit this fact for the three codes exciting, FHI-aims, and GPAW to predict errors a priori for multicomponent systems using information from the elemental solids.

Let us now inspect the errors in total energies that arise due to the finite reciprocal-space grid. Figure 2 shows results for k-point densities of 2 and 4 Å. Data obtained with a k-point density of 8 Å serves as “fully converged” reference. The rather large observed errors result from the fact that many elemental solids are metallic with a more involved shape of the Fermi-surface, so that a substantial number of k-points is required to reach convergence. Quite consistently, all codes yield average errors of the same order of magnitude if the same k-point densities are used, despite the fact that the three codes handle the numerical details of the reciprocal-space integration differently. This is reflected in the maximum errors, which vary slightly more between codes than the average ones. Again, we observe that the maximum error is approximately one order of magnitude larger than the average error.

### Predicting errors for binary and ternary systems

Following our discussion of the errors in total and relative energies of elemental solids stemming from the basis-set incompleteness, we propose to estimate the corresponding errors for multicomponent systems by linearly combining the respective errors observed for the constituents in the elemental-solids calculations at the same settings. This follows the above discussed observation that there are chemical species that require larger basis sets to reach convergence. This is in fact independent of the employed code. For the error in the total energy we simply assume:

*N*_{I} being the number of atoms of species *I*. For \({\overline{{{\Delta }}E}}_{{{\mathrm{rel}}}}\) we proceed analogously. Note that in the case of O, F, and N, the elemental solid is a molecular crystal that is not a good representative for the binding in the various oxides, fluorides, and nitrides present in the binaries data set. For this reason, we determine the values of ΔE for these particular elements from the binaries MgO, NaF, and BN by inverting Eq. (3).

To validate the ansatz of Eq. (3), we have computed the total and relative-energy errors for 63 binary solids using the exact same strategies used for the elemental solids in sec. “Elemental solids”. In Fig. 3, we then compare these real errors observed in the calculations for binary systems for two basis-set sizes for each of the three codes to the estimated errors obtained via Eq. (3). As shown in these plots, we generally obtain quite reliable total energy predictions for all three codes by this means. For the total energies (top panels), we observe better predictions when an “unbiased” and smooth metric is used to characterize the basis-set completeness. For instance, GPAW, which uses the atom-independent plane-wave cutoff *E*_{cut}, yields an almost perfect correlation between predicted and actual total energy errors. Conversely, more scattering is observed for FHI-aims, which uses an atom-specific, granular metric with different NAOs for each atom. Nonetheless, we find a clear correlation between the predicted, \({\overline{{{\Delta }}E}}_{{{\mathrm{tot}}}}\), and the actual errors, Δ*E*_{tot}, for all codes. In particular, this holds for absolute energy errors larger than > 10 meV/atom. This demonstrates that the relatively intuitive relation formulated in Eq. (3) can serve as a reliable estimate for the error associated with a particular total-energy calculation.

For the relative-energy errors shown in the lower half of Fig. 3, we observe more scattering and a less neat correlation between predicted and actual errors. The reason for that is twofold: First, benign error cancellation reduces numerical errors in relative energies, since total energy differences are inspected. In other words, a large portion of the species-specific errors described by Eq. (3) cancel each other out when computing relative energies as a difference. For this exact reason, relative energies are generally less affected by numerical errors (see Fig. 1 and its discussion). Second, relative errors are—in contrast to total energies—non-variational, i.e., they do not necessarily decrease monotonically with basis-set size. The reason is that the errors associated with the two total energies entering the relative energies typically do not decrease at the exact same rate. Still, the relative-energy error estimates for all codes are reliable enough in the respective energy window of interest, hence allowing us to compare relative energies obtained from different codes with different settings.

The data shown and discussed for the binary materials suggest that Eq. (3) can be used to estimate the total energy errors for *any* multicomponent system. As an example, we demonstrate this in Fig. 4, in which the same comparison between predicted and actual total energy errors is made for ten ternary systems, which were selected from the huge pool of compounds available in the NOMAD Repository^{16} so to cover material and structural space. Also in this case, the same quantitative and qualitative behavior as discussed for Fig. 3 is observed. The relatively simple approach of Eq. (3) is able to correctly predict the numerical errors also in these ternary systems. This further substantiates that the described approach is not only applicable to the relatively simple binary systems discussed in Fig. 3, but also to more complex systems, as the ones found in electronic-structure materials databases.

## Discussion

The focus of the formalism presented in sec. “Results” lays on the analysis of total and relative energies, since those are the most fundamental quantities produced in electronic-structure-theory calculations. However, such first-principles approaches also allow computing many other material properties, ranging from structural parameters, over thermodynamic expectation values, to electronic properties. Generally, these quantities will exhibit a different convergence behavior than the total and relative energies. In particular, this is the case for non-variational properties that do not depend monotonically on the basis-set size and k-grid density.

As an example, we discuss the numerical errors associated with the evaluation of the stress tensor **σ**. Its components are defined as^{37,38}

i.e., as the total energy derivatives with respect to symmetric strain deformations *ε*_{λμ} for the Cartesian axes *λ*, *μ* normalized by the unit-cell volume *V*. Despite the fact the stress is defined as a total energy derivative, it is well known^{39} that it is particularly sensitive to the value of *E*_{cut} chosen in plane-wave calculations. This is further demonstrated for the GPAW code in Fig. 5 using the trace of the stress tensor \(\,{{\mbox{tr}}}\,\left[{{{\boldsymbol{\sigma }}}}\right]\), as computed for the experimental lattice constants and structures. Qualitatively, the average and maximum errors observed for the stress resemble the behavior observed for GPAW’s total energy convergence quite closely, as a comparison of Figs. 1 and 5 reveals. This is not surprising, given that stress and total energy are directly related via Eq. (4). However, obtaining meaningful values for the stress, i.e., values accurate enough to perform reliable structure relaxations, requires roughly 50% higher cutoff energies *E*_{cut} than needed to obtain reasonably converged total energies. Let us note that the contributions to the numerical error in the stress tensor stemming from the finite k-grid density are much smaller than those arising from *E*_{cut}, as found for the total energy before (see Fig. 2). With respect to the basis-set convergence, the observed trends suggest that the strategy devised in this work for total and relative energies might also be useful for estimating errors in first-order derivatives of the total energy, i.e., for forces and stresses, which only depend on an accurate description of occupied electronic states^{40}. More data, especially for structures far from equilibrium, is needed to further investigate this hypothesis and to develop accurate error-estimate models for such quantities.

Not all material properties of interest solely depend on occupied electronic states, e.g., evaluating opto-electronic properties specifically requires the eigenvalues (and/or wavefunctions) of unoccupied electronic states. As an example for such kind of properties, we show in Fig. 6 the error of the Kohn-Sham band gap, Δ*E*_{BG}, as obtained for the 71 elemental solids from band-structure calculations with FHI-aims along high-symmetry paths in the Brillouin zone^{41}. The comparison of Fig. 6 with the respective total energy convergence plot in Fig. 1 shows that the range observed for both average and maximum errors in *E*_{BG} spans almost twice the orders of magnitude obtained for *E*_{tot}, substantiating that larger basis sets are required to converge *E*_{BG}. Furthermore, we note that the numerical errors do not decrease monotonically. In part, this is a consequence of the fact that the band gap is a difference of two values that exhibit different, non-variational convergence. Furthermore, we see again the effect of the employed “binning” procedure discussed for *E*_{tot} above (e.g., the peak at 5.5 basis functions per electron). In the case of the band gap, the latter is particularly important, since the calculated band gaps span a wide range, starting from virtually zero, e.g., for graphite, and reaching 17 eV for the rare gas helium. For this exact reason, the relative numerical errors for *E*_{BG}, shown in percent of the converged value in the inlet of Fig. 6, exhibit a more regular—but still non-monotonic—behavior. As it was observed for the evaluation of the stress, computing reasonably converged band gaps hence requires roughly 50% larger basis sets than needed to achieve total energy convergence.

As noted in the introduction, we have restricted our analysis to (semi-)local xc-functionals, since such kind of calculations are the current workhorse in computational *high-throughput* studies and hence constitute the uttermost majority of data stored in existing electronic-structure theory databases^{16}. However, it is well known that beyond-DFT methods require larger basis sets to achieve convergence in total energy^{42}. For the generalized hybrid functional HSE06^{43}, which incorporates a fraction of non-local, exact exchange, this is demonstrated in Fig. 7, which shows the correlation between the numerical errors observed in the total energy of the 71 elemental solids for the PBE and HSE06 functional, respectively. Especially when compared to the LDA/PBE correlation plot shown in the Supplementary Discussion, it is obvious that the numerical errors are typically larger in HSE06 calculations. Nonetheless, there is a clear qualitative correlation between PBE and HSE06 errors, suggesting that the strategies developed in this work might also be useful for beyond-(semi-)local-DFT databases.

In this study, we presented an extensive, curated data set obtained by three conceptually very different electronic-structure methods. This set contains elemental solids, binary, and ternary materials for various combinations of computational parameters. The data have been used to understand and predict the errors of calculations with respect to the basis-set quality. More specifically, we have shown that the errors for arbitrary systems can be estimated from the errors obtained from systematic calculations for related elemental solids, as exemplified for 63 binaries and 10 ternary systems covering 13 different space groups. Let us emphasize that the presented findings are not code-specific, i.e., limited to exciting, FHI-aims, and GPAW. Rather, the qualitative trends observed for the *linearized augmented plane-waves plus local orbitals*, the *linear combination of numeric atom-centered orbitals*, and the *projector-augmented wave* formalisms, respectively, generally hold for all implementations of these approaches, thus covering the vast majority of codes present in current material databases. Obviously, quantitative error estimates for individual codes depend on the details of the implementations and basis sets, e.g., the chosen local orbitals, the exact definition of the NAOs, or the employed PAW potentials, and thus require code-specific reference calculations for the elemental solids. That given, the developed formalism, which gives surprisingly good results for total energies despite its conceptual simplicity, can be incorporated into computational materials databases to estimate errors of stored data. This is a prerequisite for operating on data collections that originate from different computations, performed with different computer codes and/or different precision. Our work may serve as a starting point for more sophisticated concepts to quantify numerical errors and uncertainties, especially for more complex materials properties that do not necessarily depend monotonically on the basis-set size, e.g., band gaps, forces, vibrational frequencies, and the relative energies discussed in this work.

## Methods

### First-principles calculations

To perform the DFT calculations with these three codes in a systematic manner, the atomic simulation environment ASE^{44,45} is used to generate the code-specific input files and to store the results using ASE’s lightweight database module. In this paper, we focus on the two main numerical approximations that are used to discretize and represent the electron density *n*(**r**) = ∑_{lk}∣*ψ*_{lk}(**r**)∣^{2} via the Kohn-Sham wavefunctions *ψ*_{lk}(**r**) for the individual electronic states *l*. These are the density of the reciprocal-space grid (k-grid) for Brillouin-zone (BZ) integrations and the finite basis set *ϕ*_{jk}(**r**) enumerated via *j*. The Kohn-Sham wavefunctions are written as

For the BZ sampling, we use a Γ-centered k-grid characterized by a uniform k-point density

where *N*_{k} is the total number of k-points and V_{BZ} the BZ volume.

To discuss and analyze numerical errors, we perform total-energy calculations for fixed geometries, i.e., without any relaxation, using a representative set of numerical settings. These are k-point densities of 2, 4, and 8 Å, respectively, and choices of basis sets that are described in detail in the Supplementary Methods section. They reflect settings typically used in production calculations and also include extremely precise numerical settings that ensure convergence in total energy of <0.001 eV/atom. The latter are termed “fully converged” reference when we evaluate the error occurring with less precise (typical) settings. To make sure that no other numerical errors cloud the ones stemming from the k-grid and the basis set, all other computational parameters—for example, the convergence thresholds for self-consistency—are chosen in an extremely conservative way, as detailed in the Supplementary Methods section.

## Data availability

All presented data, i.e., in- and output files for all electronic-structure theory codes, is available at the NOMAD Repository (https://repository.nomad-coe.eu) under the following DOIs. exciting: https://doi.org/10.17172/NOMAD/2020.07.15-1, FHI-aims: https://doi.org/10.17172/NOMAD/2020.07.27-1, GPAW: https://doi.org/10.17172/NOMAD/2020.08.20-1,VASP: https://doi.org/10.17172/NOMAD/2020.07.29-1

## Code availability

The results of this work are available for further analysis as a Jupyter notebook in the NOMAD Artificial-Intelligence Toolkit (https://nomad-lab.eu/AItoolkit/tutorial-error-estimates). Therein, errors for arbitrary systems can be calculated via an easy-to-use interface for various numerical settings for exciting, FHI-aims, and GPAW. The corresponding Python code can be modified and extended for custom purposes.

## References

Lejaeghere, K. et al. Reproducibility in density functional theory calculations of solids.

*Science***351**, aad3000 (2016).Draxl, C. & Scheffler, M. In

*Handbook of Materials Modeling*(eds Andreoni, W. & Yip, S.) 49–73 (Springer, Cham, 2020).Jones, R. O. Density functional theory: its origins, rise to prominence, and future.

*Rev. Mod. Phys.***87**, 897–923 (2015).Talirz, L., Ghiringhelli, L. & Smit, B. Trends in atomistic simulation software usage [Article v1.0].

*Living J. Comp. Mol. Sci.***3**, 1483 (2021).Kohn, W. & Sham, L. J. Self-consistent equations including exchange and correlation effects.

*Phys. Rev.***140**, A1133–A1138 (1965).Lejaeghere, K., Speybroeck, V. V., Oost, G. V. & Cottenier, S. Error estimates for solid-state density-functional theory predictions: an overview by means of the ground-state elemental crystals.

*Crit. Rev. Solid State Mater. Sci.***39**, 1–24 (2014).Curtarolo, S. et al. The high-throughput highway to computational materials design.

*Nat. Mater.***12**, 191–201 (2013).Gaultois, M. W. et al. Data-driven review of thermoelectric materials: performance and resource considerations.

*Chem. Mater.***25**, 2911–2920 (2013).Saal, J. E., Kirklin, S., Aykol, M., Meredig, B. & Wolverton, C. Materials design and discovery with high-throughput density functional theory: the open quantum materials database (OQMD).

*JOM***65**, 1501–1509 (2013).Hachmann, J. et al. Lead candidates for high-performance organic photovoltaics from high-throughput quantum chemistry–the Harvard Clean Energy Project.

*Energ. Environ. Sci.***7**, 698–704 (2013).Jain, A. et al. FireWorks: a dynamic workflow system designed for high–throughput applications.

*Concurr. Comput. Pract. Exp.***27**, 5037–5059 (2015).Pizzi, G., Cepellotti, A., Sabatini, R., Marzari, N. & Kozinsky, B. AiiDA: automated interactive infrastructure and database for computational science.

*Comput. Mater. Sci.***111**, 218–230 (2016).Fischer, C. C., Tibbetts, K. J., Morgan, D. & Ceder, G. Predicting crystal structure by merging data mining with quantum mechanics.

*Nat. Mater.***5**, 641–646 (2006).Wang, S., Wang, Z., Setyawan, W., Mingo, N. & Curtarolo, S. Assessing the thermoelectric properties of sintered compounds via high-throughput ab-initio calculations.

*Phys. Rev. X***1**, 021012 (2011).Castelli, I. E. et al. Computational screening of perovskite metal oxides for optimal solar light capture.

*Energy Environ. Sci.***5**, 5814–5819 (2012).Draxl, C. & Scheffler, M. NOMAD: The FAIR concept for big data-driven materials science.

*MRS Bull.***43**, 676–682 (2018).Calderon, C. E. et al. The AFLOW standard for high-throughput materials science calculations.

*Comput. Mater. Sci*.**108**, 233–238 (2015–10).Curtarolo, S. et al. AFLOWLIB.ORG: A distributed materials properties repository from high-throughput ab initio calculations.

*Comput. Mater. Sci.***58**, 227–235 (2012).Jain, A. et al. Commentary: the materials project: a materials genome approach to accelerating materials innovation.

*APL Mater.***1**, 011002 (2013).Talirz, L. et al. Materials Cloud, a platform for open computational science.

*Sci. Data***7**, 299 (2020).Haastrup, S. et al. The computational 2D materials database: high-throughput modeling and discovery of atomically thin crystals.

*2D Mater.***5**, 042002 (2018).Gulans, A. et al. exciting: a full-potential all-electron package implementing density-functional theory and many-body perturbation theory.

*J. Phys. Condens. Matter***26**, 363202 (2014).Blum, V. et al. Ab initio molecular simulations with numeric atom-centered orbitals.

*Comput. Phys. Commun.***180**, 2175 (2009).Havu, V., Blum, V., Havu, P. & Scheffler, M. Efficient O(N) integration for all-electron electronic structure calculation using numeric basis functions.

*J. Comp. Phys.***228**, 8367–8379 (2009).Blöchl, P. Projector augmented-wave method.

*Phys. Rev. B***50**, 17953–17979 (1994).Mortensen, J. J., Hansen, L. B. & Jacobsen, K. W. Real-space grid implementation of the projector augmented wave method.

*Phys. Rev. B***71**, 035109 (2004).Enkovaara, J. et al. Electronic structure calculations with GPAW: a real-space implementation of the projector augmented-wave method.

*J. Phys. Condens. Matter***22**, 253202 (2010).Koelling, D. D. Self-consistent energy band calculations.

*Rep. Prog. Phys.***44**, 139–212 (1981).Zunger, A., Topiol, S. & Ratner, M. A. First-principles pseudopotential in the local-density-functional formalism.

*Chem. Phys.***39**, 75–90 (1979).Weinert, M., Wimmer, E. & Freeman, A. J. Total-energy all-electron density functional method for bulk solids and surfaces.

*Phys. Rev. B***26**, 4571–4578 (1982).Holzschuh, E. Convergence of momentum space, pseudopotential calculations for Si.

*Phys. Rev. B***28**, 7346–7348 (1983).Devreese, J. T. & Camp, P. V.

*Electronic Structure, Dynamics, and Quantum Structural Properties of Condensed Matter*. (Springer, 1985).Barth, U. V. & Gelatt, C. D. Validity of the frozen-core approximation and pseudopotential theory for cohesive energy calculations.

*Phys. Rev. B***21**, 2222–2228 (1980).Kresse, G. & Furthmüller, J. Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set.

*Comput. Mater. Sci.***6**, 15–50 (1996).Kresse, G. & Furthmüller, J. Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set.

*Phys. Rev. B***54**, 11169–11186 (1996).Rappe, A. M., Rabe, K. M., Kaxiras, E. & Joannopoulos, J. D. Optimized pseudopotentials.

*Phys. Rev. B***41**, 1227–1230 (1990).Nielsen, O. & Martin, R. First-principles calculation of stress.

*Phys. Rev. Lett.***50**, 697–700 (1983).Knuth, F., Carbogno, C., Atalla, V., Blum, V. & Scheffler, M. All-electron formalism for total energy strain derivatives and stress tensor components for numeric atom-centered orbitals.

*Comput. Phys. Commun*.**190**, 33–50 (2015–05).Bernasconi, M. et al. First-principle-constant pressure molecular dynamics.

*J. Phys. Chem. Solids***56**, 501–505 (1995).Gonze, X. & Vigneron, J.-P. Density-functional approach to nonlinear-response coefficients of solids.

*Phys. Rev. B***39**, 13120 (1989).Setyawan, W. & Curtarolo, S. High-throughput electronic band structure calculations: Challenges and tools.

*Comput. Mater. Sci.***49**, 299–312 (2010).Kraus, P. Basis set extrapolations for density functional theory.

*J. Chem. Theory Comput.***16**, 5712–5722 (2020).Krukau, A. V., Vydrov, O. A., Izmaylov, A. F. & Scuseria, G. E. Influence of the exchange screening parameter on the performance of screened hybrid functionals.

*J. Chem. Phys.***125**, 224106 (2006).Bahn, S. R. & Jacobsen, K. W. An object-oriented scripting interface to a legacy electronic structure code.

*Comput. Sci. Eng.***4**, 56–66 (2002).Larsen, A. H. et al. The atomic simulation environment–a Python library for working with atoms.

*J. Phys. Condens. Matter***29**, 273002 (2017).

## Acknowledgements

This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No. 676580 and No. 740233 (TEC1p). O.T.H. and E.W. gratefully acknowledge funding by the Austrian Science Fund, FWF, under the project P27868-N36. We gratefully acknowledge the help from Mohammad-Yasin Arif and Luigi Sbailò for producing the final version of the Jupyter notebook and publishing it on the NOMAD AI toolkit.

## Funding

Open Access funding enabled and organized by Projekt DEAL.

## Author information

### Authors and Affiliations

### Contributions

C.C., L.M.G., and M. Scheffler designed the database-creation protocol; B.B. and M. Strange developed the ASE-based scripts for setting up and performing the calculations. S.L. and A.G. performed the exciting, B.B. and C.C. the FHI-aims, M. Strange and J.J.M. the GPAW, and E.W. and O.T.H. the VASP calculations. B.B., M. Strange, S.L, L.M.G., and C.C. wrote the notebook to evaluate and analyze the data. C.D., K.S.T., and M. Scheffler ideated the project that was led and coordinated by C.C. All authors contributed to the discussion of the results and to the writing of the manuscript.

### Corresponding author

## Ethics declarations

### Competing interests

The authors declare no competing interests.

## Additional information

**Publisher’s note** Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Supplementary information

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

## About this article

### Cite this article

Carbogno, C., Thygesen, K.S., Bieniek, B. *et al.* Numerical quality control for DFT-based materials databases.
*npj Comput Mater* **8**, 69 (2022). https://doi.org/10.1038/s41524-022-00744-4

Received:

Accepted:

Published:

DOI: https://doi.org/10.1038/s41524-022-00744-4