High-throughput calculations of charged point defect properties with semi-local density functional theory — performance benchmarks for materials screening applications

Calculations of point defect energetics with Density Functional Theory (DFT) can provide valuable insight into several optoelectronic, thermodynamic, and kinetic properties. These calculations commonly use methods ranging from semi-local functionals with a-posteriori corrections to more computationally intensive hybrid functional approaches. For applications of DFT-based high-throughput computation for data-driven materials discovery, point defect properties are of interest, yet are currently excluded from available materials databases. This work presents a benchmark analysis of automated, semi-local point defect calculations with a-posteriori corrections, compared to 245 “ gold standard ” hybrid calculations previously published. We consider three different a-posteriori correction sets implemented in an automated work ﬂ ow, and evaluate the qualitative and quantitative differences among four different categories of defect information: thermodynamic transition levels, formation energies, Fermi levels, and dopability limits. We highlight qualitative information that can be extracted from high-throughput calculations based on semi-local DFT methods, while also demonstrating the limits of quantitative accuracy


INTRODUCTION
High-throughput first-principles calculations using Density Functional Theory (DFT) are finding wide use for screening over candidate materials for a variety of potential applications [1][2][3] . Such DFT-based, data-driven approaches offer the potential to expedite materials discovery and design; recent applications include studies on transparent conducting oxides, next-generation solar cells, and half-Heusler compounds, as well as automated screening for infrared, dielectric, and piezoelectric properties [4][5][6][7][8][9][10][11][12][13] . Point defects influence a range of optoelectronic, thermodynamic and diffusion properties of materials and can be studied using DFT [14][15][16] , yet they are often limited in their consideration within high-throughput screening studies. In part, this situation is related to the timeconsuming nature of DFT-based point-defect calculations and the relatively complex workflows that they can entail, specifically for semiconductors and insulators, which are the focus of the present work. In the past few years, progress has been made to automate the setup and analysis of such calculations, and these efforts have addressed some of the hurdles toward conducting point-defect calculations in a high-throughput manner 1,[17][18][19][20][21][22] .
DFT-based computations of point defect properties commonly make use of periodic supercells in calculations of the dilute-limit formation energy 14 . Such formation energies can be used in statistical-thermodynamic formalisms to determine properties such as defect "trap" levels, or equilibrium point defect and carrier concentrations arising from a given set of growth or annealing conditions. To capture the dilute-limit assumption that is implicit in such thermodynamic treatments, large "supercells" are used to remove the energetic contribution of periodic image interactions, leading to increased computational cost. Because these supercells are large, there is interest in the application of computationally efficient semi-local exchange approximations, like the generalized gradient approximation (GGA) 23 . However, such functionals suffer from a well-known underestimation of the band gap, an error which compounds for charged defects whose levels often lie close to the band edges. Additionally, such calculations suffer from self-interaction errors that impact their ability to qualitatively and quantitatively describe charge localization. This greatly complicates the efficacy of approaches based on semi-local DFT in accurately capturing defect energetics.
To partially circumvent the issue of band gap underestimation and charge delocalization errors, hybrid-functional approaches, which mix exact exchange with semi-local correlation at the expense of added computational cost, have become a "gold standard" for point defect computation with DFT, as they strike an appropriate balance of accuracy with computational cost for a single material system [24][25][26][27][28][29][30][31][32][33][34][35][36][37] . Despite their improvement over semilocal methods in describing the band gap and the degree of charge localization around a defect, it is worth noting that hybrid methodologies ideally are fine tuned to experimental values for a given material 38 . An even greater degree of accuracy for electronelectron interactions can be achieved with beyond-DFT approaches such as GW 39 , Quantum Monte Carlo 14,40 or many body perturbations theory 41 , but these approaches, like hybrid functional methods, have limitations in their application to defects and remain significantly more computationally expensive than those based on semi-local DFT [42][43][44] . An alternative approach for high-throughput defect calculations is to make use of semi-local functionals with the application of a-posteriori corrections to remove the energetic contributions from band gap errors [14][15][16] . Such an approach remains an attractive prospect for developing large databases of defect properties for applications such as initial property screening over wide compositional spaces. For interesting candidate compounds selected by the initial screening procedure, more in-depth quantitative follow-up calculations can be performed at a higher level of theory.
While many thorough reviews on point defect computational methodology have been written [14][15][16] , comparative studies between DFT-based properties derived from semi-local functionals and those obtained from hybrid methods have been confined to a limited set of materials [26][27][28][45][46][47] . To explore the application of high-throughput defect studies with semi-local functionals, this work benchmarks several defect properties, calculated in an automated fashion, to a dataset of 245 previously published hybrid-functional point defect calculations (23 distinct materials) [48][49][50][51][52][53][54][55][56][57] . The Results and discussion section provides an overview of the quantities and corrections to be considered in this work, as well as a quantitative and qualitative comparison for calculated thermodynamic transition levels, formation energies, Fermi levels, and dopability limits. The section then concludes with comments on the viability of high-throughput approaches for calculating point-defect properties based on semi-local DFT approaches, emphasizing the potential for such calculations to aid in the screening of defect properties in semiconductors and insulators. The Methods section provides additional computational methods not covered in earlier sections, including details on the automation procedure that was used for this work.

RESULTS AND DISCUSSION
To understand the benchmarking results considered in this work, it is important to first establish the computational formalisms that are being applied to point defects throughout. This section begins by explaining the computational methodologies implemented, before proceeding to detailed results and analysis.

Point-defect property formalism
As mentioned in the Introduction, the dilute-limit formation energy for a single charged defect is commonly calculated using the supercell approach, wherein a single defect is placed inside a supercell of the perfect lattice and internally relaxed with DFT 14,17 . The dilute limit formation energy for a defect X q , in charge state q, is given by: The first two terms on the right hand side of Eq. (1) are the DFT total energies for the charged defect supercell and bulk supercell, respectively. The third term is a summation over atomic chemical potential values (μ i ), or the energy cost of adding (n i = + 1) or removing (n i = − 1) atoms from the bulk, undefective supercell. Limiting values of the atomic chemical potentials can be calculated using the facets of a DFT phase diagram, corresponding to different growth or annealing conditions 58 . The fourth term is the defect charge multiplied by the Fermi level (ϵ F ), which serves as the electron chemical potential reservoir. The final term is a set of corrections which we focus on in this work and elaborate upon in more detail in the next subsection. Within standard supercell calculations, this term typically includes the removal of spurious periodic image interactions and properly aligns the reference potentials in charged and uncharged systems, but it can also address errors in the chemical potentials and other possible deficiencies, as discussed below. The formation energy is traditionally plotted for a particular choice of atomic chemical potentials as a function of the Fermi level, with the "zero" value representing the valence band maximum (VBM). A qualitative example is displayed in Fig. 1. For a single defect type with multiple associated charge states, it is possible to consider defect transition levels which correspond to the energetic level at which a defect captures (or emits) a free carrier. These are highlighted as purple boxes in Fig. 1 and are given by the following equation reflecting the Fermi level at which two charge states become equal in energy: where the formation energies (Eq. (1)) of the two defects are evaluated for the electron chemical potential at the VBM, producing the Fermi level position of the transition level, ϵðq=q 0 Þ, relative to the VBM. Since the same chemical composition exists for both defects in Eq. (2), the quantity ϵðq=q 0 Þ has a value that is independent of the chemical potentials and therefore is not affected by any ambiguities that may exist in their values for a specific application. If both defect energies are determined from their relaxed ground states, then Eq. (2) yields the thermodynamic transition level. However, due to the errors caused by band gap underestimation and charge delocalization mentioned earlier, charge transition levels calculated by semi-local DFT can be expected to be of limited accuracy. Another class of quantities which can be considered are those that are derived directly from the collective consideration of the ensemble of dominant (i.e., those with the highest concentration) intrinsic point defects under a fixed set of assumptions for the growth environment, which set values for the elemental chemical potentials 14,17 . This includes what can be referred to as the "dopability limit," which is the Fermi level that first produces defects with negative formation energy, indicating an instability in the structure with regard to producing defects at that Fermi level 1,59 . These quantities are shown as orange circles in Fig. 1. Pushing the Fermi level beyond the dopability limit violates the thermodynamic condition of structural stability, and is a good indicator of the dopability of a material (i.e., whether a system can be doped with a certain carrier type with additional extrinsic doping). These dopability limits can be particularly useful when screening for systems with a particular carrier type (p-type vs. ntype), provided the elemental chemical potentials are properly treated.
A further set of quantities that can be derived from consideration of the collective set of point defects are the carrier concentrations and Fermi level. These quantities are derived from the condition of charge neutrality: X fX;qg The summation is over all defects, X q , with concentration [X q ] given by: It is worth noting that thermodynamics dictates that there is a range of possible chemical potentials compatible with the thermodynamic stability of the host material. For the purpose of this work, we have set one chemical potential that is used to compare hybrid and semi-local calculations (see subsection on Benchmark Data Set for more details). For more on the determination of elemental chemical potentials and enthalpy of formation, we refer to the works by Stevanović et al. 60 and Chevrier et al. 47 .

Corrections
Several corrections have been proposed for improving the accuracy of point defect calculations in semiconductors and insulators based on periodic DFT supercell methods [14][15][16][17] , which constitute the term E corr in Eq. (1). They can broadly be categorized into corrections related to supercell periodic image interactions and corrections related to band gap errors.
The use of periodic supercell methods in DFT-based defect property calculations lead to spurious interactions with neighboring image defects. The energy contribution from this interaction leads to a deviation from the dilute limit formation energy that needs to be corrected. These finite-size effects have been discussed at length in many reviews [14][15][16][17]61,62 . For defects in semiconductors and insulators, charge can accumulate on the defect and long range Coulomb interactions with neighboring image charges become a leading energy contribution to the finitesize effects. Moreover, a charged supercell causes an infinite bulk system to have a diverging electrostatic energy. To account for this second effect, a homogeneous background charge is introduced to neutralize the supercell. This requires the use of a potential alignment method to remove the effect of the homogeneous background charge on the defect formation energetics. To account for both of these effects, a number of approaches have been proposed 63 . In this work, we employ the formalism suggested by Freysoldt et al. 64,65 , which can be seen as an extension of previous approaches based on calculations of the Madelung energy of an array of point charges in a neutralizing background charge 66 . The potential alignment correction in this scheme involves consideration of the planar-averaged electrostatic potential, which can also help to assess the localization of charge for a given defect.
While an infinitely large supercell would lead to localized defect levels that are flat (i.e., dispersonless) in the Brillouin zone of the host crystal, the finite size of the calculation supercell can result in significant dispersion of the single-particle states associated with a given defect. To address this effect, solutions such as intelligent k-point sampling to "average" out the dispersion have been developed 15,67 . In practice, the use of a sufficiently large supercell tends to make the energetic impact of dispersion from bandgap states negligible.
Several other corrections have been suggested, such as the use of band edge filling corrections to partially remove artifical delocalization of near-band-edge defect states 16 . We considered such corrections in this work but found their application to be inconsistent when applied in a high-throughput environment, as their strong dependence on potential alignment caused small uncertainties in alignment to produce large variations in the predicted defect properties. We therefore do not include this correction in the analysis that follows.
As mentioned above, semi-local DFT functionals like GGA suffer from underestimation of the band gap. This can lead to significant errors for calculated defect properties; for example, in cases where the defect has a level in the band gap, but band gap underestimation causes this level to be resonant with the host band states. A natural way to correct for this is to extend the conduction band minimum (CBM) upwards to fit a value for the band gap derived from a higher level of theory or from experiment. We will refer to this procedure as a "basic scissor operator," and a schematic representation of this approach is shown on the left side of Fig. 2.
The basic scissor operator band edge shift does not renormalize the Fermi level in Eq. (1) because the VBM is held fixed. An alternative approach is to renormalize the electron chemical potential by moving the band edges according to a reference energy that is common to both the GGA and hybrid calculations. References that have been demonstrated to work well include the electronic states of a localized defect transition level located within the band gap or using the averaged electrostatic potential (ESP) across a supercell 68,69 . For the purposes of high-throughput calculations, the averaged ESP approach is fairly straightforward to implement, and is chosen in what follows. This averaged ESPreferenced band edge shift (bes) approach generally leads to shifts in both the VBM and CBM, as illustrated in the middle panel of Fig. 2.
In addition to electron chemical potential renormalization, it has been suggested that shifting single particle defect states in a manner consistent with the shifting of the CBM and VBM, may help to improve the accuracy of calculated defect formation energetics 16 . For shallow states, a simple way to achieve this is to fully shift any "free" electrons associated with the defectoccupied Kohn-Sham eigenvalues that lie above the CBM-by an amount equal to the CBM shift (see right side of Fig. 2). By integrating the occupied conduction band states to a single number of "free" carriers (n), and assuming they shift 100% with the band edge shift, this free carrier shift correction corresponds to: nΔE CBM . An analogous correction can be applied for "free" holes, by integrating unoccupied eigenvalues below the VBM and shifting fully with the VBM shift.
In Table 1 we list descriptions and acronyms for the three different correction schemes considered in this work. They consist of a standard charge correction without a band edge shift (no_bes), an averaged ESP-referenced band edge shift (bes), and the same band edge shift with the previously described free carrier shift (bes_free). Additional correction schemes have been proposed to specify the movement of occupied defect levels lying within the gap. For example, one can project the single-particle defect wavefunction onto valence and conduction wavefunctions to produce a percentage of VB and CB character, which can then be used to shift the defect band 16 . An approach for performing this defect level projection correction was recently described and implemented in the open-source Python code Pawpyseed, as detailed by Bystrom et al. 70 . This correction, along with the basic scissor operator approach, was considered in the analysis of this benchmark study but did not substantially improve results relative to the other corrections. We therefore restrict analysis in this manuscript to results produced from the correction schemes outlined in Table 1.
It is worth noting that the delocalization of free carriers is exacerbated by the underestimation of the band gap, and the free carrier shift methodology neglects many important contributions to the formation energy. For example, the relaxation energy associated with charge localization around a defect often requires a higher level of theory to accurately capture. The localization of charge around a defect is a major reason for justifying the use of more computationally intensive methods such as hybrid functionals. Further, the ability to produce a "fully ionized defect" (a defect which has not captured any extra charge from the bulk) proves difficult to achieve with semi-local functional approaches -particularly in systems with smaller band gaps (<2 eV). As of yet, no reliable methodology exists for further correcting semi-local calculations to account for these additional errors associated with charge delocalization, and they are a source for some of the remaining errors relative to the benchmark results reported below.
Benchmark data set For this work, we benchmark defect properties calculated by GGA, with the correction schemes described earlier, against a set of 245 previously published "gold standard" defect calculations carried out with careful implementation of hybrid functionals [48][49][50][51][52][53][54][55][56][57] . The chemistry and types of defects explored in the benchmark set are displayed in Fig. 3.
For the remainder of this Section, we present and analyze the differences between hybrid and GGA-PBE-based results (referred to in what follows as "errors" associated with the GGA-based approaches) for (i) transition levels, (ii) formation energies, (iii) Fermi levels, and (iv) dopability limits. For the calculations of properties in (ii)-(iv), a choice for the chemical potentials was required. To ensure a consistent comparison, these quantities were calculated from the phase diagram produced by the same level of theory; i.e., using the same PAW potentials and exchange correlations as the defect calculations. Limiting facets of the GGA and hybrid phase diagrams were chosen to be consistent with that of previously published work [48][49][50][51][52][53][54][55][56][57] . The Supplementary Information (Supplementary Table 2) lists the facets used to compute the chemical potentials for each bulk system in this study.
For all four properties (i)-(iv) considered in this work, we analyze the errors resulting from the different correction combinations outlined in Table 1 and described in detail in the previous subsection. For each system considered, we analyze the same set of defects and charge states for both hybrid and GGA. To compute hybrid formation energies and their associated charge corrections (i.e., image corrections and potential alignment using the method by Freysoldt et al.), raw data for every hybrid calculation was inserted into the defect database structure described in the Supplementary Information. After successful integration into the database, the workflow infrastructure was used to compute formation energy corrections and other relevant defect phase diagram information.
The primary motivation of this work is to benchmark the performance of automated, semi-local point defect calculations. For automation purposes, a single hybrid functional approach was chosen-namely, the choice of 25% exact exchange as implemented by Heyd, Scuseria, and Ernzerhof (HSE06) 25 -which has been shown to balance computational efficiency with significant improvements in thermochemical results relative to semi-local DFT 71 . As a result of the different (non-HSE06) hybrid implementations that exist within the benchmark set, some differences exist between the computed band gaps of the previously published data and those generated by the automated defect workflow. Details on computed band gaps, as well as the hybrid version which was used in each bulk system in previously published data, is given in the Supplementary Information (Supplementary  Table 1). While the automation workflow has the ability to rerun defect calculations in larger supercells, in this study we chose to only run supercell sizes which contained no more than 300 atoms. In addition, no local perturbations around the defect site were introduced at the beginning of the semi-local DFT relaxation. Symmetry was also left on during the relaxation process to reduce the computational cost. The hybrid calculations were performed with more intelligent treatment of local symmetry, as detailed in the previously published literature [48][49][50][51][52][53][54][55][56][57] . As noted later in this work, these limitations make it difficult to distinguish errors which are artifacts of the calculation setup (local structural initialization or supercell size) from errors which are due to the intrinsic limitations of the GGA-PBE functional approach. Improvements in automated, semi-local GGA-PBE through the introduction of local perturbations or increased supercell sizes should be investigated in future work. In the discussion that follows, we make a point of calling attention to errors which may be improved with such approaches.
Thermodynamic transition levels Figure 4 shows the differences in calculated thermodynamic transition levels (Eq. (2)) between the GGA and hybrid calculations. The no_bes correction has the largest error of the three corrections considered, with a mean absolute error (MAE) of 0.7 eV and a mean error of -0.6 eV. The bes and bes_free corrections improve performance relative to no_bes, with mean errors within ±0.2 eV and MAEs of 0.4 eV. For these two corrections, the largest errors mostly occur for transition levels lying outside the band gap: Out of 301 transition levels lying inside the gap as hybrid calculations, 88% of the bes-corrected transition levels have absolute errors <0.5 eV, while only 23% of the 105 outside-of-gap transition levels fall within this error range. The errors are on average negative for the uncorrected transition levels, implying that GGA tends to underestimate the transition levels. The bes correction has the tendency to reduce the magnitude of the errors and increase their values such that the average error becomes slightly positive. The bes_free correction tends to shift the errors to slightly more negative values relative to bes.  While quantitative assessments are useful for comparing directly with experiment, it can be useful for the purpose of property screening to consider the more qualitative classification of the transition levels as being deep or shallow, related to whether they would be expected to be problematic or not for optoelectronic applications. To assess the ability of highthroughput semi-local approaches for screening in this qualitative manner, we graphically display the success of the three correction schemes at predicting deep and shallow levels in Fig. 5. This information is displayed in the style of a "confusion matrix", where "True" values are those determined by the hybrid calculations. For example, the upper left hand number in Fig. 5 shows the no_bes correction successfully predicted 96.6% of the shallow transition levels near the conduction band. For this analysis, a transition level in the window of 25-75% of the gap is considered a "deep" level, and anything outside of that window is considered a "shallow" level. Shallow defects can be associated with the conduction band (shown in blue) or with the valence band (shown in orange). For each correction scheme, the left side shows the number (and percentage) that are predicted True, while the right side shows False predictions relative to the hybrid calculation. Overall success rates for the no_bes, bes, and bes_free corrections are 87.3%, 79.4%, and 85.8%, respectively. For estimating both shallow and deep transition levels, the no_bes correction performs best, with a total success rate of 87.3%, while the bes correction improves at predicting deep levels, with a 89.3% success rate.
For classifying the levels as shallow or deep, the bes correction showed improved prediction success for deep levels. This is consistent with the fact that hybrid calculations often achieve larger local relaxation than semi-local calculations (resulting in deeper transition levels) and suggests that this relaxation error can be partially accounted for by opening up the band gap with a band edge shift. Upon incorporating a free carrier shift to the band edge shift correction (bes_free), the number of correctly predicted shallow levels is increased, but the number of correctly predicted deep levels is reduced. The net result is a success rate of 85.8% for the bes_free correction, which is an improvement relative to the bes correction's 79.4% total success rate, but a decrease relative to the no_bes success rate. This suggests that the free carrier shift improves the description of a shallow level defect after a band edge shift, but the shift is less helpful when applied to a deep level defect. Figure 6 displays the error distribution for formation energies with a Fermi level set equal to the VBM-a value that was chosen to maintain consistency across all systems. Note that the "true" Fermi level is fixed by charge neutrality after consideration of all defects present. The chemical potentials were set to be comparable between the hybrid and semi-local computations as described earlier. The MAE in formation energy decreases from 1.58 to 1.24 and 0.98 eV for the no_bes, bes, and bes_free corrections, respectively. The distribution of errors also skew less negative with increasing correction complexity, with the respective mean errors being -0.95, -0.84, and -0.39 eV.

Formation energies
Many of the errors in formation energies shown in Fig. 6 are very large compared to thermal energies and would thus lead to several orders of magnitude inaccuracies in calculations of the corresponding equilibrium defect concentrations. It is thus important to analyze these results further with an emphasis on understanding the reliability of predictions for low-energy defects,  as these have the highest concentrations and dominate in determining the Fermi level and carrier concentrations. For this analysis, we consider the "lower hull" of stable defects (from the hybrid functional benchmark results) which comprise the set of charged defects which are stable at any value of the Fermi level as it varies across the band gap.
When considering formation energy errors for the lower hull of stable defects, the semi-local functional correctly accounts for the stability of defects lying on the lower hull with success rates of 90%, 97.5%, and 90% for the no_bes, bes and bes_free corrections, respectively. Importantly, even in cases where the semi-local functionals do not predict the correct dominant defect, those defects that are predicted to be on the lower hull in the hybrid calculations all lie within 0.1 eV of the lower hull in the semi-local calculations with the bes correction. For the no_bes and bes_free corrections, an additional four defects lie in the range of 0.35-0.6 eV above the hull. To illustrate this analysis, Fig. 7 compares the energy above hull for each defect calculated with GGA (x-axis) and hybrid (y-axis) functionals. Each datapoint represents a single defect type in a specific charge state, with coloring according to the absolute formation energy error. While Fig. 7 only displays data for bes corrected defects, the results are qualitatively similar for all three corrections analyzed in this work. Data points lying to the right of the red dashed line have a larger distance to the lower defect hull in GGA than with hybrid. As stated earlier, the bes correction method produces above lowerhull values of <0.1 eV for all defects that lie on the lower hull in the hybrid calculation (y = 0). It is also noteworthy that larger absolute formation energy errors are produced at larger above-hull values (where the defects are less likely to influence the defect concentrations).
Overall, the results in this section suggest that while formation energies calculated from semi-local functionals can yield large errors that are only partially improved by the use of the different correction schemes, they can still yield information that is valuable for materials screening. Specifically, there is high success at predicting those defects that lie on or near the lower formation energy hull. Thus, if a higher level of accuracy is desired for a set of defects, an initial screening with GGA can be reliably followed up with calculations of lower-energy defects at a higher level of theory. For the defects in this study, if GGA were used to screen for defects which come within 0.5 eV of the ground hull, then the reduction in the number of the calculations would be 52%, 49%, and 56%, for the no_bes, bes, and bes_free corrections, respectively. By avoiding excessive screening with higher levels of theory, this speeds up the process of screening for electronic material properties of interest.

Fermi levels
Fermi levels were computed using the charge neutrality condition given by Eq. (3), at room temperature (300 K). While this work only focuses on a select set of defects for each bulk system, the true Fermi level depends on many factors, including the potential for multiple extrinsic "impurities" (i.e., defects that are not intrinsic), and the possibility of non-equilibrium growth processes leading to defect concentrations that differ from their equilibrium thermodynamic values. Regardless, the use of identical defects and identical thermodynamic conditions in the present benchmarking analysis provides the framework for an appropriate comparison between the predictions of hybrid and semi-local approaches. Specifically, the prescribed set of chemical potential values and a fixed set of possible defects suffice to define a value of the Fermi energy from the calculated formation energies, even if this value differs from that of a real material grown under non-equilibrium conditions. We choose to omit systems which have only one defect in consideration, given that a single defect is insufficient to adequately compute the Fermi level. We also chose to omit systems with negligible carrier concentrations and without a clear pinning point for two defects to charge compensate each other. Such systems result in a Fermi level that is primarily dictated by Fig. 7 Minimum distance of defect formation energy from lower hull. Comparison of the minimum distance from the formation energy of a defect to the lower hull of stable charged defects (the set of lowest formation energies for all values of the Fermi energy across the gap), as calculated by hybrid functionals (y-axis) versus GGA-PBE functionals with the bes correction scheme. Each datapoint corresponds to a single defect type in a single charge state, and the coloring corresponds to errors in formation energy with the Fermi level set at the VBM. The red dashed line corresponds to equality between the hybrid and GGA-PBE bes-corrected results. (y = x line). For all three correction schemes, the errors in the scaled Fermi energies are confined to a window of ± 21%, with mean absolute errors of 5.5%, 6.5%, and 5.8% for the no_bes, bes, and bes_free corrections, respectively. The primary outlier for the no_bes correction is TiO 2 , which has an error of −21% due to the high formation energy for the 2+ oxygen vacancy. This formation energy was subsequently lowered upon the application of the bes and bes_free correction schemes, which act to renormalize the values of the Fermi level (as discussed in Section 2.2); these corrections reduce the error to -14% and -9% for the bes and bes_free corrections, respectively.
The largest outlier observed for the results involving Fermi level renormalized corrections was BaZrO 3 , with an error of 15% for both the bes and bes_free corrections. The Fermi level pinning caused by the oxygen vacancy and the Sc on Zr substitution under the no_bes correction was not substantially modified, on an absolute scale, upon Fermi level renormalization. Therefore, the application of an asymmetric shift of the CBM by 2.5 eV more than the VBM resulted in a Fermi level pinning in the middle of the gap. This suggests a limitation in the band edge shift approach where key features of the electronic structure (e.g., hybridization within the valence band and the resulting coupling with defects) are more drastically impacted by the choice of functional.
By classifying hybrid-determined Fermi levels in the upper half of the gap as n-type and hybrid-determined Fermi levels in the lower half of the gap as p-type, a general trend emerges that the results derived from semi-local functionals have more negative errors for n-type systems and more positive errors for p-type systems (although we acknowledge that there is only one p-type material in this data set). This trend holds for all of the semi-local functional results that made use of corrections involving Fermi level renormalization (i.e., bes and bes_free corrections) and indicates that Fermi levels determined by these methods are closer to the middle of the gap than those predicted by hybrid calculations. While future studies with expanded datasets should continue to explore the systematic skewing of the Fermi level toward the middle of the gap for semi-local calculations with band edge shifts, this observed correlation suggests that Fermi levels lying closer to the band edges may be more reliable than those calculated in the middle of the gap. This has interesting implications for future high-throughput studies and should be investigated further, given the limited size of the Fermi level dataset considered in this work.
When considering calculations of the Fermi level in a highthroughput study, it is useful to consider the determination of dominant carrier type (i.e., n-type or p-type). Given this, it is noteworthy that all three correction schemes for the semi-local functional results are nearly perfect in predicting the carrier types for the 8 materials considered. The one exception is CaZrO 3 , which has a hybrid Fermi level located 55% of the way across the gap and is therefore predicted to be n-type, while the no_bes Fermi level is 46% across the gap and predicted to be p-type. This value is moved to the upper half of the gap with the application of the bes and bes_free correction schemes. Overall, the success in the classification of carrier types provides motivation for the use of corrections involving band-egde shifting within future highthroughput studies.

Dopability limits
Dopability limits, defined by the values of the Fermi level at which formation energies become negative, can provide an indication of the dopability of a material. The upper dopability limit would prevent the Fermi level from getting closer to the CBM (an electron killer), while the lower dopability limit would prevent the Fermi level from getting closer to the VBM (a hole killer). Errors for 22 (21) upper (lower) dopability limits are displayed in Fig. 9. As for the formation energies and Fermi level results, the comparison between hybrid and semi-local functional dopability is done with a specific set of chemical potentials as described earlier. For the upper dopability limit, errors generally improve with increasing correction complexity, with average error values of −1.04, −0.33, and −0.31 eV for the no_bes, bes, and bes_free corrections, respectively. For the lower dopability limit, errors for the no_bes correction are spread uniformly around zero, resulting in a relatively small average value of −0.17 eV, but a much larger value of 0.65 eV for the MAE. The application of Fermi level renormalization through the bes correction results in a positive shift in the average error to 0.55 eV and an MAE of 0.66 eV. The application of the bes_free correction results in the smallest errors, with an average error of −0.09 eV and an MAE of 0.44 eV. Overall, the bes_free correction improves error relative to no_bes, increasing the number of systems within 0.2 eV of the correct upper-limit (lower-limit) dopability from 1 (3) to 8 (9). Similar improvements were made with the bes correction, but with a wider variation in errors for the lower-limit dopability.
As in previous sections, it is worthwhile to consider the extent to which semi-local functionals can be used for qualitative highthroughput screening of dopability. The purpose of dopability consideration in a high-throughput study would likely be to screen for materials which can or cannot be doped with a certain carrier type. The lower dopability limit produces hole killers which limit the possibility of p-type doping, while the upper dopability limit produces electron killers which limit the possibility for n-type doping. Using the metric that any dopability limit within 10% of the band gap away from the band edges is a free carrier-killer, Fig. 10 displays graphical confusion matrices for deciding whether it is possible to dope a system n-type or p-type for the upper and lower limit dopabilities, respectively. Overall, the three correction schemes or semi-local functionals perform identically, with a total success rate of 90.7%, for all three corrections.
A noteworthy limitation of this dataset is an overreliance on cation-site defects, which frequently act as electron killers and have lower formation energies close to the CBM. With this in mind, it is worth noting that prediction of p-type behavior displayed in Fig. 10 is generally less ambiguous across the corrections implemented, as a result of there being fewer defects with low formation energies near the VBM. Future studies with similar goals to this work should aim to include more anion-based or "hole killer" defects, to avoid this issue. Despite this limitation, it is worth calling attention to the general success of correctly predicting defects that prevent doping (true positives for "Not Dopable" prediction in both n-type and p-type cases) for every correction method. This suggests that an effective strategy for highthroughput screening could be to use semi-local calculations to filter systems which are clearly not dopable, with hybrid functional approaches being used to further cull the list of available candidates for an application of interest. Such an approach is a subject for future work with a more comprehensive set of defects represented. Overall, the general success of all three correction methods provides motivation for future high-throughput screening studies of bulk materials with the potential for doping.
Insights for high-throughput computation of point defects High-throughput computation for materials discovery is a relatively nascent field which endeavors to speed up timeconsuming growth and characterization experiments in efforts to discover and design materials with targeted combinations of properties for a given envisioned technological application. In the context of point defect calculations in semiconductors and insulators, valuable information about the dominant type of ionic defect and electronic carrier type, as well as the extent to which their concentrations can be altered by extrinsic doping, can be gleaned from DFT-based calculations. These defect quantities are critically important for controlling optoelectronic, magnetic, ionic transport, and thermodynamic properties. Nevertheless, despite their importance for materials discovery and design, no large-scale databases of high-throughput calculated defect properties have been developed to date. This is likely due to the computational cost of hybrid functional approaches, which are frequently used in "low-throughput" for a single material system, due to their reduction of band gap and self-interaction errors relative to semi-local DFT functionals.
To assess the degree to which these more computationally efficient semi-local functionals can be used in the context of highthroughput screening of defect properties, this work presented a fully automated workflow that was used to benchmark the performance of defect calculations based on semi-local DFT with three different sets of a-posteriori corrections (no_bes, bes and bes_free). The accuracy of each approach was assessed by comparing the semi-local DFT predictions to a benchmark set of 245 previously published hybrid calculations for four different classes of defect properties: thermodynamic transition levels, formation energies, Fermi levels, and dopability limits. For several of these quantities, statistically similar errors relative to hybrid values were obtained from the semi-local functionals, independent of the correction scheme. However, strong performance and consistency across all four defect quantities was achieved with the bes_free scheme.
In Table 2, we summarize the results of bes_free-corrected semilocal DFT calculations of point defect properties compared to the benchmark hybrid functional results. These results show an average error for all quantities ranging from 0.2-0.4 eV, with a tighter uncertainty (given by the standard deviation of errors) for transition levels (±0.3 eV) and a wider spread for errors in Fig. 10 Graphical confusion matrices for potential doping as determined by the dopability limit. Possible p-type doping is determined from a lower dopability limit of <0.1 of the band gap, while possible n-type doping is determined by an upper dopability limit >0.9 of the band gap. For each correction method, the number (and percentage) of True/False predictions for dopability, relative to the hybrid prediction is shown for n-type dopability (allowing Fermi levels closer to the conduction band, in blue) and p-type dopability (allowing Fermi levels closer to the valence band, in orange). formation energies (±1.9 eV). Uncertainties for the Fermi level and dopability limits are not reported, given the limited size of the data set. For qualitative results, partial success was demonstrated for transition levels, with a 86% classification success rate for determining whether a level was shallow or deep. Near perfect qualitative success was shown for stable defects across the gap and for the classification of Fermi levels as p-type or n-type. Categorization of the potential dopability of a material also demonstrated an 91% success rate. Given these results, semi-local defect calculations with band edge shifting corrections, performed using fully automated, highthroughput workflows show promise for initial screening over defect properties. In particular, the results of this study suggest semi-local defect calculations with a-posteriori corrections are successful in qualitatively describing whether a defect is characterized by deep vs. shallow transition levels, the dominant defect and carrier (p or n) type, and dopability limits. With this in mind, it is possible to imagine an efficient screening procedure whereby semi-local functionals with a-posteriori corrections can be used to assess defects and charge states of interest within a chemical composition, and then can be followed up with more computationally intensive hybrid approaches at a higher level of accuracy. While such an approach shows promise, we emphasize that it may miss important exceptions resulting from nonequilibrium growth processes and/or from inaccuracies in the calculations, such as severe deficiencies in the underlying electronic structure of the host material.
Future work remains to be done on understanding the quantitative performance of semi-local functional calculations of defect properties -especially for reducing formation energy errors. For example, empirically-derived relaxation energy corrections for delocalized defect states may prove useful for reducing errors in formation energies and thermodynamic transition levels in the future. Given the size of the test set considered here, it is possible that additional classification or screening schemes for reducing error and identifying "delocalization" can be derived. Overall, we hope this study proves to be a useful first step in future large-scale studies looking to improve the potential for high-throughput DFT screening procedures to be performed on point defect properties.

METHODS
Several tools have been developed recently to aid in the automation of defect calculations [17][18][19] . This includes the Python Charged Defect Toolkit (PyCDT) produced by some of the authors of this manuscript 17 . Despite the work done to improve the setup and analysis stages of performing defect calculations, there is yet to be a demonstration of a fully automated workflow that is widely distributed and used by the community. To this end, we have merged the essential functionalities of PyCDT into the pymatgen 72 , atomate 73 and maggma 74 code bases to produce a fully automated defect workflow which is compatible with the Materials Project infrastructure 75 . Details on the workflow and database design are outlined in the Supplementary Information.
All DFT calculations in this study were performed in a planewave basis set using the Projector-Augmented Wave (PAW) method 76 , as implemented in the Vienna Ab-initio Simulation Package (VASP) 77,78 . For the automated semi-local defect calculations, we make use of the Perdew-Burke-Ernzerhof generalized gradient approximation (GGA-PBE) 23 , a plane-wave cutoff of 520 eV, and a gamma-centered k-point grid with a density of 100 k-points per Å −3 . This k-point density is twice that used to perform converged structural relaxations in the standard Materials Project infrastructure 75 . All calculations were performed spinpolarized. The positions of the atoms were relaxed within the supercell, the overall shape and volume of which was fixed at values dictated by the reference bulk (non-defected) crystal structure. The structural relaxations were performed with an energy tolerance of 10 −3 eV. In all calculations, the tolerance for the electronic self-consistency was 10 −4 eV. For the computational details of all hybrid calculations, the reader is referred to the previously published work from which the benchmark data is gathered [48][49][50][51][52][53][54][55][56][57] .
Point defect formation energy formalism details, as well as details regarding the chemistries and defects analyzed, were outlined in the previous section.