Machine learning the Hubbard U parameter in DFT+U using Bayesian optimization

Within density functional theory (DFT), adding a Hubbard U correction can mitigate some of the deficiencies of local and semi-local exchange-correlation functionals, while maintaining computational efficiency. However, the accuracy of DFT+U largely depends on the chosen Hubbard U values. We propose an approach to determining the optimal U parameters for a given material by machine learning. The Bayesian optimization (BO) algorithm is used with an objective function formulated to reproduce the band structures produced by more accurate hybrid functionals. This approach is demonstrated for transition metal oxides, europium chalcogenides, and narrow-gap semiconductors. The band structures obtained using the BO U values are in agreement with hybrid functional results. Additionally, comparison to the linear response (LR) approach to determining U demonstrates that the BO method is superior.


INTRODUCTION
Density functional theory (DFT) is the work horse of electronic structure simulations. In particular, semi-local exchange-correlation functionals, such as the generalized gradient approximation (GGA) of Perdew, Burke, and Ernzerhof (PBE) 1,2 are widely used for high-throughput materials discovery efforts [3][4][5] . However, due to the self-interaction error (SIE), local and semi-local functionals systematically underestimate band gaps, occasionally to the extent that semiconductors are erroneously predicted to be metallic 6,7 . One way to mitigate SIE is including a fraction of excact (Fock) exchange in a hybrid functional 8,9 , such as the Heyd-Scuseria-Ernzerhof (HSE) functional 10,11 . Hybrid functionals produce improved band gaps; However, their relatively high computational cost may be prohibitive for large systems, such as interface models containing several hundred atoms, and/or for screening a large number of materials.
The DFT+U method is an alternative approach, first introduced by Anisimov et al. 12 and further developed by Dudarev et al. 7 A Hubbard-like model is adopted to correct the self-interaction error as follows: where n is the atomic-orbital occupation number, m is the orbital momentum, σ is a spin index, U represents the on-site Coulomb repulsion, and J represents the exchange interaction. The exchange interaction may be incorporated into the Coulomb term to define the effective Hubbard U as U eff = U − J 7 . The accuracy of DFT+U calculations hinges on the choice of the system dependent parameter, U eff . Often, U eff is determined empirically by searching for values that reproduce experimental results, such as the band gap of a given material. The empirical approach will inevitably fail if no experimental data are available, which is frequently the case in materials discovery efforts. Several approaches have been proposed for determining the U parameter from first principles 13 .
One popular approach is the linear response (LR) method, based on constrained DFT (CDFT) proposed by Cococcioni and de Gironcoli 14 . In this approach, linear behavior of the total energy with respect to the occupation number is imposed to correct the unphysical curvature of local and semi-local functionals. The effective U parameter is then given by the difference between the inverse non-interacting density response, χ À1 0 , and the inverse interacting density response, χ −1 , which correspond, respectively, to the second derivatives of the non-charge-self-consistent DFT energy, E, and the charge-self-consistent DFT energy, E KS , with respect to the localized occupation of a single site, q I : To simulate the variation of occupations in an infinite crystal, a super-cell is constructed. The size of the super-cell is increased until the value of U eff converges, which may result in a significant computational cost. A modified self-consistent formulation of LR has been proposed by Kulik et al. 15 . Another method for determining the Hubbard U parameter from first principles is the unrestricted Hartree-Fock (UHF) approach proposed by Mosey et al. 16,17 . Within this approach, UHF calculations are performed for an electrostatically embedded finite-sized cluster to simulate the bulk material. Similar to the super-cell size in the LR method, the U parameters have to be converged with increasing cluster size. A third approach is the constrained random-phase approximation (cRPA) [18][19][20] , which is significantly more computationally expensive.
Here, we propose an approach for determining U eff based on Bayesian optimization (BO). To demonstrate the performance of our approach, we have chosen three classes of materials, for which semi-local functionals are known to perform poorly: transition metal monoxides, europium chalcogenides, and narrow-gap semiconductors. The results and the computational cost of GGA +U BO are also compared to the LR method for determining U eff . We show that for materials of all three classes, BO with a welldesigned objective function produces band structures of comparable quality to a hybrid functional. The band structures obtained using GGA+U BO are either similar to or better than those obtained with GGA+U LR . In all cases BO is more computationally efficient than LR.

RESULTS AND DISCUSSION
Bayesian optimization BO 21 is a machine learning algorithm that performs global optimization of a black box function by guessing the shape of the function and then iteratively improving it by sequentially sampling points that are promising and/or have high information content. A Bayesian statistical model is used to emulate the objective function. Gaussian process (see additional information in the Supplementary Discussion) is a common choice for BO algorithms 22 because it also quantifies the uncertainty associated with each prediction. Future function evaluations are decided by an acquisition function 23 . BO is superior to grid search because it utilizes acquired data to make informed sampling decisions, thus requiring fewer expensive function evaluations (e.g., DFT calculations). The successful application of Bayesian optimization requires an appropriate objective function. Here, the surrogate objective function, f ð U ! Þ, is formulated such that its maximum corresponds to the U eff values that best reproduce the band gap, E g , and the band structure obtained from HSE: ; U n is the vector of U eff values applied to different atomic species and U i ∈ [−10, 10] eV. ΔBand is defined similarly to ref. 24 as the mean squared error of the PBE+U band structure with respect to HSE: N E represents the total number of eigenvalues, ϵ, included in the comparison, N k is the number of k-points, and N b is the number of bands selected for comparison. To avoid double counting the band gap difference in the calculation of ΔBand, the valence band maximum (VBM) and conduction band minimum (CBM) are shifted to zero for both the PBE+U and HSE band structures. Hence, ΔBand captures differences in the qualitative features of the band structures produced by PBE+U vs. HSE, independently of the difference in the band gap. Hybrid functionals have been used in the past as a reference for DFT+U 25 . Here, the HSE functional has been chosen as the reference thanks to its well-established accuracy for various materials [26][27][28] . The method can be easily adapted to use any other reference band structure, including different hybrid functionals or many-body perturbation theory. The coefficients α 1 and α 2 may be used to assign different weights to the band gap vs. the band structure. We set α 1 = 0.25 and α 2 = 0.75 as default. Additional analysis of the sensitivity of the U parameters and the resulting band structures to the choice of α 1 and α 2 is provided in the Supplementary Discussion. BO is applied to maximize the objective function, as illustrated in Fig. 1. To initialize the calculation, the geometry information and input settings of VASP are required. Gaussian process with the radial basis function kernel (see additional information in the Supplementary Discussion) is used as the statistical model to fit the objective function 29 . The Gaussian process defines a mean, μ, and a standard deviation, σ, for every point, U ! , and updates those parameters from the training data in each iteration 29 . The upper confidence bound (UCB) 29 acquisition function is used to predict the value that would be generated by evaluation of the objective function at a new point and decide what value of U ! to sample in the n th iteration: The hyperparameter κ controls the trade-off between exploration and exploitation. Here, we set κ = 1. In each iteration VASP is called to evaluate f ð U n ! Þ. The posterior probability distribution and acquisition function are updated until the maximal number of iterations, N, is reached. Then, the code outputs the value of U ! that maximizes f ð U ! Þ. The reference HSE calculation is performed only once. The total computational cost of BO amounts to performing one HSE calculation and N PBE+U calculations. The computational cost of updating the BO posterior probability distribution and acquisition function in each iteration is negligible compared to the cost of DFT calculations. Because N is typically small and all calculations are performed for one unit cell (as opposed to a supercell or a large cluster), the computational cost of determining U eff by BO is often lower than that of determining U eff by the aforementioned first principles methods. Below, we demonstrate the performance of PBE+U BO for NiO, EuTe, and InAs. In the Supplementary Discussion, we provide additional examples for EuS, InP, InSb, GaSb, Ge, NiO, MnO, FeO, and CoO. The LR method of determining U eff is used for comparison. We note that all DFT+U results presented here are based on the implementation of the Dudarev formalism in VASP. Different DFT+U implementations may yield different results 30 .
Transition metal oxide: NiO Transition metal oxides are among the materials most often studied with DFT+U 15 . In particular, NiO and other transition metal monoxides have been shown to be poorly described by the PBE functional because of their strongly correlated d electrons [31][32][33][34] . Our PBE result shown in Fig. 2a is no exception. The PBE band gap of 0.73 eV is considerably underestimated compared to the HSE result of 4.26 eV shown in Fig. 2b. The latter is close to the experimental band gap of 4.0-4.3 eV 35,36 . For both PBE and HSE, the valence band maximum (VBM) is located at point T. However, there are qualitative differences in the structure of the valence band and the location of the conduction band minimum (CBM). The contribution of different states to the PBE band structure, shown in Fig. 2a, suggests that the reason could be that the Ni 3d bands are located below the Ni 4s bands leading to an inverted band ordering at the Γ point, which should have been the CBM. Thus, we start by applying the Hubbard U correction to the Ni d states. Based on the HSE band structure, we included the top 10 valence bands and the bottom 4 conduction bands in the optimization. As shown in Fig. 3, the BO algorithm converges within 13 iterations and finds the optimum at U Ni;d eff = 6.8 eV. This leads to rearrangement of the top valence bands and moves the first two conduction bands upward, as shown in Fig. 2c. These changes increase the gap to 3.36 eV and correct the position of the CBM. In comparison, the LR method produces a U Ni;d eff value of 5.4 eV, which yields a similar band structure with a somewhat smaller gap of 3.20 eV and the CBM incorrectly located between T and K, as shown in Fig. 2d.
Although applying a Hubbard U correction to the Ni d states leads to a significant improvement over pure PBE, some residual differences between the PBE+U and HSE results may be attributed to the non-negligible contribution of the oxygen 2p states, as shown in Fig. 2a 37 . Therefore, a two-dimensional (2D) BO was performed. As illustrated in Fig. 4a, it results in Hubbard U values of U Ni;d eff = 5.9 eV and U O;p eff = 9.4 eV after 55 iterations. This yields a gap of 3.70 eV and a band structure in closer agreement with HSE, as shown in Fig. 2e. Another indication for the closer agreement with HSE is an increase of the objective function from −0.37 to −0.11 eV 2 . In comparison, LR produces a Hubbard U value of U O;p eff = 8.3 eV for the O 2p states. With the two-parameter LR, the band gap increases to 3.53 eV and the CBM is in the right position, as shown in Fig. 2f. For NiO, the accuracy of the BO and LR methods for determining U is similar, however the BO method is more efficient. For one-parameter optimization on the same number of CPU cores, LR with a 3 × 3 × 3 super-cell takes about 4.5 times longer than BO, as detailed in Supplementary Table 3. When two parameters are considered, LR is more computationally expensive than BO by a factor of eight.
Europium chalcogenide: EuTe In addition to the d-block transition metals, f-block elements are considered as strongly correlated and are often treated with DFT +U 38,39 . EuTe, a ferromagnetic semiconductor with a gap of 2.0 eV (refs. 40,41 ), is a representative example. As shown in Fig. 5a, the PBE functional erroneously predicts EuTe to be metallic. Analysis of the contributions of different orbitals shows that the conduction bands are mostly formed by the d and s states of Eu. The 7 valence bands closest to the Fermi level are formed by the f states of Eu. The p states of Te are found in a separate manifold below the Eu-4f derived bands [42][43][44] . The vanishing gap can be attributed to the overlap between bands dominated by the 4f and 5d states of Eu. The HSE calculation, shown in Fig. 5b, produces an indirect band gap of 1.24 eV with the VBM located at the Γ point.
BO was performed with the Hubbard U correction applied to the 4f orbitals of Eu. 20 eigenvalues around the Fermi level (10 above and 10 below) were included in the optimization. The resulting band structure with U Eu;f eff = 7.1 eV is plotted in Fig. 5c. The Hubbard U correction pushes the Eu 4f bands away from the Fermi level and opens a band gap of 0.71 eV. The PBE+U BO calculation reproduces the qualitative features of HSE band structure. In comparison, the LR method produces a value of U Eu;f eff = 5.5 eV. As shown in Fig. 5d, this results in a somewhat smaller band gap of 0.56 eV. In this case, BO also produces similar results to LR at a fraction of the computational cost. The total amount of CPU time required for LR with a 3 × 3 × 3 super-cell is higher than BO by a factor of 9, as shown in Supplementary Table  3. Based on Fig. 5a, the d states of Eu contribute significantly to the bottom conduction bands. Therefore, it may possible to further reduce the difference between the HSE and PBE+U results by considering the d orbital of Eu (refs. 45,46 ) in 2D BO. However, applying U corrections to two orbitals of the same element is not implemented in VASP.

Narrow gap semiconductor: InAs
Although all the examples discussed so far involve strongly correlated electrons, SIE also manifests in PBE calculations of narrow-gap semiconductors [47][48][49] . The band structure of InAs, shown in Fig. 6a, is a representative example, where PBE produces no band gap 50 . HSE, shown in Fig 6b, gives a band gap of 0.37 eV, which is comparable to the experimental gap of 0.417 eV 51,52 . We find that conducting BO with a single U eff applied to the p orbitals of either In or As yields no band gap. Therefore, 2D BO is performed to optimize U In;5p eff and U As;4p eff simultaneously. The resulting band structure with U In;5p eff = −0.5 eV and U As;4p eff = −7.5 eV is shown in Fig. 6c. PBE+U BO produces a band gap of 0.31 eV and a band structure in good agreement with HSE. Negative U eff values are theoretically permissible when the exchange term, J, is larger than the on-site Coulomb repulsion, U, as suggested in refs. [53][54][55][56][57] . It has been argued that negative values of U eff are appropriate for delocalized states, such as the In s and As p states, where the exchange-correlation hole is  overestimated by GGA. In comparison, the LR method, which does not permit negative values of U eff , yields U In;5p eff = 0.7 eV and U As;4p eff = 3.3 eV. This produces a band structure with no gap, as shown in Fig. 6d. Thus, for InAs BO is not only more efficient, but also more accurate than LR. In the Supplementary Discussion, we further demonstrate the transferability of the U values found by BO for bulk InAs to a slab of InAs with 11 atomic layers.
In summary, we have developed a method of determining the optimal Hubbard U parameter in DFT+U by using the Bayesian optimization machine learning algorithm. The objective function was formulated to reproduce as closely as possible the band gap and the qualitative features of the band structure obtained with a hybrid functional. We have demonstrated robust performance for several materials, including transition metal oxides, Eu chalcogenides, and narrow-gap semiconductors. PBE+U BO consistently produces band structures comparable to HSE. Furthermore, BO is more efficient than the linear response method and performs better, particularly in cases that call for negative values of U eff . Based on this, we conclude that PBE+U BO can provide the accuracy of a hybrid functional at the computational cost of a semi-local functional. This may enable conducting simulations for larger systems, such as surfaces and interfaces, which would be unfeasible with a hybrid functional.

Computational details
All DFT calculations were performed using the Vienna Ab Initio Simulation Package (VASP) code with the projector augmented wave (PAW) method [58][59][60] . Spin-orbit coupling (SOC) was included in the calculations of transition metal oxides, Eu chalcogenides, and narrow-gap semiconductors 61 . Details of the lattice parameter and energy cutoff used for each compound are provided in Supplementary Table 1. The Brillouin zone was sampled using an 8 × 8 × 8 k-point grid for PBE and PBE+U calculations and 6 × 6 × 6 for HSE calculations. The coordinates of the high-symmetry k-points used for plotting the band structures are provided in Supplementary Table 2.

Linear response calculations
For LR calculations, a small potential was applied to the target state of a single site. Non-charge-self-consistent and charge-self-consistent calculations of the state occupation were performed as the potential was varied from −0.04 eV to 0.04 eV in increments of 0.02 eV. The derivatives of the occupation with respect to the potential give the non-interacting and interacting response matrices, used in Eq. 2 to calculate U eff (see also Yang et al. 62 ). To avoid interactions between periodic images of the perturbed atom, a 3 × 3 × 3 super-cell was constructed.

DATA AVAILABILITY
Data will be available from the corresponding author upon reasonable request.    6 Band structures of InAs obtained using different methods. a PBE with the projected contributions of In p states, In s states, and As p states colored in green, red, and yellow, respectively; b HSE; c PBE+U with U In;p eff and U As;p eff from BO; d PBE+U with U In;p eff and U As;p eff from LR. The VBM is referenced to 0 eV.
M. Yu et al.