Exploring DFT$+U$ parameter space with a Bayesian calibration assisted by Markov chain Monte Carlo sampling

Density-functional theory is widely used to predict the physical properties of materials. However, it usually fails for strongly correlated materials. A popular solution is to use the Hubbard corrections to treat strongly correlated electronic states. Unfortunately, the exact values of the Hubbard $U$ and $J$ parameters are initially unknown, and they can vary from one material to another. In this semi-empirical study, we explore the $U$ and $J$ parameter space of a group of iron-based compounds to simultaneously improve the prediction of physical properties (volume, magnetic moment, and bandgap). We used a Bayesian calibration assisted by Markov chain Monte Carlo sampling for three different exchange-correlation functionals (LDA, PBE, and PBEsol). We found that LDA requires the largest $U$ correction. PBE has the smallest standard deviation and its $U$ and $J$ parameters are the most transferable to other iron-based compounds. Lastly, PBE predicts lattice parameters reasonably well without the Hubbard correction.


Introduction
f −electron systems, and in the metal-to-insulator transition observed in many oxides [17]. The lack of accurate representation of the electronic state by commonly used XC functionals impacts the prediction of the electronic and vibrational properties, in particular, the electronic bandgap, which can be significantly underestimated [18,19].
To address the above problem, DFT+U introduces an on-site Coulombic interaction for the treatment of the electronic correlation effects [17]. An external Hubbard-like [40,41] term is added to the DFT Hamiltonian along with a double-counting term, which negates the initial DFT calculation for the terms the Hubbard Hamiltonian attempts to correct. Two parameters U and J are supplemented to the Hubbard-like term to correct the Coulomb-repulsion term and the effective exchange interaction, respectively. This method is famously used in LDA+U [38,42,43], and can be generalized to numerous DFT functionals to correct the error-prone calculations.
The main challenge facing DFT+U is obtaining the optimal U and J correction parameters. To date, there are many methods designed to obtain these values. One of the most popular methods is the semiempirical approach [44] in which the parameters' values are modified until the DFT+U predictions of some physical predefined observables are in agreement with the experimental measurements, such as electron bandgap, lattice parameters, or the atomic magnetic moment. Unfortunately, this method is limited to the materials for which experimental data is available.
Other methods are based on density-functional perturbation theory, linear response, the constrained random-phase approximation, and Hartree-Fock-based methods [45][46][47][48][49][50][51][52][53][54]. Though these theoretical methods are quite mature and have been implemented in different computational packages [55][56][57], it is unclear if the search for optimal correctional parameters will have a unique global minimum, or multiple different local minima. This is a question that can only be addressed by a careful exploration of the U and J parameters. Furthermore, the explicit dependence of the DFT+U Hamiltonian on orbital-dependence adds another dimension to the parameter space (i.e., the known metastability issue in DFT+U ) [58][59][60].
It is also unclear if a set of parameters defined for a specific material can be generalized to other materials (even within the same material family), or if the dependence of those parameters is strongly dependent on the selected XC functional within the DFT. The current understanding is that the correction parameters cannot be transferred to different materials because electronic correlations are governed by the nature of the chemical bonding and the coordination number, leading to the manifestation of different correlation effects within the same material family [61][62][63]. This further complicates the use of the DFT+U methods in high-throughput calculations.
In this investigation, we implemented an algorithm that builds a probability distribution in the parameter space of U and J for five strongly correlated iron-based compounds having different Fe oxidations states using three different XC functionals. We subsequently performed DFT+U calculations using the mean values obtained for the U and J parameters for the initial five materials and three other similar iron-based compounds. We compared our results with the experimental data to investigate how well the distribution of the correction parameters can be extended to other similar compounds. Moreover, we inspected the relationship of the U and J parameters with different XC functionals.

Bayesian Calibration and Markov Chain Monte Carlo Sampling
The main goal of this project is to determine the distribution of the U and J values that can generate accurate predictions for iron-based materials using DFT+U modeling. We use Bayesian calibration assisted by Markov chain Monte Carlo (MCMC) to sample the parameter space of U and J values on the potential energy surface. MCMC obtains the posterior distribution from the Bayes' theorem in an empirical form.
Bayes' theorem defines the relationship between posterior and prior probability distributions on the parameter space: Where P (U, J | X) is the posterior density on the parameter space given the dataset X, P (X | U, J) is the likelihood, P (U, J) is the prior density, and P (X) = P (X | U, J)P (U, J)dU dJ is the integrated probability of the data (or "evidence") given the model.

Priors
The prior density is bounded uniform, with boundaries drawn in such a way that prevents the unphysical regions of the parameter space (i.e., J > U ) from appearing in the posterior.
Likelihood The likelihood model is a "white noise" model with variance estimated in the course of the where M ij and X ij are DFT model result and corresponding experimental measurement i of type j, respectively, and N j is the total number of experimental results of type j. The variance of the experimental error σ j for property j is estimated in the calibration, with an inverse gamma prior.
Markov chain Monte Carlo The evidence P (X) may be written in terms of the likelihood and prior This integral is not analytically estimable in the present case because of the nonlinear nature of the likelihood. Therefore, a Markov chain sampling procedure is used, which is guaranteed to converge in the limit of infinite samples drawn [64]. In practice, the routine generally moves through an initial equilibration (burn-in) period before settling into its equilibrium state. Convergence is not guaranteed if insufficient samples are drawn from the parameter space, but criteria indicative of non-convergence can be tested for and ruled out, using for example a batch means test [65]. The MCMC procedure leads to a sample-based posterior distribution, from which the statistical behavior of the stochastic model can be easily inferred (for more details see Ref. 66.

Exchange-Correlation Functionals
XC functionals play a vital role in DFT. Numerous attempts have been made in the past to model the XC functional for accurate prediction of many-body quantum interactions [67,68]. In particular, the precise description of the metal-to-insulator transition in strongly correlated materials requires methods that go further than a single determinant of the N-electron wave function [38]. Even though DFT is an exact theory, the perfect XC functional is not yet known.
The local density approximation (LDA), proposed by Kohn and Sham [2], adopts the exchange and correlation energies of the homogeneous electron gas [69][70][71][72]. It follows that LDA is most successful in predicting the properties of solids whose effects of exchange and correlation are short-range [70]. Nevertheless, it is broadly used in different material classes. LDA is known to underestimate exchange energy and overestimate correlation energy [73]. LDA systematically overbinds atoms causing an underestimation of the bond lengths and lattice parameters.
Generalized-gradient approximation (GGA) XC are semi-local functionals that consider the gradient electron density to account for the anisotropic manner of the localized electron densities [10,74] of many materials. Contrary to LDA, GGA functionals tend to underbind atoms overestimating bond lengths and lattice constants. Perdew-Burke-Ernzerhof (PBE) [10,74] is the most popular GGA XC functional and has been used successfully to study many types of materials [75].
PBEsol differs from PBE only by two altered parameters that allow PBEsol to maintain many of the reliable properties from PBE [76]. PBEsol improves the equilibrium properties such as bond lengths and lattice parameters over PBE. However, it is generally poor in predicting dissociation or cohesive energies and reaction energy barriers [77][78][79][80].

DFT+U
The correction in DFT for strongly correlated materials can be introduced by including the Hubbard model [81].
where ρ σ (r) represents the charge density for spin σ and n iσ mm represents the density matrix for site i, states m and m , and spin σ. The E Hub is the Hubbard correction for the electron-electron interaction that is only applied to specified correlated states (d− and f −electrons). The E dc , known as the double counting term, contains the energy of the correlated electrons calculated within DFT [82,83]. This term must be subtracted from the total energy as the Hubbard term already contains the corrected energy of these states.
The E Hub used in this study is the rotationally invariant form introduced by Lichtenstein et al. [81]. In this form, the Hubbard Hamiltonian is written in terms of matrix elements of the Coulomb electron-electron interaction. The matrix elements can be expanded in terms of Slater integrals and spherical harmonics.
The effective Coulomb and exchange interactions, U and J are defined using the matrix elements of the Coulomb electron-electron interaction. Using atomic orbitals to extract the Slater integrals can lead to a large overestimation because the Coulomb interaction is screened. In DFT simulation packages, U and J are treated as parameters to reach an agreement with experimental results.
The DFT+U method offers a relatively simple solution to the complex problem of XC interaction calculation in strongly correlated materials. In this work, the method used to determine the double-counting correction in the Hubbard Hamiltonian was the rotationally invariant method proposed by Liechtenstein [81].
SrFeO 3 is a cubic perovskite and its HM structure propagates along <111> direction by 46 • from one layer to another [87]. Zhao and Zhou [88] suggest that at low temperatures SrFeO 3 adopts domains of FM phase causing magnetic inhomogeneity generating a metal-to-insulator transition. Given that our study is for 0 K, we use the FM ordered SrFeO 3 phase.
As for BaFeO 3 , it is well known that depending on the oxygen deficiency and temperature, it can adopt different crystal structures including triclinic, rhombohedral, tetragonal, and cubic [87,89]. These different phases correspond to different magnetic orderings ranging from the HM in the hexagonal to the FM in the cubic phase [87,90]. This material is reported to be an insulator in the cubic phase [91]. BaFeO 3 follows the <100> magnetic propagation direction and the helical structure rotates the y-z component of the spin by 22 • . Based on this smaller angle, BaFeO 3 is closer to a ferromagnetic structure than SrFeO 3 [87]. This is supported by the large magnetic field (42 T) [92] required to switch SrFeO 3 from HM to FM compared to the small magnetic field (0.3 T) [91] required to switch BaFeO 3 . Given the small HM characteristic turn angle in the BaFeO 3 , we considered this structure to be FM for this investigation.
We performed our calculations assuming that all structures had a collinear FM ordering. This assumption was made considering computational efficiency. Moreover, both of the perovskites were assumed to be insulating and in their cubic phases. Even though SrFeO 3 is not insulating, we purposefully selected a bandgap for this material (we choose a bandgap reported for a thin film [93], to both evaluate the robustness of MCMC to errors in small target values and avoid overfitting towards metallic states. Using the MCMC sampling, the space of U and J parameters was built up with the calculations made for these five compounds. The mean values of the of U and J parameters were extracted from the estimated distribution after the burn-in. Using these mean values, we performed simulations for the original five materials as well as for the new materials: FeO (Fm3m), α−Fe 2 O 3 (R3c), Al 2 FeB 2 (Cmmm), Fe 5 PB 2 (I4/mcm), and Fe 5 SiB 2 (I4/mcm).  Table 1: Univariate analysis of the parameter space distributions. U avg and J avg represent the arithmetic mean of each distribution. σ U and σ J denote the standard deviation. σ U J denotes the overall standard deviation. Lastly ρ U J represents the Pearson correlation coefficient between U and J parameters.

Results and discussion
tionals. However, these corrections can be system dependent. Therefore, if the distribution of the correction parameters applied to various materials is localized, one can conclude that the correction parameters can be used universally in that specific XC functional with similar materials with reasonably good accuracy.
After the PBE+U Markov Chain reached the stationary zone (ca. 2500 pairs of proposed U and J), the parameters varied minimally until it was terminated (ca. 8000 pairs). This leads us to believe that once the critical number of proposed pairs is reached and the algorithm locates an initial minimal variance of proposed parameters, it will not locate another in parameter space. The same behavior was observed for LDA+U and PBEsol+U . This suggests that there is only one maximum for the U and J probability density distribution.
The arithmetic means and standard deviations of the U and J parameters are displayed in Table 1. The standard deviation of the J parameter (σ J ) is smaller than that of U (σ U ) for all three XC functionals.
This is due to the higher effect the Coulomb-repulsion has on the energetics of a system compared to the exchange interaction. The mean value of the J parameter (J avg ) is larger than the J values used in other DFT+U investigations [50,82,94]. However, recent studies have shown that larger values of J are needed to reproduce the magnetic moments of some iron compounds [95,96]. These larger values of J tend to decrease the overprediction of the magnetic moment (See Supplementary Figure 2).
The mean value of the U parameter (U avg ) is substantially larger in LDA in comparison to its GGA counterparts (PBE and PBEsol). This is expected as LDA is the simplest XC functional. As mentioned earlier, LDA assumes the XC energy is that of a homogenous electron gas. Therefore, it requires a greater onsite electron-electron Coulomb-interaction correction. LDA systematically overbinds the atoms causing an underestimation in the bond lengths. Thus, it requires a larger U parameter to create the Coulomb-repulsion and expand the bonds and consequently the lattice parameters. Table 2 shows this initial underestimation in the lattice parameters and the subsequent improvement when introducing U and J in the calculations.
Regarding the GGA functionals, PBEsol required a slightly larger U parameter than PBE. One of the purposes for the introduction of PBEsol was to correct the overestimation of PBE [76] in the bond lengths for non-correlated materials. For correlated materials, however, this overestimation leads to a closer prediction 8H9 -H9 in bond length to the experimentally measured because correlated materials need an extra Coulomb-repulsion for more precise predictions.
The distribution of U and J parameters is more localized in PBE comparing to that of LDA and PBEsol.
This can be visualized in Figure 1 by noting the spread of the distribution in the parameter space in each case. Furthermore, the univariate analysis, provided in Table 1, shows that PBE has a noticeably smaller overall standard deviation (σ U J ) than LDA and PBEsol. A small overall standard deviation of U and J in the parameter space (i.e. a localized distribution) indicates that using the mean values U avg and J avg simultaneously improves the results toward a better agreement with the experimental data for all of the structures. Therefore, we expect U avg and J avg values from the distribution for PBE+U are more transferable to other materials than LDA+U and PBEsol+U .
The last column of Table 1 shows the Pearson correlation coefficient of the U and J parameter (ρ U J ). If the correlation factor is equal to zero, U and J are completely independent. As the correlation approaches one, the dependence increases. If the correlation is equal to one, U and J are completely dependent. This is reminiscent of the Dudarev approximation [98], a more simplified yet rotationally invariant form, where the functional can be obtained by only considering the zeroth-order Slater integral. The treatment of U and J values in Ref. 98 is analogous to incorporating the exchange interaction to the Coulomb interaction using an effective U , U eff = U -J [95]. Within the Dudarev approximation the two parameters of Lichtenstein form, U and J, are effectively reduced to one parameter, U eff . We find that PBE has the largest correlation between U and J. This seems to indicate that out of the three studied XC functionals, PBE has the closest result between Dudarev approximation [98] and Lichtenstein form [81].
We have recorded the experimental and predicted values of lattice parameters, volume, bandgap, and magnetic moment for the studied materials in Table 2. Even though volume, bandgap, and magnetic moment were set equally as target parameters, it can be seen that the corrections for lattice parameters have been more effective than the bandgap and magnetic moment. This is because treating the volume on the same footing as bandgap and magnetic moment increases the importance of the lattice parameters. Also, changes in lattice parameters can subsequently effect the magnetic moment and bandgap predictions.
We selected an accuracy criterion of 0.09Å and compared the experimental and predicted lattice parameters before and after the Hubbard correction. As expected, LDA usually underestimates the lattice parameters. This corroborates our previous findings that LDA needs a larger U value to correct the underestimation of the bond lengths. The introduction of the correction parameters improves the prediction for most of the structures. As mentioned before, PBE is known for overestimating lattice parameters in noncorrelated materials. For strongly correlated materials, as in the case of this study, this trend benefits PBE in predicting the lattice parameters reasonably accurately without any corrections. This was also observed by Meng et al. [121] in their study of a group of iron oxides using beyond-DFT approaches, where they observed adding the suggested U and J parameters to PBE minimally influence the lattice parameters prediction. This result also supports our previous observation that PBE requires smaller correction parameters.
On the other hand, PBEsol underestimates the lattice parameters. This was expected because PBEsol was introduced to correct the overestimation of PBE. The U and J parameters suggested in this study improve the lattice parameter prediction in PBEsol. Detailed analysis can be found in supplementary Table 2.   compound. In the future, it will be interesting to expand the parameter space to incorporate the details of the orbital occupation [58][59][60], the inter-site Hubbard V [95], and pseudopotentials [124]. Moreover, various other properties such as cohesive energy, formation energy, elastic constants, etc. can be used in the dataset X. The proposed methodology can be employed for other systems to predict their properties for a given set of parameters within the spirit of high-throughput calculations.
The valence electrons wave functions were described by the projector augmented-wave [129,130] method.
The kinetic energy expansion and optimum irreducible Brillouin zone grid (k -grid) for each structure were obtained by choosing a maximum error of 1 meV/atom for the total energy in each cell. We used Γ-centered and Monkhorst-Pack type [131]   using the U , J, and the ratio of F 4 /F 2 , as implemented in VASP [132].
The Kohn-Sham equations were solved self-consistently with a maximum total energy difference of 10 −5 eV. Furthermore, we assumed the crystal structure geometry to be optimized when the internal stress tensor components differ from the ambient pressure (assumed to be zero) by less than 0.5 kbar, and the residual forces on each atom are less than 1 meV/Å. Figures in this paper were generated using the Matplotlib [133] and PyVista [134] python packages. We used Numpy [135] and SciPy [136] Python packages for pre-and post-processing of the results.

Supplementary information
Supplementary Table 4 shows the improvement in volume and magnetic moment prediction using DFT+U .
In Supplementary Table 5  As plotted, we only show the pixels that were explored by the sampler. The U and J extracted from this sampling are shown as solid round magenta dots.
To evaluate the importance of target properties and the functionality of the method, we performed the experiment by omitting the bandgap from the target properties for LDA and PBE. This pushes the algorithm towards providing U and J pairs that improve the magnetic moment and the volume more than Supplementary    Table 7 shows the root mean square error (RMSE) and mean absolute error (MAE) from the DFT+U calculation using the U and J pair extracted from this sampling.
The observation error variance φ is estimated during the calibration. φ has an Inverse Gamma distribution (IG) prior [66]. In principle, the IG should be based on an a priori assessment of error in the experiment.
We have generated these IG by considering the mode and the mean to be twice and four times as large as the largest standard deviation of the target parameters, respectively. The standard deviations used for bandgap, magnetic moment, and volume were 0.01 eV, 0.01Å 3 , and 0.1 µ B respectively. This results in    Figure 3: U and J values extracted from MCMC sampling without bandgap in the target properties. Target properties in this sampling are volume and magnetic moment. The density was estimated using a gaussian kernel density estimation (KDE). The bandwidth was selected using the Scott approach. Each KDE is separately normalized to one. δ is the step between contour lines. Top left shows the probability density function of accepted U parameters. Bottom left shows the joint probability density function of accepted U and J. Bottom right shows the probability density function of accepted J parameters.