Downfolding from Ab Initio to Interacting Model Hamiltonians: Comprehensive Analysis and Benchmarking of the DFT+cRPA Approach

,


I. INTRODUCTION
The computational cost of solving electronic Hamiltonians increases rapidly with the size of the electronic Hilbert space, i.e., with the number of orbitals and electrons.This presents a substantial constraint on electronic structure calculations for molecules and an essential problem for the study of correlated solids.To overcome this dilemma, it is desirable to construct models with fewer electronic degrees of freedom in a systematic and controllable way.For this purpose, one needs to select a smaller target Hilbert space and determine the structure of the target Hamiltonian and its matrix elements.In practice, typically, no more than a handful of orbitals per unit cell are kept in the target space.Altogether, this procedure is known as downfolding [1] and serves as a bridge connecting ab initio methods such as density functional or GW theory with higher-level many-body treatments including (extended) dynamical mean field [2] or dual theories [3,4], ab initio DΓA [5], self-energy embedding [6] or Eliashberg theory [7], which cannot be performed on the full Hilbert space.There have been various suggestions for these downfolding schemes, ranging from constrained density functional perturbation theory for phononic degrees of freedom [8][9][10] to the constrained random phase approximation (cRPA) [11], constrained density functional theory [12], constrained GW [13] and constrained functional renormalization group [14] approaches for purely electronic Hamiltonians, next to embedding schemes, e.g., starting from coupled-cluster references [15,16] or via projection schemes [17].Our focus here is on interacting target space Hamiltonians of the form H = tαβ c † α c β + Ûαβγδ c † α c † β c γ c δ , where Û is a static Coulomb interaction in the target space, t defines the single-particle energies and the Greek indices run over the electronic target space.Generalized Hubbard model Hamiltonians of this form are the standard in "DFT++" [18] (or density functional theory based embedding) approaches to describe correlation effects in solids for which various solvers have been developed and implemented.
Although the idea of downfolding is conceptually simple, and reductions of the Hilbert space are ubiquitous in quantum physics, the question has remained on how one should do this in a systematic and practical way.By definition, the original Hilbert space is too large to do exact calculations, so one needs to use approximate methods whose errors are usually not controllable.At the same time, one needs to avoid double counting of correlation effects.This arises because of the interplay between the two-body Û terms and the single-particle term t in the Hamiltonian.The latter is frequently extracted from a density-functional theory (DFT) calculation that already contains some interaction and correlation effects.Several single-particle double counting corrections to t exist in the literature [19][20][21][22], with some physical arguments supporting their use, but there is no general consensus on how to tackle this problem.In fact, the uncertainty surrounding these single-particle double counting corrections can be a limiting factor for the accuracy of DFT++ approaches, see, e.g., Refs.[23,24] for further discussions.
In terms of the interaction in the target space, downfolding a priori generates many-fermion interactions of arbitrary order [14] and not just one-and two-particle terms.Furthermore, these interactions might be noninstantaneous (frequency-dependent) [11], requiring an action rather than a Hamiltonian formalism.Even if one accepts the restriction to a static two-particle Coulomb interaction tensor, its strength should be different from the corresponding Coulomb interaction in the full space to account for the screening by electrons outside of the target space, while screening by the other electrons inside the target space should not be included.The latter is to avoid a double counting of screening processes to the Coulomb interaction Û within the target space.A variety of constrained methods [8-11, 13, 25-27] have been developed to avoid this two-particle double counting issue, e.g., via partial (constrained) screening approaches.
Evaluating the performance of downfolding techniques has proven challenging in practice, as the matrix elements of the downfolded model are not measurable quantities themselves and as the models are often solved using approximate solvers.This multi-step procedure makes it hard to establish how accurate the downfolding by itself is since deviations from full first-principles calculations or experimental data might also originate in approximations made while solving the low-energy model.
Given the widespread use of cRPA calculations based on DFT input to derive interacting model Hamiltonians and their promise of accurately describing correlation effects at low numerical costs, it is critical to quantify their accuracy.To this end, we analyze here the sensitivity of this method at hand of the vanadocene molecule.The latter is a good proxy for correlated bulk materials or embedded correlated defects with well-defined interacting sub-spaces within gapped semiconducting backgrounds (refered to as rest spaces).Because vanadocene is a relatively small system, we are able to use state-of-the-art first-principles methods to derive reliable reference data, and the downfolded models are solved exactly, which allows us to focus on errors in the downfolding procedure.To avoid obtaining correct spectra with incorrect states, we assess both energies and the character of the states.We systematically assess how the model Hamiltonian form, the target space basis, the impact of double counting corrections, and the (constrained) screening models affect the ground state and excited state energies and wavefunctions.
In the following, we first explain our benchmarking strategy and introduce the vanadocene test system.Afterwards, we discuss the first-principles reference data before we present our step-wise benchmark, showing how each downfolding step affects the comparison to the reference system.The impact of our findings is discussed at the end.To obtain accurate reference data, we use a combination of several state-of-the-art first-principles methods which each provides systematically improvable manybody wave functions to treat the electronic correlations at an attainable cost.Specifically, we apply equation of motion coupled cluster (EOM-CC) [28,29], auxiliary field quantum Monte Carlo (AFQMC) [30], and fixednode diffusion Monte Carlo (DMC) [31], which have been shown to accurately describe the excited states for transition metal molecules [32][33][34][35].

A. Benchmarking Strategy
As the test system, we have chosen the unperturbed eclipsed vanadocene molecule VCp 2 [36], which consists of a V atom situated between two planar and parallel C 5 H 5 rings as illustrated in Fig. 1.The unperturbed molecule has D 5h symmetry with a five-fold rotation axis, five two-fold rotational axes orthogonal to the five-fold rotational axis, and a mirror plane in the xy-plane.The vanadocene molecule may also exist in a staggered ge-ometry with D 5d symmetry.Previous DFT calculations predict a difference of only 13 meV in the ground state energy between the D 5h and D 5d structures with the D 5h structure as the global ground state, consistent with experiment [37,38].On the level of DFT, there are 5 V 3d dominated Kohn-Sham states forming the highest occupied and the lowest unoccupied molecular orbitals, which host 3 electrons in total.The D 5h symmetry of the crystal field generated by the carbon rings leads to two doubly degenerate (e 1 , e 2 ) and one non-degenerate (a 1 ) singleparticle energies for these d-dominated molecular orbitals as depicted in Fig. 1.This way, VCp 2 forms an optimal test bed for cRPA-based ab initio downfolding approaches: it hosts a well-defined partially filled correlated target space spanned by the V d orbitals, which is well disentangled from the carbon ring "background" electronic structure, which has a significant single-particle gap of more than 6 eV.The latter is important to guarantee that the RPA treatment of the background screening is adequate [39].
Our goal is to benchmark the effects of electronic correlations in this molecule.For this purpose, we use the same frozen positions of the nuclei in all methods and do not consider any electron-phonon coupling.Furthermore, all methods start from the same ccECP pseudopotentials [40] and ignore spin-orbit coupling.In this sense, our benchmark should be understood as a comparison of several computational methods for the interacting electron Hamiltonian defined by this pseudopotential and these atomic positions rather than as a benchmark to experimental data.
For the downfolding, we start from conventional DFT calculations and project onto a set of localized orbitals.These are subsequently used to perform constrained random-phase approximation [1,11,41] (cRPA) calculations, which are widely used for deriving Coulomb interaction matrix elements for downfolded models [42][43][44].The resulting downfolded models are evaluated using exact diagonalization to avoid any artifacts from approximate model solutions.
Using the reference data from accurate quantum chemistry calculations, we assess the effects of changing the downfolding procedure as a sort of sensitivity analysis, in which an independent variable (the choice in downfolding procedure) is varied to determine the sensitivity of the output (the eigenstates of the embedded problem).For each step in the DFT+cRPA procedure, we consider several reasonable and common choices and analyze the sensitivity of the results.

B. First-principles Reference
Before we delve into the benchmark of the downfolded results, we first establish the first-principles references for the vanadocene molecule.In this study, we focus on the lowest four spin-flip and the lowest two crystal-field excitations, using three first-principles methods: EOM-CCSD, AFQMC, and DMC.They are among the most accurate quantum chemical methods for the investigation of excited states in correlated systems.Calculations have been systematically converged, and details of the methods can be found in Section IV.Calculations have been performed without imposing specific point group symmetry properties on many-body wave functions.
We found with all reference methods that the ground state is 4 A 2 with S = 3/2, consistent with previous experimental and theoretical studies [37,38,[45][46][47][48]. Specifically, the DMC ansatzes were chosen to be of S z = 1/2 on the S = 3/2 quartet.The configuration of the DMC ground state is determined to be (e 2 ) 2 (a 1 ) 1 (e 1 ) 0 , characterized by the one-body reduced density matrix of a natural orbital basis (see Section IV for details).The symmetry of the many-body state computed using AFQMC is inherited from the trial wave function used.For the ground state, AFQMC calculations were performed using a CASSCF trial wave function with (e 2 ) 2 (a 1 ) 1 (e 1 ) 0 3d-orbital occupancy.Fig. 2 shows a summary of the charge-neutral excitation spectra obtained using these first-principles methods.The ground state energies are aligned at zero, and the dashed (solid) lines represent spin-flip (crystal-field) excitations, identified by states' zero (finite) occupation on the high-energy e 1 manifold.All three first-principles methods are in qualitative agreement regarding the order and degeneracy of all spin-flip and crystal-field excitation energies.Quantitatively, all methods agree to within about 200 meV.

EOM
The energy difference between excitation energies of the same type (i.e.within either the set of spin-flip excitations or the set of crystal-field excitations) is remarkably similar across all methods.According to Fig. 2, the lowest three spin-flip excitation energies are found to be within windows of < 250 meV, while the fourth spin-flip excitation energy is higher than the lowest three spin-flip states by 350 meV (DMC).The crystal-field excitations are identified to be higher than the lowest four spin-flip excitations.The two crystal-field excitations are found to be separated by about 200 meV for each first-principles method.Overall, the first-principles methods obtain the same ordering and character of the many-body eigenstates of interest, but differ in energy by approximately 60 ∼ 200 meV.
In the following sections, we proceed with the discussion of the six lowest charge neutral many-body excitations with S z = 1/2, which take place within the V d shell and are thus describable within the minimal subspace of the downfolding calculations.

C. Downfolding Results
Now that we are familiar with the test system and the first-principles reference data, we turn to the results obtained using our downfolding procedure.We aim to systematically test and scrutinize all relevant steps in the DFT+cRPA procedure, especially those for which multiple strategies exist in the literature such that choices need to be made.Specifically, we will discuss the chosen form of the model Hamiltonian, the target space basis, the single-particle double counting procedure, and finally, the Coulomb screening model.The downfolding procedure is qualitatively assessed based on the predicted ground state properties, the ordering of the excitation energies, the many-body characteristics of the excited states, and the shape of their charge densities.To quantify the agreement of the (charge-neutral) excitation energies of the downfolded model with the first-principles reference data, we further calculate where E n are the charge-neutral excitation energies in the downfolded model and E ref,i n are the corresponding excitation energies according to the ith first-principles method with i ∈ {EOM-CCSD,AFQMC,DMC}. χ 2 is the log-likelihood from a Bayesian inference of the embedding model probability, based on the reference values.It naturally accounts for the fact that some of the states are in close agreement among the reference sets, and some are not.We determine σ as the average over the variances of the n excitations within the reference set (EOM-CCSD, AFQMC, DMC), which results in σ ≈ 0.1 eV.χ 2 accounts for the fact that the reference data itself has errors, and so it represents our uncertainty about the exact result.
The purpose of this quantifier is to summarize how consistent a given downfolding procedure is with the excitation spectrum of the reference data in a single number, given that there is uncertainty within the reference data set itself.The smaller χ 2 , the better the match.
Note that this procedure requires us to identify which excitations belong together (the label n, the colors of the energy levels in the figures).We achieve this based on the quantum numbers, the spatial structure of the spinresolved charge densities ρ σ n , and the many-body characteristics of the corresponding excited wave functions for each state n as explained in section IV B.
We consider several variations of the DFT+cRPA procedure.The choice of localized orbitals for the embedded space was either constructed using maximally localized Wannier functions [49] (MLWF) or first guess Wannier functions (FGWF).The single particle energies were constructed either using double counting corrections, no double counting corrections, and by explicitly modifying crystal field energies, which emulate potentially using different DFT functionals to create the one-particle energies.We then move to considering the construction of the interacting part of the downfolded Hamiltonian by considering monopole screening and frequency dependence.
FIG. 3. DMC and downfolded ground states charge densities.ρ ↑/↓ (r) from each method is normalized by the corresponding ρ ↓ (r) from the same method.

Reference DFT+cRPA procedure
To limit the number of calculations, we start with a set of common choices that yield reasonable agreement with the reference data.We use MLWF localized orbitals, single particle energies with no double counting corrections, and the full four index cRPA screened Coulomb matrix elements in the ω → 0 limit.As shown for the ground state in Fig. 3 and for all excited states in section IV B this model is in qualitative agreement with the reference data, with a loss function χ 2 of 27.3.We will consider modifications to this strategy, and examine the changes in χ 2 ; if χ 2 varies upon a change in the choice, then the DFT+cRPA results are sensitive to the choice, and if it does not, then the DFT+cRPA procedure is not sensitive to the choice.In this way, we can identify what approximations deserve further analysis.We proceed with the discussion of the chosen basis sets.For comparison to the conventional MLWF, which are constrained here to reproduce the DFT singleparticle energies, we also use a projector-like [50][51][52][53] first guess Wannier function (FGWF) basis set to calculate all model Hamiltonian matrix elements.The FGWF Wannier function basis set differs in its construction by not applying any inner (or "frozen") energy window, such that the DFT single-particle energies are not reproduced, and by using a much larger outer energy window, such that the resulting wave functions are more localized.All details can be found in section IV.In this case, both basis sets are capable of reproducing the correct ground state, so we proceed with the analysis of the charge-neutral excitation spectrum.Fig. 4 shows all spin-flip and crystalfield excitations between 0.6 and 2.6 eV.For the FGWF basis, various double and four-fold degenerate spin-flip excitations appear in the energy window of interest, with significantly different characters, such that a 1:1 mapping to the spin-flip excitations from the other methods cannot be established and χ 2 cannot be defined.Correspondingly, the FGWF excitation energies are colored gray in Fig. 4.
This comparison strikingly shows that projector-like first guess Wannier functions, which are not constrained to reproduce the DFT single-particle energies (see section IV for details), can result in quantitatively and even qualitatively wrong many-body excitation properties.
As shown in Table III, the single-particle energies in the FGWF basis are not as different from MLWF basis functions as other modifications; however, the interactions as shown in Tables I and II differ by up to 27%.Thus, at least in this case, the basis functions are important mainly because of their influence on the interactions rather than the single-particle energies.

Single-Particle Energies
Although it is convenient to extract t from the DFT energies, there are several uncontrolled approximations in doing so.First of all, the Kohn-Sham (KS) DFT energies are known to be auxiliary quantities that are not guaranteed to accurately represent the excitations of the system [54,55], even if the exact density functional were available.Secondly, the DFT calculation already incorporates some electronic interaction effects, while the (ED) solution of the downfolded Hamiltonian will add further interaction effects.Correcting for this double counting could be either achieved by starting from constrained GW calculations [13] or by using a so-called double counting correction term in the model Hamiltonian, which aims to remove any Coulomb interaction effects from the single-particle starting point.We consider two modifications to the single-particle energies: a Hartree double-counting correction, [20,56,57] which decreases the energy difference between a 1 and e 1 by more than 1 eV such that the crystal-field excitations shift significantly up in energy (see Tab. III), and rigid shifts of the energy difference between a 1 and e 1 levels by approx.±150 meV.We use the rigid shifts rather than varying the functional because varying the DFT functional also changes orbitals, which could result in conflated effects.For discussions on how to possibly vary the single-particle correction scheme for different DFT functionals, we refer the interested reader to Refs.[20,56] Fig. 5 shows how modifications to the single-particle terms affect the excitation energies in the downfolded model.The spin-flip excitations are barely affected by single-particle energy changes, as they are mainly defined by exchange Coulomb matrix elements, which are unaffected by single-particle double counting corrections.
The crystal-field excitations can, however, change significantly: Increasing the energy difference between a 1 and e 1 with H a1 CFC by about 150 meV lowers the crystal-field excitation energies, while decreasing the difference between a 1 and e 1 by about 150 meV using H e1 CFC shifts the crystal-field excitations up in energy.This trend also holds using the Hartree double counting correction, which decreases the energy difference between a 1 and e 1 by more than 1 eV such that the crystal-field excitations shift significantly up in energy (see Tab. III for further details).
In all of these cases, we can exactly map the excited many-body states to the reference ones, allowing us to calculate χ 2 .We see that H a1 CFC and H e1 CFC only mildly affect χ 2 .The Hartree single-particle double counting correction, however, significantly increases χ 2 , yielding a poor agreement with the reference data.From this, we can conclude that the Hartree double counting is not a favorable correction scheme for charge-neutral local excitations in our vanadocene test system.A similar effect from Hartree double counting corrections has already been observed for Fe impurities in AlN [20].In fact, we find the best agreement with the first-principles reference without applying a double counting correction at all.This is in line with conventional double counting corrections, as regularly applied in DFT++ schemes [58][59][60][61], which do not have an orbital dependence within the correlated space.
The effects of the single-particle energies on the accuracy of the model are relatively straightforward compared to the other terms.As long as the energies are not too much in error, the ground state and spin excitations are robust to their precise value.The crystal field excitations are shifted roughly linearly with the single-particle energies.

Static Screening Model
We now investigate how different screening models affect the charge-neutral excitation spectrum.To this end, we show in Fig. 6 the original data as obtained from using the conventional screened cRPA U ijkl tensor (referred to as "full"), together with results obtained from a cRPA tensor which is predominantly screened in the monopole channel (defined here as the leading eigenvalue of the screening, as explained in section IV), and results from using the bare (unscreened) Coulomb interaction tensor.The "monopole" cRPA model is inspired by Scott and Booth [62], and further details can be found in section IV.
The full cRPA results that have already been discussed yield a χ 2 = 27.dominately density-density Coulomb matrix elements are screened, yields a better agreement with χ 2 = 16.5.This additional constraint on the cRPA screening shifts the crystal-field excitation energies down by about 200 meV, and clusters the spin-flip excitations into 3:1 pairs, with the highest spin-flip excitations being closer to the lowest crystal-field excitations (in Fig. 6 they actually now overlap).This results in a better agreement with the reference data as quantified by the reduced χ 2 value.When screening is completely neglected, i.e., when we use the bare v ijkl Coulomb tensor, the agreement with the reference data as measured by χ 2 becomes, however, unsatisfactory compared to the (monopole) cRPA cases.
In the bare case, the crystal-field excitation energies are strongly underestimated, which mixes the overall ordering of the spin-flip and crystal-field excitations, resulting in a large error, χ 2 = 86.5.The spin-flip excitations, however, also show the 3:1 clustering similar to the monopole screening case.
From this study, we learn that the cRPA screening model is indeed better than bare Coulomb interactions.However, the screening model is not perfectly accurate; a modification of only screening the monopole channel, motivated by the underlying RPA screening by long-range charge fluctuations [62], results in better agreement of the model eigenstates with the quantum chemistry reference results, at least for vanadocene.

Finite Frequency Screening
The Coulomb tensor in the cRPA approximation is frequency-dependent [11], i.e., ÛcRPA (ω).To obtain a static Hamiltonian which is easier to solve, it is conventional to evaluate ÛcRPA (ω = 0), based on the idea that the low energy target space physics is slow compared to the screening processes.To assess this approximation, we study how the results change when the cRPA Coulomb matrix elements are evaluated at finite frequency.Here, we perform this analysis based on a simple model of the frequency dependence, the plasmon pole model (see Methods for details).Fig. 7 shows the excitation energies using a Coulomb tensor Û (ω) evaluated at various ω, which are all below the estimated background screening plasmon pole frequency of approximately 16.5 eV.Note that these are all calculated using the monopole screening model of Fig. 6, which corresponds to ω = 0 here.For ω = 4 eV and ω = 8 eV, we find an increase in the crystal-field excitation energies and a concomitant small improvement in the match with the reference data as quantified by χ 2 .The spin excitation energies barely change for these ω values.ω = 12 eV is close to the plasmon pole frequency of 16.5 eV, so the Coulomb matrix elements change more rapidly, and the excitation energies change more as a result.At this point, the agreement with the reference data deteriorates substantially.We should note that the theoretical justification of cRPA relies on the screening frequency ω being smaller than the gap in the rest space [39], which is roughly 6 eV in vanadocene (Fig. 1).I.e., for ω > 6 eV, the dynamics of particle-hole excitations start to become relevant, which we cannot capture in the static Hamiltonian formalism.
With regard to the frequency choice in the cRPA background screening, we thus find slightly better agreement with the reference data upon using ω > 0 in comparison to ω = 0.Among all benchmarked quantities, however, the background screening frequency has the smallest effect.

III. DISCUSSION
By comparing the ground state and charge-neutral excited states obtained directly from first-principles descriptions and from minimal downfolded Hamiltonians, we have provided a systematic and quantitative sensitivity analysis of the DFT+cRPA scheme for the ab initio derivation of interacting model Hamiltonians for the description of correlation effects in the vanadocene molecule.We expect that the results in this study will be most applicable to similar systems, such as defects in gapped semiconductors or correlated bulk materials with well-separated correlated sub-spaces.These systems are ideal for cRPA, as noted by some of us [39], and even so, we were able to quantify errors in every step of the DFT+cRPA process.
To summarize, we find the following about the DFT+cRPA process: Orbital shapes are crucial.The single-particle energies for FGWF are not that different from MLWF but the model is extremely inaccurate because the two-body terms highly depend on the orbital shapes.Naïve application of the orbital dependent double counting corrections results in very poor estimations of crystal-field excitations, because it fails to describe the one-particle levels.Small corrections do not change much within our uncertainties.The cRPA-screened interactions are much better than the bare ones.Monopole-only screening is clearly better than traditional.Frequency dependence, in this case, is not that big, unless one goes to very high energies.
From this data, we can make a few recommendations.One should be very careful about the basis orbitals, especially in cases where disentanglement of bands is necessary, since orbital shapes can completely change the results.Similarly, the functional used to choose localized orbitals might have significant effects on the interactions computed.Double counting corrections do not necessarily improve the accuracy of the Hamiltonian.These "corrections" can also be conflated with the tendency of DFT to incorrectly compute the one-particle energies.Even in the case of vanadocene, which is a best-case scenario for cRPA, a naïve application of cRPA screening obtains interactions that are better than bare, but appear to be limiting accuracy.In this case they can be improved by monopole-only screening in particular, although it remains to be seen if such a method is generally Looking forward, we have demonstrated that it is possible to generate reference quantum mechanical data of sufficient quality and quantity to systematically analyze heuristic (but inexpensive) methods like DFT+cRPA, at least for model systems such as vanadocene.The application of multiple first-principles methods allows us to evaluate the uncertainty in the reference data, partially mitigating the problem of overfitting.The data from this work can serve as a reference for other methods, such as GW -based approaches, and could be applied to a variety of materials and systems and has been made publicly available [63].

A. DFT Calculations
We use Quantum Espresso (QE) [64][65][66] and embed the vanadocene molecule in a 15×15×15 Å3 supercell to minimize spurious wave function overlap and undesired screening between repeated cells.The plane-wave cutoff was set to 440 Ryd, and we apply the spin-restricted (spin-unpolarized) generalized gradient approximation within a PBE functional [67].In these spin-restricted calculations we find an ordering of e 2 < a 1 < e 1 , which is different from spin-unrestricted results for both C 2h and D 5h VCp 2 where the e 2 < a 1 ordering is reversed between the spin channels [46,68].The occupations of the three lowest d orbitals are constrained to exactly 1 to avoid partial state occupations due to the conventional smearing methods in plane-wave DFT codes.This constraint does, however, not significantly affect the Kohn Sham energies compared to calculations with a large smearing, c.f. Tab.III.

B. Wannier Basis Sets
We use the three highest (partially) occupied and the two lowest unoccupied molecular orbitals with pre- dominant V d-orbital character to construct five maximally localized Wannier functions (MLWF) |ϕ i ⟩ utilizing RESPACK [69].For the construction of these MLWFs we use an inner (frozen) window to constrain the energies of the five Kohn-Sham states of interest.The resulting Wannier orbitals resemble the expected V d orbital symmetries, however, with small lobes at the carbon rings as visualized in Fig. 1, which is in good agreement with Ref. [68].These indicate a finite hybridization between the V d states and the carbon rings.
To investigate this specific choice of Wannier functions and to qualitatively compare to projector based methods, we construct a second Wannier basis without using the frozen (inner) window, utilizing a larger Wannierization (outer) window, and without performing maximal localizations.The resulting first-guess Wannier function (FGWF) single-particle energies do not perfectly reproduce the KS ones, as indicated in Fig. 8 and listed in Tab.III and cannot be generated via some uniform rotation of the MLWFs.These FGWF orbitals are nevertheless similar to the MLWFs, but do have a slightly different shape, which is reflected in the overlap matrix elements between Wannier orbitals of the same symmetry:

C. Model Hamiltonian Matrix Elements and cRPA Screening
With these Wannier basis sets, we evaluate the hopping matrix elements t ij = ⟨ϕ i |H DFT |ϕ j ⟩ and the bare v ijkl = ⟨ϕ i ϕ j |v|ϕ k ϕ l ⟩ as well as the statically cRPA screened U ijkl = ⟨ϕ i ϕ j |U (ω = 0)|ϕ k ϕ l ⟩ Coulomb matrix elements within RESPACK [69].The resulting single-particle energies are given in Tab.III, and the density-density and Hund's exchange matrix elements are listed in Tab.I and Tab.II.We see that density-density elements are screened by about 55 to 60% (up to 7.8 eV), while Hund's elements are reduced by only 5 to 10% (up to 60 meV).Analysing the leading and sub-leading screening contributions yields a screening of ε 1 = ε mono ≈ 2.44 in the predominate monopole channel and ε i>1 = ε multi ≈ 1.04 to 1.21 in the sub-leading multipole channels.These screening channels are defined by ε i = v i /U i with v i being the eigenvalues of the full v ijkl tensor evaluated in a product basis.U i are the approximated eigenvalues of U ijkl obtained using the eigenbasis of the bare v ijkl [70].The full cRPA screening is thus most dominant in the monopole channel, which mostly affects density-density interactions, while the screening becomes small in the multipole channels, which mostly affect Hund's exchange elements.
Inspired by Scott and Booth [62] we also construct a cRPA Coulomb tensor, which is screened mostly in the leading monopole channel, such that only ε 1 > 1, while all other ε i>1 = 1.The resulting matrix elements are listed in Tab.I.As expected, the approx."monopole" cRPA Coulomb matrix elements agree with the full cRPA ones in the density-density and approx.with the bare ones in the Hund's exchange channels.However, note that the monopole screened Hund's exchange elements are up to 77 meV larger than the fully screened ones, which is relevant for the spin-flip excitation energies.
For the analysis of finite background screening frequencies, we calculate U ijkl (iω n ) = ⟨ϕ i ϕ j |U (iω n )|ϕ k ϕ l ⟩ for a few Matsubara frequencies iω n .Afterwards we fit the leading eigenvalue to a single plasmon pole model of the form 6 eV and ω p ≈ 16.5eV.This allows us to adjust the real frequency of the leading eigenvalue as p +iδ for which we use δ = 2.5 eV.This way, we can continuously adjust the leading eigenvalue in the monopole screening model as done for Fig. 7.The rather large δ = 2.5 eV is used here to mimic the effects of screening channels beyond the single plasmon pole model.The dependence of the density-density monopole-screened Coulomb interaction matrix elements is depicted in Fig. 9.
Although the FGWF wave functions show deviations of just up to 6.4% in comparison to the MLWF ones (as measured by the overlaps given above), the resulting bare Coulomb matrix elements are different by up to 27% as a result of the ϕ 4 i dependence of v and as visible from Tab. II.This is a result of the enhanced atomic charac- (eV) ter of the FGWFs and their reduced hybridization with the carbon rings, yielding a smaller spread and, thus, enhanced bare Coulomb matrix elements.

D. Double counting
We choose three orbital-dependent correction schemes, which all act on the single-particle t ij hopping matrix elements according to the following terms, which we add to the model Hamiltonian: with ρ kl the single-particle density-matrix in the Wannier basis obtained from the DFT calculation.This is the conventional Hartree double counting correction as discussed in detail in Refs.[20,56,57].We also apply ad hoc chosen crystal-field corrections of the form with only ∆ (a1) i=a1 = −0.143eV or ∆ (e1) i=e1 = −0.150eV nonzero elements for the respective H CFC .These corrections thus solely shift the a 1 or e 1 states, while they preserve all symmetries.With these, we aim to investigate the effects of possibly wrong single-particle energies.

E. Exact Solutions and Benchmark Quantities
We use the exact diagonalization (ED) routines (atom diag) from TRIQS [71] to solve the downfolded many-body Hamiltonian H = tαβ c † α c β + Ûαβγδ c † α c † β c γ c δ , giving us access to both the energies and the manybody wave functions in the target space.Here, we restrict ourselves to charge-neutral excitations, i.e., those with N = 3 electrons in total, such that the relevant many-body wave functions are of the form , where the sum runs over spin and orbital indices.For three electrons, the possible values of S z are ±3/2 and ±1/2.Note that our model has SU (2) spin symmetry, so the spin S is a good quantum number, and states with the same S and different m s have the same energy.We can thus restrict ourselves to states with m s = 1/2.
Using the Wannier functions ϕ α (r), we calculate the electron density ρ i (r) in real space for a given many-body wave function i as using the many body states Ψ i as obtained from the exact diagonalization of the Hamiltonian.For the spinresolved density ρ σ i (r) we use a similar expression yielding drρ ↑ i (r) = 2 and drρ ↓ i (r) = 1 in the m s = 1/2 channel.In addition to this, we also calculate the 1-RDMs (ρ i ) αβ = ⟨Ψ i |c † α c β |Ψ i ⟩ for each wave function Ψ i .Based on these quantities, that is, the many-body state energies, wave functions, 1-RDMs, and charge densities, we can unambiguously identify all states among all applied methods.A corresponding comparison between the many-body wave functions represented as linear combinations of Slater determinants from DFT+cRPA and the multi-Slater part of the trial wave functions for DMC is depicted in Fig. 10.The corresponding comparisons for the 1-RDMs and the many-body charge densities are given in the Supplemental Notes 1 and 2.

F. Real-space Fixed-node Diffusion Monte Carlo (DMC)
In order to obtain approximate many-body eigenstates using fixed-node diffusion Monte Carlo (DMC), we constructed the trial wave functions in the following multi-Slater-Jastrow form, where r i represents the real-space coordinates of the ith electron and R encompasses the configurations of all the electrons.The multi-Slater-determinant expansion, given by k c k D ↑ k (χ i↑ (r j )) D ↓ k (χ i↓ (r j )), was generated from restricted Hartree-Fock (RHF) followed by state-averaged complete active space self-consistent field (CASSCF) [72] using the PySCF package [73] with correlation-consistent effective core potentials, and a corresponding triple-zeta basis set [40].The system consists of 63 valence electrons, and in CASSCF, we chose an active space that includes two spin-up electrons, one spin-down electron, and five molecular orbitals that are The black horizontal lines group the many-body states into ground state, spin-flip excitations, and crystal-field excitation, whereas the grey horizontal lines group the degenerate states together.We can see that states 3 and 4 (degenerate in energy and correspond to the 2nd excited state, labeled by blue dashed lines in Fig. 2) and all the crystal-field excitations show significant contributions from the double excitations.
V-3d-like determined by the atomic valence active space (AVAS) procedure [74].For all the eigenstates, the parameters in the two-body Jastrow factor J, all the determinant coefficients c k , and the molecular orbitals in the active space were fully optimized using variational Monte Carlo [31] using the method described in Refs.[75,76].The excited states were optimized with the constraint that they are orthogonal to the optimized lower-energy eigenstates [33].After the trial wave functions Ψ T (of either the ground state or the excited states) were fully optimized, the lowest-energy wave functions that have the same nodal structure were projected out using DMC [31].We then compute the energies and reduced density matrices of the fixed-node wave functions of all the eigenstates of interest.All the variational and diffusion Monte Carlo calculations, including the optimization of trial wave functions and DMC, were performed using the PyQMC package [77] interfaced with PySCF.
Using reduced density matrices in natural orbitals to characterize many-body eigenstates For a wave function |Ψ⟩, its one-body and two-body reduced density matrices (1(2)-RDMs) in a given singleparticle basis are defined as where c † iσ (c iσ ) creates (annihilates) an electron in singleparticle orbital χ iσ .Almost all the one-and two-body observables can be computed using the 1-and 2-RDMs.For example, the spin-resolved electronic density is evaluated as To directly compare the observables that characterize the fixed-node eigenstates with those of the downfolded eigenstates, it is crucial to ensure that the RDMs are calculated using the identical set of single-particle orbitals.
Given that the downfolding and DMC calculations were performed using separated packages and employed a different basis to expand the active space (MLWFs for the downfolded results and CASSCF molecular orbitals for DMC), we rotate the RDMs obtained from each method to the same basis set to align the many-body eigenstates.
The natural orbitals are defined as the principal components, that correspond to the largest five singular values, of all the 1-RDMs up to the mth downfolded excited state [78], where ρ σ [Ψ a ] is eigenstate Ψ a 's 1-RDM written in the Wannier basis {ϕ i (r)}.We then transform all the 1and 2-RDMs computed from both downfolding and DMC to these natural orbitals.This ensures that we characterize downfolded and DMC eigenstates within the same Hilbert space.We found Tr (ρ downfolded [Ψ i ]) − Tr (ρ DMC [Ψ i ]) to be 0.1 ∼ 0.2, for all the Ψ i 's of interest (up to the 2nd charge excitation).With three electrons in the active space, this suggests that when using the MLWFs derived from Wannierization of Kohn-Sham orbitals to characterize the low-energy excitations in VCp 2 , we get about 5% error in terms of their 1-RDMs.
G. Auxiliary-Field Quantum Monte Carlo (AFQMC) We computed the low-energy many-body spectrum, including spin-flip, and crystal-field excitations, of the vanadocene molecule using phaseless AFQMC [30,79,80].All AFQMC calculations were performed using the Flatiron Institute, Center for Computational Quantum Physics's production quality AFQMC code.A detailed description of the AFQMC method was recently reviewed in great detail [79].AFQMC was recently used to compute the ionization potentials of several 3d-transition metal complexes, including vanadocene, producing results which compare favorably with experiment [34].Our AFQMC procedure, especially regarding the choice of trial wave function, is guided by that benchmark.AFQMC calculations were performed using multi-Slater determinant trial wave functions which were derived from CASSCF calculations.The active space for CASSCF was chosen using the atomic valence active space (AVAS) procedure [74] and the ANO-RCC basis to define V-3d reference atomic orbitals, with an AVAS threshold of 0.1.This yields an active space consisting of 13 active electrons and 15 active orbitals.CASSCF orbitals were optimized for the ground state, and excited-state CAS-wave functions were computed in one-shot state-specific CASCI calculations using the ground-state CASSCF orbitals to define the active space.AFQMC trial wave functions are truncated CAS-wave functions where Slater determinants with weight less than 0.0014 are discarded.For states with S = 3/2, including the ground state and both crystal-field excitations, the trial wave functions consisted of only about 100 Slater determinants.For all spin-flip excitations, the trial wave functions consisted of about 150 Slater determinants.

H. Equation-of-Motion Coupled Cluster (EOMCC)
Equation-of-motion coupled-cluster calculations were performed at the single-, and double-excitation level (EOM-CCSD), as implemented in PySCF [73].The correlation consistent effective core potential (ccECP) / pseudopotential and the corresponding ccECP-cc-pVQZ basis were used for all atoms.The ROHF ground state determinant, with M S = 3/2 and S 2 = 3.75, was used as a reference for EOM-CCSD.Due to the very large dimension of the Hilbert space in the ccECP-cc-pVQZ basis, we performed EOM-CCSD in a CAS-like active space with 502 active orbitals and including all electrons in the active space.We checked that the excitation energies are converged by comparing with similar calculations performed in an active space consisting of 452 orbitals.The average absolute deviation of the excitation energies, computed over the states in Figure 2, between EOM-CCSD(502o,63e) and EOM-CCSD(452o,63e) is 28 (15) meV with a maximum absolute deviation of 53 meV from the first crystal-field excitation.

DATA AVAILABILITY
The data that support the findings of this work are available from the corresponding author upon reasonable request.All ab initio and modelling results (AFQMC, DMC, EOM-CC, DFT, cRPA, and ED) are furthermore publicly available [63].

ACKNOWLEDGEMENT
We thank Yusuke Nomura for support for using RESPACK and Timothy Berkelbach for helpful discussions.YC was supported by the U.S.  In this work, the benchmark of all the DFT+cRPA downfolded models is based on comparing their eigenstates with the many-body wave functions from highly accurate first-principles many-body calculations (DMC).On one hand, the real-space DMC wave functions contain information on the probability density of all valence electrons in vanadocene (the core electrons are ignored in the ECPs), living in a Hilbert space of 63 electrons.On the other hand, the ED eigenstates are defined only on a 3-electron Hilbert space spanned by the 5 MLWFs.Therefore, to establish the one-to-one correspondence besides the energies and S 2 values, we compare the 1-RDMs computed using the same single-particle basis, which is chosen to be MLWFs here.
In Supplementary Figure 11, we show the state-wise comparison of the diagonal terms of the one-particle reduced density matrices (1-RDMs) in the spin-up channel for all the DFT+cRPA ED eigenstates and the DMC eigenstates.The mixed estimator error in the DMC 1-RDMs have been corrected using the extrapolated estimator, ρ DMC = ρ mixed + ρ † mixed − ρ VMC .The single-particle orbitals, labeled from 0 to 4, correspond to the five MLWFs shown in Fig. 1 of the main text.States with zero/finite occupancies in the e 1 manifold are categorized as spin-flip (SF)/crystalfield (CF) excitations.Note that this information is crucial for the benchmark, especially in the cases when the states can not be matched easily based on their energies, see, for example, Fig. 5 of the main text.

SUPPLEMENTARY NOTE 2: CHARGE DENSITY COMPARISONS
In Supplementary Figures 12 to 18 we show the one-particle density matrices together with the charge densities and their differences to the ground states (∆ ↑/↓ ) for all states discussed in the main text and as obtained from our DMC calculations (after projecting to the natural basis of the downfolded results) and from downfolding using the static cRPA Coulomb interaction.To also illustrate the spin and charge excitation characters of these states, we show the difference ∆ ↑ ± ∆ ↓ .For the spin-flip excitation, we expect ∆ ↑ + ∆ ↓ to be vanishingly small, while the change in the spin polarization, i.e., ∆ ↑ − ∆ ↓ should be non-zero.For crystal-field excitations we expect finite ∆ ↑ + ∆ ↓ .
The overall good qualitative agreement between the DMC and downfolded results shows that we can indeed map the downfolded states 1:1 to the corresponding reference states from DMC, which allows for more detailed discussions of their differences.
As an example, we briefly discuss the first spin-flip and crystal-field excitations shown in Figs. 13 and 17.For both excitations, all depicted quantities obtained via DMC and downfolding are in good qualitative and quantitative agreement.For the spin-flip excitation, we do not find any significant difference between the two methods: diagonal and off-diagonal one-particle matrix elements are nearly the same, and the resulting charge densities look very much alike.In the case of the crystal-field excitation, the agreement is also qualitatively good, while there are some quantitative differences.In both DMC and the downfolded model, the higher e 1 is occupied by approx. 1 electron with approx.2/3 in the up and 1/3 in the spin down channel.The relative occupations in the lower e 2 and a 1 orbitals, however, differ between the methods.The downfolded results underestimate the a 1 occupation and overestimate the e 2 occupations compared to the DMC.This level of agreement holds throughout nearly all investigated excitations.Only the second spin-flip excitation shows different characteristics in ∆ ↑/↓ and ∆ ↑ ±∆ ↓ .From the comparison of the one-particle density matrices, we find here again that for this excitation, the a 1 (e 2 ) occupations are underestimated (overestimated) using the downfolded Hamiltonian in comparison to DMC.This might be attributed to an overestimation of the e 2 -a 1 splitting in the spinrestricted DFT calculation.In fact, upon adding a crystal-field correction (H a1 CFC ), which moves the a 1 state below the e 2 , this qualitative difference between the DMC and downfolded result can be lifted.Whether this discrepancy results here from a shortcoming in the DFT starting point, from missing double counting corrections, from the chosen model Hamiltonian, or from less accurate DMC reference data cannot be judged at the moment.FIG.18. Second crystal-field excitations within DMC (top) and as obtained from downfolding using the MLWF basis, full static cRPA Coulomb matrix elements, and no double counting (bottom).We show sketches of the many-body state, spin-channel resolved one-particle reduced density matrices (ordered as e2, a1, e1), and visualizations of the corresponding many-body charge densities.

FIG. 1 .
FIG. 1. Single Particle Properties.Left: DFT Kohn-Sham energies.Purple, green, and red colors indicate the projected orbital weight of each Kohn-Sham state, being mostly H s, C p, or V d, respectively.Red lines mark single-particle energies resulting from diagonalization of the downfolded singleparticle Hamiltonian, which perfectly overlap with the Kohn-Sham ones.Right: Maximally localized Wannier functions used within the downfolding procedure.

FIG. 4 .
FIG. 4. Basis set influence on downfolded excitation energies.Comparison between maximally localized and first guessWannier function basis sets using static cRPA Coulomb interactions.Dashed and solid lines represent spin-flip (SF) and crystal-field (CF) excitations, respectively.(Light) Grey excitation energies could not be (have not been) characterized.

FIG. 8 .
FIG. 8. Single Particle Properties.Left: DFT Kohn-Sham energies.Red lines mark single-particle energies resulting from diagonalization of the downfolded Hamiltonian using first guess Wannier functions.Right: First guess Wannier functions used within the downfolding procedure.

FIG. 9 .
FIG. 9. Approximated frequency dependence of the monopole screened cRPA Coulomb matrix elements in the densitydensity channel .

FIG. 10
FIG. 10. (a,b)The squares of configuration interaction (exact diagonalization) coefficients (|ci| 2 ) of the optimized multi-Slater-Jastrow wave functions obtained using variational Monte Carlo and eigenstates obtained from ED using the model given by the MLWFs.The four panels correspond to the 0th order configuration interaction (purple), single excitations (green), double excitations (orange), and triple excitations (red) within the CAS(5, (2,1)) space.The tick labels on x axis, (i, j) (k), represent a Slater determinant c † k↓ c † j↑ c † i↑ |Φ0⟩ (see panel (e) for the MLWFs), where |Φ0⟩ is the Slater determinant of all the valence electrons doubly occupying the orbitals up to the V d shell.Panels (c) and (d) show the cumulated contributions in each order of CIs for the VMC wave functions and the model eigenstates.The black horizontal lines group the many-body states into ground state, spin-flip excitations, and crystal-field excitation, whereas the grey horizontal lines group the degenerate states together.We can see that states 3 and 4 (degenerate in energy and correspond to the 2nd excited state, labeled by blue dashed lines in Fig.2) and all the crystal-field excitations show significant contributions from the double excitations.

FIG. 11 .
FIG.11. the state-wise comparison of the diagonal terms of the one-particle reduced density matrices (1-RDMs) in the spin-up channel for all the DFT+cRPA ED eigenstates and the DMC eigenstates.The single-particle orbitals, labeled from 0 to 4, correspond to the five MLWFs shown in Supplementary Figure1.States with zero/finite occupancies in the e1 manifold are categorized as spin-flip (SF)/crystal-field (CF) excitations.

TABLE I .
Bare and cRPA screened density-density (viijj) and Hund's exchange (vijij) Coulomb matrix elements in the Wannier basis of the MLWF in eV.
TABLE II.Bare density-density (viijj) and Hund's exchange (vijij) Coulomb matrix elements in the Wannier basis of the FGWF in eV. .
static cRPA Coulomb interactions.Dashed and solid lines represent spin-flip (SF) and crystal-field (CF) excitations, respectively.Light grey excitation energies have not been characterized.
3.Restricting the cRPA screening predominantly to the monopole channel, such that pre- FIG.6.Static screening model influence on downfolded excitation energies using the MLWF basis and no double counting correction.Dashed and solid lines represent spin-flip (SF) and crystal-field (CF) excitations, respectively.Light grey excitation energies have not been characterized.

TABLE III .
Single-particle energies (in eV) from DFT output (KS), after wannierization (MLWF and FGWF), and including double counting corrections (MLWF+H).DFT energies are given for constrained and smeared occupations.
Department of Energy, Office of Science, Office of Basic Energy Sciences, Computational Materials Sciences Program, under Award No. DE-SC0020177 for the initial diffusion Monte Carlo calculations and analysis performed at the University of Illinois.EvL acknowledges support from the Swedish Research Council (Vetenskapsrådet, VR) under grant 2022-03090.CED acknowledges support from NSF Grant No. DMR-2237674.The Flatiron Institute is a division of the Simons Foundation.TW acknowledges support from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) through the cluster of excellence "CUI: Advanced Imaging of Matter" of the Deutsche Forschungsgemeinschaft (DFG EXC 2056, Project ID 390715994) and research unit QUAST FOR 5249 (project ID: 449872909; project P5).LKW was supported by a grant from the Simons Foundation as part of the Simons Collaboration on the manyelectron problem.MR thanks the Flatiron Institute for hospitality and acknowledges financial support from the Dutch research program 'Materials for the Quantum Age' (QuMat).This program (Registration Number 024.005.006) is part of the Gravitation program financed by the Dutch Ministry of Education, Culture and Science (OCW).