Superexchange mechanism and quantum many body excitations in the archetypal di-Cu oxo-bridge

The hemocyanin protein binds and transports molecular oxygen via two copper atoms at its core. The singlet state of the Cu2O2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\rm{Cu}}}_{2}{{\rm{O}}}_{2}$$\end{document} core is thought to be stabilised by a superexchange pathway, but detailed in situ computational analysis is complicated by the multi-reference character of the electronic ground state. Here, electronic correlation effects in the functional site of hemocyanin are investigated using a novel approach, treating the localised copper 3d electrons with cluster dynamical mean field theory. This enables us to account for dynamical and multi-reference quantum mechanics, capturing valence and spin fluctuations of the 3d electrons. Our approach explains the stabilisation of the experimentally observed di-Cu singlet for the butterflied Cu2O2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\rm{Cu}}}_{2}{{\rm{O}}}_{2}$$\end{document} core, with localised charge and incoherent scattering processes across the oxo-bridge that prevent long-lived charge excitations. This suggests that the magnetic structure of hemocyanin is largely influenced by the many-body corrections. The electronic structure of copper-based metalloproteins is important for their function in biological processes, but studying this from a computational standpoint can be challenging. Here, the authors report a dynamical mean field theory approach which reveals the role of strong electronic correlation effects in oxygen binding in the hemocyanin protein.

C opper-based metalloproteins play a major role in biology as electron or dioxygen (O 2 ) transporters. Haemocyanin (Hc) is one of three oxygen transporting proteins found in nature, alongside the iron-based hemerythrin and haemoglobin, and is common to a number of invertebrates, such as molluscs and arthropods. Deoxy-Hc employs two half-spin copper (I) cations, each coordinated by the imidazole rings of three histidine residues, to reversibly bind O 2 . Furthermore, various type 3 copper-based systems possess catalytic properties. Hc can decompose hydrogen peroxide into water and oxygen 1 and synthetic analogues have been shown to reversibly cleave the dioxygen bond 2a mechanism that enables tyrosinase and catechol oxidases to oxidise phenols 3 .
An accurate understanding of the electronic structure (spin and charge) of the Cu 2 O 2 core is essential to clarifying the operation of dioxygen transport and would advance the design of synthetic catalysts that employ dioxygen as a terminal oxidant. There is significant interest in the biomimetic application of naturally occurring metal complexes for use in metallodrug design, with Cu (II) complexes recently employed in cancer therapeutics as artificial DNA metallonucleases 4 and tyrosinase mimics 5 .
However, the formation of oxygenated haemocyanin (oxyHc) via the binding of O 2 to deoxygenated haemocyanin (Hc) remains a challenging problem. In particular, the binding of O 2 falls into the category of a spin-forbidden transition. Molecular O 2 is in a spin triplet configuration, and the Cu ions in deoxyHc are known to be in the Cu(I) d 10 singlet configuration. The combination of triplet O 2 and singlet deoxyHc, to produce the Cu 2 O 2 antiferromagnetic singlet in oxyHc, is believed to occur via a simultaneous charge transfer of one electron from each Cu(I) ion to O 2 forming a hybrid Cu(II)-peroxy-Cu(II) configuration. A superexchange pathway is hypothesised to form across the two Cu atoms 6 . This mechanism is supported by superconducting quantum interference device (SQUID) measurements that report a large superexchange coupling 7 between the two Cu centres, and a diamagnetic ground state 8 .
Despite intensive theoretical studies on the nature of the sideon coordinated Cu 2 O 2 core, theoretical analysis has so far proved to be challenging for many electronic structure methods including ab initio quantum chemistry, density functional theory (DFT) and mixed quantum/classical (QM/MM) methods due to the complex simultaneous interplay of both the charge and spin, and to the multi-reference character of oxyHc. In particular, DFT and hybrid-DFT do not predict the correct singlet ground state due to the multi-reference nature of the ground state that is not accessible in DFT-based approaches [9][10][11] . To overcome the limitation of DFT techniques, a spin-projection method (also called spinmixing) is often applied, whereby the different spin-polarised ground states are calculated individually 12 , and the entangled singlet is reconstructed by linear combination of the respective Slater determinants (essentially a combination of the spin-broken symmetry state in the up-down, up-up and down-down configurations to extract an effective singlet state). Although this construction yields insights into the energetics, it does not allow for the study of excitations, and limits the scope of comparison with experimental data, such as the optical absorption 10 , that is a stringent test of any theory. Furthermore, the spin-contamination present in spin-polarised hybrid DFT remains an issue 6,11,[13][14][15] , and typically the broken symmetry state wrongly becomes asymmetric in the oxyHc Cu 2 O 2 core 10 . Finally, experiments have also alluded to the necessity of characterising the oxyHc ground state as a mixed valence state 16 , which cannot be accessed in the ground state DFT approach.
Meanwhile, multi-reference wave function methods have been used to extensively study the oxyHc core 13,14,17-22 , but these approaches are not feasible for systems containing more than several dozen electrons. Flock and Pierloot 18 argue that the inclusion of the imidazole ligands results in steric effects that are critical for a realistic description of oxygen containing dicopper systems, but most multi -reference wave function studies of this  core consider a simplified model with ammonia ligands  (including CC 13,14 , CASPT2 18 , MRCI 19 , RASPT2 20 , and DMRG-CASPT2 21 ), while others (such as DMRG 22 and DMRG-CT 23 ) are limited to the experimentally inaccessible bare Cu 2 O 2 2þ core. Furthermore, this system has large active space requirements given that it likely suffers from triplet instability 11 , and if the number of allowed excitations is too limited, size-extensivity errors arise 20 . Some of these methods also lack dynamical correlation contributions [24][25][26] , and others strongly over-correct correlation effects 19 .
In this work, we apply an alternative strategy: dynamical meanfield theory (DMFT) 27 . This method belongs to a broad category of embedding approaches and accounts for the limitations discussed above by treating the many-body effects and the superexchange of the di-Cu bridge explicitly (unlike DFT), while limiting this treatment to the correlated subspace of the copper 3d electrons, thereby side-stepping the prohibitive scaling of quantum chemistry methods. DMFT is a sophisticated method that includes quantum dynamical effects, and takes into account charge fluctuations, spin fluctuations, and thermal excitations. Although DMFT is routinely used to describe materials, it was also recently extended to molecular systems 28,29 , and combined with the linear-scaling DFT software ONETEP to extend the applicability of DFT + DMFT to systems of unprecedented size [30][31][32][33][34] .
In order to characterise the superexchange mechanism between the Cu 2 d-orbitals and intermediate O p-orbitals, we must also use non-local DMFT (or "cluster DMFT"). Single-site DMFT invokes a mean-field approximation across the multiple correlated sites, and thus it can only treat the multiplet structure of each Cu atom separately. This is not the case for cluster DMFT, where all correlated sites are directly treated in a multi-site Anderson impurity model (AIM)-hence our approach might be denoted as DFT + AIM. We also carry out a self-consistency cycle over the charge density, as we work at fixed total number of electrons. This produces a feedback to the solution of the model that is similar in its nature to the mean-field approximation used in DMFT.
Our DMFT + cluster DMFT approach identifies a regime where many-body effects-as characterised by the Hubbard interaction U-stabilise the singlet for the butterflied Cu 2 O 2 structure, in line with experiment. We show that a dominant spin singlet state is maximised at U ¼ 8 eV and discuss the validation of the model by optical experiments. The obtained singlet is in the Heitler-London regime (an entangled quantum superposition of two localised magnetic moments), and is associated with incoherent scattering processes that reduce the lifetime of charge excitations.

Results
The spin state of the Cu 2 O 2 core. This work presents the first DFT + DMFT simulations on a 58-atom model of the oxyHc functional complex (Fig. 1). While 58 atoms is within the reach of some less accurate quantum chemistry methods, the overhead of extending DFT + DMFT to include much more of the protein environment would come at an insignificant computational cost given our use of linear-scaling DFT.
In vivo, the Cu 2 O 2 core exists in a low-spin (singlet) state, as identified by EPR 35 . In the model of Metz and Solomon, this lowspin state is stabilised by superexchange via the O 2 ligand orbitals, which relies on the Cu 2 O 2 core being planar (Fig. 2a, b) 10 . As the peroxide molecule unbinds, the core butterflies (i.e. the dioxygen moves up out of the plane, leaving the core in a bent configuration). Here, each Cu overlaps with a different oxygen π Ã orbital on the peroxide (Fig. 2c, d). This removes the superexchange, and the triplet state becomes most favourable. If we measure planarity by R ¼ j 1 2 ðr CuA þ r CuB Þ À 1 2 ðr O1 þ r O2 Þjthat is, the distance between the mean position of the two copper atoms and the mean position of the two oxygen atoms-the singlet-to-triplet transition occurs at R = 0.6 Å using the B3LYP DFT functional 10 .
However, X-ray structures of the Cu 2 O 2 core reveal that the bound singlet state is not planar. In oxyHc R = 0.47 Å, and in oxyTy R = 0.63 Å-beyond the predicted singlet-to-triplet transition 36,37 . QM/MM studies of the entire oxyHc protein (from which our model complex derives) obtain R = 0.54 to 0.71 Å; evidently, the protein scaffolding around the binding site prevents the core ever reaching the planar structure observed in model complexes 11 .
With this in mind, we examined the reduced density matrix of our butterflied model (R = 0.68 Å). This provides a detailed picture of the electronic structure of the 3d subspace of the Cu atoms (Fig. 3). Within DMFT, the density matrix is obtained by tracing the Anderson impurity model over the bath degrees of freedom, and gives an effective representation of the quantum states of the two Cu atoms. Note that in our approach, the ground-state wave function is not a pure state with a single allowed value for the spin states (singlet, doublet, triplet, etc.), yet we can describe the fluctuating spin states of the Cu atom by analysing the spin distribution obtained from the dominant configurations.
DMFT allows us to systematically investigate how many-body and finite-temperature effects alter these results. The Hubbard interaction U is known to be paramount to capture many-body effects. Several competing and relevant many-body phenomena stem from this term, including charge localisation, exchange of electrons, charge-transfer excitations, and stabilisation of magnetic multiplets. Consequently, the electronic structure of the 3d subspace is highly sensitive to the magnitude of this parameter.
Although typical values for U can be obtained by linear response or constrained RPA 38,39 , these predictions are typically dependent on the choice of local orbitals and basis representation. In this work we instead consider a range of values for U (from 0 to 10 eV). By artificially manipulating the magnitude of the local many-body effects, we can systematically investigate their influence on the electronic structure and derived properties.
The effect of U on the character of the reduced density matrix is shown in Fig. 3. In the weakly correlated regime (U < 2 eV), we obtain a large contribution from the d 20 and d 19 configurations, indicating that the average charge transfer from the Cu to O 2 involves less than one electron per Cu, thus preventing the formation of a singlet (as the Cu 3d orbitals are nearly full).
As U increases, the total electronic occupation of the Cu dimer decreases ( Fig. 3 and Supplementary Table 1). In the range U = 6 − 8 eV, the d 18 singlet component is maximised, beyond which d 18 triplet excitations begin to contribute. The existence of this singlet is corroborated by the observed local magnetic moment and spin correlation (Supplementary Note 1 and Supplementary  Fig. 1). Interestingly, the von Neumann entropy (Supplementary Note 2 and Supplementary Fig. 2) grows as U increases, pointing to the importance of many low-lying quantum states. Noting that Fig. 1 The oxyHc functional complex. Structure showing the Cu 2 O 2 correlated subsystem, which is treated using dynamical mean field theory, and the surrounding imidazole rings representing the protein environment. Hydrogen, carbon, nitrogen, oxygen and iron atoms are shown in white, green, blue, red, and orange respectively.  10 . The Cu 2 O 2 core is depicted from side-on (a, c) and above (b, d). In the planar configuration (a, b), single-ligand orbitals bridge the two copper sites, and superexchange is possible. In a bent configuration (c, d), the copper d orbitals overlap with different π Ã orbitals. As these two sets of orbitals are orthogonal, hopping between the blue and the red subspaces is not possible and the superexchange mechanism breaks down. U % 8 À 8:5 eV in the case of both molecules 40 and solids 41 , it appears that in nature many-body quantum effects stabilise the low-spin singlet in spite of the butterflied structure of the Cu 2 O 2 core (Fig. 2c, d).
The superexchange mechanism. Having identified this singlet in the Cu 2 O 2 core at U % 8 eV, let us establish how it forms. Direct hopping between localised Cu d-orbitals is very unlikely due to the large distance by which they are separated, and therefore hopping must proceed via an intermediate oxygen p-orbital (i.e. superexchange) 42 .
The superexchange process can be pictured in the canonical hydrogen atom dimer system. This system describes a pair of up and down electrons that form a singlet state. In this picture, two different limits are possible: (i) when the H atoms form a bond at short distance, and the up and down electron form a delocalised bound singlet (BS) centred on the bond, with a high degree of double occupancy, (ii) in the dissociated case, known as the Heitler-London (HL) limit, where the H atoms are far apart and the singlet is a true quantum entangled state of the singly occupied H orbitals. In the latter case, the molecular orbital contains only one electron and double occupancy is zero.
In the HL limit, which typically occurs in the limit of dissociation, the charge is localised around the H atoms. However, this effect may also occur in systems where the local Hubbard Coulomb repulsion U acts as a Coulomb blockade: many-body effects prevent long-lived charge transfer excitations, and the Coulomb repulsion energy is reduced at the expense of the kinetic energy. A signature of the blockade is typically a large increase in the self energy at the Fermi level (pole structure), indicating charge localisation and incoherent scattering associated with a short lifetime of charge excitations.
To investigate the nature of the singlet (BS or HL), we report in Fig. 4a the computed self energy of the Cu 3d subspace, for various values of U. We obtain a qualitative difference between U = 6 eV and U = 8 eV, where at U = 8 eV the self energy develops a pole at ω ¼ 0. The formation of the pole is particular to U = 8 eV (Fig. 4b), and is associated with the regime where excitations are incoherent, which prevents long-lived charge transfer excitations from the Cu 3d orbitals to O 2 . In this limit, the many-body effect acts as a Coulomb blockade and the charge is in turn localised, with weak direct coupling (HL limit). For U = 6 eV, the singlet is in the BS limit, where charge excitations allow a direct electron transfer across the oxo-bridge. Note that the observation of the BS-HL crossover is not apparent in averaged quantities, such as in the double occupancies (Fig. 4c), which evolve smoothly with the Coulomb repulsion.
Turning back to the physical system, superexchange relies on a single-ligand orbital linking the two copper sites. (Superexchange pathways via two p orbitals are possible, but they give rise to ferromagnetic coupling that is significantly weaker than antiferromagnetic coupling from single-ligand orbital pathways 43 .) Examination of the molecular orbitals near the Fermi level (Fig. 5) reveals that, for the HOMO of the bent Cu 2 O 2 structure, electronic density of the oxygen ligand is directed into the copper plane, thus providing a pathway for the antiferromagnetic superexchange that we observe.
Interestingly, we note that these molecular orbitals differ in their energetic ordering compared to those from DFT studies of planar model complexes 8 . In particular, the HOMO involves hybridisation with oxygen π Ã rather than σ Ã orbitals (with the σ Ã oxygen orbital featuring approximately 3 eV above the Fermi level). This reordering will have substantial ramifications for the potential catalytic pathways, especially considering the importance of the σ Ã orbital to bond-breaking. This picture is confirmed by natural bond orbital (NBO) analysis, the results of which are presented in Supplementary Note 3.  Optical transitions. As a validation of the DFT + DMFT computational model, and to identify the strength of correlations in oxyHc, we extracted the optical absorption spectrum of ligated haemocyanin (Fig. 6).
As experiments are performed in the gaseous phase and not in a single crystal, we have calculated the isotropic and anisotropic components of the dielectric tensor. The former involves only pair correlators along the same spatial directions, whereas the latter also incorporates non-diagonal terms (note that both are causal quantities), which are important in oxyHc as the local coordination axes of the Cu atoms are not aligned. Remarkable agreement is obtained for U = 8 eV, with however a blue shift at high frequency (ω > 4 eV), and a concomitant transfer of optical weight to lower energies (ω < 3 eV). The position of the peak at approximately 3.5 eV is in excellent agreement. The experimental features at 3 eV (see arrow) are also visible in the theoretical calculations. The peaks at 1.8 and 2.2 eV contribute to the long infrared tail of the optical weight down to 1.5 eV. We note that the main difference between the BS (U 6 eV) and the HL singlet (U = 8 eV) is the presence of several large peaks associated with charge transfer from ligand to Cu 3d orbitals below 2.5 eV, which are significantly smaller/absent in experiment 45,46 . The suppression of the optical weight at 2 eV is due to a large increase in incoherent scattering at ω ¼ 0 eV at U = 8 eV (Fig. 4), associated with the localisation of the holes in the Cu 3d shell at U = 8 eV. The strong suppression of the optical weight in the near infrared regime and the consistent agreement with our calculations shows that the di-Cu singlet is in the HL regime. In comparison, DFT, without extensions, puts a strong emphasis on the near infrared peak in the optical absorption 10 because the aforementioned scattering processes are absent at this level of theory. (See also the very small HOMO-LUMO gap in Supplementary Fig. 4 for small values of U.) The absorption spectrum of this protein has been reported experimentally to be qualitatively dependent on its ligation state. In its oxygenated form, a weak peak at 570 nm (approximately 2.2 eV, see arrow) is attributed to ligand-to-metal charge transfer 45,46 from the O 2 π anti-bond with lobes oriented perpendicular to the metal atoms. This orbital is denoted "π Ã v "; the π anti-bond with lobes directed towards the copper atoms is denoted π Ã σ and is responsible for an experimentally observed (and much larger) peak at approximately 3.6 eV. Our calculations accurately reproduce this large peak, associating it with a transition from the weight at −1 eV in the density of states ( Supplementary Fig. 4), which is localised on Cu A /Cu B /imidazole, to the LUMO, which in our calculations is a hybridised state between the Cu and the π Ã σ O 2 , with dominant Cu 3d character (Fig. 5b, d). Meanwhile, a small peak at 2 eV exists corresponding to the HOMO-to-LUMO transition (π Ã v to Cu charge transfer). Overall, comparison with the experimental spectrum suggests U ! 8 eV which is compatible with the value of U that gives rise to the singlet. We note that time-dependent density functional theory (TDDFT) calculations reproduce the absorption peak at 3.6 eV, but not the peak at 4.5 eV. More importantly, the ground state electronic structure at this level of theory is not the antiferromagnetic singlet state observed experimentally; thus, agreement between TDDFT and experiment likely involves some cancellation of errors.

Discussion
We have presented the application of a DFT + cluster DMFT approach, designed to treat strong electronic interaction and multi-reference effects, to oxyHc, a molecule of important biological function. The reduced density matrix of the 3d subspace of the two Cu atoms revealed the presence of fluctuating spin-states, in which a Cu 2 d 18 singlet component is maximised at U = 8 eV in spite of the butterfly distortion of the Cu 2 O 2 core. The Hubbard U is necessary to capture the multi-reference character of the ground-state, placing oxyHc in the limit of a true quantum entangled singlet in the limit of the Heitler-London model, while the highest occupied molecular orbital has been shown to provide a pathway for antiferromagnetic superexchange. This work provides a starting point for studying biological activity of oxyHc and related type 3 Cu-based enzymes by (a) establishing that the singlet can survive the butterfly distortion, thereby resolving a prior inconsistency between structural data, spectroscopy, and first-principles calculations, and (b) by providing a framework for subsequent studies to account for the effect of the protein "scaffolding" in which the active site sits, as well as the effects of strong electronic correlation. Our approach reproduced the experimentally observed peaks in the absorption spectrum at around 2.2, 3.7, and 4.5 eV.
The DFT + DMFT method provides a useful middle-ground between (a) density-functional theory-based simulations of metalloproteins, which depend heavily on the choice of functional and incorrectly describe multi-reference effects, and (b) single-or multi-reference quantum chemistry approaches, whose unfavourable scaling prevents studies on models of realistic size. More generally, our particular implementation of DMFT within linearscaling DFT software 32 can account for strongly correlated  Note that because these are extracted from the Green's function via the spectral density, the phase of the orbitals is inaccessible. electronic behaviour while simultaneously including the effects of protein environments, making it ideally suited for studying biological activity in a wide range of transition metal-containing proteins 47,48 .

Methods
Geometry. The geometry of the 58-atom system was obtained from the literature 11 . This had been optimised using the B3LYP hybrid functional and closely matches the experimentally observed structure 11,36 . The coordinates of this structure are provided in Supplementary Note 4.
DFT. The DFT ground-state was obtained using ONETEP 30 , which is a linearscaling DFT code that is formally equivalent to a plane-wave method. Linear-scaling is achieved by the in situ variational optimisation of its atom-centred basis set (spatially truncated nonorthogonal generalised Wannier functions, or NGWFs) 49 . The total energy is directly minimised with respect to the NGWFs and the singleparticle density matrix. The use of a minimal, optimised Wannier function representation of the density-matrix allows for the DFT ground state to be solved with relative ease in large systems. This is particularly useful in molecules, since explicit truncation of the basis functions ensures that the addition of vacuum does not increase the computational cost 50 .
The DFT calculations of the oxyHc system were run with an energy cut-off of 897 eV, using the Perdew-Burke-Ernzerhof (PBE) exchange-correlation functional 51 . Nine NGWFs were employed on the copper atoms, four on each carbon/nitrogen/oxygen, and one on each hydrogen. Spin symmetry was imposed. NGWFs were truncated using 7 Å cut-off radii. Open boundary conditions were achieved via a padded cell and a Coulomb cut-off 52 . The Hubbard projectors were constructed from the Kohn-Sham solutions to an isolated copper pseudopotential 53 . The pseudopotentials were generated with the OPIUM pseudopotential generation project 54 . These pseudopotentials partially account for scalar relativistic effects. (Studies have demonstrated that relativistic effects can play a role in the electronic structure of the Cu 2 O 2 core 55 .) DMFT. We refined our DFT calculations using the DFT + DMFT method 27,56 in order to obtain a more accurate treatment of strong electronic correlation effects. The oxyHc model was mapped, within DMFT, to an Anderson impurity model (AIM) Hamiltonian 57 , and we used a recently developed extended Lanczos solver 58 to obtain the DMFT self energy. The convergence of the mapping is shown in Supplementary Fig. 5 and discussed in Supplementary Note 5.
To identify the best spatial representation of the local d-space in the projected Anderson impurity model, we first identified the orthogonal transformation that reduces the off-diagonal elements of the local Green's function, for respectively the Cu A and Cu B sites. Then, we implemented a minimisation procedure that finds the closest corresponding real space SOð3Þ rotation of the local Cartesian axis corresponding to the Oð5Þ orthonormal transformation in d-space. This provides a set of local axes, different on each Cu atom, which make the local Green's function nearly diagonal in frequency space. These axes are plotted in Supplementary Fig. 6 and discussed in Supplementary Note 6. As a result, we observed that the Cu A and Cu B holes are of pure orbital character within this local axis representation. The validity of this approach was vindicated by the NBO analysis, which when performed on the DMFT density revealed a hole in one 3d orbital for each Cu atom, confirming the expected Cu(II) oxidation state 3d 9 4s 0 (see Supplementary Note 3 for details).
Since only a single impurity site (3d orbital subspace) is present, the system becomes crystal momentum independent in the molecular limit, and since the Kohn-Sham Green's function is computed in full before projection onto the impurity subspace, the Anderson impurity mapping is effectively exact, and the necessity of invoking the DMFT self-consistency is not required. However, in DFT + DMFT there is also a charge self-consistency cycle, where (i) the chemical potential can be updated to ensure particle conservation and/or (ii) the DFT + DMFT density kernel can be used to generate a new Kohn-Sham Hamiltonian, which in turn provides a new input to DMFT; the procedure being repeated until convergence is achieved [59][60][61] . In this work, we updated the chemical potential but not the Hamiltonian due to computational cost.
To obtain the Kohn-Sham Green's function, we performed the matrix inversion, as well as all matrix multiplications involved in the DMFT algorithm, on graphical processing units (GPUs) using a tailor-made parallel implementation of the Cholesky decomposition written in the CUDA programming language. Electronic correlation effects are described within the localised subspace by the Slater−Kanamori form of the Anderson impurity Hamiltonian 62,63 , specifically: where m; m 0 are orbital indices, d mσ (d y mσ ) annihilates (creates) an electron with spin σ in the orbital m, and n m is the orbital occupation operator.
The first term describes the effect of intra-orbital Coulomb repulsion, parameterised by S m ¼ 1 2 d y ms σ ! ss 0 d ms 0 , and the second term describes the interorbital repulsion, proportional to U 0 ¼ U À 2J, which is renormalised by the Hund's exchange coupling parameter J in order to ensure a fully rotationally invariant Hamiltonian (for further information on this topic, we refer the reader to Imada et al. 64 ) The third term is the Hund's rule exchange coupling, described by a spin exchange coupling of amplitude J. S m denotes the spin corresponding to orbital m, so that S m ¼ 1 2 d y ms σ ! ss 0 d ms 0 , where σ ! is the vector of Pauli matrices.
Our DMFT calculations were carried out at room temperature, T = 293 K and the Hubbard U was varied in the range 0-10 eV, with a fixed Hund's coupling J = 0.8 eV. The theoretical optical absorption was obtained in DFT + DMFT within the linear-response regime (Kubo formalism), in the no-vertex-corrections approximation 65 , which is given by: Z dω 0 f ðω 0 À ωÞ À f ðω 0 Þ ω ðρ αβ ðω 0 À ωÞv βγ ρ γδ ðω 0 Þ Á v δα Þ; where the factor of two accounts for spin-degeneracy, Ω is the simulation-cell volume, e is the electron charge, _ is the reduced Planck constant, f ω ð Þ is the Fermi-Dirac distribution, and ρ αβ is the density-matrix given by the frequencyintegral of the interacting DFT + DMFT Green's function. The matrix elements of the velocity operator, v αβ , noting that we do not invoke the Peierls substitution 65 , are given by: This expression is general to the NGWF representation used in this work 66 , where the contribution to the non-interacting Hamiltonian due to the non-local part of the norm-conserving pseudopotentials 67,68 , represented byV nl , is included. Finally, the double-counting correction E dc must be introduced, since the contribution of interactions between the correlated orbitals to the total energy is already partially included in the exchange-correlation potential derived from DFT. The most commonly used form of the double-counting term is 69 where with N being the number of orbitals spanning the correlated subspace (and recall that U 0 ¼ U À 2J). This double-counting is derived by attempting to subtract the DFT contributions in an average way; U av is the average of the intra-and interorbital Coulomb parameters.

Data availability
The data underlying this publication are available on the Apollo-University of Cambridge Repository: https://doi.org/10.17863/CAM.46223.

Code availability
The ONETEP code version 5 is available from www.onetep.org.