Electronic excited states in deep variational Monte Carlo

Entwistle, M. T.; Schätzle, Z.; Erdman, P. A.; Hermann, J.; Noé, F.

doi:10.1038/s41467-022-35534-5

Download PDF

Article
Open access
Published: 17 January 2023

Electronic excited states in deep variational Monte Carlo

Nature Communications volume 14, Article number: 274 (2023) Cite this article

5224 Accesses
11 Citations
63 Altmetric
Metrics details

Subjects

Abstract

Obtaining accurate ground and low-lying excited states of electronic systems is crucial in a multitude of important applications. One ab initio method for solving the Schrödinger equation that scales favorably for large systems is variational quantum Monte Carlo (QMC). The recently introduced deep QMC approach uses ansatzes represented by deep neural networks and generates nearly exact ground-state solutions for molecules containing up to a few dozen electrons, with the potential to scale to much larger systems where other highly accurate methods are not feasible. In this paper, we extend one such ansatz (PauliNet) to compute electronic excited states. We demonstrate our method on various small atoms and molecules and consistently achieve high accuracy for low-lying states. To highlight the method’s potential, we compute the first excited state of the much larger benzene molecule, as well as the conical intersection of ethylene, with PauliNet matching results of more expensive high-level methods.

Deep-neural-network solution of the electronic Schrödinger equation

Article 23 September 2020

A computational framework for neural network-based variational Monte Carlo with Forward Laplacian

Article 13 February 2024

Machine learning the Hohenberg-Kohn map for molecular excited states

Article Open access 17 November 2022

Introduction

The fundamental challenge of quantum chemistry, solid-state physics, and many areas of computational materials science is to obtain solutions to the electronic Schrödinger equation for a given system, which in principle provides complete access to its chemical properties. The ground and low-lying excited states typically determine the behavior of a system and are therefore of the most interest in many applications. Understanding and being able to describe excited-state processes¹, including a wide variety of important spectroscopy methods such as fluorescence, photoionization, and optical absorption of molecules and solids, is key to the successful design of new materials.

Unfortunately, the Schrödinger equation cannot be solved exactly except in the simplest cases, such as one-dimensional toy systems or a single hydrogen atom. Accordingly, many approximate numerical methods have been developed which provide solutions at varying degrees of accuracy. Time-dependent density functional theory^2,3 (TDDFT) is the most popular method due to its computational efficiency but has well-known limitations^4,5,6,7,8,9. Higher-accuracy methods have a computational cost that scales rapidly with system size—the well-established full configuration interaction¹⁰ (FCI) and coupled cluster¹¹ (CC) techniques scale $\sim {{{{{{{\mathcal{O}}}}}}}}(\,{{\mbox{exp}}}\,(N))$ (FCI scales exponentially, while truncated CI scales polynomially.) and $\sim {{{{{{{\mathcal{O}}}}}}}}({N}^{5-10})$ (The scaling of CC depends on the particular method used: CC2 ${{{{{{{\mathcal{O}}}}}}}}({N}^{5})$, CCSD ${{{{{{{\mathcal{O}}}}}}}}({N}^{6})$, CCSD(T) ${{{{{{{\mathcal{O}}}}}}}}({N}^{7})$, CC3 ${{{{{{{\mathcal{O}}}}}}}}({N}^{7})$, CCSDT ${{{{{{{\mathcal{O}}}}}}}}({N}^{8})$, CCSDT(Q) ${{{{{{{\mathcal{O}}}}}}}}({N}^{9})$, CCSDTQ ${{{{{{{\mathcal{O}}}}}}}}({N}^{10})$.) respectively, where N is the number of electrons, thereby severely limiting their practical use. There is thus a huge need for ab initio methods that scale more favorably with system size, allowing the modeling of practically relevant molecules and materials.

Quantum Monte Carlo (QMC) techniques offer a route forward with their favorable scaling (${{{{{{{\mathcal{O}}}}}}}}({N}^{3-4})$) and therefore dominate high-accuracy calculations where other methods are too expensive^12,13. A state-of-the-art QMC calculation typically involves the construction of a multi-determinant baseline wavefunction through standard electronic-structure methods, which is augmented with a Jastrow factor to efficiently incorporate electron correlation, and then optimized through variational QMC (VMC) to obtain a trial wavefunction. This is then used within fixed-node diffusion QMC (DMC) to obtain a final electronic energy. The fixed-node approximation is used to avoid exponential scaling, with the drawback that the nodal surface of the trial wavefunction cannot be modified, which limits the accuracy of the DMC result¹⁴. A more expressive baseline wavefunction can improve upon this but traditional DMC often needs thousands to hundreds of thousands of determinants to reach convergence¹⁵. Additionally, DMC only provides the final energy, restricting the calculation of other electronic properties¹⁶. Both of these limitations can, in principle, be resolved at the VMC level, with the accuracy of VMC constrained only by the flexibility of the trainable wavefunction ansatz. So far, these techniques have mostly been developed for ground-state calculations, with different extensions proposed to address excited states^{12,17,18,19,20,21,22,23,24,25,26}.

Recently, the new ab initio approach of deep VMC methods has been introduced^27,28,29,30 and subsequently further extended and improved^31,32,33. In particular, PauliNet²⁷ and FermiNet²⁸ were the first methods to demonstrate that highly accurate ground-state results for molecules could be obtained using deep VMC with lower computational complexity and using orders of magnitude fewer Slater determinants than typically employed in other methods that achieve similar accuracy.

In the same spirit as Carleo and Troyer proposed for optimizing quantum states in lattice models³⁴, VMC is used in order to train a neural network model that represents the many-body wavefunction in an unsupervised fashion, i.e. in contrast to other quantum machine learning approaches the only input to the method is the Hamiltonian, and training data are generated on the fly by sampling from the current wavefunction model and minimizing the variational energy. In both PauliNet and FermiNet deep antisymmetric neural networks are used to represent the fermionic wavefunction in the real space of electron coordinates.

Recently, there has been much interest in developing deep learning methods for excited states³⁵. In this paper, we extend PauliNet towards the ab initio computation of electronic excited states (see the “Methods” section for details). The input is again only the Hamiltonian of the quantum system. By employing a simple energy minimization and numerical orthogonalization procedure, we are able to obtain the lowest excited-state wavefunctions of a given system. The excited-state optimization makes use of a penalty method that minimizes the overlap between the nth excited state and the lower-lying states in the spectrum. Optimization methods that introduce additional constraints have been used in the context of VMC before²⁶ and provide a simple way to obtain orthogonal states without explicit enforcement in the wavefunction ansatzes. Combining these techniques with the expressiveness of neural network ansatzes yields highly accurate approximations to excited states with direct access to the wavefunctions for the evaluation of electronic observables. Neural network-based methods have targeted low-lying excited states of one-dimensional lattice models²⁵, but have not been applied to first-principles systems.

We demonstrate our method on a variety of small- and medium-sized molecules, where we consistently achieve highly accurate total energies, outperforming traditional quantum chemistry methods. We also compute excitation energies, transition dipole moments, and oscillator strengths, the main ground-to-excited transition properties, with the latter two known to be more sensitive to errors in the underlying wavefunctions than energies. In all test systems, we find PauliNet closely matches high-order CC and experimental results. Next, we show that our method can be applied in a straightforward manner to much larger molecules, using the example of benzene where we match significantly more expensive high-level electronic-structure methods. Finally, we demonstrate that PauliNet can be used to compute excited-state potential energy surfaces by modeling an avoided crossing and conical intersection of ethylene, a highly multi-referential problem.

Results

Nearly exact solutions for small atoms and molecules

To demonstrate our method we start by applying it to a range of small atoms and molecules. We optimize the lowest-lying excited states and compute their vertical excitation energies for the ground-state equilibrium geometry (see Supplementary Table I), with each PauliNet wavefunction containing a maximum of 10 determinants. In all systems, we obtain highly accurate total energies and estimates of the first few excitation energies competitive with high-accuracy quantum chemistry methods.

In Fig. 1 the excitation energies of the lowest states are shown for several atoms. For all the atoms the excitation energies are obtained within 4 mHa of the theoretical best estimates (TBE)³⁶. Due to the high degree of symmetry the atoms exhibit degeneracies, that is, multiple orthogonal states can be found with the same energy. Being subject to the orthogonalization constraint, PauliNet approximates all orthogonal states of an energy level individually, which is observed by attaining multiple results at the same energy level. The multiplicity of the exact solution can be obtained theoretically by considering the electronic configurations of the atoms and is reproduced within our experiments.

**Fig. 1: Deep VMC obtains highly accurate excited states for single elements.**

We then compute a larger number of excited states for LiH, BeH and Be. In each experiment, we optimize eight ansatzes in parallel. In Fig. 2 we illustrate the training process by plotting the convergence of the total energies and excitation energies. Additionally, we plot the training estimates of the pairwise overlaps of the wavefunctions, which remain small throughout the optimization process. We confirm that the final overlaps are near-zero by exhaustively sampling the trained wavefunctions, thereby obtaining well-converged Monte Carlo estimates (see Supplementary Table VI). Based on the degeneracies we find a total of five (LiH), four (BeH), and three (Be) distinct excitation energies, respectively. The excitation energies match those from reference values, and in particular, we find that for all systems studied here we reliably obtain the first excited state, and apart from one case also the second excited state. However, especially for clusters of higher-lying excited states with similar energies, we typically do not find all members of the cluster. In these cases, which states are found depends on the initialization of our ansatzes, as well as the total number of states that are being sought. To give a transparent picture of the capabilities of our method, in this work we have refrained from optimizing the CASSCF baseline in order to find all possible excitations.

**Fig. 2: Optimizing low-lying excited states for small molecules.**

Highly accurate wavefunctions: transition dipole moments and oscillator strengths

Total energies and vertical excitation energies are the primary focus when benchmarking excited-state methods as they are readily available from many theoretical models and provide a good initial guess of a particular method’s accuracy. However, they provide only a partial characterization of the electronic states, and while a method in question may give accurate energies, other quantities of key importance may be inaccurate^37,38,39.

Transition dipole moments (TDM) and oscillator strengths are two principal ground-to-excited transition properties and are of great interest. TDMs determine how polarized electromagnetic radiation will interact with a system due to its distribution of charge, and therefore determine transition rates and probabilities of induced state changes. In the electric dipole approximation, the TDM between two states i and j is given by

$${{{{{{{{\bf{d}}}}}}}}}_{ij}=\langle {\psi }_{i}|\hat{{{{{{{{\boldsymbol{\mu }}}}}}}}}|{\psi }_{j}\rangle,$$

(1)

where $\hat{{{{{{{{\boldsymbol{\mu }}}}}}}}}={\sum }_{k}q{\hat{{{{{{{{\bf{r}}}}}}}}}}_{k}$ is the sum over the position operator of each particle weighted by its charge, with q = −e for electronic systems. We obtain the expectation value by Monte Carlo sampling according to Eq. (15). While the TDM is important for understanding a number of processes, including optical spectra, it is generally a complex-valued vector quantity and not an experimental observable by itself. The closely related oscillator strength is what is inferred through the experiment and is given by

$${f}_{ij}=\frac{2}{3}\Delta E{d}_{ij}^{2},$$

(2)

where ΔE is the excitation energy between states i and j, and ${d}_{ij}^{2}$ is the dipole strength. It is known that, in addition to being more basis-set sensitive, d_ij and f_ij are both highly dependent on the quality of the trial wavefunctions⁴⁰ and represent a more rigorous test for ab initio methods than just energies.

Recently, transition energies and oscillator strengths for a variety of small molecules have been computed using high-order CC calculations, systematically extrapolating to the complete basis set (CBS) limit, and comparing to experimental results where possible, in order to supply a comprehensive set of theoretical benchmarks^41,42. In that spirit, we now use these results to benchmark the accuracy of oscillator strengths computed using PauliNet. Furthermore, we also compare multi-reference CC (MR-CC) results where possible⁴³. We compute the first few electronic states for five molecules (BH, CH⁺, H₂O, NH₃, CO), such that we obtain the first non-zero oscillator strength (within the dipole approximation) for each. All calculations (CH⁺ was not included in the CC calculations in refs. ^41,42. We instead compare to (MR-)CC results in ref. ⁴³, using the same ground-state equilibrium geometry, which was obtained in a split-valence basis augmented with diffuse and polarization functions. See refs. ^43,44 for more details.) are performed at the same ground-state equilibrium geometries as refs. ^{41, 42} (see Supplementary Table I) and using the same number of determinants (≤10) as in the section “Nearly exact solutions for small atoms and molecules”.

Our results for all systems are shown in Fig. 3. First, we compute the amount of correlation energy recovered in the ground state and find PauliNet matches high-order CC methods (panel a). Second, we compute the excitation energy for each transition and find this to be close to the TBE, on par with CC and much more consistent than TDDFT where the accuracy depends on the molecule and on the exact TDDFT method used (panel b). Finally, we compare the oscillator strengths (for the 0 → 2 transition) in panel c. Even high-order methods such as CC and MR-CC can produce a spectrum of results depending on the expansion and basis set used, with this exacerbated in cheaper methods such as TDDFT (see the example of CO). In all systems, PauliNet compares well with experimental results, demonstrating the quality of deep VMC wavefunctions with just a minimal number of determinants.

**Fig. 3: Deep VMC obtains highly accurate excited-state energies and wavefunctions for small molecules.**

Application to larger molecules

The previous two sections showed that we achieve highly accurate results across a range of small systems. While this is encouraging, traditional high-accuracy methods that are better established are readily available for such small systems. In this section, to demonstrate the potential of excited PauliNet, we show that it can be applied in a straightforward manner to significantly larger molecules. For this objective, we choose the example of the benzene molecule (panel a of Fig. 4). Studies of its electronic structure and other properties are plentiful due to its importance in bio and organic chemistry, and with 42 electrons all-electron calculations will be extremely demanding or even intractable for a high-level description of its electronic states, depending on the theory level used.

**Fig. 4: Calculating the two lowest electronic states of the benzene molecule.**

Using a PauliNet ansatz with just 10 determinants, the same as in the much smaller systems, and slightly deeper neural networks (see Supplementary Table VII) we obtain very good total energies for the ground state and first excited state (upper left of Fig. 4). We note the better accuracy than high-level CC calculations, with this signifying highly accurate wavefunctions that can be used to compute other observables, as demonstrated in the previous section. The computed excitation energy is also shown (right of Fig. 4), with PauliNet compared against several experimental and theoretical results. The lower experimental result⁴⁵ (dashed black line) quantifies an adiabatic excitation energy, i.e. the energy difference between the ground state and the excited state at the corresponding relaxed geometries. This quantity is corrected to obtain the vertical excitation energy²⁶ (solid black line), which omits nuclear relaxation and vibrational effects. As our calculations are performed at the ground-state equilibrium geometry, we are targeting the vertical excitation energy, and therefore consider this corrected experimental result to be closer to the ground truth. We find this to be slightly underestimated by high-order methods (CC, DMC), and slightly overestimated by PauliNet. In other systems (panel b of Fig. 3) we notice a similar trend when comparing to the TBE.

PauliNet formally scales as ${{{{{{{\mathcal{O}}}}}}}}({N}^{4})$ with the number of electrons N, and in practice, we observe a scaling behavior ${{{{{{{\mathcal{O}}}}}}}}({N}^{3})$ for the systems investigated so far, which is related to quadratic scaling of the neural network with an extra factor from the evaluation of the local energy. As PauliNet is currently implemented in a research code, which is not optimized for production purposes, the computational time will have a large prefactor which makes it computationally unfavorable to, e.g. CC methods for small molecules. However, its very favorable scaling in N compared to ${{{{{{{\mathcal{O}}}}}}}}({N}^{5-10})$ of high-level electronic-structure methods dominates for larger molecules, and this is clearly visible in benzene. For instance, ref. ⁴⁶ used several state-of-the-art methods to obtain accurate benzene ground-state energies, with calculations run on several CPU types in a highly parallel manner (see Supporting Information of ref. ⁴⁶ for details). PauliNet was run on a single RTX 3090 GPU at a fraction of the number of node hours. Although PauliNet is the computationally cheapest method in this comparison, it provides a significantly better (variational) ground-state energy than all methods (~0.48 Ha lower). As all methods compared in Fig. 4 provide similar excitation energies, these cannot be used to group the methods into more or less accurate, but overall this data indicates that PauliNet and deep VMC methods in general have a very favorable cost/accuracy trade-off for molecules of the size of benzene and beyond.

Multi-reference application: conical intersections

Molecular configurations that produce electronic states with similar energies are fundamental in photochemical applications. Such configurations can lead to several states mixing, meaning they are all necessary for an accurate description of a particular process. Conical intersections are produced when two states become degenerate and require the computation of excited-state potential energy surfaces. The modeling of energy surfaces near degeneracies is inherently multi-reference with significant electronic correlation and is thus a challenging application for electronic-structure methods.

As a final application of excited PauliNet, we compute ground- and excited-state potential energies for ethylene (H₂C=CH₂) as a function of its torsion and pyramidalization angles (see inset of Fig. 5). Twisting around the C=C bond raises the energy of the ground state while lowering that of the first-excited singlet state, giving rise to an avoided crossing at a torsion angle τ of 90°. From this twisted structure, the energy gap between the two states is further reduced through the pyramidalization of one of the CH₂ groups, leading to a conical intersection. These potential energy curves, whose modeling is often too challenging for single-reference methods^47,48,49, have been characterized using multi-reference configuration interaction (MR-CI) methods⁵⁰ which we use for comparison.

**Fig. 5: Modeling a conical intersection of ethylene.**

We choose the same ground-state (planar) geometry as ref. ⁵⁰ (optimized using a small CAS and the aug-cc-pVDZ basis set; see Supplementary Table I) and find the excitation energy between the ground state and first-excited singlet state to be within a few mHa of the MR-CI results. As we vary τ, while keeping all other geometric parameters fixed, we find the energy curves to be well reproduced by PauliNet, with an avoided crossing at τ = 90° (panel a of Fig. 5; curves symmetric about τ = 90°). Single-reference methods, such as TDDFT (see figure), often overestimate the energy of the ground state at τ = 90° (barrier) and produce an unphysical cusp.

Next, we take the same twisted structure (τ = 90°) as ref. ⁵⁰ (optimized using a small CAS and the aug-cc-pVDZ basis set; see Supplementary Table I) and vary the pyramidalization angle ϕ, while keeping all other geometric parameters fixed. While there is a small discrepancy between PauliNet and the MR-CI results (panel b of Fig. 5), the trend of the energy curves is well described, including the correct minimum of the excited-state curve (~70°) and the conical intersection (PauliNet: ϕ ~ 100°; MR-CI: ϕ ~ 96°). We note that many single-reference methods are unable to even qualitatively describe the conical intersection, instead predicting spurious features⁴⁹.

Discussion

We have introduced an approach to compute highly accurate excited-state solutions of the electronic Schrödinger equation for molecules by using deep neural networks that are trained in an unsupervised manner with variational Monte Carlo. We have employed the PauliNet architecture²⁷ to approximate the ground- and excited-state wavefunctions, however other architectures such as FermiNet²⁸ or second quantization approaches²⁹ could also be employed, with suitable modifications. As our approach to find excited states only constrains the excited-state wavefunctions, the ability to compute highly accurate and variational absolute ground-state energies is unchanged. In addition, we demonstrate for a number of small molecules containing up to 42 electrons, that excited PauliNet can reliably find the first excitation energies with an accuracy that is on par with high-level electronic-structure methods, whereas cheaper methods such as TDDFT are less consistent in approximating these energies. The accuracy of the excited-state wavefunctions is underlined by an accurate match of oscillator strengths, which depend on the transition dipole moment, a quantity that is more sensitive to the exact form of the wavefunction than the energy. For benzene (42 electrons), PauliNet already requires significantly less computational time than higher-order methods, and this advantage will only improve for larger molecules. Formally, a single PauliNet is an ${{{{{{{\mathcal{O}}}}}}}}({N}^{4})$ method for N electrons, due to the computational cost of the Hartree-Fock or CASSCF baseline, however, in practice we empirically observe an ${{{{{{{\mathcal{O}}}}}}}}({N}^{3})$ dependency for the system sizes tested, as discussed above. In addition, for excited-state calculations n PauliNet replicas are used which gives rise to ${{{{{{{\mathcal{O}}}}}}}}(n{N}^{3})+{{{{{{{\mathcal{O}}}}}}}}({n}^{2}{N}^{2})$, with the latter term arising from the pairwise overlaps and having a much smaller prefactor than the former.

Notably, almost identical excited PauliNet architectures are used across the systems shown in this paper—up to minor modifications such as the budget of Slater determinants and the total number of excited states requested, and a deeper network for benzene to adapt for a potentially more complex wavefunction. Whereas a skilled quantum chemist can usually tune and specialize an existing electronic-structure method to give very high-accuracy results for a given molecule, our aim is the exact opposite: to provide a method that, by leveraging machine-learning tools, is as automated as possible and will work over a wide range of Hamiltonians provided.

We have demonstrated that we can compute ground- and excited-state potential energy surfaces with the example of ethylene where we model an avoided crossing and conical intersection. Here, where single-reference methods often fail, PauliNet performs well against multi-reference CI results. By combining the present approach with recent and ongoing extensions of PauliNet³² and FermiNet³³ that variationally compute entire potential energy surfaces, both highly accurate ground- and excited-state energy surfaces are now accessible with deep VMC methods. Future work will investigate the application of PauliNet to other interesting processes where molecular dynamics interacts with excited states.

One of the limitations of the current approach is that it appears difficult to reliably find all excited states up to a given desired number, especially in cases where several excited states have similar energies. This is a complex problem that depends on the Hartree-Fock/CASSCF initialization, on the total number of states requested, on the learning algorithm, and the expressiveness of the architecture and will be studied in more detail elsewhere. However, the first excited state could be reliably found for all molecules studied here, and apart from one exception also the second excited state. This, in combination with the high numerical accuracy and the favorable computational cost, makes deep VMC a promising method to compute both ground- and excited-state properties for small- and medium-sized molecules with dozens or even low hundreds of electrons.

Methods

PauliNet ansatz

At the heart of our approach is the PauliNet ansatz, introduced in ref. ²⁷ and further refined in ref. ⁵¹, a multi-determinant Slater–Jastrow-backflow type trial wavefunction that is parametrized by highly expressive deep neural networks:

$$\psi_{\boldsymbol\theta}({{{{{{{\bf{r}}}}}}}})={\mathrm e}^{\gamma({{{{{{{\bf{r}}}}}}}})+{J}_{\boldsymbol\theta}({{{{{{{\bf{r}}}}}}}})} \textstyle \sum\limits_{p} {c_p} \det\left[\tilde\varphi_{\boldsymbol\theta,{\mu_p}i}^\uparrow({{{{{{{\bf{r}}}}}}}})\right] \det \left[\tilde\varphi_{\boldsymbol\theta,{\mu_p}i}^\downarrow({{{{{{{\bf{r}}}}}}}})\right],$$

(3)

$${\tilde{\varphi }}_{{{{{{{{\boldsymbol{\theta }}}}}}}},\mu i}({{{{{{{\bf{r}}}}}}}})={\varphi }_{\mu }({{{{{{{{\bf{r}}}}}}}}}_{i}){f}_{{{{{{{{\boldsymbol{\theta }}}}}}}},\mu i}^{({{{{\rm {m}}}}})}({{{{{{{\bf{r}}}}}}}})+{f}_{{{{{{{{\boldsymbol{\theta }}}}}}}},\mu i}^{({{{{\rm {a}}}}})}({{{{{{{\bf{r}}}}}}}}),$$

(4)

where r = (r₁, . . ., r_N) is the 3N-dimensional real space of electron coordinates. The structure of our ansatz ensures that the correct physics is encoded: the wavefunction obeys exact asymptotic behavior through the fixed electronic cusps γ, and is antisymmetric with respect to the exchange of like-spin electrons through the use of generalized Slater determinants, guaranteeing the Pauli exclusion principle is obeyed.

The expressiveness of PauliNet is contained in the Jastrow factor J_θ and backflow f_θ, which introduce many-body correlation, and are both represented through deep neural networks (denoted by trainable parameters θ). J_θ and f_θ are constructed in ways that preserve the antisymmetry of the fermionic wavefunction with respect to exchanging like-spin electrons, as well as its cusp behavior. The Jastrow factor is an exchange-symmetric function, and captures complex correlation effects through augmenting the Slater-determinant baseline, but is incapable of modifying the nodal surface of the determinant expansion. Changes to the nodal surface are possible through the backflow, which acts on the single-electron orbitals φ_μ directly, transforming them into permutation-equivariant many-electron orbitals ${\tilde{\varphi }}_{\mu }$. f_θ is split into multiplicative (m) and additive (a) components (Eq. (4)), and is designed to be equivariant under the exchange of like-spin electrons.

Ground-state optimization

Like traditional VMC methods, PauliNet is based on the variational principle, which guarantees that the energy expectation value of a trial wavefunction ψ_θ is an upper bound to the true ground-state energy:

$${E}_{0}=\mathop{\min }\limits_{\psi }\langle \psi|\hat{H}|\psi \rangle \le \mathop{\min }\limits_{{{{{{{{\boldsymbol{\theta }}}}}}}}}\langle {\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}}}|\hat{H}|{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}}}\rangle .$$

(5)

For a given system, a standard quantum chemistry method (Hartree–Fock (HF) for a single determinant; complete active space self-consistent field (CASSCF) for multiple determinants) is performed, with the solution supplemented by the analytically-known cusp conditions, thus producing a reasonable baseline wavefunction. We then optimize the PauliNet ansatz by minimizing the total electronic energy (serving directly as the loss), following the standard VMC trick of evaluating it as an expectation value of the local energy, ${E}_{{{{{{{{\rm{loc}}}}}}}}}({{{{{{{\bf{r}}}}}}}})=\hat{H}\psi ({{{{{{{\bf{r}}}}}}}})/\psi ({{{{{{{\bf{r}}}}}}}})$, over the probability distribution ∣ψ_θ∣²:

$${{{{{{{\mathcal{L}}}}}}}}({{{{{{{\boldsymbol{\theta }}}}}}}})={\mathbb{E}}{}_{{{{{{{{\bf{r}}}}}}}} \sim {|\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}}}{|}^{2}}\left[{E}_{{{{{{{{\rm{loc}}}}}}}}}[{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}}}]({{{{{{{\bf{r}}}}}}}})\right].$$

(6)

This means that, in practice, we alternate between sampling electron positions generated using a Langevin algorithm with the probability of the trial wavefunction serving as the target distribution, and optimizing the trial wavefunction parameters using stochastic gradient descent. For further details, see ref. ²⁷.

Computing excited states

We now introduce the central idea of this paper: a deep VMC method to compute the ground and low-lying excited states of a given electronic system. While we employ PauliNet to represent the individual wavefunctions, the method can also employ FermiNet or other real-space wavefunction representations with suitable modifications.

In a similar spirit to the ground-state optimization process, we first obtain a reasonable baseline for each state by performing a minimal state-averaged CASSCF calculation. This optimizes the energy average for all states in question and yields a single set of orbitals to construct each multi-determinant wavefunction, which in turn are supplemented by the analytically-known cusp conditions. We fix the number of determinants in our ansatz by cutting off the CASSCF expansion based on the absolute values of their determinant coefficients. The choice of the CASSCF baseline ensures that the PauliNet ansatzes for the different excited states are close to orthogonal upon initialization. In contrast to the ground-state calculation, the optimization of excited states requires a more nuanced choice of active space. In principle, we must ensure that the solutions contain determinants with orbitals of the necessary rotational symmetries (the Jastrow factor and backflow correction are rotationally-symmetric modifications of the orbitals) and spin configurations (the choice of the number of spin-up and spin-down electrons does impose restrictions on the states that may be attained by our ansatz). For most systems studied in this paper, a generic choice of the active space was sufficient (see Supplementary Table VIII) and we have not studied the dependence on the CAS initialization in more depth. As shown in previous studies the quality of the orbitals has only a minor effect on the training and does not change the final energy⁵¹. If, however, the initialization is not accounted for and the baseline solutions provide a qualitatively wrong spectrum of excited states our ansatzes may be trapped in local minima and miss intermediate excited states (see Fig. 2), even though we keep the Slater-determinant coefficients c_p and linear coefficients c_μk of the single-electron orbitals φ_μ(r_i) = ∑_kc_μkϕ_k(r_i) trainable.

Our objective is to calculate the lowest n eigenstates of a given system, that is, find the set of orthogonal states that minimizes the energy expectation value. We approach this challenge by introducing a penalty term to the energy loss function (Eq. (6)) and optimizing the joint loss for n PauliNet instances:

$${{{{{{{\mathcal{L}}}}}}}}({{{{{{{\boldsymbol{\theta }}}}}}}})=\underbrace{\mathop{\sum}\limits_{i}{{\mathbb{E}}_{i}\left[{E}_{{{{{{{{\rm{loc}}}}}}}}}[{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},i}]({{{{{{{\bf{r}}}}}}}})\right]}}_{\begin{array}{c}{{\mbox{energy minimization}}}\end{array}}+\alpha \underbrace{\mathop{\sum}\limits_{i\ > \ j}\left(\frac{1}{1-|{S}_{ij}|}-1\right)}_{\begin{array}{c}{{\mbox{overlap penalty}}}\end{array}},$$

(7)

where ${{\mathbb{E}}}_{i}={{\mathbb{E}}}_{{{{{{{{\bf{r}}}}}}}} \sim|{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},i}{|}^{2}}$ and S_ij is the pairwise overlap between states i and j. The functional form of the overlap penalty is chosen to diverge when two states collapse and behave linearly when states are close to orthogonal (see the next section for details). This allows states to overlap during the optimization procedure while preventing their collapse and eventually driving them to orthogonality when they have settled in a local minimum of the energy. The hyperparameter α weights the two loss terms and can be increased throughout the training to strengthen the orthogonality condition when approaching the final wavefunctions. For a sufficiently large α, the true minimum of the loss function corresponds to the sum of the energies of the lowest-lying excited states with these states having no overlap. Thus, optimizing the penalized loss function (Eq. (7)) leads to an unbiased convergence towards the lowest-lying excited states (see below). In practice a small α is typically sufficient, making a robust choice possible.

To stabilize the training and reduce the computational cost we detach gradients in such a way that we only consider the overlap with the lower-lying states respectively, that is, the ground state is subject to unconstrained energy minimization and the nth excited state introduces n pairwise penalty terms. We compute the overlap of the unnormalized states i and j as the geometric mean of the two Monte Carlo estimates, obtained over distributions ∣ψ_θ,i∣² and ∣ψ_θ,j∣², respectively:

$${S}_{ij}={{{{{{{\rm{sgn}}}}}}}}\left({{\mathbb{E}}}_{i}\left[\frac{{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},j}({{{{{{{\bf{r}}}}}}}})}{{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},i}({{{{{{{\bf{r}}}}}}}})}\right]\right)\times \sqrt{{{\mathbb{E}}}_{i}\left[\frac{{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},j}({{{{{{{\bf{r}}}}}}}})}{{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},i}({{{{{{{\bf{r}}}}}}}})}\right]{{\mathbb{E}}}_{j}\left[\frac{{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},i}({{{{{{{\bf{r}}}}}}}})}{{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},j}({{{{{{{\bf{r}}}}}}}})}\right]}.$$

(8)

The sign of the overlap can be obtained from either of the two estimators, which match in the limit of infinite sampling. If the overlap is close to zero and the signs of the two estimates differ due to statistical noise of the sampling, we consider the states to be orthogonal. Similar to the energy loss, the gradient (We employ gradient clipping to stabilize the training.) of the pairwise overlap can be formulated such that it depends on the first derivative of the log wavefunction with respect to the parameters only (see below for details).

Finally, we note that different states may be modeled at different levels of quality, which can lead to erroneous excitation energies. In order to improve the error cancellation of our ansatzes we employ a variance-matching technique. As the variance of the energy σ² can be considered a metric of how close a wavefunction is to a true eigenstate, variance-matching procedures can be useful tools^21,52,53. Here, we utilize a simple scheme: for single-state quantities such as total energies, we evaluate all wavefunctions at the end of training. For multi-state quantities, such as excitation energies or transition dipole moments, we match states of a similar variance. That is, if final ψ_θ,i has a lower variance than final ψ_θ,j, we take ψ_θ,i at an earlier point in training. This simply involves computing σ² of the training energies and applying an exponential moving average at each iteration to monitor convergence (see below for details). We find this procedure typically improves the final results.

Loss function and overlap penalty

There are a number of choices of possible loss functions for the optimization of excited states in quantum Monte Carlo^20,26,54. In order to assess the feasibility of excited-state optimization with deep neural network ansatzes in variational Monte Carlo we conducted a range of experiments with different types of optimization objectives. Our empirical findings showed that employing a penalty method is the conceptually most straightforward approach and gives stable results when combining it with our implementation of PauliNet. Initially, we started with an overlap penalty term similar to Pathak et al²⁶. However, we found that our optimization could still collapse even if we chose a sufficiently large prefactor (α) and the training could not recover. We therefore switched to an alternative penalty term (Eq. (7)) which diverges upon a collapse of the states. The effect of our penalty term can be illustrated by considering the loss for a two-state system with the exact ground state $\left|{\psi }_{0}\right\rangle$ and a linear combination of the ground and first excited state $\left|{\psi }_{1}\right\rangle$ (see Fig. 6):

$$\left|{\psi }_{\epsilon }\right\rangle=\sqrt{1-\epsilon }\left|{\psi }_{1}\right\rangle+\sqrt{\epsilon }\left|{\psi }_{0}\right\rangle .$$

(9)

The overlap and the energy can be obtained as

$$\langle {\psi }_{0}|{\psi }_{\epsilon }\rangle=\sqrt{\epsilon },\quad \langle {\psi }_{\epsilon }|H|{\psi }_{\epsilon }\rangle=(1-\epsilon ){E}_{1}+\epsilon {E}_{0}.$$

(10)

In the vicinity of the orthogonal solution, the Taylor expansion of the penalty term is

$$\frac{1}{1-|S|}-1=|S |+|S{|}^{2}+|S{|}^{3}+...,\,{{{{{{{\rm{at}}}}}}}}\quad|S |=0,$$

(11)

that is, the overlap penalty behaves linearly to first order. This gives rise to a penalty that is locally stable for any prefactor, lower bounded by the S² penalty term, and diverges if states collapse. For a large enough α parameter the global optimum of the total loss is at zero overlap, that is, the optimization method is incentivized to find exactly orthogonal states without mixing.

**Fig. 6: Sketch of the loss function.**

In practice, for the batch sizes used in our calculations, we have not observed a bias due to the non-linear nature of the penalty when applied to sampled expectation values of the overlap. However, it is expected that this is no longer the case in the limit of small batches. In order to elucidate how our loss function behaves in this regard, we compute the two lowest states of LiH using a range of different batch sizes (see Fig. 7). We find the optimization procedure to be robust for the large batch sizes that we typically employ (≥2000), with the excitation energy within 1 mHa of the exact, and the pairwise overlap remaining small throughout training (panel c). For smaller batch sizes, we observe a larger degree of statistical noise in the pairwise overlap, which leads to a less reliable approximation for the excited state and the corresponding excitation energy (panel b).

**Fig. 7: Behavior of the loss function with batch size.**

Gradient of the loss function

In order to differentiate the loss function we explicitly formulate the gradient. We consider the general case of a mixed observable:

$${O}_{ij}=\frac{1}{{N}_{i}{N}_{j}}\int\,{d}^{3}{{{{{{{\bf{r}}}}}}}}{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},i}({{{{{{{\bf{r}}}}}}}})\left[\hat{O}{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},j}({{{{{{{\bf{r}}}}}}}})\right],$$

(12)

$$=\frac{{N}_{i}}{{N}_{j}}{{\mathbb{E}}}_{i}\left[\frac{\hat{O}{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},j}({{{{{{{\bf{r}}}}}}}})}{{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},i}({{{{{{{\bf{r}}}}}}}})}\right],$$

(13)

where N_i, N_j are the norms of the wavefunctions and ${{\mathbb{E}}}_{i}={{\mathbb{E}}}_{{{{{{{{\bf{r}}}}}}}} \sim|{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},i}{|}^{2}}$. By the property of Hermitian matrices, O_ij = O_ji, we derive an expression that does not depend on the wavefunction norms:

$${O}_{ij}=\sqrt{\frac{{N}_{i}}{{N}_{j}}{{\mathbb{E}}}_{i}\left[\frac{\hat{O}{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},j}({{{{{{{\bf{r}}}}}}}})}{{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},i}({{{{{{{\bf{r}}}}}}}})}\right]}\sqrt{\frac{{N}_{j}}{{N}_{i}}{{\mathbb{E}}}_{j}\left[\frac{\hat{O}{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},i}({{{{{{{\bf{r}}}}}}}})}{{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},j}({{{{{{{\bf{r}}}}}}}})}\right]},$$

(14)

$$={{{{{{{\rm{sgn}}}}}}}}\left({{\mathbb{E}}}_{i}\left[\frac{\hat{O}{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},j}({{{{{{{\bf{r}}}}}}}})}{{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},i}({{{{{{{\bf{r}}}}}}}})}\right]\right)\times \sqrt{{{\mathbb{E}}}_{i}\left[\frac{\hat{O}{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},j}({{{{{{{\bf{r}}}}}}}})}{{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},i}({{{{{{{\bf{r}}}}}}}})}\right]{{\mathbb{E}}}_{j}\left[\frac{\hat{O}{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},i}({{{{{{{\bf{r}}}}}}}})}{{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},j}({{{{{{{\bf{r}}}}}}}})}\right]}.$$

(15)

This expression reduces to the pairwise overlaps (Eq. (8)) upon setting $\hat{O}={{\mbox{Id}}}$. The derivative of this term can be expressed as

$$\partial {O}_{ij}=\frac{1}{{O}_{ij}}\Bigg\{{{\mathbb{E}}}_{i}\Bigg[\left(\frac{\hat{O}{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},j}({{{{{{{\bf{r}}}}}}}})}{{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},i}({{{{{{{\bf{r}}}}}}}})}-{{\mathbb{E}}}_{i}\left[\frac{\hat{O}{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},j}({{{{{{{\bf{r}}}}}}}})}{{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},i}({{{{{{{\bf{r}}}}}}}})}\right]\right) \partial \ln|{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},i}({{{{{{{\bf{r}}}}}}}})|\Bigg]\times {{\mathbb{E}}}_{j}\left[\frac{\hat{O}{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},i}({{{{{{{\bf{r}}}}}}}})}{{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},j}({{{{{{{\bf{r}}}}}}}})}\right]+(i {\iff} j)\Bigg\},$$

(16)

where (i ⇔ j) is an additional term with the two indices interchanged.

By considering the Hamiltonian operator $\hat{H}$ and setting i = j we recover the gradient of the energy loss²⁷:

$$\partial {E}_{ii}=2{{\mathbb{E}}}_{i}\left[\left(\frac{\hat{H}{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},i}({{{{{{{\bf{r}}}}}}}})}{{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},i}({{{{{{{\bf{r}}}}}}}})}-{{\mathbb{E}}}_{i}\left[\frac{\hat{H}{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},i}({{{{{{{\bf{r}}}}}}}})}{{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},i}({{{{{{{\bf{r}}}}}}}})}\right]\right)\partial \ln|{\psi }_{{{{{{{{\boldsymbol{\theta }}}}}}}},i}({{{{{{{\bf{r}}}}}}}})|\right].$$

(17)

Variance matching

As far as relative energies are concerned most computational chemistry methods rely heavily on the cancellation of error. While quantum Monte Carlo methods using neural network-based trial wavefunctions provide highly accurate total energies, the flexibility of these ansatzes is difficult to control which can lead to varying qualities of approximations for different states. In order to account for potential imbalances we utilize the variance of the wavefunctions as a measure of the quality of the approximation (zero-variance principle) and employ a variance-matching scheme. Variance-matching techniques as well as variance extrapolation have typically been applied by optimizing a family of ansatzes and comparing variances across the optimized wavefunctions⁵³. Instead of training multiple ansatzes we checkpoint wavefunctions during the training and compute excitation energies by rewinding the ground state to match the variance of the excited state as depicted in Fig. 8. The mean and variance of each wavefunction are computed over the batch dimension at each step in training and smoothed with an exponential walking average. For the final estimation of excitation energies, the respective wavefunctions are then sampled exhaustively as in the usual evaluation process. While the variance matching hardly impacts the excitation energies for small systems, for larger and harder-to-optimize systems, such as benzene, it becomes increasingly relevant.

Spin treatment

PauliNet encodes only the spatial part of the wavefunction and its like-spin antisymmetry explicitly¹², while the spin part, which guarantees the opposite-spin antisymmetry, is only implicit. Every spin-assigned spatial ansatz such as PauliNet is always an eigenstate of ${{{{{{{{\mathcal{S}}}}}}}}}_{z}$ with an eigenvalue of $M=\frac{1}{2}({N}_{\uparrow }-{N}_{\downarrow })$, but it may not be an eigenstate of ${{{{{{{{\mathcal{S}}}}}}}}}^{2}$. The spatial part of eigenstates of ${{{{{{{{\mathcal{S}}}}}}}}}^{2}$ is characterized by specific sets of permutational symmetries involving opposite-spin electrons⁵⁵. PauliNet does not enforce these symmetries but instead attempts to learn them through the variational principle because eigenstates of the Hamiltonian are also eigenstates of ${{{{{{{{\mathcal{S}}}}}}}}}^{2}$. Therefore, we do not, in general, control the spin of the eigenstates found in the optimization procedure—they are simply found in the order of increasing energy, independent of spin. The spin of a found eigenstate can be obtained in principle by Monte Carlo sampling⁵⁶. Whether a particular spin state is found in practice may be influenced by the spin of the CASSCF baseline wavefunction, which we, therefore, report in Supplementary Table VIII. In special cases, we may wish to target a specific spin state (e.g., see the section “Multi-reference application: conical intersections”), and for that, we can take advantage of the orbital-assigned backflow of PauliNet. Combined with the freezing of the determinant coefficients, this ensures that PauliNet remains in the same spin state as the CASSCF baseline wavefunction.

Data availability

The dataset generated in this study is openly available in Zenodo (https://doi.org/10.5281/zenodo.7274855). Source data are provided with this paper.

Code availability

The computer code used in this study is openly available in Zenodo (https://doi.org/10.5281/zenodo.7347937).

References

Lindh, R. & González, L. Quantum Chemistry and Dynamics of Excited States: Methods and Applications (John Wiley & Sons, 2020).
Kohn, W. & Sham, L. J. Self-consistent equations including exchange and correlation effects. Phys. Rev. 140, A1133 (1965).
Article ADS Google Scholar
Runge, E. & Gross, E. K. U. Density-functional theory for time-dependent systems. Phys. Rev. Lett. 52, 997 (1984).
Article ADS CAS Google Scholar
Elliott, P., Fuks, J. I., Rubio, A. & Maitra, N. T. Universal dynamical steps in the exact time-dependent exchange-correlation potential. Phys. Rev. Lett. 109, 266404 (2012).
Article ADS CAS Google Scholar
Fuks, J. I., Luo, K., Sandoval, E. D. & Maitra, N. T. Time-resolved spectroscopy in time-dependent density functional theory: an exact condition. Phys. Rev. Lett. 114, 183002 (2015).
Article ADS Google Scholar
Suzuki, Y., Lacombe, L., Watanabe, K. & Maitra, N. T. Exact time-dependent exchange-correlation potential in electron scattering processes. Phys. Rev. Lett. 119, 263401 (2017).
Article ADS Google Scholar
Singh, N., Elliott, P., Nautiyal, T., Dewhurst, J. K. & Sharma, S. Adiabatic generalized gradient approximation kernel in time-dependent density functional theory. Phys. Rev. B 99, 035151 (2019).
Article ADS CAS Google Scholar
Maitra, N. T. Charge transfer in time-dependent density functional theory. J. Phys.: Condens. Matter 29, 423001 (2017).
Google Scholar
Ullrich, C. A. & Tokatly, I. V. Nonadiabatic electron dynamics in time-dependent density-functional theory. Phys. Rev. B 73, 235102 (2006).
Article ADS Google Scholar
Szalay, P. G., Müller, T., Gidofalvi, G., Lischka, H. & Shepard, R. Multiconfiguration self-consistent field and multireference configuration interaction methods and applications. Chem. Rev. 112, 108 (2012).
Article CAS Google Scholar
Sneskov, K. & Christiansen, O. Excited state coupled cluster methods. WIREs Comput. Mol. Sci. 2, 566 (2011).
Article Google Scholar
Foulkes, W. M. C., Mitas, L., Needs, R. J. & Rajagopal, G. Quantum Monte Carlo simulations of solids. Rev. Mod. Phys. 73, 33 (2001).
Article ADS CAS Google Scholar
Williams, K. T. et al. Direct comparison of many-body methods for realistic electronic Hamiltonians. Phys. Rev. X 10, 011041 (2020).
CAS Google Scholar
Morales, M. A., McMinis, J., Clark, B. K., Kim, J. & Scuseria, G. E. Multideterminant wave functions in quantum Monte Carlo. J. Chem. Theory Comput. 8, 2181 (2012).
Article CAS Google Scholar
Benali, A. et al. Toward a systematic improvement of the fixed-node approximation in diffusion Monte Carlo for solids—a case study in diamond. J. Chem. Phys. 153, 184111 (2020).
Article ADS CAS Google Scholar
Austin, B. M., Zubarev, D. Y. & Lester, W. A. Quantum Monte Carlo and related approaches. Chem. Rev. 112, 263 (2012).
Article CAS Google Scholar
Ceperley, D. M. & Bernu, B. The calculation of excited state properties with quantum Monte Carlo. J. Chem. Phys. 89, 6316 (1988).
Article ADS CAS Google Scholar
Blunt, N. S., Smart, S. D., Booth, G. H. & Alavi, A. An excited-state approach within full configuration interaction quantum Monte Carlo. J. Chem. Phys. 143, 134117 (2015).
Article ADS CAS Google Scholar
Send, R., Valsson, O. & Filippi, C. Electronic excitations of simple cyanine dyes: reconciling density functional and wave function methods. J. Chem. Theory Comput. 7, 444 (2011).
Article CAS Google Scholar
Dash, M., Feldt, J., Moroni, S., Scemama, A. & Filippi, C. Excited states with selected configuration interaction-quantum Monte Carlo: chemically accurate excitation energies and geometries. J. Chem. Theory Comput. 15, 4896 (2019).
Article CAS Google Scholar
Pineda Flores, S. D. & Neuscamman, E. Excited state specific multi-Slater Jastrow wave functions. J. Phys. Chem. A 123, 1487 (2019).
Article CAS Google Scholar
Zhao, L. & Neuscamman, E. An efficient variational principle for the direct optimization of excited states. J. Chem. Theory Comput. 12, 3436 (2016).
Article CAS Google Scholar
Shea, J. A. R. & Neuscamman, E. Size consistent excited states via algorithmic transformations between variational principles. J. Chem. Theory Comput. 13, 6078 (2017).
Article CAS Google Scholar
Blunt, N. S. & Neuscamman, E. Excited-state diffusion Monte Carlo calculations: a simple and efficient two-determinant ansatz. J. Chem. Theory Comput. 15, 178 (2019).
Article CAS Google Scholar
Choo, K., Carleo, G., Regnault, N. & Neupert, T. Symmetries and many-body excitations with neural-network quantum states. Phys. Rev. Lett. 121, 167204 (2018).
Article ADS CAS Google Scholar
Pathak, S., Busemeyer, B., Rodrigues, J. N. B. & Wagner, L. K. Excited states in variational Monte Carlo using a penalty method. J. Chem. Phys. 154, 034101 (2021).
Article ADS CAS Google Scholar
Hermann, J., Schätzle, Z. & Noé, F. Deep-neural-network solution of the electronic Schrödinger equation. Nat. Chem. 12, 891 (2020).
Article CAS Google Scholar
Pfau, D., Spencer, J. S., Matthews, A. G. D. G. & Foulkes, W. M. C. Ab initio solution of the many-electron Schrödinger equation with deep neural networks. Phys. Rev. Res. 2, 033429 (2020).
Article CAS Google Scholar
Choo, K., Mezzacapo, A. & Carleo, G. Fermionic neural-network states for ab-initio electronic structure. Nat. Commun. 11, 2368 (2020).
Article ADS CAS Google Scholar
Han, J., Zhang, L. & Weinan, E. Solving many-electron Schrödinger equation using deep neural networks. J. Comput. Phys. 399, 108929 (2019).
Article CAS MATH Google Scholar
Spencer, J. S., Pfau, D., Botev, A. & Foulkes, W. M. C. Better, faster fermionic neural networks. Preprint at https://arxiv.org/abs/2011.07125 (2021).
Scherbela, M., Reisenhofer, R., Gerard, L., Marquetand, P. & Grohs, P. Solving the electronic Schrödinger equation for multiple nuclear geometries with weight-sharing deep neural networks. Nat. Comput. Sci. 2, 331–341 (2022).
Gao, N. & Günnemann, S. Ab-initio potential energy surfaces by pairing GNNs with neural wave functions. Preprint at https://arxiv.org/abs/2110.05064v2 (2021).
Carleo, G. & Troyer, M. Solving the quantum many-body problem with artificial neural networks. Science 355, 602 (2017).
Article ADS CAS MATH Google Scholar
Westermayr, J. & Marquetand, P. Machine learning for electronically excited states of molecules. Chem. Rev. 121, 9873 (2021).
Article CAS Google Scholar
Johnson III, R. D. NIST Computational Chemistry Comparison and Benchmark Database. NIST Standard Reference Database Number 101 (2021). https://doi.org/10.18434/T47C7Z.
Brémond, E., Savarese, M., Adamo, C. & Jacquemin, D. Accuracy of TD-DFT geometries: a fresh look. J. Chem. Theory Comput. 14, 3715 (2018).
Article Google Scholar
Tajti, A. & Szalay, P. G. Accuracy of spin-component-scaled CC2 excitation energies and potential energy surfaces. J. Chem. Theory Comput. 15, 5523 (2019).
Article CAS Google Scholar
Tajti, A., Tulipán, L. & Szalay, P. G. Accuracy of spin-component scaled ADC(2) excitation energies and potential energy surfaces. J. Chem. Theory Comput. 16, 468 (2020).
Article Google Scholar
Crossley, R. Fifteen years on—the calculation of atomic transition probabilities revisited. Phys. Scr. T8, 117 (1984).
Article ADS CAS Google Scholar
Loos, P.-F. et al. A mountaineering strategy to excited states: highly accurate reference energies and benchmarks. J. Chem. Theory Comput. 14, 4360 (2018).
Article CAS Google Scholar
Chrayteh, A., Blondel, A., Loos, P.-F. & Jacquemin, D. Mountaineering strategy to excited states: highly accurate oscillator strengths and dipole moments of small molecules. J. Chem. Theory Comput. 17, 416 (2021).
Article CAS Google Scholar
Bhattacharya, D., Vaval, N. & Pal, S. Electronic transition dipole moments and dipole oscillator strengths within Fock-space multi-reference coupled cluster framework: an efficient and novel approach. J. Chem. Phys. 138, 094108 (2013).
Article ADS Google Scholar
Olsen, J., De Meŕas, A. M., Jensen, H. J. A. & Jørgensen, P. Excitation energies, transition moments and dynamic polarizabilities for CH⁺. A comparison of multiconfigurational linear response and full configuration interaction calculations. Chem. Phys. Lett. 154, 380 (1989).
Article ADS CAS Google Scholar
Doering, J. P. Low-energy electron-impact study of the first, second, and third triplet states of benzene. J. Chem. Phys. 51, 2866 (1969).
Article ADS CAS Google Scholar
Eriksen, J. J. et al. The ground state electronic energy of benzene. J. Phys. Chem. Lett. 11, 8922 (2020).
Article CAS Google Scholar
Krylov, A. I. Spin-flip configuration interaction: an electronic structure model that is both variational and size-consistent. Chem. Phys. Lett. 350, 522 (2001).
Article ADS CAS Google Scholar
Mališ, M. & Luber, S. Trajectory surface hopping nonadiabatic molecular dynamics with Kohn–Sham ΔSCF for condensed-phase systems. J. Chem. Theory Comput. 16, 4071 (2020).
Article Google Scholar
Barbatti, M. & Crespo-Otero, R. Surface hopping dynamics with DFT excited states. In Density-Functional Methods for Excited States, (eds Ferré, N., Filatov, M., Huix-Rotllant, M.) 415–444 (Springer International Publishing, 2014).
Barbatti, M., Paier, J. & Lischka, H. Photochemistry of ethylene: a multireference configuration interaction investigation of the excited-state energy surfaces. J. Chem. Phys. 121, 11614 (2004).
Article ADS CAS Google Scholar
Schätzle, Z., Hermann, J. & Noé, F. Convergence to the fixed-node limit in deep variational Monte Carlo. J. Chem. Phys. 154, 124108 (2021).
Article ADS Google Scholar
Otis, L., Craig, I. & Neuscamman, E. A hybrid approach to excited-state-specific variational Monte Carlo and doubly excited states. J. Chem. Phys. 153, 234105 (2020).
Article ADS CAS Google Scholar
Robinson, P. J., Pineda Flores, S. D. & Neuscamman, E. Excitation variance matching with limited configuration interaction expansions in variational Monte Carlo. J. Chem. Phys. 147, 164114 (2017).
Article ADS Google Scholar
Garner, S. M. & Neuscamman, E. A variational Monte Carlo approach for core excitations. J. Chem. Phys. 153, 144108 (2020).
Article ADS CAS Google Scholar
Pauncz, R. Spin Eigenfunctions (Springer, New York, NY).
Huang, C.-J., Filippi, C. & Umrigar, C. J. Spin contamination in quantum Monte Carlo wave functions. J. Chem. Phys. 108, 8838 (1998).
Article ADS CAS Google Scholar
Bande, A., Nakashima, H. & Nakatsuji, H. LiH potential energy curves for ground and excited states with the free complement local Schrödinger equation method. Chem. Phys. Lett. 496, 347 (2010).
Article ADS CAS Google Scholar
Jasik, P., Sienkiewicz, J. E., Domsta, J. & Henriksen, N. E. Electronic structure and time-dependent description of rotational predissociation of LiH. Phys. Chem. Chem. Phys. 19, 19777 (2017).
Article CAS Google Scholar
Pitarch-Ruiz, J., Sánchez-Marín, J. & Velasco, A. M. Full configuration interaction calculation of the low lying valence and Rydberg states of BeH. J. Comput. Chem. 29, 523 (2008).
Article CAS Google Scholar
O’eill, D. P. & Gill, P. M. W. Benchmark correlation energies for small molecules. Mol. Phys. 103, 763 (2005).
Article ADS Google Scholar
Giner, E., Traore, D., Pradines, B. & Toulouse, J. Self-consistent density-based basis-set correction: how much do we lower total energies and improve dipole moments? J. Chem. Phys. 155, 044109 (2021).
Article ADS CAS Google Scholar
Larsen, H., Hald, K., Olsen, J. & Jørgensen, P. Triplet excitation energies in full configuration interaction and coupled-cluster theory. J. Chem. Phys. 115, 3015 (2001).
Article ADS CAS Google Scholar
Kowalski, K. & Piecuch, P. Excited-state potential energy curves of CH⁺: a comparison of the EOMCCSDt and full EOMCCSDT results. Chem. Phys. Lett. 347, 237 (2001).
Article ADS CAS Google Scholar
Kowalski, K. & Piecuch, P. The active-space equation-of-motion coupled-cluster methods for excited electronic states: full EOMCCSDt. J. Chem. Phys. 115, 643 (2001).
Article ADS CAS Google Scholar
Cronstrand, P., Jansik, B., Jonsson, D., Luo, Y. & Ågren, H. Density functional response theory calculations of three-photon absorption. J. Chem. Phys. 121, 9239 (2004).
Article ADS CAS Google Scholar
Sałek, P. et al. A comparison of density-functional-theory and coupled-cluster frequency-dependent polarizabilities and hyperpolarizabilities. Mol. Phys. 103, 439 (2005).
Article ADS Google Scholar
Liu, F. et al. A parallel implementation of the analytic nuclear gradient for time-dependent density functional theory within the Tamm–Dancoff approximation. Mol. Phys. 108, 2791 (2010).
Article ADS CAS Google Scholar
Adamo, C., Scuseria, G. E. & Barone, V. Accurate excitation energies from time-dependent density functional theory: assessing the PBE0 model. J. Chem. Phys. 111, 2889 (1999).
Article ADS CAS Google Scholar
Biglari, Z., Shayesteh, A. & Maghari, A. Ab initio potential energy curves and transition dipole moments for the low-lying states of CH⁺. Comput. Theor. Chem. 1047, 22 (2014).
Article CAS Google Scholar
Barysz, M. Fock space multi-reference coupled cluster study of transition moment and oscillator strength. Theor. Chim. Acta 90, 257 (1995).
Article CAS Google Scholar
Lane, J. R., Vaida, V. & Kjaergaard, H. G. Calculated electronic transitions of the water ammonia complex. J. Chem. Phys. 128, 034302 (2008).
Article ADS Google Scholar
Tawada, Y., Tsuneda, T., Yanagisawa, S., Yanai, T. & Hirao, K. A long-range-corrected time-dependent density functional theory. J. Chem. Phys. 120, 8425 (2004).
Article ADS CAS Google Scholar
Douglass, C. H., Nelson, H. H. & Rice, J. K. Spectra, radiative lifetimes, and band oscillator strengths of the A¹Π−X¹Σ⁺ transition of BH. J. Chem. Phys. 90, 6940 (1989).
Article ADS CAS Google Scholar
Mahan, B. & O’Keefe, A. Radiative lifetimes of excited electronic states in molecular ions. Astrophys. J. 248, 1209–1216 (1981).
Thorn, P. A. et al. Cross sections and oscillator strengths for electron-impact excitation of the Ã¹B₁ electronic state of water. J. Chem. Phys. 126, 064306 (2007).
Article ADS CAS Google Scholar
Chen, T., Liu, Y. W., Du, X. J., Xu, Y. C. & Zhu, L. F. Oscillator strengths and integral cross sections of the Ã${}^{1}{{{{{{{{\rm{A}}}}}}}}}_{2}^{{\prime\prime} }\leftarrow$X̃¹A₁ excitation of ammonia studied by fast electron impact. J. Chem. Phys. 150, 064311 (2019).
Article ADS Google Scholar
Kang, X. et al. Oscillator strength measurement for the A(0–6)–X(0), C(0)–X(0), and E(0)–X(0) transitions of CO by the dipole (γ, γ) method. Astrophys. J. 807, 96 (2015).
Article ADS Google Scholar
Loos, P.-F., Lipparini, F., Boggio-Pasqua, M., Scemama, A. & Jacquemin, D. A mountaineering strategy to excited states: highly accurate energies and benchmarks for medium sized molecules. J. Chem. Theory Comput. 16, 1711 (2020).
Article CAS Google Scholar
Lorentzon, J., Malmqvist, P.-Å, Fülscher, M. & Roos, B. O. A CASPT2 study of the valence and lowest Rydberg electronic states of benzene and phenol. Theor. Chim. Acta 91, 91 (1995).
Article CAS Google Scholar
Fdez. Galván, I. et al. OpenMolcas: from source code to insight. J. Chem. Theory Comput. 15, 5925 (2019).
Article Google Scholar

Download references

Acknowledgements

We thank Tim Gould (Griffith U) for early discussions about variational principles for excited states. Funding is gratefully acknowledged from the Berlin mathematics center MATH+ (Projects AA1-6, AA2-8), European Commission (ERC CoG 772230), Deutsche Forschungsgemeinschaft (NO825/3-2), and the Berlin Institute for Foundations in Learning and Data (BIFOLD).

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

These authors contributed equally: M.T. Entwistle, Z. Schätzle.

Authors and Affiliations

Department of Mathematics and Computer Science, FU Berlin, Arnimallee 12, 14195, Berlin, Germany
M. T. Entwistle, Z. Schätzle, P. A. Erdman, J. Hermann & F. Noé
Microsoft Research AI4Science, Berlin, Germany
F. Noé
Department of Physics, FU Berlin, Arnimallee 14, 14195, Berlin, Germany
F. Noé
Department of Chemistry, Rice University, Houston, TX, 77005, USA
F. Noé

Authors

M. T. Entwistle
View author publications
You can also search for this author in PubMed Google Scholar
Z. Schätzle
View author publications
You can also search for this author in PubMed Google Scholar
P. A. Erdman
View author publications
You can also search for this author in PubMed Google Scholar
J. Hermann
View author publications
You can also search for this author in PubMed Google Scholar
F. Noé
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.T.E., Z.S., J.H., and F.N. designed the research. M.T.E., Z.S., P.A.E., J.H., and F.N. developed the method. M.T.E., Z.S., and J.H. wrote the computer code. M.T.E. and Z.S. carried out the numerical calculations. M.T.E., Z.S., P.A.E., J.H., and F.N. analyzed the data. M.T.E., Z.S., J.H., and F.N. wrote the manuscript.

Corresponding authors

Correspondence to J. Hermann or F. Noé.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Xiang Li and the anonymous reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information for Electronic excited states in deep variational Monte Carlo

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Entwistle, M.T., Schätzle, Z., Erdman, P.A. et al. Electronic excited states in deep variational Monte Carlo. Nat Commun 14, 274 (2023). https://doi.org/10.1038/s41467-022-35534-5

Download citation

Received: 04 April 2022
Accepted: 08 December 2022
Published: 17 January 2023
DOI: https://doi.org/10.1038/s41467-022-35534-5

This article is cited by

A computational framework for neural network-based variational Monte Carlo with Forward Laplacian
- Ruichen Li
- Haotian Ye
- Liwei Wang
Nature Machine Intelligence (2024)
Towards a transferable fermionic neural wavefunction for molecules
- Michael Scherbela
- Leon Gerard
- Philipp Grohs
Nature Communications (2024)
Quantum confinement detection using a coupled Schrödinger system
- Chun Li
Nonlinear Dynamics (2024)
Ab initio quantum chemistry with neural-network wavefunctions
- Jan Hermann
- James Spencer
- Frank Noé
Nature Reviews Chemistry (2023)
Towards the ground state of molecules via diffusion Monte Carlo on neural networks
- Weiluo Ren
- Weizhong Fu
- Ji Chen
Nature Communications (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.