Artificial neural networks (ANNs) are a class of expressive mathematical models originally designed to imitate the high computing power of the human brain. Driven by the outstanding success over existing data processing methods in the field of machine intelligence1,2,3, ANNs have been used in a wide range of applications, from physical science4,5,6,7,8, medical diagnosis, to astronomical observations. Remarkable among numerous factors underlying their performance is their ability to perform efficient feature extraction from high-dimensional data.

As universal approximators, ANNs have a rich expressive power, which can also be exemplified by encoding complicated quantum correlations9. Carleo and Troyer10 showed that ANNs, employed as a quantum many-body wave-function ansatz, can solve strongly correlated lattice systems at state-of-the-art level. Such quantum-state ansatze, often referred to as neural quantum states (NQS), capture quantum entanglement that even scales extensively11. The use of such a powerful nonlinear parametrization has been keenly investigated in the quantum physics community: both equilibrium12,13 and out-of-equilibrium14,15,16,17 properties, extension of the network structure18,19,20, and quantum tomography21,22,23,24. Meanwhile, we point out that the application of ANNs to fermionic systems is much less explored, despite their practical significance, such as the modeling of real materials and the experimental realizability in quantum simulators25,26. The proof of concept for small molecular systems was first presented by Choo et al.27 which applied the ANNs to solve the many-body Schrödinger equation governed by the second-quantized Hamiltonian for molecular orbits. Few implementations have been further performed to simulate the electronic structures using ANNs28,29,30,31. Thus, a crucial question remains to be answered: are ANNs powerful enough to represent the electronic structures of real solid materials? This is related to one of the fundamental problems in condensed-matter physics and computational materials science; namely, establishing a predictive ab initio method for solids or surfaces. In particular, it must be demonstrated that the ANNs are capable of investigating the thermodynamic limit.

We stress that no current first-principles method can take into account both weak and strong electron correlations compactly and sufficiently. For instance, it is well known that the accuracy of the de facto standard method, density functional theory (DFT), is semi-quantitative and it is very difficult to improve significantly32,33. Many-body-wave-function-based methodologies are, in contrast, systematically improvable. Such techniques, mainly based on coupled-cluster (CC) theory (or many-body perturbation theory)34, have been successful for the electronic states of molecules. This has encouraged the application of quantum chemical methods to solid-state physics35,36. However, methods such as CC specialize in describing weak electronic correlations, and only work well for electronic states where the mean-field approximation is valid.

Methods for dealing with strongly correlated electrons, called multireference theory, also exists in quantum chemistry37; but these assume that the number of strongly correlated electrons is small. Such a condition usually holds in the case of molecules, because the number of strongly correlated electrons is often localized and limited. In contrast, there can be a large number of moderately or strongly correlated electrons in solid-state systems, owing to their high symmetry and dense structure. Based on its success in spin systems, it is natural to expect that the NQS have the potential to compactly describe a variety of electron correlations appearing in first-principles calculations of solids with a moderate computational cost (See Fig. 1 for a schematic diagram of the hierarchy of quantum chemical methods38,39,40,41,42).

Fig. 1: Schematic illustration of the relationship between the formal computational complexity and accuracy in various first-principles calculation methods for solid systems.
figure 1

Our goal is to demonstrate that the variational calculation using neural-network-based ansatz can readily describe both weakly and strongly correlated electronic structures with moderate number of variational parameters, i.e., computational cost. We denote the full configuration interaction (FCI) method by the black square, whereas the Hartree–Fock (HF) and post-HF calculation methods are indicated by blue squares: the second-order Møller–Plesset perturbation theory (MP2), the coupled-cluster singles and doubles (CCSD), and CCSD with perturbative triple excitations (CCSD(T)). Also, the green squares indicate methods based on the Density Functional Theory (DFT): the DFT and DFT-based Random Phase Approximation (RPA). The number of orbitals at each k-point is denoted as N and the total number of k-points as Nk. Note that this is a qualitative (approximate) illustration, which will vary from case to case.

In this work, we demonstrate that neural-network-based many-body wave functions can readily simulate the essense of first-principles calculations for extended periodic materials: the ground-state and excited-state properties. The second-quantized fermionic Hamiltonian is transformed into a spin representation, such that the problematic sign structure of fermions, which usually imposes severe limits on the numerical accuracy, is naturally encoded. Employing the variational Monte Carlo (VMC)-based stochastic optimization, we show that the thermodynamic limit of a one-dimensional system can be simulated within chemical accuracy. For real solids in both two and three dimensions, the static electronic correlation in the minimal active space is compactly represented by the NQS. Our work’s main contribution is that multiple excited states, forming quasiparticle band spectra, are computed by constructing an effective Hamiltonian in the truncated Hilbert space. To the best of our knowledge we offer the first demonstration that the NQS can be applied to simulate low-lying eigenstates in the identical-quantum-number sector.


Second-quantization representation of solid systems

To alleviate the notorious difficulty of simulating the many-body problem of solid systems, we employ a linear combination of the single-particle basis. Namely, we construct crystalline orbitals (COs) using the solution of the crystalline Hartree–Fock (HF) equation43,44. The second-quantization form of the many-body fermionic Hamiltonian is

$$H =\; \mathop{\sum}\limits_{pq}\mathop{\sum}\limits_{{\bf{k}}}{t}_{pq}^{{\bf{k}}}{c}_{p{\bf{k}}}^{\dagger }{c}_{q{\bf{k}}}\\ +\frac{1}{2}\mathop{\sum}\limits_{pqrs}\mathop{\sum }\limits_{{{\bf{k}}}_{p}{{\bf{k}}}_{q}{{\bf{k}}}_{r}{{\bf{k}}}_{s}}^{\prime}{v}_{pqrs}^{{{\bf{k}}}_{p}{{\bf{k}}}_{q}{{\bf{k}}}_{r}{{\bf{k}}}_{s}}{c}_{p{{\bf{k}}}_{p}}^{\dagger }{c}_{q{{\bf{k}}}_{q}}{c}_{r{{\bf{k}}}_{r}}^{\dagger }{c}_{s{{\bf{k}}}_{s}},$$

where cpk (\({c}_{p{\bf{k}}}^{\dagger }\)) denotes the annihilation (creation) operator of an electron on the p-th CO with crystal momentum k. Here, the anticommutation relation \(\{{c}_{p{{\bf{k}}}_{p}},{c}_{q{{\bf{k}}}_{q}}^{\dagger }\}={\delta }_{pq}{\delta }_{{{\bf{k}}}_{p}{{\bf{k}}}_{q}}\) is imposed, and one-body (two-body) integrals are given as \({t}_{pq}^{{\bf{k}}}\) (\({v}_{pqrs}^{{{\bf{k}}}_{p}{{\bf{k}}}_{q}{{\bf{k}}}_{r}{{\bf{k}}}_{s}}\)). For simplicity, hereafter we denote the suffix as μ (pk). While the general framework of the crystalline HF equation is common with that for molecular systems, it must be noted that the contribution from the reciprocal lattice vector G = 0 requires extra numerical care owing to the divergence of the exchange integrals. In this work, we employ the crystalline Gaussian-based atomic functions as the single-particle basis. The Gaussian density fitting technique is applied to efficiently compute the two-body integrals45.

The summation in the first term of Eq. (1) is taken over a uniform grid, which is typically obtained by shifting the k’s obeying the Monkhorst–Pack rule46. Note that the number Nk of sampled k-points can be arbitrary. The primed summation in the second term satisfies the conservation of crystal momentum, which follows from translational invariance:

$${{\bf{k}}}_{p}+{{\bf{k}}}_{r}-{{\bf{k}}}_{q}-{{\bf{k}}}_{s}\in {\mathcal{G}},$$

where \({\mathcal{G}}\) is the set of reciprocal lattice vectors. With the number of COs at each k-point denoted as N, the total number of terms in Eq. (1) is given as \({\mathcal{O}}({N}^{4}{N}_{k}^{3})\).

To solve the fermionic many-body Hamiltonian (1), we must explicitly impose the antisymmetric sign structure in the quantum state. Here, we map the Hamiltonian into the spin-1/2 representation such that the sign structure is encoded in the operators rather than the quantum states, as Choo et al.27 considered in their application of the NQS to small molecules. The Jordan–Wigner (JW) transformation47 defines the relation of fermionic and spin operators as \({c}_{\mu }^{(\dagger )}={(-1)}^{\mu -1}{\prod }_{\nu \,{<}\,\mu }{\sigma }_{\nu }^{z}{\sigma }_{\mu }^{+(-)}\), where \({\sigma }_{\mu }^{+(-)}\) is the raising (lowering) operator of the μ-th spin. Such a mapping yields a nonlocal spin Hamiltonian


where PQμ{I, X, Y, Z} is a product of Pauli matrices for a corresponding Pauli string Q.

Let us make two remarks on the application of JW transformation. First, the use of the fermion-to-spin transformation for stochastic variational calculations was initially considered in the context of near-term quantum computers48, including the application to real solids49,50,51, while the spin-to-fermion mapping has been long applied in condensed-matter and statistical physics community, e.g., to solve exactly soluble quantum spin models. Second, the JW transformation merely generates the spin operator representation of the Hamiltonian (1) and does not alter the computational basis. The evaluation of physical observables in the Monte Carlo approach by the occupation-number basis of the fermionic representation is identical to that by the spin computational basis of the spin representation. This is not the case when we apply other transformations developed in quantum information, such as the Bravyi–Kitaev transformation52.

Ground states in the thermodynamic limit

In general, it is classically intractable to solve for the ground state of the many-body Hamiltonian defined in Eq. (1) or (3). Here we alternatively rely on a variational method that exemplifies the expressive power of neural networks. Namely, a neural network is used as a variational many-body wave-function ansatz. It is optimized so that the expectation value of the energy, estimated via the Monte Carlo simulation, is minimized by approximating the imaginary-time evolution. Such a technique, called variational Monte Carlo (VMC), has been successfully applied to condensed-matter systems53,54,55,56 and quantum chemistry problems57,58, leading to state-of-the-art numerical analysis on strongly correlated phenomena. The choice of the variational ansatz plays a key role for the accuracy, which, as has been pointed out by Carleo and Troyer10, can be significantly improved by using neural networks.

Let us briefly review the general protocol of VMC for simulating ground states in many-body spin systems using the quantum-state ansatz based on the restricted Boltzmann machine (RBM)59. First, we introduce the quantum many-body wave function expressed as follows10,

$$\left|{{{\Psi }}}_{\theta }^{{\rm{RBM}}}\right\rangle = \frac{1}{Z}{\sum }_{{\boldsymbol{\sigma }}}{{{\Psi }}}_{\theta }^{{\rm{RBM}}}({\boldsymbol{\sigma }})\left|{\boldsymbol{\sigma }}\right\rangle ,\\ {{{\Psi }}}_{\theta }^{{\rm{RBM}}}({\boldsymbol{\sigma }}) = \mathop{\sum}\limits_{h}\exp ({W}_{\mu \nu }{\sigma }_{\mu }{h}_{\nu }+\mathop{\sum}\limits_{\mu }{a}_{\mu }{\sigma }_{\mu }+\mathop{\sum}\limits_{\nu }{b}_{\nu }{h}_{\nu }),$$

where \({{{\Psi }}}_{\theta }^{{\rm{RBM}}}(\sigma )\) is the unnormalized amplitude for a spin configuration \(\sigma \in {\{-1,+1\}}^{{N}_{v}}\) where Nv = NNk is the total number of spin orbitals and \(Z=\sqrt{{\sum }_{\sigma }| {{{\Psi }}}_{\theta }^{{\rm{RBM}}}(\sigma ){| }^{2}}\) is the normalization factor. We denote the set of complex variational parameters as θ = {Wμν, aμ, bν}, where the interaction Wμν denotes the virtual coupling between the spin σμ and the auxilliary degrees of freedom, or the hidden spin hν. One-body terms aμ and bν are also introduced to enhance the expressive power of the RBM state. In the present work, we find that the it suffices to take the total number of the hidden spin as Nh = Nv, and therefore the number of the complex variational parameters is \(({N}_{v}^{2}+2{N}_{v})\) in total. The all-to-all connectivity between σ and h allows the RBM state to capture complicated quantum correlations such as topological orders13,60, spin-liquid behaviours61,62,63, and electronic structures in small molecular systems27,28.

Using the RBM state (4) as the many-body variational ansatz, the ground-state problem is solved in the VMC framework. In particular, we rely on the stochastic reconfiguration technique64 to approximate the imaginary-time evolution as

$$\left|{{{\Psi }}}_{{\mathrm{GS}}}\right\rangle \propto \mathop{{{\lim}}}\limits_{\tau \to \infty }{e}^{-\tau H}\left|{{{\Psi }}}_{0}\right\rangle \sim \left|\mathop{{{\Psi }}}\nolimits_{{\theta }_{0}+{\sum }_{k}{{\Delta }}{\theta }_{k}}^{{\rm{RBM}}}\right\rangle ,$$

where the parameter update at the k-th step Δθk is given by the Monte Carlo simulation, and the initial state \(\left|{{{\Psi }}}_{0}\right\rangle\) is taken as the HF state in our simulation. Detailed information on the implementation and optimization techniques is provided in “Methods”.

As a first demonstration, we provide the potential energy curve for a one-dimensional system whose electronic correlation varies drastically as the geometry is changed. Concretely, we consider a linear hydrogen chain with homogeneous atom separation dH in a minimal basis set (STO-3G)65,66. Figure 2a presents the result of the calculation using the RBM state as well as the second-order Møller–Plesset perturbation theory (MP2)67, the coupled-cluster singles and doubles (CCSD)41,68, and CCSD with perturbative triple excitations (CCSD(T))69, which is considered as the gold-standard in modern quantum chemistry. While the weakly correlated regime at near-equilibrium is simulated quite well by all the conventional methods, we see that they start to collapse as the correlation grows at the intermediate dH regime, not to mention the Mott-insulating large dH regime. In sharp contrast, the RBM state precisely describes the electronic correlation and achieves chemical accuracy at any atom separation dH. Here, two k-points are sampled from each unit cell, which contains four hydrogen atoms so that the interactions between nearby sites are reflected explicitly on the model.

Fig. 2: Solving the ground state of the linear hydrogen chain using the minimal STO-3G basis set.
figure 2

a The potential energy curve calculated by the restricted Boltzmann machine (RBM) agrees with the full configuration interaction (FCI) method within chemical accuracy (1.6 mHa) for any atom separation dH. This indicates that the RBM states are capable of describing both the weakly and strongly interacting regimes, where gold-standard techniques, such as coupled-cluster singles and doubles (CCSD) shown by the yellow line and CCSD with perturbative triple excitations (CCSD(T)) in black line, break down. The results by restricted Hartree–Fock (RHF) and second-order Møller–Plesset perturbation theory are indicated by blue and gray lines, respectively. A unit cell consists of four hydrogen atoms placed at even intervals, and two k-points are sampled from a uniform grid. b Finite-size scaling of the ground-state energy up to Nk = 18 and its deviation from the FCI (Nk ≤ 8Nk ≤ 8) or CCSD(T) (Nk > 8Nk > 8), ΔE, at near-equilibrium dH = 2. The results show excellent agreement with conventional methods even in the thermodynamic limit Nk → . Here, the unit cell consists of a single hydrogen atom, and hence the maximum number of spin orbitals considered here is 36. The error bars denote the standard deviation of the estimation by the Monte Carlo sampling.

To further illustrate the RBM state’s power and reliability, we calculate the energy in the thermodynamic limit by extrapolating Nk →  in a system with a single atom per unit cell. The numerical result at near-equilibrium (dH = 2.0aB) is shown in Fig. 2b. We confirm the excellent agreement with conventional methods by comparing the result with the FCI for Nk ≤ 8 and CCSD for 10 ≤ Nk ≤ 18. Clearly, the thermodynamic limit is simulated precisely as well as the finite-size system.

Next, we provide the demonstration in both 2D and 3D real solids: graphene and the lithium hydride (LiH) crystal in the rocksalt structure. Here, we restrict the active space per each k-point to its highest occupied CO and lowest unoccupied CO. The results for graphene [Fig. 3a] and the crystalline LiH [Fig. 3b] are both in remarkable agreement with the FCI or CCSD(T). Clearly, the RBM ansatz gives a quantitatively accurate description, which may allow crystal structure determinations of weakly to moderately correlated real solid systems.

Fig. 3: Potential energy curves for 2D and 3D real solids calculated by neural networks.
figure 3

The ground-state energy is computed for various lattice constants in the vicinity of equilibrium values. a Graphene on a honeycomb lattice solved using the cc-pVDZ basis set. The smallest active space is taken at each 2 × 2Γ-centered k-point, and hence 16 spin orbitals in total. b LiH with the rocksalt structure solved using the STO-3G basis. The smallest active space is taken at each 2 × 2 × 2Γ-centered k-point, and hence 32 spin orbitals in total. The result obtained for the RBM state (green triangle) shows remarkable agreement either with the full configuration interaction (FCI) method or coupled-cluster singles and doubles with perturbative triple excitations (CCSD(T)), achieving an error within chemical accuracy (1.6 mHa). The red, blue dotted, gray, yellow, and black dashed lines denote the results by the FCI method, restricted Hartree–Fock (RHF) method, second-order Møller–Plesset perturbation theory (MP2), coupled-cluster singles and doubles (CCSD), and CCSD(T).

Quasiparticle band structure from the one-particle excitation

Interest beyond the ground-state electronic structures in solids is diverse: the response against electromagnetic fields, impurity effects, phononic dispersions, and so on. Here, we focus on the band structure, which is a peculiar yet fundamental property that characterizes solid systems. We stress that variational calculations for the lowest bandgap, which can be experimentally measured from photoemissions, are already few, not to mention the simulation of the band spectra based on stochastic methods70. Furthermore, to the best of our knowledge, there is no NQS simulation of excited states in the identical sector of quantum numbers except the first excited state19. This motivates us to perform the first attempt to calculate multiple low-lying states and deepen our understanding on the representability of the NQS beyond the well-studied regimes.

In general, the calculation of band structures is based on the assumption that the system is weakly to moderately correlated. In other words, the mean-field approximation is qualitatively valid, so that one-particle excitations dominate the low-lying spectrum. By employing such a picture in a quantum many-body context, we can also simulate the band structure via quasiparticle excitations. We take a similar approach here and compute the band structure from the single-particle linear-response behavior of the ground state.

Let us construct an appropriately truncated Hilbert space which captures the low-lying states in a stochastic manner. It is justified from the above argument that we consider a subspace spanned by a set of non-orthonormal bases \(\{{R}_{\alpha }\left|{{{\Psi }}}_{{\rm{GS}}}\right\rangle \}\), where Rα denotes the α-th single-particle excitation operator. Here, the valence (conduction) bands are obtained from the ionization (electron attachment) operators \(\{{c}_{p{{\bf{k}}}_{p}}\}\) (\(\{{c}_{p{{\bf{k}}}_{p}}^{\dagger }\}\)), which allows us to compute the quasiparticle band with an additional computational cost of \({\mathcal{O}}({N}_{v}^{3})\). Although it is possible to include higher-order excitation operators, here we avoid them from the viewpoint of computational cost and size inconsitency. It can be shown that the diagonalization of the effective Hamiltonian given the non-orthonormal basis is done by the following generalized eigenvalue equation71,

$$\widetilde{H}C=\widetilde{S} CE$$

where \(E={\rm{diag}}({E}_{1},...,{E}_{{N}_{v}})\) denote the eigenvalues and C is an array of eigenvectors. The matrix elements of the non-hermitian matrix \(\widetilde{H}\) and the metric \(\widetilde{S}\) are estimated via the Monte Carlo sampling as expectation values:

$${\widetilde{H}}_{\alpha \beta }=\left\langle {{{\Psi }}}_{{\rm{{\theta }}^{* }}}^{{\rm{RBM}}}| {R}_{\alpha }^{\dagger }H{R}_{\beta }| {{{\Psi }}}_{{\rm{{\theta }}^{* }}}^{{\rm{RBM}}}\right\rangle ,$$
$$\widetilde{S}_{\alpha \beta }=\left\langle {{{\Psi }}}_{{\rm{{\theta }}^{* }}}^{{\rm{RBM}}}| {R}_{\alpha }^{\dagger }{R}_{\beta }| {{{\Psi }}}_{{\rm{{\theta }}^{* }}}^{{\rm{RBM}}}\right\rangle ,$$

where the ground state is now replaced by the RBM ansatz \(\left|{{{\Psi }}}_{{\theta }^{* }}^{{\rm{RBM}}}\right\rangle\), with the optimized variational parameter θ*. In the field of quantum chemistry, this procedure is referred to as the internally contracted multireference configuration interaction72,73.

To enhance the numerical reliability, we incorporate the effect of orbital relaxation by estimating the bandgap from the extended Koopmans’ theorem74,75,76. The energies are shifted so that the first valence and conduction bands coincide with the energy difference ΔEIP and ΔEEA as

$$\left\{\begin{array}{lll}{{\Delta }}{E}^{{\mathrm{IP}}}&=&{E}_{{\mathrm{GS}}}^{{N}_{v}}-{E}_{{\mathrm{GS}}}^{{N}_{v}-1},\\ {{\Delta }}{E}^{{\mathrm{EA}}}&=&{E}_{{\mathrm{GS}}}^{{N}_{v}+1}-{E}_{{\mathrm{GS}}}^{{N}_{v}},\end{array}\right.$$

where \({E}_{{\mathrm{GS}}}^{n}\) is the energy of the RBM optimized in the particle-number sector n (See “Methods”).

We provide a demonstration for the quasiparticle band structure of the polyacetylene [Fig. 4a] using the STO-3G basis sets. The result is compared with a variant of the equation-of-motion coupled-cluster theories (EOM-CC): ionization-potential (electron-attached) EOM-CC (IP-EOM-CC, EA-EOM-CC), which considers up to 2-hole and 1-particle (2-particle and 1-hole) excitations41. The agreement with EOM-CCSD(T)(a)*77 is very good for the first valence and conduction bands, while it becomes slightly worse for higher excitations. As is shown in Fig. 4b, the first conduction band is simulated almost within chemical accuracy, which is partly due to the cancellation of the optimization errors induced by Eq. (9). Meanwhile, Fig. 4c indicates that errors in the higher excitations can be an order of magnitude larger in the worst case, which cannot be explained merely from the variational simulation error. Rather, it can be understood as a systematic error originating in the insufficiency of the truncated Hilbert space; there is a trade-off between the computational cost and the accuracy. Systematic improvement can be expected from using higher-order excitation operators, e.g., two-electron excitation operators \(\{{c}_{p{{\bf{k}}}_{p}}^{\dagger }{c}_{q{{\bf{k}}}_{q}}\}\) for the lowest energy state in the particle-number sectors (Nv ± 1).

Fig. 4: Quasiparticle band spectra from multiple excited-state calculation.
figure 4

a Schematic diagram of the trans-polyacetylene (C2H2)n. The cyan and gray spheres indicate the carbon and hydrogen atoms, respectively. b Three quasiparticle bands below and above the Fermi energy. Here, the yellow lines and black dashed lines indicate results obtained from the equation-of-motion coupled-cluster (EOM-CC) formalism; CCSD and CCSD(T) stand for the unperturbed EOM-CCSD and the perturbed EOM-CCSD(T)(a)*methods, respectively. The blue dotted lines denote the restricted Hartree–Fock (RHF) method. c A zoom-in of the first conduction band, which is computed from the electron attatchment (EA) energy. It is clearly shown, from the energy differences against the EA-EOM-CCSD(T)(a)* method, that the results by the RBM (green triangle) are comparable or better than the unperturbed EA-EOM-CCSD method. In all calculations, a single k-point is taken under the minimal basis set (STO-3G) and hence 24 spin orbitals are taken into account. The size of the unit cell is taken as 2.451 Å.


We have shown that a shallow neural network with a moderate number of variational parameters allows us to perform the essence of first-principles calculations in solid systems, i.e., the ground-state property and the quasiparticle band spectra. In the weakly to moderately correlated regions of the linear hydrogen chain, we have demonstrated that even the thermodynamic limit can be simulated using the RBM state. The representability of the RBM is also exhibited in the strongly correlated regions, where the standard approaches break down. We have furthermore shown that the electronic structures of real solids in both 2D and 3D can be described accurately. Furthermore, we have successfully obtained the quasiparticle band spectra of a polymer in the linear-response regime. To the best of our knowledge, this is the first demonstration proving that NQS are capable of computing multiple excited states, in addition to precise ground-state simulations that reach their chemical accuracy.

Numerous future directions can be envisioned. We remark the following three points. First is the extension towards the complete basis limit. While we have here focused on relatively simple basis sets, the quantitative prediction and comparison with experiments would necessarily require larger basis sets. Working in the continuum space is a possibility, but the calculation would be much more involved than in molecular systems. Second is the systematic improvement of the calculations for excited states. It is intriguing to investigate the quantitative performance; whether higher-order subspace expansions can be efficiently implemented, how the accuracy is compared to other excited-state calculation framework such as the equation-of-motion and time-dependent linear response78, and so on. Third is the behavior of physical observables. One may want to know the optical/magnetoelectric/thermal responses, so that experimental results can be directly compared. If the system is either quasi-static or static, those properties can be evaluated as derivatives of the energy with respect to an external perturbation (e.g., electric field)79.

The main bottleneck that prevents the simulation by the NQS in larger systems is the sampling efficiency. As mentioned by Choo et al. for the case of RBM27, and as known before in the VMC community, accurate calculations for relatively weak electronic correlations in the HF basis requires increasingly larger number of Monte Carlo samplings, because the amplitudes for multi-electron excitations are small. One may consider applying efficient sampling techniques, such as parallel tempering, heat-bath configuration interaction80, or even employ non-HF bases.


Stochastic imaginary-time evolution by variational Monte Carlo

Given an initial state \(\left|{{{\Psi }}}_{0}\right\rangle\) whose overlap with the true ground state is nonzero (and desirably not exponentially small), the ground state \(\left|{{{\Psi }}}_{{\rm{GS}}}\right\rangle\) can be simulated as

$$\begin{array}{r}\left|{{{\Psi }}}_{{\mathrm{GS}}}\right\rangle \propto \mathop{{{\lim}}}\limits_{N\to \infty }\mathop{{{\lim}}}\limits_{\eta \to 0}\left(\mathop{\prod }\limits_{k=1}^{N}{e}^{-\eta H}\right)\left|{{{\Psi }}}_{0}\right\rangle ,\end{array}$$

where H is the Hamiltonian of the system and η is a "learning rate" that determines the step of the imaginary-time evolution. The exact simulation of Eq. (10) for generic quantum many-body systems becomes exponentially inefficient as the system size grows. Hence, we approximate the quantum state by a variational ansatz \(\left|{{{\Psi }}}_{\theta }\right\rangle\) and consider the update rule of the parameters θ such that Eq. (10) is realized approximately.

There are numerous variational principles that dictate the parameter updates. Here, we choose the stochastic reconfiguration method64,81, which uses the Fubini-Study metric \({\mathcal{F}}\) to measure the difference between the exact and variational imaginary-time evolution. Given a set of variational parameter θ, the update δθ is determined as

$$\delta \theta = \, \mathop{{\rm{arg}}\ {{\min}}}\limits_{{{\Delta }}}\left({\mathcal{F}}\left[{e}^{-\eta \hat{H}}\left|{{{\Psi }}}_{\theta }\right\rangle ,\left|{{{\Psi }}}_{\theta +{{\Delta }}}\right\rangle \right]\right)\\ = \, -\eta {g}^{-1}f$$

where \({\mathcal{F}}[\left|\psi \right\rangle ,\left|\phi \right\rangle ]=\arccos (\sqrt{\left\langle \psi | \phi \right\rangle \left\langle \phi | \psi \right\rangle /\left\langle \psi | \psi \right\rangle \left\langle \phi | \phi \right\rangle })\) and elements of the generic force fi and the geometric tensor gij are given as

$${f}_{i}={\partial }_{i}\frac{\left\langle {{{\Psi }}}_{\theta }| H| {{{\Psi }}}_{\theta }\right\rangle }{\left\langle {{{\Psi }}}_{\theta }| {{{\Psi }}}_{\theta }\right\rangle },$$
$${g}_{ij}=\frac{\left\langle {\partial }_{i}{{{\Psi }}}_{\theta }| {\partial }_{j}{{{\Psi }}}_{\theta }\right\rangle }{\left\langle {{{\Psi }}}_{\theta }| {{{\Psi }}}_{\theta }\right\rangle }-\frac{\left\langle {\partial }_{i}{{{\Psi }}}_{\theta }| {{{\Psi }}}_{\theta }\right\rangle }{\left\langle {{{\Psi }}}_{\theta }| {{{\Psi }}}_{\theta }\right\rangle }\frac{\left\langle {{{\Psi }}}_{\theta }| {\partial }_{j}{{{\Psi }}}_{\theta }\right\rangle }{\left\langle {{{\Psi }}}_{\theta }| {{{\Psi }}}_{\theta }\right\rangle },$$

where ∂i is the derivative with respect to the i-th element of the parameter θi. It is noteworthy that the geometric tensor g is the extension of the Fisher information to quantum states. The stochastic gradient method based on g, or the Fisher information, was independently developed in the machine learning community81, and is frequently referred to as the natural gradient method.

Note that both f and g can be estimated efficiently using Monte Carlo sampling. Indeed, any physical observable O can be estimated for a quantum state \(\left|{{\Psi }}\right\rangle\) as

$$\left\langle O\right\rangle =\frac{\left\langle {{\Psi }}| O| {{\Psi }}\right\rangle }{\left\langle {{\Psi }}| {{\Psi }}\right\rangle }=\frac{{\sum }_{\sigma }| {{\Psi }}(\sigma ){| }^{2}{O}_{{\rm{loc}}}(\sigma )}{{\sum }_{\sigma }| {{\Psi }}(\sigma ){| }^{2}}=\mathop{\sum}\limits_{\sigma }p(\sigma ){O}_{{\rm{loc}}}(\sigma ),$$

where \({O}_{{\rm{loc}}}(\sigma )={\sum }_{\sigma ^{\prime} }\frac{{{\Psi }}(\sigma ^{\prime} )}{{{\Psi }}(\sigma )}\left\langle \sigma | O| \sigma ^{\prime} \right\rangle\) is introduced to enable the simulation of the expectation value from classical sampling over the probability distribution p(σ) = Ψ(σ)2/∑σΨ(σ)2. Using the Metropolis–Hastings algorithm with particle-number conservation, we typically sample \({\mathcal{O}}(1{0}^{5})\) to \({\mathcal{O}}(1{0}^{7})\) spin configurations to estimate p(σ). Each configuration is drawn every 10–20 Monte Carlo steps so that the autocorrelation, and hence the sampling error, is sufficiently small when the optimization converges.

Three technical remarks are in order. First, we take the initial state \(\left|{{{\Psi }}}_{0}\right\rangle (=\left|{{{\Psi }}}_{{\theta }_{0}}^{{\rm{RBM}}}\right\rangle )\) as the HF state such that the overlap with the ground state is nonzero. Small noise is added to avoid the gradient vanishing problem, which arises when the parameters of the RBM state are tuned to express any computational basis exactly. Second, to stabilize the optimization, small number ϵ is uniformly added to the diagonal elements of g as gii → gii + ϵ. While large ϵ is beneficial in early iterations, it is necessary to decrease it, or otherwise one may result in undesirable local minima. Therefore, ϵ is initially set as \({\mathcal{O}}(1{0}^{-2})\) and gradually decreased to \({\mathcal{O}}(1{0}^{-3})\) after several hundred steps. Third, we find that it is crucial to adopt an appropriate scheduling of η to speed up the optimization and, more importantly, avoid local minima. In the present work, we exclusively employ the RMSProp method82, which adaptively modifies η according to the magnitude of the gradient.

Energy corrections by the extended Koopmans’ theorem

In Fig. 5, we visualize the effect of the corrections to the energy bands by the extended Koopmans’ theorem, which are defined in Eq. (9) in the main text as

$$\left\{\begin{array}{lll}{{\Delta }}{E}^{{\mathrm{IP}}}&=&{E}_{{\mathrm{GS}}}^{{N}_{v}}-{E}_{{\mathrm{GS}}}^{{N}_{v}-1},\\ {{\Delta }}{E}^{{\mathrm{EA}}}&=&{E}_{{\mathrm{GS}}}^{{N}_{v}+1}-{E}_{{\mathrm{GS}}}^{{N}_{v}},\end{array}\right.$$

where \({E}_{{\mathrm{GS}}}^{n}\) is the energy of the RBM optimized in the particle-number sector n. Here, panels (a) and (b) indicate the first conduction and valence bands, respectively. In both bands, we observe a systematic deviation, which we attribute to the lack of orbital relaxation effect caused by the removal or addition of a single electron. The order of the correction ΔE ~ 0.05 Ha is comparable to that of the electronic correlation (~0.1 Ha).

Fig. 5: The effect of energy correction to quasiparticle bands.
figure 5

Here, we display the a lowest conduction band and b highest valence band. The unfilled green triangle denotes the raw values obtained by solving Eq. (6) defined in the main text, and the filled ones indicate the values corrected by Eq. (9) following the extended Koopmans' theorem. The blue dotted lines, yellow lines, and black dashd lines indicate the result by restricted Hartree–Fock method, coupled-cluster equation-of-motion formalism with singles and doubles (EOM-CCSD), and EOM-CCSD with perturbative triple excitation (EOM-CCSD(T)(a)*), respectively.