Abstract
Computing the ground state of interacting quantum matter is a longstanding challenge, especially for complex twodimensional systems. Recent developments have highlighted the potential of neural quantum states to solve the quantum manybody problem by encoding the manybody wavefunction into artificial neural networks. However, this method has faced the critical limitation that existing optimization algorithms are not suitable for training modern largescale deep network architectures. Here, we introduce a minimumstep stochasticreconfiguration optimization algorithm, which allows us to train deep neural quantum states with up to 10^{6} parameters. We demonstrate our method for paradigmatic frustrated spin1/2 models on square and triangular lattices, for which our trained deep networks approach machine precision and yield improved variational energies compared to existing results. Equipped with our optimization algorithm, we find numerical evidence for gapless quantumspinliquid phases in the considered models, an open question to date. We present a method that captures the emergent complexity in quantum manybody problems through the expressive power of largescale artificial neural networks.
Similar content being viewed by others
Main
It has been an everpersisting quest in condensedmatter and quantum manybody physics to capture the essence of quantum manybody systems that is covered behind their exponential complexity. Although many numerical methods have been developed to access the quantum manybody problem with strong interactions, it still remains an extraordinary challenge to obtain accurate groundstate solutions, especially for complex and large twodimensional systems. The respective challenges depend on the method utilized, such as the ‘curse of dimensionality’ in exact diagonalization^{1}, the notorious sign problem^{2} in quantum Monte Carlo approaches^{3} or the growth of entanglement and matrix contraction complexity in tensor network methods^{4}. One of the paradigmatic instances of such complex twodimensional quantum matter is the putative quantumspinliquid (QSL) phase in frustrated magnets^{5}. Although a large variety of different numerical methods have been applied, the nature of many of the presumed QSLs still remains debated, such as the prototypical frustrated Heisenberg J_{1}–J_{2} magnets on square^{6,7,8,9,10,11,12} or triangular lattices^{13,14,15,16,17,18,19,20,21,22}.
Recently, neural quantum states (NQSs) have been introduced as a promising alternative for solving the quantum manybody problem by means of artificial neural networks^{23}. This approach has already seen tremendous progress for QSLs^{24,25,26}. However, this method also faces an outstanding challenge critically limiting its capabilities and its potential to date. Due to the rugged quantum landscape^{27} with many saddle points, it is typically necessary to utilize stochastic reconfiguration (SR)^{28} in the optimization. SR is a quantum generalization of natural gradient descent^{29} and has a \({{{\mathcal{O}}}}({N}_\mathrm{p}^{3})\) complexity for a network with N_{p} parameters, which impedes the training of deep networks. Consequently, the current applications of NQS mainly focus on shallow networks, such as a restricted Boltzmann machine (RBM)^{23,30} or shallow convolutional neural networks (CNNs)^{25,31} with no more than ten layers and around 10^{3} parameters. Many efforts have been made to overcome the optimization difficulty in deep NQS based on either iterative solvers^{23}, approximate optimizers^{32,33,34,35,36} or largescale supercomputers^{37,38}. However, the cost of SR still represents the key limitation in increasing the network size and, thereby, fully materializing the exceptional power of artificial neural networks for outstanding physics problems.
In this work, we introduce an alternative training algorithm for NQS, which we term the minimumstep stochastic reconfiguration (MinSR). We show that the optimization cost in MinSR is reduced massively while it remains as accurate as SR. Concretely, the training cost of MinSR is only linear in N_{p}, which represents an enormous acceleration compared to SR. This, in turn, allows us to push the NQS towards the deep era by training deep networks with up to 64 layers and 10^{6} parameters. We apply our resulting algorithm to paradigmatic twodimensional quantum spin systems, such as the spin1/2 Heisenberg J_{1}–J_{2} model, both to demonstrate the resulting accuracies for large system sizes beyond what is achievable with other computational methods and to address an outstanding question relating to the gaps in the model’s QSL phases.
Results
Minimumstep stochastic reconfiguration
In the NQS approach, a neural network is utilized to encode and compress the manybody wavefunction. In a system with N spin1/2 degrees of freedom, the Hilbert space can be spanned by the S_{z} spin configuration basis \(\left\vert \sigma \right\rangle =\left\vert {\sigma }_{1},\ldots ,{\sigma }_{N}\right\rangle\) with σ_{i} = ↑ or ↓. An NQS with parameters θ maps every σ at the input to a wavefunction component ψ_{θ,σ} at the output^{23}, as shown in Fig. 1a. The full quantum state is then given by the superposition \(\left\vert {\varPsi }_{\theta }\right\rangle ={\sum }_{\sigma }{\psi }_{\theta ,\sigma }\left\vert \sigma \right\rangle\). When searching for ground states based on a variational Monte Carlo method (VMC), θ is optimized to minimize the variational energy \({E}_{\theta }=\left\langle {\varPsi }_{\theta }\right\vert {{{\mathcal{H}}}}\left\vert {\varPsi }_{\theta }\right\rangle /\left\langle {\varPsi }_{\theta } {\varPsi }_{\theta }\right\rangle\).
The standard numerical approach for finding the minimal variational energy for NQS is SR. This is done by approximately implementing imaginarytime evolution. Thus, as the training progresses, the contributions from eigenstates with higher energies are systematically reduced, thereby pushing the state towards the ground state step by step. In every training step, this requires minimizing the quantum distance d between the new variational state \(\left\vert {\varPsi }_{\theta +\delta \theta }\right\rangle\) and the exact imaginarytime evolved state \(\operatorname{e}^{{{{\mathcal{H}}}}\delta \tau }\left\vert {\varPsi }_{\theta }\right\rangle\), where δτ is the imaginarytime interval.
As proven in the Supplementary Information, the quantum distance d can be estimated for a group of samples σ with P_{σ} ∝ ∣ψ_{σ}∣^{2} as \({d}^{\;2}={\sum }_{\sigma }{\left\vert {\sum }_{k}{\overline{O}}_{\sigma k}\delta {\theta }_{k}{\overline{\epsilon }}_{\sigma }\right\vert }^{2}\), where ∑_{σ} is performed on spin configurations in samples. We adopt the following notation: \({\overline{O}}_{\sigma k}=({O}_{\sigma k}\left\langle {O}_{\sigma k}\right\rangle )/\sqrt{{N}_\mathrm{s}}\) with \({O}_{\sigma k}=\frac{1}{{\psi }_{\sigma }}\frac{\partial {\psi }_{\sigma }}{\partial {\theta }_{k}}\), and \({\overline{\epsilon }}_{\sigma }=\delta \tau\left({E}_{{{{\rm{loc}}}},\sigma }\left\langle {E}_{{{{\rm{loc}}}},\sigma }\right\rangle\right)/\sqrt{{N}_\mathrm{s}}\) with local energy \({E}_{{{{\rm{loc}}}},\sigma }={\sum }_{{\sigma }^{{\prime} }}\frac{{\psi }_{{\sigma }^{{\prime} }}}{{\psi }_{\sigma }}{H}_{\sigma {\sigma }^{{\prime} }}\), where N_{s} is the number of samples and \(\left\langle \ldots \right\rangle\) represents the mean value over the given set of samples.
Thus, the quantum distance d can be rewritten as \(d=  \overline{O}\delta \theta \overline{\epsilon } \) if we treat δθ and \(\overline{\epsilon }\) as vectors and \(\overline{O}\) as a matrix. As a key consequence, we introduce a new linear equation
whose leastsquares solution minimizes the quantum distance d and leads to the SR equation. Conceptually, one can understand the lefthand side of this equation as the change of the variational state induced by an optimization step of the parameters, and the righthand side as the change of the exact imaginarytime evolving state. The traditional SR solution minimizing their difference is
As illustrated in Fig. 1a, the matrix S in equation (2) plays an important role as the quantum metric in VMC^{29,39,40}, which links the variations in the Hilbert space and the parameter space. However, inverting the matrix S, which has N_{p} × N_{p} elements, has \({{{\mathcal{O}}}}({N}_\mathrm{p}^{3})\) complexity, and this a major difficulty when optimizing deep NQSs with large N_{p}. To reduce the cost of SR, we focus on a specific optimization case of a deep network with a large number of parameters N_{p} but a relatively small amount of batch samples N_{s}, as occurs in most deep learning research. In this case, as shown in Fig. 1b, the rank of the N_{p} × N_{p} matrix S is at most N_{s}, meaning that S contains much less information than its capacity. As a more efficient way to express the information of the quantum metric, we introduce the neural tangent kernel \(T=\overline{O}\,{{\overline{O}}^{{\dagger} }}\) (ref. ^{41}), which has the same nonzero eigenvalues as S but the matrix size reduces from N_{p} × N_{p} to N_{s} × N_{s}.
As derived in Methods, we propose a new method termed MinSR using T as the compressed matrix,
which is mathematically equivalent to the traditional SR solution but only has \({{{\mathcal{O}}}}({N}_\mathrm{p}{N}_\mathrm{s}^{2}+{N}_\mathrm{s}^{3})\) complexity. For large N_{p}, it provides a tremendous acceleration with a time cost proportional to N_{p} instead of \({N}_\mathrm{p}^{3}\). Therefore, it can be viewed as a natural reformulation of traditional SR, which is particularly useful in the limit N_{p} ≫ N_{s}, as relevant in deep learning situations. For a performance comparison, Extended Data Fig. 1 shows the time cost and accuracy of different optimization methods.
Benchmark models
To demonstrate the exceptional performance of MinSR, we consider in the following the paradigmatic spin1/2 Heisenberg J_{1}J_{2} model on a square lattice. This choice serves two purposes. On the one hand, this model serves as a standard benchmark system in various NQS studies and provides a convenient comparison to other stateoftheart methods. On the other hand, it represents a paradigmatic reference case of QSLs in frustrated magnets, as an outstanding question regarding the nature of the QSL phase is whether it is gapped or gapless. The Hamiltonian of the system is given by
where \({{{{\bf{S}}}}}_{i}=({S}_{i}^{x},{S}_{i}^{y},{S}_{i}^{z})\) with \({S}_{i}^{x},{S}_{i}^{y},{S}_{i}^{z}\) spin1/2 operators at site i, \(\left\langle i,j\right\rangle\) and \(\left\langle \left\langle i,j\right\rangle \right\rangle\) indicate pairs of nearestneighbour and nextnearestneighbour sites, respectively, and J_{1} is chosen to be equal to 1 for simplicity in this work.
We will specifically focus on two points in the parameter space: J_{2}/J_{1} = 0 and J_{2}/J_{1} = 1/2. At J_{2}/J_{1} = 0, the Hamiltonian reduces to the nonfrustrated Heisenberg model. At J_{2}/J_{1} = 1/2, the J_{1}J_{2} model becomes strongly frustrated close to the maximally frustrated point where the system resides in a QSL phase^{24}, which imposes a great challenge for existing numerical methods, including NQS^{31,42}. Two different designs of residual neural networks (ResNet), whose details we describe in Methods, will be employed for variationally learning the ground states of these benchmark models. A direct comparison with exact diagonalization results for the 6 × 6 square lattice can be found in Extended Data Fig. 2, which shows that our network can even approach machine precision on modern GPU and TPU hardware.
For a nonfrustrated Heisenberg model of a 10 × 10 square lattice, a deep NQS trained by MinSR provides an unprecedentedly precise result that is better than all existing variational methods, as shown in Fig. 2a. The adopted reference groundstate energy per site is E_{GS}/N = −0.67155267(5), as given by a simulation based on a stochastic series expansion^{43} performed by ourselves, instead of the commonly used reference E/N = −0.671549(4) from ref. ^{44} because our best NQS variational energy E/N = −0.67155260(3) provides even better accuracy compared to this common reference energy. Thanks to the deep network architecture and the efficient MinSR method, the relative error of the variational energy ϵ_{rel} = (E − E_{GS})/∣E_{GS}∣ drops much faster than for the onelayer RBM as N_{p} increases and finally reaches a level of 10^{−7}, greatly outperforming existing results.
To attain the next level of complexity, we will now focus on the frustrated J_{1}–J_{2} model, whose accurate groundstate solution has remained a key challenge for all available computational approaches. Figure 2b shows that, for a 10 × 10 square lattice, our method based on MinSR allows us to reach groundstate energies below what is possible with any other numerical scheme so far. In this context, the Marshall sign rule (MSR) limit shows the energy one can obtain without considering any frustration. As shown in the figure, the use of deep NQS becomes absolutely crucial as the shallow CNN is not guaranteed to beat the MSR limit. Most importantly, the variational energy we obtained was reduced upon increasing the network size for both networks trained by MinSR. We finally trained unprecedentedly large networks with 64 convolutional layers in ResNet1 and more than one million parameters in ResNet2, to attain the best variational energy E/N = −0.4976921(4), which outperforms all existing numerical results. The extraordinary variational outcomes allow us to accurately estimate the groundstate energy E_{GS}/N = −0.497715(9) by zerovariance extrapolation, as described in Methods. Compared with the previous best result^{24}, ϵ_{rel} in our biggest network is around 4 times lower, suggesting that our deep NQS result is substantially more accurate. From this, we conclude that the deep NQS trained by MinSR is superior even in the frustrated case, which was argued to be challenging for NQS on a general level^{45}. The variational energies of different methods in this prototypical model are summarized in Extended Data Table 1.
Finally, we aim to provide evidence that our approach still exhibits advantageous performance compared to other computational methods upon further increasing the system size. Figure 2c presents the variational energy obtained for a 16 × 16 square lattice and compares the results with existing results in the literature. One can clearly see that our approach yields the best variational energy E/N = −0.4967163(8) for the frustrated J_{1}J_{2} model on such a large lattice. Compared with the best existing variational result given in ref. ^{37}, ϵ_{rel} in this work is still 2.5 × 10^{−4} lower. In summary, the deep NQS trained by MinSR provides results for large frustrated models that are not only on a par with other stateoftheart methods but can substantially outperform them.
Energy gaps of a QSL
Although so far we have focused on demonstrating the exceptional performance of the MinSR method, we now take the next step by addressing an outstanding physical question regarding the J_{1}J_{2} Heisenberg model considered. Concretely, we utilize the combination of the deep NQS and MinSR to study the gaps for two famous QSL candidates in the J_{1}J_{2} model on a square lattice and on a triangular lattice. In these systems, several works in the literature^{6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22} have shown the existence of QSL phases, although the energy gaps in the thermodynamic limit, especially for the triangular lattice, still remain debated. Figure 3 present an extrapolation of the energy gaps between states with total spin S = 0 and S = 1 to the thermodynamic limit within the most frustrated regime in which QSL candidates reside. As explained in Extended Data Figs. 3 and 4, the energy is estimated by NQS trained by MinSR with a Lanczos step and zerovariance extrapolation to increase accuracy. In the Supplementary Information, we provide the spin and dimer structure factors to support the existence of a QSL phase on the triangular lattice and compare gap estimates with and without zerovariance extrapolation.
On the square lattice, the gaps are measured in the total spin S = 1 sector and momentum k = (π, π) (Mpoint) at the most frustrated point J_{2}/J_{1} = 0.5 for different system sizes, including 6 × 6, 8 × 8, 10 × 10, 12 × 12, 16 × 16 and 20 × 20. As shown by the small fitting error in Fig. 3 with Δ = a + b/L + c/L^{2}, the vanishing gap Δ = 0.00(3) in the thermodynamic limit provides an unprecedented precision and is so far the most accurate extrapolation at this most frustrated point. In addition to the direct extrapolation of the energy gap Δ, we support our finding of a vanishing gap in the inset of Fig. 3, which display Δ × L as a function of 1/L (ref. ^{24}). Although a finite gap would imply a divergent Δ × L, we observe a constant value, further corroborating our conclusion of a gapless phase in the thermodynamic limit. Combined with the large lattice sizes used, this result shows strong evidence of gapless QSLs as suggested by refs. ^{10,11,12,24} in contrast to the conclusion of the gapped QSLs in ref. ^{6}.
The triangular J_{1}–J_{2} model has even stronger frustration compared to the square one, leading to larger variational errors in different methods and more controversy regarding the nature of the QSLs. To target the QSLs in this model, we also studied the most frustrated point at J_{2}/J_{1} = 0.125. The gaps were measured for the S = 1 and k = (4π/3, 0) state on lattices 6 × 6, 6 × 9, 12 × 12 and 18 × 18 for the triangular lattice. Due to the larger variational error on the triangular lattice compared to the square case, a linear fitting Δ = a + b/L was utilized instead of the quadratic one to prevent overfitting. For a lattice with unequal extents L_{x} and L_{y} in different dimensions, L is defined as \(\sqrt{{L}_{x}{L}_{y}}\). Our data matches well with the linear relation Δ ∝ 1/L as expected for Dirac spin liquids, and the vanishing gap at the thermodynamic limit is Δ = −0.05(6). Furthermore, we also performed an extrapolation of Δ × L (inset of Fig. 3). We found a finite Δ × L upon increasing the system size L, indicating a vanishing gap in the thermodynamic limit. We take these results as strong numerical evidence suggesting the existence of a gapless QSL as also indicated in refs. ^{13,16,20,22} instead of a gapped QSL in refs. ^{14,15,21}. Consequently, these numerical results demonstrate the exceptional computational power of the MinSR method applied to NQS wavefunctions, especially for the challenging regime of frustrated quantum magnets in two dimensions.
Discussion
To date, there have been tremendous efforts in solving quantum manybody problems in two major directions, studying the simplified Hilbert space given by specific physical backgrounds on classical computers and traversing the full Hilbert space on quantum computers. In this work, we present another promising approach that is supported by deep NQSs. This method allows us to approximate the complexity of quantum manybody problems through the emergent expressive power of largescale neural networks.
For the future, we envision promising research directions, for instance, studying fermionic systems including the celebrated Hubbard model^{46,47} or ab initio quantum chemistry^{48}, in which the traditional methods have limited accuracy, especially in the strongly interacting regime. Moreover, it is key to point out that the MinSR method is not at all restricted to NQS. As a general optimization method in VMC, it can also be applied to other variational wavefunctions, like tensor networks, so that a more complex ansatz can be introduced in these conventional methods to enhance the expressivity. It will also be of great importance to exploit the expressive power of largescale variational wavefunctions through a suitable design that would lower the computational cost and increase the accuracy.
We can further envision the application of MinSR beyond the scope of physics for general machine learning tasks, if a suitable space for optimization like the Hilbert space in physics can be defined for which we can construct an equation like equation (1). In reinforcement learning tasks, for instance, obtaining gradients from the action in the environment is usually the most timeconsuming part of the training, so a MinSRlike natural policy gradient^{49} can provide more accurate optimization directions without substantially increased time cost and greatly improve the training efficiency, even for very deep neural networks. Recently, a method inspired by MinSR has already found applications in general machine learning tasks^{50}.
Methods
Derivation of the MinSR equation
MinSR was derived based on the observation that equation (1) is underdetermined when N_{s} < N_{p}. To obtain a unique δθ solution, we employed the leastsquares minimumnorm condition, which is widely used for underdetermined linear equations. To be specific, we chose, among all solutions with minimum residual error \(  \overline{O}\delta \theta \overline{\epsilon } \), the one minimizing the norm of the variational step ∣∣δθ∣∣, which helps to reduce higherorder effects, prevent overfitting and improve stability. We called this method MinSR due to the additional minimumstep condition. In this section, we adopt two different approaches, namely the Lagrangian multiplier method and the pseudoinverse method, to derive the MinSR formula in equation (3).
Lagrangian multiplier
The MinSR solution can be derived by minimizing the variational step ∑_{k}∣δθ_{k}∣^{2} under the constraint of minimum residual error \({\sum }_{\sigma } {\sum }_{k}{\overline{O}}_{\sigma k}\delta {\theta }_{k}{\overline{\epsilon }}_{\sigma }{ }^{2}\). To begin, we assume that the minimum residual error is 0, which can always be achieved by letting N_{s} < N_{p} and assuming a typical situation in VMC that \({\overline{O}}_{\sigma k}\) values obtained by different samples are linearly independent. This leads to constraints \({\sum }_{k}{\overline{O}}_{\sigma k}\delta {\theta }_{k}{\overline{\epsilon }}_{\sigma }=0\) for each σ. The Lagrange function is then given by
where α_{σ} is the Lagrangian multiplier. Written in matrix form, the Lagrangian function becomes
From \(\partial {{{\mathcal{L}}}}/\partial (\delta {\theta }^{{\dagger} })=0\), one obtains
Putting equation (7) back into \(\overline{O}\delta \theta =\overline{\epsilon }\), one can solve α as
Combining equation (8) with equation (7), one obtains the final solution as
which is the MinSR formula in equation (3). A similar derivation also applies when \(\overline{O},\delta \theta\) and \(\overline{\epsilon }\) are all real.
In our simulations, the residual error is nonzero, which differs from our previous assumption. This is because the inverse in equation (9) is replaced by a pseudoinverse with finite truncation to stabilize the solution in the numerical experiments.
Pseudoinverse
To simplify the notation, we use \(A=\overline{O},x=\delta \theta\) and \(b=\overline{\epsilon }\). We will prove that for a linear equation Ax = b,
is the leastsquares minimumnorm solution, where the matrix inverse is pseudoinverse.
First, we prove x = A^{−1}b is the solution we need. The singular value decomposition of A gives
where U and V are unitary matrices, and Σ is a diagonal matrix with σ_{i} = Σ_{ii} = 0 if and only if i > r with r the rank of A. The leastsquares solution is given by minimizing
where \({x}^{{\prime} }={V}^{\;{\dagger} }x\), \({b}^{{\prime} }={U}^{\;{\dagger} }b\) and N_{s} is the dimension of b, and the second step is because applying a unitary matrix does not change the norm of a vector. Therefore, all the leastsquares solutions take the form
Among all these possible solutions, the one that minimizes \(  x  =  {x}^{{\prime} } \) is
With the following definition of a pseudoinverse
we have \({x}^{{\prime} }={\varSigma }^{+}{b}^{{\prime} }\), so the final solution is
Furthermore, we show the following equality
With the singular value decomposition of A in equation (11), equation (17) can be directly proved by
and
In the derivation, the shapes of diagonal matrices Σ and Σ^{+} are not fixed but assumed to match their neighbour matrices to make the matrix multiplication valid.
Equation (17) shows that the SR solution in equation (2) and MinSR solution in equation (3) are both equivalent to the pseudoinverse solution \(\delta \theta ={\overline{O}}^{1}\overline{\epsilon }\), which justifies MinSR as a natural alternative to SR when N_{s} < N_{p}.
MinSR solution
Numerical solution
In this section, we focus on how to solve the MinSR equation numerically:
The whole computation, starting from \(T=\overline{O}\,{\overline{O}}^{{\dagger} }\), should be executed under doubleprecision arithmetic to ensure that small eigenvalues are reliable.
Then a suitable pseudoinverse should be applied to obtain a stable solution. In practice, the Hermitian matrix T is first diagonalized as T = UDU^{†}, and the pseudoinverse is given by
where D^{+} is the pseudoinverse of the diagonal matrix D, numerically given by a cutoff below which the eigenvalues are regarded as 0, that is
where λ_{i} and \({\lambda }_{i}^{+}\) are the diagonal elements of D and \({D}^{+}\), \({\lambda }_{\max }\) is the largest value among λ_{i}, and r_{pinv} and a_{pinv} are the relative and absolute pseudoinverse cutoffs. In most cases, we choose r_{pinv} = 10^{−12} and a_{pinv} = 0. Furthermore, we modify the aforementioned direct cutoff to a soft one^{52}:
to avoid abrupt changes when the eigenvalues cross the cutoff during optimization.
Complex neural networks
Our original MinSR formula equation (3) can be applied when the network is real or complex holomorphic. In our ResNet2 architecture, however, the neural network parameters are real but the network outputs can be complex, in which case equation (3) cannot be directly applied. For other nonholomorphic networks, a complex parameter can be taken as two independent real parameters but this problem still occurs. To obtain the MinSR equation in these special cases, notice that the quantum distance d between \(\left\vert {\varPsi }_{\theta +\delta \theta }\right\rangle\) and \(\operatorname{e}^{\mathrm{i}{{{\mathcal{H}}}}\delta \tau }\left\vert {\varPsi }_{\theta }\right\rangle\) can be reformulated as
assuming \(\overline{O}\) and \(\overline{\epsilon }\) are complex while δθ is real. By defining
one can rewrite the quantum distance again as \({d}^{\;2}=  {\overline{O}}^{\;{\prime} }\delta \theta {\overline{\epsilon }}^{\;{\prime} } { }^{2}\) with all entities real. The MinSR solution, in this case, is similarly given by
Similar arguments can also provide the SR equation in the nonholomorphic case as
where \(S={\overline{O}}^{{\dagger} }\overline{O}\) and \(F={\overline{O}}^{{\dagger} }\overline{\epsilon }\) are the same as for the ordinary SR solution. This solution agrees with the widely used nonholomorphic SR solution^{53}.
Neural quantum states
In this work, we adopt two different designs of ResNets. Several techniques are also applied to reduce the error.
ResNet1
The first architecture, as suggested in ref. ^{54}, has two convolutional layers in each residual block, each given by a layer normalization, a ReLU activation function and a convolutional layer sequentially. All the convolutional layers are realvalued with the same number of channels and kernel size. After the forward pass through all residual blocks, a final activation function \(f(x)=\cosh x\,(x > 0),\,2\cosh x\,(x < 0)\) is applied, which resembles the \(\cosh (x)\) activation in RBM but can also give negative outputs so that the whole network is able to express sign structures while still being realvalued. In the nonfrustrated case, ∣f(x)∣ is used as the final activation function to make all outputs positive. After the final activation function, the outputs v_{i} are used to compute the wavefunction as \({\psi }_{\sigma }^{{{{\rm{net}}}}}={\prod }_{i}({v}_{i}/t)\), where t is a rescaling factor updated in every training step. t is used to prevent a data overflow after the product.
ResNet2
The second design of ResNet basically follows ref. ^{26}. In this architecture, the residual blocks are the same as ResNet1 but the normalization layers are removed. In the last layer, two different kinds of activations can be applied. For realvalued wavefunctions, we chose \(f(x)=\sinh (x)+1\). For complexvalued wavefunctions, we split all channels in the last layer into two groups and employ \(f({x}_{1},{x}_{2})=\exp ({x}_{1}+\mathrm{i}{x}_{2})\). A rescaling factor t is also inserted in suitable places in f to prevent an overflow.
Finally, a sum is performed to obtain the wavefunction. Considering the possible nonzero momentum q, the wavefunction is given by
where v_{c,i} is the lastlayer neuron at channel c and site i, and r_{i} is the realspace position of site i. This definition ensures that the whole NQS has a momentum q.
In summary, ResNet1 performs better when one applies transfer learning from a small lattice to a larger one, but ResNet2, in general, has better accuracy and stability. Moreover, ResNet2 allows one to implement nonzero momentum, which is key to finding lowlying excited states.
Sign structure
On top of the raw output from the neural network \({\psi }_{\sigma }^{{{{\rm{net}}}}}\), the MSR^{55} is applied to wavefunctions on a square lattice, which serves as the exact sign structure for the nonfrustrated Heisenberg model but is still the approximate sign structure in the frustrated region around J_{2}/J_{1} ≈ 0.5. The sign structure representing the 120° magnetic order is also applied for the triangular lattice. Although these sign structures are additional physical inputs for specific models, the generality is not reduced because it has been shown that simple sign structures such as MSR can be exactly solved by an additional sign network^{56,57}.
Symmetry
Symmetry plays an important role in improving the accuracy and finding lowlying excited states for NQS^{30,58}. In this work, we apply symmetry on top of the welltrained \({\psi }_{\sigma }^{{{{\rm{net}}}}}\) to project variational states onto suitable symmetry sectors. Assuming the system permits a symmetry group of order ∣G∣ represented by operators T_{i} with characters ω_{i}, the symmetrized wavefunction is then defined as^{30,59}
With translation symmetry already enforced by the CNN architecture, the remaining symmetries applied by equation (29) are the point group symmetry, which is C_{4v} for the square lattice and D_{6} for the triangular lattice, and the spin inversion symmetry σ → −σ (refs. ^{60,61,62,63,64}).
Zerovariance extrapolation
The variational wavefunction provides an inexact estimate of the groundstate energy due to the variational error. Fortunately, in VMC one can compute the energy variance
as an estimate of the variational error. Hence, an extrapolation to zero energy variance gives a better estimate of the groundstate energy^{65,66}, which has been successfully applied to NQS in refs. ^{30,37}. In the following, we adopt the derivation in ref. ^{66} to show how to perform the extrapolation.
Assuming the normalized variational state \(\left\vert \psi \right\rangle\) deviates only slightly from the exact ground state \(\left\vert {\psi }_\mathrm{g}\right\rangle\), one can express it as
where \(\left\vert {\psi }_\mathrm{e}\right\rangle\) represents the error in the variational state orthogonal to the ground state and λ is a small positive number indicating the error strength. Denoting \({E}_\mathrm{g}=\left\langle {\psi }_\mathrm{g} {{{\mathcal{H}}}} {\psi }_\mathrm{g}\right\rangle\), \({E}_\mathrm{e}=\left\langle {\psi }_\mathrm{e} {{{\mathcal{H}}}} {\psi }_\mathrm{e}\right\rangle\) and \({\left\langle {{{{\mathcal{H}}}}}^{2}\right\rangle }_\mathrm{e}=\left\langle {\psi }_\mathrm{e} {{{{\mathcal{H}}}}}^{2} {\psi }_\mathrm{e}\right\rangle\), one can express the variational energy as
Similarly, the energy variance can be written as
If the error state \(\left\vert {\psi }_\mathrm{e}\right\rangle\) does not change substantially in different training attempts, there is a linear relation
for small λ, so a linear extrapolation to σ^{2} = 0 gives E = E_{g}.
As shown in Extended Data Fig. 3, the ratio (E − E_{g})/σ^{2} also remains nearly unchanged for different lattice sizes and symmetry sectors. This empirical conclusion is adopted to estimate the ratio in the large lattice from smaller ones so as to reduce the error and the time cost.
Lanczos step
The Lanczos step is a popular method in VMC for improving the variational accuracy^{67}. It is also used in NQS^{26,38}.
The key idea of a Lanczos step is to construct new states \(\left\vert {\psi }_\mathrm{p}\right\rangle\) orthogonal to the welltrained variational wavefunction \(\left\vert {\psi }_{0}\right\rangle\) and to minimize the energy of the new state formed by a linear combination of \(\left\vert {\psi }_{0}\right\rangle\) and \(\left\vert {\psi }_\mathrm{p}\right\rangle\). The new energy is then guaranteed to be lower than the initial energy.
Only one Lanczos step is applied in this work, so we have one state \(\left\vert {\psi }_{1}\right\rangle\) satisfying \(\left\langle {\psi }_{0} {\psi }_{1}\right\rangle =0\) given by
where \({E}_{0}=\left\langle {\psi }_{0} {{{\mathcal{H}}}} {\psi }_{0}\right\rangle\) and \({\sigma }^{\;2}=\left\langle {\psi }_{0} {{{{\mathcal{H}}}}}^{2} {\psi }_{0}\right\rangle {E}_{0}^{\;2}\). The linear combination of \(\left\vert {\psi }_{0}\right\rangle\) and \(\left\vert {\psi }_{1}\right\rangle\) can be written as
whose energy is
where
The minimal energy is achieved at
and the lowest energy is
Initial guess of α
A direct way to compute μ_{n} is by measuring suitable quantities as expectation values of the initial state \(\left\vert {\psi }_{0}\right\rangle\). However, the measurement becomes more accurate if it is performed with a state \(\left\vert {\psi }_{{\alpha }_{0}}\right\rangle\) closer to the ground state^{67}.
In this paper, we estimate the suitable α_{0} to obtain a \(\left\vert {\psi }_{{\alpha }_{0}}\right\rangle\) closer to the true ground state compared to \(\left\vert {\psi }_{0}\right\rangle\). Then, from equation (37), one can compute μ_{3} as
where \({E}_{{\alpha }_{0}}\) can be measured by Monte Carlo sampling. The optimal α_{*} can be derived from μ_{3} by equation (39), and the lowest energy is then given by equation (40).
Energy variance
To compute the energy variance of \(\left\vert {\psi }_{\alpha }\right\rangle\), we start with an intermediate quantity
Like the previous case, one can measure \({v}_{{\alpha }_{0}}\) by Monte Carlo sampling and determine μ_{4} as
Then \({v}_{{\alpha }_{* }}\) can be computed given μ_{3} and μ_{4}, which gives the required energy variance as
Data availability
This research does not rely on any external datasets. The data shown in Figs. 2 and 3 and the obtained neural network weights are available via Zenodo at https://zenodo.org/doi/10.5281/zenodo.7657551 (ref. ^{68}).
Code availability
We provide the code needed to reproduce our main results via Zenodo at https://zenodo.org/doi/10.5281/zenodo.7657551 (ref.^{68}) and via GitHub at https://github.com/ChenAoPhys/MinSR.
References
Lin, H., Gubernatis, J., Gould, H. & Tobochnik, J. Exact diagonalization methods for quantum systems. Comput. Phys. 7, 400 (1993).
Troyer, M. & Wiese, U.J. Computational complexity and fundamental limitations to fermionic quantum Monte Carlo simulations. Phys. Rev. Lett. 94, 170201 (2005).
Ceperley, D. & Alder, B. Quantum Monte Carlo. Science 231, 555 (1986).
Schollwöck, U. The densitymatrix renormalization group in the age of matrix product states. Ann. Phys. 326, 96 (2011).
Balents, L. Spin liquids in frustrated magnets. Nature 464, 199 (2010).
Jiang, H.C., Yao, H. & Balents, L. Spin liquid ground state of the spin\(\frac{1}{2}\) square J_{1}J_{2} Heisenberg model. Phys. Rev. B 86, 024424 (2012).
Wang, L., Poilblanc, D., Gu, Z.C., Wen, X.G. & Verstraete, F. Constructing a gapless spinliquid state for the spin1/2 J_{1}–J_{2} Heisenberg model on a square lattice. Phys. Rev. Lett. 111, 037202 (2013).
Hu, W.J., Becca, F., Parola, A. & Sorella, S. Direct evidence for a gapless Z_{2} spin liquid by frustrating Néel antiferromagnetism. Phys. Rev. B 88, 060402 (2013).
Gong, S.S., Zhu, W., Sheng, D. N., Motrunich, O. I. & Fisher, M. P. A. Plaquette ordered phase and quantum phase diagram in the spin\(\frac{1}{2}\,{J}_{1}{{\mbox{}}}{{\mbox{}}}{J}_{2}\) square Heisenberg model. Phys. Rev. Lett. 113, 027201 (2014).
Wang, L. & Sandvik, A. W. Critical level crossings and gapless spin liquid in the squarelattice spin1/2 J_{1}–J_{2} Heisenberg antiferromagnet. Phys. Rev. Lett. 121, 107202 (2018).
Liu, W.Y. et al. Gapless spin liquid ground state of the spin\(\frac{1}{2}\,{J}_{1}{J}_{2}\) Heisenberg model on square lattices. Phys. Rev. B 98, 241109 (2018).
Ferrari, F. & Becca, F. Gapless spin liquid and valencebond solid in the J_{1}–J_{2} Heisenberg model on the square lattice: insights from singlet and triplet excitations. Phys. Rev. B 102, 014417 (2020).
Kaneko, R., Morita, S. & Imada, M. Gapless spinliquid phase in an extended spin 1/2 triangular Heisenberg model. J. Phys. Soc. Jpn 83, 093707 (2014).
Zhu, Z. & White, S. R. Spin liquid phase of the \(s=\frac{1}{2}\;{J}_{1}{J}_{2}\) Heisenberg model on the triangular lattice. Phys. Rev. B 92, 041105 (2015).
Hu, W.J., Gong, S.S., Zhu, W. & Sheng, D. N. Competing spinliquid states in the spin\(\frac{1}{2}\) Heisenberg model on the triangular lattice. Phys. Rev. B 92, 140403 (2015).
Iqbal, Y., Hu, W.J., Thomale, R., Poilblanc, D. & Becca, F. Spin liquid nature in the Heisenberg J_{1}–J_{2} triangular antiferromagnet. Phys. Rev. B 93, 144411 (2016).
Saadatmand, S. N. & McCulloch, I. P. Symmetry fractionalization in the topological phase of the spin\(\frac{1}{2}\,{J}_{1}{{\mbox{}}}{{\mbox{}}}{J}_{2}\) triangular Heisenberg model. Phys. Rev. B 94, 121111 (2016).
Wietek, A. & Läuchli, A. M. Chiral spin liquid and quantum criticality in extended \(s=\frac{1}{2}\) Heisenberg models on the triangular lattice. Phys. Rev. B 95, 035141 (2017).
Gong, S.S., Zhu, W., Zhu, J.X., Sheng, D. N. & Yang, K. Global phase diagram and quantum spin liquids in a spin\(\frac{1}{2}\) triangular antiferromagnet. Phys. Rev. B 96, 075116 (2017).
Hu, S., Zhu, W., Eggert, S. & He, Y.C. Dirac spin liquid on the spin1/2 triangular Heisenberg antiferromagnet. Phys. Rev. Lett. 123, 207203 (2019).
Jiang, Y.F. & Jiang, H.C. Nature of quantum spin liquids of the \(s=\frac{1}{2}\) Heisenberg antiferromagnet on the triangular lattice: a parallel DMRG study. Phys. Rev. B 107, L140411 (2023).
Sherman, N. E., Dupont, M. & Moore, J. E. Spectral function of the J_{1}–J_{2} Heisenberg model on the triangular lattice. Phys. Rev. B 107, 165146 (2023).
Carleo, G. & Troyer, M. Solving the quantum manybody problem with artificial neural networks. Science 355, 602 (2017).
Nomura, Y. & Imada, M. Diractype nodal spin liquid revealed by refined quantum manybody solver using neuralnetwork wave function, correlation ratio, and level spectroscopy. Phys. Rev. X 11, 031034 (2021).
Astrakhantsev, N. et al. Brokensymmetry ground states of the Heisenberg model on the pyrochlore lattice. Phys. Rev. X 11, 041021 (2021).
Roth, C., Szabó, A. & MacDonald, A. H. Highaccuracy variational Monte Carlo for frustrated magnets with deep neural networks. Phys. Rev. B 108, 054410 (2023).
Bukov, M., Schmitt, M. & Dupont, M. Learning the ground state of a nonstoquastic quantum Hamiltonian in a rugged neural network landscape. SciPost Phys. 10, 147 (2021).
Sorella, S. Green function Monte Carlo with stochastic reconfiguration. Phys. Rev. Lett. 80, 4558 (1998).
Stokes, J., Izaac, J., Killoran, N. & Carleo, G. Quantum natural gradient. Quantum 4, 269 (2020).
Nomura, Y. Helping restricted Boltzmann machines with quantumstate representation by restoring symmetry. J. Phys.: Condens. Matter 33, 174003 (2021).
Choo, K., Neupert, T. & Carleo, G. Twodimensional frustrated J_{1}–J_{2} model studied with neural network quantum states. Phys. Rev. B 100, 125124 (2019).
Sharir, O., Levine, Y., Wies, N., Carleo, G. & Shashua, A. Deep autoregressive models for the efficient variational simulation of manybody quantum systems. Phys. Rev. Lett. 124, 020503 (2020).
Yang, L. et al. Deep learningenhanced variational Monte Carlo method for quantum manybody physics. Phys. Rev. Res. 2, 012039(R) (2020).
HibatAllah, M., Ganahl, M., Hayward, L. E., Melko, R. G. & Carrasquilla, J. Recurrent neural network wave functions. Phys. Rev. Res. 2, 023358 (2020).
Inui, K., Kato, Y. & Motome, Y. Determinantfree fermionic wave function using feedforward neural networks. Phys. Rev. Res. 3, 043126 (2021).
Zhang, W., Xu, X., Wu, Z., Balachandran, V. & Poletti, D. Ground state search by local and sequential updates of neural network quantum states. Phys. Rev. B 107, 165149 (2023).
Liang, X. et al. Deep learning representations for quantum manybody systems on heterogeneous hardware. Mach. Learn.: Sci. Technol. 4, 015035 (2023).
Chen, H., Hendry, D. G., Weinberg, P. E. & Feiguin, A. Systematic improvement of neural network quantum states using Lanczos. In Proc. Advances in Neural Information Processing Systems Vol. 35 (eds Oh, A. H. et al.) 7490–7503 (Curran Associates, 2022).
Mazzola, G., Zen, A. & Sorella, S. Finitetemperature electronic simulations without the Born–Oppenheimer constraint. J. Chem. Phys. https://doi.org/10.1063/1.4755992 (2012).
Park, C.Y. & Kastoryano, M. J. Geometry of learning neural quantum states. Phys. Rev. Res. 2, 023232 (2020).
Jacot, A., Gabriel, F. & Hongler, C. Neural tangent kernel: convergence and generalization in neural networks. In Proc. Advances in Neural Information Processing Systems Vol. 31 (eds Bengio, S. et al.) 8571–8580 (Curran Associates, 2018).
Liang, X. et al. Solving frustrated quantum manyparticle models with convolutional neural networks. Phys. Rev. B 98, 104426 (2018).
Sandvik, A. W. Stochastic series expansion method with operatorloop update. Phys. Rev. B 59, R14157 (1999).
Sandvik, A. W. Finitesize scaling of the groundstate parameters of the twodimensional Heisenberg model. Phys. Rev. B 56, 11678 (1997).
Westerhout, T., Astrakhantsev, N., Tikhonov, K. S., Katsnelson, M. I. & Bagrov, A. A. Generalization properties of neural network approximations to frustrated magnet ground states. Nat. Commun. 11, 1593 (2020).
Luo, D. & Clark, B. K. Backflow transformations via neural networks for quantum manybody wave functions. Phys. Rev. Lett. 122, 226401 (2019).
Moreno, J. R., Carleo, G., Georges, A. & Stokes, J. Fermionic wave functions from neuralnetwork constrained hidden states. Proc. Natl Acad. Sci. USA 119, e2122059119 (2022).
Hermann, J. et al. Ab initio quantum chemistry with neuralnetwork wavefunctions. Nat. Rev. Chem. 7, 692 (2023).
Kakade S. M. A natural policy gradient. In Proc. Advances in Neural Information Processing Systems Vol. 14 (eds by Dietterich, T. et al.) 1531–1538 (MIT Press, 2001).
Chen, Y., Xie, H. & Wang, H. Efficient numerical algorithm for largescale damped natural gradient descent. Preprint at https://arxiv.org/abs/2310.17556 (2023).
He, L. et al. Peps++: towards extremescale simulations of strongly correlated quantum manyparticle models on Sunway Taihulight. IEEE Trans. Parallel Distrib. Syst. 29, 2838 (2018).
Schmitt, M. & Reh, M. jvmc: Versatile and performant variational Monte Carlo leveraging automated differentiation and GPU acceleration. SciPost Physics Codebase https://www.scipost.org/SciPostPhysCodeb.2?acad_field_slug=politicalscience (2021).
Vicentini F. et al. Netket 3: machine learning toolbox for manybody quantum systems. SciPost Physics Codebase https://www.scipost.org/10.21468/SciPostPhysCodeb.7?acad_field_slug=politicalscience (2021).
He, K., Zhang, X., Ren, S. & Sun, J. Identity mappings in deep residual networks. In Proc. Computer Vision – ECCV 2016 (eds Leibe, B. et al.) 630–645 (Springer, 2016).
Marshall, W. & Peierls, R. E. Antiferromagnetism. Proc. R. Soc. Lond. Ser. A 232, 48 (1955).
Szabó, A. & Castelnovo, C. Neural network wave functions and the sign problem. Phys. Rev. Res. 2, 033075 (2020).
Chen, A., Choo, K., Astrakhantsev, N. & Neupert, T. Neural network evolution strategy for solving quantum sign structures. Phys. Rev. Res. 4, L022026 (2022).
Choo, K., Carleo, G., Regnault, N. & Neupert, T. Symmetries and manybody excitations with neuralnetwork quantum states. Phys. Rev. Lett. 121, 167204 (2018).
Reh, M., Schmitt, M. & Gärttner, M. Optimizing design choices for neural quantum states. Phys. Rev. B 107, 195115 (2023).
Westerhout, T. latticesymmetries: A package for working with quantum manybody bases. J. Open Source Softw. 6, 3537 (2021).
Liang, X., Dong, S.J. & He, L. Hybrid convolutional neural network and projected entangled pair states wave functions for quantum manyparticle states. Phys. Rev. B 103, 035138 (2021).
Ferrari, F., Becca, F. & Carrasquilla, J. Neural Gutzwillerprojected variational wave functions. Phys. Rev. B 100, 125131 (2019).
Wu, D. et al. Variational benchmarks for quantum manybody problems. Preprint at https://arxiv.org/abs/2302.04919 (2023).
Rende, R., Viteritti, L. L., Bardone, L., Becca,F. & Goldt, S. A simple linear algebra identity to optimize largescale neural network quantum states. https://arxiv.org/abs/2310.05715 (2023).
Kwon, Y., Ceperley, D. M. & Martin, R. M. Effects of backflow correlation in the threedimensional electron gas: quantum Monte Carlo study. Phys. Rev. B 58, 6800 (1998).
Kashima, T. & Imada, M. Pathintegral renormalization group method for numerical study on ground states of strongly correlated electronic systems. J. Phys. Soc. Jpn 70, 2287 (2001).
Sorella, S. Generalized Lanczos algorithm for variational quantum Monte Carlo. Phys. Rev. B 64, 024512 (2001).
Chen, A. & Heyl, M. Empowering deep neural quantum states through efficient optimization: data and code. Zenodo https://zenodo.org/doi/10.5281/zenodo.7657551 (2023).
Acknowledgements
We gratefully acknowledge M. Schmitt for help in improving the manuscript. We also thank T. Neupert, C. Roth, M. Bukov, F. Vicentini, W.Y. Liu and X. Liang for fruitful discussions. This project has received funding from the European Research Council under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 853443). This work was supported by the German Research Foundation (Project No. 492547816; TRR 360). We also acknowledge the Gauss Centre for Supercomputing e.V. (www.gausscentre.eu) for funding this project by providing computing time through the John von Neumann Institute for Computing on the GCS Supercomputer JUWELS at Jülich Supercomputing Centre. We gratefully acknowledge the scientific support and highperformance computing resources provided by the Erlangen National High Performance Computing Center (NHR) of the FriedrichAlexanderUniversität ErlangenNürnberg (NHR Project No. nqsQuMat). NHR funding is provided by federal and Bavarian state authorities. NHR@FAU hardware is partially funded by the German Research Foundation (Grant No. 440719683).
Funding
Open access funding provided by Universität Augsburg.
Author information
Authors and Affiliations
Contributions
A.C. proposed the method and performed numerical simulations. M.H. provided computing resources and supervised the simulations. Both authors took part in the design of numerical experiments, the writing of the manuscript and the critical review of the paper.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Physics thanks Guglielmo Mazzola and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Performance evaluation of various optimization methods on the 10 × 10 square Heisenberg J_{1}J_{2} model.
a, Time cost of solving the equation \(\overline{O}\delta \theta =\overline{\epsilon }\) for the different optimization methods for different numbers of MonteCarlo samples N_{s} and variational parameters N_{p}, measured at J_{2} = 0 on the ResNet1 architecture and an A100 80G GPU. The time cost of other contributions in a training step on 4 parallel A100 GPUs is presented as the black star, showing that the time cost of SR becomes the bottleneck for training deep networks if MinSR is not employed. b, Relative residual error \(  \overline{O}\delta \theta \overline{\epsilon }  /  \overline{\epsilon } \) for N_{s} = 10^{4} at J_{2} = 0 on ResNet1. c, Training curvea of ResNet2 at J_{2}/J_{1} = 0.5 with N_{s} = 10^{4} and N_{p} ≈ 10^{6} comparing different optimizers.
Extended Data Fig. 2 NQS wave function amplitudes for the Heisenberg and J_{1}J_{2} models on a 6 × 6 square lattice obtained by ResNet1 with 64 layers and 146320 parameters by means of MinSR.
The ED wave function amplitudes obtained by the latticesymmetries package [64] are shown as the black dotted lines, and the spin configurations are sorted according to the descending order of ED amplitudes. All wave function amplitudes are shown for the Heisenberg model, while for the J_{1}J_{2} model only one point is plotted among 10000 successive points. In the inset, we show the infidelity with different numerical precision. The infidelity of a ResNet1 with 13750 parameters which approaches the size limit of SR (pinv) is also presented for comparison. This shows that the deep NQS trained by MinSR can approach TF32 precision in the nonfrustrated case and BF16 precision in the frustrated case on the 6 × 6 lattice with a Hilbert space dimension of 15804956 after applying symmetry, while the shallow NQS trained by traditional SR cannot. In our further tests, the shallow network trained by traditional SR can only approach such precision on the 4 × 6 lattice with a much smaller Hilbert space dimension of 15578.
Extended Data Fig. 3 Zerovariance extrapolation for the square J_{1}J_{2} model at J_{2}/J_{1} = 0.5 with different lattice sizes.
The three data points in the same fitting are obtained by two different sizes of realvalued ResNet2 with 34944 and 139008 parameters and a Lanczos step on the larger one. The ground state sector is S = 0 and k = Γ = (0, 0), and the excited state sector is S = 1 and p = M = (π, π). The error bars show the standard deviations. On the largest 20 × 20 lattice, the estimation of the slope is inaccurate in direct linear fitting due to the uncertainty of data points. Consequently, we utilize an empirical assumption that the slope remains nearly unchanged for different system sizes and symmetry sectors. Excluding too small lattices in which the estimation of slopes is inaccurate due to too close data points, the tendency of unchanged slopes is obvious for the slopes on L = 10, 12, 16, respectively 0.17(2), 0.18(4), 0.15(4) for S = 0, and 0.14(3), 0.16(2), 0.16(2) for S = 1. The unchanged slope is also observed in existing literature [8, 16, 37]. Consequently, we employ the average of the aforementioned slopes as the slope on L = 20 to mitigate the error.
Extended Data Fig. 4 Zerovariance extrapolation for the triangular J_{1}J_{2} model at J_{2}/J_{1} = 0.125 with different lattice sizes.
The three data points in the same fitting are obtained by two sizes of complexvalued ResNet2 with 34944 and 139008 parameters and a Lanczos step on the larger one. The ground state sector is S = 0, k = Γ = (0, 0), and the excited state sector is S = 1, k = K = (4π/3, 0). The error bars show the standard deviations. Similar to the square lattice case, the slope on the 18 × 18 lattice is approximated by the slope on the 12 × 12 lattice to mitigate the error. The data on the 18 × 18 lattice is generated by networks with 139008 parameters and a Lanczos step.
Supplementary information
Supplementary Information
Supplementary Figs. 1–3 and discussion.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Chen, A., Heyl, M. Empowering deep neural quantum states through efficient optimization. Nat. Phys. (2024). https://doi.org/10.1038/s41567024025661
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41567024025661
This article is cited by

Empowering deep neural quantum states through efficient optimization
Nature Physics (2024)