Unitary-Coupled Restricted Boltzmann Machine Ansatz for Quantum Simulations

Neural-Network Quantum State (NQS) has attracted significant interests as a powerful wave-function ansatz to model quantum phenomena. In particular, a variant of NQS based on the restricted Boltzmann machine (RBM) has been adapted to model the ground state of spin lattices and the electronic structures of small molecules in quantum devices. Despite these progresses, significant challenges remain with the RBM-NQS based quantum simulations. In this work, we present a state-preparation protocol to generate a specific set of complex-valued RBM-NQS, that we name the unitary-coupled RBM-NQS, in quantum circuits. This is a crucial advancement as all prior works deal exclusively with real-valued RBM-NQS for quantum algorithms. With this novel scheme, we achieve (1) modeling complex-valued wave functions, (2) using as few as one ancilla qubit to simulate $M$ hidden spins in an RBM architecture, and (3) avoiding post-selections to improve scalability.

In this work, our primary focus is to investigate whether neural-network quantum states (NQS) [44][45][46] can be tailored to better fit the paradigm of hybrid quantum-classical algorithms. We focus on a particular form of NQS based on the restricted Boltzmann Machine (RBM) architecture. Within the communities of computational many-body physics, quantum information and condensed matter physics, there is a growing trend of adopting neural networks techniques, such as the RBM architecture, for various applications. Notable examples include identification of different phases of matter [47][48][49][50][51][52], scalable quantum state tomography [53,54], efficient sampler to accelerate Monte Carlo simulations [55][56][57], quantum error correction codes [58][59][60][61], and variational ansatz for many-body simulations [62][63][64][65][66][67][68][69][70][71][72]. Especially, the work [62] by Carleo and Troyer demonstrated that a complex-valued RBM model can efficiently model manybody wavefunctions with fewer variational parameters than tensor-network methods for few spin-lattice models. Subsequent investigations [73][74][75][76][77] have further clarified and affirmed the usefulness of variants of Boltzmann Machines to model many-body quantum states of complex systems. These encouraging results and rapidly accumulated knowledge about RBM-NQS motivated us to investigate whether it is suitable to apply this family of ansatz in the context of quantum simulation algorithms. In the most part of this work, we should use RBM-NQS and NQS interchangeably when there is no risk of ambiguity,.
While there are already quantum simulation algorithms [78,79] using NQS as variational ansatz with encouraging results, some fundamental obstacles limit their scope of applications. For instance, the existing approaches require the preparation of an extended wave function composed of all visible and hidden spins. As each hidden spin is explicitly modeled with an ancilla qubit, these prior methods consume too many qubits, which are expensive resources for near-term quantum devices. Furthermore, there is no scalable strategy for preparing a general NQS in quantum circuits. This is because many NQS can only be obtained via non-unitary transformations on an input state. Existing NQS based simulation algorithm [78] relies on a probabilistic post-selection to achieve the non-unitary operations. Finally, the lack of complex parameters severely limits the usefulness of the NQS for quantum simulations. For instance, (1) Complex-valued wave functions allow us to simulate fermions in time reversal symmetry breaking systems such as electrons in the presence of a magnetic field. (2) To simulate quantum dynamics, it is a necessity to account for the accumulation of dynamical phase factors.
The aforementioned limitations certainly have cast doubts on whether the NQS should be used in the cquantum circuits; despite many of their theoretical merits as wave-function ansatz and convincing demonstrations in numerical studies (i.e. classical simulations) in a broad range of physical systems cited above. To address these deficiencies, we propose a state-preparation protocol for creating complex-valued NQS in a quantum circuit. In particular, the state-preparation protocol does not use N + M qubits to model N visible spins and M hidden spins explicitly. This is because every term in a RBM Hamiltonian commutes with each other, we can explicitly arrange the order in which the unitary gates acting on the hidden spins. Hence, a single ancilla qubit (representing one hidden spin) can be recycled upon measurement and be reused to represent another hidden spin in a subsequent stage. This qubit-recyle scheme [80,81] tremendously reduces the number of physical qubits (down to just one extra ancilla at the bare minimum) needed to execute the proposed state-preparation protocol. This advantage cannot be underestimated as typical RBM-NQS might use as many as M =Poly(N ) hidden spins. To avoid the probabilistic scheme to mediate the non-unitary couplings between the visible and hidden layers, we further propose to only consider unitary couplings between visible and hidden spins under the RBM architecture in the main text.
As extensively discussed in Supplementary II and illustrated with numerics reported in the Result section, we show that arbitrary NQS can be systematically mapped onto these unitary-coupled RBM-NQS without sacrificing expressive power to model a variety of quantum systems in our empirical studies. In Supplementary V, we also provide an extension of the state-preparation protocol to handle arbitrary complex-valued coupling parameters across layers. Furthermore, we propose a novel approach to avoid post-selections on hidden-spin states. In this scheme, we decompose a single NQS into an ensemble of modified NQS, which are essentially outputted by the quantum circuit when the post-selection fails (i.e. the outcome of the projective measurement is not the desired one). Under the existing scheme, one would either discard the present result and restart after a failed post-selection or attempt to recover the input quantum state from a failed post-selection. In this ensemble scheme, one does not have to go through either of these repetitive procedures. In our case, estimates of observables on the original NQS can be extracted from the ensemble of "post-selection failures" within a Monte Carlo framework.
Finally, due to the "diagonal structure" of the NQS ansatz, the simulation algorithm based on the imaginary-time dependent variational principle [34] can be dramatically simplified as one just needs to perform measurement on one quantum circuit instead of working with O(N 2 var ) different quantum circuits where N var is the number of total variational paramters. For large-scale simulations, N var ∝ O(M N ) could be a huge number. In summary, the present work has both expanded the scope of application for NQS based hybrid quantum-classical simulations and has lowered the barrier for experimental implementations. In order to frame the significance of this work in a better perspective, we summarize and clarify the challenges we set out to address in prior works in Supplementary I. spectively. We also denote complex-valued RBM parameters as θ = [b 1 , ⋯, b N , m 1 ⋯, m M , W 11 ⋯, W N M ], and use subscripts R and I to denote the real and imaginary parts, respectively. The complex-valued NQS can be created with a two-step approach. First, entangle N + M qubits (including all visible and hidden spins of an RBM architecture) according to where +⟩ = 1 is the Hermitian part of the RBM Hamiltonian and the subscript vh denotes all visible and hidden (ancilla) qubits. Equation 1 gives a conceptually simple wave function that could be generated by first applying single-qubit transformations exp(b iv z i ) and exp(m jĥ z j ) on individual qubits followed by exp(W ijv z iĥ z j ) to couple qubits. The quantum operations are non-unitary when RBM parameters take on real parts, i.
In general, the non-unitary two-qubit operation mediating entanglement across the visible-hidden layer are difficult to implement. One could adopt the probabilistic scheme introduced in Ref. [78] to generate the inter-layer couplings with an extra ancilla qubit. However, for complex-valued wave function, this probabilistic approach in Ref. [78] is difficult to scale with the number of qubits involved.
Once the extended wave function Ψ vh (θ)⟩ is generated, all ancilla qubits (i.e. hidden spins) are postselected for +⟩ h and the desired NQS in Eq. 15 is reconstructed in the quantum circuit, whereP (h) + = + + ⋅ ⋅ ⋅ +⟩ h ⟨+ + ⋅ ⋅ ⋅ + , N v = vh ⟨+ + ⋯ + eĤ RBM (θ)P (h) + eĤ RBM (θ) + +⋯+⟩ vh and eĤ RBM (θ,h) is an operator acting on the visible spins only as we replace the Pauli operatorĥ z j with a binary value (±1) of h j ∈ h = [h 1 , ⋯, h M ]. From the second line of Eq. 2, it is clear that the hidden spins jointly mediate a specific quantum transformation, ∑ h eĤ RBM (θ,h) , on the visible-spin wave function. Since this transformation is non-unitary in general, the amplitude-amplification type of techniques [82,83] is not immediately applicable to enforce the post-selection. This inconvenient fact creates another obstacle to prepare complex-valued NQS in quantum circuits.
In the rest of this section, we describe a scalable state-preparation protocol that overcomes the two obstacles. In particular, we implement complex-valued NQS using as few as one ancilla qubit and entirely avoids the post-selection. To simplify presentations, we illustrate how to prepare a subset of NQS that we dub the unitary-coupled RBM-NQS, which only allow purely imaginary-valued inter-layer couplings W ij = iW I ij . Generation of unitary-coupled RBM-NQS bypass the inherent challenge to mediate entanglement via non-unitary transformations. Figure 1a gives a circuit diagram of preparing a unitary-coupled RBM-NQS composed of two visible spins with inter-qubit couplings mediated by two hidden spins. The Hadarmard gates prepare the +⟩ state, and the parametrized single-qubit rotations are not fixed along the z axis because of the non-unitary operations, exp b R iv z i and exp m R iĥ z i . In the circuit diagram of panel a, the single-qubit rotations R n (θ) are determined via relations of the form exp b R iv z i +⟩ c = R n (θ) +⟩ with the normalization factor c = ⟨+ exp 2b R iv z i +⟩. Figure 1b gives a schematic depicting the RBM state generated by the circuit in panel a. More importantly, as explained in the supplementary II, the unitary-coupled RBM-NQS does not necessarily suffer loss of expressive power. While we claim it is better to model quantum systems with unitary-coupled RBM-NQS for near-term applications; there is a straightforward extension of the current protocol to generate arbitrary complex-valued NQS in case it is desired. We defer the discussion of this extension to Supplementary V.
where Instead of working directly with Eq. 5, we look for an alternative approach based on the fact that we are primarily interested in the expectation value of an observableÔ, which can be formulated as The equation above can be interpreted as follows. The expectation value of an observableÔ can be turned into the average of the expression inside the square bracket if we can efficiently sample z according to the probability density ⟨z Ψ v (θ)⟩ 2 , i.e. projecting the NQS in some basis. We further analyze this probability density by exposing the details of the RBM architecture, where ∑ ⃗ s = ∑ s 1 ⋯ ∑ s M and it is implicitly assumed that ⟨Ψ v (θ) Ψ v (θ)⟩ = 1. Instead of performing the exact summation over all ⃗ s in Eq. 7 (this is essentially the same summation in Eq. 5 mentioned at the end of the last paragraph), we should simply sample ⃗ s in a Monte Carlo fashion. We note that N 2 ⃗ s is the probability of observing {s 1 , ⋯, s M } upon measuring those M hidden spins during the construction of the state Ψ ⃗ s v (θ)⟩. Hence, samples of ⃗ s are effectively drawn from the probability density N 2 ⃗ s during the construction of Ψ ⃗ s v (θ)⟩ . In short, we replace the exact summation according to where f (⃗ s) is some arbitrary function of ⃗ s and N exp is the number of sampling experiments performed.
The factor ∏ j R 2 s j (m R j ) in Eq. 7 should be calculated classically. This probability ⟨⃗ z Ψ ⃗ s v (θ)⟩ 2 is again sampled from projective measurements on visible spins. The only thing that is prohibitively expensive to estimate either classically or experimentally is the normalization constant N v . This is the reason we introduce the denominator (which is really just 1) in Eq. 7 that carries another N v to cancel the one in the numerator. By using Eqs. 6-7 together, the challenging post-selection is replaced with the Monte Carlo framework that needs to sample multiple Ψ ⃗ s v (θ)⟩ according to Eq. 7. Additional details on the ensemble state preparation method may be found in the supplementary III.
Quantum simulations with NQS-Imaginary Time Evolution (NQS-ITE). Next, we discuss the imaginary-time dependent variational principle (ITDVP) to find a ground state of a HamiltonianĤ. The idea is to propagate a trial wave function Ψ v (θ τ )⟩ in the imaginary time domain. If the trial wave function Ψ v (θ 0 )⟩ at time τ = 0 has a non-zero overlap with the ground state Ψ gs ⟩, then it should converge to an ansatz closest to Ψ gs ⟩ when τ ≫ 1. With a variational ansatz, the time-evolved Ψ v (θ τ )⟩ can be prepared in a quantum circuit if θ τ is given. In the method section, we summarize the standard ITDVP. The equations of motion for θ τ are given in Eq. 19. In this section, we adapt the standard NQS-ITE algorithm to make it compatible with the state preparation protocol introduced earlier. Details of the derivation may be found in Supplementary IV. As a result, the approach presented here implements the imaginary-time propagation (for NQS ansatz) more efficiently than the original algorithm reported in the Method section.
In short, the modified equations of motion for θ assume the following forṁ The matrix A and vector C read, where The O n operators are defined as follows, where We note that Eqs. 9-11 essentially give the stochastic reconfiguration method for the variational Monte Carlo framework. One should compare Eq. 10 and Eq. 19 to see that the standard ITDVP approach (in the Method section) requires preparing In fact, one can simultaneously estimate all O(N 2 var ) matrix elements with every given sample of z according to the definition of A matrix element in Eq. 10. This could be a tremendous advantage for large-scale simulations when N var ∝ O(M N ) could be a huge number.
Next, we point out an interesting observation. The standard gradient-based energy minimization, as done within the VQE approach, updates θ θ with the equation of motionθ n = C n where C n being the gradient vector for the energy function The comparison of Eq. 9 to the equation of motion above reveals that the NQS-ITE introduces a preconditioner A −1 to the gradient vector in order adjust the step size to account for the intrinsically curved metric for the NQS manifold. However, the evaluations of the matrix A requires no more experimental efforts for NQS-ITE than for a standard gradient-descent based approach as explained. As the imaginary-time method tends to give better result than the gradient descent; one should always adopt the imaginary-time propagation whenever NQS is used as the trial wave function.
Numerical Results. To demonstrate the effectiveness of RBM-NQS ansatz and NQS-ITE algorithm, we report numerical simulations on three different types of systems: molecules, spin chains and nanostructures (a triple quantum dots). Throughout the paper, we use a hidden and visible spin ratio of α = M N = 1, unless otherwise specified. We first present results with the standard complex-valued NQS ansatz (which requires the extended state preparation protocol described in Supplementary V), then we repeat the simulations in the last subsection to demonstrate that the unitary-coupled RBM-NQS could achieve the same level of accuracy with the same number of hidden spins, i.e. α = 1.
3a. Molecular Systems. We first test the NQS-ITE algorithm on common molecular benchmarks: the dissociation curves of H 2 and LiH molecules. The molecular Hamiltonians are first projected onto a discrete set of molecular orbitals. Here, we use the conventional STO-3G basis set, which constitutes a minimal set in that it represents the minimum number of orbitals required to represent a given atomic shell. The resulting fermionic Hamiltonians are subsequently mapped onto qubit Hamiltonians using the Bravyi-Kitaev transformation [84] . The computation of the integrals in second quantization and transformation of the Hamiltonians are done using PySCF [85] and OpenFermion [86].
In the current simulations, the hydrogen molecule requires 2 visible spins whereas lithium hydride requires 4 visible spins. The numerical results from NQS ansatz in computing the ground state energy as a function of inter-nuclear distance are plotted in Fig. 2. It can be seen that NQS is capable of reproducing nearly exact results despite using a modest number of hidden spins.
3b. Spin systems. Next, we consider the problem of finding the ground state of two prototypical spin models, i.e. the transverse-field Ising (TFI) model and the antiferromagnetic Heisenberg (AFH) model. The spin Hamiltonians can be written as, where h is the transverse field strength and we use open boundary condition. In Fig. 3 we again compute the energies using the NQS-ITE algorithm and compare the results with those from the exact diagonalization, and we observe excellent agreements as expected. 3c. Triple Quantum Dots (TQD). A lateral TQD is an artificial, fully tunable molecule constructed using metallic gates localizing electrons in a semiconductor field-effect transistor (FET). A TQD allows one to study new phenomena not present in a single or double quantum dot, e.g. topological effects [87,88]. The Hamiltonian of the TQD subject to a uniform perpendicular magnetic field, B = Bẑ, is given by whered iσ (d † iσ ) is fermionic annihilation (creation) operator with spin σ = ±1 2 on orbital i.n iσ =d † iσd iσ andˆ i =n i↓ +n i↑ are the spin and charge density on orbital level i. Each dot is represented by a single orbital with energy E iσ = E i + g * µ B Bσ, where g * is the effective Landé g-factor, µ B is the Bohr magneton. The dots are connected by magnetic field dependent hopping matrix elementst ij = t ij e 2πiφ ij . The details of the model can be found in the Method section.
The ground state energy of the TQD as a function of magnetic field is plotted in Fig. 4, we observe excellent agreement between the results from the exact diagonalization and NQS-ITE. It is worth noting that, at non-zero magnetic field, the ground state wave function of a TQD could be complex, thus a complex RBM ansatz is necessary for accurate representation of the ground state wavefunction.
3d. Results with unitary-coupled RBM-NQS. Finally in Fig. 5, we repeat the same set of model studies as above but we now apply the unitary-coupled RBM-NQS ansatz (green dahsedlines) to the NQS-ITE. The key insight revealed by Fig. 5 is that, in most cases, the unitary-coupled RBM-NQS provide comparable performance to that of the standard complex NQS. In the case of Heisenberg model, the approximated RBM ansatz sometimes fails to encode the the imaginary time operator and this leads to non-monotonic behaviour during the imaginary time evolution, though our algorithm eventually finds the true ground state wave function. This non-monotonicity can mostly be mitigated by using a smaller update time steps (  Fig. 5c inset).
Furthermore, we also present another set of results (solid red lines) in Fig. 5. These results are obtained with RBM ansatz whose parameters are initialized with a mean-field solution to the original problems. As shown, a better initial guess improves the convergence rate in comparison to randomly initialized cases. Particularly, mean-field initial wave function already provides nearly exact ground state wave function for lithium hydride molecule, as it can be seen from Fig. 5b that the ground state energy from the meanfield approximation is nearly identical to the exact ground state energy. The accuracy of the mean-field approximation for lithium hydride molecule is possibly due to the low level of electron correlations among the orbitals in minimal basis set. To test this hypothesis, we solve for the ground state of lithium hydride using Hartree-Fock method and found that it already provides a very accurate ground state solution. In the method section, we briefly comment how one can systematically make an intelligent guess on a good initial state under various situations.

DISCUSSION
In conclusion, we present a practical approach to exploit a popular machine-learning model for quantum simulations on a digital quantum computer. Before fault-tolerant quantum computation becomes readily available, the hybrid quantum-classical algorithms will prevail as a popular approach for investigating novel applications of a quantum computer. Successful experimental demonstrations of hybrid quantum- classical algorithms have certainly attracted attentions and boosted confidence in quantum computations. Nevertheless, many obstacles still prevent a clear demonstration of unambiguous quantum advantages for these hybrid quantum-classical algorithms. One possible path towards this goal is to investigate more powerful wave-function ansatz that can achieve a good tradeoff between expressive power and number of variational parameters. With fewer parameters, potentially, one may deal with a shorter-depth state preparation and deals with a simpler optimization problem.
From this perspective, the NQS certainly seems a promising option to investigate. For instance, it is known that a fully-connected RBM ansatz satisfy an entanglement volume scaling, it is very intriguing to further investigate whether one can exploit this property to minimize number of variational parameters under a realistic setting. While the long-range connectivity between qubits is not necessarily easy to realize in every kind of quantum hardwares at the moment; it is at least experimentally feasible with one of the leading hardware architecture, the ion-trap based quantum computers [89,90] having all-to-all connectivity among qubits. As the coherence time of quantum hardware continues to improve, issues of qubit connectivity could potentially be mitigated with advanced techniques such as the fermionic swap network [18]. Theoretically, one may also design quantum algorithms based on the deep Boltzmann machines [91] that further elevates the expressive power of Boltzmann-machine architectures with only short-range couplings, suitable for quantum hardware featuring local connectivity among qubits.
In this current work, we set out to improve the existing NQS state-preparation protocol in the quantum circuits. As mentioned in the introduction, prior quantum algorithms using RBM ansatz suffer from several obstacles that prevents simulations for complex systems with many degrees of freedom. Our proposed state-preparation protocols have significantly expanded the scope of applications for NQS as an ansatz for quantum simulations and lowered the experimental barriers. Without the complex-valued parameters, one cannot simulate some important quantum materials and quantum dynamics. Our numerical testings manifest encouraging signs that the NQS ansatz performs remarkably well across a variety of systems of practical and theoretical interests. Due to the qubit-recycling scheme, we reduce the number of required qubits from O(N +M ) in previous works down to O(N ) with sequential implementations of visible-hidden layer interactions. Avoiding the probabilistic preparation of inter-layer couplings (by imposing W ij = iW I ij ) also further improves the practicality of NQS-based simulations. The ensemble state preparation bypass the post-selection on hidden spins. Finally, it has been previously shown that imaginary time algorithm offers superior performance compared to VQE [34], but at the expense of more state preparations and measurements. In this work, we exploit the properties of RBM architecture and show that the number of different quantum states required in the imaginary time algorithm could be reduced from O(N 2 var ) down to 1. Hence, the experimental costs for NQS-ITE is comparable to what VQE demands. Since Boltzmann machine is a widely used machine learning model with many applications, we expect this HQC paradiagm building on the Monte Carlo framework and the NQS ansatz can be adopted for solving other important problems, such as discrete optimizations and machine learning. [62] used the Restricted Boltzmann Machine (RBM) neural-network architecture as wave-function ansatz to manybody physics and attained impressive results. Since then, several other wave function ansatz inspired by neuralnetwork architecture have been explored. Collectively, we now refer to this set of wave function ansatzs as the neural-network quantum states (NQS). Although, in this work, we only consider RBM-NQS, which is particularly convenient to model quantum systems composed of two-level systems (TLSs) such as spin lattice commonly studied in conednsed matter physics and electronic structures problems formulated in terms of qubits. For these systems, each TLS is directly identified with a visible spin in the corresponding RBM model. The entanglement between these TLSs (or visible spins) is mediated by the pairwise interactions between visible and hidden spins. In short, a manybody wave function in the NQS form reads h) . The RBM parameters are collectively denoted by θ = {b, m, W}. As shown in Eq. 15, the hidden spins are summed over in the bracket on the right-hand side of Eq. 15 to give a wave function Ψ v (θ)⟩ for the visible spins, which represent the physical system of interest. In principle, θ should possess non-vanishing imaginary components to describe complex-valued wave function.

Restricted Boltzmann Machine As Trial Wave Function. Recently, Troyer and Carleo
To prepare an NQS in a quantum circuit, we should take the energy function E θ (v, h) and promote it to an Hermitian operator by replacing the binary values of v and h with the corresponding Pauli operators. The quantum-circuit analog of Eq. 15 is decomposed into Eqs. 1-2: first entangle the visible and hidden spins then post-selects the hidden spins to mediate the desire non-unitary transformation on the visible-spin wave functions.
As τ → ∞, the stationary solution of Eq. 16 is the lowest-energy eigenstate that has a non-zero overlap with the initial state Φ(0)⟩. The imaginary-time dynamical simulation is a well-established approach to obtain the ground state of complex Hamiltonians. Through time dependent variational principle , an NQS ansatz Ψ v (θ τ )⟩ approximates a time-evolved quantum state Φ τ ⟩ via the following equation of motion, where θ(τ ) are determined via, The A and C matrices are defined as follows, Therefore, one updates the variational parameters according to For the variational parameter update in Eq. 19, we need to compute the gradients of Ψ v ⟩ with respect to θ, i.e. ∂Ψv ∂θn ⟩. As the properly normalized Ψ v (θ)⟩ carries a θ-dependent normalization factor N v in Eq. 15, the chain-rule derived formula for the gradient reads 20) where N v = ⟨Ψ vh (θ) P (h) + Ψ vh (θ)⟩. The above expression depends on the gradients of the extended RBM wave function, ∂ θ Ψ vh ⟩, which can be derived analytically where the superscript (R, I) is used to denote the real and imaginary parts of the coefficients and ⟨Ô⟩ vh = ⟨Ψ vh Ô Ψ vh ⟩. Substituting the expressions given by Eq. 21 into Eq. 20, one obtains expressions like the following, where θ ′ = [⋯, m I j + π 2 , ⋯] and θ = [⋯, m I j , ⋯] differ only by the value of m I j and the normalization factor Initial state preparation for quantum simulations. The NQS ansatz can be used in conjunction with most HQC simulation algorithms in addition to the time-dependent variational method outlined above. All these methods aim to solve highly non-trivial optimization in which the quality of solutions or the convergence rate depends crucially on the overlap of the initial state with the ground state Ψ gs ⟩. The two-stage initialization protocols described here gives a systematic approach to guide the preparation of high-quality initial states. In short, the idea is to selectively optimize a subset of parameters to obtain an approximate solution that could be used as the initial state in a subsequent simulation optimizing over all parameters.
In the simplest case, one may consider a mean-field approximation, which restricts the considerations to completely factorized product-state wave function ⟩ for all visible spins in a quantum simulation. We then subsequently use Ψ 0 v ( ⃗ b)⟩ as the starting point of another simulation in which the hidden spins are introduced along with corresponding parameters {m 1 , ⋯m M , W 11 , ⋯W N M } that collectively facilitate the formation of entangled NQS, Ψ v (θ)⟩. We note the mean-field approximation (single-body physics problem) can be easily done on a classical computer.
Nevertheless, for strongly correlated systems, the product states are not guaranteed to support a high overlap with Ψ gs ⟩. Instead of optimizing ⃗ b (the mean-field approximation) in the first run, it will be beneficial to consider an NQS with specifically designed sparse connectivity. In the second-stage calculation, the fully-connected architecture will be restored as usual, and the total number of variational parameters scale as O(N M ). An obvious question is how to decide the connectivity of this sparsely connected RBM architecture for the first-stage simulation, which needs to balance the expressive power of the variational ansatz and the complexity of the optimization tasks. For lattice systems, one may consider short-range RBMs that constitute a special class of the well-established entangled-plaquette states [72]. In this case, the total number of variational parameters scale as O(N ).
Simulation details for numerical studies. In all our simulations, we use a constant learning rate of 0.01. The variational parameters are initialized as Gaussian random numbers with mean zero and variance 0.01, except in cases where the initial conditions are obtained from mean field solutions (Fig.5 green dashed  lines).
Hamiltonians of Hydrogen and Lithium Hydride. We treat the hydrogen and lithium hydride molecules in the minimal STO-3G basis and use PySCF to compute the integrals in the second quantization The resulting fermionic Hamiltonians are subsequently mapped onto qubit Hamiltonians using Bravyi-Kitaev transformation with OpenFermion [86]. Due to the symmetry in H 2 , the final Hamiltonian consists of 2 qubits [92], whereas the lithium hydride Hamiltonian contains 4 qubits.
Hamiltonian for TQD. For TQD simulations, we use the parameters from Ref. [88], i.e. t = −0.23 meV, U i = 50 t , V ij = 10 t . g * = −0.44, E i = − t , and φ B = 1.25T −1 . We use Bravyi-Kitaev transformation to map the Hubbard Hamiltonian onto a qubit Hamiltonian using OpenFermion. In prior works, an NQS wave-function amplitude reads where Q θ (v, h) = ∑ h e E θ (v,h) N v is the joint Boltzmann distribution of visible and hidden spins as governed by the energy function E θ . When the RBM Hamiltonian is strictly real-valued, then it is obvious that the Ψ v (θ)⟩ can only model positive semi-definite wave functions. While being limited in scope of applications, one can rely on the Marshall Sign Rule to determine when the ground-state wave functions of a spin Hamiltonian should be positive semi-definite. Study on the transverse-field Ising model with positive semi-definite NQS has been reported [79] for quantum simulations on a D-wave quantum annealer. Nevertheless, there is an advantage of working with this subset of NQS. We note the visiblespin probability density ⟨v h) is simply the marginal of the joint visible-hidden distribution. Experimentally, one may simply prepare an extended wave function of all visible and hidden spins then simply "trace over" the hidden spins to estimate ⟨v Ψ v (θ)⟩ 2 . Avoiding the post-selection of + +⋯+⟩ h is certainly advantageous. However, to model realistic systems of interest, one should at least incorporate the negative signs into wave function. Reference [78] proposed an ingenious approach to generate arbitrary real-valued NQS in quantum circuit by modifying the standard RBM architecture. The authors added an extra node in the hidden layer with the sole purpose to assign a sign factor to each wave function coefficient. More precisely, the modified NQS read where with d i and c as parameters to be optimized. Given the form of s(v), it is clear that it cannot be described simply with the standard RBM Hamiltonian. Since this extra hidden node does not couple directly to other hidden spins, it does not have to be explicitly incorporated into the state preparation. Rather, when estimating physical properties of a quantum system,the sign functions can be shifted onto the observable operators. For instance, to measure energy of a quantum system, we prepare a positive semi-definite NQS Ψ v ⟩ described in the previous paragraph and estimate energy according to While this method has significantly expanded the usefulness of NQS in quantum simulations, the generation of the non-unitary transformation exp(W R ijv z iĥ z j ) cannot be done efficiently in a deterministic fashion. The challenge of scalability and the desire to model complex-valued wave functions with significantly lower amount of resources have led us to propose the state preparation scheme described in the main text.

SUPPLEMENTARY INFORMATION II: EXPRESSIVE POWER OF THE UNITARY-COUPLED BOLTZMANN MACHINES
In order to improve the scalability of the NQS preparation in a quantum circuit, we impose the crosslayer interaction W ij = iW I ij to be strictly imaginary-valued to avoid probabilistic preparation or a full tomographic characterization of a thermal-like quantum state when W ij have non-vanishing real parts. In this supplementary, we briefly outline three different schemes to convert an arbitrary complex-valued NQS into variants of Boltzmann Machine inspired wave function ansatz with only imaginary-valued couplings across visible and hidden layers. These conversions clearly show that there is no loss of expressive power with the "seemingly" restricted form of RBM-NQS we promote in this work. In the following, as illustrated in Fig. 6, we present methods to map a standard RBM architecture into two variants of Boltzmann machines as explained below.
Restricted Boltzmann Machine. First, we give the most straightforward conversion scheme. We note the standard NQS has an unnormalized amplitude, where the coefficients c n i 1 ...in are obtained from the Taylor expansion of log ∏ j cosh (m j + ∑ i W ij v i ) with v i being binary values of ±1. On the second line of Eq. 26, the exponent is a polynomial of degree N. Each term (with degree greater than 1) of this polynomial can be fully decoupled by the application of this mathematical identity [93], whereŝ i is a Pauli Z operator acting on state +⟩ s i , and the other parameters read Mod(2, N ) and C = 2 sin(2 arctan(e −ω )) .
In short, every RBM can be exactly turned into another RBM with a fixed imaginary-valued coupling value iπ 4 throughout the network. A potential price to be paid is the exponentially large number of hidden nodes in this alternative (yet fully equivalent) RBM. Certainly, when the coupling parameters c n i 1 ...in in Eq. 26 having magnitudes much smaller than that of b i and m j then one can drop many terms in that polynomial expansion. However, even in the lowest order of perturbation with a quadratic polynomial, the alternative RBM still carries O(N 2 ) hidden nodes. One may wonder whether the cost to impose strictly unitray couplings across visible and hidden layer is always an astronomical number of extra hidden nodes.
We remind that the goal of this rather formal analysis is to show that any arbitrary RBM can always be mapped onto a purely imaginary-valued RBM via a simple mapping. By no means, it is the only or the most efficient mapping. We should not be too pessimistic about this. For instance, there are examples [73] of 1D and 2D RBM-NQS (having M O(N )) exhibiting volume-law entanglements with a constant imaginaryvalued coupling coefficient (iπ 4). This implies strongly correlated quantum states can be efficiently created within this subset of RBM-NQS. Furthermore, we hypothesize that if we simply allow the imaginary-valued couplings to vary independently instead of all being held fixed at the value of iπ 4, then there should be viable mapping of arbitrary RBM-NQS onto rather compact architecture of unitary-coupled RBMs. This intuitive hypothesis is empirically confirmed in our numerical studies reported in the main text.
Unrestricted Boltzmann Machine. Given an RBM architecture (composed of N visible nodes and M hidden nodes) with complex-valued couplings W ij , one may exactly transform away the real components, W R ij via a different approach than the one described above. For instance, the coupling between the i-th visible and the j-th hidden spins may be removed with the addition of one more hidden node (labeled by s) according to the relation [93], whereŝ is the Pauli Z operator acting on the additional ancilla qubit in +⟩ s state. The normalization factor is taken to be ∆ = e W R ij 2, and other newly introduced parameters read These parameters ω v and ω h would be real-valued as expected. The newly added hidden spin breaks the restriction of no intra-layer couplings. The total number of hidden spins increase to O(M N ) after removing W R ij . Nevertheless, the modified neural-network architecture is largely compatible with RBM architecture as far as preparing the prescribed quantum states in a quantum circuit is concerned. Although the qubit recycling scheme comes with a caveat that one needs roughly O(N ) extra ancilla qubits (at each sequential stage) to explicitly model the entangled hidden state, which emerge when we transform away O(N ) realvalued couplings between visible nodes and a particular hidden node.
Deep Boltzmann Machine. Finally, we realize that the unrestricted Boltzmann machine can be further mapped onto a deep Boltzmann machine architecture without intra-layer couplings among hidden nodes. This is achieved by invoking the following mathematical identity, whereŝ i is a Pauli Z operator acting on state +⟩ s i in the deep layer, and the other parameters read .
Hence, in order to convert an unrestricted Boltzmann machine into restricted form (i.e. no intra-layer coupling), every intra-layer couplings between hidden units can be exactly replaced with couplings to extra hidden units in the deep layer. However, using the deep Boltzmann machine for quantum simulation could be difficult as the optimization of the parameters could be hard to converge. In a follow-up work, we will describe practical approaches to perform quantum simulation with deep Boltzmann Machines.

SUPPLEMENTARY INFORMATION III: ENSEMBLE STATE PREPARATION PROTOCOL
As shown in Eq. 5, the post-selected NQS Ψ v (θ)⟩ can be cast as a weighted sum of 2 M Ψ ⃗ s (θ)⟩. To facilitate the following discussion, we will work with the un-normalized version of Ψ ⃗ s (θ)⟩ in this section, where ⃗ s M −K is a reduced vector with the components {i 1 ⋯i K } removed from ⃗ s and are real-valued. From Eq. 34, it is clear that the expression when the index vectors ⃗ s ′ and ⃗ s differ by an odd number of components, i.e. K is odd. This is because the symmetrized expression above must be real-valued, but Eq. 34 demands pure imaginary numbers when K is odd. Hence, only zero is allowed. For the cases ⃗ s ′ and ⃗ s differ by an even number of components, i.e. K is even, we draw attention to a special condition. Let us first introduce another two states Ψ This is because ∏ K q=1 V iq (z), as defined in Eq. 34, carries opposite sign for ⟨Ψ ⃗ Now, consider a Hermitian operatorÔ that is diagonal in the computational basis then the real-valued expectation value is given by This expression admits a simple interpretation. N 2 v is the success probability of preparing an NQS upon post-selection of + +⋯+⟩ h . According to Eq. 39, this probability can be exactly decomposed as a linear combination of conditional probabilities, N 2 ⃗ s ∏ j R 2 s j , where N 2 ⃗ s is the probability of measuring s 1 ⋯s M ⟩ h after the hidden spins are coupled to the visible ones, and ∏ j R 2 s j is the conditional probability that the state ∏ j exp(m R jĥ z j ) s 1 ⋯s M ⟩ h can be subsequently projected back to the desired state, + +⋯+⟩ h . WhenÔ = z⟩⟨z as the projection onto a particular computational state then we obtain By substituting back the normalized factors via Ψ v ⟩ = N v Ψ v ⟩ and Ψ ⃗ s v ⟩ = N ⃗ s Ψ ⃗ s v ⟩ into Eq. 40, we recover Eq. 7 in the main text.

SUPPLEMENTARY INFORMATION IV: NQS-ITE DERIVATIONS AND IMPLEMENTATIONS
In Eq. 20, we relate ∂ θ i Ψ v (θ)⟩ to ∂ θ i Ψ vh (θ)⟩. Alternatively, we may write where Ψ vh (θ)⟩ = ∑ h eĤ RBM (θ,h) + + ⋅ ⋅ ⋅ +⟩ v is the un-normalized wave function. Hence, Eq. 20 can be cast in this equivalent form, andÑ v = ⟨Ψ vh (θ) P (h) + Ψ vh (θ)⟩. Similarly, we have to provide the derivatives of Ψ vh (θ)⟩, Substituting Eqs. 42-43 into A matrix and C vector in Eq. 19, we derive Eq. 10 in the main text. To facilitate the following discussions, we denote O j = tanh (m j + ∑ i W ijv z i ). Since O j are not Hermitian operators and not directly measurable, an experimental scheme for measuring ⟨O j ⟩ v (implied in Eqs. 10-11) should be clearly given. We propose the following strategy using the fact that [O j ,v z i ] = 0, where P v (z v ) = ⟨Ψ v (θ) z v ⟩ 2 is a probability density, z v = [z v,1 , ⋯, z v,N ] is a length-N binary string, and O j = tanh(m j + ∑ i W ij z v,i ). The second line in the equation above implies that simple projective measurements (in the computational basis) in a quantum circuit that prepares Ψ v (θ)⟩ state is an efficient approach to sample z k v binary strings from P v (z v whereĤ(z k v , y k,j v ) = ⟨z k v Ĥ y k,j v ⟩. When the Hamiltonian,Ĥ = ∑ l w lPl , is a linear combination of Pauli strings,Ĥ is a sparse matrix such that each computational state z k v ⟩ is only connected to a few other states y k,j v ⟩. On the third line of Eq. 45, the expression in the bracket should be evaluated classically. O † j (z k v )Ĥ(z k v , y k,j v ) can be done efficiently, but not so much with the individual wave function amplitude Ψ v (θ)⟩ which contains the normalization constantÑ v that is #P-hard to compute in general. However, we only need the un-normalized wave function amplitudes as the quantity appearing in the bracket is the ratio of the wave function amplitudes projected onto two computational basis states, z k v ⟩ and y k,j v ⟩, respectively.