Constructing exact representations of quantum many-body systems with deep neural networks

Carleo, Giuseppe; Nomura, Yusuke; Imada, Masatoshi

doi:10.1038/s41467-018-07520-3

Download PDF

Article
Open access
Published: 14 December 2018

Constructing exact representations of quantum many-body systems with deep neural networks

Giuseppe Carleo^1,2,
Yusuke Nomura³ &
Masatoshi Imada³

Nature Communications volume 9, Article number: 5322 (2018) Cite this article

12k Accesses
111 Citations
84 Altmetric
Metrics details

Subjects

Abstract

Obtaining accurate properties of many-body interacting quantum matter is a long-standing challenge in theoretical physics and chemistry, rooting into the complexity of the many-body wave-function. Classical representations of many-body states constitute a key tool for both analytical and numerical approaches to interacting quantum problems. Here, we introduce a technique to construct classical representations of many-body quantum systems based on artificial neural networks. Our constructions are based on the deep Boltzmann machine architecture, in which two layers of hidden neurons mediate quantum correlations. The approach reproduces the exact imaginary-time evolution for many-body lattice Hamiltonians, is completely deterministic, and yields networks with a polynomially-scaling number of neurons. We provide examples where physical properties of spin Hamiltonians can be efficiently obtained. Also, we show how systematic improvements upon existing restricted Boltzmann machines ansatze can be obtained. Our method is an alternative to the standard path integral and opens new routes in representing quantum many-body states.

Neural operators for accelerating scientific simulations and design

Article 08 April 2024

Physics-informed machine learning

Article 24 May 2021

Scientific discovery in the age of artificial intelligence

Article 02 August 2023

Introduction

A tremendous amount of successful developments in quantum physics builds upon the mapping between many-body quantum systems and effective classical theories. The probably most well-known mapping is due to Feynman, who introduced an exact representation of many-body quantum systems in terms of statistical summations over classical particles trajectories¹. Effective classical representations of quantum many-body systems are however not unique, and other approaches rely on different inspiring principles, such as perturbative expansions², or decomposition of interactions with auxiliary degrees of freedom^3,4. The classical representations of quantum states allow both for novel conceptual developments and efficient numerical simulations. On one hand, perturbative approaches based on the graphical resummation of classes of diagrams are at the heart of many-body analytical approaches in various fields of research, ranging from particle to condensed-matter physics⁵. On the other hand, several non-perturbative numerical methods for many-body quantum systems are also based on these mappings. Quantum Monte Carlo (QMC) methods are among the most successful numerical techniques, relying on continuous-space polymer representations^6,7,8,9, world-line lattice path integrals^10,11, continuous time algorithm¹², and summation of perturbative diagrams^13,14. Effective classical representations are also the building block of variational methods based on correlated many-body wave-functions¹⁵. Several successful variational techniques make extensive use of parametric representations of quantum states, where the effective parameters are determined by means of the variational principle^16,17,18,19. In matrix-product and tensor-network-states the ground-state is expressed as a classical network^20,21. In general, finding alternative, efficient classical representations of quantum states can help establishing novel numerical and analytical techniques to study challenging open issues.

Recently, an efficient variational representation of many-body systems in terms of artificial neural networks, which consists of classical degrees of freedom, has been introduced²². Numerical results have shown that artificial neural networks can represent many-body states with high accuracy^{22,23,24,25,26,27,28,29,30,31}. The majority of the variational approaches adopted so-far are based on shallow neural networks, called restricted Boltzmann machines (RBM), in which the physical degrees of freedom interact with an ensemble of hidden degrees of freedom (neurons). While shallow RBM states have promising features in terms of entanglement capacity^25,32,33,34, only deep networks are guaranteed to provide a complete and efficient description of the most general quantum states^35,36.

In this work, we introduce a constructive approach to explicitly generate deep network structures corresponding to exact quantum many-body ground states. We demonstrate this construction for interacting lattice spin models, including the transverse-field Ising and Heisenberg models. Our constructions are fully deterministic, in stark contrast to the shallow RBM case, in which the numerical optimization of the network parameters is inevitable. The number of neurons required in the construction scales only polynomially with the system size, thus the present approach constitutes a new family of efficient quantum-to-classical mappings exhibiting a prominent representational flexibility. Given as a simple set of iterative rules, these constructions can be used both as a self-standing tool, or to systematically improve results obtained with variational shallow networks. The latter improves the efficiency of the method because the numerically optimized shallow RBM states are already good approximations for ground states. Finally, we discuss sampling strategies from the generated deep networks and show numerical results for one-dimensional spin models.

Results

General scheme of constructing deep neural states

The ground state of a generic Hamiltonian, ${\cal H}$, can be found through imaginary-time evolution, $\left| {{\mathrm{\Psi }}(\tau )} \right\rangle$ = ${\mathrm {e}}^{ - \tau {\cal H}}\left| {{\mathrm{\Psi }}_0} \right\rangle$, for a sufficiently large $\tau \gg {\mathrm{\Delta }}E^{ - 1}$. Here ΔE is the energy gap between the ground and the first excited state, $\left| {{\mathrm{\Psi }}_0} \right\rangle$ is an arbitrary initial state non-orthogonal to the exact ground state, and we work in units where ħ = 1. For a finite system, the energy gap is typically finite, and the total propagation time needed to reach the ground state within an arbitrary given accuracy is expected to grow at most polynomially with the system size (for systems becoming gapless in the thermodynamic limit).

Here, we introduce a representation of the wave-function coefficients in terms of a deep Boltzmann machine (DBM)³⁷. For the sake of concreteness, let us consider the case of N spins, described by the quantum numbers $\left| {\sigma ^z} \right\rangle$ = $\left| {\sigma _1^z \ldots \sigma _N^z} \right\rangle$. Then, we represent generic many-body amplitudes $\left\langle {\sigma _1^z \ldots \sigma _N^z{\mathrm{|\Psi }}} \right\rangle \equiv {\mathrm{\Psi }}\left( {\sigma ^z} \right)$ in the two-layer DBM form:

$$\begin{array}{c}{\mathrm{\Psi }}_{\cal W}\left( {\sigma ^z} \right) = \mathop {\sum}\limits_{\{ h,d\} } {\kern 1pt} {\mathrm{exp}}\left[ {\mathop {\sum}\limits_i {\kern 1pt} a_i\sigma _i^z + \mathop {\sum}\limits_{ij} {\kern 1pt} \sigma _i^zW_{ij}h_j} \right.\\ \left. { + \mathop {\sum}\limits_j {\kern 1pt} b_jh_j + \mathop {\sum}\limits_{jk} {\kern 1pt} h_jd_kW_{jk}^\prime + \mathop {\sum}\limits_k {\kern 1pt} b_k^\prime d_k} \right]\end{array}$$

(1)

where we have introduced M hidden units h, M′ deep units d, and a set of couplings and bias terms ${\cal W}$ ≡ (a, b, b′, W, W′). A sketch of the DBM architecture is shown in Fig. 1.

In the following, we specialize to the case of spin 1/2, thus all the units are taken to be σ^z, h, d = ±1. This representation is the natural deep-network generalization of the shallow RBM, introduced as variational ansatz in ref. ²². As for the RBM form, also in this case direct connections between variables in the same layer are not allowed. A crucial difference is however that the layer of deep variables makes, in general, the evaluation of the wave-function amplitudes not possible analytically. At variance with RBM, the DBM form is known to be universal, as proven by Gao and Duan recently³⁵.

Our key finding is that, thanks to the much more flexible representability, the DBM wave function can reproduce the Hamiltonian imaginary-time evolution exactly by changing its form dynamically, and that the parameters for ground state DBM network can be derived analytically. In order to find explicit expressions for the parameters ${\cal W}$ that represent $\left| {{\mathrm{\Psi }}(\tau )} \right\rangle$ for arbitrary imaginary time, we start considering a second-order Trotter–Suzuki decomposition^10,38:

$$\left| {{\mathrm{\Psi }}(\tau )} \right\rangle = {\cal G}_1\left( {\delta _\tau {\mathrm{/}}2} \right){\cal G}_2\left( {\delta _\tau } \right) \ldots {\cal G}_1\left( {\delta _\tau } \right){\cal G}_2\left( {\delta _\tau } \right){\cal G}_1\left( {\delta _\tau {\mathrm{/}}2} \right)\left| {{\mathrm{\Psi }}_0} \right\rangle ,$$

(2)

where we have decomposed the Hamiltonian into two non-commuting parts, ${\cal H} = {\cal H}_1 + {\cal H}_2$, and introduced the short-time propagators ${\cal G}_\nu \left( {\delta _\tau } \right) = {\mathrm {e}}^{ - {\cal H}_\nu \delta _\tau }$. The problem of finding an exact representation for $\left| {{\mathrm{\Psi }}(\tau )} \right\rangle$ then reduces to finding a rule to construct the building blocks of the time-evolution, namely representing the state after two-types of propagators by DBM with new parameters $\bar {\cal W}$:

$${\mathrm {e}}^{ - {\cal H}_\nu \delta _\tau }\left| {\Psi _{\cal W}} \right\rangle = C\left| {{\mathrm{\Psi }}_{\bar {\cal W}}} \right\rangle .$$

(3)

In practice, this is achieved either by changing parameters ${\cal W}$, or by introducing additional parameters in ${\cal W}$, adding new neurons and creating new connections in the network.

In the following, we show concrete examples for paradigmatic spin hamiltonians, namely the transverse-field Ising and Heisenberg models. The rest of this section provides a general overview of how the DBM constructions are derived (how Eq. (3) is satisfied) for these models. The next section (Sampling strategies) discusses how they can be used in numerical schemes. A complete, in-depth derivation of the representations and algorithms can be found both in Methods and in the Supplementary Notes, as referred to at each step in this section. Furthermore, we provide computer codes to create the DBM network for each model as Supplementary Software 1–4.

Transverse-field Ising model (TFIM)

We start considering the TFIM on an arbitrary interaction graph. In this case, we decompose the Hamiltonian into two parts: ${\cal H}_1 = - \mathop {\sum}\nolimits_l {\kern 1pt} {\mathrm{\Gamma }}_l\sigma _l^x$, and ${\cal H}_2 = \mathop {\sum}\nolimits_{l < m} {\kern 1pt} V_{lm}\sigma _l^z\sigma _m^z$, where σ denote Pauli matrices, Γ_l (>0) are site-dependent transverse fields, and V_lm are arbitrary coupling constants.

In order to implement the mapping to a DBM, we first consider the action of the diagonal propagator ${\mathrm {e}}^{ - \delta _\tau V_{lm}\sigma _l^z\sigma _m^z}$, acting on a bond V_lm. In this case, the goal of finding an exact DBM representation can be rephrased as finding solutions to

$$\left\langle {\sigma ^z} \right|{\mathrm {e}}^{ - \delta _\tau V_{lm}\sigma _l^z\sigma _m^z}\left| {\Psi _{\cal W}} \right\rangle = C{\mathrm{\Psi }}_{\bar {\cal W}}\left( {\sigma ^z} \right),$$

(4)

i.e. finding a set of new parameters $\bar {\cal W}$ that exactly reproduces the imaginary time evolution on the left- hand side. Here C is an arbitrary finite normalization constant. The diagonal propagator introduces an interaction between two visible, physical spins, which is not directly available in the DBM architecture. This interaction can be mediated by a new hidden unit in the first layer, h_[lm] which is only connected to the visible spins on that bond, i.e. $\bar W_{l[lm]}$ and $\bar W_{m[lm]}$ are finite, but $\bar W_{i[lm]}$ = 0, ∀i ≠ l, m and $\bar W_{j[lm]}^\prime$ = 0, ∀j (see Fig. 2a).

More concretely, the new wave function has then the form:

$$\begin{array}{c}{\mathrm{\Psi }}_{\bar {\cal W}}\left( {\sigma ^z} \right) = \mathop {\sum}\limits_{h_{[lm]}} {\kern 1pt} {\mathrm {e}}^{\sigma _l^zW_{l[lm]}h_{[lm]} + \sigma _m^zW_{m[lm]}h_{[lm]}}{\mathrm{\Psi }}_{\cal W}\left( {\sigma ^z} \right)\\ = 2{\kern 1pt} {\mathrm{cosh}}\left( {\sigma _l^zW_{l[lm]} + \sigma _m^zW_{m[lm]}} \right){\mathrm{\Psi }}_{\cal W}\left( {\sigma ^z} \right).\end{array}$$

(5)

Equation (4) is then satisfied if

$${\mathrm {e}}^{ - \delta _\tau V_{lm}\sigma _l^z\sigma _m^z} = 2C{\kern 1pt} {\mathrm{cosh}}\left( {\sigma _l^zW_{l[lm]} + \sigma _m^zW_{m[lm]}} \right)$$

(6)

for all the possible values of $\sigma _l^z$ and $\sigma _m^z$. By means of a useful identity [Eq. (21) in Methods], the new parameters W_l[lm] and W_m[lm] are given by

$$W_{l[lm]} = \frac{1}{2}{\mathrm{arcosh}}\left( {{\mathrm {e}}^{2\left| {V_{lm}} \right|\delta _\tau }} \right)$$

(7)

$$W_{m[lm]} = - {\mathrm{sgn}}\left( {V_{lm}} \right) \times W_{l[lm]}.$$

(8)

In this way the classical two-body interaction can, in general, be represented exactly by the shallow RBM.

Next, to exactly represent the off-diagonal propagator ${\mathrm {e}}^{\delta _\tau {\mathrm{\Gamma }}_l\sigma _l^x}\left| {{\mathrm{\Psi }}_{\cal W}} \right\rangle$, we must solve:

$$\begin{array}{r}{\mathrm{cosh}}\left( {{\mathrm{\Gamma }}_l\delta _\tau } \right){\mathrm{\Psi }}_{\cal W}\left( {\sigma ^z} \right) + {\mathrm{sinh}}\left( {{\mathrm{\Gamma }}_l\delta _\tau } \right){\mathrm{\Psi }}_{\cal W}\left( {\sigma _l^z \to - \sigma _l^z} \right)\\ = C{\mathrm{\Psi }}_{\bar {\cal W}}\left( {\sigma ^z} \right)\end{array}$$

(9)

for the new weights $\bar {\cal W}$, and for an appropriate finite normalization constant C. In this case, one possible solution is obtained by adding one deep d_[l] and one hidden h_[l] neurons. For d_[l], we create new couplings $W_{j[l]}^\prime$ to the existing hidden neurons h_j which are connected to $\sigma _l^z$. We simultaneously allow for changes in the existing parameters. By the procedure given in Methods, after applying the off-diagonal propagator for the site l, a solution of Eq. (9) is found by the matching condition of the hidden unit interactions on the left and the right hand sides of Eq. (9). Overall, the solution results in a three-step process (Fig. 2b): First, the hidden units attached to $\sigma _l^z$ are connected to the newly introduced deep unit d_[l] as

$$W_{j[l]}^\prime = - W_{lj}$$

(10)

(see Eq. (35)). Second, all the hidden units previously connected to the spin $\sigma _l^z$ lose their connection, i.e., $\bar W_{lj} = 0,\forall j$. Third, the spin $\sigma _l^z$ and the deep unit d_[l] are connected to the new hidden unit_,h_[l], through the interactionW_l[l] and $W_{[l][l]}^\prime$, respectively, as

$$W_{l[l]} = \frac{1}{2}{\mathrm{arcosh}}\left( {\frac{1}{{{\mathrm{tanh}}\left( {{\mathrm{\Gamma }}_l\delta _\tau } \right)}}} \right),$$

(11)

$$W_{[l][l]}^\prime = - W_{l[l]}.$$

(12)

Using the given expressions for the parameters $\bar {\cal W}$ we can then exactly implement a single step of imaginary-time evolution. The full imaginary-time evolution is achieved by applying the above procedure for ${\cal H}_1$ and ${\cal H}_2$ alternately and repeatedly. Example applications of these rules, for both the diagonal and the off-diagonal propagators are shown in Fig. 2.

Approximate RBM from DBM for transverse ising model

From the previous discussion, we have seen that the action of the off-diagonal propagator is responsible for the introduction of deep units in the network, thus breaking the shallow RBM structure. An interesting question is whether, in some limit, it is possible to stay within the RBM structure even for the off-diagonal propagator. The action of the off-diagonal propagator onto an RBM state can be then systematically expanded in powers of the weights:

$$ {\left\langle {\sigma ^z} \right|{\mathrm {e}}^{\delta _\tau {\mathrm{\Gamma }}_l\sigma _l^x}\left| {{\mathrm{\Psi }}_{\cal W}^{{\mathrm{RBM}}}} \right\rangle} \\ {\propto \mathop {\sum}\limits_{\{ h\} } {\mathrm {e}}^{\mathop {\sum}\limits_{ij} {\kern 1pt} W_{ij}\sigma _i^zh_j}\left\{ {1 + {\mathrm{tanh}}\left( {{\mathrm{\Gamma }}_l\delta _\tau } \right)\left( {1 - 2\sigma _l^z\mathop {\sum}\limits_j h_jW_{lj}} \right)} \right\} + {\cal O}\left( {W_{lj}^2} \right).}$$

(13)

In the case of small weights, we can then exactly reproduce the off-diagonal propagator upon imposing a small change in the parameters W_lj → W_lj + ΔW_lj and keeping an RBM structure. If we expand the new RBM with modified weights, we get

$$\left\langle {\sigma ^z} \right|{\mathrm {e}}^{\delta _\tau {\mathrm{\Gamma }}_l\sigma _l^x}\left| {{\mathrm{\Psi }}_{\bar {\cal W}}^{{\mathrm{RBM}}}} \right\rangle \propto \mathop {\sum}\limits_{\{ h\} } {\mathrm {e}}^{\mathop {\sum}\limits_{ij} {\kern 1pt} W_{ij}\sigma _i^zh_j}\left\{ {1 + \sigma _l^z\mathop {\sum}\limits_j {\kern 1pt} {\mathrm{\Delta }}W_{lj}h_j} \right\} \\ \hskip 6pt \hskip 10pt + {\cal O}\left( {{\mathrm{\Delta }}W_{lj}^2} \right).$$

(14)

Comparing Eqs. (13) and (14), it follows that (apart from an irrelevant global normalization) the state after the off-diagonal propagator is still an RBM, with weights equal to:

$$W_{lj} \to W_{lj} - 2{\kern 1pt} {\mathrm{tanh}}\left( {{\mathrm{\Gamma }}_l\delta _\tau } \right)W_{lj},$$

(15)

and an error proportional to the square of the weights at that time step. In general, we expect that this kind of approximate updates is accurate in perturbative regimes (for example in the limit of small Γ_l) or in the limit of small imaginary time evolution. A similar approximation scheme has been derived in ref. ³⁹. Numerical results for this approximation are discussed in a dedicated section before the Discussion.

Heisenberg model

We now consider the anti-ferromagnetic Heisenberg model (AFHM), on bipartite lattices. In one dimension, we decompose the Hamiltonian into odd and even bonds: ${\cal H}_1 = \mathop {\sum}\nolimits_{\langle l,m\rangle }^{{\mathrm{odd}}} {\kern 1pt} {\cal H}_{lm}^{{\mathrm{bond}}}$ and ${\cal H}_2 = \mathop {\sum}\nolimits_{\langle l,m\rangle }^{{\mathrm{even}}} {\cal H}_{lm}^{{\mathrm{bond}}}$, with ${\cal H}_{lm}^{{\mathrm{bond}}}$ = $J\left( {\sigma _l^x\sigma _m^x + \sigma _l^y\sigma _m^y + \sigma _l^z\sigma _m^z} \right)$, where σ denote Pauli matrices. Because the bond Hamiltonian ${\cal H}_{lm}^{{\mathrm{bond}}}$ is a building block also in higher dimensional models, construction of an exact DBM representation of the ground states can be achieved by finding solutions for the bond-propagator $\left\langle {\sigma ^z} \right|{\mathrm {e}}^{ - \delta _\tau {\cal H}_{lm}^{{\mathrm{bond}}}}\left| {{\mathrm{\Psi }}_{\cal W}} \right\rangle$ = $C\left\langle {\sigma ^z{\mathrm{|}}\Psi _{\bar {\cal W}}} \right\rangle$, where the parameters $\bar {\cal W}$ are such that the previous equation is satisfied for all the possible $\left\langle {\sigma ^z} \right|$, and for an arbitrary finite normalization constant C. More explicitly, we need to satisfy

$$\begin{array}{l}\delta _{\sigma _l^z,\sigma _m^z}{\mathrm {e}}^{ - J\delta _\tau }{\mathrm{\Psi }}_{\cal W}\left( {\sigma ^z} \right) + \left( {1 - \delta _{\sigma _l^z,\sigma _m^z}} \right){\mathrm {e}}^{J\delta _\tau }{\mathrm{cosh}}\left( {2J\delta _\tau } \right) \\ \times \left( {{\mathrm{\Psi }}_{\cal W}\left( {\sigma ^z} \right) - {\mathrm{tanh}}\left( {2J\delta _\tau } \right){\mathrm{\Psi }}_{\cal W}\left( {\sigma _l^z \leftrightarrow \sigma _m^z} \right)} \right) = C{\mathrm{\Psi }}_{\bar {\cal W}}\left( {\sigma ^z} \right).\end{array}$$

(16)

The basic strategy of finding a solution for Eq. (16) is similar to that for Eq. (9) in the transverse Ising model. Several possibilities arise when looking for solutions of the bond-propagator equation, Eq. (16). The existence of non-equivalent solutions prominently shows the non-uniqueness of DBM structure to represent the very same state and, at the same time, provides us flexibility in designing DBM architectures. Here, we show three concrete constructions. See Methods and Supplementary Note 2 for a detailed derivation of the DBM construction for the Heisenberg model, including anisotropic and bond-disordered coupling cases.

1 deep + 3 hidden variables construction for Heisenberg model

The first construction is dubbed “1 deep, 3 hidden” (1d–3h). It amounts to adding an extra deep neuron, d_[lm], and three more hidden neurons to satisfy Eq. (16). A crucial difference with respect to the TFIM is that the introduced deep spin d_[lm] has a constraint depending on the state of the spins on the bond: $\sigma _l^z$ and $\sigma _m^z$. Specifically, when $\sigma _l^z = \sigma _m^z$ the deep spin is constrained to be $d_{[lm]} = \sigma _l^z = \sigma _m^z$, whereas when $\sigma _l^z \ne \sigma _m^z$, its value is unconstrained. From a pictorial point of view, the action of the bond propagator is a four-step process (see Fig. 3a). Starting from a given initial network (uppermost structures in Fig. 3), d_[lm] is added and connected, through $W_{j[lm]}^\prime$ given in Eq. (43), to the existing hidden units h_j connected to $\sigma _l^z$ and $\sigma _m^z$. Second, spin $\sigma _l^z$ is disconnected to all hidden units and reconnected to those hidden units the spin $\sigma _m^z$ is attached to [see Eq. (42)]. Third, two new hidden units are introduced. One of the hidden units, h_[lm1], mediates the interaction between $\sigma _l^z$ and d_[lm] [Eq. (47)], and the other hidden unit h_[lm2] mediates a direct spin–spin interaction between $\sigma _l^z$ and $\sigma _m^z$ [Eq. (49)]. Fourth, a further hidden unit connected to $\sigma _l^z$, $\sigma _m^z$, and d_[lm] is inserted, in such a way that the constraint previously described is satisfied. For all but the last step, the DBM weights are real-valued. In the last step instead the constraint is enforced by introducing imaginary-valued interactions (dotted lines in Fig. 3), referred to the “iπ/6‘ trick, resulting in a sign-problem-free global term ${\mathrm{cos}}({\pi {\mathrm{/}}6({\sigma _l^z + \sigma _m^z - d_{[lm]}})})$ after the summation over ±1 for the lastly added hidden unit h_[lm3]: $\mathop {\sum}\nolimits_{h_{[lm3]} = \pm 1} {\kern 1pt} {\mathrm{exp}}[{i\pi {\mathrm{/}}6({\sigma _l^z + \sigma _m^z - d_{[lm]}})h_{[lm3]}}]$. The constraint mentioned above is assured by this cosine term.

2 deep + 6 hidden variables construction for Heisenberg model

The second construction is dubbed “2 deep, 6 hidden” (2d–6h), and is more similar to the lattice path-integral formulation. In this representation, we introduce two auxiliary deep spins per bond, d_[l] and d_[m] with constraint $d_{[l]} + d_{[m]} = \sigma _l^z + \sigma _m^z$, and six hidden neurons. The action of the bond propagator is schematically illustrated in Fig. 3b: first, two deep units d_[l] and d_[m] are introduced, connecting, respectively, to the hidden units spins $\sigma _l^z$ and $\sigma _m^z$ are attached to [see Eqs. (51) and (52)]. Second, all the connections between spins $\sigma _l^z$, $\sigma _m^z$, and hidden units h_j are cut off [Eqs. (53) and (54)]. Third, four hidden units h_[lm1], …, h_[lm4] are introduced, to mediate interactions between the two deep units and the physical spins l, m [Eqs. (61) and (62)]. Finally, two hidden units h_[lm5] and h_[lm6] are introduced, connecting both to d_[l], d_[m] and $\sigma _l^z,\sigma _m^z$ with imaginary-valued weights. The last step realizes the constraint $d_{[l]} + d_{[m]} = \sigma _l^z + \sigma _m^z$, through the “iπ/4, iπ/8‘ trick discussed in Methods and the discussion of the 2d–6h representation in Supplementary Note 2.

In this representation, if the hidden neurons are traced out, the imaginary-time evolution becomes equivalent to that of the path-integral Monte Carlo method. More specifically, the number of deep neurons introduced at each time slice is exactly the same as the number of visible spins, and the deep neurons at each time slice can be regarded as additional classical spin degrees of freedom in the path-integral. Moreover, the constraint $d_{[l]} + d_{[m]} = \sigma _l^z + \sigma _m^z$ ensures that the total magnetization is conserved at each time slice. Finally, the W and W′ interactions reproduce the matrix element of ${\mathrm{exp}}\left( { - \delta _\tau {\cal H}_{lm}^{{\mathrm{bond}}}} \right)$ between neighboring time slices. See Supplementary Note 2 for more detail on this point.

2 deep + 4 hidden variables construction for Heisenberg model

A further possible solution to Eq. (16) is dubbed “2 deep, 4 hidden” (2d–4h) construction. In this case, we introduce two auxiliary deep variables d_[l] and d_[lm]. We also introduce four hidden units h_[l], h_[m], h_[lm1], and h_[lm2]. Before the imaginary time evolution, $e^{ - \delta _\tau {\cal H}_{lm}^{{\mathrm{bond}}}}$, the physical variables $\sigma _n^z$ (n = l or m) are already coupled to each hidden variable h_j with a coupling W_nj. After the time evolution ${\mathrm {e}}^{ - \delta _\tau {\cal H}_{lm}^{{\mathrm{bond}}}}$, as shown schematically in Fig. 3c, the coupling parameters are updated in the following way based on the old W_nj: First, the first deep unit d_[l] becomes coupled to the already existing hidden variables h_j through the coupling $W_{j[l]}^\prime$ given in Eq. (67). The second deep unit d_[lm] becomes similarly coupled to h_j through a term Z_lmj given in Eq. (67). Second, W_nj is updated to $\bar W_{nj} = W_{nj} + {\mathrm{\Delta }}W_{nj}$ [see Eq. (66)]. Third, newly introduced h_[n] (n = l or m) gets coupled to d_[l] through $W_{[n][l]}^\prime$, and also to $\sigma _n^z$ through W_n[n] [Eqs. (71) and (73)].

Finally, as clarified in Methods, we also need to satisfy the constraint $d_{[l]}d_{[lm]} = \sigma _l^z\sigma _m^z$. Such a constraint is represented in DBM form as

$$\mathop {\sum}\limits_{h_{[lm1]},h_{[lm2]}} {\kern 1pt} {\mathrm{exp}}\left[ {\frac{{i\pi }}{4}\left( {h_{[lm1]} + h_{[lm2]}} \right)\left( {\sigma _l^z + \sigma _m^z + d_{[l]} + d_{[lm]}} \right)} \right],$$

(17)

which ensures $d_{[l]}d_{[lm]} = \sigma _l^z\sigma _m^z$ after explicit summation of h_[lm1] and h_[lm2]. Finally, we remark that the three constructions presented here have different intrinsic network topologies. In particular, 2d–6h gives rise to a local topology (because of the equivalence with the path-integral contruction), 1d–3h has a local structure in the first layer and non-local in the second one, and 2d–4h is purely non-local in both layers.

Sampling strategies

With network structures explicitly determined, we now focus on the problem of extracting meaningful physical quantities from them. To this end, it is convenient to decompose the DBM weight into two parts, such that

$${\mathrm{\Psi }}_{\cal W}\left( {\sigma ^z} \right) = \mathop {\sum}\limits_{\{ h,d\} } {\kern 1pt} P_1\left( {\sigma ^z,h} \right)P_2\left( {h,d} \right),$$

(18)

where $P_1\left( {\sigma ^z,h} \right)$ = ${\mathrm {e}}^{\sigma ^z \cdot a + \sigma ^z \cdot W \cdot h + h \cdot b}$, and $P_2(h,d)$ = ${\mathrm {e}}^{h \cdot W\prime \cdot d + d \cdot b\prime }$. The expectation value of an arbitrary (few-body) operator ${\cal O}$ can then be computed through the expression

$$\left\langle {\cal O} \right\rangle = \frac{{\mathop {\sum}\limits_{\left\{ {\sigma ^z,h,h\prime d,d\prime } \right\}} {\kern 1pt} {\mathrm{{\Pi}}}\left( {\sigma ^z,h,h\prime ,d,d\prime } \right)O_{{\mathrm{loc}}}\left( {\sigma ^z,h,h\prime } \right)}}{{\mathop {\sum}\limits_{\left\{ {\sigma ^z,h,h\prime d,d\prime } \right\}} {\mathrm{{\Pi}}}\left( {\sigma ^z,h,h\prime ,d,d\prime } \right)}},$$

(19)

where we have introduced the pseudo-probability density Π(σ^z, h, h′, d, d′) ≡ $P_1\left( {\sigma ^z,h} \right)P_2\left( {h,d} \right)P_1^ \ast \left( {\sigma ^z,h\prime } \right)P_2^ \ast \left( {h\prime ,d\prime } \right)$, and the “local” estimator

$$O_{{\mathrm{loc}}}\left( {\sigma ^z,h,h\prime } \right) = \frac{1}{2}\mathop {\sum}\limits_{\sigma ^{\prime z}} \left\langle {\sigma ^z} \right|{\cal O}\left| {\sigma ^{\prime z}} \right\rangle \left( {\frac{{P_1\left( {\sigma ^{\prime z},h} \right)}}{{P_1\left( {\sigma ^z,h} \right)}} + \frac{{P_1\left( {\sigma ^{\prime z},h^\prime } \right)^ \ast }}{{P_1\left( {\sigma ^z,h^\prime } \right)^ \ast }}} \right).$$

(20)

For the sampling over the Π distribution, a block Gibbs sampling analogous to what performed in standard DBM architectures can be performed^37,40. Alternatively, it is possible to devise a set of Metropolis local updates sampling the exactly known marginals ${\tilde{\mathrm {\Pi}}}\left( {\sigma ^z,h,h\prime } \right)$ = $\mathop {\sum}\nolimits_{\{ d,d\prime \} } {\kern 1pt} {\mathrm{{\Pi}}}\left( {\sigma ^z,h,h\prime ,d,d\prime } \right)$ or ${\tilde{\mathrm {\Pi}}}\prime \left( {\sigma ^z,d,d\prime } \right)$ = $\mathop {\sum}\nolimits_{\{ h,h\prime \} } {\mathrm{{\Pi}}}\left( {\sigma ^z,h,h\prime ,d,d\prime } \right)$.

In general, we have found that efficiently sampling the DBMs arising from the Heisenberg model constructions is typically more challenging than for the TFIM. This circumstance is a consequence of the imaginary couplings which set constraints on the value of hidden/deep units. These constraints typically make local Metropolis updates inefficient. With the notable exception of the 2d–6h representation, for which loop updates can be readily implemented, we leave the problem of designing efficient Monte Carlo sampling for the other Heisenberg constructions open. The sampling strategies adopted in our numerica are discussed more in detail in Supplementary Note 3.

Numerical results

We have implemented numerical algorithms to sample and obtain physical properties from the DBM previously derived. In Fig. 4a we show results for the one-dimensional TFIM. Specifically, we show the expectation value of the energy following the imaginary-time evolution starting from a fully polarized (in the x direction) initial state. The initial state corresponds to an empty network, where all the DBM parameters are set to zero. The DBM results closely match the exact imaginary-time evolution, thus verifying the correctness of our construction.

In Fig. 4a we also show the corresponding imaginary-time evolution as obtained from the approximate RBM construction, Eq. (15). As expected, this approximation is very accurate for short times, and breaks at later times.

Numerical results for the one-dimensional Heisenberg model are shown in Figs. 4b and 5a. Specifically, 4b shows the numerical check for the DBM (construction 2d–6h) time evolution for one-dimensional Heisenberg model for N = 16. As expected, the DBM results also in this case follow the exact time evolution. Figure 5a shows the dependence of the energy from the initial state, for N = 80 case. Specifically, by taking a pre-optimized variational RBM as an initial state, we can significantly decrease the time τ needed to reach the ground state.

Results for two-dimensional models are shown in Fig. 5b, both for the two-dimensional Heisenberg model, and for the frustrated J₁ − J₂ model, on 4 × 4 lattice with periodic boundary conditions.

In the case of the TFIM, sampling from the DBM is realized through the Gibbs scheme previously sketched, in conjunction with a parallel tempering scheme, to improve ergodicity in the sampling.

For the AFHM and for the J₁ − J₂ model with 2d–6h representation, we adopt loop updates⁴¹ used in the path-integral QMC method, because the imaginary-time evolution in the 2d–6h representation has a direct correspondence to the path-integral formulation, allowing for an efficient handling of the constraint $d_{[l]} + d_{[m]}$ = $\sigma _l^z + \sigma _m^z$.

All the simulations carried here are sign-problem free, with the notable exception of the simulations carried on the two-dimensional J₁ − J₂ model. In this case, we start the imaginary-time evolution from a pre-optimized variational wave function, thus setting the fully evolved state as product of a DBM and the initial state. Because of the quality of the initial guess, a moderate sign problem can be numerically afforded for short time evolutions, and in this case it is enough to converge to the exact ground state (see Fig. 5b).

Discussion

We have shown how exact ground states of interacting spin Hamiltonians can be explicitly constructed using artificial neural networks comprising only two layers of hidden variables. In contrast to approaches based on one-layer RBMs, the constructions we have derived here do not require further variational optimization of the network parameters, and the exact representation of many-body ground states can be achieved with only polynomially many neurons. In the case of the Heisenberg model, all of the explicit algorithms presented here give rise to sign-problem-free representations, if the lattice is bipartite.

The DBM representation has an intrinsic conceptual value, as an alternative quantum-to-classical mapping to the path-integral representation. In the path-integral formalism, the addition of an extra dimension (the imaginary time direction) is needed to exactly represent the quantum many-body state. In our case, the DBM deep hidden layer plays a similar role as the additional dimension in the path integral. As argued in Methods [see Eq. (28)], a single-layer RBM is indeed sufficient to exactly, and efficiently describe the state of arbitrary classical spin systems. On the other hand, a second, deep layer is necessary for the efficient, and exact construction of compact networks describing quantum mechanical states.

DBM-based schemes can be further used to systematically improve upon existing RBM variational results. More generally, the initial state for the present DBM scheme can be generic variational states or even combinations of RBMs and more conventional wave functions^24,33. We have shown that, by starting the DBM construction from a pre-optimized variational state, a fast convergence to the exact ground state is observed. As shown in Fig. 5b, this kind of scheme opens the possibility of characterizing the ground state even in the case of non-bipartite lattices with frustration effects, exploiting the transient regime in which the sign problem can be still efficiently handled numerically, as for example discussed in ref. ⁴².

Methods

Useful identities

It is useful to introduce several identities, which can be used when more complicated interactions between the visible spins σ^z, hidden variables h and deep variables d beyond the standard form Eq. (1) are needed. The first identity reads

$${\mathrm {e}}^{s_1s_2V} = C\mathop {\sum}\limits_{s_3 = \pm 1} {\kern 1pt} {\mathrm {e}}^{s_1s_3\tilde V_1 + s_2s_3\tilde V_2} = 2C\,{\mathrm{cosh}}\left( {s_1\tilde V_1 + s_2\tilde V_2} \right).$$

(21)

with

$$C = \frac{1}{2}{\mathrm {e}}^{ - |V|}$$

(22)

$$\tilde V_1 = \frac{1}{2}{\mathrm{arcosh}}\left( {{\mathrm {e}}^{2|V|}} \right)$$

(23)

$$\tilde V_2 = {\mathrm{sgn}}(V) \times \tilde V_1$$

(24)

for Ising variables s₁ and s₂, and a real interaction V. This is a gadget for decomposing two-body interactions, and can be proven by examining all the cases of s₁ and s₂.

By taking s₁ and s₂ as visible (physical) variables σ^z and s₃ as a hidden variable h, the direct classical two-body interaction between physical variables [the leftmost part in Eq. (21)] is cut and instead mediated by the hidden neuron h. Furthermore, a direct interaction between σ^z and d can also be decomposed: In the following derivations for the DBM wave constructions, for convenience, we sometimes introduce the direct interaction between σ^z and d, which is not allowed in the DBM structure. However, by taking s₁ as a visible spin σ^z, s₂ as a deep variable d, and s₃ as a hidden variable h in Eq. (21), one can eliminate the direct interaction between σ^z and d and decompose it into the interaction mediated only by h with trade-off of the summation over the hidden variable h. With this trick, one can recover the standard DBM form in Eq. (1).

Another identity (decomposition of four-body interaction) is

$$\begin{array}{c}{\mathrm {e}}^{s_1s_2s_3s_4V} = \frac{1}{4}\mathop {\sum}\limits_{s_5,s_6,s_7} {\kern 1pt} {\mathrm{exp}}\left[ {i\frac{\pi }{4}\left( {s_5 + s_6} \right)\left( {s_1 + s_2 + s_3 + s_7} \right)} \right]\\ \times {\mathrm{exp}}\left( {s_4s_7V} \right)\\ = \mathop {\sum}\limits_{s_7} {\kern 1pt} {\mathrm{cos}}^2\left[ {\frac{\pi }{4}\left( {s_1 + s_2 + s_3 + s_7} \right)} \right]{\mathrm{exp}}\left( {s_4s_7V} \right)\end{array}$$

(25)

for Ising variables s_i with i = 1,…, 4. Although we have introduced complex couplings in the first line, each term in the summation in the second line of Eq. (25) is positive definite if V is real. The second line remains nonzero only if s₁s₂ = s₃s₇, which proves the identity. This identity with s₁ and s₂ as physical variables, s₄, s₅, and s₆ as hidden variables, and s₃ and s₇ as deep variables, reads

$$\begin{array}{c}{\mathrm {e}}^{\sigma _1\sigma _2d_1h_1V} = \frac{1}{4}\mathop {\sum}\limits_{h_2,h_3,d_2} {\kern 1pt} {\mathrm{exp}}\left[ {i\frac{\pi }{4}\left( {h_2 + h_3} \right)\left( {\sigma _1 + \sigma _2 + d_1 + d_2} \right)} \right]\\ \times {\mathrm{exp}}\left( {h_1d_2V} \right),\end{array}$$

(26)

Note that the right-hand side fits the DBM structure.

General three-body and two-body interactions can also be represented by the two-body form just by putting some of s₁,…,s₄ as constants in Eq. (25). These could be used instead of Eq. (21), although we employ Eq. (21) in the formalism below for the decoupling of the two-body interaction.

Finally, we discuss the gadgets for decomposing general N-body classical interactions using complex bias term b_j in addition to the couplings W and W′, whereas the gadgets Eqs. (21) and (26) are represented only by W and W′ interactions. The gadget reads

$${\mathrm {e}}^{\sigma _1\sigma _2 \ldots \sigma _NV} = C\,{\mathrm{cos}}^2\left( {b + \frac{\pi }{4}\mathop {\sum}\limits_{i = 1}^N {\kern 1pt} \sigma _i} \right)$$

(27)

$$= \frac{C}{4}\mathop {\sum}\limits_{h_1,h_2} {\kern 1pt} {\mathrm {e}}^{ib\left( {h_1 + h_2} \right)}{\mathrm {e}}^{i\frac{\pi }{4}\left( {h_1 + h_2} \right)\left( {\sigma _1 + \sigma _2 + \ldots + \sigma _N} \right)}$$

(28)

with

$$b = {\mathrm{arctan}}\left( {{\mathrm {e}}^{ - V}} \right) - \frac{\pi }{4}{\mathrm{mod}}\left( {N,4} \right),$$

(29)

$$C = \frac{1}{{{\mathrm{cos}}\left( {{\mathrm{arctan}}\left( {{\mathrm {e}}^{ - V}} \right)} \right) \times {\mathrm{sin}}\left( {{\mathrm{arctan}}\left( {{\mathrm {e}}^{ - V}} \right)} \right).}}$$

(30)

This fact suggests that any classical partition function defined for Ising spins can be written exactly in terms of an RBM. Although the RBM is shown to be powerful in representing also the quantum states, there is no analytical way to map quantum states to the RBM and one must rely on numerical optimizations to get the RBM parameters. In the present study, we show analytical mappings from quantum states to the DBM, which has additional hidden layer. In the statistical mechanics, it is known that quantum systems with D dimension can be mapped on (D + 1)-dimensional classical systems. Therefore, having additional hidden layer in neural network language is equivalent to acquiring additional dimension in statistical mechanics.

Transverse-field ising model

The solution of Eq. (9) is found in the following way. The left-hand side of Eq. (9) can be rewritten by using the notation Eq. (18) as

$$\begin{array}{l}\mathop {\sum}\limits_{\{ h,d\} } {\kern 1pt} P_1\left( {\sigma ^z,h} \right)P_2(h,d)\left[ {1 + {\mathrm{tanh}}\left( {{\mathrm{\Gamma }}_l\delta _\tau } \right){\mathrm {e}}^{ - 2\sigma _l^z\mathop {\sum}\limits_j {\kern 1pt} h_jW_{lj}}} \right]\\ = C{\mathrm{\Psi }}_{\bar {\cal W}}\left( {\sigma ^z} \right).\end{array}$$

(31)

We look for a solution by adding one deep neuron d_[l] and creating new couplings $W_{j[l]}^\prime$ to the existing hidden neurons h_j which are connected to $\sigma _l^z$. We also allow for changes in the existing interaction parameters. In particular we set the new couplings to be $\bar W_{lj} = W_{lj} + {\mathrm{\Delta }}W_{lj}$, (with ΔW_lj to be determined). Moreover, we introduce one hidden neuron h_[l] coupled to $\sigma _l^z$ and d_[l] through the interactions W_l[l] and $W_{[l][l]}^\prime$, respectively. If we trace out h_[l], the hidden neuron h_[l] mediates the interaction between $\sigma _l^z$ and d_[l] (denoted as $W_{l[l]}^{\prime\prime}$).

With this choice, we have (in the representation where h_[l] is traced out):

$$\begin{array}{*{20}{l}} {{\mathrm{\Psi }}_{\bar {\cal W}}\left( {\sigma ^z} \right)} \hfill & = \hfill & {\mathop {\sum}\limits_{\{ h,d\} } \mathop {\sum}\limits_{d_{[l]}} {\kern 1pt} P_1\left( {\sigma ^z,h} \right)P_2(h,d)} \hfill \\ {} \hfill & {} \hfill & {{\mathrm {e}}^{\sigma _l^z\mathop {\sum}\limits_j {\kern 1pt} {\mathrm{\Delta }}W_{lj}h_j + d_{[l]}\mathop {\sum}\limits_j {\kern 1pt} h_jW_{j[l]}^\prime + \sigma _l^zd_{[l]}W_{l[l]}^{\prime\prime} }.} \hfill \end{array}$$

(32)

The equations to be verified are obtained considering the two possible values of $\sigma _l^z = \pm 1$:

$${{\mathrm {e}}^{\mathop {\sum}\limits_j \,h_j\left( {{\mathrm{\Delta }}W_{lj} + W_{j[l]}^\prime } \right) + W_{l[l]}^{\prime\prime} } + {\mathrm {e}}^{\mathop {\sum}\limits_j {\kern 1pt} h_j\left( {{\mathrm{\Delta }}W_{lj} - W_{j[l]}^\prime } \right) - W_{l[l]}^{\prime\prime} } = C \times \left( {1 + {\mathrm{tanh}}\left( {{\mathrm{\Gamma }}_l\delta _\tau } \right){\mathrm {e}}^{ - 2\mathop {\sum}\limits_j {\kern 1pt} h_jW_{lj}}} \right)}$$

(33)

$${{\mathrm {e}}^{\mathop {\sum}\limits_j {\kern 1pt} h_j\left( { - {\mathrm{\Delta }}W_{lj} + W_{j[l]}^\prime } \right) - W_{l[l]}^{\prime\prime} } + {\mathrm {e}}^{\mathop {\sum}\limits_j {\kern 1pt} h_j\left( { - {\mathrm{\Delta }}W_{lj} - W_{j[l]}^\prime } \right) + W_{l[l]}^{\prime\prime} } = C \times \left( {1 + {\mathrm{tanh}}\left( {{\mathrm{\Gamma }}_l\delta _\tau } \right){\mathrm {e}}^{2\mathop {\sum}\limits_j {\kern 1pt} h_jW_{lj}}} \right).}$$

(34)

This equation has a solution from the requirement that the hidden unit interactions on the left and right hand sides match, thus we require

$${\mathrm{\Delta }}W_{lj} + W_{j[l]}^\prime = - 2W_{lj}$$

(35)

$${\mathrm{\Delta }}W_{lj} - W_{j[l]}^\prime = 0,$$

(36)

and

$$W_{l[l]}^{\prime\prime} = \frac{{{\mathrm{log}} \, {\mathrm{tanh}}\left( {{{\Gamma }}_{l}\delta _{\tau} } \right)}}{2}.$$

(37)

Notice that when Γ_l > 0, $W_{l[l]}^{\prime\prime}$ is also real. By using Eq. (21) with the following replacement $s_1 \to \sigma _l^z$, s₂ → d_[l], s₃ → h_[l], $V \to W_{l[l]}^{\prime\prime}$, $\tilde V_1 \to W_{l[l]}$, and $\tilde V_2 \to W_{[l][l]}^\prime$, the last condition determines the real couplings W_l[l] and $W_{[l][l]}^\prime$ as Eqs. (11) and (12).

Heisenberg model

Here, we show the derivation for the general form of bond Hamiltonian allowing anisotropy and bond-disorder: ${\cal H}_{lm}^{{\mathrm{bond}}}$ = $J_{lm}^{xy}\left( {\sigma _l^x\sigma _m^x + \sigma _l^y\sigma _m^y} \right)$ + $J_{lm}^z\sigma _l^z\sigma _m^z$. In the case of the bipartite lattice and the antiferromagnetic exchange $J_{lm}^z,J_{lm}^{xy} > 0$, we further apply a local gauge transformation by a π rotation around the z-axis in the spin space as σ^x → −σ^x and σ^y → −σ^y on one of the sublattices, which gives a – sign for $\sigma _l^x\sigma _m^x$ and $\sigma _l^y\sigma _m^y$ interactions. This transformation is equivalent to taking

$$J_{lm}^{xy} \to - J_{lm}^{xy}.$$

(38)

The gauge transformation enables to design a DBM neural network with real couplings {W, W′} except for those to put “constraint‘ on the values of deep neuron spins (see more detail about the constraint in the following sections). It ensures that the DBM algorithm has no negative sign problems.

In the case of the antiferromagnetic Heisenberg model after the gauge transformation on the bipartite lattice, we must solve, for each bond,

$$\begin{array}{l}\delta _{\sigma _l^z,\sigma _m^z}{\mathrm {e}}^{ - \delta _\tau J_{lm}^z}{\mathrm{\Psi }}_{\cal W}\left( {\sigma ^z} \right) + \left( {1 - \delta _{\sigma _l^z,\sigma _m^z}} \right){\mathrm {e}}^{\delta _\tau J_{lm}^z}\\ \left( {{\mathrm{\Psi }}_{\cal W}\left( {\sigma ^z} \right){\mathrm{cosh}}\left( {2J_{lm}^{xy}\delta _\tau } \right) + {\mathrm{\Psi }}_{\cal W}\left( {\sigma _l^z \leftrightarrow \sigma _m^z} \right){\mathrm{sinh}}\left( {2J_{lm}^{xy}\delta _\tau } \right)} \right)\\ = C\left\langle {\sigma ^z{\mathrm{|\Psi }}_{\bar {\cal W}}} \right\rangle .\end{array}$$

(39)

It is also useful to explicitly write the expression for the exchange term in the second line above:

$$\begin{array}{r}{\mathrm{\Psi }}_{\cal W}\left( {\sigma ^z} \right){\mathrm{cosh}}\left( {2J_{lm}^{xy}\delta _\tau } \right) + {\mathrm{\Psi }}_{\cal W}\left( {\sigma _l^z \leftrightarrow \sigma _m^z} \right){\mathrm{sinh}}\left( {2J_{lm}^{xy}\delta _\tau } \right)\\ = \mathop {\sum}\limits_{\{ h,d\} } {\kern 1pt} P_1\left( {\sigma ^z,h} \right)P_2(h,d)\left[ {{\mathrm{cosh}}\left( {2J_{lm}^{xy}\delta _\tau } \right) } \right.\\ \left. +{{\mathrm{sinh}}\left( {2J_{lm}^{xy}\delta _\tau } \right)e^{\left( {\sigma _m^z - \sigma _l^z} \right)\mathop {\sum}\limits_j {\kern 1pt} h_j\left( {W_{lj} - W_{mj}} \right)}} \right].\end{array}$$

(40)

In the following derivations, for the antiferromagnetic Hamiltonian $\left( {J_{lm}^z,J_{lm}^{xy} \,> \,0} \right)$ after the gauge transformation, we look for a solution with zero bias terms ($a_i,b_j,b_k^\prime = 0$, ∀i, j, k). We can also derive a sign-problem-free solution for the imaginary time evolution in the absence of the explicit gauge transformation by introducing a complex bias term a_i. Indeed, in the “2 deep, 4 hidden” representation, we will explicitly show that taking a specific set of complex bias term a_i on physical spins is equivalent to the gauge transformation, making a solution free from the sign problem.

In a way similar to the TFIM, solutions of Eq. (39) can be found by specifying the structure of the DBM and the three examples are the following.

1d–3h construction for Heisenberg model

We assume the structure of the updated wave function (corresponding to Eq. (32) for the TFIM) to be

$$\begin{array}{l}{\mathrm{\Psi }}_{\bar {\cal W}}\left( {\sigma ^z} \right) = \mathop {\sum}\limits_{\{ h,d\} } \mathop {\sum}\limits_{{\begin{array}{*{20}{c}} {d_{[lm]} = \pm 1} \\ {d_{[lm]} = \sigma _l^z\;{\mathrm{if}}\;\sigma _l^z = \sigma _m^z} \end{array}}} P_1\left( {\sigma ^z,h} \right)P_2(h,d)\\ {\mathrm {e}}^{\sigma _l^z\mathop {\sum}\limits_j {\kern 1pt} {\mathrm{\Delta }}W_{lj}h_j + d_{[lm]}\mathop {\sum}\limits_j h_jW_{j[lm]}^\prime + d_{[lm]}\sigma _l^zW_{l[lm]}^{\prime\prime} + V_{[lm]}\sigma _l^z\sigma _m^z}.\end{array}$$

(41)

Similarly to the case of the TFIM, a solution of Eq. (39) is given by

$${\mathrm{\Delta }}W_{lj} = - W_{lj} + W_{mj}$$

(42)

$$W_{j[lm]}^\prime = W_{lj} - W_{mj}.$$

(43)

and

$$W_{l[lm]}^{\prime\prime} = - \left( {{\mathrm{log}} \, {\mathrm{tanh}}\left( {2J_{lm}^{xy}\delta _\tau } \right)} \right){\mathrm{/}}2$$

(44)

$$V_{[lm]} = - \left( {{\mathrm{log}} \, {\mathrm{cosh}}\left( {2J_{lm}^{xy}\delta _\tau } \right)} \right){\mathrm{/}}2 - J_{lm}^z\delta _\tau$$

(45)

Notice that the first condition is equivalent to cutting all connections from spin l to the hidden units and attaching the spin l to all the hidden units connected to spin m, with an interaction W_mj.

Although the terms proportional to $W_{l[lm]}^{\prime\prime}$ and V_lm do not satisfy the standard DBM form, they can be transformed to the DBM form by introducing new hidden neurons h_[lm1] and h_[lm2] [see the gadget Eq. (21)]:

$${\mathrm {e}}^{\sigma _l^zd_{[lm]}W_{l[lm]}^{\prime\prime} } = C_{[lm1]}\mathop {\sum}\limits_{h_{[lm1]}} {\kern 1pt} {\mathrm {e}}^{\sigma _l^zh_{[lm1]}W_{l[lm1]} + h_{[lm1]}d_{[lm]}W_{[lm1][lm]}^\prime },$$

(46)

with

$$W_{l[lm1]} = W_{[lm1][lm]}^\prime = \frac{1}{2}{\mathrm{arcosh}}\left( {\frac{1}{{{\mathrm{tanh}}\left( {2J_{lm}^{xy}\delta _\tau } \right)}}} \right).$$

(47)

Similarly, the coupling V_[lm] is decomposed as

$${\mathrm {e}}^{\sigma _l^z\sigma _m^zV_{[lm]}} = C_{[lm2]}\mathop {\sum}\limits_{h_{[lm2]}} {\kern 1pt} {\mathrm {e}}^{\sigma _l^zh_{[lm2]}W_{l[lm2]} + \sigma _m^zh_{[lm2]}W_{m[lm2]}},$$

(48)

with

$$W_{l[lm2]} = - W_{m[lm2]} = \frac{1}{2}{\mathrm{arcosh}}\left( {{\mathrm{cosh}}\left( {2J_{lm}^{xy}\delta _\tau } \right)e^{2J_{lm}^z\delta _\tau }} \right).$$

(49)

Finally, as discussed in the main text, the constraint $d_{[lm]} = \sigma _l^z$ when $\sigma _l^z = \sigma _m^z$ can be satisfied by adding the third neuron h_[lm3], introducing pure complex iπ/6 couplings.

2d–6h construction for Heisenberg model

In this case, the form of the new wave function reads

$$\begin{array}{l}\Psi _{\bar {\cal W}}\left( {\sigma ^z} \right) = \mathop {\sum}\limits_{\{ h,d\} } \mathop {\sum}\limits_{\begin{array}{*{20}{c}} {d_{[l]},d_{[m]}} \\ {d_{[l]} + d_{[m]} = \sigma _l^z + \sigma _m^z} \end{array}} P_1\left( {\sigma ^z,h} \right)P_2(h,d)\\ {\mathrm {e}}^{\mathop {\sum}\limits_j \mathop {\sum}\limits_{n = l,m} {\kern 1pt} h_j\left( {{\mathrm{\Delta }}W_{nj}\sigma _n^z + W_{j[n]}^\prime d_{[n]}} \right) + \mathop {\sum}\limits_{n = l,m} \sigma _n^z\left( {W_{n[l]}^{\prime\prime} d_{[l]} + W_{n[m]}^{\prime\prime} d_{[m]}} \right)}.\end{array}$$

(50)

A solution of Eq. (39) is given by

$$W_{j[l]}^\prime = W_{lj},$$

(51)

$$W_{j[m]}^\prime = W_{mj},$$

(52)

$${\mathrm{\Delta }}W_{lj} = - W_{lj},$$

(53)

$${\mathrm{\Delta }}W_{mj} = - W_{mj},$$

(54)

and

$$W_{l[l]}^{\prime\prime} = W_{m[m]}^{\prime\prime} = - \frac{{J_{lm}^z\delta _\tau }}{2} - \frac{1}{4}{\mathrm{log}} \, {\mathrm{sinh}}\left( {2J_{lm}^{xy}\delta _\tau } \right),$$

(55)

$$W_{l[m]}^{\prime\prime} = W_{m[l]}^{\prime\prime} = - \frac{{J_{lm}^z\delta _\tau }}{2} - \frac{1}{4}{\mathrm{log}} \, {\mathrm{ cosh}}\left( {2J_{lm}^{xy}\delta _\tau } \right).$$

(56)

The direct interactions between $\left( {\sigma _l^z,d_{[l]}} \right)$, $\left( {\sigma _m^z,d_{[m]}} \right)$, $\left( {\sigma _l^z,d_{[m]}} \right)$, and $\left( {\sigma _m^z,d_{[l]}} \right)$, are mediated by h_[lm1], h_[lm2], h_[lm3], and h_[lm4], respectively, as follows:

$$\begin{array}{c}{\mathrm {e}}^{\sigma _l^zd_{[l]}W_{l[l]}^{\prime\prime} } = C_{[lm1]} \mathop {\sum}\limits_{h_{[lm1]}} {\kern 1pt} {\mathrm {e}}^{\sigma _l^zh_{[lm1]}W_{l[lm1]} + h_{[lm1]}d_{[l]}W_{[lm1][l]}^\prime },\end{array}$$

(57)

$$\begin{array}{c}{\mathrm {e}}^{\sigma _m^zd_{[m]}W_{m[m]}^{\prime\prime} } = C_{[lm2]} \mathop {\sum}\limits_{h_{[lm2]}} {\kern 1pt} {\mathrm {e}}^{\sigma _m^zh_{[lm2]}W_{m[lm2]} + h_{[lm2]}d_{[m]}W_{[lm2][m]}^\prime },\end{array}$$

(58)

$$\begin{array}{c}{\mathrm {e}}^{\sigma _l^zd_{[m]}W_{l[m]}^{\prime\prime} } = C_{[lm3]} \mathop {\sum}\limits_{h_{[lm3]}} {\kern 1pt} {\mathrm {e}}^{\sigma _l^zh_{[lm3]}W_{l[lm3]} + h_{[lm3]}d_{[m]}W_{[lm3][m]}^\prime },\end{array}$$

(59)

$$\begin{array}{c}{\mathrm {e}}^{\sigma _m^zd_{[l]}W_{m[l]}^{\prime\prime} } = C_{[lm4]} \mathop {\sum}\limits_{h_{[lm4]}} {\kern 1pt} {\mathrm {e}}^{\sigma _m^zh_{[lm4]}W_{m[lm4]} + h_{[lm4]}d_{[l]}W_{[lm4][l]}^\prime }.\end{array}$$

(60)

By applying the gadget Eq. (21), the new W and W′ interactions are given by, for small δ_τ (such that ${\textstyle{{{\mathrm {e}}^{ - J_{lm}^z\delta _\tau }} \over {\sqrt {{\mathrm{sinh}}\left( {2J_{lm}^{xy}\delta _\tau } \right)} }}} > 1$),

$$\begin{array}{c}W_{l[lm1]} = W_{[lm1][l]}^\prime = W_{m[lm2]} = W_{[lm2][m]}^\prime \\ = \frac{1}{2}{\mathrm{arcosh}}\left( {\frac{{{\mathrm {e}}^{ - J_{lm}^z\delta _\tau }}}{{\sqrt {{\mathrm{sinh}}\left( {2J_{lm}^{xy}\delta _\tau } \right)} }}} \right)\end{array}$$

(61)

and

$$\begin{array}{c}W_{l[lm3]} = - W_{[lm3][m]}^\prime = W_{m[lm4]} = - W_{[lm4][l]}^\prime \\ = \frac{1}{2}{\mathrm{arcosh}}\left( {\sqrt {{\mathrm{cosh}}\left( {2J_{lm}^{xy}\delta _\tau } \right)} \times {\mathrm {e}}^{J_{lm}^z\delta _\tau }} \right).\end{array}$$

(62)

Finally, the constraint $d_{[l]} + d_{[m]} = \sigma _l^z + \sigma _m^z$ can be put by introducing additionally two hidden neurons h_[lm5] and h_[lm6], and by introducing complex couplings

$$\begin{array}{r}\mathop {\sum}\limits_{h_{[lm5]},h_{[lm6]}} {\kern 1pt} {\mathrm {e}}^{i\frac{\pi }{4}\left( {\left( {\sigma _l^z + \sigma _m^z} \right)h_{[lm5]} - h_{[lm5]}\left( {d_{[l]} + d_{[m]}} \right)} \right)}\\ \times {\mathrm {e}}^{i\frac{\pi }{8}\left( {\left( {\sigma _l^z + \sigma _m^z} \right)h_{[lm6]} - h_{[lm6]}\left( {d_{[l]} + d_{[m]}} \right)} \right)}\end{array}$$

(63)

This term gives interactions among d_[l], d_[m], $\sigma _l^z$ and $\sigma _m^z$: $4\,{\mathrm{cos}}\left( {{\textstyle{\pi \over 4}}\left( {\sigma _l^z + \sigma _m^z - d_{[l]} - d_{[m]}} \right)} \right)$ ${\mathrm{cos}}\left( {{\textstyle{\pi \over 8}}\left( {\sigma _l^z + \sigma _m^z - d_{[l]} - d_{[m]}} \right)} \right)$, which realize the constraint.

2d–4h construction for Heisenberg model

For this construction, we assume the following structure for the wave-function after the propagator:

$$\begin{array}{c}{\mathrm{\Psi }}_{\bar {\cal W}}\left( {\sigma ^z} \right) = \mathop {\sum}\limits_{\{ h,d\} } \mathop {\sum}\limits_{d_{[l]}} {\kern 1pt} P_1\left( {\sigma ^z,h} \right)P_2(h,d){\mathrm {e}}^{\mathop {\sum}\limits_{j,n = l,m} {\kern 1pt} \sigma _n^zh_j{\mathrm{\Delta }}W_{nj}}\\ \times {\mathrm {e}}^{\mathop {\sum}\limits_j {\kern 1pt} h_jd_{[l]}W_{j[l]}^\prime + \mathop {\sum}\limits_{n = l,m} {\kern 1pt} \sigma _n^zd_{[l]}W_{n[l]}^{\prime\prime} + \mathop {\sum}\limits_j {\kern 1pt} \sigma _l^z\sigma _m^zh_jd_{[l]}Z_{lmj}}.\end{array}$$

(64)

In this case, we also look for a solution for the bond operator without the gauge transformation. This shows that the introduction of a complex bias term a_i can play the same role as the gauge transformation. Then, we need to solve

$$\begin{array}{c}\delta _{\sigma _l^z,\sigma _m^z}{\mathrm {e}}^{ - \delta _\tau J_{lm}^z}{\mathrm{\Psi }}_{\cal W}\left( {\sigma ^z} \right) + \left( {1 - \delta _{\sigma _l^z,\sigma _m^z}} \right){\mathrm {e}}^{\delta _\tau J_{lm}^z}\\ \left( {{\mathrm{\Psi }}_{\cal W}\left( {\sigma ^z} \right){\mathrm{cosh}}\left( {2J_{lm}^{xy}\delta _\tau } \right) - {\mathrm{\Psi }}_{\cal W}\left( {\sigma _l^z \leftrightarrow \sigma _m^z} \right){\mathrm{sinh}}\left( {2J_{lm}^{xy}\delta _\tau } \right)} \right)\\ = C\left\langle {\sigma ^z{\mathrm{|\Psi }}_{\bar {\cal W}}} \right\rangle .\end{array}$$

(65)

Note that the sign for ${\mathrm{\Psi }}_{\cal W}\left( {\sigma _l^z \leftrightarrow \sigma _m^z} \right){\mathrm{sinh}}\left( {2J_{lm}^{xy}\delta _\tau } \right)$ term is different from that in Eq. (39).

A solution of Eq. (65) is obtained as

$${\mathrm{\Delta }}W_{lj} = - {\mathrm{\Delta }}W_{mj} = - \frac{1}{2}\left( {W_{lj} - W_{mj}} \right),$$

(66)

where W_nj (n = l, m) is updated to $\bar W_{nj}$ with the increment ΔW_nj as $\bar W_{nj}$ = $W_{nj} + {\mathrm{\Delta }}W_{nj}$. The new couplings $W_{j[l]}^\prime$, Z_lmj and $W_{n[l]}^{\prime\prime}$ are also given by

$$W_{j[l]}^\prime = - Z_{lmj} = - \frac{1}{2}\left( {W_{lj} - W_{mj}} \right)$$

(67)

and

$$\begin{array}{c}W_{l[l]}^{\prime\prime} = \frac{1}{4}\left[ {{\mathrm{log}}\left[ { - {\mathrm {e}}^{ - 2a_{l - m}}{\mathrm{tanh}}\left( {2J_{lm}^{xy}\delta _\tau } \right)} \right]} \right.\\ \left. { + 2{\mathrm{arcosh}}\left[ {\frac{{{\mathrm {e}}^{ - 2J_{lm}^z\delta _\tau }}}{{\sqrt { - 2{\mathrm {e}}^{ - 2a_{l - m}}{\mathrm{sinh}}\left( {4J_{lm}^{xy}\delta _\tau } \right)} }}} \right]} \right]\end{array}$$

(68)

$$\begin{array}{c}W_{m[l]}^{\prime\prime} = \frac{1}{4}\left[ { - {\mathrm{log}}\left[ { - {\mathrm {e}}^{ - 2a_{l - m}}{\mathrm{tanh}}\left( {2J_{lm}^{xy}\delta _\tau } \right)} \right]} \right.\\ \left. { + {\mathrm{2arcosh}}\left[ {\frac{{{\mathrm {e}}^{ - 2J_{lm}^z\delta _\tau }}}{{\sqrt { - 2{\mathrm {e}}^{ - 2a_{l - m}}{\mathrm{sinh}}\left( {4J_{lm}^{xy}\delta _\tau } \right)} }}} \right]} \right]\end{array}$$

(69)

with a_l−m = a_l − a_m. On a bipartite lattice, to avoid the negative sign (or complex phase) problem we need to keep $W_{l[l]}^{\prime\prime}$ and $W_{m[l]}^{\prime\prime}$ real. This can be achieved by choosing a_l = 0 for any l if J_lm < 0 (ferromagnetic case). For J_lm > 0 (antiferromagnetic case), a_l = nπi with an arbitrary integer n if the site l belongs to the sub-lattice A and a_l = (n + 1/2)πi if l belongs to the sub-lattice B. This local gauge for J_lm > 0 is equivalent to the transformation $J_{lm}^{xy} \to - J_{lm}^{xy}$ and a_l = 0 for any site l. We further notice that $W_{m[l]}^{\prime\prime}$ can be taken positive if we take a sufficiently small δ_τ in Eq (69), with the leading order term $- {\mathrm{log}}\left( {2J_{lm}^{xy}\delta _\tau } \right){\mathrm{/}}2$. On the other hand, in Eq. (68), the leading order term is negative (=−J_lmδ_τ).

To recover the original form of the DBM, we first use Eq. (21) with the replacement $s_1 \to \sigma _n^z$, s₂ → d_[l], s₃ → h_[n], C → D_n, $V \to W_{n[l]}^{\prime\prime}$ $\tilde V_1 \to W_{n[n]}$, and $\tilde V_2 \to W_{[n][l]}^\prime$ for n = l, m. Then a solution for D_n, W_n[n], and $W_{[n][l]}^\prime$ are represented by using $W_{n[l]}^{\prime\prime}$ as

$$D_n = \frac{1}{2}{\mathrm{exp}}\left[ { - W_{n[l]}^{\prime\prime} } \right]$$

(70)

$$W_{n[n]} = W_{[n][l]}^\prime = \frac{1}{2}{\mathrm{arcosh}}\left( {{\mathrm{exp}}\left[ {2W_{n[l]}^{\prime\prime} } \right]} \right),$$

(71)

for positive $W_{n[l]}^{\prime\prime}$ and

$$D_n = \frac{1}{2}{\mathrm{exp}}\left[ {W_{n[l]}^{\prime\prime} } \right]$$

(72)

$$W_{n[n]} = - W_{[n][l]}^\prime = \frac{1}{2}{\mathrm{arcosh}}\left( {{\mathrm{exp}}\left[ { - 2W_{n[l]}^{\prime\prime} } \right]} \right),$$

(73)

for negative $W_{n[l]}^{\prime\prime}$ to give real W_n[n] and $W_{[n][l]}^\prime$.

To completely recover the original DBM form, we next use Eq. (26) by replacing σ₁ with $\sigma _l^z$, σ₂ with $\sigma _m^z$, d₁ with d_[l], d₂ with d_[lm], h₁ with h_j, h₂ with h_[lm1], h₃ with h_[lm2], and V with Z_lmj.

With these solutions, by ignoring the trivial constant factors including D_l and D_m, the evolution is described by introducing two deep and four hidden additional variables d_[l], d_[lm], h_[l], h_[m], h_[lm1], and h_[lm2] as

$$\begin{array}{c}{\mathrm{\Psi }}_{\bar {\cal W}}\left( {\sigma ^z} \right) = \mathop {\sum}\limits_{\{ \bar h,\bar d\} } {\kern 1pt} P_1\left( {\sigma ^z,h} \right)P_2(h,d){\mathrm{exp}}\left[ {\mathop {\sum}\limits_{j,n = l,m} {\kern 1pt} \sigma _n^zh_j{\mathrm{\Delta }}W_{nj}} \right.\\ + \mathop {\sum}\limits_j h_jd_{[l]}W_{j[l]}^\prime + \mathop {\sum}\limits_{n = l,m} {\kern 1pt} h_{[n]}\left( {\sigma _n^zW_{n[n]} + d_{[l]}W_{[n][l]}^\prime } \right)\\ \left. { + d_{[lm]}\mathop {\sum}\limits_j {\kern 1pt} h_jZ_{lmj} + \frac{{i\pi }}{4}\left( {h_{[lm1]} + h_{[lm2]}} \right)\left( {\sigma _l^z + \sigma _m^z + d_{[l]} + d_{[lm]}} \right)} \right],\end{array}$$

(74)

where $\left\{ {\bar h,\bar d} \right\}$ is a set consisting of the existing and new neurons.

Code availability

Computer codes to create the deep Boltzmann machine networks for each model are provided as Supplementary Software 1–4. Other code written for and used in this study is available from the corresponding author upon reasonable request.

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Feynman, R. P. Space-time approach to non-relativistic quantum mechanics. Rev. Mod. Phys. 20, 367–387 (1948).
Article ADS MathSciNet Google Scholar
Dyson, F. J. The S matrix in quantum electrodynamics. Phys. Rev. 75, 1736–1755 (1949).
Article ADS MathSciNet Google Scholar
Hubbard, J. Calculation of partition functions. Phys. Rev. Lett. 3, 77–78 (1959).
Article ADS Google Scholar
Stratonovich, R. L. On a method of calculating quantum distribution functions. Sov. Phys. Dokl. 2, 416–419 (1957).
ADS MATH Google Scholar
Abrikosov, A. A. Methods of Quantum Field Theory in Statistical Physics. (Dover Publications, New York, 1975). revised edition.
Google Scholar
Binder, K. Applications of the Monte Carlo Method in Statistical Physics. (Springer Verlag, Berlin, 1984).
Book Google Scholar
Takahashi, M. & Imada, M. Monte carlo calculation of quantum systems. J. Phys. Soc. Jpn. 53, 963 (1984).
Article ADS CAS Google Scholar
Takahashi, M. & Imada, M. Monte carlo calculation of quantum systems. II. Higher order correction. J. Phys. Soc. Jpn. 53, 3765 (1984).
Article ADS CAS Google Scholar
Ceperley, D. Path-integrals in the theory of condensed helium. Rev. Mod. Phys. 67, 279–355 (1995).
Article ADS CAS Google Scholar
Suzuki, M. Relationship between d-dimensional quantal spin systems and (d + 1)-dimensional ising systems: equivalence, critical exponents and systematic approximants of the partition function and spin correlations. Prog. Theor. Phys. 56, 1454–1469 (1976).
Article ADS Google Scholar
Hirsch, J. E., Sugar, R., Scalapino, D. & Blankenbecler, R. Monte carlo simulations of one-dimensional fermion systems. Phys. Rev. B 26, 5033–5055 (1982).
Article ADS CAS Google Scholar
Beard, B. & Wiese, U.-J. Simulations of discrete quantum systems in continuous euclidean time. J. Phys. Rev. Lett. 77, 5130–5133 (1996).
Article ADS CAS Google Scholar
Sandvik, A. W. Stochastic series expansion method with operator-loop update. Phys. Rev. B 59, R14157–R14160 (1999).
Article ADS CAS Google Scholar
Prokof’ev, N. & Svistunov, B. Bold diagrammatic Monte Carlo technique: when the sign problem is welcome. Phys. Rev. Lett. 99, 250201 (2007).
Article ADS PubMed Google Scholar
Feynman, R. P. Atomic theory of the two-fluid model of liquid helium. Phys. Rev. 94, 262–277 (1954).
Article ADS CAS Google Scholar
Gros, C. Physics of projected wavefunctions. Ann. Phys. 189, 53–88 (1989).
Article ADS Google Scholar
Kashima, T. & Imada, M. Path-integral renormalization group method for numerical study on ground states of strongly correlated electronic systems. J. Phys. Soc. Jpn. 70, 2287–2299 (2001).
Article ADS CAS Google Scholar
Tahara, D. & Imada, M. Variational monte carlo method combined with quantum-number projection and multi-variable optimization. J. Phys. Soc. Jpn. 77, 114701 (2008).
Article ADS Google Scholar
Becca, F. & Sorella, S. Quantum Monte Carlo Approaches for Correlated Systems. (Cambridge University Press, Cambridge, UK; New York, NY, 2017).
Book Google Scholar
White, S. R. Density-matrix algorithms for quantum renormalization groups. Phys. Rev. B 48, 10345–10356 (1993).
Article ADS CAS Google Scholar
Orús, R. A practical introduction to tensor networks: Matrix product states and projected entangled pair states. Ann. Phys. 349, 117–158 (2014).
Article ADS MathSciNet Google Scholar
Carleo, G. & Troyer, M. Solving the quantum many-body problem with artificial neural networks. Science 355, 602–606 (2017).
Article ADS MathSciNet CAS PubMed Google Scholar
Torlai, G. et al. Many-body quantum state tomography with neural networks. Nat. Phys. 14, 447–450 (2018).
Article CAS Google Scholar
Nomura, Y., Darmawan, A. S., Yamaji, Y. & Imada, M. Restricted boltzmann machine learning for solving strongly correlated quantum systems. Phys. Rev. B 96, 205152 (2017).
Article ADS Google Scholar
Deng, D.-L., Li, X. & Das Sarma, S. Quantum entanglement in neural network states. Phys. Rev. X 7, 021021 (2017).
Google Scholar
Rocchetto, A., Grant, E., Strelchuk, S., Carleo, G. & Severini, S. Learning hard quantum distributions with variational autoencoders. npj Quantum Inf. 4, 28 (2018).
Article ADS Google Scholar
Glasser, I., Pancotti, N., August, M., Rodriguez, I. D. & Cirac, J. I. Neural networks quantum states, string-bond states and chiral topological states. Phys. Rev. X 8, 011006 (2018).
Google Scholar
Kaubruegger, R., Pastori, L. & Budich, J. C. Chiral topological phases from artificial neural networks. Phys. Rev. B 97, 195136 (2018).
Article ADS Google Scholar
Cai, Z. Approximating quantum many-body wave-functions using artificial neural networks. Phys. Rev. B 97, 035116 (2018).
Article ADS Google Scholar
Saito, H. & Kato, M. Machine learning technique to find quantum many-body ground states of Bosons on a lattice. J. Phys. Soc. Jpn. 87, 014001 (2017).
Article ADS Google Scholar
Saito, H. Solving the Bose–Hubbard model with machine learning. J. Phys. Soc. Jpn. 86, 093001 (2017).
Article ADS Google Scholar
Chen, J., Cheng, S., Xie, H., Wang, L. & Xiang, T. Equivalence of restricted Boltzmann machines and tensor network states. Phys. Rev. B 97, 085104 (2018).
Article ADS Google Scholar
Clark, S. R. Unifying neural-network quantum states and correlator product states via tensor networks. J. Phys. A 51, 135301 (2018).
Article ADS MathSciNet Google Scholar
Deng, D.-L., Li, X. & Das Sarma, S. Machine learning topological states. Phys. Rev. B 96, 195145 (2017).
Article ADS Google Scholar
Gao, X. & Duan, L.-M. Efficient representation of quantum many-body states with deep neural networks. Nat. Commun. 8, 662 (2017).
Article ADS PubMed PubMed Central Google Scholar
Huang, Y. & Moore, J. E. Neural network representation of tensor network and chiral states. Preprint at http://arxiv.org/abs/1701.06246 (2017).
Salakhutdinov, R. & Hinton, G. Deep Boltzmann machines. Proc. Mach. Learn. Res. 5, 448–455 (2009).
MATH Google Scholar
Trotter, H. F. On the product of semi-groups of operators. Proc. Am. Math. Soc. 10, 545–551 (1959).
Article MathSciNet Google Scholar
Freitas, N., Morigi, G. & Dunjko, V. Neural network operations and Susuki–Trotter evolution of neural network states. Preprint at http://arxiv.org/abs/1803.02118 (2018).
Salakhutdinov, R. & Hinton, G. An efficient learning procedure for deep Boltzmann machines. Neural Comput. 24, 1967–2006 (2012).
Article MathSciNet PubMed Google Scholar
Evertz, H. G., Lana, G. & Marcu, M. Cluster algorithm for vertex models. Phys. Rev. Lett. 70, 875–879 (1993).
Article ADS CAS PubMed Google Scholar
Ceperley, D. M. & Alder, J. Ground state of the electron gas by a stochastic method. Phys. Rev. Lett. 45, 566–569 (1980).
Article ADS CAS Google Scholar

Download references

Acknowledgements

G.C. acknowledges useful discussions with Xun Gao, and Markus Heyl. Y.N. and M.I. are grateful for the useful discussions with Youhei Yamaji and Andrew S. Darmawan. Y.N. was financially supported by Grant-in-Aids for Scientific Research (JSPS KAKENHI) (No. 17K14336). M.I. and Y.N. were financially supported by a Grant-in-Aid for Scientific Research (No. 16H06345) from Ministry of Education, Culture, Sports, Science and Technology, Japan. Part of the calculations were done at Supercomputer Center, Institute for Solid State Physics, University of Tokyo. This work was also supported in part by MEXT as a social and scientific priority issue (Creation of new functional devices and high-performance materials to support next-generation industries CDMSI) to be tackled by using post-K computer. We also thank the support provided by the RIKEN Advanced Institute for Computational Science through the HPCI System Research project (hp170263) supported by Ministry of Education, Culture, Sports, Science, and Technology, Japan.

Author information

Authors and Affiliations

Center for Computational Quantum Physics, Flatiron Institute, 162 5th Avenue, New York, NY, 10010, USA
Giuseppe Carleo
Institute for Theoretical Physics, ETH Zurich, Wolfgang-Pauli-Str. 27, 8093, Zurich, Switzerland
Giuseppe Carleo
Department of Applied Physics, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan
Yusuke Nomura & Masatoshi Imada

Authors

Giuseppe Carleo
View author publications
You can also search for this author in PubMed Google Scholar
Yusuke Nomura
View author publications
You can also search for this author in PubMed Google Scholar
Masatoshi Imada
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

G.C. conceived the general idea and contributed the Ising model, and the approximate RBM construction. G.C., Y.N. and M.I. each contributed one of the three Heisenberg model representations. Numerical simulations were performed by Y.N. and G.C. All authors contributed equally to the manuscript preparation and presentation of the results.

Corresponding author

Correspondence to Giuseppe Carleo.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary Information

Description of Additional Supplementary Files

Supplementary Software 1

Supplementary Software 2

Supplementary Software 3

Supplementary Software 4

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Carleo, G., Nomura, Y. & Imada, M. Constructing exact representations of quantum many-body systems with deep neural networks. Nat Commun 9, 5322 (2018). https://doi.org/10.1038/s41467-018-07520-3

Download citation

Received: 26 February 2018
Accepted: 08 November 2018
Published: 14 December 2018
DOI: https://doi.org/10.1038/s41467-018-07520-3

This article is cited by

Quantum process tomography with unsupervised learning and tensor networks
- Giacomo Torlai
- Christopher J. Wood
- Leandro Aolita
Nature Communications (2023)
Continuous-variable neural network quantum states and the quantum rotor model
- James Stokes
- Saibal De
- Giuseppe Carleo
Quantum Machine Intelligence (2023)
Compression and reduction of $N*1$ states by unitary matrices
- Guijiao Du
- Chengcheng Zhou
- Leong-Chuan Kwek
Quantum Information Processing (2022)
Bayesian Optimization of Bose-Einstein Condensates
- Tamil Arasan Bakthavatchalam
- Suriyadeepan Ramamoorthy
- Vijayalakshmi Sethuraman
Scientific Reports (2021)
Variational quantum Boltzmann machines
- Christa Zoufal
- Aurélien Lucchi
- Stefan Woerner
Quantum Machine Intelligence (2021)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

General scheme of constructing deep neural states

Transverse-field Ising model (TFIM)

Approximate RBM from DBM for transverse ising model

Heisenberg model

1 deep + 3 hidden variables construction for Heisenberg model

2 deep + 6 hidden variables construction for Heisenberg model

2 deep + 4 hidden variables construction for Heisenberg model

Sampling strategies

Numerical results

Discussion

Methods

Useful identities

Transverse-field ising model

Heisenberg model

1d–3h construction for Heisenberg model

2d–6h construction for Heisenberg model

2d–4h construction for Heisenberg model

Code availability

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links