Letter | Published:

Neural-network quantum state tomography

Nature Physicsvolume 14pages447450 (2018) | Download Citation


The experimental realization of increasingly complex synthetic quantum systems calls for the development of general theoretical methods to validate and fully exploit quantum resources. Quantum state tomography (QST) aims to reconstruct the full quantum state from simple measurements, and therefore provides a key tool to obtain reliable analytics1,2,3. However, exact brute-force approaches to QST place a high demand on computational resources, making them unfeasible for anything except small systems4,5. Here we show how machine learning techniques can be used to perform QST of highly entangled states with more than a hundred qubits, to a high degree of accuracy. We demonstrate that machine learning allows one to reconstruct traditionally challenging many-body quantities—such as the entanglement entropy—from simple, experimentally accessible measurements. This approach can benefit existing and future generations of devices ranging from quantum computers to ultracold-atom quantum simulators6,7,8.


Machine learning methods have been demonstrated to be particularly powerful at compressing high-dimensional data into low-dimensional representations9,10. Largely developed in the domain of data science, these techniques have recently been used to address fundamental questions in the domain of physical sciences. Applications to quantum many-body systems have been put forward in the last year, for example, to classify phases of matter11,12,13, and to simulate quantum systems14.

QST is itself a data-driven problem, in which we aim to obtain a complete quantum-mechanical description of a system, on the basis of a limited set of experimentally accessible measurements. While compressed sensing approaches15 reduce the experimental burden of full QST, large systems can be studied only through techniques requiring a feasible number of measurements. For example, permutationally invariant tomography16 makes efficient use of the symmetries of prototypical quantum optics states, and can be amenable to a large number of qubits. However, the general case of many-body systems is challenging for QST. In this context, matrix product states are the state-of-the-art tool for QST of low-entangled states17,18. For highly entangled quantum states resulting either from deep quantum circuits or high-dimensional physical systems, alternative representations are required for QST.

Here, we show how machine learning approaches can be used to find such representations. In particular, we argue that suitably trained artificial neural networks offer a natural and general way of performing QST driven by a limited amount of experimental data. Our approach is demonstrated on controlled artificial data sets, comprising measurements from several prototypical quantum states with a large number of degrees of freedom (qubits, spins and so on), which are thus hard for traditional QST approaches.

We consider here the goal of reconstructing a generic many-body target wavefunction \({\rm{\Psi}}({{\bf{x}}})\equiv \left\langle {{\bf{x}}}| {\rm{\Psi}}\right\rangle\), where x is some reference basis (for example, σz for spin-\(1/2\)). To act as the model, we use a representation of the many-body state in terms of artificial neural networks14:

$${\psi }_{{\rm{\lambda}},{{\rm{\mu}}}}({{\bf{x}}})=\sqrt{\frac{{p}_{\rm{\lambda}}({{\bf{x}}})}{{Z}_{\rm{\lambda}}}}{{\rm{e}}}^{i{\phi }_{\rm{\mu}}({{\bf{x}}})/2}$$

where the networks pλ(x) and ϕµ(x) represent, respectively, the amplitude and phase of the state, and Zλ is the normalization constant. The neural-network architecture we use in this work is based on the restricted Boltzmann machine (RBM). This architecture features a visible layer (describing the physical qubits) and a hidden layer of binary neurons, fully connected with weighted edges to the visible layer (see Methods). RBM states offer a compact variational representation of many-body quantum states, capable of sustaining non-trivial correlations, such as high entanglement, or topological features19,20,21,22,23,24. Specifically, we take pλ to be an RBM with parameters λ, and a separate RBM network, pµ with parameters µ to model the phase, ϕµ = log pµ(x). Our machine learning approach to QST is then carried out as follows. First, the RBM is trained on a data set consisting of a series of independent density measurements \({\left|{\rm{\Psi}}\left({{\bf{x}}}^{[b]}\right)\right|}^{2}\) realized in a collection of bases {x[b]} of the N-body quantum system. During this stage, the network parameters (λ, µ) are optimized to maximize the data-set likelihood, in a way that \({\left|{\psi }_{\rm{\lambda},{\rm{\mu}}}\left({{\mathbf{{x}}}}^{[b]}\right)\right|}^{2}\simeq {\left|{\rm{\Psi }}\left({{\mathbf{{x}}}}^{[b]}\right)\right|}^{2}\) (see Methods). Once trained, ψλ,μ(x) approximates both the wavefunction’s amplitudes and phases, thus reconstructing the target state. The accuracy of the reconstruction can be systematically improved by increasing the number of hidden neurons M in the RBM for fixed N, or equivalently the density of hidden units α = M/N (refs 14,25). One key feature of our QST approach is that it needs only raw data (that is, many experimental snapshots coming from single measurements), rather than estimates of expectation values of operators1,4,16,17,18. This set-up implies that we circumvent the need to achieve low levels of intrinsic Gaussian noise in the evaluations of mean values of operators.

To demonstrate this approach, we start by considering QST of the W state, a paradigmatic N-qubit multipartite entangled wavefunction defined as

$$\left|{{\rm{\Psi }}}_{W}\right\rangle =\frac{1}{\sqrt{N}}\left(\left|100\ldots \right\rangle +\ldots +\left|\ldots 001\right\rangle \right)$$

To mimic experiments, we generate several data sets with an increasing number of synthetic density measurements obtained by sampling from the W state in the σz basis. These measurements are used to train an RBM model featuring only the set of parameters λ, since the target \(\left|{{\rm{\Psi }}}_{W}\right\rangle\) is real and positive in this basis. After the training, we sample from \({\left|{\psi }_{\rm{\lambda}}\left({{\mathbf{\sigma }}}^{z}\right)\right|}^{2}\) and build a histogram of the frequency of the N components \(\left(\left|100\ldots \right\rangle, \left|010\ldots \right\rangle \ldots \right)\). In Fig. 1a we show three histograms obtained with a different number of samples in the training data set for N = 20 and α = 1. From the histograms, we see that the N components converge to equal frequency in the limit of large sample number, as expected for the W state. To better quantify the quality of this reconstruction, we compute the overlap \({O}_{W}=\left|\left\langle {{\rm{\Psi }}}_{W}| {\psi }_{{\rm{\lambda }}}\right\rangle \right|\) of the RBM wavefunction with the original W state. In Fig. 1b, O W is shown as a function of the number of samples in the training data sets for three different values of N. For a system size substantially larger than what is currently available in experiments26, an overlap O W  ~ 1 can be achieved with a moderate number of samples. As a comparison, for N = 8, a brute-force QST requires almost 106 measurements4. Our RBM achieves similar accuracy in reconstructing the wavefunction with only about 100 N-bit measurements, a number comparable to other state-of-the-art QST approaches15,16,17. To examine a more challenging case for QST, one can augment the W state with a local phase shift \({\rm{\exp }}\left(i\theta \left({{\mathbf{\sigma }}}_{k}^{z}\right){\rm{/}}2\right)\) with random phase \(\theta \left({{\mathbf{\sigma }}}_{k}^{z}\right)\) applied to each qubit. QST must now be carried out using the full RBM wavefunction equation (1), and trained on 2(N − 1) additional bases. In Fig. 1 we plot a comparison between the exact phases (Fig. 1c) and the phases learned by the RBM (Fig. 1d) for N = 20 qubits, showing very good agreement (O W  = 0.997). We expect our approach to perform equally well for other paradigmatic quantum optics states. In the Supplementary Information we provide more details, including an examination of the effects of varying α on QST of the W state, discuss overfitting, and demonstrate that RBMs can encode compactly (that is, with a polynomial number of hidden units) the Greenberger–Horne–Zeilinger and Dicke states.

Fig. 1: Benchmarking the neural-network tomography of the W state.
Fig. 1

a, Histogram of the occurrence of each of the superposed states in the W state for N = 20 qubits. We plot three histograms obtained by sampling a RBM trained on a data set containing 50 (red), 1,000 (blue) and 20,000 (green) independent samples. b, Overlap between the W state and the wavefunction generated by the trained RBM with α = 1 as a function of the number of samples NS in the training data set. c,d, Phases \(\theta \left({{\mathbf{\sigma }}}_{k}^{z}\right)\) for each of the N = 20 states (different colours) in the phase augmented W state. We show the comparison between the exact phases (c) and the phases learned by a RBM (d), trained using 6,400 samples per basis (magnitudes of the phases are plotted along the radial direction). Here, RBM tomography allows systematically converge to the target W state for both cases with real and complex wavefunction coefficients, on increasing the number of experimental samples.

We now turn to the case of more complex systems, and demonstrate QST for two interacting many-body problems that are directly relevant for quantum simulators based on ultracold ions and atoms. To mimic such experimental scenarios, we generate artificial data sets by sampling different quantum states of two lattice spin models: the transverse-field Ising model (TFIM), with Hamiltonian

$${\mathscr{H}}=\sum _{ij}{J}_{ij}{\sigma }_{i}^{z}{\sigma }_{j}^{z}-h\sum _{i}{\sigma }_{i}^{x}$$

and the XXZ spin-\(1/2\) model, with Hamiltonian

$${\mathscr{H}}=\sum _{ij}\left[{\rm{\Delta }}\left({\sigma }_{i}^{x}{\sigma }_{j}^{x}+{\sigma }_{i}^{y}{\sigma }_{j}^{y}\right)+{\sigma }_{i}^{z}{\sigma }_{j}^{z}\right]$$

where the σ i are Pauli spin operators.

First, we consider ground-state wavefunctions. Using quantum Monte Carlo (QMC) methods, we synthesize artificial data sets by sampling the exact ground states of equations (3) and (4) for different values of the coupling parameters h and Δ, and for nearest-neighbour interactions J ij  = J, in both one and two spatial dimensions. The quality of the learned wavefunctions is tested by computing various observables using the RBM, and comparing them with the exact values known via the QMC simulations. For the two-dimensional (2D) TFIM, Fig. 2a illustrates how the RBMs can reproduce the average values of both diagonal and off-diagonal observables to high accuracy for N 100 spins. For the 2D XXZ model, Fig. 2b illustrates the expectation values of the diagonal \({\sigma }_{\rm{a}}^{z}{\sigma }_{\rm{b}}^{z}\) and off-diagonal \({\sigma }_{\rm{a}}^{x}{\sigma }_{\rm{b}}^{x}\) spin correlations, with a and b being neighbours along the lattice diagonal. In addition, we consider the full spin–spin \({\sigma }_{i}^{z}{\sigma }_{j}^{z}\) correlation function for the 1D TFIM, which involves non-local correlations. Figure 2d shows that the reconstructed RBM correlation function closely matches the exact result (obtained via QMC measurements in Fig. 2c). Here, deviations between the RBM and QMC are compatible with statistical uncertainty due to the finiteness of the training set.

Fig. 2: Tomography of ground and dynamically evolved states of many-body Hamiltonians.
Fig. 2

ad, QST for ground states, comparing the reconstructed observables to those obtained with quantum Monte Carlo simulations. eg, QST for unitary evolution of a 1D chain following a quantum quench with a long-range Ising Hamiltonian with γ = 3/4. a, Diagonal and off-diagonal magnetizations as a function of the transverse field h for the ferromagnetic 2D TFIM on a square lattice with linear size L = 12 (N = 144). b, Two-point correlation function (diagonal and off-diagonal) between neighbouring spins along the diagonal of the square lattice (linear size L = 12) for the 2D XXZ model. Each data point is obtained with an RBM from a network trained with α = 1/4 on separate data sets. RBM QST allows here to accurately reconstruct, for each model, both diagonal and off-diagonal observables of the target state. In the lower panels, we show the reconstruction of the diagonal spin correlation function \(\left\langle {\sigma }_{i}^{z}{\sigma }_{j}^{z}\right\rangle\) for the 1D TFIM with N = 100 sites at the critical point h = 1. c, Direct calculation on spin configurations from a test-set much larger than the training data set. d, Reconstruction of the correlations by sampling the trained RBM with α = 1/2. e, Overlap between the system wavefunction Ψ(σ; t) and the RBM wavefunction ψλ,μ(σ) for t = 0.5, as a function of the number of samples NS per basis. In the inset we show the overlap as a function of time for different values of NS. In the lower panels, we show the reconstruction of the 2N phases (rearranged as a 2D array) for N = 12 and t = 0.5. f, Exact phases θ(σ k ) for each component Ψ(σ k ;t). g, Phases ϕμ(σ k ) learned by the RBM with α = 1.

To go beyond the case of ground-state wavefunctions, we also consider states originating from dynamics under unitary evolution. We focus on a case of ‘quench’ dynamics that is realizable in experiments with ultracold ions27. Specifically, we study 1D Ising spins initially prepared in the state \({{\rm{\Psi }}}_{0}=\left|\to, \to, \ldots, \to \right\rangle\) (fully polarized in the σx basis), subject to unitary dynamics enforced by the Hamiltonian in equation (3) with long-range interactions \({J}_{ij}\propto 1{\rm{/}}{\left|i-j\right|}^{\gamma }\) and magnetic field set to zero (h = 0). For a given time t, we perform QST on the state \(\left|{\rm{\Psi }}(t)\right\rangle ={\rm{\exp }}(-i{\mathscr{H}}t)\left|{{\rm{\Psi }}}_{0}\right\rangle\) by training the RBM on spin density measurements performed in 2N + 1 different bases. In Fig. 2e, we show the overlap between the RBM wavefunction ψλ,μ(σ) and the time-evolved state Ψ(σ; t) for different system sizes N, as a function of the number NS of samples per basis. In the lower plot, we show for N = 12 the exact (Fig. 2f) and the reconstructed phases (Fig. 2g).

For both ground and dynamically evolved states, these results indicate that our neural-network QST is able to obtain high-quality results with a moderate number of measurements, important for ultracold atoms and similar systems where state preparation is costly.

Finally, we turn to the important and highly non-local quantum quantity that is perhaps the most challenging for direct experimental observation28, the entanglement entropy. Consider a bipartition of the physical system into a region A and its complement. The second Renyi entropy is defined as \({S}_{2}\left({\rho }_{{\rm{A}}}\right)=-{\rm{log}}\left({\rm{Tr}}\left({\rho }_{{\rm{A}}}^{2}\right)\right)\), with the reduced density matrix ρA describing the subsystem A. We estimate S2 by employing sampling of the ‘swap’ operator29 using the wavefunction generated by the RBM. In Fig. 3 we show the entanglement entropy for the 1D TFIM with three values of the transverse field, and for the critical (Δ = 1) 1D XXZ model. In both instances, we consider a chain with N = 20 spins and plot the entanglement entropy as a function of the subsystem size \(\ell \in [1,N{\rm{/}}2]\). From this, we see that values generated from the RBM agree with the exact entanglement entropy to within statistical errors. Using our approach, an estimate of the entanglement entropy from experimental data can then be obtained using only simple measurements of the density, currently accessible with cold atoms30.

Fig. 3: Reconstruction of the entanglement entropy for 1D lattice spin models.
Fig. 3

The second Renyi entropy as a function of the subsystem size \(\ell\) for N = 20 spins. We compare results obtained using the the RBM wavefunctions (markers) with exact diagonalization (dashed lines) for the 1D TFIM at different values of the transverse magnetic field h and the 1D XXZ model with critical anisotropy Δ = 1.

Due to their power, flexibility and ease of use, unsupervised machine learning approaches such as those developed in this paper can readily be adapted to reconstruct complicated many-body quantum states from a limited number of experimental measurements. Our results suggest that RBM approaches will perform well on physically relevant many-body and quantum optics states, whereas poorer performance is expected for structureless, random states (as studied in the Supplementary Information). Feasible applications range from validating quantum computers and adiabatic simulators31, to reconstructing quantities that are challenging for a direct observation in experiments. In particular, we predict that the use of our machine learning approach for bosonic ultracold atom experiments will allow for the determination of the entanglement entropy on systems substantially larger than those currently accessible with quantum interference techniques28.


Experimental measurements and Kullback–Leibler divergences

We provide here a detailed description of the different steps required to perform quantum state tomography (QST) with neural networks for many-body quantum systems. We concentrate on the case of systems with two local degrees of freedom (spin-\(1/2\), qubits and so on) and choose σ ≡ σz as the reference basis for the N-body wavefunction \({\rm{\Psi }}({\mathbf{{\sigma}}})\equiv \left\langle {\mathbf{{\sigma}}}| {\rm{\Psi}}\right\rangle\) we intend to reconstruct. This high-dimensional function can be approximated with an artificial neural network (NN). Given a set of input variables (for example σ = σ1, σ2, …, σ N ), a NN is a highly nonlinear function whose output is determined by some internal parameters κ. The architecture of the network consists of a collection of elementary units, called neurons, connected by weighted edges. The strength of these connections, specified by the parameters κ, encode conditional dependence among neurons, in turn leading to complex correlations among the input variables. Increasing the number of auxiliary neurons systematically improves the expressive power of the NN function, which can then be used as a general-purpose approximator for the target wavefunction14. The goal of our tomography scheme is to find the best NN approximation for the many-body wavefunction, ψκ(σ), using only numerical data obtained through some outside means (such as simulation or experiment).

Our scheme proceeds as follows. First, we assume that a set of experimental measurements in a collection of bases b = 0, 1, 2 … N B is available. These measurements are distributed according to the probabilities \({P}_{b}\left({{\mathbf{{\sigma} }}}^{[b]}\right)\propto {\left|{\rm{\Psi }}\left({{\mathbf{{\sigma} }}}^{[b]}\right)\right|}^{2}\), and thus contain information about both the amplitudes and the phases of the wavefunction in the reference basis σ. The goal of the NN training is to find the optimal set of parameters κ such that ψκ(σ) mimics as closely as possible the data distribution in each basis; that is, \({\left|{\psi }_{{{{\rm{\kappa}}}}}\left({{\mathbf{{\sigma} }}}^{[b]}\right)\right|}^{2}\simeq {P}_{b}\left({{\mathbf{{\sigma} }}}^{[b]}\right)\). This is achieved by searching for the NN parameters that minimize the total statistical divergence Ξ(κ) between the target distributions and the reconstructed ones. Several possible choices can be made for Ξ(κ). Here, we define it as the sum of the Kullback–Leibler (KL) divergences in each basis:

$${\rm{\Xi}} ({\mathbf{\kappa }})\equiv \sum _{b=0}^{{N}_{B}}{{\rm{KL}}}_{{\rm{\kappa }}}^{[b]}=\sum _{b=0}^{{N}_{B}}\sum _{\left\{{{\mathbf{\sigma }}}^{[b]}\right\}}{P}_{b}\left({{\mathbf{\sigma }}}^{[b]}\right){\rm{log}}\frac{{P}_{b}\left({{\mathbf{\sigma }}}^{[b]}\right)}{{\left|{\psi }_{{\rm{\kappa }}}\left({{\mathbf{\sigma }}}^{[b]}\right)\right|}^{2}}$$

The total divergence Ξ(κ) is positive definite, and attains the minimum value of 0 when the reconstruction is perfect in each basis: \({\left|{\psi }_{\rm{\kappa }}\left({{\mathbf{\sigma }}}^{[b]}\right)\right|}^{2}={P}_{b}\left({{\mathbf{\sigma }}}^{[b]}\right)\). Depending on the target wavefunction, a sufficiently large set of measurement bases must be included in order to have enough information to estimate the phases in the reference basis. In practice, for most states of interest it is enough to include a number of bases that scales only polynomially with system size.

Once the training is complete, the NN provides a compact representation ψκ(σ) of the target wavefunction Ψ(σ). In turn, this representation can be used to efficiently compute various observables of interest, overlaps with other known quantum states and other information not directly accessible in the experiment. In the next two subsections, we describe in detail the specific parametrization of the NN wavefunction adopted in this work and its optimization.

The RBM wavefunction

There are many possible architectures and NNs that can be employed to represent a quantum many-body state. Following ref. 14, we employ a powerful stochastic NN called a restricted Boltzmann machine (RBM). The network architecture of an RBM features two layers of stochastic binary neurons, a visible layer σ describing the physical variables, and a hidden layer h. The expressive power of the model can be characterized by the ratio α = M/N between the number of hidden neurons M and visible neurons N. An RBM is also an energy-based model, sharing many properties of physical models in statistical mechanics. In particular, it associates with the graph structure a probability distribution given by the Boltzmann distribution

$${p}_{{\rm{\kappa}}}({\mathbf{\sigma }},{\mathbf{{h}}})={{\rm{e}}}^{\mathop{\sum}\nolimits_{ij}{W}_{ij}^{\kappa }{h}_{i}{\sigma }_{j}+\mathop{\sum}\nolimits_{j}{b}_{j}^{\kappa }{\sigma }_{j}+\mathop{\sum}\nolimits_{i}{c}_{i}^{\kappa }{h}_{i}}$$

where we omitted the normalization and κ now consists on the weights Wκ connecting the two layers and the fields (biases) bκ and cκ coupled to each visible and hidden neuron, respectively. The distribution (of interest) over the visible layer is obtained by marginalization over the hidden degrees of freedom

$${p}_{{\rm{\kappa }}}({\mathbf{\sigma }})=\sum _{{\mathbf{{h}}}}{p}_{{\rm{\kappa }}}({\mathbf{\sigma }},{\mathbf{{h}}})={{\rm{e}}}^{\mathop{\sum}\nolimits_{j}{b}_{j}^{{\rm{\kappa}}}{\sigma }_{j}+\mathop{\sum}\nolimits _{i}{\rm{log}}\left(1+{{\rm{e}}}^{{c}_{i}^{{\rm{\kappa}}}+\mathop{\sum}\nolimits_{j}{W}_{ij}^{{\rm{\kappa}}}{\sigma }_{j}}\right)}$$

The RBM wavefunction is then defined as

$${\psi }_{{\rm{\lambda }},{\rm{\mu }}}({\mathbf{\sigma }})=\sqrt{\frac{{p}_{{{\lambda }}}({\mathbf{\sigma }})}{{Z}_{{\rm{\lambda }}}}}{{\rm{e}}}^{i{\phi }_{{\rm{\mu }}}({\mathbf{\sigma }})/2}$$

where \({Z}_{{\rm {\lambda }}}={\sum }_{{\mathbf{\sigma }}}{p}_{{\rm {\lambda }}}({\mathbf{\sigma }})\) is the normalization constant, ϕμ(σ)  = log pμ(σ), and λ and μ are the two set of parameters. Note that the sampling of configurations σ from \({\left|{\psi }_{{\rm {\lambda }},{{\mu }}}({\mathbf{\sigma }})\right|}^{2}\) involves only the amplitude distribution pλ(σ)/Zλ . This can be achieved, as usual for RBMs, by performing block Gibbs sampling with the two conditional distributions \({p}_{{\rm{\lambda }}}({\mathbf{\sigma }}| {\mathbf{{h}}})\) and \({p}_{{\rm{\lambda }}}({\mathbf{{h}}}| {{{\mathbf{\sigma}}}})\), which can be computed exactly. This procedure is very efficient since each neuron in one layer of the RBM is connected only to neurons of a different layer, thus enabling us to sample all units (in one layer) simultaneously.

Gradients of the total divergence

The first step in the RBM's training is to build the data set of measurements. In general, different bases are needed to estimate both amplitudes and phases of the target state Ψ(σ). We define a series of data sets D b for each base b = 1, …, N B  − 1, with each data set \({D}_{b}={\left\{{{\mathbf{\sigma }}}_{i}^{[b]}\right\}}_{i=1}^{\left|{D}_{b}\right|}\) consisting of \(\left|{D}_{b}\right|\) density measurements with underlying distribution \({P}_{b}\left({{\mathbf{\sigma }}}^{[b]}\right)\propto {\left|{\rm{\Psi }}\left({{\mathbf{\sigma }}}^{[b]}\right)\right|}^{2}\), where \({{\mathbf{\sigma }}}^{[b]}=\left({\sigma }_{1}^{[b]},\ldots, {\sigma }_{N}^{[b]}\right)\) and σ[0] = σ . The quantity to minimize, also called negative log-likelihood, is then

$${\rm{\Xi}} ({\mathbf{\kappa }})=-\sum _{b=0}^{{N}_{B}-1}{\left|{D}_{b}\right|}^{-1}\sum _{{{\mathbf{\sigma }}}^{[b]}\in {D}_{b}}{\rm{log}}{\left|{\psi }_{{\rm{\lambda }},{\rm{\mu }}}\left({{\mathbf{\sigma }}}^{[b]}\right)\right|}^{2}$$

where we omitted here a constant term given by the sum of the the cross-entropies of the data sets \({\sum }_{b}{\mathbb{H}}({D}_{b})\). The NN wavefunction in the σ[b] basis is simply obtained by

$${\psi }_{{\rm{\lambda }},{\rm{\mu }}}\left({{\mathbf{\sigma }}}^{[b]}\right)=\sum _{\{{\mathbf{\sigma }}\}}{U}_{b}\left({\mathbf{\sigma}}, {{\mathbf{\sigma }}}^{[b]}\right){\psi }_{{\rm{\lambda}},{\rm{\mu }}}({\mathbf{\sigma }})$$

with U b (σ, σ[b]) being the basis transformation matrix. The rotated state, ψλ,μ(σ[b]), can be computed efficiently, provided that U acts non-trivially on a limited number of qubits.

We proceed now to give the expressions for the various gradients needed in the training. By plugging equation (8) into equation (9), we obtain

$$\begin{array}{lll}{\rm{\Xi}} ({\mathbf{\lambda }},{\mathbf{\mu }}) & = & {N}_{B}{\rm{log}}{Z}_{{\rm{\lambda }}}-\sum _{b=0}^{{N}_{B}}{\left|{D}_{b}\right|}^{-1}\\ & & \sum _{{{\mathbf{\sigma }}}^{[b]}\in {D}_{b}}\left[{\rm{log}}\left(\sum _{\{{\mathbf{\sigma }}\}}{U}_{b}\left({\mathbf{\sigma }},{{\mathbf{\sigma }}}^{[b]}\right)\sqrt{{p}_{{\rm{\lambda }}}({\mathbf{\sigma }})}{{\rm{e}}}^{i{\phi }_{{\rm{\mu }}}({\mathbf{\sigma }})/2}\right)+{\rm{c.c.}}\right]\end{array}$$

We define now the gradients \({{\mathscr{D}}}_{{\rm{\kappa }}}({\mathbf{\sigma }})={{{\nabla }}}_{{\rm{\kappa }}}{\rm{log}}{p}_{{\rm{\kappa }}}({\mathbf{\sigma }})\) with κ = λ, μ, and the quasi-probability distribution

$${Q}_{b}\left({\mathbf{\sigma }},{{\mathbf{\sigma }}}^{[b]}\right)={U}_{b}\left({\mathbf{\sigma }},{{\mathbf{\sigma }}}^{[b]}\right)\sqrt{{p}_{\lambda }({\mathbf{\sigma }})}{{\rm{e}}}^{i{\phi }_{{\rm{\mu }}}({\mathbf{\sigma }})/2}$$

Then, the derivatives of the KL divergence with respect to the parameters λ and µ are

$${{{\nabla }}}_{{\rm{\lambda }}}{\rm{\Xi}} ({\mathbf{\lambda }},{\mathbf{\mu }})={N}_{B}{\left\langle {{\mathscr{D}}}_{{\rm{\lambda }}}\right\rangle }_{{p}_{{\rm{\lambda }}}}-\sum _{b=0}^{{N}_{B}-1}\frac{1}{\left|{D}_{b}\right|}\sum _{{{\mathbf{\sigma }}}^{[b]}\in {D}_{b}}{\rm{Re}}\left\{{\left\langle {{\mathscr{D}}}_{{\rm{\lambda }}}\right\rangle\!}_{{Q}_{b}}\right\}$$


$${{{\nabla}}}_{{\rm{\mu }}}{\rm{\Xi}} ({\mathbf{\lambda }},{\mathbf{\mu }})=\sum _{b=0}^{{N}_{B}-1}\frac{1}{\left|{D}_{b}\right|}\sum _{{{\mathbf{\sigma }}}^{[b]}\in {D}_{b}}{\rm{Im}}\left\{{\left\langle {{\mathscr{D}}}_{{\rm{\mu }}}\right\rangle\!}_{{Q}_{b}}\right\}$$

In the expression above, we have defined the pseudo-averages:

$${\left\langle {{\mathscr{D}}}_{{\rm{\lambda }}{{/}}{{\mu }}}\right\rangle\!}_{{Q}_{b}}=\frac{\sum _{\{{\mathbf{\sigma }}\}}{Q}_{b}\left({\mathbf{\sigma }},{{\mathbf{\sigma }}}^{[b]}\right){{\mathscr{D}}}_{{\rm{\lambda }}{{/}}{\rm{\mu }}}({\mathbf{\sigma }})}{\sum _{\{{\mathbf{\sigma }}\}}{Q}_{b}\left({\mathbf{\sigma }},{{\mathbf{\sigma }}}^{[b]}\right)}$$

which can be efficiently computed directly summing over the samples in the data sets D b . On the other hand, the evaluation of the average

$${\left\langle {{\mathscr{D}}}_{{\rm{\lambda }}}\right\rangle\!}_{{p}_{{\rm{\lambda }}}}=\frac{1}{{Z}_{{\rm{\lambda }}}}\sum _{\{{\mathbf{\sigma }}\}}{p}_{{\rm{\lambda }}}({\mathbf{\sigma }}){{\mathscr{D}}}_{{\rm{\lambda }}}({\mathbf{\sigma }})$$

requires the knowledge of the normalization constant Zλ, which is not directly accessible. However, as per standard RBM training32, one can approximate this average by

$${\left\langle {{\mathscr{D}}}_{{\rm{\lambda }}}\right\rangle\!}_{{p}_{{\rm{\lambda }}}}\simeq \frac{1}{n}\sum _{k=1}^{n}{{\mathscr{D}}}_{{\rm{\lambda }}}({{\mathbf{\sigma }}}_{k})$$

where σ k are samples generated using a Markov chain Monte Carlo simulation.

Finally, we point out that in our work we have adopted a slightly simplified training scheme. In particular, we break down the training into two steps. First, we learn the amplitudes only by optimizing the parameters λ. In this case, it is sufficient to minimize the KL divergence over the reference basis alone (that is, σ). This part of the training is essentially a standard unsupervised learning procedure, involving the generation of samples from the RBM33. Then, we fix the parameters λ, and use the measurements in the auxiliary bases to determine the optimal values of the phase parameters μ. This other part of the training is achieved using the gradient in equation (14), and thus does not require Monte Carlo sampling from the NN.

Training the neural network

For a given set of parameters (that is, μ), the easiest way to numerically minimize the total divergence, equation (9), is by using simple stochastic gradient descent33. Each parameter μ j is updated as

$${\mu }_{j}\leftarrow {\mu }_{j}-\eta {\left\langle {g}_{j}\right\rangle }_{B}$$

where the gradient step η is called the learning rate and the gradient g j is averaged over a batch B (\(\left|B\right|\ll \left|D\right|\)) of samples drawn randomly from the full data set:

$${\left\langle {g}_{j}\right\rangle\!}_{B}=\frac{1}{\left|B\right|}\sum _{{\boldsymbol{\sigma }}\in B}{\rm{Im}}\left\{{\left\langle {{\mathscr{D}}}_{{\mu }_{j}}\right\rangle\!}_{{Q}_{b}}\right\}$$

Stochastic gradient descent is the optimization method used to learn the amplitudes of each physical systems presented in the paper. However, for the learning of the phases, we instead implement the natural gradient descent method34, which is found to be more effective, although at the cost of increased computational resources. In this case, we update the parameters as

$${\mu }_{j}\leftarrow {\mu }_{j}-\eta \sum _{i}{\left\langle {S}_{ij}^{-1}\right\rangle\!}_{B}{\left\langle {g}_{j}\right\rangle\!}_{B}$$

where we have introduced the Fisher information matrix:

$${\left\langle {S}_{ij}\right\rangle\!}_{B}=\frac{1}{\left|B\right|}\sum _{{\mathbf{\sigma }}\in B}{\rm{Im}}\left\{{\left\langle {{\mathscr{D}}}_{\rm{\mu }_{i}}\right\rangle\!}_{{Q}_{b}}\right\}{\rm{Im}}\left\{{\left\langle {{\mathscr{D}}}_{\rm{\mu}_{j}}\right\rangle\!}_{{Q}_{b}}\right\}$$

The learning rate magnitude η is set to

$$\eta =\frac{{\eta }_{0}}{\sqrt{\sum _{ij}{\left\langle {S}_{ij}\right\rangle\!}_{B}\times {\left\langle {g}_{i}\right\rangle }_{B}{\left\langle {g}_{j}\right\rangle\!}_{B}}}$$

with some initial learning rate η0. The matrix \({\left\langle {S}_{ij}\right\rangle }_{B}\) takes into account the fact that, since the parametric dependence of the RBM function is nonlinear, a small change in some parameters may correspond to a very large change in the distribution. In this way, one implicitly uses an adaptive learning rate for each parameter μ j and speeds up the optimization compared to the simplest gradient descent. We note that a very similar technique is successfully used in quantum Monte Carlo for optimizing high-dimensional variational wavefunctions35,36. Similarly to our case, noisy gradients, which come from the Monte Carlo statistical evaluation of energy derivatives with respect to the parameters, are present, while the matrix S is instead given by the covariance matrix of these forces. Since the matrix \({\left\langle {S}_{ij}\right\rangle }_{B}\) is affected by statistical noise, we regularize it by adding a small diagonal offset, thus improving the stability of the optimization.

Data availability

The data that support the plots within this paper and other findings of this study are available from the corresponding author upon reasonable request.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Vogel, K. & Risken, H. Determination of quasiprobability distributions in terms of probability distributions for the rotated quadrature phase. Phys. Rev. A. 40, 2847 (1989).

  2. 2.

    Leonhardt, U. Quantum-state tomography and discrete Wigner function. Phys. Rev. Lett. 74, 4101–4105 (1995).

  3. 3.

    White, A. G., James, D. F. V., Eberhard, P. H. & Kwiat, P. G. Nonmaximally entangled states: production, characterization, and utilization. Phys. Rev. Lett. 83, 3103–3107 (1999).

  4. 4.

    Häffner, H. et al. Scalable multiparticle entanglement of trapped ions. Nature 438, 643–646 (2005).

  5. 5.

    Lu, C.-Y. et al. Experimental entanglement of six photons in graph states. Nat. Phys. 3, 91–95 (2007).

  6. 6.

    Bloch, I., Dalibard, J. & Zwerger, W. Many-body physics with ultracold gases. Rev. Mod. Phys. 80, 885–964 (2008).

  7. 7.

    Blatt, R. & Roos, C. F. Quantum simulations with trapped ions. Nat. Phys. 8, 277–284 (2012).

  8. 8.

    Shulman, M. D. et al. Demonstration of entanglement of electrostatically coupled singlet-triplet qubits. Science 336, 202–205 (2012).

  9. 9.

    Hinton, G. E. & Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006).

  10. 10.

    LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

  11. 11.

    Wang, L. Discovering phase transitions with unsupervised learning. Phys. Rev. B 94, 195105 (2016).

  12. 12.

    Carrasquilla, J. & Melko, R. G. Machine learning phases of matter. Nat. Phys. 13, 431–434 (2017).

  13. 13.

    van Nieuwenburg, E. P. L., Liu, Y.-H. & Huber, S. D. Learning phase transitions by confusion. Nat. Phys. 13, 435–439 (2017).

  14. 14.

    Carleo, G. & Troyer, M. Solving the quantum many-body problem with artificial neural networks. Science 355, 602–606 (2017).

  15. 15.

    Gross, D., Liu, Y.-K., Flammia, S. T., Becker, S. & Eisert, J. Quantum state tomography via compressed sensing. Phys. Rev. Lett. 105, 150401 (2010).

  16. 16.

    Tóth, G. et al. Permutationally invariant quantum tomography. Phys. Rev. Lett. 105, 250403 (2010).

  17. 17.

    Cramer, M. et al. Efficient quantum state tomography. Nat. Commun. 1, 149 (2009).

  18. 18.

    Lanyon, B. P. et al. Efficient tomography of a quantum many-body system. Nat. Phys. 13, 1158–1162 (2017).

  19. 19.

    Deng, D.-L., Li, X. & Sarma, S. D. Machine learning topological states. Phys. Rev. B 96, 195145 (2017).

  20. 20.

    Torlai, G. & Melko, R. G. Neural decoder for topological codes. Phys. Rev. Lett. 119, 030501 (2017).

  21. 21.

    Deng, D.-L., Li, X. & Das Sarma, S. Quantum entanglement in neural network states. Phys. Rev. X 7, 021021 (2017).

  22. 22.

    Gao, X. & Duan, L.-M. Efficient representation of quantum many-body states with deep neural networks. Nat. Commun. 8, 662 (2017).

  23. 23.

    Chen, J., Cheng, S., Xie, H., Wang, L. & Xiang, T. On the equivalence of restricted Boltzmann machines and tensor network states. Preprint at http://arxiv.org/abs/1701.04831 (2017).

  24. 24.

    Huang, Y. & Moore, J. E. Neural network representation of tensor network and chiral states. Preprint at http://arxiv.org/abs/1701.06246 (2017)

  25. 25.

    Torlai, G. & Melko, R. G. Learning thermodynamics with Boltzmann machines. Phys. Rev. B 94, 165134 (2016).

  26. 26.

    Wang, X.-L. et al. Experimental ten-photon entanglement. Phys. Rev. Lett. 117, 210502 (2016).

  27. 27.

    Richerme, P. et al. Non-local propagation of correlations in quantum systems with long-range interactions. Nature 511, 198–201 (2014).

  28. 28.

    Islam, R. et al. Measuring entanglement entropy in a quantum many-body system. Nature 528, 77–83 (2015).

  29. 29.

    Hastings, M. B., González, I., Kallin, A. B. & Melko, R. G. Measuring Renyi entanglement entropy in quantum Monte Carlo simulations. Phys. Rev. Lett. 104, 157201 (2010).

  30. 30.

    Bakr, W. S. et al. Probing the superfluid-to-Mott insulator transition at the single-atom level. Science 329, 547–550 (2010).

  31. 31.

    Johnson, M. W. et al. Quantum annealing with manufactured spins. Nature 473, 194–198 (2011).

  32. 32.

    Hinton, G. E. Training products of experts by minimizing contrastive divergence. Neural Comput. 14, 1771–1800 (2002).

  33. 33.

    Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, Cambridge, MA, 2016).

  34. 34.

    Amari, S.-i Natural gradient works efficiently in learning. Neural Comput. 10, 251–276 (1998).

  35. 35.

    Sorella, S. Green function Monte Carlo with stochastic reconfiguration. Phys. Rev. Lett. 80, 4558 (1998).

  36. 36.

    Becca, F. & Sorella, S. Quantum Monte Carlo Approaches for Correlated Systems (Cambridge Univ. Press, Cambridge, 2017).

Download references


We thank L. Aolita, H. Carteret, G. Tóth and B. Kulchytskyy for useful discussions. G.T. thanks the Institute for Theoretical Physics, ETH Zurich, for hospitality during various stages of this work. G.T. and R.M. acknowledge support from NSERC, the Canada Research Chair programme, the Ontario Trillium Foundation and the Perimeter Institute for Theoretical Physics. Research at the Perimeter Institute is supported through Industry Canada and by the Province of Ontario through the Ministry of Research and Innovation. G.C., G.M. and M.T. acknowledge support from the European Research Council through ERC Advanced Grant SIMCOFE, and the Swiss National Science Foundation through NCCR QSIT and MARVEL. Simulations were performed on resources provided by SHARCNET, and by the Swiss National Supercomputing Centre CSCS.

Author information


  1. Department of Physics and Astronomy, University of Waterloo, Waterloo, Ontario, Canada

    • Giacomo Torlai
    •  & Roger Melko
  2. Perimeter Institute of Theoretical Physics, Waterloo, Ontario, Canada

    • Giacomo Torlai
    •  & Roger Melko
  3. Theoretische Physik, ETH Zurich, Zurich, Switzerland

    • Guglielmo Mazzola
    • , Matthias Troyer
    •  & Giuseppe Carleo
  4. Vector Institute, Toronto, Ontario, Canada

    • Juan Carrasquilla
  5. D-Wave Systems, Burnaby, British Columbia, Canada

    • Juan Carrasquilla
  6. Quantum Architectures and Computation Group, Station Q, Microsoft Research, Redmond, WA, USA

    • Matthias Troyer
  7. Center for Computational Quantum Physics, Flatiron Institute, New York, NY, USA

    • Giuseppe Carleo


  1. Search for Giacomo Torlai in:

  2. Search for Guglielmo Mazzola in:

  3. Search for Juan Carrasquilla in:

  4. Search for Matthias Troyer in:

  5. Search for Roger Melko in:

  6. Search for Giuseppe Carleo in:


G.C. designed the research. G.T. devised the machine learning methods. G.T., G.M. and J.C. performed the machine learning numerical experiments. G.M. performed QMC simulations. All authors contributed to the analysis of the results and writing of the manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Giuseppe Carleo.

Supplementary information

  1. Supplementary Information

    Supplementary Figures 1–5, Supplementary Notes, Supplementary References.

About this article

Publication history






Further reading