Introduction

Finding solutions to strongly correlated quantum many-body systems, where the Hilbert space comprising all possible configurations grows exponentially with system size, relies on approximations and numerical simulation. In recent years neural networks encoding quantum many-body states, termed neural network quantum states (NQS), have emerged as a promising tool for a variational treatment of quantum many-body problems1. Using gradient-based optimization, the groundstate or time-evolved quantum wavefunctions as solutions to the many-body Schrödinger equation can be efficiently approximated1,2,3,4,5,6,7,8,9,10, exploiting the expressive powers and the universal approximation properties of state of the art machine learning techniques that also often face very high dimensional problems. These ansatz functions often require less parameters to express the exponential complexity of correlations and bear the prospect of an improved scaling behaviour to larger and higher dimensional systems.

Open quantum systems, where the system is connected to an environment or bath that induces dissipative processes described by a quantum master equation, have become of great interest in part due to the advent of noisy quantum devices such as quantum computers. Neural network density operators (NDO), parameterizing the mixed density operator describing such systems, have recently been shown to be a capable numerical tool to compute the dynamics of open quantum systems11,12,13,14,15. With Hilbert spaces scaling like 22N, instead of 2N as for quantum wavefunctions, this is a particularly difficult task. So far mainly two approaches have been suggested. One is to analytically trace out additional bath degrees of freedom in a purified restricted Boltzmann machine (RBM)11, yielding an ansatz function that always fulfils all the properties of a physical density matrix, as used in refs. 12, 14, 16, 17. An alternative approach is to describe the system using a probabilistic formulation via positive operator-valued measurements (POVM)18,19. This method works with arbitrary network ansatz functions and was shown to improve upon results reached by purified RBM for some cases19. Reference 20 found some advantages of density operators based on RBM in comparison with the POVM formulation when looking at mixed quantum state reconstruction.

We will focus on an improvement of the first method and address the constraints which limit the choice of parametrization. For a matrix to represent a physical density operator, it must be positive semidefinite and Hermitian, which the purified ansatz using RBM fulfils by construction. If the positivity condition is not enforced however, this opens up new possibilities to use more powerful representations. More modern deep neural networks were shown to outperform shallow RBM and fully connected networks for closed quantum systems, where the complex groundstate wavefunction is targeted instead of a mixed density matrix, e.g. refs. 21,22.

In the ongoing search for better variational functions to approximate mixed density matrices, in this work we apply prototypical deep convolutional neural networks (CNN), which are part of most modern deep learning models, to open quantum systems, by parametrizing the density matrix directly. Due to the parameter sharing properties of CNN, translation symmetry can be enforced easily. This also leads to a system size independent parametrization, enabling transfer learning to larger systems. We find considerably improved results compared to NDO based on RBM, using much fewer parameters. To the best of our knowledge, this represents the first time that competitive results are achieved by a non-positive neural network parametrization of the density matrix.

In this paper we first introduce a neural network architecture based on convolutional neural networks to encode the density matrix of translational invariant open quantum systems and then present our results for different physical models. In the Methods section we describe the formulation of finding the steady state solution to the Lindblad master equation as an optimization problem.

Results

Neural-network density operator based on purification

The neural-network density operator (NDO) based on a purified RBM is defined by analytically tracing out additional ancillary nodes a in an extended system described by a parametrized wavefunction ψ(σa)11. The reduced density matrix for the physical spin configurations \({{\boldsymbol{\sigma }}},{{{\boldsymbol{\sigma }}}}^{{\prime} }\) is then obtained by marginalizing over these bath degrees of freedom a

$$\rho ({{\boldsymbol{\sigma }}},{{{\boldsymbol{\sigma }}}}^{{\prime} })=\sum\limits_{{{\boldsymbol{a}}}}{\psi }^{* }({{\boldsymbol{\sigma }}},{{\boldsymbol{a}}})\psi ({{{\boldsymbol{\sigma }}}}^{{\prime} },{{\boldsymbol{a}}}).$$
(1)

This can be done as long as the dependence on a in ψ can be factored out, as is the case for the RBM wavefunction1 ansatz \(\psi ({{\boldsymbol{\sigma }}},{{\boldsymbol{a}}})=\exp [-E({{\boldsymbol{\sigma }}},{{\boldsymbol{a}}})]\) adopted in refs. 12,13,14 where E is an Ising type interaction energy and thus a linear function of a. But this particular design does not represent the most general density matrix, where the ancillary bath degrees of freedom would be spins in the visible layer that are traced out, instead of another set of hidden nodes. This would require a computationally expensive sampling of hidden layers because the ancillary nodes cannot be traced out analytically any more, as is the case when extending the full purification approach to deep networks23. It was recently shown however, that better results can be achieved24 by keeping the RBM coupling only for the ancillary nodes and parametrizing the dependence on the visible nodes with a deep neural network.

These purified RBM NDO which represent a reduced density matrix in an extended system have the advantage of being positive-semidefinite and Hermitian by design. However, a variational ansatz encoding the density operator does not necessarily need to enforce these properties exactly, as was previously shown15,25. Once the optimization problem yielding the steadystate solution is solved, the resulting density matrix should have these properties within some approximation error. Letting the optimization deal with this enables us to use new classes of potentially more expressive networks, such as CNN, as variational tools for open quantum systems.

Convolutional neural networks

Several studies show that deep network architectures, which have more than one hidden layer of non-linear transformations, offer considerable advantages in expressing highly entangled quantum wavefunctions compared to shallow networks like RBM21,22. Especially CNN have been applied very successfully to closed quantum many-body systems and problems in continuous space3,10,26,27.

Convolutional neural networks are used in most modern neural network architectures, for example in image recognition28. They work by applying convolution filters, which constitute a part of the variational parameters, to the input data, followed by a non-linear activation function and possibly some pooling or averaging to decrease the feature size29. The output of the n-th convolutional layer is computed as

$${F}_{i,j,k}^{(n)}=f\left(\sum\limits_{x=1}^{X}\sum\limits_{y=1}^{Y}\sum\limits_{c=1}^{C}{F}_{i+x,j+y,c}^{(n-1)}{K}_{x,y,c,k}^{(n-1)}\right)$$
(2)

with the k-th Kernel K of size (XYC), a non-linear activation function f and setting F(0) to be the input layer. Here the indices x and y run over the spatial dimensions, whereas c indexes the channels, i.e. how many kernels there were in the previous layer. In the case of image recognition for example, the channels dimension in the first layer is used to encode the colour channel in RGB input images. In our case of spin configuration, the channels describe the left and right Hilbert spaces of the density matrix, as discussed below.

Essentially, for each layer a fixed size matrix of parameters is scanned over the input, and the output is an array of inner products of this kernel with the input at the respective positions. Such a convolutional layer produces so called feature maps F as the output. These indicate the locations where the convolution filters K matches well with the corresponding part of the input29. From this it is apparent that convolutional kernels extract only local information or short range correlations between the input nodes for that layer. However, successively applying multiple such layers, the field of view of the output nodes is increased. A convolutional layer can also be understood as a fully connected layer of neurons, where some parameters are shared between them, hence there are many more connections than parameters.

Convolutional neural-network density operator

To represent a Hermitian density operator we start by parametrizing

$${\rho }_{{{\boldsymbol{\theta }}}}({{\boldsymbol{\sigma }}},{{{\boldsymbol{\sigma }}}}^{{\prime} })={A}_{{{\boldsymbol{\theta }}}}{({{\boldsymbol{\sigma }}},{{{\boldsymbol{\sigma }}}}^{{\prime} })}^{* }+{A}_{{{\boldsymbol{\theta }}}}({{{\boldsymbol{\sigma }}}}^{{\prime} },{{\boldsymbol{\sigma }}})$$
(3)

with the network output Aθ and a set of variational parameters θ. It makes sense to consider the locality of the spins σi and \({\sigma }_{i}^{{\prime} }\) at site i in the design of the network. This can be done by stacking σ and \({{{\boldsymbol{\sigma }}}}^{{\prime} }\) instead of concatenating them, essentially introducing a new dimension. For a one-dimensional chain of N spins, the input then becomes a 2D image of size N × 2 where the pairs \(({\sigma }_{i},{\sigma }_{i}^{{\prime} })\) stay together. In common neural network software libraries the channels dimension of the input layer can be used for this. Lattices with more than one dimension are equally easy to implement in this manner. We then apply to the input nodes two or more convolutional layers with fixed-size kernels. The kernel sizes together with the depth of the network determine how well long-range correlations can be represented. The resulting feature maps in each layer are transformed element-wise by the leaky variant of the rectified linear unit (ReLU), defined as

$$f(x)=\max [0,(1-\alpha )x]+\alpha x$$
(4)

with α = 0.01. To obtain a complex density matrix amplitude A, the final feature maps are taken as the input to a fully connected layer with two output neurons F(0, 1) representing the real and imaginary parts of A = F0 + iF1. In this way, all computations inside the model can be done using real-valued parameters. In terms of Eq. (2) this step can be understood as applying two kernels of the same size as the previous layer’s output.

The variational parameters consist of the convolution kernels K in Eq. (2) as well as the weights of the final dense layer. The input F(0) of the network is constructed by a configuration \(({{\boldsymbol{\sigma }}},{{{\boldsymbol{\sigma }}}}^{{\prime} })\). The network architecture is depicted in Fig. 1, including a so-called pooling layer that is described below.

Fig. 1: Convolutional neural density operator architecture.
figure 1

Architecture of a simple CNN network ansatz for translation invariant density matrices of a one dimensional system: Two convolutional layers with 4 and 10 kernels of shape (XYC) = (3, 1, 2) and (3, 1, 4) respectively, indicated in red. The columns represent the σ and \({{{\boldsymbol{\sigma }}}}^{{\prime} }\) input vectors in the first layer and the feature maps in the consequent layers. For translation invariant models, a pooling layer averages over the spatial dimensions. Finally a fully connected layer with two output nodes F0 and F1 map to the real and imaginary part of the density matrix element. For a two dimensional lattice, we simply introduce an additional dimension in the input and all convolutional layers.

Periodic boundary conditions (pbc) in the physical system can be considered by applying them to each convolutional layer’s input. Even though over the length of the spin chain the same parameters are used repeatedly, the resulting feature maps for a given configuration are not translation invariant as there is still information about where in the lattice a feature occurred. Translation invariance for systems with periodic boundary conditions can be easily imposed using a single pooling layer on the output of the last convolution, averaging all nodes along the physical dimensions

$${F}_{k}^{({{\rm{pool}}})}=\frac{1}{XY}\sum\limits_{x,y}{F}_{x,y,k}.$$
(5)

This greatly reduces the number of parameters and connections in the fully connected layer and, more importantly, the resulting network does not depend on the size of the input and hence the same parameter set can be used for any size of the physical system under consideration. This enables what is called transfer learning, which amounts in pre-training the model, for example on a smaller spin chain, and using these parameters as the initial values for a larger system3. In this way, the kernels can be trained to assume shapes that detect relevant features in the spin configurations and their correlations occurring in the steady-state, which are likely to also appear in larger physical systems. This often improves the obtained results and can accelerate convergence. In contrast to applying the translation operator to all input configurations and summing over the symmetry group members as is conventionally done30,31, with this architecture no additional effort is needed.

Dissipative transverse-field Ising model on a spin chain

To evaluate the expressive power of the neural network ansatz described in the previous section, we apply it to the problem of finding the non-equilibrium steady state (NESS) of the 1D dissipative transverse field Ising (TFI) model of N spins with periodic boundary conditions. The Hamilton operator for this system is

$$H=\frac{V}{4}\sum\limits_{j}{\sigma }_{j}^{z}{\sigma }_{j+1}^{z}+\frac{g}{2}{\sigma }_{j}^{x}$$
(6)

with the Pauli matrices \({\sigma }_{j}^{x,y,z}\) at site j, an energy scale V and a magnetic field strength g. The homogeneous dissipation is described by γk = γ (in Eq. (9)) and by the jump operators \({L}_{j}={\sigma }_{j}^{-}=\frac{1}{2}({\sigma }_{j}^{x}-i{\sigma }_{j}^{y})\) on all sites j. We set V/γ = 2 to compare with the results of ref. 14.

For the spin chain we use the network architecture depicted in Fig. 1, using two convolutional layers with 6 and 20 feature maps with kernel sizes (XYC) = (3, 1, 2) and (3, 1, 6) respectively, followed by a mean pooling over the spatial dimension and a fully connected final layer. In Fig. 2 we plot the observables σxσy and σz averaged over all sites for different magnetic field strengths g using our CNN ansatz compared to exact values and the results obtained using RBM from ref. 14. We can see that the CNN produces results with good accuracy, also in the range of the magnetic field g/γ from 1 to 2.5, where the RBM had trouble producing the correct result even with an increased computational effort (compare ref. 14).

Fig. 2: Results for the transverse field Ising model.
figure 2

One dimensional dissipative transverse field Ising model with N = 16 and periodic boundary conditions. Results obtained by optimizing our CNN ansatz (red dots) compared to exact results (solid black lines) and results obtained using RBM from ref. 14 (black crosses). The final expectation values use 100000 Monte-Carlo samples which reduces the variance and leads to negligible errorbars.

This is achieved using only 438 parameters, which is considerably less than the RBM which had 2752 trainable parameters with hidden and ancillary node densities 1 and 4 respectively, as was used in ref. 14. We chose a relatively small number of feature-maps in the first layer, as they tend to become redundant, whereas in consecutive layers, more parameters improve the results.

We found that the initialization of the parameters of convolutional layers has a big impact on the performance. We initialize the kernels following a normal distribution with zero mean and a standard deviation \(\sqrt{2/{v}_{n}}\) with vn being the number of parameters in the n-th layer, in order to control the variance throughout the network32. We then further initialized the parameters by pre-training on a 6-site chain. During optimization, the sums in Eq. (11) were evaluated using a sample size of 1024 for 5000 to 20000 iterations until converged. The final observables were computed according to Eq. (14) with 100000 samples to reduce the variance.

Interestingly, with RBM it was not possible (contrary to the CNN) to achieve an accurate result for certain parameter ranges with reasonable effort, even for such a small system of 6 spins, as can be seen in Fig. 3 where we compare 〈σx〉 and the cost function Eq. (11) during the optimization. This is probably due to the limited expressivity of the purified RBM description. The computational effort per iteration for different chain lengths is plotted in Fig. 4, analysed on a typical notebook computer with 8 CPU cores. Both RBM and CNN show a polynomial scaling and the RBM is about a factor 2 faster to evaluate than our particular CNN implementation. However a difference in convergence behaviour can outweigh this, as Fig. 3 demonstrates.

Fig. 3: Convergence compared to the restricted Boltzmann machine.
figure 3

Convergence of (a) the σx observable compared to exact diagonalization (red dashed line) and (b) the cost function for a 6-spin dissipative TFI model with g/γ = 2 using a RBM (yellow line) as described in ref. 14 and the CNN architecture (blue line).

Fig. 4: Computation time.
figure 4

Computation time per iteration of the CNN architecture (blue line) compared to a RBM as described in ref. 14 (yellow line) as a function of chain length N and showing a polynomial scaling for comparison (red dashed line).

The constant number of parameters in principle makes it easier to scale to larger systems. For 30 spins we obtained similar expectation values as depicted in Fig. 1, such as 〈σx〉 = 0.27 at g/γ = 2. Since in this setup the exact solution of these observables has no strong dependence on the chain length, this result seems plausible. For the 1D spin chain we found comparable results with network architectures without the mean pooling layer, but then there are more parameters in the final dense layer depending on the spatial dimension of the input, which leads to a higher computational effort. The advantage of this architecture is that one can treat non-translation invariant systems, for example without periodic boundary conditions, such as an asymmetrically driven dissipative spin chain as in ref. 17.

Transverse-field Ising model with rotated Hamiltonian

To further test our approach, we address the 1D dissipative transverse field Ising model with rotated Hamiltonian

$$H=\frac{V}{4}\sum\limits_{j}{\sigma }_{j}^{x}{\sigma }_{j+1}^{x}-\frac{g}{2}{\sigma }_{j}^{z}$$
(7)

and unchanged dissipative jump operators Lj = σ. This was investigated for example in the case of an 1D array of coupled optical cavities33,34 and previous results suggested that neural network quantum states had difficulty obtaining correct results35. To compare with literature a spin chain with open boundary conditions is used which also addresses the interesting question of whether systems without translation invariance can be solved by the CNN. We set V = − 2 and γ = 1 and look at the correlation functions \(\langle {\sigma }_{j}^{x}{\sigma }_{j+l}^{x}\rangle\) for varying magnetic field g. Observables in the middle of the chain with j = N/2 are investigated to avoid edge effects34. Comparing to exact values, in Fig. 5 we show the results for N = 8 spins obtained by optimizing the same CNN model as described in the previous section. The exact diagonalization results are reproduced with good accuracy. We found good results with or without the pooling layer, suggesting that edge effects played no major role. To analyse the positivity of the density matrix obtained in approximation, in the bottom panel all positive and negative eigenvalues are shown. The largest positive eigenvalue is >4 orders of magnitude bigger than the largest absolute negative eigenvalue and the sum of all positive eigenvalues amounts to 1.000042 whereas the trace is 1, indicating that the optimization retrieved an essentially positive matrix. For the \(\langle {\sigma }_{j}^{z}{\sigma }_{j+1}^{z}\rangle\) expectation values, for which some difficulties were recently reported even for a 6-spin chain35, we achieved relative errors to exact diagonalization results of no more than 0.3%.

Fig. 5: Rotated transverse field Ising model and positivity of the approximation.
figure 5

CNN steady-state solution of the dissipative TFI model with rotated Hamiltonian Eq. (7) on N = 8 spins using V = − 2 and γ = 1. Panel (a) shows expectation values of steady-state correlation functions for varied magnetic field strength g as different markers for different distances l, compared to exact diagonalization results depicted by solid lines. Expectation values using 100,000 samples are shown to reduce the variance. Panel (b) shows the positive and negative eigenvalues λ of the CNN density matrix for g = − 1 to determine the positivity of the approximation.

Scaling to larger systems, in Fig. 6 we plot the \(\langle {\sigma }_{j}^{x}{\sigma }_{j+l}^{x}\rangle\) correlation function for 40 spins at the critical points g = ± 1 to investigate long-range correlations. In order to compare with the numerical matrix product operator results by ref. 34 we use a smaller dissipation γ = 1/2. The anti-ferromagnetic ordering of the x-components at positive g and the π rotation for positive and negative fields are perfectly reproduced and our results are generally in good agreement with the reference values.

Fig. 6: Results for the rotated transverse field Ising model.
figure 6

CNN steady-state solution (cross markers) of the dissipative TFI model with rotated Hamiltonian on N = 40 spins at g = 1 (red) and g = − 1 (blue) using V = − 2 and γ = 1/2 compared to numerical values from ref. 34 (square markers). Expectation values using 100000 samples are shown to reduce the variance.

We noticed that the initialization of parameters can have a major impact on the speed of convergence. Initializing the model with pre-trained parameters of a smaller chain improves the convergence of larger models. This can be seen in Fig. 7 where we plot the convergence of the cost function, its variance and the nearest-neighbour correlation function with and without pre-training for 16 spins. A good approximation of the observable is already obtained after very few iterations, while the residue of the Lindblad equation still decreases further. We also show the larger 40 spin chain, which is more difficult to optimize but displays a similar convergence.

Fig. 7: Convergence for the rotated transverse field Ising model.
figure 7

Convergence of (a) the cost function, (b) its variance and (c) a running average of the \(\langle {\sigma }_{N/2}^{x}{\sigma }_{N/2+1}^{x}\rangle\) observable while optimizing the CNN to solve the dissipative TFI model of N spins with rotated Hamiltonian for g =− 1 and γ = 1/2. Using pre-trained kernels of a smaller system of 8 spins accelerates convergence, as seen in red for N = 16 and blue for N = 40 compared to no pretraining and N = 16 in yellow.

A stronger dissipation may also lead to faster convergence and can be used in a pre-training step. As a numerical detail we would like to mention that we found better convergence when turning the computational basis into the x-axis. In this case the Hamiltonian from Eq. (6) is retrieved but with rotated dissipation operators and observables.

2D dissipative Heisenberg model

To demonstrate expanding the network to higher dimensional systems, we look at the 2D dissipative Heisenberg model with periodic boundary conditions. The Hamiltonian reads

$$H=\sum\limits_{\langle j,k\rangle }\left({J}_{x}\,{\sigma }_{j}^{x}{\sigma }_{k}^{x}+{J}_{y}\,{\sigma }_{j}^{y}{\sigma }_{k}^{y}+{J}_{z}\,{\sigma }_{j}^{z}{\sigma }_{k}^{z}\right).$$
(8)

Following the setup in ref. 19 a uniform dissipation rate γ for the jump operators \({L}_{j}={\sigma }_{j}^{-}\) and Jx = 0.9γJz = γ are set. In Fig. 8 the steady-state results of the σz expectation value for a 3 × 3 lattice is plotted for different values of Jy/γ, obtained by optimizing the CNN ansatz compared to exact values from ref. 19. Using only 350 variational parameters, the CNN achieves comparable accuracy to the variational POVM solution by ref. 19. In their work they improved the variational results by running real-time evolution steps starting from the final optimized state. This could potentially present a method for further improving an already converged CNN result as well. We also plot the results we obtained for 4 × 4 and 6 × 6 lattices which indicate a decrease in the absolute z expectation values with system size. These calculations were again initialized with pre-trained parameters.

Fig. 8: 2D dissipative Heisenberg model.
figure 8

Steady-state expectation values of a 2D dissipative Heisenberg model with periodic boundary conditions obtained by optimizing the CNN ansatz. The results for the 3 × 3 lattice (red dots) are compared to exact values from ref. 19 (solid black line). Results for 4 × 4 (blue crosses) and 6 × 6 (green plusses) lattices are also given. The final expectation values use 100000 Monte-Carlo samples which reduces the variance and leads to the small errorbars.

Here we chose a smaller 2 × 2 kernel size in the physical dimensions and 3 convolutional layers with 6 kernels each, followed by a mean pooling and a fully connected layer, analogues to Fig. 1. Due to the symmetry of the Hamiltonian and the Lindblad operators, \(\rho ({{\boldsymbol{\sigma }}},{{{\boldsymbol{\sigma }}}}^{{\prime} })\) is nonzero only in sectors where \({\sum }_{i}({\sigma }_{i}-{\sigma }_{i}^{{\prime} })=2n\) with \(n\in {\mathbb{Z}}\). Following ref. 17, this restriction can be implemented in the Monte-Carlo sampling by proposing only allowed configurations, leading to a faster convergence. We also rescaled the Monte-Carlo weights from Eq. (12) using ρ2β with β from 0.2 – 0.5 to better cover the configuration space, as described in ref. 17. This new probability distribution is easier to sample from, as can be seen in Fig. 9, where for a 2 × 2 lattice the rescaled exact density matrix ρ2β is displayed, reordered according to the total spin of the configurations showing the allowed sectors. We again used a sample size of 1024 during the optimization.

Fig. 9: Reordered exact density matrix.
figure 9

Exact steady-state density matrix \({\rho }_{ij}=\rho ({{{\boldsymbol{\sigma }}}}_{i},{{{\boldsymbol{\sigma }}}}_{j}^{{\prime} })\) of a 2 × 2 dissipative Heisenberg model, scaled as ρ2β with β = 0.2 to obtain a probability distribution according to Eq. (12) that is easier to sample from. The row and column indices are all possible configurations σ and \({{{\boldsymbol{\sigma }}}}^{{\prime} }\) respectively, ordered according to the total spin of the configurations to show the allowed sectors.

Conclusions

We demonstrated how a deep neural network ansatz can improve upon previous variational results by parametrizing the mixed density matrix directly and not enforcing the positivity. We rearranged the left and right Hilbert space of the spin configurations, which enabled a simple convolutional network architecture to efficiently capture the NESS of the dissipative transverse field Ising model with considerably less parameters compared to neural density operators ansatz functions based on RBM. Furthermore, with the same architecture we successfully obtained correct solutions for a rotated Hamiltonian. We also exemplified how to expand the model to 2D systems and reported some results for the dissipative Heisenberg model, applying transfer learning to accelerate computations for larger systems. These results encourage to explore other powerful neural network architectures to represent mixed density matrices without the explicit constraints of positivity, which RBM density operators were designed around.

Convolutions capture local correlations only, but varying the kernel sizes and the depth of the network, it can be tuned to better express longer range correlations. The simplicity of the network architecture makes it easily expandable and possibly interpretable, as the first-layer kernels for example should learn important spin-spin correlations that are then connected to each other in following layers.

By introducing a pooling layer over all physical dimensions, translation invariance is enforced at no additional cost and the number of parameters is reduced at the same time. This, combined with the fact that in the convolutional layers there are no size-dependent fixed connections, enables transfer learning - using the same set of parameters as initialization for different physical system sizes. While this leads to a good fit for translation invariant models, it also encourages to look into CNN density matrices for physical models without such symmetries by leaving out the pooling step to keep the locality information in the network. Designing and applying complex valued networks and parameters could be another interesting area for further investigation.

Methods

Optimizing for the nonequilibrium steady-state density operator

The dynamics of an open quantum system with Hamiltonian H coupled to a Markovian environment is described by the Lindblad master equation

$$\frac{d \widehat{\rho }}{dt}={{\mathcal{L}}}\widehat{\rho }=-i\left[H,\widehat{\rho }\right]+\sum\limits_{k}{\gamma }_{k}\left({L}_{k}\widehat{\rho }{L}_{k}^{{\dagger} }-\frac{1}{2}\{{L}_{k}^{{\dagger} }{L}_{k},\widehat{\rho }\}\right)$$
(9)

with the jump operators Lk leading to non-unitary dissipation. The non-equilibrium steady-state (NESS) density matrix \(\widehat{\rho }\), where \(d\widehat{\rho }/dt=0\) which is reached in the long-time limit, can be obtained directly via a variational scheme by minimizing a norm of the time derivative in Eq. (9)36.

Neural networks as variational ansatz for the density operator \({\widehat{\rho }}_{{{\boldsymbol{\theta }}}}={\sum }_{{{\boldsymbol{\sigma }}}{{{\boldsymbol{\sigma }}}}^{{\prime} }}{\rho }_{{{\boldsymbol{\theta }}}}({{\boldsymbol{\sigma }}},{{{\boldsymbol{\sigma }}}}^{{\prime} })\left\vert {{\boldsymbol{\sigma }}}\right\rangle \left\langle {{{\boldsymbol{\sigma }}}}^{{\prime} }\right\vert\) parametrized by the set of parameters θ, with the complete many-body basis of spin-1/2 configurations \(\left\vert {{\boldsymbol{\sigma }}}\right\rangle\), have been used to find the NESS solution to different open quantum spin systems11,12,13,14,15. We use the L2 norm as the cost function to be minimized as was described in ref. 14

$$C({{\boldsymbol{\theta }}})=\frac{\,{\mbox{Tr}}\,\left[{\widehat{\rho }}_{{{\boldsymbol{\theta }}}}^{{\dagger} }{{{\mathcal{L}}}}^{{\dagger} }{{\mathcal{L}}}{\widehat{\rho }}_{{{\boldsymbol{\theta }}}}\right]}{\,{\mbox{Tr}}\,\left[{\widehat{\rho }}_{{{\boldsymbol{\theta }}}}^{{\dagger} }{\widehat{\rho }}_{{{\boldsymbol{\theta }}}}\right]}=\frac{| | {{\mathcal{L}}}{\widehat{\rho }}_{{{\boldsymbol{\theta }}}}| {| }_{2}^{2}}{| | {\widehat{\rho }}_{{{\boldsymbol{\theta }}}}| {| }_{2}^{2}}$$
(10)
$$= \sum\limits_{{{\boldsymbol{\sigma }}}{{{\boldsymbol{\sigma }}}}^{{\prime} }}{p}_{{{\boldsymbol{\theta }}}}({{\boldsymbol{\sigma }}},{{{\boldsymbol{\sigma }}}}^{{\prime} }){\left| \sum\limits_{\tilde{{{\boldsymbol{\sigma }}}}{\tilde{{{\boldsymbol{\sigma }}}}}^{{\prime} }}{{{\mathcal{L}}}}_{{{\boldsymbol{\sigma }}}{{{\boldsymbol{\sigma }}}}^{{\prime} }\tilde{{{\boldsymbol{\sigma }}}}{\tilde{{{\boldsymbol{\sigma }}}}}^{{\prime} }}\frac{{\rho }_{\theta }(\tilde{{{\boldsymbol{\sigma }}}},{\tilde{{{\boldsymbol{\sigma }}}}}^{{\prime} })}{{\rho }_{\theta }({{\boldsymbol{\sigma }}},{{{\boldsymbol{\sigma }}}}^{{\prime} })}\right | }^{2}.$$
(11)

This function and its derivative with respect to the parameters θ can then be evaluated as the statistical expectation value over the probability distribution

$${p}_{{{\boldsymbol{\theta }}}}({{\boldsymbol{\sigma }}},{{{\boldsymbol{\sigma }}}}^{{\prime} })=\frac{| {\rho }_{\theta }({{\boldsymbol{\sigma }}},{{{\boldsymbol{\sigma }}}}^{{\prime} }){| }^{2}}{\sum\nolimits_{\bar{{{\boldsymbol{\sigma }}}}{\bar{{{\boldsymbol{\sigma }}}}}^{{\prime} }}| {\rho }_{\theta }(\bar{{{\boldsymbol{\sigma }}}},{\bar{{{\boldsymbol{\sigma }}}}}^{{\prime} }){| }^{2}}$$
(12)

using Monte-Carlo samples. This avoids the first sum over the entire Hilbert space, while the inner sum in Eq. (11), which adds up the sparse Lindblad matrix elements \({{{\mathcal{L}}}}_{{{\boldsymbol{\sigma }}}{{{\boldsymbol{\sigma }}}}^{{\prime} }\tilde{{{\boldsymbol{\sigma }}}}{\tilde{{{\boldsymbol{\sigma }}}}}^{{\prime} }}\), can usually be carried out exactly. The parameters are then iteratively updated to find the steady state as the solution to the optimization problem

$$\rho_{SS} = \mathop{{\mathrm{argmin}}}\limits_{{\rho_{{\boldsymbol{\theta}}}}} \, C({\boldsymbol{\theta}}).$$
(13)

To update the parameters, the stochastic reconfiguration (SR) method37 is often used for optimizing neural quantum states1, where a system of equations is solved in each iteration to adapt the metric to the current cost surface. However, we find improved convergence and often better results using a backtracking Nesterov accelerated gradient descent optimization scheme as described in ref. 17 (NAGD+), especially when optimizing deeper neural networks. With automatic differentiation, the gradients of Eq. (11) are evaluated using the same Monte-Carlo samples.

Once the optimization is converged, for a given set of parameters the expectation values of physical observables \(\widehat{O}\) are computed as an expectation value

$$\langle \widehat{O}\rangle =\,{\mbox{Tr}}\,\,\{\widehat{O}\rho \}=\sum\limits_{{{\boldsymbol{\sigma }}}}p({{\boldsymbol{\sigma }}})\sum\limits_{{{{\boldsymbol{\sigma }}}}^{{\prime} }}\frac{\langle {{\boldsymbol{\sigma }}}| \widehat{O}| {{{\boldsymbol{\sigma }}}}^{{\prime} }\rangle \rho ({{\boldsymbol{\sigma }}},{{{\boldsymbol{\sigma }}}}^{{\prime} })}{\rho ({{\boldsymbol{\sigma }}},{{\boldsymbol{\sigma }}})}$$
(14)

over Monte-Carlo samples from the probability distribution p(σ) = ρ(σσ). This way, the summands are independent of the normalization of ρ. Again the inner sum can typically be carried out exactly. A large part of computational effort is taken up by the Monte-Carlo sampling, which, however, can be highly parallelized and in some cases accelerated by implicitly enforcing symmetries during sampling17.