## Abstract

In this work we apply deep neural networks to find the non-equilibrium steady state solution to correlated open quantum many-body systems. Motivated by the ongoing search to find more powerful representations of (mixed) quantum states, we design a simple prototypical convolutional neural network and show that parametrizing the density matrix directly with more powerful models can yield better variational ansatz functions and improve upon results reached by neural density operator based on the restricted Boltzmann machine. Hereby we give up the explicit restriction to positive semi-definite density matrices. However, this is fulfilled again to good approximation by optimizing the parameters. The great advantage of this approach is that it opens up the possibility of exploring more complex network architectures that can be tailored to specific physical properties. We show how translation invariance can be enforced effortlessly and reach better results with fewer parameters. We present results for the dissipative one-dimensional transverse-field Ising model and a two-dimensional dissipative Heisenberg model compared to exact values.

### Similar content being viewed by others

## Introduction

Finding solutions to strongly correlated quantum many-body systems, where the Hilbert space comprising all possible configurations grows exponentially with system size, relies on approximations and numerical simulation. In recent years neural networks encoding quantum many-body states, termed neural network quantum states (NQS), have emerged as a promising tool for a variational treatment of quantum many-body problems^{1}. Using gradient-based optimization, the groundstate or time-evolved quantum wavefunctions as solutions to the many-body Schrödinger equation can be efficiently approximated^{1,2,3,4,5,6,7,8,9,10}, exploiting the expressive powers and the universal approximation properties of state of the art machine learning techniques that also often face very high dimensional problems. These ansatz functions often require less parameters to express the exponential complexity of correlations and bear the prospect of an improved scaling behaviour to larger and higher dimensional systems.

Open quantum systems, where the system is connected to an environment or bath that induces dissipative processes described by a quantum master equation, have become of great interest in part due to the advent of noisy quantum devices such as quantum computers. Neural network density operators (NDO), parameterizing the mixed density operator describing such systems, have recently been shown to be a capable numerical tool to compute the dynamics of open quantum systems^{11,12,13,14,15}. With Hilbert spaces scaling like 2^{2N}, instead of 2^{N} as for quantum wavefunctions, this is a particularly difficult task. So far mainly two approaches have been suggested. One is to analytically trace out additional bath degrees of freedom in a purified restricted Boltzmann machine (RBM)^{11}, yielding an ansatz function that always fulfils all the properties of a physical density matrix, as used in refs. ^{12, 14, 16, 17}. An alternative approach is to describe the system using a probabilistic formulation via positive operator-valued measurements (POVM)^{18,19}. This method works with arbitrary network ansatz functions and was shown to improve upon results reached by purified RBM for some cases^{19}. Reference ^{20} found some advantages of density operators based on RBM in comparison with the POVM formulation when looking at mixed quantum state reconstruction.

We will focus on an improvement of the first method and address the constraints which limit the choice of parametrization. For a matrix to represent a physical density operator, it must be positive semidefinite and Hermitian, which the purified ansatz using RBM fulfils by construction. If the positivity condition is not enforced however, this opens up new possibilities to use more powerful representations. More modern deep neural networks were shown to outperform shallow RBM and fully connected networks for closed quantum systems, where the complex groundstate wavefunction is targeted instead of a mixed density matrix, e.g. refs. ^{21,22}.

In the ongoing search for better variational functions to approximate mixed density matrices, in this work we apply prototypical deep convolutional neural networks (CNN), which are part of most modern deep learning models, to open quantum systems, by parametrizing the density matrix directly. Due to the parameter sharing properties of CNN, translation symmetry can be enforced easily. This also leads to a system size independent parametrization, enabling transfer learning to larger systems. We find considerably improved results compared to NDO based on RBM, using much fewer parameters. To the best of our knowledge, this represents the first time that competitive results are achieved by a non-positive neural network parametrization of the density matrix.

In this paper we first introduce a neural network architecture based on convolutional neural networks to encode the density matrix of translational invariant open quantum systems and then present our results for different physical models. In the Methods section we describe the formulation of finding the steady state solution to the Lindblad master equation as an optimization problem.

## Results

### Neural-network density operator based on purification

The neural-network density operator (NDO) based on a purified RBM is defined by analytically tracing out additional ancillary nodes ** a** in an extended system described by a parametrized wavefunction

*ψ*(

**,**

*σ***)**

*a*^{11}. The reduced density matrix for the physical spin configurations \({{\boldsymbol{\sigma }}},{{{\boldsymbol{\sigma }}}}^{{\prime} }\) is then obtained by marginalizing over these bath degrees of freedom

*a*This can be done as long as the dependence on ** a** in

*ψ*can be factored out, as is the case for the RBM wavefunction

^{1}ansatz \(\psi ({{\boldsymbol{\sigma }}},{{\boldsymbol{a}}})=\exp [-E({{\boldsymbol{\sigma }}},{{\boldsymbol{a}}})]\) adopted in refs.

^{12,13,14}where

*E*is an Ising type interaction energy and thus a linear function of

**. But this particular design does not represent the most general density matrix, where the ancillary bath degrees of freedom would be spins in the visible layer that are traced out, instead of another set of hidden nodes. This would require a computationally expensive sampling of hidden layers because the ancillary nodes cannot be traced out analytically any more, as is the case when extending the full purification approach to deep networks**

*a*^{23}. It was recently shown however, that better results can be achieved

^{24}by keeping the RBM coupling only for the ancillary nodes and parametrizing the dependence on the visible nodes with a deep neural network.

These purified RBM NDO which represent a reduced density matrix in an extended system have the advantage of being positive-semidefinite and Hermitian by design. However, a variational ansatz encoding the density operator does not necessarily need to enforce these properties exactly, as was previously shown^{15,25}. Once the optimization problem yielding the steadystate solution is solved, the resulting density matrix should have these properties within some approximation error. Letting the optimization deal with this enables us to use new classes of potentially more expressive networks, such as CNN, as variational tools for open quantum systems.

### Convolutional neural networks

Several studies show that deep network architectures, which have more than one hidden layer of non-linear transformations, offer considerable advantages in expressing highly entangled quantum wavefunctions compared to shallow networks like RBM^{21,22}. Especially CNN have been applied very successfully to closed quantum many-body systems and problems in continuous space^{3,10,26,27}.

Convolutional neural networks are used in most modern neural network architectures, for example in image recognition^{28}. They work by applying convolution filters, which constitute a part of the variational parameters, to the input data, followed by a non-linear activation function and possibly some pooling or averaging to decrease the feature size^{29}. The output of the *n*-th convolutional layer is computed as

with the *k*-th Kernel *K* of size (*X*, *Y*, *C*), a non-linear activation function *f* and setting *F*^{(0)} to be the input layer. Here the indices *x* and *y* run over the spatial dimensions, whereas *c* indexes the channels, i.e. how many kernels there were in the previous layer. In the case of image recognition for example, the channels dimension in the first layer is used to encode the colour channel in RGB input images. In our case of spin configuration, the channels describe the left and right Hilbert spaces of the density matrix, as discussed below.

Essentially, for each layer a fixed size matrix of parameters is scanned over the input, and the output is an array of inner products of this kernel with the input at the respective positions. Such a convolutional layer produces so called feature maps *F* as the output. These indicate the locations where the convolution filters *K* matches well with the corresponding part of the input^{29}. From this it is apparent that convolutional kernels extract only local information or short range correlations between the input nodes for that layer. However, successively applying multiple such layers, the field of view of the output nodes is increased. A convolutional layer can also be understood as a fully connected layer of neurons, where some parameters are shared between them, hence there are many more connections than parameters.

### Convolutional neural-network density operator

To represent a Hermitian density operator we start by parametrizing

with the network output *A*_{θ} and a set of variational parameters ** θ**. It makes sense to consider the locality of the spins

*σ*

_{i}and \({\sigma }_{i}^{{\prime} }\) at site

*i*in the design of the network. This can be done by stacking

**and \({{{\boldsymbol{\sigma }}}}^{{\prime} }\) instead of concatenating them, essentially introducing a new dimension. For a one-dimensional chain of**

*σ**N*spins, the input then becomes a 2D image of size

*N*× 2 where the pairs \(({\sigma }_{i},{\sigma }_{i}^{{\prime} })\) stay together. In common neural network software libraries the channels dimension of the input layer can be used for this. Lattices with more than one dimension are equally easy to implement in this manner. We then apply to the input nodes two or more convolutional layers with fixed-size kernels. The kernel sizes together with the depth of the network determine how well long-range correlations can be represented. The resulting feature maps in each layer are transformed element-wise by the leaky variant of the rectified linear unit (ReLU), defined as

with *α* = 0.01. To obtain a complex density matrix amplitude *A*, the final feature maps are taken as the input to a fully connected layer with two output neurons *F*_{(0, 1)} representing the real and imaginary parts of *A* = *F*_{0} + *i**F*_{1}. In this way, all computations inside the model can be done using real-valued parameters. In terms of Eq. (2) this step can be understood as applying two kernels of the same size as the previous layer’s output.

The variational parameters consist of the convolution kernels *K* in Eq. (2) as well as the weights of the final dense layer. The input *F*^{(0)} of the network is constructed by a configuration \(({{\boldsymbol{\sigma }}},{{{\boldsymbol{\sigma }}}}^{{\prime} })\). The network architecture is depicted in Fig. 1, including a so-called pooling layer that is described below.

Periodic boundary conditions (pbc) in the physical system can be considered by applying them to each convolutional layer’s input. Even though over the length of the spin chain the same parameters are used repeatedly, the resulting feature maps for a given configuration are not translation invariant as there is still information about where in the lattice a feature occurred. Translation invariance for systems with periodic boundary conditions can be easily imposed using a single pooling layer on the output of the last convolution, averaging all nodes along the physical dimensions

This greatly reduces the number of parameters and connections in the fully connected layer and, more importantly, the resulting network does not depend on the size of the input and hence the same parameter set can be used for any size of the physical system under consideration. This enables what is called transfer learning, which amounts in pre-training the model, for example on a smaller spin chain, and using these parameters as the initial values for a larger system^{3}. In this way, the kernels can be trained to assume shapes that detect relevant features in the spin configurations and their correlations occurring in the steady-state, which are likely to also appear in larger physical systems. This often improves the obtained results and can accelerate convergence. In contrast to applying the translation operator to all input configurations and summing over the symmetry group members as is conventionally done^{30,31}, with this architecture no additional effort is needed.

### Dissipative transverse-field Ising model on a spin chain

To evaluate the expressive power of the neural network ansatz described in the previous section, we apply it to the problem of finding the non-equilibrium steady state (NESS) of the 1D dissipative transverse field Ising (TFI) model of *N* spins with periodic boundary conditions. The Hamilton operator for this system is

with the Pauli matrices \({\sigma }_{j}^{x,y,z}\) at site *j*, an energy scale *V* and a magnetic field strength *g*. The homogeneous dissipation is described by *γ*_{k} = *γ* (in Eq. (9)) and by the jump operators \({L}_{j}={\sigma }_{j}^{-}=\frac{1}{2}({\sigma }_{j}^{x}-i{\sigma }_{j}^{y})\) on all sites *j*. We set *V*/*γ* = 2 to compare with the results of ref. ^{14}.

For the spin chain we use the network architecture depicted in Fig. 1, using two convolutional layers with 6 and 20 feature maps with kernel sizes (*X*, *Y*, *C*) = (3, 1, 2) and (3, 1, 6) respectively, followed by a mean pooling over the spatial dimension and a fully connected final layer. In Fig. 2 we plot the observables *σ*^{x}, *σ*^{y} and *σ*^{z} averaged over all sites for different magnetic field strengths *g* using our CNN ansatz compared to exact values and the results obtained using RBM from ref. ^{14}. We can see that the CNN produces results with good accuracy, also in the range of the magnetic field *g*/*γ* from 1 to 2.5, where the RBM had trouble producing the correct result even with an increased computational effort (compare ref. ^{14}).

This is achieved using only 438 parameters, which is considerably less than the RBM which had 2752 trainable parameters with hidden and ancillary node densities 1 and 4 respectively, as was used in ref. ^{14}. We chose a relatively small number of feature-maps in the first layer, as they tend to become redundant, whereas in consecutive layers, more parameters improve the results.

We found that the initialization of the parameters of convolutional layers has a big impact on the performance. We initialize the kernels following a normal distribution with zero mean and a standard deviation \(\sqrt{2/{v}_{n}}\) with *v*_{n} being the number of parameters in the *n*-th layer, in order to control the variance throughout the network^{32}. We then further initialized the parameters by pre-training on a 6-site chain. During optimization, the sums in Eq. (11) were evaluated using a sample size of 1024 for 5000 to 20000 iterations until converged. The final observables were computed according to Eq. (14) with 100000 samples to reduce the variance.

Interestingly, with RBM it was not possible (contrary to the CNN) to achieve an accurate result for certain parameter ranges with reasonable effort, even for such a small system of 6 spins, as can be seen in Fig. 3 where we compare 〈*σ*^{x}〉 and the cost function Eq. (11) during the optimization. This is probably due to the limited expressivity of the purified RBM description. The computational effort per iteration for different chain lengths is plotted in Fig. 4, analysed on a typical notebook computer with 8 CPU cores. Both RBM and CNN show a polynomial scaling and the RBM is about a factor 2 faster to evaluate than our particular CNN implementation. However a difference in convergence behaviour can outweigh this, as Fig. 3 demonstrates.

The constant number of parameters in principle makes it easier to scale to larger systems. For 30 spins we obtained similar expectation values as depicted in Fig. 1, such as 〈*σ*^{x}〉 = 0.27 at *g*/*γ* = 2. Since in this setup the exact solution of these observables has no strong dependence on the chain length, this result seems plausible. For the 1D spin chain we found comparable results with network architectures without the mean pooling layer, but then there are more parameters in the final dense layer depending on the spatial dimension of the input, which leads to a higher computational effort. The advantage of this architecture is that one can treat non-translation invariant systems, for example without periodic boundary conditions, such as an asymmetrically driven dissipative spin chain as in ref. ^{17}.

### Transverse-field Ising model with rotated Hamiltonian

To further test our approach, we address the 1D dissipative transverse field Ising model with rotated Hamiltonian

and unchanged dissipative jump operators *L*_{j} = *σ*^{−}. This was investigated for example in the case of an 1D array of coupled optical cavities^{33,34} and previous results suggested that neural network quantum states had difficulty obtaining correct results^{35}. To compare with literature a spin chain with open boundary conditions is used which also addresses the interesting question of whether systems without translation invariance can be solved by the CNN. We set *V* = − 2 and *γ* = 1 and look at the correlation functions \(\langle {\sigma }_{j}^{x}{\sigma }_{j+l}^{x}\rangle\) for varying magnetic field *g*. Observables in the middle of the chain with *j* = *N*/2 are investigated to avoid edge effects^{34}. Comparing to exact values, in Fig. 5 we show the results for *N* = 8 spins obtained by optimizing the same CNN model as described in the previous section. The exact diagonalization results are reproduced with good accuracy. We found good results with or without the pooling layer, suggesting that edge effects played no major role. To analyse the positivity of the density matrix obtained in approximation, in the bottom panel all positive and negative eigenvalues are shown. The largest positive eigenvalue is >4 orders of magnitude bigger than the largest absolute negative eigenvalue and the sum of all positive eigenvalues amounts to 1.000042 whereas the trace is 1, indicating that the optimization retrieved an essentially positive matrix. For the \(\langle {\sigma }_{j}^{z}{\sigma }_{j+1}^{z}\rangle\) expectation values, for which some difficulties were recently reported even for a 6-spin chain^{35}, we achieved relative errors to exact diagonalization results of no more than 0.3%.

Scaling to larger systems, in Fig. 6 we plot the \(\langle {\sigma }_{j}^{x}{\sigma }_{j+l}^{x}\rangle\) correlation function for 40 spins at the critical points *g* = ± 1 to investigate long-range correlations. In order to compare with the numerical matrix product operator results by ref. ^{34} we use a smaller dissipation *γ* = 1/2. The anti-ferromagnetic ordering of the *x*-components at positive *g* and the *π* rotation for positive and negative fields are perfectly reproduced and our results are generally in good agreement with the reference values.

We noticed that the initialization of parameters can have a major impact on the speed of convergence. Initializing the model with pre-trained parameters of a smaller chain improves the convergence of larger models. This can be seen in Fig. 7 where we plot the convergence of the cost function, its variance and the nearest-neighbour correlation function with and without pre-training for 16 spins. A good approximation of the observable is already obtained after very few iterations, while the residue of the Lindblad equation still decreases further. We also show the larger 40 spin chain, which is more difficult to optimize but displays a similar convergence.

A stronger dissipation may also lead to faster convergence and can be used in a pre-training step. As a numerical detail we would like to mention that we found better convergence when turning the computational basis into the *x*-axis. In this case the Hamiltonian from Eq. (6) is retrieved but with rotated dissipation operators and observables.

### 2D dissipative Heisenberg model

To demonstrate expanding the network to higher dimensional systems, we look at the 2D dissipative Heisenberg model with periodic boundary conditions. The Hamiltonian reads

Following the setup in ref. ^{19} a uniform dissipation rate *γ* for the jump operators \({L}_{j}={\sigma }_{j}^{-}\) and *J*_{x} = 0.9*γ*, *J*_{z} = *γ* are set. In Fig. 8 the steady-state results of the *σ*^{z} expectation value for a 3 × 3 lattice is plotted for different values of *J*_{y}/*γ*, obtained by optimizing the CNN ansatz compared to exact values from ref. ^{19}. Using only 350 variational parameters, the CNN achieves comparable accuracy to the variational POVM solution by ref. ^{19}. In their work they improved the variational results by running real-time evolution steps starting from the final optimized state. This could potentially present a method for further improving an already converged CNN result as well. We also plot the results we obtained for 4 × 4 and 6 × 6 lattices which indicate a decrease in the absolute *z* expectation values with system size. These calculations were again initialized with pre-trained parameters.

Here we chose a smaller 2 × 2 kernel size in the physical dimensions and 3 convolutional layers with 6 kernels each, followed by a mean pooling and a fully connected layer, analogues to Fig. 1. Due to the symmetry of the Hamiltonian and the Lindblad operators, \(\rho ({{\boldsymbol{\sigma }}},{{{\boldsymbol{\sigma }}}}^{{\prime} })\) is nonzero only in sectors where \({\sum }_{i}({\sigma }_{i}-{\sigma }_{i}^{{\prime} })=2n\) with \(n\in {\mathbb{Z}}\). Following ref. ^{17}, this restriction can be implemented in the Monte-Carlo sampling by proposing only allowed configurations, leading to a faster convergence. We also rescaled the Monte-Carlo weights from Eq. (12) using ∣*ρ*∣^{2β} with *β* from 0.2 – 0.5 to better cover the configuration space, as described in ref. ^{17}. This new probability distribution is easier to sample from, as can be seen in Fig. 9, where for a 2 × 2 lattice the rescaled exact density matrix ∣*ρ*∣^{2β} is displayed, reordered according to the total spin of the configurations showing the allowed sectors. We again used a sample size of 1024 during the optimization.

## Conclusions

We demonstrated how a deep neural network ansatz can improve upon previous variational results by parametrizing the mixed density matrix directly and not enforcing the positivity. We rearranged the left and right Hilbert space of the spin configurations, which enabled a simple convolutional network architecture to efficiently capture the NESS of the dissipative transverse field Ising model with considerably less parameters compared to neural density operators ansatz functions based on RBM. Furthermore, with the same architecture we successfully obtained correct solutions for a rotated Hamiltonian. We also exemplified how to expand the model to 2D systems and reported some results for the dissipative Heisenberg model, applying transfer learning to accelerate computations for larger systems. These results encourage to explore other powerful neural network architectures to represent mixed density matrices without the explicit constraints of positivity, which RBM density operators were designed around.

Convolutions capture local correlations only, but varying the kernel sizes and the depth of the network, it can be tuned to better express longer range correlations. The simplicity of the network architecture makes it easily expandable and possibly interpretable, as the first-layer kernels for example should learn important spin-spin correlations that are then connected to each other in following layers.

By introducing a pooling layer over all physical dimensions, translation invariance is enforced at no additional cost and the number of parameters is reduced at the same time. This, combined with the fact that in the convolutional layers there are no size-dependent fixed connections, enables transfer learning - using the same set of parameters as initialization for different physical system sizes. While this leads to a good fit for translation invariant models, it also encourages to look into CNN density matrices for physical models without such symmetries by leaving out the pooling step to keep the locality information in the network. Designing and applying complex valued networks and parameters could be another interesting area for further investigation.

## Methods

### Optimizing for the nonequilibrium steady-state density operator

The dynamics of an open quantum system with Hamiltonian *H* coupled to a Markovian environment is described by the Lindblad master equation

with the jump operators *L*_{k} leading to non-unitary dissipation. The non-equilibrium steady-state (NESS) density matrix \(\widehat{\rho }\), where \(d\widehat{\rho }/dt=0\) which is reached in the long-time limit, can be obtained directly via a variational scheme by minimizing a norm of the time derivative in Eq. (9)^{36}.

Neural networks as variational ansatz for the density operator \({\widehat{\rho }}_{{{\boldsymbol{\theta }}}}={\sum }_{{{\boldsymbol{\sigma }}}{{{\boldsymbol{\sigma }}}}^{{\prime} }}{\rho }_{{{\boldsymbol{\theta }}}}({{\boldsymbol{\sigma }}},{{{\boldsymbol{\sigma }}}}^{{\prime} })\left\vert {{\boldsymbol{\sigma }}}\right\rangle \left\langle {{{\boldsymbol{\sigma }}}}^{{\prime} }\right\vert\) parametrized by the set of parameters ** θ**, with the complete many-body basis of spin-1/2 configurations \(\left\vert {{\boldsymbol{\sigma }}}\right\rangle\), have been used to find the NESS solution to different open quantum spin systems

^{11,12,13,14,15}. We use the

*L*

_{2}norm as the cost function to be minimized as was described in ref.

^{14}

This function and its derivative with respect to the parameters ** θ** can then be evaluated as the statistical expectation value over the probability distribution

using Monte-Carlo samples. This avoids the first sum over the entire Hilbert space, while the inner sum in Eq. (11), which adds up the sparse Lindblad matrix elements \({{{\mathcal{L}}}}_{{{\boldsymbol{\sigma }}}{{{\boldsymbol{\sigma }}}}^{{\prime} }\tilde{{{\boldsymbol{\sigma }}}}{\tilde{{{\boldsymbol{\sigma }}}}}^{{\prime} }}\), can usually be carried out exactly. The parameters are then iteratively updated to find the steady state as the solution to the optimization problem

To update the parameters, the stochastic reconfiguration (SR) method^{37} is often used for optimizing neural quantum states^{1}, where a system of equations is solved in each iteration to adapt the metric to the current cost surface. However, we find improved convergence and often better results using a backtracking Nesterov accelerated gradient descent optimization scheme as described in ref. ^{17} (NAGD+), especially when optimizing deeper neural networks. With automatic differentiation, the gradients of Eq. (11) are evaluated using the same Monte-Carlo samples.

Once the optimization is converged, for a given set of parameters the expectation values of physical observables \(\widehat{O}\) are computed as an expectation value

over Monte-Carlo samples from the probability distribution *p*(** σ**) =

*ρ*(

**,**

*σ***). This way, the summands are independent of the normalization of**

*σ**ρ*. Again the inner sum can typically be carried out exactly. A large part of computational effort is taken up by the Monte-Carlo sampling, which, however, can be highly parallelized and in some cases accelerated by implicitly enforcing symmetries during sampling

^{17}.

## Data availability

All relevant data are available from the corresponding author upon request.

## Code availability

The source code is available at ref. ^{38}. The implementation was based on the open source libraries Jax^{39} and NetKet^{40}.

## References

Carleo, G. & Troyer, M. Solving the quantum many-body problem with artificial neural networks.

*Science***355**, 602 (2017).Hibat-Allah, M., Ganahl, M., Hayward, L. E., Melko, R. G. & Carrasquilla, J. Recurrent neural network wave functions.

*Phys. Rev. Res.***2**, 023358 (2020).Choo, K., Neupert, T. & Carleo, G. Study of the two-dimensional frustrated J1-J2 model with neural network quantum states.

*Phys. Rev. B***100**, 125124 (2019).Choo, K., Mezzacapo, A. & Carleo, G. Fermionic neural-network states for ab-initio electronic structure.

*Nat. Commun.***11**, 2368 (2020).Schmitt, M. & Heyl, M. Quantum many-body dynamics in two dimensions with artificial neural networks.

*Phys. Rev. Lett.***125**, 100503 (2020).Hermann, J. Deep-neural-network solution of the electronic Schrödinger equation.

*Nat. Chem.***12**, 11 (2020).Pfau, D., Spencer, J. S., Matthews, A. G. D. G. & Foulkes, W. M. C. Ab initio solution of the many-electron Schrödinger equation with deep neural networks.

*Phys. Rev. Res.***2**, 033429 (2020).Gerard, L., Scherbela, M., Marquetand, P. & Grohs, P. Gold-standard solutions to the Schrödinger equation using deep learning: how much physics do we need?

*Adv. Neural Inf. Process. Syst.***35**, 10282 (2022).Carrasquilla, J. & Torlai, G. How to use neural networks to investigate quantum many-body physics.

*PRX Quantum***2**, 040201 (2021).Roth, C., Szabó, A. & MacDonald, A. H. High-accuracy variational Monte Carlo for frustrated magnets with deep neural networks.

*Phys. Rev. B***108**, 054410 (2023).Torlai, G. & Melko, R. G. Latent space purification via neural density operators.

*Phys. Rev. Lett.***120**, 240503 (2018).Hartmann, M. J. & Carleo, G. Neural-network approach to dissipative quantum many-body dynamics.

*Phys. Rev. Lett.***122**, 250502 (2019).Nagy, A. & Savona, V. Variational quantum Monte Carlo method with a neural-network ansatz for open quantum systems.

*Phys. Rev. Lett.***122**, 250501 (2019).Vicentini, F., Biella, A., Regnault, N. & Ciuti, C. Variational neural network ansatz for steady states in open quantum systems.

*Phys. Rev. Lett.***122**, 250503 (2019).Yoshioka, N. & Hamazaki, R. Constructing neural stationary states for open quantum many-body systems.

*Phys. Rev. B***99**, 214306 (2019).Kaestle, O. & Carmele, A. Sampling asymmetric open quantum systems for artificial neural networks.

*Phys. Rev. B***103**, 195420 (2021).Mellak, J., Arrigoni, E., Pock, T. & von der Linden, W. Quantum transport in open spin chains using neural-network quantum states.

*Phys. Rev. B***107**, 205102 (2023).Schmale, T., Reh, M. & Gärttner, M. Efficient quantum state tomography with convolutional neural networks.

*NJP Quant. Inf.***8**, 115 (2022).Luo, D., Chen, Z., Carrasquilla, J. & Clark, B. K. Autoregressive neural network for simulating open quantum systems via a probabilistic formulation.

*Phys. Rev. Lett.***128**, 090501 (2022).Zhao, H., Carleo, G. & Vicentini, F. Empirical sample complexity of neural network mixed state reconstruction.

*Quantum***8**, 1358 (2024).Gao, X. & Duan, L.-M. Efficient representation of quantum many-body states with deep neural networks.

*Nat. Commun.***8**, 662 (2017).Levine, Y., Sharir, O., Cohen, N. & Shashua, A. Quantum entanglement in deep learning architectures.

*Phys. Rev. Lett.***122**, 065301 (2019).Nomura, Y., Yoshioka, N. & Nori, F. Purifying deep boltzmann machines for thermal quantum states.

*Phys. Rev. Lett.***127**, 060601 (2021).Vicentini, F., Rossi, R. & Carleo, G. Positive-definite parametrization of mixed quantum states with deep neural networks.

*arXiv:2206.13488*https://doi.org/10.48550/arXiv.2206.13488 (2022a).Cui, J., Cirac, J. I. & Bañuls, M. C. Variational matrix product operators for the steady state of dissipative quantum systems.

*Phys. Rev. Lett.***114**, 220601 (2015).Saito, H. & Kato, M. Machine learning technique to find quantum many-body ground states of bosons on a lattice.

*J. Phys. Soc. Japan***87**, 014001 (2018).Pescia, G., Han, J., Lovato, A., Lu, J. & Carleo, G. Neural-network quantum states for periodic systems in continuous space.

*Phys. Rev. Res.***4**, 023138 (2022).LeCun, Y., Bengio, Y. & Hinton, G. Deep learning.

*Nature***521**, 436 (2015).LeCun, Y. et al. Handwritten digit recognition with a back-propagation network.

*Adv. Neutral Inf. Process. Syst.***2**, 396 (1989).Nomura, Y. Helping restricted Boltzmann machines with quantum-state representation by restoring symmetry.

*J. Phys. Condensed Matter***33**, 174003 (2021).Nigro, D. Invariant neural network ansatz for weakly symmetric open quantum lattices.

*Phys. Rev. A***103**, 062406 (2021).He, K., Zhang, X., Ren, S. & Sun, J. Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In

*2015 IEEE International Conference on Computer Vision (ICCV)*1026–1034 (IEEE, Santiago, Chile, 2015).Bardyn, C.-E. & İmamoglu, A. Majorana-like modes of light in a one-dimensional array of nonlinear cavities.

*Phys. Rev. Lett.***109**, 253606 (2012).Joshi, C., Nissen, F. & Keeling, J. Quantum correlations in the one-dimensional driven dissipative X Y model.

*Phys. Rev. A***88**, 063835 (2013).Kothe, S. & Kirton, P. Liouville-space neural network representation of density matrices.

*Phys. Rev. A***109**, 062215 (2024).Weimer, H. Variational principle for steady states of dissipative quantum many-body systems.

*Phys. Rev. Lett.***114**, 040402 (2015).Sorella, S., Casula, M. & Rocca, D. Weak binding between two aromatic rings: feeling the van der Waals attraction by quantum Monte Carlo methods.

*J. Chem. Phys.***127**, 014105 (2007).Mellak, J. NAGDopen Z

*enodo*https://doi.org/10.5281/zenodo.12758824 (2024).Bradbury, J. et al.

*JAX: Composable Transformations of Python+NumPy Programs*https://github.com/google/jax0/ (2018).Vicentini, F. et al. NetKet 3: Machine learning toolbox for many-body quantum systems.

*SciPost Physics Codebases*https://doi.org/10.21468/SciPostPhysCodeb.7 (2022b).

## Acknowledgements

We would like to thank Thomas Pock for providing his expertise in insightful discussions. This research was funded in part, by the Austrian Science Fund (FWF) [Grant DOI:10.55776/P33165] and by NaWi Graz. For the purpose of open access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission.

## Author information

### Authors and Affiliations

### Contributions

J.M. developed and performed the calculations in collaboration with E.A. and W.vdL. who both supervised the project and contributed to the discussion. All authors participated in the preparation of the manuscript.

### Corresponding author

## Ethics declarations

### Competing interests

The authors declare no competing interests.

## Peer review

### Peer review information

*Communications Physics* thanks Sirui Lu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

## Additional information

**Publisher’s note** Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

## About this article

### Cite this article

Mellak, J., Arrigoni, E. & von der Linden, W. Deep neural networks as variational solutions for correlated open quantum systems.
*Commun Phys* **7**, 268 (2024). https://doi.org/10.1038/s42005-024-01757-9

Received:

Accepted:

Published:

DOI: https://doi.org/10.1038/s42005-024-01757-9

## Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.