Noise-assisted variational quantum thermalization

Preparing thermal states on a quantum computer can have a variety of applications, from simulating many-body quantum systems to training machine learning models. Variational circuits have been proposed for this task on near-term quantum computers, but several challenges remain, such as finding a scalable cost-function, avoiding the need of purification, and mitigating noise effects. We propose a new algorithm for thermal state preparation that tackles those three challenges by exploiting the noise of quantum circuits. We consider a variational architecture containing a depolarizing channel after each unitary layer, with the ability to directly control the level of noise. We derive a closed-form approximation for the free-energy of such circuit and use it as a cost function for our variational algorithm. By evaluating our method on a variety of Hamiltonians and system sizes, we find several systems for which the thermal state can be approximated with a high fidelity. However, we also show that the ability for our algorithm to learn the thermal state strongly depends on the temperature: while a high fidelity can be obtained for high and low temperatures, we identify a specific range for which the problem becomes more challenging. We hope that this first study on noise-assisted thermal state preparation will inspire future research on exploiting noise in variational algorithms.

Noise is often considered to be one of the strongest adversaries of practical quantum computation. Decoherence effects due to a noisy environment can create errors in the final output of a circuit, destroying the advantage of many quantum algorithms. In contrast, noise is also what underlies stochastic processes, and is therefore a crucial element in classical computing, solving tasks such as sampling and optimization. In quantum systems, noise has also been shown to be a useful resource in several applications: carefully engineered dissipative processes can lead to universal quantum computation 1 , shot-noise in the measurement process can drive variational algorithms out of local minima 2,3 , and amplitude-damping channels can significantly improve quantum autoencoders for mixed states 4 . We investigate in the present paper a novel way to exploit noise in near-term quantum devices, with the objective of studying a central task in quantum computing: thermal state preparation. Placing a quantum system driven by a Hamiltonian H and weakly-coupled to a reservoir with an effective temperature T = 1 β , the system will asymptotically reach a thermal equilibrium state, given by the quantum Gibbs distribution where Z = Tr[e −βH ] is the partition function 5 . Efficiently preparing a thermal state on a quantum computer is a problem of broad practical importance, with applications ranging from quantum chemistry and many-body physics simulations in an open environment [6][7][8] to semi-definite programming 9,10 and quantum machine learning 11,12 . However, sampling from a general Gibbs distribution is a computationally hard task for classical computers, due to the complexity of calculating the partition function 13 . Most techniques rely on Monte-Carlo Markov Chain (MCMC) algorithms, which are often provably efficient only above a certain threshold temperature 14 .
Many algorithms have been proposed to prepare the thermal state on a quantum computer. A growing body of work has suggested using variational algorithms to solve the task of thermal state preparation on Noisy Intermediate Scale Quantum (NISQ) devices. Since a unitary circuit acting on the zero-state cannot directly output a mixed state, most variational thermalization methods consist either in preparing a purification of the thermal state and tracing out the ancillary qubits at the end of the circuit [15][16][17][18] , or in choosing an appropriate mixed state as input [19][20][21] .
(1) www.nature.com/scientificreports/ One of the main challenges associated to those methods is to design an appropriate cost function to be minimized during the variational training loop. While the ground-state of a Hamiltonian can be prepared by minimizing the average energy of the state, the thermal state can be prepared by minimizing the free energy F = H − TS of the state, where S = −Tr[ρ log(ρ)] is its Von Neumann entropy. However, the Von Neumann entropy is not an observable and can often only be computed approximately 18,22 . A second problem is the need for additional qubits, which can be costly in near-term devices. Finally, none of those methods take into account the noise of the circuit, which can change the spectrum of the final state and affect the performance of the preparation algorithm 23 .
In this paper, we propose a new method that we call Noise-Assisted Variational Quantum Thermalizer (NAVQT). Our algorithm assumes the ability to control the noise in the system down to some minimal noise level determined by the hardware. This type of control has been demonstrated in the context of error mitigation, where noise is increased in order to perform zero-noise extrapolation 24,25 . More precisely, we construct a variational circuit with a parametrized depolarizing channel after each layer of unitary gates, as illustrated in Fig. 1(a). To simplify the optimization process, we have only considered the case where all the depolarizing parameters take the same value. By varying both gate and noise parameters, we seek to minimize the free energy of the final state.
In order to compute the free energy (and its gradient), we derive an analytical expression for the entropy of a slightly different circuit: one where all the depolarizing gates have been displaced at the beginning of the circuit, as shown in Fig. 1(b). Using this approximation, we can compute the gradient of the free energy with respect to both the noise and the unitary parameters. While this might be a rough estimate of the actual gradient, we show that this approximate optimization problem exhibits similar performance as when minimizing the true free energy.
We then empirically investigate our algorithm on three different types of Hamiltonians: the Ising chain, with and without a transverse field, and the Heisenberg model. For each model, we consider both uniform coefficients and coefficients drawn from a standard normal distribution, and train our variational algorithm for several choices of hyperparameters (number of layers, learning rates, initialization, etc.). To study the performance of our approach, we extract the fidelity of the prepared state compared to the actual thermal state for a range of different temperatures.
Our results reveal different patterns. On the one hand, fidelities above 0.9 are reached for uniform Ising chains, with and without a transverse field, for all temperatures and system sizes up to 7 qubits. On the other hand, the performance tend to decrease with the system size and for specific ranges of temperatures, with fidelities that can get below 0.7 for some of the most complex systems tested in this work.
Our paper is organized as follows. We start by reviewing previous work on variational thermalization in "Related work" section. We then introduce NAVQT in "Noise-assisted variational quantum thermalization" section. We follow this up by a description of our experiments in "Methods" section, and present our results in "Results" section. Finally, we discuss our work and provide ideas for future studies in "Discussion" section.

Related work
Variational circuits have recently been proposed for thermal state preparation, due to the existence of a natural cost function for this task: the free energy. Using variational circuits to prepare a thermal state presents two challenges specific to this task: (1) finding an ansatz that can prepare mixed states, (2) finding a scalable optimization strategy.
Choice of the ansatz. A common approach to VQT consists in preparing a purification of the thermal state using a variational circuit that acts on 2N qubits-N system qubits and N ancilla/environment qubits-, and tracing the ancilla qubits out at the end of the circuit [15][16][17][18] . An example of purification often considered in the literature is the thermofield double (TFD) state 15,16 . For a Hamiltonian H and an inverse temperature β , it is given by  One advantage of this approach is the ability to simulate the TFD, which can be interesting in in its own right, for instance for studying black holes 26 . The obvious disadvantage is that it requires twice as many qubits that the thermal state we want to simulate. A converse approach consists in starting with a mixed state ρ 0 and applying a unitary circuit ansatz on the N qubits of the system. The initial ρ 0 can either be fixed 19 or modified during the optimization process 20,21 . In Ref. 19 , ρ 0 is the fixed thermal state of H I = N i=1 Z i , where Z i is the Pauli Z operator applied to qubit i of the system. It can easily be prepared using the purification However, since the spectrum does not change when we apply the unitary ansatz, having a static ρ 0 freezes the spectrum of the final state. Therefore, if the spectrum of the thermal state we want to approximate is far from the spectrum of ρ 0 , this approach will fail. In Ref. 20 , they use the thermal state 2 as an initial state and ε = {ε 1 , . . . , ε n } are parameters optimized during the training process. Finally, Ref. 21 proposes to use a unitary with stochastic parameters to prepare ρ 0 . More precisely, where V (x) is a unitary ansatz and X θ ∼ p θ is a random vector with parametrized density p θ . The density p θ can be given by a classical model, such as an energy-based model (e.g. restricted Boltzmann machine) or a normalizing flow, which will be trained to get a ρ 0 with a spectrum close to the thermal state of interest.
Optimization strategies. Once the ansatz has been fixed, the parameters within needs to be optimized.
Two main approaches have been proposed in the literature: (1) explicitly minimizing the free energy, (2) using imaginary-time evolution. In the following, we describe both these methods.
Free energy methods. The thermal state is the density matrix that minimizes the free energy. Therefore, in the same way as VQE uses the energy as a cost function, any thermal state preparation method can use the free energy as its cost function 15,16,19,21 . However, one main difference with VQE is that the free energy cannot be easily estimated. Indeed, the Von Neumann entropy term, as a non-linear function of ρ , cannot be turned into an observable, and doing a full quantum state tomography would be very costly. Several methods have been proposed to solve this challenge: • Computing several Renyi entropies S α = 1 1−α Tr[ρ α ] (using multiple copies of ρ ) and approximating the Von Neumann entropy with them 15,27 .
• Computing the Von Neumann entropies locally on a small subsystem 15 • Approximate the Von Neumann entropy by truncating its Taylor 18 or Fourier 22 decomposition.
In our work, the entropy term does not come from a purification procedure, but from the presence of depolarizing gates in the circuits. This led us to propose a different type of approximation that we will study in "Noiseassisted variational quantum thermalization".
Imaginary-time evolution. Thermal state preparation can be seen as the application of imaginary-time evolution during a time �t = iβ/2 on the maximally-mixed state ρ m = 1 d I , using the decomposition This imaginary-time evolution can be simulated using a variational circuit and a specific update rule 28,29 . In Ref. 17 , the authors use a variational circuit U(θ) on 2N qubits, initialized such that where + is a maximally-entangled state. An imaginary-time update rule with a small learning rate τ will lead to a unitary U(θ 0 ) such that: Repeating it during k = β 2 steps will give the state www.nature.com/scientificreports/ which will be the thermal state after tracing out the environment. In Ref. 30 , the authors also use imaginary-time evolution to prepare the thermal state, but manage to reduce the number of qubits to N when the Hamiltonian is diagonal in the Z-basis. Finally, an ansatz-independent imaginary-time evolution method has been proposed for thermal state preparation 31,32 .
In this work, we optimize the ansatz parameters using the free energy approach. Adapting imaginary-time evolution to a noisy ansatz could however be an interesting alternative, that we let for future work.

Noise-assisted variational quantum thermalization
We introduce here the Noise-Assisted Variational Quantum Thermalizer (NAVQT), a variational algorithm where depolarizing noise is used as the source of entropy for preparing the thermal state. We consider a noise model where each layer of unitary gates is followed by a one-qubit depolarizing channel where I is the identity matrix. The channel is represented in Fig. 1. For the purpose of this work, we consider that we have the same noise value ∈ [ min , 1] everywhere in the circuit, where min is the minimum noise reachable by the hardware. We note ρ θ , the output of the noisy circuit with unitary parameters θ and noise parameter , and want to find the optimal parameters {θ * , * } such that ρ θ * , * ≈ ρ β where the latter is given by Eq. (1).
The thermal state ρ β can be approximated by minimizing the free energy of the system, given by: where is the energy and is the Von Neumann entropy of the state. The energy term and its gradient are easy to evaluate: we can use the parameter shift-rule 33 to compute ∇ θ E(θ, ) , and the finite-difference method to calculate ∂ E(θ, ) . The entropy term is much harder to evaluate as it is a non-linear function of the state. To approximate it, we consider the circuit where all the noise has been put at the beginning, as shown in Fig. 1(b). While the resulting free energy will not be equal to the free energy of our original circuit in general, they tend to follow similar trajectories when varying the noise level (see Supplementary Fig. S1). The new entropy does not depend on θ and can be calculated analytically as if there were no unitary gates. For a circuit with N qubits and m layers, this approximate entropy S( ) is given by where d = 2 N . The full derivation is given in the Supplementary material. Using this approximation, we get the following gradient-based update rule at each optimization step: where η θ and η are the learning rates for θ and , respectively.

Methods
In this section, we will briefly describe the basis of conducted experiments. All quantum circuit simulations are done in Cirq 34 and TensorFlow-Quantum 35 . Ansatz. For the unitary layers of our circuit, we chose an ansatz inspired by the Quantum Approximate Optimization Ansatz (QAOA) applied to the Ising chain Hamiltonian 36 . More precisely, if we define a problem Hamiltonian and a mixing Hamiltonian This ansatz, whose explicit construction is represented in Fig. 2, has been well-studied in the context of groundstate preparation 37 and has been shown to be universal 38 in the limit p → ∞ . We test two different versions of this ansatz. In the first one, denoted restricted QAOA, gates of the same type from a given layer share the same parameters β i and γ i . In the second version, which we call flexible QAOA, every gate has its own parameter. We ran some preliminary tests to verify that this unitary ansatz is at least able to express the ground-state of all the systems tested in our work, and found it to be the case when the number of layers is fixed at ⌈ N 2 ⌉ . Hence the noisy ansatz should in principle be able to represent the correct thermal state for large β , by setting = 0 and fitting the unitary parameters corresponding to the ground-state. Moreover, NAVQT is also able to represent the maximally-mixed state, corresponding to a low β , by setting = 1 . In Supplementary Figure S3 37 . Finally, we extract the solution that gives the lowest (approximated) free energy among all the tested hyperparameters and initializations. We also include the same grid-search using finite-difference on the true free-energy in Supplementary Fig. S2.

Noisy circuit simulation.
To simulate the noise in our circuit, we use the fact that depolarizing gates can also be written as 39 which can be interpreted as applying a random Pauli error with probability p = 3 4 and nothing with probability p = 1 − 3 4 . We can therefore simulate depolarizing gates as stochastic mixtures over unitary circuits containing errors. More precisely, if we sample K unitaries U (k) , each being a combination of the unitary part of the ansatz and some random errors, we can extract the corresponding density matrix as: www.nature.com/scientificreports/ We found that taking a sample size of K = 500N was sufficient to get stable gradients and reach the maximum entropy S ≤ log 2 N . However, we also found that K could be smaller, especially when β was large and hence the target entropy was low.
Performance metric. For each experiment, we report the fidelity between the thermal state and the output state of the trained circuit. Tracking the fidelity requires us to compute the true thermal state ρ β for each Hamiltonian H and temperature β . In practice, taking the exponential of a matrix containing potentially large numerical values (e.g. when β is large) can result in numerical issues. To alleviate those issues, we calculate the thermal state density matrix ρ β by taking the log on both sides of Eq. (1) and using the log-sum-exp trick 40 : where c is the largest eigenvalue of H.

Models.
We evaluated our algorithm on three different models: the Ising chain, with and without a transverse field, and the Heisenberg model. For each model, we considered two cases: when the coefficients J i = h i = 1 for all i, denoted the uniform version, and when J i , h i ∼ N (0, 1) for all i, denoted the random version. Between five seeds for the random version, we pick the Hamiltonian with the lowest spectral gap as this could be considered the hardest Hamiltonian. In the case for Hamiltonians with random coefficients, we normalized the set of all coefficients such that the vector containing all coefficients had unit length. See Supplementary Fig. S4 for a plot of the model energy scales.
Ising chain. The 1D Ising model, or Ising chain (IC), considers a set of spins on a chain such that all spins have exactly two coupled neighbors when considering N > 2 . The Hamiltonian associated with such system is given by where Z i is the Pauli Z operator acting on qubit i.
Transverse field Ising chain. The transverse-field Ising chain (TFI) adds quantum effects to the previous model by including some non-diagonal terms in its Hamiltonian. It is defined as where X i is the Pauli X operator acting on qubit i.
Heisenberg model. Finally, we consider the 1D Heisenberg model, whose Hamiltonian is given by The Heisenberg model is of fundamental importance in the study of quantum materials [41][42][43][44] and is therefore a standard benchmark for thermal state preparation methods 31,32,45 .

Results
We first present the optimization curves for N = 4, at three different temperatures β ∈ {0.1, 0.5, 10} in Fig. 3. We report the fidelity between the learned state and the thermal state as a function of the inverse temperature β for all the different models in Fig. 4. Finally, we also report the final noise level as a function of β for all models in Fig. 5. We can notice a few phenomena from those figures:  Fig. 3 show that the optimization procedure improves the solution compared to a random initialization, both when a very high fidelity is reached at the end and when the fidelity is lower. It eliminates the possibility that random states being closed to the desired thermal states would explain our results. Moreover, the fidelity tends to increase with the number of iterations, showing that our approximate cost-function might be well-suited to our optimization goal. 2. Thermal states at low and high temperatures are easily approximated by our method, for all models and system sizes. Looking at the curves, we see that the optimizer is indeed able to find = 0 for very large β and = 1 for very low β . Hence, when the thermal state gets close to a maximally-mixed state or to a pure state, the algorithm learns to respectively maximize or minimize the noise, independently of the initial noise level. 3. The performance tends to degrade at intermediate temperatures, reaching for instance a fidelity of 0.6 for the Heisenberg model with random coefficients. However, there are several temperatures for which a nontrivial noise level is learned and the fidelity remains high, such as the same model at β = 10 −1 , for which a fidelity above 96% is reached for all system sizes with a noise level between 0.5 and 0.8. Hence the algorithm can actually find the correct thermal state in non-trivial temperature regimes.
From those results, an important question to consider is whether the low fidelity obtained for some systems is due to a failure of the optimization procedure or to the potentially low expressibility of our noisy ansatz. To tackle this question, we tested different methods to optimize the parameters of the ansatz, including a grid-search in the parameter space for systems that are small enough to allow it to run in a reasonable time. We found no significant improvement in the fidelity compared to the original optimization method. We also tried to initialize the unitary ansatz to the ground-state solution before turning on the noise, but it did not result in a significant increase of fidelity neither. Finally, to evaluate the effect of our free energy approximation, we performed all the experiments previously mentioned using finite-difference on the true free energy. The corresponding results can be found in Supplementary Figure S2, where we observe very similar fidelities as with the approximate free energy method. It means that for the hardest systems tested in this work, the noisy ansatz was probably not expressible enough to output an accurate approximation of the thermal state, independently of the optimization algorithm.  www.nature.com/scientificreports/ Changing the depolarizing gates to more general noise channels could help improve the expressibility of the ansatz and is let for future work.

Discussion
In this paper, we introduced a novel type of variational algorithms, in which the noise is parameterized and optimized together with the unitary gates. We used this architecture to prepare thermal states, overcoming some of the most common challenges for this task, such as the need of ancilla qubits and the adverse effect of noise. To optimize our ansatz, we used a closed-form approximation of the free-energy and performed gradient-descent with it. We investigated various Hamiltonians and deduced that the ability of our method to learn the correct thermal states strongly depends on the model, the temperature and the system size. While we systematically  www.nature.com/scientificreports/ obtained fidelities above 0.9 for both the transverse-field and the classical Ising chain, we had fidelities below 0.7 at some temperatures for the 1D Heisenberg model with random coefficients. We also identified a specific range of temperatures for each model, for which the task is harder for NAVQT to solve. Our experiments with different optimization algorithms reveal that the failure of the ansatz to learn the correct thermal state in those cases is probably an expressibility rather than an optimization issue. This paper serves as a starting point in the study of noise-assisted thermalization, and many avenues are still open for future work. For instance, we only considered a single shared parameter for all the depolarizing gates, as it allowed us to derive an approximation of the free energy, which simplified the optimization process. Varying the noise across each layer and each qubit independently could significantly increase the expressibility of Figure 5. Final noise level as a function of the inverse temperature β for various models and system sizes. We used a symlog scale for the y-axis, hence the scale becomes linear below 10 −3 . We observe a clear decrease of the noise level with β , with ≈ 1 for β = 10 −3 (corresponding to the maximally-mixed state) and ≈ 0 for β ≈ 10 2 (corresponding to the ground-state). It shows that the general relationship between the noise and the temperature has overall been correctly learned by our model. www.nature.com/scientificreports/ the ansatz. More generally, replacing the depolarizing gates by channels that are more tailored for thermal state preparation would be an interesting avenue to improve our method. For instance, Davies maps are non-unital channels that can model the evolution of quantum systems weakly-coupled to a thermal reservoir, making them particularly adapted to thermal state preparation 46 . Moreover, their unitary and dissipative parts commute, making the calculation of the entropy potentially easier than for our ansatz. A second important aspect for future work would be to better understand the theory behind noise-assisted variational circuits. For instance, what are the conditions on the Hamiltonian and the temperature under which NAVQT can approximate the thermal state with an arbitrary high fidelity? How does our method scale with the system size? What type of noise is necessary to approximate a given thermal state?
Finally, it could be interesting to study the optimization landscape of NAVQT and potentially come up with optimization algorithms that are more tailored to this problem. For instance, it has been shown that a barren plateau phenomenon occurs in noisy circuits that are similar to our ansatz 47 . It can potentially hinder the scalability of our method, as it relies explicitly on increasing the noise. Finding the relationship between the temperature β , the system size N and the magnitude of the gradient could be an interesting direction for future research.