## Introduction

As the search for high-performance materials persists in research and industry alike, there is a need for better understanding the thermal properties of materials. In particular, exploring thermal properties of nanomaterials is of great interest, with applications ranging from energy production to nanoelectronics1,2. Quantum effects can dominate at the nano-scale, and the exponential growth of resources required to simulate quantum systems with classical computers makes the simulation of such materials quickly exceed the capabilities of even the largest classical supercomputers3. Quantum computers, by contrast, are able to simulate quantum many-body systems efficiently4,5. Therefore, quantum computers offer a promising route to studying thermal properties of quantum materials. While a plethora of simulations of systems at zero-temperature have been demonstrated on quantum devices in recent years6,7,8,9,10,11,12,13, the landscape of quantum algorithms to calculate finite temperature properties remains more sparse14,15,16. The main challenge in exploring finite-temperature properties on quantum computers lies in the preparation of thermal states.

Current quantum algorithms for thermal state preparation fall into two main categories. The first comprises algorithms that initialize the qubits into the full thermal (i.e., mixed) state. In this case, the thermal average of an observable can be computed directly by measuring the observable in this state. Examples include algorithms that prepare the Gibbs state using phase estimation17,18,19, which require quantum circuits that are too large for near-term quantum devices, otherwise known as noisy intermediate-scale quantum (NISQ) computers20. Other examples include variational quantum thermalizers21,22 and methods that prepare thermofield double states1,23, both of which rely on variational techniques. The variational nature of these algorithms necessitates the use of a cost function, which generally becomes hard to compute as system size increases. Such methods are therefore difficult to scale to large or complex systems. Still other methods for generating the full thermal state require a number of ancilla qubits that scales with system size or complexity24,25, thus limiting the size of systems that can be simulated on current quantum hardware.

Algorithms in the second category prepare an ensemble of pure states, one pure state at a time, where each pure state has been sampled according to the correct thermal distribution. Existing examples rely on Monte Carlo sampling techniques, Markov chains, or both26,27,28,29,30,31. To calculate thermal averages, the desired observable is measured in each of the different pure states and the results are averaged over the ensemble. As pure states are much easier to prepare on a quantum computer than mixed states, this model for thermal state preparation is more promising for NISQ computers. However, the number of samples required generally grows with the size of the system being simulated, which can lead to significant resource requirements for large systems.

Canonical thermal pure quantum (TPQ) states32 offer a promising way to estimate thermal averages on quantum computers. Thermal state approximation by TPQ states lies in a separate third category, as it neither incurs the quantum resources required to prepare a mixed state, nor relies on a number of samples that grows with system size. TPQ states are formed by applying a specific non-unitary transformation, which is a function of the system Hamiltonian and inverse temperature, to a random state. The resulting state is shown to be representative of the thermal equilibrium in that observables measured in this state will approximate thermal averages. Remarkably, the error in the expectation value of the observable is bounded by an exponentially decreasing function of system size N. Thus, at sufficiently large N, the expectation value of an observable in only a single TPQ state will yield a very close approximation to the thermal average. The effectiveness of canonical TPQ states has been demonstrated classically32, but to our knowledge, has not been implemented on a quantum computer.

Here, we present an algorithm for generating canonical TPQ states on quantum computers, enabling the estimation of finite temperature properties of materials on NISQ devices. Our algorithm relies on a straightforward and scalable protocol for preparing the random state33, which allows for circuit depths to be tuned to find a balance between desired accuracy and feasibility of execution on NISQ hardware. Furthermore, the algorithm is agnostic to implementation of the non-unitary transformation of the random state, which can be tailored to the resource constraints of different quantum devices. We compare three possible implementations for approximating the non-unitary transformation: (i) the quantum imaginary time evolution (QITE) algorithm26, which can be better suited to devices constrained in qubit count, (ii) the dilated operator approach34, which provides some advantages in resource requirements over the QITE algorithm when a single ancillary qubit is available, and (iii) the FABLE method35 which is better suited for devices with limited coherence times. It is noted that Quantum Signal Processing may also be used to approximate this non-unitary operation, as discussed in Ref.36.

We demonstrate our algorithm using each implementation for the Heisenberg model, a quintessential model used for studying a range of behaviors in materials37,38,39,40,41. We anticipate that this algorithm will facilitate estimations of finite temperature properties of materials on near-term quantum devices. Furthermore, since error in estimating thermal averages with TPQ states decreases with increasing system size, we believe this algorithm will only become increasingly useful as quantum hardware continues to grow in size in the coming years.

## Theoretical framework

The (unnormalized) canonical TPQ state of a system of size N, governed by Hamiltonian H, at inverse temperature $$\beta$$ is defined as32

\begin{aligned} |\beta ,N\rangle =\hat{Q}|\Psi _R\rangle \equiv e^{-\beta H/2}|\Psi _R\rangle , \end{aligned}
(1)

where $$|\Psi _R\rangle =\sum _{i=1}^{2^N} c_i |i\rangle$$ is a random state, defined by complex amplitudes $${c_i}$$ which are uniformly sampled from the unit hypersphere such that $$\sum _{i=1}^{2^N} |c_i|^2 = 1$$, and $$|i\rangle$$ is an arbitrary orthonormal basis. Defined this way, $$|\Psi _R\rangle$$ is characterized as a Haar-random pure state42. We define

\begin{aligned} \langle \hat{A}\rangle _{\beta ,N}^{ens} \equiv \frac{\hbox {Tr}{[e^{-\beta H}\hat{A}]}}{\hbox {Tr}{[e^{-\beta H}]}}, \end{aligned}
(2)

as the ensemble expectation value of an operator $$\hat{A}$$ for a system of size N at inverse temperature $$\beta$$, and

\begin{aligned} \langle \hat{A}\rangle _{\beta ,N}^{TPQ} \equiv \frac{ \langle \beta ,N|\hat{A}|\beta ,N\rangle }{\langle \beta ,N||\beta ,N\rangle }, \end{aligned}
(3)

as the corresponding expectation value of $$\hat{A}$$ in a single TPQ state. It is noted that $$\hat{A}$$ must be a low-degree polynomial of local operators for the following analysis to hold, but this group contains many prominently used observables including energy, magnetization, and a number of relevant correlation functions. The error between $$\langle \hat{A}\rangle _{\beta ,N}^{TPQ}$$ and $$\langle \hat{A}\rangle _{\beta ,N}^{ens}$$ is bounded by a value that becomes exponentially small with increasing system size N32. In practice, this means that for sufficiently large N, measuring the desired observable in a single TPQ state will provide a good approximation of the true thermal average. Indeed, Ref.32 found less than a 1% error when estimating thermal properties using a single TPQ state for a quantum spin model with $$N=30$$. At lower N, fidelity may be increased by averaging over the measured values from multiple TPQ states32.

The first step in preparing a TPQ state on a quantum computer is the preparation of $$|\Psi _R\rangle$$. We can approximate this Haar-random state using random quantum circuits constructed in the manner proposed in Ref.33 and illustrated in Fig. 1.

The random circuits we consider are composed of “blocks,” where each block is composed of a layer of single-qubit rotation gates on every qubit followed by a layer of two-qubit entangling gates. The single-qubit rotation gates are selected from a finite set $$\mathbb {A}=\{RX(\frac{\pi }{2}),RY(\frac{\pi }{2}),T\}$$, with the constraint that no gate may be chosen for the same qubit two blocks in a row. In other words, if $$R_{i,j}$$ is the jth single-qubit gate acting on qubit i, then $$R_{i,j} \in \mathbb {A}\setminus R_{i,j-1}$$.

The two-qubit gate layers, as shown in Fig. 1b,c, follow a fixed pattern determined by the dimension of the Hamiltonian of interest; given a system dimension of k, there will be 2k different two-qubit gate layer patterns to loop through to ensure all coupling directions have been accounted for. In this way, the random circuits are easily generalizable to higher dimensions. For example, Fig. 1b shows the two 2-qubit gate patterns that must be looped through for a one-dimensional (1D) system. Similarly, Fig. 1c shows the four 2-qubit gate patterns to loop through for 2D systems.

Random circuits of this form can be defined by a single parameter d, which sets the number of blocks and ultimately controls the circuit depth. Such circuits can be seen as approximating successively higher t-designs as d is increased43,44,45,46. (A unitary t-design is an approximation of a Haar random unitary which accurately simulates the first t moments of Haar-random unitaries. Higher order t-designs generate states whose properties converge to those of Haar-random states). We can see this convergence by plotting the entropy of the resulting random state. This plot is useful for determining how large d needs to be for a given system size; the necessary d will be the point at which the state entropy is sufficiently converged to the value $$\ln {N}-1+\gamma$$ characteristic of Haar-random states, where $$\gamma \approx 0.577$$ is the Euler–Mascheroni constant. See Sect. I in the Supplementary Information (SI) for these convergence plots with and without simulated device noise. An N-qubit random circuit of this form with d layers has Nd single-qubit rotation gates and O(Nd) two-qubit entangling gates. Rigorous bounds predict a polynomial scaling of the number of required layers with respect to system size N, but in practice sublinear scalings have been observed46. Here, we set $$d=20$$, unless otherwise specified.

Once the random state $$|\Psi _R\rangle$$ has been generated, the non-unitary operator $$\hat{Q}$$ must be applied to generate a canonical TPQ state. Given that quantum computers can only perform unitary operations, this must be implemented by a unitary approximation of the non-unitary transformation. Several approximation methods exist and in practice, the method choice is informed by available quantum resources. Here, we demonstrate our algorithm with three different approaches, including QITE26, a dilated operator approach34, and the FABLE method35.

With QITE, neither ancillary qubits nor post-selection of results are required. This however, comes at the cost of deeper circuits that require significant classical resources to generate. Circuit generation time with QITE grows quickly as the simulated system size is increased, though in practice this can be mitigated by employing the so-called “inexact QITE” method to truncate the domain-size26. When average correlation lengths of the simulated system are small, a domain-size significantly smaller than total system size can be chosen to dramatically decrease the computational complexity of generating the QITE circuit. However, this can result in reduced accuracy when correlation lengths of the system are larger than the selected domain size.

The unitary dilation method requires a single ancilla qubit as well as post-selection of results. Any experiment in which the ancilla qubit is measured to be in the ‘1’ state must be discarded, which increases the number of shots required to generate good results. Overall, the unitary dilation method tends to generate shallower circuits than QITE, but at the price of requiring more shots.

Finally, FABLE requires a register of ancilla qubits that grows linearly with the simulated system size as well as post-selection of results. For a system size of N, FABLE requires $$N+1$$ ancilla qubits, all of which must be measured to be in the ‘0’ state (otherwise the result must be discarded). While its post-selection requirement based on such a large ancilla register will necessitate a very large number of shots, FABLE offers the most favorable circuit depths and circuit generation times of the three methods.

The algorithm shown in Fig. 2 summarizes our method for approximating thermal averages with TPQ states on quantum computers. First, the d-layer circuit is constructed to generate $$|\Psi _R\rangle$$. Layers of randomly chosen single-qubit rotation gates are alternated with layers of 2-qubit entangling gates following a pattern set by the Hamiltonian dimension k. Next, a unitary approximation of the non-unitary transformation $$\hat{Q}$$ is applied. The algorithm is flexible in how this non-unitary operation is implemented. The observable of interest is then measured in the resulting canonical TPQ state, and this process is repeated R times to construct and measure R distinct TPQ states. Finally, if applicable, the average of the measured thermal values is taken.

If QITE is chosen to implement line 10 in Fig. 2, the non-unitary operator $$\hat{Q}$$ is viewed as an evolution through imaginary time. To find a unitary approximation, this evolution is broken up into time-steps of imaginary time, and each time-step is approximated by a unitary sub-circuit26. Increasing the number of time-steps into which the imaginary-time evolution is broken will increase the transformation fidelity, i.e., the accuracy of the approximation.The size of the quantum circuit grows linearly with the number of time-steps, but a number of techniques have been developed to reduce the depth of these QITE circuits47,48,49,50. We also note that there exist other options for reducing the algorithmic error of QITE such as leveraging randomized compiling51 or reinforcement learning52. The sub-circuit for each time-step is constructed based on a so-called domain-size, which can be set to be equal to or less than the simulated system size, and should be chosen based on the average correlation length within the simulated system. Accuracy of the QITE algorithm is improved by increasing the domain-size that the unitaries act upon, but the classical computational cost associated with finding these unitaries grows exponentially with the domain-size. When using QITE, the number of imaginary time-steps and the domain-size must be chosen appropriately by the user for the simulated system.

If line 10 in Fig. 2 is instead implemented with the dilated operator approach, an ancillary qubit initialized in state $$|1\rangle$$ must be added to create the augmented input state $$\rho _{in} = (|1\rangle \langle 1|) \otimes \rho _R$$, where $$\rho _R = |\Psi _R\rangle \langle \Psi _R|$$. A dilated unitary operator $$\hat{\Omega }$$ is then constructed in the new $$2^{N+1}$$ dimensional Hilbert space that will approximate the action of $$\hat{Q}$$ on the original $$2^N$$ dimensional system Hilbert space34. $$\hat{\Omega }$$ is defined in terms of the non-unitary operator $$\hat{Q}$$ as

\begin{aligned} \hat{\Omega } \equiv \exp \left( i\epsilon \begin{pmatrix} 0&{}-i\hat{Q}\\ i\hat{Q}^\dagger &{}0 \end{pmatrix}\right) , \end{aligned}
(4)

where $$\epsilon$$ is a parameter that controls the performance of the operator. The dilated unitary operator $$\hat{\Omega }$$ is then applied to the augmented initial state $$\rho _{in}$$, and the ancillary qubit is measured. If it is measured in the state $$|0\rangle$$, the original N-qubit system of interest has been successfully transformed by an approximation to the non-unitary operator $$\hat{Q}$$. The probability that the ancillary qubit is measured in state $$|0\rangle$$, also called the probability of success, is denoted $$P_0$$. If the ancillary qubit is measured in state $$|1\rangle$$, the results of the entire circuit are discarded34.

To help quantify the performance of this dilated operator, it is useful to define a fidelity metric that measures how close the effective transformation on the N-qubit system is to the desired non-unitary transformation under $$\hat{Q}$$. This fidelity can be defined as:

\begin{aligned} F = \hbox {Tr}{\sqrt{\sqrt{\rho _0} |\beta ,N\rangle \langle \beta ,N|\sqrt{\rho _0}}}. \end{aligned}
(5)

Here, $$\rho _0$$ is the density matrix of the N-qubit system after tracing out the ancillary qubit, provided that it is measured in the state $$|0\rangle$$. The probability of success $$P_0$$ and the transformation fidelity F are ultimately controlled by the user’s choice of $$\epsilon$$. While a small $$\epsilon$$ is required to obtain an accurate approximation to the non-unitary operator (i.e. high transformation fidelity F), decreasing $$\epsilon$$ generally decreases the probability of success $$P_0$$, which means more shots need to be run until the ancilla is measured in the $$|0\rangle$$ state, indicating a successful transformation. See Sect. II of the SI for more information regarding the behavior of F and $$P_0$$ with varying $$\epsilon$$.

While this method introduces a single ancillary qubit, its advantage is that increasing the accuracy of the approximate non-unitary transformation only requires increasing the average number of shots needed to see achieve a successful transformation. This stands in contrast to the QITE algorithm, where increasing the accuracy of the approximation requires a larger number of imaginary time-steps and subsequently deeper circuits. However, this is still considered a near-term approach since it requires the decomposition of the $$N+1$$-qubit unitary into single and two-qubit gates; the decomposition of an m-qubit unitary generally requires a number of two-qubit entangling gates that is exponential in m. By counting the controlled single-qubit unitaries that need to be determined during the decomposition of multi-qubit unitaries via the column-by-column decomposition53 technique used by Qiskit and recursively decomposing the multi-qubit diagonal gates that emerge during this process via Theorem 7 of Ref.54, using this process to decompose an m-qubit unitary requires solving an upper bound of $$2^{2m} +(m-1)2^m -m$$ $$(2\times 2)$$ matrix equations and $$2^{2m+1} -(m+4)2^m+m+2$$ scalar equations before any circuit optimizations, yielding upper bounds of 4410 matrix equations and 7560 scalar equations for the simulations of 5-qubit systems seen in Fig. 4, and this decomposition method carries a classical time complexity of O($$m 2^{3m})$$55. If the dilated operator approach is preferred, then the only change in Fig. 2 is the addition of an ancillary qubit initialized in the $$|1\rangle$$ state which is not included in the random circuit construction. The non-unitary transformation approximation step then takes in the additional argument $$\epsilon$$. It is noted that other recently-developed probabilistic methods of approximating the non-unitary imaginary time evolution operator with a single ancillary qubit can skirt this exponential scaling by leveraging forward and backward real-time evolution operators, for which efficient circuit implementations have previously been developed56.

The third approach we follow to implement line 10 in Fig.  2 is a block-encoding of the non-unitary operator $$\hat{Q}$$, which is the embedding of the non-unitary in the leading principal block of a larger unitary U,

\begin{aligned} U = \begin{pmatrix} \hat{Q} &{} *\, \\ * &{} *\, \end{pmatrix}. \end{aligned}
(6)

Here, $$*$$ indicate arbitrary matrix elements. Assuming a ancilla qubits are used to block encode the $$2^N$$ dimensional $$\hat{Q}$$, the operator U is constructed in the $$2^{N+a}$$ dimensional Hilbert space. Next, we apply U to the augmented input state $$\rho _{\text {in}} = (|0\rangle ^{\otimes a} \langle 0|^{\otimes a}) \otimes \rho _R$$. If all a ancillary qubits are measured in the $$|0\rangle$$ state, the N-qubit operator $$\hat{Q}$$ has been successfully applied to $$\rho _R$$. The likelihood of a successful measurement can be increased through amplitude amplification.

We use FABLE 35 to generate a circuit for Eq. (6) based on the non-unitary operator $$\hat{Q}$$. FABLE circuits require $$N+1$$ ancilla qubits to encode an N-qubit operator and are fast to generate for small to medium-sized problems as the circuit generation algorithm scales as $$\mathcal {O}(N 4^N)$$. nThe advantage of FABLE over the dilated operator approach is that no performance parameter $$\epsilon$$ is required and the circuits can be generated more efficiently while requiring fewer CNOT gates. The main disadvantage is that significantly more ancillary qubits are required.

In Tables 1, 2 and 3, we compare the three different methods utilized in this work for implementing the non-unitary evolution step of TPQ state preparation on quantum computers. Specifically, the three methods are compared across three metrics relevant to practical implementations on near-term devices: CNOT gate count of involved circuits (Table 1), ancillary qubit requirements (Table 2), and the time to classically generate the required circuits (Table 3). The latter is important because since the execution of the required quantum circuits can take on the order of milliseconds, the classical computational costs associated with generating the quantum circuits can dominate the overall cost of the quantum simulation. All data is generated for simulation of N-spin Heisenberg models (described in more detail in the following section). We first attempted to generate the QITE circuits with the XACC software package57, however, this code requires the domain-size to be set equal to the system size, which made going to a system size of $$N=4$$ prohibitively expensive for our chosen model. For $$N > 3$$, we therefore used the ArQTiC software package58 to generate the QITE circuits. However, ArQTiC only allows a maximum domain-size of 3. Therefore, for systems with $$N > 3$$, a truncated domain-size of 3 had to be used, resulting in reduced accuracy of results. In Tables 1, 2 and 3, results from XACC are provided in the column labeled ‘QITE’, while results from ArQTiC are provided in the column labeled ‘Inexact QITE’, since we expect these results to have lowered accuracy due to domain-size truncation26. Since gate counts and/or circuit generation times of the QITE and dilated operator methods may vary between circuit realizations even for a given system size, data shown for these methods have been averaged over 10 circuit realizations. All data is collected for an inverse temperature of $$\beta =1$$, and for QITE data, the imaginary time evolution operator is broken into 10 imaginary time steps.

As seen in Tables 1, 2 and 3, while the QITE algorithm has a clear advantage in that it requires no ancillary qubits, its scaling of circuit generation time is seen to be practically prohibitive even at relatively small system size. Therefore, this method seems to be best suited for simulating very small systems on equally small near-term devices. Switching to an inexact QITE with the domain-size truncated to 3 significantly alleviates the circuit generation time scaling with system size at the cost of increased entangling gate counts and reduced accuracy of results. The dilated operator method appears to bridge the resource requirement gaps of QITE and FABLE methods; it requires a minimal ancillary qubit requirement that is independent of simulated system size, and has more favorable circuit generation time scaling than QITE, but at a cost of increased CNOT gate counts. For small systems, the latter may be mitigated by some circuit synthesis techniques such as QFAST59 and LEAP60. If sufficient ancillary qubit resources are available, the FABLE technique displays the best CNOT count and circuit generation time scalings of the three methods. Therefore, it is the best option when coherence times are a dominant limitation on experiments and ample qubit counts are available.

## Demonstration

We now demonstrate our method by calculating the thermal energy of a Heisenberg spin model under an external magnetic field at various system sizes and inverse temperatures. The Hamiltonian of an N-spin Heisenberg model is given by

\begin{aligned} \hat{H} = \sum _\alpha \sum _{\langle i,j \rangle } J_{\alpha ,ij} \sigma ^\alpha _i\sigma ^\alpha _{j} + h_x\sum _{i=1}^{N} \sigma ^x_i, \end{aligned}
(7)

where $$J_{\alpha ,ij}$$ ($$\alpha \in \{x,y,z\}$$) gives the strength of the exchange coupling interaction between nearest neighbor spin pair $$\langle i,j \rangle$$ in the $$\alpha$$-direction, $$h_x$$ is the strength of an externally applied magnetic field in the x-direction, and $$\sigma ^\alpha$$ are Pauli matrices. For results presented in this work, we set $$J_x = 0.5$$, $$J_y = 1.25$$, $$J_z = 2.0$$, and $$h_x = 1.0$$.

From TPQ state formalism it is known that, on average, the squared error from estimating a thermal value with a single TPQ state is bounded by a function that becomes exponentially small with increasing simulated system size32. At small system sizes, some of this error may be mitigated through averaging over multiple TPQ state realizations. For more details and numerical results, see Sect. III in the SI.

In Fig. 3, we present numerical results demonstrating the efficacy of the method in calculating thermal energies of 12-spin 1D and 2D Heisenberg models at various inverse temperatures. The random circuits are built through the aforementioned protocol and the non-unitary operation $$\hat{Q}$$ is numerically simulated. The thermal energies in both the 1D and 2D systems are closely approximated by averaging over just 10 canonical TPQ states.

Next, we demonstrate quantum simulator results of our method using the three different quantum circuit implementations of $$\hat{Q}$$. In Fig. 4, the thermal energy of a 5-qubit 1D Heisenberg model as a function of inverse temperature is approximated using the dilated operator, inexact QITE, and FABLE approaches to implement $$\hat{Q}$$. Inexact QITE circuits were generated and simulated with the ArQTiC package at a truncated domain size of $$D=3$$. Results using two different values of the dilated operator parameter $$\epsilon$$ are included to demonstrate its impact on performance. While smaller $$\epsilon$$ leads to significantly better results for larger $$\beta$$, it requires many more executions as the success probability decreases with smaller $$\epsilon$$. After constructing the requisite dilated operator to approximate the non-unitary transformation $$\hat{Q}$$, Qiskit was utilized to decompose it into a circuit with the basis gate set of the IBM’s “ibmq$$\_$$brooklyn” device. Presented results are averaged over $$R=10$$ distinct TPQ states, and the error bars show uncertainty.

The dilated operator method appears to be consistently accurate at low $$\beta$$, while its performance begins degrading past some threshold inverse temperature. This threshold can be increased by decreasing the performance parameter $$\epsilon$$. Results derived from using the FABLE technique have comparable accuracy to the dilated operator results at high temperatures and do not appear to exhibit a similar degradation at low temperatures. Results using the inexact QITE method exhibit some reduced accuracy compared to the other techniques at high temperatures, but still significantly outperform the dilated operator methods at low temperatures.

Finally, Fig. 5 compares results from circuits run both on a noiseless Qiskit quantum simulator and on IBM’s “ibmq$$\_$$brooklyn” quantum computer. Using the QITE version of the algorithm with no domain-size truncation, we calculated the thermal energy of a 3-site Heisenberg model as a function of inverse temperature $$\beta$$ while employing readout error mitigation (EM) and zero-noise extrapolation61,62 (ZNE) techniques to reduce error. The QITE circuits were generated using XACC57 then compressed through numerical optimization using the QSearch63 tool, reducing the total number of gates by around two orders of magnitude.

Figure 5 shows that while raw hardware data from generating and utilizing canonical TPQ states to measure thermal values has a significant amount of noise, this can be mitigated by standard techniques such as readout EM and ZNE. These error mitigation techniques can effectively accelerate the timeline in which this method for calculating thermal averages is achievable on near-term devices.

## Conclusion

We have presented a new method for approximating finite temperature properties of materials on quantum computers using TPQ states. We demonstrated its efficacy through approximating thermal energies of Heisenberg models in one and two dimensions. To demonstrate flexibility in how the non-unitary step of the algorithm is implemented, we presented results from a quantum simulator derived from performing this transformation with the QITE algorithm, a dilated operator approach, and the recently-developed FABLE method. We also present hardware results executed on IBM’s “ibmq$$\_$$brooklyn” quantum computer, showing the efficacy of the method when combined with standard error mitigation techniques. Due to the increasing accuracy of thermal observables derived from TPQ states with system size, as well as the flexibility in the implementation of our algorithm, we expect our method for computing thermal properties of materials to be increasingly useful as higher quality qubits continue to be added to current quantum computers as we progress through the NISQ era and beyond.