Exploring finite temperature properties of materials with quantum computers

Thermal properties of nanomaterials are crucial to not only improving our fundamental understanding of condensed matter systems, but also to developing novel materials for applications spanning research and industry. Since quantum effects arise at the nano-scale, these systems are difficult to simulate on classical computers. Quantum computers can efficiently simulate quantum many-body systems, yet current quantum algorithms for calculating thermal properties of these systems incur significant computational costs in that they either prepare the full thermal state on the quantum computer, or they must sample a number of pure states from a distribution that grows with system size. Canonical thermal pure quantum (TPQ) states provide a promising path to estimating thermal properties of quantum materials as they neither require preparation of the full thermal state nor require a growing number of samples with system size. Here, we present an algorithm for preparing canonical TPQ states on quantum computers. We compare three different circuit implementations for the algorithm and demonstrate their capabilities in estimating thermal properties of quantum materials. Due to its increasing accuracy with system size and flexibility in implementation, we anticipate that this method will enable finite temperature explorations of relevant quantum materials on near-term quantum computers.


I. INTRODUCTION
As the search for high-performance materials persists in research and industry alike, there is a need for better understanding the thermal properties of materials.In particular, exploring thermal properties of nanomaterials is of great interest, with applications ranging from energy production to nanoelectronics [1,2].Quantum effects can dominate at the nano-scale, and the exponential growth of resources required to simulate quantum systems with classical computers makes the simulation of such materials quickly exceed the capabilities of even the largest classical supercomputers [3].Quantum computers, by contrast, are able to simulate quantum many-body systems efficiently [4,5].Therefore, quantum computers offer a promising route to studying thermal properties of quantum materials.While a plethora of simulations of systems at zero-temperature have been demonstrated on quantum devices in recent years [6][7][8][9][10][11][12][13], the landscape of quantum algorithms to calculate finite temperature properties remains more sparse [14][15][16].The main challenge in exploring finite-temperature properties on quantum computers lies in the preparation of thermal states.
Current quantum algorithms for thermal state preparation fall into two main categories.The first comprises algorithms that initialize the qubits into the full thermal (i.e., mixed) state.In this case, the thermal average of an observable can be computed directly by measuring the observable in this state.Examples include algorithms that prepare the Gibbs state using phase estimation [17][18][19], which require quantum circuits that are too large for near-term quantum devices, otherwise known as noisy intermediate-scale quantum (NISQ) computers [20].Other examples include the variational quantum thermalizer [21] and methods that prepare thermofield double states [1,22], both of which rely on variational techniques.The variational nature of these algorithms necessitates the use of a cost function, which generally becomes hard to compute as system size increases.Such methods are therefore difficult to scale to large or complex systems.Still other methods for generating the full thermal state require a number of ancilla qubits that scales with system size or complexity [23,24], thus limiting the size of systems that can be simulated on current quantum hardware.
Algorithms in the second category prepare an ensemble of pure states, one pure state at a time, where each pure state has been sampled according to the correct thermal distribution.Existing examples rely on Monte Carlo sampling techniques, Markov chains, or both [25][26][27][28][29][30].To calculate thermal averages, the desired observable is measured in each of the different pure states and the results are averaged over the ensemble.As pure states are much easier to prepare on a quantum computer than mixed states, this model for thermal state preparation is more promising for NISQ computers.However, the number of samples required generally grows with the size of the system being simulated, which can lead to significant resource requirements for large systems.
Canonical thermal pure quantum (TPQ) states [31] offer a promising way to estimate thermal averages on quantum computers.Thermal state approximation by TPQ states lies in a separate third category, as it neither incurs the quantum resources required to prepare a mixed state, nor relies on a number of samples that grows with system size.TPQ states are formed by applying a specific non-unitary transformation, which is a function of the system Hamiltonian and inverse temperature, to a random state.The resulting state is shown to be representative of the thermal equilibrium in that observables measured in this state will approximate thermal averages.Remarkably, the error in the expectation value of the ob-servable is bounded by an exponentially decreasing function of system size N .Thus, at sufficiently large N , the expectation value of an observable in only a single TPQ state will yield a very close approximation to the thermal average.The effectiveness of canonical TPQ states has been demonstrated classically [31], but to our knowledge, has not been implemented on a quantum computer.
Here, we present an algorithm for generating canonical TPQ states on quantum computers, enabling the estimation of finite temperature properties of materials on NISQ devices.Our algorithm relies on a straightforward and scalable protocol for preparing the random state [32], which allows for circuit depths to be tuned to find a balance between desired accuracy and feasibility of execution on NISQ hardware.Furthermore, the algorithm is agnostic to implementation of the non-unitary transformation of the random state, which can be tailored to the resource constraints of different quantum devices.We compare three possible implementations for approximating the non-unitary transformation: (i) the quantum imaginary time evolution (QITE) algorithm [25], which can be better suited to devices constrained in qubit count, (ii) the dilated operator approach [33], which provides some advantages in resource requirements over the QITE algorithm when a single ancillary qubit is available, and (iii) the FABLE method [34] which is better suited for devices with limited coherence times.It is noted that Quantum Signal Processing may also be used to approximate this non-unitary operation, as discussed in Ref. [35].
We demonstrate our algorithm using each implementation for the Heisenberg model, a quintessential model used for studying a range of behaviors in materials [36][37][38][39][40].We anticipate that this algorithm will facilitate estimations of finite temperature properties of materials on near-term quantum devices.Furthermore, since error in estimating thermal averages with TPQ states decreases with increasing system size, we believe this algorithm will only become increasingly useful as quantum hardware continues to grow in size in the coming years.

II. THEORETICAL FRAMEWORK
The (unnormalized) canonical TPQ state of a system of size N , governed by Hamiltonian H, at inverse temperature β is defined as [31] |β, where i=1 c i |i is a random state, defined by complex amplitudes c i which are uniformly sampled from the unit hypersphere such that and |i is an arbitrary orthonormal basis.Defined this way, |Ψ R is characterized as a Haar-random pure state [41].We define as the ensemble expectation value of an operator Â for a system of size N at inverse temperature β, and as the corresponding expectation value of Â in a single TPQ state.It is noted that Â must be a low-degree polynomial of local operators for the following analysis to hold, but this group contains many prominently used observables including energy, magnetization, and a number of relevant correlation functions.The error between Â T P Q β,N and Â ens β,N is bounded by a value that becomes exponentially small with increasing system size N [31].In practice, this means that for sufficiently large N , measuring the desired observable in a single TPQ state will provide a good approximation of the true thermal average.Indeed, Ref. [31] found less than a 1% error when estimating thermal properties using a single TPQ state for a quantum spin model with N = 30.At lower N , fidelity may be increased by averaging over the measured values from multiple TPQ states [31].
The first step in preparing a TPQ state on a quantum computer is the preparation of |Ψ R .We can approximate this Haar-random state using random quantum circuits constructed in the manner proposed in Ref. [32] and illustrated in Fig. 1.The random circuits we consider are composed of "blocks," where each block is composed of a layer of single-qubit rotation gates on every qubit followed by a layer of two-qubit entangling gates.The singlequbit rotation gates are selected from a finite set A = {RX( π 2 ), RY ( π 2 ), T }, with the constraint that no gate may be chosen for the same qubit two blocks in a row.In other words, if R i,j is the jth single-qubit gate acting on qubit i, then R i,j ∈ A \ R i,j−1 .
The two-qubit gate layers, as shown in Fig. 1b-c, follow a fixed pattern determined by the dimension of the Hamiltonian of interest; given a system dimension of k, there will be 2k different two-qubit gate layer patterns to loop through to ensure all coupling directions have been accounted for.In this way, the random circuits are easily generalizable to higher dimensions.For example, Fig. 1b shows the two 2-qubit gate patterns that must be looped through for a one-dimensional (1D) system.Similarly, Fig. 1c shows the four 2-qubit gate patterns to loop through for 2D systems.
Random circuits of this form can be defined by a single parameter d, which sets the number of blocks and ultimately controls the circuit depth.Such circuits can be seen as approximating successively higher t-designs as d is increased [42][43][44][45].(A unitary t-design is an approximation of a Haar random unitary which accurately simulates the first t moments of Haar random unitaries.Higher order t-designs generate states whose properties converge to those of Haar-random states).We can see this convergence by plotting the entropy of the resulting random state.This plot is useful for determining how large d needs to be for a given system size; the necessary d will be the point at which the state entropy is sufficiently converged to the value ln N − 1 + γ characteristic of Haar-random states, where γ ≈ 0.577 is the Euler-Mascheroni constant.See Section I in the Supplementary Information (SI) for these convergence plots with and without simulated device noise.Here, we set d = 20, unless otherwise specified.
Once the random state |Ψ R has been generated, the non-unitary operator Q must be applied to generate a canonical TPQ state.Given that quantum computers can only perform unitary operations, this must be implemented by a unitary approximation of the non-unitary transformation.Several approximation methods exist and in practice, the method choice is informed by available quantum resources.Here, we demonstrate our algorithm with three different approaches, including QITE [25], a dilated operator approach [33], and the FABLE method [34].
With QITE, neither ancillary qubits nor post-selection of results are required.This however, comes at the cost of deeper circuits that require significant classical resources to generate.Circuit generation time with QITE grows quickly as the simulated system size is increased, though this can be mitigated by decreasing the domain-size chosen to generate the QITE approximation.When average correlation lengths of the simulated system are small, a domain-size significantly smaller than total system size can be chosen to dramatically decrease the computational complexity of generating the QITE circuit.However, this can result in reduced accuracy when correlation lengths of the system are larger than the selected domain size.
The unitary dilation method requires a single ancilla qubit as well as post-selection of results.Any experiment in which the ancilla qubit is measured to be in the '1' state must be discarded, which increases the number of shots required to generate good results.Overall, the unitary dilation method tends to generate shallower circuits than QITE, but at the price of requiring more shots.
Finally, FABLE requires a register of ancilla qubits that grows linearly with the simulated system size as well as post-selection of results.For a system size of N , FA-BLE requires N + 1 ancilla qubits, all of which must be measured to be in the '0' state (otherwise the result must be discarded).While its post-selection requirement based on such a large ancilla register will necessitate a very large number of shots, FABLE offers the most favorable circuit depths and circuit generation times of the three methods.
The algorithm shown in Fig. 2 summarizes our method for approximating thermal averages with TPQ states on quantum computers.First, the d-layer circuit is constructed to generate |Ψ R .Layers of randomly chosen single-qubit rotation gates are alternated with layers of 2-qubit entangling gates following a pattern set by the Hamiltonian dimension k.Next, a unitary approximation of the non-unitary transformation Q is applied.The algorithm is flexible in how this non-unitary operation is implemented.The observable of interest is then measured in the resulting canonical TPQ state, and this process is repeated R times to construct and measure R distinct TPQ states.Finally, if applicable, the average of the measured thermal values is taken.
3 Similarly, Fig. 1c shows the four 2-qubit gate patterns to loop through for 2D systems.
Random circuits of this form can be defined by a single parameter d, which sets the number of blocks and ultimately controls the circuit depth.Such circuits can be seen as approximating successively higher t-designs as d is increased [41][42][43][44].(A unitary t-design is an approximation of a Haar random unitary which accurately simulates the first t moments of Haar random unitaries.Higher order t-designs generate states whose properties converge to those of Haar-random states).We can see this convergence by plotting the entropy of the resulting random state.This plot is useful for determining how large d needs to be for a given system size; the necessary d will be the point at which the state entropy is su ciently converged to the value ln N 1 + characteristic of Haar-random states, where ⇡ 0.577 is the Euler-Mascheroni constant.See Section I in the Supplementary Information (SI) for these convergence plots with and without simulated device noise.Here, we set d = 20, unless otherwise specified.
Once the random state | R i has been generated, the non-unitary operator Q must be applied to generate a canonical TPQ state.Given that quantum computers can only perform unitary operations, this must be implemented by a unitary approximation of the non-unitary transformation.Several approximation methods exist and in practice, the method choice is informed by available quantum resources.Here, we demonstrate our algorithm with three di↵erent approaches, including QITE [25], a dilated operator approach [33], and the FABLE method [34].
With QITE, neither ancillary qubits nor post-selection of results are required.This however, comes at the cost of deeper circuits that require significant classical resources to generate.Circuit generation time with QITE grows quickly as the simulated system size is increased, though this can be mitigated by decreasing the domain-size chosen to generate the QITE approximation.When average correlation lengths of the simulated system are small, a domain-size significantly smaller than total system size can be chosen to dramatically decrease the computational complexity of generating the QITE circuit.However, this can result in reduced accuracy when correlation lengths of the system are larger than the selected domain size.
The unitary dilation method requires a single ancilla qubit as well as post-selection of results.Any experiment in which the ancilla qubit is measured to be in the '1' state must be discarded, which increases the number of shots required to generate good results.Overall, the unitary dilation method tends to generate shallower circuits than QITE, but at the price of requiring more shots.
Finally, FABLE requires a register of ancilla qubits that grows linearly with the simulated system size as well as post-selection of results.For a system size of N , FA-BLE requires N + 1 ancilla qubits, all of which must be measured to be in the '0' state (otherwise the result must be discarded).While its post-selection requirement based on such a large ancilla register will necessitate a very large number of shots, FABLE o↵ers the most favorable circuit depths and circuit generation times of the three methods.
Algorithm 14 summarizes our method for approximating thermal averages with TPQ states on quantum computers.First, the d-layer circuit is constructed to generate | R i. Layers of randomly chosen single-qubit rotation gates are alternated with layers of 2-qubit entangling gates following a pattern set by the Hamiltonian dimension k.Next, a unitary approximation of the non-unitary transformation Q is applied.The algorithm is flexible in how this non-unitary operation is implemented.The observable of interest is then measured in the resulting canonical TPQ state, and this process is repeated R times to construct and measure R distinct TPQ states.Finally, if applicable, the average of the measured thermal values is taken.
If QITE is chosen to implement line 10 in Algorithm 14, the non-unitary operator Q is viewed as an evolution through imaginary time.To find a unitary approximation, this evolution is broken up into time-steps of imaginary time, and each time-step is approximated by a unitary sub-circuit.Increasing the number of timesteps into which the imaginary-time evolution is broken will increase the transformation fidelity, i.e., the accuracy of the approximation.However, the size of the quantum circuit grows linearly with the number of time-steps.The sub-circuit for each time-step is constructed based on a so-called domain-size, which can be set to be equal to or less than the simulated system size, and should be chosen based on the average correlation length within the If QITE is chosen to implement line 10 in Fig. 2, the non-unitary operator Q is viewed as an evolution through imaginary time.To find a unitary approximation, this evolution is broken up into time-steps of imaginary time, and each time-step is approximated by a unitary subcircuit.Increasing the number of time-steps into which the imaginary-time evolution is broken will increase the transformation fidelity, i.e., the accuracy of the approximation.However, the size of the quantum circuit grows linearly with the number of time-steps.The sub-circuit for each time-step is constructed based on a so-called domain-size, which can be set to be equal to or less than the simulated system size, and should be chosen based on the average correlation length within the simulated system.The larger the domain-size the more accurate the QITE approximation will be, though this comes at the cost of increased computation complexity in generating the sub-circuits.When using QITE, the number of imaginary time-steps and the domain-size must be chosen appropriately by the user for the simulated system.
If line 10 in Fig. 2 is instead implemented with the dilated operator approach, an ancillary qubit initialized in state |1 must be added to create the augmented input state A dilated unitary operator Ω is then constructed in the new 2 N +1 dimensional Hilbert space that will approximate the action of Q on the original 2 N dimensional system Hilbert space [33].Ω is defined in terms of the nonunitary operator Q as where is a parameter that controls the performance of the operator.The dilated unitary operator Ω is then applied to the augmented initial state ρ in , and the ancillary qubit is measured.If it is measured in the state |0 , the original N -qubit system of interest has been successfully transformed by an approximation to the non-unitary operator Q.The probability that the ancillary qubit is measured in state |0 , also called the probability of success, is denoted P 0 .If the ancillary qubit is measured in state |1 , the results of the entire circuit are discarded.[33].
To help quantify the performance of this dilated operator, it is useful to define a fidelity metric that measures how close the effective transformation on the N -qubit system is to the desired non-unitary transformation under Q.This fidelity can be defined as: Here, ρ 0 is the density matrix of the N -qubit system after tracing out the ancillary qubit, provided that it is measured in the state |0 .The probability of success P 0 and the transformation fidelity F are ultimately controlled by the user's choice of .While a small is required to obtain an accurate approximation to the non-unitary operator (i.e.high transformation fidelity F ), decreasing generally decreases the probability of success P 0 , which means more shots need to be run until the ancilla is measured in the |0 state, indicating a successful transformation.See Section II of the SI for more information regarding the behavior of F and P 0 with varying .While this method introduces a single ancillary qubit, its advantage is that increasing the accuracy of the ap-proximate non-unitary transformation only requires increasing the average number of shots needed to see achieve a successful transformation.This stands in contrast to the QITE algorithm, where increasing the accuracy of the approximation requires a larger number of imaginary time-steps and subsequently deeper circuits.If the dilated operator approach is preferred, then the only change in Fig. 2 is the addition of an ancillary qubit initialized in the |1 state which is not included in the random circuit construction.The non-unitary transformation approximation step then takes in the additional argument .
The third approach we follow to implement line 10 in Fig. 2 is a block-encoding of the non-unitary operator Q, which is the embedding of the non-unitary in the leading principal block of a larger unitary U , Here, * indicate arbitrary matrix elements.Assuming a ancilla qubits are used to block encode the 2 N dimensional Q, the operator U is constructed in the 2 N +a dimensional Hilbert space.Next, we apply U to the augmented input state ρ in = (|0 ⊗a 0| ⊗a ) ⊗ ρ R .If all a ancillary qubits are measured in the |0 state, the N -qubit operator Q has been successfully applied to ρ R .The likelihood of a successful measurement can be increased through amplitude amplification.
We use FABLE [34] to generate a circuit for Eq. 6 based on the non-unitary operator Q. FABLE circuits require N + 1 ancilla qubits to encode an N -qubit operator and are fast to generate for small to medium-sized problems as the circuit generation algorithm scales as O(N 4 N ).The advantage of FABLE over the dilated operator approach is that no performance parameter is required and the circuits can be generated more efficiently while requiring fewer CNOT gates.The main disadvantage is that significantly more ancillary qubits are required.
In Tables I-III, we compare the three different methods utilized in this work for implementing the non-unitary evolution step of TPQ state preparation on quantum computers.Specifically, the three methods are compared across three metrics relevant to practical implementations on near-term devices: CNOT gate count of involved circuits (Table I), ancillary qubit requirements (Table II), and the time to classically generate the required circuits (Table III).All data is generated for simulation of N -spin Heisenberg models (described in more detail in the following section).We first attempted to generate the QITE circuits with the XACC software package [46], however, this code requires the domain-size to be set equal to the system size, which made going to a system size of N = 4 prohibitively expensive for our chosen model.For N > 3, we therefore used the ArQTiC software package [47] to generate the QITE circuits.However, ArQTiC only allows a maximum domain-size of 3. Therefore, for systems with N > 3, a truncated domain-size of 3 had to be used, resulting in reduced accuracy of results.In Tables I-III, results from XACC are provided in the column labeled 'QITE', while results from ArQTiC are provided in the column labeled 'Inexact QITE', since we expect these results to have lowered accuracy due to domain-size truncation [25].Since gate counts and/or circuit generation times of the QITE and dilated operator methods may vary between circuit realizations even for a given system size, data shown for these methods have been averaged over 10 circuit realizations.All data is collected for an inverse temperature of β = 1, and for QITE data, the imaginary time evolution operator is broken into 10 imaginary time steps.As seen in Tables I-III, while the QITE algorithm has a clear advantage in that it requires no ancillary qubits, its scaling of circuit generation time is seen to be practically prohibitive even at relatively small system size.Therefore, this method seems to be best suited for simulating very small systems on equally small near-term devices.
Switching to an inexact QITE with the domain-size truncated to 3 significantly alleviates the circuit generation time scaling with system size at the cost of increased entangling gate counts and reduced accuracy of results.
The dilated operator method appears to bridge the resource requirement gaps of QITE and FABLE methods; it requires a minimal ancillary qubit requirement that is independent of simulated system size, and has more favorable circuit generation time scaling than QITE, but at a cost of increased CNOT gate counts.For small systems, the latter may be mitigated by some circuit synthesis techniques such as QFAST [48] and LEAP [49].If sufficient ancillary qubit resources are available, the FA-BLE technique displays the best CNOT count and circuit generation time scalings of the three methods.Therefore, it is the best option when coherence times are a dominant limitation on experiments and ample qubit counts are available.

III. DEMONSTRATION
We now demonstrate our method by calculating the thermal energy of a Heisenberg spin model under an external magnetic field at various system sizes and inverse temperatures.The Hamiltonian of an N -spin Heisenberg model is given by where J α,ij (α ∈ {x, y, z}) gives the strength of the exchange coupling interaction between nearest neighbor spin pair i, j in the α-direction, h x is the strength of an externally applied magnetic field in the x-direction, and σ α are Pauli matrices.For results presented in this work, we set J x = 0.5, J y = 1.25, J z = 2.0, and h x = 1.0.
From TPQ state formalism it is known that, on average, the squared error from estimating a thermal value with a single TPQ state is bounded by a function that becomes exponentially small with increasing simulated system size.[31] At small system sizes, some of this error may be mitigated through averaging over multiple TPQ state realizations.For more details and numerical results, see Section III in the SI.
In Fig. 3, we present numerical results demonstrating the efficacy of the method in calculating thermal energies of 12-spin 1D and 2D Heisenberg models at various inverse temperatures.The random circuits are built through the aforementioned protocol and the non-unitary operation Q is numerically simulated.The thermal energies in both the 1D and 2D systems are closely approximated by averaging over just 10 canonical TPQ states.Next, we demonstrate quantum simulator results of our method using the three different quantum circuit implementations of Q.In Fig. 4a, the thermal energy of a 3-qubit 1D Heisenberg model as a function of inverse temperature β is computed using QITE to approximate Q.The QITE circuit was generated using XACC [46].Results are averaged over R = 10 TPQ states, with error bars showing the uncertainty given by one standard deviation divided by √ R. In Fig. 4b, the thermal energy of a 5-qubit 1D Heisenberg model as a function of inverse temperature is approximated using the dilated operator and FABLE approaches to implement Q. Results using two different values of the dilated operator parameter are included to demonstrate its impact on performance.While smaller leads to significantly better results for larger β, it requires many more executions as the success probability decreases with smaller .After constructing the requisite dilated operator to approximate the nonunitary transformation Q, Qiskit was utilized to decompose it into a circuit with the basis gate set of the IBM's "ibmq brooklyn" device.Presented results are averaged over R = 10 distinct TPQ states, and the error bars show uncertainty.
The dilated operator method appears to be consistently accurate at low β, while its performance begins degrading past some threshold inverse temperature.This threshold can be increased by decreasing the performance parameter .Results derived from using the FABLE technique have comparable accuracy to the dilated operator results at high temperatures and do not appear to exhibit a similar degradation at low temperatures.
Finally, Fig. 5 demonstrates results from circuits run on IBM's "ibmq brooklyn" quantum computer.Using the QITE version of the algorithm, we calculated the thermal energy of a 3-site Heisenberg model as a function of inverse temperature β while employing readout error mitigation (EM) and zero-noise extrapolation [50,51] (ZNE) techniques to reduce error.The QITE circuits were compressed through numerical optimization using the QSearch [52] tool.Fig. 5 shows that while raw hardware data from generating and utilizing canonical TPQ states to measure thermal values has a significant amount of noise, this can be mitigated by standard techniques such as readout EM and ZNE.These error mitigation techniques can effectively accelerate the timeline in which this method for calculating thermal averages is achievable on near-term devices.

IV. CONCLUSION
We have presented a new method for approximating finite temperature properties of materials on quantum computers using TPQ states.We demonstrated its efficacy through approximating thermal energies of Heisenberg models in one and two dimensions.To demonstrate flexibility in how the non-unitary step of the algorithm is implemented, we presented results from a quantum simulator derived from performing this transformation with the QITE algorithm, a dilated operator approach, and the recently-developed FABLE method.We also present hardware results executed on IBM's "ibmq brooklyn" quantum computer, showing the efficacy of the method when combined with standard error mitigation techniques.Due to the increasing accuracy of thermal observables derived from TPQ states with system size, as well as the flexibility in the implementation of our algorithm, we expect our method for computing thermal properties of materials to be increasingly useful as higher quality qubits continue to be added to current quantum computers as we progress through the NISQ era and beyond.

SUPPLEMENTARY MATERIAL A. Haar-random behavior of random circuit states
A useful characteristic of a random state |ψ R is its entropy H: where In a true Haar-random state, p k values follow the Porter-Thomas distribution, which can be shown to lead to H Haar = ln 2 N − 1 + γ where γ ≈ 0.577 is Euler's constant [45].
As shown in Fig. S1, the entropy of the states generated by this random circuit structure approaches what is characteristic of Haar-random states in relatively few layers when circuits are ideally simulated.The two-qubit gates can be either CZ or CNOT gates, but since CZ sets have shown to converge upon Haar-random values slightly faster than CNOT sets [32], results presented here utilize CZ gates.We also illustrate the effects of simulated device noise; the entropy of the N -qubit random state generated with noisy circuits converges to a value slightly higher than what would be expected of an N -qubit Haar-random state.This is not anticipated to meaningfully affect resulting thermal value estimations.A B-spline of degree 2 is fit to the data for readability.The entropy of ideal circuits approaches the value expected in a 5-qubit Haar-random state at d ≤ 10, in line with the findings of Ref. [32], while the simulated device noise causes the entropy to approach a slightly higher value.

B. Dilated operator fidelity and probability behavior
As mentioned in the main text, the performance of the dilated operator approach to approximating Q exhibits complex dependencies on the parameter .The main performance metrics considered are the fidelity of transformation and the probability of a successful transformation occurring, defined as F and P 0 in the main text respectively, so it is useful to examine how these properties change with .As an example task, we will seek to determine the thermal energy of a 3-site Heisenberg model with J x = 0.5, J y = 1.25, J z = 2.0, and h x = 1.0 at inverse temperature β = 0.5.Fig. S2 shows the thermal energy H T P Q β,H averaged over R = 100 independent TPQ states realized using the dilated operator approach for varying , along with the probability of success P 0 , transformation fidelity F , and the true ensemble thermal energy H ens β,H .As seen in Fig. S2, the transformation fidelity is highest at low , while this is where the probability of success is lowest.Therefore, one must pick a value of such that the transformation fidelity is high enough to recover a good enough approximation of the observable of interest, while making sure the probability of success is large enough to be practical.This threshold will depend on how many shots are feasible to run in a particular research setting.

C. TPQ State Error Trends
Fig. S3a demonstrates observed error trends with increasing system size N , averaged over results from R = 100 distinct TPQ states, for a 1D Heisenberg model at inverse temperature β = 0.5.We define the average squared error as D(H) 2 = ( Ĥ T P Q − Ĥ ens ) 2 in alignment with the TPQ state formalism laid out in Ref. [31].
Results are shown from TPQ states formed from random circuits with d = 2, when the approximation to a Haarrandom state is expected to be poor, and d = 50, when this approximation is expected to be close.In the latter case, the average squared error is shown to generally trend down as system size increases, but this trend is not observed when the TPQ states are formed from poor approximations to Haar-random states.This aligns with expectations from Ref. [31].For low N , the error from utilizing a single TPQ state may become large.Fig. S3b demonstrates that averaging over the results from multiple TPQ states can reduce such error.In this example, the observable of interest is the thermal energy as a func-tion of inverse temperature β, and the simulated system is a 6-spin Heisenberg model.

Figure 1 .
Figure 1.(a) Gate pattern for approximate Haar-random circuits.Ri,j are randomly selected single-qubit rotation gates selected from a finite set with constraints (see main text).In 1D systems 2-qubit layers follow the pattern ABAB... with individual layer patterns specified in (b), while in 2D systems, 2-qubit layers follow the pattern ABCDABCD..., with individual layer patterns specified in (c).

Figure 2 .
Figure 2. Pseudocode algorithm for computation of thermal averages with canonical TPQ states on quantum computers.

Figure 3 .
Figure 3. Thermal energy, as a function of inverse temperature β of a 1D Heisenberg model on a 12-qubit lattice (blue) and a 2D Heisenberg model on a 4x3 lattice (black).Shown are results averaged over 10 TPQ state realizations.The true ensemble thermal energies (dashed) are plotted for reference.

Figure 4 .Figure 5 .
Figure 4. Thermal energy, as a function of inverse temperature β, calculated with TPQ states on a noiseless quantum simulator for a (a) 3-qubit 1D Heisenberg model using the QITE algorithm to approximate Q (b) 5-qubit 1D Heisenberg model using FABLE and a dilated operator approach to approximate Q.

Figure S1 .
Figure S1.Entropy in 5-qubit random states prepared by random circuits of increasing depth d with and without simulated device noise mimicking the IBM "Quito" quantum computer.A B-spline of degree 2 is fit to the data for readability.The entropy of ideal circuits approaches the value expected in a 5-qubit Haar-random state at d ≤ 10, in line with the findings of Ref.[32], while the simulated device noise causes the entropy to approach a slightly higher value.

Figure
Figure S3.(a) Squared error of thermal energy calculated from TPQ states with d = 2 and d = 50 as a function of system size N for a 1D Heisenberg model at inverse temperature β = 0.5.Results are averaged over 100 TPQ state realizations.(b) Comparing thermal energies of a 6-spin Heisenberg model as a function of inverse temperature β calculated from a single TPQ state and 100 TPQ states (R=1 and R=100 respectively).Exact values are shown for comparison.

TABLE I .
CNOT gate count at various system sizes N for approximating the imaginary time evolution step of TPQ state preparation with the QITE, inexact QITE, dilated operator, and FABLE algorithms at β = 1.

TABLE III
. Wall-clock circuit generation time at various system sizes N for approximating the imaginary time evolution step of TPQ state preparation with the QITE, inexact QITE, dilated operator, and FABLE algorithms at β = 1.