Variational preparation of finite-temperature states on a quantum computer

The preparation of thermal equilibrium states is important for the simulation of condensed-matter and cosmology systems using a quantum computer. We present a method to prepare such mixed states with unitary operators, and demonstrate this technique experimentally using a gate-based quantum processor. Our method targets the generation of thermofield double states using a hybrid quantum-classical variational approach motivated by quantum-approximate optimization algorithms, without prior calculation of optimal variational parameters by numerical simulation. The fidelity of generated states to the thermal-equilibrium state smoothly varies from 99 to 75% between infinite and near-zero simulated temperature, in quantitative agreement with numerical simulations of the noisy quantum processor with error parameters drawn from experiment.


INTRODUCTION
The potential for quantum computers to simulate other quantum mechanical systems is well known [1], and the ability to represent the dynamical evolution of quantum many-body systems has been demonstrated [2]. However, the accuracy of these simulations depends on efficient initial state preparation within the quantum computer. Much progress has been made on the efficient preparation of non-trivial quantum states, including spinsqueezed states [3] and entangled cat states [4]. Studying phenomena like high-temperature superconductivity [5] requires preparation of thermal equilibrium states, or Gibbs states. Producing mixed states with unitary quantum operations is not straightforward, and has only recently begun to be explored [6,7]. In this work, we demonstrate the use of a variational quantum-classical algorithm to realize Gibbs states using (ideally unitary) gate control on a transmon quantum processor.
Our approach is mediated by the generation of thermofield double (TFD) states, which are pure states sharing entanglement between two identical quantum systems with the characteristic that when one of the systems is considered independently (by tracing over the other), the result is a mixed state representing equilibrium at a specific temperature. TFD states are of interest not only in condensed matter physics but also for the study of black holes [8,9] and traversable wormholes [10,11]. We use a variational protocol [12] motivated by quantumapproximate optimization algorithms (QAOA) that re-lies on alternation of unitary intra-and inter-system operations to control the effective temperature, eliminating the need for a large external heat bath. Recently, verification of TFD state preparation was demonstrated on a trapped-ion quantum computer [6]. Our work experimentally demonstrates the first generation of finitetemperature states in a superconducting quantum computer by variational preparation of TFD states in a hybrid quantum-classical manner.

RESULTS
Theory Consider a quantum system described by Hamiltonian H with eigenstates |j and corresponding eigenenergies E j : The Gibbs state ρ Gibbs of the system is ρ Gibbs (β) = 1 Z systems A and B as Tracing out either system yields the desired Gibbs state in the other.
To prepare the TFD states, we follow the variational protocol proposed by Wu and Hsieh [12] and consider two systems each of size n. In the first step of the procedure, the TFD state at β = 0 is generated by creating Bell pairs Φ + i = (|0 Bi |0 Ai + |1 Bi |1 Ai ) / √ 2 between corresponding qubits i in the two systems. Tracing out either system yields a maximally mixed state on the other, and vice versa. The next steps to create the TFD state at finite temperature depend on the relevant Hamiltonian. Here, we choose the transverse field Ising model in a onedimensional chain of n spins [13], with n = 2 [ Fig. 1(a)]. We map spin up (down) to the computational state |0 (|1 ) of the corresponding transmon. The Hamiltonian describing system A is where ZZ A = Z A2 Z A1 , X A = X A2 +X A1 , and g is proportional to the transverse magnetic field. The Hamiltonian for system B is the same. We focus on g = 1, where a phase transition is expected in the transverse field Ising model at large n [14]. We use a QAOA-motivated variational ansatz [12,15], where intra-system evolution is interleaved with a Hamiltonian enforcing interaction between the systems: where XX BA = X B2 X A2 + X B1 X A1 , and analogously for ZZ BA . For single-step state generation, the unitary operation describing the TFD protocol is The variational parameters γ = (γ 1 , γ 2 ), α = (α 1 , α 2 ) are optimized by the hybrid classical-quantum algorithm to generate states closest to the ideal TFD states. A single step of intra-and inter-system interaction ideally produces the state |ψ( α, γ) = U ( α, γ) Φ + 2 ⊗ Φ + 1 [16]. The variational algorithm extracts the cost function after each state preparation. We engineer a cost function C to be minimized when the generated state is closest to an ideal TFD state [16]. This cost function is Principle and generation of Thermofield Double state. (a) Two identical systems A and B are variationally prepared in an ideally pure, entangled joint state such that tracing out one system yields the Gibbs state on the other. (b) Corresponding qubits in the two systems are first pairwise entangled to produce the β = 0 TFD state. Next, intra-and inter-system Hamiltonians are applied with optimized variational angles ( α, γ) to approximate the TFD state corresponding to the desired temperature.
We compare the performance of this engineered cost function C 1.57 to that of the non-optimized cost function C 1.00 , using the reduction of infidelity to the Gibbs state as the ultimate metric of success (see [17]). The engineered cost function achieves an average improvement of 54% across the β range covered ([10 −2 , 10 2 ] in units of 1/g), as well as a maximum improvement of up to 98% for intermediate temperatures (β ∼ 1). Our choice of the class of cost functions to optimize lets us trade off a slight decrease in low-temperature performance with a significant increase in performance at intermediate temperatures. See [16] for further details on the theory.
The quantum portion of the algorithm prepares the state according to a given set of angles ( α, γ), performs the measurements, and returns these values to the classical portion. The classical portion then evaluates the cost function according to the returned measurements, performs classical optimization, generates and returns the next set of variational angles to evaluate on the quantum portion.

mm
Device and optimized quantum circuit. (a) Optical image of the transmon processor used in this experiment, with false color highlighting the four transmons employed and the dedicated bus resonators providing their nearest-neighbor coupling. (b) Optimized circuit equivalent to that in Fig. 1(b) and conforming to the native gate set in our architecture. All variational parameters are mapped onto rotation axes and angles of single-qubit gates. Tomographic pre-rotations R1-R4 are added to reconstruct the terms in the cost function C and to perform two-qubit state tomography of each system following optimization.

Experiment
We implement the algorithm using four of seven transmons in a monolithic quantum processor [ Fig. 2(a)]. The four transmons (labelled A 1 , A 2 , B 1 , and B 2 ) have square connectivity provided by coupling bus resonators, and are thus ideally suited for implementing the circuit in Fig. 1(b). Each transmon has a microwave-drive line for single-qubit gating, a flux-bias line for two-qubit controlled-Z (CZ) gates, and a dispersively coupled resonator with dedicated Purcell filter [18,19]. The four transmons can be simultaneously and independently read out by frequency multiplexing, using the common feedline connecting to all Purcell filters. All transmons are biased to their flux-symmetry point (i.e., sweetspot [20]) using static flux bias to counter residual offsets. Device details and a summary of measured transmon parameters are provided in [17].
Landscape of the cost function for infinite temperature. Panels (a) and (c): Landscape cuts obtained in noiseless simulation and in experiment, respectively, when varying control angles γ while keeping α = 0 (the ideal solution value). Panels (b) and (d): Corresponding cuts obtained when varying control angles α while keeping γ = 0 (the ideal solution value). See text for details. These landscape cuts are sampled at 100 points and interpolated using the Python package adaptive [21].
In order to realize the theoretical circuit in Fig. 1(b), we first map it to the optimized depth-13 equivalent circuit shown in Fig. 2(b), which conforms to the native gate set in our control architecture. This gate set consists of arbitrary single-qubit rotations about any equatorial axis of the Bloch sphere, and CZ gates between nearestneighbor transmons. Conveniently, all variational angles are mapped to either the axis or angle of single-qubit rotations. Further details on the compilation steps are reported in the Methods section and [17]. Bases prerotations are added at the end of the circuit to first extract all the terms in the cost function C and finally to perform two-qubit state tomography of each system.
Prior to implementing any variational optimizer, it is helpful to build a basic understanding of the costfunction landscape. To this end, we investigate the cost function C at β = 0 using two-dimensional cuts: we sweep γ while keeping α = 0 to study the effect of U intra and vice versa to study the effect of U inter . Note that owing to the β −1.57 divergence, the cost function reduces to − H BA in the β = 0 limit. Consider first the landscape for an ideal quantum processor, which is possible to compute for our system size. The γ landscape at α = 0 is π-periodic in both directions due to the invariance of |TFD(β = 0) under bit-flip (X) and phase-flip (Z) operations on all qubits. The cost function is minimized to -4 at even multiples of π/2 on γ 1 and γ 2 : |TFD(β = 0) is a simultaneous eigenstate of XX BA and ZZ BA with eigenvalue +2 due to the symmetry of the constituting Bell states Φ + i . In turn, the cost function is maximized to +4 at odd multiples of π/2, at which the Φ + i are transformed to sin- The α landscape at γ = 0 is constant, reflecting that |TFD(β = 0) is a simultaneous eigenstate of XX BA and ZZ BA and thus also of any exponentiation of these operators. The corresponding experimental landscapes show qualitatively similar behavior. The γ landscape clearly shows the π periodicity with respect to both angles, albeit with reduced contrast. The α landscape is not strictly constant, showing weak structure particularly with respect to α 2 . These experimental deviations reflect underlying errors in our noisy intermediate-scale quantum (NISQ) processor, which include transmon decoherence, residual ZZ coupling at the bias point, and leakage during CZ gates. We discuss these error sources in detail further below.
The challenge faced by the variational algorithm is to balance the mixture of the states at each β, in order to generate the corresponding Gibbs state. When working with small systems, it is possible and tempting to predetermine the variational parameters at each β by a prior classical simulation and optimization for an ideal or noisy quantum processor. We refer to this common practice [6,22] as cheating, since this approach does not scale to larger problem sizes and skips the main quality of variational algorithms: to arrive at the parameters variationally. Here, we avoid cheating altogether by starting at β = 0, with initial guess the obvious optimal variational parameters for an ideal processor ( γ = α = 0), and using the experimentally optimized ( α, γ) at the last β as an initial guess when stepping β in the range [0, 5] (in units of 1/g). This approach only relies on the assumption that solutions (and their corresponding optimal variational angles) vary smoothly with β. At each β, we use the Gradient-Based Random-Tree optimizer of the scikitoptimize [23] Python package to minimize C, using 4096 averages per tomographic pre-rotation necessary for the calculation of C. After 200 iterations, the optimization is stopped. The best point is remeasured two times, each with 16384 averages per tomographic pre-rotation needed to perform two-qubit quantum state tomography of each system. A new optimization is then started for the next β, using the previous solution as the initial guess.
To begin comparing the optimized states ρ Exp produced in experiment to the target Gibbs states ρ Gibbs , we first visualize their density matrices (in the computational basis) for a sampling of the β range covered (Fig. 4). Starting from the maximally-mixed state II/4 at β = 0, the Gibbs state monotonically develops coherences (off-diagonal terms) between all states as β increases. Coherences between states of equal (opposite) parity have 0 (π) phase throughout. Populations (diagonal terms) monotonically decrease (increase) for even (odd) parity states. By β = 5, the Gibbs state is very close to the pure state |Υ Υ|, where Qualitative comparison of optimized states to the Gibbs state. Visualization of the density matrices (in the computational basis) for the targeted Gibbs states ρ Gibbs (left) and the optimized experimental states ρExp (right) at (a-b) β = 0, (c-d) β = 1 and (e-f) β = 5. As β increases, the Gibbs state monotonically develops coherence between all states, with phase 0 (π) for states with the same (opposite) parity. Populations in even (odd) parity states decrease (increase). The optimized experimental states show qualitatively similar trends.
The noted trends are reproduced in ρ Exp . However, the matching is evidently not perfect, and to address this we proceed to a quantitative analysis.
We employ two metrics to quantify experimental performance: the fidelity F of ρ Exp to ρ Gibbs and the purity P of ρ Exp , given by At β = 0, F = 99% and P = 0.262, revealing a very close match to the ideal maximally-mixed state. However, F smoothly worsens with increasing β, decreasing to 92% at β = 1 and 75% by β = 5. Simultaneously, P does not Performance of the variational algorithm. (a) Fidelity to the Gibbs state as a function of inverse temperature β for experimental states obtained by optimization and cheating. (b) Purity of experimental states as a function of β, and comparison to the purity of the Gibbs state. Added curves in both panels are obtained by numerical simulation of a noisy quantum processor with incremental error models based on calibrated error sources for our device: qubit relaxation and dephasing times, increased dephasing from flux noise during CZ gates, residual ZZ crosstalk at the bias point, and leakage during CZ gates. Leakage is identified as the dominant error source.
closely track the increase of purity of the Gibbs state. By β = 5, the Gibbs state is nearly pure, but P peaks at 0.601.
In an effort to quantitatively explain these discrepancies, we perform a full density-matrix simulation of a four-qutrit system using quantumsim [24]. Our simulation incrementally adds calibrated errors for our NISQ processor, starting from an ideal processor (model 0): transmon relaxation and dephasing times at the bias point (model 1), increased dephasing from flux noise during CZ gates (model 2), crosstalk from residual ZZ coupling at the bias point (model 3), and transmon leakage to the second-excited state during CZ gates (model 4). The experimental input parameters for each increment are detailed in the Methods section and [17]. The added curves in Fig. 5 clearly show that model 4 quantitatively matches the observed dependence of F and P over the full β range, and identifies leakage from CZ gates as the dominant error.

DISCUSSION
The power of variational algorithms relies on their adaptability: the optimizer is meant to find its way through the variational parameter space, adapting to mitigate coherent errors as allowed by the chosen parametrization. For completeness, we compare in Fig. 5 the performance achieved with our variational strategy to that achieved by cheating, i.e., using the pre-calculated optimal ( α, γ) for an ideal processor. Our variational approach, whose sole input is the obvious initial guess at β = 0, achieves comparable performance at all β. This aspect is crucial when considering the scaling with problem size, as classical pre-simulations will require prohibitive resources beyond ∼ 50 qubits, but variational optimizers would not. Given the dominant role of leakage as the error source, which cannot be compensated by the chosen parametrization, it is unsurprising in hindsight that both approaches yield nearly identical performance.
In summary, we have presented the first generation of finite-temperature Gibbs states in a quantum computer by variational targeting of TFD states in a hybrid quantum-classical manner. The algorithm successfully prepares mixed states for the transverse field Ising model with Gibbs-state fidelity ranging from 99% to 75% as β increases from 0 to 5/g. The loss of fidelity with decreasing simulated temperature is quantitatively matched by a numerical simulation with incremental error models based on experimental input parameters, which identifies leakage in CZ gates as dominant. This work demonstrates the suitability of variational algorithms on NISQ processors for the study of finite-temperature problems of interest, ranging from condensed-matter physics to cosmology. Our results also highlight the critical importance of continuing to reduce leakage in two-qubit operations when employing weakly-anharmonic multi-level systems such as the transmon.
During the preparation of this manuscript, we became aware of related experimental work [22] on a trapped-ion system, applying a non-variationally prepared TFD state to the calculation of a critical point.

METHODS
We map the theoretical circuit in Fig. 1(b) to an equivalent circuit conforming to the native gate set in our control architecture and exploiting virtual Z-gate compilation [25] to minimize circuit depth. Single-qubit rotations R XY (φ, θ), by arbitrary angle θ around any equatorial axis cos(φ)x + sin(φ)ŷ on the Bloch sphere, are realized using 20 ns DRAG pulses [26,27]. Two-qubit CZ gates are realized by baseband flux pulsing [28,29] using the Net Zero scheme [30,31], completing in 80 ns. In the optimized circuit [ Fig. 2(b)], CZ gates only appear in pairs. These pairs are simultaneously executed and tuned as one block. Single-qubit rotations R 1 -R 4 are used to change the measurement bases, as required to measure C during optimization and to perform twoqubit tomography [32] in each system to extract F and P . A summary of single-and two-qubit gate performance and a step-by-step derivation of the optimized circuit are provided in [17].
The models used to simulate the performance of the algorithm are incremental: model k contains all the noise mechanisms in model k − 1 plus one more, which we use for labeling in Fig. 5. Model 0 corresponds to an ideal quantum processor without any error. Model 1 adds the measured relaxation and dephasing times measured for the four transmons at their bias point. These times are tabulated in [17]. Model 2 adds the increased dephasing that flux-pulsed transmons experience during CZ gates. For this we extrapolate the echo coherence time T echo 2 to the CZ flux-pulse amplitude using a 1/f noise model [33,34] with amplitude √ A = 1µΦ 0 . This noise model is implemented following [35]. Model 3 adds the idling crosstalk due to residual ZZ coupling between transmons. This model expands on the implementation of idling evolution used for coherence times: the circuit gates are simulated to be instantaneous, and the idling evolution of the system is trotterized. In this case, the residual ZZ coupling operator uses the measured residual ZZ coupling strengths at the bias point [17]. Finally, model 4 adds leakage to the CZ gates, based on randomized benchmarking with modifications to quantify leakage [30,36], and implemented in simulation using the procedure described in [35].
Leakage to transmon second-excited states is found es-sential to quantitatively match the performance of the algorithm by simulation. To reach this conclusion it was necessary to first thoroughly understand how leakage affects the two-qubit tomographic reconstruction procedure employed. The readout calibration only considers computational states of the two transmons involved. Moreover, basis pre-rotations only act on the qubit subspace, leaving the population in leaked states unchanged. Using an overcomplete set of basis pre-rotations for state tomography, comprising both positive (X, Y, Z) and negative (−X, −Y, −Z) bases for each transmon, leads to the misdiagnosis of a leaked state as a maximally mixed state qubit state for that transmon. This is explained in [17]. This supplement provides additional information in support of statements and claims made in the main text. Section I presents the optimization of the engineered cost function. Section II provides a step-by-step description of the transformation of the circuit in Fig. 1(b) into the equivalent, optimized circuit in Fig. 2(b). Section III provides further information on the device and transmon parameters measured at the bias point. Section IV presents a detailed description of the fridge wiring and electronic-control setup. Section V summarizes single-and two-qubit gate performance. Section VI characterizes residual ZZ coupling at the bias point. Section VII details the measurement procedures used for cost-function evaluation and for two-qubit state tomography. Section VIII explains the impact of transmon leakage on two-qubit tomography. Section IX explains the package and error models used in the numerical simulation.

I. OPTIMIZATION OF THE COST FUNCTION
We optimize the cost function to maximize fidelity of the variationally-optimized state |ψ( α, γ) to the TFD state |TFD(β) (assuming an ideal processor). We consider the class of cost functions defined by and perform nested optimization of parameter ς to minimize the infidelity of the variationally-optimized state to the TFD state over a range of inverse temperatures We define the minimization quantity of interest as where O is the set of operators and |Ψ(ς, β) is the state optimized using C ς (β). We find the minimum value of Ξ at ς = 1.57 [see Fig. S1(a)].
We compare the performance of the optimized cost function C 1.57 to that used in prior work, C 1.00 , in two ways. First, we compare the simulated infidelity to the TFD state of states optimized with both cost functions in the range β ∈ [0.1, 10]. The optimized cost function C 1.57 performs better over the entire range. Finally, we compare the simulated fidelity F of the reduced state of system A to the targeted Gibbs state. As shown in Fig. S2, using C 1.57 significantly reduces the infidelity 1 − F for β 3, We also observe that the purity of the reduced state tracks that of the Gibbs state more closely when using C 1.57 .

II. CIRCUIT COMPILATION
In this section we present the step-by-step transformation of the circuit in Fig. 1(b) into the equivalent circuit in Fig. 2(b) realizable with the native gate set in our control architecture.
Exponentiation of ZZ and XX: We first substitute the standard decomposition of the operations e −iφZZ/2 and e −iφXX/2 using controlled-NOT (CNOT) gates and single-qubit rotations, shown in Fig. S3. The decomposition of e −iφZZ/2 uses an initial CNOT to transfer the two-qubit parity into the target qubit, followed by a rotation R Z (φ)  on this target qubit, and a final CNOT inverting the parity. The decomposition of e −iφXX/2 simply dresses the transformations above by pre-and post-rotations transforming from the X basis to the Z basis and back, respectively. The result of these substitutions is shown in Fig. S4.
Compilation using native gate set: The native gate set consists of single-qubit rotations of the form R XY (φ, θ) and CZ gates. We compile every CNOT in Fig. S4 as a circuit using native gates, shown in Fig. S5. Note that R Y (θ) = R XY (90 • , θ). Using this replacement together with the identities R Y (−90 S4. Compilation step 1. Depth-14 circuit obtained by replacing the ZZ and XX exponentiation steps in Fig. 1(b) with the circuits of Fig. S3.   Reduction of circuit depth: Exploiting the commutations in Fig. S7 together with the identities R Y (−90 , we can bring two identical pairs of CZ gates back-toback and cancel them out (since CZ is its own inverse). This leads to the circuit in Fig. S8.
Elimination of Z rotations: All the R Z gates in Fig. S8 can be propagated to the beginning of the circuit using the commutation relation and noting that R Z commutes with CZ. State |0 is an eigenstate of all R Z rotations, so we can ignore all R Z gates at Compilation step 3. Depth-12 circuit obtained by applying the commutation rule in Fig. S7 and simple identities to the circuit of Fig. S6.
the start because they only produce a global phase. This action leads to the final depth-11 circuit shown in Fig. S9, which matches that of Fig. 2(b) upon adding measurement pre-rotations and final measurements on all qubits.
Compilation step 4. Depth-11 circuit obtained by propagating all RZ gates in Fig. S8 to the beginning of the circuit and then eliminating them.

III. DEVICE AND TRANSMON PARAMETERS AT BIAS POINT
Our experiment makes use of four transmons with square connectivity within a seven-transmon processor. Figure S10 provides an optical image zoomed in to this transmon patch. Each transmon has a flux-control line for two-qubit gating, a microwave-drive line for single-qubit gating, and dispersively-coupled resonator with Purcell filter for readout [S1, S2]. The readout-resonator/Purcell-filter pair for B 1 is visible at the center of this image. A vertically running common feedline connects to all Purcell filters, enabling simultaneous readout of the four transmons by frequency multiplexing. Air-bridge crossovers enable the routing of all input and output lines to the edges of the chip, where they connect to a printed circuit board through aluminum wirebonds. The four transmons are biased to their sweetspot using static flux bias to counter any residual offset. Table S1 presents measured transmon parameters at this bias point. S10. Optical image of the device, zoomed in to the four transmons used in the experiment. Added false color highlights the transmon pair of system A (blue, A1, A2) the transmon pair of system B (red, B1, B2), and the dedicated bus resonators used to achieve intra-system (read and blue) and inter-system coupling (purple).

IV. EXPERIMENTAL SETUP
The device was mounted on a copper sample holder attached to the mixing chamber of a Bluefors XLD dilution refrigerator with 12 mK base temperature. For radiation shielding, the cold finger was enclosed by a copper can coated with a mixture of Stycast 2850 and silicon carbide granules (15−1000 nm diameter) used for infrared absorption. To shield against external magnetic fields, the can was enclosed by an aluminum can and two Cryophy cans. Microwavedrive lines were filtered using ∼ 60 dB of attenuation with both commercial cryogenic attenuators and home-made Eccosorb filters for infrared absorption. Flux-control lines were also filtered using commercial low-pass filters and Eccosorb filters with stronger absorption. Flux pulses for CZ gates were coupled to the flux-bias lines via roomtemperature bias tees. Amplification of the readout signal was done in three stages: a travelling-wave parametric amplifier (TWPA, provided by MIT-LL [S3]) located at the mixing chamber plate, a Low Noise Factory HEMT at the 4 K plate, and finally a Miteq amplifier at room temperature.
Room-temperature electronics used both commercial hardware and custom hardware developed in QuTech. Rohde & Schwarz SGS100 sources provided all microwave signals for single-qubit gates and readout. Home-built current sources (IVVI racks) provided static flux biasing. QuTech arbitrary waveform generators (QWG) generated the modulation envelopes for single-qubit gates and a Zurich Instruments HDAWG-8 generated the flux pulses for CZ gates. A Zurich Instruments UHFQA was used to perform independent readout of the four qubits. QuTech mixers were used for all frequency up-and down-conversion. The QuTech Central Controller (QCC) coordinated the triggering of the QWG, HDAWG-8 and UHFQA.
All measurements were controlled at the software level with QCoDeS [S4] and PycQED [S5] packages. The QuTech OpenQL compiler translated high-level Python code into the eQASM code [S6] forming the input to the QCC.

V. GATE PERFORMANCE
The gate set in our quantum processor consists of single-qubit rotations R XY (φ, θ) and two-qubit CZ gates. Singlequbit rotations are implemented as DRAG-type microwave pulses with total duration 4σ = 20 ns, where σ is the Gaussian width of the main-quadrature Gaussian pulse envelope. We characterize single-qubit gate performance by single-qubit Clifford randomized benchmarking (100 seeds per run) with modifications to detect leakage, keeping all other qubits in |0 . Two-qubit CZ gates are implemented using the Net Zero flux-pulsing scheme, with strong pulses acquiring the conditional phase in 70 ns and weak pulses nulling single-qubit phases in 10 ns. Intra-system and inter-system CZ gates were simultaneously tuned in pairs (using conditional-oscillation experiments as in [S7]) in order to reduce circuit depth. However, we characterize CZ gate performance individually using two-qubit interleaved randomized benchmarking (100 seeds per run) with modifications to detect leakage, keeping the other two qubits in |0 . Figure S12 presents the extracted infidelity and leakage for single-qubit gates (circles) and CZ gates (squares).

VI. RESIDUAL ZZ COUPLING AT BIAS POINT
Coupling between nearest-neighbor transmons in our device is realized using dedicated coupling bus resonators. The non-tunability of these couplers leads to residual ZZ coupling between the transmons at the bias point. We quantify the residual ZZ coupling between every pair of transmons as the shift in frequency of one when the state of the other changes from |0 to |1 . We extract this frequency shift using the simple time-domain measurement shown in Fig. S13(a): we perform a standard echo experiment on one qubit (the echo qubit), but add a π pulse on the other qubit (control qubit) halfway through the free-evolution period simultaneous with the refocusing π pulse on the echo qubit. An example measurement with B 2 as the echo qubit and B 1 as the control is shown in Fig. S13(b). The complete results for all Echo-qubit, control-qubit combinations are presented as a matrix in Fig. S13(c). We observe that the residual ZZ coupling is highest between B 1 and the mid-frequency qubits B 2 and A 1 . This is consistent with the higher (lower) absolute detuning and the lower (higher) transverse coupling between A 2 (B 1 ) and the mid-frequency transmons.

VII. MEASUREMENT MODELS, COST FUNCTION EVALUATION, AND TWO-QUBIT STATE TOMOGRAPHY
In this section we present detailed aspects of measurement as needed for evaluation of C and for performing two-qubit state tomography. We begin by characterizing the fidelity and crosstalk of simultaneous single-qubit measurements using the cross-fideltiy matrix as defined in [S8]: where e j (g j ) denotes the assignment of qubit j to the |1 (|0 ) state, and π i (I i ) denotes the preparation of qubit i in |1 (|0 ). The measured cross-fidelity matrix for the four qubits is shown in Fig. S14. From diagonal element F ii we extract the average assignment fidelity for qubit i, the latter given by 1/2 + F ii /2 and quoted in Table S1. The magnitude of the off-diagonal elements F ji with j = i quantifies readout crosstalk, and is below 2% for all pairs. This low level of crosstalk justifies using the simple measurement models that we now describe.

A. Measurement models
The evaluation of the cost function and two-qubit state tomography require estimating the expected value of single-qubit and two-qubit Pauli operators. We do so by least-squares linear inversion of the experimental average of single-transmon measurements and two-transmon correlation measurements.
When measuring transmon i, we 1-bit discretize the integrated analog signal for its readout channel at every shot, outputting m i = +1 when declaring the transmon in |0 and m i = −1 when declared it in |1 . The expected value of m i is given by m i = Tr (M i ρ Exp ), where the measurement operator (in view of the low crosstalk) is modelled as Prepared qubit, rewrite the measurement operator in the form also with real-valued c i I and c i Z ∈ [−1, 1].
When correlating measurements on transmons i and j, we compute the product of the 1-bit discretized output for each transmon. The expected value of m ji = m j × m i is given by m ji = Tr (M ji ρ Exp ), where the measurement operator (also in view of the low crosstalk) is modelled as with real-valued coefficients c ji lk ∈ [−1, 1]. Making use of the 2-qubit Pauli operators given by tensor products, and again truncating to three transmon levels, we can rewrite the measurement operator in the form In experiment, we calibrate the coefficients c i P and c ji P Q (P, Q ∈ I, Z) by linear inversion of the experimental average of single-transmon and correlation measurements with the four transmons prepared in each of the 16 computational states (for which P i = ±1 and Q j P i = ±1). We do not calibrate the coefficients c i 2 , c ji 2P , c ji Q2 , or c ji 22 .
Measurement pre-rotations change the measurement operator as follows: Pre-rotations do not transform the projectors |2 i 2 i | as they only act on the qubit subspace.

B. Cost function evaluation
To evaluate the cost function C, we must estimate the expected value of all single-qubit Pauli operators X i , the two intra-system two-qubit Pauli operators Z j Z i , and the inter-system two-qubit Pauli operators X j X j and Z j Z i [the latter only between corresponding qubits in the two systems (e.g., B 1 and A 1 )]. We estimate these by linear inversion of the experimental averages (based on 4096 measurements) of single-transmon and relevant correlation measurements with the transmons measured in the bases specified in Table S2. As an example, Fig. S15. shows the raw data for the estimation of C with variational parameters ( α, γ) = 0. Note that every evaluation of the cost function includes readout calibration measurements to extract measurement-operator coefficients c i P and c ji P Q (P, Q ∈ I, Z).

C. Two-qubit state tomography
After optimization, we perform two-qubit state tomography of each system separately to assess peformance. To do this, we obtain experimental averages (from 16394 shots) of single-transmon and correlation measurements using an over-complete set of measurement bases. This set consists of the 36 bases obtained by using all combinations of bases for each transmon i and j, drawn from the set {+X, +Y, +Z, −X, −Y, −Z}. The expectation values of single-qubit X -1 X -2 X -3 X -4 X -5 X -6 X -7 X -8 X -9 Z -1 Z -2 Z -3 Z -4 Z -5 Z -6 Z -7 Z -8 Z -9  0000  0001  0010  0011  0100  0101  0110  0111  1000  1001  1010  1011  1100  1101  1110  Single-qubit signals (a) X -1 X -2 X -3 X -4 X -5 X -6 X -7 X -8 X -9 Z -1 Z -2 Z -3 Z -4 Z -5 Z -6 Z -7 Z -8 Z -9  0000  0001  0010  0011  0100  0101  0110  0111  1000  1001  1010  1011  1100  1101  1110  Correlation signals (e)  Table S2 and for calibration of the measurement operators (following preparation of the 16 computational states of the four transmons). Panels show the experimental averages of (a-d) single-transmon measurements, (e,h) intra-system correlations and (f,g) inter-system correlations. Our linear inversion procedure for converting measurement averages into estimates of the expected value of oneand two-qubit Pauli operators is only valid for |l j k i l j k l | = 0 whenever either k or l ≥ 2, i.e., there is no leakage on either transmon. It is therefore essential, particularly for simulation model 4, to understand precisely how leakage in either or both transmons infiltrates our extraction of the two-qubit density matrix ρ Exp .
First we consider the estimation of expected values of single-qubit Pauli operators P i , taking Z i as a concrete example. The expected value of all measurements on this transmon for the basis combinations (6 in total) where this specific transmon is measured in the +Z basis is: For measurement bases +X j and −Z i , For measurement bases −X j and +Z i , Finally, for measurement bases −X j and −Z i , Our least-squares linear inversion of these 4 experimental averages to estimate X j Z i is Clearly, owing to the balanced nature of this linear combination (all coefficients of equal magnitude, 2 positive and 2 negative), this estimator is not biased by c ji 2I , c ji 2Z , c ji I2 , c ji Z2 , and c ji 22 . In other words, the average of X j Z i est is independent of the value of these coefficients.
We are finally in position to describe how leakage in the two-transmon system infiltrates into our two-qubit tomographic reconstruction procedure. Evidently, the complete description of the two-transmon system would be a two-qutrit density matrix ρ 2Qutrit , but our procedure returns a two-qubit density matrix ρ Exp . It is therefore key to understand how elements of ρ 2Qutrit are mapped onto ρ Exp . Table S3 summarizes these mappings and Fig. S16 illustrates them, including several examples. We have verified the mappings by exactly replicating the tomographic procedure in our numerical simulation using quantumsim. To incorporate this into the simulations, we have made use of the measurement coefficients c i experimentally obtained. These leakage mappings have also been used when adding leakage in simulation model 4.
Elements of ρ2Qutrit Mapping onto ρExp l j , k i lj, ki| l j , k i lj, ki| l j , 2i lj, 2i|

IX. ERROR MODEL FOR NUMERICAL SIMULATIONS
Our numerical simulations use the quantumsim [S9] density-matrix simulator with the error model described in the quantumsim dclab subpackage.
Single-qubit gates are modeled as perfect rotations, sandwiched by two 10 ns idling blocks. The idling model takes into account amplitude damping (T 1 ), phase damping (T echo 2 ) (noise model 1 of the main text), and residual ZZ crosstalk (noise model 3). To implement it, we first split the idling intervals into slices of 10 ns or less. These slides include amplitude and phase damping. Between these slices, we add instantaneous two-qubit gates capturing the residual coupling described by the Hamiltonian: The measurements of T 1 , T echo 2 and ζ ij are detailed in previous sections.
The error model for CZ gates is described in detail in [S10]. The dominant error sources are identified to be leakage of the fluxed transmon to the second-excited state through the |11 ↔ |02 channel and increased dephasing (reduced T echo 2 ) due to the fact that the fluxed transmon is pulsed away from the sweetspot. These two effects implement noise models 4 and 2, respectively. The two-transmon process is modeled as instantaneous, and sandwiched by two idling blocks of 35 ns with decreased T echo 2 on the fluxed transmon. Quasistatic flux noise is suppressed to first order in the Net Zero scheme and is therefore neglected. Residual ZZ crosstalk is not inserted during idling for the transmon pair, because it is absorbed by the gate calibration. The error model described in [S10] allows for higher-order leakage effects, e.g., so-called leakage conditional phases and leakage mobility. We do not include these effects.
The simulation finishes by including the effect of leakage on the tomographic procedure as discussed in Section VIII. The density matrix is obtained at the qutrit level, and the correct mapping for the density-matrix elements is applied. We take special care to use the experimental readout coefficients c i to model the readout signal for the simulated density matrix, according to Eq. (S4) and Eq. (S5). The simulation produces data for the same basis set as shown in Fig. S15. Afterwards, the same tomographic state reconstruction routine as in the experiment is applied to these data. in this way, noise model 4 properly accounts for the imperfect reconstruction of leaked states, providing a fair comparison to experiment.