Independent, extensible control of same-frequency superconducting qubits by selective broadcasting

A critical ingredient for realizing large-scale quantum information processors will be the ability to make economical use of qubit control hardware. We demonstrate an extensible strategy for reusing control hardware on same-frequency transmon qubits in a circuit QED chip with surface-code-compatible connectivity. A vector switch matrix enables selective broadcasting of input pulses to multiple transmons with individual tailoring of pulse quadratures for each, as required to minimize the effects of leakage on weakly anharmonic qubits. Using randomized benchmarking, we compare multiple broadcasting strategies that each pass the surface-code error threshold for single-qubit gates. In particular, we introduce a selective-broadcasting control strategy using five pulse primitives, which allows independent, simultaneous Clifford gates on arbitrary numbers of qubits.

Building a fault-tolerant quantum computer requires the ability to efficiently address and control individual qubits in a large-scale system. Many leading experimental quantum information platforms, among them trapped ions [1], electronic spins in impurities and quantum dots [2] and superconducting circuits [3], employ qubits with level transitions in the microwave frequency domain. Addressing these transitions often involves expensive microwave electronics scaling linearly with the number of qubits. To move beyond the state of the art in microwave-frequency quantum processors, such as those recently used for small-scale quantum error correction in superconducting circuits [4][5][6], it will already be beneficial to have a hardware-efficient control strategy that harnesses economies of scale. One approach is to use microwave pulses from a single control source for multiple qubits [7], requiring frequency-matched qubits and highspeed routing of pulses to separate control lines. The linear scaling of control equipment could then be reduced to a constant overhead for the most expensive resources.
Using control equipment for multiple qubits has previously been demonstrated for optical addressing in atomic systems, where qubits naturally have the same frequency [8][9][10][11]. Such frequency reuse also becomes possible in circuit quantum electrodynamics (cQED) [12] in the context of fault-tolerant computation strategies [13][14][15] which rely only on local interactions between qubits mediated by bus resonators. The natural isolation between different lattice sites allows the use of repeating patterns of qubit frequencies with selectivity provided by spatial separation. A tileable unit cell with a handful of qubit frequencies [16] could therefore provide a promising route towards scalability. Crucially, this also solves the frequency-crowding problem which arises when trying to fit many distinct-frequency qubits within the finite useful bandwith of the circuit-based devices, particularly for designs based on weakly anharmonic qubits where higher levels must also be avoided [17,18]. While no qubit experiments have yet shown the viability of this approach, Hornibrook et al. have recently demonstrated a cryogenic switching matrix for pulse distribution operating at 20 mK, triggered by a field-programmable gate array at 4 K [7]. Cryogenic control equipment may shorten feedback latency and reduce wiring complexity across temperature stages, but the isolation and operational frequency range achieved are currently insufficient for typical cQED experiments.
In this Letter, we demonstrate frequency reuse in an extensible solid-state multiqubit architecture. Specifically, we show independent simultaneous control of two same-frequency qubits with a home-built roomtemperature vector switch matrix (VSM). The VSM allows tailoring of control pulses to individual qubit properties, and routing of the pulses to either one or both of the qubits using fast digital markers. We develop several different approaches to selective pulse broadcasting, including a simple scheme for implementing independent Clifford control on an arbitrary number of qubits with a constant overhead in time. The device for this experiment is designed to allow testing in a circuit with the correct connectivity of a relevant surface-code lattice [19,20]. Using randomized benchmarking (RB), we show that all control schemes exceed the fidelity threshold for surface code and are dominated by qubit relaxation. We also develop a method for measuring leakage to the second excited state directly within the context of RB [21,22]. We characterize the limitations of our system and find no major obstacles to scaling up to larger implementations.
Demonstrating frequency reuse in a context relevant for increasing system sizes requires two key elements: a method for distributing control pulses to multiple qubits using a single qubit-control source, and a multiqubit device containing same-frequency qubits with relevant connectivity. Here, we focus on a particular implementation of the surface code, where the connectivity between nearest-neighbor qubits is achieved via bus resonators [13]. Figure 1 illustrates a conceptual design The device (gold-colored box on right) connects two transmons with matched frequency (QT and QB: 6.220 GHz) indirectly through coupling buses and a third non-matched transmon (QM: 6.550 GHz). This provides the smallest relevant subunit of the four-frequency surface-code fabric illustrated above right. The VSM (blue box) allows independent, simultaneous transmon control with tailored DRAG pulsing by combining and directing Gaussian and derivative-of-Gaussian input pulses to separate outputs T and B connected to dedicated drive lines for each qubit. The link between inputs and outputs can be switched on nanosecond timescales using the digital marker inputs MT and MB (orange lines). DRAG pulses for the targeted qubit are independently tuned in both amplitude and phase for each input-output pair (top left).
based on repeated tiling of a unit cell consisting of four qubits with unique frequencies that are coupled via bus resonators.
Our device contains a small block of this design, consisting of two same-frequency transmon qubits (Q T and Q B ), which are connected to a third qubit (Q M ) via separate bus resonators (Fig. 1). Each qubit has a capacitively-coupled drive line for individual qubit control [23], a readout resonator coupled to a common feedline for frequency-division multiplexing readout [24,25], and a flux-bias line for individual frequency tuning (Fig. 1). While Q T and Q B were designed to be identical, fabrication uncertainties resulted in a sweet-spot (maximum) frequency of Q T 57 MHz higher than that of Q B . With Q B and Q M kept at their respective sweet-spots (6.220 GHz and 6.550 GHz, respectively), Q T was then flux tuned to match Q B with an accuracy of 50 kHz, determined using Ramsey measurements. The coherence times at the operating frequency can be found in the Supplemental Material [26]. Because of the transmon's weak anharmonicity [27], high-fidelity fast single-qubit control is achieved using the method of derivative-removal-via-adiabatic-gate (DRAG) pulsing, where the in-phase Gaussian pulse is combined with an in-quadrature derivative-of-Gaussian pulse [28,29]. For each qubit, this requires independent amplitude control of the two constituent quadrature pulses.
The VSM was designed to accept multiple input pulses and selectively fan them out to multiple qubits with individual pulse tuning for each qubit (Fig. 1). Our homebuilt room-temperature 4 × 2 (four input, two output) VSM allows independent control of amplitude and phase for each of its input-output combinations. Fast markercontrolled digital switches enable routing of pulses to the qubits at nanosecond timescale, with approximately 50 dB isolation in the frequency range from 4 to 8 GHz (see [26] for VSM specifications). By directing the two consistuent pulses of DRAG control through separate inputs of the VSM, this allows independent, in-situ DRAG tuning for both same-frequency qubits using four AWG channels [26].
The first critical test of our control architecture is to assess the VSM's ability to implement high-precision control of one qubit while leaving the other qubit idle. To do this, we use the standard technique of single-qubit RB based on Clifford gates [30][31][32], which allows us to characterize control performance independently of state preparation and measurement errors. After initializing all qubits in the ground state by relaxation, we use the VSM to selectively apply random sequences of Cliffords gates to only one of the same-frequency qubits, with lengths m ranging from 1 to 800 gates and the results averaged for 50 different random sequences. We decompose each gate into the standard minimal sequence of π and ±π/2 pulses around the x and y axes (16 ns pulses separated by a 4 ns buffer; t p = 20 ns) [22], requiring on average N p = 1.875 pulses per Clifford. This is in contrast to atomic pulses, where the 24 single-qubit Cliffords can each be implemented with a single pulse [33]. After applying a final Clifford that inverts the cumulative effect of all m previous Cliffords, the driven qubit is ideally returned to the ground state, but as a result of imperfections such as gate errors and decoherence, the final ground-state population decays as a function of m. The decay rate can be related to the average fidelity per Clifford F C [30,31]. In a strictly two-level system, the measured ground-and excited-state populations averaged over many sequences ( P 0 and P 1 ) both converge to 0.5 for large m. For weakly anharmonic transmon qubits, leakage to the second-excited state can be an important additional source of gate error, which can lead to a shift of the asymptotic value away from 0.5. We address this issue by performing the RB protocol both with and without an additional final π pulse [34], which allows us to explicitly estimate the populations of the first three transmon states (see [26] for details).
We implement the above characterization for one qubit at a time by switching off the marker for the undriven qubit at the VSM [ Fig. 2(a,b)], in each case measuring the effect on both qubits simultaneously via multiplexed readout. From the results in Fig. 2(c,d), we calculate the average Clifford fidelities for the two individuallydriven qubits to be 0.9982(2) (Q T ) and 0.9986(2) (Q B ), in both cases surpassing the best known surface-code faulttolerance threshold for single-qubit gates of ∼ 0.99 [35-  Single-qubit control of same-frequency qubits using the VSM. (a,b) Schematics showing DRAG pulses routed exclusively to either QT or QB using the corresponding markers (always on or off). Details of the tune-up procedure for optimizing pulse amplitude and phase parameters for each qubit are provided in Ref. 26. (c,d) Characterization of single-qubit control by randomized benchmarking (RB) of Clifford gates. Average populations of QT and QB in the ground, first-and second-excited states ( P0 , P1 and P2 , respectively) as a function of the number of Clifford gates applied. Curves are best fits of single exponentials with offsets. Single-qubit Clifford fidelities for each qubit are extracted from the decay of ground-state populations. For both qubits, the fidelity surpasses the surface-code fault-tolerance threshold and lies close to the theoretical value expected for T1-only relaxation-limited performance. (e,f) Expanded plots of second-excited state leakage during RB. Curves are best fits according to Eq. (2). (g,h) Cross-excitation of the undriven qubit resulting from control pulses applied to the driven qubit. Further measurements indicate that this effect can be attributed to microwave leakage. 37]. We compare these values with the expected average Clifford fidelities assuming only T 1 decay [38]: This shows our results are predominately limited by relaxation effects, the difference in performance being consistent with the different T 1 times for the two qubits. Additional measurements in the Supplemental Material furthermore show that there is no difference in performance when both qubits are driven simultaneously by the same pulse sequence (both markers on). From the measured leakage populations P 2 , we also extract estimated per Clifford leakage rates κ of 4.1(2) × 10 −6 (Q T ) and 1.3(4) × 10 −6 (Q B ) by fitting the following simple model to the data (see [26] for details): where T 2→1 is the second-to first-excited-state relaxation time. As these leakage rates are much smaller than the gate errors (1−F C ), it is reasonable to neglect them when estimating the Clifford fidelity. We next explore the effect of the single-qubit control pulses on the undriven qubit [ Fig. 2(g,h)], which should ideally remain in the ground state. We fabricated the qubits as close to each other as possible in order to study the worst-case scenario for such cross-excitation effects. While Q T remains largely unaffected when driving Q B , a substantial deviation from the ground state is measured in Q B when driving Q T . There are several possible mechanisms for cross-excitation effects in our system: cross-coupling (higher-order quantum coupling mediated by the bus resonators and Q M ) and cross-driving (spurious driving of the idle channel by microwave leakage resulting from imperfect isolation either in the VSM or on chip). From single-excitation swap experiments (see Ref. 26), we observe a residual exchange interaction [12] between Q T and Q B with strength J/2π ≤ 36 ± 1 kHz. This symmetric swapping of excitation is therefore unlikely to explain the strong asymmetry in the amount of cross-excitation measured for the different qubits. Furthermore, in RB experiments, where the state of the driven qubit is moved randomly around the Bloch sphere during each run of the experiment, we expect the effect of cross-coupling to be dramatically reduced, because the period π/J is far longer than the average Clifford gate time. Direct, independent measurements of microwave isolation in both the VSM and on chip indicate that the on-chip cross-driving significantly dominates [26]. This was also confirmed in-situ by performing the same RB measurements with the undriven qubit physically disconnected from the VSM. This showed no significant reduction in cross-excitation. Furthermore, numerical simulations show that the observed effects are consistent with cross-driving alone at levels similar to the measured values [26]. We note, however, that while the plots in Fig. 2(g,h) are useful diagnostics for identifying the presence of the cross-driving effect, they should not be interpreted in the same way as the RB plots for the driven qubit. A more comparable way to study this effect is to use interleaved RB [39], where pulses on the driven qubit (ideally identity operations on the undriven qubit) are interleaved with a random sequence of Cliffords applied to the undriven qubit [26]. From this, we estimate the average idling fidelity for Q B to be 0.9986 (5), which is consistent with the error due to T 1 decay during idling. This confirms that cross-excitation effects do not dominate the error per Clifford, as characterized by RB.
The defining test of extensibility in our control architecture is to demonstrate the simultaneous, independent, single-qubit control over same-frequency qubits that is enabled by selective broadcasting using the VSM. We ex-C 2 Y π/2 X π/2 C 13 Y π/2 X π Y π/2 X π/2 Y π/2 X π Y π/2 X π/2 X π X π/2 Y π/2 X π/2 X −π Y −π Sequential Compiled 5-primitives In the sequential scheme, the pulses implementing C2 are directed to QT, after which the pulses implementing C13 are directed to QB. In the compiled scheme, the two Cliffords are realized concurrently using a pre-determined pulse sequence, with appropriate markers, which minimizes the total number of pulses, NP (see [26] for the compilation algorithm). Finally, in the 5-primitives scheme, a fixed sequence of five pulses is repeated in each round (Np=5). The targeted Cliffords are then applied simultaneously by selecting the appropriate subset of pulses for each qubit (see [26] for the 5-primitives marker table). Top-right: Scaling of the average pulses per multiqubit combination of Cliffords, Np , versus qubit number n. The constant scaling achieved by the 5-primitives scheme provides a dramatic improvement over the linear scaling of the sequential scheme. While Np is always lowest for the compiled scheme, pre-compiling the optimal pulse and marker combinations is impractical for n 5, and the improvement over the simpler 5-primitives scheme is negligible by n ∼ 10.
plore three paradigmatic schemes for implementing selective broadcasting of Cliffords on an arbitrary number of qubits n (Fig. 3). In the most straightforward selectivebroadcasting scheme, the individual qubits are driven sequentially, with each pulse being directed to one qubit at a time. This results in a linear scaling of the average number of pulses per Clifford round ( N p = 1.875 × n). By contrast, the second paradigm takes best advantage of the VSM's capability to broadcast simultaneously to multiple qubits by compiling the constituent Clifford pulses to minimize N p for each Clifford combination in the sequence. The compilation is performed by searching all possible combinations of single-qubit Clifford decompositions and finding the one that minimizes N p (see [26] for further information). In Fig. 3, we show exact values and estimates for N p for the compiled scheme up to n = 10. Unfortunately, the compilation run-time increases exponentially with the number of qubits. This motivates our final broadcasting paradigm, where all Clifford gates can be implemented using the same fixed, ordered sequence of five pulse primitives (Fig. 3). Independent Cliffords can be applied to all qubits, irrespective of n, by selectively  (e) Average single-qubit Clifford-gate fidelities for QT and QB in each scheme, extracted from the decay of the corresponding ground-state population. All fidelities surpass the surface-code fault-tolerance threshold and closely track those expected for T1-relaxation-limited performance. Compiled selective broadcasting performs best, as expected, having the lowest total number of pulses (see Fig. 3).
directing the appropriate subset of pulses to each qubit, achieving a constant overhead in time for single-qubit Clifford control. While the compiled scheme by definition always provides the minimum sequence length, our estimates of N p for the compiled scheme suggest that it asymptotes to the same value of 5 achieved by the simple, prescriptive 5-primitives scheme. We note that the specific choice of pulse primitives is not unique, but at least five primitives are required (four pulses allow a maximum of 16 unique gate decompositions, compared with the 24 single-qubit Cliffords).
To demonstrate the full functionality of this control architecture, we implement all three selective-broadcasting schemes and measure their performance using parallel single-qubit RB with independent Clifford sequences for each qubit. Figure 4 shows that the compiled scheme performs best, followed by the sequential and then 5primitives schemes, consistent with the average number of pulses required for each (Fig. 3). In all cases, the average fidelity per Clifford still surpasses the surface-code fault-tolerance threshold, and the average error is again dominated by relaxation. The results are completely con-sistent with the values obtained in the test for isolated single-qubit control, indicating no substantial decrease in gate performance using selective broadcasting schemes.
Our VSM allows efficient reuse of qubit control equipment on same-frequency qubits. It enables high-precision single-qubit control of multiple qubits with a performance that is mainly limited by relaxation. We have demonstrated three selective broadcasting schemes, all of which achieve a performance that surpasses the faulttolerance threshold for the surface code. In particular, the 5-primitives scheme implements arbitrary Clifford control with a fixed five-pulse sequence, where the target Clifford is selected by routing a subset of the pulses using digital markers. By adding a sixth, non-Clifford gate to the five pulse primitives, this can be extended to achieve universal single-qubit control. Combining the connectivity of our device, the VSM-based control, and the fixed pulse overhead of the 5-primitives broadcasting strategy, our experiment realizes the simplest element of an exten-sible qubit control architecture. While we do not yet see explicit savings in control hardware for two qubits, this design can be expanded to more same-frequency qubits without any further increase in microwave sources or arbitrary waveform generators. This experiment suggests that surface-code tiling with frequency reuse is a viable path towards large-scale quantum processors.

SUPPLEMENTAL MATERIAL
This supplement provides experimental details and additional data supporting the claims in the main text. The device and experimental setup, including images of the device and a full wiring diagram, are described in Section I. The microwave performance of the vector switch matrix (VSM) and its use for independent control of two same-frequency qubits are demonstrated in Sec. II A. The techniques employed for tuning qubit pulses are discussed in Sec. III. Section IV contains the results of randomized benchmarking (RB) experiments in the two-qubit global broadcasting context. Our technique for assessing the effects of leakage is detailed in Sec. V. Cross-coupling and cross-driving effects are characterized in Sec. VI, along with numerical simulations of the effects of cross-excitations on RB. In Section VI D, we introduce a method for generating selective-broadcasting pulse sequences that are robust to cross-excitation effects. The decompositions of the 24 single-qubit Cliffords into a minimal set of pulses and the 5-primitive pulses are given in Sec. VII. Finally, the algorithm used to compile optimal pulse sequences for implementing independent single-qubit Clifford gates on multiple qubits is explained in Sec. VIII.

A. Chip design and fabrication
Our quantum chip consists of three transmons (top: Q T , middle (ancilla): Q M , and bottom: Q B ) with dedicated voltage drive lines (D T , D M and D B , respectively), flux-bias lines, and readout resonators. All readout resonators are capacitively coupled to one common feedline which crosses various on-chip components using airbridge crossovers. Qubits Q T and Q M are coupled by one bus resonator, and Q M and Q B by another (fundamental frequencies 4.9 and 5.0 GHz, respectively).
All resonators are open-ended on the coupling side, and short-circuited at the other.
The chip fabrication method is similar to that in Ref. 1, but with some important differences which we now explain. Rather than sapphire, we use a high-resistivity intrinsic silicon substrate prepared by HF dip and HMDS surface passivation before sputtering a 300-nm-thick film of NbTiN, as introduced in Ref. 2. This change aims to improve the substrate-metal interface and thereby intrinsic quality factors for both resonators and qubits. After sputtering, the patterns are etched into the superconducting layer using reactive-ion etching with an SF 6 /O 2 plasma. In contrast with the Al transmon capacitor plates commonly used, in this experiment we make them also from NbTiN, with an aim to improve the substrate-metal interface and avoid large AlO x surfaces which may house unwanted two-level systems. Only the Josephson junctions are made by the standard technique of Al-AlO x -Al double-angle evaporation. A further HF dip just prior to evaporation also helps to contact the junctions directly to the NbTiN capacitor plates. In Ref. 1, air bridges were already used to cross the feedline over flux-bias lines on chip. Here, we extend this technique to allow the feedline to cross three readout resonators.
A key requirement for this experiment was the ability to match qubit frequencies without sacrificing coherence. Flux-bias lines allow easy compensation for mismatch, but at the cost of reduced coherence in the qubit detuned from its maximal frequency (coherence sweet spot [3]). We aimed for identical maximum frequencies of Q T and Q B , as determined by capacitor and junction geometries. The capacitors were easily matched in fabrication. We then selected the chip with the closest matching roomtemperature resistance values for the relevant qubit junctions.

B. Experimental Setup
The chip is anchored to a copper cold-finger connected to the mixing chamber of a Leiden Cryogenics CS81 3 He/ 4 He dilution refrigerator with 7 mK base temperature. A copper can seals the sample space, with an inner surface that is coated with a mixture of Stycast 2850 and silicon carbide granules (15 to 1000 nm diameter) used for infrared absorption [4]. The copper can is in turn magnetically shielded by an aluminum enclosure and two outer Cryophy enclosures (1 mm thick) [2].
A complete wiring schematic showing all cryogenic and room-temperature components is shown in Fig. S2. The four analog channels of the Tektronix AWG5014C create the in-phase and in-quadrature pulses for Q T and Q B by single-sideband modulation of a common carrier. Because single-sideband modulation requires two AWG channels to modulate an IQ mixer, independent derivative-removal-via-adiabatic-gate (DRAG) tuning with the VSM therefore requires four AWG channels, irrespective of the number of qubits. The VSM can be scaled up to many output channels, and direct hardware savings can be realized as soon as three or more samefrequency qubits are driven by a single set of AWG inputs. These pulses are input at ports 1 and 2 of the VSM. The VSM combines these pulses with individually tuned insertion loss and phase to each of two outputs (labelled T and B , bottom) times for the three qubits at the bias point. When measuring QT or QB, the other qubit is detuned by −50 MHz to suppress cross-coupling effects. Ramsey fringes for QT (middle, left panel) fit better to a Gaussian (shown) than an exponential decay, reflecting the susceptibility of QT to low-frequency flux noise away from its sweet-spot. P1 denotes excited-state population.

A. Measured isolation
To characterize the isolation of the VSM in the range 4 to 8 GHz, we have measured the insertion loss between all input (1 and 2) and output ports (T and B) with static settings at the two gate inputs (Fig. S4). Ideally, each gate activates (on state) and deactivates (off state) the link of both inputs 1 and 2 to one output, independent of the other gate. As shown in Fig. S4, the typical relative isolation with the relevant gate in the off state is ∼ 50 dB. S4. Insertion loss between inputs and outputs of the VSM for four static combinations of the gate inputs. The insertion loss is measured relative to the level with both gate inputs activated (onTonB). The black curves indicate the noise background in our scalar network analyzer measurement. The dashed vertical lines indicate the common frequency (6.220 GHz) of QT and QB at the bias point.

B. Individual qubit tune-up
The VSM enables independent control of the on/off state, insertion loss and phase for every input-output combination. We exploit this feature to perform DRAGcompensated pulses on Q T and Q B , that are individually tailored for each qubit. Different types of gate errors, such as non-ideal in-phase and in-quadrature amplitudes, can be distinguished using an AllXY sequence [5], consisting of 21 combinations of two pulses drawn from the set I, X π , Y π , X π/2 , Y π/2 (Table S2). Figure S5 shows AllXY sequence results for Q T and Q B as the amplitude of each quadrature on Q T is varied independently. While the AllXY signature of Q T reveals changing levels of amplitude and phase errors, there is no noticeable change in the AllXY signature of Q B . This demonstrates the use of the VSM for individual tune-up of pulses for samefrequency qubits.  0.5 Y π/2 X π/2 20 1 X π/2 X π/2 10 0.5 X π/2 Yπ 21 1 Y π/2 Y π/2 11 0.5 Y π/2 Xπ TABLE S2. The 21 two-pulse combinations comprising the AllXY pulse sequence [5].

III. PULSE-CALIBRATION ROUTINES
We tune up qubit pulses by alternating the calibration of in-phase and in-quadrature pulse amplitudes until a simultaneous optimum is found. The two calibration routines are discussed below.

A. Accurate in-phase pulse amplitude calibration
The in-phase quadrature amplitude is calibrated by first applying a π/2 pulse to the qubit, followed by a train of π pulses. The pulse sequence is (X π ) 2N X π/2 |g , where |g is the qubit ground state and N ∈ [0, 49]. In the absence of gate errors and decoherence, the driven qubit would end on the equator of its Bloch sphere for all N . However, over-or under-driving produces a posi- FIG. S6. Fine calibration of pulse amplitude by initial π/2 pulse, followed by 2N repeated π pulses. The initial slope determines if the qubit is over-(positive slope) or under-driven (negative slope).
tive or negative initial slope on P 1 versus N , respectively (Fig. S6). We choose the in-quadrature amplitude that minimizes the absolute slope.

B. DRAG-parameter calibration
To minimize phase errors resulting from the presence of the second-and higher-excited states, we optimize the scaling of the in-quadrature pulse. As in conventional DRAG [6,7], we choose as the envelope of the in-quadrature pulse the derivative of the Gaussian envelope on the in-phase pulse. The DRAG scaling parameter is calibrated using the method detailed in Ref. [5]. Specifically, we measure the difference in excited-state population produced by the Y π X π/2 and X π Y π/2 pulse combinations (AllXY ID 10 and 11). Ideally, for both, the final qubit state would lie on the equator. However, any phase error shifts the final excited-state population in opposite directions in these cases. We choose the DRAG scaling parameter minimizing this shift.

IV. GLOBAL BROADCASTING
Aside from single-qubit control and selective broadcasting, the VSM also allows global broadcasting of pulses to all qubits simultaneously by keeping the markers for both qubits on (Fig. S7). While this does provide simultaneous control of Q T and Q B , marker control is needed to achieve independent control. Using RB, we measure the performance of both qubits when broadcasting pulses to both qubits, and compare the results with those obtained from single-qubit control (Fig. S7). The global broadcasting RB measurements were alternated with the single-qubit RB measurements, and aside from marker settings all other settings were identical. Comparison of the results in Fig. S7 show that the qubit gate performance does not depend on whether a single qubit is controlled, or both are controlled simultaneously through global broadcasting. Global broadcasting of DRAG pulses to samefrequency qubits. (a) Illustration of global broadcasting. Two simultaneous pulses, one with Gaussian envelope at input 1, and another with derivative-of-Gaussian envelope at input 2, are simultaneously directed to QT and QB (both markers always on). The insertion loss and phase shift of each pulse is separately optimized for each output to produce precision DRAG pulses for each qubit. (b,c) Comparison of single-qubit driving versus driving both qubits (broadcasting) by RB of Clifford gates composed from π/2 and π pulses [8]. Average population of QT and QB in the ground, first-and secondexcited state ( P0 , P1 and P2 , resp.) as a function of the number of Clifford gates applied. Curves are the best fits of single exponentials with offsets to the populations. The single-qubit Clifford-gate fidelity for each qubit is extracted from the decay of the corresponding ground-state population when using global broadcasting.

V. LEAKAGE TO SECOND EXCITED STATE
Leakage is fundamentally different from unitary qubit errors. To quantify leakage, we monitor the populations P i of the three lowest energy states (i ∈ {0, 1, 2}) during RB and calculate the average values P i over all seeds. To do this, we calibrate the average signal levels V i for the transmons in level i, and perform each RB measurement twice, the second time with an added final π pulse on the 0-1 transition. This final π pulse swaps P 0 and P 1 , leaving P 2 unaffected. Under the assumption that higher levels are unpopulated (P 0 + P 1 + P 2 = 1), where S (S ) is the measured signal level without (with) final π pulse. The populations are extracted by matrix inversion.
Measuring P 2 as a function of the number of Clifford gates allows us to estimate an average leakage per Clifford, κ. Because the populations are ensemble averages over different random seeds, we assume that leakage of the average qubit-space populations to P 2 is incoherent, and, provided P 2 remains small (κ small), we also assume that leakage is irreversible. We therefore model leakage using the following difference equation for P 2 : where T 2→1 is the second-to first-excited-state relaxation time. Assuming no initial population in the secondexcited state, the solution is Eq.
(2), which shows good agreement with measured data. We extract κ by fitting Eq.
(2) to P 2 data. T 2→1 is obtained from the best-fit decay constant (not directly measured) and κ from the best-fit prefactor.

VI. CROSS-COUPLING AND CROSS-DRIVING EFFECTS
There are several sources of spurious cross-qubit interactions [see Fig. 2(g,h) of the main text] which play a role in our experiments. We divide these into two main classes: cross-coupling, where the qubits themselves are coupled via a direct or indirect quantum interaction, and cross-driving, where input microwaves directly drive the untargeted qubit with a reduced amplitude as a result of imperfect isolation either on or off chip. These crossexcitation effects depend strongly on specific chip design, in our case chosen according to our primary aim to study the potential of frequency reuse in a circuit fully compatible with a larger surface-code lattice. This governed both how qubits and resonators were connected, as well as the selection and arrangement of frequencies. In addition, in order to fully explore the limitations of such techniques, the qubits were positioned on the chip as close to each other as possible to provide a worst-case scenario for cross-excitation effects.

A. Cross-coupling
In our device, the same-frequency qubits Q T and Q B are connected through a linear chain of coupled quantum elements: the two bus resonators and intermediate ancilla qubit Q M . They are also coupled capacitively through the ground plane. When the intermediate elements are detuned in frequency away from the near-resonant qubits, such geometries typically give rise to higher-order exchange-type interactions between the qubits of the form J(σ + T σ − B + σ − T σ + B ) [9,10]. Consistent with this picture, we observe coherent swapping of excitations between Q T and Q B at a rate strongly dependent on the qubit detuning (Fig. S8).
In order to characterise the cross-coupling between Q T and Q B , we measure the evolution of excited-state populations after a single excitation is injected at one of the qubits with a π pulse. To place a tight upper bound on the interaction strength J, the qubit frequencies must be matched as closely as possible. We achieve an accuracy of around 50 kHz using Ramsey experiments, limited by a combination of factors: the resolution of the flux tuning, the fitting resolution limit imposed by qubit T 2 dephasing times, and also the frequency shifting induced by the qubit-qubit exchange interaction itself. As shown in Fig. S8, the total excitation number exhibits a typical exponential T 1 decay, while the individual populations show a symmetric oscillation of the excitation between the two qubits. The measured common frequency of decaying oscillations for both qubits sets an upper bound on the coupling strength of J/2π ≤ 36 ± 1 kHz.

B. Cross-driving
When driving one qubit, the signal applied to its dedicated voltage line residually drives the distant qubits on the chip. This results from microwave leakage in the VSM, near the sample, or from direct on-chip coupling of the drive line to the distant qubits. In Section II A, we fully characterise the isolation in the VSM. This leads to leakage of around −57 dB and −54 dB on Q T and Q B , respectively, at the qubit operating frequency for the conditions used in the main experiments. To characterize the residual cross-driving at the device, we disconnect the VSM and compare the amplitude required for pulses applied on one drive line (either D T or D B ) to perform a π rotation on both Q T and Q B . We define the cross-driving strength of each drive line as the ratio r c of the π-pulse amplitude for the directly-driven to that for the cross-driven qubit. For this test, pulses are first amplified and then attenuated using a step attenuator to allow the large amplitude range required. Results shown in Fig. S9 demonstrate on-chip cross-driving strengths below 1% (−53 dB and −45 dB on Q T and Q B , respectively), but still somewhat larger than cross-driving effects resulting from finite isolation in the VSM. During the main experiments, both VSM and on-chip leakage are able to contribute to cross-driving effects.
In the excitation-swap experiments of the previous section, it is clear the effects result from cross-coupling rather than cross-driving, because the exchange dynam- ics occur while the drive pulses are off. Similarly, we can rule out these residual driving effects being related to cross-coupling, because the 16-ns Rabi pulses take only a fraction of the time required for a cross-coupling exchange oscillation. As a further check that the effects observed are indeed due to cross-driving, we measure the cross-driving on Q B through D T as Q T is tuned through resonance with Q B [Fig. S9(c)]. The cross-driving ratio is essentially constant, showing no significant dependence on the detuning between Q T and Q B .

C. Cross-excitation in randomized benchmarking
The isolated single-qubit control experiments in the main text [ Fig. 2(g,h)] show that significant spurious excitations can build up in the idling qubit over the course of the long gate sequences tested in RB (particularly in the case of idling Q B while driving Q T ). It may therefore initially be somewhat surprising that virtually the same individual qubit-control performance is achieved in both selective-broadcasting (Fig. 4, main text) and global-broadcasting (Fig. S7) scenarios. As discussed in the main text, the observed cross-excitation is unlikely to result from cross-coupling, primarily because a symmetrical quantum coupling should not result in strongly asymmetric effects on the different qubits. We now show the results are, however, consistent with the effects of cross-driving by numerically simulating RB with crossdriving under experimentally realistic conditions (using independently measured qubit and cross-driving parameters). Simulations are performed using QuTiP [11,12]. The results shown are averaged over ten runs, using the single-qubit minimal-set decomposition (red), and using the selective-broadcasting asymmetric and symmetric 5primitives schemes (green and blue, respectively). Crossdriving effects are largely suppressed in the 5-primitives schemes by choosing the five pulse primitives such that constituent pulses largely cancel out. The symmetric 5-primitives scheme further reduces cross-driving effects by alternating between the five pulse primitives and the inverse pulses. We model our system as two uncoupled qubits, Q T and Q B , subject to T 1 relaxation (with corresponding relaxation times) and cross-driving. We approximate the system dynamics using instantaneous unitary pulse operators from the standard Pauli set {X π , Y π , X ±π/2 , Y ±π/2 } with 20 ns delays of T 1 -only qubit relaxation between pulses implemented using a master equation. When applying a pulse to one qubit, cross-driving of the other qubit is implemented by applying a pulse with the same rotation axis, but with the original rotation angle multiplied by the relevant cross-driving ratio. We note that also trying to model the effects of qubit dephasing using a simple master equation does not produce RB data consistent with the experimental observations (e.g., Figs 2 and 4 of the main text). This reflects the non-uniform phase noise spectrum which affects the transmon qubit. The long RB pulse sequences consisting of π and π/2 pulses around the X and Y axes seem to provide some form of dynamical decoupling which makes the RB measurements robust to qubit dephasing. As will be seen, the experimental results can be well modelled using only T 1 -type noise processes.
Randomized benchmarking is implemented by generating independent Clifford sequences for each qubit. We decompose Clifford gates using either the minimal set decomposition or one of the selective-broadcasting schemes. Figure S10 shows simulated results of crossdriving for the isolated single-qubit control scenario reported in Fig. 2 of the main text. In this section, we are only concerned with the red curves, which correspond to implementing single-qubit RB with the standard set of pulse decompositions [8]. These simulations can be compared directly with the curves in Fig. 2(g,h). While the maximum excitation population observed in the simulations is larger than the value observed in the experiments, the simulations for both qubits show the same qualitative behaviour as the measured data. The quantitative FIG. S11. Simulation of sequential (interleaved) RB for several cross-driving ratios. For each cross-driving ratio, 50 simulation runs were performed, using sequential RB up to m = 800 Clifford rounds. Under the cross-driving levels pertaining in our experiments (0.0037 and 0.0076 for QT and QB, respectively), the error per Clifford for the idling operation is dominated by the effects of T1 relaxation as calculated from Eq. (1). difference may be explained by the fact that the direct measurements of cross-driving were made at a different time from the main measurement run and we observed some small fluctuations in cross-driving levels over time.
As discussed in the main text, while the plots of crossexcitation during RB are useful diagnostics of the presence of a spurious cross-driving effect, they may give a misleading impression when presented in parallel with RB results. Although the decay curves look superficially similar, they should not be interpreted in the same way. By contrast, the technique of interleaved RB (IRB), which was introduced to enable rigorous quantification of the performance of individual gates, allows us to calculate a meaningful error per Clifford for the idling operation [13]. In IRB, the usual random sequence of Cliffords is alternated with identical repetitions of an individual gate. By comparing the interleaved decay rate with the decay rate for a standard RB measurement, it is possible to calculate a robust error per gate for the individual gate in question. In this context, the target gate is the nominal identity operation on one qubit which results from a random Clifford being applied to the other qubit. The IRB pulse sequence is therefore identical to the sequence implemented in the sequential selectivebroadcasting scheme (see Fig. 4 of the main text). In the main text, we use the formulas in Ref. [13] to calculate the idling error per Clifford, but for these simulations, the performance of sequential selective-broadcasting already provides a simple way to assess the performance of cross-excitation during idling. Figure S11 shows that idling performance as quantified by RB is limited mainly by T 1 relaxation. Finally, when identical gate sequences are being applied to both qubits, cross-driving will result in a small amount of over-driving on each qubit (overdriving ratio r o ), which would also look like an error in pulse rotation angle. Figure S12 shows that the Clifford error is insensitive to both cross-driving and over-driving to first order. S12. Simulations of Clifford fidelity FC as a function of the cross-driving ratio (rc: red) and the relative over-driving rotation error (ro−1: green). For each error type, 50 simulation runs were performed, using sequential RB on both qubits simultaneously up to m = 800 Clifford rounds, after which FC was extracted from an exponential fit to averaged data. The T1 limit [Eq. (1)] is given by a horizontal dashed line. The data shows that FC is first-order insensitive to both cross-driving and to over-rotations.

D. Making pulse sequences robust to cross-driving
We have already shown that cross-excitation does not have a dominant effect on single-qubit control in both global and selective broadcasting. We show here that any residual effect can be largely eliminated also while a qubit is idling by choosing robust pulse sequences for decomposing the Clifford gates.
If a qubit is idle, every pulse that is applied to the driven qubit rotates the idle qubit by an amount depending on the cross-driving ratio. The random application of successive pulses to the driven qubit can therefore be viewed as a random walk for the idle qubit. As we will discuss in more detail in Sec. VIII, there are many ways to compose a given Clifford gate from a small set of standard rotations. By choosing the constituent pulses in such a way that their combined application largely cancels out, cross-driving effects can be greatly reduced. In the standard set of Clifford decompositions [8], the decompositions involve a majority of pulses rotating in the positive direction, biasing the random walk and producing a pronounced net cross-driving effect. This effect can be countered by choosing decompositions which minimize the bias. We have implemented this in the 5-primitives scheme, by choosing the first three pulses, {X π/2 , Y π/2 , X π/2 }, to be positive rotations, and the last two, {X −π , Y −π }, to move in the negative direction. Even though the pulse subset that is applied depends on the Clifford chosen, the pulses still largely cancel out after applying many Cliffords. Furthermore, as the single-qubit Clifford operations form a group, the inverse of all Cliffords also form the Clifford group. The complete inverse of the five pulse primitives, {X π , Y π , X −π/2 , Y −π/2 , X −π/2 }, can therefore also generate each of the 24 Cliffords using an appropriate subset of the pulses. By alternating between the normal five-pulse primitives and the inverted five pulses, crossdriving effects can be further reduced. (In fact, we note FIG. S13. Measured cross-driving results when performing selective broadcasting. Even though the selective broadcasting schemes are meant for multiple qubits, the markers of the measured qubit are turned off to measure cross-driving effects. The cross-driving effects of QB are stronger than for QT, in agreement with single-qubit results. With pulse decompositions whose cumulative effect largely cancels out, cross-driving is strongly reduced in the 5-primitives method. that this exactly eliminates all cross-driving that occurs via leakage in the VSM, because all pulses are always present at that distribution stage.) We refer to this as the symmetric 5-primitives technique and this is the technique we implement in the main experiments described in Fig. 4 of the main text. Our simulations in Fig. S10 show that the asymmetric 5-primitives technique already dramatically reduces the effect of cross-driving, and in the case of isolated single-qubit control, cross-driving is effectively eliminated completely using the symmetric 5primitives scheme. This is also confirmed by measurements of cross-driving for the three selective-broadcasting schemes (Fig. S13), where we implement the symmetric 5-primitives technique. While we have only demonstrated this technique for the 5-primitives scheme, it could also be relatively straightforwardly applied to the sequential scheme, but the far better scaling of the 5primitives scheme make it more interesting for scaling up to larger system sizes.

VII. CLIFFORD PULSE DECOMPOSITION
The decompositions of the 24 single-qubit Clifford gates into a minimal set of π/2 and π pulses and into the 5-primitives scheme are shown in Table S3.

Clifford ID Minimal set decomposition 5-primitives decomposition
First Second Third X π/2 Y π/2 X π/2 X−π Y−π 1 . Two decompositions of the 24 single-qubit Clifford gates. The first, taken from Ref. [8], minimizes the number of π/2 and π pulses around the ±x and ±y axes. The second is our decomposition into 5 primitives. Pulses are applied from left to right.

VIII. COMPILED SELECTIVE BROADCASTING ALGORITHM
A. Finding the optimal pulse sequence When using a selective broadcasting architecture to send pulse sequences to multiple qubits, pulses can be directed to any subset of the qubits, but distinct pulses may not be applied simultaneously. In compiled selective broadcasting, the total number of pulses required to implement single-qubit gates on all qubits of a multi-qubit system is minimized by searching all possible combinations of single-qubit Clifford decompositions and grouping together like pulses where possible. In this section, we introduce an algorithm for determining the shortest compiled pulse sequence implementing independent single-qubit Clifford gates on n qubits.
On average, there are approximately 38 distinct decompositions for each single-qubit Clifford gate, given the basis set of X and Y pulses: {I, X π , Y π , X ±π/2 , Y ±π/2 }, resulting in approximately 38 n different decompositions for a given n-qubit combination of Cliffords. Here, we only consider sequences of up to four pulses, because the 5-primitives decomposition identified in Table S3 already provides a recipe for decomposing an arbitrary n-qubit Clifford combination into five pulses. We do not include trivial decompositions where sequential pulses cancel out.
Given a particular choice of n Cliffords (C α1 , . . . , C αn ), where α i is the Clifford ID for qubit i, we write a specific decomposition as P 1 1 , ..., P 1 m1 , ..., P n 1 , ..., P n mn , where P i j is the jth of m i pulses which implement C αi . While this already fixes the order in which pulses must be applied to individual qubits, we still have the freedom to choose in which order the distinct pulses are applied to different qubits. For each possible decomposition, we use the following recursive algorithm to search and minimize over all possible pulse orderings.
We first define an empty broadcasting sequence P seq to store the compiled multi-qubit pulse sequence. In order to convert from parallel single-qubit pulse sequences to the single broadcasting sequence P seq , we define a vector of indices β = (β 1 , . . . , β n ) to store the current position in each single-qubit sequence. Initially, β = (1, ..., 1). At each instant, P β = (P 1 β1 , . . . , P n βn ) contains the next pulses to be applied to each qubit. When β i = m i + 1, the pulse sequence for that qubit is completed, and so there is no P βi to be added to P β . The recursive part of the algorithm then proceeds as follows: 1. Define P β to be the set of distinct pulses in P β .
If:: P β is empty, store the number of pulses in P seq and abort this recursion branch (P seq is a completed pulse sequence such that all Cliffords are applied to the corresponding qubits).
2. For each pulse P in P β , perform the following steps: After considering all possible pulse sequences and looping over all possible decompositions, choose the sequence with the minimum number of pulses N P in P seq . This algorithm determines the minimum number of pulses N P required to implement a given n-qubit combination of single-qubit Clifford gates. However, this becomes prohibitively resource intensive as the number of qubits increases. It is therefore important to assess how the performance of the optimal compiled decomposition compares with the 5-primitives decomposition, which can be applied to any number of qubits without any extra overhead in resources (in neither calculation time nor sequence length). To do this, we use the average number of pulses N P required per n-qubit combination of Cliffords (per Clifford).
In the case of compiled selective broadcasting, finding N P requires minimizing the sequence length for all 24 n  Fig. 4 of main text.
possible Clifford combinations. This problem again scales exponentially with n. For example, for n = 5 qubits, this requires 24 5 · 38 5 ≈ 6.3 · 10 14 repetitions of the complete recursive search described above. Nevertheless, by employing a number of optimizations desribed in the next section, we have exactly calculated N P for 1 ≤ n ≤ 5 qubits in under 2 hours. Using random sampling (finding the shortest pulse sequence for a random sample of Clifford combinations), we also approximated N P for 5 ≤ n ≤ 10. Exact and approximate results for n = 5 agree. As shown in Table S4, the improvement offered by compiled selective broadcasting over the 5-primitives method is already less than one pulse per Clifford and continues to decrease rapidly. Considering how badly the resource overhead scales for increasing numbers of qubits for finding a compiled sequence, it is questionable whether compiling offers any significant benefit over using the prescriptive 5-primitives approach when scaling up to larger system sizes.

B. Optimizations for the Clifford compilation algorithm
In the first optimization, we place an upper bound N ub on the pulse sequence length. The upper bound N ub is given by the minimum number of pulses found so far that can compile a given Clifford combination. At each stage, the algorithm checks if the sum of the pulses in P seq and all distinct pulses left is equal to or greater than N ub . If this is the case, a shorter combination of pulses using P seq is not possible, so we stop considering this sequence and proceed to the next one. Initially N ub = 5, as the 5-primitives method proves that there is always a decomposition of an arbitrary number of Cliffords into 5 pulses. Note that, as the limit N ub moves down, the frequency at which the algorithm stops considering sequences increases.
The second optimization relies on decompositions with fewer pulses being more likely to result in an optimal Clifford compilation. The decompositions of every Clifford are therefore arranged in ascending number of pulses. The first decompositions compared are then those with the minimum number of pulses; these have the highest probability of finding an optimal Clifford compilation. Even if an optimal Clifford compilation is not found, it is more likely that N ub will be low. This optimization is especially effective in combination with the first optimization.
The third optimization places a lower bound N lb on the number of pulses. For a given Clifford combination (C α1 , . . . , C αn ), N lb is found by looking at the minimum number of pulses N P previously found for all n−1 Clifford subsets. Since N P for the n Cliffords can never be less than N P for any of the n − 1 Clifford subsets, the maximum length of the n − 1 Clifford subsets therefore places a lower bound N lb on N P for the n Cliffords. This means that if a pulse sequence is found whose length is equal to N lb , it is an optimal Clifford compilation, and all further search is aborted. This is in contrast to the first optimization, where only the particular sequence of pulses is aborted upon reaching N ub . Furthermore, as n increases, it becomes increasingly likely that the lower bound is 5. In this case, N lb = N ub , and so the 5-primitives method is an optimal Clifford compilation. This optimization results in the largest gain in computation time, by several orders of magnitude.
In the fourth and most complicated optimization, all decompositions composed of three pulses or less are separated from those composed of four pulses. First, all combinations of Clifford decompositions composed of three pulses or less are compared. This reduces the average number of decompositions per Clifford, from 38 to 7, resulting in an exponentially reduced number of total decomposition combinations. It is, however, not always the case that the optimal Clifford compilation is found using only up to three pulses per decomposition; sometimes optimal Clifford compilations requires that one of the decompositions is composed of four pulses. However, after comparing decompositions of three pulses or less, these four-pulse decompositions only need to be considered when N lb ≤ 4 and N ub = 5. If there is a sequence containing a four-pulse decomposition that outperforms any found using up to three-pulse decompositions and the 5-primitives method, the sequence must consist of four pulses. Only one Clifford then has a four-pulse decomposition, while all other Cliffords are subsets of these four pulses. We therefore loop, for every Clifford, over each of the four-pulse decompositions, and test whether every other Cliffords can be decomposed into a subset of these four pulses. This changes the comparison of fourpulse decompositions from scaling exponentially with n to scaling linearly.
The fifth and final optimization is only of use when all different Clifford combinations need to be considered to determine N P . It stems from the observation that an optimal compilation for a certain Clifford combination (C α1 , . . . , C αn ) is the same as for any permutation of those Cliffords. We therefore only determine an optimal Clifford compilation when β 1 ≤ · · · ≤ β n . This reduces the number of calculations exponentially (81 times fewer computations for n = 5).