Leakage reduction in fast superconducting qubit gates via optimal control

Reaching high-speed, high-fidelity qubit operations requires precise control over the shape of the underlying pulses. For weakly anharmonic systems, such as superconducting transmon qubits, short gates lead to leakage to states outside of the computational subspace. Control pulses designed with open-loop optimal control may reduce such leakage. However, model inaccuracies can severely limit the usability of such pulses. We implemented a closed-loop optimization that simultaneously adapts all control parameters based on measurements of a cost function built from Clifford gates. We directly optimize the amplitude and phase of each sample point of the digitized control pulse. We thereby fully exploit the capabilities of the pulse generation electronics and create a 4.16 ns single-qubit pulse with 99.76 % fidelity and 0.044 % leakage. This is a sevenfold reduction of the leakage rate and a threefold reduction in standard errors of the best DRAG pulse we have calibrated at such short durations on the same system.


INTRODUCTION
Superconducting qubits are a promising candidate to realize large-scale quantum computing systems [1][2][3] . The architecture is scalable 4 , microwave control electronics are well developed and readily available, and transmon-type qubit designs 5 allow for stable operations. To accurately manipulate the quantum state of the weakly anharmonic qubit, control methods have been steadily improved to address common problems such as frequency crowding [6][7][8] and cross talk 9 . In particular, with the powerful tools provided by open-loop optimal control theory preparing target states [10][11][12] and gates 13,14 can be realized with high fidelity. In these methods, numerically simulated system models are used to optimize hundreds of parameters that determine the shape of the control fields applied to the quantum system 15,16 . When the system model is accurate enough, the optimized control pulses can immediately be applied in experiment, yielding high performance and reliable control [17][18][19] . However, applying such control methods to superconducting qubits produces less accurate results in comparison to ion traps 20,21 and nuclear magnetic resonance systems 15,22 since models with sufficient accuracy are not available for superconducting qubits. Effects that are hard to accurately reproduce in simulation include instrument noise, transfer functions [23][24][25] , additional modes and coupling to unwanted quantum systems [26][27][28][29] . As a result, pulse shaping for superconducting qubits requires closed-loop optimal control, i.e., direct optimization on the experimental system, which limits the amount of tunable parameters defining the pulse shapes 14,30 .
Furthermore, optimal control can suppress leakage out of the computational subspace occurring for fast qubit gates. Fast gates are required to lower the limits on gate errors set by decoherence. Moreover, in combination with short readout times 31 and fast qubit reset 32,33 fast gates are useful to reduce the overall execution time of quantum algorithms, such as variational quantum eigensolvers 34 that require many repetitions to gather statistics on quantum measurements [35][36][37] . However, fast qubit gates suffer from leakage effects and additional unitary errors caused by the large bandwidth of the short control pulse 6 . Reducing leakage out of the computational subspace is paramount for error correction since correcting such errors requires significantly more resources than correcting errors in the computational subspace [38][39][40] .
In this work, we increase the fidelity of short duration singlequbit gates. Control shapes are optimized in a closed-loop fashion to capture the full system dynamics and avoid model limitations. The resulting pulses significantly mitigate leakage effects. As optimization is limited by the experimental runtime, we investigate the time budget of the setup. We employ methods such as restless measurements 41 to speed-up the optimization and use the covariance matrix adaptation-evolution strategy (CMA-ES) algorithm 42 instead of, for instance, Nelder-Mead 43 to handle the large number of parameters. With these improvements we experimentally optimize pulses with up to 55 parameters.

Setup and system control
The system consists of two transmon-type fixed-frequency superconducting qubits 44 coupled by a flux tunable coupler 45,46 , with an identical architecture as used in 36 . Experiments are carried out on one of the qubits with a transition frequency of ω 01 /2π = 5117.22 MHz, an anharmonicity Δ/2π = − 315.28 MHz and coherence times of 105 μs and 39 μs for T 1 and T 2 , respectively. The qubit is controlled by microwave pulses applied via a readout resonator capacitively coupled to the qubit. Pulses consist of two control components Ω x (t) and Ω y (t), which are combined into a single drive signal Ω = Ω x + iΩ y . The pulse is up-converted to the qubit frequency using a microwave vector signal generator in a single-sideband configuration.
The in-phase-and quadrature-(IQ) components in the mixing process are the real and imaginary parts of the pulse modulated at an intermediate sideband frequency ΩðtÞ expfiðω ssb t þ ϕÞg with ω ssb /2π = 100 MHz. Synthesizing this signals by an arbitrary waveform generator (AWG) results in real-time control over phase, frequency and amplitude 47 .
In a frame rotating at the qubit frequency, the transmon Hamiltonian is given bŷ where terms rotating at twice the qubit frequency have been omitted. Here, e Ω x;y are the IQ-components of the drive after the displacement due to the microwave resonator 48 . The i th level of the transmon is denoted by Þ couple adjacent energy levels. Therefore, Ω x -pulses at the resonance frequency ω 01 drive rotations about the x − axis of the Bloch sphere spanned by f 0 j i; 1 j ig, see Fig. 1. The total area of the pulse envelope defines the rotation angle Θ. The rotation axis can be freely chosen in the xy-plane by changing the phase of the drive signal ϕ. By selecting ϕ = nπ/2 (n = 0, 1, …) and Θ = π/2, ±X/ 2 and ±Y/2 single-qubit operations are realized.
Since transmons have a low anharmonicity, fast pulses with a wide frequency response lead to leakage out of the computational subspace defined by the two lowest-lying energy eigenstates. This process is suppressed by derivative removal gates (DRAG) 6,49,50 , designed to reduce leakage and phase errors caused by inadvertent driving of the 1 j i $ 2 j i transition. The first-order DRAG correction ( Fig. 1(a); dashed lines) to a Gaussian shaped pulse Ω x ðtÞ ¼ A exp Àt 2 =ð2σ 2 Þ f gwith amplitude A and width σ, is The correction in the imaginary component of Ω DRAG (t) with the scaling parameter β eliminates the spectral weight of the pulse at the 1 j i $ 2 j i transition. Although being designed for fast, short gates DRAG fails to produce high fidelities when the gate duration is lower than~10/ Δ 6 . To overcome this, either higher-order correction terms or pulses with more degrees of freedom have to be employed. To find suitable pulses we use a parameterization that applies a correction δ n = a n + ib n at each point in time to a calibrated DRAG pulse, similar to common optimal control approaches 17,51 . This results in a list of piecewise-constant (PWC) control amplitudes as shown in Fig. 1(a). The time discretization Δt is naturally given by the sampling rate of the AWG generating the pulse envelope. We use a Zurich Instruments HDAWG 52 operating at a sampling rate of f s = 2.4GS/s. The optimization parameters are the amplitude corrections a n and b n to the n-th sample of Ω x and Ω y , respectively, with the initial guess a n = b n = 0.

Pulse parameter optimization
Since the parametrization in Eq. (3) no longer permits an individual optimization of each parameter we simultaneously optimize all of them using the CMA-ES optimization algorithm 42 (see "Methods" section). It is based on generating sets of parameters S k that describe k = 1, . . . , λ different pulse shapes as candidate solutions. The parameters in S k are defined by the parametrization of the pulse shape. The fidelity of each candidate solution is evaluated by a cost function, which serves to generate a new set of candidate solutions. This process is repeated until convergence is reached and the best solution is found.
As a cost function we use randomized benchmarking (RB) sequences with a fixed number of m Clifford gates 30 averaged over K sequence realizations, see Fig. 2(a). This corresponds to evaluating only a single-point in a standard RB measurement 53,54 , which reduces the runtime to evaluate the cost function. We construct the Clifford gates by composing ±X/2 and ±Y/2 pulses,  The Ω x and Ω y components are displayed in red and blue, respectively. c Sketch of ideal RB curves 1 þ F m ð Þ =2 with the cost function for m = 120 Clifford gates indicated as points for several fidelities F . d Experimental data of a full optimization run for a 23 dimensional parameter space. Each blue point represents the cost function of each of the λ = 30 candidate pulse shapes based on a unique parameter set S k evaluated using 20 Clifford sequences. The red points represent the average cost function at each iteration of the optimizer. each based on the pulse shape defined by S k , see Fig. 2(b). The average ground state population p 0 ðmÞ of the final qubit state defines the cost function, which is maximized by the optimizer. To estimate the fidelity of the optimized pulses we finally perform a full randomized benchmarking measurement.

Fidelity estimates of optimized short pulses
We optimize single-qubit pulses of varying duration ranging from N = 10 to N = 26 samples per pulse, corresponding to a duration τ = N ⋅ f s ranging from 4.16 ns to 10.83 ns. We use K = 20 sequences of m = 120 Clifford gates. Each sequence is measured 1000 times using the restless measurement protocol 41 at a rate of 100 kHz. We first use the CMA-ES-based optimization procedure to calibrate DRAG pulses, defined in Eq. (2). For this we choose the amplitude A, the DRAG parameter β and the sideband frequency ω ssb as optimization parameters, i.e., S ¼ A; β; ω ssb f g . The results of our CMA-ES-based calibration are shown in Fig. 3 (blue circles). The resulting fidelities compare well with standard sequential error amplification calibration methods 50 . The optimized DRAG pulse then serves as initial guess for a second optimization step in which we extend S by the amplitude corrections to S 0 ¼ A; β; ω ssb ; a 1 ; b 1 ; :::; a N ; b N f g . Initializing the PWC optimization with optimal DRAG pulses reduces the number of iterations, as they already produce high fidelities (see Supplementary Notes 5).
For gates longer than τ = 5 ns we find a constant fidelity of F = 99.87(1) % both for the DRAG pulse and the piecewise-constant optimized pulse, see Fig. 3. For gates shorter than 5ns we observe a decrease of fidelity for the DRAG pulses consistent with the 10/Δ limit (see the gray line in Fig. 3), while the fidelity of the piecewiseconstant optimized pulses remains constant even for the shortest gate duration. Note that drive power limitations prevent us from implementing gates shorter than 4ns. The simulated DRAG pulses have higher fidelity than the experimental pulses for gates below 5ns (dashed blue line in Fig. 3). This indicates the presence of unknown error sources, such as potential non-linearities in the transfer function at high power 55 , and demonstrates the effectiveness of the subsequent optimization as the PWC optimized pulses overcome these unknown effects. The optimized DRAG and PWC pulses are not decoherence limited as shown by the discrepancy between the noiseless simulations (solid lines) and the measured fidelities. Possible explanations for the observed noise floor are amplitude noise, phase noise, or a combination thereof in the drive signal, which we investigate numerically (dashed lines in Fig. 3) in Supplementary Note 3.
To assess the influence of leakage on the shortest 4.16 ns pulse displayed in Fig. 4(a) and (b) we follow the leakage randomized benchmarking protocol outlined in ref. 40 . The leakage RB analysis requires measuring the probabilities p j to occupy the states j j i with j ∈ {0, 1, 2} after the standard RB gate sequences. The probability p χ 1 ¼ p 0 þ p 1 ¼ 1 À p 2 of remaining in the computational subspace χ 1 ¼ f 0 j i; 1 j ig is fitted using the decay model A þ Bλ n 1 to find the average leakage per Clifford Here, n is the number of Clifford gates while A, B, and λ 1 are fit parameters.
Using the extracted leakage decay Bλ n 1 we fit p 0 (n) using the double decay model A 0 þ Bλ n 1 þ C 0 λ n 2 to find the average Clifford gate fidelity  The leakage rate of the optimized piecewise-constant pulses L PWC 1 ¼ 0:044ð25Þ % is five times lower than the leakage rate of the DRAG pulse L DRAG 1 ¼ 0:29ð3Þ %, see Fig. 4(c). Additionally, we observe a reduction of standard errors from 1 À λ DRAG 2 ¼ 1:49ð15Þ % to 1 À λ PWC 2 ¼ 0:44ð15Þ %, see Fig. 4(d). The resulting average fidelity per Clifford gate, computed using Eq. (4), is F PWC ¼ 99:76ð8Þ % for the piecewise-constant pulse and F DRAG ¼ 99:11ð8Þ % for the DRAG pulse.

DISCUSSION
Our results show that optimal pulse shaping using a piecewiseconstant basis improves the gate fidelity of short pulses, reducing leakage errors by a factor of seven and standard errors by a factor of three. Increasing the fidelity of short pulses is beneficial for quantum computers 56 , in particular considering that the singlequbit gates have similar length than two-qubit gates in certain systems 57 . Moreover, the closed-loop optimal control method we have implemented allows us to overcome leakage in short pulses benefiting low anharmonicity qubits, as discussed in more detail in Supplementary Note 2.
At longer gate durations, controlling the pulse shapes beyond analytical DRAG pulses does not improve the fidelity. All our pulses, aside from the DRAG pulses shorter than 5.5 ns, are limited to an error per gate of 0.13(1) % on average. The fidelities that we measured are, however, not limited by the T 1 -time, which sets an error per gate limit of 5 ⋅ 10 −5 , see Fig. 3. Although we show that the observed fidelity limit can be reproduced using a combination of noises on the control electronics (see Supplementary Note 3), determining the exact composition of error sources is subject of ongoing research.
The improvements with more complex pulse shapes come at the expense of long calibration times. Optimizing the longest pulse shape with N = 26 samples (i.e., 55 parameters) took up to 25 h. To understand how this time can be reduced we have measured the time taken to create the pulse sequences, initialize the control electronics, and gather the data (see "Methods" section). Creating the pulse sequences and initializing the control electronics at each iteration consumes the most time. Gathering the required data is only a small fraction of the total experimental runtime. With further improvements of the control electronics, for instance an internal generation of the 100 MHz sideband modulation, we expect further significant reductions in the overall runtime of the optimizer.
Our work demonstrates that optimizing-or calibrating-pulses with up to 55 parameters is experimentally feasible. This opens up the possibility to explore more elaborate optimal control methods on superconducting qubit platforms. We plan to extend this scheme to multi-qubit gates, where system dynamics are more complex and analytic optimal control methods are not as developed as for single-qubit gates 16 . While a piecewiseconstant parametrization, as done for single-qubit gates, is harder due to the long duration of two-qubit gates, other analytical pulse representations, such as its spectral components, will be explored to improve on gate performance.

Multidimensional parameter calibration
To optimize all parameters of the pulse shape simultaneously on the experimental setup, we have chosen the CMA-ES optimization algorithm as a noise-resilient and time-efficient optimizer 42 . This algorithm optimizes a population of λ candidate solutions, which are normally distributed in the parameter space. The center and spread of the distribution are given as starting conditions and are updated by the optimizer at each iteration 42 . See Supplementary Notes 6 for an example optimization showing the distribution of parameters during calibration.
Generally, the choice of the population size λ is a trade-off between fast convergence speed and avoiding local optima 42 . However, experimentally we have to consider the time required to process the pulse sequences (i.e., the time required to compile the pulse sequences into AWG files), to initialize the hardware (including data transfer) and to measure the cost function for different population sizes λ, see Fig. 5. We benchmark these three times using a set of 20 Clifford gate sequences per candidate solution, each with 100 Clifford gates. By dividing the total time required to evaluate the entire population by λ we calculate the effective time required to asses a single candidate solution. This allow us to gauge how efficiently the hardware is being used, see Fig. 5(c). The instrument initialization introduces a constant overhead, see Fig. 5(a), which decreases the efficiency of the optimization algorithm for lower population sizes. Singlepoint optimizers, such as Nelder-Mead 43 , are thus an inefficient choice. However, when evaluating larger populations the contribution of the constant offset of the initialization is distributed over multiple measurements, leading to a convergence of the evaluation time per candidate. The data acquisition time is small due to our implementation of restless measurements 41 , which allows for a 100 kHz repetition rate. The data analysis time is negligible in comparison to the three main contributions and is not included in the analysis.

Numerical results
To obtain the numerical results (lines in Fig. 3) we model the qubit as a driven an-harmonic oscillator with a Hilbert space of dimension d = 4. See Supplementary Note 4 for a discussion on leakage to higher energy states. We simulate the quantum dynamics and perform quantum optimal control with the q-optimize package 58,59 . For each of the ± X/2 and ± Y/2 gates, we obtain the dynamics as superoperator 60 matrices E ± X=2 and E ± Y=2 by solving the master equation in Lindblad form, which includes T 1 and T 2 . Next, we compose the gates C i in the Clifford group C from these atomic operations, for example C 6 ¼ E ÀX=2 E ÀY=2 E þX=2 . The pulse parameters S are optimized using the L-BFGS gradient search 61 to maximize the average Clifford gate fidelity F C ¼ C j j À1 P i F Ci , where F Ci is the fidelity of each Clifford gate 62 . This optimizer is more efficient for numerical simulations than the CMA-ES optimizer if measurement noise is neglected and gradients can be computed. Additional details regarding the simulations are provided in Supplementary Notes 1.

DATA AVAILABILITY
All relevant data supporting the main conclusions and figures of the document are available on request. Please refer to Max Werninghaus at maw@zurich.ibm.com.
Received: 24 March 2020; Accepted: 4 November 2020; Fig. 5 Runtime distribution of typical experiments. a Experimental runtime consisting of processing the pulse sequences (red right triangles), initializing the setup (blue circles) and measuring the cost function (gray left triangles). b Time per iteration of CMA-ES split into those three categories. In one iteration the cost function of each candidate solution in the whole population of size λ is measured. Error bars are smaller than the size of the data points. c Time per evaluation, as a function of population size. Each candidate solution in a given population requires one evaluation. As the population size increase the experimental runtime to evaluate a full iteration increases and the average time to evaluate a candidate solution decreases.