Leakage reduction in fast superconducting qubit gates via optimal control

Reaching high speed, high fidelity qubit operations requires precise control over the shape of the underlying pulses. For weakly anharmonic systems, such as superconducting transmon qubits, short gates lead to leakage to states outside of the computational subspace. Control pulses designed with open-loop optimal control may reduce such leakage. However, model inaccuracies can severely limit the usability of such pulses. We implemented a closed-loop optimization that simultaneously adapts all control parameters based on measurements of a cost function built from Clifford gates. By parameterizing pulses with a piecewise-constant representation that matches the capabilities of the control hardware we create a $4.16~\rm{ns}$ single-qubit pulse with $99.76\,\%$ fidelity and $0.044\,\%$ leakage. This is a seven-fold reduction of the leakage rate of the best DRAG pulse we have calibrated at such short durations on the same system.


I. INTRODUCTION
Superconducting qubits are a promising candidate to realize large scale quantum computing systems [1][2][3]. The architecture is scalable [4], microwave control electronics are well developed and readily available, and transmon-type qubit designs [5] allow for stable operations. To accurately manipulate the quantum state of the weakly anharmonic qubit, control methods have been steadily improved to address common problems such as frequency crowding [6][7][8] and cross talk [9]. In particular, with the powerful tools provided by open-loop optimal control theory preparing target states [10][11][12] and gates [13,14] can be realized with high fidelity. In these methods numerically simulated system models are used to optimize hundreds of parameters that determine the shape of the control fields applied to the quantum system [15,16]. When the system model is accurate enough, the optimized control pulses can immediately be applied in experiment, yielding high performance and reliable control [17][18][19]. However, applying such control methods to superconducting qubits produces less accurate results in comparison to ion traps [20,21] and nuclear magnetic resonance systems [15,22] since models with sufficient accuracy are not available for superconducting qubits. Effects that are hard to accurately reproduce in simulation include instrument noise, transfer functions [23][24][25], additional modes and coupling to unwanted quantum systems [26]. As a result, pulse shaping for superconducting qubits requires closed-loop optimal control, i.e. direct optimization on the experimental system, which limits the amount of tunable parameters defining the pulse shapes [14,27].
Furthermore, optimal control can suppress leakage out of the computational subspace occurring for fast qubit gates. Fast gates are required to lower the limits on gate errors set by decoherence. Moreover, in combination with short readout times [28] and fast qubit reset [29,30] fast gates are useful to reduce the overall execution time of quantum algorithms, such as variational quantum eigensolvers [31] that require many repetitions to gather statistics on quantum measurements [32][33][34]. However, fast qubit gates suffer from leakage effects and additional unitary errors caused by the large bandwidth of the short control pulse [6]. Reducing leakage out of the computational subspace is paramount for error correction since correcting such errors requires significantly more resources than correcting errors in the computational sub-space [35][36][37].
In this work, we increase the fidelity of short duration single-qubit gates. Control shapes are optimized in a closed-loop fashion to capture the full system dynamics and avoid model limitations. The resulting pulses significantly mitigate leakage effects. As optimization is limited by the experimental runtime, we investigate the time budget of the setup. We employ methods such as restless measurements [38] to speed-up the optimization and use the CMA-ES algorithm [39] instead of, for instance, Nelder-Mead [40] to handle the large number of parameters. With these improvements we experimentally optimize pulses with up to 55 parameters.

II. RESULTS
The system consists of two transmon-type fixedfrequency superconducting qubits [41] coupled by a flux tunable coupler [42,43]. Experiments are carried out on one of the qubits with a transition frequency of ω 01 /2π = 5117.22 MHz, an anharmonicity ∆/2π = −315.28 MHz and coherence times of 105 µs and 39 µs for T 1 and T 2 , respectively. The qubit is controlled by microwave pulses applied via a readout resonator capacitively coupled to the qubit. Pulses consist of two control components Ω x (t) and Ω y (t), which are combined into a single drive signal Ω = Ω x + iΩ y . The pulse is up-converted to the qubit frequency using a microwave vector signal generator in a single-sideband configuration.
The in-phase-and quadrature-components in the mixing process are the real and imaginary parts of the pulse modulated at an intermediate sideband frequency thesizing this signals by an arbitrary waveform generator (AWG) results in real-time control over phase, frequency and amplitude [44].
In a frame rotating at the qubit frequency, the transmon Hamiltonian is given bŷ where terms rotating at twice of the qubit frequency have been omitted. The i th level of the transmon is denoted by |i . The operatorsσ x j,j−1 = √ j (|j j − 1| + |j − 1 j|) andσ y j,j−1 = i √ j (|j j − 1| − |j − 1 j|) couple adjacent energy levels. Therefore, Ω x -pulses at the resonance frequency ω 01 drive rotations about the x−axis of the Bloch sphere spanned by {|0 , |1 }, see Fig. 1. The total area of the pulse envelope defines the rotation angle Θ. The rotation axis can be freely chosen in the xy-plane by changing the phase of the drive signal φ. By selecting φ = nπ/2 (n = 0, 1, . . .) and Θ = π/2, ±X/2 and ±Y /2 single-qubit operations are realized.
Since transmons have a low anharmonicity, fast pulses with a wide frequency response lead to leakage out of the computational subspace defined by the two lowestlying energy eigenstates. This process is suppressed by derivative removal gates (DRAG) [6,45,46], designed to reduce leakage and phase errors caused by inadvertent driving of the |1 ↔ |2 transition. The first-order DRAG correction ( Fig. 1(a); dashed lines) to a Gaussian shaped pulse Ω x (t) = A exp −t 2 /(2σ 2 ) with amplitude A and width σ, is The correction in the imaginary component of Ω DRAG (t) with the scaling parameter β eliminates the spectral weight of the pulse at the |1 ↔ |2 transition.
Although being designed for fast, short gates DRAG fails to produce high fidelities when the gate duration is lower than ∼ 10/∆ [6]. To overcome this, either higherorder correction terms or pulses with more degrees of freedom have to be employed. To find suitable pulses we use a parameterization that applies a correction δ n = a n + ib n at each point in time to a calibrated DRAG pulse, similar to common optimal control approaches [17,47]. This results in a list of piecewise-constant control amplitudes as shown in Fig. 1(a). The time discretization ∆t is naturally given by the sampling rate of the AWG generating the pulse envelope. We use a Zurich Instruments HDAWG [48] operating at a sampling rate of f s = 2.4 GS/s. The optimization parameters are the amplitude corrections a n and b n to the n-th sample of Ω x and Ω y , respectively, with the initial guess a n = b n = 0.

A. Pulse parameter optimization
Since the parametrization in Eq. (3) no longer permits an individual optimization of each parameter we simultaneously optimize all of them using the Covariance Matrix Adaptation -Evolution Strategy (CMA-ES) optimization algorithm [39] (see Methods section). It is based on generating sets of parameters S k that describe k = 1, ..., λ different pulse shapes as candidate solutions. The parameters in S k are defined by the parametrization of the pulse shape. The fidelity of each candidate solution is evaluated by a cost function, which serves to generate a new set of candidate solutions. This process is repeated until convergence is reached and the best solution is found.
As a cost function we use randomized benchmarking (RB) sequences with a fixed number of m Clifford gates [27] averaged over K sequence realizations, see Fig. 2(a). This corresponds to evaluating only a single point in a standard RB measurement [49,50] which reduces the runtime to evaluate the cost function. We construct the Clifford gates by composing ±X/2 and ±Y /2 pulses, each based on the pulse shape defined by S k , see Fig. 2 The average ground state population p 0 (m) of the final qubit state defines the cost function, which is maximized by the optimizer. To estimate the fidelity of the optimized pulses we finally perform a full randomized benchmarking measurement.

B. Fidelity estimates of optimized short pulses
We optimize single-qubit pulses of varying duration ranging from N = 10 to N = 26 samples per pulse, corresponding to a duration τ = N ·f s ranging from 4.16 ns to 10.83 ns. We use K = 20 sequences of m = 120 Clifford gates. Each sequence is measured 1000 times using the Cost function per evaluation restless measurement protocol [38] at a rate of 100 kHz. We first use the CMA-ES based optimization procedure to calibrate DRAG pulses, defined in Eq. (2). For this we choose the amplitude A, the DRAG parameter β and the sideband frequency ω ssb as optimization parameters, i.e. S = {A, β, ω ssb }. The results of our CMA-ES based calibration is shown in Fig. 3 (blue circles). The resulting fidelities compare well with standard sequential error amplification calibration methods [46]. The optimized DRAG pulse then serves as initial guess for a second optimization step in which we extend S by the amplitude corrections to S = {A, β, ω ssb , a 1 , b 1 , ..., a N , b N }. For gates longer than τ = 6 ns we find a constant fidelity of F = 99.87(1) % both for the DRAG pulse and the piecewise-constant optimized pulse, see Fig. 3. For gates shorter than 6 ns we observe a decrease of fidelity for the DRAG pulses consistent with the 10/∆ limit (see the grey line in Fig. 3), while the fidelity of the piecewiseconstant optimized pulses remains constant even for the shortest gate duration. Drive power limitations prevent us from implementing gates shorter than 4 ns.
To assess the influence of leakage on the shortest 4.16 ns pulse displayed in Fig. 4(a) and Fig. 4(b) we follow the leakage randomized benchmarking protocol outlined in [37]. The leakage RB analysis requires measuring the probabilities p j to occupy the states |j with j ∈ {0, 1, 2} after the standard RB gate sequences. The probability p χ1 = p 0 + p 1 = 1 − p 2 of remaining in the computational subspace χ 1 = {|0 , |1 } is fitted using the decay model A + Bλ n 1 to find the average leakage per Clifford Here n is the number of Clifford gates while A, B, and λ 1 are fit parameters.
Using the extracted leakage decay Bλ n 1 we fit p 0 (n) using the double decay model A 0 + Bλ n 1 + C 0 λ n 2 to find the average Clifford gate fidelity The leakage rate of the optimized piecewise-constant pulses L PWC

III. DISCUSSION
Our results show that optimal pulse shaping using a piecewise-constant basis improves the gate fidelity of short pulses, reducing leakage errors by a factor of seven and standard errors by a factor of three. At longer gate durations, controlling the pulse shapes beyond analytical DRAG pulses does not improve the fidelity. All our pulses, aside from the DRAG pulses shorter than 5.5 ns, are limited to an error per gate of 0.13(1) % on average.
The fidelities that we measured are, however, not limited by the T 1 -time, which sets an error per gate limit of 5 · 10 −5 , see Fig. 3. Instead, the fidelity limitation we observe may be explained by a dephasing proportional to the Rabi rate of the drive [46], as illustrated by the simulated fidelities shown in Fig. 3 (see Methods).
The improvements with more complex pulse shapes come at the expense of long calibration times. Optimizing the longest pulse shape with N = 26 samples (i.e. 55 parameters) took up to 25 hours. To understand how this time can be reduced we have measured the time taken to create the pulse sequences, initialize the control electronics, and gather the data (see Methods section). Creating the pulse sequences and initializing the control electronics at each iteration consumes the most time. Gathering the required data is only a small fraction of the total experimental run time. With further improvements of the control electronics, for instance an internal generation of the 100 MHz side-band modulation, we expect further significant reductions in the overall runtime of the optimizer.
Our work demonstrates that optimizing -or calibrating -pulses with up to 55 parameters is experimentally feasible. This opens up the possibility to explore more elaborate optimal control methods on superconducting qubit platforms. We plan to extend this scheme to multiqubit gates, where system dynamics are more complex and analytic optimal control methods are not as developed as for single-qubit gates [16]. While a piecewiseconstant parametrization, as done for single-qubit gates, is harder due to the long duration of two-qubit gates, other analytical pulse representations, such as its spectral components, will be explored to improve on gate performance.

IV. METHODS
To optimize all parameters of the pulse shape simultaneously on the experimental setup, we have chosen the Covariance Matrix Adaptation -Evolution Strategy (CMA-ES) optimization algorithm as a noise-resilient and time-efficient optimizer [39]. This algorithm optimizes a population of λ candidate solutions which are normally distributed in the parameter space. The center and spread of the distibution are chosen as starting conditions of the optimization.
Generally, the choice of the population size λ is a tradeoff between fast convergence speed and avoiding local optima [39]. However, experimentally we have to consider the time required to process the pulse sequences (i.e. the time required to compile the pulse sequences into AWG files), to initialize the hardware (including data transfer) and to measure the cost function for different population sizes λ, see Fig. 5. We benchmark these three times using a set of 20 Clifford gate sequences per candidate solution, each with 100 Clifford gates. By dividing the total time required to evaluate the entire population by λ we calculate the effective time required to asses a single candidate solution. This allow us to gauge how efficiently the hardware is being used, see Fig. 5(c). The instrument initialization introduces a constant overhead, see Fig. 5(a), which decreases the efficiency of the optimization algorithm for lower population sizes. Single-point optimizers, such as Nelder-Mead [40], are thus an inefficient choice. However, when evaluating larger populations the contribution of the constant offset of the initialization is distributed over multiple measurements, leading to a convergence of the evaluation time per candidate. The data acquisition time is small due to our implementation of restless measurements [38] which allow for a 100 kHz repetition rate. The data analysis time is negligible in comparison to the three main contributions and is not included in the analysis.

A. Numerical results
To obtain the numerical results in Fig. 3 (blue dashed and red dotted lines) we model the qubit as a driven an-harmonic oscillator with a Hilbert space of dimension d = 4. We simulate the quantum dynamics and perform quantum optimal control with the q-optimize package [51] which is built using TensorFlow [52]. This allows for fast simulations of the dynamics of open and closed quantum systems and gradient-based optimization of the goal function via automatic-differentiation.
For each of the ±X/2 and ±Y /2 gates, we convolve both the DRAG and the piecewise-constant pulses, as sampled by the AWG, with a Gaussian window to produce a 0.3 ns rise-time, thus, emulating the limited bandwidth of the AWG. We obtain the dynamics as superoperator matrices Λ ±X/2 and Λ ±Y /2 by solving the master equation in Lindblad form, which includes T 1 and T 2 . Next, we compose the gates C i in the Clifford group C from these atomic operations, for example The pulse parameters S are optimized using the L-BFGS gradient search [53] to maximize the average Clifford gate fidelity F C = |C| −1 i F Ci . This optimizer is more efficient for numerical simulations than the CMA-ES optimizer if measurement noise is neglected and gradients can be computed. The fidelity of each Clifford gate is F Ci = 1 d+1 (dX 0,0 + 1). Here, X 0,0 is the (0, 0) th element of the Choi matrix X representing the gate error Λ = C † i • C i between the perfect Clifford gate C i and the implemented Clifford gate C i [54]. In addition to T 1 and T 2 relaxation we include an amplitude-dependent dephasing error channel D(ρ) = (1 − γ φ ) IρI +γ φ ZρZ, applied as a superoperator to each single-qubit operation composing a Clifford gate, e.g. C 6 = D•Λ −X/2 •D•Λ −Y /2 •D•Λ X/2 . The dephasing strength is γ φ = k · Ω · t gate , where Ω and t gate are the average amplitude and length, respectively, of the pulses implementing the gates. The constant k is chosen to match the data.