## Introduction

Superconducting qubits are a promising candidate to realize large-scale quantum computing systems1,2,3. The architecture is scalable4, microwave control electronics are well developed and readily available, and transmon-type qubit designs5 allow for stable operations. To accurately manipulate the quantum state of the weakly anharmonic qubit, control methods have been steadily improved to address common problems such as frequency crowding6,7,8 and cross talk9. In particular, with the powerful tools provided by open-loop optimal control theory preparing target states10,11,12 and gates13,14 can be realized with high fidelity. In these methods, numerically simulated system models are used to optimize hundreds of parameters that determine the shape of the control fields applied to the quantum system15,16. When the system model is accurate enough, the optimized control pulses can immediately be applied in experiment, yielding high performance and reliable control17,18,19. However, applying such control methods to superconducting qubits produces less accurate results in comparison to ion traps20,21 and nuclear magnetic resonance systems15,22 since models with sufficient accuracy are not available for superconducting qubits. Effects that are hard to accurately reproduce in simulation include instrument noise, transfer functions23,24,25, additional modes and coupling to unwanted quantum systems26,27,28,29. As a result, pulse shaping for superconducting qubits requires closed-loop optimal control, i.e., direct optimization on the experimental system, which limits the amount of tunable parameters defining the pulse shapes14,30.

Furthermore, optimal control can suppress leakage out of the computational subspace occurring for fast qubit gates. Fast gates are required to lower the limits on gate errors set by decoherence. Moreover, in combination with short readout times31 and fast qubit reset32,33 fast gates are useful to reduce the overall execution time of quantum algorithms, such as variational quantum eigensolvers34 that require many repetitions to gather statistics on quantum measurements35,36,37. However, fast qubit gates suffer from leakage effects and additional unitary errors caused by the large bandwidth of the short control pulse6. Reducing leakage out of the computational subspace is paramount for error correction since correcting such errors requires significantly more resources than correcting errors in the computational subspace38,39,40.

In this work, we increase the fidelity of short duration single-qubit gates. Control shapes are optimized in a closed-loop fashion to capture the full system dynamics and avoid model limitations. The resulting pulses significantly mitigate leakage effects. As optimization is limited by the experimental runtime, we investigate the time budget of the setup. We employ methods such as restless measurements41 to speed-up the optimization and use the covariance matrix adaptation-evolution strategy (CMA-ES) algorithm42 instead of, for instance, Nelder-Mead43 to handle the large number of parameters. With these improvements we experimentally optimize pulses with up to 55 parameters.

## Results

### Setup and system control

The system consists of two transmon-type fixed-frequency superconducting qubits44 coupled by a flux tunable coupler45,46, with an identical architecture as used in36. Experiments are carried out on one of the qubits with a transition frequency of ω01/2π = 5117.22 MHz, an anharmonicity Δ/2π = − 315.28 MHz and coherence times of 105 μs and 39 μs for T1 and T2, respectively. The qubit is controlled by microwave pulses applied via a readout resonator capacitively coupled to the qubit. Pulses consist of two control components Ωx(t) and Ωy(t), which are combined into a single drive signal Ω = Ωx + iΩy. The pulse is up-converted to the qubit frequency using a microwave vector signal generator in a single-sideband configuration.

The in-phase- and quadrature-(IQ) components in the mixing process are the real and imaginary parts of the pulse modulated at an intermediate sideband frequency $${{\Omega }}(t)\exp \{i({\omega }_{\text{ssb}}t+\phi )\}$$ with ωssb/2π = 100 MHz. Synthesizing this signals by an arbitrary waveform generator (AWG) results in real-time control over phase, frequency and amplitude47.

In a frame rotating at the qubit frequency, the transmon Hamiltonian is given by

$$\frac{{\hat{H}}^{\text{R}}}{\hslash }={{\Delta }}\left|2\right\rangle \left\langle 2\right|+\frac{{\widetilde{{{\Omega }}}}_{x}(t)}{2}\mathop{\sum }\limits_{j=1}^{2}{\hat{\sigma }}_{j,j-1}^{x}+\frac{{\widetilde{{{\Omega }}}}_{y}(t)}{2}\mathop{\sum }\limits_{j=1}^{2}{\hat{\sigma }}_{j,j-1}^{y},$$
(1)

where terms rotating at twice the qubit frequency have been omitted. Here, $${\widetilde{{{\Omega }}}}_{x,y}$$ are the IQ-components of the drive after the displacement due to the microwave resonator48. The ith level of the transmon is denoted by $$\left|i\right\rangle$$. The operators $${\hat{\sigma }}_{j,j-1}^{x}=\sqrt{j}\left(\left|j\right\rangle \left\langle j-1\right|+\left|j-1\right\rangle \left\langle j\right|\right)$$ and $${\hat{\sigma }}_{j,j-1}^{y}=i\sqrt{j}\left(\left|j\right\rangle \left\langle j-1\right|-\left|j-1\right\rangle \left\langle j\right|\right)$$ couple adjacent energy levels. Therefore, Ωx-pulses at the resonance frequency ω01 drive rotations about the x − axis of the Bloch sphere spanned by $$\{\left|0\right\rangle ,\left|1\right\rangle \}$$, see Fig. 1. The total area of the pulse envelope defines the rotation angle Θ. The rotation axis can be freely chosen in the xy-plane by changing the phase of the drive signal ϕ. By selecting ϕ = nπ/2 (n = 0, 1, …) and Θ = π/2, ±X/2 and ±Y/2 single-qubit operations are realized.

Since transmons have a low anharmonicity, fast pulses with a wide frequency response lead to leakage out of the computational subspace defined by the two lowest-lying energy eigenstates. This process is suppressed by derivative removal gates (DRAG)6,49,50, designed to reduce leakage and phase errors caused by inadvertent driving of the $$\left|1\right\rangle \leftrightarrow \left|2\right\rangle$$ transition. The first-order DRAG correction (Fig. 1(a); dashed lines) to a Gaussian shaped pulse $${{{\Omega }}}_{x}(t)=A\exp \left\{-{t}^{2}/(2{\sigma }^{2})\right\}$$ with amplitude A and width σ, is

$${{{\Omega }}}_{\text{DRAG}}(t)={{{\Omega }}}_{x}(t)+i\frac{\beta }{{{\Delta }}}\frac{d\ {{{\Omega }}}_{x}(t)}{dt}.$$
(2)

The correction in the imaginary component of ΩDRAG(t) with the scaling parameter β eliminates the spectral weight of the pulse at the $$\left|1\right\rangle \leftrightarrow \left|2\right\rangle$$ transition.

Although being designed for fast, short gates DRAG fails to produce high fidelities when the gate duration is lower than ~ 10/Δ6. To overcome this, either higher-order correction terms or pulses with more degrees of freedom have to be employed. To find suitable pulses we use a parameterization that applies a correction δn = an + ibn at each point in time to a calibrated DRAG pulse, similar to common optimal control approaches17,51. This results in a list of piecewise-constant (PWC) control amplitudes

$${{{\Omega }}}_{n}={{{\Omega }}}_{\text{DRAG}}(n{{\Delta }}t)+{\delta }_{n},$$
(3)

as shown in Fig. 1(a). The time discretization Δt is naturally given by the sampling rate of the AWG generating the pulse envelope. We use a Zurich Instruments HDAWG52 operating at a sampling rate of fs = 2.4GS/s. The optimization parameters are the amplitude corrections an and bn to the n-th sample of Ωx and Ωy, respectively, with the initial guess an = bn = 0.

### Pulse parameter optimization

Since the parametrization in Eq. (3) no longer permits an individual optimization of each parameter we simultaneously optimize all of them using the CMA-ES optimization algorithm42 (see “Methods” section). It is based on generating sets of parameters $${{\mathcal{S}}}^{k}$$ that describe k = 1, . . . , λ different pulse shapes as candidate solutions. The parameters in $${{\mathcal{S}}}^{k}$$ are defined by the parametrization of the pulse shape. The fidelity of each candidate solution is evaluated by a cost function, which serves to generate a new set of candidate solutions. This process is repeated until convergence is reached and the best solution is found.

As a cost function we use randomized benchmarking (RB) sequences with a fixed number of m Clifford gates30 averaged over K sequence realizations, see Fig. 2(a). This corresponds to evaluating only a single-point in a standard RB measurement53,54, which reduces the runtime to evaluate the cost function. We construct the Clifford gates by composing ±X/2 and ±Y/2 pulses, each based on the pulse shape defined by $${{\mathcal{S}}}^{k}$$, see Fig. 2(b). The average ground state population $${\overline{p}}_{0}(m)$$ of the final qubit state defines the cost function, which is maximized by the optimizer. To estimate the fidelity of the optimized pulses we finally perform a full randomized benchmarking measurement.

### Fidelity estimates of optimized short pulses

We optimize single-qubit pulses of varying duration ranging from N = 10 to N = 26 samples per pulse, corresponding to a duration τ = Nfs ranging from 4.16 ns to 10.83 ns. We use K = 20 sequences of m = 120 Clifford gates. Each sequence is measured 1000 times using the restless measurement protocol41 at a rate of 100 kHz. We first use the CMA-ES-based optimization procedure to calibrate DRAG pulses, defined in Eq. (2). For this we choose the amplitude A, the DRAG parameter β and the sideband frequency ωssb as optimization parameters, i.e., $${\mathcal{S}}=\left\{A,\beta ,{\omega }_{{\rm{ssb}}}\right\}$$. The results of our CMA-ES-based calibration are shown in Fig. 3 (blue circles). The resulting fidelities compare well with standard sequential error amplification calibration methods50. The optimized DRAG pulse then serves as initial guess for a second optimization step in which we extend $${\mathcal{S}}$$ by the amplitude corrections to $${\mathcal{S}}^{\prime} =\left\{A,\beta ,{\omega }_{{\rm{ssb}}},{a}_{1},{b}_{1},...,{a}_{N},{b}_{N}\right\}$$. Initializing the PWC optimization with optimal DRAG pulses reduces the number of iterations, as they already produce high fidelities (see Supplementary Notes 5).

For gates longer than τ = 5 ns we find a constant fidelity of F = 99.87(1) % both for the DRAG pulse and the piecewise-constant optimized pulse, see Fig. 3. For gates shorter than 5ns we observe a decrease of fidelity for the DRAG pulses consistent with the 10/Δ limit (see the gray line in Fig. 3), while the fidelity of the piecewise-constant optimized pulses remains constant even for the shortest gate duration. Note that drive power limitations prevent us from implementing gates shorter than 4ns. The simulated DRAG pulses have higher fidelity than the experimental pulses for gates below 5ns (dashed blue line in Fig. 3). This indicates the presence of unknown error sources, such as potential non-linearities in the transfer function at high power55, and demonstrates the effectiveness of the subsequent optimization as the PWC optimized pulses overcome these unknown effects. The optimized DRAG and PWC pulses are not decoherence limited as shown by the discrepancy between the noiseless simulations (solid lines) and the measured fidelities. Possible explanations for the observed noise floor are amplitude noise, phase noise, or a combination thereof in the drive signal, which we investigate numerically (dashed lines in Fig. 3) in Supplementary Note 3.

To assess the influence of leakage on the shortest 4.16 ns pulse displayed in Fig. 4(a) and (b) we follow the leakage randomized benchmarking protocol outlined in ref. 40. The leakage RB analysis requires measuring the probabilities pj to occupy the states $$\left|j\right\rangle$$ with j {0, 1, 2} after the standard RB gate sequences. The probability $${p}_{{\chi }_{1}}={p}_{0}+{p}_{1}=1-{p}_{2}$$ of remaining in the computational subspace $${\chi }_{1}=\{\left|0\right\rangle ,\left|1\right\rangle \}$$ is fitted using the decay model $$A+B{\lambda }_{1}^{n}$$ to find the average leakage per Clifford $${L}_{1}=\left(1-A\right)\left(1-{\lambda }_{1}\right)$$. Here, n is the number of Clifford gates while A, B, and λ1 are fit parameters.

Using the extracted leakage decay $$B{\lambda }_{1}^{n}$$ we fit p0(n) using the double decay model $${A}_{0}+B{\lambda }_{1}^{n}+{C}_{0}{\lambda }_{2}^{n}$$ to find the average Clifford gate fidelity

$$\overline{F}=\frac{1}{2}\left[{\lambda }_{2}+1-{L}_{1}\right].$$
(4)

The leakage rate of the optimized piecewise-constant pulses $${L}_{1}^{\,\text{PWC}\,}=0.044(25)\ \%$$ is five times lower than the leakage rate of the DRAG pulse $${L}_{1}^{\,\text{DRAG}\,}=0.29(3)\ \%$$, see Fig. 4(c). Additionally, we observe a reduction of standard errors from $$1-{\lambda }_{2}^{\,\text{DRAG}\,}=1.49(15)\ \%$$ to $$1-{\lambda }_{2}^{\,\text{PWC}\,}=0.44(15)\ \%$$, see Fig. 4(d). The resulting average fidelity per Clifford gate, computed using Eq. (4), is $${\overline{F}}_{\text{PWC}}=99.76(8)\ \%$$ for the piecewise-constant pulse and $${\overline{F}}_{\text{DRAG}}=99.11(8)\ \%$$ for the DRAG pulse.

## Discussion

Our results show that optimal pulse shaping using a piecewise-constant basis improves the gate fidelity of short pulses, reducing leakage errors by a factor of seven and standard errors by a factor of three. Increasing the fidelity of short pulses is beneficial for quantum computers56, in particular considering that the single-qubit gates have similar length than two-qubit gates in certain systems57. Moreover, the closed-loop optimal control method we have implemented allows us to overcome leakage in short pulses benefiting low anharmonicity qubits, as discussed in more detail in Supplementary Note 2.

At longer gate durations, controlling the pulse shapes beyond analytical DRAG pulses does not improve the fidelity. All our pulses, aside from the DRAG pulses shorter than 5.5 ns, are limited to an error per gate of 0.13(1) % on average. The fidelities that we measured are, however, not limited by the T1-time, which sets an error per gate limit of 5  10−5, see Fig. 3. Although we show that the observed fidelity limit can be reproduced using a combination of noises on the control electronics (see Supplementary Note 3), determining the exact composition of error sources is subject of ongoing research.

The improvements with more complex pulse shapes come at the expense of long calibration times. Optimizing the longest pulse shape with N = 26 samples (i.e., 55 parameters) took up to 25 h. To understand how this time can be reduced we have measured the time taken to create the pulse sequences, initialize the control electronics, and gather the data (see “Methods” section). Creating the pulse sequences and initializing the control electronics at each iteration consumes the most time. Gathering the required data is only a small fraction of the total experimental runtime. With further improvements of the control electronics, for instance an internal generation of the 100 MHz sideband modulation, we expect further significant reductions in the overall runtime of the optimizer.

Our work demonstrates that optimizing—or calibrating—pulses with up to 55 parameters is experimentally feasible. This opens up the possibility to explore more elaborate optimal control methods on superconducting qubit platforms. We plan to extend this scheme to multi-qubit gates, where system dynamics are more complex and analytic optimal control methods are not as developed as for single-qubit gates16. While a piecewise-constant parametrization, as done for single-qubit gates, is harder due to the long duration of two-qubit gates, other analytical pulse representations, such as its spectral components, will be explored to improve on gate performance.

## Methods

### Multidimensional parameter calibration

To optimize all parameters of the pulse shape simultaneously on the experimental setup, we have chosen the CMA-ES optimization algorithm as a noise-resilient and time-efficient optimizer42. This algorithm optimizes a population of λ candidate solutions, which are normally distributed in the parameter space. The center and spread of the distribution are given as starting conditions and are updated by the optimizer at each iteration42. See Supplementary Notes 6 for an example optimization showing the distribution of parameters during calibration.

Generally, the choice of the population size λ is a trade-off between fast convergence speed and avoiding local optima42. However, experimentally we have to consider the time required to process the pulse sequences (i.e., the time required to compile the pulse sequences into AWG files), to initialize the hardware (including data transfer) and to measure the cost function for different population sizes λ, see Fig. 5. We benchmark these three times using a set of 20 Clifford gate sequences per candidate solution, each with 100 Clifford gates. By dividing the total time required to evaluate the entire population by λ we calculate the effective time required to asses a single candidate solution. This allow us to gauge how efficiently the hardware is being used, see Fig. 5(c). The instrument initialization introduces a constant overhead, see Fig. 5(a), which decreases the efficiency of the optimization algorithm for lower population sizes. Single-point optimizers, such as Nelder-Mead43, are thus an inefficient choice. However, when evaluating larger populations the contribution of the constant offset of the initialization is distributed over multiple measurements, leading to a convergence of the evaluation time per candidate. The data acquisition time is small due to our implementation of restless measurements41, which allows for a 100 kHz repetition rate. The data analysis time is negligible in comparison to the three main contributions and is not included in the analysis.

### Numerical results

To obtain the numerical results (lines in Fig. 3) we model the qubit as a driven an-harmonic oscillator with a Hilbert space of dimension d = 4. See Supplementary Note 4 for a discussion on leakage to higher energy states. We simulate the quantum dynamics and perform quantum optimal control with the q-optimize package58,59. For each of the ± X/2 and ± Y/2 gates, we obtain the dynamics as superoperator60 matrices $${{\mathcal{E}}}_{\pm X/2}$$ and $${{\mathcal{E}}}_{\pm Y/2}$$ by solving the master equation in Lindblad form, which includes T1 and T2. Next, we compose the gates Ci in the Clifford group C from these atomic operations, for example $${C}_{6}={{\mathcal{E}}}_{-X/2}\circ {{\mathcal{E}}}_{-Y/2}\circ {{\mathcal{E}}}_{+X/2}$$. The pulse parameters $${\mathcal{S}}$$ are optimized using the L-BFGS gradient search61 to maximize the average Clifford gate fidelity $${F}_{C}={\left|C\right|}^{-1}{\sum }_{i}{F}_{{C}_{i}}$$, where $${F}_{{C}_{i}}$$ is the fidelity of each Clifford gate62. This optimizer is more efficient for numerical simulations than the CMA-ES optimizer if measurement noise is neglected and gradients can be computed. Additional details regarding the simulations are provided in Supplementary Notes 1.