Silicon qubit fidelities approaching incoherent noise limits via pulse engineering


Spin qubits created from gate-defined silicon metal–oxide–semiconductor quantum dots are a promising architecture for quantum computation. The high single qubit fidelities possible in these systems, combined with quantum error correcting codes, could potentially offer a route to fault-tolerant quantum computing. To achieve fault tolerance, however, gate error rates must be reduced to below a certain threshold and, in general, correlated errors must be removed. Here we show that pulse engineering techniques can be used to reduce the average Clifford gate error rates for silicon quantum dot spin qubits down to 0.043%. This represents a factor of three improvement over state-of-the-art silicon quantum dot devices and extends the randomized benchmarking coherence time to 9.4 ms. By including tomographically complete measurements in our randomized benchmarking, we infer a higher-order feature of the noise called the unitarity, which measures the coherence of noise. This, in turn, allows us to theoretically predict that average gate error rates as low as 0.026% may be achievable with further pulse improvements. These spin qubit fidelities are ultimately limited by incoherent noise, which we attribute to charge noise from the silicon device structure or the environment.


The implementation of fault-tolerant quantum computing systems will require precise control of qubits and error rates below the tolerance requirement for quantum error correction. In particular, qubits must be manipulated, coupled and measured with error rates well below 1% (refs. 1,2). Among semiconductor implementations, silicon quantum dot spin qubits have demonstrated average single-qubit Clifford gate error rates approaching this threshold3,4,5,6, with error rates of 0.14% in isotopically enriched 28Si/SiGe devices7.

In these previous demonstrations, the gate fidelities were characterized using Clifford-based randomized benchmarking. Randomized benchmarking8,9,10,11,12,13 is the gold standard for quantifying the performance of quantum gates and can be used to efficiently obtain accurate estimates of the average gate fidelity, independently of state preparation and measurement (SPAM) errors. The standard method for randomized benchmarking, however, is based on measuring many random gate sequences and is therefore designed to provide only an average of the system and not any further details about the noise. To improve quantum gates further, information about the characteristics of the noise process, such as its frequency spectrum and its primary source (whether it comes from qubit interaction with the environment or control errors), would be useful. Quantum state tomography methods can provide such information but are generally inefficient and highly sensitive to SPAM errors. To overcome these challenges, variants of randomized benchmarking that quantify higher-order noise features, as well as the average gate fidelity, have been developed14,15,16.

In an early example of this approach3, the randomized benchmarking data—demonstrating average Clifford gate fidelities of 99.59% in silicon metal–oxide–semiconductor (SiMOS) qubits—exhibited non-exponential decay features. These features were subsequently attributed to low-frequency detuning noise in the system17. Thus, the randomized benchmarking approach can also provide details about the noise characteristics, which could be used to further reduce the gate infidelity. In particular, low-frequency noise can be addressed with pulse engineering techniques that can exploit the quasi-static nature of the noise process. Such an approach could, in principle, lead to higher fidelities.

In this Article, we exploit recent developments in randomized benchmarking to give precise estimates of the average gate fidelity. We use pulse engineering techniques to increase the average Clifford gate fidelity of single-qubit gate operations from 99.83% to 99.96% on the same SiMOS quantum dot (Fig. 1). This increased gate fidelity represents a 3.2 times improvement compared with state-of-the-art silicon devices7. In terms of coherence times, and compared to using standard square pulses, this leads to an improvement in randomized benchmarking coherence time \(T_2^{{\mathrm{RB}}}\) from 620 μs to 9.4 ms, an increase of 15 times. This improved coherence time is also 180 times longer than in state-of-the-art silicon devices (\(T_2^{{\mathrm{RB}}} = 52\,{\mathrm{\mu s}}\); ref. 7); see Supplementary Table 1 for a detailed comparison with earlier work. Unlike filtering approaches based on dynamical-decoupling (DD) pulses such as Carr–Purcell–Meiboom–Gill (CPMG), which assume that the spin is in a certain state and with no computational degree of freedom, the \(T_2^{{\mathrm{RB}}}\) gives a more practical benchmarking metric for qubits serving as memories. However, \(T_2^{{\mathrm{RB}}}\) is still much shorter than the spontaneous emission time of the qubit (T1 ≈ 1 s). Accordingly, the qubit performance is not limited by relaxation processes.

Fig. 1: Device image, experimental set-up and GRAPE-optimized Clifford gates.

a, Scanning electron micrograph (SEM) image of a SiMOS qubit device with the design studied here. The quantum dot (Q) that holds our qubit is confined by gate CB laterally and under gate G1, and has a diameter of ~40 nm. An in-phase and quadrature (IQ)-modulated microwave source is connected to the electron spin resonance (ESR) line, driving the control field B1. \(\Omega^{\prime}_x\) and \(\Omega^{\prime}_y\) are the microwave source amplitudes. VTG and VG1 are controlled/pulsed during the experiment, and spin-to-charge conversion is detected via Isensor. b, Axes corresponding to terms of the Hamiltonian acting on the qubit in a rotating frame referenced to the microwave, where \(H = \Omega _x\sigma _x + \Omega _y\sigma _y + \epsilon _z\sigma _z\). The qubit sees B1 as effective Ωx and Ωy control axes. Additional noise from \(\epsilon _z\) acts on the direction of the d.c. field B0. c, Microwave modulation Ωx (blue) and Ωy (red) for the seven basic types of Clifford gate (I, X, Z, 2π/3, X/2, Z/2, H), found through GRAPE iteration. Standard square pulses (black) for I, X, X/2 are used to construct the standard Clifford sequence. All other gates can be constructed with a phase shift on the microwave, for example Y-phase gates are generated by having a π/2 phase offset.

Furthermore, by using tomographic measurements in our randomized benchmarking, we are able to quantify the unitarity14 of the noise: a higher-order feature of the noise that quantifies the average change in the purity of a state, averaged over a given gate set (Fig. 2). Extracting the unitarity can help quantify the coherence in the noise independently of the error rate. By measuring both the unitarity and the average error rate, we can estimate how much of an experimental error budget is due to control errors and low-frequency noise and how much is due to uncorrectable decoherence. Our measurements demonstrate that the improved gate performance via pulse engineering is primarily due to the reduced unitary component of the noise, which also suggests that greater gains could potentially be made from further improvements in pulse engineering and control.

Fig. 2: Density matrix reconstruction through tomographic readout.

a, Single shot sequencing of tomographic readout. Immediately after the control pulse sequence finishes, the projection pulse follows, cycling through spin projection axes as X → Y → Z → −X → −Y → −Z → X and so on. b, Microwave modulation for the six axis readout pulses. Ωproj is the master projection pulse modulation, and the other six axis readout pulses are generated via mask multiplexing (grey shaded area) of the microwave output. c, XYZ spin projection, SX,Y,Z of a Rabi oscillation chevron map, with varying microwave pulse time τESR and frequnecy shift ΔfESR, measured and maintained with feedback control (Fig. 3) over 14 h. d, Coloured density matrix of c. The colour scale of e is used. e, CIELAB colour space coded Bloch sphere, with the following prime axis colours: ±X (−b* channel, blue–yellow), ±Y (a* channel, red–green), ±Z (L channel, white–black); the centre point colour grey represents a fully dephased state. (Colour may saturate due to conversion to RGB.) f, Simulation of the chevron map in d, with 80% readout visibility and exact 0.8 μs π-pulse time; no fitting attempted.

Pulse engineering and calibration

Figure 1 provides details of our qubit experiment and the shaped optimized pulse used. Our previous study into the cause of qubit gate infidelity for SiMOS qubits17 identified low-frequency drift in the qubit detuning as the dominant noise term. The timescale of this drift process is very long compared to the timescale for control of the qubit, enabling pulse engineering techniques to be used to identify compensating pulses for this noise term. Specifically, we use gradient ascent pulse engineering (GRAPE)18 to identify pulses for our qubit control that are robust against low-frequency detuning noise. This method uses a theoretical model for the noise, together with gradient ascent methods to identify (locally) optimal pulses.

We identified seven improved Clifford gate operators by using this procedure, as detailed in the Methods and Fig. 1c. The full set of 24 single-qubit Clifford gates can be achieved simply by phase-shifting one of the seven basic operators (manipulating the sign of Ωx and Ωy and/or swapping them). For example, a Y gate can be constructed by swapping Ωx and Ωy of the X gate.

Four different controllers, illustrated in Fig. 3, were used to ensure that the spin qubit environment and control parameters did not drift during the entire 35 h experiment (see Methods for full details). Calibration data in Fig. 3f suggest that the main source of noise, \(\epsilon _z\), comes from the nuclear spin from 29Si, where the change of resonance frequency has a strong step-like behaviour and no clear correlation with charge rearrangement (Fig. 3b,d). Similar nuclear-spin-like behaviour has been observed in the same device while operating in two-qubit mode19.

Fig. 3: Feedback control and calibration for randomized benchmarking over 35 h.

a, Feedback control of the single electron transistor (SET) sensor bias current. The difference between the SET current and the desired bias current is fed back into VTG, with a gain of β. b, Change of VTG over the whole randomized benchmarking measurement period. c, Feedback control of the spin to charge readout level. The dark blip counts (blips that occur at the later stage of the readout window) are maintained at a particular rate via changing VG1, ensuring that the readout visibility is constant. d, Change of VG1 over the whole randomized benchmarking measurement period. e, Calibration of resonance frequency. The resonance frequency is tracked by taking the difference of two ESR pulse sequences; one does an X/2 followed by Y/2, and the other X/2 followed by −Y/2. The two pulse sequences are interlaced in a single acquisition with total of 500 single shots, taking ~5 s to execute. f, Change of fESR over the whole randomized benchmarking measurement period. g, Calibration of ESR amplitude, ensuring a fixed 1.75 μs π-pulse time. The process is similar to the process in e, but now both sequences perform an X/2 pulse 32 times followed by an X/2 or −X/2. h, Change of microwave amplitude ΩESR over the whole randomized benchmarking measurement period. No strong correlations are observed among the four feedback parameters over 35 h, suggesting that most slow drifts/glitches in the qubit system are independent.

Qubit tomography

Figure 2a presents an example of a single shot sequence during randomized benchmarking. The projection pulses that measure each spin projection are shown in Fig. 2b. They are designed in a way that can be easily multiplexed and have built-in echoing ability. To confirm that these projection pulses are able to correctly construct a robust density matrix, a tomographic Rabi chevron map was constructed, as shown in Fig. 2c,d, combined with the calibration technique described above. To make the correlation between XYZ axes clearer, the colour-coded density matrix map integrates all the spin projection maps into one. The measured and simulated data (non-fitted) in Fig. 2f appear nearly identical, apart from the fact that the simulated map has less background white noise. We can still see XY phase oscillation for both sets of data, even at far detuning (ΔfESR > 3 MHz), where a normal Z projection-only Rabi map would appear to have no readout signal. The coloured Rabi chevron measurement serves as a strong validation of our tomographic readout, feedback control and microwave calibration, and confirms the data quality in our randomized benchmarking experiment.

Randomized benchmarking

We assessed the performance of our improved gates using randomized benchmarking8,9,10,11,12. Randomized benchmarking and its variants are fully scalable protocols that allow for the partial characterization of quantum devices. Here we use a variant of randomized benchmarking to determine the average gate fidelity as well as the coherence (unitarity) of the noise14. An overview of randomized benchmarking is provided in the Methods.

The results of our randomized benchmarking experiments, determining the average gate fidelities of both the original (square) pulses scheme S and the improved optimized pulses scheme O, are shown in Fig. 4. Both pulse schemes are performed for each of the measurement projections in an alternating manner, using the identical square projection pulses shown in Fig. 2b, with calibrations activated. For scheme S, this gave a measured randomized benchmarking decay factor (p) of 99.66(5)%, which equates to an average per-Clifford fidelity of 99.83(2)%. With scheme O, this resulted in a decay factor of 99.914(9)%, which equates to an average per-Clifford fidelity of 99.957(4)%, where the error indicates the 95% confidence levels. For comparison purposes we note that the literature often reports not only a Clifford gate fidelity but also a fidelity based on gate generators. Here we report only the fidelity returned by randomized benchmarking, namely the per Clifford fidelity. The relevant comparison fidelities are therefore the 99.96% achieved here, compared to 99.86% (ref. 7), 99.90% (ref. 20) and 99.24% (refs. 3,17). Fitting assumptions and methods are detailed in the Methods. Bayesian analysis was carried out, leading to the tight credible regions seen in Fig. 4a.

Fig. 4: Randomized benchmarking result and noise profile.

a, Clifford gate infidelity (1 − F) and incoherence (ω) (see equation (2)) for sequences using square pulses (scheme S) and GRAPE-optimized pulses (scheme O), both with calibrations activated. Error bars are calculated using weighted nonlinear least squares. The value of ω (here in red) indicates the amount of infidelity (grey) that is attributable to incoherent noise. Perfect unitary control should allow the infidelity to be reduced to the value of ω. Green lines are sequential Monte Carlo (SMC) estimates of the pulse fidelities, which show the tight credible region on the estimate of the average fidelity for the reference optimized-pulse data set. b, Analysis of the interleaved gates for scheme O. The error bars to the side of the interleaved gate sequences are calculated using the original method of ref. 34 and then using an improved method incorporating unitarity23. c, Noise frequency impact on the different types of randomized benchmarking scheme. With scheme N, standard square pulses without feedback calibration (not performed in this work), any noise frequency starting from its sub-gate time to d.c. would affect the system. For scheme S, the calibration should reduce the effect of noise slower than the calibration period. With tomographic readout, the measurement of incoherence will also remain unaffected by noise slower than the tomography time. For scheme O, both fidelity and coherence will have a reduced impact from noise slower than the actual gate time, but may be more affected by higher frequency noise—up to its shaped-pulse bandwidth. Other d.c. errors will have a direct impact on fidelity, but less so on incoherence. Error bars indicate 95% confidence levels. *Sequence time varies with Clifford gate length.

Unitarity and coherence

The data obtained from the tomographic measurements in our randomized benchmarking experiments allow us to determine the unitarity, which is a higher-order feature of the noise afflicting the system14. The unitarity can be used to distinguish ‘unitary’ errors, which may arise, for example, from control errors and/or low-frequency noise, from stochastic errors (which are generally associated with high-frequency noise). The low-frequency noise can be treated as a non-stochastic error based on the assumption that the noise in each single shot is identical during each tomography sequence time, which is the time where 120 single shots are measured in order to reconstruct a single density matrix.

For a system of dimension d, the unitarity is defined as an integral of pure states (ψ) over the Haar measure:

$$u({\cal E}) = \frac{d}{{d - 1}}{\int} {\mathrm{d}} \psi \,{\mathrm{Tr}}\left[ {{\cal E}\left( {\left| \psi \right\rangle \left\langle \psi \right| - \frac{1}{d}{\Bbb I}} \right)^2} \right]$$

and provides a measure to characterize the noise within the range 0–1: completely coherent noise corresponds to where the unitarity achieves its maximum value of 1; completely depolarizing noise corresponds to its minimum value. The minimum value depends on the fidelity and we have \(u({\cal E}) \ge [1 - \frac{{dr}}{{(d - 1)}}]^2\), which is saturated by a completely depolarizing channel with average infidelity r = 1 − F. Another way to think of unitarity is to note its equivalence to the averaged squared length of the generalized Bloch vector after applying \({\cal E}\) with the component due to the identity subtracted. We can define a new quantity, the incoherence16, which is related to the unitarity as follows:

$$\omega ({\cal E}) = \frac{{d - 1}}{d}\left( {1 - \sqrt {u({\cal E})} } \right){\mkern 1mu}$$

The incoherence is defined so that it takes a maximum value given by the infidelity and a minimum value of 0 (purely coherent noise), so that \(0 \le \omega ({\cal E}) \le r({\cal E})\). The value of the incoherence represents the minimum infidelity that might be achievable if one had perfect unitary control over the system. The incoherence (when compared to the infidelity) directly gives an indication of the amount of the infidelity that can be attributed to incoherent (statistical) noise sources. See Fig. 5d for a geometrical comparison between infidelity and incoherence.

Fig. 5: Randomized benchmarking experimental data.

a, The realigned partial density matrix data for each acquisition, produced via tomographic readout. The density matrix is realigned to spin up with respect to its expected final state (±XYZ), and the phase information is removed. The partial density matrix is colour encoded via the semicircle shown in d. b, The sequence of randomized benchmarking experiments. Each grid box represents the collection of density matrices for a single interleaved gate, formatted as in a. (1) Step through each interleaved gate sequence using standard square pulses: reference, I, X, Y, X/2, −X/2, Y/2, −Y/2, for a total of eight acquisitions. (2) Similar to (1) but with GRAPE-optimized pulses, for a total of eight acquisitions. c, Sequence of randomized benchmarking experiments after stepping through each interleaved gate sequence. (3) Repeat the 8 + 8 = 16 acquisitions five times, stacked up on the y axis. (4) Step through sequence lengths m of [1–6, 8, 10, 13, 16, 20, 25, 32, 40, 50, 63, 79, 100, 126, 158, 200, 251, 316, 398, 501, 631, 794, 1,000, 1,259, 1,585, 1,995, 2,512, 3,162], stacked up on the x axis, for a total of 33 steps. (5) Repeat everything again, stacked up on the y axis, for a total of nine repetitions. d, Colour semicircle representation of a partial (phase-less) density matrix. Greater lightness means higher fidelity, and higher red component means higher XY spin component mixed in.

The incoherence can therefore be used to estimate useful information about the type of noise afflicting the system, as well as to provide a guide as to how much improvement in fidelity can be achieved by correcting purely coherent errors (such as over-rotations). Furthermore, it can be used to provide tighter bounds on the likely diamond distance of the average noise channel21,22 and to reduce uncertainty in the interleaved benchmarking protocol23 (Fig. 4b).

The incoherence (equation (2)) for scheme S is 0.53(10) × 10−3 and for scheme O is 0.25(08) × 10−3. Using the incoherence allows a direct comparison with the reported infidelities (Fig. 4a). With scheme S the incoherence is ~30% of the infidelity and for the improved pulses 61% of the infidelity. Two conclusions can be drawn from this. First, the data provide strong, quantitative evidence that the improved pulses have reduced the errors on the gates primarily by reducing coherent errors. Second, we observe that the infidelity for the optimized pulses is below the incoherence for the square pulses, and that there are still coherent errors in the improved gates. Therefore, by using scheme O we have not only improved our unitary control but have also reduced the incoherent noise in the system.

Figure 4c presents an intuitive explanation of these results. The shaded regions in each row correspond to the effective frequency of noise that can couple into the pulsing schemes, where different schemes act as different noise filters over their fidelities and coherences. Scheme O minimizes the effect of noise on timescales greater than 8 μs, decreasing the infidelity of the system and the coherence of the remaining infidelity. The small trade off, however, is that the pulse optimized gates are slightly more susceptible to higher-frequency noise, up to the bandwidth of the pulse, leaving us with some coherent noise. Despite this, in general, the noise spectrum follows a 1/f trend at higher frequencies so it is worthwhile. Finally, near d.c. imperfections such as miscalibrations and microwave phase errors will also contribute to degrade the fidelity, but with lesser impact on its incoherence.


We have shown that the unitarity can be used to characterize noise and also as a tool to increase gate fidelities. The Clifford gate fidelities for a single qubit reported here (99.957%) are higher than other contemporary electronic platforms for scalable quantum computers, such as donor spin qubits (99.90%)20 and superconducting qubits (99.89%)24. (For this comparison, we have normalized all reported fidelities to per Clifford fidelities, rather than fidelities for non-composite pulses.) In addition, these fidelities have the potential to reach the level of atomic spin platforms such as trapped ion qubits (99.993%)25 and nitrogen-vacancy centre qubits in diamond (99.995%)26. Specifically, the data indicate that, with the improved pulses, the fidelity of the gates could be as high as 99.974% if perfect unitary control can be achieved. Furthermore, if combined with suitable qubit driving (such as the π-pulse time of 120 ns achieved in previous experiments7), the long \(T_2^{{\mathrm{RB}}}\) reported here would lead to a control fidelity exceeding 99.998% in silicon.


Stochastic GRAPE

We model our qubit system using the Hamiltonian \(H = \Omega _x\sigma _x + \Omega _y\sigma _y + \epsilon _z\sigma _z\), where Ωx/Ωy are the I/Q (in-phase/quadrature) microwave amplitudes, and \(\epsilon _z\) is a fixed (d.c.) random variable representing the Z detuning for a single pulse sequence.

The amplitudes Ωx and Ωy, as functions of time, are the two controls available that define our shaped microwave pulse. In each iteration of GRAPE, we calculate the derivative \(\frac{{\delta \Psi }}{{\delta \Omega }}\) of the target operator fidelity Ψ corresponding to each sample point Ωx and Ωy, and update them accordingly to maximize Ψ. Our GRAPE implementation is stochastic, sampling \(\epsilon _z\) on every iteration from a Gaussian distribution of \(\frac{1}{{2T_2^ \ast }} = 16.7\,{\mathrm{kHz}}\) noise strength, where \(T_2^ \ast = 30\,{\mathrm{\mu s}}\). (Note that \(\epsilon _z\) is constant within a single iteration.) In our search for improved pulses, we constrain the maximum pulse length to 8 μs, four times longer than a square π pulse. The amplitude of each pulse is also constrained by \(\Omega _x^2\) + \(\Omega _y^2 = \Omega _{{\mathrm{max}}}^2\), where \(\Omega _{{\mathrm{max}}} = \frac{1}{{2T_\pi }} = \frac{1}{{2 \times 1.75\,\mu {\mathrm{s}}}} = 285.7\,{\mathrm{kHz}}\) is the maximum allowed effective B1 amplitude.

Each optimized pulse is constructed via 800 Ω samples at a sample rate of 10 ns, with a time length of 8 μs.

For a given, small, learning factor η, a single iteration step can be written as follows:

  1. (1)

    Randomize \(\epsilon _z\)

  2. (2)

    Calculate \(\frac{{\delta \Psi }}{{\delta \Omega }}\) for all Ω pointwise, with the current Hamiltonian H

  3. (3)

    Update \(\Omega \to \Omega + \eta \frac{{\delta \Psi }}{{\delta \Omega }}\)

  4. (4)

    Filter Ω for smoothness and bound condition \(\Omega _{{\mathrm{max}}}^2 \ge \Omega _x^2 + \Omega _y^2\)

The pulse optimization can perform roughly 100 iterations per second with MATLAB; within a few minutes, solutions that have close infidelities to Fig. 1c can be found. Here, we optimize seven basic Clifford gate operators using the GRAPE method described above. These basic gates can be expanded to the complete group of 24 Clifford gates by phase-shifting one of the seven basic operators (manipulating the sign of Ωx and Ωy, and/or swapping them). For example, a Y gate can be constructed by swapping Ωx and Ωy of the X gate. Figure 1c shows the optimized Clifford gates that were found and used for the randomized benchmarking experiment. The normal square pulses in black are plotted in the same scale for comparison. See the Supplementary Information for analysis of the expected performance of these gate pulses.

Experimental set-up

The device being measured is the same as that described in refs. 19,27, fabricated on an isotopically enriched 900 nm 28Si epilayer28 with an 800 ppm residual concentration of 29Si with multi-level gate-stack silicon MOS technology29,30. The measurements were conducted in a wet dilution refrigerator with base temperature of T = 20 mK. Stanford Research System SIM928 rechargeable isolated voltage sources were used to supply all the d.c. voltages, and a LeCroy ArbStudio 1104 arbitrary waveform generator (AWG) was combined with the d.c. voltages through a resistive voltage divider (1/5 for d.c., 1/25 for AWG). The shaped microwave pulses were delivered by an Agilent E8267D vector signal generator; we used its own internal AWG for IQ modulation. The SET current signals were detected by a FEMTO transimpedance amplifier DLPCA-200 and finally acquired using an Alazar ATS9440 waveform digitizer with a PCIe interface.

Supplementary Fig. 3 presents the stability diagram and read/control point for the qubit. Notice that there is a faint horizontal transition that shows there is a quantum dot sitting under G2; this is being controlled in our other work on two-qubit randomized benchmarking19.

Randomized benchmarking

Randomized benchmarking sequence

We perform the randomized benchmarking experiment using the methods described earlier. The results are shown in the main text (Fig. 4) and in Fig. 5. We also present the data as follows: for every measurement acquisition of a randomized benchmarking sequence, we obtain a density matrix that is reconstructed via 120 single shot spin readouts with tomographic measurement (see main text). The density matrix can be rotated in such a way that its expected final state would have aligned to spin up (+Z), followed by removing the XY phase angle while maintaining its magnitude. This produces a realigned partial density matrix map that is colour encoded in Fig. 5a according to the colour semicircle in Fig. 5d. The maps are grouped in different interleaved gates and contain the complete measurement data set for every single acquisition that is studied in this Article, before any averaging and analysis take place. To present the measurement data in as raw a form as possible, other than the realigned phase information being taken away (because it only has trivial physical meaning in a randomized benchmarking experiment), no other corrections including SPAM error renormalization were performed. The colour point that has higher brightness means the measured final state from a randomized benchmarking sequence has higher fidelity. If the colour red is mixed in the data point, this suggests unitary errors have occurred that may result in a measurement having low fidelity but a high coherence/unitarity (analysis in Fig. 4). Note that there is no colour saturation in Fig. 5a, meaning there are no data compression losses unless through the limitations of the viewing/printing device for this Article. However, the colour semicircle at high coherence/visibility is saturated, but no experimental data points lie within those regions. The grey boxes at the top right corner for each map in Fig. 5a are unperformed data points due to early termination of the measurement.

Figure 5b,c describes how the whole randomized benchmarking experiment is stepped through in time, and the numbers in the figure represent the order of stepping. To begin, note that the Clifford gate sequences in every single data point shown in Fig. 5a are re-randomized and different. Now, we have (1) (square) and (2) (optimized) steps through the different interleaved gates in Fig. 5b. We start from the standard square pulse reference (no interleaved gate) with a randomized Clifford gate sequence. Once tomographic readout acquisition is done, we move onto the next interleaved gate, I, and regenerates a new randomized Clifford gate sequence with same sequence length, m, which takes about 1.7 s (at short m). After the last interleaved gate, −Y/2 acquisition is completed, which concludes process (1); the same measurement is repeated again but with the GRAPE-optimized pulses, referred to as process (2). At the end of (1) + (2), the frequency and power calibration then kicks in to adjust the qubit environment (see above). A total of 16 acquisitions are cycled through (eight interleaved gates and two types of Clifford gate pulse) and this takes about 40 s (at short m) including the calibration. When the interleaved gate cycle is done, we now move to Fig. 5c where process (3) starts. Process (3) is a simple five-repetition sequence of (1) + (2) + calibration; this is repeated on the y axis in Fig. 5a and takes ~3 min (at short m) to complete. Process (4) changes m after completion of process (3), stepping through [1–6, 8, 10, 13, 16, 20, 25, 32, 40, 50, 63, 79, 100, 126, 158, 200, 251, 316, 398, 501, 631, 794, 1,000, 1,259, 1,585, 1,995, 2,512, 3,162] sequentially, a total of 33 steps, as shown on the x axis of Fig. 5a, and takes ~250 min to complete. Finally, process (5) repeats all the above a total of nine times and stacks up on the y axis of Fig. 5a, with a final product of 45 rows. The complete measurement can be expressed as process stack ((1) + (2) + calibration) × (3) × (4) × (5), and lasts for 35 h.

Eliminating the nuisance parameter B

We note that using tomographic measurements also allows a variation of the RB protocol similar to variations previously discussed in the literature8,9,10,17,20. For any particular sequence the tomographic measurements at the end of the sequence include not only a measurement that corresponds to the expected ‘maximal-overlap’ measurement of the state, but also one that corresponds to a ‘minimal-overlap’ measurement. This ‘minimal-overlap’ measurement can be included by setting \(\bar q(m,s) = 1 - \bar q(m,s)\) for each such measurement and combining this into in the average estimate of the survival probability for each sequence length, m. If this is done the constant B is mapped to (B + (1 − B))/2 = 1/2. This removal of the SPAM parameter B leaves only two free parameters with which to fit the data, leading to tighter credible regions for the parameter of interest (p).

The randomized benchmarking procedure described above was carried out for 33 different sequence lengths of m (Fig. 4). The survival percentage for each sequence of a particular length was averaged (as discussed above) and a weighted least-squares nonlinear fit was performed to the data, using Supplementary equation (2), with B set to 0.5. The data points were weighted by the inverse variance of the observed data at a particular m.

To take into account possible gate-dependent noise, the nonlinear fit to the data was re-analysed, this time ignoring m of less than four (these are the only m that are likely to be noticeably affected by gate-dependent noise)13, with no significant impact on the results. To finalize the analysis, QInfer31,32 was used to analyse the data using Bayesian techniques (a sequential Monte Carlo estimation) of the parameter p. As can be seen in Fig. 4b the credible region found is in accordance with the least-square fit methods. This provides an indication as to the correctness of the model, which might not be the case if the system were still impacted by low-frequency noise33. Finally, we note that the use of repeat sequences complicates the analysis surrounding the use of least-squares estimates and the Bayesian techniques used by QInfer. However, using bootstrapping methods on the data confirms the robustness of the estimates.

Determining the unitarity from tomographic measurements

The tomographic measurements allow the unitarity of the average noise channel to be measured14. The protocol is similar to a randomized benchmarking experiment, except that no inverting gate is applied and the resulting state is best measured as an average over the non-identity Pauli operators, known as the purity measurement. For a single qubit this can be accomplished by measuring \({\cal Q} = \left\langle {S_x} \right\rangle ^2 + \left\langle {S_y} \right\rangle ^2 + \left\langle {S_z} \right\rangle ^2\), where each expectation value is taken with respect to the state in question. The projective measurements carried out by the tomography allow us to make numerical estimates for each of the components of the purity measurement and thus for \({\cal Q}\). Then, using the techniques discussed above, this is fit to a curve of the form \({\cal Q}(m) = A + Bu({\cal E})^{(m - 1)}\), where \(u({\cal E})\) is the unitarity and A and B are parameters that absorb SPAM noise.

Feedback and calibration

We implemented four different controllers to ensure that the spin qubit environment and parameters do not drift. The first two controllers are responsible for spin-to-charge readout process, and the other two for the Hamiltonian coefficients. Figure 3 shows how all the four controllers and their respective parameters change throughout the whole 35 h of the randomized benchmarking experiment. Figure 3a is the schematic of the circuitry controlling the sensor current Isensor. The difference between Isensor and the desired sensing point Iref is passed through a gain of β, and fed back into VTG. This controller ensures the sensing signal Isensor is always sitting on the most sensitive point for blip detection. Figure 3c presents a schematic of the circuitry controlling the dark blip count, blipdark. The dark blip count refers to the excessive blips that occur even when the qubit spin is down, gathered as the blip detection count at the later half of the readout time window. Dark count occurrence is usually caused by not biasing the readout level in the middle of the Zeeman splitting energy. Having the dark blip count being too high or low may cause the readout visibility to become saturated and will have an effect on the analysis of the randomized benchmarking decay rate. Here we set blipref to 0.16 for maximum readout visibility for the controller. The above two controllers are automatically applied by doing extra analysis of Isensor traces for each acquisition (a single digitizer data transfer of collective single shot traces of Isensor), and do not require additional adjustment. The next two controllers require interleaved measurements that are independent of the randomized benchmarking sequence. These are done periodically, after every 16 acquisitions. Here, we can modify our Hamiltonian into

$$H = \Omega _{{\mathrm{drift}}}\Omega _{{\mathrm{ESR}}}(\Omega _x\sigma _x + \Omega _y\sigma _y) + (f_{{\mathrm{ESR}}} + \epsilon _z)\sigma _z$$

where fESR is the ESR centre frequency adjustment, which can be seen as a multiplier on σz. This can cancel the effect of the detuning noise offset, \(\epsilon _z\). Ωdrift is the effective physical ESR amplitude multiplier, which drifts over time and is balanced by ΩESR through the controller to maintain the relation ΩESRΩdrift = 1. We also have the relation \(\Omega^{\prime}_{x,y} = \Omega _{{\mathrm{ESR}}}\Omega _{x,y}\), which is shown in Fig. 1a. fESR is updated by measuring the difference between the two control sequences, as shown in Fig. 3e. We have one sequence of X/2 → Y/2 with a 0.2 μs gap, while the other one has the Y/2 changed to −Y/2. The two calibration sequences would have equal spin up probability—close to 0.5—if no resonance frequency offset exists, and will have a different probability if \(f_{{\mathrm{ESR}}} + \epsilon _z \ne 0\), regardless of other SPAM errors. We then take the spin up probability difference of these two sequences and feed this back into fESR with a certain stable gain, where now the controller will enforce \(f_{{\mathrm{ESR}}} + \epsilon _z\sim 0\), because \(\epsilon _z\) has a very slow drift over the calibration period (in the range of minutes). Similarly, after calibrating fESR we perform another calibration sequence pair (shown in Fig. 3g), where now the first has X/2 repeated 32 times, followed by another X/2 at the end, versus the second having −X/2 at the end. Given that the \(f_{{\mathrm{ESR}}} + \epsilon _z\) term is negligible at this stage, the spin up probabilities of these two sequences are also close to 0.5 and only the same when ΩESRΩdrift = 1. Any difference in these two probabilities will feed back into ΩESR. The repetition of 32 is chosen for higher accuracy of calibrating ΩESR while still maintaining a stable controller. A repetition number that is higher will give better accuracy but with less tolerance of the drift range. This can result in the same spin up probability where ΩESRΩdrift = A (A is a number close to 1). On average, 16 acquisitions take around 35 s and the two calibrations of fESR and ΩESR take ~5 s each. Figure 3b,d,f,h presents plots of feedback values over the measurement time period for the controllers shown on the left of the figure.

During the randomized benchmarking measurement, traces of VTG, VG1 and fESR appear to be binary/step-like, suggesting that changes in the qubit environment are more event-like rather than drifting. The cause of these jump events could include local charge rearrangement, battery switching of gate sources or local nuclear spin flip. However, ΩESR appears to be a drift-like mechanism, which we believe is due to the high sensitivity of the microwave source to temperature and the power supply. Interestingly, we observe no clear correlation between all four traces. This is a strong indication that the fESR jumps of the qubit come from nuclear spin flip rather than local charge rearrangement, which would require a big offset in readout level VG1 with a given Stark shift level27.

Data availability

The data sets generated during and/or analysed during the current study are available from the corresponding authors on reasonable request.

Code availability

The analysis code that support the findings during the current study are available from the corresponding authors on reasonable request.


  1. 1.

    Knill, E. Quantum computing with realistically noisy devices. Nature 434, 39–44 (2005).

    Article  Google Scholar 

  2. 2.

    Fowler, A. G., Mariantoni, M., Martinis, J. M. & Cleland, A. N. Surface codes: towards practical large-scale quantum computation. Phys. Rev. A 86, 032324 (2012).

    Article  Google Scholar 

  3. 3.

    Veldhorst, M. et al. An addressable quantum dot qubit with fault-tolerant control-fidelity. Nat. Nanotechnol. 9, 981–985 (2014).

    Article  Google Scholar 

  4. 4.

    Kawakami, E. et al. Gate fidelity and coherence of an electron spin in an Si/SiGe quantum dot with micromagnet. Proc. Natl Acad. Sci. USA 113, 11738–11743 (2016).

    Article  Google Scholar 

  5. 5.

    Watson, T. F. et al. A programmable two-qubit quantum processor in silicon. Nature 555, 633–637 (2018).

    Article  Google Scholar 

  6. 6.

    Zajac, D. M. et al. Resonantly driven CNOT gate for electron spins. Science 359, 439–442 (2017).

    MathSciNet  Article  Google Scholar 

  7. 7.

    Yoneda, J. et al. A quantum-dot spin qubit with coherence limited by charge noise and fidelity higher than 99.9%. Nat. Nanotechnol. 13, 102–106 (2017).

    Article  Google Scholar 

  8. 8.

    Emerson, J., Alicki, R. & Życzkowski, K. Scalable noise estimation with random unitary operators. J. Opt. B 7, S347–S352 (2005).

    MathSciNet  Article  Google Scholar 

  9. 9.

    Knill, E. et al. Randomized benchmarking of quantum gates. Phys. Rev. A 77, 012307 (2008).

    Article  Google Scholar 

  10. 10.

    Dankert, C., Cleve, R., Emerson, J. & Livine, E. Exact and approximate unitary 2-designs and their application to fidelity estimation. Phys. Rev. A 80, 012304 (2009).

    Article  Google Scholar 

  11. 11.

    Magesan, E., Gambetta, J. M. & Emerson, J. Scalable and robust randomized benchmarking of quantum processes. Phys. Rev. Lett. 106, 180504 (2011).

    Article  Google Scholar 

  12. 12.

    Magesan, E., Gambetta, J. M. & Emerson, J. Characterizing quantum gates via randomized benchmarking. Phys. Rev. A 85, 042311 (2012).

    Article  Google Scholar 

  13. 13.

    Wallman, J. J. Randomized benchmarking with gate-dependent noise. Quantum 2, 47 (2018).

    Article  Google Scholar 

  14. 14.

    Wallman, J. J., Granade, C., Harper, R. & Flammia, S. T. Estimating the coherence of noise. New J. Phys. 17, 113020 (2015).

  15. 15.

    Kimmel, S., da Silva, M. P., Ryan, Ca, Johnson, B. R. & Ohki, T. Robust extraction of tomographic information via randomized benchmarking. Phys. Rev. X 4, 011050 (2014).

    Google Scholar 

  16. 16.

    Feng, G. et al. Estimating the coherence of noise in quantum control of a solid-state qubit. Phys. Rev. Lett. 117, 260501 (2016).

    Article  Google Scholar 

  17. 17.

    Fogarty, M. A. et al. Nonexponential fidelity decay in randomized benchmarking with low-frequency noise. Phys. Rev. A 92, 022326 (2015).

    Article  Google Scholar 

  18. 18.

    Khaneja, N., Reiss, T., Kehlet, C., Schulte-Herbrüggen, T. & Glaser, S. J. Optimal control of coupled spin dynamics: design of NMR pulse sequences by gradient ascent algorithms. J. Magn. Reson. 172, 296–305 (2005).

    Article  Google Scholar 

  19. 19.

    Huang, W. et al. Fidelity benchmarks for two-qubit gates in silicon. Nature (in the press); preprint available at

  20. 20.

    Muhonen, J. T. et al. Quantifying the quantum gate fidelity of single-atom spin qubits in silicon by randomized benchmarking. J. Phys. Condens. Matter 27, 154205 (2015).

    Article  Google Scholar 

  21. 21.

    Wallman, J. J. Bounding experimental quantum error rates relative to fault-tolerant thresholds. Preprint at (2015).

  22. 22.

    Kueng, R., Long, D. M., Doherty, A. C. & Flammia, S. T. Comparing experiments to the fault-tolerance threshold. Phys. Rev. Lett. 117, 170502 (2016).

    Article  Google Scholar 

  23. 23.

    Dugas, A. C., Wallman, J. J. & Emerson, J. Efficiently characterizing the total error in quantum circuits. Preprint at (2018).

  24. 24.

    Barends, R. et al. Superconducting quantum circuits at the surface code threshold for fault tolerance. Nature 508, 500–503 (2014).

    Article  Google Scholar 

  25. 25.

    Ballance, C. J., Harty, T. P., Linke, N. M., Sepiol, M. A. & Lucas, D. M. High-fidelity quantum logic gates using trapped-ion hyperfine qubits. Phys. Rev. Lett. 117, 060504 (2016).

    Article  Google Scholar 

  26. 26.

    Rong, X. et al. Experimental fault-tolerant universal quantum gates with solid-state spins under ambient conditions. Nat. Commun. 6, 8748 (2015).

    Article  Google Scholar 

  27. 27.

    Chan, K. W. et al. Assessment of a silicon quantum dot spin qubit environment via noise spectroscopy. Phys. Rev. Appl. 10, 044017 (2018).

    Article  Google Scholar 

  28. 28.

    Itoh, K. M. & Watanabe, H. Isotope engineering of silicon and diamond for quantum computing and sensing applications. MRS Commun. 4, 143–157 (2014).

    Article  Google Scholar 

  29. 29.

    Angus, S. J., Ferguson, A. J., Dzurak, A. S. & Clark, R. G. Gate-defined quantum dots in intrinsic silicon. Nano Lett. 7, 2051–2055 (2007).

    Article  Google Scholar 

  30. 30.

    Lim, W. H. et al. Observation of the single-electron regime in a highly tunable silicon quantum dot. Appl. Phys. Lett. 95, 242102 (2009).

    Article  Google Scholar 

  31. 31.

    Granade, C., Ferrie, C. & Cory, D. G. Accelerated randomized benchmarking. New J. Phys. 17, 013042 (2015).

    Article  Google Scholar 

  32. 32.

    Granade, C. et al. QInfer: statistical inference software for quantum applications. Quantum 1, 5 (2017).

    Article  Google Scholar 

  33. 33.

    Ball, H., Stace, T. M., Flammia, S. T. & Biercuk, M. J. Effect of noise correlations on randomize benchmarking. Phys. Rev. A 93, 022303 (2016).

    Article  Google Scholar 

  34. 34.

    Magesan, E. et al. Efficient measurement of quantum gate error by interleaved randomized benchmarking. Phys. Rev. Lett. 109, 080505 (2012).

    Article  Google Scholar 

Download references


The authors acknowledge support from the US Army Research Office (W911NF-13-1-0024, W911NF-14-1-0098, W911NF-14-1-0103 and W911NF-17-1-0198), the Australian Research Council (CE170100009 and CE170100012) and the NSW Node of the Australian National Fabrication Facility. B.H. acknowledges support from the Netherlands Organization for Scientific Research (NWO) through a Rubicon Grant. K.M.I. acknowledges support from a Grant-in-Aid for Scientific Research by MEXT, NanoQuine, FIRST and the JSPS Core-to-Core Program. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Office or the US Government.

Author information




C.H.Y. conceived and designed the GRAPE pulse sequences and the feedback control systems for the experiments. C.H.Y. and K.W.C. performed the experiments. C.H.Y., R.H., T.E., S.T.F. and S.D.B. analysed the data. K.W.C. and F.E.H. fabricated the device. K.M.I. prepared and supplied the 28Si epilayer wafer. All authors contributed materials, analysis and/or tools. C.H.Y., R.H., S.T.F., S.D.B. and A.S.D. wrote the paper with input from all co-authors. A.S.D. supervised the project.

Corresponding authors

Correspondence to C. H. Yang or S. D. Bartlett or A. S. Dzurak.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–2, Supplementary equations 1–2, Supplementary Table 1

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yang, C.H., Chan, K.W., Harper, R. et al. Silicon qubit fidelities approaching incoherent noise limits via pulse engineering. Nat Electron 2, 151–158 (2019).

Download citation

Further reading


Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing