Main

The fidelities of qubit operations need to be consistent over time and across different qubits to achieve complex quantum computations, such as those in recent groundbreaking research1,2,3,4,5. The physical mechanisms behind the entanglement between qubits are key to the success of a two-qubit gate and have a large impact in the performance of algorithms and error correction schemes6,7,8,9. Exchange-based entangling gates between silicon spin qubits have only recently matured enough to achieve high-fidelity operation10,11,12,13,14.

The variation of the operational parameters is particularly important in the case of spin qubits owing to their nanometric physical size and nanosecond-scale operation time. While all solid-state systems are subjected to materials noise and disorder, spin qubits probe these imperfections at nearly the atomic scale. In the case of silicon metal-oxide-semiconductor quantum dot qubits15, this is especially pronounced since the qubits are pressed against the amorphous Si–SiO2 interface. The issue of the consistency of high-fidelity operations then becomes central to translating this fabrication scalability into a qubit control scalability.

Our goal is to analyse the statistical characteristics and temporal stability of primitive two-qubit gate operations involving electron spins in Silicon metal-oxide-semiconductor quantum dots, which are based on the Heisenberg exchange interaction. Our entangling gates are based on controlling the exchange between spins of electrons in neighbouring quantum dots by pulsing the height of the tunnel barrier between dots with an interstitial exchange-control electrode. We perform entangling gates with two strategies: a simple square pulse of voltage, which leads to an effective Ising interaction between spins implementing controlled phase (CZ)16; or a composite gate consisting of two voltage pulses separated by a microwave pulse that performs single-qubit dynamical decoupling, referred to as decoupled controlled phase (DCZ) gates17. We analyse the errors introduced by each of these strategies in detail, leveraging three state-of-the-art methods of validation.

In addition to the multiple validation methods, we verify the consistency of the two-qubit gate operations by reproducing them in three different devices. Two devices (A and B) are nominally identical three-dot chains, having been fabricated in the same batch, with the gate layout shown in Fig. 1a. The third device (C in Extended Data Fig. 1b) has four dots instead, but the same choice of material stack (aluminium gates and thermally grown Al2O3). All experiments shown here are based on forming only two of the dots at a time.

Fig. 1: Electrostatic quantum dots with tunable exchange.
figure 1

a, False colour scanning electron micrograph (SEM) of a device similar to A and B The qubit dots are accumulated under the plunger (P) gates. The charge state of the double dot is monitored, by the sensor dot which is confined between the barrier gates SLB and SRB. b, False colour transmission electron micrograph (TEM) of a cross-section of a device similar to A and B. The shaded area shows the extension of electron wave functions under each dot. c, Charge stability map of device A in isolated mode together with the important operation points for qubit operation (Jon and Joff) and readout (RO). d, Exchange energy for devices A, B and C as a function of VJ. The rate of increase of exchange is shown in the legend. e, CZ oscillations using the measurement sequence in g with two different control qubit initializations at a fixed level of J. f, Oscillations of DCZ gate sequence as a function of J-gate pulse time and voltage level. g, Pulse sequences used in the experiments for CZ (left) and DCZ (right) in e and f. Here, ‘X’ refers to \(\frac{\uppi }{2}\) rotation around the x axis. For the CZ gate we apply phase θ rotations around z-axis for both qubits to correct for the Stark shift from pulsing the J-gate.

Source data

The level of isotopic purification of the silicon substrate is also different—800 ppm 29Si for devices A and B and 50 ppm for device C. We do not present results comparing silicon substrate purification levels, but qualitatively the more purified device C was operated with high fidelity without as much need for active feedback on the qubit parameters (only a single parameter, instead of nine or seven in the cases of devices A and B, respectively).

Our electron spin qubits reside under the plunger gates, P1 and P2, and the oscillating B1 field from a nearby antenna drives the electron spin resonance induced by magnetic field B0. The exchange interaction is tuned via the voltage on the interstitial exchange gate, J. For measurement, we use a single-electron transistor (SET) that senses charge movement and becomes conditional on the spins when we try to move both electrons into the same dot due to the Pauli exclusion principle. The detailed cross-section image in Fig. 1b displays the active area where the qubits are formed and also reveals the oxide variation and fabrication inconsistencies that lead to variations in device parameters and performance that, in turn, lead to the difficulty in obtaining consistent fidelities across devices18. The differences between devices are laid out in Extended Data Table 1.

The isolated mode19,20 stability map in Fig. 1c for device A with four electrons reveals the charge transitions between dots and allows us to choose the operational points in the device (see Extended Data Fig. 1a,c for devices B and C). We choose a symmetric operation point for our exchange-on voltage to ensure that our two-qubit gate is not sensitive to the detuning noise. The d.c. biases are typically chosen so that exchange-on point is at the high end of our dynamic range (400 mV), allowing us to lower the tunnel rates for the blockade readout.

The spins in all our experiments are read out through the relative parity of the two spins using Pauli spin blockade for spin-to-charge conversion21,22. After finding the resonance frequencies of the qubits, we analyse their coherence, Rabi frequencies and noise spectra. In one case, for device A, we use a vector magnet to study the effect of the direction of the constant magnetic field and identify the microscopic origin of the decoherence. Our analysis on device A (see magnetic field dependence measurement in ref. 23) shows that \({T}_{2}^{\,*}{\rm{s}}\) are limited by comparative spin–orbit and 29Si noise. Device B, being from the same batch as device A, is likely to have similar noise limitations. Device C is probably limited mostly by electric noise, thanks to its superior isotopic purity compared with devices A and B. Our \({T}_{2}^{\,{\rm{Hahn}}}\) is more strongly limited by the charge rather than hyperfine noise in all devices. To fight the low-frequency components of the intrinsic 1/f noise and slow jumps due to hyperfine interactions with 29Si nuclear spins, we apply frequency feedback to devices A and B to keep the microwave control in resonance24. This is not necessary for device C.

To run high-fidelity two-qubit gates without being limited by the residual exchange at the nominally off state, we aim at an exchange swing that provides at least a 104 ratio between on and off states. This is achievable for all devices with our 400 mV dynamical voltage range of the J gates, as shown in Fig. 1d.

Multi-qubit physical errors

We utilize three different validation methods—interleaved randomized benchmarking (IRB)25,26, fast Bayesian tomography (FBT)27 and gate set tomography (GST)28,29—illustrated in Fig. 2a–c summarized in Extended Data Table 2 and described in further detail in ‘Comparison with other high-fidelity two spin systems in silicon’ section in Supplementary Discussion. The estimated process matrices for all gates both from FBT and GST are shown in Extended Data Figs. 2 and 3, respectively. Our tomographic analyses allow us to identify the underlying quantum process and form conjectures about the physical mechanisms causing the error. We explored not only the entangling gates but also the two-qubit processes generated by single qubit gates, which elucidates effects such as the exposure of a qubit to decoherence when idling while the other qubit is being controlled, the effect of crosstalk and contextual errors.

Fig. 2: Summary of tomographic methods and selection of major identified physical error sources of the gate implementations based on GST for one- and two-qubit gates in devices A and B.
figure 2

a, Measurement sequence principle used in randomized benchmarking. The gate of interest (G) is interleaved with N random Clifford gates (Ci) together with recovery Clifford (CR) that are composed of five primitive gates each. Recovery probability of the randomized benchmarking sequence as a function of Clifford gates for both interleaved and reference (ref) sequences in all devices. b, Simplified FBT workflow from experiment to result. FBT can analyse any gate sequence; IRB is used here as an example. c, GST workflow used in our experiment. d, Dephasing (stochastic) errors: these noise channels occur due to the \({T}_{2}^{\,*}\)-like decay during the operation and also non-Markovian contextual error sources (Fig. 3). The reduced error rate between DCZ is due to noise limited by \({T}_{2}^{\,{\rm{Hahn}}}\)-like decay instead. e, Physical Hamiltonian errors that result from operations such as AC-Stark shift, off-resonant driving or residual exchange during the single qubit operation. f, Calibration (systematic) errors, due to the errors in the calibration of the gates. g, Other major errors with no major physical attribution.

Source data

We run a GST experiment in device A with CZ implementation and in device B with DCZ implementation. We generate the circuits for the analysis with special consideration for the measurement effects of the parity readout, as opposed to the more common readout of individual qubits. Two strategies are adopted to that end, one entailing the addition of projections of single qubit states into the parity and repeating the experiments for these projections, and the other by adapting the analysis tool itself (pyGSTi29) to handle the measurement effects directly at the analysis stage. Incorporating the native measurement operation explicitly and generating measurement fiducials accordingly is found to be the most efficient strategy (see more details in ‘Gate set tomography with parity readout’ section in Supplementary Discussion).

The most immediate form of analysis of the errors detected by GST is the breakdown between Hamiltonian and stochastic parts, which can be done through a mathematical framework without assumptions of the underlying mechanisms30. The Hamiltonian part of the error is in general associated with calibration issues, unintended Hamiltonian terms or forms of contextual error that only become evident once long circuits are performed (such as the consistent heating of the chip due to the long string of microwave pulses). Feedback into the gate control parameters allows us to minimize these errors until we achieve consistently high fidelities.

Removing these peculiarities of the qubit performance from the picture, the infidelity becomes dominated by stochastic effects that we can compare across devices (noting that the infidelity contribution of Hamiltonian errors is naturally weaker, with a quadratic dependence, compared with the linear dependence on stochastic errors). In Fig. 2d–g, we have grouped the major error channels resulting from this optimization loop according to their physical interpretation: the dephasing errors, calibration errors, physical errors and other uncategorized errors, respectively.

One striking difference between the CZ and DCZ implementations is that both the stochastic and Hamiltonian IZ and ZI errors are an order of magnitude smaller for DCZ. By its design, DCZ echoes out phase accumulation owing to quasistatic shifts in the spin Larmor frequencies and, hence, suppresses stochastic IZ and ZI errors. It also suppresses Hamiltonian IZ and ZI errors incurred due to spurious Stark shifts created by the voltage pulse on the exchange gate. This is the main reason our DCZ gates perform better in device B even though the coherence time, \({T}_{2}^{\,* }\), is worse for both qubits in this device compared with device A. This is the primary contributor to differences in the fidelities between the two implementations.

The unintentional driving errors in Fig. 2e are created by frequency crosstalk. The difference in qubit frequencies, ΔEZ, is large compared with the Rabi frequencies, ΩR, but is still relevant when high-fidelity operation is attempted, generating alternating current (AC) Stark shift (Z error) and off-resonant driving (X error). These errors appear Hamiltonian since they are systematic. Calibration errors in Fig. 2f are also Hamiltonian and are caused by the realistic limits in calibration accuracy of the gates, leading to errors such as under/overrotation of the gates and exchange level errors. We also note that the environment circumstances in which calibration is performed are different from those during the circuits, which leads to contextual errors.

We note that some errors, such as the ones in Fig. 2g, remain unexplained. For example, the strong presence of IY Hamiltonian errors even in the CZ implementation (which does not contain any microwave pulses) and the appearance of a stochastic XZ error in both devices. These errors are consistent enough across devices and measurement setups that they instigate future investigations of, as yet, unknown microscopic physical mechanisms impacting spins in quantum dots. Together, these sources of error add to a considerable amount and are part of the impediment to achieving the next level of gate fidelities. All the error channels extracted from GST are shown in Extended Data Figs. 4 and 5.

Another striking observation that we obtain both from GST and FBT in Extended Data Table 3 is that for all devices the single qubit gates are the lowest in fidelity. A more careful GST analysis, in Supplementary Table 3, reveals that on-target fidelity—that is, the fidelity of the operation on one qubit discarding effects on the other qubit—is very high. The idling qubit, however, suffers strong impacts from both dephasing while idling and crosstalk considered in the total error.

A caveat for all of our analysis methods is that they all intrinsically assume that the gates and their associated errors are Markovian processes. We have identified physical effects that do not agree with this assumption31. Figure 3b, for instance, shows that the qubit Larmor frequencies shift substantially depending on how long the microwave has been operating—an effect potentially associated with the excitation of the two-level fluctuators in the oxide32. This leads to consistent biases in calibration for the gates applied late in a long circuit compared with the same gate applied early—a form of contextual noise33 that is shown in the Fig. 3a. Another non-Markovian effect is the high-amplitude, low-frequency components in noise stemming from either nuclear spins or slow two-level fluctuators that compose the 1/f electric noise spectrum34. In these circumstances, it becomes evident that differences in the statistical inference approach (frequentist or Bayesian) and the length of circuits (shorter circuits in GST or longer random Cliffords for IRB and FBT) will provide different information about the gates and how they change over time. We also note that, depending on the statistical treatment of the data, these contextual drifts in Hamiltonian error would be interpreted as stochastic noise.

Fig. 3: Contextual Larmor frequency shift and errors.
figure 3

a, Estimated gate noise channel components XIYI for CZ, IZIY for \({{\mathrm{X}}}_{1}^{\uppi /2}\) and IYIX for \({{\mathrm{X}}}_{2}^{\uppi /2}\) as a function of number of gates in 10,000 randomized sequences with 100 single shots. The shift indicates the changes in the process over the time of one shot of experiment. b, The Larmor frequency of both qubits in device A after applying an off-resonant pre-pulse. The Q1 frequency shifts to lower frequencies with longer applied pre-pulse, due to a transient effect possibly similar to those in refs. 32,33,48. This leads to contextual errors observed in a. The error bars in both a and b correspond to the 95% confidence interval.

Source data

Two-qubit fidelity statistics

Achieving a detailed picture of errors such as the one in Fig. 2 requires a minimum level of stability of high-fidelity operations such that long experiments can be run with consistent results over long periods of time. Our evaluations of two-qubit fidelities from different experimental runs are shown in Fig. 4 and Extended Data Table 3. Most statistics are collected using the simplest validation method: IRB. Based on IRB experiments, we achieved average two-qubit entangling gate fidelities of 98.4% (CZ), 99.37% (DCZ) and 99.76% (DCZ) in devices A, B and C respectively. These numbers indicate sufficient operational fidelity for sustainable error correction6,7,8,9. More details on the IRB method can be found in ‘Randomized benchmarking’ section in Supplementary Discussion.

Fig. 4: Two-qubit gate fidelities.
figure 4

Two-qubit gate fidelities extracted from different error characterization measurements of IRB, and GST. The fidelity of the FBT analysis from certain IRB is the neighbouring data point. This data are also presented in Extended Data Table 3. The error bars are the errors of the fit. The average and the standard deviation of the IRB measurements are indicated with lines and shaded areas around the line. Each IRB run consists of 200 (A) or 500 (B and C) circuit randomizations with 100 single shots for each run. Inset: DCZ fidelity as a function of lab time, where the transient fidelities are extracted from FBT analysis on IRB experiment data. The error bars in the main gfigure and the shaded region in the inset indicate the 95% confidence interval of the fidelity based on the error coming from the fit.

Source data

We note, however, that non-Markovian effects create notable challenges to the IRB approach and can lead to an overestimated fidelity35. In some instances, we found that the circuits performed better when interleaved with DCZ gates, resulting in unphysical estimated fidelities surpassing 100%. On the other hand, relying only on GST experiments exposes our results to instability in the active feedback procedure since they are based on long experiments and a frequentist approach to the statistical treatment of the data. This degradation of the quality of the gates over time is to some extent captured in a Bayesian analysis, as seen by plotting the evolution of the FBT estimates for two-qubit gates during the operation for more than four consecutive hours as seen in the inset of Fig. 4.

The FBT analysis is performed directly on the outcomes of any arbitrary input circuit (we note that GST may also be adapted for analysing arbitrary circuits). We use the data from IRB runs as input, decomposing all the circuits into their five primitive gates (for additional information, see also ‘Implementation of fast Bayesian tomography’ section in Supplementary Discussion).

To understand better the experimental runs yielding unphysical fidelities (>100% in B and C), we chose those for the FBT analysis. Tomographic analysis that yields a physical process matrix cannot have above 100% fidelity, and FBT has the additional benefit of allowing the analysis of the process in lab time through Bayesian inference27. The fidelities estimated across the full experiment by FBT in Fig. 4 show a very large uncertainty, which can be understood from the plot of the estimated fidelity of the DCZ gate in device B as a function of lab time. Over time, the gate calibration degrades and infidelities more than double.

Implications for large-scale spin quantum computers

Besides the benefits for operation at scale, the consistency of the operations demonstrated here greatly improves the quality of the physical conclusions that can be drawn from experiments. We were able to identify commonalities between gates implemented in different devices and setups and adopting different strategies.

The stability of the operations over time, combined with the theoretical and methodological improvements in process tomography, opens a window into the physics of spin qubit errors. With the possibility of consistently performing high-fidelity one- and two-qubit gates for several hours, we can form hypotheses and test them in a repeatable manner over many hours or days of experiments. Our work not only offers compelling evidence for the microscopic nature of some of the error sources, but it also reveals some unknown processes with, as yet, unclear physical origins.

One of the most important conclusions is that the entangling gate fidelities achieved here can be systematically improved with a combination of better materials to reduce noise36,37, active use of these tomographic results to recalibrate gates against Hamiltonian errors12,31 and pulse engineering to reduce the stochastic errors with robust gates and dynamical decoupling38,39,40,41 (see also ‘Comparison with other high-fidelity two spin systems in silicon’ section in Supplementary Discussion). The fidelity estimates quoted here are far from being fundamentally limited by the physics of spin qubits or by the minimum noise levels in materials.

The scalability of the traditional Loss and DiVincenzo approach42 to spin qubits can be obscured by the overhead imposed by the strategies required for high-fidelity operation. Due to the relative phases, the number of parameters requiring feedback grows prohibitively fast with the number of qubits if no mitigation strategy is adopted43. Moreover, circumventing errors with increasingly convoluted engineered pulses tailored to the idiosyncrasies of each qubit leads to control signals that are hard to generate at scale in an automatized manner. Finally, the degraded performance of idling qubits leads to a steep price in multi-qubit operation. Innovative control solutions might be required to evade these difficulties, including continuously dynamically decoupled driven qubit implementations and pulse shapes aimed at maximizing performance regardless of the particular properties of each qubit44,45,46,47.

Spin qubits in MOS-based quantum dots now join the select group of qubit technologies with two-qubit gate fidelities exceeding the 99% barrier, with the measured average IRB fidelity being 99.17% for three devices with a standard deviation of 0.56%. The prospects for fault-tolerant operation with this platform are further enhanced when the structure of the errors is taken into account; the strong bias towards dephasing errors instead of depolarizing errors opens up the possibility for notable gains in error correction code performance. This will help push the average fidelity up and standard deviation down.

Methods

Experimental devices

The three devices studied in this work were fabricated using multi-level aluminium gate-stack silicon MOS technology49,50 on isotopically enriched silicon-28 substrates of 800 ppm residual 29Si (devices A and B) and of 50 ppm residual 29Si (device C). A layer of SiO2 of ~8 nm was thermally grown above the silicon substrates. The devices are designed with plunger gate width of 30 nm and gate pitch as small as 50 nm. This allows a 20 nm gap between the plunger gates for the J gate.

Measurement setup

Device A was measured in an Oxford Kelvinox 400HA dilution refrigerator. The d.c. bias voltages were generated from Stanford Research Systems SIM928 Isolated Voltage Sources. Gate pulse waveforms were generated by a Quantum Machines (QM) Operator-X+ (OPX+) and combined with d.c. biases using custom linear bias combiners at room temperature.

Devices B and C were measured in a Bluefors XLD400 dilution refrigerator. The d.c. bias voltages were generated with Basel Precision Instruments SP927 DACs. Gate pulse waveforms were generated by a QM OPX and combined with d.c. biases using custom linear bias combiners at the 4K stage.

The SET current of device A was amplified using a room-temperature IV converter (Basel SP983c) and sampled by a QM OPX. The SET of devices B and C were connected to a tank circuit for reflectometry measurement, with the tone generated by the QM OPX. The return signal was amplified by a Cosmic Microwave Technology CITFL1 LNA at the 4K stage, and a Mini-circuits ZX60-P33ULN+ and a Mini-circuits ZFL-1000LN+ at room temperature, before being digitized and demodulated by the QM OPX.

For all devices, microwave pulses were generated with a Keysight PSG8267D Vector Signal Generator, with in-phase and quadrature (I/Q) and pulse modulation waveforms generated by the QM OPXs.