Recent progress in quantum information has led to the start of several large national and industrial efforts to build a quantum computer. Researchers are now working to overcome many scientific and technological challenges. The program's biggest obstacle, a potential showstopper for the entire effort, is the need for high-fidelity qubit operations in a scalable architecture. This challenge arises from the fundamental fragility of quantum information, which can only be overcome with quantum error correction.1 In a fault-tolerant quantum computer the qubits and their logic interactions must have errors below a threshold: scaling up with more and more qubits then brings the net error probability down to appropriate levels needed for running complex algorithms. Reducing error requires solving problems in physics, control, materials and fabrication, which differ for every implementation. I explain here the common key driver for continued improvement - the metrology of qubit errors.

We must focus on errors because classical and quantum computation are fundamentally different. The classical NOT operation in CMOS electronics can have zero error, even with moderate changes of voltages or transistor thresholds. This enables digital circuits of enormous complexity to be built as long as there are reasonable tolerances on fabrication. In contrast, quantum information is inherently error prone because it has continuous amplitude and phase variables, and logic is implemented using analog signals. The corresponding quantum NOT, a bit-flip operation, is produced by applying a control signal that can vary in amplitude, duration and frequency. More fundamentally, the Heisenberg uncertainty principle states that it is impossible to directly stabilise a single qubit as any measurement of a bit-flip error will produce a random flip in phase. The key to quantum error correction is measuring qubit parities, which detects bit flips and phase flips in pairs of qubits. As explained in the text box, the parities are classical like so their outcomes can be known simultaneously.

When parity changes, one of the two qubits had an error, but which one is not known. To identify, encoding must use larger numbers of qubits. This idea can be understood with a simple classical example, the 3-bit repetition code as described in Figure 1. Logical states 0 (1) are encoded as 000 (111), and measurement of parities between adjacent bits A–B and B–C allows the identification (decoding) of errors as long as there is a change of no more than a single bit. To improve the encoding to detect both order n=1 and n=2 errors, the repetition code is simply increased in size to 5 bits, with four parity measurements between them. Order n errors can be decoded from 2n+1 bits and 2n parity measurements.

Figure 1
figure 1

3-bit classical repetition code for bits A, B and C with parity measurements between A–B and B–C. Table shows all combination of inputs and the resulting parity measurements. For an initial state of all zeros, a unique decoding from the measurement to the actual error is obtained for only the top four entries, where there is no more than a single bit error (order n=1).

Quantum codes allow for the decoding of both bit- and phase-flip errors given a set of measurement outcomes. As for the above example, they decode the error properly as long as the number of errors is order n or less. The probability for a decoding error can be computed numerically using a simple depolarisation model that assumes a random bit- or phase-flip error of probability ϵ for each physical operation used to measure the parities. By comparing the known input errors with those determined using a decoding algorithm, the decoding or logical error probability is found to be

(1) P l Λ ( n + 1)
(2) Λ = ε t / ε ,

where ϵt is the threshold error, fit from the data. The error suppression factor is Λ, the key metrological figure of merit that quantifies how much the decoding error drops as the order n increases by one. Note that Pl scales with ϵn+1, as expected for n+1 independent errors. The key idea is that once the physical errors ϵ are lower than the threshold ϵt, then Λ>1 and making the code larger decreases the decoding error exponentially with n. When Λ<1 error detection fails, and even billions of bad qubits do not help.

A key focus for fault tolerance is making qubit errors less than the threshold. For Λ to be as large as possible, we wish to encode with the highest threshold ϵt. The best practical choice is the surface code,2,3 which can be thought of as a two-dimensional version of the repetition code that corrects for both bit and phase errors. A 4n+1 by 4n+1 array of qubits performs n-th order error correction, where about half of the qubits are used for the parity measurements. It is an ideal practical choice for a quantum computer because of other attributes: (i) only nearest neighbour interactions are needed, making it manufacturable with integrated circuits; (ii) the code is upward compatible to logical gates, where measurements are simply turned off; (iii) the code is tolerant up to a significant density (~10%) of qubit dropouts from fabrication defects; (iv) the high error threshold arises from the low complexity of the parity measurement; a code with higher threshold is unlikely; (v) the simplicity of the measurement brings more complexity to the classical decoding algorithm, which fortunately is efficiently scalable; (vi) detected errors can be tracked in software, so physical feed-forward corrections using bit- or phase-flip gates are not needed; (vii) the prediction equation (1) for Pl is strictly valid only for the operative range Λ10, where the threshold is ϵt2%. At break-even Λ=1, the threshold is significantly smaller, 0.7%.

Typical quantum algorithms use ~1018 operations,3 so we target a logical error Pl=10−18. Assuming an improvement Λ=10 for each order, we need n=17 encoding. The number of qubits for the surface code is (4·17+1)2=4,761. For Λ=100, this number lowers by a factor of 4. Although this seems like a large number of qubits from the perspective of present technology, we should remember that a cell phone with 1012 transistors, now routinely owned by most people in the world, was inconceivable only several decades ago.

Hardware requirements can be further understood by separating out the entire parity operation into one- and two-qubit logic and measurement components. Assuming errors in only one of these components, break-even thresholds are, respectively, 4.3%, 1.25% and 12%: the two-qubit error is clearly the most important, whereas measurement error is the least important. For the practical case when all components have non-zero errors, I propose the threshold targets

(3) ε 1 0.1%
(4) ε 2 0.1%
(5) ε m 0.5% ,

which gives Λ≥17. It is critical that all three error thresholds be met, as the worst performing error limits the logical error Pl. Measurement errors ϵm can be larger because its single-component threshold 12% is high. Two-qubit error ϵ2 is the most challenging because its physical operation is much more complex than for single qubits. This makes ϵ2 the primary metric around which the hardware should be optimised. The single-qubit error ϵ1, being easier to optimise, should readily be met if the two-qubit threshold is reached. Note that although it is tempting to isolate qubits from the environment to lower one-qubit errors, in practice this often makes it harder to couple them together for two-qubit logic; I call such strategy ‘neutrino-ised qubits’.

In the life cycle of a qubit technology, experiments start with a single qubit and then move to increasingly more complex multi-qubit demonstrations and metrology. The typical progression4 is illustrated in Figure 2, where the technology levels and their metrics are shown together.

Figure 2
figure 2

Life cycle of a qubit. Illustration showing the increasing complexity of qubit experiments, built up on each other, described by technology levels I through VII. Numbers in parenthesis shows approximate qubit numbers. Key metrics are shown at bottom. Errors for one qubit, two qubit and measurement are described by ϵ1, ϵ2 and ϵm, respectively, which leads to an error suppression factor Λ. Fault-tolerant error correction is achieved when Λ>1. Scaling to large n leads to Pl→0.

In level I, one- and two-qubit experiments measure coherence times T1 and T2, and show basic functionality of qubit gates. Along with the one-qubit gate time tg1, an initial estimate of gate error can be made. Determining the performance of a two-qubit gate is much harder as other decoherence or control errors will typically degrade performance. Swapping an excitation between two qubits is a simple method to determine whether coherence has changed. Quantum process tomography is often performed on one- and two-qubit gates,5 which is important as it proves that proper quantum logic has been achieved. In this initial stage, it is not necessary to have low measurement errors, and data often have arbitrary units on the measurement axis. This is fine for initial experiments that are mostly concerned with the performance of qubit gates.

In level II, more qubits are measured in a way that mimics the scale-up process. This initiates more realistic metrology tests as to how a qubit technology will perform in a full quantum computer. Here the application of many gates in sequence through randomized benchmarking (RB) enables the total error to grow large enough for accurate measurement, even if each gate error is tiny.6 Interleaved RB is useful for measuring the error probability of specific one- and two-qubit logic gates, and gives important information on error stability. Although RB represents an average error and provides no information on error coherence between gates, it is a practical metric to characterise overall performance.7 For example, RB can be used to tune up the control signals for lower errors.8 Process tomography can be performed for multiple qubits, but is typically abandoned because (i) the number of necessary measurements scales rapidly with increasing numbers of qubits, (ii) information on error coherence is hard to use and (iii) it is difficult to separate out initialisation and measurement errors. Measurement error is also obtained in this level; differentiation should be made between measurement that destroys a qubit state or not, as the latter is eventually needed in level IV for logical qubits. A big concern is crosstalk between various logic gates and measurement outcomes, and whether residual couplings between qubits create errors when no interactions are desired. A variety of crosstalk measurements based on RB are useful metrology tools.

In level III an error detection or correction algorithm is performed,9 representing a complex systems test of all components. Qubit errors have to be low enough to perform many complex qubit operations. Experiments work to extend the lifetime of an encoded logical state, typically by adding errors to the various components to show improvement from the detection protocol relative to the added errors.

At level IV, the focus is measuring Λ>1, demonstrating how a logical qubit can have less and less error by scaling up the order n of error correction. The logical qubit must be measured in first and second orders, which requires parity measurements that are repetitive in time so as to include the effect of measurement errors. Note that extending the lifetime of a qubit state in first order is not enough to determine Λ. Measuring Λ>1 indicates that all first-order decoding errors have been properly corrected, and that further scaling up should give lower logical errors. Because 81 qubits are needed for the surface code with n=2, a useful initial test is for bit-flip errors, requiring a linear array of nine qubits. These experiments are important as they connect the error metrics of the qubits, obtained in level II, to actual fault-tolerant performance Λ. As there are theoretical and experimental approximations in this connection, including the depolarisation assumption for theory and RB measurement for experiment, this checks the whole framework of computing fault tolerance. A fundamentally important test for n≥2 is whether Λ remains constant, as correlated errors would cause Λ to decrease. Level IV tests continue until the order n is high enough to convincingly demonstrate an exponential suppression of error. A significant challenge here is to achieve all error thresholds in one device and in a scalable design.

An experiment measuring the bit-flip suppression factor ΛX has been done with a linear chain of nine superconducting qubits.10 The measurement ΛX=3.2 shows logical errors have been reduced, with a magnitude that is consistent with the bit-flip threshold of 3% and measured errors. This is the first demonstration that individual error measurements can be used to predict fault tolerance. For bit and phase fault tolerance, we need to improve only two-qubit errors and then scale.

In level V, as the lifetime of a logical state has been extended, the goal is to perform logical operations with minuscule error. Similar to classical logic that can be generated from the NOT and AND gates, arbitrary quantum logic can be generated from a small set of quantum gates. Here all the Clifford gates are implemented, such as the S, Hadamard or controlled NOT. The logical error probabilities should be measured and tested for degradation during logical gates.

In level VI, the test is for the last and most difficult logic operation, the T gate, which phase shifts the logical state by 45°. Here state distillation must be demonstrated, and feed-forward from qubit errors conditionally controls a logical S gate.3 Because logical errors can be readily accounted for in software for all the logical Clifford gates in level V, feed-forward is only needed for this non-Clifford logical T gate.

Level VII is for the full quantum computer.

The strategy for building a fault-tolerant quantum computer is as follows. At level I, the coherence time should be at least 1,000 times greater than the gate time. At level II, all errors need to be less than threshold, with particular attention given to hardware architecture and gate design for lowest two-qubit error. Design should allow scaling without increasing errors. Scaling begins at level IV: nine qubits give the first measurement of fault tolerance with ΛX, 81 qubits give the proper quantum measure of Λ and then about 103 qubits allow for exponentially reduced errors. At level V through VII, 104 qubits are needed for logical gates and finally, about 105 qubits will be used to build a demonstration quantum computer.

The discussion here focuses on optimising Λ, but having fast qubit logic is desirable to obtain a short run time. Run times can also be shortened by using a more parallel algorithm, as has been proposed for factoring. A 1,000 times-slower quantum logic can be compensated for with about 1,000 times more qubits.

Scaling up the number of qubits while maintaining low error is a crucial requirement for level IV and beyond. Scaling is significantly more difficult than for classical bits as system performance will be affected by small crosstalk between the many qubits and control lines. This criteria makes large qubits desirable, as more room is then available for separating signals and incorporating integrated control logic and memory. Note this differs from standard classical scaling of CMOS and Moore’s law, where the main aim is to decrease transistor size.

Superconducting qubits have macroscopic wavefunctions and are therefore well suited for the challenges of scaling with control. I expect qubit cells to be in the 30–300 μm size scale, but clearly any design with millions of qubits will have to properly tradeoff density with control area based on experimental capabilities.

In conclusion, progress in making a fault-tolerant quantum computer must be closely tied to error metrology, as improvements with scaling will only occur when errors are below threshold. Research should particularly focus on two-qubit gates, as they are the most difficult to operate well with low errors. As experiments are now within the fault-tolerant range, many exciting developments are possible in the next few years.

Quantum parity. An arbitrary qubit state is written as Ψ = cos ( θ /2)|0 + e i ϕ sin ( θ /2)|1 , where the continuous variables θ and ϕ are the bit amplitude and phase. A bit measurement collapses the state into |0〉 (|1〉) with probability cos2(θ/2) (sin2(θ/2)), thus digitising error. In general, measurement allows qubit errors to be described as either bit flip X ˆ (|0〉↔|1〉) or phase flip Z ˆ (|1〉↔−|1〉). According to the Heisenberg uncertainty principle, it is not possible to simultaneously measure the amplitude and phase of a qubit, so obtaining information on a bit flip induces information loss on phase equivalent to a random phase flip and vice versa. This property comes fundamentally from bit and phase flips not commuting [ X ˆ , Z ˆ ] = X ˆ Z ˆ Z ˆ X ˆ 0 ; the sequence of the two operations matter. Quantum error correction takes advantage of an interesting property of qubits X ˆ Z ˆ = Z ˆ X ˆ , so that a change in sequence just produces a minus sign. With X ˆ 1 X ˆ 2 and Z ˆ 1 Z ˆ 2 corresponding to two-qubit bit and phase parities, they now commute because a minus sign is picked up from each qubit

(6) [ X ˆ 1 X ˆ 2 , Z ˆ 1 Z ˆ 2 ] = X ˆ 1 X ˆ 2 Z ˆ 1 Z ˆ 2 Z ˆ 1 Z ˆ 2 X ˆ 1 X ˆ 2
(7) = X ˆ 1 X ˆ 2 Z ˆ 1 Z ˆ 2 ( ) 2 X ˆ 1 X ˆ 2 Z ˆ 1 Z ˆ 2
(8) =0 .

The two parities can now be known simultaneously, implying they are classical like: a change in one parity can be measured without affecting the other.