Characterizing quantum noise is an essential step in the development of quantum hardware1,2. Remarkably, despite recent progress in both gate-level and scalable noise characterization methods3,4,5,6,7,8,9,10,11,12,13,14,15,16, the full characterization of the noise channel of a single CNOT/CZ gate remains infeasible. This is unlikely to be caused by limitations of existing benchmarking algorithms. Instead, it is believed to be related to the fundamental question of what information about a quantum system can be learned, in a setting where initial states, gates, and measurements are all subject to unknown quantum noise. It is well-known that some information about quantum noise can be learned (such as the gate fidelity learned by randomized benchmarking3,4,5,6,7 or cycle benchmarking9), but not everything can be learned (due to the gauge freedom in gate set tomography17,18,19). The boundary of learnability of quantum noise – a precise understanding of what information is learnable and what is not, still remains an open question.

Recently, there has been an interest in formulating noise characterization as learning unknown gate-dependent Pauli noise channels9,11. This is motivated by randomized compiling, a technique that has been proposed to suppress coherent errors via inserting random Pauli gates20,21. As an added benefit, randomized compiling twirls the gate-dependent CPTP noise channel into Pauli noise, thus reducing the number of parameters to be learned. Note that the twirled Pauli noise channel corresponds to the diagonal of the process matrix of the CPTP map, so Pauli noise learning is a necessary step for characterizing the CPTP map, regardless of whether randomized compiling is performed.

However, even under this simplified setting of Pauli noise learning, all prior experimental attempts can only partially characterize the noise channel of a single CNOT/CZ gate21,22,23, which only has 15 degrees of freedom. A natural question is whether this limitation is caused by the fundamental unlearnability of the noise channel, and if so, which part of the noise channel and how many degrees of freedom among the 15 are unlearnable?

In this paper, we give a precise characterization of what information in the Pauli noise channel attached to Clifford gates is learnable, in a way that is robust against state preparation and measurement (SPAM) noise. We develop a systematic method for characterizing learnable degrees of freedom of a Clifford gate set using notions from algebraic graph theory and show that learnable information exactly corresponds to the cycle space of the Pauli pattern transfer graph, while unlearnable information exactly corresponds to the cut space. This characterization can be used to write down a list of linear functions of the noise model that corresponds to all independent learnable degrees of freedom. As an example, we show that the Pauli noise channel of an arbitrary 2-qubit Clifford gate has at most 2 unlearnable degrees of freedom. We perform an experimental characterization of a CNOT gate on IBM Quantum hardware24 up to 2 unlearnable degrees of freedom. Although the unlearnable information cannot be estimated with high precision, we can determine a feasible region of those freedoms using the constraint that the noise model must be physical (i.e., all Pauli error rates are nonnegative).

A corollary of our result is that cycle benchmarking is optimal in the setting we consider, in the sense that it can learn all the information that is learnable. This reveals a fundamental fact about noise benchmarking, namely that cycle benchmarking – the idea of repeatedly applying the same gate sequence interleaved by single qubit gates, is the “right” algorithm for benchmarking Clifford gates, because of the fact that learnable information forms a cycle space. As an interesting side remark, the term “cycle” in cycle benchmarking originally refers to parallel gates applied in a clock cycle. Here we show that the term can also be understood in a graph-theoretical context.

In addition, we also explore ways to overcome the unlearnability barrier. It has been recognized that the unlearnability does not apply if the initial state \({\left|0\right\rangle }^{\otimes n}\) can be prepared perfectly15,23, and it has been suggested that state preparation noise could be much smaller than gate and/or measurement noise in practice25,26,27, which would make gate noise fully learnable up to small error. We develop an algorithm based on cycle benchmarking that fully learns gate-dependent Pauli noise channel assuming perfect initial state preparation, and experimentally demonstrate the method on IBM’s CNOT gate. Based on the experiment data, we conclude that this assumption is unlikely to be correct in our experiment as it gives unphysical estimates that are outside the feasible region we determined. Furthermore, we use the data to obtain a lower bound on the state preparation noise and conclude that it has the same order of magnitude as gate noise on the device we used. Therefore, the issue of unlearnability is a practically relevant concern, for which the noise on initial states is an important factor that cannot be neglected on current quantum hardware.


Theory of learnability

We start by considering the learnability of the Pauli noise channel of a single n-qubit Clifford gate. A Pauli channel can be written as

$$\Lambda (\cdot )=\mathop{\sum}\limits_{a\in {{\mathsf{P}}}^{n}}{p}_{a}{P}_{a}(\cdot ){P}_{a},$$

where {pa} is a probability distribution on Pn = {I, X, Y, Z}n. The goal is to learn this distribution, which has 4n − 1 degrees of freedom. Considering Λ as a linear map, its eigenvectors exactly correspond to all n-qubit Pauli operators, as

$$\Lambda ({P}_{a})={\lambda }_{a}{P}_{a},\quad \forall a\in {{\mathsf{P}}}^{n}$$

where \({\lambda }_{a}={\sum }_{b\in {{\mathsf{P}}}^{n}}{p}_{b}{(-1)}^{\langle a,b\rangle }\) is the Pauli fidelity associated with the Pauli operator Pa. Therefore Λ is a linear map with known eigenvectors and unknown eigenvalues, so a natural way to learn Λ is to first learn all the Pauli fidelities λa, and then reconstruct the Pauli errors via \({p}_{a}=\frac{1}{{4}^{n}}{\sum }_{b\in {{\mathsf{P}}}^{n}}{\lambda }_{b}{(-1)}^{\langle a,b\rangle }\).

The convenience of working with Pauli fidelities is further demonstrated by the fact that some Pauli fidelities can be directly learned by cycle benchmarking, even with noisy state preparation and measurement. For example, consider the CNOT gate which maps the Pauli operator IX to itself. Figure 1(a) shows the cycle benchmarking circuit. Imagine that we put the Pauli operator IX after the left red box and evolve it with the circuit, then the evolved operator (before the right red box) equals \({\lambda }_{IX}^{3}\cdot IX\), up to a ± sign (which comes from the random Pauli gates and can always be accounted for during post-processing). Here we use the convention that the noise channel happens before each CNOT gate. In experiments, we prepare a + 1 eigenstate of IX (such as \(\left |+\right\rangle \left |+\right\rangle\)), measure the expectation value of IX at the end, and average over random Pauli twirling sequences. These SPAM operations are noisy and are represented as the red boxes. It is shown9 that the measured expectation value equals

$${\mathbb{E}}\langle IX\rangle={A}_{IX}\cdot {\lambda }_{IX}^{d}$$

where the expectation is over random Pauli twirling gates and randomness of quantum measurement, and AIX depends on SPAM noise but is independent of circuit depth d. From this λIX can be learned by estimating the observable IX at several different depths and perform a curve fitting.

Fig. 1: Cycle benchmarking for learning the Pauli noise channel of a CNOT gate.
figure 1

a Standard CB circuits, where CNOT gates are interleaved by random Pauli gates (green boxes), with initial stabilizer states and Pauli basis measurements (red boxes). b CB circuits with additional interleaved single qubit Clifford gates (blue boxes).

The Pauli operator IX is special as it is invariant under CNOT. Consider another example: CNOT maps XZ to YY and vice versa. Consider Fig. 1(b) where we insert additional layers of single-qubit Clifford gates \(\sqrt{Z}\otimes \sqrt{X}\) that also maps XZ to YY and vice versa (up to a minus sign that can always be accounted for during post-processing). After XZ picks up a coefficient λXZ in front of the CNOT gate, it gets mapped to λXZYY by CNOT but then rotated back to λXZXZ by \(\sqrt{Z}\otimes \sqrt{X}\). Following the same argument we conclude that both λXZ and λYY are learnable. For simplicity here we make an assumption that single qubit gates are noiseless, motivated by the fact that single qubit gates are 1-2 magnitudes less noisy than 2-qubit gates on today’s quantum hardware24. In practice, it is a standard assumption to model noise on single-qubit gates as gate-independent (e.g.23), and our noise characterization result can be interpreted as the noise channel induced by a dressed cycle which consists of a CNOT gate and two single-qubit gates20.

The main challenge comes with the next example: CNOT maps IZ to ZZ and vice versa. By directly applying cycle benchmarking as in Fig. 1(a) (with even depth d) we obtain

$${\mathbb{E}}\langle IZ\rangle={A}_{IZ}\cdot {\lambda }_{IZ}{\lambda }_{ZZ}{\lambda }_{IZ}{\lambda }_{ZZ}\cdots={A}_{IZ}{\left({\lambda }_{IZ}{\lambda }_{ZZ}\right)}^{d/2},$$

and curve fitting gives \(\sqrt{{\lambda }_{IZ}{\lambda }_{ZZ}}\) (similar results have been obtained in9,21,22,23). To learn λIZ, we may consider applying the same technique in Fig. 1(b). However, the problem is that once IZ gets mapped to ZZ, it cannot be rotated back to IZ because I is invariant under single qubit unitary gates. The main difference between this example and previous examples is that here the Pauli weight pattern (an n-bit binary string with 0 indicating identity and 1 indicating non-identity) changes from 01 to 11, thus making the single qubit rotation tool inapplicable.

In fact we can go on to prove that λIZ (as well as λZZ) is unlearnable. Here unlearnable means that there exists two noise models such that the parameter λIZ is different, but the two noise models are indistinguishable by any quantum experiment, meaning that any quantum experiment generates exactly the same output statistics with the two noise models. The result also generalizes to arbitrary n-qubit Clifford gates.

Theorem 1

Given an n-qubit Clifford gate \({{{{{{{\mathcal{G}}}}}}}}\) and an n-qubit Pauli operator Pa, the Pauli fidelity λa of the noise channel attached to \({{{{{{{\mathcal{G}}}}}}}}\) is learnable if and only if \({{{{{{{\rm{pt}}}}}}}}({{{{{{{\mathcal{G}}}}}}}}({P}_{a}))={{{{{{{\rm{pt}}}}}}}}({P}_{a})\). Here pt denotes the Pauli weight pattern.

The “if” part follows directly from cycle benchmarking as discussed above. For the “only if” part, when \({{{{{{{\rm{pt}}}}}}}}({{{{{{{\mathcal{G}}}}}}}}({P}_{a}))\ne {{{{{{{\rm{pt}}}}}}}}({P}_{a})\), we construct a gauge transformation to prove the unlearnability of λa, following ideas from gate set tomography17,18,19. A gauge transformation is an invertible linear map \({{{{{{{\mathcal{M}}}}}}}}\) that converts a noise model (initial states ρi, POVM operators Ej, noisy gates Gk) to a new noise model as

$${\rho }_{i}\mapsto {{{{{{{\mathcal{M}}}}}}}}({\rho }_{i}),\quad {E}_{j}\mapsto {({{{{{{{{\mathcal{M}}}}}}}}}^{-1})}^{{{{\dagger}}} }({E}_{j}),\quad {G}_{k}\mapsto {{{{{{{\mathcal{M}}}}}}}}\circ {G}_{k}\circ {{{{{{{{\mathcal{M}}}}}}}}}^{-1},$$

with the constraint that the new noise model is physical. Note that the old and new noise models are indistinguishable by definition. To construct such a gauge transformation, as \({{{{{{{\rm{pt}}}}}}}}({{{{{{{\mathcal{G}}}}}}}}({P}_{a}))\, \ne \,{{{{{{{\rm{pt}}}}}}}}({P}_{a})\), there exists a bit on which the two Pauli weight patterns differ. We then define \({{{{{{{\mathcal{M}}}}}}}}\) as a single-qubit depolarizing noise channel on the corresponding qubit. In this way we can show that the old and new noise models assign different values to λa, which means λa is unlearnable. This proof naturally implies that using other noisy gates from the gate set (that are subject to different unknown noise channels) does not change the learnability of Pauli fidelities. More details of the proof are given in Supplementary Section II B. As a side remark, it is known that under the stronger assumption of gate-independent noise (where different multi-qubit gates are assumed to have the same noise channel), the noise channel is fully learnable28,29,30.

Theorem 1 provides a simple condition for determining the learnability of individual Pauli fidelities, but it is not sufficient for characterizing the learnability of joint functions of different Pauli fidelities. In the CNOT example, we know that both λIZ and λZZ are unlearnable, but we also know that their product λIZλZZ is learnable. This means that there is only one unlearnable degree of freedom in the two parameters {λIZ, λZZ}. In the following we show how to determine learnable and unlearnable degrees of freedom of Pauli noise, and also generalize the discussion from a single gate to a gate set.

We start by defining learnable information. Consider a Clifford gate set with m gates, where we model each gate as an n-qubit gate associated with an n-qubit Pauli noise channel. This model is applicable to both individual gates (e.g. a 2-qubit system where each 2-qubit gate is implemented by a different physical process and subject to a different noise channel) as well as parallel applications of gates (e.g. an n-qubit system where each “gate” in the gate set is implemented by a layer of 2-qubit gates; the n-qubit noise channel models the crosstalk among the 2-qubit gates). The goal is to characterize the learnable degrees of freedom among the m 4n parameters.

Recall that the output of cycle benchmarking is a product of Pauli fidelities (including SPAM noise). We further show that without loss of generality this is the only type of information that we need to obtain from quantum experiments for the purpose of noise learning. This is because in general the output probability of any quantum experiment can be expressed as a sum of products of Pauli fidelities, and each individual product can be learned by cycle benchmarking (Supplementary Section IV). We therefore consider learning functions of the noise model that can be expressed as a product of Pauli fidelities (also see below Eq. (7) for a related discussion). This can be reduced to considering functions of the form \(f={\sum }_{a,{{{{{{{\mathcal{G}}}}}}}}}{v}_{a}^{{{{{{{{\mathcal{G}}}}}}}}}\cdot {l}_{a}^{{{{{{{{\mathcal{G}}}}}}}}}\), where \({l}_{a}^{{{{{{{{\mathcal{G}}}}}}}}}:\!\!=\log {\lambda }_{a}^{{{{{{{{\mathcal{G}}}}}}}}}\) is the log Pauli fidelity, \({v}_{a}^{{{{{{{{\mathcal{G}}}}}}}}}\in {\mathbb{R}}\), and the superscript \({{{{{{{\mathcal{G}}}}}}}}\) denotes the corresponding Clifford gate. In the CNOT example lIZ + lZZ is a learnable function. The idea of learning log Pauli fidelities in benchmarking has also been considered in15,31. The advantage of considering log Pauli fidelities here is that the set of all learnable functions f forms a vector space. Therefore to characterize all independent learnable degrees of freedom, we only need to determine a basis of the vector space.

Recall that the reason that lIZ + lZZ is learnable in the CNOT example is because the path of Pauli operator in the cycle benchmarking circuit forms a cycle IZ → ZZ → IZ →   , and the product of Pauli fidelities along the cycle (λIZλZZ) can be learned via curve fitting. In general, as we can also insert single qubit Clifford gates in between, we do not need to differentiate between X, Y, Z. We therefore consider the pattern transfer graph associated with a Clifford gate set where vertices corresponds to binary Pauli weight patterns and each edge is labeled by the Pauli fidelity of the incoming Pauli operator. The graph has 2n vertices and m 4n directed edges. They can also be merged to form the pattern transfer graph of the gate set {CNOT, SWAP}. Figure 2 shows the pattern transfer graph of CNOT, SWAP, and the gate set of {CNOT, SWAP}. Consider an arbitrary cycle in the pattern transfer graph C = (e1, …, ek) where each edge ei is associated with some Pauli fidelity λi. Following Fig. 1(b), a cycle benchmarking circuit can be constructed which learns the product of the Pauli fidelites along the cycle, or equivalently the function \({f}_{C}:\!\!={\sum }_{{e}_{i}\in C}\log {\lambda }_{i}\) can be learned. This implies that the set of functions defined by linear combination of cycles \(\{{\sum }_{C\in {{{{{{\mathrm{cycles}}}}}}}}{\alpha }_{C}{f}_{C}:{\alpha }_{C}\in {\mathbb{R}}\}\) are learnable. In the following we show that this in fact corresponds to all learnable information about Pauli noise.

Fig. 2: Pattern transfer graph of CNOT, SWAP, and a gate set consisting of CNOT and SWAP.
figure 2

Here, multiple edges are represented by a single edge with multiple labels. The labels on the first two graphs are gate dependent, though we omit the superscripts of CNOT or SWAP. The labels on the last graph are a combination of the first two graphs and are omitted for clarity.

We label the edges of the pattern transfer graph as e1, …, eM where M = m 4n and each edge ei is a variable that represents some log Pauli fidelity. The goal is to characterize the learnability of linear functions of the edge variables \(f=\mathop{\sum }\nolimits_{i=1}^{M}{v}_{i}{e}_{i}\), \({v}_{i}\in {\mathbb{R}}\). The set of linear functions can be equivalently understood as a vector space of dimension M, called the edge space of the graph, where f corresponds to a vector (v1, …, vM) and we think of e1, …, eM as the standard basis. Following the above discussion, the cycle space of the graph is defined as span{∑eCe: C is a cycle}, which is a subspace of edge space. We also define another subspace, the cut space, as \({{{{{{{\rm{span}}}}}}}}\{{\sum }_{e\in C}{(-1)}^{e\,{{{{{{\mathrm{from}}}}}}}\,{V}_{1}\,{{{{{{\mathrm{to}}}}}}}\,{V}_{2}}e:C\,{{{{{{\mathrm{is}}}}}}\; {{{{{\mathrm{a}}}}}}\; {{{{{\mathrm{cut}}}}}} \; {{{{{\mathrm{between}}}}}} \; {{{{{\mathrm{a}}}}}} \; {{{{{\mathrm{partition}}}}}} \; {{{{{\mathrm{of}}}}}} \; {{{{{\mathrm{vertices}}}}}}}\,{V}_{1},\, {V}_{2}\}\). It is known that the edge space is the orthogonal direct sum of cycle space and cut space for any graph32. Interestingly, we show that the complementarity between cycle and cut space happens to be the dividing line that determines the learnability of Pauli noise.

Theorem 2

The vector space of learnable functions of the Pauli noise channels associated with an n-qubit Clifford gate set is equivalent to the cycle space of the pattern transfer graph. In other words,

$${{{{{{\mathrm{All}}}}}}\; {{{{{\mathrm{information}}}}}}}\, \equiv \,{{{{{{\mathrm{Edge}}}}}} \; {{{{{\mathrm{space}}}}}}}\,,\\ \,{{{{{{\mathrm{Learnable}}}}}} \; {{{{{\mathrm{information}}}}}}} \equiv \,{{{{{{\mathrm{Cycle}}}}}} \; {{{{{\mathrm{space}}}}}}}\,,\\ \,{{{{{{\mathrm{Unlearnable}}}}}} \; {{{{{\mathrm{information}}}}}}} \equiv \,{{{{{{\mathrm{Cut}}}}}} \; {{{{{\mathrm{space}}}}}}}.$$

This implies that the number of unlearnable degrees of freedom equals 2n − c, where c is the number of connected components of the pattern transfer graph.

The learnability of cycle space follows from cycle benchmarking as discussed above. To prove the unlearnability of cut space, we use a similar argument as in Theorem 1 and show that a gauge transformation can be constructed for each cut in the pattern transfer graph. By linearity, this implies that any vector in the cut space corresponds to a gauge transformation. By definition, a learnable function must be orthogonal to all such vectors and thus orthogonal to the entire cut space. More details of the proof are given in Supplementary Section II C.

It is a well-known fact in graph theory that the cycle space of a directed graph G = (V, E) has dimension E − V + c while the cut space has dimension V − c, where c ≥ 1 is the number of connected components in G32 (a (weakly) connected component is a maximal subgraph in which every vertex is reachable from every other vertex via an undirected path). Theorem 2 implies that among the m 4n degrees of freedom of the Pauli noise associated with a Clifford gate set, there are 2n − c unlearnable degrees of freedom. This shows that while the number of unlearnable degrees of freedom can be exponentially large, they only occupy an exponentially small fraction of the entire space. In addition, a cycle and cut basis can be efficiently determined for a given graph, though in our case this takes exponential time because the pattern transfer graph itself is exponentially large. However, computing the cycle/cut basis is not the bottleneck as the information to be learned also grows exponentially with the number of qubits. For small system sizes such as 2-qubit Clifford gates, we can write down a cycle basis as shown in Table 1(a) for the CNOT and SWAP gates, which represents all learnable information about these gates. The CNOT gate has 2 unlearnable degrees of freedom while the SWAP gate has 1 unlearnable degree of freedom. As the pattern transfer graph has at least 2 connected components, we conclude that the Pauli noise channel of a 2-qubit Clifford gate has at most 2 unlearnable degrees of freedom. Note that when treating {CNOT, SWAP} together as a gate set, there are only 2 unlearnable degrees of freedom according to Theorem 2 instead of 2 + 1 = 3, because there is one additional learnable degree of freedom (such as \({l}_{IZ}^{{{{{{{{\rm{CNOT}}}}}}}}}+{l}_{XX}^{{{{{{{{\rm{CNOT}}}}}}}}}+{l}_{XI}^{{{{{{{{\rm{SWAP}}}}}}}}}\)) that is a joint function of the two gates.

Table 1 A complete basis for the learnable linear functions of log Pauli fidelities and Pauli error rates for a single CNOT/SWAP gate

Finally, the learnability of Pauli errors can be determined by the learnability of Pauli fidelities according to the Walsh-Hadamard transform \({p}_{a}=\frac{1}{{4}^{n}}{\sum }_{b\in {{\mathsf{P}}}^{n}}{\lambda }_{b}{(-1)}^{\langle a,b\rangle }\). An issue here is that Pauli errors are linear functions of {λb} instead of \(\{\log {\lambda }_{b}\}\). Here we make a standard assumption in the literature9,10 that the total Pauli error is sufficiently small. In this case all individual Pauli errors are close to 0 while all individual Pauli fidelities are close to 1. Therefore the Pauli errors can be estimated via

$${p}_{a}=\frac{1}{{4}^{n}}\mathop{\sum}\limits_{b\in {{\mathsf{P}}}^{n}}{\lambda }_{b}{(-1)}^{\langle a,b\rangle }\approx \frac{1}{{4}^{n}}\mathop{\sum}\limits_{b\in {{\mathsf{P}}}^{n}}{(-1)}^{\langle a,b\rangle }\left(1+\log {\lambda }_{b}\right),$$

which means that their learnability can be determined by Theorem 2. In fact it has been suggested31 that any function of Pauli fidelities can be estimated in this way (as a linear function of log Pauli fidelities) up to a first-order approximation, which means that the learnability of any function of Pauli fidelities can be determined by Theorem 2. In Table 1 (c) we show the learnable Pauli errors for CNOT and SWAP, where “learnable” is in an approximate sense up to Eq. (7). Interestingly, for these two gates, the learnable functions of Pauli errors have the same form as the cycle basis, i.e. the cycle space is invariant under Walsh-Hadamard transform. We calculate the learnable Pauli errors for up to 4-qubit random Clifford gates and this seems to be true in general. We leave a rigorous investigation into this phenomenon for future work.

Experiments on IBM Quantum hardware

We demonstrate our theory on IBM quantum hardware24 using a minimal example – characterizing the noise channel of a CNOT gate. In our experiments both the gate noise and SPAM noise are twirled into Pauli noise using randomized compiling. In the following we show how to extract all learnable information of Pauli noise SPAM-robustly, and also attempt to estimate the unlearnable degrees of freedom by making additional assumptions.

First, we conduct two types of cycle benchmarking (CB) experiments, the standard CB and CB with interleaving single-qubit gates (called interleaved CB), as shown in Fig. 1. The results are shown in Fig. 3. Here a set of two Pauli labels in the x-axis (e.g., {IZ, ZZ}) corresponds to the geometric mean of the Pauli fidelity (e.g., \(\sqrt{{\lambda }_{IZ}{\lambda }_{ZZ}}\)). Comparing to Table 1, we see that all learnable information of Pauli fidelities (including learnable individual and 2-product) are successfully extracted. Also note from Fig. 3 that the two types of CB experiments give consistent estimates, in terms of both the process fidelity and individual Pauli fidelities (e.g., \(\sqrt{{\lambda }_{XZ}{\lambda }_{YY}}\) estimated from standard CB is consistent with λXZ and λYY from interleaved CB).

Fig. 3: Estimates of Pauli fidelities of IBM’s CNOT gate via standard CB (left) and CB with interleaved gates (right), using circuits shown in Fig. 1.
figure 3

Data are collected from ibmq_montreal on 2022-03-23. Each Pauli fidelity is fitted using seven different circuit depths L = [2, 22, . . . , 27]. For each depth C = 60 random circuits and 1000 shots of measurements are used. Throughout this paper, the error bar represents the standard error.

We have shown that all 13 learnable degrees of freedom (excluding the trivial λII = 1) are extracted in Fig. 3 by comparing with Table 1, and there remain 2 unlearnable degrees of freedom. We can bound the feasible region of the 2 unlearnable degrees of freedom using physical constraints, i.e., the reconstructed Pauli noise channel must be completely positive. This is equivalent to requiring pa ≥ 0 for all Pauli error rates pa. We choose λXX and λZZ as a representation of the unlearnable degrees of freedom, and plot the calculated feasible region in Fig. 4(a), which happens to be a rectangular area. We also calculate the feasible region for each unlearnable Pauli fidelity and Pauli error rate, which are presented in Fig. 4(b), (c). In particular, we choose two extreme points (blue and green dots in Fig. 4(a)) in the feasible region and plot the corresponding noise model in Fig. 4(b), (c). Note that the (approximately) learnable Pauli error rates (on the left of the red vertical dashed line) are nearly invariant under change of gauge degrees of freedom, but they can be estimated to be negative due to statistical fluctuation. Thus, when we calculate the physical constraints, we only require those unlearnable Pauli error rates (on the right of the red vertical dashed line) to be non-negative.

Fig. 4: Feasible region of the learned Pauli noise model, using data from Fig. 3.
figure 4

a Feasible region of the unlearnable degrees of freedom in terms of λXX and λZZ. b Feasible region of individual Pauli fidelities. c Feasible region of individual Pauli errors.

Next, we explore an approach to estimate the unlearnable information with additional assumptions. Suppose that one can prepare \({\left|0\right\rangle }^{\otimes n}\) perfectly. Since we assume noiseless single-qubit gates, this means we can prepare a set of perfect tomographically complete states \(\{\left|0/1\right\rangle,\left|\pm \right\rangle,\left|\pm i\right\rangle \}\). In this case, all the unlearnable degrees of freedom become learnable, as one can first perform a measurement device tomography, and then directly estimate the process matrix of a noisy gate with measurement error mitigated25. Following this general idea, we propose a variant of cycle benchmarking for Pauli noise characterization, which we call intercept CB as it uses the information of intercept in a standard cycle benchmarking protocol. Given an n-qubit Clifford gate \({{{{{{{\mathcal{G}}}}}}}}\), let m0 be the smallest positive integer such that \({{{{{{{{\mathcal{G}}}}}}}}}^{{m}_{0}}={{{{{{{\mathcal{I}}}}}}}}\). For any Pauli fidelity λa (regardless of whether learnable or not according to Theorem 1), consider the following two CB experiments using the standard circuit as in Fig. 1(a). First, prepare an eigenstate of Pa, run CB with depth lm0 + 1 for some non-negative integer l, and estimate the expectation value of \({P}_{b}:\!\!={{{{{{{\mathcal{G}}}}}}}}({P}_{a})\). The result equals

$${\mathbb{E}}{\langle {P}_{b}\rangle }_{l{m}_{0}+1}={\lambda }_{{P}_{a}}^{S}{\lambda }_{{P}_{b}}^{M}{\lambda }_{a}{\left(\mathop{\prod }\limits_{k=1}^{{m}_{0}}{\lambda }_{{{{{{{{{\mathcal{G}}}}}}}}}^{k}({P}_{a})}\right)}^{l},$$

where \({\lambda }_{{P}_{a/b}}^{S/M}\) is the Pauli fidelity of the state preparation and measurement noise channel, respectively (earlier we have absorbed these two coefficients into a single coefficient A for simplicity). Second, prepare an eigenstate of Pb, run CB with depth lm0, and estimate the expectation value of Pb. The result equals

$${\mathbb{E}}{\langle {P}_{b}\rangle }_{l{m}_{0}}={\lambda }_{{P}_{b}}^{S}{\lambda }_{{P}_{b}}^{M}{\left(\mathop{\prod }\limits_{k=1}^{{m}_{0}}{\lambda }_{{{{{{{{{\mathcal{G}}}}}}}}}^{k}({P}_{a})}\right)}^{l}.$$

By fitting both \({\mathbb{E}}{\langle {P}_{b}\rangle }_{l{m}_{0}+1}\) and \({\mathbb{E}}{\langle {P}_{b}\rangle }_{l{m}_{0}}\) as exponential decays in l, extracting the intercepts (function values at l = 0), and taking the ratio, we obtain an estimator \({\widehat{\lambda }}_{a}^{\,{{{{{{\mathrm{ICB}}}}}}}}\) that is asymptotically unbiased to \({\lambda }_{a}\cdot {\lambda }_{{P}_{a}}^{S}/{\lambda }_{{P}_{b}}^{S}\). This estimator is robust against measurement noise. Note that \({\lambda }_{{P}_{a}}^{S}={\lambda }_{{P}_{b}}^{S}=1\) if we assume perfect initial state preparation, and in this case the above shows that λa is learnable, and thus the entire Pauli noise channel is learnable. We note that, instead of fitting an exponential decay in l, one could in principle just take l = 0 and estimate the ratio of \({\mathbb{E}}{\langle {P}_{b}\rangle }_{0}\) and \({\mathbb{E}}{\langle {P}_{b}\rangle }_{1}\), which also yields a consistent estimate for \({\lambda }_{a}\cdot {\lambda }_{{P}_{a}}^{S}/{\lambda }_{{P}_{b}}^{S}\). If one has already obtained all the learnable information from previous experiments, this could be a more efficient approach. However, if one has not done those experiments, the intercept CB with multiple depths can estimate the intercept (unlearnable information) and slope (learnable information) simultaneously, which is more sample efficient.

We numerically simulate intercept CB for characterizing the CNOT gate under different state preparation (SP) and measurement (M) noise. As shown in Fig. 5, this method yields relatively precise estimate when there is only measurement noise even if the noise is orders of magnitude stronger than the gate noise, but will have large deviation from the true noise model even under small state preparation noise. We refer the reader to Supplementary Section III for more details about the numerical simulation.

Fig. 5: Simulation of intercept CB on CNOT under different SPAM noise rate.
figure 5

The simulated noise channel is a 2-qubit amplitude damping channel with effective noise rate 5%, and SPAM noise are modeled as bit-flip errors. For the blue (green) lines, we introduce random bit-flip errors to the measurement (state preparation). The solid lines show the l1-distance of the estimated Pauli fidelities from the true Pauli fidelities. The solid lines show the l1-distance of the (individually) learnable Pauli fidelities from the ground truth.

Finally, we experimentally implement intercept CB to estimate λXX and λZZ, which are the two unlearnable degrees of freedom of CNOT, allowing us to determine all the Pauli fidelities and Pauli error rates. One challenge in interpreting the results is that we do not know in general whether the low SP noise assumption holds, therefore it is unclear if the learned results should be trusted. However, for the estimate to be correct, it should at least lie in the physically feasible region we obtained earlier in Fig. 4. In Fig. 6, we present our experimental results of intercept CB. It turns out that certain Pauli fidelities are far away from the physical region by several standard deviations. This gives strong evidence that the low SP noise assumption was not true on the platform we used.

Fig. 6: The learned Pauli noise model using intercept CB.
figure 6

The feasible region (blue bars) are taken from Fig. 4. Estimates of Pauli fidelities (a) and Pauli error rates (b). Each data point is fitted using seven different circuit depths L = [2, 22, . . . , 27]. For each depth C = 150 random circuits and 2000 shots of measurements are used. Data are collected from ibmq_montreal on 2022-03-23.

The data collected here can further be used to give a lower bound for the SP noise. Suppose we obtain the physical region of λa to be \([{\widehat{\lambda }}_{a,\min },\, {\widehat{\lambda }}_{a,\max }]\). Combining with the expression of intercept CB, we have

$${\widehat{\lambda }}_{a}^{{{\,{{{{{\rm{ICB}}}}}}}}}/{\widehat{\lambda }}_{a,\max }\le {\lambda }_{{P}_{a}}^{S}/{\lambda }_{{P}_{b}}^{S}\le {\widehat{\lambda }}_{a}^{{{\,{{{{{\rm{ICB}}}}}}}}}/{\widehat{\lambda }}_{a,\min }.$$

Applying this to the data of IZ and ZZ in Fig. 6(a), we have \({\lambda }_{IZ}^{S}/{\lambda }_{ZZ}^{S}\le 0.9879(23)\). If we make a physical assumption that the state preparation noise is a random bit-flip during the qubit initialization, one can conclude the bit-flip rate on the first qubit is lower bounded by 0.61(12)%. One can in principle bound the bit-flip rate on the second qubit by looking at \({\lambda }_{XX}^{S}/{\lambda }_{XI}^{S}\). Unfortunately, our estimate of \({\lambda }_{XX}^{S}\) from intercept CB falls in the physical region within one standard deviation, so there is no nontrivial lower bound. One could expect obtaining a useful lower bound by looking at a CNOT gate with reversed control and target. The lower bound of SP noise obtained here is completely independent of the measurement noise and does not suffer from the issue of gauge freedom19, as long as all of our noise assumptions are valid, i.e., there is no significant contribution from time non-stationary, non-Markovian, or single-qubit gate-dependent noise.


We have shown how to characterize the learnability of Pauli noise of Clifford gates and discussed a method to extract unlearnable information by assuming perfect initial state preparation. It is also interesting to consider other physically motivated assumptions on the noise model to avoid unlearnability. For example, we can write down a parameterization of the noise model based on the underlying physical mechanism which may have fewer than 4n parameters. The main issue here is that these assumptions are highly platform-dependent and should be decided case-by-case. Moreover, it is unclear to what extent should the learned results be trusted when additional assumptions are made, since in general we cannot test whether the assumptions hold due to unlearnability.

Another direction to overcome the unlearnability is to change the model of quantum experiments. Here we have been working with the standard model as in gate set tomography, where a quantum measurement decoheres the system and only outputs classical information. However, some platforms might support quantum non-demolition (QND) measurements, and in this case measurements can be applied repeatedly, which could potentially allow more information to be learned33.

Recently, ref. 30 considered similar issues of noise learnability. They studied a different Pauli noise model with perfect initial state \(\left|0\right\rangle\), perfect computational basis measurement, and noisy single qubit gates, and showed the existence of unlearnable information. In contrast, here we focus on the learnability of Pauli noise of multi-qubit Clifford gates assuming perfect single-qubit gates (with noisy SPAM), and in practice we make the standard assumption that noise on single-qubit gates is gate-independent (e.g.23), in which case our noise learning results are interpreted as characterizing a dressed cycle.

This work leaves open the question of noise learnability for non-Clifford gates. An issue here is that randomized compiling is not known to work with non-Clifford gates in general, so it is unclear if the general CPTP noise learnability problem can be reduced to Pauli noise. Recent work14 shows that random quantum circuits can effectively twirl the CPTP noise channel into Pauli noise and can be used to learn the total Pauli error. The question of whether more information can be learned still remains open.

Another issue to address is the scalability in noise learning. It is impossible to estimate all learnable degrees of freedom efficiently as there are exponentially many of them (an exponential lower bound on the sample complexity is shown in16). One way to avoid the exponential scaling issue is to assume the noise model has certain special structure (such as sparsity or low-weight) such that the noise model only has polynomially many parameters10,11,22,34. It is an interesting open direction to study the characterization of learnability under these assumptions, and we give some related discussions in Supplementary Section II D.