Introduction

Quantum Computation is an emerging field that sits at an intersection between computer science, electrical engineering and quantum physics. By now, scientists and engineers alike are optimistic of its computational and technological potential, achieving a so-called computational supremacy1,2. Countless of applications for quantum devices have been proposed from building a quantum internet3,4 to enhancing and feeding into scientific discoveries5 to revolutionizing simulation6, neural network7 and optimization8. Some of these applications are already being tested and even practically implemented9. That said, quantum computer implementation technologies are still a work in progress. Results on different qubit technologies, computer models, and interconnects and more have recently been published10,11,12.

Building large-scale quantum computers is still a challenging task due to a plethora of engineering obstacles13. One prominent challenge is the intrinsic noise. In fact, implementing scalable and reliable quantum computers requires implementing quantum gates with sufficiently low error rates. There has been substantial progress in characterizing noise in a quantum system14,15,16 and in building error correcting schemes that can detect and correct certain types of errors17,18,19.

Numerous protocols have been constructed to characterize the noise in quantum devices. Many of these protocols fail in achieving one of the following desirables: scalability to large-scale quantum computers and efficient characterization of the noise. Quantum Process Tomography20 is a protocol that can give a complete description of the dynamics of a quantum black box, however, it’s not scalable to large-scale quantum systems. Randomized Benchmarking (RB) is another protocol that’s typically used to estimate the error rate of some set of quantum gates21,22. Although RB is a scalable protocol in principle, it can only measure a single error rate that’s used to approximate the average gate infidelity thus providing an incomplete description of noise. Various other protocols based on RB protocol are able to characterize the correlations of noise between the different qubits, however, these protocols lack scalability21,23,24. Quantum Error Mitigation25 (QEM) is a recently emerging field that aims to improve the accuracy of near-term quantum computational tasks26 many times through data post-processing. It is considered a much more feasible alternative, for the time being, to Quantum Error Correction (QEC) which revolves around encoding quantum states in multi-qubit entangled states27,28 to achieve universal fault-tolerant quantum computation. QEM protocols include zero-noise Richardson extrapolation29, probabilistic error cancellation through sampling circuits and taking their weighted average30,31, and exploiting state-dependent bias through invert-and-measure techniques to map the predicted state to the strongest one32. Others have worked on using diverse ansatzs/models to predict observables’ noise-free values33, distribution correction factors34 or even circuit noise metrics (to behave as an objective function for a quantum compiler35). To mention a few more, there is quantum subspace expansion, symmetry verification and countless learning-based techniques available recently36,37.

Measurement Error Mitigation (MEM) is a QEM protocol that models the noise in a quantum circuit as a measurement noise matrix \({\varvec{E}}_{meas}\) applied to the ideal output of the circuit. The columns of \({\varvec{E}}_{meas}\) are the probability distributions obtained through preparing and immediately measuring all possible \(2^n\) basis input states38.

Recently, the authors in39 developed a protocol based on the RB that relies on the concept of a Gibbs Random Field (GRF) to completely and efficiently estimate the error rates of the Pauli Channel and detect correlated errors between the qubits in a quantum computer. Their effort paves the way to enable quantum error correction and/or mitigation schemes. Herein, we refer to their efficient learning protocol as the {EL protocol}. In this paper, we build upon the EL protocol and decompose the average noise of a quantum circuit of specific depth into State Preparation and Measurement (SPAM) error and average gate error. We propose a linear algebraic based protocol and proof to efficiently construct and model the average behavior of noise in a quantum system for any desired circuit depth without having to run a large number of quantum circuits on the quantum computer or simulator. We then rely on this model to mitigate the noisy output of the quantum device. Noise mitigation schemes are often categorized into two, either active involving online processing dependent on each run and operation, or passive in which it is independent of the run and operation. Our proposed technique is considered a passive one since only one noise characterization is made for the quantum device at hand and then used to mitigate errors for any arbitrary circuit of specific depth on that device. For an n-qubit quantum system, the average behavior of the noise can be well approximated as a special form of a Pauli Channel40,41,42. A Pauli channel \(\varepsilon \) acts on a qubit state \(\varvec{\rho }\) to produce

$$\begin{aligned} \varepsilon ({\varvec{\rho }})=\sum _{i}p_i{\varvec{P}}_{i}{\varvec{\rho }}{\varvec{P}}_{i} \end{aligned}$$
(1)

where \(p_i\) is an error rate associated with the Pauli operator \({\varvec{P}}_{i}\). The \(p_i\)’s form a probability distribution \((\sum _{i}p_{i}=1)\), and are related to the eigenvalues, \(\varvec{\lambda }\), of the Pauli Channel defined as

$$\begin{aligned} \lambda _i=2^{-n}Tr({\varvec{P}}_{i}\varepsilon ({\varvec{P}}_{i})) \end{aligned}$$
(2)

Thus, when a state \({\varvec{\rho }}\) is subjected to the noisy channel \(\varepsilon \), \(p_i\) describes the probability of a multiqubit Pauli error \({\varvec{P}}_{i}\) affecting the system, while \(\lambda _i\) describes how faithfully a given multispin Pauli operator is transmitted. \({\varvec{p}}\) and \({\varvec{\lambda }}\) are related by Walsh–Hadamard transform, \({\varvec{W}}\) where

$$\begin{aligned} {\varvec{\lambda }}={\varvec{Wp}} \end{aligned}$$
(3)

While RB only estimates the average value of all \(\lambda _i\) of the Pauli Channel, the EL protocol estimates the individual \(\lambda _i\). A complete characterization of the Pauli channel requires learning more than the eigenvalues or error rates associated with single-qubit Pauli operators such as \({\varvec{\sigma }}_{z}^{(1)}\) or \({\varvec{\sigma }}_{x}^{(3)}\); it requires learning all of the noise correlations in the system, that is, also learning the eigenvalues and error rates associated with multiqubit Pauli operators such as \({\varvec{\sigma }}_{z}^{(1)} \otimes {\textbf{1}}^{(2)} \otimes {\varvec{\sigma }}_{x}^{(3)}\) and how they vary compared to the ones obtained under independent local noise. Estimating these correlations is essential for performing optimal QEC and/or QEM. However, these correlations increase exponentially as the number of qubits increases, so having an efficient noise characterization protocol is crucial to direct the error mitigation efforts to capture the critical noise correlations.

Our method relies on the error rates vector \({\varvec{p}}\) of the Pauli-Channel to decompose the average behavior of noise for circuits of depth m into two noise components: a SPAM error matrix denoted by the matrix \({\varvec{N}}\) and a depth dependent component comprising an average gate error matrix denoted by the matrix \({\varvec{M}}\). We evaluate our model for the average noise by predicting the average probability distribution for circuits of depth m and computing the distance between this predicted distribution and the empirically obtained one. Finally, we use our proposed decomposition to mitigate noisy outputs of random circuits and compare our mitigation protocol with the MEM protocol38. We applied our noise characterization and mitigation protocols on the following IBM Q 5-qubit quantum computers: Manila, Lima, and Belem43.

Results

Proposed protocol theory

The ideal output probability distribution of an n-qubit quantum circuit with depth m is perturbed by the SPAM and the average gate errors. Our aim is to construct a comprehensive linear algebraic model that takes into account both these errors for an arbitrary depth m. Matrix algebra can then be employed to mitigate the noise as follows:

$$\begin{aligned} {\varvec{C}}_{ideal}= {\varvec{Q}}_{m}^{-1}{\varvec{C}}_{noisy} \end{aligned}$$
(4)

where \({\varvec{Q}}_{m}\) is the characterized noise matrix for circuits of depth m, \({\varvec{C}}_{ideal}\) and \({\varvec{C}}_{noisy}\) are the ideal and noisy outputs of a given circuit of depth m, respectively. The straight-forward approach would be to construct \({\varvec{Q}}_m\) from empirical simulations in a similar fashion to the \({\varvec{E}}_{meas}\) noise matrix that was characterized in the MEM scheme. The columns of \({\varvec{Q}}_{m}\) comprise the emperical average probability distributions for basis input states \({|{{\varvec{in}}}\rangle }\in \{ {|{{\varvec{0}}}\rangle },\,{|{{\varvec{1}}}\rangle },\,\dots ,\,{|{{\varvec{2^n-1}}}\rangle }\}\), denoted by \(\hat{{\varvec{q}}}(m,{|{{\varvec{in}}}\rangle })\), where \(\hat{{\varvec{q}}}(m,{|{{\varvec{in}}}\rangle })\) are obtained through sampling a number of depth m circuits to incorporate the average gate and SPAM errors.

$$\begin{aligned} {\varvec{Q}}_m= \begin{bmatrix} \hat{{\varvec{q}}}(m,{|{{\varvec{0}}}\rangle })&\hat{{\varvec{q}}}(m,{|{{\varvec{1}}}\rangle })&\ldots&\hat{{\varvec{q}}}(m,{|{{\varvec{2^n-1}}}\rangle }) \end{bmatrix} \end{aligned}$$
(5)

Building \({\varvec{Q}}_m\), however, through empirical simulations can be expensive especially when the circuit depth is large. Herein, we propose a method for an efficient estimation of \({\varvec{Q}}_m\) where the individual probability distributions \(\hat{{\varvec{q}}}(m,{|{{\varvec{in}}}\rangle })\) are estimated as follows:

$$\begin{aligned} {\varvec{q}}'(m, {|{{\varvec{in}}}\rangle }) = {\varvec{N}}_{in}{\varvec{M}}_{in}^m{|{{\varvec{in}}}\rangle } \end{aligned}$$
(6)

where \({\varvec{N}}_{in}\) and \({\varvec{M}}_{in}\) are input-specific matrices that represent the SPAM error matrix and average gate error for input \({|{{\varvec{in}}}\rangle }\), respectively. Both \({\varvec{M}}_{in}\) and \({\varvec{N}}_{in}\) are extracted empirically using random circuits from a set of small circuit depths T and then used in mitigating the outputs for circuits with higher depths. We first show the construction of \({\varvec{N}}_{0}\) and \({\varvec{M}}_{0}\).

The construction of \({\varvec{N}}_{0}\) and \({\varvec{M}}_{0}\) proceeds by estimating the error rates vector \({\varvec{p}}\) associated with the Pauli Channel based on the assumption in Eq. 1 for the average behavior of the noisy quantum device at hand using the EL protocol. The protocol proceeds by constructing K random identity circuits of depth \(m \in T\)23,39. Each circuit is constructed by initializing the qubits to the all-zeros state \({|{{\varvec{0}}}\rangle }\) followed by choosing a random sequence \(s \in S_{m}\), the set of all length m sequences of one-qubit Clifford gates applied independently on each qubit, followed by an inverse gate for the chosen sequence to ensure an identity circuit. It then estimates the resulting empirical probability distribution \(\hat{{\varvec{q}}}(m,{|{{\varvec{0}}}\rangle })\) by averaging over all the empirical probability distributions \(\hat{{\varvec{q}}}(m,s,{|{{\varvec{0}}}\rangle })\) for the constructed random identity circuits of depth m, that is,

$$\begin{aligned} \hat{{\varvec{q}}}(m,{|{{\varvec{0}}}\rangle })= \frac{1}{K}\sum \hat{{\varvec{q}}}(m,s,{|{{\varvec{0}}}\rangle }) \end{aligned}$$
(7)

\(\hat{{\varvec{q}}}(m,{|{{\varvec{0}}}\rangle })\) is a vector with \(2^n\) entries each corresponding to the possible observed outcome. A Walsh–Hadamard transform is then applied on each \(\hat{{\varvec{q}}}(m,{|{{\varvec{0}}}\rangle })\) to obtain

$$\begin{aligned} {\varvec{\Lambda }}(m)={\varvec{W}}\hat{{\varvec{q}}}(m,{|{{\varvec{0}})}\rangle } \end{aligned}$$
(8)

Each parameter \(\Lambda _{i}(m)\) in \({\varvec{\Lambda }}(m)\) is fitted to the model

$$\begin{aligned} \Lambda _{i}(m)=A_{i}\lambda _{i}^{m} \end{aligned}$$
(9)

where \(A_i\) is a constant that absorbs SPAM errors and the vector \({\varvec{\lambda }}\) of all fitted parameters \(\lambda _i\) is a SPAM-free estimate to the eigenvalues of the Pauli Channel defined in Eq. 2. Notice that we can rewrite Eq. 9 as

$$\begin{aligned} {\varvec{\Lambda }}(m)={\varvec{A\lambda }}^{m} \end{aligned}$$
(10)

where \({\varvec{A}}\) is a diagonal matrix where the diagonal entries are \(A_i\) and \({\varvec{\lambda }}^m\) is an element-wise exponentiation of a vector. An inverse Walsh–Hadamard Transform is then applied on \({\varvec{\lambda }}\) to get the error rate vector \({\varvec{p}}\) of the Pauli Channel as

$$\begin{aligned} {\varvec{p}}={\varvec{W}}^{-1}{\varvec{\lambda }} \end{aligned}$$
(11)

\({\varvec{p}}\) is then projected onto a probability simplex to ensure \(\sum _{i}p_i=1\). Introducing the GRF model by the EL protocol allows the scalability of estimating \({\varvec{p}}\) with the increase in the number of qubits. The GRF model assumes the noise correlations are bounded between a number of neighboring qubits depending on the architecture of the quantum computer at hand. Thus, decreasing the number of noise correlations to be estimated.

The final outcome \({\varvec{p}}\) of the EL protocol represents the SPAM-free probability distribution of the average noise in the quantum computer. Each element \(p_i \in {\varvec{p}}\) corresponds to the probability of an error of the form binary(i) on an input state \({|{{\varvec{0}}}\rangle }\). For example, for a 5-qubit quantum computer, \(p_0\) corresponds to the probability of no bit flips on the input state, i.e., error of the form IIIII, \(p_1\) to the error of the form IIIIX, \(p_2\) to the error of the form IIIXI, etc...

In order to proceed with the proof for our proposed decomposition of Eq. (6) for input state \({|{{\varvec{0}}}\rangle }\), we first state the following lemma (the detailed proof of the lemma can be found Section I in the supplementary):

Lemma 1

Let \({\varvec{\lambda }}\) and \({\varvec{p}}\) be the respective eigenvalues and error rates of a Pauli Channel with n qubits, then \({\varvec{\lambda }}^m={\varvec{WM}}^m{|{{\varvec{0}}}\rangle }\) where \({\varvec{M}}\) is a \(2^n \times 2^n\) matrix such that \(M_{ij}=p_{i\oplus j}\) (\(i \oplus j\) is the bitwise exclusive-OR operator).

Using Lemma 1 and Eqs. 8 and 10, \({\hat{q}}(m,{|{0}\rangle })\) can be estimated as

$$\begin{aligned} {\varvec{q}}'(m,{|{{\varvec{0}}}\rangle })={\varvec{W}}^{-1}{\varvec{AWM}}^{m}{|{{\varvec{0}}}\rangle } \end{aligned}$$
(12)

The transition matrix \({\varvec{M}}={\varvec{M}}_{0}\) represents the average error per gate while the \({\varvec{N}}={\varvec{W}}^{-1}{\varvec{AW}}={\varvec{N}}_{0}\) matrix represents the SPAM errors for an input state \({|{{\varvec{0}}}\rangle }\). Notice that the average noise for depth m circuits on an input state \({|{{\varvec{0}}}\rangle }\) behaves as a sequence of m average noise gates \({\varvec{M}}_{0}\) followed by SPAM errors \({\varvec{N}}_{0}\).

The construction of \({\varvec{N}}_{in}\) and \({\varvec{M}}_{in}\) for input state \({|{{\varvec{in}}}\rangle }\) proceeds similar to the procedure of constructing \({\varvec{N}}_{0}\) and \({\varvec{M}}_{0}\), however, a permutation of \(\hat{{\varvec{q}}}(m,{|{{\varvec{in}}}\rangle })\) is required before applying a Walsh–Hadamard transform to ensure that each element \(p_{i}({|{{\varvec{in}}}\rangle })\) in the input-specific error rate vector \({\varvec{p}}({|{{\varvec{in}}}\rangle })\) corresponds to the probability of an error of the form binary(i) on an input state \({|{{\varvec{in}}}\rangle }\). This permutation is done by applying an input-specific permutation matrix \({\varvec{\pi }}_{in}\) on \(\hat{{\varvec{q}}}(m,{|{{\varvec{in}}}\rangle })\) \(\forall m\) where \(\pi _{in_{ij}}=1\) if \(i\oplus j=in\) and 0 otherwise. It must be noted that \({\varvec{Q}}_m\) can be nearly singular especially for large \({\varvec{m}}\) as can be directly inferred from Eq. 12 (\(\,\left| {\varvec{M}}^{m}\right| \sim 0\,\)). In this case, the results are random hence the desire to use some enhancement technique to address this issue as much as feasible. For example, one way to do so is by utilizing unfolding techniques44. In this work, we did not enhance the results as we only experimented with relatively smaller values of \({\varvec{m}}\).

Experiments

In this section, we evaluate the accuracy of the model in Eq. 12 in predicting the average probability output, \(\hat{{\varvec{q}}}(m,{|{{\varvec{0}}}\rangle })\), for identity circuits of higher depths by estimating \({\varvec{A}}_0\) and \({\varvec{p}}({|{{\varvec{0}}}\rangle })\) using only simulations of lower depths identity circuits. Denote by \({\varvec{q}}'(m,{|{{\varvec{0}}}\rangle })\) the predicted average probability distribution obtained using Eq. 12. We select a training set of depths \(T=\{1,\,2,\,\dots ,\,m_{max}\}\) to estimate \({\varvec{A}}_0\) and \({\varvec{p}}\) using the EL protocol followed by the construction of the average gate error matrix \({\varvec{M}}_{0}\) and SPAM error matrix \({\varvec{N}}_{0}\) where \(M_{0_{ij}}=p_{i \oplus j}({|{{\varvec{0}}}\rangle })\) and \({\varvec{N}}_{0}={\varvec{W}}^{-1}{\varvec{A}}_{0}{\varvec{W}}\). A new testing set of depths \(T'=\{m_{max}+1,\,m_{max}+2,\,\dots ,\,100\}\) is then selected where we compute the Jensen–Shannon Divergence (JSD) between \(\hat{{\varvec{q}}}(m',{|{{\varvec{0}}}\rangle })\) and \({\varvec{q}}'(m',{|{{\varvec{0}}}\rangle })\) \(\forall m'\in T'\). The JSD measures the similarity between the two probability distributions45. The lower the JSD, the closer the two distributions are. More information about the JSD can be found in Section II in the supplementary. Figure 1 presents the computed JSD for different quantum computers while varying \(m_{max}\). Figure 2 presents the average and standard deviation for the test JSD values for the different quantum computers. The average test JSD varies between 0.024 and 0.056 for the different \(m_{max}\) values with lower average JSD values noted for high m for \(m_{max}=80\) as indicated in Fig. 2b.

Figure 1
figure 1

\(JSD(\hat{{\varvec{q}}}(m,{|{{\varvec{0}}}\rangle }),\,{\varvec{q}}'(m,{|{{\varvec{0}}}\rangle }))\) for training sets of depths T and testing sets of depths \(T'\) with variable maximum training depth \(m_{max}\in \{20,\,50,\,80\}\) on different IBM Q 5-qubit quantum computers.

Figure 2
figure 2

The average and standard deviation of \(JSD(\hat{{\varvec{q}}}(m,{|{{\varvec{0}}}\rangle }),\,{\varvec{q}}'(m,{|{{\varvec{0}}}\rangle }))\); (a) over all depths \(m \in [m_{max}+1,100]\) and (b) over depths \(m \in [80,100]\) while varying the maximum training depth \(m_{max}\) on different IBM Q 5-qubit quantum computers.

We rely on \({\varvec{q}}'(m, {|{{\varvec{in}}}\rangle })\) to construct and evaluate the mitigation power of \({\varvec{Q}}_m\) for different depths. We first select a training set of depths \(T=\{1,\,20,\,40,\,60,\,80,\,100\}\) to estimate \({\varvec{A}}_{in}\) and \({\varvec{p}}({|{{\varvec{in}}}\rangle })\) for each input state \({|{{\varvec{in}}}\rangle }\) using the EL protocol followed by the construction of \({\varvec{M}}_{in}\) using \({\varvec{p}}({|{{\varvec{in}}}\rangle })\) and \({\varvec{N}}_{in}={\varvec{W}}^{-1}{\varvec{A}}_{in}{\varvec{W}}\). We then estimate \(\hat{{\varvec{q}}}(m,{|{{\varvec{in}}}\rangle })\) as \({\varvec{q}}'(m,{|{{\varvec{in}}}\rangle })\) for all inputs using Eq. 6 in order to construct \({\varvec{Q}}_{m}\) using Eq. 5. We then choose a new testing set of depths \(T'=\{10,\,30,\,50,\,70,\,90\}\) so that \({\varvec{Q}}_m\) is used in mitigating the outputs for circuits of depth \(m \in T'\) where for a given identity circuit of depth m with input \({|{{\varvec{in}}}\rangle }\) and sequence s of gates, the mitigated output \(\hat{{\varvec{q}}}(m,s,{|{{\varvec{in}}}\rangle })_{mit}\) in obtained as

$$\begin{aligned} \hat{{\varvec{q}}}(m,s,{|{{\varvec{in}}}\rangle })_{mit}={\varvec{Q}}_{m}^{-1}\hat{{\varvec{q}}}(m,s,{|{{\varvec{in}}}\rangle }) \end{aligned}$$
(13)

\(\hat{{\varvec{q}}}(m,s,{|{{\varvec{in}}}\rangle })_{mit}\) is projected onto a probability simplex to ensure a probability distribution. The JSD between \(\hat{{\varvec{q}}}(m,s,{|{{\varvec{in}}}\rangle })_{mit}\) and the ideal output \({|{{\varvec{in}}}\rangle }\) is computed and then averaged over all input states and all random circuits of depth m. We also compare our proposed mitigation protocol using \({\varvec{Q}}_m\) with the MEM scheme (Fig. 3). We report upto 88% improvement in the JSD value for the proposed approach compared to the unmitigated approach, and upto 69% improvement compared to MEM approach. Note that for the results presented here, we rely on the average SPAM free error rate, \({\varvec{p}}_{avg}=\frac{1}{2^n}\sum _{in=0}^{2^n-1}{{\varvec{p}}({|{{\varvec{in}}}\rangle })}\) to construct \({\varvec{M}}_{in}={\varvec{M}}_{avg}\) for all inputs. We compare the results using \({\varvec{p}}_{avg}\) and \({\varvec{p}}({|{{\varvec{in}}}\rangle })\) in the supplementary Section V. \({\varvec{N}}_{in}\) remains input specific. Further elaborations on the results are presented in supplementary Section VI.

Figure 3
figure 3

Average JSD between the ideal output \({|{{\varvec{in}}}\rangle }\) and each of the unmitigated output \(\hat{{\varvec{q}}}(m,s,{|{{\varvec{in}}}\rangle })\), mitigated output by the MEM protocol, and mitigated output by our proposed noise model for each depth m on IBM Q 5-qubit quantum computers.

Complexity

So far in the estimation of \({\varvec{M}}_{in}\) and \({\varvec{N}}_{in}\) for each input state \({|{{\varvec{in}}}\rangle }\) using the EL protocol, K random circuits are generated for each depth \(1\le m \le m_{max}\) where the EL protocol requires \(O(2^{2n})\) for the Walsh–Hadamard transform which can be reduced into \(O(n^2)\) using fast Walsh–Hadamard transform. Thus, the overall complexity of the construction of \({\varvec{M}}_{in}\) and \({\varvec{N}}_{in}\) for all input states is \(O(m_{max}Kn^22^{n})\). Furthermore, the GRF model factors the error rates vector into a product of \(\sim O(n)\) factors, depending on the architecture of the quantum computer, where each factor depends on a subset of adjacent qubits of cardinality \(N<<n\) (typically \(N=4\)). Thus, the complexity is reduced further into \(O(nm_{max}KN^{2}2^{n})\). The construction of \({\varvec{Q}}_m\) would be based on Eq. 6 for each input state \({|{{\varvec{in}}}\rangle }\) where \({\varvec{M}}_{in}^m\) can be computed efficiently using the Singular Value Decomposition (SVD) of \({\varvec{M}}_{in}\), thus the construction of \({\varvec{Q}}_m\) is \(O(2^{3n})\). For the MEM scheme, the construction of \({\varvec{E}}_{meas}\) requires only generating K circuits with no gates for each input state \({|{{\varvec{in}}}\rangle }\), thus the complexity is \(O(K2^{n})\). For mitigation, both protocols are based on matrix inversion, thus the complexity for mitigation is \(O(2^{3n})\).

Discussion

The proposed mitigation protocol builds upon the SPAM-free noise characterization protocols for low circuit depths to generate a SPAM-error matrix \({\varvec{N}}_{in}\) and an average gate error matrix \({\varvec{M}}_{in}\) for each input state \({|{{\varvec{in}}}\rangle }\). It then constructs a noise mitigation matrix \({\varvec{Q}}_{m}\) for arbitrary circuit depths m where the columns of \({\varvec{Q}}_{m}\) are the estimated average probability distributions \({\varvec{q'}}(m,{|{{\varvec{in}}}\rangle })={\varvec{N}}_{in}{\varvec{M}}_{in}^m\). The mitigated output \(\hat{{\varvec{q}}}(m,\,s)_{mit}\) of a given circuit of depth m with sequence s of gates is obtained by applying \({{\varvec{Q}}_{m}}^{-1}\) on the empirical circuit output \(\hat{{\varvec{q}}}(m,\,s)\).

We evaluated the accuracy of our model in estimating the average probability distributions for high depth circuits and evaluated our mitigation protocol on the IBM Q 5-qubit quantum devices: Belem, Lima, and Manila. For the model accuracy evaluations, for the different \(m_{max}\) values, we reported on average a test \(JSD(\hat{{\varvec{q}}}(m,{|{{\varvec{0}}}\rangle }),{\varvec{q}}'(m,{|{{\varvec{0}}}\rangle }))\) value around 0.022–0.028 for Manilla, 0.03–0.055 for Lima, and 0.028–0.048 for Belem. For \(m_{max}=20\), the test JSD values varied between 0.005 and 0.05 for Lima computer, 0.01 and 0.06 for Manila computer, and 0.02 and 0.09 for Belem computer. We note that for \(m_{max}=20\) the test spans \(m \in \) [21–100]. For higher depths \(m \in \) [80–100], on average \(m_{max}=80\) resulted in better model error than \(m_{max}=50\) and \(m_{max}=20\). Results for IBM Q Athens are presented in the supplementary.

Finally, we report upto 88% JSD improvement for the proposed approach compared to the unmitigated approach with significant mitigation improvement compared to MEM at mid to higher depths. Specifically, for \(m=90\), we reported 58%, 66% and 85% JSD improvement for the proposed approach compared to the unmitigated on Belem, Manilla, and Lima respectively. This is compared 12%, 17% and 51% respectively for the MEM. On average for all the test depths across the different machines, we report 68.4% JSD improvement for the proposed versus 38.2% for the MEM improvement compared to the unmitigated approach.

In addition, we compared the proposed mitigation technique against zero-noise Richardson extrapolation (ZNRE) method29. Our results shows superior performance over ZNRE. the full details of the comparison and the results can be found in the supplementary file in Section VII.

Methods

In evaluating the accuracy of the model, we run \(K=1000\) random identity circuits with each submission requesting 1024 shots for each depth \(m \in \{1,\,2,\,\dots ,\,100\}\). In evaluating the mitigation power of \({\varvec{Q}}_m\), we run \(K=1000\) random identity circuits for depths \(m \in \{1,\,10,\,20,\,30,\,40,\,50,\,60,\,70,\,80,\,90,\,100\}\) for each basis input state \({|{{\varvec{in}}}\rangle }\) with each circuit requesting 1024 shots. The constructed circuits contain single qubit Clifford gates only. We run the circuits on the following IBM Q 5-qubit quantum computers: Manila, Lima, and Belem. Theoretical derivations and numerical details essential to the study are presented in the Results section. More details can be found in the Supplementary Information. For the configurations and noise profiles of the IBM quantum machines, please go to IBM Quantum Experience at http://www.research.ibm.com/quantum.