Adaptive quantum error mitigation using pulse-based inverse evolutions

Quantum Error Mitigation (QEM) enables the extraction of high-quality results from the presently-available noisy quantum computers. In this approach, the effect of the noise on observables of interest can be mitigated using multiple measurements without additional hardware overhead. Unfortunately, current QEM techniques are limited to weak noise or lack scalability. In this work, we introduce a QEM method termed `Adaptive KIK' that adapts to the noise level of the target device, and therefore, can handle moderate-to-strong noise. The implementation of the method is experimentally simple -- it does not involve any tomographic information or machine-learning stage, and the number of different quantum circuits to be implemented is independent of the size of the system. Furthermore, we have shown that it can be successfully integrated with randomized compiling for handling both incoherent as well as coherent noise. Our method handles spatially correlated and time-dependent noise which enables to run shots over the scale of days or more despite the fact that noise and calibrations change in time. Finally, we discuss and demonstrate why our results suggest that gate calibration protocols should be revised when using QEM. We demonstrate our findings in the IBM quantum computers and through numerical simulations.


I. INTRODUCTION
Quantum computers have now entered a stage where even the most powerful classical computers cannot rival them at certain specific tasks [1][2][3].Despite this enormous progress, current quantum computers are subjected to substantial levels of noise that must first be handled for quantum algorithms to be competitive in problems of practical interest.Quantum error correction stands as the primary candidate towards this goal, yet its integration in problems such as Shor's factoring algorithm may require thousands of physical qubits per each encoded logical qubit [4,5].
The goal of QEM is to mitigate errors in post processing rather than correcting them in real time.For example, the zero-noise extrapolation (ZNE) approach [13] employs circuits that are logically equivalent to the ideal target evolution but have a controllable error rate.The noiseless quantity is estimated via extrapolation to the zero-noise limit, after a fitting of noisy data to some noise scaling ansatz.However, it has been noted that ZNE is typically restricted to weak noise and may fail to mitigate errors of moderate magnitude [6,10].In addition, a proper noise scaling for ZNE depends on simplified noise models such as time-independent noise [6] or global depolarization [13], or assumes the ability to precisely scale the noise via circuits that ideally reproduce the identity operation [30].Alternative QEM approaches do not rely on these assumptions but consider errors featuring limited spatial or temporal correlations [6,8,9,20,21], or resort to machine learning protocols [9,10] with a non-established computational cost.Consequently, it is unclear whether these alternatives are applicable to large quantum circuits.
In this work, we introduce the "KIK method" for QEM.This technique can be used to mitigate the effect of timedependent and spatially correlated noise, by executing circuit sequences that only contain the target evolution and its inverse.Furthermore, the QEM accuracy is determined by the number of circuit sequences, irrespective of the size of the system.The KIK method adapts to the noise level of the system for improved QEM under strong noise.In the weak noise limit, we find that it reproduces the prediction of Richardson ZNE using circuit unitary folding [13], and elucidates the proper implementation of the corresponding inverse circuits -a key aspect not discussed in the circuit folding approach.
We test the KIK method on a ten-swap circuit and for calibrating a CNOT gate, using the IBM quantum computing platform.In both cases, we show that it consistently outperforms QEM based on circuit folding, regardless of the intensity of the noise.In the calibration experiment, noise induces biased gate parameters that translate into coherent errors.KIK-based calibration can efficiently mitigate these coherent errors by reducing the associated bias.Conversely, we find that circuit folding produces erroneous and inconsistent results.We also simulate the fidelity obtained with a noisy ten-step Trotterization [31] of the transverse Ising model on five qubits.For unmitigated fidelities as low as 0.85, we show that second order mitigation produces final fidelities beyond 0.99.

II. THE KIK FORMULA FOR TIME-DEPENDENT NOISE
We adopt the Liouville-space formalism of Quantum Mechanics [32] (see Appendix SI in [33]), in which density matrices that describe quantum states are written as vectors, and quantum operations as matrices that act on these vectors.In the following, we will employ calligraphic fonts to denote quantum operations.For example, the unitary evolution associated with an ideal (noise-free) quantum circuit and its noisy implementation will be written as U and K, respectively.As explained in Sec.III, the KIK method operates by executing circuit sequences where K is alternated with a suitable inverse evolution K I .In the absence of noise, K I coincides with the inverse unitary U † , and it is performed through a pulse schedule that relates in a simple way to the pulse schedule used for K. Therefore, both K and K I require the same level of control.We refer the reader to Appendix SII [33] for more details.
Our main result is the "KIK formula" where U KIK denotes a (not necessarily unitary) approximation to U derived in Appendix SIII [33], and the name "KIK" is a reference to the product K I K in Eq. (1).To derive this equation, we solve the open-system dynamics for the noisy evolution and represent the solution to K using a Magnus expansion [34], from which we can extract the approximation U KIK by neglecting Magnus terms beyond the first order.Details on the master equation used to characterize the noisy evolution are given in Appendix SII [33].
We remark that Eq. ( 1) is applicable to quantum circuits K subjected to time-dependent and spatially correlated noise, and also encompasses gate-dependent errors.In Appendix SIII [33], we also discuss the scenario where noise parameters drift during the experiment, which occurs for example due to temperature variations or laser instability.We show that the impact of noise drifts can be practically eliminated, if the execution order of the circuits K (K I K) m in Eq. ( 2) is properly chosen.

III. QEM USING THE KIK FORMULA
Since (K I K) − 1 2 is not directly implementable in a quantum device, we utilize polynomial expansions of (K I K) The notation U (M ) KIK represents an M th-order approximation to U KIK , with real coefficients {a m } M m=0 .To estimate an error-free expectation value we separately execute the circuits K (K I K) m on the initial state ρ, for 0 ≤ m ≤ M , measure the observable of interest on the resulting final states, and compute the sum of the outcomes weighted by the a (M ) m .Different expansions of the KIK formula can be considered depending on the strength of the noise affecting a single KIK cycle K I K.For very weak noise, K I K would be close to the identity operator I and therefore all its eigenvalues are expected to be close to 1. Letting λ denote a general eigenvalue of K I K, a Taylor expansion of λ − 1 2 around 1 constitutes a reasonable approximation in the weak noise regime.In this case, the coefficients a T ay,m λ m .We analytically derive the coefficients a (M ) T ay,m in Appendix SIV [33], for any order M , showing also that they predict QEM using Richardson ZNE, with linear noise scaling through circuit unitary folding [13].Nevertheless, we stress that a distinctive characteristic of our approach is the pulse-based inverse K I .As proven in Appendix SIV, for circuits that satisfy U 2 = I, using instead a "circuit-based" inverse K I = K introduces an additional error term that afflicts U (M ) KIK (cf.Eq. ( 2)) for any M .Thus, ignoring the pulse inverse hinders QEM in paradigmatic gates such as the CNOT, swap, or Toffoli gate.
In addition, standard circuit folding makes no distinction between noise amplification using powers of K I K or KK I , as both choices reproduce the identity operation in the absence of noise.However, proper QEM involves powers of K I K, as shown in Appendix SIII [33].

IV. QEM IN THE STRONG NOISE REGIME
To better cope with moderate and strong noise, we introduce the quantity where µ = Tr (ρ ρ), ρ is the initial state, and ρ is the state obtained by evolving ρ with the KIK cycle K I K.
For a pure state ρ, µ is the survival probability under the evolution K I K, and the lower limit g(µ) in the integration interval [g(µ), 1] is a monotonically increasing function of µ such that 0 ≤ g(µ) L2 can be interpreted as the total error in the approximation (2), over the interval [g(µ), 1].The coefficients a (M ) m can be chosen to minimize ε KIK constitutes a trace-preserving map.This leads to "adaptive mitigation" with the coefficients {a Adap,m } M m=0 , which we analytically derive in Appendix SIV [33] for 0 ≤ M ≤ 3. We also note that, by virtue of Eq. ( 3), these coefficients are functions of g(µ).
When the noise is strong one can expect the smallest eigenvalue K I K to be substantially smaller than one, and the function g(µ) should be chosen accordingly.In particular, if g(µ) is much larger than this eigenvalue the quantity ε L2 may fail to properly quantify the error in the approximation U (M ) KIK .On the contrary, if g(µ) is smaller than the lowest λ, it is always possible to systematically improve the approximation by increasing the mitigation order M .We confirm this through a simulation of the transverse Ising model of five qubits in Appendix SV [33], implemented with ten cycles of Trotterization [31].In this simulation, we use functions {g(µ)} = {1, µ, µ 2 , µ 2.5 }, and observe that µ 2 features the best performance with respect to the final fidelity.This fact was corroborated by additional simulations (not included in Appendix SV), and by the ten-swap experiment presented in the following section.We leave the study of experimental criteria for choosing g(µ) and the potential improvements that this possibility can entail for the KIK method for future work.Along the same direction, one could also explore the use of norms other than the L2 norm employed in Eq. ( 3), and weight functions w(λ) = 1 in the integrand of this expression.
Finally, we remark that, apart from the circuits K (K I K) m , used for the error mitigation itself, the estimation of µ only involves the additional circuit K I K.The variance in this estimation is given by µ(1 − µ) and has the maximum value 0.25, irrespective of the size of the system.Therefore, the number of shots needed to estimate µ is small (below 1000).

V. EXPERIMENTAL RESULTS
In the experiments described below, the KIK mitigation of noise on the target evolution K is complemented by an independent mitigation of readout errors and a small preparation error that affects the default computational state ρ = |00 00| [35].The results of Sec.VB also include the application of randomized compiling [36] to the evolutions K and K I , where circuits logically equivalent to the ideal evolution U are randomly implemented.This is useful for turning coherent errors into incoherent noise, which can be addressed by our method.Details concerning these experimental methods can be found in Appendix SVI [33].

A. KIK-based gate calibration for mitigating coherent errors
A usual approach to handle coherent errors in QEM is to first transform them into incoherent errors via randomized compiling [36], and then apply QEM.In this section, we discuss the application of the KIK formula to directly mitigate the coherent errors caused by a faulty calibration of the CNOT gate.The same approach can be used in the case of single-qubit gates.Our idea is to complement the KIK mitigation for a whole circuit, with a KIK-based calibration of the individual CNOTs.
Figure 1 shows the results of our effective calibration of a CNOT in the IBM processor Jakarta.We apply the gate on the initial state ρ = 1 √ 2 (|0 + |1 ) ⊗ |0 , and measure the expectation value of the observable Y 1 (where Y 1 is the y-Pauli matrix acting on the target qubit) at different amplitudes of the cross resonance pulse [37], which constitutes the two-qubit interaction in the IBM CNOT implementation.Experimental details can be found in Appendix SVI.Each data point of Fig. 1 is obtained by applying "Taylor mitigation" (i.e. using Eq. ( 2) with the coefficients a T ay,m ), for 0 ≤ M ≤ 3, and linear regression (least squares) is applied to determine the line that best fits the experimental data.We also verify that in this case mitigation with the adapted coefficients a (M ) Adap,m does not yield a noticeable advantage.This indicates that noise is sufficiently weak, which is further evidenced by the quick convergence of the lines corresponding to M ≥ 1 in Fig. 1A.
Keeping in mind that the calibrated amplitude must reproduce the ideal expectation value Y 1 = 0, we can see from Fig. 1 that the predicted amplitude without QEM (M = 0) and with QEM are different.Since the CNOT is subjected to stochastic noise, without QEM the measured expectation values will be shifted and the corresponding linear regression results in a calibrated amplitude that is also shifted with respect to the correct value.The ensuing effect is a coherent error due to unmitigated stochastic noise.By using the KIK method, the error-mitigated expectation values mitigate such a coherent error.It is important to stress that the benefit of this calibration would manifest when combined with QEM of the target circuit in which the CNOTs participate.The reason is that the calibrated field is consistent with gates of reduced (stochastic) noise (due to the use of QEM in the calibration process), and therefore it is not useful if the target circuit is implemented with all its original noise.
In Fig. 1 we also observe that a proper implementation of KIK QEM requires the pulse-based inverse K I (Fig. 1A), characterized in Appendix SII [33], while the use of another CNOT for K I (Fig. 1B) does not show the expected convergence as the mitigation order M increases.Note also that despite a CNOT is its own inverse in the noiseless scenario, it leads to a coefficient of determination R 2 whose values evidence a poor linear fit.

B. Quantum error mitigation in a ten-swap circuit
In Fig. 2A, we show the results of QEM for a circuit K given by a sequence of 10 swap gates.The experiments were executed in the IBM quantum processor Quito.The schematic of K is illustrated in Fig. 2B.
We mitigate errors in the survival probability Tr (ρσ), where σ is the noisy final state that results from applying K to ρ.To perform QEM, we consider the truncated expansion (2) with mitigation orders 1 ≤ M ≤ 3. The blue curve in Fig. 2A corresponds to Taylor mitigation a Adap,m that are adapted with functions g(µ) = µ and g(µ) = µ 2 in Eq. (3) give rise to the orange and green curves, respectively.Furthermore, for K I we perform the pulse-based inverse according to the pulse schedule described in Appendix SII [33].While in the absence of noise there is no difference between different realizations of the inverse evolution, the KIK method privileges the pulse inverse over other alternatives.This is in contrast with circuit folding [13], which does not make this distinction.Importantly, using the proper inverse is relevant even for evolutions where noise is likely to be time independent.2B, as a function of the mitigation order.The ideal survival probability is 1 (dashed black line).Green and orange curves show QEM adapted to the noise intensity, and the blue curve stands for mitigation assuming weak noise (Taylor mitigation).The thin colored areas connect small experimental error bars.We see that Taylor mitigation is outperformed by adapted mitigation.(B) The circuit used in the experiments.Each swap is implemented as a sequence of three CNOTs.
In Fig. 2A we observe that the adapted coefficients a Adap,m outperform Taylor mitigation.This shows that, beyond the limit of weak noise, QEM can be substantially improved by adapting it to the noise intensity.Although we can also see that in this experiment the noise-free survival probability is almost fully recovered, we emphasize that Eq. ( 1) only includes the first Magnus term in the solution of the noisy dynamics that produces K.In cases where the noise is even stronger, or when very high accuracy is needed, the higher-order Magnus terms cannot be ignored.This aspect is missing in the circuit folding analysis.In Appendix SVII [33], we present a numerical example that illustrates a situation where neglecting higher-order Magnus terms limits the accuracy of QEM.
Due to experimental limitations, it was not possible to implement the ten-swap circuit using CNOTs calibrated through the KIK method.Specifically, we could not guarantee that calibration circuits and error mitigation circuits would run sequentially, and without the interference of intrinsic (noncontrollable) calibrations from the processor.Moreover, this demonstration requires that all the parameters of the gate are calibrated using the KIK method, and not just the cross resonance amplitude.However, we numerically verified in Appendix SVI [33] that coherent errors vanish for a gate calibrated using KIK QEM, to the point that randomized compiling is no longer needed.

VI. CONCLUDING REMARKS
Quantum error mitigation (QEM) is becoming a standard practice in NISQ experiments.However, QEM methods that are free from intrinsic scalability issues lack a physically rigorous formulation, or are unable to cope with significant levels of noise.The KIK method allows for scalable QEM whenever noise is not excessively strong, as demonstrated by error bounds on the QEM accuracy derived in Appendix SVIII [33].This QEM technique is based on a master equation analysis that incorporates time-dependent and spatially correlated noise, and does not require that the noise is trace-preserving.As such, based on elementary simulations we observe that it can also mitigate leakage noise and photon loss, which can take place in superconducting circuits and photonic circuits, respectively.In the limit of weak noise, the KIK method reproduces defining features of zero noise extrapolation using circuit unitary folding, and outperforms it.This is achieved thanks to the use of pulse-based inverses for the implementation of QEM circuits, and the adaptation of QEM parameters to the noise intensity for handling moderate and strong noise.
We have demonstrated our findings using the IBM quantum processors Quito and Jakarta.In Quito, we implemented KIK mitigation in a circuit composed of 10 swap gates (30 sequential CNOTs).Despite the substantial noise in this setup, we achieved the ideal result within the experimental accuracy.Using the processor Jakarta, we also showed that even the calibration of a basic building block of quantum computing, such as the CNOT gate, can be affected by unmitigated noise.As a consequence, calibrated gate parameters feature erroneous values leading to coherent errors.These errors can be avoided by incorporating the KIK method in the calibration process.The integration of randomized compiling into our technique also enables the mitigation of general coherent errors that can affect the total circuit and not only individual gates.This is possible because randomized compiling transforms coherent errors into incoherent noise, which can be tackled by the KIK method.
Despite these successful demonstrations, we believe that there is room for improvement by exploring some of the possibilities mentioned in Sec.IV.We also hope that the performance shown here can be exploited for new demonstrations of quantum algorithms on NISQ devices, with the potential of achieving quantum supremacy in applications of interest.
We acknowledge the use of IBM Quantum services for this work.The views expressed are those of the authors, and do not reflect the official policy or position of IBM or the IBM Quantum team.Raam Uzdin is grateful for support from the Israel Science Foundation (Grant No. 2556/20).In the standard description of Quantum Mechanics, a system of dimension d is represented by a density matrix ρ of dimension d × d.Moreover, a CPTP (completely positive and trace preserving) quantum operation can be expressed as where ρ is a density matrix and {K i } are Kraus operators that satisfy the completeness relation i K † i K i = I, and I is the d × d identity matrix.Observables correspond to hermitian operators A, and the associated expectation value for a system in a state ρ reads The Liouville space formalism is an alternative formulation that is particularly useful to simplify notation and handle quantum operations.In this framework, a density matrix is replaced by a vector |ρ of dimension d 2 and a quantum operation is a matrix of dimension d 2 × d 2 .Using the calligraphic notation O for a quantum operation, the analogous of Eq. (S1) in Liouville space is given by Here, we adopt the approach of Ref. [32], where |ρ is the column vector whose first d components correspond to the first row of ρ, the next d components correspond to the second row of ρ, and so forth.More formally, the vector representation of a d × d generic matrix B (not necessarily a density matrix) is given by With this convention, in Liouville space the quantum operation (S1) takes the form [32] where K * i is the element-wise complex conjugate of K i .For example, a unitary operation ρ = U ρU † is written as Equation (S4) is obtained by following the rule to vectorize a product of three matrices B, C and D. Denoting the associated vector as |BCD , this rule states that [32] where the superscript T denotes transposition.Setting B = K i , C = ρ, and D = K † i , Eq. (S4) follows by applying (S5) to (S1) and using the linearity property of the vectorization.

S-II. DYNAMICAL DESCRIPTION OF NOISE FOR THE TARGET EVOLUTION AND ITS INVERSE
In this appendix, we set the framework for the derivation of the KIK formula (Eq.( 1) in the main text).We consider a continuous-time description of the system evolution, modeled by the master equation Here, H(t) is a time-dependent Hamiltonian and L(t) is a time-dependent dissipator that accounts for the non-unitary contribution to the dynamics, which is induced by external noise.The hat symbol in L(t) is used to emphasize that it represents a superoperator in Hilbert space.
To specify the form of L(t) one could invoke a microscopic description of the dynamics, where the system is coupled to some external environment and the total system obeys the Schrodinger equation.The time-independent case is extensively studied in [38], and various time-dependent Markovian master equations have been derived [39].The dissipator is often given in the Lindblad form, which represents the most general dissipator for a Markovian and CPTP evolution.For our purposes, this is not necessary.For example, L(t) could incorporate leakage noise, which does not preserve probability and thus is not trace preserving.Now, let us rewrite Eq. (S7) in Liouville space.Using the linearity of the vectorization operation, we have that where |[H(t), ρ] and | L(t)[ρ] are the vectors corresponding to [H(t), ρ] and L(t)[ρ], respectively.By applying the rule (S5) to the commutator [H(t), ρ] = H(t)ρ − ρH(t), we obtain where D is identified with the identity I for the product H(t)ρ, and with H(t) for ρH(t).In both cases, C is associated with ρ.For a general dissipator L(t) we can simply write | L(t)[ρ] = L(t)|ρ , because ρ can always be associated with C in Eq. (S5).
In this way, the Liouville-space representation of (S7) reads We note that both H(t) and L(t) are linear operators that correspond to matrices of dimension d 2 × d 2 .Restrictions on L(t) will be specified in what follows.

A. Noise for pulse-based inverse evolution
Suppose that applying the driving H(t) in Eq. (S10) during a total time T leads to an ideal unitary evolution U = T e − T 0 iH(t)dt , where T is the time-ordering operator.Even in the circuit model of quantum computing, where unitary operations are composed of discrete quantum gates, each elementary gate is itself generated by a pulse schedule that can be represented as a time-dependent Hamiltonian.Hence, any quantum circuit is ultimately generated by some pulse schedule H(t).
Different pulse schedules can result in the same ideal evolution U, and naturally the same is true for its inverse U † .However, in the presence of noise this equivalence does not hold in general.The derivation of the KIK formula relies on relating the pulse schedule for U † with the pulse schedule for U in a specific manner.Denoting the driving that generates U † by H I (t), this relationship reads In combination with Eq. (S11), the other ingredient for obtaining the KIK formula has to do with how noise comes into play for a given driving H(t).On the one hand, Eq. (S10) describes noise that acts locally in time, i.e. that L(t) only depends on the current instant t and not on the previous history of the evolution.On the other hand, this time dependence can have two origins.One of them is the time-dependence of H(t) itself, and the other are intrinsic fluctuations of the noise that are related e.g. to changes in the environment or miscalibrations that occur during the execution of an experiment.The second possibility is discussed in detail in Appendix SIIIB.As for the influence of the driving H(t) on the noise, we assume that L(t) does not depend on the sign of H(t), but only on its amplitude.This reflects the fact that noise cannot be undone when running the reverse schedule described in Eq. (S11).Taking into account (S11), the dissipator L I (t) for the "pulse inverse" H I (t) should then satisfy Equation (S12) implies that if different gates are affected by different noise mechanisms (e.g.very fast gates may be prone to leakage noise due to non-adiabatic couplings to the levels outside the computational basis), the order in which these noise mechanisms operate is reversed when applying the pulse inverse (S11), as shown in Fig. S1.This is a consequence of the time locality of both L(t) and L I (t), and the reversed time schedule that H I (t) imprints on the corresponding inverse gates.To summarize, our sole assumptions on the noise are: 1.The linearity and the time locality of the dissipator L(t).
2. The relationship between the dissipator L I (t), which affects the pulse inverse H I (t) (cf.Eq. (S11)) at time t, and L(t).This relationship is encapsulated by Eq. (S12).
We remark that L(t) is not restricted to have a Lindblad form or to give rise to a trace preserving map.For example, it can incorporate leakage noise, which does not conserve the total probability and therefore is not trace preserving.Following Eq. (S10), the noisy evolutions K and K I that appear in the KIK formula are thus given by It is important to remark that no restrictions are imposed on the pulse H(t), so long as it reproduces the noise-free evolution U. On the other hand, we will see in Appendix SIII that Eqs.(S11) and (S12) allow us to approximate the noise channel for K as , which is why the form of the inverse driving in (S11) is important for our main finding.
We also note that the level of control required for implementing H I (t) is very similar to that used for H(t).In essence, we only need to time-reverse the pulse schedule corresponding to H(t) and flip its sign.In this work, we use the pulse-gate capabilities of the IBM processors to implement H I (t).No stretching of the pulses or any modification of their shape is involved.Therefore, the powers of K I K that enter the implementation of the KIK method are basically as easy to execute K itself.

S-III. DERIVATION OF THE KIK FORMULA
In this appendix, we derive the KIK formula where U KIK is a first-order Magnus approximation to the ideal evolution U that we will clarify in what follows.
Hereafter, we will refer to K and K I as target evolution and inverse evolution, respectively.In particular, K is the noisy evolution over which we intend to perform error mitigation.For now, we assume that the noise-free unitary is given by U = T e − T 0 iH(t)dt , meaning that the pulse schedule H(t) is perfectly calibrated.Hence, the KIK formula is useful to mitigate errors caused by the dissipator L(t).On the other hand, we will see in Appendix SVI that randomized compiling [36] complements and enhances the error mitigation achieved with the KIK method.Accordingly, integrating randomized compiling into our QEM technique also allows for the mitigation of coherent errors, related to miscalibrations of H(t).
To arrive at Eq. (S15) we shall proceed as follows.We consider that the driving H(t) acts in the time interval (0, T ) and the driving H I (t) is applied in the interval (T, 2T ).Thus, the total evolution at time t = 2T is K I K.For any time t ∈ (0, 2T ) the dynamics is modeled according to where Note that the action of H I and L I on (T, 2T ) requires the time shift by T as described in Eqs.(S17).With this notation, the ideal evolution for t ∈ (0, 2T ) is given by Ũ(t) = T e −i t 0 H(t )dt , and therefore, Ũ(T ) = U, and Ũ(2T ) = I.Similarly, for the noisy evolution we have that K(t) = T e t 0 (−i H(t )+ L(t ))dt , K(T ) = K, and K(2T ) = K I K.By expressing Eq. (S16) in the interaction picture we can write the evolution operator in the form K = UN , where the noise channel N is the solution to Eq. (S16) in interaction picture, at time t = T .Next, using the Magnus expansion [34], we will find that N can be approximated by (K I K) 1 2 .These are the main ingredients for the derivation of the KIK formula (S15).
To define the transformed states and operators in interaction picture, we use the noiseless evolution Ũ(t).Denoting interaction picture vectors and matrices with the subscript "int", we have that The solution |ρ int (t) = Kint (t)|ρ int (0) to Eq. (S20) is related to the original (Schrodinger-picture) solution by Kint (t) = Ũ † (t) K(t).Therefore, where we express Kint (t) in terms of the Magnus expansion [34].The first-order Magnus term Ω 1 (t) is central to our analysis, and is given by Regarding higher order terms Ω n≥2 (t), we only mention that they contain nested commutators that obey time ordering.
]. Setting t = T and t = 2T in Eq. (S21) leads us to K = Ue Ω(T ) , (S23) From these expressions, our final step in the derivation of (S15) is to show that If we approximate e Ω(T ) and e Ω(2T ) by keeping only the first Magnus term in the corresponding exponents, Eq. (S25) implies that the noise channel N = e Ω(T ) in (S23) can be approximated by In this way, the KIK formula is obtained by multiplying K ≈ UN KIK by the inverse Let us now prove Eq. (S25).First, we note that ).In addition, for the same time interval Eqs.(S12) and (S17) lead to L(t) = L(2T − t).Therefore, where the last line follows by performing the change of variable t = 2T − t.
To conclude this appendix, we stress that Eq. (S12) is key for the proof of Eq. (S25).In turn, within our characterization of noise it is specifically the inverse driving (S11) which provides the form taken by the dissipator (S12).This shows the crucial role of using the pulse schedule (S11) for the inverse evolution, rather than a different alternative that generates the ideal unitary U † in the absence of noise.

A. Relation between KI K and KKI in the KIK formula
In this appendix, we show that Clearly, this implies that we cannot substitute K I K by KK I in the KIK formula or in the corresponding expansions.
In particular, the coincidence with Richardson ZNE applying circuit unitary folding, discussed in Appendix SIV, is sound whenever noise amplification is performed using the correct ordering K I K.This is different from the heuristic approach taken in Ref. [13], where KK I could be an equally valid choice because it also reproduces the identity operation in the absence of noise.
More specifically, we show that the relation (S28) holds under the same approximation that leads to Eq. (S15).Namely, when the Magnus expansion used to express the evolution KK I is also truncated to the first Magnus term.Following our noise model, this evolution is the solution to the equation where In interaction picture, the first Magnus term for the solution of (S29) at time t = 2T reads 2T 0 Lint (t)dt, where Lint (t) = Ū † (t) L(t) Ū(t).Taking this into account, we will show that Accordingly, up to first order in the Magnus expansion we have that , which is tantamount to Eq. (S28).
To prove Eq. (S31), we derive the following two alternative forms of T 0 Lint (t)dt: Noting that Ū(t) = U(t) for t ∈ (0, T ), we obtain where the change of variable t = T − t is performed in the third line, and in the last line we use the relation . This proves Eq. (S32).
For the time interval t ∈ (T, 2T ), the evolution which (in combination with (S34)) proves Eq. (S33).Equation (S31) follows straightforwardly by combining Eqs.(S32) and (S33).We note that the simulations studied in Ref. [13] are based on the assumption of a global depolarizing channel that is identical for K and K I .Because global depolarizing noise commutes with any unitary U, in this case the total noise channel for both K I K and KK I is simply another depolarizing channel with an increased error rate.Hence, for this simple model both the KIK formula and circuit unitary folding can be applied using either K I K or KK I .However, as we have shown in this appendix, in a more realistic scenario the proper time ordering corresponding to K I K is crucial for a correct application of QEM.

B. KIK formula and noise drifts
Until now, we have approached the time dependence of L(t) (cf.Eq. (S17)) as being a consequence of the time dependence associated with the pulse schedules, H(t).In this framework, any implementation of K or K I would be affected by the same dissipators L(t) and L I (t − T ).However, it is more realistic to include the possibility of noise sources that also change in time.For example, a varying temperature or external electromagnetic field can be such that the dissipator L(t) acting during a given implementation of K differs from the dissipator L (t), associated with an execution of the same evolution at a later time.In the present section we discuss a technique to collect the QEM data that minimizes the effect of noise drifts.As we shall see, this is possible by distributing the circuits for QEM into suitable sets, and separately applying the KIK formula (S15) to each of these sets.
As discussed in the main text (see also Appendix SIV), performing QEM with the KIK method involves executing circuits of the form K (K I K) m , for 0 ≤ m ≤ M , where M is the mitigation order.Therefore, the time for running T , where T is the evolution time of K or K I .In the computation of expectation values, it is necessary to implement each K (K I K) m a certain number of times N m .Following standard terminology in quantum computing, a single execution of a circuit, including the preparation of the initial state |ρ and the measurement of the final state, is dubbed a "shot".Hence, N m shots are used for each K (K I K) m , and the "shot budget" to collect all the QEM data characteristic of the KIK method reads N = M m=0 N m (note that this excludes the shots invested in the estimation of the survival probability µ = ρ|K I K|ρ , in the case of adaptive mitigation).Assuming for now that the time for preparing |ρ and the time for measuring the corresponding final states are negligible with respect to T , performing N shots takes a total time For our analysis, it is useful to extend the time domain of L(t), to account for the behavior of the noise under repetitions of the evolutions K and K I .In this way, the time for an arbitrary repetition of K or K I can be expressed as t + 2kT , with 0 ≤ t ≤ 2T and k a positive integer, and stationary noise is characterized by the condition L(t + 2kT ) = L(t), where L(t) is the dissipator in Eq. (S17).Conversely, noise drifts take place within the total time interval (0, t N ) if L(t + 2kT ) = L(t) for some k.
Let us now suppose that noise unavoidably drifts in the interval (0, t N ).In this scenario, the consistency of the evolutions K or K I in different shots can break down and prevent a correct implementation of the KIK formula (S15).However, we can avoid or at least alleviate this effect through a proper distribution of the shot budget N .Consider Fig. S2, where two strategies for implementing the circuits {K (K I K) m } 2 m=0 (second-order mitigation) are illustrated.In the case of Fig. S2(a), all the shots corresponding to a given N m are sequentially implemented, i.e., N 0 shots are first performed, followed by N 1 shots, and finally by N 2 shots.On the other hand, the strategy of Fig. S2(b) relies on dividing the N shots into S sets {n 0 , n 1 , n 2 } of N S = n 0 + n 1 + n 2 shots each, where n m shots are employed for the circuit K (K I K) m .Therefore, all the mitigation circuits K (K I K) m appear in each set.Without loss of generality for our argumentation, we can focus on the simple case where n m = N S /3 for 0 ≤ m ≤ 2, i.e. when the shots of each set are equally distributed into the different circuits K (K I K) m .Since each set {n 0 , n 1 , n 2 } contains data produced by all the circuits K (K I K) m , the KIK formula can be individually applied to these data sets.Let denote the KIK formula corresponding to evolutions K s and K I,s that are executed in shots of the sth set {n 0 , n 1 , n 2 } (s) .If there were no noise drifts, K s = K 1 = K and K I;s = K I;1 = K I for 1 ≤ s ≤ S. Therefore, the two strategies depicted .Two strategies to collect data for second-order error mitigation, using the KIK method.The arrows indicate the order of implementation of different circuits.In (a), the implementations (shots) are separated into repetitions of each circuit K (KI K) m , for 0 ≤ m ≤ 2 (horizontal groups).In (b), the shots are divided into S sets (vertical groups) that include implementations of all the circuits K (KI K) m .Figure S3.KIK method applied to a system with noise drifts.(a) The four time-dependent amplitudes f k (t) that appear in the dissipator (S39), and characterize the drift in noise parameters.(b) First order mitigation and second order mitigation using the average formula (S38), as a function of the number of sets S (see Fig. S2 and main text for details).
in Fig. S2 would lead to the same result.Nevertheless, the non-stationary character of the noise may cause that K s or K I;s change significantly when moving between different sets {n 0 , n 1 , n 2 } (s) , or even within a fixed set.The second possibility is less likely though, if the time M m=0 n m (2m + 1)T invested in implementing each set {n 0 , n 1 , n 2 } (s) is smaller than the characteristic time for noise drifts to be significant.In other words, if the time scale over which noise drifts occur is sufficiently large to allow a consistent execution of U KIK,s .Assuming that N S is sufficiently small (equivalently, S sufficiently large) for this to happen, for any 1 ≤ s ≤ S we can implement the formula (S37) without worrying about variations in the evolutions K s or K I;s .In this way, for the shot budget N we utilize the "average KIK formula" To illustrate this methodology, in Fig. S3 we consider two qubits subjected to a dissipator where ξ = 0.05, and The noise amplitudes f k (t) are plotted in Fig. S3(a), for 1 ≤ t ≤ 1000.Moreover, we assume a time-independent Hamiltonian (cf.Eq. ( S9)) where I = 1 0 0 1 and X = 0 1 1 0 .
L(t) is the Liouville-space representation of the Hilbert space dissipator that satisfies L(t)[ρ] = ξ 4 k=1 f k (t) Lk , which follows by applying the vectorization rule (S5) to each term Lk Since the associated master equation has GKSL (Gorini-Kossakowski-Sudarshan-Lindblad) form, it is guaranteed that its integration results in a completely positive and trace-preserving evolution.We also stress that L(t) is now defined in the total time interval (0, t N ), to account for the many repetitions of the pulses H(t) and H I (t − T ) that come with the N shots.We numerically simulate QEM using the average formula (S38), by setting N = 1000, (S52) ) and 1 ≤ S ≤ 20.An execution of K or K I is performed in a unit of time T = 1, for which we assume that noise is essentially time independent.In other words, a more precise description of the noise occurring during the N shots is given by the discrete dissipator with L(t) satisfying Eq. (S39) and 0 ≤ n ≤ N − 1.

Figure S3(b) shows the error-mitigated expectation value ρ
KIK,av ρ , which quantifies the overlap with the initial state ρ = I/2 ⊗ |0 0|.We apply Taylor error mitigation, using the coefficients a (M ) T ay,m given in Eq. (S60).If S = 1, the standard strategy represented in Fig. S2(a) is recovered.We observe in Fig. S3(b) that in this case ρ mit deviates drastically from the noiseless expectation value ρ id = ρ |U| ρ .As S increases, second-order mitigation quickly approaches ρ id and converges to it at S ≈ 10.We also stress that for S ≥ 10 the quality of error mitigation is maintained, both for M = 1 and M = 2, which shows that averaging the KIK formula over more sets does not degrade the performance of the KIK method.The success of this strategy is explained because KIK,s ρ , and if noise is approximately stationary for each U (M ) KIK,s then the corresponding ρ U (M ) KIK,s ρ is not affected by the action of noise drifts.However, the number of shots N S associated with each set is not sufficiently large to achieve the accuracy shown in Fig. S3(b).This accuracy is achieved after averaging over the total number of sets.The convergence observed in Fig. S3(b) confirms that in this example increasing the number of sets enhances the QEM performance, and solves the noise drift problem.

S-IV. IMPLEMENTATIONS OF THE INVERSE OF THE NOISE CHANNEL NKIK
The KIK formula (S15) provides a compact approximation for the ideal target evolution U.However, performing QEM with this formula also requires being able to physically implement the inverse In this appendix, we compute various approximations to this inverse, given as polynomials of the KIK cycle K I K.
A. Weak noise limit and Taylor approximation First, we consider the limit of weak noise.As mentioned in the main text, in this case any eigenvalue λ of K I K is close to 1, and we can obtain the eigenvalues of (K I K) − 1 2 by Taylor expanding λ − 1 2 around λ = 1.Note that here we assume that noise is such that K I K is still diagonalizable, and therefore the eigenvalues of (K I K) − 1 2 can be obtained as λ − 1 2 .A Taylor expansion of λ − 1 2 around λ = 1 is thus equivalent to expand (K I K) − 1 2 around the identity I. Namely, where Since the series (S57) involves infinite powers of K I K, we must truncate it to some fixed order M for the implementation of (K I K) In this way, where B. Richardson ZNE using Circuit folding and linear scaling of noise In the following, we show that the expansion coefficients (S60) predict the result of QEM using Richardson ZNE and noise amplification through a method known as circuit folding [13], under the assumption of a linear scaling of the noise.To put this result into context, we start by presenting the basics of Richardson ZNE and circuit folding.
The goal of ZNE is to infer the noise-free expectation value of an observable A, by measuring this observable at different levels of noise and then extrapolating to the zero-noise limit.Therefore, the application of ZNE requires assuming a certain functional dependence A (λ), between the expectation value A and some noise parameter λ over which the experimentalist should have control.By measuring expectation values A k corresponding to different levels of noise λ k , an experimentalist can fit the data [λ k , A k ] to the model A (λ) and thereby estimate the noiseless expectation value as A (0).In the case of Richardson extrapolation, for M + 1 data points [λ k , A k ], A (λ) is taken as a polynomial in λ of degree M .
There exists a unique polynomial P (λ) of degree M that intersects all the points [λ k , A k ].This polynomial can be constructed as Therefore, the noise-free expectation value is estimated by Equation (S62) gives A (0) in terms of A k and the noise strengths λ k .One of the first techniques proposed to artificially increase the value of λ is pulse stretching [6], which involves pulse control from the user.In addition, we point out that pulse stretching also assumes that the noise is time-independent.Unitary folding is an alternative that does not require this level of control.If U describes the target ideal evolution, unitary folding operates by adding quantum gates to U that in the noise-free case are just identity operations.This can be done either by using "circuit foldings" U † U, or by inserting products between gates and their own inverses.Noting that the polynomial (S59) contains powers of the (noisy) implementation of U † U, we are specifically interested in the connection between this polynomial and the use of circuit folding for ZNE, rather than folding at the level of gates.In this context, the assumption of linear scaling of the noise means that each power (K I K) k increases the noise characteristic of K by a factor of 2k, i.e. that the noise increases proportionally to the depth of the circuit (K I K) k .If λ 0 corresponds to the natural noise in the target circuit K, then, the folding m |ρ , it follows that the application of the KIK formula in the weak noise limit reproduces the prediction of Richardson ZNE, with circuit folding and linear scaling of noise [cf.Eqs.
(S62) and (S63)].As a final remark, it is worth noting that in circuit folding the realization of U † using circuit level of control does not involve modifying gates that are their own inverse.A fundamental example in this respect is the CNOT gate, which is the basic unit of two-qubit interactions.In contrast, the pulse inverse H I (t) used in our method reverses also the schedules of the CNOT gates, to keep consistency with the pulse-based inverse K I appearing in the KIK formula (S15).
In what follows, we also show that using K = K I in the case of target circuits that are their own inverse introduces an additional error term that is absent when K I is the pulse inverse.This further demonstrates the importance of the correct implementation of K I for KIK QEM.T ay,m reproduce Richardson ZNE when noise is linearly scaled through circuit folding.However, even in this limit of weak noise the KIK method provides insights that elude a naive application of circuit folding.An example of this was already given in Appendix SIIIA, by showing that, even though KK I andK I K are equivalent in the noiseless scenario, the product K I K is the correct choice for using the KIK formula (S15).This is valid in particular for Eq.(S59).
Here, we will see that ignoring the pulse inverse K I also has negative consequences for QEM using Eq.(S59), which for simplicity we call "Taylor mitigation".Specifically, we consider circuits such that which suggests the application of (S59) with K I = K.In this way, the M th order approximation to U (cf. Eq. ( 2) in the main text) reads T ay,m Ue Ω1 2m+1 , (S65) where K has been replaced by Eq. (S23) and Ω 1 = Ω 1 (T ).
Next, we approximate Ue Ω1 by Ue Ω1 ≈ U + UΩ 1 , and keep only terms that are linear in KIK .Using U 2m+1 = U, we have that where we have applied T ay,m = 1.Now we divide the sum 2m+1 k=1 U k Ω 1 U 2m+1−k into two sums such that one of them contains only even powers U k , and the other contains only odd powers U k .Taking into account that U −1 = U and that any even power of U yields I, we obtain Substituting Eq. (S68) into Eq.(S67), we find that T ay,0 UΩ 1 where in the second line we added and subtracted the term corresponding to m = 0 in the sum.Finally, we use again T ay,m = 1, and T ay,m m = − 1 2 .In this way, for any M ≥ 1, Because Taylor mitigation corresponds to weak noise, in the limit when M tends to infinity KIK should converge to the evolution U KIK in the KIK formula (S15).By construction, this is the case if K I is given by the pulse inverse.However, we see from Eq. (S70) that when KIK , irrespective of the mitigation order M .Note also that this term cannot be avoided in general by considering higherorder (nonlinear) contributions in Ω 1 .In this way, Eq. (S70) shows that using K I = K instead of the pulse inverse leads to an inconsistent application of QEM, which is afflicted by an additional error term 1  2 [U, Ω 1 ].Finally, we remark that for global depolarizing noise the associated noise channel commutes with any unitary U, and one can easily check that in this case the KIK formula (S15) is exact when K I = K.Thus, while the pulse-based inverse K I becomes unnecessary for such a simplified noise model [13], it is of paramount importance in practical applications under realistic noise.To obtain the coefficients a Adap,m , we minimize the quantity . Therefore, we have to solve the equations The M th coefficient is obtained by imposing the constraint a Adap,m .This is equivalent to the normalization condition m is trace-preserving if K and K I are trace-preserving.For the sake of notational simplicity, in the following we will write g(µ) as g.
Let us now check that the obtained coefficients minimize ε in the subspace determined by the variables . For M = 1, the second derivative and therefore a Adap,0 minimizes ε L2 with respect to a  L2 and the second partial derivative , and check their positivity.Since we only need to check the determinant of the Hessian matrix (cf.Appendix SIVD).In particular, g(µ) = 1 corresponds to Taylor mitigation.Keeping in mind Eq. (S90), the inverse K I is given by Figure S5 shows the fidelity F ρ as a function of M , for ξ = 0.00106 and ξ = 0.00223.Overall, we observe that the best performance among the tested functions g(µ) is achieved by g(µ) = µ 2 (blue curve).In particular, F ρ reaches a value extremely close to one already for M = 1.A clearer comparison between the different functions g(µ) is possible by looking at the ratio between the infidelity before QEM and after QEM, which quantifies the infidelity suppression provided by the KIK method.This quantity is plotted in Fig. S6.In this figure, we see that g(µ) = µ 2 outperforms g(µ) = 1 and g(µ) = µ for all 1 ≤ M ≤ 3.Although the ratio r F (M ) corresponding to g(µ) = µ 2 is slightly below that associated with g(µ) = µ 2.5 , in the case of M = 2, 3, g(µ) = µ 2 yields a substantially larger r Finally, it is interesting to observe how the different functions g(µ) produce physically consistent fidelities ≤ 1 as M increases.In particular, we can see that the unphysical fidelity F ρ > 1 corresponding to g(µ) = µ 2.5 is quickly fixed by going to the next mitigation order M = 2.The explanation for this behavior is as follows.The quality of the approximation Adap,m [g(µ)]λ m approximates the function λ −1/2 , where λ is an eigenvalue of K I K.If g(µ) is too small, as compared to the smallest eigenvalue of (K with low M may be a rough approximation to (K I K) −1/2 .This leads to undesired effects such as the aforementioned fidelity F ρ > 1 (note that in our example the function g(µ) = µ 2.5 yields the smallest g(µ) for any value of µ).However, by increasing M we can always improve the approximation in the whole interval (g(µ), 1), which by assumption contains all the eigenvalues of (K I K) −1/2 .This would explain not only the recovery of a physical fidelity but also that for M = 2 and M = 3 the maximum values of this quantity correspond to g(µ) = µ 2.5 .
On the contrary, when g(µ) is larger than the smallest eigenvalue of (K I K) −1/2 , the interval (g(µ), 1) does not contain all the eigenvalues of (K I K) −1/2 and increasing M does not necessarily improves the QEM.This is likely the reason that g(µ) = µ 2 yields fidelities smaller for M = 2 and M = 3, as compared to M = 1.

S-VI. DESCRIPTION OF THE EXPERIMENTAL PROCEDURES
Here, we provide additional details and complementary information concerning the experiments described in the main text.In Appendix SVIA we describe techniques to tackle different classes of errors, which complement the KIK For both experiments, initial rotations RZ (±π/2) (green squares) are applied on each qubit to mitigate a potential coherent error affecting the state |00 .After that, the initial state for the CNOT calibration experiment is prepared (this preparation is not part of the ten-swap experiment), and the evolutions K and KI are interleaved by RC operations (only in the case of the ten-swap experiment).Blue and magenta squares distinguish potentially different RC realizations, chosen from Table S1.(b) The evolution K used for the CNOT calibration experiment (left), and for the ten-swap experiment (right).
error mitigation.In Appendix SVIB, we explain in detail the steps to estimate error-mitigated expectation values, as well as the corresponding uncertainties.Then, information specific to the CNOT calibration experiment and the ten-swap experiment is given in appendices SVIC and SVID, respectively.We conclude in appendix SVIE with a simulated calibration of the CNOT gate.This simulation shows results very similar to those of the CNOT calibration experiment, and provides additional evidence of the usefulness of KIK-based calibration.

A. Integration with complementary error mitigation methods
We consider three complementary techniques for addressing errors that can affect an experiment in the three stages of its execution.Namely, the preparation of the initial state, the evolution, and the measurement.
In the IBM quantum processors the preparation of the single-qubit (default) computational state |0 may possess a small deviation angle δθ [35].As a result, the prepared state is given by |ψ(δθ) = cos δθ 2 |0 + e iϕ sin δθ 2 |1 .To cope with this error, we apply rotations R Z (±π/2) = e ∓i π 4 Z on each qubit, in such a way that R Z (π/2) and R Z (−π/2) are equally distributed among the number of shots.This produces a mixed state whose population Tr (|0 0| ) = cos 2 δθ 2 coincides with | 0|ψ(δθ) | 2 .However, the coherent error associated with the rotation angles δθ and ϕ is eliminated through this procedure.Since both experiments involve circuits acting on two qubits, there are four possible rotation operations characterized by the pairs of angles {(±π/2, ±π/2)}.This is illustrated by the green squares in Fig. S7(a).
Coherent errors present during the evolution stage of a quantum circuit can be specially harmful for quantum computing [41].Randomized compiling (RC) [36] is a standard method that allows to transform these errors into stochastic noise, which is amenable to the application of QEM using the KIK method.The essential idea of RC is to replace the target circuit by an average over compiled realizations that are logically equivalent to the original evolution.Each RC realization is obtained by randomly replacing and/or adding some gates that leave the noisefree target circuit invariant.However, in the presence of coherent errors, different RC realizations produce different evolutions, and coherent errors characteristic of single realizations can be averaged out into stochastic noise.
In our method, the implementation U (M ) KIK of the KIK formula contains both the target evolution K and its inverse K I .Therefore, for each KIK power K (K I K) m we independently applied RC on K and K I , as indicated in Fig. S7(a).
RC1 RC2 Table S1.Possible combinations of RC operations used for the evolutions K and KI in the ten-swap experiment.Here, RC1 and RC2 stand for Pauli gates that are performed as indicated in Fig. S7(a).Since in this circuit the ideal unitary associated with K and KI is the identity, the gate RC3 must coincide with RC1, and the gate RC4 must coincide with RC2.
This is done by using the Pauli gates shown in Table S1, which are performed before and after the execution of K and K I in the ten-swap experiment.A sequence of ten swap gates is logically equivalent to the identity operation, and one can straightforwardly check that each RC realization in Table S1 reproduces this operation.Furthermore, we remark that an independent application of RC on K and K I is important to perform an effective randomization of the coherent errors affecting both evolutions.This ensures that the first Magnus terms appearing in the evolutions K and K I K (cf.Eqs.(S23) and (S24)) are still related through Eq. (S25), and thus validates the KIK formula with the randomized versions of K and K I .Along the same line, we avoid the simplification of RC layers in any sequence K (K I K) m .This means that consecutive single-qubit gates used for RC are not merged into a single gate.Otherwise, the independence of the randomization would be affected by preventing that the gates of the original pair act separately on K and K I .
On the other hand, for the CNOT calibration experiment we omitted the use of RC.The reason is that, in this experiment, we explored an alternative mitigation of coherent errors affecting the calibrated gate, which operates differently from RC.More specifically, a KIK-based mitigation of stochastic noise in the calibration process can improve the quality of the calibration, which translates into a calibrated gate with reduced coherent errors.
A final ingredient for our QEM experiments is the mitigation of readout errors.For a circuit executed on N qubits, the measurement process outputs counts of the states |i 1 i 2 ...i N (i j ∈ {0, 1} for 1 ≤ j ≤ N ) in the computational basis.A readout error occurs when the state registered upon the measurement is incorrect.For example, for a single qubit whose final state is |1 , some counts may erroneously register the state as being |0 .
Let us denote a general N -bit string by k = (k 1 k 2 ...k N ), and the corresponding computational state by |k .When a quantum circuit produces a final state σ, readout errors cause wrong estimates of the probabilities p k = Tr (|k k|σ).A simple way to relate the ideal probabilities p k and the erroneous ones is by considering the probability distributions associated with each computational state.If p (l|k) denotes the conditional probability to register the state |l , given that the actual state is |k , for the final state σ the probability to measure |l in the presence of readout errors is given by Thus, the information about the readout errors is contained in a 2 N ×2 N matrix M with entries p (l|k), whose columns and rows are associated with k and l, respectively.To counteract the effect of the readout errors, we can invert the "measurement matrix" M. In this way, the vector of error-free probabilities can be obtained by applying the inverse M −1 to the vector of measured probabilities.Although the procedure described above is not scalable in N , because the size of M is exponential in the number of qubits, our experiments involve two qubits and admit an efficient estimation of the p (l|k).To this end, we prepared the four computational states {|ij } 1 i,j=0 using circuits with the appropriate X gates.For example, |01 is prepared by applying X on the second qubit.The measurement matrices were experimentally determined and then inverted for readout error mitigation in both the CNOT calibration experiment and the ten-swap experiment.The number of shots invested in the estimation of the distributions {p (l|k)} l is given in Table S2.
As a final comment, it is important to mention that the study of methods to efficiently cope with readout errors in circuits containing a large number of qubits is an active area of research.For example, in many cases the noisy probability distribution q l is mostly concentrated around the ideal distribution p k .This allows to construct a measurement matrix of reduced dimension that is defined over the subspace of bit strings with q l = 0 [17].In particular, the maximum size of such a matrix is determined by the maximum number of shots used to sample q l , and is independent of the number of qubits.

B. Statistical analysis of experimental data
The experimental estimates of different probability distributions are computed as frequencies over the number of shots.Ultimately, the goal is to estimate probability distributions {p (m,i) k } k for each m and i, where p (m,i) k denotes the probability to measure |k when the circuit K (K I K) m is implemented in combination with the ith RC realization.These probabilities are the key element for the computation of expectation values and the application of QEM.
In the case of the ten-swap experiment, we applied 16 RC realizations of each circuit K (K I K) m (see Appendix SVID for details).Hence, for any m the expectation value Tr (Oσ) of an observable O is estimated as where N RC = 16.Importantly, for observables that are not diagonal in the computational basis, the final state σ in Tr (Oσ) must contain a rotation that performs the corresponding change of basis.For example, to measure the observable O = Y 1 on qubit 1, for the CNOT calibration experiment, before the measurement we applied a rotation R X1 (π/2) = e −i π 4 X1 on this qubit.Since different RC realizations correspond to independent experiments, the variance for O m reads where N i is the number of shots used in the ith RC realization.In the CNOT calibration experiment RC was not implemented, as mentioned before.However, Eqs. (S104)-(S106) are still applicable in this case, with N RC = 8 being the number of circuits associated with different initial rotations (see Appendix SVIC), and the index i labeling any of these circuits.
The KIK method provides the error-mitigated expectation value with the coefficients a (M ) m chosen depending on the QEM strategy.Namely, Taylor mitigation, or adaptive mitigation.The independence of the experiments associated with different circuits K (K I K) m leads to the variance The error bars in the plots of the main text and the plots of Figs.S8 and S9 correspond to one standard deviation.That is, half of each error bar is given by Var ( O M ).

C. CNOT calibration experiment
The calibration experiment was performed on the IBM processor Jakarta, using the qubits labeled by 0 and 1.The qubit 0 was employed as control for the CNOT gate and the qubit 1 as target.In the IBM processors, the two-qubit interaction used to generate the CNOT gate is effectively implemented via the so called cross-resonance interaction [37] H CR = Z ⊗ X, where Z and X are the Pauli matrices Z = 1 0 0 −1 and X = 0 1 1 0 acting on the control qubit and target qubit, respectively.The CNOT thus involves a π/2 rotation with the Hamiltonian H CR .This shots per KIK circuit shots for readout mitigation RC operations CNOT calibration experiment 256000 512000 n/a ten-swap experiment 320000 240000 16 Table S2.Summary of experimental resources.The "shots per KIK circuit" are the number of shots associated with each circuit K (KI K) m , for 0 ≤ m ≤ 3.For both experiments, these shots include the initial rotations, and also RC operations in the case of the ten-swap experiment.The "shots for readout mitigation" are the total number of shots used to compute M −1 .
operation is performed in the quantum processor by applying a microwave pulse characterized by various calibrated parameters such as amplitude and duration.However, the values obtained for these parameters may be affected by systematic errors, due to noise present in the measurements used for calibration.We focus on the calibration of the pulse amplitude, using the KIK method to mitigate the effect of noise.As explained in the main text, we prepare the initial state 1 √ 2 (|0 + |1 ) ⊗ |0 and measure the observable Y = 0 −i i 0 on the target qubit for different pulse amplitudes.This choice of initial state and observable is convenient because it produces a calibration curve that is very well described by a straight line.We measured the expectation value Y 1 for the amplitude factors F ∈ {0.98, 0.9866, 0.9933, 1, 1.0066, 1.0133, 1.02}, where F = 1 corresponds to the default amplitude used by IBM, and fitted a straight line to the resulting data points.The calibrated amplitude factor is extracted from the intersection of the fitted line with the x axis, which corresponds to Y 1 = 0.The range of amplitudes F was chosen to be sufficiently narrow, so that one could observe the linear behavior previously described.
To increase the precision of the calibration, the experiment was performed using a sequence of 11 CNOT gates.Ideally, this circuit is equivalent to a single CNOT, and the repetition of CNOTs has the effect of amplifying the variations of Y 1 associated with different values of F .Accordingly, the inverse K I for the left circuit in Fig. S7(b) consists of a sequence of gates such that each of them is the pulse inverse of a single CNOT.
In the processor Jakarta, the maximum number of shots per circuit is 32000.For the mitigation of readout errors, we repeated four times the circuits employed in the preparation of each computational state, which yields a total of 4 × 32000 = 128000 shots used to estimate each probability distribution {p (l|k)} l .The circuits K (K I K) m , used for QEM, were preceded by any of the four rotation operations used to mitigate the preparation coherent error.These rotations were executed before the preparation of the initial state, as seen in Fig. S7(a).We repeated two times the circuit corresponding to any rotation.Hence, 8 × 32000 = 256000 shots were employed to measure the expectation value of Y 1 on each K (K I K) m , for each amplitude F .Since we applied KIK mitigation up to order 3, the value of m varies between 0 and 3.These experimental details are summarized in Table S2.

D. Ten-swap experiment
In this experiment, the error-mitigated quantity was the probability for the system to remain in the initial state |00 , after applying a sequence of ten swap gates.The IBM processor employed was Quito.In practice, three CNOT gates were used to implement each swap, as shown in Fig. 2B of the main text.Apart from the application of RC, detailed below, we reduced the impact of coherent errors by alternating a normal swap with a pulse-based inverse swap (see Fig. S7(b)).This strategy has been previously applied to gates that are their own inverse, with the purpose of mitigating coherent errors such as overrotations [42].Since a swap gate is its own inverse, our ideal target circuit is not modified by interleaving standard swaps with their pulse-inverse counterparts.
Let us now describe in more detail the distribution of the different circuits for this experiment.As in the previous case, any of the circuits K (K I K) m was preceded by one of the four rotation operations applied on the state |00 .For 0 ≤ m ≤ 3, a circuit K (K I K) m is accompanied by a rotation operation and a RC operation chosen at random from Table S1.We repeated four times each possible rotation, which results in a total of 16 circuits per each value of m, involving 16 (not necessarily different) RC realizations.Keeping in mind that the maximum number of shots for the IBM Quito device is 20000, the total number of shots used for each circuit K (K I K) m was 16 × 20000 = 320000.For readout mitigation, the circuits that prepare the computational states were repeated three times.Therefore, 3 × 20000 = 60000 shots were employed to estimate each probability distribution {p (l|k)} l .
We also include here the error mitigation curves for the initial states |01 , |10 , and |11 , which complement the curves shown in Fig. 2A of the main text.These plots are given in Fig. S8.As in the case of the state |00 , we can see that adaptive mitigation with the function g(µ) = µ 2 leads to the best results.We also remark that the slightly worse results corresponding to the state |11 may be due to several experimental factors.Namely, imperfect averaging of the initial state (see Eq. (S102)), drifts in gate parameters that characterize the single-qubit gates used for RC, and drifts in the measurement matrix M.However, for the present experiment we do not have sufficient information for determining the most dominant factor.
In order to study the effect of coherent errors in the ten-swap circuit, we executed an additional experiment on the IBM processor Quito.As in the case of Fig. S8(c), the quantity measured was the survival probability for the system to remain in the initial state |11 .The results presented in Fig. S9 were obtained by applying adaptive mitigation with g(µ) = µ 2 .Coherent errors are manifested in Fig. S9 .This provides strong evidence that the modest performance observed in Fig. S8(c) is not intrinsic to the state |11 , but possibly related to experimental factors already mentioned.We also remark that the orange curve is obtained by omitting the application of RC in any of the four repetitions of each initial rotation operation.The construction of both curves in Fig. S9(b) involves the same number of shots per KIK circuit, as per Table S2.
m that characterize the truncated Taylor expansion λ −

Figure 1 . 1 √ 2 (
Figure 1.Calibration of the pulse amplitude of a CNOT gate in the IBM processor Jakarta, using the KIK method.In (A) and (B) KI is given by the pulse inverse and the circuit inverse KI = K, respectively.The initial state is ρ = |ψ ψ|, with |ψ = 1 √ 2 (|0 + |1 ) ⊗ |0 .The default amplitude is increased by the factors F shown in the x axis of the figure, and for each factor we apply Eq. (2) to evaluate the expectation value Y1 , where Y1 is the y-Pauli matrix acting on the target qubit.The factor F Y 1 =0 corresponds to the ideal expectation value Y1 = 0 and yields the calibrated amplitude.The factors F Y 1 =0 associated with the magenta and black dashed lines are different, which indicates a shift in the amplitude obtained without KIK calibration.In Fig. 1B, we see that the convergence achieved for increasing M in Fig. 1A is spoiled by the use of the circuit inverse.

Figure 2 .
Figure 2. Experimental QEM in the IBM processor Quito.(A) Error-mitigated survival probability for the circuit of Fig. 2B, as a function of the mitigation order.The ideal survival probability is 1 (dashed black line).Green and orange curves show QEM adapted to the noise intensity, and the blue curve stands for mitigation assuming weak noise (Taylor mitigation).The thin colored areas connect small experimental error bars.We see that Taylor mitigation is outperformed by adapted mitigation.(B) The circuit used in the experiments.Each swap is implemented as a sequence of three CNOTs.
KIK formula for time-dependent noise III.QEM using the KIK formula IV.QEM in the strong noise regime V. Experimental results A. KIK-based gate calibration for mitigating coherent errors B. Quantum error mitigation in a ten-swap circuit VI.Concluding remarks S-III.Derivation of the KIK formula A. Relation between K I K and KK I in the KIK formula B. KIK formula and noise drifts S-IV.Implementations of the inverse of the noise channel N KIK A. Weak noise limit and Taylor approximation B. Richardson ZNE using Circuit folding and linear scaling of noise C. Error induced by the circuit inverse K I = K in circuits that satisfy U 2 = I D. Coefficients for Adaptive QEM based on the KIK formula, for mitigation orders M = 1, 2, 3 S-V.KIK QEM applied to the transverse Ising model A. Error mitigation S-VI.Description of the experimental procedures A. Integration with complementary error mitigation methods B. Statistical analysis of experimental data C. CNOT calibration experiment D. Ten-swap experiment E. Simulation of a noisy calibration of a CNOT gate, using the KIK method S-VII.Numerical example to illustrate saturation of the KIK formula 33 S-VIII.Derivation of upper bounds for the performance of the KIK method 34 A. Lower bound on the smallest eigenvalue of K I K 34 B. Sufficient condition for K I = K † 35 C. First error bounds for Adaptive mitigation and Taylor mitigation 37 D. Alternative bound for Taylor mitigation 39 E. Traceless observables and second (tighter) error bounds for Adaptive error mitigation and Taylor error mitigation

Figure S1 .
Figure S1.Time dependence of dissipators L(t) and LI (t).(a) Illustrative noisy quantum circuit, composed of three gates that act on three qubits.The total dissipator L(t) is given by a sequence of dissipators Li, 1 ≤ i ≤ 3, which affect each gate.(b) Upon reversing the drive pulse schedule (cf.Eq. (S11)), the order of the gates and the corresponding dissipators is also reversed.Hence, if the sequence of dissipators in (a) is L1L2L3, in (b) it is given by L3L2L1.

Figure S2
FigureS2.Two strategies to collect data for second-order error mitigation, using the KIK method.The arrows indicate the order of implementation of different circuits.In (a), the implementations (shots) are separated into repetitions of each circuit K (KI K) m , for 0 ≤ m ≤ 2 (horizontal groups).In (b), the shots are divided into S sets (vertical groups) that include implementations of all the circuits K (KI K) m .
S63) which coincides with the coefficient a (M ) T ay,m [cf.Eq. (S60)].Taking into account that A m = A|K (K I K)

C
. Error induced by the circuit inverse KI = K in circuits that satisfy U 2 = I Previously, we showed that the coefficients a (M )

D
. Coefficients for Adaptive QEM based on the KIK formula, for mitigation orders M = 1, 2, 3

Figure S6 .
Figure S6.Enhancement ratio (S101), which quantifies the infidelity suppression obtained by the KIK method.The color code in Fig.(b) is the same as in (a).

Figure S7 .
Figure S7.Schematic of the QEM circuits used in the experiments.(a) General form of a KIK circuit K (KI K) m .For both experiments, initial rotations RZ (±π/2) (green squares) are applied on each qubit to mitigate a potential coherent error affecting the state |00 .After that, the initial state for the CNOT calibration experiment is prepared (this preparation is not part of the ten-swap experiment), and the evolutions K and KI are interleaved by RC operations (only in the case of the ten-swap experiment).Blue and magenta squares distinguish potentially different RC realizations, chosen from TableS1.(b) The evolution K used for the CNOT calibration experiment (left), and for the ten-swap experiment (right).

Figure S8 .
Figure S8.Experimental QEM in the IBM processor Quito.Error-mitigated survival probability for the ten-swap experiment, for initial states |01 (a), |10 (b), and |11 (c).The ideal survival probability is 1 (dashed black line).Green and orange curves show adaptive QEM, and the blue curve stands for mitigation assuming weak noise (Taylor mitigation).The thin colored areas connect small experimental error bars.Here, we see again that Taylor mitigation is outperformed by adaptive mitigation.
(a)  by the significant separation between different QEM curves, which correspond to different initial rotations.Furthermore, Fig.S9(b) shows how the application of RC (blue curve) produces substantially more accurate QEM, for all 1 ≤ M ≤ 3. It is also worth stressing that in the absence of KIK QEM the enhancement provided by RC disappears, as evidenced by the matching of the blue and orange curves at M = 0.Each curve in Fig.S9(a) shows the survival probability for a given rotation operation, obtained by averaging over the associated RC realizations.Thus, if use the subscript r to label any of the four possible initial rotations, and i r to label RC realizations that accompany the rotation r, for M th order mitigation the corresponding survival probability is given byO M,r = M m=0 a (M ) Adap,m µ 2 O m,r ,(S109)where O = ρ = |00 00| and O mthe calculation of the corresponding variances takes into account the independence of different RC realizations for a given rotation.As compared to Fig. S8(c), the blue curve in Fig. S9(b) features a performance more consistent with that observed in Figs.S8(a) and S8(b)

Figure S10 .
Figure S10.Simulation of the calibration of a noisy CNOT gate, for noise strength ξ = 2/100.The curves show the expectation value of Y1 = I ⊗ Y as a function of the angle amplitude A θ , which determines the angle θ = A θ π 2 in the cross-resonance interaction HCR (cf.Eq. (S111)).The calibrated amplitude corresponds to Y1 = 0 (cyan dashed line).Different curves stand for different mitigation orders 0 ≤ M ≤ 4. Figures (a) and (b) correspond to the application of the pulse inverse and the circuit inverse KI = K, respectively.

Figure S11 .
Figure S11.Simulation of the calibration of a noisy CNOT gate, for noise strength ξ = 1/100.The curves show the expectation value of Y1 = I ⊗ Y as a function of the angle amplitude A θ , which determines the angle θ = A θ π 2 in the cross resonance interaction HCR (cf.Eq. (S111)).The calibrated amplitude corresponds to Y1 = 0 (cyan dashed line).Different curves stand for different mitigation orders 0 ≤ M ≤ 4. Figures (a) and (b) correspond to the application of the pulse inverse and the circuit inverse KI = K, respectively.