Our ability to manipulate quantum systems underpins potential advantages over classical technologies1. Their dynamics are often idealized as noiseless or, if unavoidable, noise is assumed to be uncorrelated. However, interactions with the environment generally perpetuate past information about the system to the future, thereby serving as a memory. Memory effects thus pervade physical evolutions, resulting in non-Markovian dynamics2. The complexity of describing processes grows exponentially with memory length; therefore many simulation techniques invoke memory cutoffs3. Although several metrics have been proposed to quantify memory (and the consequences of neglecting it)4, most do not consider the influence of interventions, overlooking the operational reality of sequentially probed dynamics. Indeed, the impact of memory depends on how a system is controlled3,4,5,6,7,8,9,10,11,12,13,14: generically, the system–environment state at any time is correlated, so an interrogation directly influences the system state and conditions the environment, both of which affect the future. Detected memory properties are thus naturally related to the interrogation method14. This operational perspective has important consequences for dynamical decoupling15,16,17, erasure or transmission of information18,19, correlated error correction and characterization20,21, and (operational) quantum thermodynamics22,23,24,25,26,27.

This can be illustrated with the shallow pocket model16,17,28,29, comprising a qubit coupled to a continuous degree of freedom (see Fig. 1 and Supplementary Methods A). The joint dynamics induces pure-dephasing Lindblad evolution for the qubit, with exponentially decaying coherences. Non-classical correlations between any preparation and measurement similarly vanish, so the reduced dynamics forgets the initial state. However, the evolution following a σx unitary reverts the system to its original state; in this sense, the process displays infinitely long memory. This example highlights that although certain temporal correlations of the unperturbed system may decay rapidly, they do not account for the whole story; more generally, there exist detectable correlations between the history and future processes.

Fig. 1: Memory in shallow pocket dynamics.
figure 1

The mutual information I(S:A) between a system and auxilliary initially in a Bell pair decays exponentially as the system undergoes shallow pocket evolution (black). However, this is not the case if an intervention is applied at t1 (=5 above). We depict σx (blue); an offset rotation \(\sqrt{0.95}{\sigma }_{x}+\sqrt{0.05}{\sigma }_{z}\) (green); measurement of + in the x-basis (red); and trash-and-reprepare (purple).

Disentangling the non-Markovian memory, carried by the environment, from the temporal correlations that result from probing the system is a challenge that has only recently been resolved through an operational framework for describing quantum stochastic processes10,11. Subsequently, a notion of memory length—or quantum Markov order—was introduced, which reduces the complexity of describing processes with short-term memory by only retaining the minimal number of recent timesteps relevant to the dynamics12,13. Higher-order Markov models capture all memory effects below the Markov order and are therefore more accurate than Markovian approximations. However, a missing element from this theory so far has been a method for quantifying the strength of non-Markovian memory and developing efficient approximations accordingly: truncating weak temporal-correlations will yield a more efficient description for a process without comprimising the accuracy.

In this article, we construct an operational notion of memory strength for quantum processes. That is, we quantify the correlations between history and future processes with respect to an intermediate (multi-time) probing schema (see Fig. 2). Our main result links said memory strength with process recoverability: with respect to any interrogation sequence, one can approximate the process by discarding future-history correlations. If the memory strength is small for some instrument (family), then the approximate process accurately predicts expectation values for related observables, namely those in the linear span of the original instrument elements. This connection is akin to that between the conditional mutual information (CMI) and the fidelity of recovery for quantum states30,31, and involves a generalization of the measured relative entropy32 to quantum stochastic processes. A corollary follows for the “do nothing” instrument—which many approximations assume—which bounds the accuracy of predicting future states. Moreover, the memory strength for an informationally complete instrument bounds the distinguishability between the actual and recovered process for any experimental protocol. Lastly, we demonstrate our results via a solvable non-Markovian model, highlighting the complex memory structures amenable to our framework.

Fig. 2: Quantum stochastic process.
figure 2

An open quantum process with an initial system–environment state and unitary evolutions (green) can be represented as a process tensor \({{{{\Upsilon }}}}_{{FMH}}\) (outline). For each event of a memory instrument sequence, \({{{{\mathcal{J}}}}}_{M}=\{{{\mathsf{O}}}_{M}^{({x}_{M})}\}\) (purple), a conditional future–history process \({\widetilde{{{\Upsilon }}}}_{FH}^{({x}_{M})}\) (blue, red) results; future–history correlations evidence memory effects across M.

Classical stochastic processes

We begin by reviewing the pertinent ingredients from the theory of classical stochastic processes before turning our attention to quantum stochastic processes. A classical stochastic process over a discrete set of times \({{{\mathcal{T}}}}:= \{{t}_{1},\ldots ,{t}_{n}\}\) is described by an n-point joint probability distribution \({\mathbb{P}}({x}_{n},\ldots ,{x}_{1})\). A process has finite-length memory whenever the probability of each event xk at time \({t}_{k}\in {{{\mathcal{T}}}}\) only conditionally depends upon the past events

$${\mathbb{P}}({x}_{k}| {x}_{k-1},\ldots ,{x}_{1})={\mathbb{P}}({x}_{k}| {x}_{k-1},\ldots ,{x}_{k-\ell }).$$

Here, , the minimum number for which Eq. (1) holds, denotes the Markov order; a Markovian process has  ≤ 1. Markov order captures the complexity of characterizing a process, which grows exponentially in . Although may be large for many processes, their memory can be truncated, permitting efficient approximation. Grouping the times into three segments: the history H = {t1, …, tk−1}, memory M = {tk, …, tk−1} and future F = {tk, …, tn} (see Fig. 2), Markov order implies the conditional factorization

$${{\mathbb{P}}}_{FH}({x}_{F},{x}_{H}| {x}_{M})={{\mathbb{P}}}_{F}({x}_{F}| {x}_{M}){{\mathbb{P}}}_{H}({x}_{H}| {x}_{M}),$$

i.e., the future and history are conditionally independent given memory events. The Markov order condition is equivalently expressed by the vanishing classical CMI

$$I(F:H| M):= {{\mathsf{h}}}_{FM}+{{\mathsf{h}}}_{MH}-{{\mathsf{h}}}_{FMH}-{{\mathsf{h}}}_{M},$$

where \({{\mathsf{h}}}_{X}:= -{\sum }_{x}{{\mathbb{P}}}_{X}(x){{\mathrm{log}}}\,[{{\mathbb{P}}}_{X}(x)]\). The CMI is interpreted as the memory strength of the process. For processes with Markov order , I(F : HM) = 0 by Eq. (2); however, in general \({{\mathbb{P}}}_{FMH}\) does not conditionally factorize, and the CMI quantifies the correlations between F and H, given M.

The significance of approximately-finite Markov order is best encapsulated through the recovery map, \({{{{\mathcal{R}}}}}_{M\to FM}\), that acts only on M to approximate the correct future statistics: \({{\mathbb{P}}}_{FMH}({x}_{F},{x}_{M},{x}_{H})\simeq {{{{\mathcal{R}}}}}_{M\to FM}[{{\mathbb{P}}}_{MH}({x}_{M},{x}_{H})]\), with equality holding for processes with Markov order ≤. Intuitively, the recovery map discards conditional future–history correlations and uses the r.h.s. of Eq. (2) as an approximate description. Importantly, the recovery map approximates the process with an error that is bounded by the CMI30,31, with complexity reduced to the approximate Markov order. Thus, whenever the memory is weak, the recovery map provides an accurate and efficient approximation.

Quantum stochastic processes

We now move to the realm of quantum stochastic processes. To quantify the strength of memory in quantum processes, we begin by introducing the process tensor framework, detailed in “Methods”, which generalizes Eq. (1) to the quantum domain. Consider a joint system-environment SE, with (correlated) initial state \({\rho }_{1}^{SE}\). The system is interrogated at t1 and a quantum event x1 is observed, with an associated completely positive (CP) transformation \({{{{\mathcal{O}}}}}_{1}^{({x}_{1})}\) on S. The events are described by an instrument\({{{{\mathcal{J}}}}}_{1}=\{{{{{\mathcal{O}}}}}_{1}^{({x}_{1})}\}\), which is trace preserving (TP), i.e., \({{{{\mathcal{O}}}}}_{1}^{{{{{\mathcal{J}}}}}_{1}}:= {\sum }_{{x}_{1}}{{{{\mathcal{O}}}}}_{1}^{({x}_{1})}\) is CPTP33. Following this intervention, SE evolves unitarily for time t2 − t1 according to the superoperator \({{{{\mathcal{U}}}}}_{2:1}^{SE}\). Then, S is probed at t2, and so on, until tn. The probability to observe x1, …, xn = : xn:1 using instruments \({{{{\mathcal{J}}}}}_{1},\ldots ,{{{{\mathcal{J}}}}}_{n}=:{{{{\mathcal{J}}}}}_{n:1}\) is

$${\mathbb{P}}({x}_{n:1}| {{{{\mathcal{J}}}}}_{n:1})={{{\rm{tr}}}}\left[{{{{\mathcal{O}}}}}_{n}^{({x}_{n})}{{{{\mathcal{U}}}}}_{n:n-1}^{SE}\ldots {{{{\mathcal{U}}}}}_{2:1}^{SE}{{{{\mathcal{O}}}}}_{1}^{({x}_{1})}\ {\rho }_{1}^{SE}\right].$$

As per Fig. 2, abstracting everything outside of experimental control defines the process itself and yields a multi-linear map from instruments to probability distributions called the process tensor10,11. On the other hand, the instruments on S are collected to define the generalized instrument \({{{{\mathcal{J}}}}}_{n:1}=\{{{{{\mathcal{O}}}}}_{n:1}^{({x}_{n:1})}\}\). A generalized instrument can be temporally correlated, although we focus on uncorrelated instrument sequences for simplicity. Intuitively, the process tensor encapsulates the uncontrollable effect of the environment, i.e., the process per se, whereas the generalized instrument represents the controllable influence. Any open quantum dynamics probed at a (causally ordered) number of times can be described by a process tensor, and any probing sequence by a generalized instrument8,34. (These objects are commonly known as quantum combs and have appeared elsewhere8,29,33,34,35,36,37,38,39,40,41.)

Both the process tensor and the instrument elements are higher-order quantum maps8,34,36,42 that can respectively be represented as quantum states \({{{{\Upsilon }}}_{n:1}}\) and \(\{{{\mathsf{O}}}_{n:1}^{({x}_{n:1})}\}\) via the Choi–Jamiołkowski isomorphism. The joint probability of realizing any sequence of events is given by the generalized Born-rule43 (see Methods)

$${{\mathbb{P}}}_{n:1}({x}_{n:1}| {{{{\mathcal{J}}}}}_{n:1})={{{\rm{tr}}}}\left[{{\mathsf{O}}}_{n:1}^{({x}_{n:1}){{{\rm{T}}}}}{{{\Upsilon }}}_{n:1}\right]=:{\langle {{\mathsf{O}}}_{n:1}^{({x}_{n:1})}\rangle }_{{{{\Upsilon }}}_{n:1}}.$$

The process tensor encodes all probabilities for any choice of instruments and thus characterizes the process. Grouping the times as before, the quantum generalization of the l.h.s. of Eq. (2) is obtained by projecting the memory of \({{{{\Upsilon }}}_{FMH}}\) onto the conditioning element of \({{{{\mathcal{J}}}}}_{M}=\{{{\mathsf{O}}}_{M}^{({x}_{M})}\}\). The result is the conditional future–history process (see Fig. 2)

$${\widetilde{{{\Upsilon }}}}_{FH}^{({x}_{M})}={{{{\rm{tr}}}}}_{M}\left[{{\mathsf{O}}}_{M}^{({x}_{M}){{{\rm{T}}}}}{{{\Upsilon }}}_{FMH}\right].$$

The tilde in Eq. (6) signifies that the conditional objects are not necessarily proper process tensors, since realizing memory events post-selects the history35,44; nonetheless, summing these yields a proper process tensor \({{{\Upsilon }}}_{FH}^{{{{{\mathcal{J}}}}}_{M}}:= {\sum }_{{x}_{M}}{\widetilde{{{\Upsilon }}}}_{FH}^{({x}_{M})}\). Mirroring the classical setting, if the conditional processes are uncorrelated, i.e., \({\widetilde{{{\Upsilon }}}}_{FH}^{({x}_{M})}={{{\Upsilon }}}_{F}^{({x}_{M})}\otimes {\widetilde{{{\Upsilon }}}}_{H}^{({x}_{M})}\), then the process has Markov order \(\ell\) with respect to \(\mathcal{J}_M\)12 [see r.h.s. of Eq. (2)]. The causality constraint on the process tensor ensures that \({{{\Upsilon }}}_{F}^{({x}_{M})}\) is causally ordered for each event. This operational notion means that by applying \({{{{\mathcal{J}}}}}_{M}\), no future-history correlations are possible for any history and future instruments, i.e., no memory lasting longer than is detectable. In general, the conditional processes are correlated. However, long memory does not imply strong memory; we now show how weak correlations can be truncated for accurate and efficient approximation.


The first ingredient to developing finite Markov order approximations is to quantify the memory strength over a block of length , which we provide in Eq. (7). We then construct an approximate process that neglects long-term memory. We first use the knowledge from the interrogation to (partially) tomographically reconstruct the process on the memory block, ensuring that the approximate process acts correctly on all instruments lying in the span of the original one. This reconstruction typically exhibits future–history correlations, which are subsequently discarded to yield the more efficient approximate process of Eq. (10). We then bound the error of expectation values computed with the approximate process in terms of the memory strength, thereby endowing it with operational meaning. By iterating our procedure over translations of the memory block, significant savings in the complexity of description are possible.

Memory strength and recoverability

Given a particular realization xM of instrument \({{{{\mathcal{J}}}}}_{M}\), the future and history can be more or less correlated. Such correlations can be detected by applying any choice of instruments \({{{{\mathcal{J}}}}}_{F},{{{{\mathcal{J}}}}}_{H}\) on the conditional future-history process. Taking the supremum over such instruments provides an operational definition of memory strength, quantifying the largest detectable conditional future–history correlations. We thus define

$${{\Theta }}({{{{\mathcal{J}}}}}_{M})\ := \ \ \ \mathop{{{{\rm{sup}}}}}\limits_{{{{{\mathcal{J}}}}}_{F},{{{{\mathcal{J}}}}}_{H}\in {\mathbb{J}}}\mathop{\sum}\limits_{{x}_{M}}p({x}_{M}| {{{{\mathcal{J}}}}}_{M}){I}_{{\mathbb{J}}}{(F:H| {x}_{M})}_{{{\Upsilon }}},$$

where \(p({x}_{M}| {{{{\mathcal{J}}}}}_{M}):= {{{\rm{tr}}}}\left[{\widetilde{{{\Upsilon }}}}_{FH}^{({x}_{M})}\right]\) is the probability of observing xM and the supremum is taken over uncorrelated instruments \({{{{\mathcal{J}}}}}_{Y}:= \{{{\mathsf{O}}}_{Y}^{({x}_{Y})}\}\) for Y {F, H} belonging to a set \({\mathbb{J}}\). We have defined the measured CMI

$${I}_{{\mathbb{J}}}{(F:H| {x}_{M})}_{{{\Gamma }}}:= I{(F:H)}_{{{\mathbb{P}}}_{{{\Gamma }}}({x}_{F},{x}_{H}| {x}_{M})},\quad \,{{\mbox{where}}}\,$$
$${{\mathbb{P}}}_{{{\Gamma }}}({x}_{F},{x}_{H}| {x}_{M}):= {{{\rm{tr}}}}\left[{({{\mathsf{O}}}_{F}^{({x}_{F})}\otimes {{\mathsf{O}}}_{H}^{({x}_{H})})}^{{{{\rm{T}}}}}{\widetilde{{{\Gamma }}}}_{FH}^{({x}_{M})}\right]$$

is the conditional probability distribution for any process Γ and independent future and history instruments conditioned on a fixed memory event. Intuitively, the memory strength in Eq. (7) captures the largest detectable future-history correlation, conditioned on the outcomes recorded on the memory block, aggregated to the level of \({{{{\mathcal{J}}}}}_{M}\) by averaging over xM.

Unlike classical memory, quantum memory effects depend upon the choice of probing scheme, suggesting that a universal quantum recovery map may not exist. However, we now construct a quantum process recovery map that efficiently builds up longer processes from shorter ones whenever the process is stationary. We begin with an ansatz process with finite quantum Markov order with respect to \({{{{\mathcal{J}}}}}_{M}=\left\{{{\mathsf{O}}}^{({x}_{M})}\right\}\):

$${\underline{{{\Lambda }}}}_{FMH}^{{{{{\mathcal{J}}}}}_{M}}:= \mathop{\sum}\limits_{{x}_{M}}{{{\Upsilon }}}_{F}^{({x}_{M})}\otimes {{\mathsf{D}}}_{M}^{({x}_{M})}\otimes {\widetilde{{{\Upsilon }}}}_{H}^{({x}_{M})},$$

where the \({{\mathsf{D}}}_{M}^{({x}_{M})}\) are dual operators satisfying \(\,{{\mbox{tr}}}\,\left[{{\mathsf{D}}}_{M}^{({x}_{M}){{{\rm{T}}}}}{{\mathsf{O}}}_{M}^{({x}_{M}^{\prime})}\right]={\delta }_{{x}_{M}{x}_{M}^{\prime}}\)7,8. This is the tomographic representation of the process via linear inversion of the instrument outcomes, with conditional FH correlations discarded. The recovered process can exhibit correlations between H and M, M and F (as well as within each block), but not between H and F. By construction, \({\underline{{{\Lambda }}}}_{{FMH}}^{{{{{\mathcal{J}}}}}_{M}}\) is positive on its domain, which is the span of \({{{{\mathcal{J}}}}}_{M}\). When \({{{{\mathcal{J}}}}}_{M}\) is not informationally complete, i.e., does not span the full space, then \({\underline{{{\Lambda }}}}_{FMH}^{{{{{\mathcal{J}}}}}_{M}}\) only approximates the original process in said subspace, which we denote by the underline. Such processes are called restricted process tensors45 and are commonly encountered in experiments46,47,48,49. Nevertheless, its action on its domain is guaranteed to reproduce the correct statistics for any multi-time observable of the form \(C={\sum }_{x}{c}_{x}^{{{{\mathcal{J}}}}}{{\mathsf{O}}}^{(x)}\) with \({{\mathsf{O}}}^{(x)}={\sum }_{{x}_{M}}\ {{\mathsf{E}}}_{FH}^{(x,{x}_{M})}\ \otimes {{\mathsf{O}}}_{M}^{({x}_{M})}\), with arbitrary \({{\mathsf{E}}}_{FH}^{(x,{x}_{M})}\). This is a consequence of linearity, as the observable form ensures a linear decomposition in terms of \({{\mathsf{O}}}_{M}^{({x}_{M})}\), upon which the recovered process acts correctly due to \(\,{{\mbox{tr}}}\,\left[{{\mathsf{D}}}_{M}^{({x}_{M}){{{\rm{T}}}}}{{\mathsf{O}}}_{M}^{({x}_{M}^{\prime})}\right]={\delta }_{{x}_{M}{x}_{M}^{\prime}}\).

The above ansatz is the process analogue of a quantum Markov chain state30,31, which is widely studied in the context of Petz’s recovery map50,51. Like a quantum Markov chain state, the process above is generally not separable, as F and H can share entanglement with M, as per the example in Supplementary Methods B. However, since the dual elements here can be non-positive and cannot necessarily be decomposed into orthogonal parts, care must be taken in defining the recovery map \({{{{\mathcal{R}}}}}_{M\to FM}^{{{{{\mathcal{J}}}}}_{M}}:{\underline{{{\Lambda }}}}_{MH}^{{{{{\mathcal{J}}}}}_{M}}\to {\underline{{{\Lambda }}}}_{FMH}^{{{{{\mathcal{J}}}}}_{M}}\) (see Supplementary Methods C). The concept of the quantum recovery map is analogous to the classical case [see below Eq. (3)]. The key advantage of the quantum process recovery map is that its repeated action on the ansatz propagates the process arbitrarily far into the future with fixed -dependent complexity. If the process \({{{{\Upsilon }}}_{{FMH}}}\) has weak memory \({{\Theta }}({{{{\mathcal{J}}}}}_{M})\), the expectation value of any valid observable calculated from the recovered process \({\underline{{{\Lambda }}}}_{FMH}^{{{{{\mathcal{J}}}}}_{M}}\) accurately approximates that of the original:

Theorem 1

For any multi-time observable C with support on M within the span of the elements of \({{{{\mathcal{J}}}}}_{M}\),

$$\left|{\langle C\rangle }_{{{{\Upsilon }}}_{FMH}}-{\langle C\rangle }_{{\underline{{{\Lambda }}}}_{FMH}^{{{{{\mathcal{J}}}}}_{M}}}\right|\le | {{{\bf{C}}}}| \sqrt{2{{\Theta }}({{{{\mathcal{J}}}}}_{M})},$$

with \(| {{{\bf{C}}}}| := {\inf }_{{{{\mathcal{J}}}}}\ \sqrt{{\sum }_{x}| {c}_{x}^{{{{\mathcal{J}}}}}{| }^{2}}\).

This and the following statements are proven in the Methods section. Thm. 1 is fully general inasmuch as it holds without any assumptions on the dynamics or instruments employed. The r.h.s. involves a supremum over instruments on the future and history; as the memory strength takes the form of a generalized divergence, recent numerical techniques can be used for its estimation52,53,54. In Supplementary Methods D, we provide an easier-to-compute (and looser) bound based on the relative entropy between the original and recovered process—which foregoes the requirement for optimization—by adapting results from refs. 30,32,51,55 to first bound a generalized measured relative entropy and then the l.h.s. of Eq. (11) via Pinsker’s inequality. We also prove another bound, which is tighter in some cases, by restricting to unbiased instruments satisfying \({{{{\rm{tr}}}}}_{M}\left[{{\mathsf{O}}}_{FMH}^{{{{\mathcal{J}}}}}\right]\propto {{\mathbb{1}}}_{FH}\). Such instruments have the unconditional action of a completely depolarising channel, e.g., a randomly sampled Clifford gate. Deriving tighter bounds under various assumptions remains an open problem.

While Thm. 1 applies to multi-time observables, often one only requires the time-evolved density operator; a corollary bounds its prediction error:

Corollary 2

Let \({\rho }_{j}^{({x}_{M})}\) be the true density operator at any time tjF following outcome xM of \({{{{\mathcal{J}}}}}_{M}\) applied to the memory, and \({\rho ^{\prime} }_{j}^{({x}_{M})}\) be the approximated one. Then:

$${\left\Vert {\rho }_{j}^{({x}_{M})}-{\rho ^{\prime} }_{j}^{({x}_{M})}\right\Vert }_{1}\le \sqrt{2{{\Theta }}({{{{\mathcal{J}}}}}_{M})}\ \forall \ {t}_{j}\in F.$$

The future states result from applying identity maps at all history and future times except tj and the instrument \({{{{\mathcal{J}}}}}_{M}\) to the memory block to both the true and recovered process. See ref. 48 for a detailed analysis of using finite Markov order approximations to accurately prepare future states.

Whenever \({{{{\mathcal{J}}}}}_{M}\) spans the full space, i.e., is informationally complete, then any multi-time expectation value can be accurately approximated. In this case the distinguishability, by any means, is bounded by the memory strength:

Theorem 3

For informationally complete \({{{{\mathcal{J}}}}}_{M}\), the recovered process \({{{\Lambda }}}_{FMH}^{{{{{\mathcal{J}}}}}_{M}}\) gives sensible predictions for any instrument on M and

$${\left\Vert {{{\Upsilon }}}_{FMH}-{{{\Lambda }}}_{FMH}^{{{{{\mathcal{J}}}}}_{M}}\right\Vert }_{\Diamond}\le \sqrt{2{{\Theta }}({{{{\mathcal{J}}}}}_{M})},$$

where \(\parallel X{\parallel }_{\Diamond}:= {\sup }_{{{{\mathcal{J}}}} = \{{{\mathsf{O}}}^{(x)}\}}\parallel {\sum }_{x}{{{\rm{tr}}}}[{{\mathsf{O}}}^{(x)}X\otimes {\mathbb{1}}]\left|x\right\rangle \left\langle x\right|\ {\parallel }_{1}\) generalizes the diamond norm56 to quantum processes.

The memory strength thus provides an operationally-clear measure: if there exists some informationally complete instrument for which \({{\Theta }}({{{{\mathcal{J}}}}}_{M})\) is small, then Thm. 3 states that one can closely approximate the process for all instruments, even those for which the memory strength is large. If, additionally, \({{{\Upsilon }}}_{FH}^{({x}_{M})}\approx {{{\Upsilon }}}_{FH}^{({x}_{M}^{\prime})}\,\forall \,{x}_{M},{x}_{M}^{\prime}\) in the informationally complete instrument, then the process has small memory strength for all instruments; such processes resemble similar properties to approximately finite-memory classical stochastic processes (where there is only one instrument).

Case study

Consider a qubit S coupled to another qubit E, which is cooled by an external bath. The joint evolution follows:

$$\frac{\partial {\rho }_{t}^{SE}}{\partial t}=-i\xi [{\sigma }_{x}^{S}\otimes {\sigma }_{x}^{E},{\rho }_{t}^{SE}]+\kappa {{{\mathcal{L}}}}[{\sigma }_{-}^{E}]({\rho }_{t}^{SE}),$$

where the dissipator acts on E: \({{{\mathcal{L}}}}[{\sigma }_{-}^{E}]({\rho }_{t}^{SE}):= {\sigma }_{-}^{E}{\rho }_{t}^{SE}{\sigma }_{+}^{E}-\frac{1}{2}\{{\sigma }_{+}^{E}{\sigma }_{-}^{E},{\rho }_{t}^{SE}\}\), with \({\sigma }_{\pm }^{E}:= {\sigma }_{x}^{E}\pm i{\sigma }_{y}^{E}\). In ref. 57, it was shown that for κ2 ≥ 64ξ2, the process is CP-divisible, which is a common proxy for quantum Markovianity58,59; however, CP-divisibility only implies an absence of some kinds of memory60,61. Non-Markovianity ‘measures’ built upon two-time considerations—many of which are contradictory4—overlook multi-time effects. Indeed, this model contains higher-order correlations; by constructing the process tensor, we quantify the (non-vanishing) non-Markovianity for all (ξ, κ)  [0, 2] × [0, 10] (see Supplementary Methods E). We then examine the memory strength in various regimes by constructing three 6-time process tensors ϒ6:1(ξ, κ): one CP-divisible, one strongly non-Markovian, and one intermediate, and let M range from t2 to t5. We consider: (i) the identity map, which captures the natural memory strength; (ii) the “causal break” instrument, where the system is measured and independently reprepared in an informationally complete manner, breaking information flow through the system; and (iii) the completely noisy instrument, which replaces the state with white noise, quantifying noise-resistant memory14.

All processes have vanishing memory strength for the completely-noisy instrument, which can be implemented by applying random unitaries sampled from a set whose average is the depolarizing channel, providing a convenient way to bound memory46. For cases (i) and (ii), in Fig. 3 we plot the error in the multi-time expectation value (i.e., l.h.s. of Thm. 1) and a memory strength proxy based on the relative entropy between the Choi states of the true and recovered processes (see Supplementary Methods D) which upper bounds the r.h.s. of Thm. 1. The observable C is chosen as an initial preparation, followed by doing nothing for four steps, before a final measurement. Each process displays significant memory strength for the identity instrument, indicating that unperturbed memory does not decay rapidly. In contrast, the effects of interventions are seen for the causal break here, all memory effects detected result from environmental interactions since the causal break ensures no temporal correlations can be transmitted through the system (cf. the identity instrument). The CP-divisible process exhibits negligible memory strength, the intermediate process some, and the strongly non-Markovian one stronger still. We emphasize that the unperturbed evolution is better approximated by the process recovered from the informationally complete recovery scheme than that from the identity instrument, demonstrating Thm. 3.

Fig. 3: Case study.
figure 3

We plot \(| {\langle C\rangle }_{{{{\Upsilon }}}_{FMH}}-{\langle C\rangle }_{{\underline{{{\Lambda }}}}_{FMH}^{{{{{\mathcal{J}}}}}_{M}}}|\) (hollow, dashed) and the proxy memory strength based on the relative entropy between the Choi states of the true and recovered processes (see Supplementary Methods D) (solid) for (i) the identity map and (ii) a causal break. We construct a six-step process tensor in the strongly non-Markovian (red, circles), CP-divisible (blue, squares), and intermediate (green, diamonds) regimes, and consider the memory strength over applications of said instruments.


We have introduced the concept of memory strength for quantum stochastic processes, which is shown to bound process recoverability. Its applicability is exemplified by the case study, where we are able to accurately and efficiently reconstruct dynamics with a memory cutoff, even in a highly non-Markovian regime. We expect these tools to be broadly applicable to modern techniques for efficient simulation, where operationally motivated memory approximations with quantitative error bounds are desired, such as transfer tensor62,63,64,65,66 and machine-learning methods that either attempt to learn non-Markovian features67,68 or compress the memory to low-dimensional effective environments69,70. Our notion of memory strength and the associated concept of recoverability will play an important role in the characterization and mitigation of noise in quantum experiments where multi-time memory effects are present46,47,48,49. Of particular relevance in this direction, in ref. 46, the authors recovered the restricted process tensors on four IBM quantum computers and reported a reconstruction fidelity of order 10−3. In addition, in refs. 48,49, the authors directly applied our tools to drastically reduce the number of conditional circuits required to be estimated to characterize a multi-time process on the IBM quantum computer. In particular, they demonstrate that a finite Markov order model of length  = 2, 3 suffices to prepare future target states on a five step non-Markovian process with approximately 88% and 93% fidelity, respectively, in the presence of correlated noise. This is a huge improvement over previous ‘gold standard’ techniques using gate-set tomography to mitigate state-preparation-and-measurement errors, which gives fidelity values of around 75% in the same circumstances. Our present work lays the conceptual foundations for approximating processes with finite memory with requirements only as large as the complexity of the memory and these examples highlight the efficacy of our framework in realistic settings. Further developments in this direction will bridge the gap between efficient characterization and simulation of quantum processes with memory48,49,71,72.


Introduction to process tensor

Here we provide an introduction to the process tensor formalism; for details, see, e.g., refs. 8,11,34,73.

A discrete-time classical stochastic process is characterized by the joint probability distribution \({\mathbb{P}}\) over all sequences of events, \({\mathbb{P}}({x}_{n},\ldots ,{x}_{1})\), where we drop the explicit time labels with the understanding that xj represents an event at time tj. In multi-time quantum processes, it is important to not only capture the outcome of a measurement, but also the transformation induced on the state, which together constitute an event. Thus, an interrogation of a quantum stochastic process at time tj is described by an instrument \({{{{\mathcal{J}}}}}_{j}=\{{{{{\mathcal{O}}}}}_{j}^{({x}_{j})}\}\), which is a collection of completely positive (CP) maps that sum to a completely positive and trace preserving (CPTP) map. Instruments represent general quantum operations, including projective measurements, unitary transformations, and anything in between. Each CP map corresponds to a particular event realized and the fact that the maps sum to a CPTP one encodes the assumption that some event is observed. A discrete-time quantum stochastic process is uniquely described once the probability \({\mathbb{P}}({x}_{n},\ldots ,{x}_{1}| {{{{\mathcal{J}}}}}_{n},\ldots ,{{{{\mathcal{J}}}}}_{1})\) for all possible events {x1, …, xn} for all possible instruments \(\{{{{{\mathcal{J}}}}}_{1},\ldots ,{{{{\mathcal{J}}}}}_{n}\}\) are known. As a consequence of the linearity of mixing principle43, there exists a multi-linear functional \({{{{\mathcal{T}}}}}_{n:1}\) that takes any sequence of CP maps to the correct probability distribution via \({\mathbb{P}}({x}_{n},\ldots ,{x}_{1}| {{{{\mathcal{J}}}}}_{n},\ldots ,{{{{\mathcal{J}}}}}_{1})={{{{\mathcal{T}}}}}_{n:1}[{{{{\mathcal{O}}}}}_{n}^{({x}_{n})},\ldots ,{{{{\mathcal{O}}}}}_{1}^{({x}_{1})}]\), known as the process tensor11. The process tensor generalizes classical stochastic processes9 and reproduces classical properties appropriately74,75. As it encodes all detectable memory effects, it has been used to develop operationally meaningful notions of quantum Markovianity10,11 and memory length12,13,14.

Since all of the CP maps constituting the instruments, as well as the process tensor itself, are linear maps, they can be represented as matrices through the Choi–Jamiołkowski isomorphism (CJI)8. Any map \({{{\mathcal{O}}}}:{{{\mathcal{B}}}}({{{{\mathcal{H}}}}}_{{{\rm{i}}}})\to {{{\mathcal{B}}}}({{{{\mathcal{H}}}}}_{{{{o}}}})\), where \({{{\mathcal{B}}}}({{{\mathcal{X}}}})\) denotes the space of bounded linear operators on \({{{\mathcal{X}}}}\), can be mapped isomorphically to a matrix \({\mathsf{O}}\in {{{\mathcal{B}}}}({{{{\mathcal{H}}}}}_{{{{o}}}}\otimes {{{{\mathcal{H}}}}}_{{{\rm{i}}}})\) through its action on half of an (unnormalized) maximally entangled state \({{{\Psi }}}^{+}:= {\sum }_{i,j}\left|ii\right\rangle \left\langle jj\right|\in {{{\mathcal{B}}}}({{{{\mathcal{H}}}}}_{{{\rm{i}}}}\otimes {{{{\mathcal{H}}}}}_{{{\rm{i}}}})\), i.e., \({\mathsf{O}}:= ({{{\mathcal{O}}}}\otimes {{{\mathcal{I}}}})[{{{\Psi }}}^{+}]\). Note that the time of the event is associated to both an input and output Hilbert space, and the Choi matrix is a supernormalized bipartite state. In this representation, the properties of CP and TP for the maps respectively translate to O ≥ 0 and \({{{{\rm{tr}}}}}_{{{{o}}}}\left[{\mathsf{O}}\right]={{\mathbb{1}}}_{{{\rm{i}}}}\). To aid intuition, note that the output state \({\rho }^{\prime}:= {{{\mathcal{O}}}}[\rho ]\) of the map \({{{\mathcal{O}}}}\) acting on an arbitrary input state ρ is computed in the Choi picture via \({\rho }^{\prime}={{{{\rm{tr}}}}}_{{\mathtt{i}}}\left[({{\mathbb{1}}}_{{{{o}}}}\otimes {\rho }_{{{\rm{i}}}\,}^{{{{\rm{T}}}}}){\mathsf{O}}\right]\). If the initial state is not subject to a (CPTP) quantum channel but instead a particular event is observed, associated to a CP map of an instrument \({{{\mathcal{J}}}}=\{{{\mathsf{O}}}^{(x)}\}\), then the corresponding probability is computed via

$${\mathbb{P}}(x| {{{\mathcal{J}}}})={{{\rm{tr}}}}\left[({{\mathbb{1}}}_{{{{o}}}}\otimes {\rho }_{{{\rm{i}}}\,}^{{{{\rm{T}}}}}){{\mathsf{O}}}_{\,{{{o}}}{{\rm{i}}}\,}^{(x)}\right].$$

Similarly, the action of a process tensor map \({{{{\mathcal{T}}}}}_{n:1}\) on a sequence of instrument elements \(\{{{{{\mathcal{O}}}}}_{1}^{({x}_{1})},\ldots ,{{{{\mathcal{O}}}}}_{n}^{({x}_{n})}\}\) can be expressed in terms of a multiplication of their Choi matrices and a trace as follows43

$${\mathbb{P}}({x}_{n},\ldots ,{x}_{1}| {{{{\mathcal{J}}}}}_{n},\ldots ,{{{{\mathcal{J}}}}}_{1}) ={{{\rm{tr}}}}\left[{({{\mathsf{O}}}_{n}^{({x}_{n})}\otimes \ldots \otimes {{\mathsf{O}}}_{1}^{({x}_{1})})}^{{{{\rm{T}}}}}{{{\Upsilon }}}_{n:1}\right],$$

where \({{{\Upsilon }}}_{n:1}\in {{{\mathcal{B}}}}({{{{\mathcal{H}}}}}_{{n}^{{{{i}}}}}\otimes {{{{\mathcal{H}}}}}_{n-{1}^{{\circ}}}\otimes \ldots \otimes {{{{\mathcal{H}}}}}_{{1}^{{{{i}}}}})\) is the (2n − 1)-partite Choi matrix of the process tensor map \({{{{\mathcal{T}}}}}_{n:1}\) and each \({{\mathsf{O}}}_{j}^{({x}_{j})}\) is the Choi matrix of \({{{{\mathcal{O}}}}}_{j}^{({x}_{j})}\). The Choi state of the process \({{{{\Upsilon }}}_{n:1}}\) plays the role of a quantum state over time, insasmuch as it encodes all observable probability distributions for all possible instrument sequences (just as a quantum state encodes all observable probability distributions for any choice of POVM). In the (one-time) spatial setting, \({{{\Upsilon }}}_{1}={{\mathbb{1}}}_{{{{o}}}}\otimes {\rho }_{{{\rm{i}}}\,}^{{{{\rm{T}}}}}\) and Eq. (16) reduces to Eq. (15). Note that we label the Hilbert spaces logically from the perspective of the experimenter (i.e., the experimenter receives a state from the process that is “input” into their instrument of choice, transforming it into an “output” state that is fed back into the process); hence, \(\mbox{i}\) denotes outputs of the process and \(\mbox{o}\) denotes inputs to the process. Whenever a process tensor acts on an instrument sequence, the degrees of freedom with the same labels (timestep and input/output) are contracted over.

The natural generalisation \({{{{\Upsilon }}}_{n:1}}\) of the CJI applied to the multilinear map \({{{{\mathcal{T}}}}}_{n:1}\) is constructed by feeding one half of an (unnormalized) maximally entangled state into the dynamics at each time8. More precisely, begin with the system-environment dilated dynamics shown in Fig. 2 (green), and denote the initial system-environment state by ρ and the unitary maps describing the joint evolution between times tj−1 and tj by \({{{{\mathcal{U}}}}}_{j:j-1}\). Now consider n − 1 additional maximally entangled pairs, \({{{\Psi }}}_{{j}^{{\circ}}{j}^{{\circ}}}^{+}\) associated to auxilliary systems \({A}_{{j}^{{{\circ}}}}\simeq S\), collectively described as \({{{\Psi }}}_{n-1}^{{+}}:={\bigotimes}_{j=1}^{n-1}{{{\Psi }}}_{{j}^{\circ}{j}^{\circ}}^{+}\). Letting the unitary maps between each timestep act on the environment and one half of the appropriate auxilliary systems, i.e., \({{{{\mathcal{U}}}}}_{j:j-1}:{{{\mathcal{B}}}}({{{{\mathcal{H}}}}}_{j-{1}^{\circ}}\otimes {{{{\mathcal{H}}}}}_{E})\to {{{\mathcal{B}}}}({{{{\mathcal{H}}}}}_{{j}^{i}}\otimes {{{{\mathcal{H}}}}}_{E})\) yields the Choi state of the process tensor

$${{{\Upsilon }}}_{n:1}=\ {{{{\rm{tr}}}}}_{E}[{{{{\mathcal{U}}}}}_{n:n-1}\ldots {{{{\mathcal{U}}}}}_{2:1}(\rho \otimes {{{\Psi }}}_{n-1}^{+})].$$

It is straightforward (albeit arduous) to verify the correctness of Eq. (16) via direct insertion of Eq. (17). Natural generalizations of complete positivity and trace preservation to multi-time processes translate respectively to \({{{{\Upsilon }}}_{n:1}}\) ≥ 0 and the following hierarchy of trace conditions

$${\rm{tr}}_{{j}^{{{i}}}}\left[{{{\Upsilon }}}_{j:1}\right]={{\mathbb{1}}}_{j-{1}^{{\circ}}}\otimes {{{\Upsilon }}}_{j-1:1},\quad \forall j.$$

Conversely, any operator satisfying the above represents some (causally-ordered) quantum dynamics inasmuch as it corresponds to a fixed underlying system–environment circuit11,34.

Eq. (16) constitutes a special case of how (higher-order) quantum maps act on each other: here, we are contracting all open slots of the process tensor with an operation associated to each time in order to yield a probability distribution. More generally, it is possible to consider applying instruments to only a subset of times, yielding a conditional process defined upon the remaining times, which describes the correct behaviour of the concatenated dynamics. In other words, it contains all of the information required to compute the correct probability distribution for any instruments applied to the remaining times. To compute such an object in the Choi representation, one uses the link product defined in ref. 36. Essentially, this amounts to restricting both the trace and the transposition in Eq. (16) to only the common Hilbert spaces associated to the relevant subset of times where the instrument is being applied.

For instance, grouping the times into history \(H = \{t_{1}, \, \ldots , t_k\}\), memory \(M = \{t_{k+1}, \, \ldots , t_{k + \ell}\}\) and future \(F = \{t_{k + \ell + 1}, \, \ldots, t_n\}\) and choosing an instrument \({{{{\mathcal{J}}}}}_{M}=\{{{\mathsf{O}}}_{M}^{({x}_{M})}\}\) on the memory block, the conditional future-history process that occurs given any particular event sequence \({{\mathsf{O}}}_{M}^{({x}_{M})}\) on M alone is

$${\widetilde{{{\Upsilon }}}}_{FH}^{({x}_{M})}={{{{\rm{tr}}}}}_{M}\left[{{\mathsf{O}}}_{M}^{({x}_{M}){{{\rm{T}}}}}{{{\Upsilon }}}_{FMH}\right].$$

Such a conditional process is generically correlated across F and H; however, if it is of tensor product form \({\widetilde{{{\Upsilon }}}}_{FH}^{({x}_{M})}={{{\Upsilon }}}_{F}^{({x}_{M})}\otimes {\widetilde{{{\Upsilon }}}}_{H}^{({x}_{M})}\) for each event xM of the instrument \({{{{\mathcal{J}}}}}_{M}\), the process has Markov order M with respect to said instrument12. (In general, each \({{\mathsf{O}}}_{M}^{({x}_{M})}\) may act on only a subspace of M, with the history and future retaining the rest, to yield \({\widetilde{{{\Upsilon }}}}_{F{M}_{F}{M}_{H}H}^{({x}_{M})}\), where MF and MH can depend on xM (see Supplementary Methods B); for brevity, we absorb these into F and H.)

Lastly, a Markovian process corresponds to one for which the process tensor has the specific tensor product structure of an uncorrelated sequence of CPTP maps \(\{{{{\Lambda }}}_{{j}^{{{{i}}}}:j-{1}^{{\circ}}}\}\) connecting adjacent timesteps, and an initial quantum state \({\rho }_{{1}^{{{\rm{i}}}}}\)10

$${{{\Upsilon }}}_{n:1}^{{{{\rm{Markov}}}}}={{{\Lambda }}}_{{n}^{{{{i}}}}:n-{1}^{{\circ}}}\otimes {{{\Lambda }}}_{n-{1}^{{{{i}}}}:n-{2}^{{\circ}}}\otimes {{{\Lambda }}}_{{2}^{{{{i}}}}:{1}^{{\circ}}}\otimes {\rho }_{{1}^{{{\rm{i}}}}}.$$


To begin with, we introduce the following definition:

Definition 4

(Instrument relative entropy). For any family of instruments \({\mathbb{J}}\) and process tensors \({\Upsilon},{\Gamma}\),

$${S}_{{\mathbb{J}}}({{\Upsilon }}\parallel {{\Gamma }}):= \mathop{{{{\rm{sup}}}}}\limits_{{{{\mathcal{J}}}}\in {\mathbb{J}}}S\left({{{{\mathcal{P}}}}}_{{{{\mathcal{J}}}}}[{{\Upsilon }}]\parallel {{{{\mathcal{P}}}}}_{{{{\mathcal{J}}}}}[{{\Gamma }}]\right),$$

where \(S(A\parallel B):= {{{\rm{tr}}}}\left[A({{\mathrm{log}}}\,A-{{\mathrm{log}}}\,B)\right]\) is the quantum relative entropy and \({{{{\mathcal{P}}}}}_{{{{\mathcal{J}}}}}\) is a CP map from process tensors to classical pointer states, whose elements form a probability distribution over outcomes of the instrument \({{{\mathcal{J}}}}=\{{{\mathsf{O}}}^{(x)}\}\)

$${{{{\mathcal{P}}}}}_{{{{\mathcal{J}}}}}[{{\Upsilon }}]:= \mathop{\sum}\limits_{x}{{{\rm{tr}}}}\left[{{\mathsf{O}}}^{(x){{{\rm{T}}}}}{{\Upsilon }}\right]\left|x\right\rangle \left\langle x\right|.$$

Proposition 5

For any \({{{\Upsilon }}}_{FMH},{{{{\mathcal{J}}}}}_{M}\) and \({\underline{{{\Lambda }}}}_{FMH}^{{{{{\mathcal{J}}}}}_{M}}\) as defined in Eq. (10),

$${{\Theta }}({{{{\mathcal{J}}}}}_{M})={S}_{{\mathbb{J}}\cap {{{\rm{span}}}}({{{{\mathcal{J}}}}}_{M})}\left({{{\Upsilon }}}_{FMH}\parallel {\underline{{{\Lambda }}}}_{FMH}^{{{{{\mathcal{J}}}}}_{M}}\right),$$

with\({{\Theta }}({{{{\mathcal{J}}}}}_{M})\)taken to be the measured CMI, \({\mathbb{J}}\cap {{{\rm{span}}}}({{{{\mathcal{J}}}}}_{M})\)a family of instruments whose elements have support onMonly in the linear span of the elements of\({{{{\mathcal{J}}}}}_{M}\).

Proof. Consider first the measured conditional probability distributions for a fixed memory instrument \({{{{\mathcal{J}}}}}_{M}\) and arbitrary (independent) \({{{{\mathcal{J}}}}}_{F},{{{{\mathcal{J}}}}}_{H}\) arising from the process tensor \({{{{\Upsilon }}}_{FMH}}\) and the recovered process \({\underline{{{\Lambda }}}}_{FMH}^{{{{{\mathcal{J}}}}}_{M}}\), which are respectively given by:

$${{\mathbb{P}}}_{{{{\Upsilon }}}_{FMH}}({x}_{F},{x}_{H}| {x}_{M})\ =\ {{{\rm{tr}}}}\left[{({{\mathsf{O}}}_{F}^{({x}_{F})}\otimes {{\mathsf{O}}}_{H}^{({x}_{H})})}^{{{{\rm{T}}}}}{\widetilde{{{\Upsilon }}}}_{FH}^{({x}_{M})}\right]$$


$${{\mathbb{P}}}_{{\underline{{{\Lambda }}}}_{FMH}^{{{{{\mathcal{J}}}}}_{M}}}({x}_{F},{x}_{H}| {x}_{M})\ =\ {{{\rm{tr}}}}\left[{({{\mathsf{O}}}_{F}^{({x}_{F})}\otimes {{\mathsf{O}}}_{H}^{({x}_{H})})}^{{{{\rm{T}}}}}{\widetilde{\underline{{{\Lambda }}}}}_{FH}^{({x}_{M})}\right]\ .$$

Here, we have defined \({\widetilde{\underline{{{\Lambda }}}}}_{FH}^{({x}_{M})}:= {{{{\rm{tr}}}}}_{M}\left[{{\mathsf{O}}}_{M}^{({x}_{M}){{{\rm{T}}}}}{\underline{{{\Lambda }}}}_{FMH}^{{{{{\mathcal{J}}}}}_{M}}\right]\), which, by construction, factorizes as

$$\begin{array}{ll}{{\mathbb{P}}}_{{\underline{{{\Lambda }}}}_{FMH}^{{{{{\mathcal{J}}}}}_{M}}}&({x}_{F},{x}_{H}| {x}_{M})\\ &={{{\rm{tr}}}}\left[{({{\mathsf{O}}}_{F}^{({x}_{F})}\otimes {{\mathsf{O}}}_{H}^{({x}_{H})})}^{{{{\rm{T}}}}}({{{\Upsilon }}}_{F}^{({x}_{F})}\otimes {\widetilde{{{\Upsilon }}}}_{H}^{({x}_{H})})\right]\\ &={{\mathbb{P}}}_{{{{\Upsilon }}}_{FMH}}({x}_{F}| {x}_{M}){{\mathbb{P}}}_{{{{\Upsilon }}}_{FMH}}({x}_{H}| {x}_{M}),\end{array}$$

since \({\widetilde{\underline{{{\Lambda }}}}}_{FH}^{({x}_{M})}={{{\Upsilon }}}_{F}^{({x}_{F})}\otimes {\widetilde{{{\Upsilon }}}}_{H}^{({x}_{H})}\).

With this, we can express the mutual information in the correlated distribution \(I{(F:H)}_{{{\mathbb{P}}}_{{{\Upsilon }}}({x}_{F},{x}_{H}| {x}_{M})}= I_{\mathbb{J}}(F:H)_{\Upsilon}\) [see Eq. (8)] in terms of the relative entropy between said distribution and the uncorrelated one arising from measurements on the recovered process, i.e.,

$$\begin{array}{ll}{I}_{{\mathbb{J}}}{(F:H)}_{{{\Upsilon }}}=S[{{\mathbb{P}}}_{{{{\Upsilon }}}_{FMH}}({x}_{F},{x}_{H}| {x}_{M})\parallel {{\mathbb{P}}}_{{\underline{{{\Lambda }}}}_{FMH}^{{{{{\mathcal{J}}}}}_{M}}}({x}_{F},{x}_{H}| {x}_{M})].\end{array}$$

Thus, beginning with the definition of Eq. (7), we have

$${{\Theta }}({{{{\mathcal{J}}}}}_{M})=\mathop{{{{\rm{sup}}}}}\limits_{{{{{\mathcal{J}}}}}_{F},{{{{\mathcal{J}}}}}_{H}\in {\mathbb{J}}}\left(\mathop{\sum}\limits_{{x}_{M}}p({x}_{M}| {{{{\mathcal{J}}}}}_{M}){I}_{{\mathbb{J}}}{(F:H| {x}_{M})}_{{{\Upsilon }}}\right)={S}_{{{\mathbb{J}}}_{FmH}}({{{\Upsilon }}}_{FmH}^{{{{{\mathcal{J}}}}}_{M}}\parallel {{{\Lambda }}}_{FmH}^{{{{{\mathcal{J}}}}}_{M}}).$$

Here \({{\mathbb{J}}}_{FmH}\) is the original set of uncorrelated instruments \({\mathbb{J}}\) on FH, combined with a POVM on the pointer space m, i.e., the supremum in the measured relative entropy is taken over \({{{\mathcal{J}}}}=\{{{\mathsf{O}}}_{FmH}^{(x)}\}\in {{\mathbb{J}}}_{FmH}\) of the form

$${{\mathsf{O}}}_{FmH}^{(x)}=\mathop{\sum}\limits_{{x}_{M}}{{\mathsf{E}}}_{FH}^{(x,{x}_{M})}\otimes \left|{x}_{M}\right\rangle {\left\langle {x}_{M}\right|}_{m},$$

where \({{\mathsf{E}}}_{FH}^{(x,{x}_{M})}\) can be any operator, \({{\mathsf{O}}}_{FmH}^{{{{\mathcal{J}}}}}={\sum }_{x}{{\mathsf{O}}}_{FmH}^{(x)}\) satisfies the relevant trace conditions on the FH part and \({{{{\rm{tr}}}}}_{FH}\left[{{\mathsf{O}}}_{FmH}^{{{{\mathcal{J}}}}}\right]={D}_{FH}^{{\mathtt{i}}}{{\mathbb{1}}}_{m}\). Since, for \({{{{\mathcal{J}}}}}_{M}=\{{{\mathsf{O}}}_{M}^{({x}_{M})}\}\),

$${{{{\rm{tr}}}}}_{m}\left[\left|{x}_{M}\right\rangle {\left\langle {x}_{M}\right|}_{m}{{{\Gamma }}}_{FmH}^{{{{{\mathcal{J}}}}}_{M}}\right]={{{{\rm{tr}}}}}_{M}\left[{{\mathsf{O}}}_{M}^{({x}_{M})}\mathop{\sum}\limits_{{y}_{M}}{{{\Gamma }}}_{FH}^{({y}_{M})}\otimes {{\mathsf{D}}}_{M}^{({y}_{M})}\right],$$

with Γ  {ϒ, Λ} and \(\{{{\mathsf{D}}}_{M}^{({x}_{M})}\}\) the dual set to \({{{{\mathcal{J}}}}}_{M}\), we have

$${S}_{{{\mathbb{J}}}_{FmH}}({{{\Upsilon }}}_{FmH}^{{{{{\mathcal{J}}}}}_{M}}\parallel {{{\Lambda }}}_{FmH}^{{{{{\mathcal{J}}}}}_{M}})={S}_{{\mathbb{J}}\cap {{{\rm{span}}}}({{{{\mathcal{J}}}}}_{M})}({{{\Upsilon }}}_{FMH}\parallel {\underline{{{\Lambda }}}}_{FMH}^{{{{{\mathcal{J}}}}}_{M}}),$$

with \({\underline{{{\Lambda }}}}_{FMH}^{{{{{\mathcal{J}}}}}_{M}}={\sum }_{{x}_{M}}{{{\Upsilon }}}_{F}^{({x}_{M})}\otimes {{\mathsf{D}}}_{M}^{({x}_{M})}\otimes {\widetilde{{{\Upsilon }}}}_{H}^{({x}_{M})}\). Hence, the claim is asserted. □

We are now in a position to prove our main results.

Proof of Thm. 1

First, we note that, for any set of instruments \({\mathbb{J}}\) and process tensors \(\Upsilon\) and \(\Gamma\),

$${S}_{{\mathbb{J}}}({{\Upsilon }}\parallel {{\Gamma }})=\mathop{{{{\rm{sup}}}}}\limits_{{{{\mathcal{J}}}}\in {\mathbb{J}}}\mathop{\sum}\limits_{x}{p}_{x}^{{{{\mathcal{J}}}}}({{\mathrm{log}}}\,{p}_{x}^{{{{\mathcal{J}}}}}-{{\mathrm{log}}}\,{q}_{x}^{{{{\mathcal{J}}}}}),$$

with \({p}_{x}^{{{{\mathcal{J}}}}}:= {{{\rm{tr}}}}\left[{{\mathsf{O}}}^{(x){{{\rm{T}}}}}{{{\Upsilon }}}_{FMH}\right]\) and \({q}_{x}^{{{{\mathcal{J}}}}}:= {{{\rm{tr}}}}\left[{{\mathsf{O}}}^{(x){{{\rm{T}}}}}{\underline{{{\Lambda }}}}_{FMH}^{{{{{\mathcal{J}}}}}_{M}}\right]\) the probabilities associated with the instrument \({{{\mathcal{J}}}}=\{{{\mathsf{O}}}^{(x)}\}\). We can then use Pinsker’s inequality to write

$$\begin{array}{ll}{S}_{{\mathbb{J}}}({{\Upsilon }}\parallel {{\Gamma }})&\ge \frac{1}{2}\mathop{{{{\rm{sup}}}}}\limits_{{{{\mathcal{J}}}}\in {\mathbb{J}}}{\left({\sum }_{x}| {p}_{x}^{{{{\mathcal{J}}}}}-{q}_{x}^{{{{\mathcal{J}}}}}| \right)}^{2}\\ &=\frac{1}{2}{\left(\mathop{{{{\rm{sup}}}}}\limits_{{{{\mathcal{J}}}}\in {\mathbb{J}}}{\sum }_{x}\left|{{{\rm{tr}}}}\left[{{\mathsf{O}}}^{(x){{{\rm{T}}}}}({{\Upsilon }}-{{\Gamma }})\right]\right|\right)}^{2}.\end{array}$$

Any multi-time operator C can be decomposed in terms of the elements of a single instrument \({{{\mathcal{J}}}}=\{{{\mathsf{O}}}^{(x)}\}\in {\mathbb{J}}\), as long as those elements span a sufficiently large space. That is, \(C={\sum }_{x}{c}_{x}^{{{{\mathcal{J}}}}}{{\mathsf{O}}}^{(x)}\) with \({c}_{x}^{{{{\mathcal{J}}}}}\in {\mathbb{C}}\); in general, the norm \(| C{| }_{{{{\mathcal{J}}}}}:= \sqrt{{\sum }_{x}| {c}_{x}^{{{{\mathcal{J}}}}}{| }^{2}}\) will vary with the instrument involved in the decomposition. We therefore have, using the Cauchy–Schwarz inequality

$$\begin{array}{ll}\left|{{{\rm{tr}}}}\left[C{{\Xi }}\right]\right|&\le \mathop{\mathrm{inf}}\limits_{{{{\mathcal{J}}}}\in {\mathbb{J}}}\left(\sqrt{{\sum }_{x}| {c}_{x}^{{{{\mathcal{J}}}}}{| }^{2}}\sqrt{{\sum }_{x}{\left|{{{\rm{tr}}}}\left[{{\mathsf{O}}}^{(x){{{\rm{T}}}}}{{\Xi }}\right]\right|}^{2}}\right)\\ &\le \mathop{\mathrm{inf}}\limits_{{{{\mathcal{J}}}}\in{\mathbb{J}}}\left(| C{| }_{{{{\mathcal{J}}}}}{\sum }_{x}\left|{{{\rm{tr}}}}\left[{{\mathsf{O}}}^{(x){{{\rm{T}}}}}{{\Xi }}\right]\right|\right)\\ \qquad&\le \mathop{\mathrm{inf}}\limits_{{{{\mathcal{J}}}}\in {\mathbb{J}}}\left(| C{| }_{{{{\mathcal{J}}}}}\right)\mathop{{{{\rm{sup}}}}}\limits_{{{{\mathcal{J}}}}\in {\mathbb{J}}}\left({\sum }_{x}\left|{{{\rm{tr}}}}\left[{{\mathsf{O}}}^{(x){{{\rm{T}}}}}{{\Xi }}\right]\right|\right),\end{array}$$

for any operator Ξ. Choosing \({{\Xi }}={{{\Upsilon }}}_{FMH}-{\underline{{{\Lambda }}}}_{FMH}^{{{{{\mathcal{J}}}}}_{M}}\) and restricting the set \({\mathbb{J}}\) to \({\mathbb{J}}\cap {{{\rm{span}}}}({{{{\mathcal{J}}}}}_{M})\), such that \({{{\rm{tr}}}}\left[C{{\Xi }}\right]={\langle C\rangle }_{{{{\Upsilon }}}_{FMH}}-{\langle C\rangle }_{{\underline{{{\Lambda }}}}_{FMH}^{{{{{\mathcal{J}}}}}_{M}}}\), and combining Eqs. (31) and (32) leads to the bound

$$\left|{\langle C\rangle }_{{{{\Upsilon }}}_{FMH}}-{\langle C\rangle }_{{\underline{{{\Lambda }}}}_{FMH}^{{{{{\mathcal{J}}}}}_{M}}}\right| \le \,| {{{\bf{C}}}}| \sqrt{2{S}_{{\mathbb{J}}\cap {{{\rm{span}}}}({{{{\mathcal{J}}}}}_{M})}({{{\Upsilon }}}_{FMH}\parallel {\underline{{{\Lambda }}}}_{FMH}^{{{{{\mathcal{J}}}}}_{M}})},$$

where \(| {{{\bf{C}}}}| =\mathop{\min }\limits_{{{{\mathcal{J}}}}\in {\mathbb{J}}\cap {{{\rm{span}}}}({{{{\mathcal{J}}}}}_{M})}| C{| }_{{{{\mathcal{J}}}}}\) (equivalent to the definition given in Thm. 1). Invoking Prop. 5 proves Thm. 1. □

Cor. 2 follows directly as shown below.

Proof of Cor. 2

Choose \(C={{\mathsf{P}}}_{j}\otimes {{{{\Psi }}}^{+}}^{\otimes j-k-2}\otimes {{\mathsf{O}}}_{M}^{({x}_{M})}\otimes {{{{\Psi }}}^{+}}^{\otimes k-\ell -2}\), with \({{{\Psi }}}^{+}:= {\sum }_{\alpha \beta }\left|\alpha \alpha \right\rangle \left\langle \beta \beta \right|\) the Choi state of the identity map and Pj a projector. Then C = 1, since C is an element of the instrument where the system is left to freely evolve on H, \({{{{\mathcal{J}}}}}_{M}\) is applied, it again freely evolves to time tj and then the POVM {Pj, 1 − Pj} is applied. It follows that \({\langle C\rangle }_{{{\Upsilon }}}={{{\rm{tr}}}}\left[{{\mathsf{P}}}_{j}{\rho }_{j}^{({x}_{M})}\right]\), with \({\rho }_{j}^{({x}_{M})}:= {{{{\rm{tr}}}}}_{FMH\backslash j}\left[\left({{{\Psi }}}_{H}^{+}\otimes {{\mathsf{O}}}_{M}^{({x}_{M})}\otimes {{{\Psi }}}_{F\backslash j}^{+}\right){{{\Upsilon }}}_{FMH}\right]\) the state, at time tj, of the system undergoing the process specified by \({{{{\Upsilon }}}_{FMH}}\), acted on by \({{{{\mathcal{J}}}}}_{M}\) with outcome xM occurring, and with no other active interventions. Similarly, the predicted state is \({\rho }_{j}^{{\prime} ({x}_{M})}:= {{{{\rm{tr}}}}}_{FMH\backslash j}\left[\left({{{\Psi }}}_{H}^{+}\otimes {{\mathsf{O}}}_{M}^{({x}_{M})}\otimes {{{\Psi }}}_{F\backslash j}^{+}\right){\underline{{{\Lambda }}}}_{FMH}^{{{{{\mathcal{J}}}}}_{M}}\right]\). Here, Ψ+ denotes the Choi state of the identity map and \({{{\Psi }}}_{X}^{+}\) is shorthand for a sequence of identity maps applied at all times in the block X, and j corresponds to a subspace of the future Hilbert space associated to time tj. The l.h.s. of Eq. (11) of the main text then reduces to \(| {{{\rm{tr}}}}[{{\mathsf{P}}}_{j}({\rho }_{j}^{({x}_{M})}-{\rho ^{\prime} }_{j}^{({x}_{M})})]|\); since the bound must be true for any Pj, it must be true for the one for which the l.h.s. is largest; i.e., \({\sup }_{{{\mathsf{P}}}_{j}}| {{{\rm{tr}}}}[{{\mathsf{P}}}_{j}({\rho }_{j}^{({x}_{M})}-{\rho ^{\prime} }_{j}^{({x}_{M})})]| =\parallel {\rho }_{j}^{({x}_{M})}-{\rho ^{\prime} }_{j}^{({x}_{M})}{\parallel }_{1}\) is bounded. □

For informationally complete instruments, a combination of the results derived above leads to Thm. 3.

Proof of Thm. 3

When \({{{{\mathcal{J}}}}}_{M}\) is informationally complete, \({\underline{{{\Lambda }}}}_{FMH}^{{{{{\mathcal{J}}}}}_{M}}\) is a full process tensor and any instrument can be applied to it, since \({\mathbb{J}}\cap {{{\rm{span}}}}({{{{\mathcal{J}}}}}_{M})={\mathbb{J}}\) by definition. Therefore, we can use Eq. (31), along with Prop. 5, to write:

$${\left(\mathop{{{{\rm{sup}}}}}\limits_{{{{\mathcal{J}}}}\in {\mathbb{J}}}{\sum }_{x}\left|{{{\rm{tr}}}}\left[{{\mathsf{O}}}^{(x)}({{{\Upsilon }}}_{FMH}-{\underline{{{\Lambda }}}}_{FMH}^{{{{{\mathcal{J}}}}}_{M}})\right]\right|\right)}^{2}\le 2{{\Theta }}({{{{\mathcal{J}}}}}_{M})$$

The square root of the l.h.s. of this equation is the generalized diamond distance \({\left\Vert {{{\Upsilon }}}_{FMH}-{\underline{{{\Lambda }}}}_{FMH}^{{{{{\mathcal{J}}}}}_{M}}\right\Vert }_{\Diamond}\) with X defined in Thm. 3. Equation (13) and hence Thm. 3 follows. □