Non-Markovian memory strength bounds quantum process recoverability

Generic non-Markovian quantum processes have infinitely long memory, implying an exact description that grows exponentially in complexity with observation time. Here, we present a finite memory ansatz that approximates (or recovers) the true process with errors bounded by the strength of the non-Markovian memory. The introduced memory strength is an operational quantity and depends on the way the process is probed. Remarkably, the recovery error is bounded by the smallest memory strength over all possible probing methods. This allows for an unambiguous and efficient description of non-Markovian phenomena, enabling compression and recovery techniques pivotal to near-term technologies. We highlight the implications of our results by analyzing an exactly solvable model to show that memory truncation is possible even in a highly non-Markovian regime.


INTRODUCTION
Our ability to manipulate quantum systems underpins potential advantages over classical technologies [1].Their dynamics are often idealized as noiseless or, if unavoidable, noise is assumed to be uncorrelated.However, interactions with the environment generally perpetuate past information about the system to the future, thereby serving as a memory.Memory effects thus pervade physical evolutions, resulting in non-Markovian dynamics [2].The complexity of describing processes grows exponentially with memory length; therefore many simulation techniques invoke memory cutoffs [3].Although several metrics have been proposed to quantify memory (and the consequences of neglecting it) [4], most do not consider the influence of interventions, overlooking the operational reality of sequentially probed dynamics.Indeed, the impact of memory depends on how a system is controlled [3][4][5][6][7][8][9][10][11][12][13][14]: generically, the system-environment state at any time is correlated, so an interrogation directly influences the system state and conditions the environment, both of which affect the future.Detected memory properties are thus naturally related to the interrogation method [14].This operational perspective has important consequences for dynamical decoupling [15][16][17], erasure or transmission of information [18,19], correlated error correction and characterization [20,21], and (operational) quantum thermodynamics [22][23][24][25][26][27].
This can be illustrated with the shallow pocket model [16,17,28,29], comprising a qubit coupled to a continuous degree of freedom (see Fig. 1 and Appendix A).The joint dynamics induces pure-dephasing Lindblad evolution for the qubit, with exponentially-decaying coherences.Non-classical correlations between any preparation and measurement similarly vanish, so the reduced dynamics forgets the initial state.However, the evolution following a σ x unitary reverts the system to its original state; in this sense, the process displays infinitelylong memory.This example highlights that, although certain FIG. 1. Memory in shallow pocket dynamics.The mutual information I(S : A) between a system and ancilla initially in a Bell pair decays exponentially as the system undergoes shallow pocket evolution (black).However, this is not the case if an intervention is applied at t1 (= 5 above).We depict σx (blue); an offset rotation √ 0.95σx + √ 0.05σz (green); measurement of + in the x-basis (red); and trash-and-reprepare (purple).
temporal correlations of the unperturbed system may decay rapidly, they do not account for the whole story; more generally, there exist detectable correlations between the history and future processes.
Disentangling the non-Markovian memory, carried by the environment, from the temporal correlations that result from probing the system is a challenge that has only recently been resolved through an operational framework for describing quantum stochastic processes [10,11].Subsequently, a notion of memory length or quantum Markov order was introduced, which reduces the complexity of describing processes with short-term memory by only retaining the minimal number of recent timesteps relevant to the dynamics [12,13].Higherorder Markov models capture all memory effects below the Markov order and are therefore more accurate than Markovian approximations.However, a missing element from this theory so far has been a method for quantifying the strength of non-Markovian memory and developing efficient approximations accordingly: truncating weak temporal-correlations will yield a more efficient description for a process without compromising the accuracy.
In this article, we construct an operational notion of memory strength for quantum processes.That is, we quantify the correlations between history and future processes with respect to an intermediate (multi-time) probing schema (see Fig. 2).Our main result links said memory strength with process recoverability: with respect to any interrogation sequence, one can approximate the process by discarding futurehistory correlations.If the memory strength is small for some instrument (family), then the approximate process accurately predicts expectation values for related observables, namely those in the linear span of the original instrument elements.This connection is akin to that between conditional mutual information and fidelity of recovery for quantum states [30,31], and involves a generalization of the measured relative entropy [32] to quantum stochastic processes.A corollary follows for the 'do nothing' instrument-which many approximations assume-which bounds the accuracy of predicting future states.Moreover, the memory strength for an informationally-complete instrument bounds the distinguishability between the actual and recovered process for any experimental protocol.Lastly, we demonstrate our results via a solvable non-Markovian model, highlighting the complex memory structures amenable to our framework.

BACKGROUND A. Classical Stochastic Processes
We begin by reviewing the pertinent ingredients from the theory of classical stochastic processes before turning our attention to quantum stochastic processes.A classical stochastic process over a discrete set of times T := {t 1 , . . ., t n } is described by an n-point joint probability distribution P(x n , . . ., x 1 ).A process has finite-length memory whenever the probability of each event x k at time t k ∈ T only conditionally depends upon the past events: Here , the minimum number for which Eq. ( 1) holds, denotes the Markov order; a Markovian process has ≤ 1. Markov order captures the complexity of characterizing a process, which grows exponentially in .Although may be large for many processes, their memory can be truncated, permitting efficient approximation.Grouping the times into three segments: the history H = {t 1 , . . ., t k− −1 }, memory M = {t k− , . . ., t k−1 } and future F = {t k , . . ., t n } (see Fig. 2), Markov order implies the conditional factorization i.e., the future and history are conditionally independent given memory events. 1 This is equivalently expressed by the vanish-FIG.2. Quantum stochastic process.An open quantum process with an initial system-environment state and unitary evolutions (green) can be represented as a process tensor ΥF M H (outline).For each event of a memory instrument sequence, JM = {O ing classical conditional mutual information (CMI), where The CMI is interpreted as the memory strength of the process.For processes with Markov order , I(F : H|M ) = 0 by Eq. ( 2); however, in general P F M H does not conditionally factorize, and the CMI quantifies the correlations between F and H, given M .The significance of approximately-finite Markov order is best encapsulated through the recovery map, R M →F M , that acts only on M to approximate the correct future statistics: with equality holding for processes with Markov order ≤ .Intuitively, the recovery map discards conditional future-history correlations and uses the r.h.s. of Eq. ( 2) as an approximate description.Importantly, the recovery map approximates the process with an error that is bounded by the CMI [30,31], with complexity reduced to the approximate Markov order.Thus, whenever the memory is weak, the recovery map provides an accurate and efficient approximation.

B. Quantum Stochastic Processes
We now move to the realm of quantum stochastic processes.To quantify the strength of memory in quantum processes, we begin by introducing the process tensor framework, detailed in Methods, which generalizes Eq. ( 1) to the quantum domain.Consider a joint system-environment SE, with (correlated) initial state ρ SE 1 .The system is interrogated at t 1 and a quantum event x 1 is observed, with an associated completely-positive (CP) transformation O (x1) 1 on S. The events are described by an instrument is CPTP [33].Following this intervention, SE evolves unitarily for time t 2 − t 1 according to the superoperator U SE 2:1 .Then, S is probed at t 2 , and so on, until t n .The probability to observe x 1 , . . ., x n =: x n:1 using instruments J 1 , . . ., J n =: J n:1 is

P(x
As per Fig. 2, abstracting everything outside of experimental control defines the process itself and yields a multi-linear map from instruments to probability distributions called the process tensor [10,11].On the other hand, the instruments on S are collected to define the generalized instrument J n:1 = {O (xn:1) n:1 }.A generalized instrument can be temporally correlated, although we focus on uncorrelated instrument sequences for simplicity.Intuitively, the process tensor encapsulates the uncontrollable effect of the environment, i.e., the process per se, whereas the generalized instrument represents the controllable influence.Any open quantum dynamics probed at a (causally-ordered) number of times can be described by a process tensor, and any probing sequence by a generalized instrument [8,34].(These objects are commonly known as quantum combs and have appeared elsewhere [18,29,[33][34][35][36][37][38][39][40][41].) Both the process tensor and the instrument elements are higher-order quantum maps [8,34,36,42] that can respectively be represented as quantum states Υ n:1 and {O (xn:1) n:1 } via the Choi-Jamiołkowski isomorphism.The joint probability of realizing any sequence of events is given by the generalized Born-rule [43] (see Methods): The process tensor encodes all probabilities for any choice of instruments and thus characterizes the process.Grouping the times as before, the quantum generalization of the l.h.s. of Eq. ( 2) is obtained by projecting the memory of Υ F M H onto the conditioning element of J M = {O The result is the conditional future-history process (see Fig. 2) The tilde in Eq. ( 6) signifies that the conditional objects are not necessarily proper process tensors, since realizing memory events post-selects the history [35,44]; nonetheless, summing these yields a proper process tensor F H . Mirroring the classical setting, if the conditional processes are uncorrelated, i.e., Υ , the process has Markov order with respect to J M [12] [see r.h.s. of Eq. ( 2)]. 2 This operational notion of memory strength means that by applying J M , no futurehistory correlations are possible for any history and future instruments, i.e., no memory lasting longer than is detectable.In general, the conditional processes are correlated.However, long memory does not imply strong memory; we now show how weak correlations can be truncated for accurate and efficient approximation. 2 Note that the causality constraint on the process tensor ensures that Υ (x M ) F is causally-ordered for each event.

RESULTS
The first ingredient to developing finite Markov order approximations is to quantify the memory strength over a block of length , which we provide in Eq. (7).We then construct an approximate process which neglects long-term memory.We first use the knowledge from the interrogation to (partially) tomographically reconstruct the process on the memory block, ensuring that the approximate process acts correctly on all instruments lying in the span of the original one.This reconstruction typically exhibits future-history correlations, which are subsequently discarded to yield the approximate process of Eq. (10).We then bound the error of expectation values computed with the approximate process in terms of the memory strength, thereby endowing it with operational meaning.By iterating our procedure over translations of the memory block, significant savings in the complexity of description are possible [45,46].

A. Memory Strength and Recoverability
Given a particular realization x M of instrument J M , the future and history can be more or less correlated.Such correlations can be detected by applying any choice of instruments J F , J H on the conditional future-history process.Taking the supremum over such instruments provides an operational definition of memory strength, quantifying the largest detectable conditional future-history correlations.We thus define where p(x M |J M ) := tr Υ is the probability of observing x M and the supremum is taken over uncorrelated instruments } for Y ∈ {F, H} belonging to a set J.
We have defined the measured conditional mutual information , where (8) is the conditional probability distribution for any process Γ and (independent) future-history instruments conditioned on a fixed memory event.Intuitively, the memory strength in Eq. ( 7) captures the largest detectable future-history correlation, conditioned on the outcomes recorded on the memory block, aggregated to the level of J M by averaging over x M .
Unlike classical memory, quantum memory effects depend upon the choice of probing scheme [12,13], suggesting that a universal quantum recovery map may not exist.However, we now construct a quantum process recovery map that efficiently builds up longer processes from shorter ones when the process is stationary.We begin with an ansatz process with finite quantum Markov order with respect to J M = {O (x M ) }: where the D [7,8].This is the tomographic representation of the process via linear inversion of the instrument outcomes, with conditional F H correlations discarded.The recovered process can exhibit correlations between H and M , M and F (as well as within each block), but not between H and F .By construction, Λ J M F M H is positive on its domain, which is the span of J M .When J M is not informationally-complete, i.e., does not span the full space, then Λ J M F M H only approximates the original process in said subspace, which we denote by the underline.Such processes are called restricted process tensors [47] and are commonly encountered in experiments [45,46,48,49].Nevertheless, its action on its domain is guaranteed to reproduce the correct statistics for any multi-time observable of the form . This is a consequence of linearity, as the observable form ensures a linear decomposition in terms of O , upon which the recovered process acts correctly due to tr[D The above ansatz is the process analogue of a quantum Markov chain state [30,31], which is widely studied in the context of Petz's recovery map [50,51].Like a quantum Markov chain state, the process above is generally not separable, as F and H can share entanglement with M , as per the example in Appendix B. However, since the dual elements here can be non-positive and cannot necessarily be decomposed into orthogonal parts, care must be taken in defining the recovery map ).The concept of the quantum recovery map is analogous to the classical case [see below Eq. ( 3)].The key advantage of the quantum process recovery map is that its repeated action on the ansatz propagates the process arbitrarily far into the future with fixed -dependent complexity.If the process Υ F M H has weak memory Θ(J M ), the expectation value of any valid observable calculated from the recovered process Λ J M F M H accurately approximates that of the original: Theorem 1.For any multi-time observable C with support on M within the span of the elements of J M , with This and the following statements are proven in the Methods section.Thm. 1 is fully general inasmuch as it holds without any assumptions on the dynamics or instruments employed.The r.h.s.involves a supremum over instruments on the future and history; as the memory strength takes the form of a generalized divergence, recent numerical techniques can be used for its estimation [52][53][54].In Appendix D, we provide an easierto-compute (and looser) bound based on the relative entropy between the original and recovered process-which foregoes the requirement for optimization-by adapting results from Refs.[30,32,51,55] to first bound a generalized measured relative entropy and then the left-hand side of Eq. ( 11) via Pinsker's inequality.We also prove another bound, which is tighter in some cases, by restricting to unbiased instruments satisfying tr the unconditional action of a completely depolarising channel, e.g., a randomly sampled Clifford gate.Deriving tighter bounds under various physically-motivated (e.g., thermodynamic) assumptions remains an open problem.While Thm. 1 applies to multi-time observables, often one only requires the time-evolved density operator; a corollary bounds its prediction error: be the true density operator at any time t j ∈ F following outcome x M of J M applied to the memory, and ρ (x M ) j be the approximated one.Then: The future states result from applying identity maps at all history and future times except t j and the instrument J M to the memory block to the true and recovered process.See Ref. [45] for a detailed analysis of using finite Markov order approximations to accurately prepare future states.Whenever J M spans the full space, i.e., is informationally complete, then any multi-time expectation value can be accurately approximated.In this case the distinguishability, by any means, is bounded by the memory strength: Theorem 3.For informationally-complete J M , the recovered process Λ J M F M H gives sensible predictions for any instrument on M and where generalizes the diamond norm [56] to quantum processes.
The memory strength thus provides an operationally-clear measure: if there exists some informationally-complete instrument for which Θ(J M ) is small, then Thm. 3 states that one can closely approximate the process for all instruments, even those for which the memory strength is large.If, additionally, Υ x M in the informationallycomplete instrument, then the process has small memory strength for all instruments; such processes resemble similar properties to approximately finite-memory classical stochastic processes (where there is only one instrument).

B. Case Study
Consider a qubit S coupled to another qubit E, which is cooled by an external bath.The joint evolution follows where the dissipator acts on E: In Ref. [57], it was shown that for κ 2 ≥ 64ξ 2 , the process is CP-divisible, which is a common proxy for quantum Markovianity [58,59]; however, CP-divisibility only implies an absence of some kinds of memory [60,61].Non-Markovianity 'measures' built upon two-time considerations-many of which are contradictory [4]-overlook multi-time effects.Indeed, this model contains higher-order correlations; by constructing the process tensor, we quantify the (non-vanishing) non-Markovianity for all (ξ, κ) ∈ [0, 2] × [0, 10] (see Appendix E).We then examine the memory strength in various regimes by constructing three 6-time process tensors Υ 6:1 (ξ, κ): one CP-divisible, one strongly non-Markovian, and one intermediate, and let M range from t 2 to t 5 .We consider: i) the identity map, which captures the natural memory strength; ii) the "causal break" instrument, where the system is measured and independently reprepared in an informationally-complete manner, breaking information flow through the system; and iii) the completelynoisy instrument, which replaces the state with white noise, quantifying noise-resistant memory [14].
All processes have vanishing memory strength for the completely-noisy instrument, which can be implemented by applying random unitaries sampled from a set whose average is the depolarizing channel, providing a convenient way to bound memory [48].For cases i) and ii), in Fig. 3 we plot the error in the multi-time expectation value (i.e., l.h.s. of Thm. 1) and a memory strength proxy based on the relative entropy between the Choi states of the true and recovered processes (see Appendix D) which upper bounds the r.h.s. of Thm. 1.The observable C is chosen as an initial preparation, followed by doing nothing for four steps, before a final measurement.Each process displays significant memory strength for the identity instrument, indicating that unperturbed memory does not decay rapidly.In contrast, the effects of interventions are seen for the causal break: here, all memory effects detected result from environmental interactions since the causal break ensures no temporal correlations can be transmitted through the system (cf.the identity instrument).The CP-divisible process exhibits negligible memory strength, the intermediate process some, and the strongly non-Markovian one stronger still.We emphasize that the unperturbed evolu-tion is better approximated by the process recovered from the informationally-complete recovery scheme than that from the identity instrument, demonstrating Thm. 3.

DISCUSSION
We have introduced the concept of memory strength for quantum stochastic processes, which is shown to bound process recoverability.Its applicability is exemplified by the case study, where we are able to accurately and efficiently reconstruct dynamics with a memory cutoff, even in a highly non-Markovian regime.We expect these tools to be broadly applicable to modern techniques for efficient simulation, where operationally-motivated memory approximations with quantitative error bounds are desired, such as transfer tensor [62][63][64][65][66] and machine-learning methods that either attempt to learn non-Markovian features [67,68] or compress the memory to low-dimensional effective environments [69,70].Our notion of memory strength and the associated concept of recoverability will play an important role in the characterization and mitigation of noise in quantum experiments where multi-time memory effects are present [45,46,48,49].Of particular relevance in this direction, in Ref. [48] the authors recovered the restricted process tensors on four IBM quantum computers and reported a reconstruction fidelity of order 10 −3 .Additionally, in Refs.[45,46] the authors directly applied our tools to drastically reduce the number of conditional circuits required to be estimated to characterize a multitime process on the IBM quantum computer.In particular, they demonstrate that a finite Markov order model of length = 2, 3 suffices to prepare future target states on a five step non-Markovian process with approximately 88% and 93% fidelity, respectively, in the presence of correlated noise.This is a huge improvement over previous 'gold standard' techniques using gate-set tomography to mitigate state-preparation-andmeasurement errors, which gives fidelity values of around 75% in the same circumstances.Our present work lays the conceptual foundations for approximating processes with finite memory with requirements only as large as the complexity of the memory and these examples highlight the efficacy of our framework in realistic settings.Further developments in this direction will bridge the gap between efficient characterization and simulation of quantum processes with memory [45,46,48,49,71,72].
A discrete-time classical stochastic process is characterized by the joint probability distribution P over all sequences of events, P(x n , . . ., x 1 ), where we drop the explicit time labels with the understanding that x j represents an event at time t j .
In multi-time quantum processes, it is important to not only capture the outcome of a measurement, but also the transformation induced on the state, which together constitute an event.Thus, an interrogation of a quantum stochastic process at time t j is described by an instrument J j = {O (xj ) j }, which is a collection of completely positive (CP) maps that sum to a completely positive and trace preserving (CPTP) map.Instruments represent general quantum operations, including projective measurements, unitary transformations, and anything in between.Each CP map corresponds to a particular event realized and the fact that the maps sum to a CPTP one encodes the assumption that some event is observed.A discrete-time quantum stochastic process is uniquely described once the probability P(x n , . . ., x 1 |J n , . . ., J 1 ) for all possible events {x 1 , . . ., x n } for all possible instruments {J 1 , . . ., J n } are known.As a consequence of the linearity of mixing principle [43], there exists a multi-linear functional T n:1 that takes any sequence of CP maps to the correct probability distribution via P(x n , . . ., x 1 |J n , . . ., J 1 ) = ], known as the process tensor [11].The process tensor generalizes classical stochastic processes [9] and reproduces classical properties appropriately [74,75].As it encodes all detectable memory effects, it has been used to develop operationally meaningful notions of quantum Markovianity [10,11] and memory length [12][13][14].
Since all of the CP maps constituting the instruments, as well as the process tensor itself, are linear maps, they can be represented as matrices through the Choi-Jamiołkowski isomorphism (CJI) [8].Any map O : B(H i ) → B(H o ), where B(X ) denotes the space of bounded linear operators on X , can be mapped isomorphically to a matrix O ∈ B(H o ⊗ H i ) through its action on half of an (unnormalized) maximally entangled state If the initial state is not subject to a (CPTP) quantum channel but instead a particular event is observed, associated to a CP map of an instrument J = {O (x) }, then the corresponding probability is computed via Similarly, the action of a process tensor map T n:1 on a sequence of instrument elements {O } can be expressed in terms of a multiplication of their Choi matrices and a trace as follows [43] P(x n , . . ., x 1 |J n , . . ., J 1 ) where .The Choi state of the process Υ n:1 essentially plays the role of a quantum state over time, insasmuch as it encodes all observable probability distributions for all possible instrument sequences (just as a quantum state encodes all observable probability distributions for any choice of POVM).In the (one-time) spatial setting, i and Eq. ( 16) reduces to Eq. (15).Note that we label the Hilbert spaces logically from the perspective of the experimenter (i.e., the experimenter receives a state from the process that is "input" into their instrument of choice, transforming it into an "output" state that is fed back into the process); hence, i denotes outputs of the process and o denotes inputs to the process.Thus, whenever a process tensor acts on an instrument sequence, the degrees of freedom with the same labels (timestep and input/output) are contracted over.
The natural generalisation Υ n:1 of the CJI applied to the multilinear map T n:1 is constructed by feeding one half of an (unnormalized) maximally entangled state into the dynamics at each time [8].More precisely, begin with the systemenvironment dilated dynamics shown in Fig. 2 (green), and denote the initial system-environment state by ρ and the unitary maps describing the joint evolution between times t j−1 and t j by U j:j−1 .Now consider n − 1 additional maximally entangled pairs, Ψ + j o j o associated to auxilliary systems A j o S, collectively described as Letting the unitary maps between each timestep act on the environment and one half of the appropriate auxilliary systems, i.e., U j:j−1 : yields the Choi state of the process tensor: It is straightforward (albeit arduous) to verify the correctness of Eq. ( 16) via direct insertion of Eq. ( 17).Natural generalizations of complete positivity and trace preservation to multitime processes translate respectively to Υ n:1 ≥ 0 and the following hierarchy of trace conditions: Conversely, any operator satisfying the above represents some (causally-ordered) quantum dynamics inasmuch as it corresponds to an underlying system-environment circuit [11,34].Eq. ( 16) constitutes a special case of how (higher-order) quantum maps act on each other: Here, we are contracting all open slots of the process tensor with an operation associated to each time in order to yield a probability distribution.More generally, it is possible to consider applying instruments to only a subset of times, yielding a "conditional" process defined upon the remaining times, which describes the correct behaviour of the concatenated dynamics.In other words, it contains all of the information required to compute the correct probability distribution for any instruments subsequently applied to the remaining times.To compute such an object in the Choi representation, one uses the link product defined in Ref. [36].Essentially, this amounts to restricting both the trace and the transposition in Eq. ( 16) to only the common Hilbert spaces associated to the relevant subset of times where the instrument is being applied.
For instance, grouping the times into history {t 1 , . . ., t k }, memory {t k+1 , . . ., t k+ } and future {t k+ +1 , . . ., t n } and choosing an instrument } on the memory block, the conditional future-history process that occurs given any particular event sequence O Such a conditional process is generically correlated across F and H; however, if it is of tensor product form Υ for each event x M of the instrument J M , the process has Markov order := |M | with respect to said instrument [12].(In general, each O (x M ) M may act on only a subspace of M , with the history and future retaining the rest, to yield Υ , where M F and M H can depend on x M [see Appendix B]; for brevity, we absorb these into F and H.) Lastly, a Markovian process corresponds to one for which the process tensor has the specific tensor product structure of an uncorrelated sequence of CPTP maps {Λ j i :j−1 o } connecting adjacent timesteps, and an initial quantum state ρ 1 i [10]:

B. Preliminaries
To begin with, we introduce the following definition: Definition 4 (Instrument relative entropy).For any family of instruments J and process tensors Υ, Γ, where S(A B) := tr [A(log A − log B)] is the usual quantum relative entropy and P J is a CP map from process tensors to classical pointer states, whose elements form a probability distribution over outcomes of the instrument J = {O (x) }: Proposition 5.For any Υ F M H , J M and Λ J M F M H as defined in Eq. (10), with Θ(J M ) taken to be the measured CMI, J ∩ span(J M ) a family of instruments whose elements have support on M only in the linear span of the elements of J M .
Proof.Consider first the measured conditional probability distributions for a fixed memory instrument J M and arbitrary (independent) J F , J H arising from the process tensor Υ F M H and the recovered process Λ J M F M H , which are respectively given by: (24) and Here, we have defined Λ F M H , which, by construction, factorizes as With this, we can express the mutual information in the correlated distribution I(F : H) PΥ(x F ,x H |x M ) in terms of the relative entropy between said distribution and the uncorrelated one arising from measurements on the recovered process, i.e., Thus, beginning with the definition of Eq. ( 7), we have: Here J F mH is the original set of uncorrelated instruments J on F H, combined with a POVM on the pointer space m, i.e., the supremum in the measured relative entropy is taken over where satisfies the relevant trace conditions on the F H part and tr } the dual set to J M , we have . Hence, the claim is asserted.
We are now in a position to prove our main results.
C. Proof of Thm. 1 First, we note that, for any set of instruments J and process tensors Υ and Γ, with p J x := tr O (x)T Υ F M H and q J x := tr O (x)T Λ J M F M H the probabilities associated with the instrument J = {O (x) }.We can then use Pinsker's inequality to write Any multi-time operator C can be decomposed in terms of the elements of a single instrument J = {O (x) } ∈ J, as long as those elements span a sufficiently large space.That is, C = x c J x O (x) with c J x ∈ C; in general, the norm |C| J := x |c J x | 2 will vary with the instrument involved in the decomposition.We therefore have, using the Cauchy-Schwarz inequality, for any operator Ξ.
Cor. 2 follows directly as shown below.

D. Proof of Cor. 2
Choose αβ |αα ββ| the Choi state of the identity map and P j a projector.Then |C| = 1, since C is an element of the instrument where the system is left to freely evolve on H, J M is applied, it again freely evolves to time t j and then the POVM {P j , 1 − P j } is applied.
the state, at time t j , of the system undergoing the process specified by Υ F M H , acted on by J M with outcome x M occurring, and with no other active interventions.Similarly, the predicted state is ρ Here, Ψ + denotes the Choi state of the identity map and Ψ + X is shorthand for a sequence of identity maps applied at all times in the block X, and j corresponds to a subspace of the future Hilbert space associated to time t j .The l.h.s. of Eq. ( 11) of the main text then reduces to |tr[P j (ρ )]|; since the bound must be true for any P j , it must be true for the one for which the l.h.s. is largest; i.e., sup Pj |tr[P j (ρ 1 is bounded.For informationally complete instruments, a combination of the results derived above leads to Thm. 3.

E. Proof of Thm. 3
When J M is informationally complete, Λ J M F M H is a full process tensor and any instrument can be applied to it, since J ∩ span(J M ) = J by definition.Therefore, we can use Eq. ( 31), along with Prop.5, to write: The square root of the l.h.s. of this equation is the generalized with X defined in Thm. 3. Equation ( 13) and hence Thm. 3 follows.

APPENDICES Appendix A: Memory Effects in Shallow Pocket Model
Here we detail the shallow pocket model considered as a motivating example [16,17,28,29].It describes a qubit system coupled to a linear degree of freedom that acts as its environment.The dynamics is generated by the interaction Hamiltonian where x is the position operator and g the coupling strength.
The joint state of the system-environment evolves as where ρ ij 0 corresponds to the matrix element (i, j) of the system density operator at time t = 0.
In order to examine memory effects in this process, we consider the initial state of the system to be maximally entangled with an additional ancilla A. The environment begins uncorrelated with SA in the state |ψ ψ| E , which is such that We track the mutual information I(S : A) := S(ρ S ) + S(ρ A ) − S(ρ SA ) as the system is subject to this dynamics.
Firstly, suppose that no interventions are made to the system as it evolves.Throughout the process, the system builds up correlations with the environment at the expense of those shared with the ancilla.This can be seen by tracing out the environmental degrees of freedom: in doing so, one obtains-through the inverse Fourier transform of a Lorentzian distribution-the following system-ancilla time-evolved state The mutual information of this state decays exponentially in time, as depicted by the black, solid curve in Fig. 1 of the main text (note that we choose g = 0.8, γ = 0.3 for all curves in this figure).
We now examine the effect of implementing an operation on the system at some intermediary time.We consider the natural evolution above to occur up until some fixed time t 1 (= 5 for illustration), at which point an arbitrary quantum operation can be applied to the system; the system subsequently evolves according to the shallow pocket model up to some later time t 1 +τ .It is crucial to track the entire system-ancillaenvironment state throughout the process to understand the evolution of the correlations between the parties-as this is how memory effects are made manifest-with the environment only being discarded at the conclusion.
Consider first applying the Pauli rotation σ x to the system at t 1 .The subsequent joint system-ancilla state at t 1 + τ is where the notation means that σ x was applied at fixed time t 1 , followed by shallow pocket evolution for variable time τ .It is clear that by time τ = t 1 , the system-ancilla has returned to a maximally correlated state.The mutual information as this state evolves is depicted by the blue, long-dashed line in Fig. 1 of the main text.Indeed, this analysis recovers the well-known result that application of σ x at time t 1 reverses the dynamics and leads to the system returning to its initial state (of maximal correlation with the ancilla, in our case) by time 2t 1 [16,17].This follows directly from the identity σ x Hσ † x = −H, which leads to e it1H σ x e it1H σ † x = 1.
One can also consider the experimenter applying other operations.For instance, perhaps the operation implemented at t 1 is some offset Pauli rotation σ offset := √ pσ x + √ 1 − pσ z .In this case, the subsequent system-ancilla state is The mutual information of this state is depicted by the green dotted line in Fig. 1 of the main text.Interestingly, this first induces a decrease in the mutual information between the system and ancilla that is steeper than the exponential decay that occurs when no operation is implemented, before correlations build back up as the system evolves with the environment, which retains memory of the system's past.
Similarly, a measurement of the system state could be made.In Fig. 1 of the main text, the red, dot-dashed curve depicts this for a measurement in the x-basis spanned by {|± := |0 ±|1 }; here we show the mutual information for a measurement yielding the outcome +.The post-measurement system-ancilla state is Directly after the result is observed, i.e., at τ = 0, the mutual information drops to 0 as the system and ancilla are uncorrelated; however, again, correlations build back up as the system evolves with its environment due to memory.
Lastly, the experimenter could attempt to completely erase any historic information by discarding the system (i.e., measuring without recording the outcomes) and prepare a fixed, known state to feed into the dynamics.The purple, shortdashed curve in Fig. 1 depicts this scenario.For the shallow pocket model considered, it does not matter which state is prepared by the experimenter: the discarding of measurement outcomes destroys any memory of the history and no correlations between the system and ancilla can ever be built up again.Crucially, this is not the case in general and only occurs because the dynamics is CP-divisible.
Let the state input at H o be the first halves of three maximally entangled states x∈{a,b,c} |φ + 2 (|00 + |11 ); here, the tilde denotes systems that are fed into the process, whereas the spaces without a tilde refer to systems kept outside of it.The states input at M o , and M o are labeled similarly.In between times H o and M i , the process makes use of the second part of the common cause state |e + to apply a controlled quantum channel X, which acts on all three qubits a, b, c.Following this, qubits a and b are discarded.The ab qubits input at M o , as well as all three qubits input at M o are sent forward into the process, which applies a joint channel Y on all of these systems, as well as the first part of the common cause state |e − .Three of the output qubits are sent out to F i , and the rest are discarded.The c qubit input at M o is sent to M i , after being subjected to a channel Z, which interacts with the first part of the common cause state |e − , i.e., the φ 0 , φ 1 register.
Consider the process where |e + is sent to H i and M i and |e − to M i and F i .The process tensor for this case is Next, consider the process where |e + is sent to H i and M i and |e − to M i and F i .The process tensor for this scenario is In the first case, there is entanglement between H io and M i , as well as between M o M io and F i .In the second case, FIG. 4. A process with finite quantum Markov order with parts of M kept by H and F .On the left is the first process, described in Eq. (B2), in which part of the common cause state |e + is sent to Mi and |e − is sent to M i .On the right is the second process, described in Eq. (B3), which has the recipients flipped.The process tensor is depicted in gray, and entanglement between parties color-coded in green and mauve.The overall process is a probabilistic mixture of both scenarios.See also Ref. [73] for discussion on this process.
there is entanglement between H io and M c i M ab i , as well as between M ab i M o M c i M o and F i .The overall process is the average of these two, which will still have entanglement across the same cuts for generic probability distributions that the common cause states are sent out with.
This process has vanishing memory strength Θ because we can make a parity measurement on the ab parts of M i and M i .The parity measurement applies two controlled phases to an ancilla initially prepared in the state |+ , with the control registers being qubits a and b.If the two control qubits are in states |00 or |11 , then |+ → |+ .However, if the control qubits are in states |01 or |10 , then |+ → |− .By measuring the final ancilla, which can be perfectly distinguished since it is in one of two orthogonal states, we can know which process we have in a given run; in either case, there are no H correlations.Lastly note that this process also has vanishing quantum CMI; this agrees with the analysis in Ref. [13], as the instrument that erases the history comprises only orthogonal projectors. and for any bounded operators X ∈ B(H) and x ∈ B(H ).This is satisfied when E [1] op ≤ 1 and E [1] op ≤ 1, with X op := max{|λ| : X − λ1 is not invertible}, i.e., the largest singular value of X [55].
k , then this is equivalent to the following constraints on the Kraus operators: For the maps P J appearing in Def. 4, it is possible to show that these conditions do not hold.However, we can write P J = D i P Ĵ , where such that the Kraus operators of P Ĵ take the form E xk = |x k| O (x) /D i , where {|k } is an orthonormal basis for the space the O (x) 's act on.Since the trace conditions required for the overall action of any instrument imply that tr a positive operator and hence the sum of its singular values (its trace) cannot be smaller than its largest.Therefore, P Ĵ satisfies the Cauchy-Schwarz inequality in both directions and we have where, in the first equality, we have used the simply demonstrated fact that S(αρ ασ) = αS(ρ σ) for any scalar α.Since Eq. (D6) holds for any J ∈ J, Eq. (D2) follows directly from Def. 4. Since the measured relative entropy on the r.h.s. of Eq. (D2) is equal to the memory strength (see Prop. 5), one can replace the r.h.s. of Thms. 1 and 3 and Cor. 2 by the proxy memory strength is the projected process tensor onto the classical pointer basis m and similarly for Λ J M F mH , and D F H is the dimension that the future-history process tensor acts upon.This is precisely what is plotted as a proxy for memory strength in Fig. 3 of the main text.The advantage of doing so is that the proxy does not require computing an optimization over future and history instruments, therefore making it easier to calculate.The disadvantage is that the bound is looser than the bound based upon the optimized memory strength by a factor of D F H . Importantly, this is smaller than the full system-environment dimension as it only accounts for the environmental impact on the system dynamics and can be tomographically reconstructed.Note further that the complexity of F H is often a choice: for a given d-dimensional system, one may be interested in only predicting its instantaneous state at some future time following an initial (historic) preparation for some memory instruments-in this case, D F H = d 2 × d 2 (in general, the dimension of a process tensor grows exponentially in the number of timesteps).If the memory strength vanishes, the recovered process can be used to perfectly simulate the dynamics arbitrarily far into the future and for any number of timesteps.
Alternatively, by restricting the measured relative entropy to the the special class of unbiased instruments J ub ∈ J ub (on F H) with deterministic action satisfying O J ub = 1/D o , we can prove the following bound: Proposition 7.For any Υ F M H , J M and Λ J M F M H as defined in Eq. (10) of the main text, with J ub|J M = J ∩ span(J M ) ∩ J ub F H the set of instruments whose elements have support on M only in the linear span of the elements of J M and for which the F H part is unbiased.
Proof.First we note that, for unbiased instruments, where the trace-normalized Choi state Υ is a physical density operator and x D o O (x) = 1, i.e., the rescaled instrument elements form a POVM.In this case, the action of the instrument map P J ub on the process tensor looks like a tracepreserving measurement map on the normalized Choi state, for which the usual monotonicity of relative entropy holds.That is, for process tensors Υ and Γ, We can therefore follow the same steps as in the proof of Prop. 5 to arrive at where J ub F mH is the set of F mH instruments J ub F mH satisfying O J ub F mH = 1 F mH /D o F H .In the same manner as the full set of instruments on F mH can be identified with the set of instruments J ∩ span(J M ), J ub F mH can be identified with J ∩ span(J M ) ∩ J ub F H , and Eq.(D7) follows accordingly.
are insufficient to characterize the full memory effects of the process, as they necessarily fail to capture multi-time effects.Before we go on to examine the memory length of said process, we first show that it is non-Markovian for all parameters in the considered regime using a recently introduced notion of non-Markovianity that accounts for all (multi-time) temporal correlations [10].See also Ref. [14] for further details on this case study.Due to the simplicity of the model, the analytic form of the equation of motion on the level of the system alone can be derived and is written as [57] ∂ρ where the time-dependent coefficient is given by (E2) A necessary and sufficient criterion for the dynamics to be CP-divisible is that the coefficients of the dissipation terms in the above master equation for the system, i.e, − ċt 2ct , are non-negative for all times [59].Explicit calculation shows that for κ 2 ≥ 64ξ 2 , − ċt 2ct is always non-negative, whereas for We therefore see an abrupt transition between CP-divisible and non-CP-divisible dynamics across the line κ 2 = 64ξ 2 , as shown here in Fig. 5a).
In the CP-divisible regime, the trace-distance between any two states subject to the evolution is always nonincreasing [59].This fact allows for the total quantification of two-time non-Markovianity N 2 by integrating any increases in the trace-distance over all time, which has been shown in Ref. [57] to yield the analytic result for κ 2 < 64ξ 2 and zero otherwise.This is plotted in Fig. 5b).However, CP-divisibility does not imply Markovianity [60]; as such, Figs.5a) and b) do not provide a comprehensive picture of the many prevalent memory effects for different choices of parameters κ and ξ.Here, we explicitly calculate the process tensor for the dynamics and show that it is non-Markovian for the entire parameter regime, before exploring the behavior of the instrument-specific memory strength introduced in the main text.
We consider a parameter grid ξ ∈ [0, 2] and κ ∈ [0, 10] with increments of 0.1 in each direction and construct the n = 6 step process tensor, Υ 6:1 (ξ, κ).Here, for simplicity, we assume an initially uncorrelated system-environment state, such that the process tensor begins on an output space.We also choose uniform spacing between timesteps of dt = 0.3, which corresponds to the natural timescale over which the trace-distance between arbitrary initial system states increases for most values in the parameter space [57].This means that the final time of the process tensor is T = 1.5, which corresponds to where the CP-divisibility criteria would witness non-Markovianity for a wide range of parameters.
At each point, we calculate the multi-time non-Markovianity N in the process by considering the distance between the process tensor and its nearest Markovian counterpart [10].Here, we choose the pseudo-distance to be the relative entropy, N := D(Υ 6:1 (ξ, κ) Υ Markov ), in which case the minimum occurs for the Markovian process that is simply built up from the marginals of the original process tensor [76], i.e., using the relative entropy circumvents the normally necessary minimization.The corresponding results are depicted in Fig. 5c), which indicates that the process is non-Markovian for all parameters in the chosen range.In particular, there is no abrupt transition between regimes.Although the non-Markovianity is small above the line κ = 8ξ-across which the dynamics transitions from CP-divisible to non-CPdivisible-it is non-zero, indicating a weak but present memory.The two-time witness of non-Markovianity in Eq. (E3) is insensitive to such effects, which leads to the abrupt transition between regimes; by capturing all multi-time correlations, the non-Markovianity calculated via the process tensor shows this transition to be artificial.This begs the question: how long does the memory persist?In the main text, we study the behavior of memory with respect to a number of instruments of interest [14], namely: i) The 'do nothing' or identity instrument, which intuitively corresponds to the natural memory strength of the process as the system is not actively intervened on throughout its evolution.This single 'outcome' instrument sequence on M consists of identity maps, whose Choi state is Ψ + M = k−1 j=k− Ψ + j , the natural memory strength is quantified by the correlations in Υ I M F H := tr M Ψ + M Υ F M H (the memory strength with respect to any unitary transformations can be defined similarly).
ii) The causal break instrument comprising of an informationally complete set of independently measured and repre-FIG.5. Abrupt transition between CP-divisible and non-divisible dynamics [14].In panel a), we plot ∂t|ct| with ξ = 1.As sgn ( ċt c t ) = sgn (∂t|ct|), this implies the dynamics is CP-divisible for κ ≥ 8, but not for κ < 8.In particular, there is an abrupt transition along the line κ = 8.In panel b), we plot the two-time non-Markovianity N2 as per Eq.(E3).This is plotted in the parameter space ξ ∈ [0, 5] and κ ∈ [0, 10].Note that this measure of non-Markovianity vanishes for everything above the black line κ = 8ξ.In panel c) we plot the multitime non-Markovianity N of Υ6:1(ξ, κ).An important distinction to note is that the two-time non-Markovianity in panel b) results from an integration of positive memory contributions over all times, whereas the multi-time non-Markovianity in panel c) is computed for fixed process tensors in the parameter space.Although the non-Markovianity is small above the line κ = 8ξ, it is non-zero.Moreover, there is no abrupt transition between regimes, as all memory effects are accounted for.We also depict with crosses the three specific process tensors defined in Eq. (E5) for which we calculate the instrument-specific memory strength in the main text.pared states.The causal break sequence chosen is a symmetric single-qubit informationally-complete POVM comprising elements where {α (x) } = {(1, 1, 1), (1, −1, −1), (−1, 1, −1), (−1, −1, 1)} are tetrahedral coordinate vectors, followed by the independent repreparation into one of the set of states (with uniform probability) {|0 0|, |1 1|, |+ x +| x , |+ y +| y }, where |+ x/y is the +1 eigenstate of σ x /σ y .Aggregating the correlations for each outcome to the corresponding instrument level provides another notion of memory strength amenable to our framework; we take the average with respect to the probability distribution p x = tr B iii) The completely-noisy instrument, which discards the output and reprepares the maximally-mixed state, capturing the strength of noise-resistant memory.The Choi state of this (single-outcome) instrument is the identity matrix and so the correlations in the marginal process tensor ] quantify this type of memory.We focus on the instrument-specific memory strength for the instruments above for three specific process tensors, one in each regime of interest for the dynamics described, chosen as [see Fig. 5c)] Υ CP :=Υ 6:1 (1, 10), Υ Int :=Υ 6:1 (1, 8), Υ SNM :=Υ 6:1 (1, 1). (E5) In Fig. 3 of the main text, we consider the three process tensors above upon which the identity, completely noisy and causal break instruments described previously are applied.We plot the memory strength proxy 2d u S(Υ J M F mH Λ J M F mH ) and the difference between expectation values C Υ FMH − C Λ J M FMH (i.e., the l.h.s. of Thm.1), where d = 2 and u = 4 − and we choose C to an observable corresponding to an initial preparation of state |0 , doing nothing to the process for the middle four timesteps and then finally making a measurement yielding outcome 1 of the POVM defined in Eq. (E4).

FIG. 3 .
FIG. 3. Case study.We plot | C Υ FMH − C Λ J M FMH e., O := (O ⊗ I)[Ψ + ].Note that the time of the event is associated to both an input and output Hilbert space, and the Choi matrix is a supernormalized bipartite state.In this representation, the properties of CP and TP for the maps respectively translate to O ≥ 0 and tr o [O] = 1 i .To aid intuition, note that the output state ρ := O[ρ] of the map O acting on an arbitrary input state ρ is computed in the Choi picture via ρ 1)partite Choi matrix of the process tensor map T n:1 and each O (xj ) j is the Choi matrix of O (xj ) j

(
xm)T M Υ F M H , where B (xm) M are the elements of the causal break instrument above.