Introduction

A foundational question of modern physics is to understand the origins of irreversibility1. In particular, to determine whether fundamental laws, which are fully reversible, are consistent with phenomena like equilibration and thermalization. The dynamical version of this conundrum concerns the emergence of forgetful processes from isolated ones. In quantum mechanics, an isolated process is unitary, and cannot lose information; past behavior in one part of the system will always be remembered, eventually returning to influence the future.

However, there are many ways in which nature manifests forgetful processes, where a system’s evolution is determined with a seeming disregard to its previous interactions with its surroundings. For example, a carbon atom does not typically remember its past and behaves like any other carbon atom. Such processes are not isolated, and the general intuition is that the dynamics of a system, in contact with a large environment, can be approximately described as memoryless2. Yet, formal derivations of memory-less quantum processes require several assumptions about the coupling strength with the environment, the timescales of dynamical correlations, and an infinite-dimensional reservoir. For finite-sized environments, this can only be achieved exactly by continually refreshing (discarding and replacing) the environment’s state, i.e., artificially throwing away information from the environment. The problem this poses is akin to the one made by the Fundamental Postulate of Statistical Mechanics1, which a-priori sets the probabilities of a closed system to be in any of its accessible microstates as equal.

Thus the foundational question remains open: can forgetful processes arise from isolated processes without any artificial discarding of information? Because forgetful processes are often called Markovian, we refer to the mechanism for forgetting as Markovianization, in the same spirit as the terms equilibration and thermalization1,3,4,5,6,7,8. Indeed, Markovianization is likely to come about through mechanisms intimately related to these other processes. For instance, dissipative Markov processes have fixed points to which the system relaxes; this is a mechanism for equilibration, and also possibly for thermalization. We have previously argued for the emergence of Markovianization for mathematically typical processes, using averages with respect to the Haar measure9; however, such processes are far from physically typical1.

In this paper, we identify a class of isolated physical processes which approximately Markovianize in a strong sense, where even the multi-time quantum correlations vanish. To do so, we employ large deviation bounds for approximate unitary designs derived by R. Low10, and apply them to the process tensor formalism11,12,13, which describes quantum stochastic processes. We show that, similar to the way in which quantum states thermalize, quantum processes can Markovianize in the sense that they can converge to a class of typical processes, satisfying a meaningful large deviation principle whenever they are undergone within a large environment and under complex enough—but not necessarily fully random—dynamics. As a proof of principle, we employ a recent efficient construction of approximate unitary designs with quantum circuits14 to illustrate how a dilute gas would quickly Markovianize. These results directly impose bounds on complexity and timescales for standard master equations employed in the theory of open systems. Finally, we discuss possible extensions of our results to many-body systems with time-independent Hamiltonians. Our results are timely given the ever-increasing interest and relevance in determining the breakdown of the Markovian approximation in modern experiments15,16,17,18.

Results

Quantum stochastic processes

A classical stochastic process on a discrete set of times is the joint probability distribution of a time-ordered random variable, \({\mathbb{P}}({x}_{k},\ldots ,{x}_{0})\). A process is said to have finite memory whenever the state of the system at a given time is only conditionally dependent on its previous m states: \({\mathbb{P}}({x}_{k}| {x}_{k-1},\ldots ,{x}_{0})={\mathbb{P}}({x}_{k}| {x}_{k-1},\ldots ,{x}_{k-m})\). Here, m is the Markov order; when m = 1 the process is called Markovian, and when m = 0 the process is called random. Finite memory processes, and in particular Markov processes, have garnered significant attention in the sciences for two principal reasons. First, the complexity of a process grows with the Markov order and thus it is easier to work with finite memory processes. Second, many physical processes tend to be well approximated by those with finite memory.

Generalizations of Markov processes and Markov order to the quantum realm have been plagued with technical difficulties19, which have their origin in the fundamentally invasive nature of quantum measurement. However, recently, a generalized and unambiguous characterization of quantum stochastic processes within the process tensor framework11,20 has paved the way to alleviating these difficulties. The success of this framework lies in generalizing the notion of time-ordered events in the quantum realm.

Consider a system-environment composite SE of dimension dSE = dSdE with an initial state ρ(0) that undergoes a evolution \({{\mathcal{U}}}_{0}\). An intervention \({{\mathcal{A}}}_{0}\) is then made on the system S alone, followed by evolution \({{\mathcal{U}}}_{1}\). For concreteness, onward we will consider \({{\mathcal{U}}}_{i}\,\ne\, {{\mathcal{U}}}_{j}\). Then a second intervention \({{\mathcal{A}}}_{1}\) on S alone. This continues until a final intervention \({{\mathcal{A}}}_{k}\) is performed following \({{\mathcal{U}}}_{k}\). A quantum event xi at the ith time step corresponds to an outcome of the corresponding intervention, and is represented by a completely positive (CP) map \({{\mathcal{A}}}_{{x}_{i}}(\cdot ):={\sum }_{\nu }{A}_{{x}_{i}}^{\nu }(\cdot ){A}_{{x}_{i}}^{\nu \dagger }\) with Kraus operators {Aν} satisfying ∑AνAν\({\mathbb{1}}\). In other words, an intervention is the action of an instrument \({\mathcal{J}}={\{{{\mathcal{A}}}_{{x}_{i}}\}}_{{x}_{i}}^{{X}_{i}}\) where \({A}_{i}={\sum }_{{x}_{i}}{{\mathcal{A}}}_{{x}_{i}}\) is a completely positive trace preserving (CPTP) map. This is depicted schematically in Fig. 1. In general, the evolution \({\mathcal{U}}\) is allowed to be a CPTP map on SE. In this paper, however, we are interested in an isolated SE, where the \({\mathcal{U}}\)s are unitary transformations: \({\mathcal{U}}(\cdot ):=U(\cdot ){U}^{\dagger }\), with U a unitary operator.

Fig. 1: Quantum processes and the process tensor.
figure 1

a A k-step quantum process ϒ on system S alone is due to the time evolution of an initial system-environment (SE) state ρ(0) with distinct unitary transformations \({{\mathcal{U}}}_{i}\) with i = 0, 1, …, k. In between each pair of unitaries, an external operation (e.g., a measurement) \({{\mathcal{A}}}_{i}\) for i = 0, 1, …, k is applied; this can also be described by a tensor Λ. b An n-qubit SE-system (\(\left|0\right\rangle\) depicting a single qubit) with two-qubit gate interactions (depicted by vertical lines between squares) only: a subsystem qubit is probed at the ith step through \({{\mathcal{A}}}_{i}\). While the standard approach towards typicality or equilibrium properties concerns the whole SE dynamics and/or a single measurement on system S as in Standard Statistical Mechanics, we show that complex—not necessarily uniformly random—dynamics within large environments will be highly Markovian with high probability.

The probability to observe a sequence of quantum events is given by

$${\mathbb{P}}({x}_{k},\ldots ,{x}_{0}| {{\mathcal{J}}}_{k},\ldots ,{{\mathcal{J}}}_{0})={\rm{tr}}\left[{{\mathcal{A}}}_{{x}_{k}}{{\mathcal{U}}}_{k-1}\ \ldots {{\mathcal{A}}}_{{x}_{0}}{{\mathcal{U}}}_{0}{\rho }^{(0)}\right].$$

This can be rewritten, clearly separating the influence of the environment from that of the interventions, in a multi-time generalization of the Born rule21,22,23:

$${\mathbb{P}}({x}_{k},\ldots ,{x}_{0}| {{\mathcal{J}}}_{k},\ldots ,{{\mathcal{J}}}_{0})={\rm{tr}}\left[{{\Upsilon }}{{{\Lambda }}}^{{\rm{T}}}\right],$$
(1)

where T denotes transpose, \({{\Lambda }}:={{\mathcal{A}}}_{{x}_{0}}\otimes \cdots \otimes {{\mathcal{A}}}_{{x}_{k}}\), and the effects on the system due to interaction with the environment have been isolated in the so-called process tensor ϒ. We have depicted ϒ and Λ in Fig. 1a as the red and green comb-like regions, respectively. A circuit depiction of the same process ϒ, along with the instruments Λ is given in Fig. 1b.

Maps like the process tensor are abstract objects with many different representations12. In this manuscript, for convenience, we work with the Choi state representation12,24 of the process tensor, shown in Eq. (10) of the Methods section. The process tensor ϒ is a complete representation of the stochastic quantum process, containing all accessible multi-time correlations25,26,27,28. Similarly, the tensor Λ contains all of the details of the instruments and their outcomes. This tensor, in general, is also a quantum comb, where the bond represents information fed forward through an ancillary system. Finally, the process tensor can be formally shown to be the quantum generalization of a classical stochastic process, satisfying a generalized extension theorem with consistency conditions for a family of joint probabilities to guarantee the existence of an underlying continuous quantum stochastic process13, and reducing to classical stochastic process in the correct limit29,30.

Measuring non-Markovianity

The convenience of using the Choi state ϒ is that it translates temporal correlations between timesteps into spatial correlations. Furthermore, as detailed in the Methods section on the process tensor, ϒ can be efficiently described when written as a matrix product operator11,31, whose bond dimension represents the dimension of a quantum environment that could mediate the non-Markovian correlations. In particular, when the bond dimension is one, the process is Markovian. Specifically, a process ϒ(M) is Markovian if and only if it has the form

$${{{\Upsilon }}}^{({\rm{M}})}={{\mathcal{E}}}_{1:0}\otimes \cdots \otimes {{\mathcal{E}}}_{k:k-1},$$
(2)

with \({{\mathcal{E}}}_{j:i}\) a CPTP map on the system connecting the ith to the i + 1th time12,20. This quantum Markov condition in Eq. (2) allows for a precise quantification of memory effects; it is fully consistent with the classical Markov condition, and contains all of the popular witnesses of quantum non-Markovianity19. Importantly, it allows for operationally meaningful measures of non-Markovianity: for instance, the relative entropy of the process tensor with respect to its marginals, which happen to be the closest Markovian process tensor, i.e. \({{\mathcal{N}}}_{{\mathcal{S}}}:={\min }_{{{{\Upsilon }}}^{({\rm{M}})}}{\mathcal{S}}({{\Upsilon }}\parallel {{{\Upsilon }}}^{({\rm{M}})})\), quantifies the probability of mistaking ϒ and ϒ(M), which decreases in the number of realizations of the process n as \(\exp (-n{{\mathcal{N}}}_{{\mathcal{S}}})\).

For the current considerations, a natural choice is the so-called diamond norm. Just as trace distance is a natural metric for differentiating two quantum states, in the sense of having a clear operational definition, the natural distance for differentiating two quantum channels is the diamond norm, which allows for the use of additional ancillas32. We are interested in optimally differentiating between a non-Markovian process from a Markovian one, which leads to the multi-time diamond distance:

$${{\mathcal{N}}}_{\blacklozenge }:=\frac{1}{2}\mathop{{{\min}}}\limits_{{{{\Upsilon }}}^{({\rm{M}})}}\parallel {{\Upsilon }}-{{{\Upsilon }}}^{({\rm{M}})}{\parallel }_{\blacklozenge },$$
(3)

where X\(_{\blacklozenge}\): = sup\({}_{\{{{\mathcal{O}}}_{i}\},i}\parallel {\sum }_{i}\) tr\(\left[{{\mathcal{O}}}_{i}X\otimes {\mathbb{1}}\right]\left|i\right\rangle \ \left\langle i\right|{\parallel }_{1}\) is a generalized diamond norm27,33, with the supremum over i ≥ 1 and a set of CP maps \(\{{{\mathcal{O}}}_{i}\}\). This definition generalizes the diamond norm for quantum channel distinguishability34 (also called cb-norm35 or completely bounded trace norm24), reducing to it for a single step process tensor, and similarly being interpreted as the optimal probability to discriminate a process from the closest Markovian one in a single shot, given any set of measurements, which can be made together with an ancilla.

Vanishing non-Markovianity in Eq. (3) would imply that the process must have the form of Eq. (2). The derivations of such processes make ad-hoc assumptions such as artificially refreshing the environment between time-steps (i.e., assumption of an infinite bath) that render approximations such as Born-Markov. Classical processes additionally require randomness injection by hand for stochasticity. Here, we show that a class of underlying quantum mechanisms lead to the emergence of Markovianity without ad-hoc assumptions. Namely, We show that the above measure of non-Markovianity in Eq. (3) vanishes as the global SE dynamics becomes more complex. This is entirely analogous to entanglement being the underlying mechanism explaining the emergence of statistical mechanics from quantum dynamics alone and accounting for the artificial postulate of equal a-priori probabilities3.

Markovianization with unitary designs

The generic form of open quantum dynamics is non-Markovian, but, despite this, it is often very well approximated by simpler Markovian dynamics. How this memorylessness emerges is not dissimilar to questions, regarding the emergence of thermodynamic behavior, which have pervaded quantum mechanics since its conception. Indeed, it can be shown that canonical quantum states are typical36,37,38,39, and we now know that the fundamental postulate of equal a-priori probabilities of statistical mechanics can be traced back to the entanglement between subsystems and their environment3. It turns out that, very similarly, if we sample a generic quantum process occurring in a large finite environment at random, it will be almost Markovian with very high probability9.

This sampling procedure can be formalized through the so-called Haar probability measure, μh, over the d-dimensional unitary group \({\mathbb{U}}(d)\), which is the unique (up to a multiplicative constant) measure with the property that, if \(U\in {\mathbb{U}}(d)\) is distributed according to the Haar measure, then so is any composition UV or VU, with a fixed \(V\in {\mathbb{U}}(d)\). It can be normalized to one, so as to constitute a legitimate probability measure40. The Haar measure allows one to swiftly obtain statistical properties of uniformly distributed quantities40,41,42,43,44,45 and, furthermore, to prove concentration of measure results46,47,48; these somewhat surprisingly imply that, when drawn from the right distribution, certain quantities will become overwhelmingly likely to be close to another fixed quantity as the Hilbert space dimension is increased. Henceforth, we write U ~ μh to refer to U as distributed according to the Haar measure and, similarly, we use \({{\mathbb{P}}}_{{\mathsf{h}}}\) and \({{\mathbb{E}}}_{{\mathsf{h}}}\) to denote probabilities and expectations with respect to the Haar measure.

The Result by Modi et al. on Markovian Typicality9, which is reproduced in detail in the Methods section on, gives a mathematically sound result of concentration of measure around Markovian processes. However, it assumes a Haar-distributed uniform sampling of unitary dynamics, and we know that nature seldom behaves randomly49,50. The dynamics of a vast number of physically relevant models can be approximated as Markovian51, so can we say that these also satisfy a concentration of measure with respect to Markovianity?

In some circumstances, sets of physical processes can approximate some of the statistical features of the Haar measure1,52,53,54; for example, consider the toy model depicted in Fig. 2, comprising a dilute gas of n particles evolving autonomously in a closed box. The gas particles interact with each other in one of two ways as they randomly move inside the box. Following and intervening on a special impurity particle, taken to be the system, this model can be approximately thought to be described by a circuit such as the one in Fig. 1b. The simplicity of this system suggests that it can only uniformly randomize after a large number of random two-qubit interactions, progressively resembling genuine Haar random dynamics.

Fig. 2: A toy model analogous to a system with dynamics given by an approximate unitary design with two kinds of two-qubit interactions only.
figure 2

An impurity particle (teal) immersed in a gas of nE particles (arrows depicting direction of motion) within a closed box, where all particles interact in pairs in one of two ways (dashed circles) at random, can be similarly described by an approximate unitary design. The result of Theorem 1 ensures that for a large enough nE and number of interactions, most processes analogous to this one with approximate unitary designs will be almost Markovian.

One possible way to quantify this progressive resemblance of the Haar measure is given by the concept of unitary designs. In general an ϵ-approximate t-design, which we denote \({\mu }_{{{\mathsf{t}}}_{\epsilon }}\), can be defined through

$$\left\Vert {{\mathbb{E}}}_{{{\mathsf{t}}}_{\epsilon }}\left[{{\mathcal{V}}}^{\otimes s}(X)\right]-{{\mathbb{E}}}_{{\mathsf{h}}}\left[{{\mathcal{U}}}^{\otimes s}(X)\right]\right\Vert \le \epsilon ,\ \forall s\le {\mathsf{t}}$$
(4)

for a suitable metric , where \({\mathcal{U}}(\cdot ):=U(\cdot ){U}^{\dagger }\) and \({\mathcal{V}}(\cdot ):=V(\cdot ){V}^{\dagger }\) are unitary maps with \(U,V\in {\mathbb{U}}(d)\). Here, as above, the notation \({{\mathbb{E}}}_{{{\Omega }}}\) indicates the expectation value with respect to a given probability measure μΩ, i.e. \(V \sim {\mu }_{{{\mathsf{t}}}_{\epsilon }}\) and U ~ μh. That is, \({\mu }_{{{\mathsf{t}}}_{\epsilon }}\) approximates the Haar measure up to the tth moment with a small error ϵ. In the case we are interested in, the unitary maps will correspond to SE unitaries, as depicted in Fig. 1(a), according to the either the Haar measure or a unitary design. We also do not assume anything about the parameter t other than it is a positive non-zero integer.

Notice what this would mean for a model similar to that of Fig. 2: as individual random two-body interactions of each kind accumulate, what we expect is for the dynamics to start scrambling their information across the whole gas in the box, progressively becoming more complex and uniformly random55. Unitary designs give us this finite quantification of the approximation to uniform Haar randomness and, in this case, it can give us a precise way to account for the progressive emergence of complexity from seemingly simple individual two-body interactions.

Unitary designs for t = 2, 3 have been widely studied56,57,58,59,60,61,62,63,64,65 and efficient constructions are known for larger values of t14,59,66. The latter are of particular relevance, precisely as designs for large t, i.e., those with a higher complexity55, are expected to satisfy tighter large deviation bounds, approaching concentration of measure as the level and quality of the design increases.

Such large deviation bounds over approximate unitary designs were derived in a general form by R. Low10 for a polynomial function satisfying a concentration of measure bound, and we now use them to demonstrate the phenomenon of Markovianization for corresponding classes of processes.

Theorem 1

Given a k-step process ϒ on a dS dimensional subsystem, generated from global unitary dSE dimensional SE dynamics distributed according to an ϵ-approximate unitary t-design \({\mu }_{{{\mathsf{t}}}_{\epsilon }}\), the likelihood that its non-Markovianity exceeds any δ>0 is bounded as

$${{\mathbb{P}}}_{{{\mathsf{t}}}_{\epsilon }}[\ {{\mathcal{N}}}_{\blacklozenge }\ge \delta ]\le {\mathsf{B}},$$
(5)

where B is defined as

$${\mathsf{B}}:=\frac{{d}_{{\mathsf{S}}}^{3m(2k+1)}}{{\delta }^{2m}}\left[{\left(\frac{m}{{\mathcal{C}}}\right)}^{m}+{(2{\mathcal{B}})}^{2m}+\frac{\epsilon }{{d}_{{\mathsf{SE}}}^{\ {\mathsf{t}}}}{\eta }^{2m}\right],$$
(6)

for any m(0,t/4] and

$$\eta :=\left({d}_{{\mathsf{SE}}}^{4}{d}_{{\mathsf{S}}}^{2k}+{d}_{{\mathsf{S}}}^{-(2k+1)}\right)/4,$$
(7)

where \({\mathcal{C}}\) is defined in Eq. (14) and \({\mathcal{B}}\) an upper bound on the expected norm-1 non-Markovianity \({{\mathbb{E}}}_{{\mathsf{h}}}[{{\mathcal{N}}}_{1}]\), defined in Eq. (15).

The proof is displayed in full in the Methods section. The overall strategy is as done by R. Low10: a bound on the moments \({{\mathbb{E}}}_{{{\mathsf{t}}}_{\epsilon }}[{{\mathcal{N}}}_{\blacklozenge }^{\ m}]\) is given in terms of \({\mathcal{B}}\), \({\mathcal{C}}\) and η, followed by Markov’s inequality. The quantity η is related to the ϵ-approximate unitary t-design \({\mu }_{{{\mathsf{t}}}_{\epsilon }}\) through

$${{\mathbb{E}}}_{{{\mathsf{t}}}_{\epsilon }}\left[{{\mathcal{N}}}_{2}^{\ 2m}\right]\le {{\mathbb{E}}}_{{\mathsf{h}}}\left[{{\mathcal{N}}}_{2}^{\ 2m}\right]+\frac{\epsilon }{{d}_{{\mathsf{SE}}}^{\ {\mathsf{t}}}}\ {\eta }^{2m},$$
(8)

for any m > 0 and corresponds to the sum of the moduli of the coefficients of \({{\mathcal{N}}}_{2}^{\ 2}\). We explicitly determine a bound on this quantity within the proof of Theorem 1 in the Methods section, which is the one we take as definition in Eq. (7).

The choice of 0 < mt/4 can be made to optimize the right-hand-side of the inequality, which ideally should be small whenever δ is. The term \({d}_{{\mathsf{S}}}^{3(2k+1)}/{\delta }^{2}\) arises from bounding \({{\mathcal{N}}}_{\blacklozenge}\) and Markov’s inequality, while the three summands within square brackets will be small provided i) \({\mathcal{C}}\) is large, ii) \({\mathcal{B}}\) is small and iii) the unitary design sufficiently small ϵ and large t is well-approximate and high enough. For conditions i) and ii), we require a fixed k such that \({d}_{{\mathsf{E}}}\gg {d}_{{\mathsf{S}}}^{2k+1}\): this implies \({\mathcal{B}}\approx 0\), so that ignoring subleading terms, we require \(\epsilon \ll {\delta }^{2m}{(2{d}_{{\mathsf{E}}}^{-2}{d}_{{\mathsf{S}}}^{-(10k+11)/4})}^{4m}{d}_{{\mathsf{SE}}}^{\ {\mathsf{t}}}\) for a meaningful bound, as detailed in the Methods section on Convergence towards Markovianity.

Overall, the bound in Eq. (5) approaches concentration whenever dE is large relative to dS and k, together with large enough t, as shown in Fig. 3. Generally, it can be seen by inspection that the scaling in these cases will be polynomially vanishing in dE, exponentially vanishing in t (upon appropriate choice of parameter m), and becomes loose, polynomially in dS and exponentially in k. Therefore, the vast majority of processes sampled from such a t-design are indistinguishable from Markovian ones in this limit. This can be intuitively understood as that for processes of small subsystems in large environments (\({d}_{{\mathsf{E}}}\gg {d}_{{\mathsf{S}}}^{2k+1}\)) undergoing complex enough dynamics (large enough t) will look almost Markovian with high probability if the system is probed not too many times (small k). We will now show how these processes can be modeled in terms of random circuits.

Fig. 3: Upper bound on the probability for non-Markovianity to exceed a small amount for processes with distinct number of interventions and design dynamics against environment size.
figure 3

Upper bound B, defined by Eq. (5), on \({{\mathbb{P}}}_{{{\mathsf{t}}}_{\epsilon }}[{{\mathcal{N}}}_{\blacklozenge }\ge 0.1]\), the probability \({{\mathbb{P}}}_{{{\mathsf{t}}}_{\epsilon }}\) over an ϵ-approximate t-design for the non-Markovianity \({{\mathcal{N}}}_{\blacklozenge }\) to exceed δ = 0.1, against log2(dE), where dE is environment dimension, for a subsystem qubit undergoing a joint closed approximate unitary design interaction between a given number of interventions k. We fix an ϵ = 10−12 approximate unitary t-design for different values 2 ≤ t ≤ 10 and fixed values of timesteps k, optimizing m for each case.

Markovianization by circuit design

While no explicit sets forming unitary t-designs for t ≥ 4 are known to date, several efficient constructions generating approximate unitary designs by quantum circuits are known. Using these constructions we can highlight the physical implications of the theorem above. We begin by discussing the details of one such construction. As suggested in Fig. 1b, this construction only requires simple two-qubit interactions and, under certain conditions, yields an approximate unitary design, from which we can use Eq. (5) in our main Theorem to verify that Markovianization emerges.

We focus specifically on Result 2 by Winter et al.14, reproduced in the Methods section on efficient unitary designs, where a circuit with interactions mediated by two-qubit diagonal gates with three random parameters is introduced. The intuition behind such construction is that repeated alternate applications of these diagonal gates quickly randomizes the system. Notice that this idea now fully captures the gas scenario depicted in Fig. 2, where we only have two types of random two-body interactions repeatedly occurring, and we focus on one of the particles of the gas. The detail of this construction is reproduced in the Methods section on efficient circuit unitary designs.

We can illustrate this idea in Fig. 4, where we depict an n-qubit SE composite with k interventions on one of the qubits, with the interactions within the circuit being only between pairs of qubits and of only two kinds; these form blocks of unitaries between each time-step i that we label \({{\mathcal{W}}}_{{\ell }_{i}}\), where is related to the amount of two-qubit interactions as explicitly defined in Eq. (42). The main Result 2 of by Winter et al.14 states that for an n-qubit system, when t is of order \(\sqrt{n}\), a circuit \({{\mathcal{W}}}_{\ell }\) yields an ϵ-approximate unitary t-design if t − log2(ϵ)/n, up to leading order in n and t.

Fig. 4: Circuit diagram for a quantum process which can Markovianize under only two different types of 2-qubit interaction dynamics.
figure 4

For an n-qubit system (where each \(\left|0\right\rangle\) is a single qubit), the unitaries \({{\mathcal{W}}}_{\ell }\), composed of alternate repetitions only two distinct types of random interactions (depicted by diamonds and squares joined by the interacting qubits), and defined by Eq. (42), generate an ϵ-approximate unitary t-design whenever t − log2(ϵ)/n, as shown bt Winter et al.14. This can be thought as stemming from repeated alternate applications of random 2-qubit gates diagonal in only two Pauli bases. A qubit probed with a set of operations \(\{{{\mathcal{A}}}_{i}\}\) on a system undergoing ϵ-approximate unitary t-design dynamics \({{\mathcal{W}}}_{\ell }\) on a large environment will Markovianize for small design error ϵ and large complexity t as specified in the main text.

Furthermore, of great relevance in this result is the fact that almost all 2-qubit gates in each repetition of \({{\mathcal{W}}}_{\ell }\) can be applied simultaneously because they commute64,67. Therefore, if \({{\mathcal{W}}}_{\ell }\) yields an approximate unitary design as above, the order of the non-commuting gate depth \({\mathfrak{D}}\), defined by Winter et al.67 as the circuit depth when each commuting part of the circuit is counted as a single part, will coincide with the bound on the order of the number of repetitions . That is, the non-commuting gate depth asymptotes to

$${\mathfrak{D}} \sim {\mathsf{t}}-{{{\log}}}_{2}(\epsilon )/n.$$
(9)

We can now think of the system from the toy model of Fig. 2 as given by a spin locally interacting with a large, nE-qubit environment, via a random time-independent Hamiltonian, with Eq. (5) statistically predicting under which conditions memory effects can be neglected. Notice that this is only a physical picture evoked by the \({{\mathcal{W}}}_{\ell }\) circuits rather than exactly being the model described by it. In Fig. 5 we take such a system for a single qubit and demand a bound B≤0.01 on the probability \({{\mathbb{P}}}_{{{\mathsf{t}}}_{\epsilon }}[{{\mathcal{N}}}_{\blacklozenge }\ge 0.1]\) for a k = 2 timestep process; with this, we plot the scaling of the non-commuting gate depth \({\mathfrak{D}}\) required to achieve an ϵ = 10−12 approximate unitary t-design using \({{\mathcal{W}}}_{\ell }\) circuits for different values of 2 ≤ t ≤ 10. While the number of 2-qubit gates is on the order of 104, the number of repetitions is at most 12 for an approximate 10-design and stays mostly constant as the number of environment qubits increases.

Fig. 5: Scaling of the non-commuting gate depth of the approximate unitary design by Winter et al.14 for a 2-step process on a single qubit to Markovianize with respect to environment size.
figure 5

Scaling of the non-commuting gate depth \({\mathfrak{D}}\), given by the minimum amount of alternate repetitions of the two kinds of random two-qubit diagonal gates within the unitary \({{\mathcal{W}}}_{\ell }\), plotted against the environment qubits nE = log2(dE), to generate an ϵ = 10−12 approximate unitary t-design for 2 ≤ t ≤ 10. This is such that for a single-qubit system undergoing a process with k = 2 timesteps, the probability \({{\mathbb{P}}}_{{{\mathsf{t}}}_{\epsilon }}\) for the non-Markovianity \({{\mathcal{N}}}_{\blacklozenge }\) exceeding 0.1 is less or equal than 0.01, i.e. \({{\mathbb{P}}}_{{{\mathsf{t}}}_{\epsilon }}[{{\mathcal{N}}}_{\blacklozenge }\ge 0.1]\le {\mathsf{B}}\le 0.01\).

This construction naturally accommodates the cartoon example in Fig. 2. As long as the two interactions in the example together generate the necessary level of complexity, Markovianization will emerge. This shows, in principle, how simple dynamics described by approximate unitary designs can Markovianize under the right conditions. Moreover, taking the physical interpretation of a qubit locally interacting through two-qubit diagonal unitaries with a large environment, it also hints at how macroscopic systems can display Markovianization of small subsystem dynamics in circuits requiring just a small gate depth. Furthermore, for macroscopic systems with coarse observables, the same Markovianization behavior would remain resilient to a much larger number of interventions.

Discussion

We have shown how physical quantum processes Markovianize, i.e., forget the past, for a class of physically motivated systems that can finitely can approximate random ones. Forgetfulness is indeed a common feature of the world around us, and one that is crucial for doing science. Without forgetfulness, repeatability would be impossible. After all, if each carbon atom remembered its own past then it will be unique and there would be no sense in classifying atoms and molecules. Beyond these foundational considerations, our results have direct consequences for the study of open systems using standard tools, such as master equations and dynamical maps. The latter of which can be seen as a family of one-step process tensors (with initial SE correlations a minimum of two steps must be considered16,68). Specifically, our results, for the case of k ≤ 2, can be used to estimate the time scale, using gate depth as a proxy, on which an approximate unitary design’s open dynamics can be described (with high probability) with a truncated memory kernel2,69,70, or even a Markovian master equation.

Conversely, for larger k, our results would have implications for approximations made in computing higher order correlation functions, such as the quantum regression theorem71. These higher order approximations are independent of those at the level of dynamical maps, which can, e.g., be divisible, even when the process is non-Markovian72. This is reflected in the loosening behavior of the bound in Eq. (5) as the number of timesteps increases, which can be interpreted as a growing potential for temporal correlations to become relevant when more information about the process is accessible.

This breadth of applicability is in contrast with the results of Modi et al.9, where it was shown that quantum processes satisfy a concentration of measure with respect to Haar measure around Markovian ones, which has two main drawbacks: first, as stated above, Haar random interactions do not exist in nature and hence the relevance of the result is limited. Second, the rate of Markovianization is far too strong. Almost all processes, sampled according to the Haar measure, will simply look random, i.e., Markov order m = 0 even for a large k. This, unlike our current result, misses almost all interesting physical dynamical processes. While the behaviour of our large deviation bound is polynomial, rather than exponential, thus not exhibiting concentration per-se, we have nevertheless exemplified how, with modestly large environments and relatively simple interactions, almost Markovian processes can come about with high probability. Physical macroscopic environments will be far larger than the scale shown in Figs. 3 and 5.

Despite the fundamental relevance of our result, it is well known that typicality arguments can have limited reach. For instance, the exotic Hamiltonians, introduced by Gemmer et al.73, which lead to strange relaxation, may not Markovianize even though the SE process is highly complex with a large E. There is also still significant scope for further addressing physical aspects, such as the question of whether, and how, a time-independent Hamiltonian can give rise to an approximate unitary design14, the relevant time scales of Markovianization, or the potential role of different approaches to pseudo-randomness such as that by Kastoryano et al.74, where it is shown that driven quantum systems can converge rapidly to the uniform distribution. Furthermore, a renewed wave of interest in thermalization has come along with the so-called Eigenstate Thermalization Hypothesis (ETH), which is a stronger and seemingly more fundamental condition on thermalization75,76,77,78,79,80, and we would thus expect a deep connection in the sense of ETH between Markovianization and thermalization to be forthcoming. In any case, it is clear that many physical systems Markovianize at some scale, and it only remains to discover how.

Methods

The process tensor

The Choi state representation of the process tensor is given by

$${{\Upsilon }}={{\rm{tr}}}_{{\mathsf{E}}}\left[\ {{\mathsf{U}}}_{k:0}\left(\rho \otimes {{{\Psi }}}^{\otimes k}\right)\ {{\mathsf{U}}}_{k:0}^{\dagger }\right],$$
(10)

where each Ψ is a maximally entangled state on an ancillary space of dimension \({d}_{{\mathsf{S}}}^{2}\), and where

$${{\mathsf{U}}}_{k:0}:=({U}_{k}\otimes {\mathbb{1}}){{\mathcal{S}}}_{k}\cdots ({U}_{1}\otimes {\mathbb{1}}){{\mathcal{S}}}_{1}({U}_{0}\otimes {\mathbb{1}}),$$
(11)

is a unitary operator acting on the whole SE together with the 2k ancillas. All identities act on the ancillary system, the Ui are SE unitary operators at step i, and \({{\mathcal{S}}}_{i}\) is a SWAP operator between system S and half of the ith ancillary space at the ith time-step of the process.

The definition in Eq. (10) is a generalization of the standard Choi state for quantum channels, as given by the Choi-Jamiołkowski isomorphism (CJI)24. The CJI for quantum channels establishes a one to one correspondence with a quantum state on a larger Hilbert space, given as the action of the channel onto half a maximally entangled state. The standard definition uses unnormalized maximally entangled states; however, here we are concerned with distinguishability of Choi states through the diamond norm in Eq. (3) and the Schatten norms in Eq. (IV B), so we avoid a normalization factor in these by normalizing the Choi states by definition. A discussion in full depth about the process tensor, its different representations and its properties and relevance is given by Modi et al.12.

As stated in the main text, ϒ can be efficiently described when written as a matrix product operator11,31. A matrix product operator (MPO) gets its name from the representation of an n-body operator \(\hat{O}\) as \(\hat{O}={\sum }_{\{p,q\}}{O}_{{p}_{1}\ldots {p}_{n}}^{{q}_{1}\ldots {q}_{n}}\left|{p}_{1}\ldots {p}_{n}\right\rangle \left\langle {q}_{1}\ldots {q}_{n}\right|\), where the coefficients can be represented as a product of matrices, \({O}_{{p}_{1}\ldots {p}_{n}}^{{q}_{1}\ldots {q}_{n}}={\rm{tr}}[{M}_{1}^{{p}_{1}{q}_{1}}{M}_{2}^{{p}_{2}{q}_{2}}\cdots {M}_{n}^{{p}_{n}{q}_{n}}]\). In particular, a matrix product density operator is a MPO with \({M}_{i}^{{p}_{i}{q}_{i}}={\sum }_{\ell }{A}_{i}^{{p}_{i}\ell }\otimes {({A}_{i}^{{q}_{i}\ell })}^{\dagger }\), with the dimension of the matrices \({A}_{i}^{{p}_{i}\ell }\) known as the bond dimension. For the process tensor ϒ, these matrices generically correspond to \({M}_{i}^{{r}_{i}{r}_{i-1}^{\prime}{s}_{i}{s}_{i-1}^{\prime}}=\langle {r}_{i}| {U}_{i}| {r}_{i-1}^{\prime}\rangle \otimes \langle {s}_{i}| {U}_{i}^{\dagger }| {s}_{i-1}^{\prime}\rangle\) where \(\left|{r}^{({\prime} )}\right\rangle\) and \(\left|{s}^{({\prime} )}\right\rangle\) are subsystem S basis vectors and Ui is an SE unitary at timestep i11. This means the bond dimension of ϒ is dE, which in practice should be much smaller, given that only part of the environment interacts with the system at any given time.

A non-ambiguous measure of non-Markovianity

As with any distinguishability measure, the non-Markovianity metric of Eq. (3) is not unique, and we choose the diamond norm for its mentioned operational significance. However, more generally, for any Schatten p-norm \(\parallel X{\parallel }_{p}:={\rm{tr}}{(| X{| }^{p})}^{\frac{1}{p}}\), a similar quantity can be defined \({{\mathcal{N}}}_{p}:=\frac{1}{2}{\min }_{{{{\Upsilon }}}^{({\rm{M}})}}\parallel {{\Upsilon }}-{{{\Upsilon }}}^{({\rm{M}})}{\parallel }_{p}\), as done with p = 1 in by Modi et al.9, whenever ϒ is normalized such that \({\rm{tr}}[{{\Upsilon }}]={\rm{tr}}[{{{\Upsilon }}}^{({\rm{M}})}]=1\). Then, we have the hierarchy \({{\mathcal{N}}}_{1}\ge {{\mathcal{N}}}_{2}\ge \ldots\), induced by that of the Schatten norms. As the black diamond norm is generally difficult to compute exactly, a particularly useful relation to Eq. (3) is \({d}_{{\mathsf{S}}}^{-2k-1}{{\mathcal{N}}}_{\blacklozenge }\le {{\mathcal{N}}}_{1}\le {{\mathcal{N}}}_{\blacklozenge }\), in the sense that once any Schatten norm is known, the black diamond norm is automatically bounded.

Nevertheless, we highlight that, in general, any distinguishability measure \({\mathcal{N}}\) between a process ϒ and the closest Markovian one ϒ(M) will capture all non-Markovian features across multiple time steps, i.e., all multi-time phenomena and memory effects20. This is in contrast to other measures of non-Markovianity, e.g. trace-distance based measure19 and other based on divisibility81, that have been proposed in recent years. In particular, all other measures relying on completely positive divisibility are only able to account for temporal correlations across at most three time-steps and are not sufficient to enforce the multi-time Markov condition82. This is even true in the classical case. Concretely, there are explicit examples of multi-time non-Markovian processes that are shown to be completely positive divisible processes, thus also deemed to be Markovian by the trace-distance based measure20,82. On the other hand, if a process satisfies the multi-time Markov condition, then it will be completely positive divisible.

In other words, the multi-time Markov condition is a stronger one that contains Markov conditions based on completely positive divisibility. This is why we consider the multi-time Markov condition in this manuscript.

Markovian typicality

In general, we say that a function f from a metric space \({\mathfrak{S}}\) with metric \({{{\Delta }}}_{{\mathfrak{S}}}\) and probability measure μσ, to the real numbers, satisfies a concentration of measure around its mean if, for any point \(x\in {\mathfrak{S}}\) and any δ > 0,

$${{\mathbb{P}}}_{\sigma }[f(x)\ge {{\mathbb{E}}}_{\sigma }(f)+\delta ]\le {\alpha }_{\sigma }(\delta /{\rm{L}}),$$
(12)

where as done in the remainder of this manuscript, \({{\mathbb{P}}}_{\sigma }\) and \({{\mathbb{E}}}_{\sigma }\) explicitly refer to the probability and expectation with x ~ μσ, and where L > 0 is the so-called Lipschitz constant of f, which can be determined according to \(| f(x)-f(y)| \le {\rm{L}}\ {{{\Delta }}}_{{\mathfrak{S}}}(x,y)\) for any two points \(x,y\in {\mathfrak{S}}\). Whenever L is small, intuitively this implies that f varies slowly in such space. Finally, the function ασ is called a concentration rate; it generally must be vanishing in increasing δ in order for (12) to constitute concentration of measure, and it intuitively tells us how strong such concentration is.

Particularly well-known is the example of concentration of measure in the hypersphere of a high dimension, where for all functions that do not change too rapidly, i.e. with a small Lipschitz constant L, the function evaluated on a point picked uniformly at random will be close to its mean value with high probability, i.e. specifically ασ decays exponentially with − δ2. This is also known as Levy’s lemma46 and it has, remarkably, also been used by Winter et al.3 to show that the fundamental theorem of statistical mechanics arises from entanglement.

Similarly, Modi et al.9 showed that quantum processes satisfy a concentration of measure around Markovian ones, explaining the emergence of Markovianity without a-priori assumptions. In particular, there, the trace distance \({{\mathcal{N}}}_{1}\) was used as a measure of non-Markovianity, which strictly speaking gives the distinguishability between explicitly constructed Choi states of corresponding process tensors and has no operational meaning; however, we can use the relation \({d}_{{\mathsf{S}}}^{-2k-1}{{\mathcal{N}}}_{\blacklozenge }\le {{\mathcal{N}}}_{1}\le {{\mathcal{N}}}_{\blacklozenge }\) to relate this to the stricter notion of non-Markovianity defined in terms of the diamond norm in Eq. (3). This implies that the main result by Modi et al.9, where all SE unitaries of Eq. (10) were randomly sampled according to the Haar measure, can be written equivalently as

$${{\mathbb{P}}}_{{\mathsf{h}}}\left[{{\mathcal{N}}}_{\blacklozenge }\ge {d}_{{\mathsf{S}}}^{2k+1}{\mathcal{B}}+\delta \right]\le \exp \left\{-4\ {\mathcal{C}}\ {\delta }^{2}{d}_{{\mathsf{S}}}^{-2(2k+1)}\right\},$$
(13)

where

$${\mathcal{C}}=\frac{{d}_{{\mathsf{SE}}}(k+1)}{16}{\left(\frac{{d}_{{\mathsf{S}}}-1}{{d}_{{\mathsf{S}}}^{k+1}-1}\right)}^{2},$$
(14)

is the Lipschitz constant of \({{\mathcal{N}}}_{1}\), and

$${\mathcal{B}}=\left\{\begin{array}{ll}\frac{1}{2}\sqrt{{d}_{{\mathsf{E}}}\ {{\mathbb{E}}}_{{\mathsf{h}}}[{\rm{tr}}({{{\Upsilon }}}^{2})]-x}+\frac{y}{2}&{\rm{if}}\ {d}_{{\mathsf{E}}}<{d}_{{\mathsf{S}}}^{2k+1}\\ \frac{1}{2}\sqrt{{d}_{{\mathsf{S}}}^{2k+1}{{\mathbb{E}}}_{{\mathsf{h}}}[{\rm{tr}}({{{\Upsilon }}}^{2})]-1}&{\rm{otherwise}},\end{array}\right.$$
(15)

is an upper bound on \({{\mathbb{E}}}_{{\mathsf{h}}}[{{\mathcal{N}}}_{1}]\), the expected non-Markovianity over the Haar measure, with \(x:={d}_{{\mathsf{E}}}{d}_{{\mathsf{S}}}^{-(2k+1)}\left(1+y\right)\) and \(y:=1-{d}_{{\mathsf{E}}}{d}_{{\mathsf{S}}}^{-(2k+1)}\), and

$${{\mathbb{E}}}_{{\mathsf{h}}}[{\rm{tr}}({{{\Upsilon }}}^{2})]=\frac{{d}_{{\mathsf{E}}}^{2}-1}{{d}_{{\mathsf{E}}}({d}_{{\mathsf{SE}}}+1)}{\left(\frac{{d}_{{\mathsf{E}}}^{2}-1}{{d}_{{\mathsf{SE}}}^{2}-1}\right)}^{k}+\frac{1}{{d}_{{\mathsf{E}}}},$$
(16)

the expected purity, i.e., the noisiness of the process ϒ, over the Haar measure. Holding everything else constant, the bound \({\mathcal{B}}\ge {{\mathbb{E}}}_{{\mathsf{h}}}[{{\mathcal{N}}}_{1}]\) satisfies

$$\mathop{{{\lim}}}\limits_{{d}_{{\mathsf{E}}}\to \infty }{\mathcal{B}}=0\ {\rm{and}}\ \mathop{{{\lim}}}\limits_{k\to \infty }{\mathcal{B}}=1,$$
(17)

so that the expected non-Markovianity vanishes in the dE →  limit and becomes loosest in the k →  limit case.

The significance of Eq. (15) is thus that quantum processes with not too many interventions in high dimensional environments will look to be almost Markovian with high probability. This means that, even when processes generically carry temporal correlations, these are typically low, explaining the emergence of Markovian processes without ad-hoc assumptions such as the Born-Markov approximation of weak coupling51.

Unitary designs

The result in Eq. (15) assumes that the dynamics are Haar distributed; however, implementing a Haar random unitary requires an exponential number of two-qubit gates and random bits83, thus Haar random dynamics cannot be obtained efficiently in a physical setting.

An exact unitary t-design is defined10 as a probability measure μt on \({\mathbb{U}}(d)\) such that for all positive st, and all ds × ds complex matrices X,

$${{\mathbb{E}}}_{{\mathsf{t}}}\left[{{\mathcal{V}}}^{\otimes s}(X)\right]={{\mathbb{E}}}_{{\mathsf{h}}}\left[{{\mathcal{U}}}^{\otimes s}(X)\right],\ \forall s\le {\mathsf{t}}.$$
(18)

As per the definition in Eq. (18), a unitary t-design reproduces up to the tth moment over the uniform distribution given by the Haar measure. In particular, μt can consist of a finite ensemble \({\{{V}_{i},{p}_{i}\}}_{i = 1}^{N}\) of unitaries Vi and probabilities pi, as is now common in applications such as so-called randomized benchmarking of error rates in quantum gates60,62.

Moreover, this definition can be relaxed by letting a unitary design approximate the Haar measure with a small error ϵ. In this manuscript we specifically employ the definition by R. Low10 for unitary designs. It uses the fact that the definition of an exact t-design, μt, can be written in terms of a balanced monomial Θ of degree less or equal to t in the elements of the unitaries U. A balanced monomial of degree t is a monomial in the unitary elements with precisely t conjugated and t unconjugated elements: for example, \({U}_{ab}{U}_{cd}{U}_{ef}^{* }{U}_{hg}^{* }\) is a balanced monomial of degree 2. Thus, writing Eq. (18) in terms of matrix elements, this can be seen to be equivalent to requiring \({{\mathbb{E}}}_{{\mathsf{t}}}[{{\Theta }}(V)]={{\mathbb{E}}}_{{\mathsf{h}}}[{{\Theta }}(U)]\) for all monomials Θ of degree st. Similarly, for an ϵ-approximate t-design we adopt the definition by R. Low10 with Eq. (4) implying

$$\left|{{\mathbb{E}}}_{{{\mathsf{t}}}_{\epsilon }}{{\Theta }}(V)-{{\mathbb{E}}}_{{\mathsf{h}}}{{\Theta }}(U)\right|\le \frac{\epsilon }{{d}^{{\mathsf{t}}}},$$
(19)

for monomials Θ of degree st. From now on, we will focus on the more general approximate designs. We will see below that the degree ϵ to which the distribution of the unitary dynamics on \({\mu }_{{{\mathsf{t}}}_{\epsilon }}\) differs from an exact design for given t depends on the complexity of the model.

Large deviation bounds for t-designs

The general idea for the main result by R. Low10 (similarly applied before by Horodecki et al.66) is that given a \({\mu }_{{{\mathsf{t}}}_{\epsilon }}\) distribution as an ϵ-approximate unitary t-design and a concentration result for a polynomial \({\mathcal{X}}\) of degree p, then one can compute the last term \({f}_{{{\mathsf{t}}}_{\epsilon }}\) in

$${{\mathbb{E}}}_{{{\mathsf{t}}}_{\epsilon }}{{\mathcal{X}}}^{m}={{\mathbb{E}}}_{{\mathsf{h}}}{{\mathcal{X}}}^{m}+{f}_{{{\mathsf{t}}}_{\epsilon }},$$
(20)

with mt/2p, which will generally have a dependence \({f}_{{{\mathsf{t}}}_{\epsilon }}={f}_{{{\mathsf{t}}}_{\epsilon }}(\epsilon ,{\mathsf{t}},{\mathcal{X}})\). Using Markov’s inequality

$${{\mathbb{P}}}_{{{\mathsf{t}}}_{\epsilon }}({\mathcal{X}}\ge \delta ) = \, {{\mathbb{P}}}_{{{\mathsf{t}}}_{\epsilon }}({{\mathcal{X}}}^{m}\ge {\delta }^{m})\\ \le \, \frac{{{\mathbb{E}}}_{{{\mathsf{t}}}_{\epsilon }}{{\mathcal{X}}}^{m}}{{\delta }^{m}}\\ = \, \frac{1}{{\delta }^{m}}\left[{{\mathbb{E}}}_{{\mathsf{h}}}{{\mathcal{X}}}^{m}+{f}_{{{\mathsf{t}}}_{\epsilon }}\right],$$
(21)

which is the form of the main large-deviation bound.

Specifically, the results that we employ are the following, proved R. Low10.

Theorem 2

(Large deviation bounds for t-designs by R. Low10) Let \({\mathcal{X}}\) be a polynomial of degree T. Let \(f(U)={\sum }_{i}{\alpha }_{i}{{{\Theta }}}_{{s}_{i}}(U)\) where \({{{\Theta }}}_{{s}_{i}}(U)\) are monomials and let α(f)=iαi. Suppose that f has probability concentration

$${{\mathbb{P}}}_{{\mathsf{h}}}[| f-\zeta | \ge \delta ]\le C\exp \left(-C{\delta }^{2}\right),$$
(22)

and let \({\mu }_{{{\mathsf{t}}}_{\epsilon }}\), be an ϵ-approximate unitary t-design, then

$${{\mathbb{P}}}_{{\mu }_{{{\mathsf{t}}}_{\epsilon }}}[| f-\zeta | \ge \delta ]\le \frac{1}{{\delta }^{2m}}\left(C{\left(\frac{m}{C}\right)}^{m}+\frac{\epsilon }{{d}^{{\mathsf{t}}}}{(\alpha +| \zeta | )}^{2m}\right),$$
(23)

for any integer m with 2mTt.

This is the most general result providing a large-deviations bound on approximate unitary designs, where ζ can be any quantity, in particular the expectation of f. The main idea from this result (similarly applied before by Horodecki et al.66) is that given a \({\mu }_{{{\mathsf{t}}}_{\epsilon }}\) distribution as an ϵ-approximate unitary t-design and a concentration result for a polynomial f of degree T, then one can compute

$${{\mathbb{E}}}_{{{\mathsf{t}}}_{\epsilon }}\left[{f}^{m}\right]={{\mathbb{E}}}_{{\mathsf{h}}}\left[{f}^{m}\right]+g(\epsilon ,{\mathsf{t}},f),$$
(24)

where mt/2T. Using Markov’s inequality we have

$${{\mathbb{P}}}_{{{\mathsf{t}}}_{\epsilon }}(f\ge \delta )={{\mathbb{P}}}_{{{\mathsf{t}}}_{\epsilon }}{\delta }^{m}\left[{{\mathbb{E}}}_{{\mathsf{h}}}\left[{f}^{m}\right]+g(\epsilon ,{\mathsf{t}},f)\right],$$
(25)

which is the form of the main large deviations bound in Eq. (23). More precisely, the other two main results that come along with the proof of Theorem 2 by R. Low10, and allowing to compute the right hand-side of Eq. (25) are the following.

Lemma 3

(3.4 of by R. Low10) Let \({\mathcal{X}}\) be a polynomial of degree T and ζ any constant. Let \(f(U)={\sum }_{i}{\alpha }_{i}{{{\Theta }}}_{{s}_{i}}(U)\) where \({{{\Theta }}}_{{s}_{i}}(U)\) are monomials and let α(f) = ∑iαi. Then for an integer m such that 2mTt and \({\mu }_{{{\mathsf{t}}}_{\epsilon }}\) an ϵ-approximate unitary t-design,

$${{\mathbb{E}}}_{{{\mathsf{t}}}_{\epsilon }}\left[| f-\zeta {| }^{2m}\right]\le {{\mathbb{E}}}_{{\mathsf{h}}}\left[| f-\zeta {| }^{2m}\right]+\frac{\epsilon }{{d}^{{\mathsf{t}}}}{\left(\alpha +| \zeta | \right)}^{2m}.$$
(26)

Lemma 4

(5.2 of by R. Low10) Let X be any non-negative random variable with probability concentration

$${\mathbb{P}}(X\ge \delta +\gamma )\le C\exp (-{\mathfrak{C}}\ {\delta }^{2}),$$
(27)

where γ ≥ 0, then

$${\mathbb{E}}[{X}^{m}]\le C{\left(\frac{2m}{{\mathfrak{C}}}\right)}^{m/2}+{(2\gamma )}^{m},$$
(28)

for any m > 0.

So, in essence, given these results, we determine the right-hand sides of Eq. (26) and Eq. (28) through the measure of non-Markovianity in Eq. (3) and all the other relevant quantities in such terms.

Proof of Theorem 1

A bound on the Haar moments of \({{\mathcal{N}}}_{2}\)

Let us start by noticing that X1X2, so a concentration for \({{\mathcal{N}}}_{1}\) given by \({{\mathbb{P}}}_{{\mathsf{h}}}[{{\mathcal{N}}}_{1}\ge {\mathcal{B}}+\delta ]\le \exp (-{\mathcal{C}}{\delta }^{2})\) where here \({\mathcal{C}}=\frac{{d}_{{\mathsf{SE}}}(k+1)}{4}{\left(\frac{{d}_{{\mathsf{S}}}-1}{{d}_{{\mathsf{S}}}^{k+1}-1}\right)}^{2}\) (here 4 times the one defined in Eq. (14) in the main text), and \({\mathcal{B}}\) is defined in Eq. (15), also implies

$${{\mathbb{P}}}_{{\mathsf{h}}}[{{\mathcal{N}}}_{2}\ge {\mathcal{B}}+\delta ]\le {{\rm{e}}}^{-{\mathcal{C}}{\delta }^{2}},$$
(29)

so that in turn Lemma 4 through Eq. (28) implies that

$${{\mathbb{E}}}_{{\mathsf{h}}}[{{\mathcal{N}}}_{2}^{2m}] \le \, {\left(\frac{4m}{{\mathcal{C}}}\right)}^{m}+{(2{\mathcal{B}})}^{2m}\\ = \, {\left[\frac{16m}{(k+1){d}_{{\mathsf{SE}}}}{\left(\frac{{d}_{{\mathsf{S}}}^{k+1}-1}{{d}_{{\mathsf{S}}}-1}\right)}^{2}\right]}^{m}+{(2{\mathcal{B}})}^{2m},$$
(30)

for any m > 0.

A bound on the design moments of \({{\mathcal{N}}}_{2}\)

For the case of all unitaries at each step being independently sampled, \({{\mathcal{N}}}_{2}^{\ 2}\) is a polynomial of degree p = 2 when the unitaries are all distinct (random interaction type). We can thus take \({{\mathcal{N}}}_{2}^{\ 2}\) and apply Lemma 3 for a unitary t-design \({\mu }_{{{\mathsf{t}}}_{\epsilon }}\) with t≥4m, which actually holds for real m > 0, as

$${{\mathbb{E}}}_{{{\mathsf{t}}}_{\epsilon }}[{{\mathcal{N}}}_{2}^{\ 2m}]\le {{\mathbb{E}}}_{{\mathsf{h}}}[{{\mathcal{N}}}_{2}^{\ 2m}]+\frac{\epsilon }{{d}_{{\mathsf{SE}}}^{\ {\mathsf{t}}}}\ {\eta }^{2m}$$
(31)

where η is the sum of the moduli of the coefficients of

$${{\mathcal{N}}}_{2}^{\ 2} = \, {\left(\frac{1}{2}\mathop{{{\min}}}\limits_{{{{\Upsilon }}}^{({\rm{M}})}}\parallel {{\Upsilon }}-{{{\Upsilon }}}^{({\rm{M}})}{\parallel }_{2}\right)}^{2}\\ \le \, \frac{1}{4}\parallel {{\Upsilon }}-\frac{{\mathbb{1}}}{{d}_{{\mathsf{S}}}^{2k+1}}{\parallel }_{2}^{2}\\ = \, \frac{1}{4}\left[{\rm{tr}}({{{\Upsilon }}}^{2})-{d}_{{\mathsf{S}}}^{-(2k+1)}\right].$$
(32)

The proof of Lemma 3.4 by R. Low10 requires m to be an integer through the multinomial theorem; in the notation of the cited paper, this can be relaxed to be a real number by applying the multinomial theorem for a real power: convergence requires an ordering such that \(| {\alpha }_{t}{\mathbb{E}}{M}_{t}| > {2}^{1-n}| {\alpha }_{t-n}{\mathbb{E}}{M}_{t-n}|\) for each n = 1, …, t − 1 for both the approximate design and Haar expectations.

Let us explicitly write the process ϒ, defined in Eq. (10) in the main text, as a function of the set of unitaries \({\mathfrak{U}}:={\{{U}_{i}\}}_{i = 0}^{k}\), i.e.

$${{\Upsilon }}[{\mathfrak{U}}]={{\rm{tr}}}_{{\mathsf{E}}}[{U}_{k}{{\mathcal{S}}}_{k}\cdots {U}_{1}{{\mathcal{S}}}_{1}{U}_{0}(\rho \otimes {{{\Psi }}}^{\otimes k}){U}_{0}^{\dagger }{{\mathcal{S}}}_{1}{U}_{1}^{\dagger }\cdots {{\mathcal{S}}}_{k}{U}_{k}^{\dagger }],$$
(33)

where here implicitly U stands for U 12k−ancillas and the maximally entangled states Ψ are taken to be normalized. As the swaps between the system and the ith half ancillary system are given by \({{\mathcal{S}}}_{i}=\sum {{\mathfrak{S}}}_{\alpha \beta }\otimes {\mathbb{1}}\otimes \left|\beta \right\rangle \ {\left\langle \alpha \right|}_{i}\otimes {\mathbb{1}}\) where \({{\mathfrak{S}}}_{\alpha \beta }:={{\mathbb{1}}}_{{\mathsf{E}}}\otimes \left|\alpha \right\rangle \ {\left\langle \beta \right|}_{{\mathsf{S}}}\), this can be written as

$${{\Upsilon }}[{\mathfrak{U}}]= \, {d}_{{\mathsf{S}}}^{-k}\sum {{\rm{tr}}}_{{\mathsf{E}}}\left[{U}_{k}{{\mathfrak{S}}}_{{\alpha }_{k}{\beta }_{k}}\cdots {U}_{1}{{\mathfrak{S}}}_{{\alpha }_{1}{\beta }_{1}}{U}_{0}\rho {U}_{0}^{\dagger }{{\mathfrak{S}}}_{{\delta }_{1}{\gamma }_{1}}{U}_{1}^{\dagger }\cdots {{\mathfrak{S}}}_{{\delta }_{k}{\gamma }_{k}}{U}_{k}^{\dagger }\right]\\ \otimes \, \left|{\beta }_{1}{\alpha }_{1}\cdots {\beta }_{k}{\alpha }_{k}\right\rangle \ \left\langle {\delta }_{1}{\gamma }_{1}\cdots {\delta }_{k}{\gamma }_{k}\right|.$$
(34)

Now, the standard approach to compute the sum of the moduli of the coefficients of a given polynomial is to evaluate on an argument (here a dSE × dSE matrix) full of ones (so that all single monomials equal to one) and take each summand to the corresponding modulus. We follow this approach, however, we first notice that the environment part in Eq. (34) is just a product of the environment parts of all unitaries and initial state. To see this, let \(U=\sum {U}_{{e}^{\prime}{s}^{\prime}}^{es}\left|es\right\rangle \ \left\langle {e}^{\prime}{s}^{\prime}\right|\) where \(\left|e\right\rangle\) and \(\left|s\right\rangle\) are E and S bases. Unitarity then implies \(\sum {\overline{U}}_{es}^{ab}{U}_{\epsilon \sigma }^{ab}={\delta }_{e\epsilon }{\delta }_{s\sigma }\), where the overline denotes complex conjugate, and so this means that \({{\rm{tr}}}_{{\mathsf{E}}}[V{{\mathfrak{S}}}_{\alpha \beta }U\rho {U}^{\dagger }{{\mathfrak{S}}}_{\gamma \delta }{V}^{\dagger }]=\sum {V}_{{e}^{\prime}{s}^{\prime}}^{es}{\overline{V}}_{{e}^{\prime}{\sigma }^{\prime}}^{e\sigma }{U}_{b{s}_{2}^{\prime}}^{{e}^{\prime}{s}_{2}}{\overline{U}}_{b{\sigma }_{2}^{\prime}}^{{e}^{\prime}{\sigma }_{2}}{\rho }_{bt}^{br}\ \phi (S)\) where ϕ(S) stands for the system S part; for each b index the rest of the terms are summed over e; this generalizes similarly for any number of unitaries. This implies that at most dE terms need to be set to one and we can evaluate ϒ in a set of matrices \({\mathcal{J}}=\{{{\mathbb{1}}}_{{\mathsf{E}}}\otimes {J}_{{\mathsf{S}}},\cdots \ ,{{\mathbb{1}}}_{{\mathsf{E}}}\otimes {J}_{{\mathsf{S}}},{J}_{{\mathsf{E}}}\otimes {J}_{{\mathsf{S}}}\}\) with J a matrix with each element equal to one in the respective E or S systems: let \(\rho =\sum {\rho }_{es{e}^{\prime}{s}^{\prime}}\left|es\right\rangle \ \left\langle {e}^{\prime}{s}^{\prime}\right|\), then

$${{\Upsilon }}[{\mathcal{J}}] = \, {d}_{{\mathsf{S}}}^{-k}\sum {\rho }_{es{e}^{\prime}{s}^{\prime}}{\rm{tr}}[{d}_{{\mathsf{E}}}{J}_{{\mathsf{E}}}\left|e\right\rangle \ \left\langle {e}^{\prime}\right|]{J}_{{\mathsf{S}}}\left|{\alpha }_{k}\right\rangle \ \left\langle {\beta }_{k}\right|\cdots \left|{\alpha }_{1}\right\rangle \ \left\langle {\beta }_{1}\right|{J}_{{\mathsf{S}}}\left|s\right\rangle \ \left\langle {s}^{\prime}\right|{J}_{{\mathsf{S}}}\left|{\delta }_{1}\right\rangle \\ \quad\, \left\langle {\gamma }_{1}\right|\cdots \left|{\delta }_{k}\right\rangle \ \left\langle {\gamma }_{k}\right|{J}_{{\mathsf{S}}}\otimes \left|{\beta }_{1}{\alpha }_{1}\cdots {\beta }_{k}{\alpha }_{k}\right\rangle \ \left\langle {\delta }_{1}{\gamma }_{1}\cdots {\delta }_{k}{\gamma }_{k}\right|\\ = \, \frac{{d}_{{\mathsf{E}}}}{{d}_{{\mathsf{S}}}^{k}}\sum {\rho }_{es{e}^{\prime}{s}^{\prime}}\ {J}_{{\mathsf{S}}}\left|{\alpha }_{k}\right\rangle \ \left\langle {\beta }_{k}\right|\cdots \left|{\alpha }_{1}\right\rangle \ \left\langle {\beta }_{1}\right|{J}_{{\mathsf{S}}}\left|s\right\rangle \ \left\langle {s}^{\prime}\right|{J}_{{\mathsf{S}}}\left|{\delta }_{1}\right\rangle \ \left\langle {\gamma }_{1}\right|\cdots \left|{\delta }_{k}\right\rangle \\ \quad\,\left\langle {\gamma }_{k}\right|{J}_{{\mathsf{S}}}\otimes \left|{\beta }_{1}{\alpha }_{1}\cdots {\beta }_{k}{\alpha }_{k}\right\rangle \ \left\langle {\delta }_{1}{\gamma }_{1}\cdots {\delta }_{k}{\gamma }_{k}\right|,$$
(35)

and hence (we now omit the subindex S on the J matrices for simplicity),

$$ \,{\left(\frac{{d}_{{\mathsf{S}}}^{k}}{{d}_{{\mathsf{E}}}}\right)}^{2}{\rm{tr}}[{{{\Upsilon }}}^{2}({\mathcal{J}})] = \sum {\rho }_{es{e}^{\prime}{s}^{\prime}}{\rho }_{\epsilon \sigma {\epsilon }^{\prime}{\sigma }^{\prime}}{\rm{tr}}\left[J\left|{\alpha }_{k}\right\rangle \ \left\langle {\beta }_{k}\right|\cdots \left|{\alpha }_{1}\right\rangle \ \left\langle {\beta }_{1}\right|J\left|s\right\rangle \ \left\langle {s}^{\prime}\right|J\left|{\delta }_{1}\right\rangle \ \left\langle {\gamma }_{1}\right|\cdots \ J\left|{\delta }_{k}\right\rangle \right.\\ \quad\quad\, \left.\ \left\langle {\gamma }_{k}\right|{J}^{2}\left|{\gamma }_{k}\right\rangle \ \left\langle {\delta }_{k}\right|J\cdots \left|{\gamma }_{1}\right\rangle \ \left\langle {\delta }_{1}\right|J\left|\sigma \right\rangle \ \left\langle {\sigma }^{\prime}\right|J\left|{\beta }_{1}\right\rangle \ \left\langle {\alpha }_{1}\right|J\cdots \left|{\beta }_{k}\right\rangle \ \left\langle {\alpha }_{k}\right|J\right]\\ \,= {d}_{{\mathsf{S}}}^{2}\sum {\rho }_{es{e}^{\prime}{s}^{\prime}}{\rho }_{\epsilon \sigma {\epsilon }^{\prime}{\sigma }^{\prime}}{\rm{tr}}\left[J\left|{\alpha }_{k}\right\rangle \ \left\langle {\beta }_{k}\right|\cdots \left|{\alpha }_{1}\right\rangle \ \left\langle {\beta }_{1}\right|J\left|s\right\rangle \ \left\langle {s}^{\prime}\right|J\left|{\delta }_{1}\right\rangle \ \left\langle {\gamma }_{1}\right|\cdots \ \right.\\ \quad \kern14pt \left.\left\langle {\gamma }_{k-1}\right|J\left|{\delta }_{k}\right\rangle \ \left\langle {\delta }_{k}\right|J\left|{\gamma }_{k-1}\right\rangle \cdots \left|{\gamma }_{1}\right\rangle \ \left\langle {\delta }_{1}\right|J\left|\sigma \right\rangle \ \left\langle {\sigma }^{\prime}\right|J\left|{\beta }_{1}\right\rangle \ \left\langle {\alpha }_{1}\right|J\cdots \left|{\beta }_{k}\right\rangle \ \left\langle {\alpha }_{k}\right|J\right]\\ \,= {d}_{{\mathsf{S}}}^{2k+1}\sum {\rho }_{es{e}^{\prime}{s}^{\prime}}{\rho }_{\epsilon \sigma {\epsilon }^{\prime}{\sigma }^{\prime}}\ {\rm{tr}}[J\left|{\alpha }_{k}\right\rangle \ \left\langle {\beta }_{k}\right|\cdots \left|{\alpha }_{1}\right\rangle \ \left\langle {\beta }_{1}\right|J\left|s\right\rangle \ \left\langle {s}^{\prime}\right|J\left|\sigma \right\rangle \ \left\langle {\sigma }^{\prime}\right|J\left|{\beta }_{1}\right\rangle \ \left\langle {\alpha }_{1}\right|J\cdots \left|{\beta }_{k}\right\rangle \ \left\langle {\alpha }_{k}\right|J]\\ \,= {d}_{{\mathsf{S}}}^{2k+3}\sum {\rho }_{es{e}^{\prime}{s}^{\prime}}{\rho }_{\epsilon \sigma {\epsilon }^{\prime}{\sigma }^{\prime}}\left\langle {\beta }_{k}\right|J\left|{\alpha }_{k-1}\right\rangle \cdots \left\langle {\alpha }_{2}\right|J\left|{\alpha }_{1}\right\rangle \ \left\langle {\beta }_{1}\right|J\left|s\right\rangle \ \left\langle {s}^{\prime}\right|J\left|\sigma \right\rangle \ \left\langle {\sigma }^{\prime}\right|J\left|{\beta }_{1}\right\rangle \ \left\langle {\alpha }_{1}\right|J\cdots \left\langle {\alpha }_{k-1}\right|J\left|{\beta }_{k}\right\rangle \\ \,= {d}_{{\mathsf{S}}}^{2k+5}\sum {\rho }_{es{e}^{\prime}{s}^{\prime}}{\rho }_{\epsilon \sigma {\epsilon }^{\prime}{\sigma }^{\prime}}\left\langle {\beta }_{k-1}\right|J\left|{\alpha }_{k-2}\right\rangle \cdots \left\langle {\alpha }_{2}\right|J\left|{\alpha }_{1}\right\rangle \ \left\langle {\beta }_{1}\right|J\left|s\right\rangle \ \left\langle {s}^{\prime}\right|J\left|\sigma \right\rangle \ \left\langle {\sigma }^{\prime}\right|J\left|{\beta }_{1}\right\rangle \ \left\langle {\alpha }_{1}\right|J\cdots \left\langle {\alpha }_{k-1}\right|J\left|{\beta }_{k-1}\right\rangle \\ \,= {d}_{{\mathsf{S}}}^{2(2k+1)}\sum {\rho }_{es{e}^{\prime}{s}^{\prime}}{\rho }_{\epsilon \sigma {\epsilon }^{\prime}{\sigma }^{\prime}},$$
(36)

where to obtain the second line we used the fact that Jn = dn−1J for positive integers n, here applied for n = 2, together with the trace over system S given by \(\sum \left\langle {\gamma }_{k}\right|\cdot \left|{\gamma }_{k}\right\rangle\). This is similarly done to get the third line by \(\sum \left|{\delta }_{k}\right\rangle \ \left\langle {\delta }_{k}\right|={{\mathbb{1}}}_{{\mathsf{S}}}\), and taking the trace summing over \(\left|{\gamma }_{k-1}\right\rangle\), which can subsequently be done for all \(\left|{\gamma }_{i}\right\rangle\) and \(\left|{\delta }_{i}\right\rangle\). For the fourth line, the cyclicity of the trace was used, followed by an identity taken by summing up over \(\left|{\alpha }_{k}\right\rangle\), using J2 = dJ, and taking the trace. This can be done through all remaining steps, giving the last equality. This, together with Eq. (32), implies that (now writing simply i, j for SE indices),

$$4\eta \le \,{d}_{{\mathsf{E}}}^{2}{d}_{{\mathsf{S}}}^{2(k+1)}{\left(\sum | {\rho }_{ij}| \right)}^{2}+\frac{1}{{d}_{{\mathsf{S}}}^{2k+1}}\\ \le \, {d}_{{\mathsf{E}}}^{4}{d}_{{\mathsf{S}}}^{2(k+2)}\sum | {\rho }_{ij}{| }^{2}+\frac{1}{{d}_{{\mathsf{S}}}^{2k+1}}\\ \le \, {d}_{{\mathsf{E}}}^{4}{d}_{{\mathsf{S}}}^{2(k+2)}+\frac{1}{{d}_{{\mathsf{S}}}^{2k+1}},$$
(37)

where in the second line we used \(\parallel X{\parallel }_{1}^{2}\le d\parallel X{\parallel }_{2}^{2}\) for element-wise norms \(\parallel X{\parallel }_{p}^{p}=(\sum | {x}_{ij}{| }^{p})\) and in the third line we used \(\parallel \rho {\parallel }_{2}^{2}\le 1\).

Markov’s inequality on \({{\mathcal{N}}}_{\blacklozenge }\)

As \({d}_{{\mathsf{S}}}^{-2k1-1}{{\mathcal{N}}}_{\blacklozenge }\le {{\mathcal{N}}}_{1}\le \sqrt{{d}_{{\mathsf{S}}}^{2k+1}}{{\mathcal{N}}}_{2}\), also for 0 < m ≤ t/4,

$${{\mathbb{P}}}_{{{\mathsf{t}}}_{\epsilon }}[{{\mathcal{N}}}_{\blacklozenge }\ge \delta ] \le \,{{\mathbb{P}}}_{{{\mathsf{t}}}_{\epsilon }}\left[\sqrt{{d}_{{\mathsf{S}}}^{3(2k+1)}}\ {{\mathcal{N}}}_{2}\ge \delta \right]={{\mathbb{P}}}_{{{\mathsf{t}}}_{\epsilon }}\left[{{\mathcal{N}}}_{2}^{\ 2m}\ge \frac{{\delta }^{2m}}{{d}_{{\mathsf{S}}}^{3m(2k+1)}}\right]\\ \le \, \frac{{d}_{{\mathsf{S}}}^{3m(2k+1)}\ {{\mathbb{E}}}_{{{\mathsf{t}}}_{\epsilon }}{{\mathcal{N}}}_{2}^{\ 2m}}{{\delta }^{2m}}\le {\left(\frac{{d}_{{\mathsf{S}}}^{3(2k+1)}}{{\delta }^{2}}\right)}^{m}\left[{\left(\frac{4m}{{\mathcal{C}}}\right)}^{m}+{(2{\mathcal{B}})}^{2m}+\frac{\epsilon }{{d}_{{\mathsf{SE}}}^{\ {\mathsf{t}}}}{\eta }^{2m}\right]\\ = \, {\left(\frac{{d}_{{\mathsf{S}}}^{3(2k+1)}}{{\delta }^{2}}\right)}^{m}\left\{{\left[\frac{16m}{(k+1){d}_{{\mathsf{SE}}}}{\left(\frac{{d}_{{\mathsf{S}}}^{k+1}-1}{{d}_{{\mathsf{S}}}-1}\right)}^{2}\right]}^{m}+{(2{\mathcal{B}})}^{2m}+\frac{\epsilon }{1{6}^{m}{d}_{{\mathsf{SE}}}^{\ {\mathsf{t}}}}{\left({d}_{{\mathsf{E}}}^{4}{d}_{{\mathsf{S}}}^{2(k+2)}+\frac{1}{{d}_{{\mathsf{S}}}^{2k+1}}\right)}^{2m}\right\},$$
(38)

where in the third line we used Markov’s inequality. This concludes the proof of Theorem 1.

Convergence towards Markovianity

We may first examine the third and penultimate lines leading to Eq. (38) for meaningful bounds \({{\mathbb{P}}}_{{{\mathsf{t}}}_{\epsilon }}[{{\mathcal{N}}}_{\blacklozenge }\ge \delta ]\). The term \({d}_{{\mathsf{S}}}^{3(2k+1)}/{\delta }^{2}\) arises from bounding the diamond norm and Markov’s inequality; while δ is arbitrary, the \({d}_{{\mathsf{S}}}^{3(2k+1)}\) could still be relevant when multiplied with \({{\mathbb{E}}}_{{{\mathsf{t}}}_{\epsilon }}{{\mathcal{N}}}_{2}^{\ 2m}\). This latter term will be small provided 1) \({\mathcal{C}}\) is large, 2) \({\mathcal{B}}\) is small and 3) the unitary design is approximate and high enough.

For 1) and 2), as detailed by Modi et al.9, we require a fixed k such that \({d}_{{\mathsf{E}}}\gg {d}_{{\mathsf{S}}}^{2k+1}\). This implies \({\mathcal{B}}\approx 0\), so that

$${{\mathbb{P}}}_{{{\mathsf{t}}}_{\epsilon }}[{{\mathcal{N}}}_{\blacklozenge }\ge \delta ]\, \lesssim \,{\left(\frac{{d}_{{\mathsf{S}}}^{3(2k+1)}}{{\delta }^{2}}\right)}^{m}\left\{{\left[\frac{16m}{(k+1){d}_{{\mathsf{SE}}}}{\left(\frac{{d}_{{\mathsf{S}}}^{k+1}-1}{{d}_{{\mathsf{S}}}-1}\right)}^{2}\right]}^{m}+\frac{\epsilon }{1{6}^{m}{d}_{{\mathsf{SE}}}^{\ {\mathsf{t}}}}{\left({d}_{{\mathsf{E}}}^{4}{d}_{{\mathsf{S}}}^{2(k+2)}+\frac{1}{{d}_{{\mathsf{S}}}^{2k+1}}\right)}^{2m}\right\}\\ \approx \, \left\{{\left[\frac{16m}{{\delta }^{2}(k+1)}\frac{{d}_{{\mathsf{S}}}^{2(4k+1)}}{{d}_{{\mathsf{E}}}}\right]}^{m}+\epsilon \frac{{d}_{{\mathsf{E}}}^{8m-{\mathsf{t}}}{d}_{{\mathsf{S}}}^{m(10k+11)-{\mathsf{t}}}}{{\delta }^{2m}1{6}^{m}}\right\}.$$
(39)

Now, supposing the t-design is exact, i.e. ϵ = 0, we require \(m\le {\delta }^{2}\frac{(k+1){d}_{{\mathsf{E}}}}{16\ {d}_{{\mathsf{S}}}^{6k}}\), together with mt/4. On the other hand if ϵ is non-zero, we require

$$\epsilon \ll {\left[{\delta }^{2}{\left(\frac{2}{{d}_{{\mathsf{E}}}^{2}{d}_{{\mathsf{S}}}^{(10k+11)/4}}\right)}^{4}\right]}^{m}{d}_{{\mathsf{E}}}^{\ {\mathsf{t}}}{d}_{{\mathsf{S}}}^{\ {\mathsf{t}}}.$$
(40)

The choice of real m is only restricted by 0 < mt/4, but otherwise is arbitrary. The right-hand side of Eq. (38) is not monotonic in m over all the remaining parameters, so it won’t always be optimal for some fixed choice. One may thus optimize the choice of m numerically for each particular case.

Efficient circuit unitary designs

As mentioned in the main text, we focus on Result 2 of of Winter et al.14. To begin with, an efficient approximation for a unitary design on a system composed of n-qubits is shown by Winter et al.14 for a circuit labeled \({\rm{RDC}}({{\mathcal{I}}}_{2})\), where the name stands for Random Diagonal Circuit, and refers to a circuit where \({{\mathcal{I}}}_{2}=\{{I}_{i}\}\) is a set of subsets of qubit labels Ii {1, …, n}, such that Ii = 2, i.e., at step i, Ii picks a pair of qubits, to which a Pauli-Z-diagonal gate with three random parameters is applied. This construction can already be seen in the results of Winter et al.64 as arising from only two types of random diagonal interactions, which can be simplified into a product of Z-diagonal ones.

A particular case which further simplifies things is then denoted by \({{\rm{RDC}}}_{{\rm{disc}}}^{({\mathsf{t}})}({{\mathcal{I}}}_{2})\), where the subscript disc and the superscript t refer to discrete sets from which the parameters of the diagonal gates will be sampled, and which are determined by a given natural number t. Specifically, all gates in \({{\rm{RDC}}}_{{\rm{disc}}}^{({\mathsf{t}})}({{\mathcal{I}}}_{2})\) have the simplified form

$$({\rm{diag}}\{1,{e}^{i{\phi }_{1}}\}\otimes {\rm{diag}}\{1,{e}^{i{\phi }_{2}}\})\,{\rm{diag}}\{1,1,1,{e}^{i\vartheta }\},$$
(41)

where diag denotes Pauli-Z diagonal, and with ϕ1, ϕ2 chosen independently from the discrete set {2πm/(t + 1): m {0, …, t}} and ϑ chosen from {2πm/(t/2 + 1): m {0, …, t/2}}. We emphasise that this is still a circuit with 2-qubit diagonal gates with only three random parameters each, and therein lies its simplicity.

Now let Hn = Hn be n copies of the Hadamard gate, then the main Result 2 by Winter et al.14 states that for an n-qubit system, when t is of order \(\sqrt{n}\), a circuit of the form

$${{\mathcal{W}}}_{\ell }:={\left({{\rm{RDC}}}_{{\rm{disc}}}^{({\mathsf{t}})}({{\mathcal{I}}}_{2}){{\mathsf{H}}}_{n}\right)}^{2\ell }\,{{\rm{RDC}}}_{{\rm{disc}}}^{({\mathsf{t}})}({{\mathcal{I}}}_{2}),$$
(42)

yields an ϵ-approximate unitary t-design if

$$\ell \ge {\mathsf{t}}-{{{\log}}}_{2}(\epsilon )/n,$$
(43)

up to leading order in n and t.

All the 2-qubit gates in each repetition of \({{\mathcal{W}}}_{\ell }\), except those in Hn, can be applied simultaneously because they commute64,67. Thus, as explained in the main text, if \({{\mathcal{W}}}_{\ell }\) yields an approximate unitary design, the order of the non-commuting gate depth \({\mathfrak{D}}\) will coincide with the bound on the order of the number of repetitions .