Introduction

Every experiment involves applying actions to some system, and recording corresponding output responses. Both inputs and outputs are recorded as classical bits of information, and the system’s operational behavior can always be regarded as an input–output process that transforms inputs to outputs. Quantitative science aims to capture such behavior within mathematical models—algorithmic abstractions that can simulate future behavior based on past observations.

There is keen interest in finding the simplest models—models that replicate a system’s future behavior while storing the least past information.1, 2 The motivations are twofold. Firstly, from the rationale of Occam’s razor, we should posit no more causes of natural things than are necessary to explain their appearances. Every piece of past information a model requires represents a potential cause of future events, and thus simpler models better isolate the true indicators of future behavior. The second is practical. As we wish to simulate and engineer systems of increasing complexity, there is always need to find methods that utilize more modest memory requirements.

This motivated systematic methods for constructing such models. The state of the art are ε-transducers, models of input–output processes that are provably optimal—no other means of modeling a given input–output process can use less past information.3 The amount of past information such a transducer requires thus presents a natural measure of the process’s intrinsic complexity. This heralded new ways to understand structure in diverse systems, ranging from evolutionary dynamics to action-perception cycles.47 Yet ε-transducers are classical, their optimality only proven among classical models. Recent research indicates that quantum models can more simply simulate stochastic processes that evolve independently of input.810 Can quantum theory also surpass classical limits in modeling general processes that behave differently on different input?

Here, we present systematic methods to construct quantum transducers—quantum models that can be simpler than their optimal classical counterparts. The resulting constructions exhibit significant generality: they improve upon optimal classical models whenever it is physically possible to do so. Our work indicates that classical models waste information unavoidably and this waste can be mitigated via quantum processing.

Framework

We adopt the framework of computational mechanics.13 An input–output process describes a system that, at each discrete time-step \(t\in {\Bbb{Z}}\), can be ‘kicked’ in a number of different ways, denoted by some x (t) selected from a set of possible inputs \({\cal{X}}\). In response, the system emits some y (t) among a set of possible outputs \({\cal{Y}}\). For each possible bi-infinite input sequence \(\mathop{x}\limits^{\leftrightarrow}=\ldots \,{x}^{(-1)}{x}^{(0)}{x}^{(1)}\ldots\), the output of the system can be described by a stochastic process, \(\mathop{Y}\limits^{\leftrightarrow}=\ldots \,{Y}^{(-1)}{Y}^{\mathrm{(0)}}{Y}^{\mathrm{(1)}}\ldots\), a bi-infinite string of random variables where each Y (t) governs the output y (t). The black-box behavior of any input–output process is characterized by a family of stochastic processes, \({\{\mathop{Y}\limits^{\leftrightarrow}|\mathop{x}\limits^{\leftrightarrow}\}}_{\mathop{x}\limits^{\leftrightarrow}\in \mathop{X}\limits^{\leftrightarrow}}\). When the input \(\mathop{x}\limits^{\leftrightarrow}\) is governed by some stochastic process \(\mathop{X}\limits^{\leftrightarrow}\), the input–output process outputs \(\mathop{y}\limits^{\leftrightarrow}\) with probability.

$$P[\mathop{Y}\limits^{\leftrightarrow}=\mathop{y}\limits^{\leftrightarrow}]=\sum _{\mathop{x}\limits^{\leftrightarrow}}P[\mathop{Y}\limits^{\leftrightarrow}|\mathop{X}\limits^{\leftrightarrow}=\mathop{x}\limits^{\leftrightarrow}]P[\mathop{X}\limits^{\leftrightarrow}=\mathop{x}\limits^{\leftrightarrow}].$$
(1)

Therefore, an input–output process acts as a transducer on stochastic processes—it takes one stochastic process as input, and transforms it into another.Footnote 1

Computational mechanics typically assumes processes are causal and stationary. Causality implies the future inputs do not retroactively affect past outputs. That is, for all \(L\in {{\Bbb{Z}}}^{+}\), we require \(P\left({Y}^{(t:t+L)}|\mathop{X}\limits^{\leftrightarrow}\right)=P({Y}^{(t:t+L)}|{\overleftarrow{X}}^{(t+L+1)})\), where \({\overleftarrow{X}}^{(t+L+1)}=\ldots \,{X}^{(t+L-1)}\,{X}^{(t+L)}\) and Y (t:t + L) = Y (t)Y (t + L). This naturally bipartitions each stochastic process \(\mathop{Y}\limits^{\leftrightarrow}\) into two halves, \({\overleftarrow{y}}^{(t)}=\ldots {y}^{(t-2)}{y}^{(t-1)}\) to represent events in the past of t, and \({\overrightarrow{y}}^{(t)}={y}^{(t)}{y}^{(t+1)}\ldots\) to describe events in the future, governed, respectively, by \({\overleftarrow{Y}}^{(t)}\) and \({\overrightarrow{Y}}^{(t)}\). Stationarity implies the process is invariant with respect to time translation, such that \(P({Y}^{(t:t+L)}|{\mathop{x}\limits^{\leftrightarrow}}^{(t)})=P({Y}^{(0:L)}|{\mathop{x}\limits^{\leftrightarrow}}^{(0)})\) and \(P({\mathop{Y}\limits^{\leftrightarrow}}^{(t)}|{\mathop{x}\limits^{\leftrightarrow}}^{(t)})=P({\mathop{Y}\limits^{\leftrightarrow}}^{\mathrm{(0)}}|{\mathop{x}\limits^{\leftrightarrow}}^{\mathrm{(0)}})\), where \({\mathop{x}\limits^{\leftrightarrow}}^{(t)}=\ldots {x}^{(t-1)}{x}^{(t)}{x}^{(t+1)}\ldots\) is governed by \({\mathop{X}\limits^{\leftrightarrow}}^{(t)}\). Hence, we can take the present to be t = 0, and omit the superscript (t).

Each instance of an input–output process has some specific past \(\overleftarrow{z}=(\overleftarrow{x},\,\overleftarrow{y})\). On future input \(\overrightarrow{x}\), the process will then exhibit a corresponding conditional future governed by \(P\left[\overrightarrow{Y}|\overrightarrow{X}=\overrightarrow{x},\,\overleftarrow{Z}=\overleftarrow{z}\right]\). A mathematical model of the process should replicate its future black-box behavior when given information about the past. That is, each model records \(s(\overleftarrow{z})\) in some physical memory \(\Xi\) in place of \(\overleftarrow{z}\), such that upon future input \(\overrightarrow{x}\) the model can generate a random variable \(\overrightarrow{Y}\) according to \(P[\overrightarrow{Y}|\overrightarrow{x},\,\overleftarrow{z}]\) (see Fig. 1).

Fig. 1
figure 1

Modeling a general input–output process. Each instance of an input–output process features some specific sequence of past inputs (a) and past outputs (b). A model of such a process describes a systematic method of storing relevant information within a physical system (c), such that for any future input (d), it can replicate the correct statistical output (e)

Simplest classical models

Numerous mathematical models exist for each input–output process. A brute force approach involves storing all past inputs and outputs. This is clearly inefficient. Consider the trivial input–output process that outputs a completely random sequence regardless of input. Storing all past information would take an unbounded amount of memory. Yet, this process can be simulated by flipping an unbiased coin—requiring no information about the past.

A more refined approach reasons that replicating future behavior does not require differentiation of pasts with statistically identical future behavior. Formally, we define the equivalence relation \(\overleftarrow{z}{\sim }_{\varepsilon }\overleftarrow{z}^{\prime}\) whenever two pasts, \(\overleftarrow{z}\) and \(\overleftarrow{z}\)′, exhibit statistically coinciding future input–output behavior, i.e., whenever \(P\left[\overrightarrow{Y}|\overrightarrow{X},\,\overleftarrow{Z}=\overleftarrow{z}\right]=P\left[\overrightarrow{Y}|\overrightarrow{X},\,\overleftarrow{Z}=\overleftarrow{z}^{\prime}\right]\). This partitions the space of all pasts into equivalence classes \({\cal{S}}=\{{s}_{i}\}\). Each \({s}_{j}\in {\cal{S}}\) is known as a causal state, and ε denotes the encoding function that maps each past to its corresponding causal state. In general, a process can have an infinite number of causal states. Classical studies generally concentrate on cases where \(n=|{\cal{S}}|\) is finite. In light of this, we have focused our presentation on such cases.

This motivates the ε-transducer, which stores the causal state \(\varepsilon (\overleftarrow{z})\) in place of \(\overleftarrow{z}\). It then operates according to the transition elements.

$${T}_{ij}^{y|x}=P[{S}^{(t)}={s}_{j},\,{Y}^{(t)}=\,y|{S}^{(t-1)}={s}_{i},\,{X}^{(t)}=x];$$
(2)

the probability a transducer in causal state \({s}_{i}\in {\cal{S}}\) will transition to causal state \({s}_{j}\in {\cal{S}}\) while emitting \(y\in {\cal{Y}}\), conditioned on receiving input \(x\in {\cal{X}}\). Note that this construction is naturally unifiliar—given the state of the transducer at the current time-step, its state at the subsequent time-step can be completely deduced by observation of the next input–output pair.3 Thus iterating through this procedure generates output behavior statistically identical to that of the original input–output process.

For each stationary input process \(\mathop{X}\limits^{\leftrightarrow}\), causal state s i occurs with probability \({p}_{X}\,(i)=P[\varepsilon (\overleftarrow{z})={s}_{i}]\). The ε-transducer will thus exhibit internal entropy.

$${C}_{X}=-\sum _{i}{p}_{X}(i)\,\mathrm{log}\,{p}_{X}(i).$$
(3)

ε-transducers are the provably simplest classical models—any other encoding function \(s(\overleftarrow{z})\) that generates correct future statistics, will exhibit greater entropy.3

Complexity theorists regard \({C}_{X}\) as a quantifier of complexity,3 the rationale being that it characterizes the minimum memory any model must store when simulating \({\{\mathop{Y}\limits^{\leftrightarrow}|\mathop{x}\limits^{\leftrightarrow}\}}_{\mathop{x}\limits^{\leftrightarrow}\in \mathop{X}\limits^{\leftrightarrow}}\) on input \(\mathop{X}\limits^{\leftrightarrow}\). More precisely, consider the simulation of N such input–output processes, where each instance is driven by \(\mathop{X}\limits^{\leftrightarrow}\). C x then specifies that in the asymptotic limit (N→∞) we can use the ε-transducer to replicate the future statistics of the ensemble by storing the past within a system of NC X bits.

In the special case where the input–output process is input independent (i.e., \(\overrightarrow{Y}|\mathop{x}\limits^{\leftrightarrow}\) is the same for all \(\mathop{x}\limits^{\leftrightarrow}\)), the causal states are reduced to equivalences classes on the set of past outputs \(\overleftarrow{{\cal{Y}}}\). T ij y|x become independent of x and is denoted T ij y. Here, the ε-transducers are known as ε-machines and C X as the statistical complexity. This measure has been applied extensively to quantify the structure of various stochastic processes.1215

For general input–output processes, C X is \(\mathop{X}\limits^{\leftrightarrow}\)-dependent and is known as the input-dependent statistical complexity. In certain pathological cases (e.g., always inputting the same input at every time-step), the transducer may have zero probability of being in a particular causal state, potentially leading to a drastic reduction in \({C}_{X}\). Here, we consider non-pathological inputs, such that the transducer has non-zero probability of being in each causal state, i.e., p X (i) > 0 for all i. It is also often useful to quantify the intrinsic structure of input–output processes without referring to a specific input process.3 One proposal is the structural complexity, \(\bar{C}={\sup }_{X}\,{C}_{X}\), which measures how much memory is required in the worst case scenario.

Example

We illustrate a simple example of an actively perturbed coin. Consider a box with two buttons, containing a single coin. At each time-step, the box accepts a single bit x{0,1} as input representing which of the two buttons is pressed. In response, it flips the coin with probability p if x = 1, and probability q if x = 0, where 0 < p, q < 1. The box then outputs the new state of the coin, y. The behavior of the box is described by an input–output process.

First note that when p = q = 0.5, the output of the device becomes completely random, all pasts collapse to a single causal state, and the statistical complexity is trivially zero. In all other cases, the past partitions into two causal states, \({s}_{k}=\{\overleftarrow{z}:{y}^{(-1)}=k\}\), k = 0,1, corresponding to the two possible outcomes of the previous coin toss. Future statistics can then be generated via appropriate transition elements. For details, see the graphical representation in Fig. 2.

Fig. 2
figure 2

ε-transducer for the actively perturbed coin. Here, node 0 (1) is identified with causal state s 0 (s 1) and represents pasts where the last coin toss resulted in tails (heads). Each non-zero transition element T kj y|x is then represented by a directed edge from node k to j labeled by ‘y|x:T kj y|x’. Here, future statistics can be generated by transition elements T 01 1|0 = T 10 0|0 = p, T 00 0|0 = T 11 1|0 = 1−p, T 10 0|1 = T 01 1|1 = q, T 00 0|1 = T 11 1|1 = 1−q

Consider a simple input process \({\mathop{X}\limits^{\leftrightarrow}}_{u}\) where x = 1 is chosen with some fixed probability u at each time-step, the symmetry of the transition elements implies s 0 and s 1 occur with equiprobability. Thus \({C}_{{X}_{u}}\mathrm{=1}\). Furthermore, as this is the maximum entropy a two-state machine can take, the classical structural complexity of the actively perturbed coin, \(\bar{C}\), is also 1.

Classical inefficiency

Even though the ε-transducer is classically optimal, it may still store unnecessary information. Consider the example above. The ε-transducer must store the last state of the coin (i.e., whether the past is in s 0 or s 1). However, irrespective of input x, both s 0 and s 1 have non-zero probability of transitioning to the same s k while emitting the same output y. Once this happens, some of the information being used to perfectly discriminate between s 0 and s 1 will be irrevocably lost—i.e., there exists no systematic method to perfectly retrodict whether an ε-transducer was initialized in s 0 or s 1 from its future behavior, regardless of how we choose future inputs. Thus, the transducer appears to store information that will never be reflected in future observations, and is, therefore, wasted.

We generalize this observation by introducing step-wise inefficiency. Consider an ε-transducer equipped with causal states \({\cal{S}}\) and transition elements T ij y|x. Suppose there exists \({s}_{i},\,{s}_{j}\in {\cal{S}}\) such that irrespective of input x, both s i and s j have non-zero probability of transitioning to some coinciding \({s}_{k}\in {\cal{S}}\) while emitting a coinciding output \(y\in {\cal{Y}}\)—i.e., for all \(x\in {\cal{X}}\) there exists \(y\in {\cal{Y}},\,{s}_{k}\in {\cal{S}}\) such that both T ik y|x and T jk y|x are non-zero. This implies that at the subsequent time-step, it is impossible to infer which of the two causal states the transducer was previously in with certainty. Thus, some of the information being stored during the previous time-step is inevitably lost. We refer to any ε-transducer that exhibits this condition as being step-wise inefficient.

Results

Quantum processing can mitigate this inefficiency. Whenever the ε-transducer of a given input–output process is step-wise inefficient, we can construct a quantum transducer that is provably simpler. Our construction assigns each s i a corresponding quantum causal state

$$\left|{s}_{i}\right\rangle =\mathop{\otimes }\limits_{x}\left|{s}_{i}^{x}\right\rangle,\,\,{\rm{with}}\,\,\left|{s}_{i}^{x}\right\rangle =\sum _{k}\sum _{y}\sqrt{{T}_{ik}^{y|x}}\left|y\right\rangle \left|k\right\rangle,$$
(4)

where \({\otimes }_{x}\) represents the direct product over all possible inputs \(x\in {\cal{X}}\), while \(\left|y\right\rangle\) and \(\left|k\right\rangle\) denote some orthonormal basis for spaces of dimension \(|{\cal{Y}}|\) and \(|{\cal{S}}|\) , respectively. The set of quantum causal states \(\{\left|{s}_{i}\right\rangle\}_{i}\) form a state space for the quantum transducer. Note that although each state is represented as a vector of dimension \({(|{\cal{Y}}||{\cal{S}}|)}^{|{\cal{X}}|}\), in practice they span a Hilbert space of dimension at most \(n=|{\cal{S}}|\), whenever \(|{\cal{S}}|\) is finite. In such scenarios, the quantum causal states can always be stored losslessly within a n-dimensional quantum system (see subsequent example and methods).

For an input process \(\mathop{X}\limits^{\leftrightarrow}\), the quantum transducer thus has input-dependent complexity of

$${{\cal{Q}}}_{X}=-{\rm{Tr}}[{\rho }_{X}\,\mathrm{log}\,{\rho }_{X}],$$
(5)

where \({\rho }_{X}={\sum }_{i}{p}_{X}\,(i)\,\left|{s}_{i}\right\rangle \left\langle {s}_{i}\right|\). In general \({{\cal{Q}}}_{X}\le {C}_{X}\).16 Note that when there is only one possible input, \(\left|{s}_{i}\right\rangle = \left|{s}_{i}^{0}\right\rangle\). This recovers existing quantum ε-machines that can model autonomously evolving stochastic processes better than their simplest classical counterparts.8 Our transducers generalize this result to input–output processes. We show that they field the following properties:

  1. 1.

    Correctness: For any past \(\overleftarrow{z}\) in causal state s i , a quantum transducer initialized in state \(\left|{s}_{i}\right\rangle\) can exhibit correct future statistical behavior. i.e., there exists a systematic protocol that, when given \(\left|{s}_{i}\right\rangle\) generates \(\overrightarrow{y}\) according to \(P[\overrightarrow{Y}|\overrightarrow{X}=\overrightarrow{x},\,\overleftarrow{Z}=\overleftarrow{z}]\), for each possible sequence of future inputs \(\overrightarrow{x}\).

  2. 2.

    Reduced Complexity: Q X  < C X for all non-pathologic input processes \(\mathop{X}\limits^{\leftrightarrow}\), whenever the process has a step-wise inefficient ε-transducer.

  3. 3.

    Generality: Quantum transducers store less memory whenever it is physically possible to do so. Given an input–output process, either Q X  < C X for all non-pathological \(\mathop{X}\limits^{\leftrightarrow}\), or there exists no physically realizable model that does this.

Correctness guarantees quantum transducers behave statistically identically to their classical counterparts—and are thus operationally indistinguishable from the input–output processes they simulate. The proof is done by explicit construction (see Fig. 3 and details in the methods).

Fig. 3
figure 3

The quantum circuit that illustrates how a quantum transducer initialized in|s i 〉 at time t−1, simulates the future behavior, when supplied with an input sequence x (t) x (t + 1)…. Upon receiving x at time-step t, it applies a selection operator \({S}_{x}:\left|{s}_{i}\right\rangle \to \left|{s}_{i}^{x}\right\rangle\), followed by the quantum operation A that takes each \(\left|{s}_{i}^{x}\right\rangle \left\langle {s}_{i}^{x}\right|\) to some bipartite state \({\sum }_{y,k}{T}_{ik}^{y|x}\left|y\right\rangle \left|{s}_{k}\right\rangle \left\langle y\right|\left\langle {s}_{k}\right|\) with bipartitions Σ, spanned by |y〉, and \({\mathcal{W}}\), spanned by |s k 〉 (This is always possible with suitable ancilla, see methods). Σ is emitted as output, while \({\mathcal{W}}\) is retained as the quantum causal state at the next time-step. Measurement of Σ in the {|y〉} basis by any outside observer yields outcome y (t). Iterating this procedure then generates correct outputs at each future time-step

Reduced complexity implies step-wise inefficiency is sufficient for quantum transducers to be simpler than their classical counterparts. The proof involves showing that if for any potential input x, both s i and s j have non-zero probability of transitioning to some coinciding causal state s k while emitting identical y, then |s i 〉 and |s j 〉 are non-orthogonal (see methods). Thus, provided two such causal states exist (guaranteed by step-wise inefficiency of the transducer), and each occur with some non-zero probability (guaranteed by non-pathology of the input), Q X  < C X .

Generality establishes that step-wise inefficiency is a necessary condition for any physically realizable quantum model to outperform its classical counterpart. Combined with ‘reduced complexity’, they imply that step-wise inefficiency is the sole source of avoidable classical inefficiency and that our particular quantum transducers are general in mitigating this inefficiency. The proof is detailed in the methods, and involves showing that any model which improves upon an ε-transducer that is step-wise efficient allows perfect discrimination of non-orthogonal quantum states.

Together, these results isolate step-wise inefficiency as the necessary and sufficient condition for quantum models to be simpler than their classical counterparts, and furthermore, establish an explicit construction of such a model. It follows that whenever the ε-transducer is step-wise inefficient, the upper-bound,

$$\overline{Q}=\mathop{\sup }\limits_{X}\,{Q}_{X},$$
(6)

will be strictly less than \(\bar{C}={\sup }_{X}\,{C}_{X}\), provided sup X C X is attained for a non-pathological \(\mathop{X}\limits^{\leftrightarrow}\). Intuitively, this clause appears natural. If an agent wished to drive a transducer to exhibit the greatest entropy, then it would be generally advantageous to ensure the transducer has finite probability of being in each causal state. Nevertheless, as the maximization is highly non-trivial to evaluate, this remains an open conjecture.

Example revisited

We illustrate these results for the aforementioned actively perturbed coin. Recall that the ε-transducer of this process features two causal states, s 0 and s 1 (see Fig. 2). As this transducer is step-wise inefficient, a more efficient quantum transducer exists. Specifically, set the quantum causal states to

$$\left|{\tau }_{0}\right\rangle =\sqrt{r}\left|0\right\rangle +\sqrt{1-r}\left|1\right\rangle,\quad \quad \left|{\tau }_{1}\right\rangle =\left|0\right\rangle,$$
(7)

where r = 16pq(1−p)(1−q). Note that while these states do not resemble the standard form in Eq. (4) they are unitarily equivalent. Given \(\{\left|{\tau }_{i}\right\rangle\}_{i}\), we can initialize the joint four qubit state \(\left|{\tau }_{i}000\right\rangle\) (with three ancilla), and implement an appropriate 4-qubit unitary \(U:\left|{\tau }_{i}000\right\rangle \to \left|{s}_{i}\right\rangle\) for j = 0,1, where

$$\left|{s}_{0}\right\rangle = \left|{\phi }_{p}\right\rangle _{12}\otimes \left|{\phi }_{q}\right\rangle_{34},\quad \quad \left|{s}_{1}\right\rangle ={\hat{X}}_{1}{\hat{X}}_{2}{\hat{X}}_{3}{\hat{X}}_{4}\left|{s}_{0}\right\rangle .$$

are of standard form. Here subscripts 1…4 label each of the four qubits, while \(\left|{\phi }_{p}\right\rangle =\sqrt{1-p}\left|00\right\rangle +\sqrt{p}\left|11\right\rangle\) and \(\left|{\phi }_{q}\right\rangle =\sqrt{1-q}\left|00\right\rangle +\sqrt{q}\left|11\right\rangle\), and \({\hat{X}}_{i}\) represents the Pauli X operator on the i th qubit. The \(\left|{\tau }_{i}\right\rangle\) representation makes it clear that these states can be encoded within a single qubit. Figure 4a then outlines a quantum circuit that generates desired future behavior.

Fig. 4
figure 4

The quantum transducer for the perturbed coin can generate appropriate future statistics via the quantum circuit in a. Suppose a transducer, in state |τ i 〉, receives input x at time t. To generate output y, it transforms |τ i 〉 to a 4-qubit quantum state |s i 〉 by application of an appropriate unitary U on |τ i 〉 and three ancilla qubits in state |000〉. The transducer then discards qubits one and two if x = 0, or qubits three and four if x = 1. The two remaining qubits, labeled B 1 and B 2, are subsequently transformed by a unitary V that maps |00〉 to \(\left|0{\tau }_{0}\right\rangle\) and |11〉 to \(\left|1{\tau }_{1}\right\rangle\) (this is always possible as \(\langle 0{\tau }_{0}\mathrm{|1}{\tau }_{1}\rangle \mathrm{=0}\)). B 1 is emitted as output while B 2 is retained by the transducer as the causal state for the subsequent time-step. Measurement of B 1 in the computational basis yields y. Iteration of this procedure replicates correct future input–output statistics. The resulting improved efficiency is highlighted in b, which depicts the maximum memory required by a quantum transducer \(\bar{Q}\) (orange surface) to simulated the actively perturbed coin vs. its classical counterpart, the structural complexity \(\bar{C}\) (blue surface) for various p and q. While the ε-transducer generally requires 1 bit of memory, the quantum transducer requires less, and becomes increasingly more efficient as p,q→0.5

The resulting quantum transducer is clearly more efficient. \(|\langle {\tau }_{0}|{\tau }_{1}\rangle |=\sqrt{r}\mathrm{ >0}\), provided 0 < p,q < 1. Thus Q X  < C X for all input processes \(\mathop{X}\limits^{\leftrightarrow}\). Furthermore, the quantum structural complexity \(\bar{Q}={\sup }_{X}\,{Q}_{X}\) is attained for any input process where |τ 0〉 and |τ 1〉 occur with equiprobability, such as any \(\mathop{X}\limits^{\leftrightarrow}={\mathop{X}\limits^{\leftrightarrow}}_{u}\). The improvement is significant, and can be explicitly evaluated (see Fig. 4b). In particular, \({\mathrm{lim}}_{p,q\to 0.5}{Q}_{X}/{C}_{X}\mathrm{=0}\). Thus the quantum transducer, in limiting cases, can use negligible memory compared to its classical counterpart.

The intuition is that as we approach this limit, the output of the process becomes progressively more random. Thus future black-box behavior of a process whose last output is heads becomes increasingly similar to one whose last output is tails. The equivalence class s 0 or s 1, then contains progressively less information about future outputs. A classical transducer nevertheless must distinguish these two scenarios, and thus exhibits an entropy of 1 when the scenarios are equally likely. A quantum transducer, however, has the freedom to only partially distinguish the two scenarios to the extent in which they affect future statistics. In particular \({\mathrm{lim}}_{p,q\to 0.5}|\langle {\tau }_{0}|{\tau }_{1}\rangle |\to 1\). As the process becomes more random, the quantum transducer saves memory by encoding the two scenarios in progressively less orthogonal quantum states.

Future directions

There is potential in viewing predictive modeling as a communication task, where Alice sends information about a process’s past to Bob, so that he may generate correct future statistical behavior.3, 9 The simpler a model, the less information Alice needs to communicate to Bob. The entropic benefits of harnessing non-orthogonal causal states mirrors similar advantages in exploiting non-orthogonal codewords to perform certain communication tasks.17 Quantum transducers could thus identify a larger class of such tasks, and provide a general strategy to supersede classical limits. Meanwhile, one may also consider generalizations of statistical complexity that use other measures of entropy. The max entropy, for example, captures the minimum dimensionality required to simulate general input–output processes. This may complement existing work in quantum dimensionality testing,1820 pointing to testing the dimensionality of systems by seeing how they transform stochastic processes.

Another interesting question is how quantum transducers relate to quantum advantage in randomness processing.21 In this context, it was shown that quantum sources of randomness (named quoins), in the form \(\left|p\right\rangle =\sqrt{p}\left|0\right\rangle +\sqrt{1-p}\left|1\right\rangle\) can be a much more powerful resource for sampling a coin with a p-dependent bias f(p), than classical coins of bias p. Subsequent experimental implementations have used quoins to sample certain f(p), which are impossible to synthesize when equipped with only p-coins.22 Quantum transducers appear to utilize similar effects. The quantum causal states in Eq. (4) also resemble a quantum superposition of classical measurement outcomes, which can be used to generate desired future output statistics more efficiently than classically possible. Is this resemblance merely superficial? Quantum transducers and the quantum Bernoulli factory certainly also field significant differences—both in how they quantify efficiency and in what they consider as input and output. As such this question remains very much open.

There could also be considerable interest in establishing what resources underpin the performance advantage in quantum transducers. Non-orthogonal quantum causal states, and thus coherence, is clearly necessary. This non-orthogonality then immediately implies that quantum correlations (in the sense of discord23) necessarily exist between the state of the transducer and its past outputs. Could the amount of such resources be related quantitatively to the quantum advantage of a particular transducer? Whether more stringent quantum resources, such as entanglement, are also required at some point to generate correct future statistics, also remains an open question. Certainly, all quantum transducers described here exploit highly entangling operations to generate statistically correct future behavior—and field a significant amount of entanglement during their operation. Is the existence of such entanglement at some stage during the simulation process essential for quantum advantage?

Discussion

In computational mechanics, ε-transducers are the provably simplest models of input–output processes. Their internal entropy is a quantifier of structure—any device capable of replicating the process’s behavior must track at least this much information. Here, we generalize this formalism to the quantum regime. We propose a systematic method to construct quantum transducers that are generally simpler than their simplest classical counterparts; in the sense that quantum transducers store less information whenever it is physically possible to do so. Our work indicates the perceived complexity of input–output processes generally depends on what type of information theory we use to describe them.

A natural continuation is to explore the feasibility of such quantum transducers in real world conditions. A proof of principle demonstration is well within reach of present-day technology. The quantum transducer for the actively perturbed coin can be implemented by a single qubit undergoing one of two different weak measurements at each time-step. To demonstrate a quantum advantage for real world applications would also motivate new theory. For example, noise will certainly degrade the performance of quantum transducers, forcing the use of more distinguishable quantum causal states. The derivation of bounds on the resultant entropic cost would thus help us establish thresholds that guarantee quantum advantage in real-world conditions.

Ultimately, the interface between quantum and computational mechanics motivates the potential for tools of each community to impact the other. Classical transducers, for example, are used to capture emergence of complexity under evolutionary pressures,24 distil structure within cellular automata,25, 26 and characterize the dynamics of sequential learning,27 and it would be interesting to see how these ideas change in the quantum regime. Meanwhile, recent results suggests the inefficiency of classical models may incur unavoidable thermodynamic costs.2830 In reducing this inefficiency, quantum transducers could offer more energetically efficient methods of transforming information. Any such developments could demonstrate the impact of quantum technologies in domains where their use has not been considered before.

Methods

Definitions

Let X (t),Y (t) and S (t) represent, respectively, the random variables governing the input, output and causal state at time t. Let Y (0:t) = Y (0)Y (t) govern the outputs y (0:t) = y (0)y (t) and X (0:t) = X (0)X (t) govern the inputs x (0:t) = x (0)x (t). We introduce ordered pairs Z (t) = (X (t), Y (t)) which take values z (t) = (x (t), y (t))\({\cal{Z}}\), where \({\cal{Z}}\) represents the space of potential input–output pairs. In analogy, let \(\overleftarrow{z}=(\overleftarrow{x},\,\overleftarrow{y})\) represents a particular past, \(\overleftarrow{{\cal{Z}}}\) be the space of all possible pasts and z (0:t) = z (0)z (t)\({{\cal{Z}}}^{t+1}\) the input–output pairs from time-steps 0 to t. Let \(|{\cal{X}}|\) denote the size of the input alphabet, \(|{\cal{Y}}|\) the size of output alphabet and \(n=|{\cal{S}}|\) the number of causal states(if finite). Without loss of generality, we assume the inputs and outputs are labeled numerically, such that \({\cal{X}}=\{{x}_{i}\}_{0}^{\left|{\cal{X}}\right|-1}\) and \({\cal{Y}}=\{{y}_{i}\}_{0}^{\left|{\cal{Y}}\right|-1}\).

Each instance of an input–output process \(\{\mathop{Y}\limits^{\leftrightarrow}|\mathop{x}\limits^{\leftrightarrow}\}\) exhibits a specific past \(\overleftarrow{z}=(\overleftarrow{x},\,\overleftarrow{y})\). When supplied x (0) = x as input, it emits y (0) = y with probability \(P[{Y}^{\mathrm{(0)}}=y|{X}^{\mathrm{(0)}}=x,\,\overleftarrow{Z}=\overleftarrow{z}]\). The system’s past then transitions from \(\overleftarrow{z}\) to \(\overleftarrow{z}^{\prime} =(\overleftarrow{x}x,\,\overleftarrow{y}y)=(\overleftarrow{x}^{\prime},\,\overleftarrow{y}^{\prime})\). This motivates the propagating functions μ x,y on the space of pasts, \({\mu }_{x,y}(\overleftarrow{z})=(\overleftarrow{x}x,\,\overleftarrow{y}y)\), which characterize how the past updates upon observation of (x,y) at each time-step. Iterating through this process for t + 1 timesteps gives expected output y (0:t) with probability \(P({Y}^{(0:t)}={y}^{(0:t)}|{X}^{(0:t)}={x}^{(0:t)},\,\overleftarrow{Z}=\overleftarrow{z})\) upon receipt of input sequence x (0:t). Taking t→∞ gives the expected future input–output behavior \(P[\overrightarrow{Y}|\overrightarrow{X}=\overrightarrow{x},\,\overleftarrow{Z}=\overleftarrow{z}]\) upon future input sequence \(\overrightarrow{x}\in \overrightarrow{{\cal{X}}}\).

A quantum model of an input–output process defines an encoding function \(\aleph\) that maps each past \(\overleftarrow{z}\) to some state \(\aleph (\overleftarrow{z})={\rho }_{\overrightarrow{z}}\) within some physical system \(\Xi\). The model is correct, if there exists a family of operations \({\Bbb{M}}=\{{{\cal M}}_{x}\}_{x\in {\cal{X}}}\) such that application of \({ {\cal M} }_{x}\) on \(\Xi\) replicates the behavior of inputting \(x\). That is, \({ {\cal M} }_{x}\) acting on \({\rho }_{\overleftarrow{z}}\) should (1) generate output y with probability \(P[{Y}^{\mathrm{(0)}}|{X}^{\mathrm{(0)}}=x,\,\overleftarrow{Z}=\overleftarrow{z}]\) and (2) transition \(\Xi\) into state \(\aleph [{\mu }_{x,y}(\overleftarrow{z})]={\rho }_{{\mu }_{x,y}(\overleftarrow{z})}\). (1) ensures the model outputs statistically correct y (0), while (2) ensures the model’s internal memory is updated to record the event z (0) = (x (0),y (0)), allowing correct prediction upon receipt of future inputs. Sequential application of \({ {\cal M} }_{\overleftarrow{x}}={ {\cal M} }_{{x}^{\mathrm{(0)}}},\,{ {\cal M} }_{{x}^{\mathrm{(1)}}},\ldots\) then generates output \(\overrightarrow{y}\) with probability \(P[\overrightarrow{Y}|\overrightarrow{X}=\overrightarrow{x},\,\overleftarrow{Z}=\overleftarrow{z}]\). Let \(\Omega =\{{\rho }_{\overleftarrow{z}}\}_{\overleftarrow{z}\in \overleftarrow{{\cal{Z}}}}\) be the image of \(\aleph\). We now define quantum models as follows.

Definition 1

A general quantum model of an input–output process is a triple \({\cal{Q}}=(\aleph,\Omega,{\Bbb{M}})\) , where \(\aleph\), Ω and \({\Bbb{M}}\) satisfy the conditions above.

Each stationary input process \(\mathop{X}\limits^{\leftrightarrow}\) induces a probability distribution \({p}_{X}(\overleftarrow{z})\) over the set of pasts, and thus a steady state of the machine ρ X . The resulting entropy of \(\Xi\),

$${Q}_{X}[\cal Q]=-{\rm{Tr}}\,({\rho }_{X}\,\mathrm{log}\,{\rho }_{X}),$$
(8)

then defines the model’s input-dependent complexity. For the quantum transducers in the main body, \(\Omega =\{|{s}_{i}\rangle {\langle {s}_{i}|\}}_{i}\) corresponds to the set of quantum causal states, and the encoding function \(\aleph\) maps each past \(\overleftarrow{z}\) to\(\left|{s}_{i}\right\rangle \left\langle {s}_{i}\right|\), whenever \(\overleftarrow{z}\in {s}_{i}\). Specifically, if we define the classical encoding function ε: \(\overleftarrow{{\cal{Z}}}\to {\cal{S}}\) that maps pasts onto causal states such that \(\varepsilon (\overleftarrow{z})={s}_{i}\) iff \(\overleftarrow{z}\in {s}_{i}\), then \(\left|{s}_{i}\right\rangle =\left|\varepsilon (\overleftarrow{z})\right\rangle\).

We also introduce input strategies. Suppose Bob receives a model of some known input–output process, initialized in some (possibly unknown) state \({\rho }_{\overleftarrow{z}}\in \Omega\). Bob now wants to drive the model to exhibit some particular future behavior by using an input strategy—a specific algorithm for deciding what input x (t) he will feed the model at each specific t ≥ 0, purely from the model’s black-box behavior.

Definition 2

An input strategy is a family of functions \(F=\{{f}^{(t)}|t\in {{\Bbb{Z}}}^{+}\}\) , where \({f}^{(t)}:{{\cal{Z}}}^{t}\to {\cal{X}}\) is a map from the space of pre-existing inputs–outputs onto the input at time t, such that x (t) = f (t)(z (0:t−1)). We denote a sequence of future inputs which is determined using x (t) = f (t)(z (0:t−1)), as \({\overrightarrow{x}}_{F}\).

In subsequent proofs, we will invoke input strategies on classical ε-transducers. Here, we denote

$$\begin{matrix}{P}_{{s}_{i},F}[\overrightarrow{y}]=P[\overrightarrow{Y}=\overrightarrow{y}|,\,\overrightarrow{X}={\overrightarrow{x}}_{F},\,{S}^{(-1)}={s}_{i}],\end{matrix}$$
(9)

as the probability distribution that governs future outputs \(\overrightarrow{Y}\), when an input strategy F is used to select the future inputs \({\overrightarrow{x}}_{F}\) to an ε-transducer initialized in some causal state \({s}_{i}\in {\cal{S}}\).

We also make use of the trace distance, \(D[P,\,Q]=1/2{\sum}_{\overrightarrow{y}\in \overrightarrow{{\cal{Y}}}}|P[\overrightarrow{y}]-Q[\overrightarrow{y}]|\) between two probability distributions \(P[\overrightarrow{y}]\) and \(Q[\overrightarrow{y}]\). Similarly, any two quantum states τ and σ have trace distance D[τ,σ] = 1/2(Tr|τ−σ|), where \(|A|\equiv \sqrt{{A}^{\dagger }A}\).

Proof of correctness

Here, we prove a quantum transducer can generate correct future statistical behavior when supplied with a quantum system \(\Xi\) initialized in state \(\left|{s}_{i}\right\rangle\), encoding information about \(\overleftarrow{z}\in {s}_{i}\). That is, there exists a family of quantum processes \({\Bbb{M}}=\{{ {\cal M} }_{x}\}\) whose action on \(\Xi\) produces an output y sampled from \(P[{Y}^{\mathrm{(0)}}\,=\,y|{X}^{\mathrm{(0)}}=x,{S}^{(-1)}\,=\,{s}_{i}]\), while transforming the state of \(\Xi\) to\(\left|\varepsilon (\overleftarrow{x}x,\,\overleftarrow{y}y)\right\rangle\).

Proof

Recall that \(\left|{s}_{i}\right\rangle ={\otimes }_{x}\left|{s}_{i}^{x}\right\rangle\) where \(\left|{s}_{i}^{x}\right\rangle ={\sum }_{k\mathrm{=0}}^{n-1}{\sum }_{y\in {\cal{Y}}}\sqrt{{T}_{ik}^{y|x}}\left|y\right\rangle \left|k\right\rangle\). Let Σ be the \(|{\cal{Y}}|\) dimensional Hilbert space spanned by \(\{\left|y\right\rangle\}_{y\in {\cal{Y}}}\) and \({\cal{K}}\) be the n-dimensional Hilbert space spanned by \(\{\left|k\right\rangle \}_{k=0}^{n-1}\). Then each \(\left|{s}_{i}^{x}\right\rangle\) lies in \(\omega =\Sigma \otimes {\cal{K}}\), and each causal state |s i 〉 lies within \({\cal{W}}={\omega }^{\otimes |X|}\). The set \(\{\left|{s}_{i}\right\rangle\}_{i=0}^{n-1}\) spans some subspace of \({\cal{W}}\) of dimension at most n. This implies that the causal states can be stored within \({\cal{K}}\) without loss, i.e., there exists quantum states \(\{\left|{\tau }_{i}\right\rangle\}_{i=0}^{n-1}\) in \({\cal{K}}\) and a unitary process U such that \(U:\left|{\tau }_{i}\right\rangle \to \left|{s}_{i}\right\rangle\) for all i (Note, this assumes appending suitable ancillary subspaces to each |τ i 〉). An explicit form for |τ i 〉 can be systematically constructed through Gram-Schmidt decomposition.31 We refer to |τ i 〉 as compressed causal states, and U as the decompression operator.

We define the selection operator \({S}_{x}:{\cal{W}}\to \omega\) such that \({S}_{x^{\prime}}\!\!\!:({\otimes }_{x}\left|{\phi }_{x}\right\rangle)\to \left|{\phi }_{x^{\prime} }\right\rangle.\) Physically, if \({\cal{W}}\) represents a state space of \(|{\cal{X}}|\) qudits each with state space ω labeled from 0 to \(|{\cal{X}}|-1\), S x represents discarding (or tracing out) all except the x th qudit. Meanwhile, let B be a quantum operation on ω such that \(B:\left|y\right\rangle \left|k\right\rangle \left\langle y\right|\left\langle k\right|\to \left|y\right\rangle \left|{\tau }_{k}\right\rangle \left\langle y\right|\left\langle {\tau }_{k}\right|\). This operation always exists as it can be implemented by Kraus operators \({E}_{yk}=\left|y\right\rangle \left|{\tau }_{k}\right\rangle \left\langle y\right|\left\langle k\right|\). (Fig. 5).

Fig. 5
figure 5

a The process by which the quantum transducer generates future statistics as outlined in Fig. 3. b A more detailed breakdown of. Upon input x at time t, the transducer first applies the selection operator S x . The subsequent operator A can be decomposed into two operators, a linear mapping \(B:\left|{s}_{i}^{x}\right\rangle \left\langle {s}_{i}^{x}\right|\to {\sum }_{y,k}{T}_{ik}^{y|x}\left|y\right\rangle \left|{\tau }_{k}\right\rangle \left\langle y\right|\left\langle {\tau }_{k}\right|\), and a decompression operator U that rotates each \(\left|{\tau }_{k}\right\rangle\) into |s k 〉 (always possible when suitable ancillary systems in states |0〉 are supplied). Σ is emitted as output, while \({\mathcal{W}}\) is retained as the subsequent causal state at time t. This circuit makes it clear that \(\left|{\tau }_{k}\right\rangle\) also make perfectly valid causal states

The quantum transducer operates as follows. Upon input x at time t, it applies S x on \(\Xi\), followed by execution of B. This transforms the state of the system from \(\left|{s}_{i}^{x}\right\rangle\) to \({\sum }_{k=0}^{n-1}{\sum }_{y\in {\cal{Y}}}{T}_{ik}^{y|x}\left|y\right\rangle \left|{\tau }_{k}\right\rangle \left\langle y\right|\left\langle {\tau }_{k}\right|\). Application of the decompression operator then gives \({\sum }_{k=0}^{n-1}{\sum }_{y\in {\cal{Y}}}{T}_{ik}^{y|x}\left|y\right\rangle \left|{s}_{k}\right\rangle \left\langle y\right|\left\langle {s}_{k}\right|\) on state space \(\Sigma \otimes {\cal{W}}\). The machine then emits Σ as the output. Measurement of Σ by an external observer in basis \(\{\left|y\right\rangle \}_{y\in Y}\) gives the output y at time t, while inducing \({\cal{W}}\) to transition to the subsequent quantum causal state. \({\cal{W}}\) is then retained in \(\Xi\).

The above procedure establishes a family of quantum operations \({\{{ {\cal M} }_{x}\}}_{x\in {\cal{X}}}\) that maps each quantum causal state |s i 〉 to |s j 〉 while emitting output y with probability \({T}_{ij}^{y|x}\). Thus, the quantum transducer operates statistically indistinguishably from its classical counterpart, and must be correct. To simulate the future behavior of the input–output process upon future input \(\overrightarrow{x}\), the quantum transducer iterates through the above process by sequential application of \({ {\cal M} }_{{x}^{\mathrm{(0)}}}\), \({ {\cal M} }_{{x}^{\mathrm{(1)}}}\), etc. This establishes the transducer is a valid quantum model.

Proof of reduced complexity and generality

Given an arbitrary input–output process, whose ε-transducer and quantum transducer has respective input dependent complexities of C X and Q X , we show that

  • Reduced Complexity: Q X  < C X for all non-pathological input processes \(\mathop{X}\limits^{\leftrightarrow}\), whenever the ε-transducer is step-wise inefficient.

  • Generality: Given an input–output process, either Q X  < C X for all non-pathological input processes \(\mathop{X}\limits^{\leftrightarrow}\) or there exists no physically realizable model that does this.

Our strategy makes use of the following theorem.

Theorem 1. For any input–output process the following statements are equivalent:

(I) \(\langle {s}_{i}|{s}_{j}\rangle \mathrm{=0}\) for any \(\left|{s}_{i}\right\rangle,\left|{s}_{j}\right\rangle \in \{\left|{s}_{i}\right\rangle\}_{i}\) where i≠j.

(II) For any pair \(({s}_{i},{s}_{j})\in {\cal{S}}\times {\cal{S}}\) where i≠j, \(\exists x\in {\cal{X}}\) such that \(\forall y\in {\cal{Y}},\,{s}_{k}\in {\cal{S}}\), the product \({T}_{ik}^{y|x}{T}_{jk}^{y|x}\mathrm{=0}\).

(III) For any pair \(({s}_{i},{s}_{j})\in {\cal{S}}\times {\cal{S}}\) where i≠j, there exists an input strategy F such that \(D[{P}_{{s}_{i},F},{P}_{{s}_{j},F}]=1\).

(IV) For any input process \(\mathop{X}\limits^{\leftrightarrow}\) , any physically realizable (quantum) model must have complexity at least C X .

To prove this theorem, we show A. (I) is equivalent to (II). B. (II) implies (III), C. (III) implies (IV), and D. (IV) implies (I).

Proof of A. We prove this by showing (I) is false iff (II) is false. First, assume (I) is false, such that there exists \(\left|{s_i}\right\rangle,\left|{s_j}\right\rangle \in \{\left|{s_i}\right\rangle\}_{i}\) with \(\langle {s}_{i}|{s}_{j}\rangle ={\Pi }_{x\in X}({\sum }_{y,k}\sqrt{{T}_{ik}^{y|x}{T}_{jk}^{y|x}})\mathrm{ >0}\). This implies that for all \(x\in {\cal{X}}\) there exists \({s}_{k}\in {\cal{S}},\,y\in {\cal{Y}}\) such that \({T}_{ik}^{y|x}{T}_{jk}^{y|x}\ne 0\). Thus (II) is false. Meanwhile assume (II) is false, i.e., there exists s i and s j such that \(\forall x\) we can find \({y}_{x}\in {\cal{Y}},\,{s}_{{k}_{x}}\in {\cal{S}}\) for which \({T}_{i{k}_{x}}^{{y}_{x}|x}{T}_{j{k}_{x}}^{{y}_{x}|x}\mathrm{ >0}\). It follows that \(\langle {s_i}|{s_j}\rangle >{\Pi _{x}}({T_{i{k_x}}^{{y_x}|x}}|x{T_{j{k_x}}^{{y_x}|x}}) >0\) and (I) is false.

Proof of B. To prove this, we introduce the update function \(g:{\cal{S}}\times {\cal{X}}\times {\cal{Y}}\to {\cal{S}}\), such that g(s i ,x,y) = s k iff \({T}_{ik}^{y|x}\ne 0\). Note that g is always a function by joint unifiliarity of the ε-transducer.3 That is, the triple (s (t−1),x (t),y (t)) uniquely determines s (t).

We also use the following game to elucidate the proof. At time t = −1, Alice initializes an ε-transducer in either s i or s j and seals it inside a black box. Alice gives this box to Bob and challenges him to infer whether S (−1) = s i or S (−1) = s j , based purely on the transducer’s future black-box behavior. We first prove that if (II) is true, then for each pair (s i ,s j ) there exists an input strategy F ij that allows Bob to discriminate S (−1) = s i from S (−1) = s j to arbitrary accuracy purely from the transducer’s output behavior.

Specifically (II) implies that for all pairs \(({s}_{i},{s}_{j})\in {\cal{S}}\times {\cal{S}}\), there exists some x ij such that \(\forall y\in {\cal{Y}},\,{s}_{k}\in {\cal{S}}\), \({T}_{ik}^{y|{x}_{ij}}{T}_{jk}^{y|{x}_{ij}}\mathrm{=0}\). At t = 0, Bob inputs X (0) = x ij . Let y = Y (0) be the corresponding output. Note that because we have observed y = Y (0), there must have been some non-zero probability of observing y on input x ij , i.e., at least one of (i) P[Y (0) = y|X (0) = x ij ,S −1 = s i ]≠0 or (ii) P[Y (0) = y|X (0) = x ij ,S −1 = s j ]≠0 must be true. This presents two different possible scenarios:

  • Only one of (i) and (ii) is true. That is, one of s i or s j never outputs y upon input x ij —such that either P[Y (0) = y|X (0) = x ij , S −1 = s i ] = 0 or P[Y (0) = y|X (0) = x ij , S −1 = s j ] = 0.

  • Both (i) and (ii) are true, implying g(s i , y, x ij )≠g(s j , y, x ij ).

If (a) occurs, then Bob can immediately determine whether S (−1) = s i or S (−1) = s j and we are done. If (b) occurs, let s i ′ = g(s i , y, x ij ) be the new causal state if S (−1) = s i . Let s j ′ = g(s j , y, x ij ) be the new causal state if S (−1) = s j . Due to joint unifiliarity of the ε-transducer, Bob is able to uniquely determine s j′ and s i′ upon observation of z (0) = (x ij , y). Given s i′ and s j′ , (II) implies Bob can find x ij ′ such that \(\forall y\in {\cal{Y}},\,{s}_{k}\in {\cal{S}}\) the product \({T}_{i^{\prime} k}^{y|{x}_{ij}'}{T}_{j^{\prime} k}^{y|{x}_{ij}'}=0\). Thus, we can repeat the steps above choosing x (1) = x ij ′. Iterating this procedure defines an input strategy F ij , which determines each input x (t) as a function of observed inputs and outputs. At each point in time t, Bob will be able to identify some s i (t) which is the current causal state if S (−1) = s i , and some s j (t) which is the current causal state if S (−1) = s j .

Eventually, either scenario (a) will occur allowing Bob to perfectly rule out S (−1) = s i . Alternatively, in the limit of an infinite number of time steps, Bob can synchronize the ε-transducer based on the observed inputs and outputs (that is, the causal state at time t is entirely determined by observation of the past in limit of large t 1, 3). Thus, Bob can determine s (t) in the limit as t→∞, allowing inference of whether S (−1) = s i or S (−1) = s j . This constitutes an explicit input strategy F ij that allows Bob to discriminate between S (−1) = s i and S (−1) = s j to any arbitrary accuracy. Accordingly \(D[{P}_{{s}_{i}{F}_{ij}},{P}_{{s}_{j},{F}_{ij}}]=1\).

Proof of C. We prove this via its contrapositive. That is, suppose (IV) is false, such that there exists a quantum model \({\cal{Q}}^{\prime}\) with identical input–output relations to the process’s ε-transducer, which stores \({Q}_{X}[{\cal{Q}}^{\prime} ]< {C}_{X}\) for some \(\mathop{X}\limits^{\leftrightarrow}\). We show that if (III) is true then the data processing inequality is violated.16

We first make use of the following observation: If a model \({\cal{Q}}^{\prime} =\left(\aleph ^{\prime},\Omega^{\prime},{\Bbb{M}}'\right)\) satisfies \(\aleph ^{\prime} (\overleftarrow{z})\ne \aleph ^{\prime} (\overleftarrow{z}^{\prime})\) for some \(\overleftarrow{z}{\sim }_{\varepsilon }\overleftarrow{z}^{\prime}\), then we can always construct an alternative model \({\cal{Q}}=(\aleph,\Omega,{\Bbb{M}})\) such that \({Q}_{X}[{\cal{Q}}]\le {Q}_{X}[{\cal{Q}}^{\prime} ]\) for all input processes \(\mathop{X}\limits^{\leftrightarrow}\), and \(\aleph (\overleftarrow{z})=\aleph (\overleftarrow{z}^{\prime})\) iff \(\varepsilon (\overleftarrow{z})=\varepsilon (\overleftarrow{z}^{\prime})\) for all \(\overleftarrow{z},\,\overleftarrow{z}^{\prime} \in \overleftarrow{{\cal{Z}}}\). (This is a consequence of the concavity of entropy, see methods in ref. 32). That is, for any model \({\cal{Q}}^{\prime}\) with quantum states Ω′ not in 1-1 correspondence with classical causal states, there always exists a simpler model \({\cal{Q}}\) whose quantum states are in 1-1 correspondence with the causal states.

Thus, falsehood of (IV) implies there must exist some quantum model \({\cal{Q}}=(\aleph,\Omega,{\Bbb{M}})\) such that (i) \(\aleph (\overleftarrow{z})=\aleph (\overleftarrow{z}^{\prime})\) if and only if \(\overleftarrow{z}{\sim }_{\varepsilon }\overleftarrow{z}^{\prime}\) and (ii) \({Q}_{X}[{\cal{Q}}] < {C}_{X}\) for some \(\mathop{X}\limits^{\leftrightarrow}\). Now by virtue of (ii), there must exist two states ρ i , ρ j Ω such that the trace distance

$$D[{\rho }_{i},{\rho }_{j}] < 1.$$
(10)

The data processing inequality, therefore, implies that any quantum operation \({ {\cal M} }_{\overrightarrow{x}}:\Xi \to \overrightarrow{{\cal{Y}}}\) that generates future output statistics must satisfy \(D[{ {\cal M} }_{\overrightarrow{x}}({\rho }_{i}),\,{ {\cal M} }_{\overrightarrow{x}}({\rho }_{j})]\le D[{\rho }_{i},\,{\rho }_{j}] < 1.\)

However, all models of the same input–output process have identical black-box behavior. In particular the ε-transducer of the input–output process that \({\cal{Q}}\) models, must behave identically to \({\cal{Q}}\). As such, there exists two causal states of the classical ε-transducer, \({s}_{i},{s}_{j}\in {\cal{S}}\) such that \(D[{P}_{{s}_{i},F},{P}_{{s}_{j},F}]=D[{ {\cal M} }_{{\overrightarrow{x}}_{F}}({\rho }_{i}),\,{ {\cal M} }_{{\overrightarrow{x}}_{F}}({\rho }_{j})] < 1\) for all possible input strategies F. This implies (III) is false. Thus we have used proof by contrapositive to show (III) implies (IV).

Proof of D. The quantum transducer is a physically realizable model. Thus (IV) implies that Q X  ≥ C X for all \(\mathop{X}\limits^{\leftrightarrow}\). However, we note from our construction Q X  ≤ C X for all \(\mathop{X}\limits^{\leftrightarrow}\). Therefore, Q X  = C X . Since the causal states of the quantum transducer are all pure, all |s i 〉 are mutually orthogonal.16

Proof of Main Result. Reduced complexity and generality are consequences of the above theorem. Specifically given a particular input–output process, falsehood of (II) implies that its transducer is step-wise inefficient. Meanwhile, falsehood of (I) implies Q X  < C X for all non-pathological \(\mathop{X}\limits^{\leftrightarrow}\). Thus reduced complexity is implied by equivalence of (I) and (II). Generality is proven by contradiction. Assume that for some non-pathological \(\mathop{X}\limits^{\leftrightarrow}\), quantum transducers yield no improvement (i.e., Q X  = C X ) but some other physically realizable model has complexity less than C X . The former implies (I) is true, the latter implies (IV) is false, which violates the theorem. Thus, both reduced complexity and generality must hold.