Abstract
All natural things process and transform information. They receive environmental information as input, and transform it into appropriate output responses. Much of science is dedicated to building models of such systems—algorithmic abstractions of their input–output behavior that allow us to simulate how such systems can behave in the future, conditioned on what has transpired in the past. Here, we show that classical models cannot avoid inefficiency—storing past information that is unnecessary for correct future simulation. We construct quantum models that mitigate this waste, whenever it is physically possible to do so. This suggests that the complexity of general input–output processes depends fundamentally on what sort of information theory we use to describe them.
Introduction
Every experiment involves applying actions to some system, and recording corresponding output responses. Both inputs and outputs are recorded as classical bits of information, and the system’s operational behavior can always be regarded as an input–output process that transforms inputs to outputs. Quantitative science aims to capture such behavior within mathematical models—algorithmic abstractions that can simulate future behavior based on past observations.
There is keen interest in finding the simplest models—models that replicate a system’s future behavior while storing the least past information.^{1, 2} The motivations are twofold. Firstly, from the rationale of Occam’s razor, we should posit no more causes of natural things than are necessary to explain their appearances. Every piece of past information a model requires represents a potential cause of future events, and thus simpler models better isolate the true indicators of future behavior. The second is practical. As we wish to simulate and engineer systems of increasing complexity, there is always need to find methods that utilize more modest memory requirements.
This motivated systematic methods for constructing such models. The state of the art are εtransducers, models of input–output processes that are provably optimal—no other means of modeling a given input–output process can use less past information.^{3} The amount of past information such a transducer requires thus presents a natural measure of the process’s intrinsic complexity. This heralded new ways to understand structure in diverse systems, ranging from evolutionary dynamics to actionperception cycles.^{4–7} Yet εtransducers are classical, their optimality only proven among classical models. Recent research indicates that quantum models can more simply simulate stochastic processes that evolve independently of input.^{8–10} Can quantum theory also surpass classical limits in modeling general processes that behave differently on different input?
Here, we present systematic methods to construct quantum transducers—quantum models that can be simpler than their optimal classical counterparts. The resulting constructions exhibit significant generality: they improve upon optimal classical models whenever it is physically possible to do so. Our work indicates that classical models waste information unavoidably and this waste can be mitigated via quantum processing.
Framework
We adopt the framework of computational mechanics.^{1–3} An input–output process describes a system that, at each discrete timestep $t\in \mathbb{Z}$, can be ‘kicked’ in a number of different ways, denoted by some x^{(t)} selected from a set of possible inputs $\mathcal{X}$. In response, the system emits some y^{(t)} among a set of possible outputs $\mathcal{Y}$. For each possible biinfinite input sequence $\overleftrightarrow{x}=\dots \phantom{\rule{0.3em}{0ex}}{x}^{\left(1\right)}{x}^{\left(0\right)}{x}^{\left(1\right)}\dots $, the output of the system can be described by a stochastic process, $\overleftrightarrow{Y}=\dots \phantom{\rule{0.3em}{0ex}}{Y}^{\left(1\right)}{Y}^{\mathrm{(0)}}{Y}^{\mathrm{(1)}}\dots $, a biinfinite string of random variables where each Y^{(t)} governs the output y^{(t)}. The blackbox behavior of any input–output process is characterized by a family of stochastic processes, ${\left\{\overleftrightarrow{Y}\mid \overleftrightarrow{x}\right\}}_{\overleftrightarrow{x}\in \overleftrightarrow{X}}$. When the input $\overleftrightarrow{x}$ is governed by some stochastic process $\overleftrightarrow{X}$, the input–output process outputs $\overleftrightarrow{y}$ with probability.
Therefore, an input–output process acts as a transducer on stochastic processes—it takes one stochastic process as input, and transforms it into another.
Computational mechanics typically assumes processes are causal and stationary. Causality implies the future inputs do not retroactively affect past outputs. That is, for all $L\in {\mathbb{Z}}^{+}$, we require $P\left({Y}^{\left(t:t+L\right)}\mid \overleftrightarrow{X}\right)=P\left({Y}^{\left(t:t+L\right)}\mid {\stackrel{\u20d6}{X}}^{\left(t+L+1\right)}\right)$, where ${\stackrel{\u20d6}{X}}^{\left(t+L+1\right)}=\dots \phantom{\rule{0.3em}{0ex}}{X}^{\left(t+L1\right)}\phantom{\rule{0.3em}{0ex}}{X}^{\left(t+L\right)}$ and Y^{(t:t + L)} = Y^{(t)}…Y^{(t + L)}. This naturally bipartitions each stochastic process $\overleftrightarrow{Y}$ into two halves, ${\stackrel{\u20d6}{y}}^{\left(t\right)}=\dots {y}^{\left(t2\right)}{y}^{\left(t1\right)}$ to represent events in the past of t, and ${\stackrel{\u20d7}{y}}^{\left(t\right)}={y}^{\left(t\right)}{y}^{\left(t+1\right)}\dots $ to describe events in the future, governed, respectively, by ${\stackrel{\u20d6}{Y}}^{\left(t\right)}$ and ${\stackrel{\u20d7}{Y}}^{\left(t\right)}$. Stationarity implies the process is invariant with respect to time translation, such that $P\left({Y}^{\left(t:t+L\right)}\mid {\overleftrightarrow{x}}^{\left(t\right)}\right)=P\left({Y}^{\left(0:L\right)}\mid {\overleftrightarrow{x}}^{\left(0\right)}\right)$ and $P\left({\overleftrightarrow{Y}}^{\left(t\right)}\mid {\overleftrightarrow{x}}^{\left(t\right)}\right)=P\left({\overleftrightarrow{Y}}^{\mathrm{(0)}}\mid {\overleftrightarrow{x}}^{\mathrm{(0)}}\right)$, where ${\overleftrightarrow{x}}^{\left(t\right)}=\dots {x}^{\left(t1\right)}{x}^{\left(t\right)}{x}^{\left(t+1\right)}\dots $ is governed by ${\overleftrightarrow{X}}^{\left(t\right)}$. Hence, we can take the present to be t = 0, and omit the superscript (t).
Each instance of an input–output process has some specific past $\stackrel{\u20d6}{z}=\left(\stackrel{\u20d6}{x},\phantom{\rule{0.3em}{0ex}}\stackrel{\u20d6}{y}\right)$. On future input $\stackrel{\u20d7}{x}$, the process will then exhibit a corresponding conditional future governed by $P\left[\stackrel{\u20d7}{Y}\mid \stackrel{\u20d7}{X}=\stackrel{\u20d7}{x},\phantom{\rule{0.3em}{0ex}}\stackrel{\u20d6}{Z}=\stackrel{\u20d6}{z}\right]$. A mathematical model of the process should replicate its future blackbox behavior when given information about the past. That is, each model records $s\left(\stackrel{\u20d6}{z}\right)$ in some physical memory $\Xi $ in place of $\stackrel{\u20d6}{z}$, such that upon future input $\stackrel{\u20d7}{x}$ the model can generate a random variable $\stackrel{\u20d7}{Y}$ according to $P\left[\stackrel{\u20d7}{Y}\mid \stackrel{\u20d7}{x},\phantom{\rule{0.3em}{0ex}}\stackrel{\u20d6}{z}\right]$ (see Fig. 1).
Simplest classical models
Numerous mathematical models exist for each input–output process. A brute force approach involves storing all past inputs and outputs. This is clearly inefficient. Consider the trivial input–output process that outputs a completely random sequence regardless of input. Storing all past information would take an unbounded amount of memory. Yet, this process can be simulated by flipping an unbiased coin—requiring no information about the past.
A more refined approach reasons that replicating future behavior does not require differentiation of pasts with statistically identical future behavior. Formally, we define the equivalence relation $\stackrel{\u20d6}{z}{~}_{\epsilon}{\stackrel{\u20d6}{z}}^{\prime}$ whenever two pasts, $\stackrel{\u20d6}{z}$ and $\stackrel{\u20d6}{z}$′, exhibit statistically coinciding future input–output behavior, i.e., whenever $P\left[\stackrel{\u20d7}{Y}\mid \stackrel{\u20d7}{X},\phantom{\rule{0.3em}{0ex}}\stackrel{\u20d6}{Z}=\stackrel{\u20d6}{z}\right]=P\left[\stackrel{\u20d7}{Y}\mid \stackrel{\u20d7}{X},\phantom{\rule{0.3em}{0ex}}\stackrel{\u20d6}{Z}={\stackrel{\u20d6}{z}}^{\prime}\right]$. This partitions the space of all pasts into equivalence classes $\mathcal{S}=\left\{{s}_{i}\right\}$. Each ${s}_{j}\in \mathcal{S}$ is known as a causal state, and ε denotes the encoding function that maps each past to its corresponding causal state. In general, a process can have an infinite number of causal states. Classical studies generally concentrate on cases where $n=\mid \mathcal{S}\mid $ is finite. In light of this, we have focused our presentation on such cases.
This motivates the εtransducer, which stores the causal state $\epsilon \left(\stackrel{\u20d6}{z}\right)$ in place of $\stackrel{\u20d6}{z}$. It then operates according to the transition elements.
the probability a transducer in causal state ${s}_{i}\in \mathcal{S}$ will transition to causal state ${s}_{j}\in \mathcal{S}$ while emitting $y\in \mathcal{Y}$, conditioned on receiving input $x\in \mathcal{X}$. Note that this construction is naturally unifiliar—given the state of the transducer at the current timestep, its state at the subsequent timestep can be completely deduced by observation of the next input–output pair.^{3} Thus iterating through this procedure generates output behavior statistically identical to that of the original input–output process.
For each stationary input process $\overleftrightarrow{X}$, causal state s_{ i } occurs with probability ${p}_{X}\phantom{\rule{0.3em}{0ex}}\left(i\right)=P\left[\epsilon \left(\stackrel{\u20d6}{z}\right)={s}_{i}\right]$. The εtransducer will thus exhibit internal entropy.
εtransducers are the provably simplest classical models—any other encoding function $s\left(\stackrel{\u20d6}{z}\right)$ that generates correct future statistics, will exhibit greater entropy.^{3}
Complexity theorists regard ${C}_{X}$ as a quantifier of complexity,^{3} the rationale being that it characterizes the minimum memory any model must store when simulating ${\left\{\overleftrightarrow{Y}\mid \overleftrightarrow{x}\right\}}_{\overleftrightarrow{x}\in \overleftrightarrow{X}}$ on input $\overleftrightarrow{X}$. More precisely, consider the simulation of N such input–output processes, where each instance is driven by $\overleftrightarrow{X}$. C_{ x } then specifies that in the asymptotic limit (N→∞) we can use the εtransducer to replicate the future statistics of the ensemble by storing the past within a system of NC_{ X } bits.
In the special case where the input–output process is input independent (i.e., $\stackrel{\u20d7}{Y}\mid \overleftrightarrow{x}$ is the same for all $\overleftrightarrow{x}$), the causal states are reduced to equivalences classes on the set of past outputs $\stackrel{\u20d6}{\mathcal{Y}}$. T_{ ij }^{yx} become independent of x and is denoted T_{ ij }^{y}. Here, the εtransducers are known as εmachines and C_{ X } as the statistical complexity. This measure has been applied extensively to quantify the structure of various stochastic processes.^{12–15}
For general input–output processes, C_{ X } is $\overleftrightarrow{X}$dependent and is known as the inputdependent statistical complexity. In certain pathological cases (e.g., always inputting the same input at every timestep), the transducer may have zero probability of being in a particular causal state, potentially leading to a drastic reduction in ${C}_{X}$. Here, we consider nonpathological inputs, such that the transducer has nonzero probability of being in each causal state, i.e., p_{ X }(i) > 0 for all i. It is also often useful to quantify the intrinsic structure of input–output processes without referring to a specific input process.^{3} One proposal is the structural complexity, $\stackrel{\u0304}{C}={sup}_{X}\phantom{\rule{0.3em}{0ex}}{C}_{X}$, which measures how much memory is required in the worst case scenario.
Example
We illustrate a simple example of an actively perturbed coin. Consider a box with two buttons, containing a single coin. At each timestep, the box accepts a single bit x∈{0,1} as input representing which of the two buttons is pressed. In response, it flips the coin with probability p if x = 1, and probability q if x = 0, where 0 < p, q < 1. The box then outputs the new state of the coin, y. The behavior of the box is described by an input–output process.
First note that when p = q = 0.5, the output of the device becomes completely random, all pasts collapse to a single causal state, and the statistical complexity is trivially zero. In all other cases, the past partitions into two causal states, ${s}_{k}=\left\{\stackrel{\u20d6}{z}:{y}^{\left(1\right)}=k\right\}$, k = 0,1, corresponding to the two possible outcomes of the previous coin toss. Future statistics can then be generated via appropriate transition elements. For details, see the graphical representation in Fig. 2.
Consider a simple input process ${\overleftrightarrow{X}}_{u}$ where x = 1 is chosen with some fixed probability u at each timestep, the symmetry of the transition elements implies s_{0} and s_{1} occur with equiprobability. Thus ${C}_{{X}_{u}}\mathrm{=\; 1}$. Furthermore, as this is the maximum entropy a twostate machine can take, the classical structural complexity of the actively perturbed coin, $\stackrel{\u0304}{C}$, is also 1.
Classical inefficiency
Even though the εtransducer is classically optimal, it may still store unnecessary information. Consider the example above. The εtransducer must store the last state of the coin (i.e., whether the past is in s_{0} or s_{1}). However, irrespective of input x, both s_{0} and s_{1} have nonzero probability of transitioning to the same s_{ k } while emitting the same output y. Once this happens, some of the information being used to perfectly discriminate between s_{0} and s_{1} will be irrevocably lost—i.e., there exists no systematic method to perfectly retrodict whether an εtransducer was initialized in s_{0} or s_{1} from its future behavior, regardless of how we choose future inputs. Thus, the transducer appears to store information that will never be reflected in future observations, and is, therefore, wasted.
We generalize this observation by introducing stepwise inefficiency. Consider an εtransducer equipped with causal states $\mathcal{S}$ and transition elements T_{ ij }^{yx}. Suppose there exists ${s}_{i},\phantom{\rule{0.3em}{0ex}}{s}_{j}\in \mathcal{S}$ such that irrespective of input x, both s_{ i } and s_{ j } have nonzero probability of transitioning to some coinciding ${s}_{k}\in \mathcal{S}$ while emitting a coinciding output $y\in \mathcal{Y}$—i.e., for all $x\in \mathcal{X}$ there exists $y\in \mathcal{Y},\phantom{\rule{0.3em}{0ex}}{s}_{k}\in \mathcal{S}$ such that both T_{ ik }^{yx} and T_{ jk }^{yx} are nonzero. This implies that at the subsequent timestep, it is impossible to infer which of the two causal states the transducer was previously in with certainty. Thus, some of the information being stored during the previous timestep is inevitably lost. We refer to any εtransducer that exhibits this condition as being stepwise inefficient.
Results
Quantum processing can mitigate this inefficiency. Whenever the εtransducer of a given input–output process is stepwise inefficient, we can construct a quantum transducer that is provably simpler. Our construction assigns each s_{ i } a corresponding quantum causal state
where ${\otimes}_{x}$ represents the direct product over all possible inputs $x\in \mathcal{X}$, while $\u2223y\u27e9$ and $\u2223k\u27e9$ denote some orthonormal basis for spaces of dimension $\mid \mathcal{Y}\mid $ and $\mid \mathcal{S}\mid $_{,} respectively. The set of quantum causal states ${\left\{\u2223{s}_{i}\u27e9\right\}}_{i}$ form a state space for the quantum transducer. Note that although each state is represented as a vector of dimension ${\left(\mid \mathcal{Y}\mid \mid \mathcal{S}\mid \right)}^{\mid \mathcal{X}\mid}$, in practice they span a Hilbert space of dimension at most $n=\mid \mathcal{S}\mid $, whenever $\mid \mathcal{S}\mid $ is finite. In such scenarios, the quantum causal states can always be stored losslessly within a ndimensional quantum system (see subsequent example and methods).
For an input process $\overleftrightarrow{X}$, the quantum transducer thus has inputdependent complexity of
where ${\rho}_{X}={\sum}_{i}{p}_{X}\phantom{\rule{0.3em}{0ex}}\left(i\right)\phantom{\rule{0.3em}{0ex}}\u2223{s}_{i}\u27e9\u27e8{s}_{i}\u2223$. In general ${\mathcal{Q}}_{X}\le {C}_{X}$.^{16} Note that when there is only one possible input, $\u2223{s}_{i}\u27e9=\u2223{s}_{i}^{0}\u27e9$. This recovers existing quantum εmachines that can model autonomously evolving stochastic processes better than their simplest classical counterparts.^{8} Our transducers generalize this result to input–output processes. We show that they field the following properties:

1.
Correctness: For any past $\stackrel{\u20d6}{z}$ in causal state s_{ i }, a quantum transducer initialized in state $\u2223{s}_{i}\u27e9$ can exhibit correct future statistical behavior. i.e., there exists a systematic protocol that, when given $\u2223{s}_{i}\u27e9$ generates $\stackrel{\u20d7}{y}$ according to $P\left[\stackrel{\u20d7}{Y}\mid \stackrel{\u20d7}{X}=\stackrel{\u20d7}{x},\phantom{\rule{0.3em}{0ex}}\stackrel{\u20d6}{Z}=\stackrel{\u20d6}{z}\right]$, for each possible sequence of future inputs $\stackrel{\u20d7}{x}$.

2.
Reduced Complexity: Q_{ X } < C_{ X } for all nonpathologic input processes $\overleftrightarrow{X}$, whenever the process has a stepwise inefficient εtransducer.

3.
Generality: Quantum transducers store less memory whenever it is physically possible to do so. Given an input–output process, either Q_{ X } < C_{ X } for all nonpathological $\overleftrightarrow{X}$, or there exists no physically realizable model that does this.
Correctness guarantees quantum transducers behave statistically identically to their classical counterparts—and are thus operationally indistinguishable from the input–output processes they simulate. The proof is done by explicit construction (see Fig. 3 and details in the methods).
Reduced complexity implies stepwise inefficiency is sufficient for quantum transducers to be simpler than their classical counterparts. The proof involves showing that if for any potential input x, both s_{ i } and s_{ j } have nonzero probability of transitioning to some coinciding causal state s_{ k } while emitting identical y, then s_{ i }〉 and s_{ j }〉 are nonorthogonal (see methods). Thus, provided two such causal states exist (guaranteed by stepwise inefficiency of the transducer), and each occur with some nonzero probability (guaranteed by nonpathology of the input), Q_{ X } < C_{ X }.
Generality establishes that stepwise inefficiency is a necessary condition for any physically realizable quantum model to outperform its classical counterpart. Combined with ‘reduced complexity’, they imply that stepwise inefficiency is the sole source of avoidable classical inefficiency and that our particular quantum transducers are general in mitigating this inefficiency. The proof is detailed in the methods, and involves showing that any model which improves upon an εtransducer that is stepwise efficient allows perfect discrimination of nonorthogonal quantum states.
Together, these results isolate stepwise inefficiency as the necessary and sufficient condition for quantum models to be simpler than their classical counterparts, and furthermore, establish an explicit construction of such a model. It follows that whenever the εtransducer is stepwise inefficient, the upperbound,
will be strictly less than $\stackrel{\u0304}{C}={sup}_{X}\phantom{\rule{0.3em}{0ex}}{C}_{X}$, provided sup_{ X }C_{ X } is attained for a nonpathological $\overleftrightarrow{X}$. Intuitively, this clause appears natural. If an agent wished to drive a transducer to exhibit the greatest entropy, then it would be generally advantageous to ensure the transducer has finite probability of being in each causal state. Nevertheless, as the maximization is highly nontrivial to evaluate, this remains an open conjecture.
Example revisited
We illustrate these results for the aforementioned actively perturbed coin. Recall that the εtransducer of this process features two causal states, s_{0} and s_{1} (see Fig. 2). As this transducer is stepwise inefficient, a more efficient quantum transducer exists. Specifically, set the quantum causal states to
where r = 16pq(1−p)(1−q). Note that while these states do not resemble the standard form in Eq. (4) they are unitarily equivalent. Given ${\left\{\u2223{\tau}_{i}\u27e9\right\}}_{i}$, we can initialize the joint four qubit state $\u2223{\tau}_{i}000\u27e9$ (with three ancilla), and implement an appropriate 4qubit unitary $U:\u2223{\tau}_{i}000\u27e9\to \u2223{s}_{i}\u27e9$ for j = 0,1, where
$$\u2223{s}_{0}\u27e9={\u2223{\varphi}_{p}\u27e9}_{12}\otimes {\u2223{\varphi}_{q}\u27e9}_{34},\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\u2223{s}_{1}\u27e9={\widehat{X}}_{1}{\widehat{X}}_{2}{\widehat{X}}_{3}{\widehat{X}}_{4}\u2223{s}_{0}\u27e9.$$are of standard form. Here subscripts 1…4 label each of the four qubits, while $\u2223{\varphi}_{p}\u27e9=\sqrt{1p}\u222300\u27e9+\sqrt{p}\u222311\u27e9$ and $\u2223{\varphi}_{q}\u27e9=\sqrt{1q}\u222300\u27e9+\sqrt{q}\u222311\u27e9$, and ${\widehat{X}}_{i}$ represents the Pauli X operator on the i^{th} qubit. The $\u2223{\tau}_{i}\u27e9$ representation makes it clear that these states can be encoded within a single qubit. Figure 4a then outlines a quantum circuit that generates desired future behavior.
The resulting quantum transducer is clearly more efficient. $\mid \u27e8{\tau}_{0}\mid {\tau}_{1}\u27e9\mid =\sqrt{r}\mathrm{>\; 0}$, provided 0 < p,q < 1. Thus Q_{ X } < C_{ X } for all input processes $\overleftrightarrow{X}$. Furthermore, the quantum structural complexity $\stackrel{\u0304}{Q}={sup}_{X}\phantom{\rule{0.3em}{0ex}}{Q}_{X}$ is attained for any input process where τ_{0}〉 and τ_{1}〉 occur with equiprobability, such as any $\overleftrightarrow{X}={\overleftrightarrow{X}}_{u}$. The improvement is significant, and can be explicitly evaluated (see Fig. 4b). In particular, ${\mathrm{lim}}_{p,q\to 0.5}{Q}_{X}\u2215{C}_{X}\mathrm{=\; 0}$. Thus the quantum transducer, in limiting cases, can use negligible memory compared to its classical counterpart.
The intuition is that as we approach this limit, the output of the process becomes progressively more random. Thus future blackbox behavior of a process whose last output is heads becomes increasingly similar to one whose last output is tails. The equivalence class s_{0} or s_{1}, then contains progressively less information about future outputs. A classical transducer nevertheless must distinguish these two scenarios, and thus exhibits an entropy of 1 when the scenarios are equally likely. A quantum transducer, however, has the freedom to only partially distinguish the two scenarios to the extent in which they affect future statistics. In particular ${\mathrm{lim}}_{p,q\to 0.5}\mid \u27e8{\tau}_{0}\mid {\tau}_{1}\u27e9\mid \to 1$. As the process becomes more random, the quantum transducer saves memory by encoding the two scenarios in progressively less orthogonal quantum states.
Future directions
There is potential in viewing predictive modeling as a communication task, where Alice sends information about a process’s past to Bob, so that he may generate correct future statistical behavior.^{3, 9} The simpler a model, the less information Alice needs to communicate to Bob. The entropic benefits of harnessing nonorthogonal causal states mirrors similar advantages in exploiting nonorthogonal codewords to perform certain communication tasks.^{17} Quantum transducers could thus identify a larger class of such tasks, and provide a general strategy to supersede classical limits. Meanwhile, one may also consider generalizations of statistical complexity that use other measures of entropy. The max entropy, for example, captures the minimum dimensionality required to simulate general input–output processes. This may complement existing work in quantum dimensionality testing,^{18–20} pointing to testing the dimensionality of systems by seeing how they transform stochastic processes.
Another interesting question is how quantum transducers relate to quantum advantage in randomness processing.^{21} In this context, it was shown that quantum sources of randomness (named quoins), in the form $\u2223p\u27e9=\sqrt{p}\u22230\u27e9+\sqrt{1p}\u22231\u27e9$ can be a much more powerful resource for sampling a coin with a pdependent bias f(p), than classical coins of bias p. Subsequent experimental implementations have used quoins to sample certain f(p), which are impossible to synthesize when equipped with only pcoins.^{22} Quantum transducers appear to utilize similar effects. The quantum causal states in Eq. (4) also resemble a quantum superposition of classical measurement outcomes, which can be used to generate desired future output statistics more efficiently than classically possible. Is this resemblance merely superficial? Quantum transducers and the quantum Bernoulli factory certainly also field significant differences—both in how they quantify efficiency and in what they consider as input and output. As such this question remains very much open.
There could also be considerable interest in establishing what resources underpin the performance advantage in quantum transducers. Nonorthogonal quantum causal states, and thus coherence, is clearly necessary. This nonorthogonality then immediately implies that quantum correlations (in the sense of discord^{23}) necessarily exist between the state of the transducer and its past outputs. Could the amount of such resources be related quantitatively to the quantum advantage of a particular transducer? Whether more stringent quantum resources, such as entanglement, are also required at some point to generate correct future statistics, also remains an open question. Certainly, all quantum transducers described here exploit highly entangling operations to generate statistically correct future behavior—and field a significant amount of entanglement during their operation. Is the existence of such entanglement at some stage during the simulation process essential for quantum advantage?
Discussion
In computational mechanics, εtransducers are the provably simplest models of input–output processes. Their internal entropy is a quantifier of structure—any device capable of replicating the process’s behavior must track at least this much information. Here, we generalize this formalism to the quantum regime. We propose a systematic method to construct quantum transducers that are generally simpler than their simplest classical counterparts; in the sense that quantum transducers store less information whenever it is physically possible to do so. Our work indicates the perceived complexity of input–output processes generally depends on what type of information theory we use to describe them.
A natural continuation is to explore the feasibility of such quantum transducers in real world conditions. A proof of principle demonstration is well within reach of presentday technology. The quantum transducer for the actively perturbed coin can be implemented by a single qubit undergoing one of two different weak measurements at each timestep. To demonstrate a quantum advantage for real world applications would also motivate new theory. For example, noise will certainly degrade the performance of quantum transducers, forcing the use of more distinguishable quantum causal states. The derivation of bounds on the resultant entropic cost would thus help us establish thresholds that guarantee quantum advantage in realworld conditions.
Ultimately, the interface between quantum and computational mechanics motivates the potential for tools of each community to impact the other. Classical transducers, for example, are used to capture emergence of complexity under evolutionary pressures,^{24} distil structure within cellular automata,^{25, 26} and characterize the dynamics of sequential learning,^{27} and it would be interesting to see how these ideas change in the quantum regime. Meanwhile, recent results suggests the inefficiency of classical models may incur unavoidable thermodynamic costs.^{28–30} In reducing this inefficiency, quantum transducers could offer more energetically efficient methods of transforming information. Any such developments could demonstrate the impact of quantum technologies in domains where their use has not been considered before.
Methods
Definitions
Let X^{(t)},Y^{(t)} and S^{(t)} represent, respectively, the random variables governing the input, output and causal state at time t. Let Y^{(0:t)} = Y^{(0)}…Y^{(t)} govern the outputs y^{(0:t)} = y^{(0)}…y^{(t)} and X^{(0:t)} = X^{(0)}…X^{(t)} govern the inputs x^{(0:t)} = x^{(0)}…x^{(t)}. We introduce ordered pairs Z^{(t)} = (X^{(t)}, Y^{(t)}) which take values z^{(t)} = (x^{(t)}, y^{(t)})∈$\mathcal{Z}$, where $\mathcal{Z}$ represents the space of potential input–output pairs. In analogy, let $\stackrel{\u20d6}{z}=\left(\stackrel{\u20d6}{x},\phantom{\rule{0.3em}{0ex}}\stackrel{\u20d6}{y}\right)$ represents a particular past, $\stackrel{\u20d6}{\mathcal{Z}}$ be the space of all possible pasts and z^{(0:t)} = z^{(0)}…z^{(t)} ∈ ${\mathcal{Z}}^{t+1}$ the input–output pairs from timesteps 0 to t. Let $\mid \mathcal{X}\mid $ denote the size of the input alphabet, $\mid \mathcal{Y}\mid $ the size of output alphabet and $n=\mid \mathcal{S}\mid $ the number of causal states(if finite). Without loss of generality, we assume the inputs and outputs are labeled numerically, such that $\mathcal{X}={\left\{{x}_{i}\right\}}_{0}^{\u2223\mathcal{X}\u22231}$ and $\mathcal{Y}={\left\{{y}_{i}\right\}}_{0}^{\u2223\mathcal{Y}\u22231}$.
Each instance of an input–output process $\left\{\overleftrightarrow{Y}\mid \overleftrightarrow{x}\right\}$ exhibits a specific past $\stackrel{\u20d6}{z}=\left(\stackrel{\u20d6}{x},\phantom{\rule{0.3em}{0ex}}\stackrel{\u20d6}{y}\right)$. When supplied x^{(0)} = x as input, it emits y^{(0)} = y with probability $P\left[{Y}^{\mathrm{(0)}}=y\mid {X}^{\mathrm{(0)}}=x,\phantom{\rule{0.3em}{0ex}}\stackrel{\u20d6}{Z}=\stackrel{\u20d6}{z}\right]$. The system’s past then transitions from $\stackrel{\u20d6}{z}$ to ${\stackrel{\u20d6}{z}}^{\prime}=\left(\stackrel{\u20d6}{x}x,\phantom{\rule{0.3em}{0ex}}\stackrel{\u20d6}{y}y\right)=\left({\stackrel{\u20d6}{x}}^{\prime},\phantom{\rule{0.3em}{0ex}}{\stackrel{\u20d6}{y}}^{\prime}\right)$. This motivates the propagating functions μ_{ x,y } on the space of pasts, ${\mu}_{x,y}\left(\stackrel{\u20d6}{z}\right)=\left(\stackrel{\u20d6}{x}x,\phantom{\rule{0.3em}{0ex}}\stackrel{\u20d6}{y}y\right)$, which characterize how the past updates upon observation of (x,y) at each timestep. Iterating through this process for t + 1 timesteps gives expected output y^{(0:t)} with probability $P\left({Y}^{\left(0:t\right)}={y}^{\left(0:t\right)}\mid {X}^{\left(0:t\right)}={x}^{\left(0:t\right)},\phantom{\rule{0.3em}{0ex}}\stackrel{\u20d6}{Z}=\stackrel{\u20d6}{z}\right)$ upon receipt of input sequence x^{(0:t)}. Taking t→∞ gives the expected future input–output behavior $P\left[\stackrel{\u20d7}{Y}\mid \stackrel{\u20d7}{X}=\stackrel{\u20d7}{x},\phantom{\rule{0.3em}{0ex}}\stackrel{\u20d6}{Z}=\stackrel{\u20d6}{z}\right]$ upon future input sequence $\stackrel{\u20d7}{x}\in \stackrel{\u20d7}{\mathcal{X}}$.
A quantum model of an input–output process defines an encoding function $\aleph $ that maps each past $\stackrel{\u20d6}{z}$ to some state $\aleph \left(\stackrel{\u20d6}{z}\right)={\rho}_{\stackrel{\u20d7}{z}}$ within some physical system $\Xi $. The model is correct, if there exists a family of operations $\mathbb{M}={\left\{{\mathcal{M}}_{x}\right\}}_{x\in \mathcal{X}}$ such that application of ${\mathcal{M}}_{x}$ on $\Xi $ replicates the behavior of inputting $x$. That is, ${\mathcal{M}}_{x}$ acting on ${\rho}_{\stackrel{\u20d6}{z}}$ should (1) generate output y with probability $P\left[{Y}^{\mathrm{(0)}}\mid {X}^{\mathrm{(0)}}=x,\phantom{\rule{0.3em}{0ex}}\stackrel{\u20d6}{Z}=\stackrel{\u20d6}{z}\right]$ and (2) transition $\Xi $ into state $\aleph \left[{\mu}_{x,y}\left(\stackrel{\u20d6}{z}\right)\right]={\rho}_{{\mu}_{x,y}\left(\stackrel{\u20d6}{z}\right)}$. (1) ensures the model outputs statistically correct y^{(0)}, while (2) ensures the model’s internal memory is updated to record the event z^{(0)} = (x^{(0)},y^{(0)}), allowing correct prediction upon receipt of future inputs. Sequential application of ${\mathcal{M}}_{\stackrel{\u20d6}{x}}={\mathcal{M}}_{{x}^{\mathrm{(0)}}},\phantom{\rule{0.3em}{0ex}}{\mathcal{M}}_{{x}^{\mathrm{(1)}}},\dots $ then generates output $\stackrel{\u20d7}{y}$ with probability $P\left[\stackrel{\u20d7}{Y}\mid \stackrel{\u20d7}{X}=\stackrel{\u20d7}{x},\phantom{\rule{0.3em}{0ex}}\stackrel{\u20d6}{Z}=\stackrel{\u20d6}{z}\right]$. Let $\Omega ={\left\{{\rho}_{\stackrel{\u20d6}{z}}\right\}}_{\stackrel{\u20d6}{z}\in \stackrel{\u20d6}{\mathcal{Z}}}$ be the image of $\aleph $. We now define quantum models as follows.
Definition 1
A general quantum model of an input–output process is a triple$\mathcal{Q}=\left(\aleph ,\Omega ,\mathbb{M}\right)$, where$\aleph $, Ω and$\mathbb{M}$satisfy the conditions above.
Each stationary input process $\overleftrightarrow{X}$ induces a probability distribution ${p}_{X}\left(\stackrel{\u20d6}{z}\right)$ over the set of pasts, and thus a steady state of the machine ρ_{ X }. The resulting entropy of $\Xi $,
then defines the model’s inputdependent complexity. For the quantum transducers in the main body, $\Omega =\{\mid {s}_{i}\u27e9{\u27e8{s}_{i}\mid \}}_{i}$ corresponds to the set of quantum causal states, and the encoding function $\aleph $ maps each past $\stackrel{\u20d6}{z}$ to$\u2223{s}_{i}\u27e9\u27e8{s}_{i}\u2223$, whenever $\stackrel{\u20d6}{z}\in {s}_{i}$. Specifically, if we define the classical encoding function ε: $\stackrel{\u20d6}{\mathcal{Z}}\to \mathcal{S}$ that maps pasts onto causal states such that $\epsilon \left(\stackrel{\u20d6}{z}\right)={s}_{i}$ iff $\stackrel{\u20d6}{z}\in {s}_{i}$, then $\u2223{s}_{i}\u27e9=\u2223\epsilon \left(\stackrel{\u20d6}{z}\right)\u27e9$.
We also introduce input strategies. Suppose Bob receives a model of some known input–output process, initialized in some (possibly unknown) state ${\rho}_{\stackrel{\u20d6}{z}}\in \Omega $. Bob now wants to drive the model to exhibit some particular future behavior by using an input strategy—a specific algorithm for deciding what input x^{(t)} he will feed the model at each specific t ≥ 0, purely from the model’s blackbox behavior.
Definition 2
An input strategy is a family of functions$F=\left\{{f}^{\left(t\right)}\mid t\in {\mathbb{Z}}^{+}\right\}$, where${f}^{\left(t\right)}:{\mathcal{Z}}^{t}\to \mathcal{X}$is a map from the space of preexisting inputs–outputs onto the input at time t, such that x^{(t)} = f^{(t)}(z^{(0:t−1)}). We denote a sequence of future inputs which is determined using x^{(t)} = f^{(t)}(z^{(0:t−1)}), as${\stackrel{\u20d7}{x}}_{F}$.
In subsequent proofs, we will invoke input strategies on classical εtransducers. Here, we denote
as the probability distribution that governs future outputs $\stackrel{\u20d7}{Y}$, when an input strategy F is used to select the future inputs ${\stackrel{\u20d7}{x}}_{F}$ to an εtransducer initialized in some causal state ${s}_{i}\in \mathcal{S}$.
We also make use of the trace distance, $D\left[P,\phantom{\rule{0.3em}{0ex}}Q\right]=1\u22152{\sum}_{\stackrel{\u20d7}{y}\in \stackrel{\u20d7}{\mathcal{Y}}}\mid P\left[\stackrel{\u20d7}{y}\right]Q\left[\stackrel{\u20d7}{y}\right]\mid $ between two probability distributions $P\left[\stackrel{\u20d7}{y}\right]$ and $Q\left[\stackrel{\u20d7}{y}\right]$. Similarly, any two quantum states τ and σ have trace distance D[τ,σ] = 1/2(Trτ−σ), where $\mid A\mid \equiv \sqrt{{A}^{\u2020}A}$.
Proof of correctness
Here, we prove a quantum transducer can generate correct future statistical behavior when supplied with a quantum system $\Xi $ initialized in state $\u2223{s}_{i}\u27e9$, encoding information about $\stackrel{\u20d6}{z}\in {s}_{i}$. That is, there exists a family of quantum processes $\mathbb{M}=\left\{{\mathcal{M}}_{x}\right\}$ whose action on $\Xi $ produces an output y sampled from $P\left[{Y}^{\mathrm{(0)}}\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}y\mid {X}^{\mathrm{(0)}}=x,{S}^{\left(1\right)}\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}{s}_{i}\right]$, while transforming the state of $\Xi $ to$\u2223\epsilon \left(\stackrel{\u20d6}{x}x,\phantom{\rule{0.3em}{0ex}}\stackrel{\u20d6}{y}y\right)\u27e9$.
Proof
Recall that $\u2223{s}_{i}\u27e9={\otimes}_{x}\u2223{s}_{i}^{x}\u27e9$ where $\u2223{s}_{i}^{x}\u27e9={\sum}_{k\mathrm{=0}}^{n1}{\sum}_{y\in \mathcal{Y}}\sqrt{{T}_{ik}^{y\mid x}}\u2223y\u27e9\u2223k\u27e9$. Let Σ be the $\mid \mathcal{Y}\mid $ dimensional Hilbert space spanned by ${\left\{\u2223y\u27e9\right\}}_{y\in \mathcal{Y}}$ and $\mathcal{K}$ be the ndimensional Hilbert space spanned by ${\left\{\u2223k\u27e9\right\}}_{k=0}^{n1}$. Then each $\u2223{s}_{i}^{x}\u27e9$ lies in $\omega =\Sigma \otimes \mathcal{K}$, and each causal state s_{ i }〉 lies within $\mathcal{W}={\omega}^{\otimes \mid X\mid}$. The set ${\left\{\u2223{s}_{i}\u27e9\right\}}_{i=0}^{n1}$ spans some subspace of $\mathcal{W}$ of dimension at most n. This implies that the causal states can be stored within $\mathcal{K}$ without loss, i.e., there exists quantum states ${\left\{\u2223{\tau}_{i}\u27e9\right\}}_{i=0}^{n1}$ in $\mathcal{K}$ and a unitary process U such that $U:\u2223{\tau}_{i}\u27e9\to \u2223{s}_{i}\u27e9$ for all i (Note, this assumes appending suitable ancillary subspaces to each τ_{ i }〉). An explicit form for τ_{ i }〉 can be systematically constructed through GramSchmidt decomposition.^{31} We refer to τ_{ i }〉 as compressed causal states, and U as the decompression operator.
We define the selection operator ${S}_{x}:\mathcal{W}\to \omega $ such that ${S}_{{x}^{\prime}}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}:\left({\otimes}_{x}\u2223{\varphi}_{x}\u27e9\right)\to \u2223{\varphi}_{{x}^{\prime}}\u27e9.$ Physically, if $\mathcal{W}$ represents a state space of $\mid \mathcal{X}\mid $ qudits each with state space ω labeled from 0 to $\mid \mathcal{X}\mid 1$, S_{ x } represents discarding (or tracing out) all except the x^{th} qudit. Meanwhile, let B be a quantum operation on ω such that $B:\u2223y\u27e9\u2223k\u27e9\u27e8y\u2223\u27e8k\u2223\to \u2223y\u27e9\u2223{\tau}_{k}\u27e9\u27e8y\u2223\u27e8{\tau}_{k}\u2223$. This operation always exists as it can be implemented by Kraus operators ${E}_{yk}=\u2223y\u27e9\u2223{\tau}_{k}\u27e9\u27e8y\u2223\u27e8k\u2223$. (Fig. 5).
The quantum transducer operates as follows. Upon input x at time t, it applies S_{ x } on $\Xi $, followed by execution of B. This transforms the state of the system from $\u2223{s}_{i}^{x}\u27e9$ to ${\sum}_{k=0}^{n1}{\sum}_{y\in \mathcal{Y}}{T}_{ik}^{y\mid x}\u2223y\u27e9\u2223{\tau}_{k}\u27e9\u27e8y\u2223\u27e8{\tau}_{k}\u2223$. Application of the decompression operator then gives ${\sum}_{k=0}^{n1}{\sum}_{y\in \mathcal{Y}}{T}_{ik}^{y\mid x}\u2223y\u27e9\u2223{s}_{k}\u27e9\u27e8y\u2223\u27e8{s}_{k}\u2223$ on state space $\Sigma \otimes \mathcal{W}$. The machine then emits Σ as the output. Measurement of Σ by an external observer in basis ${\left\{\u2223y\u27e9\right\}}_{y\in Y}$ gives the output y at time t, while inducing $\mathcal{W}$ to transition to the subsequent quantum causal state. $\mathcal{W}$ is then retained in $\Xi $.
The above procedure establishes a family of quantum operations ${\left\{{\mathcal{M}}_{x}\right\}}_{x\in \mathcal{X}}$ that maps each quantum causal state s_{ i }〉 to s_{ j }〉 while emitting output y with probability ${T}_{ij}^{y\mid x}$. Thus, the quantum transducer operates statistically indistinguishably from its classical counterpart, and must be correct. To simulate the future behavior of the input–output process upon future input $\stackrel{\u20d7}{x}$, the quantum transducer iterates through the above process by sequential application of ${\mathcal{M}}_{{x}^{\mathrm{(0)}}}$, ${\mathcal{M}}_{{x}^{\mathrm{(1)}}}$, etc. This establishes the transducer is a valid quantum model.
Proof of reduced complexity and generality
Given an arbitrary input–output process, whose εtransducer and quantum transducer has respective input dependent complexities of C_{ X } and Q_{ X }, we show that

Reduced Complexity: Q_{ X } < C_{ X } for all nonpathological input processes $\overleftrightarrow{X}$, whenever the εtransducer is stepwise inefficient.

Generality: Given an input–output process, either Q_{ X } < C_{ X } for all nonpathological input processes $\overleftrightarrow{X}$ or there exists no physically realizable model that does this.
Our strategy makes use of the following theorem.
Theorem 1. For any input–output process the following statements are equivalent:
(I)$\u27e8{s}_{i}\mid {s}_{j}\u27e9\mathrm{=\; 0}$for any$\u2223{s}_{i}\u27e9,\u2223{s}_{j}\u27e9\in {\left\{\u2223{s}_{i}\u27e9\right\}}_{i}$where i≠j.
(II) For any pair$\left({s}_{i},{s}_{j}\right)\in \mathcal{S}\times \mathcal{S}$ where i≠j, $\exists x\in \mathcal{X}$such that$\forall y\in \mathcal{Y},\phantom{\rule{0.3em}{0ex}}{s}_{k}\in \mathcal{S}$, the product${T}_{ik}^{y\mid x}{T}_{jk}^{y\mid x}\mathrm{=\; 0}$.
(III) For any pair$\left({s}_{i},{s}_{j}\right)\in \mathcal{S}\times \mathcal{S}$where i≠j, there exists an input strategy F such that$D\left[{P}_{{s}_{i},F},{P}_{{s}_{j},F}\right]=1$.
(IV) For any input process$\overleftrightarrow{X}$, any physically realizable (quantum) model must have complexity at least C_{ X }.
To prove this theorem, we show A. (I) is equivalent to (II). B. (II) implies (III), C. (III) implies (IV), and D. (IV) implies (I).
Proof of A. We prove this by showing (I) is false iff (II) is false. First, assume (I) is false, such that there exists $\u2223{s}_{i}\u27e9,\u2223{s}_{j}\u27e9\in {\left\{\u2223{s}_{i}\u27e9\right\}}_{i}$ with $\u27e8{s}_{i}\mid {s}_{j}\u27e9={\Pi}_{x\in X}\left({\sum}_{y,k}\sqrt{{T}_{ik}^{y\mid x}{T}_{jk}^{y\mid x}}\right)\mathrm{>\; 0}$. This implies that for all $x\in \mathcal{X}$ there exists ${s}_{k}\in \mathcal{S},\phantom{\rule{0.3em}{0ex}}y\in \mathcal{Y}$ such that ${T}_{ik}^{y\mid x}{T}_{jk}^{y\mid x}\ne 0$. Thus (II) is false. Meanwhile assume (II) is false, i.e., there exists s_{ i } and s_{ j } such that $\forall x$ we can find ${y}_{x}\in \mathcal{Y},\phantom{\rule{0.3em}{0ex}}{s}_{{k}_{x}}\in \mathcal{S}$ for which ${T}_{i{k}_{x}}^{{y}_{x}\mid x}{T}_{j{k}_{x}}^{{y}_{x}\mid x}\mathrm{>\; 0}$. It follows that $\u27e8{s}_{i}\mid {s}_{j}\u27e9>{\Pi}_{x}\left({T}_{i{k}_{x}}^{{y}_{x}\mid x}\mid x{T}_{j{k}_{x}}^{{y}_{x}\mid x}\right)>0$ and (I) is false.
Proof of B. To prove this, we introduce the update function $g:\mathcal{S}\times \mathcal{X}\times \mathcal{Y}\to \mathcal{S}$, such that g(s_{ i },x,y) = s_{ k } iff ${T}_{ik}^{y\mid x}\ne 0$. Note that g is always a function by joint unifiliarity of the εtransducer.^{3} That is, the triple (s^{(t−1)},x^{(t)},y^{(t)}) uniquely determines s^{(t)}.
We also use the following game to elucidate the proof. At time t = −1, Alice initializes an εtransducer in either s_{ i } or s_{ j } and seals it inside a black box. Alice gives this box to Bob and challenges him to infer whether S^{(−1)} = s_{ i } or S^{(−1)} = s_{ j }, based purely on the transducer’s future blackbox behavior. We first prove that if (II) is true, then for each pair (s_{ i },s_{ j }) there exists an input strategy F_{ ij } that allows Bob to discriminate S^{(−1)} = s_{ i } from S^{(−1)} = s_{ j } to arbitrary accuracy purely from the transducer’s output behavior.
Specifically (II) implies that for all pairs $\left({s}_{i},{s}_{j}\right)\in \mathcal{S}\times \mathcal{S}$, there exists some x_{ ij } such that $\forall y\in \mathcal{Y},\phantom{\rule{0.3em}{0ex}}{s}_{k}\in \mathcal{S}$, ${T}_{ik}^{y\mid {x}_{ij}}{T}_{jk}^{y\mid {x}_{ij}}\mathrm{=\; 0}$. At t = 0, Bob inputs X^{(0)} = x_{ ij }. Let y = Y^{(0)} be the corresponding output. Note that because we have observed y = Y^{(0)}, there must have been some nonzero probability of observing y on input x_{ ij }, i.e., at least one of (i) P[Y^{(0)} = yX^{(0)} = x_{ ij },S^{−1} = s_{ i }]≠0 or (ii) P[Y^{(0)} = yX^{(0)} = x_{ ij },S^{−1} = s_{ j }]≠0 must be true. This presents two different possible scenarios:

Only one of (i) and (ii) is true. That is, one of s_{ i } or s_{ j } never outputs y upon input x_{ ij }—such that either P[Y^{(0)} = yX^{(0)} = x_{ ij }, S^{−1} = s_{ i }] = 0 or P[Y^{(0)} = yX^{(0)} = x_{ ij }, S^{−1} = s_{ j }] = 0.

Both (i) and (ii) are true, implying g(s_{ i }, y, x_{ ij })≠g(s_{ j }, y, x_{ ij }).
If (a) occurs, then Bob can immediately determine whether S^{(−1)} = s_{ i } or S^{(−1)} = s_{ j } and we are done. If (b) occurs, let s_{ i }′ = g(s_{ i }, y, x_{ ij }) be the new causal state if S^{(−1)} = s_{ i }. Let s_{ j }′ = g(s_{ j }, y, x_{ ij }) be the new causal state if S^{(−1)} = s_{ j }. Due to joint unifiliarity of the εtransducer, Bob is able to uniquely determine s_{ j′ } and s_{ i′ }upon observation of z^{(0)} = (x_{ ij }, y). Given s_{ i′ } and s_{ j′ }, (II) implies Bob can find x_{ ij }′ such that $\forall y\in \mathcal{Y},\phantom{\rule{0.3em}{0ex}}{s}_{k}\in \mathcal{S}$ the product ${T}_{{i}^{\prime}k}^{y\mid {x}_{ij}^{\prime}}{T}_{{j}^{\prime}k}^{y\mid {x}_{ij}^{\prime}}=0$. Thus, we can repeat the steps above choosing x^{(1)} = x_{ ij }′. Iterating this procedure defines an input strategy F_{ ij }, which determines each input x^{(t)} as a function of observed inputs and outputs. At each point in time t, Bob will be able to identify some s_{ i }^{(t)} which is the current causal state if S^{(−1)} = s_{ i }, and some s_{ j }^{(t)} which is the current causal state if S^{(−1)} = s_{ j }.
Eventually, either scenario (a) will occur allowing Bob to perfectly rule out S^{(−1)} = s_{ i }. Alternatively, in the limit of an infinite number of time steps, Bob can synchronize the εtransducer based on the observed inputs and outputs (that is, the causal state at time t is entirely determined by observation of the past in limit of large t^{1, 3}). Thus, Bob can determine s^{(t)} in the limit as t→∞, allowing inference of whether S^{(−1)} = s_{ i } or S^{(−1)} = s_{ j }. This constitutes an explicit input strategy F_{ ij } that allows Bob to discriminate between S^{(−1)} = s_{ i } and S^{(−1)} = s_{ j } to any arbitrary accuracy. Accordingly $D\left[{P}_{{s}_{i}{F}_{ij}},{P}_{{s}_{j},{F}_{ij}}\right]=1$.
Proof of C. We prove this via its contrapositive. That is, suppose (IV) is false, such that there exists a quantum model ${\mathcal{Q}}^{\prime}$ with identical input–output relations to the process’s εtransducer, which stores ${Q}_{X}\left[{\mathcal{Q}}^{\prime}\right]<{C}_{X}$ for some $\overleftrightarrow{X}$. We show that if (III) is true then the data processing inequality is violated.^{16}
We first make use of the following observation: If a model ${\mathcal{Q}}^{\prime}=\left({\aleph}^{\prime},{\Omega}^{\prime},{\mathbb{M}}^{\prime}\right)$ satisfies ${\aleph}^{\prime}\left(\stackrel{\u20d6}{z}\right)\ne {\aleph}^{\prime}\left({\stackrel{\u20d6}{z}}^{\prime}\right)$ for some $\stackrel{\u20d6}{z}{~}_{\epsilon}{\stackrel{\u20d6}{z}}^{\prime}$, then we can always construct an alternative model $\mathcal{Q}=\left(\aleph ,\Omega ,\mathbb{M}\right)$ such that ${Q}_{X}\left[\mathcal{Q}\right]\le {Q}_{X}\left[{\mathcal{Q}}^{\prime}\right]$ for all input processes $\overleftrightarrow{X}$, and $\aleph \left(\stackrel{\u20d6}{z}\right)=\aleph \left({\stackrel{\u20d6}{z}}^{\prime}\right)$ iff $\epsilon \left(\stackrel{\u20d6}{z}\right)=\epsilon \left({\stackrel{\u20d6}{z}}^{\prime}\right)$ for all $\stackrel{\u20d6}{z},\phantom{\rule{0.3em}{0ex}}{\stackrel{\u20d6}{z}}^{\prime}\in \stackrel{\u20d6}{\mathcal{Z}}$. (This is a consequence of the concavity of entropy, see methods in ref. 32). That is, for any model ${\mathcal{Q}}^{\prime}$ with quantum states Ω′ not in 11 correspondence with classical causal states, there always exists a simpler model $\mathcal{Q}$ whose quantum states are in 11 correspondence with the causal states.
Thus, falsehood of (IV) implies there must exist some quantum model $\mathcal{Q}=\left(\aleph ,\Omega ,\mathbb{M}\right)$ such that (i) $\aleph \left(\stackrel{\u20d6}{z}\right)=\aleph \left({\stackrel{\u20d6}{z}}^{\prime}\right)$ if and only if $\stackrel{\u20d6}{z}{~}_{\epsilon}{\stackrel{\u20d6}{z}}^{\prime}$ and (ii) ${Q}_{X}\left[\mathcal{Q}\right]<{C}_{X}$ for some $\overleftrightarrow{X}$. Now by virtue of (ii), there must exist two states ρ_{ i }, ρ_{ j } ∈Ω such that the trace distance
The data processing inequality, therefore, implies that any quantum operation ${\mathcal{M}}_{\stackrel{\u20d7}{x}}:\Xi \to \stackrel{\u20d7}{\mathcal{Y}}$ that generates future output statistics must satisfy $D\left[{\mathcal{M}}_{\stackrel{\u20d7}{x}}\left({\rho}_{i}\right),\phantom{\rule{0.3em}{0ex}}{\mathcal{M}}_{\stackrel{\u20d7}{x}}\left({\rho}_{j}\right)\right]\le D\left[{\rho}_{i},\phantom{\rule{0.3em}{0ex}}{\rho}_{j}\right]<1.$
However, all models of the same input–output process have identical blackbox behavior. In particular the εtransducer of the input–output process that $\mathcal{Q}$ models, must behave identically to $\mathcal{Q}$. As such, there exists two causal states of the classical εtransducer, ${s}_{i},{s}_{j}\in \mathcal{S}$ such that $D\left[{P}_{{s}_{i},F},{P}_{{s}_{j},F}\right]=D\left[{\mathcal{M}}_{{\stackrel{\u20d7}{x}}_{F}}\left({\rho}_{i}\right),\phantom{\rule{0.3em}{0ex}}{\mathcal{M}}_{{\stackrel{\u20d7}{x}}_{F}}\left({\rho}_{j}\right)\right]<1$ for all possible input strategies F. This implies (III) is false. Thus we have used proof by contrapositive to show (III) implies (IV).
Proof of D. The quantum transducer is a physically realizable model. Thus (IV) implies that Q_{ X } ≥ C_{ X } for all $\overleftrightarrow{X}$. However, we note from our construction Q_{ X } ≤ C_{ X } for all $\overleftrightarrow{X}$. Therefore, Q_{ X } = C_{ X }. Since the causal states of the quantum transducer are all pure, all s_{ i }〉 are mutually orthogonal.^{16}
Proof of Main Result. Reduced complexity and generality are consequences of the above theorem. Specifically given a particular input–output process, falsehood of (II) implies that its transducer is stepwise inefficient. Meanwhile, falsehood of (I) implies Q_{ X } < C_{ X } for all nonpathological $\overleftrightarrow{X}$. Thus reduced complexity is implied by equivalence of (I) and (II). Generality is proven by contradiction. Assume that for some nonpathological $\overleftrightarrow{X}$, quantum transducers yield no improvement (i.e., Q_{ X } = C_{ X }) but some other physically realizable model has complexity less than C_{ X }. The former implies (I) is true, the latter implies (IV) is false, which violates the theorem. Thus, both reduced complexity and generality must hold.
References
 1.
Shalizi, C. R. & Crutchfield, J. P. Computational mechanics: Pattern and prediction, structure and simplicity. J. Stat. Phys. 104, 817–879 (2001).
 2.
Crutchfield, J. P. & Young, K. Inferring statistical complexity. Phys. Rev. Lett. 63, 105 (1989).
 3.
Barnett, N. & Crutchfield, J. P. Computational mechanics of input—output processes: Structured transformations and the epsilontransducer. J. Stat. Phys. 161, 404–451 (2015).
 4.
Meddis, R. Simulation of auditoryneural transduction: Further studies. J. Acoust. Soc. Am. 83, 1056–1063 (1988).
 5.
Rieke, F. Spikes: Exploring the Neural Code (MIT press, 1999).
 6.
Tishby, N. & Polani, D. In Perceptionaction Cycle, 601–636 (Springer, 2011).
 7.
Gordon, G. et al. Toward an integrated approach to perception and action: conference report and future directions. Front. System Neurosci. 5, 20 (2011).
 8.
Gu, M., Wiesner, K., Rieper, E. & Vedral, V. Quantum mechanics can reduce the complexity of classical models. Nat. Commun. 3, 762 (2012).
 9.
Mahoney, J. R., Aghamohammadi, C. & Crutchfield, J. P.Occam’ss quantum strop: Synchronizing and compressing classical cryptic processes via a quantum channel. Sci. Rep. 6, 20495 (2016).
 10.
Tan, R., Terno, D. R., Thompson, J., Vedral, V. & Gu, M. Towards quantifying complexity with quantum mechanics. Eur. Phys. J. Plus. 129, 1–12 (2014).
 11.
Kallenberg, O. Foundations of Modern Probability (Springer Science & Business Media, 2006).
 12.
Tiňo, P. & Köteles, M. Extracting finitestate representations from recurrent neural networks trained on chaotic symbolic sequences. IEEE T Neural Networ 10, 284–302 (1999).
 13.
Larrondo, H., González, C., Martin, M., Plastino, A. & Rosso, O. Intensive statistical complexity measure of pseudorandom number generators. Phys. A. 356, 133–138 (2005).
 14.
Gonçalves, W., Pinto, R., Sartorelli, J. & de Oliveira, M. Inferring statistical complexity in the dripping faucet experiment. Phys. A. 257, 385–389 (1998).
 15.
Park, J. B., Lee, J. W., Yang, J.S., Jo, H.H. & Moon, H.T. Complexity analysis of the stock market. Phys. A. 379, 179–187 (2007).
 16.
Nielsen, M. A. & Chuang, I. L. Quantum Computation and Quantum Information (Cambridge University Press, 2010).
 17.
Perry, C., Jain, R. & Oppenheim, J. Communication tasks with infinite quantumclassical separation. Phys. Rev. Lett. 115, 030504 (2015).
 18.
Ahrens, J., Badzig, P., Cabello, A. & Bourennane, M. Experimental deviceindependent tests of classical and quantum dimensions. Nat. Phys. 8, 592–595 (2012).
 19.
Gallego, R., Brunner, N., Hadley, C. & Acn, A. Deviceindependent tests of classical and quantum dimensions. Phys. Rev. Lett. 105, 230501 (2010).
 20.
Kleinmann, M., Gühne, O., Portillo, J. R., Larsson, J.Å. & Cabello, A. Memory cost of quantum contextuality. New. J. Phys. 13, 113011 (2011).
 21.
Dale, H., Jennings, D. & Rudolph, T. Provable quantum advantage in randomness processing. Nat. Commun. 6, 8203 (2015).
 22.
Yuan, X. et al. Experimental quantum randomness processing using superconducting qubits. Phys. Rev. Lett. 11, 010502 (2016).
 23.
Modi, K., Brodutch, A., Cable, H., Paterek, T. & Vedral, V. The classicalquantum boundary for correlations: discord and related measures. Rev. Mod. Phys. 84, 1655 (2012).
 24.
Crutchfield, J. P. & Görnerup, O. Objects that make objects: the population dynamics of structural complexity. J. R. Soc. Interface 3, 345–349 (2006).
 25.
Hanson, J. E. & Crutchfield, J. P. The attractorbasin portrait of a cellular automaton. J. Stat. Phys. 66, 1415–1462 (1992).
 26.
Shalizi, C. R. Causal Architecture, Complexity and SelfOrganization in the Time Series and Cellular Automata. Ph.D. thesis, Univ. WisconsinMadison (2001).
 27.
Crutchfield, J. P. & Whalen, S. Structural drift: The population dynamics of sequential learning. PLoS Comput. Biol. 8, e1002510 (2012).
 28.
Wiesner, K., Gu, M., Rieper, E. & Vedral, V. Informationtheoretic lower bound on energy cost of stochastic computation. In Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, vol. 468, 4058–4066 (The Royal Society, 2012).
 29.
Garner, A. J. P., Thompson, J., Vedral, V. & Gu, M. When is simpler thermodynamically better? [arXiv:1510.00010] (2015).
 30.
Cabello, A., Gu, M., Gühne, O., Larsson, J.Å. & Wiesner, K. Thermodynamical cost of some interpretations of quantum theory. Phys. Rev. A 94, 052127 (2016).
 31.
Noble, B. & Daniel, J. W. Applied Linear Algebra, Vol. 3 (PrenticeHall, New Jersey, UAS, 1988).
 32.
Suen, W. Y., Thompson, J., Garner, A. J. P., Vedral, V. & Gu, M. The classicalquantum divergence of complexity in the ising spin chain. [arXiv:1511.05738] (2015).
Acknowledgements
We thank Suen Whei Yeap, Blake Pollard, Liu Qing, and Yang Chengran for their input and helpful discussions. This work was funded by the John Templeton Foundation Grant 53914 ‘Occam’s Quantum Mechanical Razor: Can Quantum theory admit the Simplest Understanding of Reality?’; the Oxford Martin School; the Ministry of Education in Singapore, the Academic Research Fund Tier 3 MOE2012T31009; the Foundational Questions Institute Grant Observerdependent complexity: The quantumclassical divergence over ‘what is complex?’ the National Research Foundation of Singapore and in particular NRF Award No. NRF–NRFF2016–02.
Author contributions
All authors conceptualized the project and developed the examples. J.T. worked through the detailed calculations. J.T. and M.G wrote the manuscript. M.G. led the project.
Competing interests
The authors declare that they have no competing interests.
Author information
Affiliations
Centre for Quantum Technologies, National University of Singapore, 3 Science Drive 2, Singapore, 117543, Singapore
 Jayne Thompson
 , Andrew J. P. Garner
 , Vlatko Vedral
 & Mile Gu
Atomic and Laser Physics, University of Oxford, Clarendon Laboratory, Parks Road, Oxford, OX1 3PU, UK
 Vlatko Vedral
Department of Physics, National University of Singapore, 3 Science Drive 2, Singapore, 117543, Singapore
 Vlatko Vedral
School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, 639673, Singapore
 Mile Gu
Complexity Institute, Nanyang Technological University, Singapore, 639673, Singapore
 Mile Gu
Authors
Search for Jayne Thompson in:
Search for Andrew J. P. Garner in:
Search for Vlatko Vedral in:
Search for Mile Gu in:
Corresponding authors
Correspondence to Jayne Thompson or Mile Gu.
This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/