A space–time tradeoff for implementing a function with master equation dynamics

Master equations are commonly used to model the dynamics of physical systems, including systems that implement single-valued functions like a computer’s update step. However, many such functions cannot be implemented by any master equation, even approximately, which raises the question of how they can occur in the real world. Here we show how any function over some “visible” states can be implemented with master equation dynamics—if the dynamics exploits additional, “hidden” states at intermediate times. We also show that any master equation implementing a function can be decomposed into a sequence of “hidden” timesteps, demarcated by changes in what state-to-state transitions have nonzero probability. In many real-world situations there is a cost both for more hidden states and for more hidden timesteps. Accordingly, we derive a “space–time” tradeoff between the number of hidden states and the number of hidden timesteps needed to implement any given function.


I. INTRODUCTION
Many problems in science and engineering involve understanding how a physical system can implement a given map taking its initial, "input" state to its "output" state at some fixed later time. Often such a map is represented by some stochastic matrix M over the state space X of our system. For example, M may be a conditional distribution that governs the evolution of some naturally occurring system between two particular moments, and we wish to understand what underlying physical process could result in that distribution. Alternatively, M might be the update function of a system we wish to design, e.g., it could be the rule updating the logical state of a digital computer, and we wish to understand how to physically construct such a system.
In this paper we consider how the characteristics of M restrict the possible physical systems that could implement it. As is conventional in many fields of science (including in particular much of stochastic thermodynamics [1][2][3][4][5][6]), we restrict attention to physical systems with master equation dynamics, i.e., that evolve according to time-inhomogeneous continuous-time Markov chain (CTMC) dynamics over a discrete state space.
The problem of constructing a CTMC to implement a given stochastic matrix is called the embedding problem in the mathematics literature [7][8][9]. Perhaps surprisingly, not every matrix M can be implemented with a CTMC. For example, it is known that any M that can be implemented with a CTMC must obey [7,8,10] i M ii ≥ det M > 0 .
(1) * Massachusetts Institute of Technology; Arizona State University This restriction holds no matter how much time we allow the CTMC to run, and no matter whether the CTMC is reversible (i.e., obeys detailed balance) or not. In the context of thermodynamics, this restriction holds for both closed and driven physical systems, and irrespective of how much work is done or how much heat is dissipated as the system evolves.
Note that any single-valued function f (other than the identity), when expressed as a stochastic matrix M , has i M ii = 0, thus failing the conditions of Eq. (1). Therefore no single-valued function can be exactly implemented with a CTMC. At the same time, CTMCs are often used to analyze physical systems implementing single-valued functions. In particular, the bit erasure function, represented by the stochastic matrix M = ( 1 1 0 0 ), has been the subject of numerous studies in stochastic thermodynamics, in which the state of the bit (which may be a coarse-graining of an underlying phase-space) undergoes continuous-time Markovian dynamics [11][12][13][14][15][16][17].
A partial resolution of this paradox is that some stochastic matrices can be approximated arbitrarily accurately as the limit of CTMCs that take an arbitrarily long time and/or have a rate matrix with arbitrarily large entries, called the quasistatic limit. Stochastic matrices that can be accurately approximated in the quasistatic limit have determinant 0, and so are infinitesimally close to satisfying the conditions of Eq. (1). In fact, all logically non-invertible single-valued functions, including bit erasure, have determinant 0 and, as we demonstrate formally below, can be modeled this way.
On the other hand, for logically invertible functions M (except the identity map), the product of the diagonals ( i M ii ) is equal to 0, while the determinant is either 1 or −1. Thus, such maps are not infinitesimally close to satisfying the conditions of Eq. (1), and such func-tions cannot be embedded even in the quasistatic limit of CTMCs, as we formally establish below. As an example, the bit flip function, represented by the matrix M = ( 0 1 1 0 ), cannot be implemented, even approximately, by any CTMC over a two-state system.
How is it possible, then, that logically invertible functions are performed by actual physical systems whose dynamics can be accurately modeled as CTMCs? The work in [18,19] showed that any map M over state space X can in fact be implemented with a quasistatic CTMC -if the process has access to a set of hidden, "auxiliary" states, in addition to the visible states X (see also [20], where similar results are derived under coarse-graining). If M is a single-valued function, then it is implemented using a sequence of quasistatic protocols that carry out many-toone functions over an augmented space Y that includes both X and the hidden states. As a result of the entire sequence, the map M , which may be logically reversible or not, is implemented over the subspace X, while the map over the entire space Y is by construction logically irreversible. Note that repeating such a CTMC will iteratively apply the function M to the states in X. As one consequence, this analysis can be used to model the update function of a conventional digital computer, which is repeatedly applied to the computer's logical state until the computation ends.
The minimal number of hidden states needed to implement a given stochastic matrix M depends on the details of that matrix. As mentioned above, some functions, like bit erasure, can be implemented without any hidden states. The minimal number of hidden states needed to implement a given stochastic matrix M can be viewed as a new kind of cost of implementing M , not previously considered in the literature. We refer to this as the state space cost (or sometimes simply as the space cost) of M . The earlier work on physical implementations of arbitrary functions [18][19][20] did not investigate the details of this cost, focusing instead on issues related to thermodynamic reversibility. In a companion paper [21], we investigate this cost for arbitrary stochastic matrices (not just single-valued functions), and derive bounds that relate the details of the matrix M to the minimal number of hidden states needed by any CTMC that implements M . Importantly, the results in that companion paper are universal, applying to all systems governed by continuous-time Markov dynamics, not just the precise systems analyzed in [18][19][20].
In addition to exploiting a set of hidden states to implement a given matrix M , the constructions in these earlier works used a sequence of distinct "steps" to implement M . In each of those steps, some degrees of freedom of the physical system are allowed to dynamically evolve while others are held fixed. The sets of frozen and changing degrees of freedom varies from step to step. Thus, each step is characterized by a different set of allowed transitions that the system can make. We refer to such steps as the hidden timesteps of the CTMC.
Everything else being equal, it is often more difficult to implement CTMCs which have a large number of hidden timesteps. For example, suppose that an engineer implements a given map using a physical system that is connected to a heat bath and undergoes a Hamiltonian driving protocol, and evolves according to CTMC dynamics [1,22,23]. Additional hidden timesteps can arise when one of two things happen: either some state of the physical system goes from having a finite energy to an infinite energy (or vice versa), or some transition rate between two states goes from zero to non-zero (or vice versa). These changes often involve the introduction or removal of infinite-sized energy barriers, either in the state space of the CTMC or some "underlying" state space. 1 The engineer implementing the map will often want to minimize the number of times that infinite-sized energy barriers must be inserted or removed, since such transformation can be difficult to achieve. Accordingly, the minimal number of hidden timesteps needed by any CTMC to implement a given stochastic matrix M can be seen as another cost of implementing M , in addition to the cost of the number of hidden states required to implement M . We refer to this as the timestep cost of M .
Our definition of timestep cost is similar in spirit to some work in control theory, in which the cost of implementing a given control protocol is quantified by the number of (some type of) elementary operations needed to implement the protocol. For instance, the difficulty of carrying out some control protocol which implements a stochastic matrix M is sometimes quantified by counting the number of Poisson matrices (which correspond to extremal control actions) whose product equals M [28][29][30]. Surprisingly, as we prove below, our more physically motivated timestep cost (i.e., the minimal number of hidden timesteps needed to implement M ) can also be evaluated by counting the number of elementary matrices whose product is M . However in our case, the elementary matrices are idempotent matrices rather than Poisson matrices.
In this paper, we analyze state space costs and timestep costs, as well as how they affect one another, for single-valued functions. In Section IV, we show that any non-invertible function can be implemented with no hidden states, while any invertible function (other than the identity) requires exactly one hidden state. However, implementing functions using the minimal possible number of hidden states can impose a very large timestep cost. We show that there is a tradeoff between state space and timestep costs, meaning that timestep cost can be reduced if additional hidden states are available. We also show that the precise nature of the tradeoff between these costs depends on the details of the implemented function.
This tradeoff is analogous to the tradeoff of space and time costs considered in computer science. As an example, consider a digital circuit, implemented using some physical system that obeys master equation dynamics. The visible states of the circuit are the possible joint values of the inputs and outputs of all the gates in the circuit, and each visible timestep can be taken to be an iteration of the update function f which specifies how the circuit updates that visible state. The number of "visible" timesteps needed for the circuit to implement some given overall Boolean function g (i.e., the number of times f must be iterated to implement g) is the depth of the circuit. Circuit complexity theory is concerned with the tradeoff between the number of visible states and the number of visible timesteps needed for the best possible circuit to implement such a given g [31,32].
Our results, on the other hand, specify the "hidden" requirements of implementing each iteration of f . For example, they specify the minimal number of hidden states and hidden timesteps that any system implementing a desired digital circuit needs to implement a single pass through that circuit. On a smaller scale, our results also apply individually to each gate in such a circuit, if we redefine the visible states to be the possible joint values of a gate's input and output bits. In this case, they give the minimal number of hidden timesteps and the minimal number of hidden states needed to operate the gate, and specify the space/time tradeoff between these two numbers. At the other extreme, our analysis also applies to any underlying physical system that implements a single iteration of a set of multiple circuits coupled together to form an entire digital computer. Our results even apply to the (partial) function that specifies transitions from a digital computer's initial state to its final state that is given by repeating such passes through the entire computer until it halts [33,34]. In this sense, our results give the minimal "hidden computer" of any physical system that implements a given "visible computer", for visible computers that range from a single gate all the way up to the partial function given by running an entire (finite) digital computer until it halts.
It is now widely known that there are fundamental constraints on physical systems that perform information processing. Probably the most famous such constraint is Landauer's principle [20,23,[35][36][37][38][39], which gives the minimum amount of heat that must be generated by any physical system that implements a given map. The results in this paper, as well as those in the companion manuscript [21], derive novel constraints on informationprocessing in terms of state space and and timestep resources, as well as novel tradeoffs between such resources. Though these results are physically motivated, in particular by the usual derivations of master equation dynamics in stochastic thermodynamics, they are independent from more "traditional" thermodynamic bounds on heat and work. In fact, the results derived in this paper are independent of considerations like whether local detailed balance holds, how much entropy is produced, how many heat baths (if any) the system is connected to, etc. At the same time, it is likely that future work may uncover additional tradeoffs between requirements of space, timesteps, and thermodynamic resources like heat and work.

A. Roadmap and nomenclature
In Section II we start by formally defining concepts, including hidden states and hidden timesteps, that will be central to our analysis and establishing some of their properties. After this, we restrict attention to stochastic matrices M that are single-valued maps. We sometimes refer to such matrices as functions, with the term "single-valued" implicit. We make a connection between timestep costs and idempotent functions in Section III, and then derive results about timestep costs, and their tradeoffs with state space costs, in Section IV. Section V provides some discussion of various subtleties and implications of our results. We end with a summary and suggestions for future work in Section VI.
In this paper we focus on CTMCs over a finite state space X. We will sometimes use variants of a lower case x to indicate elements of X (e.g., x 0 is the state of X at time t = 0). For visual clarity we often use subscripts i, j, . . . to indicate generic elements of X. Given rate matrix R, we write the rate for going from x ∈ X to x ∈ X as R xx . We write the matrix of transition probabilities of a given CTMC going from time t to time t ≥ t as a function of those times, e.g., T (t, t ), and use T (t, t ) ij to indicate the particular transition probability from state j at time t to state i at time t . For a CTMC with time- We adopt the convention that if M is a stochastic matrix then i M ij = 1, meaning that probability distribution vectors are multiplied on the left by stochastic matrices. A stochastic matrix M is called embeddable [7,8] if there exists a CTMC with rate matrix R(t) such that In our discussion of limits of matrices, any natural topology can be assumed, e.g., the metric topology induced by any matrix norm.
All proofs not in the text are in the appendices.

II. THE NUMBER OF HIDDEN TIMESTEPS IN A CTMC
As mentioned, any embeddable stochastic matrix must have strictly positive determinant (Eq. (1)). This follows from the fact that if M = OE[R](0, 1), then det M = 1 0 dt exp Tr R(t) [8], and the RHS cannot be negative. This reveals that even simple functions like bit erasure, which have determinant 0, are not embeddable. However, bit erasure can be implemented arbitrarily accurately by taking the limit of an appropriate sequence of CTMCs.
To establish this formally, we first note that because the set of stochastic matrices is closed, any limit of a sequence of stochastic matrices is itself a stochastic matrix. This allows us to construct limits of CTMCs in terms of limits of the associated transition matrices: In this case we say that {T (n) : n = 1, 2, . . . } (or sometimes just T ) limit-embeds the stochastic matrix M .
A stochastic matrix M may be limit-embeddable but not embeddable, i.e., there may be no CTMC with finite rate matrices that implements M . In particular, it may be that while each T (n) in a sequence that limit-embeds M can be described in terms of finite rate matrices, in the limit these rate matrices have infinite components. For example, this is the case with bit erasure (see Example 1 below).
As mentioned, we measure the difficulty of implementing a given CTMC by counting how often the adjacency matrix of its transition matrix changes. We can use Definition 1 to quantify the difficulty. To see how, first recall that the adjacency matrix A(M ) of stochastic matrix M is Given our definition of implementation difficulty, the matrix that is easiest to implement is one with a limitembedding in which the adjacency matrix of the transition matrices never changes: Note that in our definition, the adjacency matrix is only assumed to be constant over the open interval t ∈ (0, 1). The continuity condition guarantees that though the adjacency matrix may change at t ∈ {0, 1}, the transition matrices are still well-behaved at those endpoints.

Example 1.
In the model of bit erasure described in [14] a classical bit is stored in a quantum dot, which can be either empty (state 0) or filled with an electron (state 1). The dot is brought into contact with a metallic lead at temperature T which can transfer an electron to/from the dot. The propensity of the lead to give an electron is set by it's chemical potential, indicated by µ(t) at time t.
The energy of an electron in the dot is indicated by E. Let p(t) indicate the two-dimensional vector of probabilities, with p 0 (t) and p 1 (t) being the probability of an empty and full dot, respectively. These probabilities evolve according to the rate matrix [14] where C sets the time-scale of the exchange of electrons between the dot and the lead and f (E, t) is the Fermi distribution of the lead, Assume for simplicity that the chemical potential is constant over time, µ(t) = µ, and define α : for any t ≥ t. Note that in the limit of µ → −∞, α vanishes. Taking this limit along with the long-time limit C → ∞ gives for t > t. Using our previous definitions, the matrix in Eq.
(3) corresponds to T (t, t ). This shows that bit erasure is limit-embeddable: T (0, 1) carries out bit erasure. Furthermore, the adjacency matrix of Thus, bit erasure is stable.
We use our definition of stable matrices to count the minimal number of times that the adjacency matrix must change in order to implement a given stochastic matrix: We extend this definition in the natural way to encompass the use of hidden states:  Example 2. Suppose we wish to flip the state of a bit (X := {0, 1}), which is physically represented by the voltage level across a particular capacitor. Assume the voltage level can be either V 0 (corresponding to X = 0) or V 1 (corresponding to X = 1), and evolves according to a CTMC. One way to do this begins by replacing the original two-state system with a three-state system, by introducing a third voltage level, V 2 . Given that the original voltage was either V 0 or V 1 , we can flip between those two voltages using the following sequence of idempotent functions over the voltage level: (See Fig. 1.) Note that the third voltage level served as a hidden state, so Y := X ∪ {V 2 }. Thus, we have implemented a bit flip using one hidden state and three hidden timesteps. In fact, results below together with Example 3 show that this is the minimal number of hidden timesteps needed to implement a bit flip. In other words, the timestep cost of a bit flip with 1 hidden state is 3.
Note that a matrix M can be stable even if there are times t, t < t and pairs of states i, j such that , if there is a time between 0 and t at which probability can flow from i to j, but that that flow is cut off between t and t. Thus, the timestep cost is a lower bound on the number of times the rate matrix adjacency structure actually changes when implementing some matrix M . However, while it is possible to strengthen Definition 2 to avoid the possibility of such "invisible" changes, this would not affect any of our results. Moreover, as we show below (Lemma 2 and associated results in the appendix), this lower bound is always achievable: if an arbitrary function f has timestep cost τ , it is always possible to implement f while only changing the rate matrix τ times.
Note as well that by normalization of probability under the map M , if the product i L (i) in Def. 3 is applied to an initial distribution with no probability mass on any of the hidden states, then it produces a final distribution with no probability mass on any of them. This means that if we iterate i L (i) over the augmented space Y then we iterate M over the subspace X -thereby fulfilling a common (albeit often implicit) desideratum of digital devices.
Our final two definitions will be central to our analysis of the space and timestep costs of implementing a computation.
The first of these is common in the literature: Accordingly The second definition formalizes a type of transitivity relation for stochastic matrices: In particular, an embeddable M is transitive if the CTMC that embeds it cannot both send probability from i to j and from j to k without also sending probability all the way from i to k.
Note that there is an intimate connection between stable matrices and transitivity, which we will make extensive use of in proving the results that follow. Recall that a stable matrix is implemented by a CTMC with a constant adjacency matrix. One outcome of this definition is that any stable matrix must be transitive.

III. TIMESTEP COST AND IDEMPOTENT FUNCTIONS
In this section, we make a formal connection between timestep cost and idempotent functions. To do, we first provide a lemma showing that idempotent functions are stable, which is proved by construction (details in the appendix). We then show that it is in fact sufficient to consider idempotent functions in order to compute the timestep cost of any function f .

Lemma 2. Any idempotent function over a finite X is a stable matrix.
Lemma 2 means that we can get an upper bound on the timestep cost of a single-valued matrix M over a finite X by finding the minimal number of idempotent functions that equals M . The next result shows that any function which can be implemented with hidden timesteps can be equivalently implemented as a product of idempotent functions. Thus, the upper bound in terms of idempotent functions is tight: Lemma 3. Suppose the stochastic matrix M over Y ⊇ X has timestep cost and the restriction of M to X is a function f : X → X. Then there is a product of idempotent functions over X whose restriction to X equals f .
As a special case of this result, we note that if X = Y , and M is single-valued and stable (so = 1), then M must be an idempotent function.
Together these lemmas allow us to reduce calculation of timestep cost from the number of stable matrices we need to the number of idempotent functions we need:

IV. EVALUATING TIMESTEP COSTS
Theorem 4 established a link between timestep cost and idempotent functions. Idempotent functions are an important object of study in the mathematical field of semigroup theory [40][41][42][43][44]. Thus, we are able to use existing results from this field to calculate the timestep cost (to within 1) for any function.
First, for any function f : X → X, write fix(f ) for the number of fixed points of f , and | img(f )| for the size of the image of f . We also define cycl(f ) as the number of cyclic orbits of f , i.e., as the number of distinct subsets of X of the form {x, f (x), f (f (x)), . . . , x} where each element has a unique inverse under f . 2 No non-identity product of idempotents is invertible. Thus, any invertible function other than the identity require at least one hidden states to implement. At the same time, non-invertible functions can be implemented without any hidden states. The associated number of timesteps required to implement the function is specified by the following theorem, proved in [44]: 2 Equivalently, a cyclic orbit of f is a subset S of X such that for all x ∈ S there is some c ≥ 2 such that f (c) (x) = x (where f (c) indicates the composition of f with itself c times), and there is no x ∈ S, y S such that f (y) = x [42].
where b 0 (f ) equals either zero or one.
The expression for b 0 (f ) in [44] is not easy to calculate, though some sufficient conditions for b 0 (f ) = 0 are known.
We extend this result to arbitrary functions, while also allowing hidden states: Theorem 6. Let f : X → X. For any number of hidden states k > 0, the timestep cost is where b k (f ) equals either zero or one.
An immediate outcome of the above two theorems is that

Corollary 7. Any non-invertible function can be implemented with no hidden states, while any invertible function can be implemented with one additional hidden state.
Proof. Observe that S 0 is defined and finite for any noninvertible function (Theorem 5), and S 1 is defined and finite for any invertible function (Theorem 6).
Theorem 6 gives the minimal number of hidden timesteps we need to implement a function f if we are allowed to use at most k hidden states. An important special case is the following example: Another interesting issue to consider is the minimal timestep cost necessary for some function f , provided that there are no restrictions on the number of hidden states that can be used. The following example shows that any f can be implemented in two timesteps, as long as | img(f )| hidden states are available. More generally, while Theorem 6 gives the minimal number of timesteps needed to implement f if we are given k hidden states, the following rough "converse" of that theorem gives the minimal number of hidden states needed to implement f if we are allowed to use at most τ timesteps:

Corollary 8. Let τ > 3 and define
We can implement f in τ timesteps if we have at least k hidden states, where . . , 15}. The first of these function, indicated by "Cycle", is the map x → x + 1 mod 16, i.e., a complete permutation of X. The other function, indicated by "Complement", represents each element of X as a four-bit string and then applies the bitwise NOT operation, e.g., 1 = 0001 → 1110 = 14. Note that Complement generally requires more steps to implement than Cycle, due to its large number of cyclic orbits.

A. Space-timestep tradeoffs under constraints
Our results on the tradeoffs between timestep and space costs assume that CTMC dynamics can be sufficiently controlled so as to produce any desired idempotent function. In this sense, these results can be understood as quantifying the time-space tradeoff under the "best-case" scenario.
In real world situations, there will often be severe constraints on the kinds of idempotent functions that can be practically carried out. In particular, the number of idempotent functions over any finite set X is [45]. As an illustration of how big this number is in practice, suppose that X is the state space of a set of N binary spins, so |X| = 2 N . For even quite moderately large N , the number of possible idempotents is extremely large. To suppose that we can design a system to implement an arbitrary sequence of idempotent selected from such an enormous set of possibilities seems far-fetched.
Limitations on the set of idempotent functions that can actually be implemented can have major consequences on time-space tradeoffs. To use the previous example, consider implementing some arbitrary function over the 2 N states using a physical system consisting of N binary spins (representing N bits), along with some number of additional "hidden" spins (which encode hidden states). Suppose that we are limited in the types of idempotent functions that we can use. In particular, suppose we are only able to run idempotent functions that manipulate the state of one or two spins at once (with no restrictions on what spins we manipulate, nor whether the spins are visible or hidden), while the state of all other spins remains unchanged.
It is shown in Appendix E that we can use such a restricted set of idempotent functions to implement any single-valued function over the visible spins -provided we have enough hidden spins. However implementing a function with this restricted set of idempotent functions will in general require more hidden states and more hidden timesteps than if we were able to run arbitrary idempotent operations on the system. Indeed, the analysis in Appendix E relies on expressing the sequence of idempotent functions needed to implement a function f over the visible spins as a Boolean circuit, the idempotent functions being identified with gates in the circuit. The number of timesteps is related to the depth of that circuit, whereas the number of hidden states is determined by the number and type of gates in that circuit. As such, the analysis in that appendix is related to ongoing research in circuit complexity theory [31,32].

B. Coarse-graining
There are other ways to model the implementation of single-valued functions with CTMCs, some of which might at first seem to violate our results. In particular, suppose we identify X as a set of coarse-grained macrostates, where the microstate dynamics evolve according to a CTMC in such a way that X evolves according to the desired function f . By appropriate choice of the coarse-graining and underlying CTMC, we can (appear to) violate the timestep cost bounds derived above for implementing f As an example, consider a system involving two capacitors, one with binary state space W and one with binary state space Z. The elements of W partition the joint space of microstates W × Z into two macrostates. There are many ways to design dynamics over the microstates to implement a bit flip between the two macrostates W = 0 and W = 1. For example, so long as the state of Z is 0 before it starts, the following single idempotent function over W × Z would implement a bit flip over W , since it would send (0, 0) → (1, 1) and (1, 0) → (0, 1) : Note though that Z = 1 at the end of this idempotent function, whereas it equalled 0 at the beginning. Therefore if we iterated this function, i.e., repeated the underlying CTMC, we would not implement a second bit flip. Indeed, since this would iterate an idempotent function, such a second implementation would leave the full state (W, Z) unchanged.
So flipping a bit with this single idempotent function violates a common (though often implicit) desideratum of how to design digital devices: in general we want the same function over the visible states to be repeated if we repeat the underlying CTMC. In contrast, this desideratum is met by the constructions analyzed above involving hidden states; those CTMCs all implement the associated function x → f (x) no matter how many times they are run. (See [46] for a careful analysis of how to meet this desideratum when the visible states are identified with macrostates, as in the capacitor example.)

VI. SUMMARY AND FUTURE WORK
In this paper we analyzed the hidden resources that any physical system must exploit in order to implement some specified conditional distribution over a set of "visible" states. A prototypical example is given by the hidden resources needed by any physical system that implements a desired logical gate, i.e., that transforms the joint value of the input and output bits of such a gate in a desired manner.
We restrict attention to physical systems that can be modeled as a time-inhomogeneous continuous-time Markov process. Many single-valued functions f over a set of "visible" states X cannot be implemented by any such process, even approximately. However, any f can be implemented by such a process if it has access to some additional "hidden" states not in X. Motivated by engineering considerations, we consider a natural decomposition of the dynamical evolution of such a system into a sequence of "hidden" timesteps, demarcated by changes in the set of allowed transitions between states. We demonstrate a tradeoff between the number of hidden states and the number of hidden timesteps needed to implement any given f using a CTMC, analogous to space / time tradeoffs in computer science theory [31,32,47]. At the extremes of the tradeoff, we find that any logically non-invertible function can be implemented with no hidden states, while any logically invertible function can be implemented with one hidden state; however, these "space-constrained" implementations may require a very large number of hidden timesteps. At the other extreme, we find that any function f : X → X can be implemented with two hidden timesteps as long as a |img(f )| hidden states are available, where img(f ) is the image of f . Since |img(f )| ≤ |X|, it suffices to have one additional bit of storage, which doubles the state space, to implement any f within two timesteps.
As mentioned in the Introduction, our results can be seen as uncovering novel constraints on physical systems that perform information processing. These constraints concern resources like available state space and difficulty of implementation, as quantified by our notion of timestep cost. These constraints are independent from, and complementary to, well-known thermodynamics constraints on heat and work required to perform computation, such as Landauer's bound [35,37]. However, it is interesting to note that in standard treatments of the thermodynamics of computation, logically reversible maps are viewed as free, whereas many-to-one maps are viewed as costly. Moreover, noisy maps, which have inherent into outputs, can be cheaper than single-valued ones, in the sense that the minimal free energy they require to run can actually be negative [18,20,48]. In contrast, when considering the number of hidden states required to implement a computation, many-to-one maps are free and the logically reversible ones are costly. Moreover, as shown in our companion paper [21], noisy maps may require more hidden states to implement than singlevalued ones. So the relative benefits of many-to-one, invertible, and noisy maps in thermodynamics is ordered in the opposite way from how they are ordered when considering hidden states.
There are several important directions for future work. First, in this paper we only calculated the timestep cost up to an additive value of 0 or 1 steps. Future work involves trying to calculate timestep cost exactly, e.g., by expressing the additive value as an already wellcharacterized optimization problem.
Second, as we emphasized in a previous section, our results assume that there are no significant restrictions on the kinds of dynamics that an engineer can apply to a system. If such restrictions exist, then different -and generally worse -tradeoffs will govern the relationship between hidden state and hidden timestep requirements. Investigating these kinds of "constrained" tradeoffs remains for future work.
Third, in this paper we focused on the space and timestep tradeoffs involved in implementing single-valued functions. Future work involves relaxing this restriction, and computing the minimal number of stable matrices required to implement an arbitrary (possibly non-singlevalued) stochastic matrix. This is closely related to our work in a companion paper [21], because the "local relaxations" studied there are stable matrices. In particular, upper bounds are derived there on the number of hidden states required to implement any stochastic matrix with a sequence of local relaxations, and so also provide us with an upper bound on the the number of hidden states needed to implement that stochastic matrix using a sequence of stable matrices. In addition, those bounds are derived in [21] by explicitly constructing a sequence of local relaxations that implements a given stochastic matrix. Simply by counting how long those sequences are, they give us an explicit bound on the timestep cost of implementing such a distribution.
An important consequence of this generalization would be the ability to analyze tradeoffs between space, timesteps, and accuracy of computing some function.
In particular, suppose that one wanted to implement a given single-valued function f but were willing to use any stochastic matrix P within some of f . One could then analyze how hidden state and timestep requirements change with increasing . Finally, other future work involves extending this framework to evaluate space and timestep tradeoffs for functions over countably infinite (and ultimately uncountably infinite) state spaces. In particular, this should allow us to extend our results to apply to Turing machines.

Appendix B: Idempotent functions are stable
To show that idempotent function f are stable, we first prove a more general result (which is not presented in the main text): Theorem 11. If M is a block-diagonal stochastic matrix where each block has rank 1, then M is stable.
Proof. Consider first a single block B of M , which by assumption is rank 1. Define a family of rate matrices which do not depend on time, parameterized by a rate parameter γ, For any time-invariant rate matrix the ordered exponential is the regular matrix exponential,. So we can evaluate the transition matrix associated with R γ for any two times t, t > t and any γ: Since B is stochastic and rank 1, it is idempotent 3 , which implies R 2 γ = −γ 2 (B − I), and more generally R n γ = −(−γ) n (B − I) for n ≥ 1. This lets us write the matrix exponential as 3 To see this, note that each column of B is the same and its entries sum to 1, so we can write the product as Plugging this into the quasistatic limit of the transition matrix (given by taking γ → ∞), we get Since this holds for all t, t > t ∈ [0, 1], B is stable. To see that the full matrix M is stable, simply consider a rate matrix with the same block structure as M , with each block B replaced by the associated matrix R γ defined above, and repeat the above argument.
The next result is a consequence of Theorem 11: Lemma 2. Any idempotent function over a finite X is a stable matrix.
Proof. As mentioned, f : X → X over a finite set X is idempotent if there is a partition of X, {X i : i = 1, . . . }, such that all x in any given partition element X i get mapped by f to the same element in X i . Note that such a function corresponds to a block-diagonal matrix M , with one block corresponding to each X i , and each block is rank-1. For each leaf v of Γ i , define W i (v) as the union of v and all its ancestors, and arbitrarily choose some single element y i (v) from the communicating class of that leaf node v. Label the leaves with the natural numbers, and iteratively define σ i Again using the fact that L (i) is transitive, we see that for all leaves v of Γ i , every y ∈ W i (v) has nonzero probability of going to y i (v) under L (i) .
Define an idempotent function L (i) that sends all x in each σ i (j) to y i (j) for all j. Let x, x x be any two elements of U . Note that i=1 L (i) sends nonzero probability mass from x to x only if f does. Since f is single-valued, it follows that the restriction of i=1 L (i) to X equals the function f .
An important special case of Lemma 3 arises by taking = 1 and X = Y : Corollary 12. If the stable matrix M is single-valued, then it must be an idempotent function.
Combining Lemma 2 and Lemma 3 gives the following result: where b k (f ) equals either zero or one.
Proof. Let Y = X ∪ Z where Z ∩ X = ∅ and |Z| = k. By definition S k (f ) is the minimum of S 0 (g) over all non-invertible functions g : Y → Y that equal f when restricted to X. Moreover, by Theorem 5, Due to the constraint that g(X) = f (X), our problem is to determine the optimal behavior of g over Z. For any fixed img(g), this means finding the g(Z) that minimizes cycl(g) − fix(g). Since f (X) ⊆ X, the constraint tells us that there are no cyclic orbits of g that include both elements of X and elements of Z. So all cyclic orbits of g either stay wholly within Z or wholly within X. Moreover changing g so that all elements of a cyclic orbit Ω lying wholly in Z become fixed points of g does not violate the constraint and reduces the timestep cost. Therefore under the optimal g, all z ∈ Z must either be fixed points or get mapped into f (X).
Our problem then reduces to determining precisely where g should map those elements it sends into f (X). To determine this, note that g might map an element of Z into an x that lies in a cyclic orbit of f , Ω. If that happens, Ω will not be a cyclic orbit of g -and so the timestep cost will be reduced. Thus, to ensure that cycl(g) is minimal, we can assume that all elements of Z that are not fixed points of g get mapped into f (X), with as many as possible being mapped into orbits of f . This quantity is minimized if m is as large as possible, which establishes the result.

Corollary 8. Let τ > 3 and define
We can implement f in τ timesteps if we have at least k hidden states, where Proof. Since b k (f ) is always 0 or 1, by Theorem 6 we know that we can implement f if τ and k obey This inequality will hold if The RHS is non-increasing in k. So we can implement f in τ timesteps, as desired, if k is the smallest integer that obeys the inequality. First hypothesize that the smallest such n is less than cycl(f ). In this case max cycl(f ) − k, 0 = cycl(f ) − k. So our bound becomes If instead the least k that obeys our inequality is greater than or equal to cycl(f ), then our bound becomes The fact that k must be a non-negative integer completes the proof.