## Abstract

Irreversible information processing cannot be carried out without some inevitable thermodynamical work cost. This fundamental restriction, known as Landauer’s principle, is increasingly relevant today, as the energy dissipation of computing devices impedes the development of their performance. Here we determine the minimal work required to carry out any logical process, for instance a computation. It is given by the entropy of the discarded information conditional to the output of the computation. Our formula takes precisely into account the statistically fluctuating work requirement of the logical process. It enables the explicit calculation of practical scenarios, such as computational circuits or quantum measurements. On the conceptual level, our result gives a precise and operational connection between thermodynamic and information entropy, and explains the emergence of the entropy state function in macroscopic thermodynamics.

### Similar content being viewed by others

## Introduction

Thermodynamics in essence is an information theory—its purpose is to make statements about systems for which we only have certain partial information, such as a gas of many particles for which only macroscopic quantities like temperature, volume and pressure are accessible. Following this point of view, Jaynes showed that the entropy function derived in statistical mechanics corresponds to the information-theoretic entropy of the gas associated with a macroscopic observer who is maximally ignorant of the microscopic degrees of freedom^{1}, resorting to Shannon’s mathematical theory of information^{2} developed in the context of telecommunications.

When the observers have access to knowledge about microscopic quantities, such as positions and velocities of particles in a gas, the second law of thermodynamics seems to break down, as was illustrated by Maxwell’s demon. To address this problem, Szilard^{3} studied a one-particle gas that can be located on either side of a box, left (‘L’) or right (‘R’), and noted that by isothermally compressing the gas or letting the gas expand, one can trade this one bit of information for *kT* ln 2 work, as depicted in Fig. 1a (in the presence of a heat bath at temperature *T*, and where *k* is Boltzmann’s constant). Landauer and Bennett later realized that the information content of data stored in a memory register, independently of the nature of its physical representation, counts as thermodynamic entropy when considering thermodynamical operations on that register^{4,5,6,7,8,9,10,11,12,13}. For example, given a bit in an unknown state, any operation that resets it to zero must dissipate at least *kT* ln 2 heat, and thus the corresponding amount of work must be supplied (this is known as Landauer’s principle). This fact salvages the second law of thermodynamics and resolves the paradox of Maxwell's demon.

More recently with the advent of quantum information, efforts were made to understand the laws of quantum thermodynamics from an information-theoretic viewpoint^{14,15,16,17,18}, while the increasing technological ability to control and manipulate nanoscale systems^{19,20} has prompted the study of particular operational models and frameworks, leading to characterization of the work cost of various information-theoretic tasks such as erasure and work extraction^{21,22,23,24,25,26,27,28,29,30,31,32}. For a more specific review of existing results, we refer to (Supplementary Note 1).

The aim of this work is to study thermodynamics in such generalized scenarios, where one may have knowledge about microscopic degrees of freedom, by resorting to modern tools of information theory^{33,34}. We provide a fundamental lower bound to the work cost of a physical implementation of a logical process, discuss several examples and illustrate how traditional thermodynamics emerges from our micrsocopic result in the limit of macroscopic systems.

## Results

### The Framework

We determine a general expression for the minimal amount of work needed to carry out any given logical process . This can be for example an AND gate or any quantum or classical computation; most generally is defined as any completely positive, trace-preserving map from quantum states on an input Hilbert space to quantum states on an output Hilbert space . We assume these spaces to be of finite dimension for simplicity; note that such a space can be a subspace of an infinite-dimensional Hilbert space in which the relevant computation or logical process takes place. The terminology ‘logical process’ is meant to emphasize that the mathematical object only specifies for each input state the corresponding output state and does not prescribe its physical realization, which would consist of a full description of a physical system including the parts of its environment that are relevant to determine its time evolution. Note that in performing a logical process one does not merely transform one quantum state into another; rather, the output must be related to the input in a precisely specified way. In the case where the input is a classical value, this means that the output depends on the particular input value received, and not only on the distribution of inputs. This might be checked in practice, for example, if one keeps a copy of the input as a reference system and observes the correlations between the output and the reference system.

There are, generally, many ways of actually realizing a logical process with an actual physical device. The device and its interactions with the environment (for example, a heat bath) may for example be described by a Hamiltonian or a Liouvillian. For our purposes, it is sufficient to specify the set of operations which the device is allowed to perform as well as the associated work cost. We then optimize the work expenditure over precisely those strategies, which realize the given logical process . Observe that the more permissive our framework is, the more robust our bound will be. In our model, we shall be allowed to implement at no work cost any trace-preserving completely positive map that is unital, that is, which preserves the identity operator. Note that if we were to allow any logical process that is not unital to be performed for free, one could flagrantly violate the second law of thermodynamics on a macroscopic scale: in this sense, unital maps are the most permissive logical operation that we can allow for free. The model must also include a description of a ‘battery’ that provides the energy required to drive the process. For this we resort to Bennett’s idea of an ‘information fuel tape’^{5,11}: such a battery consists of a large number of qubits with a degenerate Hamiltonian. Initially, a certain number *λ*_{1} of these qubits are in the maximally mixed state and the rest are pure. We may freely implement any joint unital map on the system and battery. At the end of the operation, the state of the battery consists of a possibly different number *λ*_{2} of qubits in a maximally mixed state, while the rest should be pure (The requirement that these *λ*_{2} qubits be maximally mixed is not a restriction, see Methods section.). We then count the amount of work consumed as *W*=*kT* ln 2·(*λ*_{2}−*λ*_{1}), which is the amount of work required to restore the battery system into its initial state. Indeed, a vast amount of literature has well underscored the correspondence between possessing a pure degenerate qubit, or storing *kT* ln 2 work, and vice versa^{3,5,11,12}. The quantity *W* may be negative, indicating that work can be extracted from the battery when restoring it to its initial state. In addition, we assume that the input to the logical process is encoded in a system whose initial Hamiltonian is degenerate. The same is assumed about the output system at the end of the computation. Note that this does not exclude making use of systems with nontrivial Hamiltonians during the implementation of the process. Also, this requirement is in practice not a limitation, as many other frameworks may be mapped to this setting^{26,28,29}; indeed the assumption should rather be regarded as a technicality to ensure a clean way of accounting for work.

To obtain physically relevant results, we also have to exclude overwhelmingly unlikely events from our considerations. This is actually quite common in thermodynamics and is usually done implicitly. For example, consider a stone lying on the ground. There is a very small chance that by thermal fluctuation the stone spontaneously jumps in the air. However, this event is so disproportionately unlikely that in a physical theory we may safely choose to ignore this possibility. Within our framework, we do this more explicitly. That is, we consider a parameter that specifies the total probability of all events we want to exclude. In the quantum regime, where events are generally not well-defined, this idea is captured by -approximations: the stone has a very small amplitude of being found in the air, but its state is -close to a state completely located on the ground. Analogously, we study the work requirement of logical processes that are -approximations of the desired logical process. This is a standard procedure in information theory^{33,34}, and is justified by the fact that an -approximation cannot be distinguished from the original logical process with probability greater than .

### The main result

To formulate our main claim, we represent the logical process by its Stinespring dilation^{35}. This is an isometry (which can be seen as part of a unitary) that maps *X* onto *X*′ as well as an extra system *E* such that the original map is retrieved by ignoring *E* (see Fig. 1b). Our main result asserts that , the work one needs to supply to execute the operation up to an -approximation, is lower bounded by

The right hand side is the smooth max-entropy of *E* conditioned on *X*′ and may be interpreted as a measure for the irreversibility of the logical process. More precisely, the smooth max-entropy is an information-theoretic measure defined in the Methods section, and quantifies the uncertainty one has about *E* when given access to *X*′. The parameters and are related by may be chosen arbitrarily. We stress that the system *E* is an abstract mathematical concept used to represent the logical map , and can be interpreted as the information discarded by the mapping. In particular, our bound is independent of the choice of this representation.

The form of the bound (1) naturally expresses our intuition that the amount of work that needs to be provided corresponds to the amount of information that is logically discarded, and which therefore has to be dumped into the environment. This consideration is done from the viewpoint of the observer who has completed the computation, and thus has access to *X*′, explaining the occurrence of the conditional entropy. Also, if *E* is classical, the max-entropy has the operational interpretation of being the amount of memory space needed to compress the information contained in *E* when possessing knowledge of *X*′ (ref. 36) (In the fully quantum case, it corresponds to quantum state merging^{37}.).

The proof of our main result proceeds by first considering the special case in which . The bound one then obtains is

where Π_{X} is the projector onto the support of the input state. This expression proves particularly useful for calculating some simple practical examples.

The proof of this special case, and its generalization to the regime where , is presented in the Methods section. An alternative proof, using techniques from majorization, is given in (Supplementary Note 5).

### Classical mappings and dependence on the logical process

Our result, which is applicable to arbitrary quantum processes, applies to all classical computations as a special case. Classically, logical processes correspond to stochastic maps, of which deterministic functions are a special case. As a simple example, consider the AND gate. This is one of the elementary operations computing devices can perform, from which more complex circuits can be designed. The gate takes two bits as input, and outputs a single bit that is set to 1 exactly when both input bits are 1, as illustrated in Fig. 2a.

The logical process is manifestly irreversible, as the output alone does not allow to infer the input uniquely. If one of the inputs is zero, then the logical process effectively has to reset a three-level system to zero, forgetting which of the three possible inputs 00, 01 or 10 was given; this information can be viewed as being discarded, and hence dumped into the environment. We can confirm this intuition with our main result, using the fact that a general classical mapping is given by the specification of the conditional probability *p*(*x*′|*x*) of observing *x*′ at the output if the input was *x*. Embedding the classical probability distributions into the diagonals of quantum states, the infinity norm in expression (2) becomes simply

where the sum ranges only over those *x* that have a non-zero probability of occurring. In the case of deterministic mappings *p*(*x*′|*x*)∈{0,1}, this corresponds to the maximum number of input states that map to a same output state. For the AND gate, provided all four states 00, 01, 10 and 11 have non-negligible probability of occurring, there are three input states mapping to the same output state, so (3) gives us simply . Also, in simple examples as considered here, the expression (3) is stable to considering an -approximation (Supplementary Note 4); this quantity is thus physically justified.

Crucially, our result reveals that the minimal work requirement in general depends on the specific logical process, and not only on the input and output states. This contrasts with traditional thermodynamics for large systems, where the minimal work requirement of a state transformation can always be written as a difference of a thermodynamical potential, such as the free energy. For example, the minimal work cost of performing specifically an AND gate may differ from that of another logical process mapping an input distribution (*p*_{00}, *p*_{01}, *p*_{10}, *p*_{11}) (with _{i} *p*_{i}=1) to the distribution (*p*′_{0}, *p*′_{1})=(*p*_{00}+*p*_{01}+*p*_{10}, *p*_{11}) (Recall that the classical counterpart of a quantum state is a probability distribution.). To see this, consider the XOR gate, which outputs a 1 exactly when both inputs are different (see Fig. 2b). The minimal work cost requirement of this gate, as given by (3), is now only *kT* ln 2, as in the worst case, only a single bit of information is erased (again supposing that all four input states have non-negligible probability of occurring). Now, suppose that, for some reason, the input distribution is such that *p*_{01}+*p*_{10}=*p*_{11}, that is, the input 11 occurs with the same probability as of either 01 or 10 appearing. Then, the XOR gate reproduces the exact same output distribution as the AND gate: in both cases, we have *p*′_{0}=*p*_{00}+*p*_{10}+*p*_{01}=*p*_{00}+*p*_{11} and *p*′_{1}=*p*_{11}=*p*_{01}+*p*_{10}. In other words, both logical processes have the same input and output state, yet the XOR gate only requires work *kT* ln 2 compared with the AND gate, which requires 1.6*kT* ln 2. Furthermore, we point out that this difference, which appears small in this case, may be arbitrarily large in certain scenarios (Supplementary Note 4).

On the one hand, we are by definition interested in the work cost of a given logical process, so one might have expected that this work cost should not only depend on the input and output states. On the other hand, it might seem contradictory that the full logical process matters even though we have fixed an input state *σ*_{X}. However, this makes sense if we consider preparing the input state as part of a pure state on the input system and a reference system. In this case, the logical process that is implemented influences the (in principle detectable) correlations between the output and the reference system, even if the reduced state on the input is the fixed state *σ*_{X}.

We emphasize that the phenomenon observed here is fundamentally different from the notion of thermodynamic irreversibility. Here we always consider the optimal procedure for implementing the logical process, whereas a thermodynamically irreversible process is in fact an ‘inefficient’ physical process that could be replaced by a more efficient, reversible one. In our framework, the thermodynamically irreversibile processes are those physical implementations that do not achieve the bound (1). A longer discussion with examples is provided in (Supplementary Note 2).

### Work extraction

While erasure requires work, it is well known that in a wide range of frameworks one can in general extract work with the reverse logical process, which corresponds to taking a register of bits that are all in the zero state and making them maximally mixed^{3,5}. Our result intrinsically reproduces this fact: the Stinespring dilation of a logical process that generates randomness in fact creates entanglement between the output *X*′ and *E* (see Fig. 2c). The conditional entropy then becomes negative, such that the bound (1) allows work to be extracted. We remark that, even if the logical process is classical, the relevant state for the entropic term in (1) is entangled, and thus all but classical; this is due to the construction of *E* as a purifying system for the logical process.

### Erasure with a quantum memory and tightness of our bound

Recently, del Rio *et al*.^{25} have constructed an explicit procedure capable of resetting a quantum system *S* to a pure state using an erasure mechanism assisted by a quantum memory *M*, and doing so at a work cost of approximately

The approximation holds up to terms of the order of the logarithm of and are negligible in typical scenarios (Supplementary Note 4).

Our main result implies that their procedure is nearly optimal (Fig. 2d). Indeed, consider the total system , in the initial state *σ*_{SM}, with the logical process , denoting symbolically with a prime the output system *S*′ (The state on *M* remains unchanged.). One then straightforwardly sees that the resulting joint state on *E* and the output is obtained from the initial state on *S* and *M* by isometrically ‘transferring’ the *S* part to *E* and replacing it by a fixed pure state. The entropy term in our bound (1) then becomes , the latter entropy being evaluated on the input state. This matches the term in (4).

Conversely, this optimal erasure procedure can be used to show that for any arbitrary logical process, the minimal amount of work our result associates to it can be in principle achieved to good approximation. Given a logical process and an input state *σ*_{X}, calculate its Stinespring dilation as explained above, and consider an ancillary system *A*_{E} of the same dimension as *E*. This ancilla system is initialized in a pure state . One can then carry out a unitary on *X* and *A*_{E}, chosen such that

In effect, *A*′_{E} impersonates the abstract system *E* while we perform a unitary corresponding to the Stinespring dilation of (see inset of Fig. 1b). This unitary operation can be implemented at no work cost because it is reversible. The aforementioned optimal erasure procedure can then be used to restore the ancilla *A*′_{E} to its original pure state, using the output system *X*′ as the quantum memory, at a work cost of approximately . As *A*′_{E} corresponds to *E*, this matches our bound (1) and therefore proves its tightness.

### The work requirement of a quantum measurement

The problem of determining the amount of work needed to carry out a quantum measurement has been the subject of much literature^{38,39,40}, especially in the context of Maxwell’s demon^{5,6,12,41}. A quantum measurement is a logical process (depicted in Fig. 3a) acting on a system *X* to be measured and a classical register *C* initially set to a pure state, and outputting systems *C*′ and *X*′, with *C*′ containing the measurement result and *X*′ the quantum post-measurement state. We will consider a projective measurement for simplicity, treating the more general case in (Supplementary Note 4). The logical process corresponding to the measurement described by a complete set of projectors {*P*_{i}}_{i} takes the form

Our bound (2) for this map is at most zero (since ), implying that the measurement can be carried out in principle at no work cost, as was already stated by Bennett^{5}. Note that a work cost is required if the classical register *C* was not initially pure^{40}.

A related question is the work cost of erasing the information contained in the register *C*′ after the measurement. Doing so would allow us to construct a cycle. The cost of this erasure can be reduced using the post-measurement state as a quantum memory, by employing the procedure presented above, to . But because *C*′ and *X*′ may only be classically correlated, no work may be extracted in this way^{25}. In some cases this work cost may be zero, for example for projective measurements on a maximally mixed state (Supplementary Note 4). This might seem to save Maxwell’s demon from Bennett’s information-theoretic exorcism, which argues that the demon must pay work to reset its memory^{5} (see Fig. 3c). However, the key point is to notice that the demon cannot use the post-measurement state to both extract work and to reset its internal memory register.

## Discussion

Our main result exposes various features of thermodynamics in the microscopic regime that are not present in the standard setting of large systems. In particular, as argued above, the minimum work cost of a logical process cannot be given in terms of a state function, such as the entropy or the free energy in thermodynamics.

Traditional thermodynamics is concerned with macroscopic systems, and we may retrieve this limit by considering logical processes that consist of many individual operations. Under appropriate independence assumptions and using typicality arguments^{42}, one can show that the average minimal work cost per process as determined by (1) simply takes the form *kT* ln (2)·[*H*(*X*)−*H*(*X*′)], where *H*(*X*)=−tr(*ρ*_{X} log_{2} *ρ*_{X}) is the usual von Neumann entropy (see Methods section): the minimal work requirement is now given by a function of state *H*(*X*), and no longer depends on the logical process that maps *X* to *X*′ (see Methods).

Our result thus provides the following fresh view on the macroscopic regime. Thermodynamics can be seen as a general framework, in which the second law postulates the existence of a state function, the thermodynamic entropy, which relates to the heat flow in processes. Many standard results of thermodynamics follow from that starting point. It is now the role of a microscopic theory to construct a state function with this property, based on the microscopic dynamics of the particular system. In textbook statistical mechanics, this construction is given for several physical setups, such as gases or lattices; one usually considers, for example, the configuration entropy, or an appropriately normalized Shannon or von Neumann entropy of the density of the statistical ensemble. Our result generalizes this construction and clarifies when it is justified: the state function, in general, appears whenever the inherent fluctuations due to the microscopic stochastic nature of the process vanish by typicality. The existence of an entropy state function is therefore not a property of the microscopic system; it is rather an emergent quantity that appears whenever the full system is typical, such as in the limit of macroscopic processes (Fig. 4).

Finally, one should note that the system in consideration need not be large for the typicality arguments to apply. For example, if one considers the work requirement of performing many independent repetitions of a single given logical process (seen as one big joint process), then the work requirement per repetition converges to the average work requirement as calculated via statistical mechanics, even if the individual system is small: in this case, the entropy function emerges. This further justifies the usage of the von Neumann entropy in statistical mechanics even for small systems. Conversely, a large system does not necessarily display typicality; such is the case for systems out of thermodynamic equilibrium. An explicit example is provided in (Supplementary Note 4).

In summary, our main result quantifies the minimal required work to perform a logical process on the microscopic level. On the conceptual level, our result shows how, for macroscopic systems, the information-theoretic von Neumann entropy emerges as a state function and can thus be strictly identified with the thermodynamic entropy.

## Methods

### Mathematical formulation and proof of the main result

The task is to implement the logical process . Recall the framework allows for the implementation of any unital map, that is, , to be performed on the systems at hand. We first adapt a well-known classical result about doubly stochastic and doubly sub-stochastic matrices^{43} to relate unital quantum maps to so-called subunital maps, that is, maps that satisfy . Note also that the composition of two unital maps is unital, and similarly the composition of two subunital maps is subunital. We will need the following proposition, which we prove in (Supplementary Note 6) as Prop. 17.

*Proposition I (dilation of a subunital map)*. Let and be finite dimensional Hilbert spaces, and let be a completely positive, trace-nonincreasing, subunital map. Then there exists finite dimensional Hilbert spaces and , and a completely positive, trace-preserving, unital map such that

for some pure states |i〉_{Q}, |f〉_{Q′}. In addition, dim ()= dim ().

Let’s now denote by *A* the ‘information battery’ system, which is the physical system that tracks how much work we have used or extracted. The system *A* may be as large as we might wish (but finite) and starts in a state with some given number of mixed qubits *λ*_{1}. The system *X* starts in a given state *σ*_{X}, and we assume that the Hamiltonians of *X* and *A* vanish at the beginning and at the end of the physical process.

Our framework specifies that we are allowed to perform any sequence of joint unital operations on any subsystems of *X* and *A*. The final state on should be a product state, with the state on *A*′ of the form . Note that the structure imposed on this state is not a restriction: if the final state on *A*′ is not of this form, an additional unital map can be applied on the support of the final state on *A*′ to replace the latter by a maximally mixed state on its support. However, this condition does assume that there is no way to extract work while transforming a state *ρ* to a maximally mixed state of the same rank, or, equivalently, that the worst-case erasure cost of a state *ρ* is *kT* ln 2 log_{2} rank *ρ*. This can usually be seen as a consequence of the choice of framework, and is in line with the findings of refs 28, 29. Alternatively, given a state *ρ*, let *m* be its rank, *p*_{min} its smallest non-zero eigenvalue and Π the projector on its support. The state *ρ* may be written as a statistical mixture of with probability *m*·*p*_{min} and some state (*ρ*−*p*_{min}Π)/(1−*m p*_{min}) with probability 1−*m*·*p*_{min}. In the event where the system is prepared in the maximally mixed state of rank *m*, the work requirement for erasure is deterministic because the state is uniform, and equals *kT* ln 2 log_{2} *m* (refs 3, 5, 11, 12); it follows that the work required for erasing *ρ* with certainty is at least *kT* ln 2 log_{2} rank *ρ*.

Observe that our framework is equivalent to allowing the agent to perform a single unital operation on the whole of *X* and *A*, leaving both systems in the state : indeed the composition of unital maps is unital, and extending a unital map by an identity map still yields a unital map.

Even though we have presented our results while hinting that *X* and *X*′ represent the same system, and are thus of the same dimension, this need not be the case: our results are valid for arbitrary finite dimensions of *X* and *X*′. However, we will assume that one can bring in ancillas of arbitrary finite dimension in pure states and dispose of ancillas restored to a pure state for free. Henceforth, we will assume that such ancillas are counted as part of the pure systems composing the work storage systems *A* and *A*′ (The systems *A* and *A*′ hence need not be of same dimension.).

We must in addition require that the physical process implement the logical process . Let |*σ*〉_{XR} be a purification of *σ*_{X} on a system *R*. If one applies the physical process to *X* while leaving *R* untouched, then the state on that results from the physical process must be equal to the state *ρ*_{X′R} that would result by applying the mapping on *σ*_{XR}, that is, . Observe that this constraint is equivalent to requiring the logical mapping corresponding to the physical process to be exactly on the support of *σ*_{X}, due to the Choi-Jamiołkowski isomorphism. So, even with a fixed given input state *σ*_{X}, the full information about the mapping can be observed in the resulting state on , by keeping a purification of *σ*_{X}: in other words, the full information about the mapping and the input state is one-to-one encoded in the bipartite state *ρ*_{X′R}.

Let’s now state a formal version of our problem, in the case where we do not yet consider an -approximation. The task is to find the minimal *kT* ln2·(*λ*_{2}−*λ*_{1}), such that there exists a unital, trace-preserving, map satisfying

where and where an identity mapping on *R* is implicitly understood (We henceforth omit the pure states on system *A*, that is, the factors ‘’ above, for readability.).

At this point, note that whenever for given *λ*_{1}, *λ*_{2}, there is such a unital map, then there is also a subunital map achieving the same logical process and vice versa. Let’s write this as a proposition:

*Proposition II*. Let *λ*_{1}, *λ*_{2}⩾0 and let be given. Then are equivalent

(1) For a large enough *A*, and corresponding *A*′, there exists a trace-preserving unital map such that

(2) For a large enough *B*, and large enough *B*′, there exists a trace-nonincreasing subunital map such that

*Proof.* The forward direction is straightforward, as a unital map is in particular subunital. For the converse, we will dilate the given subunital map to a unital map using Prop. 1, with and : let , and be given by the Proposition. Now define and . We would like to show that , where we have defined and (as pure states, |i〉_{Q} and |f〉_{Q′} do not alter the amount of work stored in the work storage systems *A* and *A*′). Define also the shorthand . By construction, and using (7), we have

Since is trace-preserving, we have tr ()=1 and

as the expression in (11) has unit trace. It follows that lies in the support of , and from (11) we conclude as requested that

We can now characterize the allowed operations in our framework and their work costs with the following proposition.

*Proposition III*. Let *σ*_{X}, be given. Choose system *B* big enough and let be given integers *λ*_{1}, *λ*_{2}⩾0. Then are equivalent:

(1) There exists a trace-nonincreasing subunital map such that

(2) There exists a trace-nonincreasing map , mapping linear operators on to linear operators on , such that , and ;

(3) The map satisfies , where Π_{X} is the projector onto the support of *σ*_{X}.

*Proof.* (i)⇒(ii): Define . Then, . Also, , because is subunital.

(ii)⇒(iii): We have because the maps are equal on the support of *ρ*_{X} (alternatively, operate tr_{R}[(·)*ρ*_{R}^{−1}] on both sides of noting that *ρ*_{R}=*σ*_{R}); then because Π_{X}≤_{X}, we have .

(iii)⇒(i): Let . Observe that is subunital: . Also, , because the input to is inside the support of . Hence, satisfies the conditions of (i).

With these propositions, we can calculate straightforwardly and explicitly the minimization in the formulation of the main problem. It now reduces to the simple question of minimizing *λ*_{2}−*λ*_{1} subject to ; we have thus proven (2).

### Entropic form of the bound

Some basic facts about the smooth entropy framework are necessary to understand the rest of this section. For a more complete introduction on the smooth entropy framework, we refer to (Supplementary Note 3).

An equivalent definition of the Rényi-zero conditional entropy, also known as alternative max-entropy, for a bipartite state *ρ*_{AB}, is given as

where Π_{AB} is the projector on the support of *ρ*_{AB}. For consistency with the standard literature, we will express our final result in terms of the max-entropy, which is related to the Rényi-zero entropy up to factors logarithmic in (ref. 34). The non-smooth conditional max-entropy can be defined as

where is the fidelity between two quantum states^{35}, and where the optimization ranges over density operators on *B*. The smooth conditional max-entropy is defined by ‘smoothing’ the max-entropy on states that are -close to *ρ*_{AB} in fidelity distance:

where the minimization ranges over all such that .

Let’s now return to our bound (2). Consider the Stinespring dilation of , given by an isometry *V*_{X→X′E} including an additional system . Defining the pure state *ρ*_{X′ER}=*Vσ*_{XR}*V*^{†} is obviously compatible with our previous definition of *ρ*_{X′R}, as . It follows that *V*Π_{X}*V*^{†}=Π_{X′E}, where Π_{X′E} is the projector on the support of *ρ*_{X′E}. Recalling (12), we have

and our bound (2) takes the form

### Considering an -approximation

A ‘smooth’ version of the result is straightforward to obtain. In this case, we allow the actual process to not implement precisely , but only approximate it well. The best strategy to detect this inexactness is to prepare |*σ*〉_{XR} and send *σ*_{X} into the process, and then perform a measurement on *ρ*_{X′R}. To ensure that the approximate process is not distinguishable from the ideal process with probability greater than , we require that the trace distance between the ideal output of the process *ρ*_{X′R} and the actual output must not exceed . We can apply our main result to the approximate process that brings σ to , and lower bound the work cost of that process by

where the second inequality is shown in ref. 44 This relaxation of *H*_{0} to *H*_{max} is done for the sake of presentation and consistency with other results within the smooth entropy framework. When smoothing with a parameter , there is no significant difference with this relaxation: indeed, the two quantities are equivalent up to adjustment of the parameter and up to a logarithmic term in (Lemma 18 of ref. 44).

If we optimize (17) over all possible maps that output such , we obtain a bound on the work requirement of the -approximation,

where the first optimization ranges over all such that the trace distance , and where the second optimization ranges over all such that , with , where is the fidelity between the quantum states *ρ* and .^{35}

### Macroscopic limit: many independent repetitions

As we have seen in the introduction, considerable previous work has focused on the limit cases where many i.i.d. systems are provided. In such a case, the process is applied on *n* independent copies of the input , and outputs . A smoothing parameter is chosen freely. We may simply apply our (smoothed) main result to get an expression for our bound on the work cost,

However, it is known that the smooth entropies converge to the von Neumann entropy in the i.i.d. limit^{42},

which allows us to simplify the expression of the work cost per particle, or per repetition of the process, to

where the last equality holds because *ρ*_{EX} and *σ*_{X} have the same spectrum being both purifications of the same *ρ*_{R}=*σ*_{R}. We conclude that in the asymptotic i.i.d. case, the work cost is simply given by the difference of entropy between the initial and final state,

Here *W* is the average work cost per particle, or per repetition of the process. In the case for example of many independent particles undergoing a similar, independent process, the total work *W* required is obtained by considering the entropy of the full system of all particles in both terms in (21).

## Additional information

**How to cite this article**: Faist, P. *et al*. The minimal work cost of information processing. *Nat. Commun.* 6:7669 doi: 10.1038/ncomms8669 (2015).

## References

Jaynes, E. T. Information theory and statistical mechanics.

*Phys. Rev.***106**, 620–630 (1957).Shannon, C. E. A mathematical theory of communication.

*Bell Syst. Tech. J.***27**, 379–423 (1948).Szilard, L. Über die Entropieverminderung in einem thermodynamischen System bei Eingriffen intelligenter Wesen.

*Zeitschrift für Physik***53**, 840–856 (1929).Landauer, R. Irreversibility and heat generation in the computing process.

*IBM J. Res. Dev.***5**, 183–191 (1961).Bennett, C. H. The thermodynamics of computation—a review.

*Int. J. Theor. Phys.***21**, 905–940 (1982).Bennett, C. H. Notes on Landauer's principle, reversible computation and Maxwell's demon.

*Stud. Hist. Phil. Mod. Phys.***34**, 501–510 (2003).Bennett, C. H. Logical reversibility of computation.

*IBM J. Res. Dev.***17**, 525–532 (1973).Shizume, K. Heat generation required by information erasure.

*Phys. Rev. E***52**, 3495–3499 (1995).Maruyama, K., Nori, F. & Vedral, V. Colloquium: the physics of Maxwell's demon and information.

*Rev. Mod. Phys.***81**, 1–23 (2009).Piechocinska, B. Information erasure.

*Phys. Rev. A.***61**, 062314 (2000).Feynman, R. P.

*Lectures on Computation*Westview Press (1996).Leff, H. S. & Rex, A. F.

*Maxwell's Demon 2: entropy, classical and quantum information, computing*Taylor & Francis (2010).Sagawa, T. & Ueda, M. in

*Nonequilibrium Statistical Physics of Small Systems: Fluctuation Relations and Beyond*(eds Klages R., Just W., Jarzynski C., Schuster H. G. 181–211Wiley-VCH (2013).Lloyd, S. Ultimate physical limits to computation.

*Nature***406**, 1047–1054 (2000).Plenio, M. B. & Vitelli, V. The physics of forgetting: Landauer's erasure principle and information theory.

*Contemp. Phys.***42**, 25–60 (2001).Popescu, S., Short, A. J. & Winter, A. Entanglement and the foundations of statistical mechanics.

*Nat. Phys.***2**, 754–758 (2006).Gemmer, J., Michel, M. & Mahler, G.

*Quantum thermodynamics: Emergence of thermodynamic behavior within composite quantum systems*Springer Verlag (2009).Oppenheim, J., Horodecki, M., Horodecki, P. & Horodecki, R. Thermodynamical approach to quantifying quantum correlations.

*Phys. Rev. Lett.***89**, 180402 (2002).Hänggi, P. & Marchesoni, F. Artificial brownian motors: Controlling transport on the nanoscale.

*Rev. Mod. Phys.***81**, 387–442 (2009).Baugh, J., Moussa, O., Ryan, C. A., Nayak, A. & Laflamme, R. Experimental implementation of heat-bath algorithmic cooling using solid-state nuclear magnetic resonance.

*Nature***438**, 470–473 (2005).Alicki, R., Horodecki, M., Horodecki, P. & Horodecki, R. Thermodynamics of quantum information systems—hamiltonian description.

*Open Syst. Inf. Dyn.***11**, 205–217 (2004).Janzing, D.

*Computer Science Approach to Quantum Control*Habilitation, Universität Karlsruhe (2006).Linden, N., Popescu, S. & Skrzypczyk, P. How small can thermal machines be? the smallest possible refrigerator.

*Phys. Rev. Lett.***105**, 130401 (2010).Dahlsten, O. C. O., Renner, R., Rieper, E. & Vedral, V. Inadequacy of von Neumann entropy for characterizing extractable work.

*New J. Phys.***13**, 53015 (2011).del Rio, L., Åberg, J., Renner, R., Dahlsten, O. & Vedral, V. The thermodynamic meaning of negative entropy.

*Nature***474**, 61–63 (2011).Egloff, D., Dahlsten, O. C. O., Renner, R. & Vedral, V. Laws of thermodynamics beyond the von Neumann regime. Preprint at http://arxiv.org/abs/1207.0434 (2012).

Brandão, F. G. S. L., Horodecki, M., Oppenheim, J., Renes, J. M. & Spekkens, R. W. Resource theory of quantum states out of thermal equilibrium.

*Phys. Rev. Lett.***111**, 250404 (2013).Åberg, J. Truly work-like work extraction via a single-shot analysis.

*Nat. Commun.***4**, 1925 (2013).Horodecki, M. & Oppenheim, J. Fundamental limitations for quantum and nanoscale thermodynamics.

*Nat. Commun.***4**, 2059 (2013).Skrzypczyk, P., Short, A. J. & Popescu, S. Extracting work from quantum systems. Preprint at http://arxiv.org/abs/1302.2811 (2013).

Reeb, D. & Wolf, M. M. An improved Landauer principle with finite-size corrections.

*N J. Phys.***16**, 103011 (2014).Brandão, F., Horodecki, M., Ng, N., Oppenheim, J. & Wehner, S. The second laws of quantum thermodynamics.

*Proc. Natl Acad. Sci. USA***112**, 201411728 (2015).Renner, R.

*Security of Quantum Key Distribution*. Ph.D. thesis, ETH Zürich (2005). Security of Quantum Key Distribution. Preprint at http://arxiv.org/abs/quant-ph/0512258 (2005).Tomamichel, M.

*A Framework for Non-Asymptotic Quantum Information Theory*. Ph.D. thesis, ETH Zurich (2012). A Framework for Non-Asymptotic Quantum Information Theory Preprint at http://arxiv.org/abs/1203.2142 (2012).Nielsen, M. A. & Chuang, I. L.

*Quantum Computation and Quantum Information*Cambridge University Press (2000).König, R., Renner, R. & Schaffner, C. The operational meaning of min- and max-entropy.

*IEEE Trans. Inf. Theory***55**, 4337–4347 (2009).Horodecki, M., Oppenheim, J. & Winter, A. Partial quantum information.

*Nature***436**, 673–676 (2005).Sagawa, T. & Ueda, M. Minimal energy cost for thermodynamic information processing: Measurement and information erasure.

*Phys. Rev. Lett.***102**, 250602 (2009).Buscemi, F., Hayashi, M. & Horodecki, M. Global information balance in quantum measurements.

*Phys. Rev. Lett.***100**, 210504 (2008).Jacobs, K. Quantum measurement and the first law of thermodynamics: The energy cost of measurement is the work value of the acquired information.

*Phys. Rev. E***86**, 040106 (2012).Earman, J. & Norton, J. D. Exorcist XIV: The wrath of Maxwell's demon. part II. from Szilard to Landauer and beyond.

*Stud. Hist. Phil. Sci.***30**, 1–40 (1999).Tomamichel, M., Colbeck, R. & Renner, R. A fully quantum asymptotic equipartition property.

*IEEE Trans. Inf. Theory***55**, 5840–5847 (2009).Bhatia, R.

*Matrix Analysis*Springer (1997).Tomamichel, M., Schaffner, C., Smith, A. & Renner, R. Leftover hashing against quantum side information.

*IEEE Trans. Inf. Theory***57**, 5524–5535 (2011).

## Acknowledgements

We thank Johan Åberg, Francesco Buscemi, Lea Krämer Gabriel Joe Renes, Lídia del Rio and Paul Skrzypczyk for discussions. P.F., F.D. and R.R. were supported by the Swiss National Science Foundation (SNSF) through the National Center of Competence in Research ‘Quantum Science and Technology’, through grant No. 200020-135048, and by the European Research Council through grant No. 258932. F.D. was also supported by the SNSF through grants PP00P2-128455 and 20CH21-138799, as well as by the German Science Foundation (grant CH 843/2-1). J.O. is funded by the Royal Society of London. This work was also supported by the COST Action MP1209.

## Author information

### Authors and Affiliations

### Contributions

The main ideas were developed by all authors. P. F. wrote the manuscript.

### Corresponding author

## Ethics declarations

### Competing interests

The authors declare no competing financial interests.

## Supplementary information

### Supplementary Information

Supplementary Figures 1-3, Supplementary Note 1-6 and Supplementary References (PDF 1175 kb)

## Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

## About this article

### Cite this article

Faist, P., Dupuis, F., Oppenheim, J. *et al.* The minimal work cost of information processing.
*Nat Commun* **6**, 7669 (2015). https://doi.org/10.1038/ncomms8669

Received:

Accepted:

Published:

DOI: https://doi.org/10.1038/ncomms8669

## Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.