A general derivation and quantification of the third law of thermodynamics

The most accepted version of the third law of thermodynamics, the unattainability principle, states that any process cannot reach absolute zero temperature in a finite number of steps and within a finite time. Here, we provide a derivation of the principle that applies to arbitrary cooling processes, even those exploiting the laws of quantum mechanics or involving an infinite-dimensional reservoir. We quantify the resources needed to cool a system to any temperature, and translate these resources into the minimal time or number of steps, by considering the notion of a thermal machine that obeys similar restrictions to universal computers. We generally find that the obtainable temperature can scale as an inverse power of the cooling time. Our results also clarify the connection between two versions of the third law (the unattainability principle and the heat theorem), and place ultimate bounds on the speed at which information can be erased.


A. The system being cooled
We refer to the system to be cooled as "the system". We assume that the system is finite-dimensional, and treat the infinite-dimensional case in Section I G. We denote its energy eigenstates and eigenvalues by |s and E s , where s = 1, 2, . . . , d, such that the Hamiltonian is Without loss of generality 0 = min s E s and define J = max s E s . We denote by g the degeneracy of the ground state, and by ∆ the energy gap above it. The initial state of the system ρ S is arbitrary, with spectral decomposition The results that we obtain depend on the density-matrix's eigenvalues λ s , but not on its eigenstates |φ s . In particular, these can be coherent superpositions of energy eigenstates |s , but this does not affect our bounds. Special attention is given to the thermal (or Gibbs) state at temperature T S = 1/β S . As usual, the normalization factor is the partition function. If the final state is thermal, we denote its temperature and partition function by T S and Z S .

B. The thermal bath
In order to assist the cooling process there is a bath or reservoir with finite volume V , and hence, finite heat capacity C. The reason for limiting the volume is that, in finite time, the system can only (fully) interact with a bath of finite volume. At the very least, the Lieb-Robinson bound [5] establishes a limit on the speed at which information propagates within a system with local interactions. Roughly, the time it takes for a system to interact with a V -volume bath, in a D-dimensional space, is t ≥ 1 v V 1/D , where v is proportional to the "speed of sound" of the bath.
If the volume of the bath is not well defined, then the parameter V can be understood as the number of bosonic or fermionic modes, spins, subsystems, etc. Note that, despite its finite extension, the bath can have an infinitedimensional Hilbert space. Even more, the bath can be a quantum field, in which case it will have an infinite number of modes. Despite this, the spatial finiteness will warrant that the number of levels with energy below any finite value will remain finite. In the pathological case where there are bosonic modes with zero energy, we ignore them without affecting the cooling properties of the bath. We show below that baths with infinite-dimensional Hilbert space have more cooling capacity.
The energy eigenstates of the bath |b are labeled by b = 1, 2, . . ., and the corresponding energies by E b . We assume that E b are increasingly ordered, and that E 1 = 0. We assume that the bath is in the thermal or canonical state ρ B = 1 Z B e −βH B , at temperature T = 1 β , with the normalization factor Z B = tr e −βH B being the partition function. The free energy of the canonical state at inverse temperature β is and the density of free energy is The heat capacity of the canonical state is The number of states with energy at most E is denoted by The number of states with energy inside the energy window (E − ω, E] is denoted by and we refer to it as "the density of states". As usual, we fix the width of the energy window matching the width of the energy distribution in the canonical state It is often the case that, for large volume V , almost all energy levels in the interval (E − ω, E] are clustered around the upper limit of the interval E. This is due to the fast increase of the number of levels as the energy grows (see for example [6] for a proof). When this is the case, the quantity Ω(E) does not depend much on ω.
The logarithm of the density of states is the micro-canonical entropy at energy E It is often the case that, in the large volume limit, micro-canonical and canonical entropies become equal. This is proven in [6] for the case of local Hamiltonians. The function S(E) is discontinuous, but it is usually the case that, for sufficiently large V , the relative size of its discontinuities is very small, and the quotient S(E) V tends to a smooth function of the energy density E/V (see [6]). However, here we do not make any smoothness assumption, and hence, use the discrete derivatives The temperature of the micro-canonical state with energy E is given by and the corresponding heat capacity is All reasonable systems (black holes excluded) have a positive heat capacity, and a negative one would allow to cool to absolute zero any system, violating the third law. The finiteness of the heat capacity is a consequence of the finite volume of the bath. The density of free energy for the micro-canonical state at inverse temperature β 0 is where E 0 is the solution of S (E 0 ) = β 0 . Note the difference between the background inverse temperature β and the one defining the state β 0 .

C. The cooling process
The cooling process consists of a joint transformation of system, bath and weight. We follow [4] in that the work storage device is modeled by a weight with Hamiltonian where the orthonormal basis {|w : w ∈ R} corresponds to the position of the weight. The Hermitian operator Π translates the energy eigenbasis for all w, x ∈ R.
In recent approaches to nano-thermodynamics [1][2][3], the work consumed or generated in a process is constrained to have small fluctuations around its mean value. However, here we want to be maximally general, hence we allow the work consumed to fluctuate arbitrarily. Hence, our weight is an ideal source or sink of work, which can be used to simulate any other work storage device appearing in the literature.
As mentioned in the main text, to transform cooling bounds in terms of work to one in terms of time, we can assume that injecting work into the system or the bath requires an amount of time that grows with the amount of work injected. Hence, since we require the transformation to be implemented within a certain given time, we restrict the worst-case work consumed to be at most a given constant w max . We stress that the work average is not restricted, and in particular, it can be larger than the free energy difference of the transformation, which breaks thermodynamical reversibility.
Abstractly, a cooling process is characterized by a completely positive and trace-preserving map satisfying the following requirements: 1. Microscopic reversibility: there is a unitary operator jointly acting on the Hilbert spaces of system, bath and weight. That is 2. Conservation of total energy: 3. Independence of the weight's "position": 4. Bound on worst-case work transferred: for all w ∈ R.
The first point follows from the fact that closed systems evolve according to the Schrödinger Equation.
The second point expresses the conservation of energy (First Law of Thermodynamics). As noted in the main section, this is not a restrictive assumption, and just ensures that we account for all sources of energy.
The third point ensures that the weight is only used as a source of work, and not, for instance, as an entropy dump [4]. More concretely, in Section II A it is proven that, any U satisfying (23) induces a map on system and bath which is a mixture of unitaries: This prevents the entropy of system and bath to decrease (Second Law of Thermodynamics). Essentially, for the work system, all that should matter is how much work is delivered to our system, thus energy differences matter, but the zero of the energy should not. The energy that is subtracted from or added to the weight is in general not fixed, it fluctuates depending on the micro-state of system and bath. However, we assume that in finite time, the worst-case work fluctuation is bounded by a given value w max . This is encapsulated in point four. We assume that the variable w max can increase with the time invested in the transformation.

D. Summary of the Argument
In this subsection we summarize the derivation of the quantitative third law by presenting the steps of the proof as a series of results. The proofs are given in Section II.
We start by showing that the action of U on system and bath (when the weight is traced out) cannot reduce their entropy. In particular, this prevents to use the weight as an entropy dump.
where u w are unitaries.
As a result, the action of the cooling process on the system and bath is at best unitary (which will preserver the Von Neumann entropy), and possibly entropy increasing if a mixture of unitaries. As we will show in Section II A, the unitaries u ω depend on the global unitary U , but not on the state of the weight ρ W . The error of the cooling process is quantified by the probability of not being in the ground space where P is the projector onto the ground space of H S and Γ S is the action of the cooling process on the system For the following result it is useful to define where λ max , λ min are the largest and smallest eigenvalues of ρ S .

Result 2
The error made when cooling a system to absolute zero satisfies where E 0 is the smallest energy violating Recall that ω is defined in (10). The dependence of (30) and (29) on the smallest eigenvalue λ min makes a discontinuous function of the state ρ S , which is unphysical. In Section I G we apply a standard smoothing technique to make a continuous function of the state. This also allows us to adapt our results to infinite-dimensional systems (d → ∞).
The above bound on is valid with full generality. However, solving equation (31) is in general difficult. Next we assume the positivity of the (micro-canonical) heat capacity and derive a more usable bound.

Result 3
The error made when cooling a system to absolute zero satisfies where E 0 is the (unique) solution of provided and for all E.
Our bound (32) depends implicitly on the two parameters V and w max , which quantify the amount of resources. But bound (32) is valid in the range of parameters (V and w max ) satisfying condition (34). However, this regime includes the late time situations we are interested in, since we can take the volume of the bath V sufficiently large, and C can (V ) grows with the volume in at least a linear rate. An important consequence of Result 3 is the following. The faster Ω(E) or S(E) grow, the slower S (E) decreases and the larger the solution E 0 of equation (33) is. A large E 0 in (32) gives a smaller bound for . As we will see in Result 6, baths with faster Ω-growth allow for better cooling. And actually, when this growth is exponential or faster then perfect cooling ( = 0) can be achieved with finite w max (see Section I H). However, these cases correspond to negative heat capacity, which is unphysical.

Result 4
If the final state is thermal, then its temperature T S satisfies where is the probability of the system not being in the ground state (32), and ∆ is the energy of the first excited state above the ground space of the system.
The following result applies Result 3 to the case where the initial state of the system is thermal at temperature T S , in which case we have It also uses Result 4 to translate the error probability to the final temperature of the system T S .

Result 5
If the microcanonical entropy of the bath scales as then, the final temperature of the system cannot be lower than where only leading terms in V are considered, and we take the regime given by Equations (34) and (35).
This result appears in the main section, as Equation (7). Here we can observe the very natural fact that: the smaller the initial temperature T S is, the smaller the final temperature T S becomes. The above result is already a third law, in the sense that it places a limitation on the temperature which is achievable, given a restriction on resources V and w max . Also note that, in most known types of bath, when V is large we have f mic (β 0 ) ≈ f can (β 0 ), and no distinction between the two free energy densities is necessary.
In what follows we consider a family of entropy functions for the bath. In particular, this family contains the entropy of a box of electro-magnetic radiation in D spatial dimensions and volume V (in the large V limit). This result illustrates, that the faster the entropy grows (larger ν > 0), the lower the achievable temperature.

Result 6 If the entropy function of the bath is
with ν ∈ [1/2, 1), then equation (39) becomes up to leading terms in V and ξ.
The largest work fluctuation w max and the volume of the bath V are resources that we associate to the time invested in the cooling process. The larger this quantities are, the lower the final temperature can be. In the following result we express the lowest achievable temperature in terms of time. In order to facilitate a simpler expression above, we have suppressed all constant terms.

Result 7
If the entropy function is (40) and up to leading terms.

E. Cooling processes with non-constant H S
Our previous results only apply to the case where the Hamiltonian of the system is kept constant during the cooling process. However, our results can be easily adapted to the case where the Hamiltonian changes.
It is straightforward to check that, if the parameters g and ∆ appearing in our formulae are those of the final Hamiltonian, then Results 2, 3, 4, 5, 6, 7 become true for processes with non-constant H S . F. Cooling processes which discard part of the system A well known example of this type of process is evaporative cooling, in which the cooled final system contains only a fraction of the atoms of the initial system. In general, we write the Hilbert space of the initial system H S as the product of the final system H S times the discarded part H S , that is H S = H S ⊗ H S . This translates into a relation between the respective dimensions of these Hilbert spaces d = d d .
Now, we can repeat the argument that led to Results 1-7, but replacing H 0 S by our new target subspace H 0 S ⊗ H S , where H 0 S is the ground space of H S . Hence, if P is the projector onto H 0 S then the projector onto the target subspace is P = P ⊗ 1 S where 1 S is the identity of H S . Also, if g and g are the ranks of P and P , respectively, then we have the relation g = g d . Now we apply our results to subspace P . We just have to keep in mind that d, g, J, λ min , λ max , T S refer to the initial system (before the partial trace), and ∆, T S refer to the final system. Also note that the dependence of our bounds on d and g is via d/g, hence there is no difference when using either d/g or d /g .

G. Continuity and infinite-dimensional systems
Some of our bounds on the error and the temperature T S depend on the lowest eigenvalue λ min , which makes them a discontinuous function of the initial state ρ S . For example, Result 3 provides a lower bound for which tends to zero as λ min tends to zero. Hence, for very small λ min the result is useless.
One way to fix this problem is by truncating the Hilbert space of the system H S such that the smallest eigenvalues of H S are eliminated. After truncating H S , system's parameters λ min , J and d may take new values. The physical meaning of truncation is that the truncated subspace is assumed to be mapped to the ground space without interfering with the map of the untruncated subspace. Hence, the truncated subspace does not contribute to the error .
Ideally, one should optimize over all possible truncations, until the lower bounds for and T S are maximal. The truncation always has an optimal non-trivial point, because if one truncates everything except for a 1-dimensional subspace then = 0, which cannot be maximal. Hence, there is an optimal truncation dimension d in the interval 1 < d ≤ d for which and T S are maximal.
This method can also be used to apply all our results to infinite-dimensional systems. Any finite-dimensional truncation gives finite λ min , J and d , which provide a non-trivial bound when substituted in any of our results. Now, let us apply the truncation method to a harmonic oscillator with energy levels E s = s for s = 1, 2, . . ., with initial state being a thermal state at the same temperature as the bath T S = T . This system suffers from the above mentioned two problems: λ min = 0 and d = ∞. We solve these problems by truncating out all energy levels s > d , obtaining Substituting this in (32) we obtain Setting d = 2 gives a non-trivial bound. However, to obtain the optimal value of d , one needs to jointly optimize the above two equations.

H. Negative heat capacity allows perfect cooling
In this subsection we show that when the heat capacity of the bath is negative for all E, then perfect cooling to absolute zero ( = 0) with finite w max is possible. Using relation (15) we see that the violation of the above conditions implies S (E) ≥ 0. Which in turn implies that S(E) grows at least linearly, or that Ω(E) grows at least exponentially. First note that the integral I(E) never grows slower than Ω(E). Hence I(E) is also exponential or super-exponential. This implies that for any value of d/g there is a sufficiently large energy w max such that for all E. Therefore the smallest energy violating (49) is E 0 = ∞, which, when substituted in (30), gives = 0. In other words, the whole space H S ⊗ H B can be mapped into the ground space H 0 S ⊗ H B . There is no unattainability principle.

II. SUPPLEMENTARY METHODS
A. The weight as a work storage system ( Result 1) In this subsection we show that, if the global transformation U commutes with the translations on the weight then the effect of U on system and bath is a mixture of unitaries. And hence, it can never decrease the entropy of system and bath, which amounts to a statement of the second law of thermodynamics.
Let U be a unitary acting on system, bath and weight H SB ⊗ H W which commutes with the translations on the weight [U, 1 SB ⊗ Π] = 0. This implies that we can write it as where A x is a family of operators acting on H SB . By imposing unitarity we obtain which implies where δ(y) is the Dirac delta distribution. Now, let us figure out the structure of the reduced map on system and bath: where p(k) = k|σ|k is a probability measure and |k is the eigenstate Π|k = k|k for all k ∈ R. Also, note that, in general, the map Γ depends on the state of the weight σ.
If we define the family of operators then we can write the reduced map as Now we show that the operators u k are unitary: where we have used (52). In summary, for any initial state of the weight σ, the reduced map (55) is a mixture of unitaries. Interestingly, the set of unitaries u k is independent of σ, but the probability measure p(k) does depend on σ.

B. Optimal cooling (Result 2)
Here, we formalize the intuition described in Figure 1 for quantifying the probability of all the states which cannot be mapped to the ground space.
We define the subspaces with energy lower than a given value E as and recall that The optimal cooling unitary u is the one that maps the largest amount of probability from ρ S ⊗ ρ B to the ground space H 0 S ⊗ H B . Hence, it is useful to denote the subspace corresponding to the largest eigenvalues of ρ S ⊗ ρ B by where X is a convenient way to parametrize the probability, and we have used The dimension of this "large probability" subspace is where λ min = min s λ s . Using J = max s E s and (62) we obtain where λ max = max s λ s . Assumption "Bound on worst-case work transferred", stated in (24), can be written as Combining the two above equations we obtain where Note that, if Q X is (fully) mapped into the ground space H 0 S ⊗ H B then (66) implies which in turn implies (substituting (63) and (61)) dI(X + T ln λ min ) ≤ g I(X + w 0 ) .
For the subsequent analysis it is convenient to define the threshold value X 0 , which is the infimum of the X's violating (69). With this definition, we write the decomposition in orthogonal subspaces. In order to obtain an upper bound for the amount of probability that can be mapped from ρ S ⊗ ρ B to the ground space, we assume that there is no constraint on where Q ⊥ X0+ω is mapped, and that the only constraint on the image of Q X0+ω is (66). However, the definition of X 0 vie (69) prevents mapping all of Q X0+ω into the ground space. Clearly, the optimum is to map the subspace of Q X0+ω containing the largest eigenvalues of ρ S ⊗ ρ B to the ground space. The complement of this subspace cannot be mapped into the ground space. Also, we know it has dimension dimW ≥ dI(X 0 + T ln λ min + ω) − gI(X 0 + w 0 + ω) = dΩ(X 0 + T ln λ min + ω) − gΩ(X 0 + w 0 + ω) +dI(X 0 + T ln λ min ) − gI(X 0 + w 0 ) ≥ dΩ(X 0 + T ln λ min + ω) − gΩ(X 0 + w 0 + ω) where in the last inequality we have used the definition of X 0 via (69). The subspace W is mapped in the complement of the ground space, and hence, it contributes to the error . Our lower-bound for is obtained by only taking into account this contribution. Equation (62) tells us that the smallest eigenvalue in Q X0+ω , and hence in W, is not smaller than 1 Z B e −β(X0+ω) . Therefore, we can bound by the product of this number with the dimension of W where we have used definitions In these new variables, E = E 0 is the smallest energy violating C. Simpler bound for the error (Result 3) In this subsection we derive an upper bound which is easier to obtain than solving (75). This bound E 1 is used to write a lower bound for that is simpler than (72). We start by assuming that E 1 satisfies and later we prove that this assumption reduces to premise (34). On the other hand, premise (35) warrants that which it is used below. Taylor's theorem implies that for any pair E, ξ > 0 there is ξ * ∈ [0, ξ] such that This and (78) imply Assumption (77) We can also write the lower bound I(E) ≥ Ω(E) = e S(E) .
Substituting bounds (81) and (82) in (75) we obtain d ≤ g e S (E) ξ 1 − e −S (E) ω . (83) Substituting ω = √ C/β and using assumption (77) we obtain 1 − e −S (E) √ C/β > 2 3 , which allows us to write (83) as We define E 1 to be the infimum value of E satisfying If S (E) is strictly monotonic, then equation (85) has a unique solution E 1 . Let us show that S (E) is strictly monotonic. Equation (85) implies S (E 1 ) > 0, which together with the finiteness of the micro-canonical heat capacity (35) forces S (E 1 ) < 0. And this in turn implies the strict decreasing monotonicity of S (E) around E 1 .

D. Relationship between error and temperature (Result 4)
In this subsection we assume that the final state is thermal at temperature T S and has partition function Z S = Z S (T S ). Because of our convention min s E s = 0, we have In principle, the function Z S (T S ) can be inverted, and the bound for (32) can be transformed into a bound for T S . However, this is in general a hard task. In what follows we obtain a general relation between Z S and T S which avoids having to invert Z S (T S ). For any Hamiltonian H S following convention min s E s = 0 we have Combining this with (87) we obtain or equivalently, (90)