No second law of entanglement manipulation after all

Many fruitful analogies have emerged between the theories of quantum entanglement and thermodynamics, motivating the pursuit of an axiomatic description of entanglement akin to the laws of thermodynamics. A long-standing open problem has been to establish a true second law of entanglement, and in particular a unique function that governs all transformations between entangled systems, mirroring the role of entropy in thermodynamics. Contrary to previous promising evidence, here we show that this is impossible and that no direct counterpart to the second law of thermodynamics can be established. This is accomplished by demonstrating the irreversibility of entanglement theory from first principles. Assuming only the most general microscopic physical constraints of entanglement manipulation, we show that entanglement theory is irreversible under all non-entangling transformations. We furthermore rule out reversibility without significant entanglement expenditure, showing that reversible entanglement transformations require the generation of macroscopically large amounts of entanglement according to certain measures. Our results not only reveal fundamental differences between quantum entanglement transformations and thermodynamic processes, but also showcase a unique property of entanglement that distinguishes it from other known quantum resources. A formal analysis of the physical limits of entanglement manipulation shows that it cannot be done reversibly, highlighting an important difference from thermodynamics.

as communication [8][9][10], computation [11], and cryptography [12]. The parallel with thermodynamics prompted a debate concerning the axiomatisation of entanglement theory [13][14][15][16] and the possible emergence of a single entanglement measure, akin to entropy, which would govern all entanglement transformations and establish the reversibility of this resource [13,[16][17][18]. Although later results suggested that entanglement may often be quite different from thermodynamics, even exhibiting irreversibility in some of the most practically relevant settings [19,20], hope persisted for an axiomatic framework for entanglement manipulation that would exactly mirror thermodynamic properties. Notably, identifying a unique entropic measure of entanglement was long known to be possible for the special case of pure states [13,21], and several proposals for general reversible frameworks have been formulated [16,22,23]. The seminal work of Brandão and Plenio [23,24] then provided further evidence in this direction by showing that reversible manipulation may [25] be possible when the physical restrictions governing entanglement transformations are suitably relaxed. These findings strengthened the belief that a fully reversible and physically consistent theory of entanglement could be established.
Here, however, we prove a general no-go result which shows that entanglement theory is fundamentally irreversible. Equivalently, we show from first principles that entanglement transformations cannot be governed by a single measure, and that an axiomatic second law of entanglement manipulation cannot be established.
Our sole assumption is that entanglement manipulation by separated parties should be accomplished by means of operations that make the theory fully consistent, namely, that never transform an unentangled system into an entangled one. This can be thought of as the analogue in the entanglement setting of the Kelvin-Planck statement of the second law, which in classical thermodynamics forbids the creation of resources (work) from objects which are not resourceful themselves (a single heat bath) [2,26]. By imposing only this requirement, we dispense with the need to make any assumptions about the structure of the considered processes: for example, we do not even posit that all intermediate transformations obey the laws of standard quantum mechanics, as previous works implicitly did. Instead, we only look at the initial and final states of the system, and demand that no resource, in this case entanglement, is generated in the overall transformation. This philosophy, hereafter termed axiomatic, is analogous to that followed by the pioneers of thermodynamicsand more recently by Lieb and Yngvason [5] -to establish truly universal versions of the second law. Such a general approach allows us to preclude the reversibility of entanglement under all physically-motivated manipulation protocols.
Importantly, however, our conclusions remain unaffected even when the above assumptions are substantially relaxed. It is intuitive to ask whether irreversibility could be avoided with just a small amount of generated entanglement, restoring the hope for reversible transformations in practice. We disprove such a possibility by strengthening our result to show that, with a suitable choice of an entanglement measure such as the entanglement negativity [27], it is necessary to generate macroscopically large quantities of entanglement in the process -any smaller amount cannot break the fundamental irreversibility revealed in our work. In particular, as we argue below, macroscopic entanglement generation is the price one would have to pay in Brandão and Plenio's framework [23,24] to restore reversibility.
The most surprising aspect of our findings is not only the stark contrast with thermodynamics, but also the fact that several other quantum phenomena -including quantum coherence and purity -have been shown to be reversible in analogous axiomatic settings [28], and no quantum resource has ever been found to be irreversible under similar assumptions. Our result is thus a first of its kind: it highlights a fundamental difference between entanglement on one side, and thermodynamics and all other quantum resource theories known to date on the other.
The generality of our approach allows for an extension of the results beyond the theory of entanglement of quantum states, to the manipulation of quantum operations [29]. This corresponds to the setting of quantum communication, where the resource in consideration is the ability to reliably transmit quantum systems. Importantly, thermodynamics allows for reversible manipulation of operations [30] as well, so an irreversibility of communication theory is, once again, in heavy contrast with thermodynamics.

ENTANGLEMENT MANIPULATION
The framework of entanglement theory features two separated parties, conventionally named Alice and Bob, who share a large number of identical copies of a bipartite quantum state, and wish to transform them into as many copies as possible of some target state, all while making a vanishingly small error in the asymptotic limit. We introduce this setting in Fig. 1. Here, an entanglement transformation protocol allows us to obtain two copies of a target state for every three copies of an initial state , with the transformation error improving as more copies of are provided. More generally, the initial global state is represented by an -fold tensor product ⊗ , where is a density operator on some tensor product separable Hilbert space H ⊗ H . In contrast with previous works, we do not assume that such Hilbert space is finite-dimensional. By means of some quantum operation Λ : → that acts on copies of and outputs copies of a (different) bipartite system , the initial state will be transformed into Λ ⊗ . Given a desired target state ⊗ , we thus require that the output state of the protocol be almost indistinguishable from this target state operationally, in the sense that any attempt of discriminating them by means of a quantum measurement should incur an error akin to that of a random guess. By the Helstrom-Holevo theorem [31,32], this property can be captured mathematically by imposing that the distance between the output of the transformation and the target state, as quantified by the trace norm · 1 , has to vanish. Therefore, by requiring that lim →∞ Λ ⊗ − ⊗ 1 = 0, we guarantee that the conversion of ⊗ into ⊗ will get increasingly better with more copies of the state available, culminating in an asymptotically perfect transformation.
The figure of merit in transforming the input quantum state into a target state is the transformation rate ( → ), defined as the maximum ratio / that can be achieved in the limit → ∞ under the condition that copies of are transformed into copies of with asymptotically vanishing error. Such a rate depends crucially on the set of allowed operations. In keeping with our axiomatic approach, we consider the largest physically consistent class of transformations: namely, those which are incapable of generating entanglement, and can only manipulate entanglement already present in the system.
To formalise this, we introduce the set of separable (or unentangled) states on a bipartite system , composed of all those states that admit a decomposition of the form [33,34] where is an appropriate probability measure on the set of pairs of local pure states. Our assumption is that any allowed operation Λ should transform quantum states on into valid quantum states on some output system , in such a way that Λ( ) is separable for all separable states . We refer to such operations as non-entangling (NE); they are also known as separability preserving. Hereafter, all transformation rates are understood to be with respect to this family of protocols.
We say that two states , can be interconverted reversibly if ( → ) ( → ) = 1, as visualised in Fig. 2. However, to demonstrate or disprove reversibility of entanglement theory as a whole, it is not necessary to check all possible pairs , . Instead, we can fix one of the two states, say the second, to be the standard unit of entanglement, the two-qubit maximally entangled state ('entanglement bit') Φ 2 1 2 2 , =1 | | [7]. The two quantities ( ) ( → Φ 2 ) and ( ) (Φ 2 → ) −1 are referred to as the distillable entanglement and the entanglement cost of , respectively. Entanglement theory is then reversible if ( ) = ( ) for all states .
( → ) = 2/3 Reversible interconversion between two states and . In this example, in the asymptotic many-copy limit it is possible to obtain 2 copies of from each 3 copies of , and vice versa.

IRREVERSIBILITY OF ENTANGLEMENT MANIPULATION
By demonstrating an explicit example of a state which cannot be reversibly manipulated, we will show that reversibility of entanglement theory cannot be satisfied in general. We formalise this as follows.

Theorem 1. The theory of entanglement manipulation is irreversible under non-entangling operations.
More precisely, for the two-qutrit state 3 = 1 6 3 , =1 | | − | | it holds that To show this result, we introduce a general lower bound on the entanglement cost that can be efficiently computed as a semi-definite program. Our approach relies on a new entanglement monotone which we call the tempered negativity, defined through a suitable modification of a wellknown entanglement measure called the negativity [27]. The situation described by Theorem 1 is depicted in Fig. 3. The full proof of the result is sketched in the Methods and described in detail in Supplementary Notes I-III.  3. Irreversibility of entanglement manipulation. Our main result in Theorem 1 shows that the two-qutrit state 3 cannot be reversibly manipulated under non-entangling transformations: we can extract only log 2 (3/2) ≈ 7/12 entanglement bits per copy of 3 asymptotically, but one full entanglement bit per copy is needed to generate it. Theorem 1 can be strengthened and extended in several ways, which we overview in the Methods section and expound on in Supplementary Notes IV-VI: (1) We show, in particular, that irreversibility persists beyond non-entangling transformations: the conclusion of Theorem 1 holds even when we allow for the generation of small amounts of entanglement (sub-exponential in the number of copies of the state), as quantified by several choices of entanglement measures such as the negativity or the standard robustness of entanglement [35]. What this means is that, in order to reversibly manipulate the state 3 , one would need to generate macroscopic (exponential) amounts of entanglement.
(2) We furthermore show that the irreversibility cannot be alleviated by allowing for a small non-vanishing error in the asymptotic transformation -a property known as pretty strong converse [36].
(3) Finally, Theorem 1 can also be extended to the theory of point-to-point quantum communication, exploiting the connections between entanglement manipulation and communication schemes [10,37]. This is considered in detail in a follow-up work [29]. These extensions further solidify the fundamental character of the irreversibility uncovered in our work, showing that it affects both quantum states and channels, and that there are no ways to avoid it without incurring very large transformation errors or generating significant amounts of entanglement.

WHY NON-ENTANGLING TRANSFORMATIONS?
The intention behind our general, axiomatic framework is to prove irreversibility in as broad a setting as possible. The key strength of this approach is that irreversibility under the class of non-entangling transformations enforces irreversibility under any smaller class of processes, which includes the vast majority of different types of operations studied in the manipulation of entanglement [7,28]; furthermore, our result shows that even enlarging the previously considered classes of processes cannot enable reversibility, as long as the resulting transformations are nonentangling.
To better understand the need for and the consequences of such a general approach, let us compare our framework to another commonly employed model, that where entanglement is manipulated by means of local operations and classical communication (LOCC). In this context, irreversibility was first found by Vidal and Cirac [19]. Albeit historically important, the LOCC model is built with a 'bottom-up' mindset, and rests on the assumption that the two parties can only employ local quantum resources at all stages of the protocol. Already in the early days of quantum information, it was realised that relaxing such restrictions -e.g. by supplying some additional resources -can lead to improvements in the capability to manipulate entanglement [16,22]. Although attempts to construct a reversible theory of entanglement along these lines have been unsuccessful [16,38], the assumptions imposed therein have left open the possibility of the existence of a larger class of operations which could remedy the irreversibility.
The limitations of such bottom-up approaches are best illustrated with a thermodynamical analogy: in this context, they would lead to operational statements of the second law concerning, say, the impossibility of realising certain transformations by means of mechanical processes, but would not tell us much about electrical or nuclear processes. Indeed, since we have no guarantee that the ultimate theory of Nature will be quantum mechanical, it is possible to envision a situation where, for instance, the exploitation of some exotic physical phenomena by one of the parties could enhance entanglement transformations. To construct a theory as powerful as thermodynamics, we followed instead a 'top-down', axiomatic approach, which -as discussed above -imposes only the weakest possible requirement on the allowed transformations, thereby ruling out reversibility under any physical processes.
The non-entangling operations considered here are examples of 'resource non-generating operations', commonly employed in the study of many other quantum resource theories [28]. In all of these other contexts, such operations have always been shown to lead to the reversibility of the given theory. For instance, Gibbs-preserving maps [39,40] in quantum thermodynamics are a broad, axiomatic formulation of the constraints governing thermodynamic transformations of quantum systems analogous to non-entangling operations. Under such operations, the theory of thermodynamics is known to be reversible [30]. An equivalent result has also been shown in the resource theory of quantum coherence [41][42][43], suggesting that reversibility could be a generic feature in the manipulation of different resources under all resource non-generating transformations. Our result, however, shows entanglement theory to be fundamentally different from thermodynamics and from all other known quantum resources: not even the vast class of all non-entangling maps can enable reversible entanglement manipulation. What this means is that, under the exact same assumptions that suffice to facilitate the reversibility of other quantum resources, entanglement remains irreversible.

MACROSCOPIC ENTANGLEMENT GENERATION IS NECESSARY FOR REVERSIBILITY
A similar axiomatic mindset to the one employed in our work has already proved to be useful. Notably, it led Brandão and Plenio [23,24] to construct a theory of entanglement which was claimed to be fully reversible. Recently, an issue that casts some doubts on the validity of their mathematical proof has transpired [25]. In spite of this, it remains a possibility that the theory of entanglement proposed by Brandão and Plenio may actually be reversible [25], so let us discuss it here in detail. This theory features so-called asymptotically non-entangling operations, defined as those that may generate some limited amounts of entanglement, provided that any such supplemented resources are vanishingly small in the asymptotic limit. This, on the surface, appears consistent with how fluctuations are treated in the theory of thermodynamics. However, the key question to ask here is: according to what measure should one enforce the generated entanglement to be small? Brandão and Plenio choose to quantify entanglement with the generalised robustness [35]. As we argue below, this a priori arbitrary choice turns out to be crucial to decide between reversibility and irreversibility. That is, there are reasonable entanglement measures using which reversibility only becomes possible at the price of exponential entanglement generation. In fact, even a minor change of the quantifier from the generalised robustness to the closely related standard robustness of entanglement [35] makes reversibility impossible. This entails that Brandão and Plenio's operations, despite generating vanishingly little entanglement with respect to the generalised robustness, create macroscopically large amounts of it as quantified by other entanglement measures. We show, in fact, that this is not simply an issue with the particular framework of Brandão and Plenio [23,24], but rather a fundamental property of entanglement theory: any attempt to achieve reversibility must necessarily lead to macroscopic entanglement generation.
To make this precise, consider a modified version of asymptotic entanglement manipulation. As previously, given copies of an initial state , we want to transform them into copies of a target state with asymptotically vanishing error. To define the set of allowed transformations, based on Brandão and Plenio's approach [24], we fix an entanglement measure and consider all those transformations Λ on copies of the system that are ( , )-approximately nonentangling, in the sense that (Λ ( )) ≤ for all separable states , where the numbers quantify the magnitude of the entanglement fluctuations at each step of the process. The maximum ratio / that can be achieved in the limit → ∞ determines the transformation rate under these operations. The modified notion of distillable entanglement, denoted , NE ( ) ( ), is then defined by choosing the maximally entangled bit Φ 2 as the target state, and analogously the modified entanglement cost , NE ( ) ( ) corresponds to the transformation from Φ 2 to a given state . By choosing a suitable measure of entanglement and setting = 0 for all , we recover our original definition of non-entangling transformations NE.
The problem of choosing what measure to employ has no straightforward solution, as it is well known that there are many asymptotically inequivalent ways to quantify entanglement [7]; hence, constraining one such measure cannot guarantee that the supplemented entanglement is truly small according to all measures. From a methodological perspective, this arbitrariness is problematic: resource quantifiers should be endowed with an operational interpretation by means of a task defined in purely natural terms; presupposing a particular measure and using it to define the task in the first place makes the framework somewhat contrived, and does not take into consideration what happens when a different monotone is used.
Indeed, the choice of turns out to be pivotal. Brandão and Plenio's main result [23,24] claims [25] that, with the specific choice of being the generalised robustness of entanglement [35], entanglement can be manipulated reversibly even if we take → 0 as → ∞. In stark contrast, we now show that a completely opposite conclusion is reached when is taken to be either the standard robustness [35] or the entanglement negativity [27].
Comparing this to Brandão and Plenio's conclusion, we can observe that the operations employed there may only hope to achieve reversibility by generating exponential amounts of entanglement, as measured by either the negativity or the standard robustness.
We stress that there is no a priori operationally justified reason to prefer the generalised robustness over the other monotones. If anything, the most operationally meaningful monotones to select here would be those defined directly in terms of practical tasks, such as the entanglement cost itself; however, following this route actually trivialises the theory [23,24], entailing that different choices of monotones need to be employed to give meaningful results. Even between the generalised robustness (as employed by Brandão and Plenio) and the standard robustness S , it is actually the latter that admits a clearer operational interpretation in this context -S quantifies exactly the entanglement cost of a state in the one-shot setting [44], when asymptotic transformations are not allowed. These ambiguities in the choice of a 'good' measure, and the vastly disparate physical consequences of the different choices, put the physicality of the reversibility result claimed by Brandão and Plenio [23,24] into question: why should one such framework be considered more physical than the other, irreversible ones? Importantly, since the core concept of separability is independent of the particular choice of a measure, our axiomatic assumption of strict no-entanglement-generation bypasses the above problems completely: it removes the dependence on any entanglement measure and ensures that the physical constraints are enforced at all scales, therefore yielding an unambiguously physical model of general entanglement transformations. However, should such a requirement be considered too strict, our Theorem 2 shows that irreversibility of entanglement is robust to fluctuations in the generated resources.
Let us also point out that the assumptions of Brandão and Plenio (and of Theorem 2) are in fact more permissive than those typically employed in quantum thermodynamic frameworks [40,45,46], where one usually allows fluctuations in the sense of the consumption of small ancillary resources, but not fluctuations in the very physical laws governing the process. In a thermodynamic sense, entanglement transformations under approximately non-entangling maps could be compared to the manipulation of systems under transformations that do not conserve the overall energy -a relaxation which would go against standard axiomatic assumptions [45,46]. Importantly, no such 'unphysical' fluctuations are necessary in order to establish the reversibility of thermodynamics [30,40] or other known quantum resources [28]. We invite the interested reader to Supplementary Note V, where we discuss different notions of resource fluctuations in more detail.

DISCUSSION
Our results close a major open problem in the characterisation of entanglement [47] by showing that a reversible theory of this resource cannot be established under any set of 'free' transformations which do not generate entanglement. Indeed, from our characterisation we can conclude not only that entanglement generation is necessary for reversibility, but also that macroscopically large amounts of it must be supplemented. This shows that the framework proposed by Brandão and Plenio [23,24] is effectively the smallest possible one that could allow reversibility, although only at the cost of significant entanglement expenditure.
That the seemingly small revision of the underlying technical assumptions we advocated by enforcing strict entanglement non-generation can have such far-reaching consequences -namely, precluding reversibility -is truly unexpected. In fact, as remarked before, the opposite of this phenomenon has been observed in a number of fundamentally important quantum resource theories, where the set of all resource-non-generating operations suffices to enable reversibility. It is precisely the necessity to generate entanglement in order to reversibly manipulate it that distinguishes the theory of entanglement from thermodynamics and other quantum resources. This fundamental difference contrasts not only with the previously established information-theoretic parallels, but also with the many links that have emerged between entanglement and thermodynamics in broader contexts such as many-body and relativistic physics [48][49][50][51]. It then becomes an enthralling foundational problem to understand what makes entanglement theory special in this respect, and where its fundamental irreversibility may come from. Additionally, the axiomatic theory of entanglement manipulation delineated here leaves several outstanding follow-up questions: for instance, it would be very interesting to understand whether a closed expression for the associated entanglement cost can be established, and whether the phenomenon of entanglement catalysis [7,52] can play a role in this setting.
We remark that the recently identified gap in Brandão and Plenio's proof [25], which came to light after this work was completed, does not affect our results or conclusions in any way, since the methods that we use are independent of Refs. [23,24]. Our main finding -that of entanglement irreversibility under non-entangling operations -is complementary to the result of Brandão and Plenio [23,24], as we discussed above and in Supplementary Notes IV-V. This recent development does, however, rekindle the question of whether entanglement can be reversibly manipulated whatsoever [47], even in a more permissive framework such as Brandão and Plenio's.
In conclusion, we have highlighted a fundamental difference between the theory of entanglement manipulation and thermodynamics, proving that no microscopically consistent second law can be established for the former. At its heart, our work reveals an inescapable restriction precipitated by the laws of quantum physics -one that has no analogue in classical theories, and was previously unknown even within the realm of quantum theory.

METHODS
In the following we sketch the main ideas needed to arrive at a proof of our main result, Theorem 1, and extensions thereof.

A. Asymptotic transformation rates under non-entangling operations
We start by defining rigorously the fundamental quantities we are dealing with. Given two separable Hilbert spaces H and H and the associated spaces of trace class operators T (H) and T (H ), a linear map Λ : T (H) → T (H ) is said to be positive and trace preserving if it transforms density operators on H into density operators on H . As is well known, physically realisable quantum operations need to be completely positive and not merely positive [53]. While we could enforce this additional assumption without affecting any of our results, we will only need to assume the positivity of the transformations, establishing limitations also for processes more general than quantum channels.
Since we are dealing with entanglement, we need to make both H and H bipartite systems. We shall therefore assume that H = H ⊗ H and H = H ⊗ H have a tensor product structure.
Separable states on are defined as those that admit a decomposition as in (1). A positive tracepreserving operation Λ : T (H ⊗ H ) → T (H ⊗ H ), which we shall denote compactly as Λ → , is said to be non-entangling or separability-preserving if it transforms separable states on into separable states on . We will denote the set of non-entangling operations from to as NE( → ). The central questions in the theory of entanglement manipulation are the following. Given a bipartite state and a set of quantum operations, how much entanglement can be extracted from ? How much entanglement does it cost to generate in the first place? The ultimate limitations to these two processes, called entanglement distillation and entanglement dilution, respectively, are well captured by looking at the asymptotic limit of many copies. As remarked above, this procedure is analogous to the thermodynamic limit. The resulting quantities are called the distillable entanglement and the entanglement cost, respectively. We already discussed their intuitive operational definitions, so we now give their mathematical forms: Here, is the system composed by copies of , 0 0 denotes a fixed two-qubit quantum system, and , is the maximally entangled state of 0 0 , also called the 'entanglement bit'.
One question that could be raised at this point is: is our definition of transformation rates not too restrictive? Such a reservation could be motivated by the fact that, e.g., in the resource theory of quantum thermodynamics, employing only energy-conserving unitary transformations is known to be insufficient to achieve general transformations [39]; to avoid this issue, additional resources are provided in the form of ancillary systems composed of a sublinear number of qubits, allowing one to circumvent the restrictions of energy conservation without affecting the underlying physics [54,55]. Such an approach can be adapted to more general resources [56]. In our setting, however, this is already implicitly included in the definition of and , since such ancillary systems can be absorbed into the asymptotic transformation rates. That is, we could have equivalently defined where ( ) are arbitrary (possibly entangled) systems such that dim = 2 ( ) . The rates are not affected by the addition of such an ancilla, since its sub-exponential size means that any contributions to the rate due to will vanish asymptotically. This is addressed in more detail in Supplementary Note V.

B. The main idea: tempered negativity
Let us commence by looking at a well-known entanglement measure called the logarithmic negativity [27,57]. For a bipartite state , this is formally defined by Here, Γ denotes the partial transpose, i.e. the linear map Γ : is the space of bounded operators on H ⊗ H , that acts as Γ( ⊗ ) = ( ⊗ ) Γ ⊗ , with denoting the transposition with respect to a fixed basis, and is extended by linearity and continuity to the whole T (H ⊗ H ) [58]. It is understood that ( ) = ∞ if Γ is not of trace class. Remarkably, the logarithmic negativity does not depend on the basis chosen for the transposition. Also, since Γ is a valid state for any separable [58], this measure vanishes on separable states, i.e.
Given a non-negative real-valued function on bipartite states which we think of as an 'entanglement measure', when can it be used to give bounds on the operationally relevant quantities and ? It is often claimed that in order for this to be the case, should obey, among other things, a particular technical condition known as asymptotic continuity. Since a precise technical definition of this term is not crucial for this discussion, it suffices to say that it amounts to a strong form of uniform continuity, in which the approximation error does not grow too large in the dimension of the underlying space. While asymptotic continuity is certainly a critical requirement in general [17,59], it is not always indispensable [17,27,[60][61][62][63]. The starting point of our approach is the elementary observation that the logarithmic negativity , for instance, is not asymptotically continuous, yet it yields an upper bound on the distillable entanglement [27]. The former claim can be easily understood by casting (8) into the equivalent form ( ) = log 2 sup Tr : where ∞ sup | | is the operator norm of , and the supremum is taken over all normalised state vectors | . Since the trace norm and the operator norm are dual to each other, the continuity of with respect to the trace norm is governed by the operator norm of in the optimisation (10). However, while the operator norm of Γ is at most 1, that of can only be bounded as ∞ ≤ Γ ∞ ≤ , where min {dim(H ), dim(H )} is the minimum of the local dimensions. This bound is generally tight; since grows exponentially in the number of copies, it implies that is not asymptotically continuous. But then why is it that still gives an upper bound on the distillable entanglement? A careful examination of the proof by Vidal and Werner [27] (see the discussion surrounding Eq. (46) there) reveals that this is only possible because the exponentially large number actually matches the value taken by the supremum in (10) on the maximally entangled state, that is, on the target state of the distillation protocol. Let us try to adapt this capital observation to our needs. Since we want to employ a negativity-like measure to lower bound the entanglement cost instead of upper bounding the distillable entanglement, we need a substantial modification.
The above discussion inspired our main idea: let us tweak the variational program in (10) by imposing that the operator norm of be controlled by the final value of the program itself. The logic of this reasoning may seem circular at first sight, but we will see that it is not so. For two bipartite states , , we define the tempered negativity by and the corresponding tempered logarithmic negativity by This definition encapsulates the above idea of tying together the value of the function and its continuity properties, and indeed will turn out to yield the desired lower bound on the entanglement cost. Note the critical fact that in the definition of ( ) the operator norm of is given precisely by the value of ( ) itself.

C. Properties of the tempered negativity
The tempered negativity ( | ) given by (11) can be computed as a semi-definite program for any given pair of states and , which means that it can be evaluated efficiently (in time polynomial in the dimension [64]). Moreover, it obeys three fundamental properties, the proofs of which can be found in Supplementary Note II. In what follows, the states , are entirely arbitrary.
(b) Super-additivity: (c) The ' -lemma': The tempered negativity, just like the standard (logarithmic) negativity, is monotonic under several sets of quantum operations commonly employed in entanglement theory, such as LOCC or positive partial transpose operations [65], but not under non-entangling operations. Quite remarkably, it still plays a key role in our approach.

D. Sketch of the proof of Theorem 1
To prove Theorem 1, we start by establishing the general lower bound on the entanglement cost of any state under non-entangling operations. To show (16), let > 0 be any number belonging to the set in the definition of (S20) -in quantum information, this is known as an achievable rate for entanglement dilution. By definition, there exists a sequence of non-entangling operations Λ ∈ NE 0 , =1 | | for a twoqudit maximally entangled state, and observed that Φ ⊗ 2 = Φ 2 . A key step in our derivation is to write Φ -which is, naturally, a highly entangled state -as the difference of two multiples of separable states. (In fact, this procedure leads to the construction of a related entanglement monotone called the standard robustness of entanglement [35]; we consider it in detail in the Supplementary Information.) It has long been known that this can be done by setting where 1 stands for the identity on the two-qudit, 2 -dimensional Hilbert space. Crucially, ± are both separable [66]. Applying a non-entangling operation Λ acting on a two-qudit system yields Since Λ( ± ) are again separable, we can then employ the observation that Γ 1 = 1 for separable states (recall (9)) together with the triangle inequality for the trace norm, and conclude that We are now ready to present our main argument, expressed by the chain of inequalities derived using (18) together with the above properties (a)-(c) of the tempered negativity. Evaluating the logarithm of both sides, diving by , and then taking the limit → ∞ gives ≥ ( ). A minimisation over the achievable rates > 0 then yields (16), according to the definition of (S20). We now apply (16) to the two-qutrit state To compute its tempered logarithmic negativity, we construct an ansatz for the optimisation in the definition (11) of by setting 3 2 3 −3Φ 3 . Since it is straightforward to verify that Γ 3 ∞ = 1 and 3 ∞ = 2 = Tr 3 3 , this yields In Supplementary Note III, we show that the above inequalities are in fact all equalities. It remains to upper bound the distillable entanglement of 3 . This can be done by estimating its relative entropy of entanglement [67], which quantifies its distance from the set of separable states as measured by the quantum relative entropy [68]. Simply taking the separable state 3 /3 as an ansatz shows that and once again this estimate turns out to be tight. Combining (20) and (21) demonstrates a gap between and , thus proving Theorem 1 on the irreversibility of entanglement theory under non-entangling operations.

E. Consequences and further considerations
Our result explicitly show that there cannot exist a single quantity that governs asymptotic entanglement transformations, thus ruling out a 'second law' of entanglement theory under nonentangling operations. Specifically, it is already known that, were such a quantity to exist, it would have to equal the regularised relative entropy of entanglement ∞ ,S [16,59]. But then consider the Thus, if the second law held, then from two copies of Φ 2 one should be able to obtain three copies of 3 . But Theorem 1 explicitly shows that only two copies of 3 can be obtained from two copies of Φ 2 . An interesting aspect of our lower bound on the entanglement cost in (20) is that it can therefore be strictly better than the (regularised) relative entropy bound. Previously known lower bounds on entanglement cost which can be computed in practice are actually worse than the relative entropy [38,69], which means that our methods provide a bound that is both computable and can improve on previous approaches.
As a final remark, we note that instead of the class of non-entangling (separability-preserving) operations, we could have instead considered all positive-partial-transpose-preserving maps, which are defined as those that leave invariant the set of states whose partial transpose is positive. Within this latter approach we are able to establish an analogous irreversibility result for the theory of entanglement manipulation, recovering and strengthening the findings of Wang and Duan [38]. Explicit details are provided in the Supplementary Information.

F. Necessity of macroscopic entanglement generation
In Theorem 2, we strengthen the result of Theorem 1 further by considering operations that are not required to be non-entangling, but only approximately so, allowing for the possibility of microscopic fluctuations in the form of small amounts of entanglement being generated.
As we discussed in the main text, this mirrors the approach taken by Brandão and Plenio [23,24], where reversibility of entanglement was claimed under similar constraints. The reason we call that framework into question is that the entanglement generated by the 'asymptotically non-entangling maps' (ANE) employed there, despite being small when quantified by the generalised robustness, can actually be very large when gauged with another measure, such as the standard robustness or the negativity. Instead of demonstrating this with an explicit example, we prove an even stronger statement, namely, that irreversibility must persist if the generated entanglement is required to be small with respect to these other measures. It follows logically that any claimed restoration of reversibility requires macroscopic entanglement generation in the process.
To this end, as described in the main text, we consider a sequence of operations Λ which are ( , )-approximately non-entangling, in the sense that where ( ) ∈N ∈ R + is a sequence governing the restrictions on entanglement generation, and is a choice of an entanglement measure. We denote the above class of operations as NE ( ) , and the associated distillable entanglement and entanglement cost as Γ 1 − 1 is the negativity [27] (whose logarithmic version we already encountered in Eq. (8)), or = S is the standard robustness of entanglement [35], defined by S ( ) inf ≥ 0 : ∃ separable state : + separable . Compare this with Brandão and Plenio's choice of the generalised robustness, given by S ( ) inf ≥ 0 : ∃ state : + separable ; the only difference between the latter two expressions is whether or not is required to be separable.
Theorem 2 then tells us that as long as the generated entanglement stays sub-exponential according to = or = S , then irreversibility persists. The key step in proving this result is an approximate monotonicity of the two measures under all ( , )-approximately non-entangling operations; specifically, we can show that under the application of any map Λ satisfying (22), the corresponding measure cannot increase by more than a factor (1 + ). But if = 2 ( ) , then any such additional term will vanish in the limit → ∞, meaning that the basic idea of our proof of Theorem 1 can be applied almost unchanged, as the asymptotic bounds will not be affected by ( , ) entanglement generation. A full discussion of the proof and the requirements on entanglement creation required to achieve reversibility can be found in Supplementary Note IV. This contrasts with the result claimed by Brandão and Plenio [23,24]: there, choosing as the generalised robustness S is conjectured [25] to yield full reversibility of the theory. In support of this conjecture, note that due to Brandão and Plenio's result concerned with entanglement dilution -whose proof is not affected by the aforementioned issue [25] -the entanglement cost of an arbitrary state under S , -approximately non-entangling operations, with − −− → →∞ 0, coincides with its regularised relative entropy of entanglement. In the case of 3 , this equals log 2 (3/2), which matches its distillable entanglement. Therefore, while we still lack a general proof of reversibility that holds for all states, at least 3 is a reversible state under Brandão and Plenio's asymptotically non-entangling operations provided that one makes the choice = S . However, modifying this choice ever so slightly by picking the standard instead of the generalised robustness shatters reversibility altogether. The choice of the measure in (22) is for all intents and purposes a free parameter, and -as we just showed -a crucial one, on which the conclusion hinges. This ambiguity is precisely why no one framework of this type can be deemed more physical than another: there does not appear to be a reason to consider = S a better motivated choice than = S . Due to the inability to unambiguously define a sensible notion of 'small' entanglement, especially when the macroscopic limit is involved, we thus posit that the only way to enforce fully physically consistent manipulation of entanglement is to forbid any entanglement generation whatsoever, as we have done in our approach based on non-entangling operations.

G. Extension to quantum communication
The setting of quantum communication is a strictly more general framework in which the manipulated objects are quantum channels themselves. Specifically, consider the situation where the separated parties Alice and Bob are attempting to communicate through a noisy quantum channel Λ : T (H ) → T (H ). To every such channel we associate its Choi-Jamiołkowski state, defined through the application of the channel Λ to one half of a maximally entangled state: where id denotes the identity channel and is the local dimension of Alice's system, assumed for now to be finite. Such a state encodes all information about a given channel [70,71]. The parallel with entanglement manipulation is then made clear by noticing that communicating one qubit of information is equivalent to Alice and Bob realising a noiseless qubit identity channel, id 2 . But the Choi-Jamiołkowski operator id 2 is just the maximally entangled state Φ 2 , so the process of quantum communication can be understood as Alice and Bob trying to establish a 'maximally entangled state' in the form of a noiseless communication channel. The distillable entanglement in this setting is the (two-way assisted) quantum capacity of the channel [10], corresponding to the rate at which maximally entangled states can be extracted by the separated parties, and therefore the rate at which quantum information can be sent through the channel with asymptotically vanishing error. In a similar way, we can consider the entanglement cost of the channel [37], that is, the rate of pure entanglement that needs to be used in order to simulate the channel Λ.
We sketch the basic idea here, as it is very similar to the approach we took for quantum states above. The complete details of the proof in the channel setting will be published elsewhere [29].
The major difference between quantum communication and the manipulation of static entanglement arises in the way that Alice and Bob can implement the processing of their channels. Having access to copies of a quantum state is fully equivalent to having the tensor product ⊗ at one's disposal, but the situation is significantly more complex when uses of a quantum channel Λ are available, as they can be exploited in many different ways: they can be used in parallel as Λ ⊗ ; or sequentially, where the output of one use of the channel can be used to influence the input to the subsequent uses; or even in more general ways which do not need to obey a fixed causal order between channel uses, and can exploit phenomena such as superposition of causal orders [72,73]. This motivates us, once again, to consider a general, axiomatic approach that covers all physically consistent ways to manipulate quantum channels, as long as they do not generate entanglement between Alice and Bob if it was not present in the first place. Specifically, we will consider the following. Given channels Λ 1 , . . . Λ , we define an -channel quantum process to be any -linear map Υ such that Υ(Λ 1 , . . . , Λ ) is also a valid quantum channel. Now, channels Γ → such that Γ is separable are known as entanglement-breaking channels [74]. We define a non-entangling process to be one such that Υ(Γ 1 , . . . , Γ ) is entanglement breaking whenever Γ 1 , . . . , Γ are all entanglement breaking.
The quantum capacity (Λ) is then defined as the maximum rate at which non-entangling -channel processes can establish the noiseless communication channel id ⊗ 2 when the channel Λ is used times. As in the case of quantum state manipulation, the transformation error here is only required to vanish asymptotically. Analogously, the (parallel) entanglement cost (Λ) is given by the rate at which noiseless identity channels id 2 are required in order to simulate parallel copies of the given communication channel Λ.
The first step of the extension of our results to the channel setting is then conceptually simple: we define the tempered negativity of a channel as where the supremum is over all bipartite quantum states ∈ T (H ⊗ H ) on two copies of the Hilbert space of Alice's system. A careful extension of the arguments we made for statesaccounting in particular for the more complicated topological structure of quantum channelscan be shown [29] to give for any Λ : → , whether finite-or infinite-dimensional. For our example of an irreversible channel, we will use the qutrit-to-qutrit channel Ω 3 whose Choi-Jamiołkowski state is 3 ; namely, where Δ(·) = 3 =1 | | · | | is the completely dephasing channel. Our lower bound (24) on the entanglement cost then gives ( To upper bound the quantum capacity of Ω 3 , several approaches are known. If the manipulation protocols we consider were restricted to adaptive quantum circuits, we could follow established techniques [10,75,76] and use the relative entropy to obtain a bound very similar to the one we employed in the state case (Eq. (21)). However, to maintain full generality, we will instead employ a recent result [63] which showed that an upper bound on under the action of arbitrary non-entangling protocols -not restricted to quantum circuits, and not required to have a definite causal order -is given by the max-relative entropy [77] between a channel and all entanglementbreaking channels. Using the completely dephasing channel Δ as an ansatz, we then get establishing the irreversibility in the manipulation of quantum channels under the most general transformation protocols.

No second law of entanglement manipulation after all -Supplementary Information
. When normalised to have trace equal to one, operators in T + (H) form the set of density operators, which we will denote by D(H). Hereafter, the subscript sa (e.g. B sa ) indicates a restriction to self-adjoint operators. A linear map Λ : T (H ) → T (H ), i.e. from system to system , is said to be: where id denotes the identity channel on the space of × complex matrices; (iii) trace preserving, if Tr Λ( ) = Tr for all .
We will denote the set of positive (respectively, completely positive) trace preserving maps from to with PTP → (respectively, CPTP → ). Given a bounded linear map Λ : T (H ) → T (H ), we can consider its adjoint Λ † : B(H ) → B(H ), defined by the identity is a positive and trace preserving linear map, then it satisfies that This in particular implies that Λ is bounded in the Banach space sense. Its adjoint Λ † is positive and unital, meaning that Λ † (1 ) = 1 , and more generally As mentioned in the main text, Werner, Holevo, and Shirokov [34] have shown that a state is separable if and only if it can be expressed as where is a Borel probability measure on the product of the sets of local (normalised) pure states. The cone generated by the set of separable states will be denoted with As an outer approximation to S , one often employs the cone of positive operators that also have a positive partial transpose (PPT). In formula, this is given by and Γ stands for the partial transpose. As recalled already in the main text, this is defined by the expression on simple tensors, and is extended to the whole T (H ) by linearity and continuity. Some subtleties related to the infinite-dimensional case are discussed in the Supplementary Note VII. It has been long known that [58] S ⊆ PPT (S9) for all bipartite systems . Leveraging this fact, one can introduce an easily computable entanglement measure known as the logarithmic negativity, given by [27,57] ( ) log 2 Γ 1 = log 2 sup Tr : Here, 'bounded' is intended in the Banach space sense; that is, we require that Λ( ) 1 ≤ 1 for some constant < ∞ and all ∈ T (H ).

Note.
Hereafter, unless otherwise specified, K will denote one of the two cones K = S or K = PPT defined by (S6) and (S7), respectively.
All states from now are understood to be on a bipartite system , although we will often drop the subscripts for the sake of readability. The (standard) K-robustness of a state is defined as Here, it is understood that the variable is a trace-class operator. Much of the appeal of the expression in (S11) is that it is a convex optimisation program, and even a semi-definite program (SDP) for the special case of K = PPT (however, it is an infinite-dimensional optimisation when acts on an infinite-dimensional space). Note that PPT ( ) ≤ S ( ) holds for all states , as a simple inspection of (S11) reveals.
Note. Our definition of robustness K follows the convention of Vidal and Tarrach [35]. The robustness as constructed in Ref. [78,79], instead, would be expressed as 1 + K in our current notation.
It turns out that 1 + 2 K ( ) is nothing but the base norm of = as computed in the base norm space (B sa (H ) , K , Tr ). This is proved in Ref. [78,Lemma 25] for the case K = S, and in Supplementary Note VII for the case K = PPT . Leveraging this correspondence, one can establish the dual representation [78,Eq. (23) and Lemma 25] where The notation [−1, 1] K * is motivated by the fact that this set can be understood as an operator interval with respect to the cone K * , in the sense that [−1, 1] K * = (1 − K * ) ∩ (−1 + K * ). Combining expressions (S11) and (S12) allows us to compute the robustness exactly in some cases. For instance, for a pure state Ψ = |Ψ Ψ| with Schmidt decomposition |Ψ = ∞ =0 | | it holds that [35,78,79] The special case of this formula where Ψ = Φ ⊗ 2 is made of copies of an entanglement bit |Φ 2 = 1 √ 2 (|00 + |11 ) is especially useful. We obtain that On a different note, by combining the expression in (S12) with the elementary estimate K ≥ Γ 1 one deduces the following.
Lemma S1. For all states , it holds that We will also need some notation for the set of quantum channels that preserve separability and PPT-ness, or -in short -K-ness. Remember that we denote with (C)PTP → the set of (completely) positive and trace-preserving maps from to . Since our results actually hold for all positive transformations, even those that are not completely positive, we will drop the complete positivity assumption from now on.
Thus, let us we define K-preserving operations between two bipartite systems and by When K = S, the above identity defines the set of non-entangling (or separability-preserving) operations employed in the main text. When K = PPT , we obtain the family of PPT-preserving operations instead. Note that the name of 'PPT-preserving' maps has been used in the literature to refer to several distinct concepts; we stress that here we only impose that Λ( ) is PPT whenever is.
Let us comment briefly on this choice of transformations. As discussed in the main text, the intention here is to be as general as possible -it would be beside the point to ask ourselves whether all non-entangling transformations are physically implementable in any given theory; in general this might not always be the case. In exactly the same way, not all transformations that obey the second law of thermodynamics are physical: consider e.g. one that does not preserve electric charge or angular momentum. The assumption of no entanglement generation is merely a necessary condition for a transformation to be physical -any practical process used for entanglement manipulation should not create extra entanglement from nothing, and hence should be in KP( → ). We record the following elementary yet important observation. Lemma S2. For K = S , PPT , the K-robustnesses (S11) is monotonic under K-preserving operations.
Proof. The proof follows standard arguments, and indeed an even stronger variant of the monotonicity of the robustnesses (selective monotonicity) has been shown e.g. in Ref. [80]; we repeat the basic argument here only for the sake of convenience. For a bipartite state , an arbitrary > 0, and some Λ ∈ KP ( → ), let ∈ K be such that + ∈ K and Tr ≤ ( ) + . Then Λ( ) ∈ K and also Λ( Since this holds for arbitrary > 0, we deduce that K Λ( ) ≤ K ( ), as claimed.
We now recall the definitions of distillable entanglement and entanglement cost of a state under K-preserving operations. We present here a slightly more general construction than in the Methods section of the main text, namely, one which incorporates a non-zero asymptotic error. Following e.g. Vidal and Werner [27,Eq. (43)], for an arbitrary ∈ [0, 1) let us set For a fixed , the function , KP ( ) is non-decreasing in , while , KP ( ) is non-increasing. Also, note that coincide with the quantities discussed in the Methods section of the main text. A variation on the notions of distillable entanglement and entanglement cost can be obtained by looking only at exact transformations. The corresponding modified entanglement measures read Although less operationally meaningful than their error-tolerant counterparts (S19)-(S20), the exact distillable entanglement and the exact entanglement cost are nevertheless useful sometimes. For instance, they can come in handy in establishing bounds, thanks to the simple inequalities which hold for all bipartite states . We now introduce formally the notion of reversibility for the theory of entanglement manipulation under K-preserving transformations. It is not difficult to realise that neither of the two families of K-preserving operations, for K = S and K = PPT , is a subset of the other. Hence, one would be tempted to deduce that the distillable entanglement and the entanglement cost under non-entangling and PPT-preserving operations do not obey any general inequality. This is however not the case, and the reason must be traced back to the very high degree of symmetry exhibited by the maximally entangled state, and -more precisely -to the fact that for isotropic states the 'PPT criterion' [58] is necessary and sufficient for separability [66]. The operational relation between the two classes can be inferred from [24, Remark on pp. 843-844] already, and we can formalise this observation as follows.

Lemma S4. For all bipartite states and all
In particular, Proof. We prove only the first inequality, as all the others are completely analogous. Let be an achievable rate for , PPTP ( ) at error threshold ∈ [0, 1), and let Λ ∈ PPTP be PPT-preserving operations satisfying that lim sup →∞ ≤ . To proceed further, define the twirling operation T on a 2 × 2 bipartite system by [66] T ( ) where d denotes the Haar measure over the (local) unitary group. Note that T ∈ NE ∩ PPTPin fact, T can be physically implemented with local operations and shared randomness -and that the output states of T are all isotropic, that is, they are linear combinations of Φ ⊗ 2 and the maximally mixed state. For states of this form, it is known that the PPT criterion is necessary and sufficient for separability [66].
We now claim that T • Λ ∈ NE. To see why this is the case, note that for all states ∈ S ⊆ PPT it holds that Λ ( ) ∈ PPT and hence T • Λ ( ) ∈ PPT . However, since the latter is an isotropic state, we conclude that in fact T • Λ ( ) ∈ S, proving the claim. Observe also that where the last inequality is a consequence of the contractivity of the trace norm under positive trace preserving maps [81], or more mundanely of the triangle inequality applied to the integral representation (S28) of T . The above relation implies that the distillation rate is achievable at error threshold by means of the separability-preserving operations T • Λ , i.e. , NE ( ) ≥ . Taking the infimum in we obtain the sought inequality , NE ( ) ≥ , PPTP ( ).

II. TEMPERED ROBUSTNESS AND TEMPERED NEGATIVITY
The main idea is to introduce a modified version of the standard K-robustness in (S11) by modifying the dual program in (S12). Namely, for a pair of states , , let us define thetempered K-robustness by 1 + 2 K ( | ) sup Tr : Here, the operator interval [−1, 1] K * is defined by (S14). Note that the constraint ∞ = Tr can be rewritten as − (Tr ) 1 ≤ ≤ (Tr ) 1. Therefore, the expression in (S30) is a convex program, and even an SDP for the special case K = PPT . What this additional constraint is trying to tell us is that the support supp of lies entirely within the eigenspace of corresponding to the eigenvalue with the largest modulus. At this point, a little thought shows that K ( | ) depends in fact only on and supp . Analogously, K ( ) depends only on supp .
We also introduce a further quantity, the -tempered negativity, defined by Exactly as above, it does not take long to realise that the expression in (S32) is in fact an SDP, and that ( | ) depends only on and supp , while ( ) depends only on supp . The corresponding tempered logarithmic negativity is The main elementary properties of the tempered robustness and negativity -related to their monotonicity, multiplicativity, and various bounds between the quantities -are gathered in the following proposition.
Proposition S5. For K = S , PPT and for all pairs of states , on a bipartite system , it holds that: (c) K is monotonic under the simultaneous action of any K-preserving map Λ ∈ KP ( → ), in formula are satisfied.
Proof. We proceeed one claim at a time.
(a) Taking = 1 in the definition of K (S30) yields immediately that K ( | ) ≥ 0. Also, since we obtained (S30) by adding one more constraint to the dual program (S12) for the standard robustness, it is clear that the value of the the supremum can only decrease, implying that On the other hand, it is not difficult to verify that the operators in the dual formulation of K (S12) can always be assumed to be compact and in fact even of finite rank. Indeed, thanks to the fact that H and H are separable Hilbert spaces, we can pick sequences of finitedimensional projectors ( ) ∈N  This shows that in (S12) can be taken to be of finite rank. For any such , there will exist a state such that ∞ = Tr ; in fact, it suffices to have the support of span the eigenspace of corresponding to the eigenvalue with maximum modulus. Therefore, (b) The lower bound ( | ) ≥ 1 can be retrieved by setting = 1 in (S32). The fact that ( | ) ≤ Γ 1 follows by comparing (S32) with the dual form of the negativity on the rightmost side of (S10). The equality Γ 1 = sup ( | ) is proved as for claim (a). One starts by showing that the operator in the rightmost side of (S10) can be assumed to have finite rank. To see this, it suffices to observe that = ( ⊗ ) ( ⊗ ) defined as before satisfies that , considering the sequence of finite-rank operators ( ) instead of in (S10) leads to the same value of the optimisation.
(c) It suffices to write that Note that in (i) we just used the definition of adjoint map. Justifying (ii) requires a bit more care. We start by observing that where the inequality holds because Λ( ) is a normalised quantum state and belongs to K. (We remind the reader that our definition of K-preserving maps imposes that any such map is also positive and trace preserving.) Moreover, ∞ ≤ ∞ = Tr Λ( ) = Tr thanks to the positivity and unitality of Λ † (see (S3)); this is in fact an equality, because on the other hand Tr ≤ ∞ 1 = ∞ . Since = Λ † ( ) satisfies that ∈ [−1, 1] K * and moreover ∞ = Tr , we deduce the inequality in (ii).
(d) It is easy to see from (S30) that K ( | ) is monotonically decreasing with respect to the inclusion ordering on the cone K for all fixed and , meaning that K 1 ⊆ K 2 implies that K 1 ( | ) ≥ K 2 ( | ). Since S ⊆ PPT , the first inequality in (S36) follows. We now move on to the second. Note that Γ ∞ ≤ 1 entails that ∈ [−1, 1] S * , simply because for all ∈ S with Tr = 1 one has that where we remembered that Γ ≥ 0 and hence Γ 1 = Tr Γ = Tr = 1. Hence, the set on the right-hand side of (S32) is contained in that on the right-hand side of (S30), which shows that ( | ) ≤ 1 + 2 S ( | ).
In addition to the basic properties established above, our main results will rely on one more technical property of the tempered quantities. This is a perturbative version of Proposition S5(a), allowing us to relate the robustness K ( ) of a given state with the tempered robustness K ( ) of another state which is sufficiently close to it. The following lemma can quite rightly be regarded as lying at the heart of our method.
Lemma S6 (The -lemma). For all states , such that it holds that and also Proof. The first inequality in (S42) is just an application of Proposition S5(a). As for the second, using the definition (S30) of K as well as Hölder's inequality we see that which becomes (S42) upon elementary algebraic manipulations. The proof of (S43) is entirely analogous: This concludes the proof.

III. MAIN RESULTS: IRREVERSIBILITY OF ENTANGLEMENT MANIPULATION
Here we state our main results concerning the theory of entanglement manipulation for quantum states. The extension of the argument to quantum channels will be tackled in full detail separately [29].
Theorem S7. For K = S or K = PPT , the entanglement cost under K-preserving operations satisfies that inf where and the tempered logarithmic negativity is defined by (S34).

Remark S8. An interesting consequence of the above result is that the tempered logarithmic negativity is a lower bound on the standard entanglement cost under local operations and classical communication (LOCC), denoted , LOCC , in formula
The entanglement cost under LOCC is a notoriously hard quantity to compute; it is given by the regularised entanglement of formation [82], and the regularisation is known to be necessary due to Hastings's counterexample to the additivity conjectures [83,84]. Previously known lower bounds include the regularised relative entropy of entanglement [67] and the squashed entanglement [85][86][87], both of which are extremely hard to evaluate in general (albeit for different reasons). The former can be in turn lower bounded by either Piani's measured relative entropy of entanglement [69], which has the advantage of doing away with regularisations, or by the measure recently proposed by Wang and Duan [38], which is particularly convenient computationally because it is given by a semi-definite program (SDP). Both of these lower bounds on the LOCC entanglement cost, that inferred by Piani's results and that relying on , are quite useful, but are known to be weaker than that given by the regularised relative entropy of entanglement.
The tempered negativity provides us with an independent lower bound on the LOCC entanglement cost that can strictly improve on the regularised relative entropy one. This latter fact will be apparent from the proof of Theorem S9. Also, since it is also given by an SDP, our bound is still computationally friendly. We are aware of no other quantity possessing these properties.
Proof of Theorem S7. The following argument could be marginally simplified at the level of notation by resorting to the results of Brandão and Plenio [24]. However, for the sake of readability we prefer to give a more direct and self-contained proof. Call the bipartite system where lives. Let be an achievable rate for the entanglement cost , KP ( ) at some error threshold ∈ [0, 1/2). Consider a sequence of operations Λ ∈ KP 0 0 → , with 0 , 0 being single-qubit systems, such that For all sufficiently large , we then write 2 (i) Here, in (i) we just recalled the value of the standard robustness of maximally entangled states (S16), (ii) comes from the monotonicity of K under K-preserving operations (Lemma S2), and (iii) is an application of the -lemma (Lemma S6). Taking the logarithm, dividing by , and computing the limit for → ∞ yields where (iv) is a consequence of the fact that is bounded away from 1/2, as per (S50). This completes the proof of the first inequality (S46).
As for the second inequality (S47), we observe that Here, (v) is an application of the lower bound in Proposition S5(d), in (vi) we leveraged the super-multiplicativity of the tempered negativity (Proposition S5(e)), and finally (vii) is just the definition (S34) of tempered logarithmic negativity.
Before we state and prove our result on the irreversibility of entanglement, we need to recall and discuss two well-known bounds on the distillable entanglement. Lower bounds on , KP ( ) can be obtained by looking at smaller classes of operations included in the set of all K-preserving ones. A typical choice is the set of local operations assisted by one-way classical communication, say from Alice to Bob, denoted with LOCC → . In this setting, Devetak and Winter's hashing inequality [88] states that where ( ) − Tr log 2 is the von Neumann entropy, and Tr is the reduced state of on Bob's side. Since local operations assisted by one-way classical communication are both non-entangling and PPT-preserving, in formula LOCC → ⊆ KP, we see that , LOCC → ( ) ≤ , KP ( ). In particular, the rightmost side of the hashing inequality (S54) lower bounds the distillable entanglement under K-preserving operations, i.e. , KP ( ) ≥ coh ( ) .
To establish an upper bound on , KP ( ), instead, we can introduce a relative entropy measure defined by [67] , K ( ) inf Since K ⊗ K ⊆ K , the function , K is sub-additive, and then Fekete's lemma [89] implies that its regularisation is well defined and satisfies that ∞ , K ( ) ≤ , K ( ). It does not take long to realise that , K is monotonic under K-preserving operations. This amounts to an elementary observation once one remembers that the relative entropy is non-increasing under the simultaneous application of any positive trace preserving map [90]. Since , K is also asymptotically continuous [91] (see also [92,Lemma 7]), its regularisation can be shown to be an upper bound on the distillable entanglement under K-preserving operations [14,93]: We are now ready to make use of the above Theorem S7 to prove irreversibility of entanglement manipulation under both non-entangling and PPT-preserving operations. To this end, according to Definition S3 (cf. (S24)) it suffices to exhibit an example of a bipartite state for which , KP ( ) < , KP ( ). Our candidate is a two-qutrit state, with Hilbert space H ⊗H = C 3 ⊗C 3 . Denote the local computational basis of the two qutrits , with {| } =1,2,3 . Define the projector onto the maximally correlated subspace and the maximally entangled state by respectively. Then, construct the state We now show the following, proving and extending Theorem 1 from the main text of the paper.
Theorem S9. The two-qutrit state 3 defined by (S60) satisfies that for all ∈ [0, 1/2). In particular, the resource theory of entanglement is irreversible under either nonentangling or PPT-preserving operations.
Remark S10. The above result not only guarantees that the entanglement cost of the state 3 under non-entangling operations is 1. It also establishes a 'pretty strong' converse [36] for this value of the rate. Namely, every protocol that attempts to prepare 3 from entanglement bits at a rate smaller than 1 must incur an asymptotic error that is not only non-vanishing, but actually larger than a constant. This constant is 1/2 in the current formulation of Theorem S9. However, we will see in Lemma S20 that a careful analysis actually yields a slightly larger value of 2/3. An even stronger statement (strong converse) can be shown for distillable entanglement, where no error smaller than 1 can improve the transformation rates whatsoever. For simplicity, we have omitted these extensions from the statement of Theorem S9, and we instead refer the interested reader to Supplementary Note VI B for a more in-depth discussion of (pretty) strong converses and error-rate trade-offs.
Proof of Theorem S9. We have that Here, (i) is an elementary computation, (ii) follows from the hashing inequality (S55), (iii) is a consequence of the upper bound on distillable entanglement in (S58), (iv) descends from the aforementioned sub-additivity of the relative entropy of K-ness, (v) is deduced by taking as ansatz in (S56) the state = 3 /3 ∈ S ⊆ PPT , and finally (vi) comes again from a direct calculation. This proves (S61).
As for the entanglement cost, irreversibility of entanglement under K-preserving operations hinges on the crucial inequality , KP ( 3 ) ≥ 1. Hereafter, ∈ [0, 1/2) is a fixed constant. Thanks to Theorem S7, it suffices to show that ( 3 ) ≥ 2. To this end, using the notation defined in (S59), let us consider the operator For completeness we now show that the inequalities in (S69) are in fact all tight; this will establish (S62) and conclude the proof. Start by observing that where (vii) follows from Lemma S4, while (viii) is an application of the elementary inequality (S24). We now argue that exact , NE ( 3 ) ≤ 1, by providing an explicit example of a non-entangling operation Λ from a two-qubit to a two-qutrit system such that Λ(Φ 2 ) = 3 . Construct where 3 show that Λ is non-entangling it suffices to prove that 3 + (1 − ) 3 ∈ S for all ∈ [0, 1/2]. Since the claim is trivial for = 0, because 3 is manifestly separable, by convexity it suffices to prove it for = 1/2. Let us write where |± 1 √ 2 (|1 ± |2 ), and P is the non-entangling quantum operation defined by (S73) with 3 denoting the symmetric group over a set of 3 elements. Note that the last equality in (S72), which can be proved by inspection using the representation in (S73), amounts to the sought separable decomposition of the state 1 2 ( 3 + 3 ). This establishes that , PPTP ( 3 ) ≤ , NE ( 3 ) ≤ 1 and concludes the proof.
Remark S11. One can wonder what other types of states cannot be reversibility manipulated. This is far from obvious, since the axiomatic classes of operations NE or PPTP are typically much more powerful than previously employed types of transformations; in particular, several types of states which have been used to show irreversibility in specific settings are actually reversible under NE or PPTP transformations.
The prime example of this is the antisymmetric state, defined on a bipartite system with Hilbert space C ⊗ C by where , =1 | | is the flip operator. This state gained fame as the 'universal counterexample' which violates many properties obeyed by other types of quantum states [94]: for example, it is known that its manipulation is highly irreversible under LOCC -its distillable entanglement is of order 1/ , while its entanglement cost is lower bounded by a -independent non-zero constant. [95]. Curiously, however, was also the first example of a mixed state whose manipulation is reversible under all PPT operations [22] -these transformations (hereafter simply denoted with PPT) are all maps Λ such that id ⊗ Λ is PPT-preserving for all ancillary systems [65], and are therefore a strict subset of the PPT-preserving operations considered herein. The reason why reversibility can be achieved in this setting is that the entanglement cost of can be significantly lowered by considering PPT transformations instead of LOCC, allowing it to reach order 1/ .
However, in Ref. [38], a related class of states supported on the asymmetric subspace was used to show the irreversibility of entanglement manipulation under PPT operations. In particular, for the state it was shown that ,PPT ( ) < 1 = ,PPT ( ). One might then wonder if these states could serve as a similar example of irreversibility for the larger class of PPT-preserving operations. However, this cannot be the case. To see this, we can use the fact that the quantity considered in [38] constitutes a lower bound on the distillable entanglement ,PPTP , but already in [38] it was shown that ( ) = 1, meaning that ,PPTP ( ) = 1 and this state is actually reversible under PPT-preserving maps.
In a way, this suggests that the state 3 is somewhat special, since its entanglement cost cannot be brought down low enough to match its distillable entanglement, even if we allow the extended classes of operations NE or PPTP. It would be interesting to study in more detail the special properties of 3 which induce this behaviour, and to understand exactly what types of states exhibit irreversibility in entanglement manipulation under NE and PPTP.
Remark S12. The proof of Theorem S9 actually allows us to compute also the zero-error costs of 3 As it turns out, the same entanglement cost of exact preparation of 1 can be achieved by means of a strict subset of PPT-preserving operations, namely, the aforementioned PPT operations. In fact, already the result by Audenaert et al. [22] guarantees that exact , PPT ( 3 ) = ( 3 ) = 1, where is the logarithmic negativity (S10), because 3 has vanishing 'binegativity', as a straightforward check reveals. We can arrive at the same conclusion thanks to the complete characterisation of the exact PPT entanglement cost recently proposed by Wang and Wilde [96].

IV. HOW MUCH ENTANGLEMENT MUST BE GENERATED TO ACHIEVE REVERSIBILITY?
We first recall the framework and the claimed results of Brandão and Plenio [24] in detail. To begin, we need to fix some notation. For two bipartite quantum systems , , a given non-negative function : D(H ) → R + ∪ {+∞} on the set of states on that vanishes on K ∩ D(H ), and some ≥ 0, we define the set of ( , )-approximately K-preserving maps by Typically, will be chosen to be an entanglement measure [7,67]. In what follows, we will in fact assume that is actually a family of functions defined on each bipartite quantum system. Given a sequence ( ) ∈N  and entanglement cost under ( , ( ) )-approximately K-preserving maps by setting We also set , KP ( ) ). To state Brandão and Plenio's conjecture in this framework, we first need to introduce another entanglement monotone closely related to the standard robustness. Recalling first the definition (S11) of K , namely, K ( ) = inf Tr : ∈ K, + ∈ K , the generalised robustness (or global robustness) [35,97,98] is defined similarly as It is also an entanglement monotone, and many similarities between the two robustness measures have been found; for example, for any pure state Ψ it holds that K (Ψ) = K (Ψ) [78,79,97,98]. The two can, however, exhibit very different properties, as we will explicitly demonstrate below (see also Supplementary Note VI). With this language, Brandão and Plenio's claim is that entanglement becomes reversible under S -asymptotically S-preserving maps. Employing the simplified notation KP ( ) KP ( ) S , we formalise their claim as follows.
Conjecture S13 (Reversibility under asymptotically non-entangling operations [24]). For any state acting on a finite-dimensional Hilbert space, there exists a sequence ( ) ∈N such that lim where ∞ ,K is the regularised relative entropy measure defined by (S57). Brandão and Plenio argue that the above means that entanglement can be reversibly interconverted without generating macroscopic amounts of it, since the supplemented entanglement is constrained by which vanishes in the asymptotic limit. This is certainly true if one quantifies entanglement with the generalised robustness. However, this is an a priori arbitrary choice: one could analogously choose the standard robustness K as a quantifier, and consider the (sequence of) sets of operations KP ( ) KP S ( ) defined by (S76) for the special case = S . Alternatively, We bring to the reader's attention the recently discovered technical issues underlying the proof of the main result of [24], as detailed in [25]. For this reason, we state the result here as a 'conjecture', and its validity is an open question. We nevertheless find it useful to discuss the result here in detail as we take a conceptual inspiration from the framework of [24]. Our findings are independent of whether this result is ultimately found to be correct or not.
An extension of our result is then as follows.
Theorem S14. For any sequence ( ) such that = 2 ( ) , the two-qutrit state 3 defined by (S60) satisfies that for all ∈ [0, 1/2). In particular, the resource theory of entanglement is irreversible under any class of operations which does not generate an amount of entanglement that grows exponentially in , as quantified by either the standard robustness K or by the negativity .
Here = 2 ( ) means that has a sub-exponential behaviour in : for any > 0, there exists ∈ N such that < 2 for all ≥ . Consequently, to have any hope of recovering reversibility, one needs (and hence the generated entanglement) to grow exponentially: there must exist a choice of > 0 such that, for all ∈ N, ≥ 2 for at least one ≥ ; in other words, is lower bounded by 2 infinitely often.
Remark. One can wonder whether K can be considered as a more operationally meaningful measure of the supplemented entanglement, justifying its use over other measures such as K or and thus substantiating the reversibility conjecture of [24] over the irreversibility result of Theorem S14. We do not believe that there is any compelling reason to do so: although K admits a very general operational interpretation as the quantifier of the advantage that a given state provides in channel discrimination tasks [79,101,102], K has an arguably even more relevant application, as it exactly quantifies the one-shot entanglement cost under non-entangling operations [44]. On the technical side, both of the quantities suffer from very similar issues in the many-copy limit, as they do not satisfy asymptotic continuity [91].
This is no coincidence, as Brandão and Plenio have shown that the choice of an asymptotically continuous monotone to quantify the supplemented entanglement leads to the trivialisation of the framework [24, Section V]. Note that almost all the most commonly used entanglement measures and all of those with the strongest operational meanings are in fact asymptotically continuous. Examples include the entanglement of formation [92,103], the (LOCC) entanglement cost [92], the squashed entanglement [86,104], and the (regularised) relative entropy of entanglement [91,92]. In fact, among the most widely used entanglement measures, the only one that is not asymptotically continuous is the logarithmic negativity [27,57]. For this reason, we regard the failure of asymptotic continuity for the robustnesses as an issue of some conceptual importance, one that may cast some doubts on the status of approximately K-preserving maps.
Following the original notation introduced by Hardy and Littlewood [99], we could denote this behaviour as = 2 Ω( ) . However, the commonly used notation Ω( ) actually refers to a stronger property [100], so we do not use it here.
The choice of K in [24] is motivated a posteriori by the fact that it is claimed to lead to reversibility, rather than by a prori physical considerations. We are therefore inclined to believe that there is no unique and indisputable choice of a suitable entanglement measure, and we consider Theorem S14 to serve as evidence that the irreversibility of entanglement revealed in our work is very robust, and that avoiding it requires a very careful and deliberate choice of an entanglement monotone -according to other, equally reasonable choices, the generated entanglement must be exponentially large.
Remark. We should also note in passing that between the two sets of operations NE ( ) and NE ( ) that we considered in Theorem S14 above, the former is perhaps more adherent to our intuitive notion of approximately non-entangling maps. Indeed, since the standard robustness of entanglement S is a faithful measure, i.e. it is strictly positive on all entangled states, transformations in NE 0 are in fact non-entangling. Transformations in NE 0 , on the contrary, map separable states to PPT states that can very well be entangled. However, since NE ⊆ NE , showing the irreversibility of entanglement under the operations NE constitutes a strictly stronger result, and indeed shows also that generating PPT entangled states is not sufficient to recover reversibility -any reversible protocol must create highly non-PPT entanglement.
The first step in proving the Theorem is the following lemma, which establishes an approximate monotonicity of K under approximately K-preserving maps, whether quantified by K itself or by the negativity.
Lemma S15. For any Λ ∈ KP ( → ), it holds that Similarly, for any Λ ∈ KP ( → ), it holds that Proof. Let us take Λ ∈ KP ( → ) and consider any feasible decomposition for the standard robustness of as = − where , ∈ K (noting that these are in general only unnormalised states). Since K Λ( ) Tr ≤ and K Λ( ) Tr ≤ , for any > 0 there exist decompositions for some , , , ∈ K such that Tr , Tr ≤ + . Then Since this holds for any feasible value of Tr , it must also hold that K (Λ( )) + 1 ≤ (1 + 2 + 2 )( K ( ) + 1), as K is defined precisely as the infimum of all feasible values of Tr . Since > 0 was arbitrary, the desired statement follows.
Theorem S14 then relies on the following extension of Theorem S7.
Theorem S16. For K = S or K = PPT , the entanglement cost under approximately K-preserving operations satisfies that inf for any sequence ( ) such that = 2 ( ) .
Proof. In complete analogy with the proof of Theorem S7, for any feasible sequence (Λ ) ∈N of maps such that Λ ∈ KP and Λ (Φ where now we used Lemma S15 to incorporate the approximate entanglement non-generation. Then where in the last line we used the fact that lim →∞ log 2 = 0 (S94) by hypothesis. The rest of the proof of the first part of the Theorem is then exactly the same as in Theorem S7. The second part of the proof is very similar in that it follows Theorem S7, but it goes directly to the tempered negativity rather than the intermediate quantity K . Taking now any feasible sequence (Λ ) ∈N with Λ ∈ KP , we have where in (i) we used Lemma S15, in (ii) Proposition S5(b), and in (iii) the -lemma (Lemma S6). This gives using the super-multiplicativity of the tempered negativity (Proposition S5(e)) and the assumption that = 2 ( ) .
As the final ingredient that will be required in the proof of Theorem S14, we need to show that the distillable entanglement cannot increase even if we allow sub-exponential entanglement generation.
Lemma S17. Consider any state and let K = S or K = PPT . For any ∈ [0, 1) and any non-negative sequence ( ) it holds that Moreover, if = 2 ( ) then also Proof. The proof will proceed in two steps. First, we establish expressions for the minimal error achievable in distillation with ( , )-approximately K-preserving operations. Then, we argue that, for sub-exponential , the contributions from the parameter to the performance of the distillation task can be absorbed into the transformation rates, effectively preventing any improvement in the asymptotic distillation error.
Consider first any operation Λ ∈ KP → 0 0 for a fixed , where is a generic positive integer. We would then like to understand exactly the error in the transformation from ⊗ to the maximally entangled state at some rate , which we will for now quantify using the fidelity ( , ) √ √ 2 1 . We have using that Φ 2 is a pure state. Notice now that, by definition of the generalised robustness, for any ∈ K ∩ D(H) we have Λ( ) ≤ (1 + ) for some ∈ K ∩ D(H). Thus, for any ∈ K ∩ D(H), where in the first line we used the positivity of Φ 2 , and in the second that the maximal overlap of Φ ⊗ 2 with a separable state is 1 2 [105]. Noting also that 0 ≤ Λ † Φ ⊗ 2 ≤ 1 due to the fact that Λ is positive and trace preserving, we get (S101) The argument for operations KP , where now negativity is the figure of merit, proceeds analogously. We have, for any ∈ K ∩ D(H) and any Λ ∈ KP , that where in the second line we used the Cauchy-Schwarz inequality, and in the third we used that S103) exactly the same as in (S101) up to the substitution ↦ → 2 , which we will see to be immaterial.
For the other direction, define the separable [66] state (1 − Φ ⊗ 2 )/(4 − 1) ∈ K ∩ D(H), which satisfies also that [66] Φ ⊗ 2 + (2 − 1) ∈ K. Note that K Φ ⊗ 2 = 2 − 1 by (S16), and therefore is the state that achieves the minimum in the definition of K (S11) for Φ ⊗ 2 . Take any operator such that 0 ≤ ≤ 1 and Tr ≤ (1 + ) 1 2 for all ∈ K ∩ D(H), where we assume that (1 + ) 1 2 ≤ 1 without loss of generality. Define the map Γ by This map is explicitly completely positive and trace preserving, and we can furthermore verify that, since 0 ≤ Tr ≤ (1 + ) 1 2 , for any ∈ K ∩ D(H) we have where conv denotes the convex hull. Since ∈ K and Φ ⊗ 2 + (2 − 1) ∈ K, also The convexity of K (which follows directly from the convexity of K itself) then implies that K (Γ ( )) ≤ , and hence Γ ∈ KP . Noting that Γ ( ⊗ ), Φ ⊗ 2 = Tr ⊗ , optimising over all feasible yields where the function , ( ) is defined in (S101). Using the inclusion between the sets of operations KP ⊆ KP and KP ⊆ KP , we therefore obtain that for all , , such that (1 + ) 1 . (S108) We now pass from the fidelity to the trace distance. We claim that for (S109) To see why this is the case, it suffices to observe that we can always twirl the output of Λ without loss of generality, i.e. we can substitute Λ ↦ → T • Λ, where T is defined as in (S28). Doing so does not change the fact that Λ ∈ KP , simply because is convex and invariant under local unitaries; furthermore, twirling leaves Λ( ⊗ ), Φ ⊗ 2 = Tr Λ( ⊗ )Φ ⊗ 2 invariant and never increases 1 2 Λ( ⊗ ) − Φ ⊗ 2 1 (see (S29)). At the same time, since the output state of Λ is now twirled, and hence it is a convex combination of the orthogonal states Φ ⊗ 2 and , we have that

V. FLUCTUATIONS IN ENTANGLEMENT MANIPULATION PROTOCOLS AND COMPARISON WITH THERMODYNAMICS
When studying the asymptotic behaviour of quantum systems, allowing for some type of microscopic fluctuation is arguably very natural from a thermodynamical perspective. Indeed, thermodynamics is a theory of macroscopic states and quantities, which should be unaffected by microscopic degrees of freedom and by the fluctuations associated with them. Although in Supplementary Note IV we justified our framework for entanglement manipulation by demonstrating the issues that come with enforcing only asymptotic (rather than exact) entanglement non-generation, it is natural to wonder: Is it not more physically meaningful to allow for fluctuations at the level of microscopic systems, rather than completely forbid entanglement generation? Let us discuss the types of fluctuations that are allowed in our entanglement manipulation framework, to illustrate why our setting is in fact very similar to others considered previously in the literature, and to pinpoint the key difference with Brandão and Plenio's approach [23,24].
Although fluctuations are a natural component of our theory, not all fluctuations are equal: in the context of state-to-state transformations in entanglement theory (cf. the discussion in Section I B), and more generally in that of quantum resource theories [28], we find it useful to introduce the following classification.
(I) Fluctuations at the level of microscopic transformation error. Specifically, one typically requires that transformations of the form → are only realised approximately as Λ( ⊗ ) ≈ ⊗ with some non-zero error (usually measured by the trace norm), and only in the limit → ∞ does the error vanish. Such fluctuations are common to most approaches to quantum resource theories, including quantum thermodynamics as well as entanglement theory [28].
Crucially, without such fluctuations, even quantum thermodynamics is no longer a reversible theory, and many state transformations become impossible [63,106,107].
We do note, however, that in some contexts one may in fact be interested in exact, i.e. zeroerror, entanglement distillation [108,109] or dilution [22,96,110]. In the language we are using here, such settings would explicitly rule out microscopic transformation errors (as in (S22)-(S23)), or would require them to vanish exceedingly -i.e. super-exponentiallyfast [111]. For these reasons, these approaches are generally regarded as less physically motivated than the one we take here, which explicitly allows for vanishingly small transformation errors.
(II) Fluctuations in the resources consumed by the process. For example, in quantum thermodynamics, one often includes a small ancillary system to activate a desired transformation [39,54]. This is arguably necessary to circumvent the energy super-selection rule that prohibits the creation of coherence between different energy levels under an energy-preserving unitary: without the ancilla, an energy-incoherent state would remain energy-incoherent under thermal operations, thus preventing dilution of resources altogether [39]. Including a small ancillary system that carries some coherence in the energy eigenbasis solves the problem, but its small size means that its contributions to the rate will asymptotically vanish, leaving the underlying physics unaffected.
More detailed insights on sensible physical requirements to impose on the ancillary system can be deduced from [54]. There the authors look at transformations of the form where is required to be an energy-preserving unitary, is a 'small' and 'not too energetic' ancilla, and Tr 2 denotes partial trace over the ancillary modes. Although the setting of that work is slightly different than the one considered here or in [39], the conclusions are nevertheless insightful: in [54] it is found that, although fluctuations (II) are indeed necessary to enable the reversibility of the theory, ancillae with subexponential dimension log dim = log and sublinear Hamiltonian operator norm ∞ = 2/3 suffice to established the desired result.
Since they play such a key role in thermodynamics, it is certainly meaningful to try to take into account fluctuations of this type in entanglement theory as well. However, although it may not be apparent at first sight, these fluctuations are already implicitly included in our framework -and in fact, in most of the commonly encountered formulations of entanglement theory. They take the form of sublinear fluctuations in the number of entanglement bits consumed by either the distillation or the dilution process. But in the definitions of (S19) and (S20) we only care about asymptotic rates, so sublinear fluctuations are suppressed in the limit.
In other words: suppose that in our original definitions in Section I B we chose to consider a larger class of transformations than K-preserving ones; a general transformation of this new class would be obtained by: (a) attaching an ancilla of ( ) many qubits per party initialised in any (possibly entangled) state, and (b) performing a K-preserving operation on the joint system. In this way, the allowed transformations are of the form where Λ ∈ KP is an arbitrary K-preserving operation, and = is an ancillary state over a system . Our assumptions on this ancilla are as follows: (i) can be an arbitrary bipartite system, possibly dependent on , but it has to have subexponential dimension, i.e. dim( ) = dim = 2 ( ) ; that is, log dim( ) = ( ) is required to be sublinear, or in other words should be made of sublinearly many qubits; (ii) Apart from this, is completely arbitrary. It could be for instance a maximally entangled state composed of sublinearly many ebits.
Formally, the new distillable entanglement and entanglement cost now look like this: (S122) Here dim denotes the dimension of the system of the ancilla.
We can now ask ourselves: can it be that In other words, can it happen that granting a sublinear number of ancillary qubits allows for better rates in either entanglement distillation or entanglement dilution? The answer turns out to be negative, and the fundamental reason is that ancillae made of a sublinear number of qubits cannot change the rate. In full detail, a proof can be constructed as follows.
Lemma S18. For any state , it holds that Proof. It is clear that , KP ( ) ≥ , KP ( ) and , KP ( ) ≤ , KP ( ), so we only need to prove the converse inequalities. To do so, assume that is an achievable rate for , KP ( ), so that we can find a sequence of ancillae and K-preserving operations Λ ∈ KP with the property that Fix > 0, and take large enough so that 2 ≥ dim . Since any state can be prepared from the maximally entangled state of the same dimension via LOCC, we can prepare starting from Φ ⊗ 2 and using an LOCC Ξ , i.e. consuming many ebits. In formula, (S125) Now, consider the operation Λ Λ • ⊗ ⊗ Ξ , where ⊗ denotes the identity channel on the first qubits. Note that Λ is still K-preserving, because the set of Kpreserving operations is closed under composition, and LOCCs are K-preserving. Moreover, it satisfies that Thus, thanks to (S124) (S127) Since now there is no ancilla, we obtain that the rate is actually achievable for ,KP ( ). Therefore, Taking the infimum over yields that ,KP ( ) ≤ ,KP ( ) + . Since > 0 was arbitrary, we obtain that in fact which is what we wanted to show.
To show that , KP ( ) ≤ , KP ( ), assume that is an achievable rate for , KP ( ), i.e. that there exists a valid choice of ( ) and K-preserving operations Λ ∈ KP such that Noting that is a finite-dimensional system, the optimal decomposition for the standard robustness in (S11) exists, and we can write for some normalised states ± ∈ K. Observe then that for any state ∈ K , it holds that for some states , ∈ K due to the fact that K is closed under tensor product. This implies that where the last inequality is due to the fact that the value of K (Φ dim ) = dim −1 is maximal among states of the same dimension. As dim = 2 ( ) by assumption, this entails that each Λ (· ⊗ ) is an ( K , 2 ( ) )-approximately K-preserving operation (cf. Supplementary Note IV). The proof is thus concluded by recalling from Lemma S17 that such operations cannot increase the rate of distillation compared to exactly K-preserving operations, that is, ≤ , KP ( ).
In the context of quantum thermodynamics, we can compare this to works which studied the manipulation of quantum systems under so-called Gibbs-preserving operations (e.g. [30]), defined to be all channels Λ such that Λ( ) = where , are the equilibrium states of the input and output system, respectively. Although no explicit resource fluctuations are allowed in such a definition, the Gibbs-preserving framework recovers the exact same asymptotic rates as frameworks that do consider fluctuations in the consumed resources [39,54] -this is precisely because the type-II fluctuations are implicit there, as one only looks at the rates.
(III) Genuine fluctuations in the fundamental physical laws governing the process. These fluctuations are excluded in most of the literature on quantum thermodynamics: for example, in Refs. [39,54] the unitary operators acting on the joint system (including the ancilla) are required to be exactly and not only approximately energy-preserving. This is well justified from a physical perspective: analogously, we would not claim that energy is only approximately preserved in a piece of uranium because we lost track of some neutrinos; include those back into the picture, and you will restore exact energy preservation. It should be noted at this point that although the law of energy conservation is, to the best of our knowledge, exactly obeyed, this fact alone does not play a decisive role for the validity of thermodynamics, which is a macroscopic rather than microscopic theory. Still, the fact that it is conceivable that Nature preserves energy exactly shows that a setting excluding genuine type-III fluctuations -yet encompassing type-I and type-II fluctuations -is somewhat reasonable, if not completely satisfactory.
Importantly, in most of the known physical theories including thermodynamics, disallowing type-III fluctuations does not change the underlying physics whatsoever. Although they can in principle be considered, as we elaborate below, genuine fluctuations of the physical laws governing thermodynamical processes -such as global unitarity and energy conservation [39] -are not necessary to construct a reversible theory of quantum thermodynamics [30,39,112,113].
A possible objection to shis claim could be that the maps considered in [39,54] are in fact not exactly but only approximately Gibbs-preserving. While this could seem prima facie an example of a type-III fluctuation, it is only a spurious one, essentially because of the aforementioned fact that these maps can be thought of as Gibbs-preserving operations acting on a larger quantum system with an ancilla of sublinear size. These apparent type-III fluctuations are thus in fact type-II fluctuations in disguise, and as such they are included also in our original framework.
What our results show in this context is that Brandão and Plenio's asymptotically nonentangling operations [23,24] admit fluctuations that are genuinely of type III (see Section IV). Specifically, they cannot be re-absorbed into the type-II category -if this were the case, then the entanglement cost , ANE under asymptotically non-entangling operations [24, Definition III.2] would necessarily be equal to that under non-entangling operations, which we denoted by , NE . Since the former is given by the regularised relative entropy of entanglement [24,114], in formula , ANE = ∞ ,S , we would deduce that the entanglement cost under non-entangling operations must also equal , NE = ∞ ,S . But that this is not the case in general is precisely the content of our main result (Theorem S9), whose proof reveals that , . Nevertheless, one can argue that allowing type-III fluctuations is actually the sensible thing to do in entanglement manipulation, as it allows one to achieve an even greater level of generality and a stricter adherence to the spirit of thermodynamics. This is especially important when discussing no-go results such as asymptotic irreversibility: if reversibility could be restored with just an unequivocally vanishing amount of fluctuations -even ones of type III -then, arguably, such irreversibility would not be robust. This is precisely the motivation for our Theorem S14, where we clarify that it is impossible to have any reversible framework of entanglement that generates only small amounts of it: using as an entanglement quantifier either the standard robustness of entanglement or the negativity, we show that these 'fluctuations' must in fact be macroscopically large, putting into question the physicality of any conceivable reversible theory of entanglement.
What is pivotal to understanding the consequences of our results is that thermodynamics is a reversible theory even when fluctuations of type III are not allowed. This is addressed in works such as Ref. [39,112,113], where the manipulation of quantum states under thermal operations was considered, and extended also to quantum channels in [30] under the Gibbspreserving framework. That is, fluctuations of types I and II fully suffice to recover the entropy as the unique quantity governing the thermodynamical transformations of macroscopic systems, and the physical constraints of energy conservation can be enforced at all scales, including microscopic ones. In fact, we stress that in the Gibbs-preserving framework -which is completely analogous to the non-entangling operations used in our work -even fluctuations of type I are sufficient to establish the reversibility of thermodynamics [30], but one can equivalently consider type-I and type-II fluctuations.
Therefore, our entanglement manipulation framework under non-entangling operationswhich does allow type-I and type-II but forbids type-III fluctuations -follows exactly the same reasoning as the quantum thermodynamics frameworks of Refs. [30,39,40,45,46,54,112,113,115], and yet it leads to a completely opposite conclusion, as per Theorem S9. It strikes us as surprising the stark contrast between thermodynamics, which is asymptotically reversible, and entanglement Here we note that, despite the issue with the proof of [24] uncovered in [25], this part of the argument by Brandão and Plenio is not affected. The proof can also be obtained by combining the framework of [24] with the asymptotic equipartition property of S , for which an alternative proof is given in [114]. theory, which turns out to be fundamentally irreversible under the same assumptions, no matter how much one strives to avoid it.
That is not to say that there is something inherently wrong with the framework proposed by Brandão and Plenio [24] -quite the contrary, there is no intrinsic reason why fluctuations of type III cannot be allowed, even if they are not needed in thermodynamics. However, due to the ambiguity in defining 'small' amounts of entanglement in the asymptotic limit, even such permissive fluctuations are not enough in light of our Theorem S14, unless they are made non-vanishingly large. Our work therefore provides insight into how any potential reversible entanglement framework would function and the size of 'fluctuations' it would need to allow.
In conclusion, although connections between entanglement theory and thermodynamics such as the one conjectured in [24] can possibly be established, our work conclusively shows that reversibility of entanglement, if at all possible, cannot play out in the same way as it does in the theory of thermodynamics (with type-I and type-II fluctuations, but without type-III fluctuations), or indeed not even in a similar way (with universally small type-III fluctuations). A fundamental and inexorable difference thus exists between the two theories.

A. Confessions of 3
Here we explain the intuition behind our choice of the state 3 as the counterexample to the reversibility of entanglement manipulation (Theorems S9 and S14). It will rely on the relation between two entanglement monotones that we have encountered in the course of this workthe standard robustness K and the generalised robustness K . Crucially, the work of Brandão and Plenio [24] connected each of these quantities with the operational tasks of entanglement distillation and dilution. In particular, Ref. [24] (cf. [44]) showed that, when considering only a single copy of a state rather than an asymptotic rate, the entanglement cost under K-preserving operations is given exactly by the logarithm of 1 + K ( ). This then implies that the asymptotic exact cost of entanglement (Eq. (S23)) is given by exact , KP ( ) = lim (S135) The entanglement cost , KP also needs to account for an asymptotically vanishing transformation error, and it can thus be expressed by suitably 'smoothing' the robustness over the -ball around a given state, i.e.
On the other hand, employing a connection established between the generalised robustness K and the regularised relative entropy ∞ ,K [24,114], the distillable entanglement can be tightly upper bounded as [24,93] , KP ( ) ≤ ∞ , K ( ) = lim →0 + lim →∞ 1 inf ∈D(H ⊗ ), 1 2 − ⊗ 1 ≤ log 2 1 + K ( ) . (S137) These expressions lead to a very curious fact: showing the irreversibility of entanglement manipulation under the class of K-preserving maps can be done by exhibiting a gap between the smoothed regularisations of K and K in (S136) and (S137).
The daunting expressions for the regularised robustness measures do not immediately make studying reversibility any easier. Let us then start by asking a more basic question: is there a gap between the standard and the generalised entanglement robustness, S and S ? We stumbled upon the state 3 precisely when looking for a way to demonstrate such a gap. Indeed, we have already seen in the course of proving Theorem S9 that . (S138) It is also not difficult to show that the decomposition 3 + 1 2 Φ 3 = 1 2 3 is optimal for the generalised robustness of this state, giving . (S139) We thus see an explicit gap between the two robustness measures. The direct connections between S and S on one side and, respectively, the entanglement cost and distillable entanglement on the other then motivated us to look into 3 as a possible candidate for a state whose cost could be strictly larger than the entanglement that can be distilled from it. This leads directly to the considerations expounded in the main text of the paper and to the main results of this work.

B. Strong converses and error-rate trade-offs
According to Definition S3, the theory of entanglement manipulation is considered reversible under some set of operations if the distillable entanglement and the entanglement cost coincide for all states. As we have seen, these latter two quantities are defined in terms of asymptotic transformations in which the allowed error is required to vanish as the number of copies of the state grows. In information theory, classical as well as quantum, one can study also weaker notions of transformation rates, dubbed strong converse rates. In this setting, one requires instead that the error, as measured by half of the trace norm distance between the output and the target state, is bounded away from its maximum value of 1. Intuitively, what this means is that attempting to distill or dilute entanglement at a rate larger than the strong converse one necessarily incurs an error that grows to 1, making the protocol useless. Formally, we can define the strong converse rates of distillation and dilution, respectively, by † , KP ( ) sup , KP ( ) . (S140) As for the entanglement cost, we have not yet been able to prove that no improvement can be obtained by considering the corresponding strong converse rate. However, we deem such a possibility quite unlikely, and we present strong evidence against it by showing that any such error would have to be very large, which effectively rules out this approach as a practically viable way of improving entanglement dilution. We leave the complete solution of this problem, which we formalise below, as an open question left for future work. are considered. Crucially, our counterexample 3 is a maximally correlated state. This implies that where the first equality is due to the fact that both ,NE ( ) and ,MIO ( ) are given by the relative entropy of coherence of [41,42,65] for any maximally correlated state. Intuitively, any MIO transformation can be seen to give rise to a transformation which is non-entangling, but only when restricted to the maximally correlated subspace; our result shows that such maps cannot always be extended to a transformation which is non-entangling for any input state, and so NE operations are weaker at manipulating maximally correlated states than MIO operations are at manipulating their corresponding single-party systems.

D. More open questions
Our results motivate a number of extensions and follow-up results that would strengthen the understanding of general entanglement manipulation. We have already remarked several of them throughout this Supplementary Information; let us collect them here and discuss other open questions.
First, although our bound , NE ( ) ≥ ( ) is good enough to establish the irreversibility of entanglement manipulation, one could ask whether a tighter computable bound can be obtained. For instance, does the regularised quantity K ( ) in Theorem S7 admit a single-letter expression? Even more ambitiously, one could ask whether an exact expression for the entanglement cost , NE itself can be established. Such questions are interesting not only from an axiomatic perspective, but also because any such result would provide improved bounds on , LOCC .
Additionally, as mentioned in Supplementary Note III, several quantum states which have previously been used as examples of irreversibility under smaller sets of operations are actually reversible under NE and PPTP. An understanding of what exactly makes a state such as 3 irreversibile under all non-entangling transformations, and -more generally -a complete characterisation of all irreversible states could help shed light on the stronger type of irreversibility that we have revealed in this work.
Another result uncovered in Supplementary Note IV of our work is that, although Brandão and Plenio's framework [23,24] suggests that entanglement can be reversibly manipulated while generating only small (asymptotically vanishing) amounts of the generalised robustness S , requiring that the standard robustness S also be small completely breaks reversibility. It would be very interesting to understand exactly why such a 'phase transition' in the task of entanglement dilution occurs, and whether one can tighten our and Brandão and Plenio's results with other choices of monotones.
Also, the phenomenon of catalysis [52] is a remarkable feature that can significantly enhance feasible entanglement transformations: the fact that a transformation → is impossible does not necessarily mean that ⊗ → ⊗ cannot be accomplished with the same set of allowed processes. One could then ask about transformation rates where, instead of requiring that Λ( ⊗ ) → ⊗ , one asks that Λ( ⊗ ⊗ ) → ⊗ ⊗ for some state , and define the distillable entanglement and entanglement cost correspondingly. Additionally, the operations NE and PPTP are rather curious in that they are not closed under tensor product, in the sense that Λ, Λ ∈ NE does not mean that Λ ⊗ Λ ∈ NE when acting on a larger system. Such properties leave open the possibility of significantly different behaviour when catalysis is employed, something that was already remarked in [23,24].
Finally, a fundamentally important question is: what is it about entanglement that makes it irreversible? As discussed in the main text, several examples of quantum resource theories have been shown to be reversible when all resource-non-generating operations (counterparts to non-entangling or PPT-preserving maps) are allowed [28]. Although there is no a priori reason to expect reversibility to be a generic property of quantum resources, one can note that the framework and main results of Brandão and Plenio may be adapted to more general convex resource theories [118], suggesting [25] that reversibility can be guaranteed when asymptotically resource-non-generating transformations are allowed and the generated resource is quantified with a generalised robustness-type measure K . Our question then reduces to: in what types of resources can we enforce strict resource non-generation and still maintain reversibility? Is entanglement truly unique in its general irreversibility?

VII. SUBTLETIES RELATED TO INFINITE DIMENSION
Throughout this section we will elaborate on some issues that arise specifically in dealing with infinite-dimensional spaces.

A. On the definition of partial transpose
The definition of the partial transposition [58] requires some further care when one deals with infinite-dimensional spaces, the main reason being that this operation does not preserve the space of trace class operators. We first define it on the dense subspace T (

B. Topological properties of the cone of PPT operators
An important property of the two cones of separable and of PPT operators is that they are closed with respect to the topology induced on T (H ) by its pre-dual, the Banach space of compact operators on H . We will refer to this topology as the weak*-topology. The fact that S is weak*-closed has been established in Ref. [78,Lemma 25]. An analogous statement for PPT can be proved even more directly (cf. [119,Lemma 13]). Lemma S21. The cone PPT ⊆ T (H ) defined by (S7) is weak*-closed, i.e. closed with respect to the topology induced on T (H ) by its pre-dual, the Banach space of compact operators on H .
Proof. Note that PPT = T + (H )∩(Γ(T + (H )) ∩ T (H )), where the intersection with T (H ) in the second factor reminds us of the fact that Γ(T + (H )) ⊆ B(H ) contains operators that are not of trace class. Since T + (H ) is well known to be weak*-closed, it suffices to show that so is Γ(T + (H )) ∩ T (H ) as well. Pick local orthonormal bases {| } ∈N and {| } ∈N . Let us say that an operator ∈ T (H ) has a finite expansion if , | | , ≠ 0 only for a finite number of quadruples ( , , , ) ∈ N 4 . If this is the case, it is simple to verify that also the partial transpose Γ has a finite expansion; in particular, both and Γ are compact operators.
We now claim that ∈ T (H ) satisfies that ∈ Γ(T + (H )) if and only if Tr Γ ≥ 0 for all ≥ 0 with a finite expansion. To see why, note that a necessary and sufficient condition for Γ ≥ 0 is that all (finite) principal minors of Γ are positive. In particular, Γ ≥ 0 if and only if Tr Γ = Tr Γ ≥ 0 for all ≥ 0 with a finite expansion. Since any such Γ is compact, the functionals ↦ → Tr Γ are weak*-continuous; we deduce immediately that Γ(T + (H )) ∩ T (H ) is weak*-closed. This concludes the proof.