Fundamental limitations on distillation of quantum channel resources

Quantum channels underlie the dynamics of quantum systems, but in many practical settings it is the channels themselves that require processing. We establish universal limitations on the processing of both quantum states and channels, expressed in the form of no-go theorems and quantitative bounds for the manipulation of general quantum channel resources under the most general transformation protocols. Focusing on the class of distillation tasks — which can be understood either as the purification of noisy channels into unitary ones, or the extraction of state-based resources from channels — we develop fundamental restrictions on the error incurred in such transformations, and comprehensive lower bounds for the overhead of any distillation protocol. In the asymptotic setting, our results yield broadly applicable bounds for rates of distillation. We demonstrate our results through applications to fault-tolerant quantum computation, where we obtain state-of-the-art lower bounds for the overhead cost of magic state distillation, as well as to quantum communication, where we recover a number of strong converse bounds for quantum channel capacity.


Supplementary Note 1: Setting
All of our discussions will take place in finite-dimensional Hilbert spaces. Given the Hilbert space of a system , we will write L( ) for the linear operators, and D( ) for the density operators acting on this space. We use CPTP( → ) to denote the set of quantum channels, i.e. completely positive and trace-preserving (CPTP) maps from L( ) to L( ). We associate with each channel E : → its Choi matrix E id ⊗ E (Φ + ) ∈ L( ), where Φ + = , | | is the unnormalised maximally entangled state in L( ) and . The normalised Choi state is then E E / . We use , = Tr( † ) for the Hilbert-Schmidt inner product between operators. All logarithms will be taken to the base 2, unless specified otherwise.
We will be concerned with schemes for transformations of quantum channels, that is, maps CPTP( → ) → CPTP( → ). Recall that such superchannels [1] can be written as Θ(E) = M → • (id ⊗ E) • N → where N , M are some pre-and post-processing quantum channels, id is the identity channel, and denotes an ancillary system (see Fig. 1 in the main text).

Resource theories
As discussed in the main text, a given set of channels O is designated as free -these are the channels which are freely available to use within the constraints of the given physical setting, and all channels outside of this set are the resourceful ones. To ensure broad applicability, we only make the natural assumptions that the set O is closed and convex. When discussing channels acting on different spaces, we will assume that each space in consideration has its own associated set of free channels O, and for simplicity of notation we often do not explicitly indicate the relevant spaces.
The transformations of channels, and the issue of what exactly constitutes a free channel transformation, are approached in various ways [2][3][4][5]. Often, one is interested in the manipulation of channels under superchannels of the form Θ(E) = M • (id ⊗ E) • N where the pre-and post-processing channels N and M are both free. In order to apply our results to more general settings, we will make no such assumption, and instead take the weakest possible constraint on the considered set of free superchannels: that for any M ∈ O, we necessarily have Θ(M) ∈ O. We use S to denote the set of all such resource-preserving superchannels. By studying these transformations, we will therefore obtain the most general bounds on the achievable performance of any free channel manipulation protocol, since any valid choice of free transformations will necessarily be a subset of S.
The first type of resources that we consider is concerned with the investigation of intrinsic channel resources; this includes the various resource theories of quantum communication [4,[6][7][8][9], the related setting of quantum memories [10,11], and quantum error mitigation [12,13]. The other type is concerned with an underlying state-based resource and the manipulation of channels in order to extract or utilise the state resource more effectively; this includes, for instance, quantum entanglement [14][15][16][17][18][19], coherence [20][21][22][23][24], thermodynamics [25], or non-stabiliser (magic) states [26,27]. In the case of such state-based theories, we will use F to denote the corresponding (convex and closed) set of free states. We will then study three levels of resources: the free states F, for example separable states in entanglement theory; the free channels O, for example the closure of the set of local operations and classical communication (LOCC) or other chosen classes of operations in entanglement manipulation; and S, which we will always take to be the set of all superchannels that preserve the chosen set O, and which can be understood as the most general way to manipulate free channels while preserving their freeness.
We remark that although we focus here on the framework accounting for all channels as valid objects under study, it is sometimes useful to impose certain constraints on the considered channels (both free and non-free), restricting the attention to CPTP maps possessing a certain structure. Such an approach can be used to investigate settings such as Bell nonlocality [28][29][30][31][32][33], post-quantum correlations [34][35][36][37], and quantum contextuality [38][39][40][41]. Our results can indeed be extended to those cases, which we discuss in more detail elsewhere [42].

Benchmarking channel transformations and resources
In its general formulation, the task of distillation can be understood as the transformation of a noisy resource channel E : → into 'pure' or 'perfect' resources, which are represented by some target channel T : → . Our task then is to understand when one can achieve transformations of the type Θ(E) = T , with Θ ∈ S being a free transformation.
However, in any practical setting, it is important to allow for the possibility of error in the transformation, reflecting the physical imperfections in the manipulation of channels. To account for this, we will employ the worst-case fidelity between two channels E, F : → [43,44] (E, F ) min (id ⊗ E ( ), id ⊗ F ( )) = min (id ⊗ E ( ), id ⊗ F ( )), where ( , ) = √ √ 2 1 is the state fidelity, and in the second line the optimisation is constrained to pure input states. We will thus aim to achieve a transformation such that (Θ(E), T ) ≥ 1 − for some small error . This fidelity metric is very closely related to the channel diamond norm · [45] as 1 − (E, F ) ≤ 1 2 E − F ≤ 1 − (E, F ) [43,46], meaning that our results can be straightforwardly adapted also to cases when diamond norm error is considered.
In order to quantify the resources of general quantum channels, we will employ two different resource measures which were first defined in resource theories of states. These quantities are also directly related to more general geometric concepts based on Hilbert's projective metric [47,48], and specifically the Funk metric [49]. Let us begin by recalling their definitions in the state-based setting . First, the (generalised) robustness [50,51] is given by The optimisation in the definition of ( E, F) can be understood as being over arbitrary ancillary spaces , but in fact it suffices to take [43,44]. We note that our notational conventions are slightly different from ones typically used in the literature; in particular, the robustness is often defined as F ( ) − 1 in our notation, while the resource weight is often defined 1 − F ( ) where the inequality is with respect to the cone of positive semidefinite operators. This can be understood as the least amount such that, when mixed with another state , the mixture + ( − 1) = has ∈ F. The measure previously found a number of applications in different operational tasks [51][52][53][54][55][56][57][58][59][60][61]. In order to avoid pathological cases, throughout this work we assume that the robustness F ( ) is finite (that is, the minimum in Eq. (2) exists) -this can be guaranteed for any state by requiring that the set of free states F contains at least one state of full rank, which is a natural requirement in physical theories of interest.
A closely related quantity is the resource weight: The resource weight generalises a measure of entanglement known as the best separable approximation [62], and was recently shown to be a meaningful quantifier of general resources [63][64][65]. The optimal here can be understood the largest weight that a free state can take in a convex decomposition = + (1 − ) for an arbitrary state . This monotone is a peculiar quantity: although it gives a faithful and strongly monotonic measure in any convex resource theory [66], it exhibits unusual behaviour. Specifically, if there exists a free state ∈ supp( ), we necessarily have F ( ) > 0; conversely, if a given does not have any free states in its support, the weight achieves the minimum value F ( ) = 0. In particular, any resourceful pure state ∉ F has F ( ) = 0.
The measures can be extended to quantum channels [4,21,57,63,67] by considering the corresponding Choi matrices: This can be equivalently understood as the optimisation over free channels M such that M − E (in the case of the robustness) or E − M (for the resource weight) is a completely positive map. Both of the quantities are valid resource monotones in that they satisfy monotonicity under any free superchannel Θ ∈ S, i.e. O (Θ(E)) ≤ O (E), with the resource weight obeying a reverse monotonicity: Other useful properties of the measures, such as their submultiplicativity, can also be established (see Supplementary Notes 1.3 and 3). The quantities have a simple structure which allows for an efficient computation as a semidefinite program (SDP) whenever the set of channels O can be characterised using semidefinite constraints, which we will see to be the case in many relevant settings. We will use one additional resource monotone in order to quantify how close a target channel T is to a free channel, and thus characterise how difficult it is to purify noisy channels into the target T . We thus define the fidelity-based overlap measure In state-based theories, one analogously has

Properties of the robustness and resource weight
We will use the fact that both the robustness and weight measures can be expressed in terms of the max-relative entropy max ( ) log inf ≤ [68], whose generalisation to channels obeys some useful properties [67]. In particular, defining max ( ) 2 max ( ) , for any channels E, F : → one can define the optimised channel divergence [67] where and the maximisation can be regarded as being over the convex and compact set of density operators on , with thought of as a purification of a given . Crucially, it holds that [67, Lemma 12] that is, it suffices to consider the Choi matrices of the channels to evaluate the max-relative entropy. We can then write the robustness measure as In a very similar way, we notice that We will use the facts that the minimisation and maximisation problems in the above can be interchanged, which can be shown through the application of Sion's minimax theorem [69] similarly to how it was done in Refs. [5,27].

Lemma 4. We have that
Proof. The minimax optimisation can be regarded as being over two convex, compact sets: the given set of free channels O and the set of density matrices on , with the state taken to be a purification of . To apply Sion's minimax theorem to O , we then need that the objective function is quasi-concave in and quasi-convex in M [69]. Since max is the composition of max with the non-decreasing function 2 , quasi-convexity in M follows from the quasi-convexity of max [68], and quasi-concavity in follows from the concavity of max (see [27,Prop. 13] and [5,Thm. 2]). The case of O follows similarly. The fidelity (·, ·) is known to be concave in M [43] and can be shown to be convex in following Refs. [5,27].
We proceed to establish other useful properties of the two measures.

Lemma 5. The robustness and weight measures can be expressed in their dual forms as
Proof. Follows from standard convex duality results (see e.g. [57,70]) and Lem. 4.

Lemma 6.
For any superchannel Θ ∈ S, it holds that The result below applies to state-based resource theories, and shows an exact relation between the channel-based and state-based measures.

Lemma 7.
For any replacement channel R : → defined as R (·) = Tr(·) with some fixed , it holds that If the class of operations O contains all replacement channels R : → with ∈ F, then equality holds in all of the above.
Proof. Taking any feasible dual solution for F ( ) in Eq. (12) and any ∈ F, we can see that the operator ⊗ is feasible for O since ⊗ , M = , M ( ) ≤ 1 for any M ∈ O using the Choi-Jamiołkowski isomorphism. This immediately gives that Now, assume that R ∈ O ∀ ∈ F. From the definition of the robustness, we know that there exists a ∈ F such that The case of O proceeds analogously. For the fidelity, consider that Taking the ansatz ⊗ for the input state, where ∈ D( ) is an arbitrary state and ∈ F, we get where the second line follows from the data processing inequality for the fidelity, and the third line since M ( ) ∈ F for any ∈ F. For the converse inequality, assuming that any R is in O gives

Supplementary Note 2: No-go theorems for quantum channel distillation
The results of this section establish Theorem 1 of the main text, explicitly divided into two cases depending on whether the target channel is a unitary or replacement channel.

Unitary channel distillation
Within the context of channel-based resource theories, it is natural to regard some unitary channel U (·) = · † as the target of distillation protocols. Notice that in this case the measure O simplifies to since id ⊗ U ( ) is a pure state. We also recall that, in cases where this distance might not be easy to compute, one can instead choose to employ the more straightforward fidelity measure which uses the corresponding (normalised) Choi matrices: Indeed, this quantity is the figure of merit in many communication scenarios [71][72][73], and can be alternatively expressed as the fidelity averaged over all input states [44,74]. As our first main result, we then establish a general bound on the error necessarily incurred in any transformation of a channel under free superchannels.

Theorem 1(a).
If there exists a free superchannel Θ ∈ S such that (Θ(E), U) ≥ 1 − for some resourceful unitary channel U, then and Notice from the weight-based bound in Eq. (25) that, when = 0, the transformation E → U is impossible for any channel with O (E) > 0. Recalling that O (E) is 0 if and only if E has no free Choi matrices M with M ∈ O in its support, we conclude that zero-error distillation is impossible whenever the given channel has any free channels in supp( E ). This recovers a no-go result of Ref. [75] and extends insights from quantum state distillation [59,75].
For clarity, we will divide the proof of Theorem 1(a) into two parts and consider each bound separately. We begin with the robustness O .
Proof. Recall from Lem. 5 that for any channel we have where Now, for the given target channel U : → , let ★ ∈ D( ) with denote a state such that Notice now that 1 O ( U) id ⊗ U ( ★ ) is a feasible witness to the dual formulation of the robustness by definition of ★ . Using the monotonicity of the robustness under free superchannels, we then get where in the last inequality we used the fact that where the first line is by assumption, second by definition of (N , U), and third since id ⊗ U ( ★ ) is rank one.

Proposition 9.
If there exists a free supermap Θ ∈ S such that Θ(E) = N for some channel N with (N , U) ≥ 1 − , then Proof. Using Lem. 5 we have once again that where In a way similar to the proof of Prop. 8, we let ★ ∈ D( ) be a state achieving the minimum for O (U). We then notice that which means that and using Eq. (32) we conclude that which is precisely the statement of the Proposition.

State-based resources
Although the idea of distilling noisy resources into pure ones makes sense in many physical settings, some resource theories are instead concerned with extracting state-based resources. Here, we will assume that there is an underlying set of free states F, and the operations O are free operations in this theory. In such cases, the target channel in distillation can be chosen as the replacement channel R , which substitutes any input with a target state: R (·) = Tr(·) . A special case of such channels are preparation channels P , which have trivial input and simply prepare a single copy of a chosen resourceful pure state . To characterise the resourcefulness of the target channel, we consider the overlap We then obtain an analogous bound for all transformations into replacement channels.

Theorem 1(b).
If there exists a free superchannel Θ ∈ S such that (Θ(E), R ) ≥ 1 − for some resourceful pure state , then Once again, the weight bound (41) gives a no-go result: no resourceful pure state replacement channel can be distilled with = 0 from a channel E such that O (E) > 0, that is, such that supp( E ) contains any free channels. As a special case, the results apply also to the manipulation of states themselves, that is, transformations of resourceful quantum states under free transformations in the form of channels O. Thus, we obtain: Here we note that an analogous robustness bound for states (42) previously appeared in Ref. [59].
Another approach to no-go results in the distillation of resources from quantum states was studied in Ref. [75]. Our new weight-based bound (43) strictly improves on that result. Let us explicitly compare our result with the bound of Ref. [75], which applies only to full-rank input states , and is given by ≥ [1 − F ( )] min ( ), where min denotes the smallest eigenvalue of . First, our bounds require no assumption about the rank of the input state , thus extending the applicability of the fundamental restrictions on quantum resource distillation. More importantly, our approach replaces the dependence on the eigenvalues of the input state with a bound which explicitly takes into consideration the resources contained in , which provides more accurate restrictions. Further, for a full-rank state one can notice that min can be written as min ∈D max ≥ . From this it follows that F ( ) ≥ min ( ) in any resource theory (with the inequality typically strict), and so the weight-based bound in Eq. (43) is tighter than the result of Ref. [75]. Indeed, in Supplementary Note 4 we will see this improvement to be significant.
As before, we split the proof for clarity.

Proposition 11. Consider the replacement channel R :
→ . If there exists a free supermap Θ such that When the input is a preparation channel P : C → and the target is the preparation channel P : C → , the problem reduces to manipulating quantum states, and we have Proof. Noticing that, for a fixed E, the function O (E) that we considered in Prop. 9 is concave in , we can relax the optimisation to write since the minimum will be achieved on a pure state anyway. Choosing ★ = ⊗ for arbitrary ∈ D( ) and ∈ F, we use the reverse monotonicity of O to obtain where we used that M ( ) ∈ F. Notice now that 1 − ≥ 0 and 1 − , we then have which concludes the proof.
The above reduces to the case of quantum states when the input and target are preparation channels, since this constrains any output of the transformation to also be a preparation channel.

Proposition 12.
If there exists a free supermap Θ such that Θ(E) = N for some channel N : If E = P and the system is trivial, we have that Proof. Analogously as in Prop. 11, we choose ★ = ⊗ for some ∈ F to get where in the first line we used the monotonicity of O , in the second line we used that O is convex in so we can optimise over mixed states, and in the fourth line we used that ≥ 0 and , ≤ F ( ) ∀ ∈ F which means that F ( ) is a valid feasible dual solution for the robustness F ( ).

Bounds for many-copy manipulation
Recall that the most general physically realisable manipulation protocols involving multiple quantum channels are dubbed quantum processes [76][77][78][79][80][81]. We will be interested in processes which transform multiple uses of a quantum channel into one output map -we thus understand a quantum process Υ as any transformation such that Υ(N 1 , . . . , N ) ∈ CPTP for any N 1 , . . . , N ∈ CPTP, and we take the set of free quantum processes as Depending on the given resource theory, different ways to manipulate multiple channels might be of interest. For instance, when the theory is closed under tensor product, i.e. M, M ∈ O ⇒ M ⊗ M ∈ O, then any free protocol which manipulates copies of a channel in parallel as Θ(E ⊗ ) is a free quantum process. Similarly, when the theory is closed under composition, i.e. M, M ∈ O ⇒ M • M ∈ O, then any sequential protocol of the form Θ (E) • · · · • Θ 1 (E) belongs to the set S ( ) . However, a general channel theory need not be closed under tensor product or composition -for instance, the tensor product of operations which preserve separability in entanglement theory is not always separability preserving itself [82]. Therefore, to take into consideration the most general way of manipulating quantum channels allowed by the constraints of the given resource theory, we employ the formalism of free quantum processes S ( ) . By considering such transformations, we can establish fundamental bounds on the performance of any adaptive, multi-copy protocol for manipulating channels or states.

Theorem 2.
Given any distillation protocol Υ ∈ S ( ) -parallel, sequential, or adaptive, with or without a definite causal order -which transforms uses of a channel E to some target unitary U up to accuracy > 0, it necessarily holds that ≥ log Analogously, when the target channel is a replacement channel R which prepares a pure state , we have ≥ log The result applies also to the case of state manipulation, where we obtain that the number of copies of a state needed to perform the distillation M ( ⊗ ) → up to error must obey This, again, improves on the bound obtained in Ref. [75] for resource theories of states.
The two bounds exhibit very different properties. Intuitively, we see that the weight-based bound (54) will perform better for small , establishing in particular that must scale as log(1/ ) as → 0 for distillation to be possible. On the other hand, the robustness-based bound (55) increases in performance with decreasing O (U), i.e. with increasing resourcefulness of U. One can use both of these insights to one's advantage when aiming to obtain more accurate bounds. For instance, a straightforward way to decrease O (U) is, instead of considering U as a target channel, to consider several copies of it. In practice, such an approach can be employed in block distillation protocols, which could provide a more efficient way of purifying the given resource. As long as one can bound or compute the quantity O (U ⊗ ) -which we will shortly see to be possible in relevant cases -this leads to an immediate improvement in the robustness bound (cf. [60]).
The result of Thm. 2 is a consequence of a general sub-or supermultiplicativity result for the monotones O , O . Theorem 13. Consider a collection of channels (E 1 , . . . , E ). For any free protocol Υ ∈ S ( ) it holds that and In particular, for any free protocol Υ ∈ S ( ) which takes uses of a quantum channel E to another quantum channel where in the second inequality we take log 0 = −∞ and assume that O (E ) and O (E) are not both 0.
Notice that this establishes bounds which go beyond distillation protocols, and imposes constraints on arbitrary manipulation of channels. In particular, the robustness-based bound gives a general restriction on the capabilities of channel dilution, i.e. transformations U → E and R → E , which can be understood as simulating the action of a channel E by employing the pure channel U or R .
An interesting difference between the bounds of Thm. 13 emerges in the case when E is a unitary or pure replacement channel with O (E ) = 0. Here, the weight-based bound shows that increasing the number of uses of a channel cannot allow perfect distillation when O (E) ∈ (0, 1], strengthening the no-go result of Thm. 1(a). However, the bound based on O does not provide information on the distillation of channels with O (E) = 0 -notably, unitary-to-unitary transformations -while the robustness-based bound can also be applied in such cases. This complements the no-go results provided by O and can reveal errors even in transformations where the weight bound becomes trivial.
Proof of Thm. 13. We consider O first. For each E , let ∈ R + and M ∈ O be such that E ≥ M . Using the -linearity of the transformation Υ, we can expand . . .
By the positivity of Υ, each term on the right-hand side is positive semidefinite, and so for some M ∈ O due to the fact that Υ is a free quantum process. Choosing M as optimal channels such that Taking logarithm of both sides of the equation and recalling that O (E) ∈ [0, 1], we get If the resource theory is closed under

Asymptotic rates of distillation
To understand the ultimate limitations on transforming a given state or channel, one can study the maximal rate at which the conversion can be performed with an asymptotic number of channel uses, allowing for conversion error that vanishes asymptotically. Specifically, given two channels E, F , we define the maximal achievable rate of transformation under any adaptive protocol as Again, the transformations that we consider include both parallel and sequential protocols as relevant special cases, and thus provide an upper bound for both. Although (68) characterises the conversion rate with the perfect fidelity in asymptotic limit, it does not give insights into how robust the rate is against error. To characterise the maximum rate at which the asymptotic transformation is possible with some non-vanishing error, we define the strong converse rate as [83] † In other words, as soon as a rate exceeds the strong converse rate, the fidelity necessarily goes to 0. This places a threshold for the achievable distillation rates, even when non-zero error is allowed. Applying our result in Thm. 2 allows us to obtain a general bound on the rate of distillation protocols.

Theorem 3(a). If the target channel
, then we have a strong converse bound as Alternatively, if the target is a replacement channel R such that F ( ⊗ ) = F ( ) , then In the above, O (U) (or F ( )) can be replaced with any other multiplicative quantity , and the results hold analogously.
Proof. Let be any achievable rate at error threshold ∈ [0, 1), that is, assume that there exists a sequence {Υ } of free quantum processes such that 1 − Υ (E × ), U ⊗ with lim inf →∞ = < 1. From Thm. 2, for each we have that where we used the multiplicativity of O (U). Dividing by and taking lim sup →∞ in both sides gives the claimed result.
When the target is a replacement channel, we can follow the proof of Lem. 7 to get and the rest of the proof proceeds analogously.
In some cases, it is reasonable to restrict our attention to parallel protocols (illustrated in Fig. 2(a) of the main text), which are indeed how many communication and channel manipulation schemes are often considered [71,72,84]. To separately characterise this scenario, we define the rate of transformation with parallel protocols as and its strong converse rate † par (E → F ) analogously. Then, we can get better strong converse bounds for parallel protocols by suitably 'smoothing' the definition of the robustness over all channels within a small distance of the original input E [4,5,7,21]. We thus define the regularised log-robustness (max-relative entropy) , and use it as follows.

Theorem 3(b). If the target channel
Alternatively, if the target is a preparation channel P such that F ( ⊗ ) = F ( ) , then we have a strong converse bound as In the above, O (U) (or F ( )) can be replaced with any other multiplicative quantity , and the results hold analogously.
For both of the results of this section to be valid, we require the multiplicativity of the channel fidelity of the target unitary U or state . In practice, the bound can be relaxed by using the Choi fidelity O (U) ≥ O (U), which is often easier to show to be multiplicative. Although the multiplicativity condition might not hold in full generality, the majority of relevant resource theories do indeed satisfy it. This includes (cf. Table I  • The theory of magic for multi-qubit quantum channels. Here, the fidelity O (R ) of any replacement channel R reduces to F ( ) which is known to be multiplicative as long as is a state of up to three qubits [85]. We will furthermore show in Supplementary Note 4.2 the multiplicativity of the channel fidelity O (U) of any 1-, 2-, or 3-qubit diagonal unitary channel from the third level of the Clifford hierarchy, which are common choices of target channels in practical settings.
• The theory of magic for multi-qudit quantum channels, where a multiplicative bound for the fidelity of any target replacement channel R is given by the min-thauma of magic [27,86]. The fidelity F itself can also be shown to be multiplicative in some relevant cases.
• Other state-based theories such as entanglement, coherence, athermality, and purity, where the quantities F ( ) are known to be multiplicative for any target pure state .
Our results therefore give broadly applicable bounds for the asymptotic performance of general distillation protocols. In order to prove Thm. 3(b), we first show the following lemma.
where in the third line we defined˜ to be the minimiser of the second term in the second line and also defined˜ U = id ⊗ U (˜ ), and in the fourth line we used that 0 ≤ |˜ U ˜ U | ≤ 1.

Proof of Thm. 3(b).
Let be any achievable rate at error threshold ∈ [0, 1), that is, assume that there exists a sequence {Θ } of free superchannels such that 1 − Θ (E ⊗ ), U ⊗ with lim inf →∞ = < 1. Let with 0 < ≤ 1 be some constant and N be a channel such that log O (Θ (E ⊗ )) = log O (N ). Then, using Lem. 15, Since doing nothing is a valid free comb, N can be transformed to U ⊗ with fidelity 1 − − √ for free. Applying Thm. 2, we get where we used the assumption of the multiplicativity O U ⊗ = O (U) . Let { } be a subsequence such that lim →∞ = . Then, since 0 ≤ < 1, there exist some integer and positive real number such that 0 < + √ < 1 for any > . Using (80), we get for > and ≤ that where in the last inequality we used the monotonicity of max-relative entropy measure under free superchannels [8]. Noting that lim sup we take lim →0 lim sup →∞ in both sides of (81) to get showing that the quantity in the right hand side is a strong converse bound. The state case can be shown analogously.

Quantum communication
A central goal of quantum communication is to enable reliable transmission of quantum states to another party through a noisy channel. A way of accomplishing this task is to apply encoding (respectively, decoding) operations to input (output) states so that one can offset the effect of noise, and there has been an enormous amount of work to investigate the ultimate limit on how much information can be reliably sent [15,83,[87][88][89][90][91][92][93][94][95][96][97][98][99]. By understanding this problem as the purification of a noisy quantum channel to the identity channel by means of some allowed free channel transformation, we can directly apply the results of our work.
To encompass the most general communication setting, we consider assisted communication scenarios, where both parties may share some correlations (e.g. entanglement) or are able to perform some joint operations (e.g. communicate classically) in order to enhance their quantum communication capabilities [89,100]. The given type of assistance can be specified by the set of free superchannels in our framework, and our general results can be applied by considering O as the set of free channels that are preserved by the free superchannels. We will thus investigate the maximum size of quantum systems that can be reliably sent using free superchannels S A where A denotes the type of assisting operations. Of particular interest in the theory of quantum communication is the asymptotic capacity where many uses of the given channel are considered. We define the A-assisted adaptive quantum capacity A,adap as the rate at which the given channel can be converted to the qubit identity channel id 2 using the given choice of superchannels S = S A , that is, adap (E → id 2 ) in the notation of Supplementary Note 3.2. Another important quantity is the strong converse capacity, similarly defined as † A,adap (E) † (E → id 2 ). We note that our methods naturally apply also to the setting of generalised resource theories of communication [9], where transformations with an indefinite causal order are allowed.
We also consider a simpler scenario where multiple uses of the given channel are applied in parallel. We define the A-assisted quantum capacity under a parallel communication protocol as A (E) par (E → id 2 ) and similarly the strong converse capacity as † A (E) † par (E → id 2 ). Considering assisted capacities is insightful in that they serve as upper bounds for capacities with weaker assistance, including the unassisted quantum capacity (E) whose single-letter formula is not known and thus hard to analyse in general [101]. Below, we apply our method to two assisted settings that are often considered in the literature.
for any state , , and any pair of states We denote the set of superchannels realised by no-signalling channels as S NS and call them no-signalling superchannels. Intuitively, such transformations do not create a side channel that allows for free communication. In this setting, the free channels are the channels which are useless for transmitting any information: this set is formed by the replacement channels O R : → R (·) = Tr(·) , ∈ D( ) , which simply replace the input with a fixed output state. Indeed, any superchannel S NS preserves the set of replacement channels. Motivated by this, Ref.  (53), which map multiple constant channels to a constant channel. A class of free many-copy transformations are the quantum feedback-assisted adaptive protocols where the receiving party is allowed to send a part of their quantum system back to the sender, followed by the application of a no-signalling bipartite channel between the channels uses. This includes the quantum feedback-assisted communication with entanglement assistance discussed in Refs. [84,106].
To evaluate O (id ), we will use the following lemma to relate it with the Choi fidelity O (id ) = max M ∈O id , M , which is easier to compute. This is conceptually similar to an argument we made in the proof of Prop. 8.

Proof. We have that
where in the first line we chose M ★ ∈ O as an optimal channel for O in (9), in the second line we chose ★ as a state such that is a feasible solution for the dual form of max in (12), and the last line is by definition.
For the theory of no-signalling channels, a direct computation tells us that O (id ) = 1 2 . Coupled with the fact O (id ) = 2 [8], we obtain from Lem. 16 that O (id ) = 1 2 . This can alternatively be seen by a direct calculation: where the first line is due to Lem. 4, and in the second line we used that a bipartite pure state allows for a Schmidt decomposition | = −1 =0 | | where max max in the third line. This implies in particular that O (id ⊗ 2 ) = 1 2 2 . First, it is insightful to see what the bounds of Thm. 1(a) tell us about one-shot transformations E → id . Here, the maximal fidelity achievable under no-signalling codes can be computed with an SDP [102]. We will also use the fact that O R has been computed analytically for some simple channels [7], and O R can also be computed in such cases.
For instance, for the depolarising channel D ( ) = (1 − ) + 1 we get O R (D ) = , and the robustness and weight-based bounds are actually equal: we have ≥ ( 2 − 1)/ 2 . In fact, the bounds match the achievable fidelity, meaning that Thm. 1(a) quantifies the error in the one-shot transformation D → id under S NS exactly. On the other hand, channels such as the qubit dephasing channel Z ( ) = (1 − ) + or the amplitude damping channel N have no constant channels in their support, and thus the weight bound becomes useless with O R (Z ) = O R (N ) = 0. However, the robustness bound gives ≥ 1 2 − | − 1 2 | for the dephasing channel and ≥ 1 4 (2 + − 2 1 − ) for the amplitude damping channel, and these bounds, again, exactly equal the fidelity achievable under no-signalling assistance. Thus, we see that the error bounds of Thm. 1(a) can be tight in relevant cases. We refer to Fig. 4 of the main text for an explicit comparison.
We now apply our result to get insights into asymptotic capacity. From Thm. 3(a), we immediately obtain a strong converse bound for adaptive capacity as For the capacity with parallel protocols, we can use the asymptotic equipartition property of Ref. [7] which showed that )) being the mutual information between and . Combining this result with Thm. 3(b), we obtain Together with the known achievable capacity NS (E) ≥ 1 2 (E) [89,102], this ensures the strong converse property of no-signalling-assisted quantum capacity NS (E) = † NS (E) = 1 2 (E). Importantly, no-signalling-assisted communication includes entanglement-assisted communication as a subclass, and thus the strong converse bound for the no-signalling-assisted capacity serves as that for the entanglement-assisted capacity. We note that the strong converse property for both settings was previously shown in several different ways [7,8,84,92,107,108]; our general resource-theoretic approach provides an alternative perspective that employs the asymptotic equipartition property of Ref. [7] to show this important relation.

PPT and separable assistance
We now study another approach to quantum communication, where assistance by local operations and (two-way) classical communication is allowed. Since such operations are extremely difficult to characterise mathematically [109], various relaxations are frequently employed. A particularly useful set of operations, amenable to both an efficient numerical computations as well as a simplified analytical characterisation, is the set of PPT (positive partial transpose) codes [102,110]. In the general case of bipartite channels → , a map M is called PPT if the partial transpose of M across the : bipartition is positive [110]. Analogously, a map is separable if M is a separable operator [111]. Quantum communication through a channel E : → with PPT (or SEP) assistance is then defined as allowing Alice and Bob to perform joint PPT (or separable) operations between their successive channel uses (see e.g. [97]). The goal is then, again, to simulate a number of uses of the identity channel id 2 . Using O PPT to denote the set of all PPT channels and analogously for O SEP , for both classes of channels we have [110,112] Using now the fact that O PPT (id ) = O SEP (id ) = [11], from Lem. 16 we conclude that O PPT (id ) = O SEP (id ) = 1 , and in particular O (id ⊗ 2 ) = 1 2 . In order to describe these operations in our framework, one can notice that any PPT code, understood as a superchannel acting on a channel E : → , preserves the set of PPT channels [18,19]. This motivates us to define the class of PPT-preserving superchannels: S PPT Θ Θ(M) ∈ O PPT ∀M ∈ O PPT . Any bound obtained for such superchannels will then upper bound the capabilities of PPT codes. Similarly, one defines S SEP Θ Θ(M) ∈ O SEP ∀M ∈ O SEP as the separability-preserving superchannels. More general PPT-and separability-preserving quantum processes are defined analogously.
Using this approach, our results immediately provide several bounds on the capabilities of both PPT and separable codes in assisting quantum communication, both in the non-asymptotic and asymptotic settings. We discuss in particular the applications to upper bounding quantum capacity.
Denoting by SEP,adap (E) the quantum capacity of a channel E assisted by general (adaptive, two-way) separable codes, Thm. 3(a) gives Here, log O SEP is the so-called max-relative entropy of entanglement of a channel, and our result recovers exactly the strong converse bound of Ref. [16] (see below for a clarification regarding this quantity). Our approach then provides a remarkable simplification of the proof methods used to show this relation. Note that the quantity O SEP was recently also considered in Ref. [11] in the different context of quantifying the memory of a quantum channel, where it was evaluated for a number of cases, and in particular shown to be computable exactly for low-dimensional channels: as long as ≤ 3 and = 2, we have This provides an analytically computable strong converse bound for the capacities of all qubit channels.
Although the asymptotic equipartition property was shown for the diamond norm smoothing, it can be straightforwardly extended to the fidelity smoothing using the relation between the diamond norm and fidelity [43,46].
Similarly, for PPT codes we get PPT,adap This gives a general, SDP-computable bound on the quantum capacity and is similar to the results of Refs. [93,96], although the robustness (max-relative entropy)-based quantities in those works optimise over a larger set of maps than our O PPT . Specifically, following the observations in Refs. [19,99], the precise result of Refs. [93,96] can be recovered exactly in the framework of this work if one considers the class of completely PPT-preserving superchannels [18] instead of PPT-preserving superchannels S PPT as we have done here. We also note that O PPT (E) = max{1, E ∞ } for any channel with = 2 [11]. Of note is the fact that previous approaches to establishing such bounds typically relied on so-called amortised monotones [16,96,97,113] defined at the level of quantum states, while our method is based on resource measures directly at the level of channels. Our results not only provide a streamlined approach to recovering the specialised bounds of Refs. [16,96], but also reveal these approaches to be a part of a broad resource-theoretic framework. A notable technical difference between our approach and the previous methods is that we do not need to make any assumptions about the structure of the communication protocol, and in particular we do not need to impose that it is constructed by sequentially composing free channel transformations, as previous works did.
Let us also note that the generality of our methods allows them to be immediately applicable to related settings such as secret key agreement [114] (giving bounds on the private capacities and recovering a result of Ref. [16]) and bidirectional quantum communication [115] (giving bounds on quantum and private capacities, recovering results similar to Ref. [116]).
Before proceeding further, let us clarify a difference between our measures O SEP , O PPT and related quantities found in literature. Max-relative entropy quantities for channels are often defined in the literature with respect to a set of quantum states, rather than trace-preserving maps, which might seem different from our definitions. For instance, the max-relative entropy of entanglement of a channel was originally defined as [16] where ∈ F SEP optimises over all separable states -not necessarily valid Choi matrices -which can be understood as an optimisation over a cone of completely positive maps. However, it is not difficult to show that this definition is equal to ours.  This result shows a bound of Ref. [16,Thm. 5.2] (also [117,Prop. 1]) to be tight. In the formalism of 'resource-generating power' found in Refs. [3,4,20], the result can be understood as the fact that the entanglement-generating power of a channel E : → in terms of the generalised robustness of entanglement F SEP is equal to the robustness O SEP (E).

Quantum gate synthesis and magic states
To realise reliable computation under the presence of noise, it is essential to encode quantum states into higher-dimensional spaces by a quantum error correcting code and carry out computation within the logical subspace in a fault-tolerant manner [118]. Among the whole class of quantum gates, Clifford gates are known to be easier to implement on many codes of interest [119][120][121][122][123] whereas logical implementation of non-Clifford gates results in a large overhead cost [124,125]. The cost is often quantified in terms of the T gates, with T being the single-qubit unitary = diag(1, /4 ) [125] which enables universal quantum computation together with Clifford gates and can be considered as the most difficult to implement. In particular, any circuit composed of only Clifford gates can be efficiently simulated on a classical computer [121], but a circuit built with the Clifford+T gate set has a simulation time exponential in the number of T gates [126]. Much effort has therefore been devoted to studying the optimal ways to realise circuits with minimal T-gate cost [127][128][129].
However, choices of relevant operations other than the commonly used set of Clifford+T gates can be made. Indeed, the Clifford unitaries are not the only types of channels that admit efficient classical simulation schemes: this holds true also for stabiliser operations [130] and an even larger class of completely stabiliser-preserving channels [26]. Such extensions have shown potential to significantly improve the cost of implementing quantum circuits in terms of the T-gate count [131][132][133] as well as the simulation cost of noisy circuits [60,85,130,[134][135][136]. It is therefore of interest to understand the limitations on manipulating quantum circuits through the most general means.
The task of synthesising gates and circuits is often realised through magic state distillation [125], which aims to prepare clean magic states -states that cannot be obtained with stabiliser operations alone -and use them to implement costly quantum gates through the scheme of state injection [124]. Magic state distillation can provide feasible ways to synthesise gates and circuits [129], and investigating the precise relations between distillation and gate synthesis is highly important in paving the way to fault-tolerant quantum computation [129,137,138].
The resource theory of magic was thus introduced to understand the limitations of manipulating and distilling states using different free operations [26,130,136]. We can apply our general results to this formalism to obtain fundamental limitations on both magic state distillation and more general gate manipulation protocols which act the level of quantum channels directly. In this resource theory, stabiliser operations O STAB are built through Clifford gates, Pauli measurements, and preparations of ancillary states in the computational basis [130]. Then, the set of stabiliser states F STAB is defined to be the states that can be created by such operations, which coincides with the convex hull of eigenstates of Pauli operators. The larger set of completely stabiliser-preserving operations [26] is defined as all channels whose Choi state satisfies E ∈ F STAB , which allows the computation of quantifiers such as O and O as semidefinite programs (although their size scales superexponentially with the number of qubits). By choosing this set as our free operations O, we ensure that all of our bounds will apply also to smaller sets of transformations, which includes all channels typically considered in the study of magic. The channel manipulation protocols S that we consider include, in particular, any pre-and post-processing of the channel with other completely stabiliser-preserving channels.
Of particular importance are the channels which can be realised through state injection, such as unitary gates from the third level of the Clifford hierarchy C 3 [124], that is, unitaries which map Pauli operators to Clifford gates. This includes many gates of practical relevance such as the T gate, the controlled-phase gate CS, the controlled-controlled-Z gate CCZ, or the Toffoli gate. All channels which can be implemented in this way obey the following relation, which we will prove and generalise in the next section.

Lemma 18.
Let N : → be any channel which can be implemented by state injection, that is, for which there exists a free state ∈ F STAB ( ⊗ ) and free operation G ∈ O such that the state = id ⊗ N ( ) allows for the implementation of the channel N as N ( ) = G( ⊗ ) ∀ . Then That is, the channel resource measures all reduce to the corresponding state-based resource measures of the associated state = id ⊗ N ( ).
The Lemma can be considered as an application of the idea of resource simulability [27,97], which generalises the notions of teleportation-based simulation from entanglement theory [15,74,100]. Importantly, this result is valid for all unitary channels U (·) = · † where is a -qubit unitary ∈ C 3 and is the 2 -qubit maximally entangled state [124]. When is additionally a diagonal gate, the state injection can be performed more easily with the state = |+ +| ⊗ † . Notably, when is a 1-, 2-, or 3-qubit diagonal unitary in C 3 , then the quantities O (U) and O (U) become multiplicative, owing to the multiplicativity of the associated state-based monotones [60,85]. We will hereafter restrict our discussion to diagonal unitaries ∈ C 3 of up to 3 qubits for simplicity, as this is enough to encompass most gates of practical interest (or gates Clifford-equivalent thereto). We then use | |+ ⊗ for the corresponding states. An interesting property of Lem. 18 for such gates is that the result actually does not depend on the choice of free operations, meaning that the robustness, weight, and fidelity measures all have the same values for any set of channels O which can implement state injection gadgets -this ranges from the practically relevant stabiliser operations to all stabiliser-preserving channels [26]. Let us now look at how our main results can be understood in this setting. In general, the bound of Thm. 13 provides insight into the best achievable performance of any free channel transformation protocol; namely, for any transformation → which takes uses of a diagonal gate ∈ C 3 to copies of a target diagonal gate ∈ C 3 , it holds that For instance, using the known values of F STAB (| |) and F STAB (| |) [85] we conclude that ≥ 3.6335 and so 4 T gates are necessary to perfectly synthesise a CCZ gate -this is a slight strengthening of previous results which showed this optimality in other settings [136,139], as we establish that even the most general, adaptive channel manipulation protocols cannot do better.
In practice, the input gate might be affected by noise, and similarly the output of the protocol might only be required to be a good approximation of the target gate up to small errors. We can then apply Thm. 2 to show that, for any channel E and any deterministic gate synthesis protocol Υ which transforms uses of a noisy channel E to copies of the target unitary channel U (·) = · † up to accuracy (Υ(E, . . . , E), U ⊗ ) ≥ 1 − , it holds that We stress that the coefficient F STAB ( ) = max ∈F STAB | | is the stabiliser fidelity [85] of the associated state | , which is known exactly for most gates of interest [85,139]: for instance, F STAB ( ) = 1 4 (2 + √ 2), F STAB ( ) = 9 16 . The bounds thus establish universal, efficiently computable restrictions on gate synthesis protocols, providing in particular a fundamental lower bound on the associated resource cost that any physical protocol must satisfy. Considering the previous example of the transformation → , we obtain > 3 for error values up to ≈ 0.095, showing that a large error is necessary if one employs fewer than 4 T gates in the transformation.
As a special case, Thm. 2 gives also fundamental restrictions on the resource cost of magic state distillation protocols. In particular, for any protocol M ∈ O which satisfies (M ( ⊗ ), | | ⊗ ) ≥ 1 − , we necessarily have Our weight-based bound improves on the previous bound of Ref. [75] and extends its applicability beyond full-rank input states. An explicit comparison in Fig. 3 of the main text reveals that this improvement can be substantial. We note that the robustness bound previously appeared in Ref. [60] in this context. Moreover, from Thm. 3(a) we obtain a strong converse bound on all transformations of channels as † for any diagonal ∈ C 3 of up to 3 qubits. This establishes limitations on the capabilities of general adaptive protocols in gate synthesis in the asymptotic limit. We now consider some explicit examples to compare the different bounds.
In the examples below, we compute the robustness O and weight O with the choice of O as the completely stabiliser-preserving operations. We refer to Fig. 3 of the main text. First, we compare the bounds on the error incurred in one-shot transformations, as stated in Thm. 1(a) and Cor. 10. In Fig. 3a, our aim is to transform the noisy T gate N = D • T to the CCZ gate, where D ( ) = (1 − ) + 1 2 is the depolarising channel. Notably, we see that the robustness bound indicates a significant error also in the noiseless case ( = 0), whereas the weight bound becomes trivial for noiseless inputs. In Fig. 3b, we study the error incurred in distilling the | state from three copies of the noisy T state = D (| |). The result explicitly shows the substantial improvement of our bounds over the bounds of Ref. [75], which become nearly 0 in this case. We also see the previously remarked fact that a significant (≈ 0.1) error is unavoidable in the conversion | ⊗3 → | . When considering the distillation of noisy resources, noise applied at the level of channels can affect the results in different ways than noise applied at the level of states. For example, in subfigures c-d we present a comparison between the bounds for magic state distillation from the noisy T state and for gate synthesis from the noisy T gate N . The result shows a difference between the gate synthesis properties of the two: we can see that the bounds impose much higher requirements on the number of noisy states required to succeed. The comparatively weaker bound on the required copies of the noisy T gate indicates a possibility that, as long as a certain type of noise is fixed (e.g. depolarising noise here), manipulating the noisy gate at the level of channels might offer an improvement over methods which rely on distilling | states from the noisy state .
The difference between the state and channel cases can be understood as follows. Letting G be the standard state injection gadget such that G(· ⊗ | |) = T (·) [124], one can easily verify that G(· ⊗ 1 2 ) is the completely dephasing channel Z 1/2 ( ) = 1 2 + 1 2 . This means that the state is equivalent to the channel N Z = (1 − )T + Z 1/2 , and thus depolarising noise at the level of states corresponds to dephasing, rather than depolarising, at the level of channels. In particular, Lem. 18 gives O (N Z ) = F ( ) and analogously for the other quantities.

Remarks on the case of qudits
Our results apply in an analogous way to the resource theory of magic for qudit systems of prime dimension [130]. Here, our robustness-based bounds can be used to recover results related to Refs. [27,86], although these works considered a slightly different approach based on the discrete Wigner function and associated norms. The technical differences in the framework of Ref. [86], and in particular the use of sub-normalised operators rather than normalised quantum states, mean that these results do not exactly fit within our framework. Nevertheless, we can use them to get insights about important states in this theory.
In particular, in Table I of the main text we reported the value of F (and analogously O ) for a selection of quantum states, and in particular the + state and the Norrell state. These results are obtained by leveraging the findings of Ref. [86]: for the aforementioned states, Ref. [86] showed that log F ( ) (max-relative entropy of magic) equals a quantity called the 'min-thauma' min ( ). But in general it holds that where the first inequality holds for any pure state [66,140] (cf. Lem. 16), and the second is a consequence of the definition of min [86]. Therefore, whenever min ( ) = log F ( ), we conclude that also log F ( ) has the same value. For channels and states whose value of O or F has not been established or is not known to be multiplicative (such as the qutrit T state), one can instead substitute F with 2 − min in all of our bounds. In particular, since min is additive on tensor products [86], the many-copy results of Thms. 3(a) and 3(b) can always be applied by choosing F ( ) = 2 − min ( ) .

Proof of Lem. 18 and extension to other resources
The basic idea behind Lem. 18 is to exploit the fact that some channels can be reversibly interconverted with state resources through free operations, which means that the two types of resources can be considered as equivalent. This idea was first applied in the theory of entanglement [100,Sec. 5], later extended to general entanglement manipulation protocols [15] and to other types of resource theories [17,97]. Within the theory of magic states, the idea can be understood as a generalisation of gate teleportation [124].
A general formulation of this property is as follows (see also [17,Sec. VI]).

Lemma 19.
Let O be any class of free operations which can prepare all free states F, that is, R ∈ O ∀ ∈ F. Let N : → be any channel such that: (i) there exists a free superchannel Γ ∈ S and a state such that Γ(R ) = N , Then In practice, the superchannel Γ is often realised as a state injection protocol which provides as an ancillary system and processes the joint state with a free operation in O. This is the way in which this property is usually applied both in entanglement theory [15,74,100] and in more general settings [17,97]. For completeness, we provide a statement of the property in this form (see also [27]).

Lemma 20. Consider any resource theory such that ,
∈ F ⇒ ⊗ ∈ F and let O be a chosen class of free operations. Let N : → be any channel such that: (i) it can be implemented through state injection, that is, there exists a free operation G ∈ O( → ) and a state ∈ D( ) such that G(· ⊗ ) = N (·), (ii) there exists a free transformation Θ ∈ S such that Θ(N ) = R ; for example, N ( ) = for some ∈ F, or Then Remark. In particular, the result holds in the resource theory of magic for any -qubit unitary channel U (·) = · † from the third level of the Clifford hierarchy [124]. In this case: O can be any subset of completely stabiliser-preserving operations which allows for the implementation of state injection gadgets; is given by the Choi state (1 ⊗ ) | + + | (1 ⊗ † ) where | + is the 2 -qubit maximally entangled state; and the free superchannel Θ consist of simply preparing the free state | + + | before the application of U. In the case of diagonal gates in the third level of the Clifford hierarchy, we can use = |+ +| ⊗ † and the result is valid for the larger class of stabiliser-preserving operations. A similar state injection protocol can also be employed in qudit systems with odd prime dimensions as well [141,142].
Proof. The first part of the proof can be shown following [27,Prop. 22]: where the first inequality is since G(· ⊗ ) ∈ O for any ∈ F, the second inequality is by the data processing inequality of max (E F ) [67], and the equality in the second-to-last line is by the data processing inequality of max ( ) [68]. On the other hand, using the monotonicity of O under free superchannels, we have where the last inequality is by Lem. 7. The cases of O and O proceed in the same way.
An interesting consequence of such an operational equivalence of channels N and states is that it allows us to simplify general, adaptive channel manipulation protocols into simply acting on copies of the state through the use of so-called teleportation stretching [15] (see also [97]).

Supplementary Note 5: Extension to probabilistic protocols
We now consider probabilistic sub-superchannels that need not preserve quantum channels, but can instead map them to their probabilistic implementations -completely positive and trace-non-increasing maps. Recall that the set of free subchannelsÕ with respect to the given set of free channels O is given bỹ and the corresponding set of sub-superchannels isS := Θ ∀M ∈ O,Θ(M) ∈Õ . Our figure of merit in probabilistic channel transformations is then the conditioned fidelity where ( ) = Tr[id ⊗Θ(E) ( )].
It is instructive to see how the above notions translate into that for state theories. By taking preparation channels as free channels and relabeling O → F and S → O, Eq. (104) reduces toF ˜ ˜ ∈ cone(F), Tr[˜ ] ≤ 1 and correspondingly we getÕ M ∀ ∈ F,M ( ) ∈F . As for the conditional fidelity, since our free superchannels transform preparation channels to preparation channels, we replace the target unitary with target preparation channel that prepares target state . WritingM ( ) = , ∈ D, the conditional fidelity reduces to much clearer form cond ( ,M ( )) = ( , ).
With this, we are now ready to present an extension of Thm. 1(a) as follows. and where O (U) := max M ∈O (id ⊗ U ( ), id ⊗ M ( )). Alternatively, taking M ∈ O to be a free channel such that E ≥ O (E) M , it holds that We also obtain corresponding bounds for state-based theories.

Proposition 22.
If there exists a free subchannelM ∈Õ such thatM ( ) = with ( , ) ≥ 1 − for some resourceful pure state . Then, it holds that and Alternatively, taking ∈ F to be a free state such that ≥ F ( ) , it holds that It can be easily seen that whenΘ is a superchannel, = 1 holds for any and the bounds of Thm. 21 reproduce Thm. 1(a). An interesting question is whether the no-go statement implied by Thm. 1(a), which says that perfect purification with = 0 is impossible for any channel with O (E) > 0, remains valid in probabilistic cases. For the choice of M as an optimal channel satisfying E ≥ O (E) M , Eq. (108) (and (111) for state theories) implies that if Tr[id ⊗Θ(M) ( )] > 0, the no-go theorem still holds. On the other hand, if Tr[id ⊗Θ(M) ( )] = 0, meaning that the free part of E is completely cut off by the selective operationΘ, then this does not give us any insight into . This is actually a natural consequence because such a perfect purification is indeed possible, implying that one cannot expect a bound in which > 0 for O (E) > 0 (see also [143]).
As an illustrative example, let us consider a state theory, in particular, the theory of coherence where F is the set conv{|0 0| , |1 1| , |2 2|} defined on a three-dimensional system. Take = (1/2) |0 0| + (1/2) |+ 12 + 12 | where |+ 12 = (1/ √ 2) (|1 + |2 ). Consider a free subchannelM (·) = 12 · 12 where 12 |1 1| + |2 2| is the projector onto the subspace spanned by {|1 , |2 }. Then we getM ( ) = (1/2) |+ 12 + 12 |, indicating that we can obtain a pure state |+ 12 with probability 1/2, although F ( ) = 1/2 > 0. This shows a clear contrast to the deterministic case of Thm. 1(a) and Cor. 10 as well as the probabilistic bounds shown in Ref. [75], which showed that perfect purification is impossible even probabilistically for full-rank states. Another interesting observation from Thm. 21 is that unlike Eqs. (106) and (107) where the lower bounds for error decrease as success probability decreases, Eq. (108) appears to have an opposite behaviour with respect to the success probability. In fact, there is an intricate trade-off between the probability of detecting the free component Tr[id ⊗Θ(M) ( )] and success probability , and this provides a practical instruction that to accomplish a purification protocol with high accuracy, it is crucial to reduce the probability of detecting the free part of the given channel, as characterised by the resource weight O .

Proof of
The last line is because where we used thatΘ(M) ∈Õ, ensuring by (104) To get the third line, we bound the first term in the second line by using (113) and identifying = Tr[id ⊗Θ(M) ( )], which can be seen by taking trace in both sides of (114). We used a similar reasoning to bound the second term; for any channel N , sub-superchannelΘ, and state , there exists another channel N such that id ⊗Θ(N ) ( ) = · id ⊗ N ( ) with we equate the right-hand side of the third line of (115)  Using Lem. 23 we get that namely, 1 − F ( ) also shows the strong monotonicity (with the opposite direction of inequality). Then, since one can always construct a free instrument by complementingM with a replacement subchannelR defined by the Choi matrix R = (1 − Tr M ) ⊗ , ∈ F, (120) implies that where in the last inequality we used Cor. 10. A simple reordering of the terms leads to the bound in the statement.