Lossless quantum data compression with exponential penalization: an operational interpretation of the quantum Rényi entropy

Based on the problem of quantum data compression in a lossless way, we present here an operational interpretation for the family of quantum Rényi entropies. In order to do this, we appeal to a very general quantum encoding scheme that satisfies a quantum version of the Kraft-McMillan inequality. Then, in the standard situation, where one is intended to minimize the usual average length of the quantum codewords, we recover the known results, namely that the von Neumann entropy of the source bounds the average length of the optimal codes. Otherwise, we show that by invoking an exponential average length, related to an exponential penalization over large codewords, the quantum Rényi entropies arise as the natural quantities relating the optimal encoding schemes with the source description, playing an analogous role to that of von Neumann entropy.

One of the main concerns in classical and quantum information theory is the problem of encoding information by using fewest resources as possible. This task is known as data compression and it can be carried out either in a lossy or a lossless way, depending on whether the original data can be recovered with or without errors, respectively.
Here, we are interested in lossless quantum data compression. In order to state our proposal, let us first recall how this task works in the classical domain. The mathematical foundations of classical data compression can be found in the seminal paper of Shannon 1 (see e.g. 2 for an introduction to the topic), although we can summarize it as follows. Let = S p s { , } i i be a classical source where each symbol s i has associated a probability of occurrence p i . The idea is to assign to each symbol a codeword c s 1} in an adequate way. In particular, a k-ary classical code c of S is said uniquely decodable if this assignment of codewords is injective for any possible concatenation. A celebrated result states that any uniquely decodable code necessarily satisfies the Kraft-McMillan inequality 3,4 : where i  is the length of the codeword c s ( ) i (measured in bits if k = 2). Conversely, given a set of codewords lengths { } i  , there exists a uniquely decodable code with these lengths. Thus, lossless data compression consists in finding a uniquely decodable code taking into account the statistical description of the source. Formally, this is carried out by minimizing the average codeword length  L p i i i = ∑ subject to the Kraft-McMillan inequality. In the end, one obtains a variable-length code where shorter codewords are assigned to symbols with a high probability of occurrence, whereas larger codewords are assigned to symbols with low probability (see 2 , chap. 5). Moreover, one has that (in the limit of the large number of independent and identically-distributed sources) the average length of the optimal code is arbitrarily close to the Shannon entropy 1 of the source, H p p p ( ) log i i k i = −∑ . As noticed by Campbell, the previous solution has the disadvantage that it can happen that the codeword length turns out to be very large for symbols with a sufficiently low probability of occurrence 5 . Indeed, the use of average codewords length as a criterion of performance has the implicit assumption that the cost varies linearly with the codeword length, which is not always desirable. For instance, it could be the case that adding a letter to a large codeword may have a larger impact than adding a letter to a shorter codeword, for instance in terms of memory needed to store a codeword. This problem has given place to the proposal of several other measures of codeword lengths (see e.g. [6][7][8][9] ), for which the average length is a limiting case. In particular, a generalized average t-length, also called exponential average, is defined as 5 t 0 is a parameter related to the cost. Notice that in the limiting case → t 0 one recovers → L L t and, as t increases, a greater penalization over the large codewords holds. Indeed, Campbell has obtained a source coding theorem taking into account such a penalization. His theorem is similar to the standard one, but the encoding is made in such a way that the generalized codeword length turns out to be arbitrarily close to the Rényi entropy 10 of the source, H p p ( ) log k i i This remarkable result provides an operational interpretation of the Rényi entropy as the natural information measure for the problem of optimal data compression with penalization over large codewords (see also 11 for a discussion of an axiomatic derivation of entropy related to the coding problem).
As we have seen, variable-length codes arise naturally in the problem of lossless classical data compression. In the quantum information theory realm, the formulation of this problem presents intrinsic difficulties. These difficulties are mainly related to the fact that a quantum source can possibly send mutually non-orthogonal states. Thus, one has to deal with superpositions of quantum codewords. Even worse, these superpositions may correspond to codewords of different lengths. Schumacher and Westmoreland were the first in establishing a general approach to the problem of quantum variable-length coding 12 . Furthermore, they have provided the first quantum version of the Kraft-McMillan inequality and have found that the von Neumann entropy of the source ρ, ρ ρ ρ = − S( ) Tr( log ) 2 (binary logarithm for coding in qubits), plays an analogous role to that of the Shannon entropy in the classical source coding theorem. Several other authors have contributed to this subject proposing alternative or extended schemes [12][13][14][15][16][17][18][19][20] . In general, these approaches face the same disadvantage as in the classical case: namely they do not consider the fact that large codewords, even appearing with low probabilities, may have large impact in terms of resources needed for the encoding. This drawback is even more relevant nowadays, due to the fact that the practical implementation of quantum information protocols pose the challenge of manipulating coherent superpositions of qubits. While the use of chains of qubits of arbitrary length may arise naturally in some theoretical considerations, it can be very expensive and difficult to implement large chains in the lab, specially at the early stages of the development of quantum information technology devices. Thus, our goal is to provide a quantum version of Campbell's strategy for the problem of coding with penalization of large codewords. As a consequence, we show that in this framework the quantum Rényi entropies emerge as the natural quantities relating the optimal encoding schemes with the source description. Accordingly, we provide an operational interpretation for those entropies.

Uniquely decodable quantum code and quantum Kraft-McMillan inequality.
In this section, we summarize some definitions and results of the literature related to our proposal. We begin by pointing out the problem of lossless quantum compression. Lossless quantum compression consists in compressing a quantum source given by an ensemble of quantum states, by using a variable-length quantum code so that the original states can be exactly recovered, i.e., without error. More precisely, the situation to deal with is the following. Let us assume that a quantum source produces an ensemble of quantum states p s { , } n n n N 1 and ∈ ≡ H S s n d . The first task is to encode in an unambiguous or uniquely decodable way not only every single quantum state s n of the source, but also any string of quantum states of the source. In this sense, let us first introduce a very general definition of a uniquely decodable quantum source code. Definition 1.A uniquely decodable quantum source code of  over a quantum k-ary alphabet  = { 0 , In this way, the fact that U is an isometry guarantees an injective mapping which assigns for each string of the form . Now, with this observation we can define an The physical interpretation of U ∞ is that for each sentence s of the source, we will obtain the right coded sentence for each . It is important to remark that all these coding schemes are lossless in the sense of definition 1.
As it is well known in classical data compression, the Kraft-McMillan inequality gives a necessary and sufficient condition for the existence of a uniquely decodable code (see e.g. 2 ,). This result has been originally extended to the quantum domain in 12 , introducing a particular formalism. We proceed here to obtain a quantum version of the Kraft-McMillan inequality, compatible with the previous construction.
Let us first introduce the length observable, which allows to get a further notion of codeword length. Theorem 1. For any losless quantum encoding scheme U given by Eq. (1), the following inequality must be satisfied: The proof of this theorem, which mainly relies in its classical counterpart, is given in the section Methods, along with the proofs of the subsequent theorems.
Source coding and von Neumann entropy bounds. As in the classical case, we are interested in quantum codes that minimize the amount of resources involved. However, in the quantum case arises an extra difficulty to quantify the number of resources since there is no a unique way of defining the notion of length of a quantum codeword. For a given encoding scheme U, the standard definition of quantum codeword length is the following.
Thus, from this definition, the codewords may not have definite length in the sense that they are not eigenstates of the length operator in the general case. For that reason a quantum code given by the encoding scheme (1) is sometimes called quantum indeterminate-length code 12 .
As we have noticed, one can introduce another important measure of the length of a quantum codeword. One used in the literature is the base length 13 : Notice that the base length plays a key role as it determines the minimum size of the quantum register necessary to store a quantum codeword.
The base length of a quantum codeword is an integer whereas the quantum codeword length is not, in general. However, there is a relation between both lengths given by ω , with equality if and only if U s ω = is an eigenstate of Λ, i.e., if |s〉 is an eigenstate of U.
Henceforth, we consider that the state of the quantum source  is given by the density operator ρ, i.e., a positive semi-definite operator of trace one acting on d  . We will write the density operator using the decomposition on ensemble's states, i.e., p s s n N n n n 1 ρ = ∑ = , or equivalently, considering the spectral decomposition, i.e., , where ρ i is the eigenvalue corresponding to the eigenstate ρ i . In addition, we will denote as † the output of the quantum encoder (1). Then, according to definition 3, the average codeword length of  is given by On the other hand, according to definition 4, the base length of  is We have now all the ingredients to introduce optimal quantum lossless codes.
and thus the minimal average codeword length for the source  is given by In the classical setting to search for the optimal code, one has to find for the set of integers { } i  that minimizes the averaged length subjected to the Kraft-McMillan inequality. It is well known that Huffman code provides the optimal solution 21 . Let us see that the quantum optimal code or the quantum version of Huffman code is obtained for an encoding scheme U with basis given by the eigenstates of ρ and the classical code c given by the Huffman code for the symbols d {1, , } … with probabilities given by the eigenvalues of ρ.
Theorem 2. The optimal quantum code of the quantum source  writes ⌈ ⌉ , and construct a corresponding code using the Kraft tree (see 2 for more details). This is a well known method called the Shannon coding for which the average length is close to the optimal one (which is given by the Huffman code). Accordingly, we can say that the quantum version of the Shannon code is given by an encoding scheme of the form (13), where the classical code c is now given by the Shannon code. Nevertheless, without explicitly expressing the optimal code, it is possible to upper and lower bound the optimal average codeword length in terms of the von Neumann entropy of the source, as previously proved in a different formalism in 12 .

Theorem 3. The average length of the optimal code is lower and upper bounded as follows
→∞ ⊗ We end this section discussing what happens to the average codeword length when the encoding scheme is designed for a "wrong" density operator τ instead of the correct one ρ. This could be useful for the case where τ is the best estimation of the state of the source for instance. In such a situation, the average code length of the quantum Shannon code corresponding to τ is again bounded, as follows (see, e.g., refs 19,20 . The average length of such a quantum encoding is bounded as follows Sh . Notice that this gives an operational interpretation to the quantum relative entropy as follows: ρ τ S( ) measures the deviation from the average codeword length of the quantum Shannon code, when the code is designed using a density operator which differs from density operator associated to the source (see also 22,23 , for a further understanding of the role of quantum relative entropy in the context of data compression).

Source coding and quantum Rényi entropy bounds.
Let us first note that the definition 5 of optimal code and the results given above are closely linked to the standard definition 3 of the length of a quantum codeword. However, there could be problems for which the relevant measure of length is not the usual one. In this sense, Müller et al. have used the average of the base lengths of the source in order to define a different optimal code 18 and have obtained a complementary result to the one given by theorem 3. In this section we follow an alternative strategy, which is based on an extension of Campbell's proposal to the quantum case 5 . Let us first introduce a notion of exponential quantum codeword length. The standard quantum codeword and base lengths turn out to be particular cases of our definition. Definition 6. The t-exponential length of a quantum codeword ω ≡ U s for some ∈ H S s is given by the expectation value where ≥ t 0 is a parameter related to the cost assigned to large codewords. In the limiting cases, one has for ≤ ′ t t . Thus, by changing the parameter t, one can move continuously and increasingly from the standard quantum codeword length to the base length. In other words, the t-exponential codeword length will allow to make a compromise between minimizing the average length and the base length. Finally, note that if   ω ∈ , i.e., the quantum codeword is an eigenstate of the length observable, then   ( ) t ω = , which is a reasonable property for a quantum codeword length measure.
According to definition 6, the t-exponential average codeword length of the quantum source  is given by We introduce now the notion of optimal quantum code corresponding to our previously defined t-exponential codeword length. A natural choice is as follows: Definition 7. A quantum encoding scheme U is t-exponential optimal for the source  if it minimizes the t-exponential average codeword length, that is, t U k U k t opt Tr( ) 1 † † and thus the minimal t-exponential average codeword length for the source  is given by In the classical setting to search for the t-exponential optimal code, as for the standard context, one has to look for the set of integers { } i  that minimizes the t-exponential averaged length subjected to the Kraft-McMillan inequality. This problem has been already solved in 7,24,25 . In the quantum context, we prove here that the optimal code is again obtained for an encoding scheme U with basis given by the eigenstates of ρ and the classical t-exponential optimal code c t for the symbols d {1, , } … with probabilities given by the eigenvalues of ρ. . As for the standard case, there is no an analytic formula for the individual optimal integer lengths  i leading to the minimum t-exponential average length of the classical code. But, again, if one drops the integer restriction of { } i  in the minimization problem, one obtains now the optimum "lengths" ρ −log k t i where the ρ t i are the "escort probabilities", eigenvalues of the "escort" density operator  (20), it is possible to upper and lower bound the optimal t-exponential average quantum codeword length (21) in terms of the quantum Rényi entropy of the source.

Theorem 6. The t-exponential average length of the t-exponential optimal code is lower and upper bounded as follows
, is the quantum Rényi entropy of the density operator of the source ρ. We recall that our aim is to provide a scheme to address the problem of how to codify codewords of a quantum source allowing chains of variable length, but considering a penalization for large codewords. This aim can be achieved by appealing to definitions 6 and 7 and theorems 5 and 6. In particular, we can interpret theorem 6 as the quantum version of Campbell's source coding theorem 5 . Hence, the quantum Rényi entropy plays a role similar to that of von Neumann's in the standard quantum source coding theorem, when an exponential penalization is considered. Indeed, theorem 3 results as a particular case of our theorem 6 (with t 0 = ), recovering the results of Schumacher and Westmoreland 12 . This situation is completely analogous to that of the classical setting, with regard to the roles played by Rényi and Shannon measures for the cases with and without penalization, respectively. Consequently, this allows us to provide a natural operational interpretation for the quantum Rényi entropy in relation with the problem of lossless quantum data compression. Finally, notice that this is an alternative approach to that of Müeller et al. 18 , where they have studied an analogous problem, but minimizing the average of the individual base lengths of the source instead of considering a penalization over large codewords.
According to theorem 6, the quantum Rényi entropy of the source bounds the compression capacity when an exponential penalization is considered. As in the case with no penalization, one can attain the lower bound for the case of K independent and identically prepared sources for large K. Thus, consider a density operator ρ ⊗K and denote by to the t-exponential optimal average length code per source. Then, using that Let us point out that the quantum Rényi entropy appears also naturally in the determination of the exponent of the average error of the quantum fixed-length source coding 26,27 , which is closely related to the Chernoff exponent appearing in classical discrimination problems. This exponent provides thus another interpretation of the quantum Rényi entropy. Our approach differs in that we study the role played by the quantum Rényi entropy in the problem of lossless quantum data compression with penalization.
As in the end of the previous section, we now discuss what happens with the t-exponential average codeword length when the encoding scheme is designed for a density operator τ, i.e., using the escort density operator . In that case the t-exponential average codeword length of the quantum Shannon code corresponding to τ t is again bounded as follows.
Scientific RepoRts | 7: 14765 | DOI:10.1038/s41598-017-13350-y Theorem 7. Let τ be a density operator whose diagonal form is τ τ τ τ = ∑ = Sh . It is important to remark that this theorem provides an operational interpretation for the quantum Rényi divergence as follows: S ( ) t t t 1 ρ τ + quantifies the deviation from the t-exponential average codeword length of the quantum Shannon code, when the code is designed using an escort density operator which differs from density operator associated to the source.
It would be desirable to have some expression that indicates how the standard average and the base length of the t-exponential optimal code behave when an exponential penalization is considered. However, this is not possible as there is no an analytic formula for the individual codeword length in this case, in general. An interesting alternative is analyzing how the the standard average of the quantum Shannon code is affected by an exponential penalization.
be the quantum Shannon code designed for the escort density operator t ρ , for which the classical codewords lengths are given by log The average length of this code is bounded as follows Notice that the bounds are basically a convex combination of the von Neumann entropy (related to the minimum average length) and the Rényi entropy (related to the minimal t-exponential average length) of the source.
On the contrary, for the base length, one can see that when ρ i is small enough, there exists a parameter t sufficiently large so that l C lC . In particular, the base length can be lessen up to with respect to t. The optimal choosing of the cost parameter depends on the particularities of the problem in question (e.g., the size of the quantum register, etc). Finally, notice that for the exceptional case that all ρ −log k ti are integers, the quantum Shannon code hence designed coincides with the t-exponential optimal code  t opt of theorem 5 and the lower bound of (24) is achieved.

Discussion
We have addressed the problem of lossless quantum data compression. In particular, we have considered the case in which codification of large codewords is penalized. Our work can be regarded as a quantum version of Campbell's work 5 . First, we have provided an expression for the optimal code for the case with exponential penalization (theorem 5) in terms of its classical counterpart 7,24,25 . We have shown that this penalization affects the optimal code in such a way that the Rényi entropy of the source bounds the t-exponential average codeword length (theorem 6). As a corollary, in the limit of a large number of independent and identically prepared sources, we have found that the capacity of compression equals the Rényi entropy of the source. Thus, the quantum Rényi entropy acquires a natural operational interpretation. In addition, we have found that a wrong description of the source produces an excess term in the bound of the average codeword length, which is related to the quantum Rényi divergence (theorem 7). Given that we recover the results by Schumacher and Westmoreland 12 when penalization is negligible, our work can be seen as an generalization of theirs.
Finally, we have discussed how the average and base lengths of the quantum Shannon code behave in terms of the cost parameter, which is related to the penalization (theorem 8). Indeed, there is a tradeoff between these two quantities, in the sense that it is possible to reduce the base length, but with the side effect of increasing the average length and viceversa.
It is worth noticing that our approach provides an alternative to that of Müeller et al. 18 , where they have studied an analogous problem, but minimizing the average of the individual base lengths of the source. Our results are complementary to theirs.

Methods
In this section, we give the proofs of all theorems.
Proof of theorem 1.

Proof. Notice first that
 is the classical average length of the classical code c. Finally, one has to find the classical optimal code c, whose solution is well known in the literature given by the Huffman code 21 . □ Proof of theorem 3.
Proof. Let us first introduce the density operator , it is straightforward to show that The upper bound in (14) immediately follows from this inequality and from   C C ( . This last one can be computed by the algorithms proposed in 7,24,25 . □ Proof of theorem 6.