Quantum tomography protocols with positivity are compressed sensing protocols

Characterising complex quantum systems is a vital task in quantum information science. Quantum tomography, the standard tool used for this purpose, uses a well-designed measurement record to reconstruct quantum states and processes. It is, however, notoriously inefficient. Recently, the classical signal reconstruction technique known as ‘compressed sensing’ has been ported to quantum information science to overcome this challenge: accurate tomography can be achieved with substantially fewer measurement settings, thereby greatly enhancing the efficiency of quantum tomography. Here we show that compressed sensing tomography of quantum systems is essentially guaranteed by a special property of quantum mechanics itself—that the mathematical objects that describe the system in quantum mechanics are matrices with non-negative eigenvalues. This result has an impact on the way quantum tomography is understood and implemented. In particular, it implies that the information obtained about a quantum system through compressed sensing methods exhibits a new sense of ‘informational completeness.’ This has important consequences on the efficiency of the data taking for quantum tomography, and enables us to construct informationally complete measurements that are robust to noise and modelling errors. Moreover, our result shows that one can expand the numerical tool-box used in quantum tomography and employ highly efficient algorithms developed to handle large dimensional matrices on a large dimensional Hilbert space. Although we mainly present our results in the context of quantum tomography, they apply to the general case of positive semidefinite matrix recovery. Researchers in the USA have demonstrated a connection between the reconstruction of quantum states and classical signal reconstruction techniques. A team led by Amir Kalev at the University of New Mexico found that the fact that quantum states describe probability distributions means that reconstruction methods for quantum states all belong to the family of compressed sensing protocols. The reconstruction of quantum states using quantum tomography becomes increasingly complex for systems with a large number of degrees of freedom. To reduce the complexity of information retrieval for practical applications, compressed sensing techniques are employed, as they are capable of signal reconstruction based on incomplete information and thus require fewer experimental measurements. The finding of Kalev’s team unifies quantum tomography methods previously considered to be distinct, and consequently could lead to better measurement protocols.


I. INTRODUCTION
Determining an unknown signal from a set of measurements is a fundamental problem in science and engineering.In the context of quantum information science, the "signals" we seek to reconstruct are quantum states and processes, and the protcol for reconstruction is known as quantum tomography (QT).In QT, we say that a measurement is informationally complete (IC) [1][2][3][4] if it is possible to uniquely determine the state (or the process) from the probabilities of the measurement outcomes.The notion of IC is related to the concept of identifiability in the theory of system identification [5].Because the standard protocols for QT scale poorly (growing as some power of the total Hilbert space dimension, which in turn grows exponentially with the number of subsystems) there has been a concentrated effort to develop techniques that minimize the resources necessary for tomography.To this end, the methodology of compressed sensing (CS) has been ported from classical signal recovery [6][7][8][9][10][11][12][13][14] and has been applied to the problem of QT [15][16][17][18][19][20][21][22][23][24].CS theory applies under the assumption that there is an a priori knowledge that the signal has a concise representation, e.g., that it is a sparse vector with a few nonzero elements or a low-rank matrix with a few nonzero singular values.Then, according to the CS methodology, one can reconstruct the signal with very high accuracy with a substantially reduced number of measurement settings when compared to a general signal with no known structure.
The CS methodology implicitly relies on the notion of IC to allow faithful recovery of a signal.However, for QT the detailed nature of the relation between measurements in CS and IC measurements has not been made explicit.In this work, we establish rigorously the connection between CS and IC through a key feature that arises in the quantum context: The density matrix (or the process matrix) is positive semidefinite.For brevity, we refer to this property as "positivity."We show that the positivity of the signal matrix implies a strict relationship between the measurements employed to achieve CS and an IC measurement record in QT.This relation has far reaching consequences for QT.First, tools provided by the CS methodology enable the construction of special types of IC measurements, which additionally are robust to noise and to small model imperfections manifesting as approximate sparsity and/or a few large eigvenvalues.Secondly, standard algorithms can then be employed that efficiently analyze data for QT from large dimensional systems without the need for specialized CS solvers.

II. BRIEF REVIEW OF QT AND CS
In quantum theory, a measurement is represented by a POVM, E = {E µ |E µ ≥ 0, µ E µ = 1}.In the context of quantum-state tomography (QST), a POVM is said to be IC, if [4], holds for all ρ b .In words, no two distinct states ρ a and ρ b yield the same measurement outcome probabilities, and thus a (noise-free) measurement record uniquely determines the state of the system.In general, for a ddimensional Hilbert space an IC POVM contains at least d 2 elements.Examples include the set of orthogonal bases associated with d 2 − 1 Hermitian observables {O i } that span the Lie algebra of SU(d), d+1 mutual unbiased bases [25], and SIC POVMs [26].The latter two are cases in which the POVMs are also tight frames: the closest to being orthonormal operator bases for the set of quantum states, with SIC POVMs being the unique minimal set of rank-one operators [26].While Eq. ( 1) gives a general formulation of the notion of IC measurements, if one has prior information about the state of the system, a different, more specific notion of IC measurements can be defined [27,28].If the state is known a priori to be of a special class, P, e.g., the class of density matrices of at most rank r, then one can consider a restricted notion of IC measurements [27].A P restricted-IC POVM uniquely identifies a quantum state within the subset P, but cannot necessarily uniquely identify it, say, from the set of all physical quantum states.More formally, in the context of QST, a POVM is said to be P restricted-IC, if [27], holds for all ρ b ∈ P. The motivation for considering P restricted-IC measurements is that they are composed of fewer POVM elements compared to IC POVMs.For example, Heinosaari et.al [27] showed that when P is the set of density matrices of at most rank r, then P restricted-IC POVMs, i.e., rank-r restricted-IC POVMs, are composed of O(rd) elements, rather than O(d 2 ) elements of IC POVMs.
The CS methodology also employs prior information to reduce the number of measurements required to reconstruct an unknown signal.Consider a d × d Hermitian matrix, M , and assume that the measurement record is specified as a vector-valued linear map, , where A is known as the sensing map.In general, when the set {A i } forms a matrix frame with d 2 elements [3] (e.g., an orthonormal basis for the space of d × d Hermitian matrices), then the measurement record is IC, and in the absence of measurement noise the signal can be recovered uniquely.However, according to the CS methodology, we can substantially reduce the number of measurement settings from which we sample, if it is a priori known that rank(M ) ≤ r, with r d.From m d 2 samples {y i }, and a suitable choice of {A i }, one can recover the matrix by solving the convex optimization, [13,14], Here, the nuclear (or trace) norm, M * = Tr √ M † M , serves as the convex proxy for rank minimization.The key requirement for CS is that the sensing map, A, satisfies the restricted isometry property (RIP) [9].The restricted isometry constant δ r is defined as the smallest number such that holds for all matrices with rank ≤ r, where M F = Tr(M † M ).Loosely speaking, when δ r is small, the sensing map A acts almost like an isometry when applied to rank ≤ r matrices: almost preserving norm and orthogonality.

III. THE RELATION BETWEEN CS AND IC
The relation between the recovery of the signal matrix and the RIP is codified in a theorem by Candès et al. [14], restated as follows: Theorem [CS].Let P r be the set of Hermitian matrices with rank ≤ r.Assume y = A[M 0 ] for some M 0 ∈ P r , where A satisfies the restricted isometry property with constant δ 4r < Importantly for our discussion, the set {M | A[M ] = y} generally contains an infinite number of elements.The above CS theorem states that among all of the elements in this set, there is a single element that has rank ≤ r.Thus, the CS theorem implies that the measurement record y = A[M 0 ] is not IC in the sense of Eq. ( 1), but rather rank-r restricted-IC in the sense of Eq. ( 2).Therefore, to reconstruct M 0 we must search only within the restricted set of matrices, P r .This is generally an NP-hard problem, but the CS methodology ensures that if δ 4r is small enough, then instead of looking for the solution within the set P r , we can instead solve the convex program, Eq. ( 3).The nuclear norm minimization serves as a convex heuristic of rank minimization.To recover the signal, it is therefore essential to minimize the nuclear norm.Minimizing any other convex function would not guarantee a search within the restricted set P r , and since the measurement record is not strictly-IC, we would not be guaranteed to find M 0 [29].For example, the convex programs arg min M Tr(M ) s.In what follows, we specialize the CS paradigm to the case of QST.There, the aim is to recover the state of the system, ρ, which has the key property of positivity, ρ ≥ 0. As we shall see, this constraint has profound theoretical and practical implications in conjunction with the CS methodology.Our results are derived from the following theorem: Theorem 1.Let H be the set of all positive semidefinite Hermitian matrices.Assume p = A[ρ 0 ] for some ρ 0 ∈ P r ∩H.If A satisfies the restricted isometry property with constant δ 4r < This is an analogous Theorem to the one presented by Bruckstein et al. [30] for the case of positive sparse vector solutions for an underdetermined set of linear equations.Its proof is given in Appendix A. It also extends a result by Candès et al. [31] from rank-1 matrices to matrices with rank ≤ r for all permissible r.The key difference between Theorem 1 and the standard CS theorem is that the RIP now guarantees that there is a unique solution within the set of all positive Hermitian matrices, not solely within the restricted set of low rank matrices.This difference has important consequences that we elaborate now.

A. Strictly-IC POVMs and CS
In general the RIP of A implies that the measurement recored is rank-r restricted-IC.Therefore, for the recovery of arbitrary (not necessarily positive) matrices, it was essential to search for the solution within the subset of matrices with rank ≤ r.Theorem 1 states that if the matrix to be estimated is positive (e.g., a density matrix), then the RIP of A implies that there is no other density matrix of any rank which yields the measurement probabilities A[ρ 0 ] = p, except ρ 0 , provided that: (i) the rank(ρ 0 ) ≤ r and (ii) δ 4r < √ 2−1.This Theorem, therefore, suggests that for positive matrix recovery, the RIP of A corresponds to a rank-r strictly-IC POVM.In this context, we call a POVM E "rank-r strictly-IC POVM" if Eq. ( 1) holds for all density matrices ρ b of at most rank r.Note that in this definition ρ a can be of any rank.A similar notion of IC POVM is also studied in [28] Geometrically, Theorem 1 states that the rankdeficient subset of the positive matrices cone is "pointed" [31].Therefore, under the promise that rank(ρ 0 ) ≤ r and A satisfies the RIP with δ 4r < √ 2 − 1, the space of matrices ρ that satisfy A[ρ] = p and the cone of positive matrices intersect in a single point ρ = ρ 0 .
Theorem 1 has a few important practical implications for CS applications in QT in general and QST in particular.First, because the measurement is rank-r strictly-IC, the solution set contains only one matrix in the set of all positive matrices.That means that we can use any optimization program to search for this single element, and by Theorem 1, we are guaranteed to find it.Thus, we have the following result: Given p = A[ρ 0 ], with rank(ρ 0 ) ≤ r, A satisfies the RIP with δ 4r < √ 2−1, then the solution to ρ = arg min ρ C(ρ) s.t.A[ρ] = p and ρ ≥ 0, (5) or to is unique, ρ = ρ 0 , where C(ρ) is a convex function of ρ, and • is any norm function.By confining the feasible set of matrices to positive matrices, we ensure the measurement is rank-r strictly-IC, and thus any convex function of ρ, or of the measurement error may serve as a cost function, not solely the convex heuristic of rank minimization, the nuclear norm (or 1 norm for the case of sparse vectors).In particular, this result suggests that the body of work of sophisticated and fast algorithms developed to handle large dimensional matrices can be employed for CS of QST (or process-maps) of large dimensional Hilbert space.Second, since for a positive matrix reconstruction RIP corresponds to rank-r strictly-IC, Theorem 1 implies that we can construct rank-r strictly-IC POVMs by means of RIP.For example, it was shown by Candès and coworkers [14] that a random Gaussian sensing map with or- The "number of measurement settings" refers to the number of samples used in CS or the number of elements in a POVM.Uniquely identifying a d×d (Hermitian or positive) matrix from the set of all permissible matrices requires an IC measurement record, typically composed of O(d 2 ) data points.If the Hermitian matrix is a priori known to be of rank ≤ r, then it can be uniquely identified from the set of all permissible matrices with rank ≤ r using a CS sensing map with m = O(rd poly(log d)) columns that satisfies the appropriate RIP condition.Such a map therefore corresponds to a rank-r restricted-IC measurement.On the other hand if the matrix is positive (with rank ≤ r), then CS measurement allows its unique identification from the set of all positive matrices.Such a CS measurement therefore corresponds to a rank-r strictly-IC measurement.
der m = O(rd poly(log d)) columns satisfies the appropriate RIP with overwhelming probability.Therefore, such maps correspond to rank-r strictly-IC measurements (with overwhelming probability) for positive signals.Moreover, in the context of a many-qubit system, Liu [32] showed that a set of expectation values of Pauli products, w = ⊗ n i=1 σ αi , where σ α ∈ {I, σ x , σ y , σ z }, when sampled and random with the same order of elements, satisfy the RIP with overwhelming probability.Therefore, this set of expectation values is, with high probability, rank-r strictly-IC measurement record.Similar results hold for sparse quantum process matrix reconstruction, e.g., it is shown in [17] that if the sensing map is constructed from random input states, and random observables, then the restricted isometry holds with high probability.The relation between RIP and the various types of IC measurements is illustrated in Fig. 1.

B. Robustness
So far, we have discussed the ideal case of a noiseless measurement record, where in the context of QT, p denoted a probability vector.The CS methodology, however, assures a robust reconstruction of the signal in the presence of noise.Our analysis inherits this crucial feature.In a realistic scenario, we allow for a noisy measurement record, f = A[ρ 0 ] + e, where we assume that the noise contribution can be bounded by some norm e ≤ .In the context of QT, we consider f to denote a vector of the observed frequencies of measurement out-comes corresponding to the POVM elements.
Theorem 1 ensures robust recovery of the positive density matrix if the noise level is small by solving any convex optimization problem.Under the assumptions of Theorem 1, any convex minimization problem that looks for a solution within the cone of positive matrices must yield a solution ρ such that ρ − ρ 0 ≤ g( ), where g( ) → 0 as → 0. From the geometrical point of view, when the noisy data arises from a rank-deficient state, as we gain data, there are fewer states that could have given rise to that data because the convex set of physical states is highly constrained near the point.In the idealized limit of perfect (noiseless) data, there in only one state compatible with the data.Therefore, qualitatively, we expect a "CS effect" no matter how we search for the solution whenever the data arises from low rank positive matrices.Quantitatively, of course, different heuristics may perform differently, yielding different estimations.Choosing the program to use depends, in part, on the noise model.For example, in Appendix B we derive a specific bound on ρ − ρ 0 F , where ρ is the solution for the nonnegative least-squares (LS) program We find that where C 0 and C 1 depend only on the RIP constant δ 4r and Tr[(ρ) c ] equals the sum of the smallest d − r eigenvalues of the estimated density matrix ρ.Note that as → 0, Tr[(ρ) c ] → 0. This bound suggests that we may regard the nonnegative-LS program as a CS heuristic.
Though we have considered above the case rank(ρ 0 ) ≤ r, the CS methodology is not restricted to rank-deficient signal matrices, see e.g., [14].The CS theory ensures the robust recovery of the dominant rank r part of the density matrix.Our analysis also shares this important and nontrivial property.Lemma 3 (given in Appendix B) is the root of this feature.
Note, nowhere have we constrained the normalization of the state to be Trρ = 1.In the noiseless case this was unnecessary, as the solution is ρ = ρ 0 , i.e., a normalized state.In the case of a noisy measurement record, one can include this normalization constraint in the optimization, but it is generally unnecessary.In fact, sometimes one can actually improve the robustness to noise by choosing Trρ = 1, as we discuss below.In general, the output of the optimization should then be renormalized to give the final estimate.
Recently, Gross and coworkers [16,18,32] have studied the problem of QST for a system of n qubits.It was shown [16,18,32] that m = O(rd poly(log d)) expectation values of random Pauli observables satisfy the RIP with the appropriate constant with high probability.If these expectation values are obtained through measurements in Pauli bases, i.e., local projective measurements on individual qubits in the eigenbasis of the Pauli observables, then, in fact, we obtain much more information, as we also obtain the frequency of occurrence of each of the POVM elements, E µ = ⊗ n i=1 P αi , where µ indexes the series of α i , α = x, y, z, and This in turn implies that m = O(rd poly(log d)) measurement outcomes in random Pauli bases correspond to a rank-r strictly-IC POVM.Since from the frequencies of the POVM outcomes one can calculate more than m expectation values, we expect that we can obtain a rank-r strictly-IC POVM using substantially fewer than m = O(rd poly(log d)) measurements of random Pauli bases.This further reduction in resources for QST has practical import for QST of large dimensional systems.
Gross et.al [16] obtained a CS version of QST by solving the minimization problem, which is equivalent to minimizing the nuclear norm of ρ under the same constraints.As noted above, minimizing the trace of the matrix in the absence of the positivity constraint is not equivalent to minimizing the nuclear norm, and therefore, would not achieve CS.While both Eq. ( 7) and Eq. ( 8) are CS programs, in general they return different estimations.However, in Appendix C we show that the constrained LS program, is formally equivalent to the nuclear norm minimization of Eq. ( 8) for a given value of t.This fact was observed empirically in a recent experiment by Smith et al, [21], in which QST via continuous measurement was achieved at a equivalent rate by both LS and trace minimization, with the positivity constraint included.The difference between the final estimate was attributed to a difference in the robustness of the two estimators to noise.Since Eqs. ( 8) and ( 9) are formally equivalent, the noisy measurement can be equivalently accommodated by solving ( 9) with a choice of t that depends on the noise bound .As always, we renormalize to obtain the final density matrix.

IV. NUMERICAL ANALYSIS
To exemplify the implication of Theorem 1, we perform numerical experiments on an n-qubit system.In our numerical experiments, we simulate independent measurements of random Pauli bases on a Haar-random pure state of dimension d = 2 n , ρ 0 = |Ψ 0 Ψ 0 |.The measurement record, given by the frequency of outcomes, f , is generated by sampling N rep times from the probability distribution p = Tr(Eρ 0 ), where E is the vector of POVM elements, each corresponding to a tensor product of projectors onto the eigenbasis of Pauli observables.Eq. ( 7) Eq. ( 8) ML FIG. 2. The average infidelity distance between the solution of different estimators and a Haar-random three qubit pure state, as a function of the number of Pauli bases.The estimations are obtained by solving three different heuristics within the cone of positive matrices: (i) the program of ( 7), (ii) the program of (8), and (iii) the maximum-(log)likelihood procedure (using the algorithm described in [33]).In the figure inset we consider the ideal noiseless case where the measurement record is p, while in the main plot we simulate noisy measurement record.
The measurement record is then used in various estimators.We measure the performance by the average infidelity over 10 random pure states, 1 − Ψ 0 |ρ|Ψ 0 .
In Fig. 2, we simulate measurements on a three qubit system, d = 8, and use four different estimators, restricted to the cone of positive matrices.In the inset we show the idealized case where the "measurement record" is the vector of probabilities p, while in the main plot includes statistical noise, and the the measurement record corresponds to frequency of outcomes for N rep = 200 repetitions.The plots clearly show that once restricted to the positive cone, the performance of the different estimators is qualitatively the same.When the number of Pauli bases satisfy the appropriate RIP, and thus corresponding to a rank-1 strictly-IC POVM, the various estimators find the exact state in idealized case of a noiseless measurement record, and find a good estimation close the state of the system in the presence of statistical noise.
In Fig. 3, we treat a large dimensional Hilbert space: a ten qubit system, d = 2 10 = 1024.We simulate 30 random Pauli bases of a Haar-random pure state with N rep = 100d repetitions for each observable.We estimate the state by solving Eq. ( 7) with a convex optimization program that can efficiently handle such large dimensional data sets.In the plot we see the CS effect due to the positivity constraint -all the information is captured in about 28 random Pauli bases.

V. SUMMARY
We have established a rigorous connection between the measurements employed in CS that satisfy RIP and IC measurements used in QT.CS relies on the a priori knowledge that the (signal) matrix has rank ≤ r (or close to such a matrix) as well as the RIP of the sensing map to ensure exact recovery of the unknown matrix in the ideal noiseless case.In the presence of noise, the RIP also ensures a robust estimation of the signal.The estimation is obtained by solving a particular convex proxy program for rank minimization.Therefore, given CS measurements, it is essential to look for the solution within the set of matrices with rank ≤ r.Hence, in general, CS measurement corresponds to rank-r restricted-IC measurement.We showed that when considering the problem of QT, because the physical matrix to be reconstructed is positive, the RIP of the sensing map implies a stronger notion: rank-r strictly-IC.This relation has two important implications.First, it suggests that solving any optimization program, be it LS minimization, trace minimization, or maximum likelihood, is effectively preforming CS, as long as the search is confined to the feasible set of positive matrices.Therefore, from a practical perspective one can achieve the CS effect with any efficient convex optimization, such as nonnegative-LS, using state-of-the art algorithms developed to handle large dimensional matrices.Second, the RIP enable us to construct strictly-IC POVMs that are robust against noise.That is, when there is measurement noise and/or the assumption that the rank ≤ r is not satisfied (but the density matrix is only close to a density matrix with rank ≤ r), it is guaranteed that the estimation will be close, in some distance measure, to the unknown matrix.
Though we have presented our results in the context of QST, they are general and apply to the case of positive sparse vectors and positive rank-deficient matrices.Lemma 3, is somewhat different than Lemma 3.2 proved in [14].However the proof of Lemma 3.2 applies directly to Lemma 3.
An important special case of this Lemma 3 is for a signal matrix M 0 with rank(M 0 ) ≤ r, that satisfies A(M 0 ) − f 2 ≤ .For this case, M c = 0, and therefore M − M 0 F ≤ C 0 . (B3) We are now ready to bound ρ − ρ 0 F .Using the triangle inequality and the result of Eq. (B3) we get where ρ * is the solution for Eq.(B1), and 2)δ4r .To bound ρ−ρ * F we use the result of Lemma 3 which give an upper bound on ρ − ρ * F .The only assumption regarding ρ that entered the proof of Lemma 3 is that it is a feasible matrix, A(ρ) − f ≤ .However, ρ is a feasible matrix for the problem of (B1) since by its definition it minimizes A(•)−f .Therefore, necessarily, A(ρ) − f ≤ .
t. y = A[M ] and arg min M y − A[M ] 2 , do not achieve CS, and will generally require m ∼ d 2 samples to yield M 0 .

FIG. 1 .
FIG.1.Venn diagram illustrating of the relation between the sensing map used for CS that satisfies the RIP, and the various types of IC POVMs used for the recovery of a general d×d Hermitian matrix (left) and of positive matrix (right).The "number of measurement settings" refers to the number of samples used in CS or the number of elements in a POVM.Uniquely identifying a d×d (Hermitian or positive) matrix from the set of all permissible matrices requires an IC measurement record, typically composed of O(d 2 ) data points.If the Hermitian matrix is a priori known to be of rank ≤ r, then it can be uniquely identified from the set of all permissible matrices with rank ≤ r using a CS sensing map with m = O(rd poly(log d)) columns that satisfies the appropriate RIP condition.Such a map therefore corresponds to a rank-r restricted-IC measurement.On the other hand if the matrix is positive (with rank ≤ r), then CS measurement allows its unique identification from the set of all positive matrices.Such a CS measurement therefore corresponds to a rank-r strictly-IC measurement.

FIG. 3 .
FIG.3.The average infidelity distance between the solution of the heuristic of (7) and a Haar-random ten qubit pure state, as a function of the number of Pauli bases (30 in total).The error bars denote standard deviation.