Implementation of quantum measurements using classical resources and only a single ancillary qubit

We propose a scheme to implement general quantum measurements, also known as Positive Operator Valued Measures (POVMs) in dimension $d$ using only classical resources and a single ancillary qubit. Our method is based on the probabilistic implementation of $d$-outcome measurements which is followed by postselection of some of the received outcomes. We conjecture that the success probability of our scheme is larger than a constant independent of $d$ for all POVMs in dimension $d$. Crucially, this conjecture implies the possibility of realizing arbitrary nonadaptive quantum measurement protocol on a $d$-dimensional system using a single auxiliary qubit with only a \emph{constant} overhead in sampling complexity. We show that the conjecture holds for typical rank-one Haar-random POVMs in arbitrary dimensions. Furthermore, we carry out extensive numerical computations showing success probability above a constant for a variety of extremal POVMs, including SIC-POVMs in dimension up to 1299. Finally, we argue that our scheme can be favourable for the experimental realization of POVMs, as noise compounding in circuits required by our scheme is typically substantially lower than in the standard scheme that directly uses Naimark's dilation theorem.

Quantum measurements recover classical information stored in quantum systems and, as such, constitute an essential part of virtually any quantum information protocol. Every physical platform has its native measurements that can be realized with relative ease. In many cases, the class of easily implementable measurements contains projective (von Neumann) measurements. However, there are numerous applications [1][2][3][4][5][6][7][8][9] in which more general quantum measurements, so called Positive-Operator-Valued Measures (POVMs), need to be implemented. Implementation of these measurements requires additional resources. A recent generalization [10] of Naimark's dilation theorem [11] showed that the most general measurement on N qubits requires N auxiliary qubits, when projective measurements can be implemented on the combined system in a randomized manner.
From the perspective of implementation in near-term quantum devices [12], it is desirable to implement arbitrary POVMs with fewer resources. Particularly, one would like to reduce the number of auxiliary qubits needed to implement a complex quantum measurement. A related problem is to quantify the relative power that generalized measurements in d-dimensional quantum systems have with respect to projective measurements in the same dimension. While POVMs appear as natural measurements for a variety of quantum information tasks: quantum state discrimination [13], quantum tomography [14][15][16], multiparameter metrology [17,18], randomness generation [19], entanglement [20] and nonlocality detection [21], hidden subgroup problem [22,23], port-based-teleportation [24][25][26], to name just a few. It is, however, not clear in general what quantitative advantage the more complex measurements offer over their simpler projective counterparts. This is because of the possibility to realize non-projective quantum measurements via randomization and post-processing of simpler measurements [10,[27][28][29][30][31][32]. Specifically, taking convex combinations of projective measurements can result in implementation of a priori quite complicated nonprojective POVMs [10,32].
In this work we advance understanding of the relative power between projective and generalized measurements by focusing on a simpler problem, namely the relation between d-outcome POVMs and general (with arbitrary number of outcomes) POVMs acting on a d-dimensional Hilbert space H ≈ C d . We find a strong evidence that general quantum measurements do not offer an asymptotically increasing advantage over d-outcome POVMs for general quantum state discrimination problems [13], as d tends to infinity. Specifically, we generalize the method of POVM simulation from [32] based on randomized implementation of restricted-class POVMs, followed by post-processing and postselection (defined later, see also Fig. 1). Here by postselection we mean disregarding certain measurement outcomes and accepting only the selected ones. In [32] it was shown that postselection allows to implement arbitrary POVM on C d using only projective measurements and classical resources. This, however, comes with a cost -the method outputs a sample from a target quantum measurement with success probability q succ = 1 d . In this work we find that, surprisingly, there exists a protocol that allows to simulate a very broad class of POVMs on C d via d-outcome POVMs and postselection with success probability q succ above a constant which is independent on the dimension d. Importantly, our construction ensures d-outcome POVMs used in the simulation can be implemented using projective measurements in Hilbert space of dimension 2d. Therefore, our method gives a way to implement quantum measurements on C d using only a single auxiliary qubit and projective measurements with constant success probability. We note that there exist schemes implementing arbitrary POVMs on C d using a sequence of von Neumann instruments (i.e., a description of quantum measurements which includes post-measurement state of the system) on a system extended by a single auxiliary qubit [33,34]. Our method is potentially simpler to implement as, in a given round of the experiment, only a single projective measurement has to be realized on the extended system and post-measurement states need not to be considered.
While we do not prove that the success probability q succ of our scheme is lower bounded by a dimension-independent constant for any POVMs on C d , we give strong evidence that this is indeed the case. First, we prove that for generic d-outcome Haar-random rank-one POVMs in C d [35] the success probability is above 6.5% (numerically we observe ≈ 25%). We also support our conjecture by numerically studying specific examples of symmetric informationally complete POVMs (SIC-POVMs) [36][37][38] and for a class of nonsymmetric informationally complete POVMs [39] (IC-POVMs), both for dimensions up to 1299. As the dimension increases, we observe that the success probability q succ both for SIC-POVMs and IC-POVMs is ≈ 1/5. Importantly, if true, our conjecture implies that any non-adaptive measurement protocol can be realized using only single ancillary qubit with a sampling overhead that does not depend on the system size.
Finally, our scheme gives a possibility of more reliable implementation of complicated POVMs in noisy quantum devices. To support this claim, we employ the noise model used in Google's recent demonstration of quantum computational advantage [40]. We make the following comparison between our method and the standard Naimark's scheme of POVM implementation: for implementing typical random POVMs on N qubits, the fidelity of circuits which implement our scheme is exponentially higher than for Naimark's implementation. This is due to the lower number of ancillary qubits required.
Preliminaries-We start by introducing notation and the concepts necessary to explain our POVM implementation scheme. We will be studying generalized quantum measurements on ddimensional Hilbert space H ≈ C d . An n-outcome POVM, is an n-tuple of linear operators on C d (usually called effects), i.e., is called projective if all its effects satisfy the following relations: P i P j = δ ij P i . Measurement of M on a quantum state ρ results in a random outcome i, distributed according to the Born rule p(i|ρ, M) = tr (ρM i ). We will denote the set of all all n-outcome POVMs by P(d, n). The set P(d, n) is convex [30]: for M, N ∈ P(d, n), and p ∈ [0, 1] we define pM + (1 − p)N to be an n-outcome POVM with the i-th effect given by [pM + (1 − p)N] i = pM i + (1 − p)N i . A convex mixture pM + (1 − p)N can be operationally interpreted as a POVM realized by applying, in a given experimental run, measurements M, N with probabilities p and 1 − p respectively. A POVM M ∈ P(d, n) is called extremal if it cannot be decomposed as a nontrivial convex combination of other POVMs.
Another classical operation that can be applied to POVMs is classical post-processing [29,41]: given a POVM M, we obtain another POVM Q (M) by probabilistically relabeling the outcomes of the measurement M. Effects of Q (M) are given by Q(M) i = j q i|j M j , where q i|j are conditional probabilities, i.e., q i|j ≥ 0 and i q i|j = 1. Lastly, postselection, i.e., the process of disregarding certain outcomes can be used to implement otherwise inaccessible POVMs. We say that a POVM L = (L 1 , . . . , L n , L n+1 ) simulates a POVM M = (M 1 , . . . , M n ) with postselection probability q if L i = qM i for i = 1, . . . , n. This nomenclature is motivated by realizing that when we implement L, then, conditioned on getting the first n outcomes, we obtain samples from M. Thus, we can simulate M by implementing L, and post-selecting on non-observing outcome n + 1. The probability of successfully doing so is q which means that a single sample of M is obtained by implementing L on average 1/q number of times. The reader is referred to [32] for a more detailed discussion of simulation via post-selection.
We will use A to denote the operator norm of a linear operator A, and [n] to denote n-element set {1, . . . n}. Moreover, we will use µ n to refer to Haar measure on n-dimensional unitary group U(n), and by P U ∼µn (A) we will denote probability of occurrence of an event A according to this probability measure. Finally, for two positive-valued functions f (x), g(x) we will write f = Θ(g) if there exist positive constants c, C > 0 such that cf (x) < g(x) < Cf (x), for sufficiently large x.
General POVM simulation protocol-The following theorem gives a general lower bound on the success probability of simulation of n-outcome POVMs via measurements with bounded number of outcomes and postselection. Theorem 1. Let M = (M 1 , M 2 , . . . , M n ) be an n-outcome POVM on C d . Let m ≤ d be a natural number and let {X γ } α γ=1 be a partition of [n] into disjoint subsets X γ satisfying |X γ | ≤ m − 1. Then, there exists a simulation scheme that uses measurements having at most m outcomes, classical randomness and post-selection that implements M with success probability (1) Furthermore, if rankM i ≤ 1, and m ≤ d, then measurements realizing the scheme can be implemented by projective measurements in dimension 2d, i.e., using a single auxiliary qubit.
Proof. In what follows we give an explicit simulation protocol that generalizes earlier result from [32,42] that concerned the case of simulation via dichotomic measurements (m = 2). The idea of the scheme is given in Fig. 1. We start by defining, for every element X γ of the partition, auxiliary measurements N Xγ , each having m + 1 outcomes, whose purpose is to "mimick" measurement M for outputs belonging to X γ and collect other (i.e., not belonging to X γ ) results in the "trash" output labelled by n + 1. Effects of N Xγ are defined by N We then define a probability distribution { qsucc λγ } α γ=1 . The simulation of M is realized by considering a convex combination of N Xγ according to this distribution: L = α γ=1 qsucc λγ N Xγ . An explicit computation shows that we have L i = q succ M i , for i ∈ [n] and therefore L simulates the target measurement M with success probability q succ .
Finally, each of the measurements N Xγ comprising L has at most |X γ | + 1 nonzero effects and therefore they can be implemented with POVMs with at most m outcomes. From the standard Naimark scheme of implementation of POVMs (c.f. [11]) we see that the dimension needed to implement a POVM N Xγ via projective measurements equals at most the sum of ranks of effects of N Xγ . In the case of rank-one M and |X γ | ≤ m − 1 this sum for each N Xγ is at most d + m − 1 ≤ 2d, which completes the proof.
Crucially, we recall that an arbitrary quantum measurement on C d can be implemented by a convex combination of rankone POVMs having at most d 2 outcomes followed by suitable post-processing [27,30]. This implies that our protocol facilitates the simulation of any POVM on C d using only a single ancillary qubit -first by decomposing the target POVM into a convex combination of rank-one ≤ d 2 -outcome measurements, and subsequently applying Theorem 1 to each of them.
Importantly, the standard Naimark's implementation of a general POVM would require appending an extra system of dimension d (which can be realised by log 2 d ancillary qubits) and carrying out a global projective measurement. Our simulation protocol greatly reduces this requirement on the dimension cost of implementing M with the possible downside being the probabilistic nature of the scheme. The success probability q succ depends on the choice of the partition {X γ } α γ=1 , and finding the optimal one (for a given bound on the size of X γ ) is in general a difficult combinatorial problem. In what follows we collect analytical and numerical results suggesting the following Conjecture. For arbitrary extremal rank-one POVM M = (M 1 , . . . , M n ) on C d , there exists a partition {X γ } α γ=1 of [n] satisfying |X γ | ≤ d − 1 such that the corresponding value of success probability q succ from Eq. (1) is larger than a positive constant independent of d.
Let us explore the intriguing conceptual consequences of the validity of this conjecture. First, consider a general nonadaptive measurement protocol that utilizes some quantum measurement M on C d . Such a protocol consists of S independent measurement rounds of a quantum state ρ resulting in outcomes i 1 , i 2 , . . . , i S distributed according to the probability distribution p(i|M, ρ) = tr(M i ρ). This experimental data is then processed to solve a specific problem at hand. If we can simulate any arbitrary M (see comment below proof of Theorem 1) via POVMs that can be implemented using only a single auxiliary qubit with probability q,which is independent of the dimension d, then this means that we can, on average, exactly reproduce the implementation of the above protocol for qS of the total S rounds. Importantly, we also know in which rounds the simulated protocol was successful, so we know which part of the output data generated by our simulation comes from the target distribution. Crucially, the above considerations are completely oblivious to the figure of merit and the structure of the problem that measurements of M aim to solve.
For many quantum information tasks, losing only a constant fraction of the measurement rounds is not prohibitive and hence, assuming the validity of the conjecture, our POVM simulation scheme offers a way to significantly reduce quantum resources needed for said POVM's implementation. Such exemplary tasks include quantum state tomography [16], quantum state discrimination [13], multi-parameter quantum metrology [17,18] or port-based teleportation [24], and will be explored by us in future works.
Our simulation protocol and the above conjecture are also relevant from the perspective of POVM simulability [10,32,43] that attracted a lot of attention recently in the context of resource theories [44][45][46][47][48][49][50]. Namely, the maximal post-selection probability, q (m) (M), with which a target POVM M on C d can be simulated using strategies utilizing randomized POVMs with at most m outcomes, quantifies how far M is from the set of m-outcome simulable POVMs in C d , denoted by S m . Moreover, q (m) (M) imposes bounds on the so-called white noise critical visibility t (m) (M) [10] and the robustness R (m) (M) [44] against simulation via POVMs from S m . Here by critical visibility we mean a parameter t (m) (M) associated with a minimal amount of white noise that ensures that noisy version of M belongs to subset S m , namely with respect to S m , we mean the minimal amount of mixing of M with a POVM from S m so that the resulting POVM belongs to S m , i.e., Now, the above quantities are bounded with the success probability of our scheme via (see Appendix A): Importantly, we note that the robustness R (m) (M) has an appealing operational interpretation: it is also expressible as the maximal relative advantage that M offers over any POVM in S m for a state discrimination task [44]: is an ensemble of quantum states, and P succ (E, M) is the probability for the minimum error discrimination of the states from E with M. Now, from the second inequality in (4) and the (conjectured) constant lower bound on q (d) we get a surprising conclusion: general POVMs on C d do not offer asymptotically increasing (with d) advantage over doutcome simulable measurements for general quantum state discrimination problems.
Haar Random POVMs-We want to qualitatively understand how q succ depends on the total number of outcomes n, the number of POVM outcomes used in the simulation m, and the dimension d. To make the problem feasible we turn to study Haar-random POVMs on C d . Quantum measurements comprising this ensemble can be realized by a construction motivated by Naimark's extension theorem: (i) attach to C d an ancillary system C a so that the composite system is n-dimensional: C d ⊗ C a ≈ C n , (ii) apply on this composite system a random unitary U chosen from the Haar measure µ n in U(C n ), and (iii) measure the composite system in the computational basis. Effects of this measurement M U are given by Haar-random POVMs were introduced first in [23] in the context of the hidden subgroup problem and are a special case of a more general family of random POVMs studied recently in [35]. Measurements M U are extremal for almost all U ∈ U(n). Furthermore, all extremal rank-one POVMs in C d are of the form M U for some U ∈ U(n), and n ∈ {d, d + 1, . . . , d 2 }. Hence, Haar-random POVMs form an ensemble consisting of extremal non-projective measurements, making them a natural test-bed for studying the performance of our simulation algorithm.
Moreover, let q (m) (M U ) be the maximal success probability of implementing M U with postselection via convex combination of m-outcome measurements using any simulation protocol. We then have The above result shows that when simulating Haar-random POVMs on C d with m-outcome measurements in our scheme, the success probability scales as m d . Furthermore, Eq. (7) shows the optimality of our method up to a factor logarithmic in d. Specifically, we obtain the following crucial result: when m = d, with overwhelming probability over the choice of ran- succ (M U ) is above 6.74%. Below we sketch the proof for Theorem 2. We provide a complete proof in Appendix C, with expressions for finite d, for bounds in Eq. (6) and (7).

Sketch of Proof. An explicit computation shows that for any
where U X is a d × |X| matrix, obtained by choosing the first d rows of U , and then taking from the resulting matrix those columns with indices in X. With this we analyze the statistical behaviour of q succ (M U ) in the regime d → ∞ using tools from random matrix theory. Specifically, the proof relies on the phenomenon of concentration of measure [51] on the unitary group U(n) equipped with the Haar measure and distance induced by the Hilbert-Schmidt norm. It shows that as n −→ ∞, Lipschitzcontinuous random variables on U (n) are with high probability close to their Haar-averages -this is captured by large deviation bounds (also known as concentration inequalities), that upper bound the probability that a random variable take values drastically different form its Haar-average.
In order to prove Eq. (6), we choose U X as the random variable to which we apply the machinery of concentration of measure. An upper bound to its Haar-average is obtained by performing a discrete optimization over an -net of an m − 1dimensional complex sphere. Since the concentration inequality is true for all subsets X in the partition of [n], the union bound shows that X i∈X M U i also exhibits concentration of measure, which gives Eq. (6).
In order to prove Eq. (7), we invoke the inequality in Eq. (4), and use it to upper bound q (m) with the robustness R (m) (M U ) of a random POVM M U with respect to m-outcome simulable POVMs in C d . Using the interpretation of robustness in the context of state-discrimination (see Eq. (5)), we lower bound it by constructing a specific ensemble of quantum states obtained by rescaling the effects of M U . In this way, a lower bound on the robustness (hence an upper bound on the success probability) becomes a function of the matrix elements |U ij | 2 of the Haarrandom unitary U . Finally, we prove a concentration of measures inequality for this resulting function, by again invoking the  1)), which was obtained from random ≤ 24 partitions. For random POVMs, in each dimension, we generate 10 to 500 random POVMs (lower number for higher dimensions) and plot the minimum qsucc across them. For IC-POVMs, the measurement operators are specified by a single parameter α which we keep at a fixed value across all dimensions (see Appendix E for details). union bound and the cumulative distribution function of |U ij | 2 , which was obtained in [52].
Numerical results-We tested the performance of our POVM simulation scheme by computing q succ for SIC-POVMs [36][37][38], IC-POVMs [39] and for Haar-random d 2 -outcome POVMs. We focused on simulation strategies via POVMs that can be implemented with a single auxiliary qubit (this corresponds to setting m = d in Theorem 1). For every dimension, we generated effects of symmetric POVMs numerically from a single fiducial pure state via transformations For IC-POVMs we used a one-parameter family of fiducial states |ψ α described in Ref [39] for the specific value α = 1 2 (1 + i) (we remark that POVMs originating from other values of α exhibited a similar behaviour). For SIC-POVMs we used fiducial states from a catalogue in Ref [53] for d < 100 and states in higher dimension (up to d = 1299), which were provided to us by Markus Grassl in a private correspondence. The construction of random POVMs is described in Appendix E.
Results of our numerical investigation are given in Fig 2. For every considered measurement, the success probability was obtained via direct maximization over only ≤ 24 random partitions of [d 2 ]. The graph shows that with increasing dimension, q succ approaches ≈ 25% for SIC POVMs and random POVMs, while for IC it is above ≈ 20% even up to d = 1299.
Noise Analysis-Let us now discuss the effects of experimental imperfections on practical implementation of our scheme for generic POVMs. The quantum circuits implementing Haarrandom POVMs can be considered generic random circuits. The simplest noise model often adopted for such circuits (see Ref. [54]) is a global completely depolarizing channel described by a "visibility" parameter η. In what follows we assume that this noise is going to affect implementation of circuits used to realize a target POVM M (either via Naimark's construction or via our method). This noise acts in the following way on effects of n-outcome POVM: To quantitatively compare noisy and ideal implementation of a POVM we use Total-Variation Distance N). In particular, we will be interested in the worst-case distance, i.e., TVD maximized over quantum states ρ, which can be interpreted as measure of statistical distinguishability of M and N (without using entanglement [55]). This notion of distance is used to benchmark quality of quantum measurements on near-term devices [56][57][58].
The following result, proven in Section D of the Appendix, gives a lower bound for the average worst-case distance between ideal and noisy implementation of Haar-random POVMs.
To make qualitative comparison between our and standard (i.e., based on Naimark's dilation theorem) implementation of POVMs, we use noise model used in Google's recent demonstration of quantum advantage [40]. Assuming that main source of errors are multiple two-qubit gates, we get that dominating term in visibility is exponentially decaying function: η = η (r 2 , g 2 ) ≈ exp (−r 2 g 2 ), where r 2 is two-qubit error rate and g 2 is the number of two-qubit gates needed to construct a given circuit. Now recall that for implementation of d 2 -outcome POVM using Naimark's dilation, one needs to implement circuits on the Hilbert space with doubled number of qubits 2N (we assume d = 2 N ), while our post-selection scheme requires only a single additional qubit, hence the target space has only N + 1 qubits. We note that for implementation of generic circuits on 2N qubits, the theoretical lower bound [59] for needed number of CNOT gates is g Naimark 2 = Θ 4 2N = Θ 16 N , while our scheme gives the scaling g post 2 = Θ 4 N . Finally, combining the above considerations with Proposition 1, we get expected worst-case distance between ideal and noisy Naimark implementation of generic d 2 -outcome measurement is lower bounded by ≈ 1 − exp −Θ 16 N e −1 , which corresponds to η Naimark = exp −Θ 16 N . We compare this to the quality of probability distribution p noise post (M|ρ) generated by the noisy version of our simulation scheme which is based on implementation of projective measurements on N + 1 (not 2N ) qubits and hence incurring noise with η post ≈ exp −Θ 4 N . In Appendix D we show that postselection step in our scheme does not significantly affect the quality of pro-duced samples by proving that for typical Haar random M U where C is an absolute constant. Therefore, for generic measurements, implementation via our scheme will be affected by much lower noise than in the case of Naimark's. We expect that similar behaviour (i.e., amount of noise in our scheme compared to Naimark's dilation) should be exhibited also for more realistic noise models -the high reduction of the dimension of the Hilbert space is, reasonably, expected to highly reduce the noise.
Discussion and open problems-Aside from their practical relevance, our results shred light onto the question whether POVMs are more powerful (in quantum information tasks requiring sampling) than projective measurements. Indeed, since typical POVMs in C d can be implemented using d-outcome measurements, it suggests (and if our conjecture is true, then it implies) that, if there exists a gap in the relative usefulness (quantified for example via robustness), then it is between projective measurements and d-outcome POVMs. Moreover, the surprisingly high value of q (d) succ will likely have potential applications to nonlocality. First, it significantly limits (due to inequality (4)) the amount of local depolarizing noise that can be tolerated in schemes for generation secure quantum randomness using extremal d 2 -outcome measurements [19,60]. We also anticipate that our results can be used to construct new local models for entangled quantum states that undergo general POVM measurement (by using techniques similar to those of [10,61]).
We conclude with giving directions for future research. First, naturally, is to verify whether our conjecture is true. The difficulty in proving it comes from the combinatorial nature of the optimization problem in Eq. (1) -it is difficult to analytically find the optimal partition of [n] that maximizes q succ for a target POVM M. Effects of Haar random POVMs have similar properties -in particular, they have (on average) equal operator norms -this symmetry allowed us to study them analytically. However, general POVMs can be highly unbalanced (in the sense of having effects whose operator norms can vary significantly) and suitable strategies need to be devised to tackle such situations. Second, it is desirable to devise an algorithmic method which, when given the circuit description of some POVM, returns the circuits needed to implement it with postselection. Another direction is to identify and quantify the real-time implementation costs of randomisation and post-processing, and how these cost considerations can be taken into account for suitable modifications of the scheme. Finally, it would be interesting to see if the success probability is connected to other properties of POVMs -for instance, their entanglement cost [62].
Data availability The data obtained in numerical simulations is available from authors upon request.
Code availability The code used to obtain numerical simulations is available from authors upon request.
Acknowledgements We are sincerely grateful to Markus Grassl for fruitful discussions and for sharing with us the numerical form of fiducial kets of SIC POVMs for high dimensions. We thank Zbigniew Puchała for the discussions at the initial stage of this project and Michał Horodecki for suggesting potential application of our scheme in PBT. The authors acknowledge the financial support by TEAM-NET project co-financed by EU within the Smart Growth Operational Programme (contract no. POIR.04.04.00-00-17C1/18-00). A portion of this work was done while TS was in Fudan university, and TS acknowledges support from the National Natural Science Foundation of China (Grant No. 11875110) and Shanghai Municipal Science and Technology Major Project (Grant No. 2019SHZDZX01).
Author Contributions TS had a leading role in proving Theorem 2, Proposition 1 and many auxiliary technical results. FBM carried out numerical simulations and proved results concerning noise robustness of POVM implementation methods. MO contributed with the main idea of the project, proved Theorem 1 and supervised the other parts project. All authors equally contributed to writing the manuscript equally. Competing

Appendix
We collect here technical results that are used in the main part of the paper, as well as more detailed descriptions of some of the presented concepts. In Section A, we discuss a relation between success probability of our implementation scheme, and a resourcetheoretic quantities -visibility and robustness of POVMs. In Section B we explain concentration of measure for general random variables on probability spaces, especially for the special cases of the unitary group U(n) and the (n − 1)-complex sphere. The contents of this section should be treated as preliminaries for further sections. The proofs technical version of Theorem 2 are provided in Sections C. In Section D we describe in more detail the effects that completely depolarizing noise has on the implementation of quantum measurements. Finally, in Section E we provide details of numerical simulations presented in the main text.
For the benefit of the reader, in table below we explain the notation used in the Appendix Norm induced by Hilbert-Schmidt inner product on linear operators. dTV(p, q) Total variational distance between probabilities p and q.

Appendix B: Preliminaries
In this Part we provide some basic theoretical background that will be used in Lemmas 1, 2, 4, 5, 6, and Theorems 5 and 6. In Subsection B 1, we introduce the notion of concentration of measure, which will be used extensively for proving the aforementioned lemmas and theorems. Related concepts like Lipshitz constants of functions and log-Sobolev inequalities and log-Sobolev constants are also explained alongside. The metric spaces which we use in this work are the unitary group U(n) (with metric induced by Hilbert-Schmidt inner product), and the (n − 1)-complex sphere S n−1 C , with the metric it inherits from C n . The Haar-measure on U(n) and the uniform measure on S n−1 C will be introduced in subsections B 2 and B 3 respectively, and the corresponding log-Sobolev constants also mentioned.

Concentration of Measure: Lipshitz constants and log-Sobolev inequalities
We start by recalling notions of Lipshitz constants and log-Sobolev inequalities. Let (X , d) be a metric space, and let f : X → R be a real function on X . We say that f is L-Lipshitz on X with respect to the metric d, if f satisfies the following condition. , y), for all x, y ∈ X . (B.1) Now let µ be a probability measure on (X , d), and let function f be such that the length of the gradient of f can be defined at any point x in X , namely Then for any such function, the following concentration inequalities hold where C is called the log-Sobolev constant of µ with respect to the metric d of X . We note that the inequality (B.4) can be derived from (B.3) (see Theorem 5.39,in [51]). We refer the reader to [51] for more details on log-Sobolev inequalities.

Haar-measure on U(n)
The group of n × n unitary matrices U(n) is endowed with the well known probability measure known as the Haar-measure. It follows that for any integrable function f on U (n), its expectation value with respect to the Haar measure is invariant under the following operationsˆd where W is an arbitrary fixed unitary in U (n). U(n) inherits a metric from the Hilbert-Schmidt inner product on the space of n × n complex matrices. The distance between two unitaries U, W with respect to the Hilbert-Schmidt metric is The follwing Theorem then gives the log-Sobolev constant for the Haar measure with respect to the Hilbert-Schmidt metric ( 3. Uniform measure on S n−1

C
The complex (n − 1)-sphere S n−1 C is defined as For any n × n unitary U , the unitary action |x → U |x is norm-preserving. Thus, the Haar-measure of U (n) endows a rotationally invariant probability measure on S n−1 C in the following way: fix some arbitrary |x in S n−1 C , then for Haar-random U , |z = U |x is a random variable in S n−1 C , endowed with a probability measure called the uniform probability measure on S n−1 C . In particular, one can choose |x to be a standard basis vector |e i , which tells us that when U is Haar-random, then it's columns are distributed with the uniform measure on S n−1 C . The uniform probability measure on S n−1 C has a log-Sobolev constant with respect to the usual norm-induced metric on S n−1 C (see table 5.4 in [51]; note that S n−1 C S 2n−1 , which is the (2n − 1)-sphere in R 2n ). Let { |e j } n j=1 , denote the standard basis for C n . Each vector |ψ in S n−1 C can be mapped to an n-probability vector as follows: Imposing the uniform measure on S n−1 C , converts p i into a random variable on interval [0, 1]. Denote p i by x, the probability density of this random variable is given by [52] p It is easy to see that the expectation value of x is 1 n . Also, the distribution of x is given by P (x ≥ y) = (1 − y) n−1 and it follows that

Haar-random POVMs
In this subsection we recall the construction of rank-one Haar random POVMs. An n-outcome, rank-one POVM M U on C d can be constructed from Haar-random unitary U ∈ U(n) using the following steps 1. Extend the principal system C d to a larger system C n using an ancillary system, which is prepared in a fixed state |0 .
2. Rotate the composite system by the unitary U in U(n).

Measure the composite system in a computational basis {|e
Let us denote by P U a rank-1 n-outcome projective measurement on the composite system, whose effects are given by Now if the ancillary system is prepared in state |0 0|, then performing the above measurement on the composite system, implements on original system C d a rank-1 n-outcome measurement M U with effects given by M U the matrix elements of M i can be related to the matrix elements of U via Finally, when U is distributed according to the Haar measure on U(n), then a POVM M U also becomes a random variable. This is called a Haar-random POVM.

Appendix C: Proof of Theorem 2
In this section we prove the Theorem 2 concerning bounds on the success probability of implementation of Haar-random POVMs with postselection. The first three subsections contain auxiliary lemmas needed in the proof of the main result which we provide in Section C 4. From now on, unless stated otherwise, we denote by X a subset of [n] such that |X| = m, by U a n × n unitary matrix, and by U X a truncation of unitary U , occurring at the intersection between rows in [d] and columns in X. Furthermore, {|e i } n i is a standard orthonormal basis in C n and by P = d i=1 |e i e i | we denote a projector onto the space of its first d components.

Lipshitz constants for functions used in proof of Theorem 2
We first bound Lipshitz constants for some functions which will be used latter. Lemma 1. The function U → || U X || is 1-Lipshitz on U(n) with respect to the Hilbert-Schmidt metric.
Proof. Let U, W be two n × n unitaries, such that U = W . Then Lemma 2. For any |z in S n−1 C , the function |z → ||P |z || is 1-Lipshitz.

Upper bound to the Haar-averaged norm of truncations of unitary matrices
The following auxiliary results allow us to upper bound expected value of the operator norm of truncations of Haar random unitaries.
Let E X be an -net for S X . Then e i | P |x = 0 for i ≥ d + 1 for all |x ∈ C n , and we have Proof. From the singular value decomposition of U X , we get that where |ã ∈ S X is the (or is a) vector at which the maximization in equation (C.5) is attained. Now to discretize the optimization in equation (C.5), we optimize over E X instead, and we note that then there exists |x ∈ E X such that || |x − |ã || ≤ . Hence we get that || P U |ã || ≤ || P U |x || + || U X ||, which gives us for all 0 < < 1. Additionally, when m = d − 1, Remark 1. The proof of Lemma 4 is inspired by the proof of equation (18) and Theorem 7 in [63] (please see Section 2 of the appendix in [63]). In Remark 2 below we briefly explain the differences between the proof presented here and the proof in [63].
Proof. Let |z ∈ S n−1 C , and define the function |z → || P |z ||. This function is 1-Lipshitz on S n−1 C (Lemma 2). Define S X as in equation (C.3). Now fix some |x ∈ S X . Let U ∈ U(n) be Haar-random, and let |z = U |x . Then |z is uniformly distributed on S n−1 C (see Subsection B 3 ). Thus the function |z → || P |z || satisfies the following log-Sobolev inequality with a constant C = 1 2n−1 , with respect to the uniform measure on S n−1 where A := || P U |x || Haar . Since exp (−λ A) is independent of the integrating variable, we get First we prove that A ≤ d n . Using the well-known result |U ij | 2 Haar = 1 n , one obtains where we chose |x = |e j for some j ∈ X. Now note that || P U |x || Now let E X be an -net for S X . Then we sum the inequality (C.12) over all |x ∈ E X , and we get For each U ∈ U(n) there is some |x U ∈ E X , such that It is not difficult to see that U → || P U |x U || is a continuous function, which implies that exp (λ || P U |x U || ) is integrable on U(n). Thus we getˆd Since the exponential function is convex, Jensen's inequality can be applied in equation (C.15), which gives Now taking the (natural) logarithm (and assuming that λ > 0) we get (C.17) Since the inequality (C.17) is valid for all λ > 0, we directly minimize the RHS over λ, and we get 18) which is obtained at the value λ = 2(2n − 1) log |E X |. Note that we have used equation (C.14) in the LHS of equation (C.18). There's a well-known theorem (see, e.g., [51,64]) that an -net for S X has at most (1 + 2/ ) 2m points. This gives us an upper bound for |E X |, which inserted into RHS of (C. 18 which is valid for any ∈ (0, 1). Now recall that in our scheme we are interested in case when m ≤ d − 1, which allows to rewrite the above inequality as where we have used the fact that 1 < log(1 + 2/ ) for any ∈ (0, 1) and we assume that n is large. With this approximation it is possible to perform minimization over ε, which gives us the inequality (C.7). Note that the result of minimization will generally depend on the relative values of d and m, and so for special case m = d − 1 we get inequality (C.8).

Remark 2.
There are two differences between the proof that we gave above, and the proof for equation (18) in Theorem 7 of [63]. Firstly, the goal of Lemma 4 is to find an upper bound to || U X || Haar , while in [63], the upper bound being sought is for , such that |X| = d and |Y | = m, and U X,Y is the d × m truncation of U lying at the intersection between rows in X and columns in Y . For this purpose, the optimization in [63] is over an -net whose cardinality is n d n m 1 + 2 2(d+m) . The second difference is that we use the equation (C.4) for the optimization, whereas in [63], they used || U X,Y || = max |x ,|y Re x| U |y , where |x ∈ E X , which is an -net for S X , and |y ∈ E Y , which is an epsilon-net for S Y . Our reason for choosing equation (C.4) is that it allows us to obtain a lower upper bound in inequality (C.7) and (C.8). This is important because this upper bound is closely associated with the success probability, as can be seen in the proof of Theorem 5. Proof. Consider the event where r is a constant, that will be determined later to get a decent concentration. The event E implies that there exists some i ∈ [d] and some j ∈ [n] such that the following event is true: Now we note that for ∈ −1, n r log n − 1 and y = r(1 + ) log n n , from inequality (B.10) we have where we used the fact that 1 < exp r(1+ ) log n n < e. Using the union bound gives Note that for the probabilities appearing on the RHS of the inequality (C.27) to be meaningful, it's necessary to revise the interval for as follows.
1 r log n < < n r log n − 1, (C. 28) provided that r is chosen so that n r ≥ n d. The maximum value of d in terms of n is when d = n. Thus we choose r = 2, which proves the lemma.
Proof. Since n j=1 w U j = d, we get that w1 d , w2 d , · · · , wn d (where we dispense with the superscript U ) is an n-probability vector. For any n-probability p, consider the function p → n j=1 p 2 j is Schur-convex [65] and therefore its minimum value is where minimization goes over all n-probability vectors, and is attained at the uniform distribution, p = ( 1 n , 1 n , · · · , 1 n ). Finally, to prove the lemma we note that for the Fourier matrix F , with elements

Technical version Theorem 2 in the main text
Now we are ready to provide a technical version of the first part of the Theorem 2 from the main text. Since the methods used in the proofs of inequalities (6) and (7) comprising Theorem 2 differ, we formulated two auxiliary technical theorems (Theorem 5 and Theorem 6 below), each covering one of the inequalities.
Remark 3. One can directly obtain an upper bound for the m = d case, by evaluating the RHS of inequality (C.32) for m = d1. But in that case the success probability one gets is 4.65%, which is lower than the success probability in inequality (C.33) (which is (6.74%). Thus, a separate derivation for (C.33) is warranted.
Proof. Let U j be a truncation of U , occurring at the intersection between rows in [d] and columns in X γ . Using Lemma 4 we obtain the following upper bound to || U j || Haar .
where c ≈ 1.92, and γ = 2(m−1) d . For the case m = d, the upper bound is simpler: where c ≈ 3.85. To simplify the presentation, define From Lemma 1 it follows that the function U → || U j || is 1-Lipshitz on U(n) with respect to the Hilbert-Schmidt metric. Therefore, the function satisfies the following concentration inequality (see Subsection B 1) where we have used the fact that By defining we can rewrite the inequality (C.37) as Suppose U be such that it satisfies: α j=1 || U j || 2 ≥ α(A + t) 2 . This implies that for at least one j ∈ [α], U ∈ E j . Using α ≤ n m−1 , we obtain where we have used the union bound inequality on the event α j=1 E j . When U satisfies the inequality α j=1 || U j || 2 ≥ n m (A+t) 2 then using the fact that 2 , we get that the success probability of our scheme is bounded by Finally, by taking := t A , and using equation (C.36), the event (C.42) can be rewritten as where c = 1 2 c 2 ≈ 0.136. By plugging this into equation (C.12) we get the inequality (C.32). For the special case when m = d, we follow the same reasoning as above, starting from inequality (C.35), and then obtaining (C.33).
Theorem 6 (Technical formulation of inequality (7) from Theorem 2). Let n ∈ d, . . . , d 2 , m ≤ d. Let M U denote a rank-one n-outcome Haar-random POVM on C d . Let q (m) (M U ) be the maximal success probabilility of implementing M U with postselection via convex combination of m-outcome measurements. We then have Remark 4. Theorem 6 is meaningful only for values of d, m and n such that 2 m log n < d. Moreover, inequality (7) is reproduced by setting = 1 in Eq.(C.44).
Proof. Let S m be the set of all n-outcome POVMs simulable by quantum measurements with at most m-outcomes. Let M be arbitrary n-outcome POVM on C d . To establish inequality (C.44) we shall use the following inequality between q (m) (M) and the robustness R (m) (cf. Appendix A) The robustness R (m) (M) has an operational interpretation: it can be expressed via the maximal relative advantage that M can offer over all over all possible POVMs in quantum state diecrimination S m (see Theorem 2, in [44]): is an n-element ensemble of quantum states, and P succ (E, M) (P succ (E, N)) is the success probability for the minimum error discrimination of the states with the POVM M (or N respectively). For a given M, we construct the following ensemble of states: Note that the convexity of S m implies that max When U is distributed according to the Haar measure, then we can use inequality (C.51) from Lemma 5, which proves the theorem.

Depolarizing noise in implementation with post-selection
In this part we study how global depolarizing noise acting affects the quiality of our POVM implementation scheme involving postselection. Recall that our scheme implements a measurement N = (qM 1 , . . . , qM n , (1 − q) 1) , (D. 17) where M is a target POVM (which we assume consist of rank one effects) and q is a success probability of the implementation. The above measurement is realized as a convex mixture of m-outcome measurements (for simplicity we assume here that m − 1 divides n) as N = γ p γ N γ , (D. 18) where each N γ has n + 1 formal outcomes, such that where X γ is subset of |X γ | ≤ m − 1 outcomes and probability distribution {p γ } is defined by (D.20) Each of the measurements {N γ } is implemented via Naimark's dilation theorem (i.e projective POVM on extended Hilbert space). As explained in the main text, if the target POVM M is rank one, and m ≤ d then POVMs N γ can be implemented using Hilbert space of dimenstion m − 1 + d ≤ 2d =: d tot . Now, due to the noise, the effects of the implemented POVM are distorted as group), where d is the dimension of the system. Such POVM has d 2 rank-1 effects and is shown to be informationally-complete [39]. A fiducial vector is constructed as where α is a parameter characterizing the POVM and has to fulfill condition 0 < |α| < 1 Now vectors defining other effects of that POVM are obtained as where U m,n is a (projective) unitary representation of Z d × Z d given by with m, n ∈ [0, d − 1] and ⊕ is addition modulo d. See Ref. [39] for more details. In our simulations we arbitrarily choose the free parameter to be α = 1 2 (1 + i). We note that we checked a few other instances of this parameter and we did not observe quantitative differences in the probability of success of POVMs simulation using our scheme.

b. Symmetric Informationally Complete measurements
The measurement is called symmetric if its effects have equal pairwise Hilbert-Schmidt scalar products. The search for symmetric and informationally complete (SIC) measurements is an active area of research [38] and even existence of SICs in arbitrary dimension d is an open problem. To date, SIC POVMs have been found either numerically or analytically for a restricted collection of dimensions [36,66,67]. SIC POVMs are, similarly to IC, represented by a single fiducial vector and we generate other measurement operators from that vector by the action of Z d × Z d group (we note that all SIC POVMs found to date are covariant with respect to some group, and the most of them covariant to Z d × Z d group).
In were provided by Markus Grassl in private correspondence.

Haar-random POVMs
In this work, we are interested in generating Haar random d-dimensional POVMs with d 2 outcomes. A straightforward method to do so would be to generate Haar-random d 2 × d 2 unitary matrix and take its d 2 × d submatrix as defining such POVM. However, generation of random matrices quickly becomes unfeasible -due to large amount of memory required, we were not able to generate such matrices for high d. As a workaround, instead of generating random d 2 × d 2 unitary matrices, we generated random d 2 × d isometries. To do so, we implemented the following algorithm. It follows that {e k } d k=1 forms an orthonormal set of d 2 -dimensional random vectors. Hence those vectors can be used to construct a d 2 × d isometry.
5. To construct a POVM one simply looks at rows of this isometry as a set of d 2 vectors of dimension d. Since the matrix is an isometry, it follows that those rows define rank-1 effects of d 2 -outcome random POVM.