Quantum uncertainty switches on or off the error-disturbance tradeoff

The indeterminacy of quantum mechanics was originally presented by Heisenberg through the tradeoff between the measuring error of the observable A and the consequential disturbance to the value of another observable B. This tradeoff now has become a popular interpretation of the uncertainty principle. However, the historic idea has never been exactly formulated previously and is recently called into question. A theory built upon operational and state-relevant definitions of error and disturbance is called for to rigorously reexamine the relationship. Here by putting forward such natural definitions, we demonstrate both theoretically and experimentally that there is no tradeoff if the outcome of measuring B is more uncertain than that of A. Otherwise, the tradeoff will be switched on and well characterized by the Jensen-Shannon divergence. Our results reveal the hidden effect of the uncertain nature possessed by the measured state, and conclude that the state-relevant relation between error and disturbance is not almosteverywhere a tradeoff as people usually believe.

Quantum uncertainty is conventionally explained through two sequential quantum measurements 1,2 , the error of the first and its disturbance to the precision of the second are qualitatively claimed to have an essential tradeoff. The core effect in the relation between error and disturbance is the intrinsic back action of quantum measurements, as Dirac wrote "a measurement always causes the system to jump into an eigenstate of the dynamical variable that is being measured " 3 . Heisenberg only gave an intuitive and informal argument of the probable errordisturbance tradeoff. The famous uncertainty inequalities, Kennard's 4 Δ x Δ p ≥ ħ/2 (Δ is the standard deviation), Robertson's 5 ψ ψ ∆ ∆ ≥ |〈 | | 〉|
To close the discrepancy between mathematical inequalities and physical interpretations, Ozawa firstly quantified the error and the disturbance, and analyzed their relationship. He proposed an inequality 10 :

A B A B B A
and claimed the incorrectness of the Heisenberg-type tradeoff ε A η B ≥ |〈ψ|[A, B]|ψ〉|/2. There in, the error term, ε A , is defined as the root mean squared of the difference between the observable really measured and A, the observable we aimed at. The disturbance to B, η B , is defined in a similar manner. We shall not repeat the expressions of ε A and η B but emphasize that they are defined as being state-relevant, i.e., they are defined for each specific input state |ψ〉. The mathematical correctness of Eq. (1) has been verified extensively in qubit experiments [11][12][13][14][15][16] . However, its physical exactness was questioned is triggering a heated debate on the error-disturbance tradeoff [17][18][19][20][21][22][23][24][25][26][27] . The ε A and η B defined by Ozawa were criticized for violating the operational requirements for exact and faithful definitions of error and disturbance, which are explicitly suggested in ref. 27. They require • error to be nonzero if the outcome distribution produced in an actual measurement of A deviates from that predicted by Born's rule; • disturbance to be nonzero if the back-action introduced by the actual measurement alters the original distribution with respect to B.
These requirements emphasize the probabilistic feature of quantum mechanics, and further imply that 〈ψ|[A, B]|ψ〉 should be excluded to appear alone on the right hand side of the inequalities that describe the tradeoff 27 . This conclusion sharply conflicts with Eq. (1) and leaves an open problem as what can be there.
Meanwhile, Busch, Lahti and Werner (BLW) developed theories to support Heisenberg-type relations 24,25 . In their inequalities, the relevance to states is erased by searching the extreme case over all possible inputs. The relevance to input state has not been well taken into account 26 . These theories behave like benchmarking the quality of the measurement device. Unlike those state-relevant formalisms, they cannot describe the tradeoff in each particular experiment.
In spite of the intricate features, we notice the presence of Δ A and Δ B , characters of the quantum uncertainty, in Eq. (1). It inspires us to ask the question: does the quantum uncertainty possessed by the quantum state with respect to A and B play some hidden but intrinsic role in the error-disturbance tradeoff? Meanwhile, those state-irrelevant inequalities hint that the state-relevant relation may not be a pure tradeoff. In this paper, we will propose a state-relevant formalism of the error-disturbance relation with definitions of the error and the disturbance that obey the operational requirements. We demonstrate both theoretically and experimentally that a condition controlled by quantum uncertainty determines the existence of the error-disturbance tradeoff. We also give an inequality for the case where the tradeoff does exist.

Results
Error and disturbance. Suppose we measure observable A on a quantum system in state |ψ〉 with an imperfect real-life device. In the process, the employed device (together with the environment) selects a preferred pointer basis in the system's Hilbert space 28  , the distribution predicted theoretically by Born's rule. Thus, we define the error, Err ψ (A), to be specified below, to quantify the difference between P and P′ (see Fig. 1).
By the notation "|ψ〉" and the specified process of measurements, we have assumed that the problem in consideration associates with pure-state systems and the projective measurements that are maximally informative 29 . In fact, all the quantum measurements can be modeled by tracing back to projective measurements performed on a complete system described by a pure state. The concepts of mixed states and positive-operator valued measurements (POVM) emerge by statistically ignoring some sub-systems and losing some classical information. We will discuss them at the end and here focus on the most underlying structure. Now we define the disturbance. Statistically, the back-action of the real-life measurement maps the original state |ψ〉 to ρ = ∑ ′ ′ ′ p a a i i i i . This is the disturbance. When referring to observable B, it means that the original distribution determined by Born's rule, induced by the real-life measurement of A cannot be perceived from observing B. So we define the disturbance to B, Dis ψ (B), by the divergence between  Q and Q. It can be seen that our definitions satisfy the operational requirements introduced above in a minimal way. That is, our definitions satisfy no additional requirements other than the operational requirements. Furthermore, the operational definitions will offer us a direct experimental implementation as illustrated in Fig. 2. With these preparations, we can now give one of the main results.
Quantum uncertainty and error-disturbance tradeoff. We find that a criterion for which quantum uncertainty completely determines the existence of the state-relevant error-disturbance tradeoff.  30 . That means the occurrence probability of a top-k high-probability outcome in the ideal measurement of A is no less than that of B, for all possible values of k. Thus, majorization gives a rigorous criterion for the partial order of certainty or uncertainty. Moreover,  P Q leads to i is the Shannon entropy. As widely accepted, larger Shannon entropy means more uncertainty. So Theorem-1 concludes that, if the outcome of measuring A is more certain than the outcome of measuring B, there will be no state-relevant error-disturbance tradeoff; otherwise, the tradeoff will be switched on and a positive lower bound of Err ψ (A) + Dis ψ (B) is expected.
The necessary part of Theorem 1 can be derived from a historic mathematical theorem 31 . The necessary part says if P and Q are generated in sequential measurements, P must majorize Q. That is, measurement is a way of extracting information thus the entropy is generally increasing. The sufficient part is more technic and we leave the proof in Methods. Our proof sets up an algorithm to find generally 2 d−1 different realizations of ′ a { } i i to close the tradeoff. There are two known extreme examples where there is no tradeoff. One is the case where |ψ〉 is an eigenstate of A. Another case is the zero-noise and zero-disturbance (ZNZD) states defined in ref. 27. In the first example of eigenstates, P = {1, 0, … 0}; for ZNZD states, P and Q are both the uniform distribution  d d {1/ , 1/ , }. So P majorizes Q in both cases. Here by revealing the effect of quantum uncertainty behind the scene, Theorem 1 links these isolated point-like examples together and extends the no tradeoff conclusion to extensive situations.
Theorem 1 does not conflict with the common sense that non-commuting observables are impossible to be precisely measured simultaneously with a state-independent strategy, because the vanishing tradeoff between error and disturbance implied by Theorem 1 is conditioned on the input state. An inverse question may be interested: given the real life measurements along | ′〉 a { } i i , how many input states share the merits of zero-error and zero-disturbance? We leave it to the Discussion.
Quantification of the tradeoff. If  P Q, the error-disturbance tradeoff is switched on. To describe the tradeoff, we shall quantify Err ψ (A) and Dis ψ (B). Based on the above analysis, we can quantify them using any non-negative functional D(·, ·) that vanishes only when the associated two probability distributions are identical. The relative entropy m m m m , a central concept in information theory with wide applications, is one such example. Explicitly, we quantify the error and the disturbance as These information-theoretical definitions are independent of the unessential eigenvalues of the relevant ovservables, like the entropic uncertainty relation 6,7 . This is different from the Wasserstein 2-deviation used in the works of BLW. In the following, we give the result for 2-dimensional cases at first. Theorem 2. When d = 2 and  P Q, let us label the outcomes so that p 1 ≥ p 2 and q 1 ≥ q 2 and quantify error and disturbance by the relative entropy as Err

JS
where the Jensen-Shannon divergence, In Fig. 3, we illustrate both D JS (P, Q) and the exact bound obtained by numerical calculations. D JS (P, Q) is shown to be a valid lower bound which can be very close to the exact one. Theorem-2 answers the open question asked in ref. 27 of what can serve on the right hand side of the state-relevant inequality. The Jensen-Shannon divergence is also a distance function of probability distributions that has been applied in bioinformatics, machine learning and social science 32 . Interestingly, the well-known Holevo's bound 33 , the upper bound to the accessible information of a quantum state, is its quantum generalization.
Next we shall give the strategy for constructing the inequalities, and then generalize Theorem 2 to the case of higher dimensions.

Strategy for the lower bound. As coordinates, all pairs
as defined above compose a set we call 2  . We can release the specified definitions to view P′ and  Q as free distributions. Then the pairs ′  P Q  . An analytical solution seems complicated to approach and shapes terrible because of the involved geometry of 2  embedding in  0 . Instead, we define the set of pairs satisfying ′ P Q as 1  . According to Horn's theorem 30 , we have ⊂ ⊂  such that its minima value over  1 must be obtained at the surface of 1  . This surface consists of many faces, especially those defined by majorization. The geometry of 1  is much simpler and analytical solution becomes reachable. So the strategy is to find the minima over  1 . An illustration of this strategy for the case d = 2 is given in Fig. 4. The strategy also works for many other quantifications of the difference between probability distributions, aside from relative entropy.
Physically speaking, the strategy is equivalent to replacing the ideal measurements of B (in the part of sequential measurements in Fig. 1) with an imprecise apparatus performing projective measurements in a random basis , then cut the subscript string 1 ~ d into short sections ∼ j (1 ) 1 , . For each section, say the t-th one, we find the probabilities according to the subscripts in this section and take their sum Then we say P majorizes Q by sections if the relation holds for all the short sections. (If some zero-valued probabilities vanish the denominator, take the limit from infinitesimal positive factors). Equation (4) says that P is relatively more certain than Q in each section. We use   P Q to denote this relation where the index  labels this particular partition of the subscript string. In addi- are two distributions coarse-grained from P and Q by this partition. Next, let us find all the coarsest partitions under which P majorizes Q by sections. We say a partition is coarser than another if the latter can be obtained from the former by additional cutting such as cutting (2 ~ 9) into (2 ~ 4) (5 ~ 9). (" Coarser" defined in such a way is a partial order, we cannot say (1 ~ 5) (5 ~ 9) is coarser than (1 ~ 4) (4 ~ 7) (7 ~ 9)). Let us denote the set of all the coarsest partitions upon which P majorizes Q by sections as PQ  . None of  PQ can be obtained by further cutting from another section in it. PQ  is never an empty set since P will always majorize Q by sections for the finest partition where each short section consists of only one subscript. Then according to each partition in  PQ (say, the one labeled by ), we coarsen P and Q to obtain distributions in the way given above. With these preparations, now we can present our tradeoff relation.

Experiment
Being based on operational definitions, our theory can be experimentally tested in a straightforward manner. To compare, the reported experimental tests of Ozawa's inequality (qubit cases) require the three state method or the technology of weak measurement [11][12][13][14][15][16] . These experimental configurations are out of the original physical picture thus not so direct 34 . For single-photon experiments, a weak coherent state measured by single photon detectors is a good approximation to a true single-photon source 35 . Thus, as a photon source, we used a pulsed laser (with 788-nm central wavelength, 120-fs pulse width and 76-MHz repetition rate; Coherent Verdi-18/Mira-900F) and highly attenuate its mean photon number to ~0.004 at each pulse. For measurements, we can measure σ z by two single-photon detectors after the polarizing beam splitter (PBS, extinction ratio >500), which transmits horizontal and reflects vertical polarizations (Fig. 2a). The σ x measurement, which corresponds to the polarization measurement in the ±45° linear polarization basis, could be realized by inserting a half-wave plate (HWP) rotated at 22.5° before the PBS (Fig. 2b). The real-life measurements of A are actually measuring observables σ σ = ⋅ n n , the direction  n is a unit vector. They are realized by a PBS which performs σ z measurement, and groups of HWP and quarter-wave plate (QWP) that implement unitary rotations between the basis {|H〉, |V〉} and the eigenbasis of σ  n (Fig. 2c). The ratio of the counts of D1 + D1' and the total counts determines ′ p 1 , and  q 1 is determined by the ratio of the counts of D1' + D2 and the total counts. The distributions P′ and  Q are thus obtained. In the experiment, a polarizer was used to prepare the photons in the state ψ θ | 〉 = | 〉 + | 〉 θ θ H V ( ) cos s in 2 2 with θ π ∈ [0, /2]. Therein, |H〉 and |V〉, the horizontal and vertical polarization states, are viewed as the eigenstates of σ z , i.e., σ z |H〉 = |H〉 and σ z |V〉 = −|V〉. We calculated the optimal directions  n so that the measurements of σ ⋅ n could reach the minimal sum of Err ψ (A) and Dis ψ (B), where σ σ σ σ = ( , , ) x y z . These directions are used to carry out the imprecise measurements of A of which the experimental results are illustrated in Fig. 3a

Discussion
If we focus on a sub-system of a pure-state system, mixed input states must be taken into account. Now actually classical uncertainty is involved in. We show in the Methods that Theorem 2 and Theorem 3 are valid for all the mixed input states, and Theorem 1 is robust to depolarization noise, i.e., valid for at least the ensembles described by , as well as for all qubit states, pure or mixed. The validity of Theorem 1 in more general situations is still an open question. Additionally, we considered only projective measurements. The more general POVMs are realized from projective measurements on the enlarged systems. Correlation between the measured system and the ancillary established in the implementation of POVM, and the consequent information flow between them make the problem more complicated 27 . The state-dependent relation between error and disturbance thus still have rich structures to be discovered.
For some state |ψ〉, if  P Q we can find ′ a { } i i so that Err ψ (A) = Dis ψ (B) = 0. While, given this ′ a { } i i , how many input states, mixed or pure, ensure zero-error and zero-disturbance? The d-dimensional quantum state ρ can be parameterized using a vector  r so that where the generators of the Lie-algebra of group SU (d), , satisfy tr(λ i λ j ) = 2δ ij .  r should satisfy some conditions to make ρ positive. ′ a i , |a i 〉 and |b j 〉 can also be represented in the same way (by ′  a i ,  a i , and  b j , respectively). Then zero-error and zero-disturbance, i.e., ′ = p p i i and =  q q j j , are described by This series contains 2(d − 1) independent equations while  r has d 2 − 1 coordinates. Therefore, the solution set of  r has at most (d − 1) 2 dimensions, and includes at least |ψ〉 and the mixed states η ψ ψ To conclude, we have revealed the hidden effect of quantum uncertainty on the error-disturbance tradeoff. Our results also shed new light on overcoming problems engendered by the back-action of quantum measurements in fields such as quantum control, quantum metrology and measurement-based quantum information protocols 36 . We hope this work can inspire more research on quantum uncertainty and measurement.

Methods
Proof of theorem 1. The proposition that error and disturbance can be zero simultaneously is equivalent to the proposition that there is a unitary matrix which satisfies the following two conditions simultaneously: Here we just need to show sufficiency. For B, we have the freedom to define the phases of its eigenstates 1 so that the state |ψ〉 can be written as ij i j satisfies the above two condition, we have 1 2 (for convenience, we assume p 1 > p 2 , the case p 1 = p 2 is trivial), the following unitary results Actually, we will get a second solution by taking −φ, −θ 1 and −θ 2 in the above matrix. Here, we do not require the normalization that ∑ = p 1 i i . Then we assume the validity of the cases where the dimension equals to d − 1. For d-dimensional cases, the first condition can be written as , the second majorization must be valid. Then we have a unitary U 1 that acts only on the subspace belonging to p 1 , p j , such that it changes the diagonal elements p 1 , p j to q 1 , p 1 + p j − q 1  . Then U 1 U 2 is the unitary we want for the d-dimensional cases.
Since we have two solutions when d = 2, it can be seen from the induction that generally 2 d−1 solutions can be found.
Proof of Theorems 2 and 3. Theorem 2 is covered by Theorem 3, thus we give only the proof of the latter. For convenience, we make use of the freedom of relabeling to assume that ≥ ≥  p p p d 1 2 and the same applies for distributions Q. Then P′ and  Q, which are also labeled by such an order, will give ∑ + ∑ gives the minima to ). The proof is omitted here. Without loss of generality, we assume that elements in P and Q are all positive. Now we use n to denote the dimension of the manifold, i.e., n = 2(d − 1). An (n − j)-dimension surfaces of 1  is produced by j equations of the above equation string, accompanied with the restriction that ′ P Q. Now let us consider the minimum value on an (n − k)-dimension surface (j ranges from 1 to d − 1). The j equations cut the subscript string 1 ~ d into k + 1 sections that the sum of ′ p i and the sum of  q i within each subscript-section are equal. We use S t to denote the different sections and define notations do the same for the distributions P, Q and  Q. Then we use the Lagrange multiplier method to search for the extreme value: where the equivalence in each section has already implies that ∑ =  q 1 i i . Simple calculation shows that the minima is obtained on the point if the subscript "i" is in the t-th section where λ p = 2 and λ = − To write down the minimum value, we define two distributions obtained from P, Q by coarse graining:  Now we have to check whether this point is located on the (n − j)-surface of 1  , i.e., whether ′  P Q ( , ) given by Eq. (18) satisfies the requirement that ′ P Q. Since ′ =  P Q S S t t , ′ P Q if and only if P and Q satisfy the condition that within any section, such as S t . After re-normalizing P S t and Q S t to a common factor we must have  P Q in each section. More rigorously, suppose that section S t has subscripts , then ′ P Q if and only if Eq. (4) is valid for all these sections. This is the conception of "majorization by sections" introduced above.
If the above condition is not satisfied, the point defined by Eq. (18) locates outside of  1 so we should consider the edges of the (n − j)-dimensional surface, i.e., we should add another equation in Eq. (15) and study the (n − j − 1)-dimensional case. If the above condition is satisfied, a finer partition, i.e., adding extra equations in Eq. (15), will not bring a lower value.
So we have to find the family of all the coarsest partitions of the subscript string (anyone in this family is not a refinement of another one in it) under which P majorizes Q by sections. Then we calculate the Jensen-Shannon divergence corresponding to the partitions and the minimum is just the minimum over  1 . With the notations PQ  , the above analysis leads to Theorem 3.
One may wonder whether the solution given by Eq. (18)  . Actually, we do not need to care. This is because   P Q ensures ′ P Q such that the solution is in 1  . Thus all the values derived from  PQ can be reached in  1 , and meanwhile the real minima over 1  must link with one partition in PQ  . So the minimizing over PQ  will always give the minima we want. The second part of Theorem 3, which gives the sufficient condition for the situations when D JS (P, Q) serves as the lower bound, can be checked straightforwardly. Theorem 2 is covered in this case.
Generalization to mixed input states. Mixed states emerge by statistically ignoring some sub-systems and losing some classical information, i.e., including classical uncertainty in the scenario. Theorem 3, inequality for the tradeoff, is still valid because the relation ⊂ ⊂ 2 1 0    holds for mixed input states. Actually one can do more analysis in  1 to get a tighter bound, since P′ is also majorized by the string of eigenvalues of the input ρ. Other versions of Theorem 3 could be derived with other metrics of distance between probability distributions. The relative entropy could be infinity. One with finite upper bound can be utilized if someone wanted to derive a BLW-type theory from ours.
As to Theorem 1, let us give it another proof for qubits. The density matrix of the input state, A, B and the observable of the real-life measurement O A , can be represented by four vectors  r ,  a,  b and ′  a in Bloch sphere. The probability distributions have one to one correspondence with the inner products such as ⋅   r a and ⋅   r b. They can be assumed to be positive due to the freedom of relabeling the eigenstates of A and B. Suppose the angle between  r and  a is θ a and the angle between  r and  b is θ b . Now  P Q implies that θ a ≤ θ b . ′  a can be obtained by rotating  a around  r such that P′ = P. Thus the angle ξ between ′  a and  r will range from θ b − θ a to θ a + θ b . Then there must be a case where we have θ ξ θ = cos cos cos a b , which then leads to =  Q Q. It can be seen from the proof that what important is not the norm of  r , but rather its direction and the inter-angles between the vectors representing |ψ〉, ′ a i and |b j 〉. For higher dimensions, projectors of pure states can also be represented by coherent vectors as ( 1) 2 (20) We have proved Theorem 1 for pure states. Drawing an analogy with the qubit case, we conclude that Theorem 1 is valid for mixed states in the form of η ψ ψ + | 〉〈 | η I d , which have parallel but shorter coherent vectors compared with |ψ〉. In other words, Theorem 1 is robust under depolarization de-coherence. The validity of Theorem 1 for more general mixed states is still open.