Quantum metrology with imperfect measurements

The impact of measurement imperfections on quantum metrology protocols has not been approached in a systematic manner so far. In this work, we tackle this issue by generalising firstly the notion of quantum Fisher information to account for noisy detection, and propose tractable methods allowing for its approximate evaluation. We then show that in canonical scenarios involving N probes with local measurements undergoing readout noise, the optimal sensitivity depends crucially on the control operations allowed to counterbalance the measurement imperfections—with global control operations, the ideal sensitivity (e.g., the Heisenberg scaling) can always be recovered in the asymptotic N limit, while with local control operations the quantum-enhancement of sensitivity is constrained to a constant factor. We illustrate our findings with an example of NV-centre magnetometry, as well as schemes involving spin-1/2 probes with bit-flip errors affecting their two-outcome measurements, for which we find the input states and control unitary operations sufficient to attain the ultimate asymptotic precision.


Supplementary Note 2. Properties of γM
In this section we discuss several properties of γ M .
a. Classical interpretation of γ P . For commuting {M x } x , namely when the imperfect measurement is specified by the noisy detection channel P ∼ {p(x|i)} such that ∀ x : M x = i p(x|i) Π i , γ P ≡ γ M has a classical interpretation. Note that in this case Eq. (6) reads: where a, b are real normalised, orthogonal vectors: a·b = 0, |a| 2 = |b| 2 = 1. It is simple to see that a, b can be assumed to be real vectors. In order to gain further intuition, we can define p i := a 2 i and a 'derivative' dp i := a i b i such that then a i = √ p i and b i = 2d( √ p i ). As a result, it can be seen that the constraints of |a| 2 = 1, a · b = 0, |b| 2 = 1 are equivalent to the constraints: i p i = 1, i dp i = 0, i dp 2 i pi = 1. In nuce, {p i } i is a probability distribution, {dp i } i is the derivative vector of the probability distribution and the constraint of i dp 2 i pi = 1 is a normalization constraint: the original FI equals to 1. In this new notation, Eq. (8) reads: with the constraint of i dp 2 i pi = 1. Hence, γ P can interpreted as the optimal noisy classical FI, optimised over all {p i } i , {dp i } i with the original FI of 1.
b. Data processing inequality γ P2•P1 ≤ γ P1 . Let us consider again the setting of γ P in Eq. (8) specified by the noisy detection channel, P, that constitutes a stochastic map with entries [P] xi = p(x|i) and yields the imperfect measurement {M x = i [P] xi Π i } x . Now, considering a composition of any two noisy detection channels P 2 •P 1 as an effective stochastic map, the corresponding imperfect measurement simply reads {M x = i [P 2 P 1 ] xi Π i } x . We demonstrate that the resulting γ-coefficient (8) obtained via such a composition must be contractive, i.e. γ P2•P1 ≤ γ P1 .
c. Monotonically increasing with the number of probes, i.e. ∀ M : γ M ≤ γ M⊗I ≤ γ M⊗M . Let us first prove that γ M⊗I ≤ γ M⊗M for any imperfect measurement M, which follows from convexity. In general, for any two positive semidefinite operators M 1 , M 2 ≥ 0: which follows from the convexity of the function x 2 y , where we take x i = Re ξ ⊥ |M i |ξ , y i = ξ|M i |ξ . This convexity implies: and, hence, γ M⊗I ≤ γ M⊗M . Now, γ M ≤ γ M⊗I is assured by the fact that the maximisation over all orthogonal |ξ and |ξ ⊥ in Eq. (6) defining 2-dimensional subspace in the support of M, is trivially contained within the maximisation over all orthogonal |ξ , |ξ ⊥ lying in the support of M ⊗ I. In particular, for any two |ξ and |ξ ⊥ one may choose |ξ = |ξ |χ and |ξ ⊥ = |ξ ⊥ |χ with any |χ , so that d. Sufficient condition for γ M = 1. We mention in the main text that a sufficient condition for a perfect QFI, i.e. γ M = 1, is perfect distinguishability between two states: there exist two orthogonal states, |ξ and |ξ ⊥ , such that for every M x either M x |ξ = 0 or M x |ξ ⊥ = 0. In order to see this, observe from the Cauchy-Schwarz inequality (7) It is straightforward to see that given perfect distinguishability this condition is satisfied, with the proportionality constant being exactly zero for all x.
To confirm that we should always choose projective measurement before the noisy detection channel whenever possible, i.e., m = s * = 1, we put in s * = 1/y * into g x (y * ) for an explicit convex function of y * . One can then verify readily that the roots are now not greater than 1, and therefore, the optimal choice of y * is 1. Finally, as all the above optimizations hold for all f x , it follows that they apply to the total FI, and thereforē Note that as Eq. (14) depends on ϑ, which is an angle defined relative to r(θ), it follows that in this case here we have a freedom to fix either the measurement or the input state and optimize over the other, as long as both are restricted to the equatorial plane. In particular, we may fix ρ = |+ y + y |, i.e., r 0 = e y , and optimize over {Π i, φ = |Π i Π i |} with |Π 1,2 = (|0 ± e iφ |1 )/ √ 2, i.e., m = cos φ e x + sin φ e y . Equivalently, we may optimize over ρ = |ψ ψ| with |ψ = e iφσz/2 |+ y = (|0 + ie iφ |1 )/ √ 2, i.e., r 0 = − sin φ e x + cos φ e y , with fixed Π 1, φ = |+ +|, Π 2, φ = |− −|, i.e., m = e x . Both give the same expression with ϑ = θ+φ−π/2 in Eq. (14), which turns into Eq. (9) in the main text.
Supplementary Note 4. Relations of quantum metrology with imperfect measurements to the "standard" setting of noisy parameter encoding In this section, we discuss in detail the differences and relation between our quantum metrology with imperfect measurement model and the "standard" noisy metrology model. In the latter, the measurement is taken to be perfect, and the noise is described by a CPTP channel that acts before the measurement stage.
Supplementary Figure 1. Different models for noisy quantum metrology. (a) Quantum metrology with imperfect measurement M. V φ is some control unitary operation for optimizing the measurement basis. (b) Equivalent picture of (a), with Π being a projective measurement which can be chosen and fixed arbitrarily, and Λ is a CPTP channel. (c) Quantum metrology with "standard" noisy parameter encoding, where the noise, described by a CPTP channel Λ, is independent of the choice of the measurement settings φ.
Firstly, let us denote B(H d ) as the set of bounded linear operators on the Hilbert space H d with dimension d. Then, recall that our quantum metrology with imperfect measurement on the physical level is as depicted in Suppl. Fig. 1(a): an encoded qudit described by the state ρ(θ) ∈ B(H d ), undergoes a control unitary opera- Here, all the M x and the V φ are elements of B(H d ) as well. The probability of getting the outcome x is given by the Born's rule, We can also think of it in a mathematically equivalent picture, as in Suppl. Fig. 1(b), where we describe the imperfect measurement M ∼ {M x } |X| x=1 rather by a noisy CPTP channel Λ that is followed by a perfect projective measurement Π ∼ {Π x = |x x|} |X| x=1 . That is, we can always find some quantum (CPTP) map Λ ∼ {k } that allows for the conjugate-map decomposition of M, i.e.: or, rewriting the above in a compact way: Note that: (i) the channel Λ : B(H d ) → B(H |X| ) acts on operators of d dimension and outputs operators of |X| dimension; (ii) given M, there are multiple Λ (and Π) satisfying Eq. (16); and (iii) Λ is defined independently of the control unitary V φ -one can consider the l.h.s. of Eq. (15) as V † φ M x V φ instead, and define φ-dependent Λ, but doing so is neither necessary nor helpful.
One can prove that the condition (15), or (16), can always be satisfied, by considering as an example the quantum-classical channel that we used as well in the main text (Lemma 3 and Corollary 2) [1], namely, Another example of a CPTP map that also always satisfies Eq. (16) is whose rank is now d, rather than d · |X| in Eq. (17). Note that, both the orthogonal sets of basis ket {|x } x and bra { i|} i in Eqs. (17) and (18) can essentially be understood as flags, and can be chosen arbitrarily. Importantly, despite this equivalent picture in Fig.1(b), it is however still not the same as the "standard" noisy metrology scheme as depicted in Fig.1(c), where the noise, described by a CPTP map Λ, acts independently of the choice of the control unitary V φ . The crucial difference is that, in the former the control operation V φ is applied before the channel Λ, whereas in the latter it is applied after the channel Λ. Then, unless V φ commutes with Λ, which is hardly ever true, the two noise model will not be equivalent.
Let us still study more closely the two models from the perspective of parameter estimation, whereby the focus is on the QFI and channel QFI (perfect versus imperfect). Denote F [ρ(θ), M] as the classical Fisher information for θ with the state ρ(θ) and measurement M ∼ {M x } x , i.e., Tr{ρ(θ)Mx} . Then, for the imperfect measurement model, the imperfect QFI is given by (see Eq. (5) in the main text): Note that the optimal control unitary V opt ∼ {V opt } generally depends on the given ρ(θ), though for simplicity of the notation we will not write out this dependence explicitly for now. It then follows that where W is a unitary operator in B(H |X| ), and so we may formally upper bound the imperfect QFI by the QFI of (Λ • V opt )[ρ(θ)], which can be interpreted as a state that undergoes a "standard" noisy channel Λ•V opt . Moreover, given the imperfect measurement M, one can further optimize over different possible Λ that satisfies Eq. (16), and get Despite the established formal relations Eqs. (20) and (21), note that Λ • V opt is defined using the knowledge about optimal control unitary. However, if we had known V opt , we would have already in fact obtained F (im) , and there is no need for the upper bounds. In other words, the formal bounds Eqs. (20) and (21) are not that meaningful in practice.

A. Proof of the Observation 1 in Methods
Still, there is a special case worth mentioning, where we can indeed meaningfully evaluate a QFI-based upper bound without solving exactly for the V opt . Suppose that from the symmetry of the estimation problem we know that the optimal control operation must also carry it, so that its optimisation may be restricted to elements in some compact group G, i.e., V opt ∈ G. Then, if Λ satisfying Eq. (16) is known to be G-covariant, i.e., where W g is some unitary representation of G in H |X| -in particular, for W opt correspondingly to V opt in Eq. (22), we have from Eq. (20) that where the equality in (23) follows from the fact that QFI is invariant under parameter-independent unitary transformation.
Extension to channel QFI is straightforward. Let us denote the (perfect) parameter-encoding channel as E θ , such that ρ(θ) = E θ [ρ] for the input probe state ρ, which we will eventually optimize over. For the imperfect measurement case, we have by definition (see Eq. (5) in the main text): where V opt and ρ opt are respectively the optimal control unitary and input state, andĖ θ = ∂ θ E θ is the derivative of the encoding channel w.r.t. the parameter, such that for . Similarly to Eq. (20), one then obtains where σ is some state in H d , W is a unitary operator in B(H |X| ), andF is the channel QFI. Again, although Eq. (25) now provides a formal relation between the imperfect channel QFI and the channel QFI of a "standard" noisy encoding (Λ • V opt • E θ ), it requires knowledge of the optimal control unitary V opt (which implicitly depends on the optimal state ρ opt ), and is hence in general not immediately applicable.
Still, when we know that, thanks to some symmetry of the problem, the optimisation over control V φ can be restricted to elements of some compact group G, meaningful upper bound on the imperfect channel QFI can be formulated. Firstly, following directly from the Observation 1, in case the conjugate map Λ satisfies not only Eq. (16) but also the G-covariant condition Eq. (22), we can immediately conclude by maximising Eq. (23) over the input probe-states thatF

B. Proof of the Observation 2 in Methods
Secondly, suppose that the encoding channel E θ , as well as its derivate,Ė θ = ∂ θ E θ , are both G-covariant locally around the parameter value θ, i.e., where W g is some unitary representation of G in H d -in particular, W opt correspondingly for V opt . In this case, from (25), where the equality in (27) follows from the fact that channel QFI is invariant under parameter-independent unitary transformation on the input probe state. Note that it is necessary to include the local G-covariant condition forĖ θ here, as FI is not a function of the state alone but also its derivative, c.f. explicitly Eq. (24) and Eq. (28) below.
The QFI is defined as a function of a quantum state ρ ≡ ρ(θ) and its derivativeρ ≡ ∂ θ ρ(θ), as follows [2]: Equivalently, it may be expressed as the maximisation of the error propagation formula over all the quantum observables, i.e., Hermitian operators O = O † , as [3]: which is always maximised by O opt = L − Tr{ρL} with L being the SLD operator defined implicitly in Eq. (28).
However, the fraction in Eq. (29) can always be rewritten by introducing another maximisation, i.e.: with the maximum occurring at α opt = | Ȯ | ∆ 2 O . Hence, we may write again Eq. (29) as while noticing that the two maximisations can be recast into one by defining O := αO. Moreover, for any O above we may define a shifted operator X := O − O , so that substituting O = X + Tr{ρO } into Eq. (32), we obtain [4]: where the maximum is now performed over all Hermitian operators X = X † . We have also dropped the absolute value, as Ẋ = Tr{ρX} = ∂ θ Tr{ρX} is real, while the first (quadratic) term above is unaffected by the X → −X transformation-and so must be the maximal value attained in Eq. (34). Let us note that Eq. (34) constitutes a valid lower bound on the QFI for any fixed X, i.e.
while the optimal X opt , yielding the maximum in Eq. (34), is related to the actual optimal observable O opt in Eq. (29) via Hence, by substituting further O opt = L − Tr{ρL}, we can explicitly relate X opt to the SLD and the QFI, i.e.: Let us consider for our purposes the case of unitary parameter encoding, i.e. E θ = U θ ∼ {e −iθH } so thaṫ ρ = i[ρ, H], but the following analysis can be directly generalised to allow for arbitrary E θ andĖ θ . Then, using the expression (28) for the QFI, we may rewrite the potentially valid-e.g. given the G-covariance (22) or (26)upper bound on the imperfect QFIF (im) in Eq. (27) for the encoding U θ = e −iθH as: Moreover, we may now use Eq. (34) to further re-express the above channel QFI (for Λ • U θ ) as (42) As a result, we can formulate a numerical 'seesaw' algorithm that allows us to compute the channel QFI (41) by exploiting Eq. (42), as follows [4]: 1. Select (randomly) a starting state σ 0 and calculate the corresponding QFI F as well as the SLD L it would lead to in Eq. (41), i.e. without performing the maximisation over the input states σ.
2. Use the obtained F and L to compute the optimal operator X 0 = X opt according to Eq. (38), which then maximises Eq. (42) for the fixed state σ 0 .
3. Maximise the expression (42) over the input states with the operator X 0 being now fixed, in order to determine the best state σ 1 that then yields the tightest lower bound (35) on QFI for X 0 .
4. Return to step 1 and use σ 1 as the new starting state.
The above procedure is computationally efficient, as F and L in step 1 are obtained solving a linear programme, as in Eq. (28), while finding the optimal input state in step 3 corresponds to solving the maximal eigenvalue of a Hermitian operator defined within (. . . ) of Eq. (42). Although the convergence of the algorithm is generally assured [4], even if its rate is slow, at any stage it yields a valid lower bound (35) on the channel QFI (42). Recall that Λ in Eq. (27), and so in Eq. (42), corresponds to some valid conjugate-map decomposition of a given imperfect measurement M, for which it must fulfil the condition (16). As we now show, the above 'seesaw' formulation allows naturally to incorporate in Eq. (42) also the maximisation over all such conjugate maps, i.e. all quantum (CPTP) channels Λ satisfying M = Λ † [Π] for some projective measurement Π.
Let us define the corresponding maximum as where have substituted forF[H, Λ] according to Eq. (41). Note that in the above expression any projective measurement, Π, can be used as a reference, becausē which we simplify further by denoting the action of any CPTP map Λ via its Choi-Jamio lkowski (CJ) state [5].
In particular, for any input state σ it is true that Λ[σ] = Tr B Λ (I ⊗ σ T ) , where the CJ state of the map Λ is defined as Λ := Λ⊗I [|I I|]. Here, any square matrix A defines a bipartite state |A = ij A ij |i, j AB = ij A ij |i A |j B , so that |I = i |i, i AB is the (unnormalised) maximally entangled state. Then, it is straightforward to prove that the CJ state for the conjugate map of Λ, i.e. Λ † , is nothing but Λ † = S AB * Λ S AB , where S AB is the swap operator such that S AB |ψ A |φ B = |φ A |ψ B for any states |φ and |ψ . Consequently, the action of Λ † corresponds to Finally, using Eq. (47) to replace the maximisation over quantum maps Λ in Eq. (45) by the corresponding CJ states Λ , we obtain which we evaluate by adding to the aforementioned 'seesaw' algorithm one more step in which we maximise over Λ ≥ 0 (given the linear constraints to reproduce the elements of the imperfect measurement), while fixing the input state, σ, and the Hermitian operator, X.
Consider also the input probe state in the equatorial Bloch plane, such that for example the encoded state is ρ(θ) = |ψ(θ) ψ(θ)|, |ψ(θ) = 1 √ 2 (|0 + e iθ |1 ). As we have established above in Supplementary Note 3, the optimal control unitary in this case must have the structure V opt = e iϕoptσz for some ϕ opt and, hence, belongs to the U(1) group with unitary representation: V g ∼ {e igσz } and g ∈ [0, 2π). Consequently, when Λ can be chosen to not only fulfil (16) but also be phase-covariant [6][7][8], i.e. ∀ g : [Λ, V g ] = 0, the G-covariance condition (22) is satisfied. For the special case of symmetric mixing with p = q and any 0 ≤ p ≤ 1, one can show that the dephasing channel fulfilling (16) Fig. 2 for p = q = 0.9. For asymmetric mixing p = q, however, one can show that for a wide range of p and q any channel Λ satisfying Eq. (16) cannot exhibit phase-covariance. In particular, Λ can only be phase-covariant when there exists φ ∈ [0, 2π) such that | cos φ| ≥ |δ| and 1 ≥ 4η 2 where η := p + q − 1 and δ := p − q as in the main text, and the above conditions originate from the CP-constraints on any phase-covariant channel (see e.g. Ref. [8]). Nonetheless, note that if conditions (49) q (apart from special q = p), given the value of p = 0.9 chosen.
On the contrary, within the range of p = q in Suppl. Fig. 2 that yield imperfect measurements M whose valid conjugate-map decompositions in Eq. (16) may not exhibit phase-covariance (or more generally, Gcovariance)-range of q without any solution marked in blue-the Observation 1 is not applicable and the upper bound (23) can no longer be taken for granted. Indeed, as demonstrated by the red solid line in Suppl. Fig. 2 for the quantum-classical channel (15), which yields a valid conjugate-map decomposition for any q, its QFI no longer upper-bounds the imperfect QFI with, in fact, . It is so, as F[Λ[ρ(θ)]] equals then the classical FI for the imperfect measurement M with no control (V φ = 1) and, hence, by definition is always smaller or equal to F (im) . Interestingly, in this particular qubit example with asymmetric bit-flip noise, F[Λ[ρ(θ)]] coincides with the imperfect QFI, F (im) , when Λ is chosen to be the channel defined in Eq. (18). The same exemplary model can also be used to illustrate the applicability of the Observation 2 discussed in Supplementary Note 4 A, which applies rather to the imperfect channel QFI,F (im) , incorporating maximisation over the input states in Eqs. (41) and (42). As explained in the main text, see particularly Fig. 2 therein, when allowing for arbitrary control any input state lying on the equator of the Bloch sphere is optimal, so that F (im) = F (im) for the input |+ and the corresponding curve (black solid line) in Suppl. Fig. 3 is just the same as the one for F (im) in Suppl. Fig. 2 (given by Eq. (11) in the main text). Now, as the phase encoding e iθσz/2 commutes with the group of control unitaries e iϕσz for any ϕ, the G-covariance condition (26) is satisfied instead, and at the level of the imperfect channel QFI the inequalitȳ F (im) ≤F[(Λ • E θ )] holds for any channel Λ satisfying Eq. (16).
In Fig. 3, with help of the 'seesaw' algorithm introduced in Supplementary Note 4 C, we computeF[(Λ • E θ )] in Eq. (45) with Λ representing: phase-covariant channels, quantum-classical channel (17) and the channel (18). In the first case, as any state from the equator is still optimal thanks to the phase-covariance property, we can equivalently utilise Eq. (50) and recover the same upper bounds as in Suppl. Fig. 2 (blue oval and dot). However, in the latter two cases, we find the resulting upper bounds to coincide with the imperfect channel QFI, i.e.F (im) =F[(Λ • E θ )] with Λ of Eq. (17) or Eq. (18). Note that this contrasts the case of quantum-classical channel in Suppl. Fig. 2, which thanks to the optimisation of the input state (for each value of q) yields now a valid "upper-bound"-actually reproducing the exact F (im) .
Furthermore, we resort to the generalisation of the 'seesaw' algorithm that includes maximisation over all possible conjugate-map decompositions of the imperfect measurement, as discussed in Supplementary Note 4 D. In this way, we obtainF[(Λ • E θ )] maximised over all Λ satisfying Eq. (16) for each bit-flip error values p and q-orange line in Suppl. Fig. 3. We observe that such an upper is not even convex as the true imperfect channel QFI,F (im) (black line) and, hence, tight only at the extremal values of q = 0 or 1. In this example, we demonstrate a scenario where now the G-covariance condition (26) is not satisfied and, hence, also Observation 2 is not applicable. We consider generalisation of the above phase-sensing example to the case of two (N = 2) qubits, in which each qubit probe state undergoes the same encoding and imperfect measurement. For our purposes, we consider only the case of symmetric (p = q) bit-flip as the detection noise, as it is sufficient to explore the possibility of the optimal input probe state now being entangled and, crucially, the control operation V φ now acting globally on both qubits, , obtained by determining numerically for each p = q the optimal control unitary, Vopt, which now crucially acts globally on both qubits and may not commute with neither the conjugate-map decompositions considered, Λ ⊗2 , nor the encoding, E ⊗2 θ . As result, both the G-covariance conditions (22) and (26) need not be satisfied. This is confirmed by observing that Eq. (27) is explicitly invalidated, i.e.F (im) ≤F[(Λ • E θ ) ⊗2 ], for conjugate-map decompositions with Λ corresponding to either: the dephasing channel (green), quantum-classical channel (17) (red ) or the channel defined in Eq. (18) (blue)-observe all the corresponding curves lying below the one ofF (im) . On the other hand, the 'seesaw' algorithm allows us to find also the local and global channelsΛ ⊗2 and Λ (2) , respectively, that constitute valid conjugate-map decompositions (16) θ ] = 4 (black dots), i.e. the perfect quantum channel QFI. This emphasises further that, already for N = 2, results of quantum metrology with noisy encoding may not be directly used to estimate performance with imperfect measurements.
First, from Lemma 1, we know that the imperfect measurement does not change the optimal input state, which is thus: |Φ + = (|00 + |11 )/ √ 2. Then, one may prove that the optimal two-qubit control unitary must take the form (see also the discussion below in Supplementary Note 5): V opt ∼ {V opt = W e iϕopt(σz⊗1+1⊗σz) } for some optimal real coefficient ϕ opt , where W is the unitary that transforms the Bell basis onto the product ba- However, one can also verify that unitaries of such a form generally do not commute with the encoding E ⊗2 θ , so there is no straightforward identification that the Gcovariance condition (26) can be satisfied. In fact, since satisfying the G-covariance condition (26) implies that Eq. (27) must hold for any Λ, finding an example of Λ that violates Eq. (27) and, hence, the Observation 2, would imply that (26) cannot be fulfilled.
We evaluate the aforementioned channel QFIs in Suppl. Fig. 4 for the three types of conjugate-map decompositions by resorting to the 'seesaw' algorithm described in Supplementary Note 4 C. However, one should also recall from Suppl. Fig. 3 (see the maximum of the orange line) that in the single-qubit case there exists Λ satisfying (16) its corresponding channel QFI actually equals the perfect channel QFI and, hence, is useless in estimating the impact of the detection (bit-flip) noise. We verify that such a conjugate-map decomposition remains uninformative also in the two-qubit scenario, in θ ] = 4 for any p = q-see the black dotted line in Suppl. Fig. 4. For consistency, by resorting to the 'seesaw' algorithm of Supplementary Note 4 D that includes maximisation over conjugate-map decompositions, we also verify that there exists a global Λ (2) that satisfies the condition (16) for N = 2, i.e. M ⊗2 = Λ (2) † [Π ⊗2 ], and similarly leads tō Interestingly, we observe that neitherΛ ⊗2 nor Λ (2) commute in general with V opt , so that G-covariance condition (22) still does not applymaking the connection between metrological protocols with imperfect (local) measurements and the "standard" setting of quantum metrology with noisy encoding on multiple probes even less apparent.

Supplementary Note 5. Proof of Theorem 1 and convergence rate to the perfect QFI
As the results and expressions derived in Lemma 1 are independent of the number of probes, it follows that Equations (1-4) hold still. Applying specifically to the multi-probe scenario discussed in the main text with the structure of measurement as M ⊗N , i.e. each probe is still measured independently but in an imperfect manner, Eqs. (3) and (4) read with γ( Φ, ψ N (θ)) := where now x = (x 1 , x 2 , · · · , x N ), with M ⊗N ∼ {M x }, where M x is the noisy measurement operator for outcome x for the -th probe. A lower bound on γ( Φ, ψ N (θ)), and hence on F N , for any given choice of V Φ can be constructed as fol- , where in order not to overload the notation, we have kept the θ and Φ dependence implicit. Then, the numerator of 4γ( Φ, ψ N (θ)) is while the denominator is which by the Cauchy-Schwartz inequality is smaller or equal to Hence, we have Let us now denote: and note that p ± (x) depends solely on our choice of |ζ N , |ζ N ⊥ . To attain the lower bound in Eq. (17) of Thm. 1, consider a specific choice of |ζ N , |ζ N ⊥ (obtained by a suitable V Φ ) : for some orthogonal single-qudit states {|ζ , |ζ ⊥ }. Then, with p + (x) := ζ|M x |ζ , p − (x) := ζ ⊥ |M x |ζ ⊥ , and c := x p + (x)p − (x), we have p ± (x) = N j=1 p ± (x j ), and thus Evidently, we have 0 ≤ c ≤ 1, and so we complete the proof of Thm 1.
A few remarks are in order: Given a specific choice of {|ζ ⊗N , |ζ ⊥ ⊗N } (equivalently V Φ and p ± (x)}) the FI converges to the perfect QFI exponentially fast with an approximate prefactor (see Eqs. (57) and (60)): Note that this prefactor is the Hellinger distance, H (p + (x) , p − (x)), between the distributions p + (x) and p − (x) [12], while the convergence rate χ (c ≡ e −χ ) reads For very close distributions it can be observed that the convergence rate is equal to the Fisher metric: . As N → ∞, generally we can use the central limit theorem to approximate H (p + (x) , p − (x)) as the Hellinger distance between two Gaussian distributions. This would then imply a convergence rate of: 1 , where µ ± , σ ± are the average and standard deviation of p ± (x) respectively, which of course depends on our choice of |ζ , |ζ ⊥ . We can thus apply this analysis to obtain the convergence rate for specific cases. For Poissonian channel (with coefficients λ |0 , λ |1 ), such as in NV centres, we find a convergence rate of: 1 2 . This convergence rate is achieved by taking |ζ N , |ζ N ⊥ to be |0 ⊗N , |1 ⊗N (similar convergence rate is obtained by taking them to be any superposition of |0 ⊗N , |1 ⊗N ). This implies that for realistic experimental values of NV centres, λ |0 = 0.1, λ |1 = 0.07 (without the use of nuclear spins as memory), the number of probes that obtains 95% of the perfect QFI would be ∼ 2000. For a binary asymmetric bit-flip channel with probabilities p, q, we find a convergence rate of: 1 4 (p+q−1) 2 p(1−p)+q(1−q) , given a similar choice of |ζ = |0 ⊗N , |ζ ⊥ = |1 ⊗N . As a side note, we may also compute c directly in this case, and it can be verified that indeed c N = A note on optimality: Let us justify our choice of unitary, namely the choice of |ζ N = |0 ⊗N , |ζ N ⊥ = |1 ⊗N . Given the approximated value of the convergence rate, Eq. (62), we claim that this choice yields the optimal convergence rate. That is, we would like to choose |ζ N , |ζ N ⊥ that maximize the Hellinger distance between x . As before, we focus on the classical noise channel, namely commuting M x : M x = i p(x|i) Π i . Observe that: • The Hellinger distance is convex in the probability distributions: • Let {|j } j be the common eigenbasis of {M x } x , then given |ζ = √ In words: taking superpositions of the eigenstates leads to convex combinations of the probabilities.
The above two observations imply that the maximal Hellinger distance is achieved by taking |ζ N , |ζ N ⊥ to be elements in the common eigenbasis (and not superpositions of them). Hence, we just need to find the two basis states that yield maximal Hellinger distance. For N qudits, we thus need to find the two states |j 1 , |j 2 with maximal Hellinger distance out of the d-dimensional eigenbasis, and then the optimal choice of |ζ N , |ζ N ⊥ would be |j 1 ⊗N , |j 2 ⊗N . For the case of NV centres with the local projective measurement {Π 1 = |0 0|, Π 2 = |1 1|} for each of the N NV centres, this immediately implies that the optimal |ζ N , |ζ N ⊥ are |0 ⊗N , |1 ⊗N . In fact this intuition together with some numerical evidence leads us to conjecture that for any classical noise channel, M x = i p(x|i) Π i that is applied independently on each of the N probes, the optimal |ζ N , |ζ N ⊥ take the form of "cat states": where |j , |k can be found numerically for just a single probe, and θ depends on N and should be found numerically. In particular for N qubits we connjecture that the optimal |ζ N , |ζ N ⊥ take the form of cos (θ) |0 ⊗N + sin (θ) |1 ⊗N , − sin (θ) |0 ⊗N + cos (θ) |1 ⊗N .

Supplementary Note 6. Proof of Lemma 2
With the encoded state ρ N r (θ) = r ψ N (θ) + (1 − r)1 d N /d N , where ψ N (θ) = |ψ N (θ) ψ N (θ)|, similar to the proof of Thm. 1, it is straightforward to establish that the FI can be written as where γ r ( Φ, ψ N (θ)) with p (x) := Tr{M x }/d N . Note that p (x) is a legit probability distribution, i.e., p (x) ≥ 0 ∀x, and x p (x) = 1. Moreover, evidently γ r ( Φ, ψ N (θ)) ≤ rγ( Φ, ψ N (θ)) ≤ r, and so we arrive at an upper bound . Now, using essentially the same observation and notation leading to Eqs. (54-56), the following inequality holds: . (65) Consider then three different ways of grouping all the x. First, consider two sets, A + and B + , defined as Similarly, define the sets as well as which can be interpreted as the minimal error in discriminating the two probability distributions {p ± (x)} and {p (x)} with prior r and 1 − r, respectively. Thus, Then, for the r.h.s. of Eq. (65), we shall evaluate the sum over all x into A and B respectively. For x ∈ A, using the identity (1 + x) −1 ≥ 1 − x for any (1 + x) ∈ R + , we get Meanwhile, for x in the set B, observe that Hence, we have Combining Eqs. (65, 70, 71, 73), we obtain Finally, upon choosing V Φ as in Eq. (59), we have To complete the proof of Lemma 2, note that the choice of V Φ in Eq. (59) gives us p ± (x) = N j=1 p ± (x j ), and since p (x) = N j=1 p (x j ) where p (x j ) = Tr{M xj }/d, ± is now the minimal error in discriminating two probability distributions {p ± (x j )} and p (x j ) over N repetitions, which goes to zero in the N → 0 limit.

Supplementary Note 7. Proof of Lemma 3
A particular Kraus representation of the (single-probe) channel Λ θ, φ = Λ M • V φ • U θ , with the quantum-classical channel Λ M being defined in Eq. (17) for a given imperfect measurement M, reads Note that as mentioned earlier, the set of orthogonal bra basis { j|} d j=1 can be chosen arbitrarily, corresponding to different choice of quantum-classical channel Λ. Then, as elaborated in Methods in the main text, the asymptotic CE bound F (CE,as) N is defined when there exists some other Kraus representation {K x,j (θ, φ)} of Λ θ, φ such that βK = 0 for any φ. Moreover, following Refs. [13][14][15], it suffices to consider Kraus representations that have the following properties: where g( φ) is an arbitrary Herimitian matrix satisfying g x,j ;x ,j ( φ) = g * x ,j ;x,j ( φ), and the βK = 0 condition is equivalent to the existence of g such that Putting in Eq. (75) into Eq. (78), the βK = 0 condition is then given by ( with A x ( φ) := 2 j,j g x,j;x,j ( φ)|j j |, which is Eq. (22) in Lemma 3-here, without specifying explicitly which probe we are referring to.

Supplementary Note 8. Proof of Corollary 2
We say that a detection channel P acts non-trivially on a given subsetĨ ⊆ I of 'inaccessible' outcomes, if for any pair i, i ∈Ĩ there exists at least one 'observable' outcome x such that the transition probabilities of P satisfy p(x|i)p(x|i ) > 0. Moreover, if thisĨ contains the outcomes spanning the subspace of the encoding Hamiltonian, i.e. h = i,i ∈Ĩ h i,i |i i | with det{h} = 0, we say that P acts non-trivially on the encoding subspace. However, for our purposes we consider P that act nontrivially on all the outcomes in I, and refer to these as non-trivial.
Suppose the imperfect measurement M is composed of a perfect von Neumann measurement, i.e., Π ∼ {Π i = |i i|} i with Π i Π i = δ i,i Π i , followed by a noisy detection channel P ∼ {p(x|i)}, such that M x = i p(x|i)Π i . As a result, the condition (79) is equivalent to where the entries i|A x ( φ)|i = 2g x,i;x,i ( φ) can be chosen arbitrarily (of some Hermitian matrix). Now, given that P is non-trivial, so that for any pair i, i ∈ I there exists x such that p(x|i)p(x|i ) = 0, we can define a Hermitian matrix C whose all entries, C i,i := 2 x p(x|i)p(x|i ) g x,i;x,i ( φ) for all i, i ∈ I, can be freely chosen (in particular, non-zero) by varying g. Consequently, we may again rewrite Eq. (79) as h = V † CV, where V := i,i ∈I i|V φ |i |i i | is an (invertible) unitary matrix. Hence, we can always satisfy the condition (79) by choosing g such that C = VhV † . In summary, whenever the stochastic map P representing the detection noise is non-trivial, Eq. (79) can always be fulfilled, which by the virtue of the asymptotic CE bound forces the MSE to asymptotically follow the SS.
Then, note that the calculations for F can be made simpler using the fact that we are looking at estimation precision locally around some underlying true value of θ, say θ 0 . Consequently, instead of considering the most general unitaries u with arbitrary θ dependence, such thatK x,j (θ, φ) = x ,j u x,j;x ,j (θ, φ)K x ,j (θ, φ), it suffices to consider all the Kraus representationsK(θ, φ) that differ from the canonical one K(θ, φ) only by their first derivatives with respect to θ. That is, we only need to consider unitaries u = e i(θ−θ0)g for some Hermitian generator g, such that at θ = θ 0 eventually, the Kraus operators obey Eqs (76 and 77). As result we can replace abstract minimization minK in Eqs. (80) and (81) by min g , i.e. minimization over all Hermitian matrices g of dimension d|X| × d|X|.
The bounds F ( φ) involve calculations of operator norms αK and βK , which can be cast as a SDP problem. We refer the readers again to Refs. ( [14,15]) for its complete derivation, and for here we shall just outline the algorithm and result. In essence, upon defining λ 2 a := αK and λ 2 b := βK 2 , and stacking up all the Kraus operators into a vector of matrices, such that Eq. (76) now readsK = K := [K x=0,j=1 (θ, φ), K x=1,j=1 (θ, φ), · · · ] T andK =K −igK, we can rewrite Eq. (80) as where and d out = d(|X| 2 + 1). In order to evaluate F (CE,as) N ( φ) of Eq. (81), an additional constraint should just be added to Eq. (82), i.e. iK † K = K † gK that is simply equivalent to the condition βK = 0 imposed in Eq. (81). For particular simple examples analytical answers can be obtained. In particular, for the case of N qubits each sensing the phase θ in the encoding U θ ∼ {e iθσz/2 }, and are subject to measurement noise corresponding to binary asymmetric channel P mixing each binary outcome of measuring {Π i = |Π i Π i |} with |Π 1(2) = |± , starting from the canonical Kraus representation, we find that by selecting g = r(σ z ⊕ σ z ) in Eq. (77) with Interestingly this result is independent of the choice of φ with the local control unitaries taking the form V φ = e iσzφ , which is found to be optimal over all choices of V φ , and so we have actually obtainedF (85), and hence, Eq. (24) in the main text.
We also take note that the bound (85) can also be obtained from a conjecture, as a consequence of the Gcovariance formalism (Observation 2 in main text), applied to the case with local control unitaries. That is, upon conjecturing that the optimal local control unitaries take the form of V opt = e iσzφ (which as said is supported by our numerical findings), the generalisation of Eq. (27) to N independent copies of channel leads tō where Λ is some conjugate-map decomposition Λ satisfying the G-covariant condition (16), e.g., the quantumclassical channel (17). As (Λ • U θ ) ⊗N is of the form of uncorrelated noisy encoding, then, we may apply the standard technique of CE formalism to upper bound [15], which, in this case, turns out to be given exactly by (85).

Supplementary Note 10. Error-propagation formula with imperfect measurement
The mean squared error of estimators obtained from measuring the mean of some observableÔ with large number of repetitions ν, is well approximated by the socalled "error-propagation formula" [16] where ∆ 2 O = Ô 2 − Ô 2 , with A being the expectation value of the operator A over the quantum state ρ(θ), i.e. A = Tr{ρ(θ)A}. For our quantum-classical channel scenario with noisy detection channel represented by the stochastic map P ∼ {p(x|i)}, while we have the freedom to choose the measurement basis Π i, φ , we need to keep in mind that the only observable and effective measurement that we have is {M x, φ = i p(x|i)Π i, φ }, and it is not projective in general. The observable that we measure is thuŝ O = x f x M x, φ for some {f x } defining the observable (which can be chosen quite arbitrarily). For N independent quantum-classical channel, we can then construct the joint observableÔ = N j=1Ô where j labels the different channels. Alternatively, we may also consider a second kind of joint observable, where instead of summing over the constituent single-particle operators, we perform product over them: While it maybe tempting, we cannot however simply to compute ∆ 2Ô in Eq. (87). The reason is, Eq. (87) uses the implicit assumption that the observableÔ is measured at its eigenbasis, and that is not the case here. The effectivê O 2 that we should use in Eq. (87) is one which mimics the statistics that we would get as if we are measuring the eigenbasis, i.e., as if {M x, φ } are projective. That is, we have, for the first kind of observables, and, for the second kind of observables, and then for our quantum-classical channel. We apply Eq. (91) to the case of N qubits, each of which undergoes a projective measurement Π 1, φ = |+ +|, Π 2, φ = |− −| where σ x |± = ±|± , and |X| = 2. We consider a binary mixing channel P that flips the measurement outcomes with p(1|1) = p, p(2|2) = q, so that the effective measurements corresponding to the observed outcomes read M 1, φ = (1 + δ)1/2 + ησ x /2 and M 2, φ = (1 − δ)1/2 − ησ x /2 with η := p + q − 1, δ := p − q. Then, for the first kind of observables, we havê 2 is the usual total angular momentum operator in the -direction with = {x, y, z}. After some straightforward algebra, one obtains Given U θ = e iθσz/2 to be the unitary encoding the estimated parameter θ onto each probe, the state of all the probes just before the measurement reads ρ N (θ) := e iθĴz ρ N e −iθĴz , where ρ N is the input N -qubit probe state. In such as case, we have that ∂ θ Ĵ x = − sin θĴ x + cos θĴ y ρ N = Tr{ρ N (sin θĴ x + cos θĴ y )}. For the second kind of observables, let us consider for example the (imperfect) parity operator, i.e., with f 1 = −f 2 = 1. Then, we havê and finally Consider the measurement of the (imperfect) parity operator,P = , with M 1, φ = (1 + δ)1/2 + ησ x /2 and M 2, φ = (1 − δ)1/2 − ησ x /2 as above. Then, by the error-propagation formula once more (see Supplement for details), we have where P = Tr{e iθĴz ρ N e −iθĴzP }. Using the (rotated) GHZ input state, ρ N = |ψ ψ|, |ψ = e iφĴz 1 √ 2 |0 . . . 0 + |1 . . . 1 , we thus get with ϕ = φ + θ as before. While the parity measurement with GHZ state will perform poorly for large N by virtue of the exponential factor η −2N , it does however make a good candidate for small N regime where the 1/N 2 factor dominates. Indeed, upon optimising over φ, we obtain the blue curve in Fig. 6 of the main text.
Supplementary Note 11. Hierarchy of moment-based lower bounds on the FI Using only partial information from a full probability distribution, such as considering only up to certain finite moments, one obtains a lower bound for the FI. For the case of univariate and single-parameter estimation, we provide here a simple "physicist's" reformulation for constructing such a lower bound, which consistently agrees with more abstract considerations [9][10][11].
We first rewrite the FI F = xq φ,θ (x) 2 /q θ, φ (x) by making use of essentially a simple identity: any real quadratic function g(y) = −ay 2 + 2by with a > 0, b ∈ R, has its maximum given by max y g(y) = g(b/a) = b 2 /a, so equivalently F = x max yx {−q θ, φ (x)y 2 x +2q φ,θ (x)y x }. Using a series ansatz y x = K k=0 α k w(x) k for some chosen function w(x) with K smaller than the cardinality of the probability distribution, we obtain a lower bound F (K) on FI after maximizing now over the finite set {α k } K k=0 . By construction, we have F (0) ≤ F (1) ≤ F (2) ≤ · · · ≤ F (K) ≤ F , and F (K) can be computed straightforwardly as where α = α 0 , α 1 , · · · , α K T , and A and b are as in