Quantum speedup in the identification of cause–effect relations

The ability to identify cause–effect relations is an essential component of the scientific method. The identification of causal relations is generally accomplished through statistical trials where alternative hypotheses are tested against each other. Traditionally, such trials have been based on classical statistics. However, classical statistics becomes inadequate at the quantum scale, where a richer spectrum of causal relations is accessible. Here we show that quantum strategies can greatly speed up the identification of causal relations. We analyse the task of identifying the effect of a given variable, and we show that the optimal quantum strategy beats all classical strategies by running multiple equivalent tests in a quantum superposition. The same working principle leads to advantages in the detection of a causal link between two variables, and in the identification of the cause of a given variable.


SUPPLEMENTARY NOTES
Supplementary Note 1: Complementarity relation between tests of the causal structure and tests of the functional dependency between cause and effect.
Here we provide the proof of the complementarity relation (7) in the main text.
1. Bound on the error probability for parallel strategies with no reference system The two causal hypotheses are that the quantum channel from A to the composite system B ⊗ C is either of the form C 1,U1 = U 1,B ⊗ I C /d, or of the form C 2,U2 = I B /d ⊗ U 2,C , with U 1 (·) := U 1 · U † 1 , U 2 (·) := U 2 · U † 2 . Here, U 1 and U 2 are unitary operations, unknown to the experimenter but fixed throughout the N rounds of the experiment.
Here we consider parallel strategies, where the channel C ⊗N x,Ux (with x = 1 or x = 2) is applied in parallel on a multipartite input state, as in the following diagram where R is a reference system of fixed dimension. The probability to obtain the outcome x when the channel is C x,Ux is equal to p( x|x) = Tr P x C ⊗N x,Ux ⊗ I R (ρ) .
Since U 1 and U 2 are unknown, we consider the worst-case error probability, namely p wc err := max U1,U2∈U where U is a set of unitary operators. For example, U can be 1. the group of permutation operators of the form U π = d i=1 |π(i) i|, where π is an element of the permutation group S d The worst-case error probability is lower bounded by the average error probability p ave err := 1 |U| 2 U1,U2 p err (U 1 , U 2 ) . (6) By definition, the average error probability is equal to the error probability in distinguishing between the average channels C (N ) 1 and C (N ) 2 Now, suppose that the experimenter prepares an N -particle state |Ψ ∈ H ⊗N , without using a reference system. The average error probability has the tight lower bound achieved by Helstrom's minimum error measurement [2]. The distance between the average output states can be expressed as with ω := Now, the pure states are purifications of Ψ ⊗ ω and Σ, respectively. Hence, the monotonicity of the trace distance yields the bound Inserting this bound into Equation (8), we then obtain Now, note that we have Hence, the Equation (13) yields the bound It is clear that the minimum of the right-hand-side is obtained when the state Ψ is pure, in which case, the bound becomes p wc err ≥ 1/(2d N ).

Bound on the success probability in the identification of a unitary gate
More generally, the bound (15) can be interpreted as a complementarity relation between the estimation of the causal structure and the estimation of the functional dependence between cause and effect. Lemma 1 . Consider the task of guessing the gate U ∈ U from the state |Ψ U := U ⊗N |Ψ . If U is a generalised N -design for some group representation {U } U ∈G , then the probability of a correct guess satisfies the bound The bound is attained by the square-root measurement [3], with operators Then, the Yuen-Kennedy-Lax bound implies the inequality Since the unitaries U form a generalised N -design, the operator Λ is invariant under the action of the group representation {U ⊗N } U ∈G . Moreover, every invariant operator Γ can be written as Λ for some suitable Λ (in fact, it suffices to take Λ = Γ). Hence, one has the bound In particular, one can take Γ = c Ψ for some suitable constant c. With this choice, the condition Γ ≥ Ψ/|U| is equivalent to which in turn is equivalent to Then, the bound (19) becomes p U guess ≤ Tr[ Ψ ] 2 /|U|. The bound is attained by the square-root measurement for every U .
Combining the above lemma with Equation (15) we obtain the relation (23)

Supplementary Note 2: Optimal universal strategy
Here we derive the optimal strategy for identifying the causal intermediary when the cause-effect relationship is described by an arbitrary unitary gate.

Reduction to the minimisation of the average probability
The problem is to find the strategy that minimises the worst-case error probability. Thanks to the symmetry of the problem, the minimisation of the worst-case error probability can be reduced to the minimisation of the average error probability: Lemma 2 . For every fixed reference system R and for every fixed N , minimum worst-case error probability in the discrimination of the channels C 1,U1 and C 2,U2 with N uses is equal to the average error probability where dU is the normalised invariant measure. In turn, the average error probability is equal to the minimum error probability in the discrimination of the channels and C (N ) 2 There exists a state ρ and a measurement {P 1 , P 2 } that are optimal for both problems.
We omit the proof, which is a simple adaptation of Holevo's argument on the optimality of covariant measurements [5], see also [6].

Optimal form of the input states
Let us search for the optimal quantum strategy. Note that the channels C (N ) 1 and C (N ) 2 satisfy the condition where T (N ) in is the twirling channel Eq. (26) implies that the search of the optimal input state can be restricted to invariant states-i. e. states satisfying the condition The structure of the invariant states can be made explicit using the Schur-Weyl duality [7], whereby the tensor product Hilbert space H ⊗N is decomposed as where Y N,d is the set of Young diagrams of N boxes arranged in d rows, while R λ and M λ are representation and multiplicity spaces for the tensor action of SU(d), respectively. Using the Schur-Weyl decomposition, every invariant state on H ⊗N ⊗ H R can be decomposed as where {q λ } is a probability distribution, P λ is the identity operator on the representation space R λ , and ρ λR is a density matrix on the Hilbert space M λ ⊗ H R . Note that the set of invariant states (30) is convex. Since the (average) error probability is a linear function of ρ, the minimisation can be restricted to the extreme points of this convex set. Hence, we have the following Proposition 1 . Without loss of generality, the optimal input state for a parallel strategy with reference system R can be taken of the form where λ 0 ∈ Y N,d is a fixed Young diagram and Ψ λ0R is a pure state on M λ0 ⊗ H R .

Error probability for states of the optimal form
The problem is to find the input state that makes the output states most distinguishable. To this purpose, it is convenient to label operators with the corresponding systems and to use the notation A := A 1 A 2 · · · A N , B := B 1 B 2 · · · B N , C := C 1 C 2 · · · C N , and R := R.
When applied to an invariant state of the composite system AR, the two channels C up to a convenient reordering of the Hilbert spaces.
The minimum error probability in the discrimination of the output states is given by Helstrom's theorem [2]. Specifically, one has In the following, we compute the trace norm explicitly for input states of the optimal form It is convenient to decompose the identity operator I ⊗N as where P λ is the identity operator on the representation space R λ and Q λ is the identity operator on the multiplicity space M λ . In the following, we denote by m λ = Tr[Q λ ] the dimension of M λ . Combining Eqs. (32), (34), and (35), we obtain It remains to compute the trace norm in the first summand. To this purpose, it is convenient to define the states where ρ is the marginal state of Ψ λ0R on the multiplicity space M λ0 , and {|n , n = 1, . . . , m λ0 } are the eigenvectors of ρ. With this definition, the states are mutually orthogonal. For example, one has the second equality coming from the fact that ρ is diagonal in the basis {|n }.
In terms of the vectors (37), one can rewrite the relevant terms as Then, the trace norm is The maximum trace norm is reached when the eigenvalues of ρ are all equal. In that case, one has where r is the rank of ρ. Combining the above equation with Eqs. (36) and (33) we obtain the error probability Note that the function f (r) is monotonically decreasing, and therefore the error probability is minimised by maximising the rank r, i. e. by choosing where d R is the dimension of the reference system.

Minimum error probability
The probability of error is given by Eq. (43). When the reference system has dimension larger than the multiplicity m λ0 , one has the equality and the error probability becomes with f defined as in Equation (43).
The only way to beat the classical scaling 1/d N is to make f (m λ0 ) exponentially small. Since f is positive and monotonically decreasing, this means that m λ0 must be exponentially large. Note that, for large m λ0 , the probability of error has the asymptotic expression (47) Asymptotically, the problem is reduced to the minimisation of the ratio d λ0 /m λ0 . To find the minimum, it is useful to apply the notion of majorisation Young diagrams. Given two diagrams λ and µ of N boxes arranged in d rows, we say that λ majorises µ if where λ i (µ i ) is the length of the i-th row of the diagram λ (µ).
Here the pair (i, j) labels a box in the diagram, with the indices i and j labelling the row and the column, respectively. hook(i, j) denotes the length of the hook consisting of boxes to the right and to the bottom of the box (i, j). Using the above expressions, the dimension/multiplicity ratio reads Now, since λ majorises µ, one has the bounds Summarizing, we showed that 1. when N is a multiple of d, the optimal Young diagram corresponds to the trivial representation of SU(d) 2. when N is not a multiple of d, the optimal Young diagram corresponds to the totally antisymmetric representation acting on N − d N/d particles.
3. asymptotically, the symmetric subspace is the worst possible choice, leading to the classical rate R C = log d.
In conclusion, we proved the following Proposition 3 . When N is a multiple of d, the optimal input state is P λ0 /d λ0 ⊗ |Ψ Ψ| λ0R , where λ 0 is he trivial representation of SU(d) in the N -fold tensor product U ⊗N , d R ≥ m λ0 , and |Ψ λ0R ∈ M λ0 ⊗ H R is a maximally entangled state.
Since the trivial representation is one-dimensional, the error probability (47) takes the form (52) Moreover, the trivial representation of SU(d) corresponds to the Young diagram with d rows, each of length N/d. Hence, its multiplicity is given by For fixed d, the Stirling approximation yields the expression where c(N ) is a function tending to 1 in the large N limit. Taking the logarithm on both sides, one obtains Inserting this value into the expression of the error probability (47), we obtain the rate (56)

Quantum superposition of equivalent setups
Here we prove that the optimal state can be realized as a coherent superposition of equivalent setups, where the N input variables are divided in groups of d, and all the variables in the same group are initialized in the SU(d) singlet state.
is an orthonormal basis for the reference system, indexed by the possible ways to group N objects into groups of d, and |S ⊗N/d i is the product of N/d singlet states, distributed according to the grouping i. Then, 1. the state |Ψ AR is optimal for the identification of the causal intermediary 2. the number r of linearly independent vectors of the form |S where c(N ) is a function tending to 1 in the large N limit.
Proof. By definition, |Ψ AR is invariant under the n-fold action of SU(d) on system A, meaning that the corresponding density matrix has the optimal form |Ψ Ψ| AR = P λ0 /d λ0 ⊗|Ψ Ψ| λ0R , where λ 0 is the trivial representation of SU(d).
In fact, since the trivial representation is one-dimensional, we may equivalently write |Ψ Ψ| AR ≡ |Ψ Ψ| λ0R . Now, the marginal state is invariant under permutations. Hence, the Schur lemma implies the relation Since |Ψ AR is a purification of ρ A , we conclude that |Ψ AR is a maximally entangled state between R and the multiplicity system M λ0 . Hence, |Ψ AR coincides with the optimal input state of Proposition 3 . Moreover, comparing Equations (59) and (60) we obtain that the rank of ρ A is equal to the multiplicity m λ0 . Since the rank of ρ A is the number of linearly independent vectors of the form |S , we conclude that the number of such vectors is m λ0 . Finally, m λ0 can be expressed as in Equation (54).

Supplementary Note 3: Optimal classical strategy for k causal hypotheses
Here we provide the optimal classical strategy for the case where exactly one out of k possible variables B 1 , B 2 , . . . , B k is the causal intermediary of A. The result is stated in the following Lemma 4 . The minimum error probability in the identification of the causal intermediary among k ≥ 2 alternatives is Proof. Suppose that the i-th output variable is not the causal intermediary. The probability that it takes values compatible with a permutation is P Hence, the probability that the i-th variable-and only the i-th variable-is confusable with the true causal intermediary is Similarly, the probability that that variables i 1 , i 2 , . . . , i t (and only variables i 1 , i 2 , . . . , i t ) are confusable with the true causal intermediary is When this situation arises, one has to resort to a random guess, with probability of error t/(t + 1). In total, the probability of error is equal to Since the coefficient P (d, v) is minimum when v = 1, the optimal strategy is to initialize all input variables in the same value, thus obtaining probability of error p C err = k−1 Supplementary Note 4: Optimal quantum strategy for k hypotheses without reference system Here we provide the best strategy among all quantum strategies that do not use a reference system.
Lemma 5 . The best quantum strategy without reference system is to divide the N input variables into N/d groups of d elements each and, within each group, to prepare the singlet state where k1k2...k d is the totally antisymmetric tensor and the sum ranges over all vectors in the computational basis. The corresponding error probability is Proof. Let us denote by x the "true causal intermediary", namely the quantum system B x whose state depends on the state of A, and by C x,U the channel defined by the relation where the subscript x indicates that the operator U(ρ) acts on the Hilbert space of system B x and the subscript x indicates that the operator acts on the Hilbert space of the remaining k − 1 systems.
By the same arguments used in Lemma 2 , the discrimination of the causal hypotheses can be reduced to the discrimination of the channels Again, one can show that, for every reference system R, the optimal state can be chosen of the form where P λ0 is the projector on the SU(d) representation space with Young diagram λ 0 , d λ0 = Tr[P λ0 ], and Ψ λ0R is a pure state of the composite system M λ0 ⊗ H R , M λ0 being the SU(d) multiplicity space associated to λ 0 .
Here we consider the case where the reference system R is trivial. In this case, the problem is to distinguish among the states Using the Yuen-Kennedy-Lax formula [4], the maximum success probability in distinguishing among these states is Note that the states {ρ x , k = 1, . . . , k} commute. Hence, they can be diagonalized in the same basis and the operator Γ can be chosen to be diagonal in that basis without loss of generality. With a similar argument, one can restrict the search for the optimal Γ over the operators of the form where Γ λ1,...,λ k is an operator acting on the tensor product space M λ1 ⊗ M λ2 ⊗ · · · ⊗ M λ k . Note that the operators Γ λ1,...,λ k can be set to zero for all k-tuples (λ 1 , . . . , λ k ) such that λ i = λ 0 for every i ∈ {1, . . . , k}. Now, suppose that λ i = λ 0 and λ j = 0 for the remaining j = i. In this case, we must have where Q λ is the identity operator on the multiplicity space M λ . Taking the trace on both sides, we obtain the relation Similar bounds can be found for the operators Γ λ1,...,λ k where two or more indices are equal to λ 0 . For example, consider the terms where λ i = λ j = λ 0 , while λ l = 0 for the remaining values of l. In this case, we have the conditions where we introduced the shorthand notation We now combine conditions (73) and (74) can be combined into a single condition. To this purpose, we expand Q λ0 as which allows for rewriting (73) and (74) as Now, since Ψ λ0 ⊗ Ψ ⊥ λ0 and Ψ ⊥ λ0 ⊗ Ψ λ0 are orthogonal vectors, it is also true that which can be rewritten as Tracing on both sides, one obtains Likewise, a term with λ i1 = λ i2 = · · · = λ it = λ 0 and all the remaining λ l different from λ 0 will satisfy the condition leading to the inequality Note that one can choose the operator Γ in such a way that the equality holds in all bounds. With this choice, the probability of success is having defined the Schur-Weyl measure p λ := d λ m λ /d N .
Expanding the term in square brackets, we obtain Hence, the error probability is Again, the optimal choice for N multiple of d is to pick λ 0 to be the trivial representation of SU(d), in which case the error probability is Note that, however, the choice of representation λ 0 does not affect the asymptotic rate: indeed, for every λ 0 we have Note also that the rate is independent of the number of hypotheses, as in the case of the Chernoff bound for quantum states [8].

Supplementary Note 5: Optimal quantum strategy for k causal hypotheses with arbitrary reference system
Here we provide the optimal quantum strategy using a reference system. We will prove the following lemma: Lemma 6 . The optimal input state is where m(N, d) is the dimension of the multiplicity space of the trivial representation, given by (for N/d being an integer) with lim N →∞ c(N ) = 1.
The proof consists of four steps: Step 1: reduction to the permutation register. We apply N uses of the channel C x to a state of the optimal form (31), where the pure state |Ψ λ0 is set to be the maximally entangled state |Φ λ0 = where the subscript x indicates that the corresponding operator acts on the N Hilbert spaces with label x (and on the reference), while the subscript x indicates that the corresponding operator acts on all systems except those with label x.
Breaking down the identity operator as I = (P λ0 ⊗ Q λ0 ) ⊕ (I − P λ0 ⊗ Q λ0 ), we can decompose ρ out x into orthogonal blocks where exactly l output systems are in the sector λ 0 . Explicitly, we have where S l denotes the set of all l-element subsets of {1, 2, . . . , k}, ρ x,A is the quantum state defined by χ A is the quantum state defined by and q(A|x) is the conditional probability distribution defined by Mathematically, this means performing a non-demolition measurement with outcomes (l, A), which projects the state into the block labelled by (l, A). When such a measurement is performed on the state ρ out x , the outcome (l, A) can occur only if A contains x-in which case the probability of occurrence is q(A|x). Conditionally on the outcome, the system is left in the state ρ A,x ⊗ χ A and the problem is to identify x within the set A. Hence, the probability of success for fixed x is where p succ (x) is the probability of correctly identifying the state ρ A,x ⊗ χ A . Note that, for x ∈ A, the optimal success probability p where we used the shorthand notation Φ x := (Φ λ0 ) x , and used I m to denote the identity matrix in dimension m, with m = m λ0 (these are the states that arise from Eq. (92) after discarding the representation spaces). We denote by p (l) succ the average success probability Averaging the success probability (95) over x, we obtain The next step is to compute p (l) succ .
Step 2: reduction to type states. The state σ x in Eq. (96) is the product of a maximally entangled state and a (l − 1) copies of the maximally mixed state. The latter can be diagonalized as where |j is the basis vector |j = |j 1 ⊗ |j 2 ⊗ · · · ⊗ |j l−1 corresponding to the sequence j = (j 1 , j 2 , . . . , j l−1 ) ∈ {1, . . . , m} ×(l−1) . Now, let us introduce the shorthand Note that for x ≤ y one has Let n = (n 1 , n 2 , . . . , n m ) be a partition of l − 1 into m nonnegative integers. Recall that the sequence j = (j 1 , j 2 , . . . , j l−1 ) is said to be of type n if it n 1 entries of j are equal to 1, n 2 entries are equal to 2, and so on. Eq. (101) tells us that the vectors |Φ x,j and |Φ y,k are orthogonal whenever the sequences j and k are of different type. Using this fact, we can define the orthogonal subspaces where S n is the set of all sequences of length l − 1 and of type n. Hence, we can decompose the states σ x in Eq. (96) as with p(n) = C n m l−1 and σ n, where C n = (l − 1)!/[n 1 !n 2 ! · · · n m !] is the number of sequences of type n. Eq. (103) tells us that, in order to distinguish the states σ x , one can perform an orthogonal measurement that projects on the subspaces {H n } (102). If the measurement outcome is n, one is left with the task of distinguishing among the states σ n,x . The success probability of this strategy is where p (n) succ is the probability of correctly distinguishing the states {σ n,x | x ∈ {1, . . . , l}}.
Step 3: lower bound on the probability of success. The probability of correctly distinguishing the states {σ n,x | x ∈ {1, . . . , l}} is lower bounded by the probability of correctly distinguishing among all their eigenstates Note that the total number of states is l C n . We now construct a measurement that distinguishes these states with high success probability. The measurement is constructed through a Grahm-Schmidt orthogonalization procedure. We define a first batch of C n vectors as This definition is well-posed, because the above vectors are orthonormal, due to Eq. (101). A second batch of vectors is constructed from the vectors {|Φ 2,j , j ∈ S n } via the Grahm-Schmidt procedure, which yields where j 12 is the sequence such that Φ 1,j 12 |Φ 2,j = 1/m. A third batch of vectors is constructed from the vectors {|Φ 2,j , j ∈ S n }. Now, the Grahm-Schmidt procedure yields where |Γ 3,j is a vector of the form |Φ 1,k for some suitable k and |Rest 3,j is a suitable unit vector, which is irrelevant for computing the leading order of the success probability. In general, the x-th batch of vectors is where |Γ x,j is a normalized combination of vectors of the form |Φ z,kz , z < x − 2, while |Rest x,j is a suitable unit vector.
Note that one has having used the fact that the product Φ x,j |Γ x,j is O(1/m).
Using Eq. (111), we can now evaluate the probability of correctly distinguishing the states {|Φ x,j }. On average over all possible states, the probability of success is Since measuring on the basis {|Ψ x,j } is not necessarily the optimal strategy, we arrived at the lower bound Note that the (leading order of the) r.h.s. is independent of the type n.
Step 4: putting everything together. Combining the results obtained so far, we can lower bound the success probability in distinguishing among k causal structures. Inserting the lower bound (113) into Eq. (105), we obtain Then, we can insert the above bound into Eq. (98). Reverting to the full notation m λ0 ≡ m, we obtain Hence, the error probability of the optimal quantum strategy is upper bounded as Recalling that the ratio d λ /m λ is minimised by the representation with "minimal" Young diagram (in the majorisation order), we conclude that, when N is a multiple of d, the optimal error probability satisfies the bound Hence, the asymptotic decay rate is lower bounded as On the other hand, the r.h.s. is equal to the decay rate for k = 2, which is a lower bound for the decay rate for k ≥ 2.
In conclusion, we obtained that the optimal decay rate is equal to R Q = 2 log d.

Supplementary Note 6: Quantum speedup in the identification of a cause
We consider the scenario where k quantum variables A 1 , . . . , A k are candidate causes of a given effect B. For simplicity, we assume that all variables are quantum systems of dimension d < ∞. The causal relation is described by a quantum channel C x,U of the form C x,U (ρ) = U(Tr x [ρ]), where Tr x denotes the partial trace over all input systems except A x , with x ∈ {1, . . . , k}, and U is a generic unitary channel, acting on the remaining system A x . The problem is to identify the value of x.

Fixed unitary gates
Suppose first that the unitary gate U is fixed. Without loss of generality, we can assume U = I, so that the channel C x,I is simply the partial trace over all systems except x. The distinguishability of the channels {C x,I } k x=1 has been studied extensively in the optimization of port-based teleportation [9]. A simple strategy is to entangle each input system with a reference system, obtaining the output state ρ , where Φ + is the maximally entangled state, R x is the x-th reference system, and the subscript x indicates that the operator (I/d) ⊗(k−1) acts on the Hilbert space of all reference systems except R x .
For k ≥ d, the optimal probability of success in distinguishing between the states {ρ x } k x=1 is p succ = d 2 /(k − 1 + d 2 ) [9]. If the unknown process is probed for N times, the output state is ρ ⊗N x and the probability of success is

Unknown unitary gates
Let us consider the scenario where the unitary gate U is completely unknown. By the same argument as in Supplementary Note 1, the minimum worst-case error probability is equal to the minimum error probability in distinguishing between the average channels The symmetry of the problem implies that the optimal input states are of the form where P λi is the projector on the representation space R λi in the tensor product (H ⊗N ) i of the N systems corresponding to variable A i , and the subscript M λi denotes the multiplicity space in (H ⊗N ) i . When the input variables are initialized in the state ρ AR , the output is where Tr M λx is the trace over all multiplicity spaces except M λx . We now show that the true cause can be perfectly identified using at O(log d k) queries to the unknown process. We first provide an exact strategy using log d k queries (at the leading order), and then show that the number of queries can be reduced to 1/2 log d k (at the leading order) if a small error, vanishing in the large k limit, is tolerated.
Our exact strategy disregards the reference system R. In this strategy, we prepare the multiplicity systems in the product state We divide the indices i into L groups, labelled as G 1 , G 2 , . . . , G L and assign a distinct Young diagram to each group, so that λ i = λ j for i, j in the same group. Within each group, we choose the states |ψ i M λ i to be orthogonal. This choice constrains the number of indices in group G l to be at most the dimension of the multiplicity space M λ G l , where λ G l is the Young diagram assigned to the group G l . In turn, this implies that the condition must be satisfied. Both bounds can be saturated, as one can choose L to be the number of Young diagrams in the decomposition of the tensor representation U ⊗N . On the other hand, the multiplicities are lower bounded as is the multinomial coefficient [10,11]. Hence, we have the bound λ m λ ≥ d N /(N +1) d(d−1)/2 , meaning that condition (122) can be satisfied with N ≥ log d k +O(log log k). Hence, the unknown cause can be identified with zero error using approximately log d k queries. We now construct a strategy that identifies the correct cause with 1/2 log d k+O(log log k) queries and with vanishing error probability. In this strategy, all the input variables are initialized in the same sector, namely λ 1 = λ 2 = · · · = λ k ≡ λ. Specifically, we take N to be a multiple of d and choose λ to be the Young diagram corresponding to the trivial representation of SU(d). The strategy uses the reference system R = M ⊗k λ and the input state where Φ + λ is the projector on the maximally entangled state of two identical copies of M λ . Then, the output state is where the maximally entangled state Φ + λ x involves the output system B and the x-th reference system, while all the remaining reference systems are in the maximally mixed state Q λ /m λ . Distinguishing among the states ρ BR,x is equivalent to distinguishing the states Φ x . This problem has been solved in the context of port-based teleportation, and the minimum error probability is known to be p err = (k − 1)/(m 2 λ + k − 1) [9]. Using Equation (89), we then obtain with lim N →∞ c(N ) = 1. Hence, a vanishing error probability can be obtained by setting N = (log d k)(1 + )/2 with > 0.
When p is small enough, the rate can be larger than log d, the best classical rate in the noiseless scenario. Since noise can only increase the error probability, this implies a quantum-over-classical advantage in the noisy scenario. The same result holds for the discrimination of k ≥ 2 hypotheses, as the quantum Chernoff bound for multiple states is equal to the worst-case Chernoff bound among all pairs [8].
We now provide a partial discussion of the scenario where the functional dependence between cause and effect is unknown. This scenario can be modelled by concatenating the depolarizing channel with a completely unknown unitary gate acting on the input variable. The full analysis of the probability of error is substantially more complex, and we leave it as a topic of future research. Here we evaluate the error probability in the simplified scenario where the depolarization is heralded, meaning that when the system is depolarized to the maximally mixed state, the process outputs a classical outcome. Taking this piece of information into account, the error probability becomes p err = N k=0 (1 − p) k p N −k N k p err,k , where p err,k is the probability of error with k noiseless experiments.
The evaluation of p err,k is as follows. The input state of k maximally entangled states, averaged over all possible unitary gates is A. Supplementary Note 8: Proof of Equation (29) in the main text Step 1. Observe that the channels C ± = 2 d N ±1 P ± ρ ⊗ I ⊗N P ± are no-signalling. Indeed, for every subset S ⊆ {1, . . . , N } one has that the input system A(S) := i∈S A i cannot signal to the output system BC(S) := i ∈S (B i ⊗ C i ). To check the no-signalling condition, we use the relation where I BiCi is the identity operator on the composite system B i C i , and SWAP BiCi is the unitary operator that swaps systems B i and C i . The state of the output system BC(S) is and depends only on the state of the input system A(S).
Step 2. Show that there exist coefficients a and b such that the maps a C + + b C − − 1/2 C 1,I and a C + + b C − − 1/2 C 2,I are completely positive.
Let us consider the N = 1 case first. By definition, one has with m 00 = a 2(d + 1) .
As an ansatz, we choose α = √ d − 1 x and β = √ d + 1 x, for some x > 0. Then, condition (138) becomes x ≥ 1 8d Note that the choice x = x 0 satisfies both conditions (138) and (137). Finally, note that the above derivation holds for arbitrary N , by replacing d with d N .
Step 3. Define the constant λ := a + b and the no-signalling channel C := (a C + + b C − )/λ. By construction, the maps λ C − 1/2 C 1,I and λ C − 1/2 C 2,I are completely positive. Explicit evaluation yields Finally, observe that the maps λ C −1/2 C 1,I and λ C −1/2 C 2,I are completely positive if and only if the Choi operators C, C 1,I , and C 2,I corresponding to C, C 1,I , and C 2,I satisfy the inequalities λ C ≥ 1/2 C 1,I and λ C ≥ 1/2 C 2,I . Inserting the expression of λ into Equation (26) of the main text, we then obtain the desired bound