Introduction

Quantum measurements are a key resource behind most quantum technologies1 and, moreover, they reveal some of the most startling non-classical features of quantum theory.2,3 Indeed, performing joint quantum measurements on composite systems is a key feature behind quantum teleportation, superdense coding, metrology, cryptography,4 quantum repeaters,1 and quantum networks more generally. Hence the ability to certify the non-classical nature of quantum measurements is vitally important for the functioning of quantum technology and additionally, for understanding some of the fundamental differences between quantum and classical physics. Moreover, as the manufacturers of quantum measurement devices may not always be trusted, such certifications should be device-independent. That is, they should rely only on output measurement statistics rather than any intrinsic quantum properties, such as knowledge of the underlying Hilbert space dimension.

Previous work on the certification of joint quantum measurements largely falls into two categories. The first uses witnesses to certify the presence of non-classical measurements,5,6,7 but is manifestly not device-independent. The second is device-independent, but requires post-selection—the ability to discard all runs of an experiment where a specific event did not occur—to certify the presence of a quantum measurement.8 In the case of entanglement, such certification is accomplished by exploiting the fact that applying an entangled measurement to two initially independent entangled states and post-selecting the outcome, that is considering only those runs in which a single fixed outcome occurred, induces entanglement between the states, which can then be certified device-independently by violating a Bell inequality. This method hence detects quantum measurements through their action on states. That is, it certifies an entangled measurement through what is does, not what it is. This is in stark contrast with entangled states, whose non-classicality is easily certified through the violation of a Bell inequality. Such violation implies a denial of (at least one of) the assumptions underlying Bell’s theorem. The modern treatment of which utilises the classical causal model framework to unify Bell’s original assumptions.9,10,11 Composite states are thus said to be non-classical if the correlations generated by locally measuring each composite system are inconsistent with an underlying classical causal model.

This paper remedies the discrepancy between the treatment of non-classicality in quantum states and measurements. In analogy with Bell’s theorem, a joint quantum measurement is said to be non-classical if the correlations generated by performing it on local preparations on each composite system are inconsistent with an underlying classical causal model. In the following section this classical causal model is introduced and a non-linear inequality on any distribution generated by it is derived. Violation of this inequality entails that the observed correlations are in conflict with the classical causal model. As the inequality depends only on observed output statistics, it is manifestly device-independent. Additionally, it will be demonstrated that this inequality provides a finer-grained notion of joint measurement non-classicality for general quantum measurements than the post-selection approach of ref. 8, discussed above, as it certifies the presence of non-classicality that cannot be revealed by examining correlations arising from post-selecting the outcomes of measurement alone.

Results

Certifying non-classical joint measurements

Recently, tools and techniques from the classical causal models framework have begun to see myriad applications in quantum information.9,10,11,12,13,14,15,16 For connections between related notions of causality and quantum information, see ref. 17,18,19,20,21,22,23,24,25 In this framework, the inputs and outputs of agents measurement and preparation devices are represented by nodes in directed acyclic graphs (DAGs), with the arrows denoting the causal relationship between nodes. The structure of each DAG encodes conditional independence relations among the nodes (here the faithfulness condition is being assumed, see ref. 11,26 for a discussion). For instance, the no-signalling conditions P(A|X, Y) = P(A|X), and P(B|X, Y) = P(B|Y), follow directly11 from the structure of the DAG from in Fig. 1. Indeed, the sructure of the DAG specifies all the conditional independences between the nodes.26,27 In short, every relation between the inputs and outputs of the different agents are specified by the DAG.

Fig. 1
figure 1

DAG representing classical causal model. This causal structure is similar to the bilocality structure introduced in ref. 28, with a few key differences. Firstly, Alice and Bob have preparation devices, rather than measurement devices. Moreover, in the structure considered here, there is an arrow from preparation outcome to the hidden variable rather than the other way around—as is the case in the bilocality set-up. Hence, here, the hidden variables can a priori depend on the choice of preparation. That is, it does not follow from the above DAG that P(λ1|x) = P(λ1)

Consider three agents Alice, Bob, and Charlie. Alice and Bob both have devices which prepare a quantum state from some ensemble of states, given a choice between different ensembles. Charlie has a measurement device which jointly measures the states prepared by Alice and Bob. The actions of these devices are represented in a black-box manner. Alice and Bob’s devices have a classical input x, y (the choice of different ensemble) respectively, and a classical output a, b (the state prepared from the chosen ensemble) respectively. Here it is assumed that a, b, x, y {0, 1}. Charlie has no classical input, as his device only performs a single measurement, but has a classical output C indexing the possible measurement outcomes. It is assumed in this section that C takes four values and hence is indexed by two bits, C = c0c1 {00, 01, 10, 11}. Preparing and measuring states in this manner gives rise to a conditional probability distribution P(a, b, c0c1|x, y).

In analogy with Bell’s theorem, a classical causal model for P(a, b, c0c1|x, y) is described by the DAG in Fig. 1, where λ1, λ2 are unobserved, independent random variables. If the correlations generated by performing Charlie’s measurement on Alice and Bob’s preparations are consistent with the DAG in Fig. 1, then they are said to be classical. That is, they are mediated by the hidden random variables λ1, λ2. One might wonder why there are two hidden variables, rather than one. This is due to the independence of Alice and Bob’s devices: \(P(a,b|x,y) = \mathop {\sum}\nolimits_{c_0c_1} P (a,b,c_0c_1|x,y) = P(a|x)P(b|y)\). If the correlations between Alice, Bob, and Charlie were mediated by a single hidden variable, then Alice and Bob’s marginal distribution would not be independent. A bound on the possible classically generated correlated is now presented.

Result 1. A distribution P(a, b, c0c1|x, y) generated by the DAG of Fig. 1 satisfies:

$$\sqrt {|M|} + \sqrt {|N|} \le 1,$$
(1)
$$\begin{array}{lll}where\qquad M &=& \frac{1}{4}\mathop {\sum}\limits_{xy} {\langle A_xB_yC^0\rangle } ,\\ and\,\quad \qquad N &=& \frac{1}{4}\mathop {\sum}\limits_{xy} {( - 1)^{x + y}} \langle A_xB_yC^1\rangle ,\\ and\,\langle A_xB_yC^i\rangle &=& \mathop {\sum}\limits_{abc_0c_1} {( - 1)^{a + b + c_i}} P(a,b,c_0c_1|x,y).\end{array}$$
(2)

Note that, in contrast to standard Bell inequalities, the inequality presented above is non-linear in the joint distribution P(a, b, c0c1|x, y). This is due to the independence of Alice and Bob’s preparations. The proof of Result 1 is similar to the derivation of the bilocality inequality from ref. 28, with a few key differences. First, in the case considered here, the hidden variables can a priori depend on the choice of preparation. That is, it does not follow from the DAG of Fig. 1 that P(λ1|x) = P(λ1). Lastly, Alice and Bob have preparation devices, rather than measurement devices.

Proof. Given the structure of the DAG from Fig. 1, it follows that P(a, b, c0c1|x, y) decomposes as

$${\int\!\!\!\!\!\int} d \lambda _1d\lambda _2P(a|x)P(b|y)P(c_0c_1|\lambda _1\lambda _1)P(\lambda _1|x)P(\lambda _2|y).$$
(3)

Define \(\langle A_x\rangle = \mathop {\sum}\nolimits_a {( - 1)^a} P(a|x)\), \(\langle B_y\rangle = \mathop {\sum}\nolimits_b {( - 1)^a} P(b|y)\), and \(\left\langle {C^i} \right\rangle _{\lambda _1\lambda _2} = \mathop {\sum}\nolimits_{c_0c_1} {( - 1)^{c_i}} P(c_0c_1|\lambda _1\lambda _2).\) It follows from the above decomposition that one can write 〈AxByCi〉 as

$${\int\!\!\!\!\!\int} d \lambda _1d\lambda _2\langle A_x\rangle \langle B_y\rangle \left\langle {C^i} \right\rangle _{\lambda _1\lambda _2}P(\lambda _1|x)P(\lambda _2|y).$$
(4)

This, together with \(|\left\langle {C^i} \right\rangle _{\lambda _1\lambda _2}| \le 1\), implies

$$\begin{array}{l}|M| \le \left( {{\int} d \lambda _1\frac{{|\langle A_0\rangle P(\lambda _1|0) + \langle A_1\rangle P(\lambda _1|1)|}}{2}} \right)\\ \quad \quad \quad \cdot \left( {{\int} d \lambda _2\frac{{|\langle B_0\rangle P(\lambda _2|0) + \langle B_1\rangle P(\lambda _2|1)|}}{2}} \right).\end{array}$$
(5)

One can similarly bound |N|. The key difference being that the +’s in the above bound are replaced with −’s due to the occurrence of the (−1)x+y term in N. For real z, w, z′, w′ ≥ 0, it was proved in ref. 28 that the inequality \(\sqrt {zw} + \sqrt {z^{\prime}w^{\prime}} \le \sqrt {z + z^{\prime}} \sqrt {w + w^{\prime}}\) holds. Hence

$$\begin{array}{l}\sqrt {|M|} + \sqrt {|N|} \le \sqrt {{\int} d \lambda _1\left( {{\textstyle{{|\langle A_0\rangle P(\lambda _1|0) + \langle A_1\rangle P(\lambda _1|1)|} \over 2}} + {\textstyle{{|\langle A_0\rangle P(\lambda _1|0) - \langle A_1\rangle P(\lambda _1|1)|} \over 2}}} \right)} \\ \cdot \sqrt {{\int} d \lambda _2\left( {{\textstyle{{|\langle B_0\rangle P(\lambda _2|0) + \langle B_1\rangle P(\lambda _2|1)|} \over 2}} + {\textstyle{{|\langle B_0\rangle P(\lambda _2|0) - \langle B_1\rangle P(\lambda _2|1)|} \over 2}}} \right)} \\ \le \sqrt {{\int} d \lambda _1\max (|\langle A_0\rangle P(\lambda _1|0)|,\langle A_1\rangle P(\lambda _1|1))} \\ \cdot \sqrt {{\int} d \lambda _2\max (|\langle B_0\rangle P(\lambda _2|0)|,\langle B_1\rangle P(\lambda _2|1))} \le 1\qquad \square \end{array}$$
(6)

The bound from Result 1 can be classically saturated. To see this, consider the following. Let x, y be independent, uniformly distributed random bits. Let Alice’s (Bob’s) device output a = x1 (b = y1) with probability one, and let λ1 (λ2) equal a1 (b1) with probability one. Let Charlie have two independent and identically distributed random bits μ0 and μ1. When both μ0 and μ1 equal zero, Charlie’s device outputs (c0, c1) = (λ1λ2, ν) with probability one, where ν is another random bit. When μ0 and μ1 equal one, Charlie’s device outputs (c0, c1) = (ν, λ1λ2) with probability one. When μ0 ≠ μ1 Charlie’s device outputs (c0, c1) = (λ1, λ2,) with probability one. When μ0 = μ1 = 0 it follows by a straightforward calculation that M = 1 and N = 0, and when μ0 = μ1 = 1, M = 0 and N = 1. In all remaining cases M = N = 0. As the probability that μ0 = μ1 = 0 is r2 and the probability that μ0 = μ1 = 1 is (1 − r)2, where r = P(μ0 = 0) = P(μ1 = 0), all points (M, N) = (r2, (1 − r)2) can be achieved. The boundary \(\sqrt {|M|} + \sqrt {|N|} = 1\) is thus classically saturated.

Quantum violation

Recall that Alice and Bob’s devices prepare a single state from an ensemble of two states, given a choice between two possible ensembles. For the quantum violation of the bound from Result 1, the preparation device used by Alice and Bob is the same. The functioning of this device will now be specified. For x = 0 (y = 0, respectively) one of the two states from the basis

$$\left\{ \cos \left( {\frac{\pi }{8}} \right)|0\rangle + \sin \left( {\frac{\pi }{8}} \right)|1\rangle ,\cos \left( {\frac{\pi }{8}} \right)|0\rangle - \sin \left( {\frac{\pi }{8}} \right)|1\rangle \right\}$$
(7)

is prepared, with the value of a (b)—the preparation outcome—denoting whether the first (‘0’th) or second (‘1’st) state is prepared. Each state is equally likely. For x = 1 (y = 1) one of the two states from the basis

$$\left\{ \cos \left( {\frac{{3\pi }}{8}} \right)|0\rangle + \sin \left( {\frac{{3\pi }}{8}} \right)|1\rangle ,\cos \left( {\frac{{3\pi }}{8}} \right)|0\rangle - \sin \left( {\frac{{3\pi }}{8}} \right)|1\rangle \right\}$$
(8)

is prepared. Again, the value of a (b) denotes whether the first (‘0’th) or second (‘1’st) state is prepared. For instance, if Alice chooses preparation x = 1 and observes outcome a = 0, then the state

$$\cos \left( {\frac{{3\pi }}{8}} \right)|0\rangle + \sin \left( {\frac{{3\pi }}{8}} \right)|1\rangle$$
(9)

has been prepared by her device. It is clear that the states received by Charlie depend on the choice of preparation.

A simple quantum realisation of Alice’s (Bob’s) preparation device is to prepare a maximally entangled |ψ〉 state between Alice’s (Bob’s) system and an ancilla, and perform a measurement on the ancilla to prepare a state on Alice’s (Bob’s) system. Given two distinct measurements that can be performed on the ancilla, there are two distinct ensembles of states to which Alice’s (Bob’s) system can be steered. The specific measurement outcome prepares a fixed state from the ensemble associated with that measurement. Note that as Alice has control of both her original system and the ancilla, she knows the measurement outcome on the ancilla system and hence what state is prepared on her system. There is hence a mathematical correspondence between the choice of measurement on the ancilla and the choice of preparation on the original system. In effect, when a single agent holds both systems, measuring one half of an entangled state is mathematically equivalent to preparing a state on the other system. This is schematically depicted in Fig. 2, see the caption for further details.

Fig. 2
figure 2

Physical realisation of Alice and Bob’s preparation devices. The box corresponds to the preparation device, and the vertical line emerging from it is the system Alice (Bob) sends Charlie. Here, on the left hand side of the diagram, x denotes the different possible ensembles and a denotes the specific state prepared from each ensemble. As Alice and Bob control both system and ancilla on the right hand side diagram, they know the measurement outcome a′. Hence, they know the exact state prepared on their original system. More specifically, the choice of measurement on their ancilla, denoted by x, specifies the two different ensembles the original system can be steered to. Moreover, the specific measurement outcome a′ prepares a fixed state a from the ensemble associated with that choice of measurement

Now, to achieve the specific state preparations described at the start of this section, Alice (Bob) performs either \((\sigma _Z + \sigma _X)/\sqrt 2\) (for x = y = 0) or \((\sigma _Z - \sigma _X)/\sqrt 2\) (for x = y = 1) on her (his) ancilla. This provides a concrete physical implementation of the preparation devices held by Alice and Bob, described earlier in this section.

Finally, Charlie performs the “noisy” Bell state measurement \(\{ E_{c_0c_1}\}\) on his system, where \(E_{c_0c_1} = p{\mathrm{ }}|\psi _{c_0c_1}\rangle \langle \psi _{c_0c_1}| + (1 - p){\Bbb I}/4\), and {|ψ00〉〈ψ00|, |ψ01〉〈ψ01|, |ψ10〉〈ψ10|, |ψ11〉〈ψ11|} is the Bell state measurement. As \(E_{c_0c_1} \ge 0,\forall c_0c_1\), and \(\mathop {\sum}\nolimits_{c_0c_1} {E_{c_0c_1}} = {\Bbb I}\), this is a valid measurement.

The correlations generated by the above preparation and measurement procedure are the same as those considered in Section III A of ref. 28, namely:

$$P(a,b,c_0c_1|x,y) = \frac{1}{{16}}\left( {1 + p( - 1)^{a + b}\left\{ \frac{{( - 1)^{c_0} + ( - 1)^{x + y + c_1}}}{2}\right\} } \right).$$
(10)

From this one obtains \(\sqrt {|M|} + \sqrt {|N|} = \sqrt {2p} ,\) providing a quantum violation for p > 1/2.

Post-selection

Ref. 8 demonstrated that the presence of an entangled measurement can be certified in an device-independent fashion using post-selection. This was achieved by exploiting the fact that performing an entangled measurement on two initially independent entangled states and post-selecting the outcome—that is considering only those runs of the experiment in which a specific fixed outcome occurs—induces entanglement between the states, which can then be certified device-independently by violating a Bell inequality (note that post-selection here does not refer to finite sampling effects. It is hence not related to the fair sampling loophole in Bell experiments, which concerns practical limitations on the efficiencies of measurement devices). This method detects entangled measurements through their action on states by showing that for each fixed measurement outcome the induced correlations are non-classical. In the current work a novel method has been introduced which certifies general measurement non-classicality not through what it does, but what it is. These two approaches coincide for entangled measurements,8 but do they coincide for general non-classical measurements? That is, if a measurement is non-classical in the sense that it violates the bound from Result 1, are the correlations induced between Alice and Bob’s devices on post-selection of Charlie’s outcome always non-classical? It will now be shown that, surprisingly, the existence of a separate classical model for each post-selected measurement outcome does not imply the measurement is classical in the sense of Fig. 1.

Note that given the realisations, introduced in the previous section, of Alice and Bob’s preparation devices involving steering using projective measurements on an ancilla, it follows that non-classical correlations between Alice and Bob’s preparation devices are equivalent to non-classical correlations between projective measurements performed on their ancillas.

Now, consider the following: Allow Charlie to perform a noisy Bell state measurement with noise parameter p and post-select on an arbitrary fixed outcome. If Alice and Bob each have their own Bell state, then Charlie’s joint measurement on two of their systems induces a noisy Bell state—with the same noise parameter p—between Alice and Bob’s ancilla. For instance, if Charlie post-selects outcome \(E_{00} = p{\mathrm{ }}|\psi ^ - \rangle \langle \psi ^ - | + (1 - p){\Bbb I}/4\), then Alice and Bob’s ancilla will be in the \(p|\psi ^ - \rangle \langle \psi ^ - | + (1 - p){\Bbb I}/4\) state. Hence, classically simulating Charlie’s joint noisy Bell measurement on Alice and Bob’s preparations is equivalent to classically simulating local projective measurements on Alice and Bob’s ancilla’s in the induced noisy Bell state. As shown in,29 such correlations can be classically simulated for p < 0.68. But, as shown in the previous section, the non-post-selected measurement is non-classical as long as p > 1/2. To summarise, the following has been shown:

Result 2. The existence of a separate classical model for each joint measurement outcome—adhering to the constraints imposed by Fig. 1does not imply the joint measurement is classical in the sense of Fig. 1 and Result 1.

An intuitive explanation of this result could be that, as Charlie’s measurements outcomes can overlap on certain states, classical models for each individual measurement outcome cannot always be combined consistently.

Generalisation to n systems and 2k outcomes

The inequality from Result 1 will now be generalised to allow for n systems, k choices for the each preparation device—each of which have two possible outcomes—and 2k possible outcomes for Charlie’s joint measurement, indexed using k bits c0...ck−1. Result 1 corresponds to the n = k = 2 case. As before, the classical causal model is depicted in Fig. 3.

Fig. 3
figure 3

DAG for composite joint measurement

Result 3. A distribution

$$P(a_1, \ldots ,a_nc_0 \ldots c_{k - 1}|x_1, \ldots ,x_n),$$
(11)

with ai, cj {0, 1} and xi {0, …, k − 1}, generated by the DAG of Fig. 3 satisfies the following inequality:

$${\cal{S}}: = \mathop {\sum}\limits_{i = 0}^{k - 1} {|I_i|^{1/n}} \le k - 1,$$
(12)

\(where\,I_i = \frac{1}{{2^n}}\mathop {\sum}\nolimits_{x_1, \ldots ,x_n = i}^{i + 1} {\langle A_{x_1}^1 \cdots A_{x_n}^nC^i\rangle }\), for i ranging from 0 to k − 1, with \(A_k^i = - A_0^i\) and \(\langle A_{x_1}^1 \cdots A_{x_n}^nB_y\rangle\) = \({\sum} {( - 1)^{b_i + \mathop {\sum}\limits_{j = 1}^n {a_j} }} P(a_1, \ldots ,a_nc_0 \cdots c_{k - 1}|x_1, \ldots ,x_n)\).

Proof. Given the decomposition of the distribution over the agents preparations and Charlie’s measurement,

$$P(a_1, \ldots ,a_nc_0 \cdots c_{k - 1}|x_1, \ldots ,x_n),$$
(13)

implied by the structure of Fig. 3, it follows that

$$|I_i| \le \mathop {\prod}\limits_{j = 1}^n \left( \frac{1}{2}{\int} | \mathop {\sum}\limits_{x_j = 1}^n {\langle A_{x_j}^j\rangle } p(\lambda _j|x_j)|d\lambda _j\right),$$
(14)

where \(\langle A_{x_j}^j\rangle = \mathop {\sum}\nolimits_{a_j} {( - 1)^{a_j}} P(a_j|x_j)\).

It was shown in ref. 30 that, for \(c_i^k \in {\Bbb R}_ +\) and \(m,n \in {\Bbb N}\), the following holds:

$$\mathop {\sum}\limits_{k = 1}^m {\left(\mathop {\prod}\limits_{i = 1}^n {c_i^k} \right)^{1/n}} \le \mathop {\prod}\limits_{i = 1}^{i + 1} \left( {c_i^1 + c_i^2 + \cdots + x_i^m} \right)^{1/n} .$$
(15)

Applying this result to \({\cal{S}} = \mathop {\sum}\limits_{i = 0}^{k - 1} {|I_i|^{1/n}}\) yields

$${\cal{S}} \le \left[ {\mathop {\prod}\limits_{j = 1}^n {\frac{1}{2}} {\int} {\left( {|\langle A_0^j\rangle p(\lambda _j|0) + \langle A_1^j\rangle p(\lambda _j|1)| \,+} \right.} } \right.$$
(16)
$$\left. {\left. { \cdots + |\langle A_{k - 1}^j\rangle p(\lambda _j|k - 1) - \langle A_0^j\rangle p(\lambda _j|0)|} \right)d\lambda _j} \right]^{1/n}.$$
(17)

The following upper bound holds:

$$\begin{array}{l}\frac{1}{2}\left( {|\langle A_0^j\rangle p(\lambda _j|0) + \langle A_1^j\rangle p(\lambda _j|1)| + } \right.\\ \left. { \cdots + |\langle A_{k - 1}^j\rangle p(\lambda _j|k - 1) - \langle A_0^j\rangle p(\lambda _j|0)|} \right) \le k - 1.\end{array}$$
(18)

Hence, one has

$${\cal{S}} \le \left(\mathop {\prod}\limits_{j = 1}^n {{\int} {\left( {k - 1} \right)^n} } d\lambda _j\right)^{1/n} = k - 1\qquad \qquad \square$$
(19)

Discussion

This paper has introduced a novel notion of non-classicality for joint quantum measurements. This notion took its cue from Bell’s theorem and the device-independent certification of entangled quantum states by stipulating a joint quantum measurement to be non-classical if the correlations generated by performing it on local preparations are inconsistent with an underlying classical causal model. A non-linear inequality was then derived as a witness for this inconsistency: a violation entails non-classicality. This inequality bounded the classically generated correlations achievable with this causal model. In future work it would be interesting to investigate the corresponding bounds for LOCC, unentangled, and entangled measurements, as was done in the semi-device independent case by refs. 5,7

Moreover, this approach was shown to provide a more fine-grained notion of non-classicality than the post-selection method of ref. 8 That is, there exists quantum joint measurements which admit a classical hidden variable model for each post-selected measurement outcome, but which are nevertheless non-classical and violate the inequality from Result 1. It would be interesting to determine if a quantum protocol exhibiting an information-theoretic advantage due to this discrepancy existed. That is, can an agent with access to the entire collection of correlations generated by a quantum joint measurement gain an advantage over an agent who only has access to a post-selected subset of those correlations?

In future work, connections between the notion of non-classicality introduced here and that of contexuality discussed in ref. 9 will be explored. Moreover, possible extensions to other experimental set-ups involving joint quantum measurements—such as the “triangle scenario” studied in refs. 31,32—will also be explored.

There has recently been a surge of interest in self-testing entangled measurements.33,34 While these methods provide robust methods to certify the presence of entangled measurements, they do not provide a clear definition of non-classicality for general joint quantum measurements—as Bell’s theorem does for entangled states. The current work remedied this situation by providing a clean notion of when a joint measurement should be said to be non-classical. Future work will look at the connections between these different approaches.

Note also that the work of ref. 35 gave a method to device-independently certify the presence of local quantum measurements. However, this method was relational in the sense that it only certified how non-classical one local measurement was with respect to another. That is, it only certified how two local measurements relate to each other, but not what they are individually. In the current work, the case of individual joint quantum measurements (i.e., a single measurement acting jointly on multipartite systems) is considered.

Finally, it is hoped that the current work will lead to further fruitful applications of the causal model framework to research in quantum information.