Information-Theoretic Implications of Quantum Causal Structures

The correlations that can be observed between a set of variables depend on the causal structure underpinning them. Causal structures can be modeled using directed acyclic graphs, where nodes represent variables and edges denote functional dependencies. In this work, we describe a general algorithm for computing information-theoretic constraints on the correlations that can arise from a given interaction pattern, where we allow for classical as well as quantum variables. We apply the general technique to two relevant cases: First, we show that the principle of information causality appears naturally in our framework and go on to generalize and strengthen it. Second, we derive bounds on the correlations that can occur in a networked architecture, where a set of few-body quantum systems is distributed among a larger number of parties.


INTRODUCTION
A causal structure for a set of classical variables is a graph, where every variable is associated with a node and a directed edge denotes functional dependence.Such a causal model offers a means of explaining dependencies between variables, by specifying the process that gave rise to them.More formally, variables X 1 , . . ., X n form a Bayesian network with respect to a directed, acyclic graph, if every variable X i depends only on its graph-theoretic parents pa i .This is the case [1,2] if and only if the distribution factorizes as in p(x 1 , . . . ,x n ) = n ∏ i=1 p(x i |x PA i ). ( One can ask the following fundamental question: Given a subset of of variables, which correlations between them are compatible with a given causal structure?In this work, we measure "correlations" in terms of the collection of joint entropies of the the variables (a precise definition will be given below).This problem appears in several contexts.In the young field of causal inference, the goal is to learn causal dependencies from empirical data [1,2].If observed correlations are incompatible with a presumed causal structure, it can be discarded as a possible model.This is close to the reasoning employed in Bell's Theorem [3] a connection which is increasingly appreciated among quantum physicists [4][5][6][7][8][9][10].In the context of communication theory, these joint entropies describe the capacities that can be achieved in network coding protocols [11].
In this work, we are interested in quantum generalizations of causal structures.Nodes are now allowed to represent either quantum or classical systems, and edges are quantum operations.An important conceptual difference to the purely classical setup is rooted in the fact that quantum operations disturb their input.
Put differently, quantum mechanics does not assign a joint state to the input and the output of an operation.Therefore, there is in general no analogue to (1), i.e., a global density operator for all nodes in a quantum causal structure cannot be defined.However, if we pick a set of nodes that do coexist (e.g. because they are classical, or because they are created at the same instance of time), then we can again ask: Which joint entropies of coexisting nodes can result from a given quantum causal structure?The main contribution of this work is to describe a systematic algorithm for answering this question, generalizing previous results on the classical case [6,7,[12][13][14].We illustrate the versatility and practical relevance with two examples.The details, along with more examples including, e.g.dense coding schemes [15], are presented in the main text.
Distributed architectures.-Consider a scenario where, in a first step, several few-body quantum states are distributed among a number of parties.In a second step, each party processes those parts of the states it has access to (e.g. by performing a coherent operation or a joint measurement).Such setups are studied e.g. in distributed quantum computing [16,17], quantum networks [18], quantum non-locality [19], and quantum repeaters [20].Which limits on the resulting correlations are implied by the network topology alone?Our framework can be used to compute these systematically.We will, e.g., prove certain monogamy relations between the correlations that can result from measurements on distributed quantum states.
Information causality.-The"no-signalling principle" alone is insufficient to explain the "degree of nonlocality" exhibited by quantum mechanics [21].This has motivated the search for stronger, operationally motivated principles, that may single out quantum mechanical correlations [22][23][24][25][26][27][28][29][30].One of these is information causality (IC) [24,31] which posits that an m bit message from Alice to Bob must not allow Bob to learn more than m bits about a string held by Alice.A precise formulation of the protocol involves a relatively complicated quantum causal structure (Fig. 1b).It implies an information-theoretic bound on the mutual information between bits X i held by Alice and guesses Y i of these by Bob [24].Here, we note that the IC setup falls into our framework and we put the machinery to use to generalize and strengthen it.We will show below that by taking additional information into account, our strengthened IC principle can identify super-quantum correlations that could not have been detected in the original formulation.

QUANTUM CAUSAL STRUCTURES
Informally, a quantum causal structure specifies the functional dependency between a collection of quantum and classical variables.We find it helpful to employ a graphical notation, where we aim to closely follow the conventions of classical graphical models [1,2].There are two basic building blocks: Root nodes are labeled by a set of quantum systems and represent a density operator for these systems AB .
The second type is given by nodes with incoming edges.Again, both the edges and the node carry the labels of quantum systems.Such symbols represent a quantum operation (completely positive, trace-preserving map) from the systems associated with the edges to the ones associated with the node: These blocks may be combined: a node containing a system X can be connected to an edge with the same label.The interpretation is, of course, that X serves as the input to the associated operation.For example, says that the state of system C is the result of applying an operation Φ AB→C to a product state on AB.To avoid ambiguities, we will never use the same label in two different nodes (in particular, we always assume that the output systems of an operation are distinct from the input systems).For a more involved example, note that Fig. 1(a) gives a fairly readable representation of the following cumbersome algebraic statement: (2) (where the operation defined in the first line is acting on the state defined in the second line).The graphical representation does not indicate which input state or which operation to employ.We suppress this information, because we will be interested only in constraints on the resulting correlations that are implied by the topology of the interactions alone, regardless of the choice of states and maps.We will use round edges to denote classical variables (equivalently, quantum systems described by states which are diagonal in a given basis).In principle, classical variables could have more than one outgoing edge, though this does not happen in the examples considered here.Of course, the no-cloning principle precludes a quantum system being used as the input to two different operations.Only graphs that are free of cyclic dependencies can be interpreted as specifying a causal structure.Thus, as is the case in classical Bayesian networks, every quantum causal structure is associated with a directed, acyclic graph (commonly abbreviated DAG).
We note that graphical notations for quantum processes have been used frequently before.The most popular graphical calculus is probably the gate model of quantum computation [32], where, directly opposite to our conventions, operations are nodes and systems are edges.Quantum communication scenarios are often visualized the same way we employ here [33].The recently introduced generalized Bayesian networks of [9] are closely related to our system.There, the authors even allow for post-quantum resources.
We have noted in the introduction that a classical Bayesian network not only defines the functional dependencies between random variables, but also provides a structural formula (1) for the joint distribution of all variables in the graph.Again, such a joint state for all systems that appear in a quantum causal structure is not in general defined.However, other authors have considered quantum versions of distributions that factor as in (1) and have developed graphical notations to this end.Well-known examples include the related constructions that go by the name of finitely correlated states, matrix-product states, tree-tensor networks, or projected entangled pairs states (a highly incomplete set of starting points to the literature is given by [34][35][36]).Also, certain definitions of quantum Bayesian networks [37] fall into that class.

ENTROPIC DESCRIPTION OF QUANTUM CAUSAL STRUCTURES
The entropic description of classical-quantum DAGs can be seen as a generalization of the framework for case of purely classical variables [6,7,[12][13][14] that consists of three main steps.In the first, we describe the constraints (given in terms of linear inequalities) over the entropies of the n variables describing a DAG.In the second step one needs to add to this basic set of inequalities, the causal entropic constraints as encoded in the conditional independencies implied by the DAG.In the last step, we need to eliminate from our description all terms involving variables that are not observable.The final result of this three steps program is the description of the marginal entropic constraints implied by the model under test.
We denote the set of indices of the random variables by [n] = {1, . . ., n} and its power set (i.e., the set of subsets) by 2 [n] .For every subset S ∈ 2 [n] of indices, let X S be the random vector (X i ) i∈S and denote by H(S) := H(X S ) the associated entropy vector (for some, still unspecified entropy function H).Entropy is then a function H : 2 [n] → R, S → H(S) on the power set.
Note that as entropies must fulfill some constraints, not all entropy vectors are possible.That is, given the linear space of all set functions denoted by R n and a function h ∈ R n the region of vectors in R n that correspond to entropies is given by {h ∈ R n | h(S) = H(S) for some entropy function H} .
Clearly, this region will depend on the chosen entropy function.
For purely classical variables, H is chosen to be the Shannon entropy given by H(X S ) = − ∑ x s p(x s ) log 2 p(x s ).In this case an outer approximation to the associated entropy region has been studied extensively in information theory, the so called Shannon cone Γ n [11], which is the basis of the entropic approach in classical causal inference [14].The Shannon cone is the polyhedral closed convex cone of set functions h that respect two elementary inequalities, known as polymatroidal axioms: The first relation is the sub-modularity (also known as strong subadditivity) condition which is equivalent to the positivity of the conditional mutual information, e.g.I(A : B|C) = H(A, C) + H(B, C) − H(A, B, C) − H(C) ≥ 0. The second inequality -known as monotonicity -is equivalent to the positivity of the conditional entropy, e.g.
Here lies the first difference between the classical and quantum variables, the latter being described in terms of the quantum analog of the Shannon entropy, the von Neumann entropy H( A,B ) = −Tr ( A,B log A,B ).While quantum variables respect sub-modularity, the von Neumann entropy fails to commit with monotonicity.Note, however, that for sets consisting of both classical and quantum variables, monotonicity may still hold.That is because the uncertainty about a classical variable A cannot be negative, even if we condition on an arbitrary quantum variable , following then that H(A| ) ≥ 0 [31].Furthermore, for a classical variable A, the entropy H(A) reduce to the Shannon entropy [38].
Another important difference in the quantum case is the fact that measurements (or more generally complete positive and trace preserving (CPTP) maps) on a quantum state will generally destroy/disturb the state.To illustrate that consider the classical-quantum DAG in Fig. 1.Consider the classical and observable variable A. It can without loss of generality be considered a deterministic function of its parents A 1 and A 2 , as any additional local parent can be absorbed in the latter.For the variable A to assume a definite outcome, a joint CPTP map is applied to both parents A 1 and A 2 that will in general disturb these variables.The variable A does not coexist with variables A 1 and A 2 .Therefore, no entropy can be associated to these variables simultaneously, that is, H(A, A 1 , A 2 ) cannot be part of the entropic description of the classical-quantum DAG.Classically, this problem does not arise as the underlying classical hidden variables could be accessed without disturbing them.
The elementary inequalities discussed above encode the constraints that the entropies of any set of classical or quantum random variables are subject to.Classically, the causal relationships between the variables are encoded in the conditional independencies (CI) implied by the graph.These can be algorithmically enumerated using the so-called d-separation criterion [1].Therefore, if one further demands that classical random variables are a Bayesian network with respect to some given DAG, their entropies will also ensue the additional CI relations implied by the graph.The CIs, relations of the type p(x, y|z) = p(x|z)p(y|z), defining non-linear constraints in terms of probabilities are faithfully translate to homogeneous linear constraints on the level of entropies, e.g.H(X, Y|Z) = 0.The CIs involving jointly coexisting variables also hold for the quantum causal structures considered here [9].However, some classically valid CIs may, in the quantum case, involve non coexisting variables and therefore are not valid for quantum variables.An example of that is illustrated below for the information causality scenario.
Furthermore, because terms like H(A, A 1 , A 2 ) are not part of our description, we need, together with the CIs implied by the quantum causal structure, a rule telling us how to map the underlying quantum variables in their classical descendants, for example, how to map . This is achieved by the data processing (DP) inequality, another basic property that is valid both for the classical and quantum cases [32].The DP inequality basically states that the information content of a system cannot be increased by acting locally on it.To exemplify, one DP inequality implied by the DAG in Fig. 1 is given by , that is, the mutual information between the classical variables cannot be larger then the information shared by their underlying quantum parents.
Finally, we are interested in situations where not all joint distributions are accessible.Most commonly, this is because the variables of a DAG can be divided into observable and not directly observable ones (e.g. the underlying quantum states in Fig. 1).Given the set of observable variables, in the classical case, it is natural to assume that any subset of them can be jointly observed.However, in quantum mechanics that situation is more subtle.For example, position Q and momentum P of a particle are individually measurable, however, there is no way to consistently assign a joint distribution to both position and momentum of the same particle [3].That is while H(Q) and H(P) are part of the entropic description of classical-quantum DAGs, joint terms like H(Q, P) cannot be part of it.This motivates the following definition: Given a set of variables X 1 , . . ., X n contained in a DAG, a marginal scenario M is the collection of those subsets of X 1 , . . ., X n that are assumed to be jointly measurable.
Given the inequality description of the DAG and the marginal scenario M under consideration, the last step consists of eliminating from this inequality description, the variables that are not directly observable, that is the variables that are not contained in M.This is achieved, for example, via a Fourier-Motzkin (FM) elimination (see appendix for further details).In two of the examples below (information causality and quantum networks), all the observable quantities correspond to classical variables, corresponding, for example, to the outcomes of measurements performed on quantum states.Therefore, the marginal description will be given in terms of linear inequalities involving Shannon entropies only.For the super dense coding case, the final description involves a quantum variable, therefore implying a mixed inequality with Shannon as well von Neumann entropy terms.

INFORMATION CAUSALITY
The IC principle can be understood as a kind of game: Alice receives a bit string x of length n, while Bob receives a random number s (1 ≤ s ≤ n).Bob's task is to make a guess Y s about the sth bit of the bit string x using as resources i) a m-bit message M sent to him by Alice and ii) some correlations shared between them.It would be expected that the amount of information available to Bob about x should be bounded by the amount of information contained in the message, that is, H(M).IC makes this notion precise, stating that the following inequality is valid in quantum theory [24] n where I(X : Y) is the classical mutual information between the variables X and Y and the input bits of Alice are assumed to be independent.This inequality is valid for quantum correlations but is violated by all nonlocal correlations beyond Tsirelson's bound [24,31,39].
Consider the case where X = (X 1 , X 2 ) is a 2-bit string.The corresponding causal structure to the IC game is then the one shown in Fig. 1 b).The only relevant CI is given by I(X 1 , X 2 : AB) = 0. Note that classically the CI I(X 1 , X 2 : Y s |M, B) = 0 (with s = 1, 2) would also be part of our entropic description.However, because we cannot assign a joint entropy to Y s and B , that is not possible in quantum case anymore.We can now proceed with the general framework.But before doing that we first need to specify in which marginal scenario we are interested.In Ref. [24] the authors implicitly restricted their attention to the marginal scenario defined by Proceeding with this marginal scenario we find that the only non-trivial inequality characterizing this marginal entropic cone is given by that corresponds exactly to the IC inequality obtained in [38] where the input bits are not assumed to be independent.
Note, however, that using the aforementioned marginal scenario, available information is being discarded.The most general possible marginal scenario is given by {X 1 , X 2 , Y s , M} (with s = 0, 1).That is, in this case we are also interested in how much information the guess Y 1 of the bit X 1 together with the message M may contain about the bit X 2 (similarly for B 2 and X 1 ).Proceeding with this marginal scenario we find different classes of non-trivial tight inequalities describing the marginal information causality cone.Of particular relevance is the following tighter version of the original IC inequality Two different interpretations can be given to this inequality: as a monogamy of correlations or as a classical quantification of causal influence.
For the first interpretation, consider for simplicity the case where the input bits are independent, that is, I(X 1 : X 2 ) = 0.These independent variables may, however, become correlated given we know the values of other variables that depend on them.That is, in general I(X 1 : X 2 |Y 2 , M) = 0.However, the underlying causal relationships between the variables impose constraints on how much we can correlate these variables.In fact, as we can see from (5), the more information the message M and the guess Y i contain about about the input bit X i , the smaller is the correlation we can generate between the input bits.As an extreme example suppose Alice decides to send M = X 1 ⊕ X 2 .Then X 1 and X 2 are fully correlated given M, but M doesn't contain any information about the individual inputs X 1 and X 2 .
As for the second interpretation, we need to rely on the classical concept of how to quantify causal influence between two sets of variables X and Y.As shown in [40], a good measure C X→Y of the causal influence of a variable X over a variable Y should be lower bounded as C X→Y ≥ I(X : Y|Pa X Y ), where Pa X Y stands for all the parents of Y but X.That is, excluded the correlations between X and Y that are mediated via Pa X Y , the remaining correlations give a lower bound to the direct causal influence between the variables.Consider for instance that we allow for an arrow between the input bits X and the guess Y. Therefore, the classical CI I(X 1 , X 2 : Y 1 , Y 2 |M, B) = 0 that is valid for the DAG in Fig. 1 b), does not hold any longer.In this case I(X : , an object that is part of the entropic description in the classical case.Proceeding with the general framework one can prove that That is, the degree of violation of (5) (for example, via a PR-box) gives exactly the minimum amount of direct causal influence required to obtain the same level of correlations within a classical model.Inequality (5) refers to the particular case of two input bits for Alice.As we prove in the appendix the following generalization for any number of input bits is valid within quantum theory: We further notice that the IC scenario is quite similar to the super dense coding scenario [15], the only difference being on the fact that for the latter the message M is a quantum state.On the level of the entropies this difference is translated in the fact that the monotonicity H(M|X 0 , X 1 , B) ≥ 0 must be replaced by a the weak monotonicity H(M|X 0 , X 1 , B) + H(M) ≥ 0. As proved in the appendix this implies that a similar inequality ( 7) is a also valid for the super dense coding scenario if one replaces H(M) by 2H(M), that is, a quantum message (combined with the shared entangled state) may allow for the double of information to be transmitted.
Finally, to understand how much more powerful inequality ( 5) may be in order to witness postquantum correlations we perform a similar analysis to the one in Ref. [41].We consider the following section of the nonsignalling polytope p(a, b|x, y) = γP PR + P det + (1 − γ − )P white (8) with P PR (a, b|x, y) = (1/2)δ a⊕b,xy , P white (a, b|x, y) = 1/4 and P det (a, b|x, y) = δ a,0 δ b,0 corresponding, respectively to the PR-box, white noise and a deterministic box.The results are shown in Fig. 2 where it can be seen that the new inequality is considerably more powerful then the original one.It can for instance witness the postquantumness of distributions that could not be detected before even in the limit of many copies.

QUANTUM NETWORKS
Quantum networks are ubiquitous in quantum information.The basic scenario consists of a collection of entangled states that are distributed among several spatially separated parties in order to perform some informational task, e.g., entanglement percolation [18], entanglement swapping [43] or distributed computing [16,17].A similar setup is of relevance in classical ✁ ✂✄☎ ✆✝ ✞✟✂✠✡✄ ☛☞✌✍ ✆✝ ✞✟✂✠✡✄ ☛☞✌✍ ✆✝ ✎✏✡✑✟✌✡✄ ☛☞✌✟✄✞ ✒✏✓✂✑✏✎ FIG. 2. A slice of the non-signalling polytope corresponding to the distributions (8).The lower black dashed line is an upper limit on quantum correlations obtained via the criterion in Ref. [42] while the upper solid black line bounds the set of non-signalling correlations.The solid red, blue and orange curves correspond, respectively, to the boundaries obtained with the IC inequalities ( 5), ( 3) and ( 4).Above each of this curves, the corresponding inequalities are violated.See appendix for details of how this curves are computed.causal inference, namely the inference of latent common ancestors [14,44].As we will show next, the topology alone of these quantum networks imply non-trivial constraints on the correlations that can be obtained between the different parties.We will consider the particular case where all the parties can be connected by at most bipartite states.We note, however, that our framework applies as well to the most general case and results along this line are presented in the appendix.
The problem can be restated as follows.Consider n observable variables that may be assumed to have no direct causal influence on each other (as they are space-like separated).Given some observed correlations between them, the basic question is then: Can the correlations between these n variables be explained by (hidden) common ancestors connecting at most 2 of them?The simplest of such common ancestors scenarios (n = 3), the so called triangle scenario [5,44,45], is illustrated in Fig. 1.
In the case where the underlying hidden variables are classical (for example, separable states), the entropic marginal cone associated to this DAG has been completely characterized in Ref. [7].Following the framework delineated before, we can prove that the same cone is obtained if we replace the underlying classical variables by quantum states (see appendix).This implies that entropically there are no quantum correlations in the triangle scenario.
The natural question is how to generalize this result to more general common ancestor structures for arbitrary n.With this aim, we prove in the appendix that the monogamy relation recently derived in [14] is also valid for quantum theory.We also prove in the appendix that this inequality is valid for general non-signalling theories, generalizing the result obtained in [9] for n = 3.In addition we exhibit that for any nontrivial common ancestor structure there are entropic corollaries even if we allow for general non-signalling parents.
The inequality (9) can be seen as a kind of monogamy of correlations.Consider for instance the case n = 3 and label the commons ancestor (any nonsignalling resource) connecting variables V i and V j by i,j .If the dependency between V 1 and V 2 is large, that means that V 1 has a strong causal dependence on their common mutual ancestor 1,2 .That implies that V 1 should depend only mildly on its common ancestor 1,3 and therefore its correlation with V 3 should also be small.The inequality (9) makes this intuition precise.

DISCUSSION
In this work, we have introduced a systematic algorithm for computing information-theoretic constraints arising from quantum causal structures.Moreover, we have demonstrated the versatility of the framework by applying it to a set of diverse examples from quantum foundations, quantum communication, and the analysis of distributed architectures.In particular, our framework readily allows to obtain a much stronger version of information causality.
These examples aside, we believe that the main contribution of this work is to highlight the power of systematically analyzing entropic marginals.A number of future directions for research immediately suggest themselves.In particular, it will likely be fruitful to consider multi-partite versions of information causality or other information theoretical principles and to further look into the operational meaning of entropy inequality violations.
We acknowledge support by the Excellence Initiative of the German Federal and State Governments (Grant ZUK 43), the Research Innovation Fund from the University of Freiburg.DG's research is supported by the US Army Research Office under contracts W911NF-14-1-0098 and W911NF-14-1-0133 (Quantum Characterization, Verification, and Validation).CM acknowledges support by the German National Academic Foundation.

APPENDIX A linear program framework to entropic inequalities
Given the inequality description of the entropic cone describing a causal structure, to obtain the description of an associated marginal scenario M we need to eliminate from the set of inequalities all variables not contained in M.After this elimination procedure, we obtain a new set of linear inequalities, constraints that correspond to facets of a convex cone, more precisely the marginal entropic cone characterizing the compatibility region of a certain causal structure [7].This can be achieved via a Fourier-Motzkin (FM) elimination, a standard linear programming algorithm for eliminating variables from systems of inequalities [46].The problem with the FM elimination is that it is a double exponential algorithm in the number of variables to be eliminated.As the number of variables in the causal structure of interest increases, typically this elimination becomes computationally intractable.
While it can be computationally very demanding to obtain the full description of a marginal cone, to check if a given candidate inequality is respected by a causal structure is relatively easy.Consider that a given causal structure leads to a number N of possible entropies.These are organized in a n-dimensional vector h.In the purely classical case, the graph consisting of n nodes (X 1 , . . ., X n ) will lead to a N = 2 n dimensional entropy vector that can be organized as h = (H(∅), H(X n ), H(X n−1 ), H(X n−1 X n ), . . ., H(X 1 , . . ., X n )).In the quantum case, since not all subsets of variables may jointly coexist we will have typically that N is strictly smaller than 2 n .
As explained in details in the main text, for this entropy vector to be compatible with a given causal structure, a set of linear constraints must be fulfilled.These linear constraints can be casted as a system of inequalities of the form Mh ≥ 0, where M is a m × N matrix with m being the number of inequalities characterizing the causal structure.
Given the entropy vector h, any entropic linear inequality can be written simply as the inner product I, h ≥ 0, where I is the associated vector to the inequality.A sufficient condition for a given inequality to be valid for a given causal structure is that the associated set of inequalities Mh ≥ 0 to be true for any entropy vector h.That is, to check the validity of a test inequality, one simply needs to solve the following linear program: subject to Mh ≥ 0 In general, this linear program only provides a sufficient but not necessary condition for the validity of a inequality.The reason for that is the existence of non-Shannon type inequalities, that are briefly discussed below.

Details about the new IC inequality
In the following we will discuss how to characterize the most general marginal scenario in the information causality scenario.We will start discussing the purely classical case (i.e Alice and Bob share classical correlations) and afterwards apply the linear program framework to prove that all inequalities characterizing the classical Shannon cone are also valid for quantum mechanical correlations.
The classical causal structure associated with information causality contains six classical variables S = {X 1 , X 2 , Y 1 , Y 2 , M, λ}.The variable λ stands here for the classical analog of the quantum state AB .The most general marginal scenario that is compatible with the information causality game and thus with protocols using more general resources such as nonlocal boxes is given by M = {X 1 , X 2 , Y i , M} (with i = 1, 2).The relevant conditional independencies implied by the graph are given by I(X 1 , X 2 : λ) = 0 and I(X 1 , X 2 : implied by the relevant ones together with the polymatroidal axioms for the set S of variables, and in this sense are thus redundant.Given this inequality description (basic inequalities plus CIs) we need to eliminate from our description, via a FM elimination, all the variable not contained in M.
Our first step was to eliminate from the system of inequalities the variable λ.Doing that one obtains a new set of inequalities for the five variables S = {X 1 , X 2 , Y 1 , Y 2 , M}.These set of inequalities is simply given by the basic inequalities plus one single nontrivial inequality, implied by the CIs: We then proceed eliminating all variables not contained in M. The final inequality description of the marginal cone of M can be organized in two groups.The first group contains all inequalities that are valid for the collection of variables in M independently of the underlying causal relationships between them, that is, they follow from the basic inequalities alone.The second group contains the inequalities that follow from the basic inequalities plus the conditional independencies implied by the causal structure.These are the inequalities capturing the causal relations implied by in-formation causality and there are 54 of them.Among these 54 inequalities, one of particular relevance is the tighter IC inequality (5) given in the main text.
One can prove, using the linear program framework delineated before, that this inequality is also valid for the corresponding quantum causal structure shown in Fig. 1 b).Following the discussion in the main text, the sets of jointly existing variables in the quantum case are given by S 0 = {X 1 , X 2 , A, B}, S 1 = {X 1 , X 2 , M, B} and S 2 = {X 1 , X 2 , M, Y i } (with i = 1, 2).One can think about these sets of variables in a time ordered manner.At time t = 0 the joint existing variables are the inputs X 1 and X 2 of Alice, together with the shared quantum state A,B .At time t = 1 Alice encodes the input bits into the message M also using her correlations with Bob obtained through the shared quantum state.Doing that, Alice disturbs her part A of the quantum system that therefore does not coexist anymore with the variables defined in S 1 .In the final step of the protocol at time t = 2, Bob uses the received message M and its part B of the quantum state in order to make a guess Y 1 or Y 2 about Alice's inputs.Once more, by doing that B ceases to coexist with the variables contained in S 2 .
Following the general idea, we write down all the basic inequalities for the sets S 0 and S 1 and S 2 , together with the conditional independencies and the data processing inequalities.As discussed before, because the quantum analogous of I(X 1 , X 2 : Y 1 , Y 2 |M, λ) = 0 has no description in the quantum case, the only CI implied here will be I(X 1 , X 2 : A, B) = 0.The causal relations encoded in the other CIs are taken care by the data processing inequalities.Below we list all used data processing inequalities: Note that some of these DP inequalities may be redundant, that is, they may be implied by other DP inequalities together with the basic inequalities.
We organize all the above constraints into a matrix M and given a certain candidate inequality I we run the linear program discussed before.Doing that one can easily prove that inequality ( 5) is also valid in the quantum case.
Note that this computational analysis will in general be restricted by the number of variables involved in the causal structure.To circumvent that we provide in the following an analytical proof of the validity of the generalized IC inequality (7) for the quantum causal structure in Fig. 1 b).
Proof.First rewrite the following conditional mutual information as The LHS of the inequality (7) can then be rewritten as This quantity can be upper bounded as leading exactly to the inequality (7).In the proof above we have used consecutively i) the data processing inequalities I(X 1 : as can be easily be proved induc-tively using the strong subadditivity property of entropies), iii) the monotonicity H(M|X 1 , . . ., X n , B) ≥ 0, iv) the independence relation I(X 1 , . . ., X n : B) = 0 and v) the positivity of the mutual information I(B : M) ≥ 0 Note that this proof can be easily adapted to the case where the message M sent from Alice to Bob is a quantum state.In this case there two differences.First, because the message is disturbed in order to create the guess Y i , we cannot assign a entropy to M and Y i simultaneously.That is, in the LHS side of the inequality (7) we replace I(X i : Y i , M) → I(X i : Y i ) and I(X 1 : The second difference is in step iii), because we have used the monotonicity H(M|X 1 , . . . ,X n , B) ≥ 0 that is not valid for a quantum message.Instead of that, we can use a weak monotonicity inequality, namely H(M|X 1 , . . ., X n , B) + H(M) ≥ 0. Therefore, in the final inequality (7), I(X i : Y i , M) → I(X i : Y i ) and I(X 1 : X i |Y i , M) → I(X 1 : X i |Y i ) and H(M) is replaced by 2H(M) (here, H standing for the von Neumann entropy), leading exactly to what should be expected of a super dense coding [15].

Proving that in the triangle scenario the classical and quantum marginal cones coincide
Since 1998 it is known that, for a number of variables n ≥ 4, there are inequalities valid for Shannon entropies that cannot be derived from the elemental set of polymatroidal axioms (submodularity and monotonicity) [47].These are the so called non-Shannon type inequalities [11].More precisely, the existence of these inequalities imply that the true entropic cone (denoted by Γ * n ) is a strict subset of the Shannon cone, that is, the Remember that a convex cone has a dual description, either in terms of its facets or its extremal rays.In terms of its half-space description the strict inclusion Γ * n ⊂ Γ n implies that while all Shannon type inequalities are valid for any true entropy vector, they may fail to be tight.In terms of the extremal rays, this implies that some of the extremal rays of the Shannon cone are not populated, that is, there is no well defined probability distribution with an entropy vector corresponding to it.
Sometimes, the projection of the outer approximation Γ n onto a subspace, described by the marginal cone Γ M , may lead to the true cone in the marginal space, that is Γ M = Γ * M [6].A sufficient condition for that to happen is that all the extremal rays of Γ M are populated.Using this idea, in the following we will prove that all the extremal rays describing the Shannon marginal cone of classical triangle scenario are populated, proving that in this case the Shannon and true marginal cones coincide.We will then use the linear program framework delineated previously in order to prove that all the corresponding inequalities are also valid for underlying quantum states, therefore proving that entropically the set of classical and quantum correlations coincide.
Proceeding with the three steps program delineated in the main text, one can see that the marginal scenario {A, B, C} of the triangle scenario is completely charac- terized by the following non-trivial Shannon type inequalities (and permutations thereof) [7] plus the polymatroidal axioms for the three variables A, B, C. Given the inequality description of the marginal cone we have used the software PORTA [48] in order to recover the extremal rays of it.There are only 10 extremal rays, that can be organized in the 4 different types listed in Table I .Below we list the probability distributions reproducing the 4 different types of entropy vectors: Since all the extremal rays are populated, this proves that Γ M = Γ * M for the marginal scenario M = {A, B, C} of the triangle scenario.
To prove that the same entropic cone holds for the associated quantum causal structure (Fig. 1 a) we just need to prove that all the inequalities defining Γ M hold true in the quantum case.Clearly, the polymatroidal axioms for {A, B, C} also hold true in the quantum case, since these variables are classical.Using the linear programming framework detailed above, one can also prove that the inequalities (20), (21) and (22) hold if the underlying hidden variables stand for quantum states.
The sets of jointly existing variables in the quantum case are given by S B, C} and S 7 = {A, B, C}.The fact that the quantum states are as- sumed to be independent is translated in the CI . The causal constraint that the observable variables have no direct influence on each other (all the correlation are mediated by the underlying quantum states) is encoded in the CI given by I(A : B|A 1 ) = I(A : B|B 1 ) = 0 (similarly for permutation of the variables).Below we also list all used data processing in-equalities: and similarly for permutations of all variables.Again, note that some of these DP inequalities may be redundant, that is, they may be implied by other DP inequalities together with the basic inequalities.

Proving the monogamy relations of quantum networks
In the following we provide an analytical proof of the monogamy inequality (9) in the main text.
We start with the case n = 3.For a Hilbert space H we denote the set of quantum states, i.e. the set of positive semidefinite operators with trace one, on it by S(H).
Proof.Data processing yields Then we exploit the chain rule twice and afterwards data processing again, We have therefore from which it follows that where in the second line we used strong subadditivity and in the last line we used that the entropy of a classical state conditioned on a quantum state is positive.
This proof can easily be generalized to the case of an arbitrary number of random variables resulting from a classical-quantum Bayesian network in with each parent connects has at most two children.i =j and let be an arbitrary measurement for i = 1, ..., n.Then n Proof.First, utilize the independences in the same way as in the proof of Theorem 1 to conclude Now assume Using the proof of Theorem 1 again and stopping before the last inequality in (31) we get i.e. we get where we defined the primed systems by H (n−11) = H (n−11) ⊗ H (n1) , observing that this yields a classicalquantum bayesian network with n − 1 nodes and and connectivity two and used the induction hypothesis.

Proving the monogamy relation for GPTs
We want to prove the inequality ∑ j∈{1,...,n} j =i For random variables that constitute a generalized Bayesian network [9] with respect to a DAG where each parent correlates at most two of them, i.e. the random variables are results of measurements on a set of arbitrary non-signalling resources shared between two parties.The case of three random variables has been proven in [9], the purpose of this appendix is to prove the generalization to an arbitrary number of random variables.Also we want to proof that for any fixed connectivity number for the parent nodes there are entropic corollaries.To this end we have to introduce a framework to handle generalized probabilistic theories that are non-signaling and have a property called local discriminability that was developed in [49].
An operational probabilistic theory has two basic notions, systems and tests.Tests are the objects that represent any physical operation that is performed, e.g. the preparation of a state, or a measurement.A test has input and output systems and can have a classical random variable as measurement outcome as well.An outcome together with the corresponding output system state is called event.The components are graphically represented by a directed acyclic graph (DAG) where the nodes represent tests, and the edges represent systems.We use the convention that the diagram is read from bottom to top, i.e. a tests input systems are represented by edges coming from below and its output systems are edges emerging from the top of the node: σ Y Finally we assume, just as it is done in [9], that there exists a unique way of discarding a system, which we denote by .We call this the discarding test, and it is shown in [49] that its existence and uniqueness is equivalent to the non-signaling condition.These elements can now be connected by using the output system of one test as input system for another.An arrangment of tests is called a generalized Bayesian network (GBN).We also say that the arrangement forms a GBN with respect to a DAG G, or that a GBN has shape G, if the tests are arranged according to it, analogous to the classical case introduced in the main text.
The main ingredient for the proof of (38) for n = 3 in [9] is the following Lemma 3 ([9], Thm.23.).For any probability distribution p(x, y, z) of random variables that are the classical output of a GBN with respect to the DAG in Figure 3 there is a probability distribution p such that p (x, z) = p (x)p (z) (39) p (x, y) = p(x, y) p (y, z) = p(y, z) For our purposes we need a generalization of this result.The GBN for the scenario of any parent connecting at most m children can be described as follows.The n random variables V 1 , ..., V n arise from n measurement tests.For any m of these measurement tests there is a preparation test whose output systems are input for exactly these measurements.We denote the preparation test corresponding to a subset I ⊂ {1, ..., n}, |I| = m by σ I .In total there are therefore ( n m ) preparations.This GBN can be found in Figure 4 probability distribution p such that any bivariate marginal involving V i is equal to the corresponding marginal of p and p (v 1 , ...v i−1 , v i+1 , ..., v n ) is compatible with G n−1,m−1 .
Proof.Analoguous to the proof of Lemma 3 in [9] we define a new GBN from the old one as follows: • Any preparation test σ I with i ∈ I is left as it is.
• Any preparation test σ I with i / ∈ I is copied.In one copy the first outgoing edge is discarded and in the second copy all edges except the first are discarded.
The modified GBN is depicted in Figure 5 for n = 3, m = 2 and i = 2.It can be seen in an analogous way as in the proof of Theorem 23 in [9] that using the probability distribution that arises from this GBN as p has the desired properties.
We are now ready to prove the inequality (38).
Theorem 5. Let V 1 , ..., V n be random variables defined by a GBN of shape G n,2 .Then for any i ∈ {1, ..., n} ∑ j∈{1,...,n} j =i Proof.Without loss of generality we take i = 1.We proceed by induction over n.For n = 2 the inequality is trivially true.Assume now that the statement is true for n − 1.We construct the probability distribution p according to Lemma 4 and observe that p (v 2 , ..., v n ) arises from G 1,n−1 , i.e. it is a product distribution.Denote the modified random variables by V 1 , ..., V n and calculate where the inequality follows from the independence of V 2 and V 3 and srong subadditivity.Now observe that with X = (V 2 , V 3 ) the distribution of V 1 , X, V 4 , ..., V n is compatible with G n−1,2 and therefore we have, using the induction hypothesis, But p (v 1 ) = p(v 1 ) and therefore (38) is proven.
For general m the situation is somewhat less simple, but for the special case of n = m + 1 we can still prove a nontrivial inequality.Theorem 6.Let V 1 , ..., V m+1 be random variables corresponding to a GBN of shape G m+1,m .Then m+1 and there is a set of random variables X 1 , ..., X m+1 incompat-ible with G m+1,m that violates this inequality.
Note that this inequality is, in particular, also true for quantum-classical bayesian networks and, to our knowledge, provides the only known entropic corollaries in this case, too.
Proof.(by induction) For m = 1 the statement is trivially true, as then the two random variables are independent and therefore I(V 1 : V 2 ) = 0 ≤ 0. Assume now the inequality was proven for m − 1. Construct random variables V 1 , ..., V m+1 according to Lemma 4. Then calculate where the first inequality follows from strong subadditivity and the second inequality follows from the induction hypothesis and the trivial bound I(X : Y) ≤ H(X).This completes the proof of the first assertion, i.e. that the inequality is fulfilled by random variables from a GBN of shape G m+1,m .To see that more general random variables violate this inequality, let X i = X, i = 1, ..., m + 1, where X is an unbiased coin.
In other words, the X i are maximally corellated.Then H(X i ) = 1 and I(X 1 : X i ) = 1.Therefore we have m+1 hence the inequality ( 45) is violated Note that this inequality yields nontrivial constraints for the entropies of random variables resulting from a GBN of any shape G n,m , n > m, as it can be applied to any m + 1 of the n variables.

FIG. 1 .
FIG. 1.(a) An example of distributed architecture involving bipartite entangled states.Each of the underlying quantum states can connect at most two of the observable variables, what implies a non-trivial monogamy of correlations as captured in (9).b The quantum causal structure associated with the information causality principle.

BA
If a system has trivial input or trivial output we omit the edge and represent the node by a half moon shape.Tests with trivial input are called preparations, tests with trivial output are called measurements.If we do not need to talk about the systems we omit the labels of the edges, and preparation tests are given greek letter labels, as they have, without loss of generality, only a single event.

TABLE I .
Extremal rays of the marginal triangle scenario type # of permutations H C H B H BC H A H AC H AB H ABC The four kinds of extremal rays defining the marginal entropic cone of the triangle scenario.
, and we denote the corresponding DAG by G n,m .An example of the modified GBN from Lemma 4 for n = 3 and i = 2 Lemma 4. For any probability distribution arising from a GBN of shape G n,m and any index i ∈ {1, ..., n} there is a