A quantum causal discovery algorithm

Finding a causal model for a set of classical variables is now a well-established task---but what about the quantum equivalent? Even the notion of a quantum causal model is controversial. Here, we present a causal discovery algorithm for quantum systems. The input to the algorithm is a process matrix describing correlations between quantum events. Its output consists of different levels of information about the underlying causal model. Our algorithm determines whether the process is causally ordered by grouping the events into causally-ordered non-signaling sets. It detects if all relevant common causes are included in the process, which we label Markovian, or alternatively if some causal relations are mediated through some external memory. For a Markovian process, it outputs a causal model, namely the causal relations and the corresponding mechanisms, represented as quantum states and channels. Our algorithm provides a first step towards more general methods for quantum causal discovery.


I. INTRODUCTION
The discovery of causal relations is a basic and universal task across all scientific disciplines. The very nature of causal relations, however, has been a long-standing subject of controversies with the central question being what, if anything, distinguishes causation from correlation.
It is only recently that a rigorous framework for causal discovery has been developed [1,2]. Its core ingredients are causal mechanisms that are responsible for correlations between observed events, with the possibility of external interventions on the events. The possibility of interventions is what provides an empirically well-defined notion of causation, distinct from correlation: an event A is a cause for an event B if an intervention on A results in a change in the observed statistics of B. A causal model is typically defined as a set of direct-cause relations and a quantitative description of the corresponding causal mechanisms. The causal relations are represented as arrows in a graph and the causal mechanisms are usually described in terms of transition probabilities (Figure 1).
Among the most important achievements of causal models is the development of algorithms for causal discovery. The objective of such algorithms is to infer a causal model based on observational and interventional data. Such algorithms have found countless applications and constitute one of the backbones in the rising field of machine learning.
It is a natural question whether similar algorithms can be developed for quantum systems. In simple quantum experiments, causal relations are typically known and well under control. However, the fast growth of quantum technology goes towards the development of networks of increasing size and complexity. Hence, appropriate tools to recover causal relations might become necessary for the functioning of large, distributed quantum networks, as it is already the case for classical ones [3]. Causal discovery might further detect the presence of "hidden common causes", namely external sources of correlations that might introduce systematic errors. Finally, from a foundational perspective, the possibility of discovering causal relations from empirical data opens the possibility to recover causal structure from more fundamental primitives.
Classical causal discovery algorithms, however, fail to discover causal relations in quantum experiments [4]. A considerable effort has been recently devoted to solve this tension and transfer causal modeling tools to the quantum domain [5][6][7][8][9][10][11][12][13][14], leading to the formulation of a quantum causal modeling framework [15,16]. (See Refs. [17,18] for a broader philosophical context.) Here we introduce a first algorithm for the discovery of causal relations in quantum systems. The starting point of the algorithm is a description of a quantum experiment (or "process") that makes no prior assumption on the causal relations or temporal order between events [19]. Given such a description, encoded in a process matrix, the algorithm extracts different levels of causal information for the events in the experiments. It determines whether or not they are causally ordered, namely whether they can be organized in a sequence where later events cannot influence earlier ones. If a causal order exists, the algorithm finds if all common causes are modeled as events in the process matrix-a property expressed by the condition of Markovianity, as defined in Ref. [15]. If the process is Markovian, the algorithm outputs a causal model for it: a causal structure (depicted as arrows connecting events) together with a list of quantum channels and states that generate the process.
The complexity of our algorithm scales quadratically with the number of events, although the size of the problem itself (the dimension of the process matrix) is exponential. This suggests that the algorithm can be used efficiently given an efficient encoding of the input to the code. We further comment on possible extensions of the algorithm to deal with processes that are not markovian, not causally ordered, or that follow a different definition of markovianity [16]. We provide the MatLab code of the algorithm, a Mathematica code to generate process matrices of random causal structures for testing the code, and a manual with instructions [20]. Both codes require some functions of Tony Cubitt [21], which we include.

A. Process framework
We will use a formulation of quantum mechanics that can assign probabilities to quantum events with no prior knowledge of their causal relations [19]. This formulation is based on the "combs" formalism for quantum networks [22], with the main difference that the causal order between events is not assigned in advance.
In this framework, a quantum event A can be thought to be performed by a party inside a closed laboratory ( Figure 2)-which is associated with an input and an output Hilbert space, H A I and H A O respectivelyand is represented by a completely positive (CP) map It was found that, for a set of parties {A 1 , · · · , A n }, the joint probability of their CP maps to be realized, given their instruments, is a function of their maps and some matrix that mediates their correlations: Using a version of the Choi-Jamiolkovsky (CJ) isomorphism [23,24], the CJ matrix is an orthonormal basis on H A I and T denotes matrix transposition in that basis and some basis of is a matrix that lives on the combined Hilbert space of all input and output systems of the parties and is called process matrix. Equation (1) can be seen as a generalization of the Born rule, and the process matrix as a generalization of the quantum state, as it is the resource that allows calculating joint probabilities for all possible events. Just as the Born rule is the only non-contextual probability assignment for POVM measurements [25], Equation (1) is the only non-contextual probability rule for CP maps [26].
Here we are interested in situations where causal relations define a partial order, which we call causal order. We identify causal relations with the possibility of signaling: if the probability of obtaining an outcome in laboratory B can depend on the settings in laboratory A, we say that A causally precedes B, and write A ≺ B. (We write A||B if no signaling is possible and say A and B are causally independent.) The process matrices that define a causal order between the events are called causally ordered.

B. From mathematical to graphical representation
The causal structure encoded in the process matrix can be represented by a Directed Acyclic Graph (DAG): A directed graph is a pair G = V, E , where V = {V 1 , ..., V n } is a set of vertices (or nodes) and E ⊂ V × V is a set of ordered pairs of vertices, representing directed edges ( Figure 3). A directed path is a sequence of directed edges where, for each edge, the second vertex is the first one in the next edge. A directed cycle is a directed path that ends up in a vertex already used by the path. A DAG is a directed graph with no directed cycles. We refer to edges as causal arrows. Following Ref. [15], we define a quantum causal model by associating a specific type of process matrix to a DAG. We associate a party, with input and output spaces, to each node of the DAG. If the node has more than one outgoing arrow, the output space is composed of subsystems, with one subsystem for each arrow. We refer to them as output subsystems. We define the parent space Γ A of a node A as the tensor product of all output subsystems associated with an arrow ending in A. A Markov quantum causal model is then defined by a collection of quantum channels, one for each node A, connecting the parent space of A to its input space. Now let us see how a process matrix whose causal structure is represented by a DAG looks like. It will be a tensor product of three types of factors: input states for the set of parties with no incoming arrow in the DAG, channels connecting each input system of a remaining party with its parent space, and finally the identity matrix 1 for the output systems of the set of parties with no outgoing arrows in the DAG. For example, if ..} is a set of parties where F , M and L is the label for the three set of parties described above (first, middle, and last), respectively, then their process matrix would be where This isomorphism is the same as the one used to describe the CP maps of the parties, but without transposition.
channel with its matrix representation. A representation of Markovian processes as Equation (2) is also employed in the study of open quantum systems [27]. The above condition for the causal structure of the process matrix to be described by a DAG is a quantum generalisation of the Markov condition for classical variables and so it can be called the quantum Markov condition [15]. (We will comment below on a slightly different possible definition [16].) Such a process matrix is also causally ordered, with a partial order defined by the DAG. However, the class of causally ordered process matrices is strictly broader than Markovian ones, and they are represented by quantum combs [22].

III. QUANTUM CAUSAL DISCOVERY
The input to the code is a process matrix, which can be obtained from experimental data. The procedure is similar to quantum state tomography: one can reconstruct the process matrix given the probabilities arising from informationally instruments [15].

A. The linear constraints
A process matrix of the form of Equation (2) satisfies a set of linear constraints. This set identifies a DAG-in fact, each constraint corresponds to a particular element in the DAG. There are two types of constraints.
Open output: A party A has an open output when in the process matrix W there is an identity matrix on the corresponding output system A O . This translates to the following linear constraint: When this condition is satisfied, the party A cannot signal to any party and is considered last. In the case where the output system of the party is decomposed into subsystems A Oi , i = 1, · · · , n, each corresponding to an outgoing arrow, then the corresponding identity matrix in the process matrix lives on the Hilbert space of that output subsystem A Oi . We also call this subsystem open and the linear constraint is Channel: A quantum channel between the input of a party A and its parents space Γ A is represented by a factor T Γ A A I in the process matrix, as we have already mentioned. It is a positive matrix that lives on the tensor product of the Hilbert spaces of the output and input systems involved, and has the property that upon tracing out the output of the channel (the input of A) what remains is identity on the input (the space of output systems Γ A ): This property is necessary and sufficient for the channel to be trace preserving and we use it to discover channels in the process matrix: we trace out the input of A, A I , and we check whether in the remaining process matrix there is now-and not before-identity on the output system of a given party, say, B. This describes a linear constraint that a process matrix satisfies when there is a channel from the output of B to the input of A.
If the output of party B is decomposed into subsystems, then we use the above constraint for each subsystem separately, by replacing B O with every output subsystem The maximal set of output systems and subsystems for which this condition holds is the parent space of A, Γ A .
In the concrete implementation of the algorithm, the above equalities are tested up to some precision defined by a small number , which can be adjusted depending on the working precision. This is due to different numerical rounding of irrational numbers that lead to errors- Naturally, this number can be adjusted to account for experimental inaccuracies.

B. The code
The causal discovery code subjects the process matrix to the above types of linear constraints and the set of them that are satisfied define the DAG.
The code takes as input: the number of parties, the dimension of each input system, output system, output subsystem, and the process matrix.
Briefly the procedure of causal discovery goes as follows: First the code identifies and traces out any open output subsystems. Then it determines whether the process matrix is causally ordered. If it is, it outputs a possible causal order and proceeds to determine if the process is Markovian. For a Markovian process, it outputs the DAG, and the represented mechanisms. Below we expand on these three stages. b. Checking if W is causally ordered: Let us call a non-signaling set, a set of parties that are causally independent, namely that cannot signal to each other. A non-signaling set is maximal if it is not a proper subset of another non-signaling set. The first output of the algorithm is all the maximal non-signaling sets and their causal order. This is done through the linear constraint that detects open output systems, in Equation (3). The set of parties whose output systems satisfy the constraint is labeled as last set. Note that the constraint has to be satisfied by the whole output system and not only by some subsystems. To determine the next set, the second last, the code traces out the last set from the process matrix and, using the same constraint, it identifies the new last set, and so on. Note that the partition into maximal non-signaling sets does not uniquely identify the partial order of the parties, in the sense that it is not guaranteed that parties in different non-signaling sets can signal to each other. What is guaranteed is that at least one party from a set X can signal to at least one party in a succeeding set Y (Figure 4). Note also that the partition into maximal non-signaling sets is not unique, much like a foliation of space-time into space-like hypersurfaces. The process matrix is causally ordered if and only if the algorithm succeeds in grouping all parties in maximal non-signaling sets. This is because, given the nonsignaling sets, we can define a total order among the parties by adding arbitrary order relations among members of each set. For example, we can order the parties in different time steps where: when A ≺ B, A occurs at a time before B and, when A||B, we pick an arbitrary time ordering ( Figure 5). With the parties ordered in this way, the process matrix satisfies the condition defining a quantum comb [22]. This is a recursive version of Equation (3), that holds for the output of each system after all systems that come after it are traced out. A central result in the theory of quantum networks is that, whenever this condition holds, the corresponding process can be realized as a channel with memory [22,28,29]. Thus, this part of the algorithm determines whether the input process matrix has a physical realisation as a causally ordered process.  Figure 4, we can order all events in time, obtaining a total order of the parties, by putting an arbitrary order between parties in the same non-signaling set.
c. Causal discovery and Markovianity: After the algorithm has traced out all open output subsystems and has established the maximal non-signaling sets of the parties, it is time to determine the DAG. The algorithm checks all possible causal arrows between pairs of an input system of a party and an output system of another party, using Equation (6). If a party's output system is divided into subsystems, then each subsystem is checked using the linear constraint in Equation (7). Every time the constraint is satisfied, an arrow is associated with the corresponding systems and the output system or subsystem is marked as used and is not being checked again. The collection of all output systems and subsystems that satisfy the constraints for a single input system of a party A uniquely identifies the parent space of A, Γ A . Figure 6 shows the information input to the code and the output information that obtains during the three staged described above.
At this stage, the code outputs the DAG if the process is Markovian, namely if the process matrix is of the form of Equation (2). To determine this, the code constructs a test-matrix that is Markovian with respect to the found DAG: it contains all (and only) the factors as in Equation (2) that correspond to the elements of the DAG. There are three kinds of these elements: first parties, causal arrows, and last parties; the corresponding terms on the process matrix are input states for the first parties, channels that live on the input and output systems and subsystems of the associated parties, and identity matrices on the output system of the last parties, respectively. To construct the test process matrix, these factors are extracted from the original process matrix by tracing out all systems except from the desired ones. If the process is Markovian, then the test-matrix will be equal to the original process matrix that was input to the code.

C. Minimality
The code is guaranteed to give a unique and minimal DAG for a Markovian process. A process matrix is said to be Markov with respect to the DAG if every channel (found by Equation (6) and Equation (7)) in the process matrix is represented by an arrow in the DAG. However, a W can be Markov to more than one DAGsome DAGs will have arrows allowed by the causal order but there is no actual channel in the W corresponding to this arrow. In other words, a W can be in the tensor product form (2), but with some factor of the form T Γ M M I = 1 Γ M ⊗ ρ M I , for some normalized density matrix ρ. This represents a channel that always produces the state ρ. Hence, this W is Markovian with respect to a DAG with arrows representing such channels, from Γ M to M I , but is also Markovian to a DAG without such arrows.
If every arrow in the DAG corresponds to a non-trivial channel in the process matrix, the DAG is called minimal. From another perspective, a DAG is minimal if, by removing any arrow from it, then the W is not any more Markov with respect to the resulting DAG.
The fact that the output of the code is always the minimal DAG is guaranteed by the first step of the algorithm, where the open subsystems are established and discarded. Indeed, an "extra arrow" in a nonminimal DAG would necessarily be associated with an open subsystem-an identity tensor factor in the process matrix.
Note also that, in [15], it was proven that a DAG can be in principle recovered under the additional assumption of faithfulness. Our algorithm does not require such an extra assumption, proving that causal discovery is always possible for a quantum Markov causal model.

IV. COMPLEXITY OF THE ALGORITHM
The dimension of the process matrix is given by the product of input and output dimension of each party. Thus, the size of the process matrix would generally scale exponentially with the number of parties. This is expected, as also the dimension of ordinary density matrices would scale exponentially with the number of parties.
One can however consider situations where, under appropriate assumptions and approximations, the physical scenario under consideration is described by a polynomial number of parameters. Then, the main cost of the algorithm lies in two parts: the one that establishes the non-signaling sets and the one that searches for causal arrows between parties. The first step tests condition (3) for all parties, to determine each non-signaling set, and the second step tests condition (6) (or (7)) for pairs of nodes-in both cases the number of tests required is thus quadratic in the number of parties. Therefore, given an efficient encoding of the input process matrix, the algorithm scales quadratically with the number of parties.

V. NON-MARKOVIAN PROCESSES
A Markovian process is one with a process matrix of the form of Equation (2), and is represented by a DAG. In a non-Markovian process the process matrix is not of that form, i.e. it is not a tensor product of factors representing input states for the parties with no incoming arrows, channels, and identity matrices for the output of the parties in the last set. In other words, in a non-Markovian process, these factors alone-or their representation in a DAG-cannot account for the observed correlations between the events.

A. Latent nodes
If the code outputs that the process is causally ordered but non-Markovian, it may be the case that there are extra nodes, not considered in the process, which affect the local outcomes of the nodes considered. These are called latent nodes [15].
For example, the outcomes of quantum measurements performed in some measurement stations (nodes) in a laboratory, may be affected by the temperature or maybe another system is leaking into one of the stations, like stray light affecting the detection part and causing correlated noise. If these are producing significant change in the data-higher than the noise tolerance in the codethe process will appear non-Markovian.
To recover a causal model by introducing latent nodes we would need to extend the algorithm such that it adds nodes and arrows until it finds that is Markovian. Computationally, this task can be hard because the code has to find the right combination of the number of nodes needed, their position in the DAG and the exact channels around them. However, although the original process is non-Markovian, the code still outputs the causal order of the parties for a causally ordered process matrix. From that, one could make guesses about the right causal model, by introducing nodes with specific input and output systems and channels connecting them to the rest of the parties. To do this, one should add the corresponding factors into the current test-matrix W test and run the code using as input the updated number of parties, dimensions of systems and W test as the process matrix and see if now the process is Markovian.

B. Mixture of causal orders
Another possible reason why the process is non-Markovian is that it might be the case that the it represents a probabilistic mixture of two or more Markovian processes with different causal orders, resulting in a noncausally ordered process matrix 2 . There is a Semidefinite Program (SDP) for this problem, that finds the right decomposition [32]. For instance, for a bipartite process, the SDP would look like the following.
given W find q (8) such where W X≺Y denotes a valid process matrix where Y is last and therefore has a factor 1 Y O . In the case with more parties, one simply has to write a decomposition that includes all different causal orders for the given parties. Given the result, one can apply the causal discovery algorithm to each term in the decomposition.

C. Dynamical and indefinite causal order
So far, we have seen that when events have a definite causal order, they can be represented either by a fixed causal order process or by a mixture of causal orders. However, it may be the case that the process matrix represents a situation of more than two parties, where the causal order of some parties depend on the operations of parties in their past. That is, a party may influence the causal order of future parties. Such a dynamical causal order was studied in [33] where a definition of causality was proposed, compatible with such dynamical causal order. For the tripartite case, it was found that the process matrix describing such a situation should obey certain conditions. However similar conditions were not found for the case of arbitrary parties. In such cases, the notion of causal discovery is not clear, as depending on some events in the past, the DAG of future ones would change. Hence the output would be different DAGs for different operations of certain parties. We do not know if the discovery of those DAGs is possible.

D. Different definitions of Markovianity
Our algorithm relies on the definition of quantum Markov causal model of Ref. [15]. A different definition was proposed in Ref. [16], where the output systems of the parties are not assumed to factorize into subsystems in the presence of multiple outgoing arrows. In Ref. [16], arrows in the DAG are still associated with a quantum channel from the output space of the parent nodes to the input space of the child but, rather than defining a factorization in subsystems of the output space, multiple outgoing arrows are more generally associated with commuting channels. For example, in a tripartite scenario where A is a parent of both B and C, a Markovian process matrix would have the form Thus, according to Ref. [16], a Markovian process matrix does not need to be a tensor product but can more generally be a product of commuting matrices. To distinguish the two definitions, we will call tensor-Markovian and commuting-Markovian a process matrix that satisfies the condition of Ref. [15] (used in our code) and Ref. [16], respectively. Note that all tensor-Markovian processes are commuting Markovian, but the converse is not true 3 .
Our algorithm could be adapted to discover the causal structure of commuting-Markovian processes. Note that the strategy used in our code, to detect the parent space of each node by checking (5), would not work. Indeed, tracing out B I from matrix (9) does not result in a matrix with identity on A O . A possible approach could be to instead detect all the children of each node A, namely all the nodes with an incoming arrow departing from A. 3 In Ref. [16] it is further assumed that input and output spaces of each node are isomorphic. Thus, strictly speaking, not all tensor-Markovian process considered here satisfy the definition of Ref. [16], but only those with input and output of equal dimension. This difference is of little consequence from the point of view of a causal discovery algorithm, since in any case the dimension of each space has to be specified as input to the code.
The children are then identified as the smallest subset of parties C 1 , . . . , C k such that As this condition must be checked for subsets of parties, the number of tests is exponential in the number of parties for the worst-case scenario. In contrast, we have seen that to discover a tensor-Markovian causal structure a quadratic number of tests is sufficient. Another potential complication is that our test for Markovianity relies on the tensor-product form of the process matrices; it is not clear if there is a simple way to test whether a process is commuting-Markovian. An alternative approach is to retain the definition of tensor-Markovian processes and model commuting-Markovian processes as non-Markovian ones.
Indeed, since a commuting-Markovian process is causally ordered, it can always be recovered from a tensor-Markovian one by adding an appropriate number of latent nodes [15]. An extension of our code to detect latent nodes could thus be used to detect the causal structure of a commuting-Markovian process. In figure 7 we show an example of a DAG of a commuting-Markovian process (left) and how that would be represented as a tensor-Markovian (right) with a latent node.  [16], e.g. for the DAG on the left, is generally described by a DAG with a latent node (filled node in the DAG on the right) according to the definition of Markovianity of Ref. [15] on which our algorithm is based.

VI. CONCLUSIONS
We have presented an algorithm (whose implementation we provide) that can discover an initially unknown causal structure in a quantum network. The first of its type, it is an important proof of principle: it shows that causal structure has a precise empirical meaning in quantum mechanics. Just as other physical properties, it can be unknown and discovered. This is of particular significance for foundational approaches where causal structure is seen as emergent from more fundamental primitives. Causal discovery provides the methodology to determine when and how causal structure emerges.
Causal discovery can also have broad applications for protocols based on large and complex quantum networks. Our algorithm is guaranteed to find a minimal causal model for any Markovian process, namely a process in which all causally relevant events are under experimental control, with no extra assumptions; this improves on the results of Ref. [15], where the additional condition of faithfulness was invoked. Even for non-Markovian processes, the algorithm still recovers important causal information, namely a causal order of the events.
Another important use of our algorithm is to tackle the difficult problem of non-Markovianity. An extensive body of research is currently devoted to the problem of detecting non-Markovianity [34]. Our algorithm finds a concrete solution: it allows discovering when some external memory is affecting the correlations in the observed system. Detecting non-Markovianity can also have important practical applications in for large quantum network: the presence of "latent nodes", can in-dicate a possible source of systematic correlated noise in a process, that might affect the working of a quantum protocol. It can further have applications in cryptography for detecting the presence of an eavesdropper.
Finally, our algorithm has promising possible extensions. A natural extension is an algorithm that can make "good guesses" for causal structure in the presence of latent nodes. Promising is also the extension of causal discovery to mixtures of causal order, dynamical and indefinite causal structure.