Quantum causal unravelling

Complex processes often arise from sequences of simpler interactions involving a few particles at a time. These interactions, however, may not be directly accessible to experiments. Here we develop the first efficient method for unravelling the causal structure of the interactions in a multipartite quantum process, under the assumption that the process has bounded information loss and induces causal dependencies whose strength is above a fixed (but otherwise arbitrary) threshold. Our method is based on a quantum algorithm whose complexity scales polynomially in the total number of input/output systems, in the dimension of the systems involved in each interaction, and in the inverse of the chosen threshold for the strength of the causal dependencies. Under additional assumptions, we also provide a second algorithm that has lower complexity and requires only local state preparation and local measurements. Our algorithms can be used to identify processes that can be characterized efficiently with the technique of quantum process tomography. Similarly, they can be used to identify useful communication channels in quantum networks, and to test the internal structure of uncharacterized quantum circuits.


I. Introduction
Many processes in nature arise from sequences of basic interactions, each involving a small number of physical systems.Determining the causal structure of these interactions is important both for basic science and for engineering.
Often, however, the sequence of interactions giving rise to a process of interest may not be directly accessible to experiments.For example, scattering experiments in high energy physics can probe the relation between a set of incoming particles and a set of outgoing particles, but typically cannot access the individual events taking place

I INTRODUCTION
within the scattering region.In this and similar scenarios, a fundamental problem is to characterize the causal structure of the interactions by accessing only the inputs and outputs of the process of interest, while treating the intermediate steps as a black box.We call this problem, illustrated in Figure 1, the causal unravelling of an unknown physical process.Explicitly, the problem of causal unravelling is to determine whether an unknown process can be broken down into a sequence of simpler interactions, to determine the order of such interactions, and to determine which systems take part in each interaction.Causal unravelling can be viewed as a special case of the broader problem of causal discovery [1,2], namely the task to identify the causal relations between a given set of variables.In the broad class of causal discovery problems, the distinctive features of causal unraveling are that (i) the goal is to identify a linear causal structure, corresponding to the sequence of interactions underlying the given process, and (ii) certain variables are a priori known to be 'inputs' (and therefore potential 'causes'), while other variables are a priori known to be 'outputs' (and therefore potential 'effects').This scenario often arises in experimental physics, where the input/output structure is typically clear from the design of the experiment, as in the aforementioned example of scattering experiments.In principle, candidate answers can be extracted from a full tomographic characterization of the process under consideration.However, the complexity of process tomography grows exponentially in the number of inputs and outputs, making this approach unfeasible when the process involves a large number of systems.
In the classical domain, the problem of causal unravelling can be efficiently addressed with a variety of algorithms developed for the general problem of causal discovery [1][2][3].Classical causal discovery algorithms often formulate causal relationships with graphical models and solve such models by structure learning algorithms, such as PC algorithm (named after Peter Spirtes and Clark Glymour) [1], Greedy Equivalence Search [4], and Max-Min Hill Climbing [5].These algorithms cover a wide variety of problems, by making different sets of assumptions on the process under consideration.Typical assumptions include causal sufficiency-meaning that no variables are hidden-and causal faithfulness-meaning that the conditional independences among the variables are precisely those associated to an underlying graph used to model the causal structure.In general, however, causal discovery is intrinsically a hard problem: when no assumption is made, the complexity of all the known algorithms becomes exponential in the worst case over all possible instances [6,7].
In the quantum domain, the problem of causal unravelling is made even more challenging by the presence of correlations that elude a classical explanation [8,9].In recent years, the quantum extension of the notion of causal model has been addressed in a series of works [10][11][12][13][14][15], providing a solid conceptual foundation to the field of quantum causal discovery.On the algorithmic side, however, the study of quantum causal models remained relatively underdeveloped.Specific instances of quantum causal discovery were studied in Refs.[16][17][18], showing that quantum resources offer appealing advantages.These examples, however, were limited to simple instances, typically involving a small number of variables and/or a small number of hypotheses on the causal structure.In more general scenarios, one approach could be to perform quantum process tomography and then to infer the causal structure from the full description of the process under consideration [19].As in the classical case, however, the number of queries needed by a full process tomography grows exponentially with the number of systems involved in the process, making this approach I INTRODUCTION impractical as the size of the problem increases.

FIG. 1. Causal unravelling.
A physical process involves a set of input systems (in the figure , A1, A2, and A3) and a set of output systems (in the figure , B1, B2, and B3).The process is the result of a sequence of interactions, each of which involves a subset of the inputs and a subset of the outputs.The problem is to infer the causal structure of the interactions solely from the input-output behaviour of the process.
In this paper, we provide an efficient algorithm for unravelling the causal structure of multipartite quantum processes without resorting to full process tomography.Our algorithm is similar to the PC algorithm [1] for classical causal discovery, in that it is based on a set of tests that establish the independence relations between subsets of input and output systems.We show that the algorithm has the following features: 1.The number of independence tests needed to infer the causal structure scales polynomially with the number of inputs/outputs of the process.This feature is possible thanks to the special structure of the causal unravelling problem, where the goal is to establish a linear ordering of the interactions giving rise to the process under consideration.
2. The independence tests produce, as a byproduct, an estimate of the strength of correlation between the various inputs and outputs of the process.In Methods, we show that this estimate can be obtained by performing a number of measurements that grows polynomially with the dimension of the systems under consideration, and that the number of measurements needed to conclude independence scales polynomially with the number of systems.
3. The algorithm is exact whenever the process has bounded information loss, and satisfies a form of causal faithfulness property, namely that the strength of the causal relations, when present, is above a given threshold.When these assumptions are not satisfied, the algorithm produces an approximate result.The details of the approximate case are in Supplementary Note 6.
Moreover, the efficiency of our algorithm can be further boosted in special cases, including (i) the case where each input of a given interaction has a non-trivial causal influence on all the outputs of subsequent interactions, and (ii) the case where the process belongs to a special case of Markovian processes [12,[19][20][21], where each output depends only on one previous input and each input affects only one later output.We study these cases in the Results, where we devise an alternative algorithm that only requires local state preparations and local measurements.The number of queries to the process is only logarithmic in the number of input and output wires, thanks to a method that efficiently determines the correlations between input-output pairs as described in the Methods.
The remaining parts of this paper is structured as follows.In Results, we first formulate the quantum causal unravelling problem.In the second subsection, we give the main body of the efficient causal unravelling algorithm, discuss the assumptions and analyze its efficiency.The third subsection of Results talks about the alternative algorithm designed for special cases.At the end of Results, we briefly talk about a generalization of our algorithm.Future works, interpretations and the applications of the causal unravelling algorithms are addressed in the Discussion.The Methods section contains the detailed implementation of the independence tests used by the algorithms in Results.

II. Results
Problem formulation.Let us start by giving a precise formulation of the problem of quantum causal unravelling.In this problem, an experimenter is given access to a multipartite quantum process, with inputs labelled as A 1 , . . ., A nin , and outputs labelled as B 1 , . . ., B nout .Note that, in general, different labels may refer to the same physical system: for example, system A 1 could be a single photon with a given frequency, entering in the interaction region, and system B 1 could be a single photon with the same frequency, exiting the interaction region.Mathematically, the process is described by a quantum channel, that is, a completely positive trace-preserving (CPTP) linear map C transforming operators on the tensor product space H A1 ⊗ • • • ⊗ H An in to operators on the tensor product space Note that, without loss of generality, one can always assume n in = n out = n, as this condition can be satisfied by adding a number of dummy systems with one-dimensional Hilbert space.In the following, we will denote by L(H) the set of linear operators on a generic Hilbert space H, and by S(H) the subset of density operators on H, that is, the subset of operators ρ ∈ L(H) that are positive semidefinite and have unit trace.
FIG. 2. Decomposition of a physical process into a sequence of m interactions.The inputs (outputs) of the processes are divided into m non-overlapping subsets.The i-th subset of the inputs (outputs) consists of the systems that enter (exit) the interaction at the i-th step (rectangular boxes in the picture).The wires connecting one interaction to the next represent intermediate systems that in this stage are not directly accessible to experiments.For example, a photon could enter the first interaction, remain as an intermediate system between the first and second interaction, and then exit the interaction region after the second interaction.
The problem of causal unravelling is to determine whether a multipartite process can be broken down into a sequence of interactions, as in Figure 2, and, in the affirmative case, to determine which systems are involved in each interaction.Mathematically, the problem is to find a partition {P i } m i=1 of the set {A 1 , . . ., A n } and a partition {Q i } m i=1 of the set {B 1 , . . ., B n }, such that the multipartite process can be decomposed into a sequence of interactions, with the i-th interaction involving input systems in P i and output systems in Q i .Such a sequential structure matches the framework of quantum combs [22,23].A quantum comb is a quantum process that can be broken down into a sequence of interactions C 1 , . . ., C k as in Figure 2, while each interaction C i is a CPTP map and is called a tooth of the comb.Refs.[22,23] give a set of necessary and sufficient conditions for determining whether a given process conforms to a quantum comb, which is equivalent to whether the process admits a causal unravelling with partitions {P i } and {Q i }.We say a process C has a causal unravelling (P 1 , Q 1 ), . . ., (P m , Q m ) if it can be decomposed into the form of a quantum comb with m teeth as in Figure 2.
The causal unravelling of a given multipartite process determines the possible signalling relations between inputs and outputs.With respect to this decomposition, input systems at a given time can only signal to output systems at later times.The resulting pattern can be graphically illustrated by a causal graph [1,2,14], as shown in Figure 3.
Note that the causal graph includes all the signalling relations that are in principle compatible with the structure of the interactions.However, a specific quantum process may not exhibit any signalling from a specific input to a specific output, even though the causal structure of the interactions would in principle permit it.When this occurs, the causal unravelling of a process may not be unique.For example, consider a bipartite process with inputs {A 1 , A 2 } and outputs {B 1 , B 2 }, with the property that A 1 signals to It is impossible to decide whether the signalling A 1 → B 1 happens before or after A 2 → B 2 since they are causally uncorrelated.Therefore, the process admits two causal unravellings: (A 1 , B 1 ), (A 2 , B 2 ) and (A 2 , B 2 ), (A 1 , B 1 ).In this case, any of the two options is a valid solution of the causal unravelling problem.
Generally, causal discovery is a hard problem, even in the classical setting [6,7].However, we will now show that, under a few assumptions, the more specific problem of quantum causal unravelling defined above can be solved efficiently.The first assumption is that the basic interactions appearing in the causal unravelling involve a small number of systems, independent of the number of inputs and outputs.At the fundamental level, this assumption is motivated by the fact that interactions are local, and typically involve a small number of systems.Mathematically, the assumption is that the cardinality of all the sets in the partitions {P i } m i=1 and {Q i } m i=1 is no larger than a constant c independent of n.For simplicity, we will first restrict our attention to the special case c = 1, meaning that the process can be broken down into interactions involving only one input and one output at a time.We will discuss larger c at the end of the Results and in Supplementary Note 7. Hereafter, we will denote by Comb[(A 1 , B 1 ), . . ., (A n , B n )] the set of quantum combs with n teeth where the i-th tooth has input A i and output B i .
In the basic c = 1 scenario illustrated above, the problem of causal unravelling is to find out which pair of systems is involved in the first interaction, which pair is involved in the second, and so on.Formally, our goal is to identify an ordering of the inputs and outputs, (A σ(1) , B π(1) ), . . ., (A σ(n) , B π(n) ) with σ and π being permutations of {1, . . ., n}, To quantify the efficiency of our algorithms, we will focus on the sample complexity, namely the number of blackbox queries to the channel C, and on the computational complexity, including additional quantum and classical computation time measured by the number of elementary quantum gates and classical operations.
Efficient quantum causal unravelling.We now provide an efficient quantum algorithm for quantum causal unravelling.The main idea of the algorithm is to recursively find the last interaction in the decomposition of a given process.To illustrate this idea, consider the example of Figure 3.When we remove B 4 , the node A 4 becomes disconnected from all the other nodes, meaning that the state of system A 4 does not affect the state of the other systems.By testing this independence condition, we can in principle check whether the graph of Figure 3 is an appropriate model for the process under consideration.In general, suppose (A x , B y ) are the input and output of the last interaction.Since A x signals to only B y , if we ignore B y , A x is independent of the joint system formed by all systems excluding A x and B y .This gives the following criterion that the last interaction must satisfy: Proposition 1. [22] The last interaction of a process C involves the input/output pair (A x , B y ) if and only if A x is independent of A =x B =y , the joint system containing all systems other than A x and B y .
More details can be found in Supplementary Note 1.By scanning the possible pairs (A x , B y ), we can find if one of them satisfy the above criterion, and in the affirmative case, we can assign that pair to the last interaction.Note that, in general, there may be more than one pair that satisfy the required condition.After one pair is found, one can reduce the problem to a smaller graph containing n − 1 inputs and n − 1 outputs.If at every step a suitable pair is found, then the final result is a valid causal unravelling of the original process.If at one step no pair can be found, the algorithm will then conclude that no causal unravelling with c = 1 exists for the remaining subgraph.At this point, the algorithm can continue by considering causal unravellings with higher values of c for the remaining subgraph, which is discussed at the end of the Results and detailed in Supplementary Note 7.
A description of the algorithm is provided in Algorithm 1.The algorithm runs in a recursive manner: for an n-tooth comb, it finds the last tooth of the comb, remove the tooth (feeds the input with an arbitrary state and discards its output), and reduces the problem to finding the causal unravelling of an (n − 1)-tooth comb.We repeat the above procedure until we reach the bottom case n = 1, and thus obtain the order of all inputs and outputs.If the last tooth cannot be found in some iteration, it means that the current channel cannot be further decomposed, and the algorithm will output the trivial causal unravelling (P, Q) where P (Q) is the set of all input (output) wires of the current channel.Output (P, Q) and exit; // Cannot be further decomposed We now discuss the efficiency of this algorithm.First, we show that the number of independence tests is polynomial in n.Let T test (n) be the number of independence tests required for an n-to-n channel.In this algorithm, an independence test is performed for at most each of the n 2 input-output pairs (A x , B y ), resulting in O(n 2 ) independence tests.After finding a last tooth, the problem size is reduced to n − 1.Therefore, T test (n) can be given by the recursive Second, we show that the independence tests can be efficiently realized.This is a non-trivial problem, because testing whether two generic systems are in a product state is computationally hard in the worst-case scenario [24].
Nevertheless, in the Methods we design a quantum circuit that performs the independence tests efficiently under assumptions of bounded information loss and causal faithfulness.Our circuit converts the independence test to the estimation of the distance between quantum states, which is done by the SWAP test [25].Now, we give the efficiency guarantee for our algorithm.We first discuss the exact case, when our algorithm produces the exact causal unravelling of C based on two assumptions.We will use a few parameters related to the introduced by the process.
An important parameter entering into the analysis is the degree of independence between systems, defined in the following.For a state ρ, we say two disjoint subsystems S and T are independent if ρ ST = ρ S ⊗ ρ T , where ρ S , ρ T and ρ ST are the marginal states of ρ on the subsystems S, T and the joint system ST , respectively.The degree of independence between S and T is then defined as χ 1 (S; T ) ρ := ρ ST −ρ S ⊗ρ T 1 , where X 1 := Tr √ X † X denotes the trace norm.Clearly, χ 1 (S; T ) ρ = 0 if and only if ρ ST = ρ S ⊗ ρ T .More generally, the trace distance is related to the probability to distinguish the states, and thus χ 1 (S; T ) ρ measures the probability that an observer correctly decides whether the subsystems are independent or correlated.In the following, we will apply this definition to the Choi state C of the process C. With this choice, χ 1 (S; T ) C satisfies the conditions for a quantum causality measure, as defined in Ref. [27].
To facilitate the efficiency analysis of our algorithm, we first put the process C in a standard form where all input and output wires have the same dimension d A .This standard form does not limit the generality of the quantum process we investigate.For a process C with input dimensions d A1 , . . ., d An and output dimensions d B1 , . . ., d Bn , we can pick d A = max{d A1 , . . ., d An , d B1 , . . ., d Bn } and regard each input or output wire of C as a subspace of a d A -dimensional system, thus transforming C into a process whose wires all have dimension d A .In our analysis, we will use the Kraus rank of the process in the standard form to characterize the information loss.We assume that the information loss is bounded, which is given by the following assumption: Assumption 1.The Kraus rank of C, after transforming it into the standard form, is bounded by a polynomial of n.
To ensure that the algorithm outputs the correct causal unravelling, we further require that the process satisfies a form of causal faithfulness, meaning that the strength of causal relations is either zero or above a threshold.Using χ 1 as a quantitative measure of correlation, we adopt the following assumption: Assumption 2. There exists a number χ min > 0 such that, for any two disjoint sets of wires S and T being tested for independence, either where C is the Choi state of the quantum process C.
The threshold χ min determines the resolution of the independence tests.To guarantee the correctness of Algorithm 1, the independence tests must be precise enough to detect correlations above this threshold with high probability.The efficiency and correctness of Algorithm 1 are given in the following theorem, whose proof is in Supplementary Note 2: Theorem 1.Under Assumptions 1 and 2, for any confidence parameter κ 0 > 0, Algorithm 1 satisfies the following conditions: 1.With probability 1 − κ 0 , the output of Algorithm 1 is a correct causal unravelling for C.
2. The number of queries to C is in the order of where r C is the Kraus rank of C in the standard form with all wires having dimension d A .
3. The computational complexity is in the order of O(T sample n log d A ).
Theorem 1 guarantees that, under appropriate assumptions, the sample complexity of our algorithm is polynomial in the number of input and output systems of the process under consideration.This feature is in stark contrast with the exponential complexity of full process tomography.As a consequence, our algorithm offers a speedup over for other algorithms, such as the one proposed in Ref. [19], which require process tomography as an intermediate step.
Assumptions 1 and 2 guarantee that Algorithm 1 produces an exact causal unravelling.However, both assumptions can be lifted if we only require an approximate causal unravelling, meaning that the process C is within a certain error of another process compatible with the causal unravelling output by the algorithm.In Supplementary Note 6, we formulate this approximate case and prove that the error is small under the condition that every marginal Choi state of C obtained by taking only the first k inputs and k − 1 outputs in the causal unravelling of C has polynomial rank up to a small error.This condition can be verified efficiently during the execution of Algorithm 1.
Causal unravelling with local observations.We now show that, under some assumptions on the input-output relations, one can design algorithms that have much lower sample complexity in terms of n compared to Algorithm 1, and are more experimentally friendly, in that they require only local state preparation and local measurements.
In the Methods, we show an efficient algorithm to detect the pairwise correlations between input and output wires of C with local state preparation and local measurements.The algorithm computes a Boolean matrix ind ij such that, with high probability, for every i and j, A i and B j are approximately independent whenever ind ij = true, and are correlated whenever ind ij = false.With some assumptions, this Boolean matrix ind ij is sufficient to give the exact causal unravelling.The first case is given by the following assumption on the process C: and there exists a constant χ min > 0 such that, for any pair of input and output wires A σ(i) and Assumption 3 indicates a non-trivial correlation between any pair consisting of an input system and an output system, with the property that the input system appears before the output system in the overall causal order.In other words, for any defines a total order of the input (output) wires, and ensures a unique causal unravelling that C is compatible with.
Under this assumption, if A i is the k-th input, namely i = σ(k), A i is correlated with n − k + 1 output wires including every output B π(j) with j ≥ k, and is independent of the other outputs.In other words, if we find A i is correlated with exactly c A (i) output wires, it must be the (n − c A (i) + 1)-th input.With this, the order of input wires can be exactly determined, and a similar statement can be applied to order the output wires.
2. The algorithm uses only local state preparations and local measurements and the number of queries to C is in the order of where 3. The computational complexity is in the order of O(N n(n Note the sample complexity of this algorithm grows only logarithmically with n.
A similar idea could be adopted to the case where the process belongs to a special case of Markovian processes [12,[19][20][21], where each output depends only on one previous input and each input affects only one later output.This indicates that the process is decomposable to a tensor product of n channels each with one input and one output.In our problem, the order of inputs and outputs is unknown, and we have the following assumption: for some permutation π .
Since each output is related to at most one input and each input affects at most one output, after obtaining ind ij , we can obtain the causal unravelling by matching each input-output pair (A i , B j ) with ind ij = false.
If there exists a threshold χ min > 0 such that either χ 1 (A i ; B j ) C = 0 or χ 1 (A i ; B j ) C ≥ χ min holds for every A i and B j , then the algorithm has the same complexity as in Theorem 2 that is logarithmic in n.However, in case a threshold χ min is not known, we can still show that the algorithm is efficient yet produces an approximate answer with an error bound defined by the diamond norm [28].The diamond norm, also known as the completely bounded trace norm, is a distance measure between channels defined for C, D : Theorem 3.For a quantum process C satisfying Assumption 4, there is an algorithm that outputs a causal unravelling (A 1 , B π(1) ), . . ., (A n , B π(n) ) satisfying the following conditions: 1.With probability 1 − κ, the causal unravelling is approximately correct in the following sense: (3) 2. The algorithm uses only local state preparations and local measurements and the number of queries to C is in the order of where d A := max i d Ai and d B := max j d Bj .
3. The computational complexity is in the order of O(N n(n Causal unravelling with interactions between more inputs and outputs.In the algorithms shown so far, we assumed that the process under consideration admits a causal unravelling where each interaction involves exactly one input and one output.More generally, Algorithm 1 can be easily extended to the scenario where each interaction involves at most c inputs and c outputs of the original process.Instead of considering each wire separately, the idea is to consider a subset of at most c input (output) wires and perform independence tests on the subsets.
In Algorithm 1, one enumerates an input-output pair (A x , B y ) and checks whether it is the last tooth by performing an independence test between A x and A =x B =y .To deal with larger c, we replace this procedure by enumerating a subset of input wires P ⊂ {A This process is done recursively until one reaches the bottom case.We give the detailed algorithm and analysis in Supplementary Note 7.For constant c, under some assumptions, the complexity of the algorithm is still polynomial in n and d A , while the exponent depends on c.

III. Discussion
In this paper we developed an efficient algorithm for discovering linear causal structures between the inputs and outputs of a multipartite quantum process.Our algorithm provides a partial solution to the more general quantum causal discovery problem, whose goal is to produce a full causal graph describing arbitrary causal correlations in an arbitrary set of quantum variables.Our algorithm can be used as the first step for quantum causal discovery, and to obtain the full causal structure, additional tests may be adopted to detect signalling between more subsets of input and output wires.Since the most general quantum causal discovery problem is intrinsically hard, an interesting direction for future work is to examine to what extent the problem of quantum causal discovery can be solved by an efficient algorithm in scenarios beyond the linear structure analyzed in this work.
The efficiency of our algorithms relies on some assumptions.For Algorithm 1, the low-rank assumption is the key in both the exact case (Assumption 1) and the approximate case discussed in Supplementary Note 6.This assumption avoids the computational difficulty of deciding whether a completely general state is a product state [24].Physically, a quantum process has a low rank if the number of uncontrolled particles entering and/or exiting the interaction region is small.In this picture, the uncontrolled particles in the input can be regarded as sources of environmental noise, and the uncontrolled particles in the output are responsible for information loss in the process.Intuitively, without the low-rank assumption, the causal correlations will be obscured by the noise, and it will be hard to discover them without additional prior knowledge.On the other hand, if one has prior knowledge, as in the case of processes satisfying Assumptions 3 and 4, the low-rank assumption may be lifted.
The ability to infer the underlying causal structure of a process is useful for a variety of applications.Classically, discovering causal relationships is the goal of many research areas with numerous applications in social and biomedical sciences [1] such as the construction of the gene expression network [29].Causal discovery allows us to understand complex systems whose internal structures are not directly accessible, and to discover possible models for the internal mechanisms.Quantum causal discovery, likewise, enables the modelling of quantum physical processes with inaccessible internal structure, for example, discovering the individual interactions in scattering experiments.Below we list some specific examples where our causal unravelling algorithms can be applied to the detection of correlations and the modelling of internal structures of complex processes.
First, causal relations among quantum variables are relevant to the study of quantum networks [30][31][32], where the presence of a causal relation between two systems can be used to test whether it is possible to send signals from one node to another.In a realistic setting, the signalling patterns within a quantum network may change dynamically, depending on the number of users of the network at a given moment of time, on the way the messages are routed from the senders to the receivers, and also on changes in the environment, which may affect the availability of transmission paths between nodes.Such a dynamical structure occurs frequently in classical wireless networks [33,34], and is likely to arise in a future quantum internet.In this context, our algorithms provide an efficient way to detect dynamical changes in the availability of data transmission paths.
The detection of causal relations is also relevant to the verification of quantum devices, as it can be used as an initial test to determine whether a given quantum device generates input-output correlations with a desired causal structure.Such a test could serve as an initial screening to rule out devices that are not suitable for a given task, and could be followed by more refined quantum benchmarks [35] which quantify how well the device performs a desired task.In this context, the benefit of the causal unravelling test is that it could save the effort of performing more refined tests in case the process under consideration does not comply with the desired causal structure.
Finally, our causal unravelling algorithm can be used as a preliminary step to full process tomography.By detecting the causal structure of multipartite quantum processes, one can sometimes design a tailor-made tomography scheme that ignores unnecessary correlations, and achieves full process tomography without requiring an exponentially large number of measurement setups.For example, a process that admits a causal unravelling with systems of bounded dimension at every step can be efficiently represented by a tensor network state [36,37], for which tomography can be performed efficiently [38].In a quantum communication network, efficient tomography of the transmission paths is crucial for the design of encoding, decoding and calibration schemes for more efficient data transmission.In physics experiments, the causal structure and tomography data are useful for modelling the underlying physical process, for example, by finding the smallest quantum model that reproduces the observed data [39][40][41].More generally, characterizing the causal structure of a multipartite process as a tensor network enables the use of efficient protocols that exploits the tensor network structure, including simulation protocols [37,42,43] and compression protocols [44].

IV. Methods
Efficient tests for the last tooth via the SWAP test.In the Results, we have given the framework of Algorithm 1.In this section, we discuss how the tests for the last tooth, namely line 5 of Algorithm 1, can be carried out efficiently.
Consider a process C of three input wires A 1 , A 2 , A 3 and three output wires B 1 , B 2 , B 3 , and suppose that we want to test whether (A 1 , B 1 ) is the last tooth, which is equivalent to the independence test between A 1 and A 2 A 3 B 2 B 3 according to Proposition 1. Testing the independence between A 1 and A 2 A 3 B 2 B 3 can be converted to the estimation Generally, the Hilbert-Schmidt distance between two states ρ and σ is defined as ρ−σ 2 , where denotes the Schatten 2-norm, also known as the Frobenius norm.It is related to the trace distance by the following inequality [45]: The Hilbert-Schmidt distance between two quantum states can be estimated via SWAP tests [25].The SWAP test uses the quantum circuit in Figure 4 to estimate Tr[ρσ] for two given quantum states ρ and σ.
If we run the circuit in Figure 4 for N times and let c + be the number of times observing outcome |+ , then estimate with error no more than ε with probability 1 − κ as shown in Lemma 1. Proof.For each run, the probability that the outcome is |+ is (1 + Tr[ρσ])/2.By Hoeffding's inequality, The SWAP test is efficient in the sense that its circuit complexity is linear in the number of qubits representing the systems.For d-dimensional states ρ and σ, the controlled-SWAP gate acting on ρ and σ can be realized with log d controlled-SWAP gates acting on each pair of corresponding qubits of ρ and σ, and thus can be implemented with Running SWAP tests on ρ ⊗ σ, ρ ⊗ ρ, and σ ⊗ σ, we are able to obtain estimates for Tr[ρσ], Tr[ρ 2 ] and Tr[σ 2 ], respectively.From these estimates we can compute the Hilbert-Schmidt distance between ρ and σ according to the following equation: Coming back to the independence tests, by applying SWAP tests on marginal Choi states C A1A2A3B2B3 and C A1 ⊗ C A2A3B2B3 , we can estimate their Hilbert-Schimidt distance.Using Eq. ( 5), we can bound their trace distance, obtain a bound on χ 1 (A 1 ; A 2 A 3 B 2 B 3 ) C , determine the independence between A 1 and A 2 A 3 B 2 B 3 , and decide whether (A 1 , B 1 ) is the last tooth.This procedure is summarized in Function 2.  This completes Algorithm 1.The choices of ε and δ are done in the detailed error analysis in Supplementary Note 2. To ensure the correctness and efficiency, we need to further ensure that Eq. ( 5) provides a useful bound by giving upper bounds on the ranks of the marginal Choi states.Such upper bounds can be obtained from Assumption 1, and the details are in Supplementary Note 2.
We have shown the possibility of using the SWAP test as an efficient test of independence.The SWAP test could in principle be replaced by any algorithm for estimating the Hilbert-Schmidt distance, with possibly lower sample complexity [46].In a quantum network scenario, causal unravelling may be performed by multiple parties, each of which has access to one input system or one output system.A bonus of the SWAP test is that, since a controlled-SWAP gate for two multipartite states can be decomposed into controlled-SWAP gates on the local systems of each party, only the control qubit needs to be transferred from one party to another in each round of the test.Additionally, the control qubit is measured after each round, and therefore parties do not need to carry quantum memories over subsequent rounds.
Testing the independence between all input-output pairs.In this section, we show the algorithm to test the independence between every pair of input and output wires, resulting in a Boolean matrix ind ij .The first step is to collect statistics of the channel C. When collecting data, we need to ensure that the information encoded in the quantum data are preserved, for which purpose we use informationally complete POVMs [47].The elements of such a POVM on Hilbert space H form a spanning set of L(H), and the number of elements can be chosen as (dim H) 2 .
We pick an informationally complete POVM {P Ai,α } β=1 for each output wire B j .We pick N to be the number of queries to the channel C. In each query, we measure the Choi state of C with the POVM This POVM can be realized with local measurements on each wire.Let (α n ) be the outcome of the k-th query.We can write the outcomes in a matrix as shown in Figure 5. Independent?
FIG. 5. Matrix of measurement outcomes.Each row corresponds to a query to the process, and each column corresponds to an input or output wire.When we investigate Ai and Bj, we only look at the corresponding columns in the matrix and check the independence from data in these columns.
This matrix will be used to determine the independence between every pair of input and output wires.When we investigate A i and B j , we only look at the corresponding columns in the matrix and check the independence from data in these columns.Since we are using an informationally complete POVM, the independence between the two columns of the matrix is equivalent to the independence between the input A i and output B j .With this idea, we will be able to compute a Boolean matrix ind ij such that ind ij = true if and only if A i and B j are independent, up to an error related to N , the number of queries.
One may notice that the POVM in Eq. ( 8) is enough for a full process tomography of C.This is true if we pick an exponentially large N as large as the square of the product of all input and output dimensions.However, to compute ind ij with a small error, N does not need to be large.We show that N could be chosen to grow only logarithmically with n, far less than a full process tomography.
Lemma 2. Given a channel C with input wires A 1 , . . ., A n and output wires B 1 , . . ., B n , there exists an algorithm that satisfies the following conditions: 1.With probability 1 − n 2 κ 0 , the algorithm produces an output satisfying: where χ − > 0 is a threshold that can be freely chosen.

VII AUTHOR CONTRIBUTIONS
2. The number of queries to C is in the order of The detailed algorithm and the proof are in Supplementary Note 3.
The Boolean matrix ind ij gives a partial order of the wires: if ind ij = false, then A i must be before B j .This partial order may not produce the full ordering of the wires, but will be a convenient initial guess for the causal unravelling, since its complexity is much lower than the general algorithm Algorithm 1 for large n.In the Results, we show that under Assumptions 3 or 4, this partial order is enough to infer the full order.
Furthermore, the matrix of outcomes (Figure 5) is a conversion from the quantum process to classical data, making it possible to adopt classical causal discovery algorithms [1,[3][4][5].For example, one could use algorithms based on the graph model, and try to find a graph in the form of Figure 3 that best fits the observation data in Figure 5.
We assume that the process C is already in its standard form with all input and output wires having dimension d A , and thus r C = rank(C).

a. Accuracy of independence tests
We first consider the accuracy of the independence test implemented in Function 2, in an arbitrary iteration of Next, using Eq. ( 5), The rank of C 1 can be bounded as Therefore by Eq. ( 12), Now we consider the output of Function 2. If Function 2 returns true, then p 1 + p 2 − 2p 3 ≤ δ.From Eq. ( 11), and then from Eq. ( 14), If Function 2 returns false, then p 1 + p 2 − 2p 3 > δ, and thus , we have the following lemma: Lemma 3. Let C be the input channel of Function 2. With probability 1 − 3κ, Function 2 satisfies the following: 1.If Function 2 returns true, then where rank(C) is the Kraus rank of C; 2. If Function 2 returns false, then b. Bounding the rank during recursion In the following, we first give some lemmas on the Kraus rank of channels, and then prove Lemma 5, which can be used in the inductive proof of Theorem 1 to upper bound the rank of C in any recursion.excluding A x (B y ).
Proof.According to P (k), C (k) ∈ Comb[(A =x , B =y ), (A x , B y )], and the following channel 1) , where C (k−1) equals to the Choi state of the updated channel C (k−1) .According to Lemma 4, rank(C (k−1) ) ≤ rank(C (k) ) dim H By / dim H Ax = rank(C (k) ), where we used dim H Ax = dim H By = d A since we have assumed that the process is in its standard form in Assumption 1.
Now we are ready to prove Theorem 1.
Proof of Theorem 1.In Algorithm 1, using the relation T test (n) ≤ n 2 + T test (n − 1), we can find that there are T test (n) ≤ n 3 independence tests, each of which invokes three SWAP tests.Each SWAP test produces an estimate within error ε with probability no less than 1 − κ according to Lemma 1.Let κ = κ 0 /3n 3 , and from now we assume all SWAP tests are within error ε, which has probability no less than 1 − 3n 3 κ = 1 − κ 0 .
Let C (n) be the initial value of C in Algorithm 1, and let C (k) , which has k input wires and k output wires, be the value of C in the (n − k + 1)-th recursive call of Algorithm 1.
Consider testing whether (A x , B y ) is the last tooth of C (k) with Function 2. Let S be the set of all wires of C (k)   excluding A x and B y .If (A x , B y ) passes the test, namely Function 2 returns true, by Lemma 3, Now we pick δ ≤ χ 2 min 8d A rank(C (k) ) and ε = δ/5.Eq. ( 18) then indicates χ 1 (A x ; S) Therefore, if Function 2 returns true, then (A x , B y ) must be the last tooth of the comb, in some causal unravelling of C (k) .Using the notation in Lemma 5, P (k) holds for this iteration.On the other hand, if Function 2 returns false, by Lemma 3, and (A x , B y ) does not satisfy the condition for being last tooth of the comb.
Let r C := rank(C (n) ).Take the definition of P (k) and Q(k) from Lemma 5. Now we perform an induction to prove P (k) and Q(k) for every k.
First, Q(n) obviously holds according to Assumption 1. Now, we assume that Q(k) holds, and want to show that Q(k − 1) also holds.
By Q(k), we have rank(C (k) ) ≤ r C .In any recursive call of Algorithm 1, if Q(k) holds, then we can choose δ = , and by the argument above, P (k) holds.Combining this with Lemma 5, we have This completes the induction, and we conclude that Q(k) and P (k) holds for every k.P (k) being true for every k shows the correctness of our algorithm.
).The sample complexity is then and the computational complexity is O(T sample n log d A ) from the realization of SWAP tests.
λ min (X) denotes the minimum eigenvalue of operator X and F and G are defined in Function 3. Then we define The error of the estimate χ 1 (ρ A,B ) is given in the following lemma, whose proof is in Supplementary Note 3 a.Lemma 6.For any positive real number ε > 0, over the N measurement outcomes on independent copies of ρ AB , with probability 1 − κ 0 , the output of Function 3 estimates χ 1 (ρ A,B ) with the following error bound That is, the inequality holds for any positive real number ε > 0. Here, Pr(C) expresses the probability that the condition C is satisfied.
In other words, to reach confidence 1 − κ 0 and error ε 0 , the number of samples N is of order If one choose {P α } and {Q β } to be SIC-POVMs, [48].In this case, the number of samples Function 3 could be used on quantum channels to discover the correlation between input and output wires.To characterize a quantum channel, we could transform it to its β=1 .This process generates the same statistics for (α, β) according to the following equation: where The left hand side of Eq. ( 25) is the probability of sampling ψ α times the probability of outcome β on the output conditioned on the input being ψ α , and the right hand side of Eq. ( 25) is the probability of outcome (α, β) if one measure the Choi state C with POVM {P α ⊗ Q β }.Now we adopt Function 3 to causal unravelling.By testing the correlation between every pair of input and output wires, one is able to infer the causal unravellings: if output B j is correlated to input A i , B j must be after A i .In this section, we study cases where such reasoning yields the full causal unravelling.
We introduce an algorithm that discovers the correlations between every pair of input and output wires.The idea is to first collect all the required classical data, and deduce all the correlations according to the statistics.During this procedure, the data are reused for different input-output pairs, leading to fewer queries to the channel C. Output : An array ind where ind i,j being true indicates A i and B j are independent For each j, measure j-th output wire with {Q j β } β=1 .Let the outcomes be (b To understand Function 4, we observe that {(a j )} N k=1 are sampled from the same distribution as the outcomes of measuring C Ai,Bj with {P i α ⊗ Q j β }, and estimatechi1 produces an estimate of χ 1 (A i ; B j ) C .Conditioned on that all n 2 number of calls to estimatechi1 succeed, which has probability no less than 1 − n 2 κ 0 , all the estimations χ 1 (A i ; B j ) C have error no greater than ε 0 .We summarize these observations in the following lemma: Lemma 7.With probability 1 − n 2 κ 0 , Function 4 produces an output satisfying: 1. if ind i,j = false, then χ 1 (C Ai,Bj ) > χ − − ε 0 measurement outcomes, pαβ is the average of independent Bernouli random variables, and has mean p αβ .The same is true for pA α and pB β , whose means are p A α and p B β , respectively.
Here we consider the case where no assumption on χ min is made, and give the correctness and efficiency (Theorem 3) merely based on Assumption 4.
Proof of Theorem 3. Picking χ − = ε 0 , according to Lemma 7, with probability 1 − n 2 κ 0 , 1. if ind i,j = false, then χ 1 (A i ; B j ) C > 0 2. if ind i,j = true, then χ 1 (A i ; B j ) C ≤ 2χ − Since by Assumption 4, for every input A i , there is at most one output B j such that χ 1 (A i ; B j ) C > 0, and therefore there is at most one j satisfying ind i,j = false.Let A matched := {A i |∃j, ind i,j = false} and B matched := {B π(i) |A i ∈ A matched } be the inputs and outputs that are matched up to line 12 of Algorithm 3.
Define the channel D by its action on product states as follows: Clearly, D ∈ Comb[(A 1 , B π(1) ), . . ., (A n , B π(n) )], since it is a tensor product of channels from A i to B π(i) .Now we show that the difference between C and D is small.
According to Assumption 4, C = n i=1 C Ai→B π (i) = i∈A matched C Ai→B π (i) ⊗ i / ∈A matched C Ai→B π (i) .For channel C and each i ∈ A matched , A i is correlated to B π(i) but not correlated to all other outputs, and by the way π is computed, one must have π(i) = π (i).Therefore, We prove Theorem 4 in the following subsections.In Supplementary Note 6 a, we analyze the accuracy of the independence tests in the approximate case.In Supplementary Note 6 b we analyze how the error in each recursion step affects the final error of the algorithm.The main proof of Theorem 4 is in Supplementary Note 6 c.

a. Accuracy of independence tests
The following lemma is useful for converting the Hilbert-Schimidt distance to the trace distance in the approximate case.

Choi state [ 26 ]
of the process C, defined as the state C := (C ⊗ I)(|I I|)/d in , where d in is the total dimension of all the input systems, and |I = din j=1 |j ⊗ |j is the canonical (unnormalized) maximally entangled state.The rank of the Choi state C is called the Kraus rank of the process C. Since a unitary evolution, namely a process without information loss, has Kraus rank equal to one, the Kraus rank could be interpreted as the degree of information loss II RESULTS is the identity map.The diamond norm measures the maximum probability to distinguish two channels, and is tighter than the trace distance since C − D ≤ C − D 1 for all channels C and D with Choi states C and D. The error bound is stated in the following theorem, whose proof is in Supplementary Note 5.
which is the distance between marginal Choi states.Each copy of C A1A2A3B2B3 or C A2A3B2B3 can be prepared with one use of the process C. Note that C A1 equals to I A1 /d A1 by definition of a CPTP map.Given the ability to prepare the marginal Choi states, we now consider the estimation of their distance, which gives the value of χ 1 (A 1 ; A 2 A 3 B 2 B 3 ) C .In the following, we first talk about the estimation of another distance measure, the Hilbert-Schmidt distance, and use it to bound the trace distance as used by χ 1 .

Function 2 :1
checklast(C, A x , B y , ε, δ, κ) Input : A quantum channel C, input wire A x , output wire B y , error thresholds ε, δ, confidence κ Output : Whether (A x , B y ) is the last tooth of C Prepare the circuits that generate the states C 1 := Tr By [C] and C 2 := I Ax /d Ax ⊗ Tr AxBy [C];

Algorithm 1 .
Define C 1 := Tr By [C] and C 2 := I Ax /d Ax ⊗ Tr AxBy [C] as in Function 2. By Lemma 1, for each of the estimate p i , (i = 1, 2, 3), with probability 1 − κ, p i is ε-close to the true value.By the union bound, with probability at least 1 − 3κ, all three estimates are ε-close to their corresponding true values, and therefore

Function 4 :
independence(C, N, χ − ) Input : Black-box access of a quantum comb C with input wires A 1 , . . ., A n and output wires B 1 , . . ., B n

whereC
Ai→B π(i) (ρ Ai ) := Tr B =π(i) [C(ρ Ai ⊗ I A =i /d A =i )] is channel C restricted on input A i and output B π(i) , andC Ai B π(i) (ρ Ai ) := Tr[ρ Ai ]C B π(i) is a constant channel that outputs C B π(i), the marginal Choi state of C on system B π(i) .
, B y ) is the last tooth then // Done by checking whether A x and A =x B =y are independent using the quantum circuit described in the Methods Algorithm 1: Efficient quantum causal unravelling algorithm Input : Black-box access to quantum channel C Output : A causal unravelling of C 1 if C has only one input wire A x and one output wire B y then 2 Output (A x , B y ) and exit; 3 end 4 foreach input-output pair (A x , B y ) of C do 5 if (A x 6 Let C A =x ,B =y (ρ) := Tr By [C(ρ ⊗ τ Ax )], where τ Ax is an arbitrary state on system A x ; // Remove A x and B y from C 7 Recursively run this algorithm on the reduced channel C A =x ,B =y ; 8 Output (A x , B y ) and exit; // Append (A x , B y ) to the end of the output 9 end end Let P (Q) be the set of all input (output) wires of C; The estimated distance is larger than δ 8Return true; // (A x , B y ) is the last tooth 9 end Choi state by feeding the input with half of a maximally entangled state, and apply Function 3 on the Choi state.To save the cost of creating and maintaining entangled states, we could use Function 3 in an alternative way.Let {ψ α } Tr[P α ].One samples ψ α from {ψ α } ]/d A , apply quantum channel C : L(H A ) → L(H B ) on ψ α , and measure the output with {Q β } (33) ,(33)namely with probability no less than 1− 2(d 2 A d 2 B + d 2 A + d 2 B )e −2ε 2 1 N = 1 − κ(ε), all (d 2 A d 2 B + d 2 A + d 2 B ) number of the following inequalities |p αβ − pαβ | ≤ε 1 , ∀α, β